The present disclosure relates to a method and system for prognosis of ovarian cancer, to a system and method for identifying candidate genes for use in a prognostic method, and in prognostic kits.
Ovarian cancers are very heterogeneous diseases which lack robust diagnostic, prognostic and predictive clinical biomarkers. Conventional clinical biomarkers (stages, grades, tumor mass etc) and molecular biomarkers (CA125, KRAS, p53 etc) are not appropriate for early diagnosis, differential diagnosis, prognosis and prediction of the disease outcome for individual patients. The most common type of human ovarian cancers is human epithelial ovarian cancer (EOC). This cancer is characterized by having one of the lowest survival rates among cancers.
For the past 30 years, epithelial ovarian cancer (EOC) mortality rate has remained high and unchanged, despite considerable efforts directed toward this disease (Siegel et al, 2012). This is because EOC patients are usually diagnosed at late stage with a 5-year survival rate of only 30% (Cho et al, 2009; Karst et al, 2011; Kim et al, 2012). This high-grade epithelial ovarian cancer (HG-EOC) is normally treated as a single entity, regardless of histological or molecular subtypes. However, HG-EOC frequently exhibits very high tumor heterogeneity, genome instability and altered gene expression (Levanon et al, 2008; Shih et al, 2011), which makes the proper subtype identification and signature discovery of HG-EOC essential tasks for facilitating the development of more effective therapeutic regimens.
Previous studies of OC signature discovery have focused on the differences in the gene expression profiles in OC cancer samples or cell lines relative to normal ovarian tissue samples (Nam et al, 2008; Dahiya et al, 2008; Zhang et al, 2008; Wang et al, 2012). Given that some cell lines might not represent actual patho-biological complexity and clonal evolution of the tumors, results from cell line based studies could not be easily interpreted in the context of a paradigm shift of OC etiology and molecular classification (Vaughan et al, 2011). Recent studies suggest that the majority of HG-EOC originates from the fimbriae of the fallopian tubes, or metastasis from carcinoma of the breast, colon or other tissues (Tuma, 2010). Therefore, two HG-EOC tissue samples with similar histological subtype could display distinct biological and clinical heterogeneity in the cellular context (Cho et al, 2009; Shih et al, 2011; TOGA, 2011; Wang et al, 2005; Helfand et al, 2011; Calin et al, 2006; Chan et al, 2012), which implies a more complex HG-SOC pathobiology and complicates the search for signatures that characterize this disease.
MicroRNAs (miRNAs) are small regulatory RNA molecules processed from hairpin-shaped nucleotide precursors (pre-miRNAs) that can be incorporated into RNA-induced silencing complexes (RISC), and regulate mRNA translation and/or transcription (Lagos-Quintana et al, 2001). Most miRNAs play critical roles in vital cellular processes, as they are highly conserved across species. Human miRNAs can regulate both oncogenes and tumor suppressors, and modulate diverse cellular processes, such as development, metabolism, cell division, differentiation, and apoptosis (Calin et al, 2006; Chan et al, 2012; Valastyan et al, 2011). The oncogenic or tumor suppressive properties of specific miRNAs are complex and often ambiguous. For example, miR-138, which was identified previously as a tumor suppressor in multiple carcinomas, can function as a pro-survival oncomiR in malignant gliomas. Moreover, work has showed that overexpression of mir-138 in gliomas plays a vital role in tumor-initiating cells with self-renewal potential and is clinically significant as a prospective prognostic biomarker and chemotherapeutic target (Chan et al, 2012). Therefore, the function of a miRNA is often cell type- and context-dependent.
There remains a need to determine biomarkers for prognosis of EOC and to find improved methods for the prognosis of EOC.
The present invention proposes, in general terms, methods, systems and kits for providing a prognosis of overall survival or prediction of therapeutic outcome (for example, chemotherapeutic outcome) for a patient suffering from epithelial ovarian cancer, in which expression of let-7b and/or miRNAs with which it is associated and/or genes within which it is associated are used to provide the prognosis and/or prediction of the therapeutic outcome. In another aspect the invention proposes methods and systems for identifying miRNA and/or gene signatures for use in a prognosis or and/or prediction of the therapeutic outcome
Embodiments relate to an analytical method to identify biologically meaningful and survival-significant microRNA biomarkers and their pro-oncogenic functions and their direct and indirect gene interactors. The method may involve integrating transcriptomic and clinical information with biological knowledge to assist in selection of the most clinically relevant biomarkers.
In certain embodiments, integrative genomics and survival analysis are used to identify associations of tumor transcriptome variations and clinical heterogeneity of HG-EOC. One-dimensional Data-driven grouping (DDg) survival prediction (Motakis et al, 2009) and clustering analyses may be used to assess the prognostic ability of individual let-7 members and their gene network interactors. In certain embodiments, EOC patients may be stratified based on analysis of transcriptional co-expression patterns, biological pathways and networks of miRNAs, integrated with clinical information via consequent application of the DDg and a statistically-weighted voting grouping (SWVg) method (Kuznetsov et al, 1996; Kuznetsov et al, 2006), adapted here to multivariate survival prediction analyses assessing stratification performance of a patient cohort using the measure(s) that minimized intercomparable p-values of two or more Kaplan-Meier (K-M) curves. Following the DDg and SWVg analysis, biological pathway and network enrichment analyses, and categorical agreement analysis (Agresti, 2007) between clinical markers and the stratified sub-groups from the SWVg analysis, may be used to select the most patho-biologically reasonable and clinically significant biomarker(s) for prognoses or predictions of therapeutic outcome.
In certain embodiments, a method of prognosis and therapeutic outcome prediction of high-grade epithelial ovarian cancer (HG-EOC) based on the measurements of microRNA let-7b and/or a set of 21 let-7b associated miRNAs and/or a set of 36 let-7b associated mRNAs in a patient tumor sample is also provided. Embodiments may relate to both the methods of identification of gene or microRNA signatures, and the resulting signatures themselves.
Embodiments relate to prognostic methods and computational methods which employ let-7b and/or let-7 associated non-coding and protein-coding entities for the purpose of ovarian cancer patient stratification and disease survivability prognosis. The method may involve stratification of high-grade epithelial ovarian carcinoma patients with respect to their disease prognosis. Advantageously, the method may be carried out as an unsupervised patient stratification method, using a survival model (Cox proportional hazards model) which includes expression profile data for selection of the most statistically significant expressed genes, leading to identification of new complex biomarkers which form a statistically weighted combination of genes related to let-7b miRNA expression. Not only does the method select survival significant features, it also provides statistically-based optimal stratification of the patients regarding the risk of death or (chemo)therapeutic resistance.
The 36-protein-coding-gene and 21-non-coding-miRNA prognostic signatures of embodiments of the invention are based on the expression patterns, in patient samples, of protein-coding genes and non-coding miRNAs correlated with the let-7b expression pattern in the samples.
Particular examples are directed to:
(i) HG-EOC prognostic ability of let-7b and the 36 mRNAs encoded by protein-coding genes associated with expression pattern of let-7b;
(ii) HG-EOC prognostic ability of let-7b and the 21 coding/non-coding genes associated with expression pattern of let-7b and its associations;
(iii) let-7b as an individual or collective (i.e., together with other biomarkers including members of the 21-miRNA prognostic signature or 36-mRNA prognostic signature) biomarker of HG-EOC;
(iv) methods of patient stratification.
(A) Multiple sequence alignment of mature miRNA sequences of let-7 family.
(B) Heat-map of expressions of let-7 family members based on k-means clustering for TCGA dataset (top) and GSE27290 dataset (below). Greyness represents the expression values of the let-7 family members. Dark grey and light grey represent up-regulated and down-regulated miRNAs respectively.
(C) Kaplan-Meier (K-M) survival curves of three subgroups of patients (low risk 110 and 140, intermediate risk 120 and 150, high risk 130 and 160) based on SWVg analysis in TCGA (top) and GSE27290 (below) datasets, based on overall survival (OS). Stratification performance is assessed by a minimization of intercomparable p-values of K-M curves in an overall survival analysis. The log-rank P-values of the three curves are listed.
(D) K-M survival curves of two subgroups of patients with different prognosis (and risks) of death, separated by DDg analysis of the expression profiles of a possible tumor suppressor, let-7a (top), and a possible oncogene, let-7b (below), in the TCGA dataset, based on OS. The log-rank P-values of two curves are listed. In the top panel, curve 170 represents the subgroup having high expression of let-7a, and curve 175 represents the subgroup having low expression of let-7a. In the lower panel, curve 180 represents the subgroup having low expression of let-7b, and curve 185 the subgroup having high expression of let-7b.
(A-B) Heatmaps of correlation values between let-7 members and 141 miRNAs for (A) TCGA and (B) Shih's dataset.
(C-D) Heatmaps of correlation values between let-7 members and 21 significant miRNAs for (C) TCGA and (D) Shih's dataset.
(E-F) Kaplan-Meier survival curves for dataset (E) TCGA and (F) GSE27290, generated via 1DDg and SWVg. In panels E and F, curves for low-risk (L), intermediate-risk (I) and high-risk (H) subgroups are shown.
Greyness in the heatmaps represents the correlation values of miRNA-mRNA probe pairs respectively. Dark grey and light grey represent positively and negatively-correlated respectively.
(A) Frequency distribution plots of Kendall-tau correlation coefficients across all 364 samples for each member of let-7 family, compared to the let-7 family and the entire background consisting of 2,571,080 miRNA-mRNA pairs (136 miRNAs vs 18905 mRNAs). The vertical dotted lines located at Tau=−0.122 and +0.122 specify the statistically significant FDR cut-off of 0.01.
(B) Flow-chart of extracting significant probesets for GO and pathway analysis. A Benjamini-Hochberg corrected p-value (FDR or q-value) of 0.01 was imposed and 2,971 mRNA probes that were significantly correlated with let-7b in both positive and negative direction were extracted. GO analysis was performed for both the positively correlated genes and negatively correlated genes of let-7b (DAVID Bioinformatics). Venn diagram of significant GO terms (q-value <0.05) revealed that gene functions associated with positively correlated genes and negatively correlated genes are distinct.
(C) Pathway enrichment analyses on both sets of probes were performed using Metacore™ from GeneGo Inc. A total of 162 genes (corresponding to 238 probes) were extracted from significant pathways (q-value <0.001) for further survival prediction analysis and signature selection.
(D) Survival significance of each of the 162 genes was assessed using one-dimensional data-driven grouping (DDg) method. The top-ranked survival-significant genes were further assessed via statistically weighted voting grouping (SWVg) to generate a survival gene signature. The 36-mRNA prognostic signature with involvement in DNA damage repair, cell cycle, cell adhesion, regulation of epithelial-to-mesenchymal transition and immune response, can provide strong stratification of the patients according to Kaplan-Meier survival curves for overall survival (OS) derived by SWVg via minimization of p-values in inter-comparison of Kaplan-Meier survival curves p-value=1.27E-19. Survival curves for low-risk (L), intermediate-risk (I) and high-risk (H)_subgroups stratified using the 36-mRNA signature are shown.
(A)-(C) Independent evaluation of the 36-mRNA prognostic signature. The three subgroups from independent datasets were predicted using the prediction model generated by our method from The Cancer Genome Atlas (TCGA) dataset (with same gene design and weight). The survival curves in Figure A, B and C were obtained from 230 tumor samples in GSE9899, 130 samples from GSE26712, and 157 samples from GSE13876, respectively. One of 36 genes (TUBB) is absent in dataset GSE13876. So, the 35 genes were utilized to generate the SWVg stratification model. L=low-risk, I=intermediate-risk, H=high-risk.
(D) Boxplots of log 2-expression levels for representative survival prognostic signature (SPS) genes that are survival significant as selected by our voting algorithm and that are also differentially expressed between the distinct prognostic (and risk) groups, as defined by the SPS.
(E) A model of let-7b-mediated transcriptional regulation in HG-SOC prognoses chemotherapy response and overall patient survival.
Bibliographic references mentioned in the present specification are for convenience listed in the form of a list of references and added at the end of the examples. The whole content of such bibliographic references is herein incorporated by reference.
The present inventors have found from computational analyses of EOC datasets that let-7b is an important member of the let-7 family exhibiting pro-oncogene characteristics and directly involved in progression of HG-EOC. Based on this, embodiments of the invention (i) identify 21 non-coding microRNAs which are significantly correlated with let-7b, (ii) identify a subset of let-7b associated genes significantly enriched for biological pathways which are critical for cancer progression and prognosis of patient survival, (iii) identify a let-7b associated 36 protein-coding gene prognostic signature from (ii) that can stratify HG-EOC patients into three survival significant clinical subgroups (low-, intermediate- and high-disease prognostic risk subgroups, significantly differentiated by the minimization of intercomparable p-values of K-M curves in the overall survival (OS) analysis, the corresponding tumors of which are considered to be distinct by virtue of the statistical significance of enrichment of the genes involved in specific biological pathways, and which differ in sensitivity to primary therapy. Embodiments also make use of the results of (i-iii) and propose the use of let-7b and/or the let-7b associated 21-miRNA prognostic signature and/or let-7b associated 36-mRNA prognostic signature in a kit pr prognostic assay for prediction of overall survival time and treatment outcome of individual HG-EOC patients in a clinical setting.
The present inventors have found that genes of the 36-mRNA prognostic signature are involved in pathways of immune response, cell-adhesion, DNA damage repair, cell cycle, and regulation of epithelial-to-mesenchymal transition which could constitute, independently or in various combinations, small-dimension survival prediction signatures of HG-EOC.
Currently, patients diagnosed with stage III-IV HG-EOC have poor prognosis where only 20-30% survive after 5 years. However, embodiments of the present invention can further stratify these patients into one of three disease prognostic risk subgroups, of which the low-risk subgroup has a relatively good 5-year survival rate of 65-72%. On the other hand, the intermediate- and high-risk subgroups have 5-year survival rates of 20-35% and 0-10% respectively. Furthermore, the high-risk subgroup is significantly correlated with the mesenchymal molecular subtype, which often exhibited stem-cell like properties of which chemo-resistance do not respond favorably to treatment, which contributes to a very poor mortality rate. The high-risk subgroup is also significantly associated with large tumor residual size or poor patient response after primary therapy. Contrary to that, the low-risk subgroup is significantly correlated with proliferative-subtype, of which the fast-dividing cancer cells could be sensitive to chemo-therapy. Embodiments use the biologically and clinically relevant 36-mRNA prognostic signature as a high-confidence prognostic tool to significantly stratify HG-EOC patients into three survival-significant, molecularly different and clinically distinct subclasses, which can improve patient risk assessment, management and counseling, as well as provide a solution for the optimization of personalized medicine strategy of treating human ovarian cancers in a clinical setting. Embodiments relate to a method of prognosis and outcome prediction of high-grade epithelial ovarian cancer (HG-EOC) based on the measurements of microRNA let-7b, the 21 let-7b associated miRNAs and the 36 let-7b associated mRNAs in the patient tumor samples.
Embodiments relate to the methods of identification and use of the resulting gene or microRNA signatures.
Embodiments may include one or more of the following features:
i) the identification of let-7b as an important master regulator and pro-oncogenic miRNA of the let-7 family in HG-EOC. This is based on a modification of data-driven grouping (DDg) analysis method predicting patient survival based on let-7b expression level in tumor cells and correlation analyses of let-7 family members' gene expression with expression levels of direct and indirect gene targets defined in the HG-EOC patient transcriptomes using microarray signals. DDg is a computational method, which classifies the patients into low and high-risk subgroups through the optimization of statistical difference between the two (or three) Kaplan-Meier survival curves generated by the optimal expression cut-off value of each gene. The cutoff value for a gene is generated based on expression data of that gene across a plurality of patient samples.
ii) the use of expression correlation analysis to identify microRNAs which are significantly associated with let-7b. In a particular example, the expression correlation analysis generates a 21-miRNA signature.
iii) the use of expression correlation and pathway enrichment analyses to identify a representative subset of let-7b-associated mRNA genes that are both significantly correlated with let-7b across all HG-EOC patients and are involved in the most statistically significantly enriched biological pathways which are critical for progression and metastasis of cancer.
iv) the use of DDg and a statistically-weighted voting grouping (SWVg) method to identify from (iii), a subset of biologically meaningful and survival significant genes that can provide clinically distinct and statistically significant stratification of HG-EOC patients into low-, intermediate- and high-risk subgroups, defined by the SWVg method, adapted to survival prediction analysis. The SWVg is a computational disease outcome prediction method that performs a goodness-of fit analysis to separate a cohort of patients into two or more subgroups belonging to distinct K-M curves. The K-M curves are constructed in a survival analysis using the multivariate Cox proportional model. The SWVg is used to obtain a consensus grouping decision from the grouping information (e.g. groups based on individual survival significant genes) generated from the DDg method. The initial patient cohort splitting performance is assessed via minimization by the SWVg via an assessment of intercomparable p-values of K-M curves in the multivariate overall survival data analysis. The log-rank p-values are used in the assessment. SWVg can be applicable to data generated from different kind of assays including but not limited to microarrays, PCR-based and sequencing-based detection systems (e.g. TaqMan, RNA-seq)
In a particular example, the combination of DDg and SWVg generates a 36-mRNA signature which provides the separation of a given patent group into the three statistically different overall survival subgroups.
Embodiments of the method may involve the analysis of gene and/or miRNA expression in tumour tissue samples, which can be obtained by biopsy. Expression analysis may also be performed using peritoneal sample tests, smear tests and blood tests. Samples used in expression analysis can be obtained from body fluids, for example blood, lympha, ascites, pleural fluid, peritoneal fluid, pericardial fluid, sputum, saliva, and urine.
Embodiments of the present invention provide the following advantages:
i) provide the stratification of large cohorts of HG-EOC patients into three distinct molecular subgroups with differential overall survival based on the expression values of the let-7b and the genes of the 36-mRNA signature.
ii) facilitate the study of each molecular subgroups defined in (i), with respect to their molecular features and tumor etiology of HG-EOC. In particular, regulation of EMT appears to be a practically important mechanism, and allows identification of biomarkers which can assist in discriminating into low-, intermediate- and high-risk subgroups.
iii) be used as a prognostic and primary (chemo)therapy outcome predictive tool in the clinics for patients diagnosed with HG-EOC based on the expression values of let-7b, let-7b associated 21-miRNA non-coding genes and let-7b associated 36-mRNA protein coding genes.
Embodiments may relate to one or more of the following:
1. A method of identifying biologically meaningful (significantly enriched with specific biological categories) and survival-significant gene signatures via integrating the sub-transcriptome of the genes correlated with the expression pattern of a given microRNA, and clinical information about patient survival with biological knowledge derived by application of pathway and/or network enrichment analysis, Data-Driven Grouping (DDg) analysis followed by Statistically-weighted voting grouping (SWVg).
2. A method of identifying therapeutic gene targets via integrating the sub-transcriptome of the genes correlated with expression pattern of a given microRNA and clinical information about patient survival with biological knowledge derived by application of pathway/network enrichment analysis and Data-Driven Grouping (DDg) analysis followed by Statistically-weighted voting grouping (SWVg).
3. A method to predict therapy outcome and classify cancer patients into low-, intermediate- and high-risk subgroups by measuring the expression levels of microRNA let-7b, a 21-miRNA prognosis signature and/or a 36-mRNA prognosis signature. Prediction of therapeutic outcome includes predicting whether a patient is likely to respond to therapeutics such as chemotherapeutic agents.
4. A 36-mRNA signature for prognosis of EOC as follows—DNMT1, CFD, CD93, MMP13, ARPC1B, CD44, PIK3R1, GNG12, CCL2, PLAUR, LAMA4, COL3A1, VCL, CAV2, FZD1, CALD1, EDNRA, TGFBR2, PDGFRA, FGFR1, HGF, POLR2D, POLR2J, CDK4, CHEK1, CCT2, CDC6, TUBB, NCAPD2, NCAPG2, POLA2, MCM2, TCP1, NCAPH, CBX3, and MIS12. In exemplary embodiments, a low-risk subgroup defined by the 36-mRNA prognosis signature has a 5-year overall survival rate of 65-72%, an intermediate-risk subgroup has a 5-year overall survival rate of 20-35%, and a high-risk subgroup has a 5-year overall survival rate of 0-10%.
5. A 21-miRNA survival signature for EOC prognosis as follows—miR-107, miR-103, miR-106b, miR-18a, miR-17-5p, miR-20b, miR-183, miR-25, miR-324-5p, miR-517c, miR-200a, miR-429, miR-200b, miR-96, miR-362, miR-127, miR-214, miR-136, miR-22, miR-320 and miR-486. In exemplary embodiments, a low-risk subgroup defined by the 21-miRNA prognosis signature has a 5-year overall survival rate of 53%, an intermediate-risk subgroup has a 5-year overall survival rate of 22%, and a high-risk subgroup has a 5-year overall survival rate of 8%.
6. A method of treating cancer in a subject by modulating the expression of protein-coding and/or non-coding genes that are positively correlated or negatively correlated with let-7b.
Results of analyses performed by the present inventors suggest that genes that are positively correlated or negatively correlated with let-7b in epithelial ovarian cancer could be involved in anti-apoptotic and apoptotic processes respectively. Furthermore, classification of the patients into the three distinct risk subgroups, followed by differential expression analysis revealed that genes up-regulated in the high-risk subgroup with respect to the low-risk subgroup are significantly enriched in negative regulation of apoptosis (FDR=0.0070) and anti-apoptosis (FDR=0.0072).
The 36-mRNA prognosis signature stratifies patients into three subgroups with different overall survival and primary therapy outcome. The mRNA signature may offer some suggestions (supported by statistical testing) whether a patient is likely to respond to primary (chemo) therapy.
Advantageously, embodiments of the presently disclosed method can perform prognostic feature selection on very high-dimensionality, noisy and mixture biomarker spaces and stratification. The prognostic feature selection method can be broadly used in prognosis of many types of diseases and medical conditions. Via survival data modeling and integration with statistically significant and biologically meaningful prognostic features, this method can be applied for analyzing any complex clinical data sets and used in disease subtypes classification, disease prognosis prediction, treatment assignment making decision, clinical trials design and clinical biomarkers discovery.
In an exemplary embodiment, a DDg-SWVg-based analysis was used to identify a subset of 36 mRNAs associated with let-7b that could stratify HG-EOC patients into three distinct disease prognosis risk subgroups where the low-risk subgroup has a 5-year overall survival rate of 65-72%. The p-values discriminating survival subgroups are 1.27E-19 (TCGA as training dataset) and 2.54E-17 (AOCS dataset, GEO accession number GSE27290, as test dataset). The 36-mRNA prognosis signature is represented by 7 genes (FZD1, CALD1, EDNRA, TGFBR2, PDGFRA, FGFR1, and HGF) involved in regulation of epithelial-to-mesenchymal transition, which suggests that the signature reflects specific molecular mechanisms related to ovarian cancer progression and to HG-EOC patient survival. The 36-mRNA signature is represented by 6 genes (PDGFRA, CDK4, CCL2, DNMT1, LAMA4 and GNG12) which were found in the published literature to be related to ovarian cancer, and 30 genes not previously associated with ovarian cancer. The 36-mRNA signature, as a composite biomarker, is able to stratify patients with HG-EOC into survival significant subgroups based on their risk of death or (chemo)therapeutic resistance. Accordingly, embodiments of the present invention provide for classification of patients already diagnosed with the disease into more discriminative survival subgroupings/stratification as compared to previously known methods. The signature can be implemented as a test/kit for survival prognosis of the HG-EOC patients.
In another exemplary embodiment, a DDg-SWVg-based analysis was used to identify 21 microRNAs which are significantly correlated with let-7b. Among the 21 microRNAs, 14 of them (miR-107, miR-103, miR-106b, miR-18a, miR-17-5p, miR-20b, miR-183, miR-25, miR-324-5p, miR-517c, miR-200a, miR-429, miR-200b, miR-96) are negatively correlated with let-7b and let-7c, while 7 of them (miR-362, miR-127, miR-214, miR-136, miR-22, miR-320, miR-486) are positively correlated. Overexpression of the 7 miRNA subset positively correlated with expression of let-7b provides relatively poor prognosis for HG-EOC, while overexpression of the 14 miRNA subset provides relatively good prognosis for the disease. Six miRNAs (miR-324-5p, miR-320, miR-136, miR-214, miR-17, and miR-18a) are survival significant (DDg p-value 0.01). Combining the 6 miRNAs into a survival signature could provide strong classification of patients according to their survival profile (p-value=6.26E-11). Furthermore, a signature comprising of all 21 miRNAs that are correlated with let-7b could provide further improvement in patient stratification (p-value=1.03E-12). The 21 miRNAs can significant stratify patients diagnosed with HG-EOC into low-, intermediate- and high-risk subgroups, where the 5-year survival rate is 8%, 22% and 53% respectively (p-value=1E-12). This result suggests that a signature comprising of 21-miRNAs or a signature comprising a subset of the 21 miRNAs could also be used as potential biomarkers of HG-EOC patient stratification.
Advantageously, generation of biologically meaningful gene signatures can be performed in an automated and unsupervised fashion.
In certain embodiments, methods of identifying candidate genes make use of a data-driven grouping (DDg) method which stratifies a patient cohort into two partitions, as described in Motakis et al (2009), US Patent Publication 20110320390 and US Patent Publication 20120004135, the entire contents of each of which are hereby incorporated by reference. In other embodiments, a generalization of the two-partition DDg method is possible, in which the DDg method can be used to partition a patient cohort into three (or possibly more than three) partitions wherever appropriate or meaningful. Briefly, DDg is a computational statistical-based method of identification of survival significant genes. This method is based on fitting a semi-parametric Cox proportional hazard regression model, which is used to fit patients' disease free survival times (t) and events (e) to a gene's expression data (y). The model estimates the optimal partition (cut-off) of a gene's expression level by maximizing the separation of the survival curves related to the high- and low-risk of the disease behavior (for two partitions) or low, intermediate and high-risk of the disease behavior (for three partitions). The method can identify single genes that exhibit a statistically significant influence on patients' survival and can divide patients into two or three distinct subgroups. In the presently described DDg analysis, an individual gene is ranked based on its ability to significantly classify patients into two or three subgroups. As a further optional step, the SWVg procedure uses the ranked list of genes from the DDg analysis to obtain a consensus grouping decision from the respective groups generated by two or more genes. The SWVg method selects statistically significant genes which were derived from a plurality of DDg models, each of which represents a way of partitioning a set of patients based on the optimal cut-off values of gene expression. Those genes are identified based on which one of the models has a high prognostic significance.
Embodiments of the present invention can be used as a prognostic tool to significantly stratify HG-EOC patients into three survival-significant molecularly different and clinically distinct subclasses can improve patient risk assessment, management and counseling, as well as provide a solution for the optimization of personalized medicine strategy of treating human ovarian cancers in a clinical setting. Currently, patients diagnosed with stage III HG-EOC have poor prognosis where only 30% survive after 5 years. Embodiments of the present invention, via the 36-mRNA (protein-coding) or 21-miRNA (non-protein coding) signature can further stratify these patients into more discriminative risk subgroups (low-risk, intermediate-risk and high-risk) which is an indication of the heterogeneous nature of this disease. In a clinical setting the present methods may be used by clinicians for patient prognosis, prediction of primary (chemo)therapy efficacy as well as the design of future personalized therapeutic intervention. Let-7b, as well as individual genes, subsets, and all genes of 36-mRNA and/or 21-miRNA prognostic signatures could be used as prognostic biomarker kits and assays.
Having now generally described the invention, the same will be more readily understood through reference to the following examples which are provided by way of illustration, and are not intended to be limiting of the present invention.
A person skilled in the art will appreciate that the present invention may be practised without undue experimentation according to the method given herein. The methods, techniques and chemicals are as described in the references given or from protocols in standard biotechnology and molecular biology text books.
As will be described in more detail below, individual let-7 members exhibited diverse evolutionary, regulatory and functional characteristics (
Thus, this methodological approach suggests the development of a novel class of combined biomarkers related to the regulatory pathways of pro-oncogenic agent let-7b. Let-7b associated 36-mRNA prognostic signature and 21-miRNA prognostic signature is clinically significant in HG-EOC, where the patients can be classified into one of low-, intermediate- or high-risk subgroups, with eventual implications on patient risk prognosis, assessment, management and patient therapy.
TCGA datasets containing miRNA and mRNA expression profiles and clinical data of SOC samples were obtained through The Cancer Genome Atlas (TCGA) data portal (Cancer Genome Atlas Research Network, 2008). The TOGA miRNA dataset contains 13 batches of 520 samples in total, with 8-47 samples in each batch. Most of the patients (>90%) in this dataset were classified as stage III SOC. The miRNA expression data were generated using the Agilent Human miRNA Microarray Platform 8X15K, based on the Sanger miRBase (release 10.1). Agilent oligo 60-mer probes used in this platform were produced by SurePrint Technology. The microarray dataset was generated from the same patient reservoir as the miRNA dataset on an Affymetrix U133A platform, which contains 22,277 probe sets. This dataset contained 11 batches of 463 primary solid ovarian cancer tissue samples, with 21-47 samples in each batch.
A second miRNA dataset, generated in the Australian Ovarian Cancer Study (AOCS) by Shih et al. consisted of 62 microRNA samples generated from advanced SOC patients (stage III and IV) (Shih et al, 2011). This dataset was obtained from the Gene Expression Omnibus (GEO) website under accession number GSE27290 (http://www.ncbi.nlm.nih.gov/geo/). The Shih et al miRNA expression dataset was generated using the Agilent Human MicroRNA Microarray Platform 8X15K, V1.0 (beta version of G4470A) based on the Sanger Database, 9.1. The Agilent oligo 60-mer probes used in this platform were also produced by SurePrint Technology.
We evaluated the performance of our signature on three independent mRNA expression datasets obtained from GEO under accession numbers GSE9899 (Tothill et al, 2008), GSE26712 (Bonome et al, 2008), and GSE13876 (Crijns et al, 2009). In the GSE9899 dataset, 246 samples with Malignant Ser/PapSer were selected. Among them, 22 samples were in stage I/II, 222 were in stage III/IV, and 2 were of an unknown stage. Ninety-six samples were in grade 1/2, 148 samples were in grade 3, and 2 were of an unknown grade. GSE26712 and GSE13876 datasets contained 185 late-stage HG-OC samples and 157 advanced-stage SOC samples, respectively.
Currently, grading systems for OC are qualitative and rather subjective, with high intra- and inter-observer viability (Hernandez et al, 1984). As there are borderline differences between low grade (grade 1/2) and high grade (3/4) SOC in TCGA dataset, we included few samples (<10%) with grade 1 and grade 2 in TOGA and GSE9899 datasets.
For each dataset, quality assessments were initially performed within each batch to identify poor quality chips. Background correction and normalization were then conducted within each batch. Finally, data from all batches were combined after batch effect adjustment.
For miRNA expression datasets, quality assessments were performed within each batch to identify poor quality chips, utilizing several visualization methods and statistical indicators on four typical signals from the Agilent platform (MeanSignal, ProcessedSignal, TotalProbeSignal, TotalGeneSignal). The statistical indicators were the median of log2 intensity, log intensity ratio M (difference of log intensity), relative log expression (RLE), and correlation among samples, Box plot statistics were utilized to identify outliers for each of the above indicators in each signal. Density plots and MA plots were used to visualize the homogeneity of the data. Samples that failed in more than two indicators for more than two signals were identified as outliers and subsequently removed. The indicators were estimated again for the remaining samples. This procedure was performed iteratively, until no more outliers were present. Background correction and normalization were performed within each batch. We utilized invariant set normalization (ISN), in which a subset of probesets with small rank differences in their intensities in a series of arrays were selected to serve as references ad hoc as the basis for fitting a normalization curve. The fitted curve, the cubic smoothing spline to the probe intensities of these arrays, was used to calculate the correction to all probesets. The probe-level expression values were summarized by the median across arrays. Alternative normalization methods such as quantile normalization could also be used. Non-parametric ComBat software (http://jlab.byu.edu/ComBat/; Johnson et al., 2007) was utilized to correct for batch effects.
For the mRNA expression datasets, box plot statistics, MA plots and density plots were utilized to perform the outlier identification before pre-processing. In each batch, scale factor, average background, percentage of present call, GAPDH 3′:5′ ratio, GAPDH 3′:M ratio, Beta-actin 3′:5′ ratio, Beta-actin 3′:M ratio, slope of the RNA degradation plot, Normalized unscaled standard error (NUSE) median, NUSE IQR, Relative Log Expression (RLE) median, and RLE IQR were used as quality metrics. A sample was identified as an outlier if was an outlier with respect to more than two of these metrics. This procedure was performed iteratively, until no more samples could be identified as outliers. Following background correction and normalization, the Model-based expression index (MBEI) method was used to calculate probe set summaries. Other probe set summary methods such as RMA, or MAS5 or PLIER of Affymetrix are also possible. Analysis Of Variance (ANOVA)-based models (Kerr and Churchill, 2001) were adopted to correct possible batch effects in the microarray data.
Filtration of Unreliable miRNA and mRNA Microarray Probe-Sets
For the miRNA microarrays, the average expression of each of the 723 miRNA probesets was calculated across all arrays. Only 136 miRNA probesets were significantly expressed after setting a minimum untransformed (i.e., on the original scale) expression cut-off value of 25, based on the distribution of average miRNA probe expression.
For the mRNA microarray, the APMA database (Orlov et al, 2007) was used to remove unreliable probe-sets where discrepancies were found in annotation and target sequence mapping. Subsequently, using HGNC database (downloaded on 8 Dec. 2010), existing Affymetrix symbols were converted whenever possible to approved gene symbols, and Affymetrix probesets that did not map to an approved gene symbol were removed and unused in subsequent analysis. A total of 18,905 reliable Affymetrix probe-sets were retained.
The Data-Driven grouping approach (DDg) for the two-group partitioning as described in Motakis et al. (2009) was applied to each dataset. In a generalization of DDg method, described in further detail below, a three-group partitioning of a patient cohort can be performed. DDg methods, whether they provide two-group or three-group partitioning, are based on fitting a semi-parametric Cox proportional-hazard regression model. The model was used to fit patients' overall survival (OS) times and events to gene expression data. The model estimates the optimal partition (cut-off) for the expression level of a gene by maximizing the separation of the survival curves related to the high- and low-risks of the disease behavior (for two subgroups partitioning), or low, intermediate and high-risks of the disease behavior (for three subgroups partitioning). The DDg method identifies single genes that exhibit a statistically significant influence on patients' survival or therapeutic outcome, and can divide patients into two or three distinct subgroups.
In this example, the 1D DDg method for feature selection procedure is used. Let the M×N matrix
denote preprocessed expression data (as described above) for N genes in M patients. xij is the expression level of the jth gene in the ith patient. Let numeric array T=(ti) denote the clinical outcome (survival time) of patients and nominal array E (ei) denote the clinical event (1=deceased, 0=alive). For the jth gene, let us rank-order the M patients according to the value of expression level of the gene. According to our model, in the case of unfavorable clinical outcome, a positive correlation between risk of death and gene expression level could be observed; alternatively, in the case of favorable clinical outcome, a negative correlation between risk of death and gene expression level could be observed. Assuming that the clinical outcomes are negatively (or positively) correlated with the expression of gene j, patient i can be separated into two subgroups (1=“high-risk”, 0=“low-risk”) at a pre-defined expression cutoff value cj of the expression level of the j-th gene with the following formulae:
in the case of unfavorable clinical outcome (positive correlation between risk of death and gene expression level), and
in the case of favorable clinical outcome (negative correlation between risk of death and gene expression level).
The survival curves corresponding to a favorable clinical outcome, given cutoff value cj, can be described by K-M curves, characterizing a time-course of the probability of clinical outcome/events. The K-M curves could be fitted by a Cox proportional hazard regression model:
log hij(ti|yij,βj)=αj+βj·yij, (2)
where hij the hazard function, αj=log hij(t) represents the unspecified log-baseline hazard function when all of the y's are zero, and βj is the regression parameter, and can be estimated by using the univariate Cox partial likelihood function:
where R(tk)={k: tk≧ti} is the risk set at time ti.
For gene j at optimized cutoff value cj, the Wald statistic (W) of the {circumflex over (β)}j for each Cox proportional hazard regression model is estimated and serves as a measure of the subgroup discrimination. The genes with the largest βi Wald Statistics (Wj's) and having a p-value equal to or smaller than a predetermined threshold (typically, p-value ≦0.05) are considered. The method uses all potential predictors (e.g. all Affymetrix microarray probesets representing the expressed genes) as an input of the univariate or multivariate survival analysis. Our method processes these potential predictors/features and provides selection of the features as long as the p-value of the survival test statistic (e.g. the Wald statistic) for a given feature is equal to or less than the predetermined cut-off value (for instance, p≦0.05). The features providing p-values equal to or less than the cut-off value are picked up, rank-ordered by their p-value, and finally considered as the survival significant predictors.
Equations 1a and 1b suggest that the selection of prognostic-significant genes relies on the pre-defined expression cutoff value cj of gene j based on which patients could be separated into two subgroups. A data-driven method (DDg) was developed to identify ‘the optimal’ cj of gene j, which could ‘most successfully’ discriminate two subgroups corresponding to the minimum log-rank p-value with Wald estimation of βj. The optimal value cj of gene j provides a maximization of the difference between two K-M curves corresponding to the favorable and unfavorable clinical outcomes. The searching interval for optimal value cj is defined between the 10th quantile and 90th quantile of the distribution of the signal intensity values for gene j. The detailed procedure can be found in the reference by Motakis et. al. (2009), the contents of which are incorporated by reference herein.
When 1D-DDg analysis is applied to separating three groups, two expression cutoffs of a mRNA or miRNA corresponding to local minimum p-values (e.g. corresponding to the Wald statistics) of a potential survival plot (left panel of
Similar calculation procedures as in 1D-DDg could be applied. The data-driven “goodness-of-fit” method is utilized to identify the optimal cutoffs c1j and c2j of miRNA j, which could ‘most successfully’ discriminate three groups corresponding to two minimum values of the score estimated as a multiplication of three pairwise Wald p-values among three survival curves.
A Statistically weighted voting (SWVg) procedure based on DDg was utilized to obtain consensus grouping decisions from the grouping information generated by multiple covariates (e.g. microarray expressed genes).
A list of genes is ordered in ascending values according to their p-values generated from the DDg procedure above. The numeric grouping value for sample i could be calculated by the formula GiN=Σj=1NwjGij, where N is the number of genes and Gij is the group allocation for sample i assigned by gene j in the DDg. The weight wj is calculated by the formula
where pj is the p-value of gene j in the DDg procedure.
In a particular example where samples are divided into two groups, patient i could be separated into two subgroups (1=“high-risk”, 0=“low-risk”) at a pre-defined cutoff value (GC) of GiN with the following formula:
A Cox proportional hazard regression model is estimated by using a univariate Cox partial likelihood function with the method described in the DDg procedure.
Wald statistic of {circumflex over (β)}j is estimated and serves as an indicator to evaluate the ability of group discrimination for gene j at cutoff GC. The searching space of GC is from 0.2 to 0.8, with an increment of 0.01 for each step. The GC that provides the minimum log-rank p-values in the searching space is the optimized GC. The above-described procedure is repeated for different N, which varies from 3 to the number of genes assigned. The number (Nopt) and combination of genes are optimized for minimum log-rank p-values.
In a particular example where the samples are divided into three subgroups, two cutoff values (GC1, GC2, GC1<GC2) of yiN are calculated according to the following formula:
A Cox proportional hazard regression model and log-rank statistic estimates are computed. GC1 is searched in the range from 0.2 and 0.44, with an increment of 0.01 for each step; while GC2 is searched in the range from 0.56 to 0.8, with an increment of 0.01 for each step. GC1, GC2 and Nopt are optimized for the minimum value of multiplication of pair-wise log-rank p-values of 3 survival curves.
Open source clustering software Cluster 3.0 and visualization software Java Treeview (Eisen et al, 1998) were utilized to perform K-means clustering with k=3. Kendall tau correlation was used to measure the distance matrix. The Kaplan-Meier survival analysis was used to calculate the survival status of each cluster. The log-rank test was used to compare the survival distribution of the three samples.
Gene ontology analyses were performed via DAVID Bioinformatics tools (Huang et al, 2009) and MetaCore™ (version 6.8 build 29806, from GeneGo Inc). In both analyses, the filtered list of 18,905 reliable Affymetrix probe-sets was uploaded as background to prevent any systematic bias during the statistical calculations. In DAVID Bioinformatics tools, categories of interest included OMIM, GO_BP_GAT, GO_CC_FAT, GO_MF_FAT, Panther_BP_AII, Panther_MF_AII, BBID, BIOCARTA, KEGG, Interpro, PIR_Superfamily, SMART and UP_TISSUE. In MetaCore, gene enrichment reports in curated pathways, processes, and diseases were generated.
From the let-7b-associated mRNA signatures comprising 36 genes, 350 patients from TCGA ovarian cancer database were able to be stratified into three distinct subgroups, where the low-, intermediate- and high-risk subgroups showed distinct 5-year survival rates of 64%, 12% and 10%, respectively. For each miRNA and mRNA probe, pair-wise differential expression was performed among the three subgroups, which contained 106, 188 and 56 patients in the low-, intermediate- and high-risk subgroups, respectively. The significances of the differential expression were calculated using non-parametric Mann-Whitney test and corrected for multiple probe testing (across all probsets in U133A platform) via the Benjamini-Hochberg Step-Up FDR method. Subsequently, for each pair of risk subgroup transition (i.e., low to intermediate-risk or high to low-risk), the differentially expressed probesets (FDR≦0.05) were extracted to perform gene ontology analysis.
To assess the stability of the groupings obtained via 1D DDg and SWVg, a ten-fold cross validation procedure can be performed as follows:
Comparison of the patient grouping from ten-fold cross validation with the original DDg-SWVg provides strong indication that the parameters of 1D DDg and SWVg are stable, and can be applied reliably to independent patient or set of patients (Table 1,
Comparison of the Let-7b-Associated 36-mRNA Prognosis Signature with Random Gene ID Lists
Prior to survival analyses, 162 Affymetrix U133A probesets correlated with let-7b and significantly associated with biological pathways were selected. For each of these 162 probesets, survival significance of the individual probeset was evaluated. Finally, via statistically-weighted voting, the let-7b-associated 36-mRNA prognosis signature comprising of the top 36 survival-significant genes were able to separate patients into three distinct risk subgroups of which the significance of separation is measured by a log-rank p-value.
To validate our biomarker selection methods, a set of negative control probes were defined as those that were not 1D DDg survival significant (p-value >0.1). From this set of negative control probesets, 999 probeset lists, each containing 162 probesets, were randomly generated without replacement within each list. Each list was generated independently from the list of negative control probesets. For each randomly generated list, similar 1D DDg and SWVg analyses were performed on the 162 probes to eventually generate the let-7b-associated 36-mRNA prognosis signature.
The log-rank p-value of our actual 36-mRNA prognosis signature was compared to the distribution of the random log-rank p-values.
Tests on the associations of two miRNAs or miRNA-mRNA pairs were calculated using Kendall's tau correlation. To correct for multiple observations, we adjusted the P-value using Benjamini-Hochberg step-up FDR correction. Clustering analysis of the correlation coefficients of all of the combinations of let-7s and mRNA probes were performed. We extracted a subset of Affymetrix mRNA probe-sets that showed a strong correlation (FDR <0.01) for any of the let-7 members and performed hierarchical clustering analysis.
Pathway enrichment analyses were performed for positively and negatively correlated genes of let-7b independently. Pathways that were significantly associated with the positively and negatively correlated probes of let-7b (p-value <0.001) were generated by MetaCore. The expression values of specific genes were obtained from the probes with the most significant correlation with let-7b. The values were then used in an integrative analysis of the individual gene expression with the clinical data across all patients to examine the prognostic ability of each of these genes to predict HG-SOC patients' post-surgery survivability. Significant mRNAs were utilized in a SWVg procedure, where weights were assigned to the ranked list of DDg survival-significant genes to derive a representative gene signature to discriminate patients into low-, intermediate- and high-risk post-surgery treatment outcomes.
Univariate hazard ratios (HR) were calculated with 95-percent confidence intervals (95% CI) in Cox proportional-hazards model. Probabilities of overall survival (OS) were estimated by the Kaplan-Meier method, and the Wald test from the corresponding models was utilized to compare time-to-event distributions. Other co-variates included tumor stage, histologic grade, primary therapy outcome success, and tumor residual disease. The simultaneous prognostic effect of various factors was determined in a multivariate analysis in a Cox proportional-hazards model. The level of agreement between our predicted molecular subgroups and the clinical subgroups were evaluated by weighted Kappa correlation value (StatXact-9). The significance of the agreement was estimated by Mantel-Haenszel (MH) test (Agresti, 2007). All P-values are two-sided.
The reporting recommendations for tumor marker prognostic studies (REMARK; McShane et al, 2005) were adopted to identify potential biomarkers. We analyzed two independent miRNA expression datasets (TCGA and GSE27290, as discussed above) collected from HG-SOC patients (Tables 2 and 3).
+ median survival time is calculated from the information of the deceased patients only
* Alive patients with follow-up <5 years or patient with no follow-up information
After removing outlier samples, 514 profiles in TCGA dataset, and 49 profiles in GSE27290 qualified for the analysis (
For the GSE27290 dataset, 49 samples were separated into three risk subgroups (low-, intermediate- and high-risk), and 27 of these samples (55%) were clustered consistently by the two methods (Table 5). The log-rank test showed significant differences in the OS among the three subgroups. Specifically, the expressions of let-7b and let-7c were higher in the high-risk subgroup as compared with that in the low-risk subgroup. In contrast, the expression levels of let-7a, let-7f and let-7g were lower in both high- and intermediate-risk subgroups as compared with those in the low-risk subgroup. Similar sub-groupings and results were obtained by analyzing the samples in TCGA dataset. The expression of let-7b and let-7c were higher in the high-risk subgroup than that in the low-risk subgroup, suggesting unfavorable influences of both miRNAs on post-surgery treatment responses of HG-SOC patients (
Furthermore, we utilized an online tool MIRUMIR (Antonov et al., 2012; www.bioprofiling.de/GEO/MIRUMIR/mirumir.html) to assess the relationship between expression levels of let-7 members with clinical outcomes (particularly, OS) and found that let-7b and let-7c have different functions in different cancer types. The higher expression levels were associated with relatively poor prognosis for HG-SOC patients, relatively good prognosis for breast cancer patients and no survival significance among prostate cancer patients (
A correlation analysis of miRNA expression between let-7 members for both datasets (
Hierarchical clustering analysis was performed on the correlation coefficients of let-7 with 141 miRNAs present in both TCGA and GSE27290 datasets (
To achieve an understanding of the correlation patterns of the miRNAs across the genome, we performed correlation analysis between miRNA and mRNA probesets represented in the TCGA microarray datasets, and identified classes of protein-coding genes potentially controlled by the let-7 family. For each member, the distribution curves of correlation coefficients with all mRNA probes were compared with the background distribution. The correlation pattern associated with let-7b was distinct from the background distribution for all miRNA-mRNA pairs. Specifically, the frequency distribution of the correlation coefficients for let-7b had a wider profile, suggesting that let-7b was strongly correlated with a large number of mRNAs in the HG-SOC genome (
In total, the expression levels of 4,126 Affymetrix U133A probesets were significantly correlated with the expression levels of any of the let-7 family members (FDR<0.01,
To investigate whether mRNAs correlated with let-7b could be significantly enriched in any biological pathways, we performed enrichment analysis using MetaCore (
In contrast, from 1457 probesets that were negatively correlated with let-7b (FDR <0.01), 122 unique probesets were significantly enriched in eleven pathways associated with processes such as cell cycle regulation, metaphase checkpoints, DNA replication start, damage and DNA repair, role of BRCA1 and BRCA2 in DNA repair, spindle assembly, role of APC in cell cycle regulation, chromosome separation and condensation, apoptosis and survival (P-value<0.001,
Overall, within the significantly enriched biological pathways, a total of 238 probesets (corresponding to 162 unique genes) were significantly correlated with let-7b (
The majority of the SPS genes could be considered as novel prospective biomarkers, with only six SPS genes (PDGFRA, CDK4, CCL2, DNMT1, LAMA4 and GNG12) previously known to be in an OC signature.
Importantly, the 5-year OS rates for the low- and high-risk subgroups by our SPS signature were 64% and 10%, respectively. The univariate analysis showed that the hazard ratio (HR) of high-risk with respect to low-risk was 7.78, with a confidence interval (CI) of 4.84 to 12.52 (P-value <1E-16, Table 9).
In Table 9, patients belonging to the TCGA ovarian cancer dataset were analyzed. P-values were obtained from the Wald statistic. Only significant factors are included here.
Multivariate and survival analyses indicated that SPS could provide a strong post-surgery prognostic classification of patients that surpasses clinicopathological parameters, such as histological grade/stage, or conventional biomarkers, such as CA125, HE4, P53, or MYC (Table 10,
To validate our procedures of biomarker selection and the computational algorithms used, we randomly generated 999 probeset lists, each containing 162 probesets from a list of negative control probesets and performed similar DDg and SWVg analyses as described earlier. Within, the same TCGA dataset, our SPS significantly outperformed those of the negative controls (FDR=3E-3,
Next, we validated our SPS and prediction model on three independent datasets—GSE9899, GSE26712, and GSE13876—which contain 246 OC samples (90% in stage III/IV), 185 late-stage HG-OC samples and 157 advanced-stage SOC samples, respectively (
The 5-year survival rates were 56-71%, 21-29%, and 0-4.6% for three risk subgroups, respectively. This analysis strongly supports our SPS and suggests the potential application of SPS in clinical settings.
Kappa correlation coefficient revealed significant associations between patient subgroupings based on our risk classification and clinical parameters, such as tumor stage (P-value=3E-4), tumor residual size (P-value=0.01), and chemotherapy response (P-value=1E-3). These findings suggest the potential application of our SPS in predicting therapy outcome (Table 12).
*others/no information
*others/no information
*others/no information
*others/no information
*others/no information
*others/no information
*others/no information
#Classification
*These subcategories were not included in the calculation of Kappa coefficient.
#The 21 miRNAs, correlated with let-7b in the TCGA dataset are assessed for their patient prognostic classification using DDg and SWVg methods.
Also, we compared our patient classification with previously reported subgroupings, where patients were classified based on molecular subtypes such as differentiated-type, immunoreactive-type, mesenchymal-type and proliferative-type (TCGA, 2011). We observed that our low-risk and high-risk patients were significantly correlated with proliferative-type and mesenchymal-type, respectively (P-value=1E-18, Table 12). However, unlike our classification, which significantly stratified patients into three risk subgroups, the subgrouping based on TCGA molecular subtypes did not show prognostic significance (
DDG-SWVg was applied to high-grade epithelial ovarian carcinoma (HG-EOC) data from The Cancer Genome Atlas (TCGA) and Australian Ovarian Cancer Study (AOCS) [GEO accession no. GSE27290], where TCGA was used as a training dataset and AOCS as an independent evaluation dataset. For both datasets, data pre-processing was performed, including identification and removal of poor-quality chips, normalization of data across multiple microarray chips and finally batch effect correction as described above. In the TCGA dataset, survival analysis via DDg method of individual members of let-7 family first revealed the clear heterogeneity of let-7 family, where let-7b and let-7c exhibited pro-oncogenic pattern in HG-EOC. Next, expression correlation analysis of individual let-7 members with all mRNAs revealed the distinctly strong correlation pattern of let-7b when compared to the rest of the let-7 members. Pathway enrichment analyses were performed on two lists of genes using MetaCore from GeneGo Inc.: genes positively correlated with let-7b (Kendall-tau measure of correlation, FDR≦0.01) and genes negatively correlated with let-7b (Kendall-tau measure of correlation, FDR≦0.01). Genes that are significantly correlated with let-7b (Kendall-tau measure of correlation, FDR≦0.01) and also involved in the top significant pathway maps (P≦0.001) were extracted. In this example,
The let-7b associated 36 genes are involved in methionine metabolism (DNMT1), immune response (CFD, CD93), cell-adhesion (MMP13, ARPC1B, CD44, PIK3R1, GNG12, CCL2, PLAUR, LAMA4, COL3A1, VCL, CAV2), regulation of epithelial-to-mesenchymal transition (FZD1, CALD1, EDNRA, TGFBR2, PDGFRA, FGFR1, HGF), DNA damage repair (POLR2D, POLR2J, CDK4, CHEK1) and cell-cycle (CCT2, CDC6, TUBB, NCAPD2, NCAPG2, POLA2, MCM2, TCP1, NCAPH, CBX3, MIS12, CDK4, CHEK1). The 36-mRNA prognosis signature can further stratify these patients into three risk subgroups, of which the low-risk subgroup has a relatively good 5-year survival rate of 65%. On the other hand, the intermediate- and high-risk subgroup has a 5-year survival rate of only 20% and 10% respectively. In a test dataset (AOCS), the 36-mRNA prognosis signature could provide similar classification of these independent patients, by using the prediction model constructed from TCGA dataset, into three risk subgroups (p-value=2.54E-17), of which the low-risk subgroup has a relatively good 5-year survival rate of 72%, while the intermediate- and high-risk subgroup has a 5 year survival rate of 35% and 0% respectively. This evaluation analysis could suggest the application of the 36-mRNA prognosis signature in potential clinical settings.
The twenty-one miRNAs (miR-107, miR-103, miR-106b, miR-18a, miR-17-5p, miR-20b, miR-183, miR-25, miR-324-5p, miR-517c, miR-200a, miR-429, miR-200b, miR-96, miR-362, miR-127, miR-214, miR-136, miR-22, miR-320 and miR-486) showed strong correlations with all of the let-7 family members, with fourteen of them negatively correlated with let-7b and let-7c, while seven were positively correlated. Both positively and negatively correlated miRNAs contain known oncogene and tumor suppressors. Using DDg and SWVg, it was observed that TOGA HG-EOC patients can be significantly stratify patients diagnosed with HG-EOC into low-, intermediate- and high-risk subgroups, where the 5-year survival rate is 8%, 22% and 53% respectively (p-value=1E-12). This suggests the application of this 21-miRNA signature in potential clinical settings.
Differential expression and gene ontology analysis of the patient subgroups suggest that 26 key genes involved in HG-SOC regulatory programs could be candidate therapeutic targets.
The results of the differential expression analysis revealed a clear dichotomy of gene function enrichments associated with either transition from lower to higher-risk patients or transition from higher to lower-risk patients. Crucially, we observed that gene sets significantly up-regulated (FDR <0.05) in higher-risk patients relative to lower-risk patients were typically enriched in the genes with GO functions related to ECM, response to wounding, cell motion and angiogenesis (Tables 13 to 18), while gene sets significantly up-regulated in lower-risk patients relative to higher-risk patients were enriched in the genes with GO functions including cell cycle, DNA replication, mitosis and DNA repair. Therefore, distinct and specific cellular programs could dominate during transitions between different prognostic risk subgroups as defined by our SPS, and our results suggest that key genes involved in HG-EOC regulatory programs could be candidate therapeutic targets. Specifically, our analysis revealed that 26 of the 36 genes in our SPS were found to be differentially expressed across the three risk subgroups, with pairwise significance as FDR <0.05 (Table 19). The genes include PDGFRA, CAV2, FZD1, EDNRA, MMP13, HGF, PLAUR and COL3A1, which were independently and collectively are strong survival significant, and could be therapeutic targets (
Furthermore, results also suggest that within the 36-mRNA prognostic signature, genes associated with regulation of epithelial-to-mesenchymal transition are enriched (Table 20).
Number | Date | Country | Kind |
---|---|---|---|
201207691-5 | Oct 2012 | SG | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/SG2013/000436 | 10/11/2013 | WO | 00 |