This application relates to epigenomic marker sets for assessing the risk of cancer metastasis, and to the use of such marker sets in methods and kits. The marker sets are particularly applicable to methods and kits for use in connection with human breast cancer, colon cancer and glioma.
Breast cancer is one of the most prevalent human malignancies and is a major cause of cancer-related morbidity and mortality. Invasive ductal carcinoma (IDC) of the breast is a phenotypically diverse disease, consisting of tumors with varying pathologic and molecular characteristics (3-5). The primary biological subtypes of IDC include estrogen receptor (ER) and progesterone receptor (PR)-positive tumors (luminal A and B) and tumors that are ER/PR-negative (basal-like and HER2-enriched). These molecular determinants have significant effects on metastatic behavior, response to therapy, and clinical outcome. For example, ER/PR-positive tumors are generally associated with better clinical prognosis while basal-like (ER/PR-negative and HER2-negative, triple-negative) tumors are associated with higher rates of metastasis and death (6-9). The genomic alterations—including both genetic and epigenetic aberrations—underlying these differing metastatic potentials are ill-defined.
Significant effort has been undertaken to more accurately define the molecular alterations underlying breast cancer. For example, it has been shown that hormone receptor (HR) status is prognostic for clinical outcome. Mutations in genes such as BRCA1, PTEN, and PIK3CA help promote breast cancer oncogenesis, and are enriched in specific subgroups of IDC (10-12). Genomewide sequencing surveys have been performed to identify the scope of mutations in breast cancers (13-15). These data demonstrate that there exists substantial biological heterogeneity between and within the ER/PR positive and negative subgroups for which the molecular foundations remain obscure (13). In addition, gene expression classifiers have been developed to help predict metastatic risk. (16-18). Despite their increasing use in the clinic, the genomic root causes of these transcriptome differences that underlie metastatic potential remains unclear.
In addition to genomic variations of this type, changes in phenotype or gene expression caused by mechanisms other than changes in the underlying DNA sequence may occur and be involved in the onset and progression of cancers. Changes of this type are referred to as “epigenetic” or “epigenomic” variations.
Widespread changes in DNA methylation patterns have been reported to occur during oncogenesis and tumor progression (19, 20). Cancer specific changes in DNA methylation can alter genetic stability, genomic structure, and gene expression (21, 22). Promoter CpG island methylation can result in transcriptional silencing, and thus loss of function of tumor suppressor genes, and plays an important role in the oncogenic process (19). CIMP (CpG island methylator phenotype), which is associated with a strong tendency to hypermethylate specific loci, has been described in a subset of colorectal cancers, and recently in a subgroup of gliomas (2, 23). Aberrations in DNA methylation have been reported in human breast cancer but the impact of the methylome on metastasis and the presence of breast CIMP has remained elusive (24-30).
US Patent Publication No. 2010/0273164, which is incorporated herein by reference, discloses methods for detection of methylated cytosine residues in a target nucleic acid. Van der Auwera et al., PLosOne (2010) 5: 1-10, which is incorporated herein by reference, discloses evaluation of methylation in breast cancer to arrive at a 14 gene classifier in which methylation of NIP, CHGA, OSR1, GFRA3, KLK10, SSTR1, EFCPB2, PPARG, PRKAR1B, ABCG2, FGF5, PLTP, GRASP and PAX7 is used to distinguish inflammatory breast cancer from non-inflammatory. Van der Auwere et al. also reported that high methylation with this classifier was observed in samples with distant metastases and poor prognosis. US Patent Publication No. 2010/0209906 which is incorporated herein by reference relates to detection of methylation in colon cancer.
The present application provides a classifier that can be used in the prediction of metastatic risk in a patient, with particular applicability to patients diagnosed with breast cancer, colon cancer, or glioma.
In one aspect, the invention provides a method for assessing risk of metastasis in a cancer patient identified as having breast cancer, colon cancer or glioma, comprising the steps of:
(a) obtaining a sample of tumor tissue from the patient;
(b) evaluating the sample for hypermethylation of a plurality of genes, and
(c) based on the evaluation of step (b) determining whether and/or to what extent the patient is at risk of cancer metastasis. The plurality of genes includes at least three genes selected from the group consisting of: ALX4, CRABP1, ADAM23, MOXD1, CHST2, FAM89A, RNH1, B3GNT5, KCNIP1, SLC16A12, RUNX3, LYN, PSAT1, RASGRF2, SOX8, ARHGEF7, ADAM12, PYGO1, P2RY1, FLJ25477, FBN1, PROX1, FOXL2, KCNJ2, SMOC1, MCF2L2, BMP3, TRIM29, GRIK1, ALK, C2orf32, VIM, AKAP12, EIF5A2, DZIP1, FLJ34922, TMEM22, LBX1, GJA7, HAAO, KLK10, ZAR1, DPYSL5, SLIT2, RGS17, KIAA1822, PTGFR, FBN2, ST6GALNAC3, VAX1, GPR83, TBX2, SIX3, ACADL, and ASRGL1. In exemplary embodiments, 3-20 genes, for example 3-10 genes, from this list are evaluated in a patient with breast cancer.
In specific embodiments, the plurality of genes includes at least three genes selected from among RASGF2, ARHGEF7, FBN1, SOX8, CRABP1, FOXL2, and ALX4.
In a further aspect of the invention, a set of 33 genes is provided as a single classifier that can be used in prediction of risk of metastasis in patients with breast cancer, colon cancer or glioma. This set of genes includes three genes from the set of genes above plus additional genes. Using this classifier, methylation is assessed for ADAM12, ALX4, FOX12, ACOT12, ACTA1, AOX1, C1orf76, CD8A, DES, DMN, DMT1, DYPSL4, EYA4, FLJ14834, GCM2, GSH2, LOC112937, LOC389112937, LOC399458, MAL, MEGF10, MGC26856, NEUROG1, PDPN, RPL39L, SFRP2, SLC13A5, SYT6, TFP12, THSD3, TLR2 and TP73.
A further aspect of the invention is a kit which may be used in methods for assessment of metastatic. Such a kit consists essentially of materials for the evaluation of metastasis risk in a cancer patient identified as having breast cancer, colon cancer or glioma, said kit includes reagents for determination of the extent of methylation of a plurality of genes, wherein the plurality of genes includes at least three genes selected from the group consisting of: ALX4, CRABP1, ADAM23, MOXD1, CHST2, FAM89A, RNH1, B3GNT5, KCNIP1, SLC16A12, RUNX3, LYN, PSAT1, RASGRF2, SOX8, ARHGEF7, ADAM12, PYGO1, P2RY1, FLJ25477, FBN1, PROX1, FOXL2, KCNJ2, SMOC1, MCF2L2, BMP3, TRIM29, GRIK1, ALK, C2orf32, VIM, AKAP12, EIF5A2, DZIP1, FLJ34922, TMEM22, LBX1, GJA7, HAAO, KLK10, ZAR1, DPYSL5, SLIT2, RGS17, KIAA1822, PTGFR, FBN2, ST6GALNAC3, VAX1, GPR83, TBX2, SIX3, ACADL, and ASRGL1. In exemplary embodiments, the kit includes materials for assessment of less than 100 genes, preferably less than 50 genes. For example, the kit may include materials for assessing methylation in 3 to 20 genes, for example 3-10 genes, from the above list.
In another embodiment, the kit includes materials for assessing methylation in the genes ACOT12, ACTA1, AOX1, C1orf76, CD8A, DES, DMN, DMT1, DYPSL4, EYA4, FLJ14834, GCM2, GSH2, LOC112937, LOC389112937, LOC399458, MAL, MEGF10, MGC26856, NEUROG1, PDPN, RPL39L, SFRP2, SLC13A5, SYT6, TFP12, THSD3, TLR2 and TP73
The present invention is based on a genome-wide analysis to characterize the methylomes of breast cancers with diverse metastatic behavior. This analysis led to the identification of a subset of breast tumors that display coordinate hypermethylation at a large number of genes, demonstrating the existence of a breast-CpG island methylator phenotype (B-CIMP). B-CIMP imparts a distinct epigenomic profile and is a strong determinant of metastatic behavior. B-CIMP loci are highly enriched for genes that define metastatic potential. Importantly, methylation at B-CIMP genes account for much of the transcriptomal diversity between breast cancers of varying prognosis, indicating a fundamental epigenomic contribution to metastatic risk.
As used in this application, the term “risk of metastasis” refers to a prognostic indication that the cancer in a particular patient, particularly a human patient, will advance to a metastatic state based on statistical predictors. Actual advance to a metastatic state is not required, and adoption of treatment modalities to try to delay or prevent the realization of such risk is anticipated to occur.
As used in this application, the term “obtaining a sample of tumor tissue from the patient” refers to obtaining a specimen of tumor, for example a biopsy specimen, or a portion of a surgically excised specimen from a patient for use in testing. The sample may be collected by the person performing the assay procedures, but will more commonly be collected by a third party and then sent for assay. Either the actual collection or the receipt of a sample for assay is within the scope of the term “obtaining a sample.”
In the assay methods and kits of the invention, the sample is evaluated for the extent of methylation, and preferably for hypermethylation of a plurality of genes. In general the number of genes will be less than 50 genes, and preferably will be in the range of 3 to 20 genes, for preferably 3 to 10 genes. Selection of the genes and the number of genes evaluated is suitably based the prognostic value of the genes. Where genes with higher prognostic value are evaluated, fewer genes need to be evaluated to arrive at a reliable indication of risk of breast cancer metastasis. A gene with a high prognostic value is one that has a high correlation between hypermethylation and metastasis risk. For example, in Table 2, the q value in Column 4 is an indicator the statistical significance of the relation between hypermethylation of the indicated gene and a decrease in metastatic risk. It can be seen that ALX4, when analysed with the probeset cg04988423 has a very high statistical significance (small q value). Thus, tools that include analysis of this gene will need fewer tests to achieve statistical reliability. On the other hand, tests that include no genes from the top 50 genes in Table 2 should evaluate more genes in the assay method and/or kit.
In some embodiments, the kits of the present invention consist essentially of materials for the evaluation of metastasis risk in a cancer patient identified as having breast cancer, colon cancer or glioma and include materials for detection of the extent of methylation in at least some specified genes. As used in this context, the term “consisting essentially of” means that the kit does not include materials that provide functionality other than the evaluation of metastasis risk to any significant extent. In particular, the kit does not encompass a set of broad screening reagents such as found on an Affymetrix® chip or and Illumina Human Methylation27 beadarray, which may include the relevant genes in combination with a multitude of genes that are not relevant to metastasis prediction. The kit might, however, include materials for evaluation of some additional genes, provided these do not change the primary purpose of the kit.
Table 1 sets forth a subset of genes that have been found by the inventors to have prognostic value for prediction of metastasis risk in order of significance as well as suitable probe sets for each protein listed as Differentially methylated Probeset IDs from Illumina Human Methylation27 beadarray. These beadarrays query 27,578 CpG islands each, covering 14,495 genes.
In embodiments of the invention, the genes evaluated are selected from this list. In some embodiments, all of the genes in Table 1 are evaluated. In some embodiments, at least 50 genes in Table 1 are evaluated. In some embodiments, from 3 to 20 genes in Table 1 are evaluated. In specific embodiments, the plurality of genes includes at least three genes selected from among RASGF2, ARHGEF7, FBN1, SOX8, CRABP1, FOXL2, and ALX4. In some embodiments, the gene tested include the top 3, 5, 8 or 10 genes listed in Table 1.
Methylation of these genes may be tested in combination with other genes that have been shown to be of relevance in a other CIMP classifiers without departing from the scope of the invention.
Based on the evaluation results, whether and/or to what extent the patient is at risk of breast cancer metastasis is determined. It will be appreciated that the significance of hyper or hypomethylation to metastatic risk depends on the gene that is hyper or hypomethylated. For example, as discussed below, observation of hypermethylation in ALX4, ARHGEF7, and RASGRF2 correlated with a decreased incidence of metastatic relapse. This is the case for each of the genes in Table 1, and in Table 2.
On a practical level, risk is measured by detecting methylation in a subset of the CIMP genes. The genes that can be used can include any combination of our B-CIMP genes as described in Table 1 or Table 2. A panel of 3-10 genes detected using quantitative methylation specific PCR, EpiTYPER, or methyllight can be used in the clinic. Methylation of these genes determines whether the breast tumor is CIMP+ or −. This information is used in conjunction with standard staging and pathology to determine risk of metastasis. If risk is sufficiently high (determined on a case by case basis via clinical practice standards), then patient may be offered more aggressive chemotherapy.
Other methods for detection of methylation at the nucleic level are also known, and may be used in the methods of the invention. For example, as described in U.S. Pat. No. 7,153,653, which is incorporated herein by reference, mapping of methylated regions in DNA may be based on Southern hybridization approaches, based on the inability of methylation-sensitive restriction enzymes to cleave sequences which contain one or more methylated CpG sites, or using methylated CpG island amplification (MCA) to enrich for methylated CpG rich sequences. MCA coupled with Representation Difference Analysis (MCA/RDA) can recover CpG islands differentially methylated in cancer cells (Toyota, et al., Cancer Res. 59:2307 2312, 1997).
Because CpG island methylation leads to reduced expression of the gene and associated proteins, methylation can also be assessed indirectly through assessment of gene expression and expressed protein levels. This assays can be performed using an Affymetrix microarray, or immunohistochemistry. By way of example, if this approach is used, assessment of risk can be made on the basis of an assay for some combination of the 102 hypermethylated and down-regulated genes of Table 3. In most cases, however, methylation assays are preferred over expression-based assays since methylation assays are more robust, less expensive, and can be used on samples that are easier to obtain from the clinic, DNA being more stable than RNA.
To facilitate the performance of the methods of the invention as prognostic evaluations on actual patient samples, the present invention provides diagnostic assay tools/kits that include reagents sufficient to do the testing without the overhead of numerous additional and less relevant reagents that might be present in a research tool. Thus, in accordance with an embodiment of the invention, the assay kits of the invention comprise reagents for determination of CpG island methylation of 100 genes or less, preferably 50 genes or less, in which at least 50% of the genes for which reagents are provided are genes that have relevance to the determination of risk of breast cancer metastases.
In accordance with some embodiments, the kits of the invention contain reagents for detection of methylation in 3 to 20 genes. In specific embodiments, the plurality of genes includes at least three genes selected from among RASGF2, ARHGEF7, FBN1, SOX8, CRABP1, FOXL2, and ALX4. In some embodiments, the gene tested include at least the top 3, 5, 8 or 10 genes listed in Table 1, any three genes of the top 5, any three genes of the top 8 or any three genes of the top 10 genes listed in Table 1.
In some embodiments of the invention, the target specific reagents contained in the kit include reagents for detection of methylation of a gene set as discussed above. The specific nature of the reagent will depend on the methodology employed for determination of methylation, but may include sequence specific probes or primers. Optionally, the kit may also include the non-target-specific reagents. The reagents may be provided in an array format for ease of use and interpretation.
The results summarized above are derived from a systematic, genome-wide characterization of the breast cancer methylome in breast cancers with diverse metastatic behavior. We used the Illumina Infinium HumanMethylation27 platform because it provides efficient genome-wide interrogation of CpG islands. This platform is well-validated and highly reproducible (mean correlation coefficient=0.987) (31, 32). Using this platform, analyses of replicate breast cancer samples generated highly concordant data. We first analyzed a discovery set of IDCs with differing metastatic behavior (including samples with varying ER/PR and HER2 status from patients with excellent clinical followup (n=39, Table 4). To identify breast cancer subgroups, we selected the most variant probes and performed consensus clustering and unsupervised hierarchical clustering. We identified two robust DNA methylation clusters, one encompassing a portion of the HR-positive tumors (defined as ER/PR-positive, cluster 2) and one encompassing tumors that were ER/PR-positive or ER/PR-negative (cluster 1). Cluster 2 breast cancer samples possessed a highly characteristic DNA methylation profile with high coordinate hypermethylation at a subset of loci, similar to the CIMP phenotype seen in colorectal cancer (2, 23). We, thus, designated this group (cluster 2) as having a breast CpG island methylator phenotype (B-CIMP). In our discovery set, 17 out of 39 (44%) of tumors were B-CIMP-positive (B-CIMP+) Importantly, the composition of the B-CIMP+ subgroup was confirmed by two independent clustering algorithms (2D hierarchical and K-means consensus clustering); both approaches defined the same set of tumors as exhibiting B-CIMP. Cluster significance was evaluated by SigClust and class boundaries were highly significant (33). Our array results were validated using EpiTyper (34), a mass spectrometry-based technique allowing sensitive detection of DNA methylation at base-pair resolution (
We defined the relationship between B-CIMP status, clinical co-variates, and known molecular determinants. Interestingly, CIMP+ tumors consisted almost entirely of ER/PR-positive tumors (94%,
To validate the existence of B-CIMP, we used Epityper to evaluate an independent cohort of breast cancers. We examined methylation of three loci (ALX4, ARHGEF7, and RASGRF2) that were among the most predictive for B-CIMP in our Infinium data (
We next sought to define the nature of the methylome differences between the B-CIMP subgroups and characterize the effects of these differences on the breast cancer transcriptome. Probes were filtered for analysis by ranking transformed beta-values using decreasing adjusted p-values and increasing beta-value difference to identify the top most differentially hypermethylated genes in the B-CIMP group. Of the 3297 CpG sites that were differentially methylated between CIMP+ and CIMP− tumors, 2333 (71%) were hypermethylated (Tables 5-7). There were 2543 unique genes represented within this group, including 1764 that were hypermethylated and 779 that were hypomethylated (See Table 13).
Affymetrix transcriptome data were obtained from the same breast tumors analyzed for methylation to determine genes demonstrating differential expression and B-CIMP methylation. A total of 279 genes were significantly downregulated and 238 genes were significantly upregulated (Table 8). Gene ontology (GO) analysis showed that the significantly upregulated genes were highly enriched for functional categories involving cell motion, angiogenesis, apoptosis, development, kinase activity, and DNA binding (FIG. S4A-B). The downregulated genes were enriched for functional categories involved in mitosis, cytokinesis, exocytosis, chromosomal segregation, transcription factor activity, and kinase activity. Integration of the normalized gene expression and DNA methylation gene sets identified 102 genes with both significant hypermethylation and downregulation in B-CIMP-positive tumors (Table 3). Among these genes are LYN, MMP7, KLK10, and WNT6, which are known to play a role in breast cancer outcome or epithelial-mesenchymal transition (EMT) (35,36,37,38). GO analysis showed B-CIMP-specific downregulation of genes (hypermethylated and downregulated in B-CIMP) is associated with cell motion, development, signaling, and catalytic activity as some of the most significant functional categories.
Although mRNA expression signatures have been developed to help predict the risk of metastatic disease in breast cancer patients, the genomic foundations for these differences in gene expression are incompletely understood (17, 39, 40). Few genetic changes have been shown to be causally related to these transcriptional differences. Since B-CIMP status affects metastatic risk, we wondered whether methylation helps account for the transcriptome diversity underlying common breast cancer prognostic expression signatures. To address this question, we performed concepts mapping analysis as previously described (41). Remarkably, the methylated and down-regulated genes comprising the transcriptomic footprint of B-CIMP (B-CIMP repression signature) were markedly enriched among the most differentially expressed genes defining prognosis in multiple breast cancer cohorts. Low expression of genes comprising the B-CIMP repression signature was seen in tumors that did not metastasize and high expression of the signature was seen in tumors which metastasized and/or resulted in poor survival (Tables 7 and 8). Importantly, we observed highly significant associations between B-CIMP genes and breast cancer relapse expression signatures from multiple independent data sets, confirming the validity of our findings. Using the van't Veer cohort, we demonstrated that the presence of the B-CIMP repression signature strongly predicted survival (
To elucidate the differences in the methylation landscape between the two epigenomic subclasses, we mapped regions of the most significant methylation differences between CIMP+ and CIMP− tumors across the genome. Dense clusters of methylation density were apparent in the arms of a number of chromosomes. GSEA of the differentially methylated genes showed a highly significant enrichment for polycomb complex 2 (PRC2) targets (Table 3), the vast majority being CpG-island containing genes (42). B-CIMP genes were highly enriched in many PRC2 occupancy data sets, including those involving H3K27 methylation, SUZ12 and EZH2—in stem cells and in cancer cells. GSEA analysis using the Broad molecular signature database demonstrated that CIMP genes were most significantly enriched in polycomb (PcG) occupancy data sets, although other processes were also implicated, including EMT and Wnt signaling, which are known to play a role in metastasis (Tables 9 and 10) (43). It has been shown that the presence of a bivalent chromatin mark involving the key PcG mark, trimethylated H3K27, in stem cells may predispose specific genes to become hypermethylated and silenced in cancer and may be indicative of a contribution of stem cells to the derivation of specific cancers (44, 45). Perhaps this process is active in breast tumors of the B-CIMP subclass.
A further classifier set that allows a limited of number of genes to be used for prediction of metastatic risk in multiple cancer types, specifically breast and colon cancer and glioma was also developed. We compared the CIMP-associated loci from breast cancer, colon cancer, and glioma (publicly available from The Cancer Genome Atlas—http://cancergenome.nih.gov). CIMP-associated genes were defined for glioma and colon cancer using the same methodology as above and were consistent with previous data (1, 2). Colon CIMP genes were derived from MSKCC tumors (n=24) using hierarchical clustering and confirmed as described in Materials and Methods and in Weisenberger et al. (2006). All data sets were generated using the same Infinium HumanMethylation27 platform and were directly comparable. We first wished to determine whether CIMP selectively targeted PcG targets not only in breast cancer but in other malignancies as well. All methylated loci (beta-value FDR-corrected p-value<0.05) in the three tumor types were compared with previously generated global PcG target gene sets (46, 47) (Table 11). Highly significant overlap was observed between CIMP and PcG targets in breast, glioma, and colon cancer, potentially indicating that CIMP may employ similar processes across cancer types. Using the 33 most significant common predictors of CIMP, we generated a consensus signature for CIMP-positivity across these tumor types (Table 12). As CIMP imparts a favorable clinical prognosis in breast cancer, colon cancer (CIMP-hi, microsatellite unstable) (2, 48), and gliomas (1), this epigenomic signature can be used as an indicator of outcome across multiple human malignancies.
Our findings have several important implications for the understanding of breast and other cancer. First, we have definitively identified distinct epigenomic subtypes of breast cancer and documented the existence of a global CIMP in breast cancer. Aberrant hypermethylation of genes have been described in breast cancer previously (26, 49-51) and the methylation state of specific genes has been linked to outcome (52-54). However, the existence of a global CpG island methylator phenotype has remained elusive prior to our study. Our global approach robustly identifies B-CIMP as a characteristic of a subset of hormone-positive tumors. B-CIMP+ tumors demonstrated a lower propensity for metastasis and a better clinical outcome than B-CIMP− tumors. Interestingly, the association of better clinical outcome with CIMP+ tumors can be seen across multiple malignancies (breast, colon, and glioma) (2, 23). In these tumors, it may be that the epigenomic defects causing CIMP initially help promote neoplastic transformation but inactivate genes that may facilitate tumor aggressiveness in later stages of cancer progression. It is important to note, however, that the association of methylation at CIMP genes with good clinical outcome is not universally applicable to methylation at all genes. Methylation of specific candidate genes or groups of genes has been associated with poorer prognosis and these may has an effect on tumor aggressiveness independent of CIMP (27, 53, 55-57). Interestingly, genes such as these—including CDKN2A, PTPRD, and BRCA1—were not included among the B-CIMP loci.
Second, the genomic basis of prognostic transcriptional signatures is unclear. Importantly, our data, to our knowledge, for the first time demonstrate that aberrations in the DNA methylome explain many of the mRNA expression differences that underlie these signatures. The tight association of these changes with a genome-wide concerted hypermethylation phenotype and their enrichment for polycomb targets argues against the inactivation of these genes as being sporadic events. Rather, the B-CIMP phenotype is consistent with a global, systematic derangement in epigenetic regulation. Importantly, the methylome profiles we have derived and the associated CIMP repression signature provide a previously unknown mechanistic link between breast cancers with differing metastatic behavior and transcriptional signatures that predict metastatic relapse. It is important to note, although we show that methylation-associated gene silencing underlies many metastasis-associated gene expression changes, genetic changes are undoubtedly important as well. Indeed, mutations of a number of genes such as BRCA1, PTEN, and ERBB2 have been shown to be associated with an increased risk of metastasis (5, 58, 59). The relationship between these mutations and the B-CIMP phenotype is unclear and it is very likely that both genetic and epigenetic alterations contribute to the metastatic phenotype. Interestingly, BRCA1 has recently been shown to up-regulate DNMT1, which may help explain the association between BRCA1 mutation, basal-type tumors, and the lack of methylation we have observed in our study among hormone receptor-negative breast cancers (60). Future studies will be required to define any potential casual relationship between mutations and derangements in the epigenomic landscape.
Breast tumors (discovery set n=39, validation set n=132) from the Memorial Sloan-Kettering Cancer Center were obtained following patient consent and with institutional review board (IRB) approval. For the primary breast tumor data, tissues from primary breast cancers were obtained from therapeutic procedures performed as part of routine clinical management. Source DNAs or RNAs were extracted from frozen or paraffin-embedded primary tumors for the methylation and expression studies. Frozen samples were “snapfrozen” in liquid nitrogen and were stored at −80° C. Each sample was examined histologically with H&E-stained cryostat sections. Regions were
microdissected from the slides to provide a consistent tumor cell content of more than 70% in tissues used for analysis. Genomic DNA was extracted using the QIAamp DNA Mini kit or the QIAamp DNA FFPE Tissue kit (Qiagen) using the manufacturer's instructions. RNA was extracted using the Trizol (Invitrogen) according to the manufacturer's directions. Nucleic acid quality was determined using the Agilent 2100 Bioanalyzer. Nucleic acids from the discovery set were used for methylation and expression analysis as described below.
Genome-wide methylation analysis was performed using the Illumina Infinium HumanMethylation27 bead array. Bisulphite conversion of genomic DNA was done with the EZ DNA methylation Kit (Zymo Research) by following the manufacturer's protocol with modifications for the Illumina Infinium Methylation Assay. Briefly, one mg of genomic DNA was mixed with 5 μl of M-Dilution Buffer and incubated at 37° C. for 15 minutes and then mixed with 100 μl of CT Conversion Reagent prepared as instructed in the protocol. Bisulphite-converted DNA samples were desulphonated and purified. Bisulphite-converted samples were used for microarray or Epityper analysis. Bisulphite-converted genomic DNA was analyzed using the Infinium Human Methylation27 Beadchip Kit (Illumina, WG-311-1202) by the MSKCC Genomics Core. Processing of the array was per the manufacturer's protocol. Briefly, 4 μl of bisulphite-converted genomic DNA was denatured in 0.014N sodium hydroxide, neutralized and amplified with reagents from the kit and buffer for 20-24 hours at 37° C. Each sample was loaded onto a 12-sample array. After incubation at 48° C. for 16-20 hours, chips were washed with buffers provided in the kit and placed into a fluid flow-through station for primer-extension reaction. Chips were image-processed using Illumina's iScan scanner. Data were extracted using GenomeStudio software (Illumina). Methylation values for each CpG locus are expressed as a beta (?)-value, representing a continuous measurement from 0 (completely unmethylated) to 1 (completely methylated). This value is based on following calculation: ?-value=(signal intensity of methylation-detection probe)/(signal intensity of methylation-detection probe+signal intensity of non-methylation-detection probe). Methylation analysis controls included in vitro methylated DNA (positive control) (61) and human HCT 116 DKO DNA (DNA methyltransferase double knock-out cells (DNMT1 and DNMT3b) (62).
Methods for RNA extraction, labeling, and hybridization for DNA microarray analysis have been described previously (39). Briefly, complementary DNA was synthesized from total RNA using a T7 promoter-tagged dT primer. All gene expression analysis was carried out using the Affymetrix Human Genome U133A 2.0 microarray. Image acquisition was performed using an Affymetrix GeneChip scanner. Fluorescence intensities were background-corrected, mismatch-adjusted, normalized and summarized to yield log 2-transformed gene expression data.
For expression analysis, the Affymetrix data were imported into the Partek Genomics Suite (Partek). Data were normalized, log-transformed, and median-centered for analysis. Analysis of variance (ANOVA) followed by false discovery correction (FDR) (63, 64) was used to identify genes that were differentially expressed between the CIMP groups. (See table 14) Hierarchical clustering was performed using wither Euclidean distance or Pearson correlation. SigClust significance as implemented in the R package sigclust was used as described in (33). For Gene Ontogeny analysis, functional analysis of gene lists was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) (65, 66) and the PANTHER functional annotation classes. PANTHER categories with adjusted p-values (FDR-corrected with Benjamini-Hochberg)<0.05 were considered as significantly over-represented in our gene lists.
For methylation analysis, Illumina data was imported into Partek using custom software. Beta-values were logit-transformed and mean-centered prior to analysis. ANOVA with false discovery correction (FDR) (63, 64) was used to identify genes that were differentially methylated between the CIMP groups. Significant changes were defined as genes having an FDR-corrected p-value<0.05. Probes with an adjusted p-value below 0.05 were considered significantly differentially methylated between the two sets of tumors. The beta-value difference between the two groups was performed by first calculating the mean beta-value across each group and then calculating the difference between the mean beta-values for each probe. The hierarchical clustering of the methylation data was performed as above using the top 5% most variant probes across the samples (defined by standard deviation). K-means consensus clustering was performed using the R statistical package. The optimum cluster number was identified by varying K and evaluating the K-means output for significance of iterations. The top 5% of the most variably methylated probes between CIMP subgroups were retained, resulting in a 1359-gene by 39-sample matrix. Consensus clustering was performed on this matrix with k-means clustering (Kmax=9) using Euclidean distance and average linkage over 1000 resampling iterations with random restart (as implemented in GenePattern v3.2.3) (67). The consensus matrix for K=2 was imported into R statistical software (v.2.11.1) and the heatmap was visualized using the gplots and color Ramps packages in Bioconductor v2.6.
For identification of CIMP genes in colon cancer and glioblastoma, analysis was performed as follows. Methylation data for colon cancer were downloaded from The Cancer Genome Atlas (TCGA) data portal and imported into R statistical software. Hierarchical clustering was performed as described above with the breast cancer data using the top 5% most variant probes. Iterations using the top 3% to 20% did not significantly alter the clustering results. The cluster results were confirmed using the methylation b-values of the 5 gene panel described by Weisenberger et. al. to identify CIMP+ tumors in colorectal cancers (2). The cluster of samples that exhibited hypermethylation of these marker genes was selected as CIMP positive and used for further analyses. These corresponded to the cluster with high coordinate hypermethylation derived by hierarchical clustering. The glioblastoma CIMP genes were identified as described in (1). Datasets are deposited in the Gene Expression Omnibus and at www.cbio.mskcc.org. The Cancer Genome Atlas Project GBM cancer datasets are publically available at www.cancergenome.nih.gov.
Concepts module mapping was performed as follows. The methylation signature identified from our analysis (table S1) was imported into Oncomine (http://www.oncomine.org) to search for associations with molecular concepts signatures derived from independent cancer profiling studies. We report statistically significant overlaps of our methylation gene signature with the top-ranking gene expression signatures of clinical outcome using percentile cutoffs (10%). Q-value is calculated as previously described (41).
Gene Set Enrichment Analysis (GSEA) was performed using GSEA (68) software v2.0.7 and MSigDB database v2.5 (68). We assessed the significance of the gene sets with the following parameters: number of permutations=1000 and permutation_type=phenotype with an FDR q-value cut-off of 25%. The most differentially expressed genes from statistically significant gene sets were identified using the ‘leading edge subset” that consists of genes with the most contribution to the enrichment score of a particular gene set. Enrichment of gene sets downloaded from the literature (as referenced in table S8) was analyzed together with the curated gene sets (MSigDB collection c2) or within each other.
DNA methylation analysis was carried out using the Epityper system Sequenom. The EpiTYPER assay is a tool for the detection and quantitative analysis of DNA methylation using base-specific cleavage of bisulfite-treated DNA and matrix-assisted laser desorption/Ionization time-of-flight mass spectrometry (MALDI-TOF MS) (69). Specific PCR primers for bisulfate-converted DNA were designed using the EpiDesigner software (www.epidesigner.com), for the entire CpG island of the genes of interest. T7-promoter tags are added to the reverse primer to obtain a product that can be in vitro transcribed, and a 10-mer tag is added to the forward primer to balance the PCR conditions. For primer sequences, target chromosomal sequence, and Epityper-specific tags, see table S2. One mg of tumor DNA was subjected to bisulfate treatment using the EZ-96 DNA methylation kit, which results in the conversion of unmethylated cytosines into uracil, following the manufacturer's instructions (Zymo). PCR reactions were carried out in duplicate, for each of the 2 selected primer pairs, for a total of 4 replicates per sample. For each replicate, 1 ml of bisulfate-treated DNA was used as template for a 5 ml PCR reaction in a 384-well microtiter PCR plate, using 0.2 units of Kapa2G Fast HotStart DNA polymerase (Kapa Biosystems), 200 mM dNTPs, and 400 nM of each primer. Cycling conditions were: 94° C. for 15 minutes, 45 cycles of 94° C. for 20 seconds, 56° C. for 30 seconds, 72° C. for 1 minute, and 1 final cycle at 72° C. for 3 minutes. Unincorporated dNTPs were deactivated using 0.3 U of shrimp alkaline phosphatase (SAP) in 2 ml, at 37° C. for 20 minutes, followed by heat inactivation at 85° C. for 5 minutes. Two ml of SAP-treated reaction were transferred into a fresh 384-well PCR plate, and in vitro transcription and T cleavage were carried out in a single 5 ml reaction mix, using the MassCleave kit (Sequenom) containing 1×T7 polymerase buffer, 3 mM DTT, 0.24 ml of T Cleavage mix, 22 units of T7 RNA and DNA polymerase, and 0.09 mg/ml of RNAse A. The reaction was incubated at 37° C. for 3 h. After the addition of a cation exchange resin to remove residual salt from the reactions, 10 nl of Epityper reaction product were loaded onto a 384-element SpectroCHIP II array (Sequenom). SpectroCHIPs were analyzed using a Bruker Biflex III matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) mass spectrometer (SpectroREADER, Sequenom). Results were analyzed using the Epityper Analyzer software, and manually inspected for spectra quality and peak quantification. CIMP positivity was defined as a mean methylated allelic frequency of >50% or a two-fold increase over normal breast tissue and the CIMP-negative state.
The 295-sample set of Van't Veer microarray data (NKI295) was downloaded from Rosetta Inpharmatics website (17). Seventy genes out of 102 of our methylation signature were represented in NKI295 and were used to test for prognostic significance. An average expression value was calculated for our hypermethylated and downregulated in CIMP geneset across each sample of NKI295. (See Table 15) A two-way classifier was developed by separating the patients into two groups based on the average expression value of our methylation signature: CIMP repression signature up-regulated if the average expression value was >0 and CIMP repression signature down-regulated otherwise. Kaplan-Meier curves comparing survival of patient subgroups were generated using SPSS statistical software.
Evaluation of the extent of CIMP for a given gene can be determined using variations of bisulfite sequencing. Methylation in CpG islands occurs on cytosine bases within the sequences. Bisulfite conversion of the nucleic acid converts unmethylated cytosines to uracil, and methylated cytosines to unmethylated cytosines. Thus, sequencing of the bisulfite conversion product and comparison with a reference sequence for the gene identifies the bases that were been methylated in the sample sequences. This type of procedure can be done using any type of assay platform that can distinguish between sequences containing Cs and sequences containing Us. This includes amplification of the relevant region and complete sequencing, high stringency hybridization assays that detect binding, high stringency amplification where the primer overlaps with the CpG island and amplifies only in the absence or presence of methylation, and similar techniques. One particular technique makes use of an Illumina Human Methylation27 beadarray, or a scaled down variant in which the probe sets used are those that provide information concerning genes methylated in IDC breast cancers with metastatic potential. This technique looks at 2 CpG sites per CpG island, although more sites would be evaluated in a more focused assay. See also US Patent Publication No. 2010/0209906 relating to detection of methylation in colon cancer.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US11/66549 | 12/21/2011 | WO | 00 | 1/8/2014 |
Number | Date | Country | |
---|---|---|---|
61425610 | Dec 2010 | US |