Aspects of the present invention relate generally to colorectal cancer (CRC), and more particularly to methods and compositions (e.g., gene marker panels) for at least one of diagnosis, identification and classification of CRC. Further aspects relate to marker identification based on a comprehensive genome-scale analysis of aberrant DNA methylation and/or gene expression in CRC. Particular aspects relate to identification and/or classification of colorectal tumors, corresponding to distinctive DNA methylation-based subgroups of CRC including CpG island methylator phenotype (CIMP) groups and non-CIMP groups. Further aspects related to correlations of genetic mutation, and other epigenetic markers with said CRC subgroups for at least one of diagnosis, identification and classification of CRC including CIMP groups and non-CIMP groups.
A Sequence Listing (in .txt format) comprising SEQ ID NOS:1-278 was filed as part of this application, and is incorporated by reference herein in its entirety.
Colorectal cancer (CRC) arises through the accumulation of multiple genetic and epigenetic changes. Somatic mutations in APC, BRAF, KRAS, PIK3CA, TP53 and other genes have been frequently observed in CRC and are considered to be drivers of colorectal tumorigenesis (Wood et al., 2007). In addition, the majority of sporadic CRCs (65-70%) display chromosomal instability (CIN), characterized by aneuploidy, amplifications and deletions of subchromosomal genomic regions and loss of heterozygosity (LOH) (Pino and Chung, 2010).
Two major types of epigenetic modifications closely linked to CRC are DNA methylation and covalent histone modifications (Jones and Baylin, 2007). Aberrant DNA methylation of CpG islands has been reported in the earliest detectable lesions in the colonic mucosa, aberrant crypt foci (ACF) (Chan et al., 2002). Promoter CpG island DNA hypermethylation is associated with transcriptional gene silencing, and can cooperate with other genetic mechanisms to alter key signaling pathways critical to colorectal tumorigenesis (Baylin and Ohm, 2006). A recent large-scale comparison between genes mutated and hypermethylated in CRC revealed significant overlap between these two alterations (Chan et al., 2008). Importantly, DNA hypermethylation appeared to be the preferred mechanism when a gene can be inactivated by either mutation or promoter DNA hypermethylation.
New insights into the mechanisms and the role of CpG island hypermethylation in cancer have emerged from recent studies using integrated analyses of the two types of epigenetic modifications. We and other groups have reported that genes that are targeted by Polycomb group (PcG) proteins in embryonic stem (ES) cells are susceptible to cancer-specific DNA hypermethylation (Ohm et al., 2007; Schlesinger et al., 2007; Widschwendter et al., 2007). PcG target genes are characterized by trimethylation of histone H3 lysine 27 (H3K27me3), are maintained at a low expression state and are poised to be activated during development (Bernstein et al., 2007). More recently, it has been found that genes targeted by H3K27me3 in normal tissues acquire DNA methylation and lose the H3K27me3 mark in cancer (Gal-Yam et al., 2008; Rodriguez et al., 2008). Importantly, epigenetic switching of H3K27me3 and DNA methylation mainly occurs at genes that are not expressed in normal tissues. Furthermore, cancer-specific H3K27me3-mediated gene silencing has also been shown to inactivate tumor suppressor genes independent of DNA hypermethylation in CRC (Jiang et al., 2008; Kondo et al., 2008).
Colorectal tumors with a CpG island methylator phenotype (CIMP) exhibit a high frequency of cancer-specific DNA hypermethylation at a subset of genomic loci and are highly enriched for activating mutation of BRAF (BRAFV600E) (Weisenberger et al., 2006). CRCs with CIN and CIMP have been shown to be inversely correlated (Goel et al., 2007; Cheng et al., 2008) and appear to develop in two separate pathways (Leggett and Whitehall, 2010). DNA hypermethylation of some CIMP-associated gene promoters have been detected in early stages of in colorectal tumorigenesis (Ibrahim et al., 2011). Furthermore, an extensive promoter DNA hypermethylation has been observed in the histologically normal colonic mucosa of patients predisposed to multiple serrated polyps, the proposed precursors of CIMP tumors (Young and Jass, 2006). Notably, some of the distinct genetic and histopathological characteristics associated with CIMP tumors may be directly attributable to CIMP-mediated gene silencing. Applicants have reported that CIMP-associated DNA hypermethylation of MLH1 is the dominant mechanism for the development of sporadic CRC with microsatellite instability (MSI) (Weisenberger et al., 2006). Furthermore, the CIMP-specific inactivation of IGFBP7-mediated senescence and apoptosis pathways may provide a permissive environment for the acquisition of BRAF mutations in CIMP-positive tumors (Hinoue et al., 2009; Suzuki et al., 2010).
Recent studies from several groups indicated that colorectal tumors with KRAS mutations may also be associated with a unique DNA methylation profile. CIMP-low (CIMP-L) tumors were originally shown to exhibit DNA hypermethylation of a reduced number of CIMP-defining loci (Ogino et al., 2006). CIMP-L was significantly associated with KRAS mutations, was observed more commonly in men than women and appeared to be independent of MSI status. Shen and colleagues described the CIMP2 subgroup, which also showed DNA hypermethylation of CIMP-associated loci, but was highly correlated (92%) to KRAS mutations and not associated with MSI (Shen et al., 2007). A recent report from Yagi, et al. reported the intermediate-methylation epigenotype (IME), which was also associated with KRAS mutations (Yagi et al., 2010).
In light of these findings, there is confusion in the art with regards to DNA methylation subtypes in CRC. It is not established whether CIMP-L, CIMP2 or IME represent unique DNA methylation-based subgroups in CRC, as limited numbers of genomic regions were used to derive membership in these studies. Moreover, the types of genes targeted for DNA methylation in each subgroup and the effects of DNA hypermethylation on gene expression in each subtype have not yet been fully explored.
In particular aspects, four distinct DNA methylation subgroups were identified and characterized in CRC by performing comprehensive, genome-scale DNA methylation profiling of 125 primary colorectal tumors and 29 adjacent non-tumor colonic mucosa samples using the Illumina Infinium DNA methylation assay.
In certain aspects, Applicants developed diagnostic DNA methylation gene marker panels to identify CIMP (CIMP-H and CIMP-L), as well as to segregate CIMP-H tumors from CIMP-L tumors based on the Infinium DNA methylation data (
In particular aspects, a CIMP-defining marker panel consisting of B3GAT2, FOXL2, KCNK13, RAB31 and SLIT1 was identified. Using the conditions that DNA methylation of three or more markers qualifies a sample as CIMP, this panel identifies CIMP-H and CIMP-L tumors with 100% sensitivity and 95.6% specificity with 2.4% misclassification using a β-value threshold of ≧0.1.
In particular aspects, a second marker panel of FAM78A, FSTL1, KCNC1, MYOCD, and SLC6A4 specifically identifies CIMP-H tumors with 100% sensitivity and 100% specificity (0% misclassification) using conditions that three or more markers show DNA methylation β-value threshold of ≧0.1.
In certain aspects, a tumor sample is classified as CIMP-H if both marker panels are positive (three or more markers with DNA methylation for each panel).
In further aspects, a tumor sample is classified as CIMP-L if the CIMP-defining marker panel is positive while the CIMP-H specific panel is negative (0-2 genes methylated).
Gene expression data was also obtained for paired tumor and adjacent normal samples in order to assess the biological implications of DNA methylation-mediated gene silencing in CRC.
Preferred Exemplary Embodiments.
Preferred aspects provide methods for at least one of diagnosing, detecting and classifying a colorectal cancer belonging to a distinct colorectal cancer (CRC) subgroup having frequent CpG island hypermethylation (CIMP CRC), comprising: determining, by analyzing a human subject biological sample comprising colorectal cancer (CRC) cell genomic DNA using a suitable assay, a CpG methylation status of at least one CpG dinucleotide from each gene of the gene marker panel of B3GAT2, FOXL2, KCNK13, RAB31 and SLIT1 (CIMP marker panel); wherein CpG hypermethylation, relative to normal control values, of at least three genes of the CIMP marker gene panel is indicative of a frequent CpG island hypermethylation colorectal cancer subgroup (CIMP CRC), and wherein a method of at least one of diagnosing, detecting and/or classifying a colorectal cancer belonging to the distinct colorectal cancer (CRC) subgroup having frequent CpG island hypermethylation (CIMP CRC) is afforded. In certain aspects, the CpG island hypermethylation colorectal cancer (CIMP CRC), comprises both CIMP-H and CIMP-L subgroups of CIMP. In particular embodiments, CIMP-H and CIMP-L tumors are identified with about 100% sensitivity and about 95.6% specificity with about 2.4% misclassification using conditions that three or more markers show DNA methylation β-value threshold of ≧0.1. as defined herein. In certain aspects of the methods disclosed herein, determining a CpG methylation status of at least one CpG dinucleotide from each gene of the gene marker panel of B3GAT2, FOXL2, KCNK13, RAB31 and SLIT1 (CIMP marker panel), comprises determining a CpG methylation status of at least one CpG dinucleotide from each of: at least one of SEQ ID NOS:45, 46 and 278 (B3GAT2 promoter, CpG island and amplicon, respectively); at least one of SEQ ID NOS:40, 41 and 240 (FOXL2 promoter, CpG island and amplicon, respectively); at least one of SEQ ID NOS:25, 26 and 224 (KCNK13 promoter, CpG island and amplicon, respectively); at least one of SEQ ID NOS:35, 36 and 236 (RAB31 promoter, CpG island and amplicon, respectively); and at least one of SEQ ID NOS:30, 31, 228 and 232 (SLIT1 promoter, CpG island and amplicons, respectively), respectively. Additional aspects further comprise determining, by analyzing the human subject biological using a suitable assay, a CpG methylation status of at least one CpG dinucleotide from each gene of an additional gene marker panel of FAM78A, FSTL1, KCNC1, MYOCD, and SLC6A4 (CIMP-H marker panel), wherein a CIMP-L subgroup of CIMP is indicated where the CIMP-defining marker panel is positive (hypermethylation of at least three genes of the CIMP marker gene panel) while the CIMP-H marker panel is negative (hypermethylation of only 0-2 genes of the CIMP-H marker gene panel), and wherein a CIMP-H subgroup of CIMP is indicated where both the CIMP-defining marker panel and the CIMP-H marker panel are positive (hypermethylation of at least three genes of each marker gene panel). In additional aspects, the methods further comprise determination of at least one of KRAS, BRAF and TP53 mutant status. In certain aspects, the BRAF mutation status comprises mutation status at codon 600 in exon 15 (e.g., BRAFV600E), wherein the KRAS mutation status comprises mutation status at codon 12 and/or 13 in exon 2, and wherein the TP53 mutation status comprises mutation status at exons 4 through 8. In certain aspects, a positive mutation status comprises at least one of missense mutations, nonsense mutations, splice-site mutations, frame-shift mutations, and in-frame deletions. Yet additional aspects further comprise determining a MLH1 gene methylation status, wherein MLH1 hypermethylation is strongly associated with CIMP-H CRC. In particular embodiments of the methods disclosed herein, determining a CpG methylation status of at least one CpG dinucleotide from each gene of the gene marker panel of FAM78A, FSTL1, KCNC1, MYOCD, and SLC6A4 (CIMP-H marker panel), comprises determining a CpG methylation status of at least one CpG dinucleotide from each of: at least one of SEQ ID NOS:50, 51 and 247 (FAM78A promoter, CpG island and amplicon, respectively); at least one of SEQ ID NOS:65, 66, 259, 263 and 265 (FSTL1 promoter, CpG island and amplicons, respectively); at least one of SEQ ID NOS:60, 61 and 255 (KCNC1 promoter, CpG island and amplicon, respectively); at least one of SEQ ID NOS:55, 56 and 251 (MYOCD promoter, CpG island and amplicon, respectively); and at least one of SEQ ID NOS:70, 71, and 269 (SLC6A4 promoter, CpG island and amplicons, respectively), respectively. In certain embodiments, determining methylation status comprises treating the genomic DNA, or a fragment thereof, with one or more reagents (e.g., bisulfite, hydrogen sulfite, disulfite, and combinations thereof) to convert cytosine bases that are unmethylated in the 5-position thereof to uracil or to another base that is detectably dissimilar to cytosine in terms of hybridization properties.
Yet further aspects provide methods for at least one of diagnosing, detecting and classifying a colorectal cancer belonging to a distinct colorectal cancer (CRC) subgroup having frequent CpG island hypermethylation (CIMP CRC), comprising: determining, by analyzing a human subject biological sample comprising colorectal cancer (CRC) cell genomic DNA using a suitable assay, a CpG methylation status of at least one CpG dinucleotide from each gene of the gene marker panel of FAM78A, FSTL1, KCNC1, MYOCD, and SLC6A4 (CIMP-H marker panel); wherein CpG hypermethylation, relative to normal control values, of at least three genes of the CIMP-H marker gene panel is indicative of a CIMP-H subgroup of CIMP CRC, and wherein a method of at least one of diagnosing, detecting and classifying a colorectal cancer belonging to the CIMP-H subgroup of CIMP CRC is afforded. In certain aspects, CIMP-H tumors are identified with about 100% sensitivity and about 100% specificity (about 0% misclassification) using conditions that three or more markers show DNA methylation β-value threshold of ≧0.1. as defined herein. Certain aspects, further comprise determination of at least one of KRAS, BRAF and TP53 mutant status. In certain aspects, the BRAF mutation status comprises mutation status at codon 600 in exon 15 (e.g., BRAFV600E), wherein the KRAS mutation status comprises mutation status at codon 12 and/or 13 in exon 2, and wherein the TP53 mutation status comprises mutation status at exons 4 through 8. In particular aspects, a positive mutation comprises at least one of missense mutations, nonsense mutations, splice-site mutations, frame-shift mutations, and in-frame deletions. Certain aspects further comprise determining a MLH1 gene methylation status, wherein MLH1 hypermethylation is strongly associated with CIMP-H CRC. In certain aspects of the methods disclosed herein, determining a CpG methylation status of at least one CpG dinucleotide from each gene of the gene marker panel of FAM78A, FSTL1, KCNC1, MYOCD, and SLC6A4 (CIMP-H marker panel), comprises determining a CpG methylation status of at least one CpG dinucleotide from each of: at least one of SEQ ID NOS:50, 51 and 247 (FAM78A promoter, CpG island and amplicon, respectively); at least one of SEQ ID NOS:65, 66, 259, 263 and 265 (FSTL1 promoter, CpG island and amplicons, respectively); at least one of SEQ ID NOS:60, 61 and 255 (KCNC1 promoter, CpG island and amplicon, respectively); at least one of SEQ ID NOS:55, 56 and 251 (MYOCD promoter, CpG island and amplicon, respectively); and at least one of SEQ ID NOS:70, 71, and 269 (SLC6A4 promoter, CpG island and amplicons, respectively), respectively. In particular embodiments, determining methylation status comprises treating the genomic DNA, or a fragment thereof, with one or more reagents (e.g., bisulfite, hydrogen sulfite, disulfite, and combinations thereof) to convert cytosine bases that are unmethylated in the 5-position thereof to uracil or to another base that is detectably dissimilar to cytosine in terms of hybridization properties.
Yet additional aspects, provide kits for performing the methods, comprising, for each gene of the gene marker panel of B3GAT2, FOXL2, KCNK13, RAB31 and SLIT1, at least two oligonucleotides whose sequences in each case are identical, are complementary, or hybridize under stringent or highly stringent conditions to the respective marker gene; and optionally comprising a bisulfite reagent (e.g., bisulfite, hydrogen sulfite, disulfite, and combinations thereof). In certain aspects of the kits disclosed herein, the respective marker gene sequences comprise at least one sequence from each of: at least one of SEQ ID NOS:45, 46 and 278 (B3GAT2 promoter, CpG island and amplicon, respectively); at least one of SEQ ID NOS:40, 41 and 240 (FOXL2 promoter, CpG island and amplicon, respectively); at least one of SEQ ID NOS:25, 26 and 224 (KCNK13 promoter, CpG island and amplicon, respectively); at least one of SEQ ID NOS:35, 36 and 236 (RAB31 promoter, CpG island and amplicon, respectively); and at least one of SEQ ID NOS:30, 31, 228 and 232 (SLIT1 promoter, CpG island and amplicons, respectively), respectively.
Further aspects provide kits suitable for performing the method comprising, for each gene of the gene marker panel of FAM78A, FSTL1, KCNC1, MYOCD, and SLC6A4, at least two oligonucleotides whose sequences in each case are identical, are complementary, or hybridize under stringent or highly stringent conditions to the respective marker gene; and optionally comprising a bisulfite reagent (e.g., bisulfite, hydrogen sulfite, disulfite, and combinations thereof). In certain aspects of the kits disclosed herein, the respective marker gene sequences comprise at least one sequence from each of: at least one of SEQ ID NOS:50, 51 and 247 (FAM78A promoter, CpG island and amplicon, respectively); at least one of SEQ ID NOS:65, 66, 259, 263 and 265 (FSTL1 promoter, CpG island and amplicons, respectively); at least one of SEQ ID NOS:60, 61 and 255 (KCNC1 promoter, CpG island and amplicon, respectively); at least one of SEQ ID NOS:55, 56 and 251 (MYOCD promoter, CpG island and amplicon, respectively); and at least one of SEQ ID NOS:70, 71, and 269 (SLC6A4 promoter, CpG island and amplicons, respectively), respectively.
The data presented and discussed in this specification have also been deposited in NCBI's Gene Expression Omnibus (GEO) and are accessible through GEO Series accession numbers GSE25062 and GSE25070, incorporated by reference herein. The following links have been created to review these records: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=xpannsgssikcuvq&acc=GSE25062; and http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=rzgzzwyyqqqgklu&acc=GSE25070.
In particular aspects, “gene’ refers to the respective genomic DNA sequence, including any promoter and regulatory sequences of the gene (e.g., enhancers and other gene sequences involved in regulating expression of the gene), and in particular embodiments, portions of said gene. In certain embodiment a gene sequence may be an expressed sequence (e.g., expressed RNA, mRNA, cDNA). In particular aspects, the term “gene” shall be taken to include all transcript variants thereof (e.g., the term “B3GAT2” shall include for example its transcripts and any truncated transcript, etc) and all promoter and regulatory elements thereof. Furthermore where SNPs are known within genes the term shall be taken to include all sequence variants thereof.
In particular aspects, “promoter” or “gene promoter” refers to the respective contiguous gene DNA sequence extending from 1.5 kb upstream to 1.5 kb downstream relative to the transcription start site (TSS), or contiguous portions thereof. In particular aspects, “promoter” or “gene promoter” refers to the respective contiguous gene DNA sequence extending from 1.5 kb upstream to 0.5 kb downstream relative to the TSS. In certain aspects, “promoter” or “gene promoter” refers to the respective contiguous gene DNA sequence extending from 1.5 kb upstream to the downstream edge of a CpG island that overlaps with the region from 1.5 kb upstream to 1.5 kb downstream from TSS (and is such cases, my thus extend even further beyond 1.5 kb downstream), and contiguous portions thereof. In particular aspects, with respect to any particular recited gene, any CpG dinucleotide of the particular recited gene that is coordinately methylated with the “promoter” or “gene promoter” of said recited gene, has substantial diagnostic/classification utility as disclosed herein, as one of ordinary skill in the art could readily practice the disclosed invention using any such coordinately methylated CpG dinucleotide sequences.
In particular aspects, a “CpG” island (CGI) refers to the NCBI relaxed definition defined bioinformatically as DNA sequences (200 based window) with a GC base composition greater than 50% and a CpG observed/expected ratio [o/e] of more than 0.6 (Takai & Jones Proc. Natl Acad. Sci. USA 99:3740-3745, 2002; Takai & Jones In Silico Biol. 3:235-240, 2003; see also NCBI MapViewer help document describing relaxed vs strick definition of CpG islands at www.ncbi.nlm.nih.gov/projects/mapview/static/humansearch.html#cpg; all of which are incorporated by reference herein in their entirety). In particular aspects “CpG” island (CGI) refers to the more strick definition (Id).
“Stringent hybridisation conditions,” as defined herein, involve hybridising at 68° C. in 5×SSC/5×Denhardt's solution/1.0% SDS, and washing in O.2×SSC/O.1% SDS at room temperature, or involve the art-recognized equivalent thereof (e.g., conditions in which a hybridisation is carried out at 60° C. in 2.5×SSC buffer, followed by several washing steps at 37° C. in a low buffer concentration, and remains stable). Moderately stringent conditions, as defined herein, involve including washing in 3×SSC at 42° C., or the art-recognized equivalent thereof. The parameters of salt concentration and temperature can be varied to achieve the optimal level of identity between the probe and the target nucleic acid. Guidance regarding such conditions is available in the art, for example, by Sambrook et al. 1989, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, N.Y.; and Ausubel et al. (eds.), 1995, Current Protocols in Molecular Biology, (John Wiley & Sons, N.Y.; incorporated herein by reference) at Unit 2.10.
The term “methylation state” or “methylation status” refers to the presence or absence of 5-methylcytosine (“5-mCyt”) at one or a plurality of CpG dinucleotides within a DNA sequence. Methylation states at one or more particular CpG methylation sites (each having two CpG dinucleotide sequences) within a DNA sequence include “unmethylated,” “fully-methylated” and “hemi-methylated.”
The term “hemi-methylation” or “hemimethylation” refers to the methylation state of a double stranded DNA wherein only one strand thereof is methylated.
The term “hypermethylation” refers to the average methylation state corresponding to an increased presence of 5-mCyt at one or a plurality of CpG dinucleotides within a DNA sequence of a test DNA sample, relative to the amount of 5-mCyt found at corresponding CpG dinucleotides within a normal control DNA sample.
The term “hypomethylation” refers to the average methylation state corresponding to a decreased presence of 5-mCyt at one or a plurality of CpG dinucleotides within a DNA sequence of a test DNA sample, relative to the amount of 5-mCyt found at corresponding CpG dinucleotides within a normal control DNA sample.
The term “bisulfite reagent” refers to a reagent comprising bisulfite, disulfite, hydrogen sulfite or combinations thereof, useful as disclosed herein to distinguish between methylated and unmethylated CpG dinucleotide sequences.
The term “Methylation assay” refers to any assay for determining the methylation state of one or more CpG dinucleotide sequences within a sequence of DNA.
The term “MS.AP-PCR” (Methylation-Sensitive Arbitrarily-Primed Polymerase Chain Reaction) refers to the art-recognized technology that allows for a global scan of the genome using CG-rich primers to focus on the regions most likely to contain CpG dinucleotides, and described by Gonzalgo et al., Cancer Research 57:594-599, 1997.
The term “MethyLight™” refers to the art-recognized fluorescence-based real-time PCR technique described by Eads et al., Cancer Res. 59:2302-2306, 1999.
The term “HeavyMethyl™” assay, in the embodiment thereof implemented herein, refers to an assay, wherein methylation specific blocking probes (also referred to herein as blockers) covering CpG positions between, or covered by the amplification primers enable methylation-specific selective amplification of a nucleic acid sample.
The term “HeavyMethyl™ MethyLight™” assay, in the embodiment thereof implemented herein, refers to a HeavyMethyl™ MethyLight™ assay, which is a variation of the MethyLight™ assay, wherein the MethyLight™ assay is combined with methylation specific blocking probes covering CpG positions between the amplification primers.
The term “Ms-SNuPE” (Methylation-sensitive Single Nucleotide Primer Extension) refers to the art-recognized assay described by Gonzalgo & Jones, Nucleic Acids Res. 25:2529-2531, 1997.
The term “MSP” (Methylation-specific PCR) refers to the art-recognized methylation assay described by Herman et al. Proc. Natl. Acad. Sci. USA 93:9821-9826, 1996, and by U.S. Pat. No. 5,786,146.
The term “COBRA” (Combined Bisulfite Restriction Analysis) refers to the art-recognized methylation assay described by Xiong & Laird, Nucleic Acids Res. 25:2532-2534, 1997.
The term “MCA” (Methylated CpG Island Amplification) refers to the methylation assay described by Toyota et al., Cancer Res. 59:2307-12, 1999, and in WO 00/26401A1.
Colorectal cancer (CRC) is a heterogeneous disease in which unique subtypes are characterized by distinct genetic and epigenetic alterations. Comprehensive genome-scale DNA methylation profiling of 125 colorectal tumors and 29 adjacent normal tissues was performed, and four DNA methylation-based subgroups of CRC were identified using model-based cluster analyses. Each subtype shows characteristic genetic and clinical features, indicating that they represent biologically distinct subgroups.
In particular aspects, a CIMP-high (CIMP-H) subgroup, which exhibits an exceptionally high frequency of cancer-specific DNA hypermethylation, is strongly associated with MLH1 DNA hypermethylation and the BRAFV600E mutation.
In additional aspects, a CIMP-low (CIMP-L) subgroup is enriched for KRAS mutations and characterized by DNA hypermethylation of a subset of CIMP-H associated markers rather than a unique group of CpG islands.
In further aspects, non-CIMP tumors are separated into two distinct clusters. One non-CIMP subgroup is distinguished by a significantly higher frequency of TP53 mutations and frequent occurrence in the distal colon, while the tumors that belong to the fourth group exhibit a low frequency of both cancer-specific DNA hypermethylation and gene mutations, and are significantly enriched for rectal tumors.
In yet further aspects, 112 genes were identified that were downregulated more than 2-fold in CIMP-H tumors together with promoter DNA hypermethylation. These represent approximately 7% of genes that acquired promoter DNA methylation in CIMP-H tumors. Intriguingly, 48/112 genes were also transcriptionally silent in non-CIMP subgroups, but this was not attributable to promoter DNA hypermethylation.
In particular aspects, therefore, four distinct DNA methylation subgroups of CRC were identified, and provide novel insight regarding the role of CIMP-specific DNA hypermethylation in gene silencing.
CRC can be classified based on various molecular features. Identification and characterization of these subtypes has been not only essential to better understand the disease (Jass, 2007), but also valuable in selection of optimal drug treatments, prediction of patient survival, and discovery of risk factors linked to a particular subtype (Walther et al., 2009; Limsui et al., 2010). The Illumina Infinium DNA methylation assay was used herein to investigate DNA methylation-based subgroups in CRC. This BeadArray platform interrogates the gene promoter DNA methylation of all 14,495 consensus coding DNA sequence (CCDS) genes in multiple samples simultaneously and is therefore suitable for a study requiring large-scale promoter DNA methylation profiling of a large number of samples (Bibikova, 2009). Using this platform, four DNA methylation subgroups of CRC were identified herein, based on model-based unsupervised cluster analyses. Importantly, the genetic and clinical correlations observed with each subtype indicate that they represent biologically distinct subgroups.
One subgroup, designated here as CIMP-H, contained all of the CIMP-positive tumors characterized by the MethyLight five-marker panel (i.e., CACNA1G, IGF2, NEUROG1, RUNX3, SOCS1)) previously developed in Applicants' laboratory (Weisenberger et al., 2006) (see also
Six CIMP-H tumors were identified herein, based on the Infinium DNA methylation data, that did not meet the criteria for CIMP using the MethyLight five-gene panel. The MethyLight-based marker panel was developed based on the screening of 195 MethyLight markers (Weisenberger et al., 2006). In the current study, Applicants measured DNA methylation at a much larger number of loci using the Illumina Infinium DNA methylation platform (27,578 CpG sites located at 14,495 gene promoters). According to particular aspects, the additional loci present on the array more accurately identified CIMP tumors, compared to the conventional MethyLight-based five-marker panel. This increased accuracy is likely a reflection of both the inclusion of additional markers which are more tightly associated with CIMP, and the mere fact that a larger number of informative loci will usually outperform a small panel of informative loci. The limited MethyLight panel was designed to be particularly compatible with cost-effective processing of large numbers of formalin-fixed, paraffin-embedded (FFPE) samples, and evertheless, the five-marker CIMP panel has been found to be very useful in large-scale studies of FFPE samples. However, any small panel of markers will likely have some misclassification error in identifying a complex molecular profile, regardless of the composition of the panel.
According to particular aspects, the instant results provide new diagnostic DNA methylation marker panels to identify CIMP (CIMP-H and CIMP-L), as well as to segregate CIMP-H tumors from CIMP-L tumors (see EXAMPLE 6, and
Ogino and colleagues proposed the CIMP-low subgroup, which showed DNA hypermethylation of CIMP-defining markers despite at a low frequency and enrichment for KRAS mutations (Ogino et al., 2006). Applicants herein identified the CIMP-L subgroup through a genome-scale approach and provided a comprehensive DNA methylation profile of these tumors. Importantly, the CIMP-L-associated DNA methylation appears to occur only at a subset of CIMP-H-associated sites, as Applicants did not find evidence for strong CIMP-L-specific DNA methylation at a unique set of CpG sites. Moreover, Applicants found that although KRAS mutations are enriched in CIMP-L tumors, this subtype may not be driven by KRAS mutations, since DNA hypermethylation profiles in KRAS wild-type and mutant tumors within CIMP-L tumors were highly correlated across the CpG sites we examined. The independence of KRAS mutations from CIMP-L status suggests that a more complex molecular signature exists in driving CIMP-L DNA methylation profiles. Recently, Applicants and others have hypothesized that BRAF mutations might be favorably selected in the specific environment that CIMP creates (Hinoue et al., 2009; Suzuki et al., 2010). Similar mechanisms may also result in the enrichment of KRAS mutations in the CIMP-L subgroup.
Shen and colleagues (Shen et al., 2007) reported the CIMP2 subset, along with CIMP1 (CIMP-H) and non-CIMP subsets of CRC, using a 28-gene panel. They found a very strong association of CIMP2 with KRAS mutations (92%), together with DNA hypermethylation of several CIMP-H-associated loci. The CIMP2 subgroup may be similar to the CIMP-L subgroup we identified in our study. However, the present Applicants only detected a KRAS mutation frequency of approximately 50% in CIMP-L tumors. The differences in KRAS mutation frequencies between Applicants' CIMP-L and CIMP2 of Shen et al. likely arise from differences in the CRC patient collections and in the genomic features and technologies used to analyze DNA methylation subgroups of CRC in both studies.
Applicants did not find a statistically significant association of MGMT DNA hypermethylation and CIMP-L status. However, Ogino and colleagues reported statistical significance in their recent report (Ogino et al., 2007). The differences between the instant results and those of Ogino and colleagues may arise from several sources. First, Ogino and colleagues used a different criterion for classifying CIMP-L tumors. Specifically, they classified a tumor sample as CIMP-L if one or two markers from the MethyLight-based CIMP panel showed DNA methylation. By contrast, Applicants' CIMP-L classification was based on Infinium DNA methylation data, a more robust resource of CIMP-L gene markers. Additionally, possible disparities in the CRC sample collections between the studies, such as ethnic population differences, may contribute to CIMP-L classification differences. Finally, there are differences in sample sizes between both studies, which may also contribute to statistical evaluation of CIMP in both collections of CRC tumors.
In particular aspects, Applicants also obtained gene expression profiles in pairs of CIMP-H and non-CIMP tumor-normal adjacent tissues to gain insight into the role of CIMP-specific DNA hypermethylation in colorectal tumorigenesis. Aberrant DNA methylation of promoter CpG islands has been established as an important mechanism that inactivates tumor suppressor genes in cancer (Jones and Baylin, 2007). However, many cancer-specific CpG island hypermethylation events are also found in promoter regions of genes that are not normally expressed, and these may represent “passenger” events that do not have functional consequences (Widschwendter et al., 2007; Gal-Yam et al., 2008). In additional aspects, therefore, Applicants examined effects of CIMP-associated DNA hypermethylation on gene expression, and determined found that only 7.3% of the CIMP-H-specific DNA methylation markers showed a strong inverse relationship with their gene expression levels (see EXAMPLE 7, and
In particular aspects, 112 genes were identified herein that showed both promoter DNA hypermethylation and reduction in gene expression in CIMP-H tumors (see EXAMPLE 7, and
In yet further aspects, Applicants observed that of the 112 genes that exhibited DNA hypermethylation and reduced gene expression in CIMP-H tumors, 48 were also silenced in non-CIMP tumors, but without substantial increases in DNA methylation. CIMP status in CRC has been found to be inversely correlated with the occurrence of chromosomal instability (CIN), which is characterized by aneuploidy, gain and loss of subchromosomal genomic regions and high frequencies of loss of heterozygosity (LOH) (Goel et al., 2007; Cheng et al., 2008). Recently, Chan and colleagues identified genes that are inactivated by both genetic mechanisms (mutation or deletion) and DNA hypermethylation in breast and colorectal cancer (Chan et al., 2008). They observed that these genetic and epigenetic changes are generally mutually exclusive in a given tumor, and that silencing of these genes was associated with poor clinical outcome (Chan et al., 2008). Together, these genes may act as key tumor suppressor genes in CRC and the gene silencing mechanisms can be determined by the underlying molecular pathways involved in colorectal tumorigenesis.
The molecular mechanisms that account for CIMP have not been identified. It has been proposed that CIMP arises through a distinct pathway originating in a variant of hyperplastic polyps and sessile serrated adenomas due to the similar histological and molecular features shared by the CIMP tumors and these lesions (O'Brien, 2007). Some individuals and families with hyperplastic polyposis syndrome have an increased risk of developing CIMP CRC, indicating the existence of a genetic predisposition that could lead to CIMP (Young et al., 2007). Environmental exposures might also influence the risk of developing CIMP CRC. Cigarette smoking was found to be associated with increased risk of developing CIMP CRC in a recent report (Limsui et al., 2010)
Applicant's present sturdy provides the most comprehensive genome-scale analysis of DNA methylation-based subgroups of CRC to date. In particular aspects, the unique DNA methylation profiles in CRC, together with genomic changes, provide a detailed molecular landscape of colorectal tumors. According to particular aspects, the findings have substantial clinical utility for identification and diagnosis of colorectal cancer, as well as for determining particular treatments for CRC patients.
Primary Colorectal Tissue Sample Collection and Processing.
Twenty-five paired colorectal tumor and histologically normal adjacent colonic tissue samples were obtained from colorectal cancer patients who underwent surgical resection at the department of surgery in the Groene Hart Hospital, Gouda, The Netherlands. Tissue samples were stored at −80° C. within one hour after resection. Tissue sections from the surgical resection margin were examined by a pathologist (C. M. van Dijk) by microscopic observation. All patients provided written informed consent for the collection of samples and subsequent analysis. The study was approved by the Institutional Review Board of the Groene Hart Hospital in Gouda and the Leiden University Medical Center and University of Southern California. An additional collection of 100 fresh-frozen colorectal tumor samples and four matched histologically normal-adjacent colonic mucosa tissue samples were obtained from the Ontario Tumor Bank Network (The Ontario Institute for Cancer Research, Ontario, Canada). The tissue collection and analyses were approved by the University of Southern California Institutional Review Board. Genomic DNA and total RNA were extracted simultaneously from the same tissue sample using the TRIZOL®Reagent (Invitrogen, Burlington, ON) according to the manufacturer's protocol.
Mutation Analysis.
BRAF (NM—004333.4; GI:187608632) mutations at codon 600 in exon 15 and KRAS (NG—007524.1; GI:17686616) mutations at codons 12 and 13 in exon 2 were identified using the pyrosequencing assay. Specifically, a 224 bp fragment of the BRAF gene containing exon 15 was amplified from genomic DNA using the following primers: 5′ TCA TAA TGC TTG CTC TGA TAG GA 3′ (SEQ ID NO:1) and 5′Biotin-GGC CAA AAA TTT AAT CAG TGG A 3′(SEQ ID NO:2), and genotyped with the sequencing primer 5′ CCA CTC CAT CGA GAT T 3′ (SEQ ID NO:3). Similarly, a 214 bp fragment of the KRAS gene containing exon 2 was amplified from each genomic DNA sample using the following primers: 5′Biotin-GTG TGA CAT GTT CTA ATA TAG TCA 3′ (SEQ ID NO:4) and 5′ GAA TGG TCC TGC ACC AGT AA 3′ (SEQ ID NO:5), and genotyped with the sequencing primer 5′ GCA CTC TTG CCT ACG 3′ (SEQ ID NO:6).
Mutations in TP53 exons 4 through 8 were determined by direct sequencing of PCR products. Specifically, TP53 exons 4 through 8 were amplified by PCR using three exon-specific primer sets: Exon 4, 5′-GTT CTG GTA AGG ACA AGG GTT-3′ (forward) (SEQ ID NO:7) and 5′-CCA GGC ATT GAA GTC TCA TG-3′ (reverse) (SEQ ID NO:8) (Tm=49° C.); Exons 5 and 6, 5′-GGT TGC AGG AGG TGC TTA C-3′ (forward) (SEQ ID NO:9) and 5′-CCA CTG ACA ACC ACC CTT AAC-3′ (reverse) (SEQ ID NO:10) (Tm=51° C.); Exons 7 and 8, 5′-CCT GCT TGC CAC AGG TCT C-3′ (forward) (SEQ ID NO:11) and 5′-TGA ATC TGA GGC ATA ACT GCA C-3′ (reverse) (SEQ ID NO:12) (Tm=51° C.). PCR amplification was performed using a touchdown protocol with an initial step of 95° C. for 12 minutes, then 5 cycles of 95° C. for 25 sec, Tm+15° C. for 1 min and 72° C. for 1 min, then 5 cycles of 95° C. for 25 sec, Tm+10° C. for 1 min and 72° C. for 1 min, followed by 5 cycles of 95° C. for 25 sec, Tm+5° C. for 1 min and 72° C. for 1 min, finishing with 35 cycles of 95° C. for 25 sec, Tm° C. for 1 min and 72° C. for 1 min.
Sequencing of the purified PCR products was performed using an ABI PRISM BigDye Terminator Cycle Sequencing Ready Reaction Kit (Applied Biosystems, Foster City, Calif.). Cycle sequencing reactions were performed in a thermal cycler for 25 cycles at 96° C. for 10 sec, annealing at 50° C. for 5 sec, and extension at 60° C. for 4 min. Prior to capillary electrophoresis, unincorporated dye terminators were removed from the extension product using a DyeEx 96 Plate (Qiagen, Valencia, Calif.) according to the manufacturer's instructions. The purified extension products were denatured at 90° C. for 2 min and placed on ice for 1 min. Sequencing was performed on an ABI PRISM 3730×1 DNA Analyzer (Applied Biosystems). The sequencing output files (.ab1) were processed using the Phred/Phrap software package developed at the University of Washington (Nickerson et al., 1997; Ewing and Green, 1998; Ewing et al., 1998; Gordon et al., 1998). Sequence Alignments for each exon read were viewed in the Consed Viewer Software and sequence variations were annotated and recorded.
Samples containing missense mutations, nonsense mutations, splice-site mutations, frame-shift mutations, and in-frame deletions were considered positive for a mutation.
DNA Methylation Assays.
For MethyLight-based assays, genomic DNAs were treated with sodium bisulfite using the Zymo EZ DNA Methylation Kit (Zymo Research, Orange, Calif.) and subsequently analyzed by MethyLight as previously described (Campan et al., 2009; incorporated herein by reference it its entirety). The primer and probe sequences for the MethyLight reactions for the five-gene CIMP marker panel and MLH1 were reported previously (Weisenberger et al., 2006; incorporated herein by reference in its entirety). The results of the MethyLight assays were scored as PMR (Percent of Methylated Reference) values as previously defined, with a PMR of ≧10 was used as a threshold for positive DNA methylation in each sample (Weisenberger et al., 2006; Campan et al., 2009). A sample was scored as CIMP-positive if ≧3 of the five CIMP-defining markers gave PMR values≧10.
The Illumina Infinium HumanMethylation27 DNA methylation assay technology has been described previously (Bibikova, 2009; incorporated herein by reference in its entirety). Briefly, genomic DNA was bisulfite converted using the EZ-96 DNA Methylation Kit (Zymo Research) according to the manufacturer's instructions. The amount of bisulfite converted DNA and completeness of bisulfite conversion was assessed using a panel of MethyLight-based quality control (QC) reactions as previously described (Campan et al., 2009). All of the samples in this study passed Applicants' QC tests and entered into the Infinium DNA methylation assay pipeline. The Infinium DNA methylation assay was performed at the USC Epigenome Center according to the manufacturer's specifications (Illumina, San Diego, Calif.). The Illumina Infinium DNA methylation assay examines DNA methylation status of 27,578 CpG sites located at promoter regions of 14,495 protein-coding genes and 110 microRNAs. A measure of the level of DNA methylation at each CpG site is scored as beta (β) values ranging from 0 to 1, with values close to 0 indicating low levels of DNA methylation and close to 1 high levels of DNA methylation (Bibikova, 2009). The detection P values measure the difference of the signal intensities at the interrogated CpG site compared to those from a set of 16 negative control probes embedded in the assay. All data points with a detection P value >0.05 were identified as not statistically significantly different from background measurements, and therefore not trustworthy measures of DNA methylation. These data points were replaced by “NA” values as previously described (Noushmehr et al., 2010). More specifically, for the Illumina Infinium DNA methylation data analysis, data points were masked as “NA” for probes that might be unreliable (see the Supplemental Methods). All data points with a detection P value >0.05 were identified and replaced by “NA” values. Finally, probes that are designed for sequences on either the X- or Y-chromosome were excluded. DNA methylation data sets which did not contain any “NA”-masked data points were analyzed. DNA methylation βvalues were normalized to eliminate the batch effects. Briefly, the batch means of β-values were brought closer to the overall mean while retaining the original range of DNA methylation data (0 to 1) (Pan et al., manuscript in preparation). Only the tumor samples were used to calculate the batch means and overall mean in estimating the scaling factor for each batch. For the gene expression analysis, unreliable probes (9%), as described by Barbosa-Morais et al., were removed from the subsequent analysis (Barbosa-Morais et al., 2010). Data point were masked as “NA” for probes that contained single-nucleotide polymorphisms (SNPs) (dbSNP NCBI build 130/hg18) within the five base pairs from the interrogated CpG site or that overlap with a repetitive element that covers the targeted CpG dinucleotide. Furthermore, data points were replaced with “NA” for probes that are not uniquely aligned to the human genome (NCBI build 36/hg18) at 20 nucleotides at the 3′ terminus of the probe sequence, and those that overlap with regions of insertions and deletions in the human genome. Together, data points for 4,484 probes were masked. The assay probe sequences and detailed information on each interrogated CpG site and the associated genomic characteristics on the HumanMethylation27 BeadChip can be obtained at www.illumina.com, and these data are incorporated herein by reference in their entirety. All Infinium DNA methylation data are available at the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE25062, and these data are incorporated herein by reference in their entirety.
Validation of Infinium DNA Methylation Data by MethyLight Assay.
Genomic DNA from 25 pairs of colorectal tumor and their adjacent normal samples were treated with sodium bisulfite using the Zymo EZ96 DNA Methylation Kit (Zymo Research) and subsequently analyzed by MethyLight as previously described (Campan et al., 2009). Primers and probes used for validation are as follows and are listed as 5′ to 3′: SFRP1, forward primer: 5′ GAA TTC GTT CGC GAG GGA 3′ (SEQ ID NO:13), reverse primer: 5′ AAA CGA ACC GCA CTC GTT ACC 3′ (SEQ ID NO:14), probe: 6FAM-CCG TCA CCG ACG CGA AAA CCA AT-BHQ-1 (SEQ ID NO:15); TMEFF2, forward primer: 5′ GTT AAA TTC GCG TAT GAT TTC GAG A 3′ (SEQ ID NO:16), reverse primer: 5′ TTC CCG CGT CTC CGA C 3′ (SEQ ID NO:17), probe: 6FAM-AAC GAA CGA CCC TCT CGC TCC GAA-BHQ-1 (SEQ ID NO:18); LMOD1, forward primer: 5′ TTT TAA AGA TAA GGG GTT ACG TAA TGA G 3′ (SEQ ID NO:19), reverse primer: 5′ CCG AAC TAA CGA ATT CAC CGA C 3′ (SEQ ID NO:20), probe: 6FAM-TCG TCC CTA CTT ATC TAA CTC TCC GTA-MGBNFQ (SEQ ID NO:21). The results of the MethyLight assays were scored as PMR (Percent of Methylated Reference) values as previously defined (Weisenberger et al., 2006; Campan et al., 2009).
Gene Expression Assay.
Gene expression assay was performed on 25 pairs of colorectal tumor and non-tumor adjacent tissue samples using the Illumina Ref-8 whole-genome expression BeadChip (HumanRef-8 v3.0, 24,526 transcripts) (Illumina). Scanned image and bead-level data processing were performed using the BeadStudio 3.0.1 software (Illumina). The summarized data for each bead type were then processed using the lumi package in Bioconductor (Du et al., 2008). The data were log2transformed and normalized using Robust Spline Normalization (RSN) as implemented in the lumi package. Specifically, total RNA from 26 pairs of colorectal tumor and non-tumor adjacent tissue samples was isolated using the TRIZOL® Reagent (Invitrogen, Burlington, ON) according to the manufacturer's protocol. The concentrations of RNA samples were measured using the NanoDrop 8000 (Thermo Fisher Scientific, Waltham, Mass.). The quality of the RNA samples was assessed using the Experion RNA StdSens analysis kit (Bio-Rad, Hercules, Calif.). Expression analysis was performed using the Illumina Ref-8-whole-genome expression BeadChip (HumanRef-8 v3.0, 24,526 transcripts) (Illumina, San Diego, Calif.). Briefly, RNA samples were processed using the Illumina TotalPrep RNA Amplification Kit (Illumina). Total RNA (500 ng) from each sample was subject to reverse transcription with an oligo(dT) primer bearing a T7 promoter. The cDNA then underwent second strand synthesis and purification. Biotinylated cRNA was then generated from the double-stranded cDNA template through in vitro transcription with T7 RNA polymerase. The biotinylated cRNA (750 ng) from each patient was then hybridized to the BeadChips. The hybridized chips were stained and scanned using the Illumina HD BeadArray scanner (Illumina). Scanned image and bead-level data processing were performed using the BeadStudio 3.0.1 software (Illumina). The summarized probe profile data and processed expression data are available at the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE25070, and these data are incorporated herein by reference in their entirety.
Validation of the Illumina Gene Expression Array Data by Quantitative RT-PCR Assay.
Total RNA sample from 25 pairs of colorectal tumor and non-tumor adjacent tissue samples were treated with DNase using DNA-free™ kit (Applied Biosystems) to remove contaminating DNA. Reverse transcription reaction was performed using iScript Reverse Transcription Supermix for RT-PCR (Bio-Rad). Quantitative RT-PCR assays were performed with primers and probes obtained from Applied Biosystems (SFRPJ: Hs00610060_m1_M; TMEFF2: Hs00249367_m1_M; LMOD1: Hs00201704_m1_M). The raw expression values were normalized to those of HPRT1 (Hs99999909_m1_M).
Unsupervised Clustering.
Recursively partitioned mixture model (RPMM) was used for the identification of colorectal tumor subgroups based on the Illumina Infinium DNA methylation data. RPMM is a model-based unsupervised clustering approach developed for beta-distributed DNA methylation measurements that lie between 0 and 1 and implemented as RPMM Bioconductor package (Houseman et al., 2008). Probes were identified that do not contain any “NA”-masked data points and then RPMM clustering was performed on 2,758 probes (ten percent of original probes) that showed the most variable DNA methylation levels across the colorectal tumor panel. A fanny algorithm (a fuzzy clustering algorithm) was used for initialization and level-weighted version of Bayesian information criterion (BIC) as a split criterion for an existing cluster as implemented in the R-based RPMM package. The logit (logistic) transformation was applied to DNA methylation β-values and each probe was median-centered across the tumor samples. Consensus clustering was then performed using the same 2,728 Infinium DNA methylation probes that were used for RPMM-based clustering. The optimal number of clusters was assessed based on 1,000 re-sampling iterations (seed value: 1022) of K-means clustering for K=2,3,4,5,6 with Pearson correlation as the distance metric as implemented in the R/Bioconductor ConsensusClusterPlus package.
Statistical Analysis and Visualization.
Statistical analysis and data visualization were carried out using the R/Biocoductor software packages (http://www.bioconductor.org). The Wilcoxon Rank Sum test and the Wilcoxon Signed Rank test were used to evaluate the difference in DNA methylation β-value for each probe between two independent groups and between tumor and matched adjacent normal tissues, respectively. False-discovery rate (FDR) adjusted P values for multiple comparisons were calculated using Benjamini and Hochberg approach. The Illumina Infinium DNA methylation βvalues were represented graphically using a heatmap, generated by the R/Bioconductor packages βplots and Heatplus. Ordering of the samples within a RPMM class in the heatmaps was obtained by using the function “seriate” in the seriation package.
Classification and Selection of Cancer-Specific DNA Methylation Markers.
Gene promoters that exhibited cancer-specific DNA methylation were categorized into three groups. Four hundred fifteen (415) unique gene promoters were selected that showed significant CIMP-H-specific DNA hypermethylation (FDR-adjusted P<0.0001 for CIMP-H vs. non-CIMP tumors and P>0.05 for CIMP-L vs. non-CIMP tumors), and seventy three (73) gene promoters were selected that showed DNA hypermethylation in both CIMP-H and CIMP-L tumors (FDR-adjusted P<0.0001 for CIMP-H vs. non-CIMP and CIMP-L vs. non-CIMP). For the third category, five hundred forty seven (547) genes were identified that acquired cancer-specific DNA hypermethylation irrespective of CIMP status (FDR-adjusted P<0.00001 for 29 paired tumor vs. adjacent non-tumor tissue). The genes are listed in Table 4. (Supplemental Table 4 for a list of genes).
Identification of Diagnostic CIMP-Associated DNA Methylation Gene Marker Panels.
The top 20 Infinium DNA methylation probes that are significantly hypermethylated in CIMP (CIMP-H and CIMP-L) compared with non-CIMP tumors based on the Wilcoxon rank-sum test were first selected. Using the conditions that DNA methylation β-value≧0.1 of three or more markers qualifies a sample as CIMP, a five-probe panel was determined that best classify CIMP (CIMP-H and CIMP-L) by calculating sensitivity and specificity, and overall misclassification rate for each random combination of the top 20 probes. For the CIMP-H-specific marker panel, top 20 probes were first selected that are significantly hypermethylated in CIMP-H compared with CIMP-L tumors. A five-marker panel was then chosen that showed the best sensitivity and specificity, and overall misclassification rate to classify CIMP-H using the conditions that three or more markers show DNA methylation β-value threshold of ≧0.1.
Integrated Analyses of the Illumina Infinium DNA Methylation and Gene Expression Data.
One probe was selected for each gene that showed the highest absolute mean β-value difference between tumor and normal-adjacent samples. The DNA methylation was then merged with the gene expression data set using Entrez Gene IDs using the R merge function. Expression data points with a detection P value >0.01, computed by BeadStudio software, were considered as not distinguishable from the negative control measurements, and therefore not expressed. A mean β-value difference (|Δβ|) of 0.20 was used as a threshold for differential DNA methylation. This threshold of |Δβ|=0.20 was determined previously as a stringent estimate of Δβ detection sensitivity across the range of β-values (Bibikova, 2009).
Comprehensive genome-scale DNA methylation profiling of 125 colorectal tumor samples and 29 histologically normal-adjacent colonic tissue samples was performed using the Illumina Infinium DNA methylation assay, which assesses the DNA methylation status of 27,578 CpG sites located at the promoter regions of 14,495 protein-coding genes (Bibikova, 2009) (see working Example 1 above for more details). The mutation status of the BRAF, KRAS, and TP53 genes was also identified in the tumor samples. CRC subtypes were first determined based on DNA methylation profiles in the collection of 125 tumor samples. Probes that might be unreliable (see the Supplemental Methods section) and probes that are designed for sequences on either the X- or Y-chromosome were excluded. The top ten percent of probes with the highest DNA methylation variability based on standard deviation of the DNA methylation β-value across the entire colorectal tumor panel (2,758 probes) was selected, and then unsupervised clustering was performed using a recursively partitioned mixture model (RPMM). RPMM is a model-based unsupervised clustering method specifically developed for beta-distributed DNA methylation data such as obtained on the Infinium DNA methylation assay platform (Houseman et al., 2008). We identified four distinct tumor subgroups were identified by this approach, and designated as clusters 1, 2, 3 and 4 (
Genetic and clinical features of each cluster are summarized in Table 1 below.
For comparison, resampling-based unsupervised consensus clustering (Monti et al., 2003) of the DNA methylation data set was also performed, and four DNA methylation based clusters were also identified using this method. The DNA methylation consensus cluster assignments for each sample were compared to their RPMM-based cluster assignments and substantial overlap was found with 80% (100/125) of the tumors showing agreement in cluster membership calls between these two different clustering methods (
Subsequent analyses were based on cluster membership derived from RPMM-based unsupervised clustering method, which is particularly well-suited for beta-distributed DNA measurements, and has successfully identified DNA methylation profiles that are clinically relevant in normal and tumor samples from diverse tissues types (e.g., Christensen et al., 2009a; Christensen et al., 2009b; Marsit et al., 2009; Christensen et al., 2010; Christensen et al., 2011; Marsit et al., 2011).
The cluster 1 subgroup is enriched for CIMP-positive colorectal tumors, as determined by the CIMP-specific MethyLight five-marker panel developed previously in Applicants' laboratory (CACNA1G, IGF2, NEUROG1, RUNX3, SOCS1) (Weisenberger et al., 2006), as well as MLH1 DNA hypermethylation using MethyLight technology (see
Previous studies with a limited number of DNA methylation markers from several groups indicated the existence of additional DNA methylation-based subtypes in CRC which are associated with KRAS mutations. These subgroups have been variously described as CIMP-low (Ogino et al., 2006), CIMP2 (Shen et al., 2007), and Intermediate-methylation epigenotype (IME) (Yagi et al., 2010). It is not clear whether these classifications represent the same tumor subgroup or different subgroups within CRC. We found that although KRAS mutant tumors are represented across the four classes, they are more common in the cluster 2 subgroup compared to the other clusters (
These genetic and epigenetic characteristics observed in the cluster 2 subgroup are consistent with the CIMP-low subtype described previously (Ogino et al., 2006). Therefore, in this study, we refer to the tumors that belong to the cluster 1 subgroup as CIMP-high (CIMP-H) and the cluster 2 subgroup tumors as CIMP-low (CIMP-L).
Applicants' RPMM-based clustering analysis identified two other CRC subtypes, designated as clusters 3 and 4, in addition to the CIMP-H and CIMP-L subgroups (
Importantly, the tumors included in cluster 3 are distinguished by a significantly higher frequency of TP53 mutations (65%) [P=6.5×10−5 (vs. cluster 4), Fisher's exact test] and their location in the distal colon (65%) [P=0.028 (vs. cluster 4), Fisher's exact test]. In contrast, the tumors that belong to cluster 4 exhibit a lower frequency of both KRAS (16%) and TP53 (16%) mutations, and their occurrence shows significant enrichment in the rectum compared to all the other groups (P=2.1×10−3, Fisher's exact test). Cluster 4 tumors also show borderline statistical significance to be more commonly found in males compared to the cluster 3 tumors (P=0.056, Fisher's exact test), providing additional lines of evidence that cluster 3 and 4 tumors are distinct.
A panel of 119 gene promoters was also identified that are constitutively methylated in normal samples, but show variable levels of DNA methylation in tumors (
DNA methylation markers associated with CIMP-H and CIMP-L subgroups were investigated. To accomplish this, the DNA methylation β-values for each probe was compared between CIMP-H and non-CIMP tumors (cluster 3 and 4 combined) as well as the β-values between CIMP-L and non-CIMP tumors using the Wilcoxon rank-sum test. Applicants identified 1,618 CpG sites that showed significant DNA hypermethylation in CIMP-H versus non-CIMP tumors (FDR-adjusted P<0.0001) (
Specifically,
In order to determine whether there are DNA methylation markers specifically associated with CIMP-L subgroup, 22 CpG sites were examined that showed significant DNA hypermethylation in CIMP-L tumors, but not in CIMP-H tumors, as compared to non-CIMP tumors [FDR-adjusted P<0.001 (CIMP-L vs. non-CIMP) and P>0.05 (CIMP-H vs. non-CIMP)] (
Specifically, we also did not find a significant increase in MGMT DNA hypermethylation in CIMP-L tumors compared with non-CIMP tumors (P>0.05), as reported previously (Ogino et al., 2007). Clinically, Ogino and colleagues observed a significant association between CIMP-L and male sex (Ogino et al., 2006). Present Applicants also found that CIMP-L tumors are slightly more common in men (59%) than women (41%), although the association did not achieve statistical significance (P>0.05, Fisher's exact test).
Significant enrichment of KRAS mutations in the CIMP-L may suggest that KRAS mutations either induce DNA hypermethylation of a group of CpG loci or they might synergize with a specific DNA methylation profile associated with CIMP-L tumors. Interestingly, Shen et al. proposed a CIMP2 subtype of CRC, found to be tightly linked with KRAS mutations (92% of cases), using a limited number of DNA methylation markers (Shen et al., 2007).
In this Example, Applicants investigated whether KRAS mutations themselves are associated with DNA hypermethylation of specific sets of genes in CRC. We stratified tumors into three groups by their BRAF and KRAS mutation status: 1) BRAF mutant (n=17), 2) KRAS mutant (n=34), and 3) wild-type for both BRAF and KRAS (n=74), and then compared DNA methylation profiles between each group. A large number of CpG sites (715, FDR-adjusted P<0.0001) were identified that are significantly hypermethylated in tumors with BRAF mutation, all of which belong to the CIMP-H subgroup, as compared with tumors with wild-type for BRAF and KRAS (
To further examine the DNA methylation profiles in KRAS mutant tumors and BRAF/KRAS wild-type tumors, CIMP-L and non-CIMP tumors were subdivided by their KRAS mutation status and the mean DNA methylation β-values were compared among these groups. Mean DNA methylation β-values for KRAS mutant tumors and those BRAF/KRAS wild-type tumors were observed to be well correlated within both the CIMP-L and non-CIMP subgroups (
Specifically,
In this working example, gene promoters that acquired cancer-specific DNA methylation were classified into three categories based on their DNA methylation level profiles across colorectal tumor subtypes (see Methods of Example 1 herein, and Table 5 below): 1) CIMP-associated DNA methylation markers specific for the CIMP-H subgroup only, 2) CIMP-specific DNA methylation shared between both the CIMP-H and CIMP-L subgroups, and 3) non-CIMP cancer-specific DNA methylation. For comparison, 500 gene promoters were included in two additional groups that did not exhibit cancer-specific DNA methylation profiles, and were either constitutively methylated or unmethylated across tumor and adjacent-normal tissue samples (
Applicants explored whether the distinction between these groups of promoters can be attributable to simple structural and sequence characteristics. The majority of genes in all three groups that exhibited cancer-specific DNA methylation as well as the genes that were constitutively unmethylated in normal and tumor tissues are located within CpG islands defined by Takai and Jones (Takai and Jones, 2002) (see
Present Applicants did not observe significant differences in the overall distribution with respect to the CpG observed-to-expected ratio, G:C content, and CpG island length among these four groups of DNA sequences (
Applicants also considered that specific sequence motifs or repeat sequences surrounding CpG islands may have a role in differential DNA hypermethylation specifically in CIMP tumors. There was no enrichment or depletion of any di- or tetranucleotide sequences and known transcription factor binding sites in the CIMP-associated CpG islands (data not shown). Recently, Estecio and colleagues reported that retrotransposons are more frequently associated with CpG islands that are resistant to DNA hypermethylation than those that are susceptible to DNA hypermethylation (Estecio et al., 2010). Consistent with their observations, we found that the distances of Infinium DNA methylation probes to the nearest ALU repetitive element were significantly different between cancer-specifically methylated DNA promoter sequences (median distance: 4,300 bp) and those that do not exhibit cancer-specific DNA methylation changes (median distance: 1,730 bp) (P<2.2×10−16, Wilcoxon rank-sum test) (
The trimethylation status of histone H3 lysine 4 (H3K4me3) and histone H3 lysine 27 (H3K27me3) were next identified in human ES cells for genes in the five classification groups described above using a previously published dataset (Ku et al., 2008). The genes that are constitutively unmethylated across tumor and adjacent-normal tissue samples were found to be highly enriched for H3K4me3, whereas those that are constitutively methylated are enriched for chromatin states with neither marks in ES cells (
In this working example, Applicants developed diagnostic DNA methylation gene marker panels to identify CIMP (CIMP-H and CIMP-L), as well as to segregate CIMP-H tumors from CIMP-L tumors based on the Infinium DNA methylation data (
In particular aspects, a CIMP-defining marker panel consisting of B3GAT2, FOXL2, KCNK13, RAB31 and SLIT1 was identified. Using the conditions that DNA methylation of three or more markers qualifies a sample as CIMP, this panel identifies CIMP-H and CIMP-L tumors with 100% sensitivity and 95.6% specificity with 2.4% misclassification using a β-value threshold of ≧0.1.
In particular aspects, a second marker panel of FAM78A, FSTL1, KCNC1, MYOCD, and SLC6A4 specifically identifies CIMP-H tumors with 100% sensitivity and 100% specificity (0% misclassification) using conditions that three or more markers show DNA methylation β-value threshold of ≧0.1.
In certain aspects, a tumor sample is classified as CIMP-H if both marker panels are positive (three or more markers with DNA methylation for each panel).
In further aspects, a tumor sample is classified as CIMP-L if the CIMP-defining marker panel is positive while the CIMP-H specific panel is negative (0-2 genes methylated).
Table 7 lists the gene and CpG island locations and sequences for the 10 marker genes comprising these two marker panels (i.e., B3GAT2, FOXL2, KCNK13, RAB31 and SLIT1; and FAM78A, FSTL1, KCNC1, MYOCD, and SLC6A4).
Table 11 lists the primer, probe and unconverted amplicon sequences for the MethyLight reactions for the 10 marker genes comprising these two marker panels (i.e., B3GAT2, FOXL2, KCNK13, RAB31 and SLIT1; and FAM78A, FSTL1, KCNC1, MYOCD, and SLC6A4), and for the MLH1 gene.
In yet further aspects, identification and/or classification of CIMP-H and CIMP-L subgroups is provided by a panel comprising at least one of the additional markers listed in Table 8. According to particular aspects,
In yet further aspects, identification and/or classification of CIMP-H subgroups is provided by a panel comprising at least one of the additional markers listed in Table 9.
In additional aspects the MethyLight five-marker panel (i.e., CACNA1G, IGF2, NEUROG1, RUNX3, SOCS1), or markers thereof, previously developed in Applicants' laboratory (Weisenberger et al., Nat Genet 38: 787-793, 2006; see also published U.S. patent application Ser. No. 11/913,535, DNA METHYLATION MARKERS ASSOCIATED WITH THE CPG ISLAND METHYLATOR PHENOTYPE (CIMP) IN HUMAN COLORECTAL CANCER, published as US-2009-0053706-A1 to Laird; all incorporated by reference herein in their entirety; and see Table 10) are used in combination with the panels disclosed herein to provide for identification and/or classification of CRC.
Promoter CpG island DNA hypermethylation can lead to transcriptional silencing of the associated gene. However, the majority of cancer-specific CpG island hypermethylation may occur in gene promoters that are not normally expressed, and therefore may not be involved in tumor initiation or progression (Widschwendter et al., 2007; Gal-Yam et al., 2008).
In this working example, Applicants examined the extent to which cancer-specific DNA hypermethylation affects gene expression in colorectal tumors, by performing an integrated analysis of promoter DNA methylation and gene expression data from six CIMP-H normal adjacent-tumor pairs and 13 pairs of non-CIMP tumors and adjacent-normal tissues. Applicants found that 7.3% of genes that showed DNA hypermethylation (|Δβ|>0.20) in CIMP-H tumors also showed more than a 2-fold reduction in gene expression (
Applicants found that 112 genes (24%) that are downregulated in CIMP-H are directly associated with promoter DNA hypermethylation (Table 6 below).
Furthermore, 12 genes were identified that are both downregulated and cancer-specifically hypermethylated in both CIMP-H and non-CIMP tumors (
Intriguingly, 48/112 genes were also identified that are downregulated in both CIMP-H and non-CIMP tumors compared with the matched adjacent normal colon. However, substantial increases in promoter DNA methylation for these genes were observed only in CIMP-H tumors. This finding was confirmed for the LMOD1 gene using MethyLight and qRT-PCR technologies (
This application claims the benefit of priority to U.S. Provisional Patent Application Ser. Nos. 61/492,749 filed 2 Jun. 2011, and 61/492,325 filed 1 Jun. 2011, both of which are incorporated by reference herein in their entirety.
This invention was made with government support under Contract No. 5R01CA118699 awarded by the National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
61492325 | Jun 2011 | US | |
61492749 | Jun 2011 | US |