The present invention is directed to methods of determining the prognosis of a subject having colon cancer. Collections of genes whose expression levels are informative of colon cancer prognosis are also disclosed.
Oncologists are often faced with difficult treatment decisions regarding the use of chemotherapy and adjuvant radiation therapy for various tumors. Patients and oncologists are increasingly looking for prognostic indicators to help them make these difficult decisions. Since these treatments have significant toxicity and inherent dangers, it is critical to have means to help determine prognosis and minimize adverse events as a result of over-treating patients who would have fared well without aggressive treatments.
With the advent of accurate and rapid means to analyze the RNA and DNA found in tumors, diagnostic tests that predict outcome are increasingly utilized in clinical settings to help guide treatment decisions for clinicians. In particular, patients who suffer from breast cancer have recently been able to have their tumors analyzed using molecular genetic techniques to help predict their disease outcome. This initial breast cancer prognostic test consisted of a mutation analysis of a small number of genes including, BRCA1, BRCA2, and BRCA3. Analysis of ErbB2 status has also been helpful in guiding patient treatment with targeted therapies such as Herceptin.
Although these initial analyses provided some useful information for a subset of breast cancer patients, it did not provide useful prognostic information for the vast majority of patients. Therefore, more recent attempts to provide prognostic information for breast cancer tumors have been based on gene expression patterns of multiple genes.
Several recent publications report the use of microarray gene expression analysis to characterize tumors such as breast cancers (Golub et al, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, 286(5439):531-537 (1999); Bhattarcharjee et al, “Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinoma Subclasses,” Proceed. Natl. Acad. Sci. U.S.A., 98(24):13790-13795 (2001); Ramaswamy et al, “Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures,” Proceed. Natl. Acad. Sci. U.S.A., 98(26):15149-15154 (2001); Martin et al, “Linking Gene Expression Patterns to Therapeutic Groups in Breast Cancer,” Cancer Res., 60(8):2232-2238 (2000); West et al, “Predicting the Clinical Status of Human Breast Cancer by Using Gene Expression Profiles,” Proceed. Natl. Acad. Sci. U.S.A., 98(20):11462-11467 (2001)). These studies have shown gene expression patterns specific to breast cancer tumors that may have prognostic value. (Sorlie et al, “Gene Expression Patterns of Breast Carcinomas Distinguish Tumor Subclasses with Clinical Implications,” Proceed. Natl. Acad. Sci. U.S.A., 98(19):10869-10874 (2001); Yan et al, “Dissecting Complex Epigenetic Alterations in Breast Cancer Using CpG Island Microarrays,” Cancer Res., 61(23):8375-8380 (2001); Van De Vijver et al, “A Gene-Expression Signature as a Predictor of Survival in Breast Cancer,” N. Engl. J. Med., 347(25):1999-2009 (2002)). Using similar techniques, commercial products like Oncotype Dx (Genomic Health, Redwood City, Calif.) have been developed, making breast cancer prognosis widely available.
Similar testing for other cancers, such as colon cancer, are currently not available. This year, over 153,000 new cases of colorectal cancer (CRC) will be diagnosed, and 52,180 patients will die from this disease in the United States. There is an urgent need to improve colorectal cancer prognosis by developing accurate molecular techniques that will complement the clinico-pathology, as well as to identify individuals with early disease.
The present invention is directed to overcoming these and other deficiencies in the art.
A first aspect of the present invention relates to a method for determining the prognosis of a subject having colon cancer that involves obtaining a biological sample from the subject and detecting expression levels of at least five genes selected from a group of 176 genes informative of colon cancer prognosis. The group of 176 genes informative of colon cancer prognosis includes the following genes: ACSL4, RQCD1, AA058828*, AIP, AKR1A1, AP3D1, ARL2BP, ARL4A, ARL6IP4, OGFOD2, ASNA1, ATP5B, C12orf52, C19orf36, C1GALT1, C1orf144, C5orf23, C6orf15, C7orf10, C8orf70, CALML4, CASP1, CCNA2, CCT2, CDC42BPA, AK023058*, CDR2L, CFB, CHST12, CLN5, CMPK1, CNOT7, CNPY2, COBL, COMMD4, COX5A, CXCL11, CYB561, CYB5B, DAZAP2, DDX23, DENND2A, DENND2D, DHX15, AL359599*, DND1, DOCK9, EGFR, ELP3, ERP29, ETV1, FAM82C, FDFT1, FKBP14, FLJ10357, FRYL, GALNS, GCHFR, GHITM, GLS, GPR177, GRB10, GREM2, GRHPR, GRP, GSR, GSTA1, H2AFZ, HOXB7, IFT88, IL15RA, ISG20, ITGAE, KIAA0746, SERINC2, KIF13B, KLC1, LAMP3, LANCL1, LAP3, LEPREL1, LL22NC03-5H6.5, LOC100131861, SAMM50, LRRC41, LRRC47, MAP4, MAPKAPK5, MCM5, MCRS1, METRN, METTL3, MFHAS1, MMP3, MOSPD1, MRPL46, MTUS1, MYRIP, N4BP2L2, NAB1, NAT1, NDUFC1, NISCH, NUMB, OGT, OSBPL3, PAM, PBK, PDGFA, PEBP1, PGDS, PIGR, PIGT, PRDM2, PRELP, PSMA5, PSMD9, PSPC1, PTHLH, R3HCC1, RP3-377H14.5, RPLP0, RPLP0-like, RPS27L, RTN2, RYK, SAV1, SCAMP1, SERPINA1, SF3B1, SFPQ, SGCD, SLC25A3, SLC39A8, SMG7, SMURF2, SORD, SOX4, SPATA5L1, SQRDL, SRP72, SSNA1, STK3, SYNGR1, TAPBPL, TEGT, TES, TLN1, TMCC1, TMEM106C, TMEM16A, TMEM33, TMEM87A, TNFRSF10B, TNFSF10, TNIK, TRIM36, U2AF2, UBE2L6, UCP2, UNC84A, UQCRFS1, UQCRH, USP12, USP3, VPS41, WARS, WDR1, WDR68, XPO7, YBX1, ZC3H7B, ZMYM2, ZMYM5, ZNF117, and ZNF430. This method further involves comparing the detected expression levels of the at least five genes from the biological sample with the expression levels of the corresponding at least five genes when associated with a good disease prognosis expression profile and when associated with a bad disease prognosis expression profile. Based on that comparison, the prognosis of the subject having colon cancer is determined.
Another aspect of the present invention relates to a method for determining the prognosis of a subject having colon cancer that involves obtaining a biological sample from the subject and detecting the expression level of at least five genes selected from a group of 101 genes informative of colon cancer prognosis. The group of 101 genes informative of colon cancer prognosis includes the following genes: NARS, WDR1, WARS, CCT4, ATP5B, SORD, UBE2L6, PSME2, AIP, RRM2, LRRC41, CCT2, TAF9, HDAC5, SVIL, CCNB2, DBN1, PBX2, RFC5, IDE, MAD2L1, PSMA4, NDUFC1, IVD, PPIH, NEO1, CXCL10, FXN, GABBR1, ARHGAP8, LOC553158, HOXA4, COMMD4, DFFB, KLF12, GLMN, CASP7, PIR, ATP5G3, ACTN1, DDOST, TAPBP, RGL2, CYB561, TUSC3, C3orf63, GRB10, NR2F1, WDR68, CXCL2, CNPY2, CASP1, INDO, PFKM, CXCL11, MCAM, MAP2K5, MRPS11, NOLC1, CD59, CAMSAP1L1, SHANK2, KLC1, EMP1, C1orf95, GMDS, RPLP0, RPLP0-like, PDLIM4, PAM, TM4SF1, BEX4, ADORA1, FAM48A, ITM2B, PREB, CMPK1, LAP3, FAM82C, AACS, RP5-1077B9.4, NUP37, RHBDF1, PBK, TIPIN, TMEM204, ALG6, NPR3, SCD5, FLJ13236, GPATCH4, GREM2, RPL22, KLHL3, C15orf44, USP3, TNS1, ZBTB20, RTN2, FLJ10357, and CALML4. This method further involves comparing the detected expression levels of the at least five genes from the biological sample with the expression levels of the corresponding at least five genes when associated with a good disease prognosis expression profile and when associated with a bad disease prognosis expression profile. Based on that comparison, the prognosis of the subject having colon cancer is determined.
The present invention is also directed to a method of identifying an agent that improves the prognosis of a subject having colon cancer. This method involves administering the agent to the subject having colon cancer and obtaining a first biological sample from the subject before said administering and a second biological sample from the subject after said administering. The method further involves detecting the expression levels of at least five genes selected from the group of 176 genes informative of colon cancer prognosis disclosed supra. Determining increases or decreases in the expression levels of the at least five genes in the second sample compared to the first sample identifies an agent that improves the prognosis of a subject having colon cancer.
Another aspect of the present invention is directed to a collection of 71 genes having expression levels informative for predicting a prognosis of a patient having colon cancer. The collection of 71 genes comprises the following genes: SLC25A3, DAZAP2, TEGT, ERP29, PSMA5, DDX23, LOC100131861, SAMM50, SFPQ, NISCH, CYB5B, TMEM106C, EGFR, MCRS1, SERPINA1, CCNA2, NDUFC1, COX5A, GCHFR, ITGAE, PRDM2, PDGFA, GSR, GRP, COMMD4, XPO7, YBX1, SRP72, UCP2, SLC39A8, NAB1, WDR68, CXCL11, RECQL, CASP1, PTHLH, UNC84A, MTUS1, KIAA0746, SERINC2, DOCK9, FRYL, MAPKAPK5, LRRC47, RQCD1, TNIK, RPLP0, RPLP0-like, CLN5, NAT1, CDC42BPA, GSTA1, ZMYM5, RYK, PIGT, CMPK1, SQRDL, FAM82C, CNOT7, LL22NC03-5H6.5, PSPC1, TAPBPL, METRN, PBK, MRPL46, FKBP14, C1GALT1, GREM2, GPR177, DND1, and PRELP.
Another aspect of the present invention is related to a collection of 101 genes having expression levels informative for predicting a prognosis of a patient having colon cancer. The collection of 101 genes comprises the following genes: AACS, ACTN1, ADORA1, AIP, ALG6, ARHGAP8, L00553158, ATP5B, ATP5G3, BEX4, C15orf44, C1orf95, C3orf63, CALML4, CAMSAP1L1, CASP1, CASP7, CCNB2, CCT2, CCT4, CD59, CMPK1, CNPY2, COMMD4, CXCL10, CXCL11, CXCL2, CYB561, DBN1, DDOST, DFFB, EMP1, FAM48A, FAM82C, FLJ10357, FLJ13236, FXN, GABBR1, GLMN, GMDS, GPATCH4, GRB10, GREM2, HDAC5, HOXA4, IDE, INDO, ITM2B, IVD, KLC1, KLF12, KLHL3, LAP3, LRRC41, MAD2L1, MAP2K5, MCAM, MRPS11, NARS, NDUFC1, NEO1, NOLC1, NPR3, NR2F1, NUP37, PAM, PBK, PBX2, PDLIM4, PFKM, PIR, PPIH, PREB, PSMA4, PSME2, RFC5, RGL2, RHBDF1, RP5-1077B9.4, RPL22, RPLP0, RPLP0-like, RRM2, RTN2, SCD5, SHANK2, SORD, SVIL, TAF9, TAPBP, TIPIN, TM4SF1, TMEM204, TNS1, TUSC3, UBE2L6, USP3, WARS, WDR1, WDR68, and ZBTB20.
The current standard of care for colorectal cancer provides the average treatment for the average tumor, with less than average results. Current cancer care over-treats many patients to help an unknown few, with toxic, relatively ineffective, expensive therapeutics. There is an urgent need to develop a means to predict which patients will respond to standard therapies, which patients do not require therapy in addition to surgery, and which patients are likely not to respond to current therapeutics. For every 100 stage 11 and III colon cancer patients on adjuvant therapy, only about 12 of them will respond favorably, about 50 would survive without therapy, and about 38 will experience a recurrence even when given the current treatments. The current invention seeks to help individuals on both sides of this equation by stratifying the risk of a poor outcome. Thus, individuals with low risk tumors, in consultation with their physicians, may opt to avoid unnecessary and debilitating therapy. On the other hand, individuals with high risk tumors may seek to enroll in clinical trials testing the newest therapies to increase their chance of a better outcome.
The present invention relates generally to methods of determining the prognosis of a subject having colon cancer. In a first aspect of the present invention, the method for determining the prognosis of a subject having colon cancer involves obtaining a biological sample from the subject and detecting expression levels of at least five genes selected from the group of 176 genes informative of colon cancer prognosis. The group of 176 genes informative of colon cancer prognosis includes the following genes: ACSL4, RQCD1, AA058828*, AIP, AKR1A1, AP3D1, ARL2BP, ARL4A, ARL6IP4, OGFOD2, ASNA1, ATP5B, C12orf52, C19orf36, C1GALT1, C1orf144, C5orf23, C6orf15, C7orf10, C8orf70, CALML4, CASP1, CCNA2, CCT2, CDC42BPA, AK023058*, CDR2L, CFB, CHST12, CLN5, CMPK1, CNOT7, CNPY2, COBL, COMMD4, COX5A, CXCL11, CYB561, CYB5B, DAZAP2, DDX23, DENND2A, DENND2D, DHX15, AL359599*, DND1, DOCK9, EGFR, ELP3, ERP29, ETV1, FAM82C, FDFT1, FKBP14, FLJ10357, FRYL, GALNS, GCHFR, GHITM, GLS, GPR177, GRB10, GREM2, GRHPR, GRP, GSR, GSTA1, H2AFZ, HOXB7, IFT88, IL15RA, ISG20, ITGAE, KIAA0746, SERINC2, KIF13B, KLC1, LAMP3, LANCL1, LAP3, LEPREL1, LL22NC03-5H6.5, LOC100131861, SAMM50, LRRC41, LRRC47, MAP4, MAPKAPK5, MCM5, MCRS1, METRN, METTL3, MFHAS1, MMP3, MOSPD1, MRPL46, MTUS1, MYRIP, N4BP2L2, NAB1, NAT1, NDUFC1, NISCH, NUMB, OGT, OSBPL3, PAM, PBK, PDGFA, PEBP1, PGDS, PIGR, PIGT, PRDM2, PRELP, PSMA5, PSMD9, PSPC1, PTHLH, R3HCC1, RP3-377H14.5, RPLP0, RPLP0-like, RPS27L, RTN2, RYK, SAV1, SCAMP1, SERPINA1, SF3B1, SFPQ, SGCD, SLC25A3, SLC39A8, SMG7, SMURF2, SORD, SOX4, SPATA5L1, SQRDL, SRP72, SSNA1, STK3, SYNGR1, TAPBPL, TEGT, TES, TLN1, TMCC1, TMEM106C, TMEM16A, TMEM33, TMEM87A, TNFRSF10B, TNFSF10, TNIK, TRIM36, U2AF2, UBE2L6, UCP2, UNC84A, UQCRFS1, UQCRH, USP12, USP3, VPS41, WARS, WDR1, WDR68, XPO7, YBX1, ZC3H7B, ZMYM2, ZMYM5, ZNF117, and ZNF430. This method further involves comparing the detected expression levels of the at least five genes from the biological sample with the expression levels of the corresponding at least five genes when associated with a good disease prognosis expression profile and when associated with a bad disease prognosis expression profile. Based on that comparison, the prognosis of the subject having colon cancer is determined.
In a preferred embodiment of this aspect of the present invention, the at least five genes are selected from a group of 71 genes informative of colon cancer prognosis. This group of 71 genes is a subset of the 176 genes informative of colon cancer prognosis and includes the following genes, SLC25A3, DAZAP2, TEGT, ERP29, PSMA5, DDX23, LOC100131861, SAMM50, SFPQ, NISCH, CYB5B, TMEM106C, EGFR, MCRS1, SERPINA1, CCNA2, NDUFC1, COX5A, GCHFR, ITGAE, PRDM2, PDGFA, GSR, GRP, COMMD4, XPO7, YBX1, SRP72, UCP2, SLC39A8, NAB1, WDR68, CXCL11, RECQL, CASP1, PTHLH, UNC84A, MTUS1, KIAA0746, SERINC2, DOCKS, FRYL, MAPKAPK5, LRRC47, RQCD1, TNIK, RPLP0, RPLP0-like, CLN5, NAT1, CDC42BPA, GSTA1, ZMYM5, RYK, PIGT, CMPK1, SQRDL, FAM82C, CNOT7, LL22NC03-5H6.5, PSPC1, TAPBPL, METRN, PBK, MRPL46, FKBP14, C1GALT1, GREM2, GPR177, DND1, and PRELP.
As described in greater detail in the Examples below, the 176- and 71-genes, whose expression levels are informative for predicting colon cancer outcome were derived from a larger pool of 383 genes. Kaplan-Meier (KM) survival curves were generated for the 383-genes and genes having p-values of >0.02 were removed from further analysis. The remaining group of 176 genes was further narrowed to 71 genes by removing genes having p-values associated with the KM curves of >0.0125 (See
Homo sapiens, clone
laevis)
Drosophila)
The term “prognosis” as used in the context of the present invention refers to the prediction of disease outcome for a subject having colon cancer. Disease outcome encompasses disease progression, reoccurrence, metastasis, and drug resistance. Determining the prognosis of a subject having colon cancer in accordance with the methods of the present invention has particular value for determining an appropriate treatment plan. For example, the prognosis of a subject determined using the methods of the present invention can predict a subject's response to a specific drug or combination of drugs, chemotherapy, radiation therapy, or surgical removal, and whether survival after following the administration of a particular treatment plan is likely.
As used herein a “disease prognosis expression profile” refers to gene expression of a collection of genes informative of disease outcome that is associated with a good disease outcome or a bad disease outcome. The gene expression of a collection of genes that is associated with a good disease outcome is a good disease prognosis expression profile. A good disease prognosis expression profile consists of genes having expression levels that are below the average tumor sample expression level and/or genes having expression levels that are above the average tumor sample expression level. In a preferred embodiment of the present invention a good disease prognosis expression profile for the group of 176 genes informative of colon cancer prognosis consists of genes having expression levels that are below that of an average tumor sample expression level that are selected from the group consisting of AK023058*, AIP, ARL2BP, C1GALT1, CDC42BPA, C8orf70, CLN5, COBL, CYB5B, MOSPD1, DOCK9, EGFR, FKBP14, DND1, GREM2, GPR177, GALNS, GRB10, GRP, GSTA1, RP3-377H14.5, HOXB7, ZNF117, TNIK, LANCL1, METRN, LEPREL1, NAB1, NISCH, OGT, OSBPL3, PDGFA, PRDM2, PRELP, PSPC1, RECQL, RYK, SMURF2, TLN1, UNC84A, USP12, ZMYM2, ZMYM5, AL359599*, ARL4A, N4BP2L2, GLS, C19orf36, TMCC1, METTL3, TMEM16A, RTN2, SCAMP1, SF3B1, SOX4, STK3, ZNF430, C6orf15, C7orf10, CHST12, ETV1, ACSL4, FLJ10357, C5orf23, AA058828*, CDR2L, KLC1, MAP4, NUMB, PAM, PGDS, PTHLH, ZC3H7B, SAV1, SGCD, SYNGR1, TES, IFT88, TRIM36 and VPS41. The good disease prognosis expression profile for the group of 176 genes further consists of genes having expression levels that are above the average tumor sample expression level that are selected from the group consisting of SERPINA1, RPLP0, RPLP0-like, CYB561, AKR1A1, AP3D1, ARL6IP4, OGFOD2, ASNA1, CFB, ERP29, SMG7, CASP1, CCNA2, LOC100131861, SAMM50, COX5A, CXCL11, DAZAP2, DDX23, FDFT1, COMMD4, GCHFR, GRHPR, GSR, ISG20, ITGAE, KIAA0746, SERINC2, FRYL, LRRC47, LAMP3, R3HCC1, MAPKAPK5, MCM5, MCRS1, TMEM106C, MMP3, MTUS1, LRRC41, NAT1, NDUFC1, YBX1, PEBP1, PIGR, PSMA5, SFPQ, SLC25A3, SLC39A8, SQRDL, SRP72, SSNA1, TAPBPL, TEGT, PBK, UCP2, UQCRH, XPO7, CCT2, CNOT7, DHX15, TMEM87A, ELP3, FAM82C, LL22NC03-5H6.5, DENND2D, WDR68, IL15RA, DENND2A, KIF13B, MFHAS1, SPATA5L1, MYRIP, PIGT, PSMD9, RPS27L, TEGT, TNFRSF10B, UBE2L6, USP3, ATP5B, CALML4, C1orf144, TMEM33, C12orf52, GHITM, H2AFZ, LAP3, MRPL46, SORD, CNPY2, TNFSF10, U2AF2, CMPK1, UQCRFS1, WARS and WDR1.
The gene expression of a collection of genes informative of disease outcome that is associated with a bad disease outcome is a bad disease prognosis expression profile. A bad disease prognosis expression profile consists of genes having expression levels above and/or below the average tumor sample expression level. In a preferred embodiment of the present invention, a bad disease prognosis expression file for the collection of 176 genes informative of colon cancer prognosis consists of genes having expression levels that are below that of an average tumor sample expression level selected from the group consisting of SERPINA1, RPLP0, RPLP0-like, CYB561, AKR1A1, AP3D1, ARL6IP4, OGFOD2, ASNA1, CFB, ERP29, SMG7, CASP1, CCNA2, LOC100131861, SAMM50, COX5A, CXCL11, DAZAP2, DDX23, FDFT1, COMMD4, GCHFR, GRHPR, GSR, ISG20, ITGAE, KIAA0746, SERINC2, FRYL, LRRC47, LAMP3, R3HCC1, MAPKAPK5, MCM5, MCRS1, TMEM106C, MMP3, MTUS1, LRRC41, NAT1, NDUFC1, YBX1, PEBP1, PIGR, PSMA5, SERPINA1, SFPQ, SLC25A3, SLC39A8, SQRDL, SRP72, SSNA1, TAPBPL, TEGT, PBK, UCP2, UQCRH, XPO7, CCT2, CNOT7, DHX15, TMEM87A, ELP3, FAM82C, LL22NC03-5H6.5, DENND2D, WDR68, IL15RA, DENND2A, KIF13B, MFHAS1, SPATA5L1, MYRIP, PIGT, PSMD9, RPS27L, TNFRSF10B, UBE2L6, USP3, ATP5B, CALML4, C1orf144, TMEM33, C12orf52, GHITM, H2AFZ, LAP3, MRPL46, SORD, CNPY2, TNFSF10, U2AF2, CMPK1, UQCRFS1, WARS and WDR; and genes having expression levels that are above the average tumor sample expression level selected from the group consisting of AK023058*, AIP, ARL2BP, C1GALT1, CDC42BPA, C8orf70, CLN5, COBL, CYB5B, MOSPD1, DOCK9, EGFR, FKBP14, DND1, GREM2, GPR177, GALNS, GRB10, GRP, GSTA1, RP3-377H14.5, HOXB7, ZNF117, TNIK, LANCL1, METRN, LEPREL1, NAB1, NISCH, OGT, OSBPL3, PDGFA, PRDM2, PRELP, PSPC1, RECQL, RYK, SMURF2, TLN1, UNC84A, USP12, ZMYM2, ZMYM5, AL359599*, ARL4A, N4BP2L2, GLS, C19orf36, TMCC1, METTL3, TMEM16A, RTN2, SCAMP1, SF3B1, SOX4, STK3, ZNF430, C6orf15, C7orf10, CHST12, ETV1, ACSL4, FLJ10357, C5orf23, AA058828*, CDR2L, KLC1, MAP4, NUMB, PAM, PGDS, PTHLH, ZC3H7B, SAV1, SGCD, SYNGR1, TES, IFT88, TRIM36 and VPS41.
Another aspect of the present invention relates to a method for determining the prognosis of a subject having colon cancer that involves obtaining a biological sample from the subject and detecting the expression levels of at least five genes selected from the group of 101 genes informative of colon cancer prognosis. The group of 101 genes informative of colon cancer prognosis are provided in Table 2 below. This method further involves comparing the detected expression levels of the at least five genes from the biological sample with the expression levels of the corresponding at least five genes when associated with a good disease prognosis expression profile and when associated with a bad disease prognosis expression profile. Based on that comparison, the prognosis of the subject having colon cancer is determined.
cerevisiae,
laevis)
In accordance with this aspect of the present invention, a good disease prognosis expression profile consists of genes, from the collection of 101 genes informative of colon cancer disease outcome, having expression levels that are below that of an average tumor sample expression level that are selected from the group consisting of ACTN1, ADORA1, ARHGAP8, LOC553158, BEX4, C1orf95, C3orf63, CAMSAP1L1, CD59, CNPY2, DBN1, FAM48A, FLJ10357, GPATCH4, GRB10, GREM2, HDAC5, HOXA4, ITM2B, KLC1, KLF12, KLHL3, NPR3, PAM, PBX2, PDLIM4, PIR, RGL2, RHBDF1, RP5-1077B9.4, RTN2, SCD5, SHANK2, SVIL, TAPBP, TIPIN, TM4SF1, TMEM204, TNS1, TUSC3 and ZBTB20. A good disease expression profile further consists of genes having expression levels that are above the average tumor sample expression level that are selected from the group consisting of NARS, WDR1, WARS, CCT4, ATP5B, SORD, UBE2L6, PSME2, AIP, RRM2, LRRC41, CCT2, TAF9, CCNB2, RFC5, IDE, MAD2L1, PSMA4, NDUFC1, IVD, PPIH, NEO1, CXCL10, FXN, GABBR1, COMMD4, DFFB, GLMN, CASP7, ATP5G3, DDOST, CYB561, NR2F1, WDR68, CXCL2, CASP1, INDO, PFKM, CXCL11, MCAM, MAP2K5, MRPS11, NOLC1, EMP1, GMDS, RPLP0, RPLP0-like, PREB, CMPK1, LAP3, FAM82C, AACS, NUP37, PBK, ALG6, FLJ13236, RPL22, C15orf44, USP3 and CALML4.
Also in accordance with this aspect of the present invention, a bad disease prognosis expression profile consists of genes from the collection of 101 genes informative of colon cancer disease outcome, having expression levels below that of an average tumor sample expression level that are selected from the group consisting of NARS, WDR1, WARS, CCT4, ATP5B, SORD, UBE2L6, PSME2, AIP, RRM2, LRRC41, CCT2, TAF9, CCNB2, RFC5, IDE, MAD2L1, PSMA4, NDUFC1, IVD, PPIH, NEO1, CXCL10, FXN, GABBR1, COMMD4, DFFB, GLMN, CASP7, ATP5G3, DDOST, CYB561, NR2F1, WDR68, CXCL2, CASP1, INDO, PFKM, CXCL11, MCAM, MAP2K5, MRPS11, NOLC1, EMP1, GMDS, RPLP0, RPLP0-like, PREB, CMPK1, LAP3, FAM82C, AACS, NUP37, PBK, ALG6, FLJ13236, RPL22, C15orf44, USP3 and CALML4. A bad disease expression profile further consists of genes having expression levels that are above the average tumor sample expression level that are selected from the group consisting of ACTN1, ADORA1, ARHGAP8, LOC553158, BEX4, C1orf9S, C3orf63, CAMSAP1L1, CD59, CNPY2, DBN1, FAM48A, FLJ10357, GPATCH4, GRB10, GREM2, HDAC5, HOXA4, ITM2B, KLC1, KLF12, KLHL3, NPR3, PAM, PBX2, PDLIM4, PIR, RGL2, RHBDF1, RP5-1077B9.4, RTN2, SCD5, SHANK2, SVIL, TAPBP, TIPIN, TM4SF1, TMEM204, TNS1, TUSC3 and ZBTB20.
Determining the prognosis of a subject having colon cancer using the gene expression data of the present invention, involves calculating the percentage of genes analyzed having expression levels associated with a good disease prognosis expression profile and the percentage of genes analyzed having expression levels associated with a bad disease prognosis expression profile in the sample from the subject. A favorable prognosis for the subject exists when greater than 20%, more preferably, greater than 25%, and most preferably, greater than 30% of the genes analyzed have expression levels associated with a good disease prognosis expression profile and less than 30%, more preferably, less than 25%, and most preferably, less than 20% of the genes analyzed have expression levels associated with a bad disease prognosis expression profile. An unfavorable prognosis for the subject exists when greater than 20%, more preferably, greater than 25%, and most preferably, greater than 30% of the genes analyzed have expression levels associated with a bad disease prognosis expression profile and less than 30%, more preferably, less than 25%, and most preferably, less than 20% of the genes analyzed have expression levels associated with a good disease prognosis expression profile.
A biological sample obtained from the subject having colon cancer in accordance with the methods of the present invention can be any biological tissue, fluid, or cell sample. Typical biological samples include, but are not limited to, sputum, blood, blood cells (e.g., white cells), tissue or fine needle biopsy samples, urine, stool, peritoneal fluid, and pleural fluid, or cells therefrom. Biological samples may also include sections of tissues such as frozen sections taken for histological purposes. In a preferred embodiment of the present invention, the biological sample obtained from the subject having colon cancer is a population of primary colon cancer cells. The colon cancer cells can be derived from a stage I, II, III, or IV colon cancer tumor.
Methods of isolating RNA and protein from biological samples for use in the methods of the present invention are readily known in the art. Protein preparation can be carried out using any method that produces analyzable protein. For example, the sample cells or tissue can be lysed in a protein lysis buffer (e.g. 50 mM Tris-HCl (pH, 6.8), 100 mM DTT, 100 μg/ml PMSF, 2% SDS, 10% glycerol, 1 μg/ml each of pepstatin A, leupeptin, and aprotinin, and 1 mM sodium orthovanadate) and sheared with a 22-gauge needle. Other methods of protein isolation that are suitable for use in carrying out the methods of the present invention are fully described in D
Methods of isolation and purification of nucleic acids suitable for use in carrying out the methods of the present invention are described in detail in L
It may be desirable to amplify the nucleic acid sample prior to detecting gene expression. One of skill in the art will appreciate that a method which maintains or controls for the relative frequencies of the amplified nucleic acids to achieve quantitative amplification should be used.
Typically, methods for amplifying nucleic acids employ a polymerase chain reaction (PCR) (See e.g., P
Other suitable amplification methods include the ligase chain reaction (LCR) (e.g., Wu et al., “The Ligation Amplification Reaction (LAR)—Amplification of Specific DNA Sequences Using Sequential Rounds of Template-Dependent Ligation,” Genomics 4:560-9 (1989), Landegren et al., “A Ligase-Mediated Gene Detection Technique,” Science 24-1:1077-80 (1988) and Barringer et al., “Blunt-End and Single-Strand Ligations by Escherichia coli Ligase: Influence on an In Vitro Amplification Scheme,” Gene 89:117-22 (1990), which are hereby incorporated by reference in their entirety); transcription amplification (Kwoh et al., “Transcription-Based Amplification System and Detection of Amplified Human Immunodeficiency Virus Type 1 with a Bead-Based Sandwich Hybridization Format,” Proc. Natl. Acad. Sci. USA 86:1173-7 (1989) and WO88/10315 to Gingeras, which are hereby incorporated by reference in their entirety); self-sustained sequence replication (Guatelli et al., “Isothermal, In Vitro Amplification of Nucleic Acids by a Multienzyme Reaction Modeled After Retroviral Replication,” Proc. Nat. Acad. Sci. USA 87:1874-8 (1990) and WO90/06995 to Gingeras, which are hereby incorporated by reference in their entirety); selective amplification of target polynucleotide sequences (U.S. Pat. No. 6,410,276 to Burg et al, which is hereby incorporated by reference in its entirety); consensus sequence primed polymerase chain reaction (CP-PCR) (U.S. Pat. No. 5,437,975 to McClelland, which is hereby incorporated by reference in its entirety); arbitrarily primed polymerase chain reaction (AP-PCR) (U.S. Pat. Nos. 5,413,909 to Bassam, and 5,861,245 to McClelland which are hereby incorporated by reference in their entirety); and nucleic acid based sequence amplification (NABSA) (See U.S. Pat. Nos. 5,409,818, 5,554,517, and 6,063,603 all to Davey, which are hereby incorporated by reference in their entirety). Other amplification methods that may be used are described in U.S. Pat. Nos. 5,242,794 to Whiteley; 5,494,810 to Barany; and 4,988,617 to Landegren, which are hereby incorporated by reference in their entirety.
As described herein, detecting the “expression level” of a gene can be achieved by measuring any suitable value that is representative of the gene expression level. The measurement of gene expression levels can be direct or indirect. A direct measurement involves measuring the level or quantity of RNA or protein. An indirect measurement may involve measuring the level or quantity of cDNA, amplified RNA, DNA, or protein; the activity level of RNA or protein; or the level or activity of other molecules (e.g., a metabolite) that are indicative of the foregoing. The measurement of expression can be a measurement of the absolute quantity of a gene product. The measurement can also be a value representative of the absolute quantity, a normalized value (e.g., a quantity of gene product normalized against the quantity of a reference gene product), an averaged value (e.g., average quantity obtained at different time points or from different tumor cell samples from a subject, or average quantity obtained using different probes, etc.), or a combination thereof.
When it is desirable to measure the expression level of a gene by measuring the level of protein expression, any protein hybridization or immunodetection based assay known in the art can be used. In a protein hybridization based assay, an antibody or other agent that selectively binds to a protein is used to detect the amount of that protein expressed in a sample. For example, the level of expression of a protein can be measured using methods that include, but are not limited to, western blot, immunoprecipitation, enzyme-linked immunosorbent assay (ELISA), radioimmunoassay (RIA), fluorescent activated cell sorting (FACS), immunohistochemistry, immunocytochemistry, or any combination thereof. Also, antibodies, aptamers, or other ligands that specifically bind to a protein can be affixed to so-called “protein chips” (protein microarrays) and used to measure the level of expression of a protein in a sample. Alternatively, assessing the level of protein expression can involve analyzing one or more proteins by two-dimensional gel electrophoresis, mass spectroscopy (MS), matrix-assisted laser desorption/ionization-time of flight-MS (MALDI-TOF), surface-enhanced laser desorption ionization-time of flight (SELDI-TOF), high performance liquid chromatography (HPLC), fast protein liquid chromatography (FPLC), multidimensional liquid chromatography (LC) followed by tandem mass spectrometry (MS/MS), protein chip expression analysis, gene chip expression analysis, and laser densitometry, or any combinations of these techniques.
Measuring gene expression by quantifying mRNA expression can be achieved using any commonly used method known in the art including northern blotting and in situ hybridization (Parker et al., “mRNA: Detection by in Situ and Northern Hybridization,” Methods in Molecular Biology 106:247-283 (1999), which is hereby incorporated by reference in its entirety); RNAse protection assay (Hod et al., “A Simplified Ribonuclease Protection Assay,” Biotechniques 13:852-854 (1992), which is hereby incorporated by reference in its entirety); reverse transcription polymerase chain reaction (RT-PCR) (Weis et al., “Detection of Rare mRNAs via Quantitative RT-PCR,” Trends in Genetics 8:263-264 (1992), which is hereby incorporated by reference in its entirety); and serial analysis of gene expression (SAGE) (Velculescu et al., “Serial Analysis of Gene Expression,” Science 270:484-487 (1995); and Velculescu et al., “Characterization of the Yeast Transcriptome,” Cell 88:243-51 (1997), which is hereby incorporated by reference in its entirety). Alternatively, antibodies may be employed that recognize specific duplexes, including DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein duplexes.
In a preferred embodiment of the present invention, mRNA expression is measured using a nucleic acid amplification assay that is a semi-quantitative or quantitative real-time polymerase chain reaction (RT-PCR) assay. Because RNA cannot serve as a template for PCR, the first step in gene expression profiling by RT-PCR is the reverse transcription of the RNA template into cDNA, followed by its exponential amplification in a PCR reaction. The two most commonly used reverse transcriptases are avilo myeloblastosis virus reverse transcriptase (AMV-RT) and Moloney murine leukemia virus reverse transcriptase (MLV-RT), although others are also known and suitable for this purpose. The reverse transcription step is typically primed using specific primers, random hexamers, or oligo-dT primers, depending on the circumstances and the goal of expression profiling. For example, extracted RNA can be reverse-transcribed using a GeneAmp RNA PCR kit (Perkin Elmer, Calif., USA), following the manufacturer's instructions. The derived cDNA can then be used as a template in the subsequent PCR reaction.
Although the PCR step can use a variety of thermostable DNA-dependent DNA polymerases, it typically employs the Taq DNA polymerase, which has a 5′-3′ nuclease activity but lacks a 3′-5′ proofreading endonuclease activity. An exemplary PCR amplification system using Taq polymerase is TaqMan® PCR (Applied Biosystems, Foster City, Calif.). Taqman® PCR typically utilizes the 5′-nuclease activity of Taq or Tth polymerase to hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with equivalent 5′ nuclease activity can be used. Two oligonucleotide primers are used to generate an amplicon typical of a PCR reaction. A third oligonucleotide, or probe, is designed to detect the nucleotide sequence located between the two PCR primers. The probe is non-extendible by Taq DNA polymerase enzyme, and is labeled with a reporter fluorescent dye and a quencher fluorescent dye. Any laser-induced emission from the reporter dye is quenched by the quenching dye when the two dyes are located close together as they are on the probe. During the amplification reaction, the Taq DNA polymerase enzyme cleaves the probe in a template-dependent manner. The resultant probe fragments disassociate in solution, and signal from the released reporter dye is free from the quenching effect of the second fluorophore. One molecule of reporter dye is liberated for each new molecule synthesized, and detection of the unquenched reporter dye provides the basis for quantitative interpretation of the data.
TaqMan® RT-PCR can be performed using commercially available equipment, such as, for example, the ABI PRISM 7700® Sequence Detection System®(Perkin-Elmer-Applied Biosystems, Foster City, Calif., USA), or the Lightcycler (Roche Molecular Biochemicals, Mannheim, Germany).
In addition to the TaqMan primer/probe system, other quantitative methods and reagents for real-time PCR detection that are known in the art (e.g. SYBR green, Molecular Beacons, Scorpion Probes, etc.) are suitable for use in the methods of the present invention.
To minimize errors and the effect of sample-to-sample variation, RT-PCR is usually performed using an internal standard. The ideal internal standard is expressed at a constant level among different tissues, and is unaffected by colon cancer. RNAs most frequently used to normalize patterns of gene expression are mRNAs for the housekeeping genes glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and β-actin.
Real time PCR is compatible both with quantitative competitive PCR, where internal competitor for each target sequence is used for normalization and quantitative comparative PCR using a normalization gene contained within the sample, or a housekeeping gene for RT-PCR. For further details see, e.g., Heid et al., “Real Time Quantitative PCR,” Genome Research 6:986-994 (1996), which is incorporated by reference in its entirety.
In a preferred embodiment of the present invention, the expression levels of genes informative of colon cancer prognosis are detected using an array-based technique. These arrays, also commonly referred to as “microarrays” or “chips” have been generally described in the art, see e.g., U.S. Pat. Nos. 5,143,854 to Pirrung et al.; 5,445,934 to Fodor et al.; 5,744,305 to Fodor et al.; 5,677,195 to Winkler et al.; 6,040,193 to Winkler et al.; 5,424,186 to Fodor et al., which are all hereby incorporated by reference in their entirety. A microarray comprises an assembly of distinct polynucleotide or oligonucleotide probes immobilized at defined positions on a substrate. Arrays are formed on substrates fabricated with materials such as paper, glass, plastic (e.g., polypropylene, nylon), polyacrylamide, nitrocellulose, silicon, optical fiber or any other suitable solid or semi-solid support, and configured in a planar (e.g., glass plates, silicon chips) or three-dimensional (e.g., pins, fibers, beads, particles, microtiter wells, capillaries) configuration. Probes forming the arrays may be attached to the substrate by any number of ways including (i) in situ synthesis (e.g., high-density oligonucleotide arrays) using photolithographic techniques (see Fodor et al., “Light-Directed, Spatially Addressable Parallel Chemical Synthesis,” Science 251:767-773 (1991); Pease et al., “Light-Generated Oligonucleotide Arrays for Rapid DNA Sequence Analysis,” Proc. Natl. Acad. Sci. U.S.A. 91:5022-5026 (1994); Lockhart et al., “Expression Monitoring by Hybridization to High-Density Oligonucleotide Arrays,” Nature Biotechnology 14:1675 (1996); and U.S. Pat. Nos. 5,578,832 to Trulson; 5,556,752 to Lockhart; and 5,510,270 to Fodor, which are hereby incorporated by reference in their entirety); (ii) spotting/printing at medium to low-density (e.g., cDNA probes) on glass, nylon or nitrocellulose (Schena et al., “Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray,” Science 270:467-470 (1995), DeRisi et al, “Use of a cDNA Microarray to Analyse Gene Expression Patterns in Human Cancer,” Nature Genetics 14:457-460 (1996); Shalon et al., “A DNA Microarray System for Analyzing Complex DNA Samples Using Two-Color Fluorescent Probe Hybridization,” Genome Res. 6:639-645 (1996); and Schena et al., “Proc. Natl. Acad. Sci. U.S.A. 93:10539-11286) (1995), which are hereby incorporated by reference in their entirety); (iii) masking (Maskos et al., “Oligonucleotide Hybridizations on Glass Supports: A Novel Linker for Oligonucleotide Synthesis and Hybridization Properties of Oligonucleotides Synthesised In Situ,” Nuc. Acids. Res. 20:1679-1684 (1992), which is hereby incorporated by reference in its entirety); and (iv) dot-blotting on a nylon or nitrocellulose hybridization membrane (see e.g., S
Fluorescently labeled cDNA for hybridization to the array may be generated through incorporation of fluorescent nucleotides by reverse transcription of RNA extracted from colon cancer tumor tissue of interest. Labeled cDNA applied to the array hybridizes with specificity to each nucleic acid probe spotted on the array. After stringent washing to remove non-specifically bound cDNA, the array is scanned by confocal laser microscopy or by another detection method, such as a CCD camera. Quantitation of hybridization of each arrayed element allows for assessment of corresponding mRNA abundance. With dual color fluorescence, separately labeled cDNA samples generated from two sources of RNA are hybridized pairwise to the array. The relative abundance of the transcripts from the two sources corresponding to each specified gene is thus determined simultaneously. The miniaturized scale of the hybridization affords a convenient and rapid evaluation of the expression pattern for large numbers of genes. Such methods have been shown to have the sensitivity required to detect rare transcripts, which are expressed at a few copies per cell, and to reproducibly detect at least approximately two-fold differences in the expression levels (Schena et al., “Parallel Human Genome Analysis: Microarray-Based Expression Monitoring of 1000 Genes,” “Proc. Natl. Acad. Sci. USA 93(20):10614-9 (1996), which is hereby incorporated by reference in its entirety).
When the use of microarray technology is desired, the expression levels of genes informative of colon cancer prognosis can be detected using commercially available arrays comprising nucleic acid probes, where at least five of the nucleic acid probes are complementary at least a portion of a nucleotide sequence (i.e., an RNA transcript or DNA nucleotide sequence) of a gene in the group of 176, 71, or 101 genes informative of colon cancer prognosis disclosed supra. As described herein, the expression levels of genes informative of colon cancer progression can be detected using the Affymetrix U133 gene expression arrays following the manufacturer's protocols. In a preferred embodiment of the present invention, however, the microarray comprises a plurality of nucleic acid probes, each nucleic acid probe having a nucleotide sequence that is complementary to at least a portion of a nucleotide sequence (RNA or DNA) of a gene selected from the group of 176 genes informative of colon cancer outcome disclosed supra. In another embodiment, the microarray comprises a plurality of nucleic acid probes, each nucleic acid probe having a nucleotide sequence that is complementary to at least a portion of a nucleotide sequence (RNA or DNA) of a gene selected from the group of 71 genes informative of colon cancer outcome described supra. In accordance with this aspect of the present invention, the nucleic acid probes of the present invention have a nucleotide sequence that is complementary to at least a portion of an RNA transcript or DNA nucleotide sequence encoded by a gene informative of colon cancer outcome. Exemplary nucleic acid probes having nucleotide sequences complementary to the RNA transcripts encoded by the 176 genes and the 71 genes informative of colon cancer outcome are provided in Table 1 by their Affymetrix identifier.
In another embodiment of the present invention, the microarray comprises a plurality of nucleic acid probes, each nucleic acid probe having a nucleotide sequence that is complementary to at least a portion of a nucleotide sequence (i.e., RNA transcript or DNA nucleotide sequence) of a gene selected from the group of 101 genes informative of colon cancer outcome disclosed supra. Exemplary nucleic acid probes having nucleotide sequences complementary to the RNA transcripts encoding the 101 genes informative of colon cancer outcome are provided in Table 2 by their Affymetrix identifier.
In another embodiment of the present invention, one or more supplementary analyses is performed to supplement or confirm the prognosis prediction achieved with the gene expression level analysis. In accordance with this embodiment of the present invention, the one or more additional analyses includes detecting microsatellite instability, measuring DNA promoter methylation, screening one or more mutations in one or more colon cancer oncogenes or tumor suppressor genes in the sample, or any combination of these analyses. The prognosis of a subject having colon cancer is then based on the detected expression levels of genes known to be informative of colon cancer in combination with one or more of these independent, additional analysis.
A deficient DNA mismatch repair (MMR) system is observed in about 10-15% of all colorectal carcinomas and in up to 90% of hereditary non-polyposis colorectal cancer (HNPCC) patients. Tumors with MMR defects acquire mutations in short repetitive DNA stretches, a phenomenon termed microsatellite instability. Therefore, the determination of microsatellite status is an ideal independent confirmatory prognostic analysis to perform in accordance with the methods of the present invention. Additionally, because the efficacy of adjuvant chemotherapy can be dependent on the microsatellite status of the tumor, determining the microsatellite status can be particular relevant to determining an effective individualized treatment plan for a subject having colorectal cancer.
In accordance with this aspect of the present invention, a favorable prognosis exists when a microsatellite instability-low status is detected, whereas an unfavorable prognosis exists when a microsatellite instability-high status is detected.
Methods and techniques for detecting microsatellite instability in a sample are well known in the art and are suitable for use in accordance with this aspect of the invention. In a preferred embodiment, microsatellite instability detection is performed using a PCR-based method to amplify tumor DNA and detect the five microsatellite markers established by the National Cancer Institute (Boland et al., “A National Cancer Institute Workshop of Microsatellite Instability for Cancer Detection and Familial Predisposition: Development of International Criteria for the Determination of Microsatellite Instability in Colorectal Cancer,” Cancer Res. 58(22):5248-57 (1998), which is hereby incorporated by reference in its entirety). These five microsatellite markers include two mononucleotide repeats (BAT26 and BAT25) and three dinucleotide repeats (D2S123, D5S346, and D17S250). The multiplex assay for rapid and accurate detection of the NCI 5-marker panel described by Nash et al., “Automated, Multiplex Assay for High-Frequency Microsatellite Instability in Colorectal Cancer,” J. Clin. Oncol. 21:3105-12 (2003), which is hereby incorporated by reference in its entirety, is particularly well suited for use in accordance with this aspect of the present invention. Alternatively, a PCR-based method for assessing the microsatellite instability status of a sample can be employed (e.g. detection of the 3′ UTR mononucleotide repeat, T25 (CAT25), of the CASP2 gene as described in U.S. Patent Application Publication No. 20080096197 to Findeisen et al., which is hereby incorporated by reference in its entirety).
Immunohistochemical approaches for detecting microsatellite instability are also suitable for use in accordance with this aspect of the present invention. Monoclonal antibodies specific for DNA mismatch repair genes, for example MLH1, MSH2, MSH6, and PMS2 have been described by Marcus et al. “Immunohistochemistry for hMLH1 and hMSH2: A Practical Test for DNA Mismatch Repair-Deficient Tumors,” Am J Surg Pathol. 23(10):1248-55 (1999); Lindor et al. “Immunohistochemistry Versus Microsatellite Instability Testing in Phenotyping Colorectal Tumors,” J Clin Oncol. 20(4):897-9 (2002); and Umar et al. “Revised Bethesda Guidelines for Hereditary Nonpolyposis Colorectal Cancer (Lynch syndrome) and Microsatellite Instability,” J Natl Cancer Inst. 96 (4):261-8 (2004), which are hereby incorporated by reference in their entirety.
A second analysis that is suitable to complement the detection of gene expression levels involves measuring the level of DNA promoter methylation. In higher order eukaryotic organisms, DNA methylation occurs at cytosines located 5′ to guanosine in a CpG dinucleotide. This modification has important regulatory effects on gene expression predominantly when it involves CpG rich areas known as CpG islands that are located in the promoter region of a gene sequence. Extensive methylation of CpG islands in tumor-suppressor genes has been associated with reduced expression of the tumor suppressor gene, resulting in unchecked cellular growth, tissue invasion, angiogenesis, and metastases. For example, the aberrant methylation of the Mut L homologue 1 gene (hMLH1) resulting in defective DNA mismatch repair has been associated with colorectal cancer. In accordance with this aspect of the invention, hMLH1 promoter methylation can be measured to compliment or confirm the gene expression detection analysis. Other genes known to be hypermethylated in colon cancer which are also suitable for promoter methylation analysis in accordance with this aspect of the invention include HPP1 (Sato et al., “Aberrent Methylation of the HPP1 Gene in Ulcerative Colitis-Associated Colorectal Carcinoma,” Cancer Research 62:6820-22 (2002), which is hereby incorporated by reference in its entirety); Reprimo (Takahashi et al., “Aberrent Methylation of Reprimo in Human Malignancies,” Int J Cancer 115(4):503-10 (2005), which is hereby incorporated by reference in its entirety); NEURL and FOXL2 (Schuebel et al., “Comparing the DNA Hypermethylome with Gene Mutations in Human Colorectal Cancer,” PLOS Genet. 3(9):e157-(2007), which is hereby incorporated by reference in its entirety); and ADAMTS1, CRABP1, and NR3C1 (Lind et al., “ADAMTS1, CRABP1, and NR3C1 identified as Epigenetically Deregulated Genes in Colorectal Tumorigenesis,” Cell Oncology 28(5-6):259-72 (2006), which is hereby incorporated by reference in its entirety).
In a preferred embodiment of the present invention the methylation level of the lecithin:retinol acyl transferase (LRAT) gene promoter nucleotide sequence, or region upstream thereof, is measured (See U.S. Patent Application Publication No. US20050227265 to Barany et al. and WO2008/077095 to Barany et al., which are hereby incorporated by reference in their entirety). In accordance with this aspect of the invention, a favorable prognosis exists when an increase in the methylation level of the lecithin:retinol acyl transferase gene promoter nucleotide sequence, or region upstream thereof, is measured.
DNA promoter methylation can be measured at a genome-wide or gene-specific level. For global methylation analysis, chromatographic methods, such as reverse-phase high pressure liquid chromatography and methyl accepting capacity assays are generally used. Alternatively, the restriction landmark genomic scanning for methylation (RLGS-M) assay as described by Hayashizaki et al., “Restriction Landmark Genomic Scanning Method and its Various Applications,” Electrophoresis 14(4):251-8 (1993) and CpG island microarry can also be used to measure genome-wide methylation. Various techniques available to measure gene-specific methylation, include DNA digestion with a methylation sensitive restriction enzyme followed by Southern blot detection of PCR amplification; methylation specific PCR; bisulfite genomic sequencing PCR; or in situ immunodetection using 5-methylcytosine specific antibody as described by Castilho et al., “5-Methylcytosine Distribution and Genome Organization in Triticale Before and After Treatment with 5-Azacytidine,” J Cell Sci 112:4397-404 (1999), which is hereby incorporated by reference in its entirety). Additional methods and techniques for measuring DNA methylation including the nearest neighbor analysis, chemical DNA sequencing, methylation sensitive restriction fingerprinting, combined bisulfite restriction analysis, and methyl-CpG binding column isolation are described in DNA Methylation Protocols (Mills and Ramsahoye, eds., Humana Press 2002), which is hereby incorporated by reference in its entirety. In a preferred embodiment, DNA promoter methylation analysis is carried out using the quantitative bisulfite-PCR/LDR/Universal Array platform described in U.S. Patent Application Publication No. US20050227265 to Barany et al.; WO2008/077095 to Barany et al.; and Chen et al., “Multiplexed Profiling of Candidate Genes for CpG Island Methylation Status using a Flexible PCR/LDR/Universal Array Assay,” Genome Research 16:282-9 (2006) which are incorporated by reference in their entirety.
In another embodiment of the present invention, the mutational status of one or more colon cancer oncogenes or tumor-suppressor genes is screened. The presence or absence of such mutations can contribute to the determination of a subject's prognosis. Mutations in several such genes, especially DNA mismatch repair genes, are well known in the art and can be screened in accordance with this aspect of the invention. In a preferred embodiment, the mutational status of K-ras, B-raf, APC, p53, PIK3CA, is screened. An unfavorable prognosis exists when mutations in one or more of these colon cancer oncogenes or tumor suppressor genes is identified.
Any art acceptable method for detecting the mutational status of a gene can be used in accordance with this aspect of the invention. Preferred methods include the endonuclease/ligase based mutation scanning method (Huang et al., “An Endonuclease/Ligase Based Mutation Scanning Method Especially Suited for Analysis of Neoplastic Tissue,” Oncogene 21:1909-21 (2002) and U.S. Pat. No. 7,198,894 to Barany et al., which are hereby incorporated by reference in their entirety); ligase detection reaction (LDR) (U.S. Pat. No. 6,312,892 to Barany et al., which is hereby incorporated by reference in their entirety); coupled LDR/PCR (U.S. Pat. Nos. 7,097,980, 6,797,470, 6,268,148, and 6,027,889 all to Barany et al., which are hereby incorporated by reference in their entirety); coupled PCR/restriction endonuclease digestion/LDR reaction (U.S. Pat. No. 7,014,994 to Barany et al., which is hereby incorporated by reference in its entirety); ligase detection reactions using addressable arrays (U.S. Pat. No. 7,083,917 to Barany and U.S. Patent Application Publication Nos. 20020150921, 20030022182, 20040259141, and 20040253625 all to Barany et al., which are hereby incorporated by reference in their entirety) and DNA microarray multiplex detection methods (Gerry et al., “Universal DNA Microarray Method for Multiplex Detection of Low Abundant DNA Mutations,” J Mol Biol 292:251-62 (1999), which is hereby incorporated by reference in its entirety). Other suitable methods for determining the mutational status of a gene include direct DNA sequencing techniques, (e.g. Sanger dideoxy or Maxam-Gilbert sequencing reactions) and massively parallel sequencing technology.
In a preferred embodiment of the present invention, the data generated from the detection of gene expression levels of the at least five genes selected from the group of 176, 71, or 101 genes informative of colon cancer prognosis is used to prepare a personalized genomic profile for a colon cancer patient. Information regarding microsatellite instability, DNA promoter methylation, and the mutational status of one or more oncogenes or tumor-suppressor genes can also be incorporated into an individual's personalized genomic profile. The genomic profile can be used to establish a personalized treatment plan for the colon cancer patient. Such treatment plan may consist of surgery, individual therapy, chemotherapy, radiation therapy or any combination thereof. In accordance with this aspect of the invention, the colon cancer patient is administered a cancer treatment based on the treatment plan.
The negative and positive scores are converted to percentages based on the total number of genes analyzed. In the hypothetical example, sample 1 had 3 out of 6 genes, or 50%, with favorable or positive expression levels, and 1 out of 6 genes, or 17% with unfavorable or negative expression levels (
As indicated in
As discussed supra, the predicted outcome for a patient, determined by gene expression levels as outlined above, can be used to guide treatment. For example, patients who bin to Group 1 have a favorable prognosis and may benefit from surgery only, whereas patients who bin to Group 4 have an unfavorable prognosis and may need to supplement surgery with chemotherapy or other more aggressive therapies. Treatment decisions should further take into consideration the stage of the tumor. For example, individuals with stage 2 tumors in Group 1 or 2A will most likely benefit from surgery without additional treatment. Individuals with stage 3 tumors in these groups are probably responsive to standard care. Individuals with stage 3 tumors in Groups 4 and 5 will most likely not be responsive to standard care, and thus would be candidates for enrolling into clinical trials of novel therapies.
The present invention is also directed to a method of identifying an agent that improves the prognosis of a subject having colon cancer. This method involves administering an agent (i.e., a candidate agent) to the subject having colon cancer and obtaining a first biological sample from the subject before said administering and a second biological sample from the subject after said administering. The method further involves detecting the expression level of at least five genes selected from the group of 176 genes informative of colon cancer prognosis disclosed supra. Determining increases or decreases in the expression levels of the at least five genes in the second sample compared to the first sample identifies an agent that improves the prognosis of a subject having colon cancer. In a preferred embodiment of this aspect of the present invention, the at least five genes is selected from the group of 71 genes informative of colon cancer prognosis disclosed supra.
In accordance with this aspect of the present invention, an agent that increases the expression levels of any one of the following genes: SERPINA1, RPLP0, RPLP0-like, CYB561, AKR1A1, AP3D1, ARL6IP4, OGFOD2, ASNA1, CFB, ERP29, SMG7, CASP1, CCNA2, LOC100131861, SAMM50, COX5A, CXCL11, DAZAP2, DDX23, FDFT1, COMMD4, GCHFR, GRHPR, GSR, ISG20, ITGAE, KIAA0746, SERINC2, FRYL, LRRC47, LAMP3, R3HCC1, MAPKAPK5, MCM5, MCRS1, TMEM106C, MMP3, MTUS1, LRRC41, NAT1, NDUFC1, YBX1, PEBP1, PIGR, PSMA5, SFPQ, SLC25A3, SLC39A8, SQRDL, SRP72, SSNA1, TAPBPL, TEGT, PBK, UCP2, UQCRH, XPO7, CCT2, CNOT7, DHX15, TMEM87A, ELP3, FAM82C, LL22NC03-5H6.5, DENND2D, WDR68, IL15RA, DENND2A, KIF13B, MFHAS1, SPATA5L1, MYRIP, PIGT, PSMD9, RPS27L, TNFRSF10B, UBE2L6, USP3, ATP5B, CALML4, C1orf144, TMEM33, C12orf52, GHITM, H2AFZ, LAP3, MRPL46, SORD, CNPY2, TNFSF10, U2AF2, CMPK1, UQCRFS1, WARS and WDR1 is an agent that improves the prognosis of a subject having colon cancer. An agent that causes a decrease in the expression levels of any one of the following genes: AK023058*, AIP, ARL2BP, C1GALT1, CDC42BPA, C8orf70, CLN5, COBL, CYB5B, MOSPD1, DOCK9, EGFR, FKBP14, DND1, DND1, GREM2, GPR177, GALNS, GRB10, GRP, GSTA1, RP3-377H14.5, HOXB7, ZNF117, TNIK, LANCL1, METRN, LEPREL1, NAB1, NISCH, OGT, OSBPL3, PDGFA, PRDM2, PRELP, PSPC1, RECQL, RYK, SMURF2, TLN1, UNC84A, USP12, ZMYM2, ZMYM5, AL359599*, ARL4A, N4BP2L2, GLS, C19orf36, TMCC1, METTL3, TMEM16A; RTN2, SCAMP1, SF3B1, SOX4, STK3, ZNF430, C6orf15, C7orf10, CHST12, ETV1, ACSL4, FLJ10357, C5orf23, AA058828*, CDR2L, KLC1, MAP4, NUMB, PAM, PGDS, PTHLH, ZC3H7B, SAV1, SGCD, SYNGR1, TES, IFT88, TRIM36 and VPS41 is an agent that improves the prognosis of a subject having colon cancer.
Another aspect of the present invention is directed to a collection of 71 genes having expression levels informative for predicting a prognosis of a patient having colon cancer. This collection of 71 genes includes the following genes of Table 1: SLC25A3, DAZAP2, TEGT, ERP29, PSMA5, DDX23, LOC100131861, SAMM50, SFPQ, NISCH, CYB5B, TMEM106C, EGFR, MCRS1, SERPINA1, CCNA2, NDUFC1, COX5A, GCHFR, ITGAE, PRDM2, PDGFA, GSR, GRP, COMMD4, XPO7, YBX1, SRP72, UCP2, SLC39A8, NAB1, WDR68, CXCL11, RECQL, CASP1, PTHLH, UNC84A, MTUS1, KIAA0746, SERINC2, DOCK9, FRYL, MAPKAPK5, LRRC47, RQCD1, TNIK, RPLP0, RPLP0-like, CLN5, NAT1, CDC42BPA, GSTA1, ZMYM5, RYK, PIGT, CMPK1, SQRDL, FAM82C, CNOT7, LL22NC03-5H6.5, PSPC1, TAPBPL, METRN, PBK, MRPL46, FKBP14, C1GALT1, GREM2, GPR177, DND1, and PRELP. The collection of 71 genes informative of predicting the prognosis of a patient having colon cancer can further include the following genes of Table 1: AA058828*, ACSL4, AIP, AK023058*, AKR1A1, AL359599*, AP3D1, ARL2BP, ARL4A, ARL61P4, OGFOD2, ASNA1, ATP5B, C12orf52, C19orf36, C1orf144, C5orf23, C6orf15, C7orf10, C8orf70, CALML4, CCT2, CDR2L, CFB, CHST12, CNPY2, COBL, CYB561, DENND2A, DENND2D, DHX15, DND1, ELP3, ETV1, FDFT1, FLJ10357, GALNS, GHITM, GLS, GRB10, GRHPR, H2AFZ, HOXB7, IFT88, IL15RA, ISG20, KIAA0746, SERINC2, KIF13B, KLC1, LAMP3, LANCL1, LAP3, LEPREL1, LRRC41, MAP4, MCM5, METTL3, MFHAS1, MMP3, MOSPD1, MYRIP, N4BP2L2, NUMB, OGT, OOSBPL3, PAM, PEBP1, PGDS, PIGR, PSMD9, R3HCC1, RP3-377H14.5, RPS27L, RTN2, SAV1, SCAMP1, SF3B1, SGCD, SLC39A8, SMG7, SMURF2, SORD, SOX4, SPATA5L1, SSNA1, STK3, SYNGR1, TEGT, TES, TLN1, TMCC1, TMEM16A, TMEM33, TMEM87A, TNFRSF10B, TNFSF10, TRIM36, U2AF2, UBE2L6, UCP2, UQCRFS1, UQCRH, USP12, USP3, VPS41, WARS, WDR1, ZC3H7B, ZMYM2, ZNF117, and ZNF430.
Another aspect of the present invention is related to a collection of 101 genes having expression levels informative for predicting a prognosis of a patient having colon cancer. The collection of 101 genes are provided in Table 2 above.
Also included in the present invention are arrays that are useful for practicing one or more of the above described methods. Such arrays consist of nucleic acid or peptide-based probes that are useful for detecting the expression of one or more genes, preferably at least five genes, from the collection of 71, 101, or 176 genes that are informative for predicting the prognosis of a subject having colon cancer, using any of the methods described supra for detecting gene expression. A variety of different array formats are known in the art with a wide variety of probe structures, substrate compositions, and attachment technologies (See e.g. U.S. Pat. Nos. 5,143,854 to Pirrung et al.; 5,288,644 to Beavis et al.; 5,324,633 to Fodor et al.; 5,432,049 to Fischer et al.; 5,470,710 to Weiss et al.; 5,492,806 to Drmanac et al.; 5,445,934 to Fodor et al.; 5,744,305 to Fodor et al.; 5,677,195 to Winkler et al.; 6,040,193 to Winkler et al.; and 5,424,186 to Fodor et al., which are all hereby incorporated by reference in their entirety). In a preferred embodiment, array(s) of the present invention consist of a plurality of nucleic acid probes, each nucleic acid probe having a nucleotide sequence that is complementary to at least a portion of a nucleotide sequence (e.g., RNA or DNA) of a gene selected from the collection of 71 genes, 101 genes, 176 genes, or any combination thereof. Exemplary nucleic acid probes having nucleotide sequences complementary to at least a portion of the nucleotide sequences (i.e., RNA transcript) encoded by the genes of the 71, 101, and 176 gene collections are provided in Tables 1 and 2, although variations of those probes, or other probes may also be suitable for use.
In a preferred embodiment of the present invention the arrays of the present invention are available together with suitable reagents as a kit. The kit can be used to determine gene expression levels in biological sample(s) from a subject having colon cancer and determine his or her prognosis. Additional reagents suitable for inclusion in such kits include, but are not limited to, gene specific primers for the collections of the 71, 101, and/or 176 genes, universal primers, dNTPs and/or rNTPS, fluorescent, biotinylated, or other post-synthesis labeling reagents, enzymes such as reverse transcriptase, DNA and/or RNA polymerases, and various wash and buffer mediums.
Another aspect of the present invention relates to a method for determining a subject's predisposition to having colon cancer. This method involves obtaining a biological sample from the subject and detecting the expression levels of at least five gene selected from the collection of 176 genes informative of colon cancer predisposition disclosed supra. The method further involves comparing the detected expression levels of the at least five genes from said sample with the expression levels of the corresponding five genes associated with a having a predisposition to colon cancer and determining the subject's predisposition to having colon cancer based on said comparing.
Expression array data was generated from 183 primary colon cancer (PCC) tumors, 46 large adenomas, 39 liver metastasis, 19 lung metastasis, 53 normal mucosa, 7 normal lung, and 12 normal liver tissues. In addition, SNP array data was collected from 89 colorectal (CRC) tissue samples (65 primary colon cancer, 9 liver metastasis, 10 lung metastasis, and 5 unclassified colon cancer), as well as 56 normal tissues (i.e., normal mucosa, liver, or kidney), 51 of which were matched to the CRC tissues. Tissue samples were obtained from CRC patients at Memorial Sloan Kettering Cancer Center (MSKCC), whose initial operations occurred between 1992 and 2004. Cancer samples included in SNP array analysis were characterized by pathologists (MSKCC) to have ≧70% pure tumor cells. Acquisition of tissues followed the strict protocols of the Institutional Review Boards of MSKCC and Cornell University Weill Medical College.
Total RNA from microdissected tissue samples (both tumor and normal tissue samples) was prepared following the protocol recommended by Affymetrix (Santa Clara, Calif.). RNA was extracted from homogenized tissues using the Trizol protocol (Guanidinium thiocyanate-phenol-chloroform extraction) (Invitrogen Corp.) and purified using RNeasy columns (Qiagen).
Microdissected tissue samples (50-100 mg) were homogenized in liquid nitrogen and suspended in 400 ul proteinase K solution (50 ul 20 mg/ml proteinase K in proteinase K buffer). Phenol/chloroform (500 ul) was added and the mixture was shaken thoroughly in a phase lock gel tube. The upper aqueous layer containing genomic DNA was transferred to a separate tube and washed with isopropanol and 70% ethanol. The resulting pellet was resuspended in molecular biology-grade water.
To generate the expression array data, the protocol recommended by Affymetrix, Inc. was strictly followed. Briefly, first strand cDNA was synthesized from 10 mg total RNA, using the One-Cycle cDNA Synthesis kit (which includes T7 (dT) primer, and SuperScript II Reverse Transcriptase). Additional reagents from the same kit (i.e., 2nd strand reaction mix, E. coli DNA ligase, and E. coli Polymerase I) were used to synthesize the 2nd strand cDNA. The cDNA product was transcribed in vitro to produce biotin-labeled cRNA, using MEGAscript T7 Kit (Ambion, Inc.). The labeled cRNA was fragmented and hybridized to GeneChip Human Genome U133A Array chip at 45° C. for 16 h. Afterwards, the arrays were washed and stained using SAPE (streptavidin-phycoerythrin) and biotinylated anti-streptavidin antibody. All of the washing and staining procedures were conducted using the Affymetrix Fluidic Station 450 (FS450). Following hybridization, the arrays were scanned using the GeneChip Scanner 3000. The Affymetrix GCOS software was used to generate image (DAT), cell intensity (CEL), and analysis (CHP) files for every sample. Standard thresholding, filtering operations, and normalizations were applied such that the average intensity value across all probesets for every sample was around 69.
The primary colorectal cancer samples were classified into two groups according to the level of gene expression as determined by the Affymetrix U133A expression array. Kaplan-Meier survival analysis was used to determine the disease-specific survival patterns on selected genes in areas of chromosomal aberrations. Follow-up (0-175 months; median 74 months) was censored at death from other causes for the Kaplan-Meier analysis. Statistical analysis and curves were generated using the JMP statistical software (version 5.1.2, SAS institute, Cary, N.C., USA).
Primary colon tumor samples from 166 patients were used in the analysis to identify genes that are predictive of disease outcome. Of these samples, 56 were derived from patients that had died of disease (DOD), and 110 samples were derived from patient that either had no evidence of disease (NED) in long term follow up, were alive with disease (AWD), or died of other or unknown causes (DOC/DUC). Samples from the 110 patients who did not die of disease are collectively referred to as “non-DOD”.
A computer analysis was performed to identify genes that had expression levels in the top third in samples from patients who died of disease (DOD) but in the bottom third in samples taken from patients who did not die of disease (non-DOD), and identify genes that had expression levels in the bottom third in samples from DOD patients, but in the top third in samples from non-DOD patients. This analysis identified genes that had different expression patterns in DOD and non-DOD samples and were candidates for further analysis.
A difference score for each of these candidate gene was then calculated by subtracting the total number of DOD tumor samples where gene expression was in the bottom third of tumor expression from the total number of DOD tumor samples where gene expression was in the top third of tumor expression. Genes having a difference score outside of 12 to 19 or -23 to −12 were eliminated from analysis while the remaining genes, 383 in total, were further analyzed using Kaplan-Meier survival curves (
Kaplan-Maier curves were manually generated for all of the 383 genes using the JMP statistical analysis program (SAS Institute, Cary, N.C.). The chi-square values and p-values for all of these curves were then used to sort the genes by the greatest difference in survival based on expression. The 383 gene set that was identified based on difference scores was narrowed to 176 genes, where the 176 genes had KM curves with a p-value ≦0.02. The 176 gene set was further narrowed to 71 genes based on those genes having KM curves with a p-value of ≦0.0125 as shown in
Table 3 below summarizes additional parameters calculated for each gene in the 176 gene set, which includes the 71 gene set. These parameters include (1) the average expression value for a particular gene across all tumor samples (“Ave Tumor”) and the standard deviation for expression for each gene probe used to detect expression (“Stdev Tumor”); (2) the difference score (“Diff”) which is the total number of DOD samples where the gene expression level was in the top third of tumor expression level minus the total number DOD samples where the gene expression level was in the bottom third of tumor expression level; (3) the percentage DOD samples having gene expression values in the top third of tumor expression (“D+1%”); (4) the percentage of DOD samples having gene expression values equal to the average, or the middle third of tumor expression (“D0%”); (5) the percentage of DOD samples having gene expression values in the bottom third of tumor expression (“D−1%”); (6) the percentage of difference between the two curves in the Kaplan-Meier analysis (“KM %”) calculated by dividing the number of DOD samples where the gene was expressed in the top third over the number of DOD and non-DOD samples where the gene was expressed in the top third; and 7) the chi-square and p-values of the KM survival curve analysis. The last two columns of Table 3 indicate whether increased (“up”) or decreased (“down”) expression of the particular gene predicts an unfavorable prognosis (“Bad Outcome Score”) or a favorable prognosis (“Good Outcome Score”).
Using the above-described methods, genes having expression levels above the average tumor expression level and genes having expression levels below the average tumor expression level in samples derived from patients who generally had poor outcome were discovered. The final list of validated genes was sorted by chromosomal location to identify consistent patterns of over or under expression that were chromosome location specific.
Typically, Kaplan Meier curves revealed expression patterns with normal distribution (
An additional 22 samples (
SNP analysis was performed using the Affymetrix GeneChip Human Mapping 50K array Xba 240 array (or SNP array) following the protocol provided by Affymetrix (“GeneChip Mapping 100K Assay Manual”). Briefly, 0.25 μg of genomic DNA was digested with XbaI. The digests were ligated, PCR-amplified (such that the products were in the range of 250 to 2,000 bp), fragmented, biotin-labeled, and hybridized to the array. As in the expression array protocol, the SNP arrays also underwent staining and washing in Fluidics Station 450 (FS450) with the use of SAPE (streptavidin-phycoerythrin) and biotinylated anti-streptavidin antibody. The arrays were scanned in GeneChip Scanner 3000 to generate the image (DAT) and cell intensity (CEL) files. The CEL files were imported to GeneChip Genotyping Analysis Software (GTYPE) ver 4.1 software to generate the SNP calls.
The functionalities of Chromosomal Copy Number Analysis Tool (GNAT) software are embedded in GTYPE and the concepts and algorithms are initially described by Huang et al., “Whole Genome DNA Copy Number Changes Identified by High Density Oligonucleotide Arrays,” Hum. Genomics 1(4):287-99 (2004), which is hereby incorporated by reference in its entirety. GNAT uses the probe intensity data, as well as the GDAS-produced SNP calls to generate both the Single Point Analysis (SPA) and Genomic Smoothed Analysis (GSA) copy number (CN) estimates and the corresponding p-values. In addition, GNAT also generates the measures of loss of heterozygosity (LOH) based on the SNP calls. Once the SNP genotype calls and copy number estimates were obtained using GTYPE and GNAT, the data was further processed to refine the copy number data and to provide LOH calls that accommodate tissue and/or DNA aberration heterogeneity resulting in partially changed DNA (e.g. DNA with single gains at a given location in some of the strands and copy-neutral in other strands of the same chromosomal location). Regions of variation in copy number data are identified by applying segmentation and spatial filtering algorithms. The results are not constrained to integers. Sample-specific copy neutral, gain, and loss levels are obtained. For the LOH analysis, the SNPs that undergo an actual loss of heterozygosity from a normal control sample to the case sample are taken as input together with the SNPs that remain heterozygous. The majority of SNPs which are homozygous in the normal sample are ignored, as they are uninformative for regions of LOH. These two kind of SNPs are spatially averaged to allow for the effects of tissue heterogeneity. For those samples that lack a matched normal sample, the LOH values are inferred from the homozygosity data based on the relationship between these two quantities obtained from the matched tumor and normal samples.
Shown in
The simultaneous use of SNP and expression arrays allows one to analyze the patterns of gene expression in chromosomal regions usually characterized by aberrations (copy gains/losses involving either whole chromosomal arms, or regions of smaller size). Chromosomal arms 7p, 7q, 8q, 13q, 20p, and 20q, which usually gain additional copies in colorectal cancer, also have a high percentage of upregulated genes (see
The concordance between dysregulation and prognostic effect is highly evident in the 8p arm (
Sodium bisulfite has been widely used to distinguish 5-methylcytosine from cytosine. Bisulfite converts cytosine into uracil via a deamination reaction while leaving 5-methylcytosine unchanged. Genomic DNAs extracted from colon tumor samples were used in this study. Typically, 1˜0.5 μg genomic DNA in a volume of 40 μl was incubated with 0.2N NaOH at 37° C. for 10 minutes. Next, 30 μl of 10 mM hydroquinone and 520 μl of 3M sodium bisulfite were added to the reaction. Sodium bisulfite (3M) was made with 1.88 g sodium bisulfite (Sigma Chemicals, ACS grade) dissolved in a final total of 5 ml deionized water at pH 5.0. The bisulfite/DNA mixture was incubated for 16 hours in a DNA thermal cycler (Perkin Elmer Cetus), cycling between 50° C. for 20 minutes and 85° C. for 15 seconds. The bisulfite treated DNA was desalted using MICROCON centrifugal filter devices (Millipore, Bedford, Mass.) or, alternatively, was cleaned with Wizard DNA clean-up kit (Promega, Madison, Wis.). The eluted DNA was incubated with one-tenth volume of 3N NaOH at room temperature for 5 minutes before ethanol precipitation. The DNA pellet was then resuspended in 20 μl deionized H2O and stored at 4° C. until PCR amplification.
Two promoter regions of the LRAT gene were simultaneously amplified in a multiplex fashion. The multiplex PCR has two stages, namely a gene-specific amplification (stage one) and a universal amplification (stage two). The PCR primers are shown in Table 5.
The gene-specific PCR primers were designed such that the 3′ sequence contains a gene-specific region and the 5′ region contains an universal sequence. The gene specific primers design allows hybridization to promoter regions containing as few CpG sites as possible. For primers that inevitably include one or more CpG dinucleotides, the nucleotide analogs, K and P, which can hybridize to either C or T nucleotides or G or A nucleotides, respectively, can be included in the primer design. To reduce the cost of primer synthesis, PCR primers were designed without nucleotide analogs and using nucleotides G to replace K (purine derivative) and T to replace P (pyrimidine derivative), respectively. This type of primer design favors pairing to DNA that was initially methylated, although it also allows the mismatch pairing of G/T when the original DNA was unmethylated at that site. The ethidium bromide staining intensity of PCR amplicons separated by the agarose gel electrophoresis, demonstrated that this primer design was as robust as using analogs-containing primers.
In the first stage, the multiplex PCR reaction mixture (12.40 consisted of 0.5 μl bisulfite modified DNA, 400 μM of each dNTP, 1× AmpliTaq Gold PCR buffer, 4 mM MgCl2, and 1.25 U AmpliTaq Gold polymerase. The gene-specific PCR primer concentrations are listed in the Table 5. Mineral oil was added to each reaction before thermal cycling. The PCR procedure included a pre-denaturation step at 95° C. for 10 minutes, 15 cycles of three-step amplification with each cycle consisting of denaturation at 94° C. for 30 second, annealing at 60° C. for 1 minute, and extension at 72° C. for 1 minute. A final extension step was at 72° C. for 5 minutes.
The second stage of multiplex PCR amplification was primed from the universal sequences (UniB) located at the extreme 5′ end of the gene-specific primers. The second stage PCR reaction mixture (12.50 consisted of 400 μM of each dNTP, 1× AmpliTaq Gold PCR buffer, 4 mM MgCl2, 12.5 μmol universal primer B (UniB) and 1.25 U AmpliTaq Gold polymerase. The UniB PCR primer sequence is listed in the Table 5. The 12.5 μl reaction mixtures were added through the mineral oil to the finished first stage PCR reactions. The PCR procedure included a pre-denaturation step at 95° C. for 10 minutes, 30 cycles of three-step amplification with each cycle consisting of denaturation at 94° C. for 30 second, annealing at 55° C. for 1 minute, and extension at 72° C. for 1 minute. A final extension step was at 72° C. for 5 minutes.
After the two-stage PCR reaction, 1.25 μl Qiagen Proteinase K (approximately 20 mg/ml) was added to the total 25 μl reaction. The Proteinase K digestion condition consisted of 70° C. for 10 minutes and 90° C. for 15 minutes.
Ligation detection reactions were carried out in a 20 μl volume containing 20 mM Tris-HCl pH 7.6, 10 mM MgCl2, 100 mM KCl, 20 mM DTT, 1 mM NAD, 50 fmol wild-type Tth ligase, 500 fmol each of LDR probes, 5-10 ng each of the PCR amplicons. The Tth ligase can be diluted in a buffer containing 15 mM Tris-HCl pH 7.6, 7.5 mM MgCl2, 0.15 mg/ml BSA. To ensure the scoring accuracy of a LRAT promoter methylation status, 30 LDR probes were designed to interrogate the methylation levels of ten CpG dinucleotide sites within the PCR amplified regions. Two discriminating LDR probes and one common LDR probe were designed for each of the CpG sites. The LDR probe mix contains 60 discriminating probes (30 probes for each channel) and 10 common probes (Table 6). The reaction mixtures were pre-heated for 3 minutes at 95° C., and then cycled for 20 rounds of 95° C. for 30 seconds and 60° C. for four minutes.
The ligation detection reaction (20 μl) was diluted with equal volume of 2× hybridization buffer (8×SSC and 0.2% SDS), and denatured at 95° C. for 3 minutes then plunged on ice. The Universal Arrays (Amersham Biosciences, Piscataway, N.J.) were assembled with ProPlate slide modules (Grace Bio-Labs, Bend, Oreg.) and filled with the 40 μl denatured LDR mixes. The assembled arrays were incubated in a rotating hybridization oven for 60 minutes at 65° C. After hybridization, the arrays were rinsed briefly in 4×SSC and washed in 2×SSC, 0.1% SDS for 5-10 minutes at 63.5° C. The fluorescent signals were measured using a ProScanArray scanner (Perkin Elmer, Boston, Mass.).
LDR is a single tube multiplex reaction with three probes interrogating each of the selected CpG sites. LDR products are captured on a Universal microarray using the ProPlate system (Grace BioLabs) where 64 hybridizations (four slides with 16 sub-arrays each) are carried out simultaneously. Each slide is scanned using a Perkin Elmer ProScanArray (Perkin Elmer, Boston, Mass.) under the same laser power and PMT within the linear dynamic range. The Cy3 and Cy5 dye bias were determined by measuring the fluorescence intensity of an equal quantity of Cy3 and Cy5 labeled LDR probes manually deposited on a slide surface. The fluorescence intensity ratio (W=ICy3/ICy5) was used to normalize the label bias when calculating the methylation ratio Cy3/(Cy3+Cy5). The methylation standard curves for each interrogated CpG dinucleotide were established using various combinations of in vitro methylated and unmethylated normal human lymphocyte genomic DNAs. The methylation levels of six CpG dinucleotides in the 5′-UTR regions were averaged and used to determine the overall promoter methylation status of LRAT gene.
Because PCR primer and LDR probe design does not bias amplification or detection of methylation status, independent of methylation status of neighboring CpG dinucleotides (i.e. by using nucleotide analogues or degenerate bases within the primer designs), it is possible to quantify methylation status of given CpG sites in the genome.
To demonstrate that the assay is quantitative, genomic DNA in vitro methylated with SssI methylase was mixed with normal human lymphocytes DNA (carrying unmethylated alleles), such that the test samples contained 0%, 20%, 40%, 60%, 80%, and 100% of methylated alleles and these mixtures were subjected to Bisulfite-PCR/LDR/Universal Array analysis. The fluorescence intensity is presented by Cy3 (methylated alleles) or Cy5, (unmethylated alleles) on each double spotted zipcode addresses. The average fluorescence intensity of two duplicated spots was used to calculate the methylation ratio of each analyzed cytosine using the formula Cy3average/(Cy3 average+Cy5 average).
The measured methylation ratios of each interrogate cytosine was plotted against the methylation levels of mixed genomic DNAs. The R2 values (correlation coefficient) of these experiments are between 0.97 and 0.89, which demonstrates the linearity of the described assay. Such standard curves can be used as reference points for further measurements done in clinical samples. Similar standard curves were also established for genes such as p16INK4a, p14ARF, TIMP3, APC, RASSF1, ECAD, MGMT, DAPK, GSTP1 and RARβ (Cheng et al., “Multiplexed Profiling of Candidate Genes for CpG Island Methylation Status Using a Flexible PCR/LDR/Universal Array Assay,” Genome Res. 16(2):282-289 (2006), which is hereby incorporated by reference in its entirety). In “100%” in vitro methylated DNA sample, the Cy3average/(Cy3 average+Cy5 average) ratios of the investigated CpG sites were between 0.6 and 0.9. This observation suggested that in vitro methylation is not fully efficient due to sequence context variation of each CpG site. This analysis also confirmed the different percentage of methylation at each CpG dinucleotide and suggested that methylation level is not 100% at each CpG site in cell line DNA (Cheng et al., “Multiplexed Profiling of Candidate Genes for CpG Island Methylation Status Using a Flexible PCR/LDR/Universal Array Assay,” Genome Res. 16(2):282-289 (2006), which is hereby incorporated by reference in its entirety). By comparing the ratio of (methylated):(methylated+unmethylated) DNA in different cell lines, one can extrapolate the CpG methylation level at a given position. Overall, the data demonstrate that the bisulfite-PCR/LDR/Universal Array approach is a quantitative method for the measurement of DNA methylation.
Since aberrant DNA methylation may also result from aging, it is necessary to identify a promoter region where its methylation is disease specific. To demonstrate LRAT promoter region methylation is tumor specific, CRC tumor samples (n=133) and the adjacent normal tissues (n=69) were analyzed using bisulfite/PCR-PCR/LDR/Universal Array approach. For each clinical sample, the methylation levels of ten CpG dinucleotide sites residing in the 5′-UTR (CpG sites 1-6) and exon-1 (CpG sites 7-10) regions of LRAT promoter were interrogated. Since the tumor (disease) specific aberrant methylation was identified in the 5′-UTR, the methylation levels of CpG sites 1-6 were averaged (the mean value) to determine the overall promoter methylation status. A promoter with a mean value of methylation signal intensity greater than 0.2 was scored as hypermethylated (methylation score 1), while a mean value equal to or less than 0.2 was scored as unmethylated (methylation score 0). This approach allowed a simple scoring system to use quantitative methylation data from multiple representative CpG sites across a larger DNA sequence region. Such quantitative reports give non-ambiguous and repeatable results of study DNA methylation.
A series of 133 CRC patient samples from Memorial-Sloan Kettering Cancer Center tumor bank were subject to bisulfite/PCR-PCR/LDR/Universal Array analysis. The methylation levels of ten CpG dinucleotide sites in the LRAT promoter region were determined for each CRC sample. The average methylation level of CpG sites 1-6 was used to score the overall LRAT promoter methylation status. A hypermethylated promoter was defined as having an average methylation level greater than 0.2.
LRAT promoter hypermethylation in CRCs was initially studied in microsatellite instability (MSI) tumors that often show multiple hypermethylated genes. LRAT hypermethylation was found in 36 of 40 MSI samples (90%) and was confirmed using methylation specific PCR (
LRAT promoter methylation is significantly associated with increased survival for all sporadic, non-MSI CRC patients. When all four CRC stages were considered, patients with LRAT promoter hypermethylation had a better disease-specific survival rate than patients with unmethylated promoter (
In a validation study, Kaplan-Meier survival analysis was carried out on an additional 44 non-MSI colorectal samples (total n=125) (
Since the MSI patients typically have a better survival and clinical outcome, Kaplan-Meier survival analysis was performed on patients with non-MSI genotype. Survival was measured from the date of resection of colorectal cancer to the date of death, the completion of 5 years of follow-up, or the last clinical review before April 2006. Only cancer-related deaths were analyzed as events. A p-value of less than 0.05 was considered as statistical significance.
Although preferred embodiments have been depicted and described in detail herein, it will be apparent to those skilled in the relevant art that various modifications, additions, substitutions, and the like can be made without departing from the spirit of the invention and these are therefore considered to be within the scope of the invention as defined in the claims which follow.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/104,574 filed Oct. 10, 2008, which is hereby incorporated by reference in its entirety.
This invention was made with government support under grant numbers P01-CA65930 and HHSN261200700388P, both awarded by the National Cancer Institute. The government has certain rights in this invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2009/005573 | 10/13/2009 | WO | 00 | 6/29/2011 |
Number | Date | Country | |
---|---|---|---|
61104574 | Oct 2008 | US |