The invention relates to diagnosis, prognosis and treatment of gastric cancer.
Gastric adenocarcinoma (gastric cancer, GC) is the second leading cause of global cancer mortality and 4th most common cancer worldwide. Most GC patients present with late stage disease with an overall 5-year survival of about 20%. A wealth of clinical, molecular, and pathological data suggests that GC is a heterogeneous disease. Objective response rates to conventional chemotherapeutic regimens range from 20-40%, indicating that individual GCs can exhibit a range of responses when treated identically. Canonical oncogenic pathways such as E2F, K-RAS, p53, and Wnt/β-catenin signalling are also known to be deregulated with varying frequencies in GC, suggesting a high degree of molecular heterogeneity. However, despite evidence that GCs can exhibit striking inter-individual differences in disease aggressiveness, histopathologic features, and responses to therapy, most GC patients today are managed alike with a “one size fits all” approach resulting in markedly diverse clinical outcomes. Approaches capable of classifying heterogeneous populations of GC patients into biologically and clinically homogenous subgroups are thus urgently required, such that GC patient prognoses can be accurately predicted, and clinical decisions made based on the underlying biology of each subgroup.
Reflecting this urgency, several classification systems for GC have been reported over the decades. In 1965, Lauren described two main subtypes of GC, intestinal (G-INT) and diffuse (G-DIF), on the basis of microscopic features observed in gastric tumors (Lauren P., Acta Pathol Microbiol Scand, 1965, 64:31-49). But note that while the intestinal and diffuse subtypes are correlated with G-INT and G-DIF, about 30% of cases are discordant. Thus Lauren's classification and G-INT/G-DIF should not be regarded as the same. Since then, several other GC histopathological classifications have since been developed, such as the systems of the WHO (Jass J. R. et al., Cancer, 1990, 66:2162-7); Ming S. C., Cancer, 1977, 39:2475-85; Mulligan R. M., Pathol Annu, 1972, 7:349-415; and Goseki N. et al., Gut, 1992, 33:606-12, and more recently, molecular classifications based on immunohistochemistry, gene expression profiles (Kim B. et al., Cancer Res, 2003, 63:8248-5518-20; Vecchi M. et al., Oncogene, 2007, 26:4284-94; and Boussioutas A. et al., Cancer Res, 2003, 63:2569-77), proteomics (Lee H. S. et al., Clin Cancer Res, 2007, 13:4154-63), and integrative systems biology approaches (Aggarwal A. et al., Cancer Res, 2006, 66:232-41; Tay S. T. et al., Cancer Res, 2003, 63:3309-16; Myllykangas S. et al., Int J Cancer, 2008, 123:817-25). However, to date, none of these GC classification systems been shown to provide reliable independent prognostic information, nor have they been able to suggest specific treatment options for patients.
One common feature shared by most previously-described GC classification systems is that they have principally focused on the characterization of primary tumors, which are known to contain many distinct cell types including tumor cells, fibroblastic/desmoplastic stroma, blood vessels, and immune cells.
There remains a need for a clinically meaningful GC taxonomy to classify GC and to provide prognostic and predictive value.
The invention relates to methods for diagnosis and prognosis of gastric cancer. The approach described herein aims to distinguish intestinal-type gastric cancer (G-INT) from diffuse-type gastric cancer (G-DIF). The genomic expression signatures as disclosed herein define two major sets of genes. It is submitted that a diagnosis of gastric cancer G-INT and G-DIF can be made on the basis of the expression levels of these genes. This can lead to a better prognosis and treatment of gastric cancer.
In one aspect, the invention relates to a method of diagnosing intestinal-type gastric cancer (G-INT). The method comprises the step of determining the expression levels of the following Group A1 genes in gastric tissue in a biological sample from a subject having gastric cancer: TSPAN8, GPX2, LYZ, PLS1, LGALS4, FUT2, C5orf32, ATAD4, DEGS2, NOSTRIN, MUC13, ALDH3A1, MYO1A, ABCC3, AGR3, VILL, SH3RF1, TRAK1, EGLN3, CDH17, BCL2L14, CEACAM1, LIPH, RSPH1, KALRN, CAPN8, CLCN3, PLEK2 and TMC5. In addition, the expression level of at least one of the following Group A2 genes in the biological sample may also be determined for greater accuracy and precision: CYP3A5, EPS8L3, FA2H, TOX3, BAIAP2L2, PIP5K1B, AGPAT2, BCL2L15, TNFRSF11A, PLCH1, GPR35, ATP10B, TC2N, MMP28, CYP3A5, LLGL2, CAPN10, TRNP1, SDCBP2, MYB, ACSM3, REG4, CYP2C18, PRR15, SGK493, HNF4G, TMEM45B, KLF5, UGT8, RNF128, KCNE3, LOC100133019, DNAJC22, ST6GALNAC1, CLRN3, GDF15, RNF43, KIAA0746, USH1C, CLDN2, EHF, FOXA3, POF1B, LOC286208, C9orf152, GMDS, SLC22A18AS, C11orf9, LOC100131701, TMPRSS4, SLC37A1, PTK6, CEACAM5, SULT2B1, LOC120376, MST1R, ELF3, SLC26A9, SLC40A1, PTPRB, AGR2, GALNT12 and HEPH. An increase in the expression levels of the Group A1 and optional Group A2 genes in the subject, in comparison with expression levels of the genes in non-cancerous gastric tissue, would indicate that the subject has G-INT.
A further aspect of the invention relates to a method of diagnosing diffuse-type gastric cancer (G-DIF). The method comprises determining the expression levels of the following Group B1 genes in gastric tissue in a biological sample from a subject having gastric cancer: RDX, TBCEL, FERMT2, MYO5A, SOAT1, FADS1, MYH10, FNBP1, ELOVL5, ABL2, PGBD1, SELM, LOXL2, cN-PAC, FZD2, KIAA1586 and RASSF8. In addition, the expression level of at least one of the following Group B2 genes in the biological sample may also be determined for greater accuracy and precision: NUAK1, TMEFF1, SCHIP1, TMEM136, ZCCHC11, FAM101B, FAM127A, SIX4, DENND5A, TTC7B, ZNF512B, KIRREL, GNB4, FN1, GJC1, GLIPR2, FJX1, DSE, ENAH, DNAH14, CALD1, GPRASP2, HEG-int, DLX1, TIMP3, GLT8D4, LPHN2, PTPRS, FRMD6, SNAP47, WHAMML1, WHAMML2, GATA2, APH1B, MLLT11, PPM1F, SNX21, ANXA6, PKIG, ANTXR1, ATP8B2, CSRP2, DEGS1, KLHDC8B, DEPDC1, CSE1L, WDR35, SAMD4A, TRIM23, FAM92A1, S1PR3, TUBA1A, LOC644450, PTPN1, HOMER3, IGFBP7, TSR1, AURKB, MSX1, CTSL1, TEAD1, LOC283658 and MAP1B. An increase in the expression levels of the Group B1 and optional Group B2 genes in the subject, in comparison with expression levels of the genes in non-cancerous gastric tissue, would indicate that the subject has G-DIF.
In accordance with another aspect of the invention, there is provided a method of diagnosing G-INT by RNA analysis. The method comprises steps such as: obtaining a gastric tissue sample from a subject with gastric cancer; isolating RNAs from the sample for a gene expression analysis; analyzing the RNAs by a hybridization analysis or a sequencing analysis to determine the expression levels of the following Group A1 genes in the sample: TSPAN8, GPX2, LYZ, PLS1, LGALS4, FUT2, C5orf32, ATAD4, DEGS2, NOSTRIN, MUC13, ALDH3A1, MYO1A, ABCC3, AGR3, VILL, SH3RF1, TRAK1, EGLN3, CDH17, BCL2L14, CEACAM1, LIPH, RSPH1, KALRN, CAPN8, CLCN3, PLEK2 and TMC5, and optionally determining the expression level of at least one of the following Group A2 genes in the sample: CYP3A5, EPS8L3, FA2H, TOX3, BAIAP2L2, PIP5K1B, AGPAT2, BCL2L15, TNFRSF11A, PLCH1, GPR35, ATP10B, TC2N, MMP28, CYP3A5, LLGL2, CAPN10, TRNP1, SDCBP2, MYB, ACSM3, REG4, CYP2C18, PRR15, SGK493, HNF4G, TMEM45B, KLF5, UGT8, RNF128, KCNE3, LOC100133019, DNAJC22, ST6GALNAC1, CLRN3, GDF15, RNF43, KIAA0746, USH1C, CLDN2, EHF, FOXA3, POF1B, LOC286208, C9orf152, GMDS, SLC22A18AS, C11orf9, LOC100131701, TMPRSS4, SLC37A1, PTK6, CEACAM5, SULT2B1, LOC120376, MST1R, ELF3, SLC26A9, SLC40A1, PTPRB, AGR2, GALNT12 and HEPH, wherein higher expression levels of the Group A1 and optional Group A2 genes in the subject, compared to expression levels of the genes in non-cancerous gastric tissue, indicates that the subject has G-INT.
In accordance with another aspect of the invention, there is provided a method of diagnosing G-DIF by RNA analysis. The method comprises steps such as: obtaining a gastric tissue sample from a subject with gastric cancer; isolating RNAs from the sample for a gene expression analysis; analyzing the RNAs by a hybridization analysis or a sequencing analysis to determine the expression levels of the following Group B1 genes in the sample: RDX, TBCEL, FERMT2, MYO5A, SOAT1, FADS1, MYH10, FNBP1, ELOVL5, ABL2, PGBD1, SELM, LOXL2, cN-PAC, FZD2, KIAA1586 and RASSF8, and optionally determining the expression level of at least one of the following Group B2 genes in the sample: NUAK1, TMEFF1, SCHIP1, TMEM136, ZCCHC11, FAM101B, FAM127A, SIX4, DENND5A, TTC7B, ZNF512B, KIRREL, GNB4, FN1, GJC1, GLIPR2, FJX1, DSE, ENAH, DNAH14, CALD1, GPRASP2, HEG-int, DLX1, TIMP3, GLT8D4, LPHN2, PTPRS, FRMD6, SNAP47, WHAMML1, WHAMML2, GATA2, APH1B, MLLT11, PPM1F, SNX21, ANXA6, PKIG, ANTXR1, ATP8B2, CSRP2, DEGS1, KLHDC8B, DEPDC1, CSE1L, WDR35, SAMD4A, TRIM23, FAM92A1, S1PR3, TUBA1A, LOC644450, PTPN1, HOMER3, IGFBP7, TSR1, AURKB, MSX1, CTSL1, TEAD1, LOC283658 and MAP1B, wherein higher expression levels of the Group B1 and optional Group B2 genes in the subject, compared to expression levels of the genes in non-cancerous gastric tissue, indicates that the subject has G-DIF.
In certain aspects of the invention, the hybridization analysis comprises a microarray analysis. In certain aspects, the microarray analysis uses commercially available microarrays such as an Affymetrix Human Genome U133 Plus 2.0 array or an Affymetrix U1333AB array. In other aspects, the hybridization analysis comprises a microarray analysis using an Illumina Human-6 v2 Expression Beadchips. In other aspects, the hybridization analysis comprises a customized array comprising probes for detection of the genes of the methods described herein.
In other aspects of the invention, the hybridization analysis comprises a real-time polymerase chain reaction with detection of amplification of genes by fluorescent probes.
In certain aspects of the invention, the sequencing analysis comprises a high-throughput sequencing analysis. In certain aspects, the high-throughput sequencing methods include, but are not limited to SOLiD sequencing, 454 sequencing and Solexa sequencing. In certain aspects, the high-throughput sequencing methods are used in conjunction with SAGE or superSAGE for the gene expression analysis.
In certain aspects of the invention, the gene expression analysis comprises a comparative genomic hybridization assay. In some embodiments, this assay includes detection by epifluorescence microscopy.
In accordance with another aspect of the invention, there is provided a method of diagnosing G-INT by protein analysis. The method comprises steps such as: obtaining a gastric tissue sample from a subject with gastric cancer; isolating proteins from the sample for a gene expression analysis; analyzing the proteins by a protein affinity-based method or by a mass spectrometry-based proteomics method to determine the levels of proteins encoded by the following Group A1 genes in the sample: TSPAN8, GPX2, LYZ, PLS1, LGALS4, FUT2, C5orf32, ATAD4, DEGS2, NOSTRIN, MUC13, ALDH3A1, MYO1A, ABCC3, AGR3, VILL, SH3RF1, TRAK1, EGLN3, CDH17, BCL2L14, CEACAM1, LIPH, RSPH1, KALRN, CAPN8, CLCN3, PLEK2 and TMC5, and optionally determining the expression level of at least one of the following Group A2 genes in the sample: CYP3A5, EPS8L3, FA2H, TOX3, BAIAP2L2, PIP5K1B, AGPAT2, BCL2L15, TNFRSF11A, PLCH1, GPR35, ATP10B, TC2N, MMP28, CYP3A5, LLGL2, CAPN10, TRNP1, SDCBP2, MYB, ACSM3, REG4, CYP2C18, PRR15, SGK493, HNF4G, TMEM45B, KLF5, UGT8, RNF128, KCNE3, LOC100133019, DNAJC22, ST6GALNAC1, CLRN3, GDF15, RNF43, KIAA0746, USH1C, CLDN2, EHF, FOXA3, POF1B, LOC286208, C9orf152, GMDS, SLC22A18AS, C11orf9, LOC100131701, TMPRSS4, SLC37A1, PTK6, CEACAM5, SULT2B1, LOC120376, MST1R, ELF3, SLC26A9, SLC40A1, PTPRB, AGR2, GALNT12 and HEPH, wherein higher expression levels of the Group A1 and optional Group A2 genes in the subject, compared to expression levels of the genes in non-cancerous gastric tissue, indicates that the subject has G-INT;
In accordance with another aspect of the invention, there is provided a method of diagnosing G-DIF by protein analysis. The method comprises steps such as: obtaining a gastric tissue sample from a subject with gastric cancer; isolating proteins from the sample for a gene expression analysis; analyzing the proteins by a protein affinity-based method or by a mass spectrometry-based proteomics method to determine the expression levels of the following Group B1 genes in the sample: RDX, TBCEL, FERMT2, MYO5A, SOAT1, FADS1, MYH10, FNBP1, ELOVL5, ABL2, PGBD1, SELM, LOXL2, cN-PAC, FZD2, KIAA1586 and RASSF8, and optionally determining the expression level of at least one of the following Group B2 genes in the sample: NUAK1, TMEFF1, SCHIP1, TMEM136, ZCCHC11, FAM101B, FAM127A, SIX4, DENND5A, TTC7B, ZNF512B, KIRREL, GNB4, FN1, GJC1, GLIPR2, FJX1, DSE, ENAH, DNAH14, CALD1, GPRASP2, HEG-int, DLX1, TIMP3, GLT8D4, LPHN2, PTPRS, FRMD6, SNAP47, WHAMML1, WHAMML2, GATA2, APH1B, MLLT11, PPM1F, SNX21, ANXA6, PKIG, ANTXR1, ATP8B2, CSRP2, DEGS1, KLHDC8B, DEPDC1, CSE1L, WDR35, SAMD4A, TRIM23, FAM92A1, S1PR3, TUBA1A, LOC644450, PTPN1, HOMER3, IGFBP7, TSR1, AURKB, MSX1, CTSL1, TEAD1, LOC283658 and MAP1B, wherein higher expression levels of the Group B1 and optional Group B2 genes in the subject, compared to expression levels of the genes in non-cancerous gastric tissue, indicates that the subject has G-DIF.
In certain aspects of the invention, the protein affinity method comprises detection of specific proteins using interactions with antibodies or antibody fragments. The interactions may be provided by antibodies or antibody fragments. The antibodies or antibody fragments may be deposited on an antibody microarray.
In other aspects of the invention, the mass-spectrometry-based proteomics method uses Fourier Transform electrospray ionization mass spectrometry or matrix-assisted laser ionization/desorption mass spectrometry.
In one aspect of the invention, the mass-spectrometry-based proteomics analysis method is APEX.
A further aspect of the invention relates to a method for prognosis of gastric cancer in a subject. The method comprises the steps of determining the expression levels of the Group A1 genes and Group B1 genes as defined above, in gastric tissue in a biological sample from a subject having gastric cancer, and optionally determining the expression level of at least one of the Group A2 genes and Group B2 as defined above. Compared to expression levels of the genes in non-cancerous gastric tissue, an increase in the expression levels of the Group A1 and optional Group A2 genes would indicate that the subject has G-INT. Similarly, an increase in the expression levels of the Group B1 and optional Group B2 genes would indicate that the subject has G-DIF. Information about whether the subject has G-INT or G-DIF would be of prognostic value.
A further aspect of the invention relates to a method of treating gastric cancer in a subject. The method comprises determining whether the subject has intestinal-type gastric cancer (G-INT) or diffuse-type gastric cancer (G-DIF) by determining the expression levels of the Group A1 genes in gastric tissue in a biological sample from a subject having gastric cancer, and optionally determining the expression level of at least one of the Group A2 genes; and determining the expression levels of the Group B1 genes from the same subject, and optionally determining the expression level of at least one of the Group B2 genes. Then, guided by the results, chemotherapeutic treatment may be designed for the subject, taking into account the likelihood that the subject has G-INT or G-DIF. If the subject has G-INT, administering 5-fluorouracil or a fluoropyrimidine, and/or oxaliplatin to the subject may be appropriate. If the subject has G-DIF, administering cisplatin as an example may be appropriate.
A further aspect of the invention relates to an array comprising a set of polynucleotide probes. The set of polynucleotide probes are specific for the expression products of the Group A1 genes as defined above, and optionally at least one of the Group A2 genes as defined above. Alternatively, the set of polynucleotide probes are specific for the expression products of the Group B1 genes defined above, and optionally at least one of the Group B2 genes as defined above. It is contemplated that the set of polynucleotide probes are specific to the genes associated with gastric cancer, i.e. the Groups A1, A2, B1 and B2 genes, and does not include irrelevant genes. The array can comprise the set of polynucleotides specific for the expression products of the Group A1 genes and the Group B1 genes.
In drawings illustrating embodiments of the invention:
Due to the high level of tissue complexity, subtle variations in diverse cell types, both across and within-tumors, can cause differences in interpretation between observers, and ultimately pose difficulties for standardization across different centres. The present invention provides an alternative strategy that initially focused not on primary GCs, but on a diverse panel of GC cell lines. Since cancer cell lines are devoid of other cell types such as fibroblasts, endothelial, and immune cells, any genomic differences detected in cell lines should be by nature tumor-centric and thereby “intrinsic” to the underlying biology of the GC cancer cell.
Investigation of a large panel of GC cell lines permitted us to identify a genomic expression signature clearly defining two major intrinsic subgroups of GC. These intrinsic subgroups were validated in primary tumors and, when applied to 4 independent GC cohorts, the intrinsic subtypes proved capable of providing independent prognostic information (see Example 5). In vitro and in vivo evidence also demonstrated that GCs belonging to different intrinsic subtypes may respond differently to various standard-of-care chemotherapies.
Unlike previous approaches for comparative molecular examination of GC (Jinawath N. et al., Oncogene, 2004, 23:6830-44; Wang L. et al., World J Gastroenterol, 2006, 12:6949-54; Meireles S. I. et al., Cancer Res, 2004, 64:1255-65), the method described herein used unsupervised approaches for subclass discovery. The present invention aims to address several deficiencies in approaches known in the art, namely a) the major distinctions in the molecular heterogeneity of GC might be unrelated to presently known classification systems or phenotypes, and b) using current classification systems, reproducibility among pathologists is only about 70% (Arslan C. et al., Histopathology 1982, 6:391-8; Dixon M. F. et al., Histopathology, 1994, 25:309-16; Palli D. et al., Br J Cancer, 1991, 63:765-8; Shibata A. et al., Cancer Epidemiol Biomarkers Prey, 2001, 10:75-8) and this lack of inter-observer concordance might compromise supervised analysis. Testing of several different prediction algorithms confirmed that the intrinsic subtypes exhibited stable and reproducible classification performance in cell lines and primary tumors, thus demonstrating that the subtypes are statistically robust.
Using a strict filtering criteria (FDR<0.002), a genomic classifier of 171 genes exhibiting differential regulation between the subtypes was identified. Biological curation of the classifier confirmed that the intrinsic subtypes are associated with very different gene expression features, cellular processes and biological pathways. These results demonstrate that the intrinsic subtypes are very distinct and may represent distinct lineages.
The clinical relevance of the intrinsic subclasses is supported by the finding that it can act as an independent predictor of clinical survival in multiple patient cohorts, even after controlling for tumor stage. Intestinal cancers are classically characterized by glandular differentiation on a background of gastric atrophy or intestinal metaplasia, while diffuse cancers typically appear as rows of single mononuclear “signet ring” cells with little cell adhesion. These apparently distinct features, however, are not always discernable in clinical samples where inter-observer variation and unclassifiable or “mixed” subtypes are not uncommonly reported. As described herein, patients stratified by Lauren's histopathology did not exhibit significantly different survival outcomes, while patients discordant between the intrinsic subclasses and Lauren's exhibited survival patterns that support the intrinsic genomic taxonomy. The present results show that the intrinsic subclasses provide information about the predominant lineage in GC samples that may not be precisely distinguished by morphology, and that this information is clinically relevant.
Besides gene expression, two genes in the classifier (LGALS4 and L1-Cadherin (CDH17)) were employed as immunohistochemical markers for the G-INT intrinsic subtype. LGALS4 and CDH17 have been previously reported to be differentially regulated across subsets of gastric tumors (Chen X. et al., Mol Biol Cell, 2003, 14:3208-15) and cell lines (Ji J. et al., Oncogene, 2002, 21:6549-56), and expressed in intestinal metaplasia (Dong W. et al., Dig Dis Sci, 2007, 52:536-42; Lee H. J., Gastroenterology, 2010, 139:213-25 e3). CDH17 was recently reported as a prognostic factor in early-stage GC (Lee H. J., Gastroenterology, 2010, 139:213-25 e3), a marker of poor prognosis in another study (Ito R. et al., Virchows Arch, 2005, 447:717-22), and a potential therapeutic target in experimental models (Liu Q. S. et al., Cancer Sci, 2010, 101:1807-12). The 2-marker positive group was specifically compared to the 2-marker negative group to confidently distinguish between the GINT and G-DIF cancers. Our results showed that the one-third of 1-marker positive patients also appeared to exhibit an improved survival trend compared to the 2-marker negative group (CDH17, p=0.08 adjusted for stage; LGALS4, p=0.07 adjusted for stage). These results show that some of the 1-marker positive cancers may also be G-INT cancers as well (
In vitro, G-INT lines were more sensitive to 5-FU and oxaliplatin than G-DIF cell lines, but were also more resistant to cisplatin. The absolute magnitude of these in vitro differential sensitivities is about 3-5 fold. A significant interaction between the intrinsic subtypes and differential benefit from adjuvant 5-FU therapy was observed in retrospective patient cohorts (Table 3 and Table 8). These results show that in addition to patient prognosis, the intrinsic subtypes can be used to guide treatment selection.
In INT-0116 (Macdonald J. S., J Clin Oncol, 2009, 27:abst 4515), a ten-year update subgroup analysis revealed that all GC subsets benefited from 5-FU therapy except for cases with diffuse histology. Moreover, in JCOG 9912 (Boku N. et al., Lancet Oncol, 2009, 10:1063-9) which established S-1 monotherapy as a first-line palliative chemotherapy option in Japan, benefit of irinotecan/cisplatin over 5-FU based monotherapy was observed in diffuse but not intestinal GCs. The results described herein are consistent with subgroup analysis of these two large GC clinical trials. Therefore, the intrinsic subtypes described herein provide a clinically relevant genomic taxonomy of GC with prognostic and predictive value.
The genomic expression signatures identified herein define two major intrinsic subgroups of GC which allows for differentiation between G-INT and G-DIF:
Intestinal-type gastric cancer (G-INT) involve the 92 gene(s) listed in Table 5 (referred to henceforth as “Group A”): TSPAN8, GPX2, LYZ, PLS1, LGALS4, FUT2, C5orf32, ATAD4, DEGS2, NOSTRIN, MUC13, ALDH3A1, MYO1A, ABCC3, AGR3, VILL, SH3RF1, TRAK1, EGLN3, CDH17, BCL2L14, CEACAM1, LIPH, RSPH1, KALRN, CAPN8, CLCN3, PLEK2, TMC5, CYP3A5, EPS8L3, FA2H, TOX3, BAIAP2L2, PIP5K1B, AGPAT2, BCL2L15, TNFRSF11A, PLCH1, GPR35, ATP10B, TC2N, MMP28, CYP3A5, LLGL2, CAPN10, TRNP1, SDCBP2, MYB, ACSM3, REG4, CYP2C18, PRR15, SGK493, HNF4G, TMEM45B, KLF5, UGT8, RNF128, KCNE3, LOC100133019, DNAJC22, ST6GALNAC1, CLRN3, GDF15, RNF43, KIAA0746, USH1C, CLDN2, EHF, FOXA3, POF1B, LOC286208, C9orf152, GMDS, SLC22A18AS, C11orf9, LOC100131701, TMPRSS4, SLC37A1, PTK6, CEACAM5, SULT2B1, LOC120376, MST1R, ELF3, SLC26A9, SLC40A1, PTPRB, AGR2, GALNT12 or HEPH. Diffuse-type gastric cancer (G-DIF) involve the 79 gene(s) (referred to henceforth as “Group B”): RDX, TBCEL, FERMT2, MYO5A, SOAT1, FADS1, MYH10, FNBP1, ELOVL5, ABL2, PGBD1, SELM, LOXL2, cN-PAC, FZD2, KIAA1586, RASSF8, NUAK1, TMEFF1, SCHIP1, TMEM136, ZCCHC11, FAM101B, FAM127A, SIX4, DENND5A, TTC7B, ZNF512B, KIRREL, GNB4, FN1, GJC1, GLIPR2, FJX1, DSE, ENAH, DNAH14, CALD1, GPRASP2, HEG-int, DLX1, TIMP3, GLT8D4, LPHN2, PTPRS, FRMD6, SNAP47, WHAMML1, WHAMML2, GATA2, APH1B, MLLT11, PPM1F, SNX21, ANXA6, PKIG, ANTXR1, ATP8B2, CSRP2, DEGS1, KLHDC8B, DEPDC1, CSE1L, WDR35, SAMD4A, TRIM23, FAM92A1, S1PR3, TUBA1A, LOC644450, PTPN1, HOMER3, IGFBP7, TSR1, AURKB, MSX1, CTSL1, TEAD1, LOC283658 or MAP1B.
An increase in the expression level of the above gene(s) in the subject, compared to expression level of the corresponding gene(s) in non-cancerous gastric tissue, indicates that the subject probably has G-INT or G-DIF. Treatment of the subject for GC can be guided accordingly. It should be noted that although 92 genes are indicated for G-INT and 79 genes for G-DIF, not all these genes need to be assayed for expression in order to obtain a diagnostic or prognostic value for G-INT and G-DIF. The aim is to provide a minimum set of polynucleotides that would be useful in diagnosing G-INT or G-DIF. Any number of gene(s) from the above sets that permits diagnosis within acceptable diagnostic parameters is contemplated.
It is contemplated that the number of genes whose expression is to be assayed may be a few from the relevant set, or any number up to all of the genes identified in the relevant set. Specifically, it is contemplated, based on the analysis set forth in the Examples, that the group of 29 genes (referred to henceforth as “Group A1”): TSPAN8, GPX2, LYZ, PLS1, LGALS4, FUT2, C5orf32, ATAD4, DEGS2, NOSTRIN, MUC13, ALDH3A1, MYO1A, ABCC3, AGR3, VILL, SH3RF1, TRAK1, EGLN3, CDH17, BCL2L14, CEACAM1, LIPH, RSPH1, KALRN, CAPN8, CLCN3, PLEK2 and TMC5, would be sufficient for the diagnosis or prognosis of G-INT. Determination of the expression level of at least one additional gene from the remainder of Group A should improve accuracy. It is contemplated that the expression levels of at least 1, 5, 10, or at least 20, at least 30, at least 40, at least 50, or all 63 remaining genes of Group A may be assayed.
For example, the additional genes from Group A can comprise at least one of or any combination of:
EHF, FOXA3, POF1B, LOC286208 and C9orf152;
GMDS, SLC22A18AS, C11orf9, LOC100131701 and TMPRSS4;
SLC37A1, PTK6, CEACAM5, SULT2B1 and LOC120376; and/or
It is also contemplated, based on the analysis set forth in the Examples, that the group of 17 genes (referred to henceforth as “Group B1”): RDX, TBCEL, FERMT2, MYO5A, SOAT1, FADS1, MYH10, FNBP1, ELOVL5, ABL2, PGBD1, SELM, LOXL2, cN-PAC, FZD2, KIAA1586 and RASSF8, would be sufficient for the diagnosis or prognosis of G-DIF. Determination of the expression level of at least one additional gene from the remainder of Group B should improve accuracy for G-DIF diagnosis and prognosis. It is contemplated that the expression levels of at least 1, 5, 10, or at least 20, at least 30, at least 40, at least 50, or all 62 remaining genes of Group B may be assayed.
For example, the additional genes from Group B can comprise at least one of or any combination of:
S1PR3, TUBA1A, LOC644450, PTPN1 and HOMER3; and/or
For further accuracy and precision of gastric cancer prognosis, it is contemplated that the subsets of genes above which are sufficient indicators of G-INT and G-DIF, are both assayed for the same subject. For example, about 44 genes of the 171 genes, based on the results of the analysis in the Examples, to 46 genes (Group A1+Group B1) can be assayed.
Assays of non-relevant genes, i.e. other than the genes of Groups A and B, such as those provided in the Affymetrix DNA array or such arrays known in the art as research tools, are not intended to be included in the present invention. Thus it is contemplated that the expression levels of no other genes than the 171 genes of Groups A1, A2, B1 and B2 are determined.
As used herein, “gastric cancer” is intended to encompass, without limitation, abnormal or uncontrollable cell growth, metastasis, interference with the normal functioning of neighboring cells, release of cytokines or other secretory products at abnormal levels, suppression or aggravation of inflammatory or immunological response, neoplasia, premalignancy, malignancy, invasion of surrounding or distant tissues or organs, such as lymph nodes, etc. “Metastatic disease” refers to cancer cells that have left the original tumor site and migrate to other parts of the body, for example via the bloodstream or lymph system. The two main subtypes of gastric cancer are described by Lauren, that is intestinal-type (G-INT) and diffuse-type (G-DIF) (Lauren P., Acta Pathol Microbiol Scand, 1965, 64:31-49, hereby incorporated by reference).
As used herein, “tissue” is intended to encompass a plurality of functionally related cells. A tissue can be a suspension, a semi-solid, or solid. Tissue includes cells collected from a subject, as well as cell lines grown ex vivo or in vitro.
As used herein, “diagnosing” or “diagnosis” is intended to encompass the process of identifying gastric cancer by its signs, symptoms and results of various tests. Diagnosing gastric cancer includes the methods described herein. In one embodiment, diagnosing gastric cancer includes determining whether a subject likely has intestinal-type gastric cancer (G-INT) or diffuse-type gastric cancer (G-DIF). This determination may help in choosing an appropriate course of treatment with a greater chance of success.
As used herein, “expression” of a gene is intended to encompass the process by which the coded information of a gene is converted into an operational, non-operational, or structural part of a cell, such as the synthesis of a protein. When used in reference to the expression of a nucleic acid molecule, such as a gene, an increase in the expression level of a gene refers to any process which results in an increase in production of a gene product. A gene product can be RNA (such as mRNA, rRNA, tRNA, and structural RNA) or protein. Therefore, an increase in the expression level of a gene includes processes that increase transcription of a gene or translation of mRNA. The “expression level” of a nucleic acid molecule in a cancerous cell or tissue can be altered relative to a non-cancerous or normal (wild type) cell or tissue. Alterations in the expression of a nucleic acid molecule is associated with a change in expression of the corresponding or RNA protein. The change can result in an increase or decrease of the expression product. In certain embodiments, an increase in expression of the relevant set of genes indicate that the gastric cancer is likely to be G-INT or G-DIF. Controls or standards for comparison to a sample, for the determination of differential expression, include samples believed to be normal, for example, a sample such as gastric tissue from a subject that does not have gastric cancer.
An increase in the expression level of a gene includes any detectable increase in the production of a gene product. In certain examples, production of a gene product (such as those listed in Table 5) increases by at least 1.1-fold, at least 1.2-fold, at least 1.3-fold, at least 1.4-fold, at least 1.5-fold, at least 1.6-fold, at least 1.7-fold, at least 1.8-fold, at least 1.9-fold, at least 2-fold, at least 3-fold, or at least 4-fold, as compared to expression level of the gene in non-cancerous tissue which may be gastric tissue.
As is clear from the description above, an expression level of gene can be “determined” using any method available in the art. A variety of methods may be used which involve analysis of nucleic acids and proteins. Traditional methods for analysis of nucleic acids and proteins include Northern blots for analyzing RNA and Western blots for analyzing proteins. The newer techniques described hereinbelow are better suited for high throughput analyses of gene expression levels in most cases.
Nucleic acid-based methods may be based on detection and/or characterization of an mRNA product of the genes of interest. Such nucleic acid-based analysis methods include nucleic acid hybridization-based methods and nucleic acid sequencing methods. These methods require isolation of RNA. A number of commercially-available kits such as the RNeasy purification kits (www.qiagen.com), NucleoSpin RNA columns (www.clontech.com), and GeneJet RNA purification kits, for example are available for this purpose. RNA isolated by such kits can be then used in the methods described herein. In some cases, platform manufacturers will have one or more recommended kits selected for platform compatibility.
Protein-based analyses appropriate for use in the methods described herein include protein affinity detection methods and mass-spectrometry proteomics analysis methods. Processes for purifying proteins for protein-based analyses tend to be more complicated than the processes used to purify RNA and may include a number of chromatographic separation methods, such as size exclusion chromatography, ion exchange chromatography, reversed phase chromatography and affinity chromatography, as well as electrophoretic methods. The uses of these techniques will depend upon the platform used for the subsequent analyses. Furthermore, evaluation of the purified proteins may be needed prior to initiating gene expression analyses. Exemplary methods and techniques for preparing proteins for proteomics analyses can be found, for example, in Purifying Proteins for Proteomics—A Laboratory Manual, 2004, Cold Spring Harbor Press, Richard J. Simpson ed., which is incorporated herein by reference.
In terms of nucleic acid hybridization methods, gene expression analysis is generally performed using a nucleic acid probe for measuring the level of mRNA (or a cDNA corresponding to the mRNA), to which the probe has been engineered to bind, where the probe binds the intended species and provides a distinguishable signal. Exemplary methods for selecting PCR primers and/or hybridization probes are included in Innis et al., eds., 1990, PCR Protocols: A Guide to Methods and Applications, Academic Press Inc., San Diego, Calif.; Froehler et al., 1986, Nucleic Acid Res. 14:5399-5407; McBride et al., 1983, Tetrahedron Lett. 24:246-248, U.S. Pat. No. 7,013,221, each of which is incorporated by reference. Probes usually have lengths of at least 20 nucleotides to provide requisite specificity for detecting expression, although they may be shorter depending upon other species expected to be found in sample.
In some embodiments, a set of nucleic acid probes capable of hybridizing to RNA or cDNA allows quantification of the expression level and prediction of the clinical outcome based on this quantification. In some embodiments, the probes are affixed to a solid support, such as a microarray. Microarrays are described in more detail hereinbelow.
In other embodiments the real time polymerase chain reaction (also known as quantitative PCR(qPCR)) may be used as a hybridization-based method which allows amplified DNA corresponding to the genes of interest to be detected in real time as the amplification reaction progresses. This method requires that the RNA of interest, such as transcribed mRNA be first transcribed to cDNA using reverse transcriptase before amplification begins. Two common methods for detection of products in real-time PCR are: (1) non-specific fluorescent dyes that intercalate with any double-stranded DNA, and (2) sequence-specific DNA probes consisting of oligonucleotides that are labeled with a fluorescent reporter which permits detection only after hybridization of the probe with its complementary DNA target. The physical properties of such dyes and reporters provide the physical characteristics required for quantitation of gene expression in the methods described herein.
Another technique which may be used in the methods described herein is comparative genomic hybridization (CGH). In this technique, DNA samples from subject tissue and from normal control tissue are labeled with different tags for later analysis by fluorescence. After mixing subject and reference DNA along with unlabeled human cot-1 DNA (placental DNA that is enriched for repetitive DNA sequences such as the Alu and Kpn family) to suppress repetitive DNA sequences, the mixture is hybridized to normal metaphase chromosomes or, in the case of array- or matrix-based CGH, to a slide containing hundreds or thousands of defined DNA probes. Using epifluorescence microscopy and quantitative image analysis, regional differences in the fluorescence ratio of gains/losses vs. control DNA can be detected and used for identifying abnormal regions in the genome. CGH is described in detail in U.S. Pat. No. 6,335,167, which is incorporated herein by reference in entirety.
High-throughput nucleic acid sequencing, which is also known to those skilled in the art as “next-generation sequencing” may be used in certain embodiments of the methods described herein. Examples of high throughput sequencing include massively parallel signature sequencing (MPSS) developed by Lynx Therapeutics, (Zhou et al, Methods Mol. Biol. 2006; 331: 285-311, incorporated herein by reference in entirety); the SOLiD platform of Applied Biosciences Inc. (www.appliedbiosystems.com), the pyrosequencing platform developed by 454 Life Sciences (now Roche Diagnostics Inc., www.roche.com/diagnostics/), and Solexa sequencing (Illumina Inc., www.illumina.com), among others.
Next-generation sequencing is particularly powerful in context of the methods described herein when combined with a technique known as superSAGE, a variation of SAGE (serial analysis of gene expression) (see for example, Matsumura et al., Proc. Natl. Acad. Sci. USA 100, 26: 15718, incorporated herein by reference in entirety). In the original SAGE method, mRNA is isolated and a portion of the sequence is extracted from a defined position from each mRNA molecule. The portions are then linked into a long chain or concatemer and cloned into a vector for transfection of bacteria to obtain high copy numbers. The concatemers are then sequenced using modern high throughput methods and the data are processed to count the sequence portions.
SuperSAGE uses the type III-endonuclease EcoP15I of phage P1, to cut 26 bp long sequence tags from cDNA corresponding to each mRNA transcript, expanding the tag-size by at least 6 bp relative to the predecessor techniques SAGE and LongSAGE. The longer tag size allows for a more precise allocation of the tag to the corresponding transcript, because each additional base increases the precision of the annotation considerably. By direct sequencing with modern next-generation sequencing techniques, hundreds of thousands or millions of tags can be analyzed simultaneously, producing very precise and quantitative gene expression profiles. Therefore, this method can provide accurate transcription profiles.
Measurements of proteins for determining protein expression levels can be accomplished by using a specific binding reagent, such as an antibody. One of ordinary skill in the art would recognize that different affinity reagents could be used with present invention, such as one or more antibodies (e.g., monoclonal or polyclonal antibodies) and the invention can include using techniques such as ELISA for the analysis.
Specific antibodies (e.g., specific to the genes of the proteins encoded by the genes of interest) can be used in methods described herein for gene expression analysis. Antibodies and related affinity reagents such as, e.g., antibody fragments, and engineered sequences such as single chain Fvs (scFvs) must specifically bind their intended target, i.e., a protein encoded by a gene included in the molecular signature of interest. Specific binding includes binding primarily or exclusively to an intended target.
Antibodies can be identified and obtained from a variety of sources, such as the MSRS catalog of antibodies (Aerie Corporation, Birmingham, Mich.), or can be prepared via conventional antibody-generation methods. Methods for preparation of polyclonal antisera are taught in, for example, Ausubel, F. M. et al., Current Protocols in Molecular Biology, Volume 2, 1997, pp. 11.12.1-11.12.9 (incorporated by reference). Preparation of monoclonal antibodies is taught for example, in Ausubel, F. M. et al., Current Protocols in Molecular Biology, Volume 2, 1997, pp. 11.4.1-11.11.5 (incorporated by reference in entirety). Preparation of scFvs is taught in, e.g., U.S. Pat. Nos. 5,516,637 and 5,872,215, both of which are incorporated by reference in their entirety.
Antibody arrays can be used in conjunction with the methods described herein. As described by Walter et al, Curr. Opin. Microbiol. 2000, 3: 298-302, (and references contained therein, each of which is incorporated herein by reference in entirety), an attractive method for fabricating antibody arrays involves the use of a micromolded hydrogel stamper and an aminosilylated receiving surface. The stamper deposits protein (e.g. antibody) as a submonolayer, as shown by I125 labelling and atomic force microscopy. This allows antibody activity to be retained. Other approaches described by Walters et al., for preparation of protein microarrays involve using either photolithography of silane monolayers or gold, combining microwells with microsphere sensors, or inkjetting onto polystyrene film. These advances focus on the fabrication of miniaturized immunoassay formats by arraying of single proteins such as monoclonal antibodies.
Also in terms of protein analyses, mass spectrometry-based proteomics methods may be used in the methods described herein. Such methods use matrix-assisted laser desorption/ionization (MALDI) or electrospray ionization (ESI) mass spectrometric characterization of proteins. Adaptations of mass spectrometry-based proteomics methods for gene expression analysis are reviewed, for example, in Pasa-Tolic et al., J. Mass Spectrom. 2002, 37: 1185-1198, which is incorporated herein by reference in entirety.
In one exemplary technique for gene expression profiling, known as APEX (Lu et al., Nature Biotech. 2007, 25: 117), proteins are analyzed using standard shotgun proteomics methods, beginning with tryptic digest of a protein mixture, liquid chromatographic separation of the mixture (2D HPLC), analysis of peptide masses by electrospray ionization mass spectrometry (MS), fragmentation of peptides and subsequent analysis of the fragmentation spectra (MS/MS). The method enables the number of peptides observed per protein to provide an estimate of the abundance of the proteins of interest, thereby quantitating the expression products. Mass spectrometry-based proteomics analysis methods such as APEX can be adapted for gene expression profiling tasks according to the methods described herein without undue experimentation.
As used herein, “biological sample” is intended to encompass a biological specimen containing genomic DNA, RNA (including mRNA), protein, or combinations thereof, obtained from a subject. Examples include, but are not limited to, tissue biopsy, surgical specimen, and autopsy material, or any material from the body which shows the same gene expression profile as gastric tissue. In one example, a sample includes a gastric cancer tissue biopsy.
In a particular embodiment, the gastric tissue biopsy is obtained endoscopically. The gastric tissue biopsy can be processed by a variety of acceptable methods known in the art. For example, the gastric tissue biopsy is placed immediately in RNAlater solution upon obtaining it from a subject. Total RNA is then extracted using any known methods and kits such as the Qiagen RNeasy Mini-kit (Qiagen) according to the instructions of the manufacturer. For the profiling, mRNAs may be hybridized to the probes specific for the sets of relevant genes described herein, preferably on a DNA array, according to techniques described herein as well as those known in the art.
The ability to differentiate between G-INT and G-DIF using the methods of the invention allows for cancer treatment that is directed specifically for treating G-INT or G-DIF by administering a chemotherapeutic agent to the subject in a manner most effective for the treatment of G-INT or G-DIF. In one aspect, once the subject is diagnosed as having intestinal-type gastric cancer, 5-fluorouracil or a fluoropyrimidine, and/or oxaliplatin, or any treatment that is effective for treating G-INT can be administered to the subject. In a further aspect, once the subject is diagnosed as having diffuse-type gastric cancer (G-DIF), cisplatin or any treatment that is effective for treating G-DIF can be administered to the subject.
As used herein, “treating” or “treatment” of gastric cancer is intended to encompass a therapeutic intervention that ameliorates a sign or symptom of a gastric cancer including, but not limited to, indigestion, loss of appetite, abdominal discomfort, abdominal irritation, abdominal pain, weakness, fatigue, bloating of the stomach, usually after meals, nausea, vomiting, diarrhea, constipation, weight loss, bleeding, anemia and dysphagia. Treatment can also induce remission or cure of gastric cancer. In particular examples, treatment includes prevention of gastric cancer, for example by inhibiting the full development or metastasis of a tumor. Prevention of gastric cancer does not require a total absence of disease. For example, a decrease of at least about 10%, at least about 20%, at least about 30%, at least about 40% or at least 50% can be sufficient. As contemplated herein, the treatment of gastric cancer encompasses treatments known in the art.
As used herein, “administration” or “administering” is intended to encompass providing or giving a subject an agent, such as a chemotherapeutic agent, by any effective route, including, but not limited to, injection (such as subcutaneous, intramuscular, intradermal, intraperitoneal, and intravenous), oral, sublingual, rectal, transdermal, intranasal, vaginal and inhalation routes.
As used herein, “chemotherapeutic agent” is intended to encompass any chemical agent with therapeutic usefulness in the treatment of gastric cancer. Examples of chemotherapeutic agents are known in the art (see for example, Slapak and Kufe, Principles of Cancer Therapy, Chapter 86 in Harrison's Principles of Internal Medicine, 14th edition; Perry et al., Chemotherapy, Ch. 17 in Abeloff, Clinical Oncology 2nd ed., 2000 Churchill Livingstone, Inc; Baltzer and Berkery. (eds): Oncology Pocket Guide to Chemotherapy, 2nd ed. St. Louis, Mosby-Year Book, 1995; Fischer Knobf, and Durivage (eds): The Cancer Chemotherapy Handbook, 4th ed. St. Louis, Mosby-Year Book, 1993). Exemplary chemotherapeutic agents used for treating gastric cancer include carboplatin, cisplatin, paclitaxel, docetaxel, doxorubicin, epirubicin, topotecan, irinotecan, gemcitabine, iazofurine, gemcitabine, etoposide, vinorelbine, tamoxifen, valspodar, cyclophosphamide, methotrexate, 5-fluorouracil or an oral fluoropyrimidine, oxaliplatin, mitoxantrone and vinorelbine. Combination chemotherapy is the administration of more than one chemotherapeutic agent to treat cancer. In one embodiment, the chemotherapeutic agent is 5-fluorouracil or a fluoropyrimidine, and/or oxaliplatin.
As used herein, “fluoropyrimidine” is intended to encompass oral fluoropyrimidines including capecitabine, tegafur/ftorafur, S-1, UFT (uracil/ftorafur, an oral agent with combines uracil, a competitive inhibitor of DPD, with the 5-FU prodrug tegafur) or UFT plus oral leucovorin or with folinic acid. S-1 is an orally active combination of tegafur which is a prodrug that is converted by cells to fluorouracil, gimeracil which is an inhibitor of dihydropyrimidine dehydrogenase (DPD) and degrades fluorouracil, and oteracil which inhibits the phosphorylation of fluorouracil in the gastrointestinal tract, thereby reducing the gastrointestinal toxic effects of fluorouracil. An alternative S-1 combination is S-1 (BMS 247616) which is composed of tegafur plus two modulators: a DPD inhibitor (5-chloro-2,4-dihydroxypyridine [CDHP]), and oxonic acid, an inhibitor of phosphoribosyl pyrophosphate transferase (an enzyme located in the gastrointestinal tract that causes decreased 5-FU incorporation into cellular RNA).
The chemotherapeutic agents 5-fluorouracil, oral fluoropyrimidines and/or oxaliplatin are preferred for treating intestinal-type gastric cancer. In another embodiment, the chemotherapeutic agent is cisplatin. The chemotherapeutic agent cisplatin is preferred for treating diffuse-type gastric cancer.
Methods for diagnosis of gastric cancer may involve the use of arrays. Both DNA arrays and protein arrays are contemplated.
In one aspect, the array comprises polynucleotides that hybridize to a subset of the genes listed in Table 5 G-INT involves the subset of 92 gene(s) listed in Table 5 (Group A, defined above). G-DIF involve the 79 gene(s) (Group B, defined above).
It is contemplated that the number of genes being probed on the array may be a few from the relevant set, or any number up to all of the genes identified in the relevant set. Specifically, it is contemplated, based on the analysis set forth in the Examples, that the group of 29 genes of Group A1 as defined above, would be sufficient in an array for the diagnosis or prognosis of G-INT. Inclusion of at least one additional gene on the array from the remainder of Group A should improve accuracy. It is contemplated that the array can include probes specific for at least 10, at least 20, at least 30, at least 40, at least 50, or all 63 remaining genes of Group A.
For example, the array may additionally include probes for at least one of or any combination of the following genes from Group A:
EHF, FOXA3, POF1B, LOC286208 and C9orf152;
GMDS, SLC22A18AS, C11orf9, LOC100131701 and TMPRSS4;
SLC37A1, PTK6, CEACAM5, SULT2B1 and LOC120376; and/or
With respect to GC-DIF, it is contemplated, based on the analysis set forth in the Examples, that the group of 17 genes of Group B1 as defined above, would be sufficient in an array. Inclusion of at least one additional gene on the array from the remainder of Group B should improve accuracy. It is contemplated that the array can include probes specific for at least 1, 5, 10, or at least 20, at least 30, at least 40, at least 50, or all 62 remaining genes of Group B.
For example, the array may additionally include probes for at least one of or any combination of the following genes from Group B:
S1PR3, TUBA1A, LOC644450, PTPN1 and HOMER3; and/or
For further accuracy and precision of gastric cancer prognosis, it is contemplated that the array would include both subsets of genes above which are sufficient indicators of G-INT and G-DIF. For example, the array can include oligonucleotides for about 44 genes of the 171 genes, based on the results of the analysis in the Examples, to all 46 genes of Group A1 and Group B1.
The specific arrays of the invention relate to the sets of genes associated with gastric cancer and are not intended to encompass commercially available microarrays such as a Affymetrix Human Genome U133 plus 2.0 Genechip or an Illumina Human-6 v2 Expression Beadchip, although the general construction of the array may be similar. Accordingly, one aspect of the invention involves determining the level of expression of no more than the sets of genes associated with G-INT or G-DIF, as disclosed herein; that is, it is contemplated that the arrays of the invention include probes for no other genes than the Groups A1, A2, B1 and B2 genes.
DNA microarray technology is known in the art and generally involves an arrayed series of DNA oligonucleotides (probes or reporters) used to hybridize a cDNA or cRNA sample (target) under high-stringency conditions. In a standard microarray, the probes are attached via surface engineering to a solid surface by a covalent bond to a chemical matrix (via epoxy-silane, amino-silane, lysine, polyacrylamide or others). The solid surface can be glass or a silicon chip.
As used herein, “array” is intended to encompass an arrangement of molecules, such as biological macromolecules (such as peptides or nucleic acid molecules) or biological samples (such as tissue sections), in addressable locations on or in a substrate. Arrays are also known as DNA chips or biochips. A “microarray” is an array that is miniaturized so as to require or be aided by microscopic examination for evaluation or analysis.
The array of molecules makes it possible to carry out a very large number of analyses on a sample at one time. In certain exemplary arrays, one or more molecules (such as an oligonucleotide probe) will occur on the array a plurality of times (such as twice), for instance to provide internal controls. In particular examples, an array includes nucleic acid molecules, such as oligonucleotide sequences that are at least 15 nucleotides in length, such as about 15-40 nucleotides in length. In particular examples, an array includes oligonucleotide probes or primers which can be used to detect expression of gastric-cancer-associated molecule sequences, such as at least one of those of the sequences listed in Table 5, such as at least 17, at least 29, at least 46, at least 50, at least 60, at least 75, at least 80, at least 90, at least 100, at least 150, or at least 171 sequences listed in Table 5 (for example, oligonucleotides for the 17 genes of Group B1, or for the 29 genes of Group A1, and optionally 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 44, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 120, 130, 140, 150 or 171 of the remaining genes listed in Groups A and B). These are referred to collectively as oligonucleotide probes that are specific for the gastric cancer-associated genes.
Within an array, each arrayed sample is addressable, in that its location can be reliably and consistently determined within at least two dimensions of the array. The feature application location on an array can assume different shapes. For example, the array can be regular (such as arranged in uniform rows and columns) or irregular. Thus, in ordered arrays the location of each sample is assigned to the sample at the time when it is applied to the array, and a key may be provided in order to correlate each location with the appropriate target or feature position. Often, ordered arrays are arranged in a symmetrical grid pattern, but samples could be arranged in other patterns (such as in radially distributed lines, spiral lines, or ordered clusters). Addressable arrays usually are computer readable, in that a computer can be programmed to correlate a particular address on the array with information about the sample at that position (such as hybridization or binding data, including for instance signal intensity). In some examples of computer readable formats, the individual features in the array are arranged regularly, for instance in a Cartesian grid pattern, which can be correlated to address information by a computer.
Protein-based arrays include probe molecules that are or include proteins, or where the target molecules are or include proteins, and arrays including nucleic acids to which proteins are bound, or vice versa. In some examples, an array contains antibodies to gastric-cancer-associated proteins, such as any combination of proteins encoded by the sequences listed in Table 5, such as at least 17, at least 29, at least 46, at least 50, at least 60, at least 75, at least 80, at least 90, at least 100, at least 150, or at least 171 sequences listed in Table 5 (for example, protein probes for the 17 genes of Group B1, or for the 29 genes of Group A1, and optionally 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 44, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 120, 130, 140, 150 or 171 of the proteins encoded by the remaining genes listed in Groups A and B).
As used herein, “polynucleotide” and “oligonucleotide” refers to nucleic acid molecules representing genes, for example DNA (intron or exon or both), cDNA, or RNA (such as mRNA), of any length suitable for use in detection, as a probe or other indicator molecule, and that is informative about the corresponding gene, such as those listed in Table 5. Nucleic acid molecules means a deoxyribonucleotide or ribonucleotide polymer including, without limitation, cDNA, mRNA, genomic DNA, and synthetic (such as chemically synthesized) DNA. The nucleic acid molecule can be double-stranded or single-stranded. Where single-stranded, the nucleic acid molecule can be the sense strand or the antisense strand. In addition, a nucleic acid molecule can be circular or linear. Polynucleotide includes nucleic acid molecule analogs that function similarly to polynucleotides but which have non-naturally occurring portions. For example, polynucleotide analogs can contain non-naturally occurring portions, such as altered sugar moieties or inter-sugar linkages, such as a phosphorothioate oligodeoxynucleotide.
Particular polynucleotides can include linear sequences up to about 200 nucleotides in length, for example a sequence (such as DNA or RNA) that is at least 6 nucleotides, for example at least 8, at least 10, at least 15, at least 20, at least 21, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 100 or even at least 200 nucleotides long, or from about 6 to about 50 nucleotides, for example about 10-25 nucleotides, such as 12, 15 or 20 nucleotides. In one example, a polynucleotide is a short sequence of nucleotides of at least one of the disclosed gastric-cancer-associated molecules listed in Table 5.
As used herein, “hybridizes to” or “hybridization” is intended to encompass formation of base pairs between complementary regions of two strands of DNA, RNA, or between DNA and RNA, thereby forming a duplex molecule. Hybridization conditions resulting in particular degrees of stringency will vary depending upon the nature of the hybridization method and the composition and length of the hybridizing nucleic acid sequences. Generally, the temperature of hybridization and the ionic strength (such as the Na+ concentration) of the hybridization buffer will determine the stringency of hybridization. Calculations regarding hybridization conditions for attaining particular degrees of stringency are discussed in Sambrook et al., (1989) Molecular Cloning, second edition, Cold Spring Harbor Laboratory, Plainview, N.Y. (chapters 9 and 11). It is intended that oligonucleotide probes hybridize under sufficiently stringent conditions such that the probes are specific for the expression products of the gastric cancer-associated genes.
The sequences of the genes listed in Table 5 are available in the art and may be obtained from publicly-accessible databases, such as the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.qov/, National Center for Biotechnology Information, National Library of Medicine, Building 38A, Bethesda, Md. 20894), and the European Molecular Biology Laboratory (EMBL) (www.ebi.ac.uk/embl/, EMBL Nucleotide Sequence Submissions, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK).
The invention is further illustrated by the following non-limiting examples.
GC cell lines were obtained either from commercial sources or collaborators and cultured as recommended. AGS, KatoIII, SNU1, SNU5, SNU16, SNU719, NCI-N87, and Hs746T were obtained from the American Type Culture Collection (http://www.atcc.org/) and cultured as recommended by the supplier. AZ521, Fu97, IM95, Ist1, MKN1, MKN45, MKN7, NUGC3, NUGC4, OCUM1, RerfGC1B Takigawa, TMK1 cells were obtained from the Japanese Collection of Research Bioresources/Japan Health Science Research Resource Bank (http://cellbank.nibio.go.jp/) and cultured as recommended. SCH cells were a gift from Yoshiaki Ito (Institute of Molecular and Cell Biology, Singapore) and grown in RPMI media. YCC1, YCC2, YCC3, YCC6, YCC7, YCC9, YCC10, YCC11, YCC16, YCC17, YCC18, YCC19, and YCC20 cells were a gift from Sun-Young Rha (Yonsei Cancer Center, South Korea) and were grown in MEM supplemented with 10% fetal bovine serum (FBS), 100 units/mL penicillin, 100 units/mL streptomycin, and 2 mmol/L L-glutamine (Invitrogen). CLS145 and HGC27 were obtained from the RIKEN Gene Bank (http://www.brc.riken.go.jp/) and cultured as recommended by supplier.
Four independent patient cohorts were analyzed (n=521). Cohort 1 (SG)-200 patients, National Cancer Centre Singapore, Singapore; Cohort 2 (AU)—70 patients, Peter MacCallum Cancer Centre, Australia; Cohort 3 (YG)—65 patients, Yonsei University, South Korea; and Cohort 4 (TMA)—186 patients, National Healthcare Group, Singapore. Cohorts 1-3 (SG/AU/YG) comprise gene expression profiles of primary GCs, while cohort 4 (TMA) comprises tumor sections on a tissue microarray. From the participating centres' tissue repositories or pathology archives, all available primary gastric tumors were collected with approvals from the respective institutional Research Ethics Review Committees and with signed patient informed consent. There was no pre-specified sample size calculation since this is a hypothesis generating discovery study. Clinical information was collected with Institutional Review Board approval and in accordance with REMARK guidelines (McShane L. M. et al., J Natl Cancer Inst, 2005, 97:1180-4). The clinical characteristics of the four cohorts are presented in Table 1. Clinical information was available for all patients except 3 patients in the SG cohort.
For gastric cancer cell lines and patient cohorts 1 and 2, gene expression profiling was performed with Affymetrix Human Genome U133 plus Genechips (HG-U133 Plus 2.0, Affymetrix). For patient cohort 3, IIlumina Human-6 v2 Expression Beadchips was employed. For gastric cancer cell lines and patient cohorts 1 and 2, total RNA was extracted using Qiagen RNA extraction reagents (Qiagen), and hybridized to Affymetrix Human Genome U133 plus Genechips (HG-U133 Plus 2.0, Affymetrix). Raw Affymetrix datasets are available from Gene Expression Omnibus database (GSE15460). For patient cohort 3, total RNA was extracted from the fresh frozen tissues using a mirVana™ RNA Isolation labeling kit (Ambion, Inc.) and hybridized to Illumina Human-6 v2 Expression Beadchips. Primary microarray data is available in the GEO database (GSE 15460 and GSE13861).
Cell proliferation assays were performed using a tetrazolium compound-based colorimetric method. Adherent or semi-adherent cell lines with doubling times less than 48 hours were used in this analysis. The cell lines for which cell proliferation assays were performed are: YCC19, YCC18, TMK1, YCC2, CLS145, YCC9, YCC6, NUGC3, HGC-27, Fu97, Ist1, YCC7, YCC16, Hs746T, MKN45, KatoIII, AGS, SNU719, AZ521, YCC1, MKN1, YCC11, IM95, MKN7, YCC3, YCC10, SCH and N87. Cell proliferation assays were performed using a tetrazolium compound-based colorimetric method (MTS kit, Promega, Madison, Wis., USA) according to the manufacturer's instructions and measured using an EnVision 2104 multilabel plate reader (Perkin Elmer, Finland) at 490 nm. Inhibition of cell growth by drugs was also visually confirmed under microscopy. Drugs used include cisplatin (Sigma, 479306-1G), oxaliplatin (Sigma, O9512), 5-Fluorouracil (Sigma, F6627-1G).
Samples from cohort 1 were subjected to central pathologic review by two independent pathologists (LKH, WWK) blinded to the genomic classification. Immunohistochemical studies using LGALS4 and CDH17 antibodies were performed on a tissue microarray of 186 GC patients (cohort 4), and staining intensities determined by a pathologist blinded to the clinical data (MST). Photomicrographs, details of staining patterns and grading scales are provided below.
Bioinformatic analyses were performed using R. Raw Affymetrix datasets were preprocessed with quantile normalization using RMA (package Affy). Gastric cancer cell lines were filtered using the nsFilter function from the Genefilter package on Bioconductor (Irizarry R. A. et al., Stat Appl Genet Mol Biol, 2003, 2:Article1, hereby incorporated by reference). The R package LIMMA was used for feature selection. Enrichment of functional annotations in the gene expression data were performed using EASE software (http://apps1.niaid.nih.qov/david/; Hosack D. A. et al., Genome Biol, 2003, 4:R70, hereby incorporated by reference). Statistical significance was determined using the Fisher exact score and EASE score. For patient cohorts, preprocessing of cohort 1 and 2 (Affymetrix) was performed with Refplus while preprocessing of cohort 3 (IIlumina) was performed with quantile normalization and the average signal intensity used for summarization. Nearest Template Prediction (Hoshida Y. et al., N Engl J Med, 2008, 359:1995-2004; Reiner A. et al., Bioinformatics, 2003, 19:368-75; Hoshida Y., PLoS One, 2010, 5:e15543, all of which are hereby incorporated by reference) was performed using Genepattern (Reich M. et al., Nat Genet, 2006, 38:500-1, hereby incorporated by reference). The R package e1071 was used for support vector machine (SVM) learning and classification. Correlation with clinico-pathologic parameters and survival analysis were performed using SPSS software (version 16, Chicago). Survival curves were estimated using the Kaplan-Meier method and the duration of survival was measured from the date of surgery to date of death or last follow-up visit. Cancer-specific survival (CSS) was used as the outcome metric, with deaths due to cancer was regarded as an event. Patients who are still alive, died from other causes or lost to follow-up at time of analysis were censored at their last date of follow up. Univariable and multivariable survival analyses were performed using the Cox proportional hazards regression model (Cox D. R., J Royal Stat Soc B, 1972, 34:182-220; Simon R., Br J Clin Pharmacol, 1982, 14:473-82, each of which is hereby incorporated by reference). The test of interaction between the genomic subtypes and therapy was performed with the null hypothesis of treatment equivalence within the subtypes and the alternative hypothesis was of differential treatment efficacy in the subtypes (Cox D. R., J Royal Stat Soc B, 1972, 34:182-220; Simon R., Br J Clin Pharmacol, 1982, 14:473-82, each of which is hereby incorporated by reference). Two-sided p-values less than 0.05 were considered statistically significant. Further details of bioinformatics and statistical analysis are provided below.
The Silhouette technique (Rousseeuw P. J., J Comput Appl Math, 1987, 20:53-65, hereby incorporated by reference) was used to evaluate the validity of clustering. To construct the silhouettes S(i) the following formula was used: S(i)=(b(i)−a(i))/max{a(i),b(i)}, where a(i)—average dissimilarity of i-object to all other objects in the same cluster; b(i)—minimum of average dissimilarity of i-object to all objects in other cluster (in the closest cluster). Silhouette values above 0 indicate that the sample is assigned to the appropriate cluster.
Naturally emergent patterns of at least 2 major subtypes within the 37 GCCLs from unsupervised clustering techniques were observed. nsFilter was employed as an initial filter. Briefly, nsFilter removes control probe sets and probe sets without an Entrez Gene ID annotation. A duplicate filter was also used to select the probe set with the largest variance, under conditions where multiple probe sets map to the same Gene ID. Genes were then filtered on variance alone, removing genes with an interquartile range less than the median interquartile range. 10135 genes passed this filter. Hierarchical clustering was performed using Euclidean distance and a complete linkage metric. Using the 2 major subtypes as class labels, LIMMA analysis was performed to identify genes exhibiting differential regulation between the phenotypes2. All signatures were corrected for multiple comparisons by the Benjamini and Hochberg method3 at a q-value threshold of 0.002. These 171 genes constitute the Gastric cell line derived signature associated with the biological subtype distinction.
Prediction analysis was performed by evaluating the expression status of the signature using the nearest template prediction (NTP) method as implemented in the NearestTemplatePrediction module of the GenePattern analysis toolkit. Briefly, a hypothetical sample serving as the template of G-INT outcome was defined as a vector having the same length as the G-INT signature. In this template, a value of 1 was assigned to G-INT-correlated genes and a value of −1 was assigned to G-DIF-correlated genes, and then each gene was weighted by the absolute value of the corresponding t score from the LIMMA algorithm. The template of G-DIF outcome was similarly defined. For each sample, a prediction was made based on the proximity measured by the cosine distance to either of the two templates. Significance for the proximity was estimated by comparison to a null distribution generated by randomly picking (1,000 times) the same number of marker genes from the microarray data for each sample, and correcting for multiple hypothesis testing.
A classifier was developed in the training gastric cancer cell line dataset based upon class labels generated by unsupervised hierarchal clustering of gastric cancer cell lines. A Support-Vector Machine (SVM) classification algorithm with a Radial-Basis Function (RBF) Kernel and eps-regression option was used, as provided by the Bioconductor software package e1071. After cross-validation, the trained classifier was then applied to the target primary tumor datasets. Each tumor profile is then ascribed a predicted class label, based on their classification scores (scaled SVM scores) reflecting the similarity of that sample with either G-INT or G-DIF subclass respectively.
Concordance between the 2 classification systems was 91-94% for the training dataset (GC cell lines) as well as in primary tumors (SG and AU cohorts). 86% of samples were identified by NTP at an FDR of <0.05. These results show that the 171 gene set can robustly classify primary tumors into G-INT and G-DIF sub-classes.
A total of 186 gastric cancer cases that were surgically resected at the National University of Singapore between year 2000 and 2008 were included in the construction of the tissue microarray (TMA). The TMA blocks were constructed as described previously (Zhang D. et al., Mod Pathol, 2003, 16:79-84; Ong C. W. et al., Mod Pathol, 2010, 23:450-7, each of which is hereby incorporated by reference). Briefly, a needle with 0.6 mm diameter was used to punch a donor core from morphologically representative areas of a donor tissue block. The core was subsequently inserted into a recipient paraffin block using an ATA-100 tissue arrayer (Chemicon, USA). Each core was taken from the central of tumor growth as well as a separate core from the matched histologically-normal gastric epithelium of the same case. Consecutive TMA sections of 4 μm thickness were cut and placed on slides for immunohistochemical analyses.
All protein markers were assessed immunohistochemically using commercially available antibodies (see table below). Antigen retrieval was carried out with 10 mM citrate buffer (pH 6.0) in a MicroMED TT Microwave Processor (Milestone, Sorisole, Italy) for 5 minutes at 120° C. Slides were then incubated with the primary antibody for 12 hours at the dilutions indicated in the table below. Immunostaining was performed with the streptavidin-biotin kit (LSAB2, Dako, Norway) in accordance with the manufacturer's specifications and the slides were then counterstained with hematoxylin. Various human tissues or cell lines embedded in paraffin with known expression for the markers were used as positive controls. Paraffin-embedded colorectal cancer tissue specimens were used as positive control for CDH17 (Su M. C. et al., Mod Pathol, 2008, 21:1379-86, hereby incorporated by reference). For LGALS4, normal colonic epithelial tissues were used as positive controls (Huflejt M. E. et al., Glycoconj J, 2004, 20:247-55, hereby incorporated by reference). Negative controls consisted of the omission of primary antibody without any other changes to subsequent procedures.
Dark brown membranous staining was defined as positive for CDH17. Positivity of LGALS4 was defined as staining in the cytoplasmic compartment. The staining was scored as follows: 0 (no detectable staining); 1+ (<25% positive cells), 2+ (25-49%) and 3+ (>50%). The primary evaluation of the staining was independently performed by a trained scientist (CWO) and confirmed by a gastrointestinal pathologist (MST).
The test of interaction between the intrinsic genomic subtypes and therapy were performed with the null hypothesis of treatment equivalence within the subtypes, and the alternative hypothesis of differential treatment efficacy between the subtypes (Cox D. R., J Royal Stat Soc B, 1972, 34:182-220; Simon R., Br J Clin Pharmacol, 1982, 14:473-82, each of which is hereby incorporated by reference). For the test of interaction (null hypothesis=NO interaction between therapy and genomic subtypes; alternative hypothesis=interaction between therapy and genomic subtypes), the model takes the form:
λgt(τ)=f(τ)exp(ag+bt+cgt);
with the hypotheses defined as:
H0: cg=1; t=1=cg=1; t=2=cg=2; t=1=cg=2; t=2=0 and
HA: At least 1 interaction term is not zero (cg=i; t=j≠0)
If the null hypothesis is rejected, subset effects will be investigated and the model above will be abandoned. The subset HR will be calculated based on 4 different models. Taking g=1 to define Subtype 1, g=2 to define Subtype 2, t=1 to define Adjuvant 5-FU based treatment and t=2 to define Surgery alone, the 4 models are as follows:
1. λgt(τ)=f(τ)exp(ag); Analysis done only on subset: patients on Adjuvant 5-FU based treatment
2. λgt(τ)=f(τ)exp(ag); Analysis done only on subset: patients on Surgery alone
3. λgt(τ)=f(τ)exp(bt); Analysis done only on subset: patients with Genomic Subtype 1
4. λgt(τ)=f(τ)exp(bt); Analysis done only on subset: patients with Genomic Subtype 2
Effectively model 1 and 2 are the same only that the patients used for the analysis are two different groups (mutually exclusive groups). The same goes for Model 3 and 4. An example is provided in Table 4.
Gene expression profiling was performed for a panel of 37 GC cell lines. Analysis of the expression data using four different unsupervised and unbiased clustering techniques (hierarchical clustering (Eisen M. B. et al., Proc Natl Acad Sci USA, 1998, 95:14863-8, hereby incorporated by reference), silhouette plot (SP) analysis (Rousseeuw P. J., J Comput Appl Math, 1987, 20:53-65, hereby incorporated by reference), nonnegative matrix factorization (NMF) (Lee D. D. et al., Nature, 1999, 401:788-91, hereby incorporated by reference), and principal components analysis (PCA)) was performed to identify pervasive and thereby “intrinsic” gene expression differences across the cell lines. Two major intrinsic subtypes were identified by hierarchical clustering (
LIMMA (Linear models for microarray data) (Smyth G. K., Stat Applications Gen Mol Biol, 2004, 3:Article 3, hereby incorporated by reference), a modified t-test incorporating the Benjamini Hochberg multiple correction technique (Benjamini Y. et al., Behav Brain Res, 2001, 125:279-84, hereby incorporated by reference), was used to analyze gene expression differences between the intrinsic subtypes. A genomic signature of 171 genes was identified, distinguishing the G-INT and G-DIF intrinsic subtypes (FDR<0.002;
The intrinsic 171-gene genomic signature was mapped onto primary tumors in two independent cohorts of GC patients (SG and AU), collectively totaling 270 patients. Two classification algorithms were used (Nearest Template Prediction and a support vector machine classifier). Concordance between the 2 classification systems (SVM and NTP) was 94-96% in the SG and AU cohorts with 88% of samples identified by NTP at an FDR of <0.05. These results show the 171 gene set can robustly classify primary tumors into G-INT and G-DIF sub-classes. Due to its methodological simplicity and applicability to single samples without requiring a corresponding training dataset [30], the NTP classifications were used for subsequent analyses. Specifically, 114 samples in the SG cohort and 38 samples in the AU cohort were classified as G-INT (
The associations of the intrinsic subtypes with clinical-pathologic parameters was investigated. The intrinsic subtypes were found to be significantly associated with Lauren's intestinal and diffuse subtypes respectively in the SG (p=0.002) and AU cohorts (p=0.003), hence their name (G-INT and G-DIF). Besides Lauren's, the intrinsic subtypes were also related to tumor grade (Table 7).
Although the intrinsic subtypes are named G-INT and G-DIF due to their associations with Lauren's histopathology, the overall concordance between the intrinsic genomic subtypes and Lauren's histopathology was only 64%. Thus, the two classifications should more appropriately be regarded as related but distinct. Specifically, 91 of 134 Lauren's intestinal cases were classified at GINT, and 64 of 106 Lauren's diffuse cases were classified as G-DIF (
Using cancer-specific survival as the outcome metric, patients with G-DIF cancers had worse survival outcomes compared to patients with G-INT tumors in the SG and AU cohorts (cohort 1: HR 1.78, 95% Cl: 1.19-2.64, p=0.004; cohort 2: HR 1.73, 95% Cl: 0.92-3.26, p=0.09) and also in a combined analysis (HR: 1.79, 95% Cl: 1.28-2.51, p=0.001,
In a multivariate analysis (Table 2), the intrinsic subtypes remained prognostic (p<0.001) even after accounting for other interacting factors such as Lauren's classes and grade. The intrinsic subtypes were also prognostic after accounting for other variables that were also prognostic in univariate analysis (stage, margin status and gender; p=0.005).
To further determine the general applicability of the intrinsic subclasses, the intrinsic genomic signature was applied to a third GC patient cohort (YG) profiled on a different microarray platform (Illumina Human-6 v2 Expression Beadchip). Of the 65 patients, 35 were classified as G-INT by NTP. Similar to the SG and AU cohorts, patients with G-INT tumors had superior overall survival compared to patients with G-DIF tumors in the YG cohort (HR 3.3, 95% Cl: 1.03-10.53, p=0.04), while Lauren's classes was not prognostic (p=0.23).
To assess if a panel of immunohistochemical markers might also be used to identify the intrinsic subtypes and its relation to survival outcomes, an independent tissue microarray (TMA) cohort (cohort 4) of 186 GC patients was analyzed. Two G-INT markers were selected (LGALS4 and CDH17) meeting the criteria of high gene expression in G-INT cell lines and tumors, and for which commercial immunohistochemical markers were available. The TMA tumors were classified based on their intensity of LGALS4 and CDH17 staining (CDH17 (>1+) and LGALS4 (>2+)), using intensity cutoffs determined by a pathologist blinded to the clinical data. To confidently distinguish between G-INT and G-DIF cancers, the 2-marker positive group (G-INT) was compared to the 2-marker negative group (G-DIF). Among the 186 tumors, 75 were classified as G-INT (both markers positive), 44 as G-DIF (neither marker positive) and 67 were equivocal (one marker positive). Patients with G-DIF tumors classified by IHC exhibited worse outcomes than G-INT tumors classified by IHC (Hazard ratio, adjusted for stage: 1.95, 95% Cl: 1.13-3.38, p=0.02) (
Of the 37 cell lines, 28 cell lines (11 G-INT and 17 G-DIF) had growth characteristics suitable for in vitro drug sensitivity testing. 5-FU, oxaliplatin and cisplatin are drugs presently employed in the adjuvant and 1st line palliative treatment of GC. The 28 cell lines were treated with increasing concentrations of these drugs. G-INT cell lines were significantly more sensitive to 5-FU (p=0.04) and oxaliplatin (p=0.02) in vitro, while G-DIF cell lines were more sensitive to cisplatin (p=0.03) (
Information regarding use of adjuvant 5 Fluorouracil chemoradiation were available from 2 gene expression cohorts (1 & 2) and the TMA cohort (cohort 4). Decisions regarding adjuvant therapy in these cohorts were based upon existing knowledge at the point of diagnosis, patient's general health status, risk factors for relapse especially disease stage, treatment related toxicities and patient preference.
Patients with advanced stage disease were more likely to receive adjuvant treatment (p=0.03), however no significant differences were observed in prescribing 5-FU therapy between the intrinsic subtypes either across all stages (p=0.27) or within each stage (p˜0.4-0.8) (Table 7). To evaluate if the intrinsic subtypes might exhibit differential benefit with 5-FU chemoradiation in the patient cohorts, a statistical test for interaction that was specifically adjusted for stage was performed.
A significant interaction between the intrinsic subtypes and benefit with 5-FU based chemoradiation (Table 3) was observed, which shows that patients with G-INT tumors may derive differential benefit from adjuvant 5-FU based therapy. Specifically, the test for interaction by Cox proportional hazards regression was p=0.002 (combined analysis), gene expression (p=0.03) and TMA cohorts (p=0.02). The stage adjusted hazard ratio of death due to cancer for surgery alone compared to adjuvant 5-FU therapy was 1.68 (p=0.06 for G-INT tumors and 0.90 (p=0.67) for G-DIF tumors. Table 3 presents the interactions for the combined analysis, while the gene expression and TMA cohorts are separately presented in Table 8.
1. Naturally emergent patterns of at least 2 major subtypes within gene expression profiles from 37 Gastric Cancer Cell Lines (GCCLs) issuing from unsupervised clustering techniques was observed (hierarchal clustering, NMF clustering, Kmeans clustering, silhouette plot analysis).
2. Feature selection. Bioinformatic analysis was performed with R.
a. To select features, nsFilter was employed as an initial filter.
i. Briefly, nsFilter removes control probe sets and probe sets without an Entrez Gene ID annotation. A duplicate filter was also used to select the probe set with the largest variance, under conditions where multiple probe sets map to the same Gene ID. Genes were then filtered on variance alone, removing genes with an interquartile range less than the median interquartile range. 10135 genes passed this filter.
ii. Hierarchical clustering was performed using Euclidean distance and a complete linkage metric.
iii. Using the 2 major subtypes as class labels, LIMMA analysis (package e1071 from bioconductor) was performed to identify genes exhibiting differential regulation between the phenotypes.
iv. All analysis were corrected for multiple comparisons by the Benjamini and Hochberg method3 at a q-value threshold of 0.002.
v. These 171 genes constitute the Gastric cell line derived signature associated with the biological subtype distinction.
3. Classification. Nearest Template Prediction was performed with GenePattern (publicly available at www.broadinstitute.org/cancer/software/genepattern/)
i. Prediction analysis was performed by evaluating the expression status of the signature using the nearest template prediction (NTP) method as implemented in the NearestTemplatePrediction module of the GenePattern analysis toolkit.
ii. Briefly, a hypothetical sample serving as the template of G-INT outcome was defined as a vector having the same length as the GINT signature. In this template, a value of 1 was assigned to G-INTcorrelated genes and a value of −1 was assigned to G-IFcorrelated genes, and then each gene was weighted by the absolute value of the corresponding t score from the LIMMA algorithm. The template of G-DIF outcome was similarly defined.
iii. For each sample, a prediction was made based on the proximity measured by the cosine distance to either of the two templates. Significance for the proximity was estimated by comparison to a null distribution generated by randomly picking (1,000 times) the same number of marker genes from the microarray data for each sample, and correcting for multiple hypothesis testing.
iv. An FDR<0.05 defines a robustly classified sample.
4. How many genes to robustly classify. The table in subsequent pages of this document list all 171 genes ranked from most “discriminative” to least “discriminative”. The subsequent table list effects of dropping genes from the bottom of the list, leaving behind the top 170, top 169 genes and so on. It appears that dropping below 60 genes compromises slightly on the precision of the classification and dropping below 44 substantially on the precision of the classification.
Background:
Several gene expression signatures derived from supervised approaches based on histology, peritoneal or lymph node metastases and survival have been proposed in order to classify gastric cancers such as adenocarcinomas and provide prognostic information. These studies had relatively small sample sizes. There are two major disadvantages of these approaches. One disadvantage is that gastric adenocarcinomas are characterized by substantial tissue heterogeneity. Different cell populations (tumor cells, fibroblastic/desmoplastic stroma and immune cells) may confound signature development and use thereof. Macro and micro-dissection can be challenging. Another disadvantage is that supervised approaches rely on precise histopathology. Discordance among pathologists compromises signature development. The strategy described in this example involves an initial focus on a diverse panel of gastric cancer cell lines. The hypothesis is that any genomic differences detected in cell lines should be, by nature, tumor-centric and thereby “intrinsic” to the underlying biology of the GC cancer cell.
Methods:
7 datasets of gene expression profiles across different microarray platforms were generated in-house or obtained from collaborators. The study included a panel of 37 gastric cancer cell lines (GCCLs) which were analyzed using the Affymetrix U133-2Plus microarray and samples from 549 patients in 6 independent patient cohorts as follows: 197 patients in Singapore whose samples were analyzed using the Affymetrix U133-2plus microarray; 70 patients in Australia, whose samples where analyzed using the Affymetrix U133-2plus microarray: 31 patients in the United Kingdom whose samples were analyzed using the Affymetrix U133AB microarray; 90 patients from Hong Kong whose samples were analyzed using a custom array; a first set of 96 patients from Korea whose samples were analyzed using a custom array; and a second set of 65 patients in Korea whose samples were analyzed using the Illumina Human-6 v2 microarray. Unsupervised techniques were used to distinguish major intrinsic subtypes from GCCLs and distinguishing features were identified using linear models for microarray data (LIMMA). Patient tumors were classified using the nearest template prediction algorithm and the classification precision and correlation with patient survival were evaluated.
Results:
Beginning with unsupervised techniques, 2 major intrinsic subtypes were identified from the training set (GCCL). A 171-gene signature was identified that could distinguish the two subtypes of tumors. At a false discovery rate of 0.05, the signature precisely classified 432 (78.6%—see Table 11) of primary tumors with 61.1% to 88.6% of tumors precisely classified in each dataset and 55% of the classified tumors belonging to the larger of 2 intrinsic subgroups. With 5 other published signatures, classification precision was <30%. The 2 genomic subtypes were differentially enriched among Lauren's intestinal and diffuse histological subtypes (p<0.001, chi square test). The subclasses were therefore referred to as genomic intestinal and genomic diffuse (
This classification of intrinsic subtypes provided prognostic information with the more aggressive subgroup having inferior overall survival: median survival: 30 months vs. 71 months (HR 1.48; 95% Cl: 1.14-1.92, p<0.01, univariate analysis and HR 1.39; 95% Cl: 1.05-1.78, p=0.02 after adjusting for stage—See Table 12). All of the other previously published gene signatures were found to be not prognostic.
The genomic intrinsic gastric cancer classification scheme described herein which was discovered by an unsupervised approach in investigating gastric cancer cell lines precisely classifies patient samples. Although the intrinsic subtypes classification is related to Lauren's histology, it represents a significant improvement by providing independent prognostic value in 6 independent datasets across different microarray platforms.
This example indicates that the intrinsic signature provided by the method described herein was successful in precisely classifying gastric cancers in 6 large patient cohorts from different countries and using different microarray platforms. This indicates that the methods described herein provide better prognostic information than the methods that use the previously existing signatures.
The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this invention belongs. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural reference unless the context clearly dictates otherwise.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it is readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.
1.95 (1.36-2.78),
1.92 (1.32-2.78),
p < 0.001
p < 0.001
HR: 1.79, (1.28-2.51),
1.63 (1.16-2.29),
p = 0.001
p = 0.005
1.45 (1.01-2.08),
p = 0.05
1.83 (1.16-2.90),
p = 0.01
4.40 (1.49-12.99),
4.39 (1.48-12.97),
p = 0.01
p = 0.01
11.99 (4.35-33.04),
12.29 (4.45-33.98),
p < 0.001
p < 0.001
30.13 (10.78-84.22),
28.56 (10.14-80.43),
p < 0.001
p < 0.001
This application claims benefit of, and priority from, U.S. provisional patent application No. 61/476,698, filed on Apr. 18, 2011, the contents of which are fully incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61476698 | Apr 2011 | US |