Correctly characterizing the type or types of cancer a patient or subject has and, potentially, selecting one or more effective therapies for the patient can be crucial for the survival and overall wellbeing of that patient. Advances in characterizing cancers, predicting prognoses, identifying effective therapies, and otherwise aiding in personalized care of patients with cancer are needed.
Aspects of the disclosure relate to techniques for characterizing subjects having certain renal (kidney) cancers, such as clear cell renal carcinoma (ccRCC). The disclosure is based, in part, on methods for identifying the tumor microenvironment (TME) of a subject having renal cancer (e.g., ccRCC) by using gene expression data obtained from the subject to produce a renal cancer (RC) tumor microenvironment (TME) signature (referred to as an RC TME signature) that, when processed by methods disclosed herein, allows for assignment of an RC TME type to the subject. In some embodiments, the RC TME type of a subject is indicative of one or more characteristics of the subject (or the subject's cancer), for example the likelihood a subject will have a good prognosis or respond to a therapeutic agent such as an immunotherapy (also referred to as an IO agent) or a tyrosine kinase inhibitor (TKI). Aspects of the disclosure also relate to machine learning techniques for determining whether (e.g., the likelihood that) a subject will have a good prognosis or respond to an IO agent or TKI.
Accordingly in some aspects, the disclosure provides a method for determining a renal cancer (RC) tumor microenvironment (TME) type for a subject having, suspected of having, or at risk of having renal cancer, the method comprising using at least one computer hardware processor to perform obtaining RNA expression data for the subject, the RNA expression data indicating RNA expression levels for at least some genes in each group of at least some of a plurality of gene groups listed in Table 1; generating an RC TME signature for the subject using the RNA expression data, the RC TME signature comprising gene group scores for respective gene groups in the at least some of the plurality of gene groups, the generating comprising:
determining the gene group scores using the RNA expression levels; and identifying, using the RC TME signature and from among a plurality of RC TME types, an RC TME type for the subject.
In some aspects, the disclosure provides a method for determining a renal cancer (RC) myogenesis signature for a subject having, suspected of having, or at risk of having renal cancer, the method comprising using at least one computer hardware processor to perform obtaining RNA expression data for the subject, the RNA expression data indicating RNA expression levels for at least some of the genes in the gene group listed in Table 2; and generating a myogenesis signature for the subject using the RNA expression data, the myogenesis signature consisting of a gene group score for the gene group listed in Table 2, the gene group score determined using the RNA expression levels.
In some aspects, the disclosure provides a method for predicting the likelihood of a subject responding to an immuno-oncology (IO) agent, the subject having, suspected of having, or at risk of having renal cancer, the method comprising using at least one computer hardware processor to perform generating, using RNA expression data that has been obtained from a subject, a set of input features, the set of input features comprising at least two of the following features an RC TME type for the subject; RNA expression levels for one or more of the following genes: PD1, PD-L1, and PD-L2; an ECM associated signature for the subject; an Angiogenesis signature for the subject; a Proliferation rate signature for the subject; and a similarity score indicative of a similarity of an RC TME signature for the subject to RC TME signatures associated with RC TME type B and/or RC TME Type C samples; providing the set of input features as input to a machine learning model to obtain a corresponding output indicating a responder score, the responder score indicative of a likelihood that the subject responds to the immuno-oncology (IO) agent; identifying the subject as likely to have an increased likelihood of responding to the IO agent when the responder score is greater than a specified threshold.
In some aspects, the disclosure provides a method for predicting the likelihood of a subject responding to tyrosine kinase inhibitor (TKI), the subject having, suspected of having, or at risk of having renal cancer, the method comprising using at least one computer hardware processor to perform generating, using RNA expression data that has been obtained from a subject, a set of input features, the set of input features comprising at least two of the following features: a Macrophage signature for the subject; an Angiogenesis signature for the subject; a Proliferation rate signature for the subject; and a similarity score indicative of a similarity of an RC TME signature for the subject to RC TME signatures associated with RC TME type B samples; providing the set of input features as input to a machine learning model to
obtain a corresponding output indicating a responder score, the responder score indicative of a likelihood that the subject responds to the TKI; identifying the subject as likely to have an increased likelihood of responding to the TKI when the responder score is greater than a specified threshold.
In some aspects, the disclosure provides a method for identifying one or more therapeutic agents for administration to a subject having renal cancer, the method comprising: generating an International Metastatic RCC Database Consortium (IMDC) Risk Score for the subject; when the subject is identified as having a Poor IMDC Risk Score, identifying a combination of immuno-oncology (IO) agent and TKI as the one or more therapeutic agents for administration to the subject; when the subject is identified as having a Favorable or Intermediate IMDC Risk Score,
Generating an IO responder score according to a method as described herein; a TKI responder score according to a method as described herein; and identifying the one or more therapeutic agents for the subject using the IO responder score and the TKI responder score.
In some aspects, the disclosure provides a system, comprising at least one computer hardware processor; and at least one non-transitory computer readable medium storing processor-executable instructions that, when executed by the at least one computer hardware processor, causes the at least one computer hardware processor to perform a method for determining a renal cancer (RC) tumor microenvironment (TME) type for a subject, as described herein.
In some aspects, the disclosure provides at least one non-transitory computer readable medium storing processor-executable instructions that, when executed by at least one computer hardware processor, causes the at least one computer hardware processor to perform a method for determining a renal cancer (RC) tumor microenvironment (TME) type for a subject, as described herein.
In some aspects, the disclosure provides a system, comprising at least one computer hardware processor; and at least one non-transitory computer readable medium storing processor-executable instructions that, when executed by the at least one computer hardware processor, causes the at least one computer hardware processor to perform a method for determining a renal cancer (RC) myogenesis signature for a subject, as described herein.
In some aspects, the disclosure provides at least one non-transitory computer readable medium storing processor-executable instructions that, when executed by at least one computer hardware processor, causes the at least one computer hardware processor to perform a method for determining a renal cancer (RC) myogenesis signature for a subject, as described herein.
In some aspects, the disclosure provides a system, comprising at least one computer hardware processor; and at least one non-transitory computer readable medium storing processor-executable instructions that, when executed by the at least one computer hardware processor, causes the at least one computer hardware processor to perform a method for predicting the likelihood of a subject responding to an immuno-oncology (IO) agent, as described herein.
In some aspects, the disclosure provides at least one non-transitory computer readable medium storing processor-executable instructions that, when executed by at least one computer hardware processor, causes the at least one computer hardware processor to perform a method for predicting the likelihood of a subject responding to an immuno-oncology (IO) agent, as described herein.
In some aspects, the disclosure provides a system, comprising at least one computer hardware processor; and at least one non-transitory computer readable medium storing processor-executable instructions that, when executed by the at least one computer hardware processor, causes the at least one computer hardware processor to perform a method for predicting the likelihood of a subject responding to tyrosine kinase inhibitor (TKI), as described herein.
In some aspects, the disclosure provides at least one non-transitory computer readable medium storing processor-executable instructions that, when executed by at least one computer hardware processor, causes the at least one computer hardware processor to perform a method for predicting the likelihood of a subject responding to tyrosine kinase inhibitor (TKI), as described herein.
In some aspects, the disclosure provides a system, comprising at least one computer hardware processor; and at least one non-transitory computer readable medium storing processor-executable instructions that, when executed by the at least one computer hardware processor, causes the at least one computer hardware processor to perform a method for identifying one or more therapeutic agents for administration to a subject having renal cancer as described herein.
In some aspects, the disclosure provides at least one non-transitory computer readable medium storing processor-executable instructions that, when executed by at least one computer hardware processor, causes the at least one computer hardware processor to perform a method for identifying one or more therapeutic agents for administration to a subject having renal cancer as described herein.
In some embodiments, obtaining the RNA expression data for the subject comprises obtaining sequencing data previously obtained by sequencing a biological sample obtained from the subject.
In some embodiments, the sequencing data comprises at least 1 million reads, at least 5 million reads, at least 10 million reads, at least 20 million reads, at least 50 million reads, or at least 100 million reads. In some embodiments, the sequencing data comprises whole exome sequencing (WES) data, bulk RNA sequencing (RNA-seq) data, single cell RNA sequencing (scRNA-seq) data, or next generation sequencing (NGS) data. In some embodiments, the sequencing data comprises microarray data.
In some embodiments, the method further comprises normalizing the RNA expression data to transcripts per million (TPM) units prior to generating the RC TME signature.
In some embodiments, obtaining the RNA expression data for the subject comprises sequencing a biological sample obtained from the subject. In some embodiments, biological sample comprises kidney tissue of the subject. In some embodiments, the biological sample comprises tumor tissue of the subject.
In some embodiments, the RNA expression levels comprise RNA expression levels for at least three genes from each of at least two of the following gene groups: Effector cells group: PRF1, GZMB, TBX21, CD8B, ZAP70, IFNG, GZMK, EOMES, FASLG, CD8A, GZMA, GNLY; NK cells group: GZMB, NKG7, CD160, GZMH, CD244, EOMES, KLRK1, NCR1, GNLY, KLRF1, FGFBP2, SH2D1B, KIR2DL4, IFNG, NCR3, KLRC2, CD226; T cells group: TRAC, TRBC2, TBX21, CD3E, CD3D, ITK, TRBC1, CD3G, CD28, TRAT1, CD5; B cells group: CR2, MS4A1, CD79A, FCRL5, STAP1, TNFRSF17, TNFRSF13B, CD19, BLK, CD79B, TNFRSF13C, CD22, PAX5; Antitumor cytokines group: IFNA2, CCL3, TNF, TNFSF10, IL21, IFNB1; Checkpoint inhibition group: CTLA4, HAVCR2, CD274, LAG3, BTLA, VSIR, PDCD1LG2, TIGIT, PDCD1; Treg group: TNFRSF18, IKZF2, IL10, IKZF4, CTLA4, FOXP3, CCR8; Neutrophil signature group: FCGR3B, CD177, CTSG, PGLYRP1, FFAR2, CXCR2, PRTN3, ELANE, MPO, CXCR1; (i) Granulocyte traffic group: CXCL8, CCR3, CXCR2, CXCL2, CCL11, KITLG, CXCL1, CXCL5, CXCR1; MDSC group: ARG1, IL4I1, IL10, CYBB, IL6, PTGS2, IDO1; Macrophages group: MRC1, CD163, MSR1, SIGLEC1, IL4I1, CD68, IL10, CSF1R; Cancer-associated fibroblasts (CAF) group: PDGFRB, COL6A3, FBLN1, CXCL12, COL6A2, COL6A1, LUM, CD248, COL5A1, MMP2, COL1A1, MFAP5, PDGFRA, LRP1, FGF2, MMP3, FAP, COL1A2, ACTA2; Matrix group: COL11A1, LAMB3, FN1, COL1A1, COL4A1, ELN, LGALS9, LGALS7, LAMC2, TNC, LAMA3, COL3A1, COL5A1, VTN, COL1A2; Angiogenesis group: PGF, CXCL8, FLT1, ANGPT1, ANGPT2, VEGFC, VEGFB, CXCR2, VEGFA, VWF, CDH5, CXCL5, PDGFC, KDR, TEK; Endothelium group: NOS3, MMRN1, FLT1, CLEC14A, MMRN2, VCAM1, ENG, VWF, CDH5, KDR; Proliferation rate group: AURKA, MCM2, CCNB1, MYBL2, MCM6, CDK2, E2F1, CCNE1, ESCO2, CCND1, AURKB, BUB1, MKI67, PLK1, CETN3; EMT signature group: SNAI2, TWIST1, ZEB2, SNAI1, ZEB1, TWIST2, CDH2; Citric Acid Cycle group: ACLY, FAH, PC, MDH1B, SLC16A7, IREB2, PCK1, MDH1, SLC33A1, ALDH1B1, IDH3B, DLST, PDHB, MDH2, ACO1, IDH1, SLC5A6, HICDH, SLC16A8, GOT1, ME3, ME1, CS, OGDH, SDHA, ALDH5A1, CLYBL, SDHD, IDH3A, SLC25A1, ACSS2, SDHC, ACSS1, SUCLA2, SLC13A5, PDHX, SDHB, ALDH4A1, PCK2, DLD, ACO2, PDHA1, SLC13A2, FAHD1, IDH2, GOT2, ME2, ADSL, SUCLG2, SLC13A3, SUCLG1, SLC25A10, FH, IDH3G, SLC16A1, SLC25A11, PDHA2, DLAT; Glycolysis and Gluconeogenesis group: SLC2A9, PFKL, GCK, PFKFB4, SLC16A7, PCK1, PGAM2, GAPDH, BPGM, G6PC2, FBP2, LDHD, SLC2A3, GPI, ENO1, SLC25A11, PFKFB3, PFKM, LDHAL6B, SLC2A2, G6PC3, SLC2A6, GAPDHS, SLC2A11, PCK2, PFKP, PGK1, ALDOC, SLC2A10, ACYP2, SLC2A4, PKLR, HKDC1, PGK2, SLC2A8, PGAM1, SLC5A1, SLC5A12, SLC16A1, ALDOB, HK3, HK1, SLC5A9, GPD2, PFKFB1, SLC2A7, SLC5A11, SLC5A3, ACYP1, SLC16A8, PFKFB2, ALDOA, SLC5A2, HK2, ENO3, SLC2A12, FBP1, LDHA, LDHB, LDHC, G6PC, SLC2A14, SLC5A8, TPI1, SLC16A3, PKM2, ENO2, PGM1, UEVLD, LDHAL6A, SLC2A1, PGM2; and Fatty Acid Metabolism group: MLYCD, ALDH3A2, SLC27A5, SLC27A3, LIPC, SLC27A2, ACSL4, ACSL1, PCCB, SLC25A20, AADAC, SLC22A4, SLC22A5, ECH1, PCCA, SLC27A1, SLC27A4, CROT, ACSL5, ACSL3, CYP4F12.
In some embodiments, the RNA expression levels comprise RNA expression levels for at least three genes from each of at least two of the following gene groups: MHC I group: HLA-C, B2M, HLA-B, HLA-A, TAP1, TAP2, NLRC5, TAPBP; MHC II group: HLA-DQA1, HLA-DMA, HLA-DRB1, HLA-DMB, CIITA, HLA-DPA1, HLA-DPB1, HLA-DRA, HLA-DQB1; Coactivation molecules group: CD80, TNFRSF4, CD27, CD83, TNFSF9, CD40LG, CD70, ICOS, CD86, CD40, TNFSF4, ICOSLG, TNFRSF9, CD28; Effector cells group: PRF1, GZMB, TBX21, CD8B, ZAP70, IFNG, GZMK, EOMES, FASLG, CD8A, GZMA, GNLY; T cell traffic group: CXCL9, CCL3, CXCR3, CXCL10, CXCL11, CCL5, CCL4, CX3CL1, CX3CR1; NK cells group: GZMB, NKG7, CD160, GZMH, CD244, EOMES, KLRK1, NCR1, GNLY, KLRF1, FGFBP2, SH2D1B, KIR2DL4, IFNG, NCR3, KLRC2, CD226; T cells group: TRAC, TRBC2, TBX21, CD3E, CD3D, ITK, TRBC1, CD3G, CD28, TRAT1, CD5; B cells group: CR2, MS4A1, CD79A, FCRL5, STAP1, TNFRSF17, TNFRSF13B, CD19, BLK, CD79B, TNFRSF13C, CD22, PAX5; M1 signatures group: IL1B, IL12B, NOS2, SOCS3, IRF5, IL23A, TNF, IL12A, CMKLR1; Th1 signature group: IL12RB2, IL2, TBX21, IFNG, STAT4, IL21, CD40LG; Antitumor cytokines group: IFNA2, CCL3, TNF, TNFSF10, IL21, IFNB1; Checkpoint inhibition group: CTLA4, HAVCR2, CD274, LAG3, BTLA, VSIR, PDCD1LG2, TIGIT, PDCD1; Treg group: TNFRSF18, IKZF2, IL10, IKZF4, CTLA4, FOXP3, CCR8; T reg traffic group: CCL28, CCR10, CCR4, CCR8, CCL17, CCL22, CCL1; Neutrophil signature group: FCGR3B, CD177, CTSG, PGLYRP1, FFAR2, CXCR2, PRTN3, ELANE, MPO, CXCR1; Granulocyte traffic group: CXCL8, CCR3, CXCR2, CXCL2, CCL11, KITLG, CXCL1, CXCL5, CXCR1; MDSC group: ARG1, IL4I1, IL10, CYBB, IL6, PTGS2, IDO1; MDSC traffic group: CCL15, IL6R, CSF2RA, CSF2, CXCL8, CXCL12, IL6, CSF3, CCL26, CXCR4, CXCR2, CSF3R, CSF1, CXCL5, CSF1R; Macrophages group: MRC1, CD163, MSR1, SIGLEC1, IL4I1, CD68, IL10, CSF1R; Macrophage DC traffic group: CCL7, CCL2, XCR1, XCL1, CSF1, CCR2, CCL8, CSF1R; Th2 signature group: IL13, CCR4, IL10, IL5, IL4; Protumor cytokines group: MIF, TGFB1, IL10, TGFB3, IL6, TGFB2, IL22; CAF group: PDGFRB, COL6A3, FBLN1, CXCL12, COL6A2, COL6A1, LUM, CD248, COL5A1, MMP2, COL1A1, MFAP5, PDGFRA, LRP1, FGF2, MMP3, FAP, COL1A2, ACTA2; Matrix group: COL11A1, LAMB3, FN1, COL1A1, COL4A1, ELN, LGALS9, LGALS7, LAMC2, TNC, LAMA3, COL3A1, COL5A1, VTN, COL1A2; Matrix remodeling group: MMP1, PLOD2, MMP2, MMP12, ADAMTS5, ADAMTS4, LOX, MMP9, MMP11, MMP3, MMP1, CA9; Angiogenesis group: PGF, CXCL8, FLT1, ANGPT1, ANGPT2, VEGFC, VEGFB, CXCR2, VEGFA, VWF, CDH5, CXCL5, PDGFC, KDR, TEK; Endothelium group: NOS3, MMRN1, FLT1, CLEC14A, MMRN2, VCAM1, ENG, VWF, CDH5, KDR; Proliferation rate group: AURKA, MCM2, CCNB1, MYBL2, MCM6, CDK2, E2F1, CCNE1, ESCO2, CCND1, AURKB, BUB1, MKI67, PLK1, CETN3; EMT signature group: SNAI2, TWIST1, ZEB2, SNAI1, ZEB1, TWIST2, CDH2; Cyclic Nucleotides Metabolism group: ADCY4, PDE11A, PDE6A, PDE9A, PDE6C, ADCY7, PDE4A, PDE8A, PDE1B, PDE1A, GUCY2C, GUCY1A3, ADCY9, ADCY2, PDE6B, ADCY8, PDE8B, GUCY2F, PDE4C, PDE3A, GUCY1A2, PDE6G, PDE1C, GUCY2D, ADCY10, GUCY1B3, GUCY1B2, PDE7B, PDE5A, PDE6D, NPR2, ADCY5, NPR1, ADCY6, PDE7A, PDE2A, PDE4B, PDE10A, PDE6H, PDE4D, ADCY1, PDE3B, ADCY3; Glycolysis and Gluconeogenesis group: SLC2A9, PFKL, GCK, PFKFB4, SLC16A7, PCK1, PGAM2, GAPDH, BPGM, G6PC2, FBP2, LDHD, SLC2A3, GPI, ENO1, SLC25A11, PFKFB3, PFKM, LDHAL6B, SLC2A2, G6PC3, SLC2A6, GAPDHS, SLC2A11, PCK2, PFKP, PGK1, ALDOC, SLC2A10, ACYP2, SLC2A4, PKLR, HKDC1, PGK2, SLC2A8, PGAM1, SLC5A1, SLC5A12, SLC16A1, ALDOB, HK3, HK1, SLC5A9, GPD2, PFKFB1, SLC2A7, SLC5A11, SLC5A3, ACYP1, SLC16A8, PFKFB2, ALDOA, SLC5A2, HK2, ENO3, SLC2A12, FBP1, LDHA, LDHB, LDHC, G6PC, SLC2A14, SLC5A8, TPI1, SLC16A3, PKM2, ENO2, PGM1, UEVLD, LDHAL6A, SLC2A1, PGM2; Citric Acid Cycle group: ACLY, FAH, PC, MDH1B, SLC16A7, IREB2, PCK1, MDH1, SLC33A1, ALDH1B1, IDH3B, DLST, PDHB, MDH2, ACO1, IDH1, SLC5A6, HICDH, SLC16A8, GOT1, ME3, ME1, CS, OGDH, SDHA, ALDH5A1, CLYBL, SDHD, IDH3A, SLC25A1, ACSS2, SDHC, ACSS1, SUCLA2, SLC13A5, PDHX, SDHB, ALDH4A1, PCK2, DLD, ACO2, PDHA1, SLC13A2, FAHD1, IDH2, GOT2, ME2, ADSL, SUCLG2, SLC13A3, SUCLG1, SLC25A10, FH, IDH3G, SLC16A1, SLC25A11, PDHA2, DLAT; and, Fatty Acid Metabolism group: MLYCD, ALDH3A2, SLC27A5, SLC27A3, LIPC, SLC27A2, ACSL4, ACSL1, PCCB, SLC25A20, AADAC, SLC22A4, SLC22A5, ECH1, PCCA, SLC27A1, SLC27A4, CROT, ACSL5, ACSL3, CYP4F12.
In some embodiments, the RNA expression levels further comprise RNA expression levels for at least three genes from each of at least two of the following gene groups: ECM associated group: ADAM8, ADAMTS4, C1QL3, CST7, CTSW, CXCL8, FASLG, LTB, MUC1, OSM, P4HA2, SCUBE1, SEMA4B, SEMA7A, SERPINE1, TCHH, TGFA, TGM2, TNFSF11, TNFSF9, WNT10B; TLS kidney group: ZNF683, POU2AF1, LAX1, CD79A, CXCL9, XCL2, JCHAIN, SLAMF7, CD38, SLAMF1, TNFRSF17, IRF4, HSH2D, PLA2G2D, MZB1; NRF2 signature group: TRIM16L, UGDH, KIAA1549, PANX2, FECH, LRP8, AKR1C2, FTH1, AKR1C3, CBR1, PFN2, CBX2, TXN, CYP4F11, CYP4F3, AKR1C1, AKR1B15, G6PD, PRDX1, TALDO1, EPT1, SRXN1, JAKMIP3, FTHL3, UCHL1, TXNRD1, Clorf131, CASKIN1, PGD, GPX2, OSGIN1, KIAA0319, CABYR, AIFM2, TRIM16, AKR1B10, GCLC, ABCC2, ETFB, IDH1, MAFG, NECAB2, ME1, PTGR1, PIR, GSR, RIT1, GCLM, ALDH3A1, NQO1, PKD1L2, NRG4, ABHD4, HRG, SLC7A11; and, tRCC signature group: FST, TRIM63, SLC10A2, ANTXRL, ERW-2, SNX22, INHBE, SV2B, FAM124A, EPHA5, LUZP2, CPEB1, HOXB13, ALLC, KCNF1, NDRG4, GREB1, ASTN1, JSRP1, UBE2U, KCNQ4, MYO7B, BRINP2, C1QL2, CCDC136, SLC51B, CATSPERG, PMEL, BIRC7, PLK5, ADARB2, CFAP61, TUBB4A, PLIN4, ABCB5, SYT3, HCN4, CTSK, SPACA1, TRIM67, NMRK2, LGI3, ARHGEF4, NTSR2, KEL, SNCB, PLD5, ADGRB1, CYP17A1, IGFBPL1, TRIM71, SLC45A2, TP73, IP6K3, HABP2, RGS20, IGFN1, CDH17.
In some embodiments, the RNA expression levels comprise RNA expression levels for each of the genes from each of the following gene groups: Effector cells group: PRF1, GZMB, TBX21, CD8B, ZAP70, IFNG, GZMK, EOMES, FASLG, CD8A, GZMA, GNLY; NK cells group: GZMB, NKG7, CD160, GZMH, CD244, EOMES, KLRK1, NCR1, GNLY, KLRF1, FGFBP2, SH2D1B, KIR2DL4, IFNG, NCR3, KLRC2, CD226; T cells group: TRAC, TRBC2, TBX21, CD3E, CD3D, ITK, TRBC1, CD3G, CD28, TRAT1, CD5; B cells group: CR2, MS4A1, CD79A, FCRL5, STAP1, TNFRSF17, TNFRSF13B, CD19, BLK, CD79B, TNFRSF13C, CD22, PAX5; Antitumor cytokines group: IFNA2, CCL3, TNF, TNFSF10, IL21, IFNB1; Checkpoint inhibition group: CTLA4, HAVCR2, CD274, LAG3, BTLA, VSIR, PDCD1LG2, TIGIT, PDCD1; Treg group: TNFRSF18, IKZF2, IL10, IKZF4, CTLA4, FOXP3, CCR8; Neutrophil signature group: FCGR3B, CD177, CTSG, PGLYRP1, FFAR2, CXCR2, PRTN3, ELANE, MPO, CXCR1; Granulocyte traffic group: CXCL8, CCR3, CXCR2, CXCL2, CCL11, KITLG, CXCL1, CXCL5, CXCR1; MDSC group: ARG1, IL4I1, IL10, CYBB, IL6, PTGS2, IDO1; Macrophages group: MRC1, CD163, MSR1, SIGLEC1, IL4I1, CD68, IL10, CSF1R; Cancer-associated fibroblasts (CAF) group: PDGFRB, COL6A3, FBLN1, CXCL12, COL6A2, COL6A1, LUM, CD248, COL5A1, MMP2, COL1A1, MFAP5, PDGFRA, LRP1, FGF2, MMP3, FAP, COL1A2, ACTA2; Matrix group: COL11A1, LAMB3, FN1, COL1A1, COL4A1, ELN, LGALS9, LGALS7, LAMC2, TNC, LAMA3, COL3A1, COL5A1, VTN, COL1A2; Angiogenesis group: PGF, CXCL8, FLT1, ANGPT1, ANGPT2, VEGFC, VEGFB, CXCR2, VEGFA, VWF, CDH5, CXCL5, PDGFC, KDR, TEK; Endothelium group: NOS3, MMRN1, FLT1, CLEC14A, MMRN2, VCAM1, ENG, VWF, CDH5, KDR; Proliferation rate group: AURKA, MCM2, CCNB1, MYBL2, MCM6, CDK2, E2F1, CCNE1, ESCO2, CCND1, AURKB, BUB1, MKI67, PLK1, CETN3; EMT signature group: SNAI2, TWIST1, ZEB2, SNAI1, ZEB1, TWIST2, CDH2; Citric Acid Cycle group: ACLY, FAH, PC, MDH1B, SLC16A7, IREB2, PCK1, MDH1, SLC33A1, ALDH1B1, IDH3B, DLST, PDHB, MDH2, ACO1, IDH1, SLC5A6, HICDH, SLC16A8, GOT1, ME3, ME1, CS, OGDH, SDHA, ALDH5A1, CLYBL, SDHD, IDH3A, SLC25A1, ACSS2, SDHC, ACSS1, SUCLA2, SLC13A5, PDHX, SDHB, ALDH4A1, PCK2, DLD, ACO2, PDHA1, SLC13A2, FAHD1, IDH2, GOT2, ME2, ADSL, SUCLG2, SLC13A3, SUCLG1, SLC25A10, FH, IDH3G, SLC16A1, SLC25A11, PDHA2, DLAT; Glycolysis and Gluconeogenesis group: SLC2A9, PFKL, GCK, PFKFB4, SLC16A7, PCK1, PGAM2, GAPDH, BPGM, G6PC2, FBP2, LDHD, SLC2A3, GPI, ENO1, SLC25A11, PFKFB3, PFKM, LDHAL6B, SLC2A2, G6PC3, SLC2A6, GAPDHS, SLC2A11, PCK2, PFKP, PGK1, ALDOC, SLC2A10, ACYP2, SLC2A4, PKLR, HKDC1, PGK2, SLC2A8, PGAM1, SLC5A1, SLC5A12, SLC16A1, ALDOB, HK3, HK1, SLC5A9, GPD2, PFKFB1, SLC2A7, SLC5A11, SLC5A3, ACYP1, SLC16A8, PFKFB2, ALDOA, SLC5A2, HK2, ENO3, SLC2A12, FBP1, LDHA, LDHB, LDHC, G6PC, SLC2A14, SLC5A8, TPI1, SLC16A3, PKM2, ENO2, PGM1, UEVLD, LDHAL6A, SLC2A1, PGM2; and Fatty Acid Metabolism group: MLYCD, ALDH3A2, SLC27A5, SLC27A3, LIPC, SLC27A2, ACSL4, ACSL1, PCCB, SLC25A20, AADAC, SLC22A4, SLC22A5, ECH1, PCCA, SLC27A1, SLC27A4, CROT, ACSL5, ACSL3, CYP4F12.
In some embodiments, the RNA expression levels for genes in the plurality of gene groups comprise RNA expression levels for each of the genes from each of the following gene groups: MHC I group: HLA-C, B2M, HLA-B, HLA-A, TAP1, TAP2, NLRC5, TAPBP; MHC II group: HLA-DQA1, HLA-DMA, HLA-DRB1, HLA-DMB, CIITA, HLA-DPA1, HLA-DPB1, HLA-DRA, HLA-DQB1; Coactivation molecules group: CD80, TNFRSF4, CD27, CD83, TNFSF9, CD40LG, CD70, ICOS, CD86, CD40, TNFSF4, ICOSLG, TNFRSF9, CD28; Effector cells group: PRF1, GZMB, TBX21, CD8B, ZAP70, IFNG, GZMK, EOMES, FASLG, CD8A, GZMA, GNLY; T cell traffic group: CXCL9, CCL3, CXCR3, CXCL10, CXCL11, CCL5, CCL4, CX3CL1, CX3CR1; NK cells group: GZMB, NKG7, CD160, GZMH, CD244, EOMES, KLRK1, NCR1, GNLY, KLRF1, FGFBP2, SH2D1B, KIR2DL4, IFNG, NCR3, KLRC2, CD226; T cells group: TRAC, TRBC2, TBX21, CD3E, CD3D, ITK, TRBC1, CD3G, CD28, TRAT1, CD5; B cells group: CR2, MS4A1, CD79A, FCRL5, STAP1, TNFRSF17, TNFRSF13B, CD19, BLK, CD79B, TNFRSF13C, CD22, PAX5; M1 signatures group: IL1B, IL12B, NOS2, SOCS3, IRF5, IL23A, TNF, IL12A, CMKLR1; Th1 signature group: IL12RB2, IL2, TBX21, IFNG, STAT4, IL21, CD40LG; Antitumor cytokines group: IFNA2, CCL3, TNF, TNFSF10, IL21, IFNB1; Checkpoint inhibition group: CTLA4, HAVCR2, CD274, LAG3, BTLA, VSIR, PDCD1LG2, TIGIT, PDCD1; Treg group: TNFRSF18, IKZF2, IL10, IKZF4, CTLA4, FOXP3, CCR8; T reg traffic group: CCL28, CCR10, CCR4, CCR8, CCL17, CCL22, CCL1; Neutrophil signature group: FCGR3B, CD177, CTSG, PGLYRP1, FFAR2, CXCR2, PRTN3, ELANE, MPO, CXCR1; Granulocyte traffic group: CXCL8, CCR3, CXCR2, CXCL2, CCL11, KITLG, CXCL1, CXCL5, CXCR1; MDSC group: ARG1, IL4I1, IL10, CYBB, IL6, PTGS2, IDO1; MDSC traffic group: CCL15, IL6R, CSF2RA, CSF2, CXCL8, CXCL12, IL6, CSF3, CCL26, CXCR4, CXCR2, CSF3R, CSF1, CXCL5, CSF1R; Macrophages group: MRC1, CD163, MSR1, SIGLEC1, IL4I1, CD68, IL10, CSF1R; Macrophage DC traffic group: CCL7, CCL2, XCR1, XCL1, CSF1, CCR2, CCL8, CSF1R; Th2 signature group: IL13, CCR4, IL10, IL5, IL4; Protumor cytokines group: MIF, TGFB1, IL10, TGFB3, IL6, TGFB2, IL22; CAF group: PDGFRB, COL6A3, FBLN1, CXCL12, COL6A2, COL6A1, LUM, CD248, COL5A1, MMP2, COL1A1, MFAP5, PDGFRA, LRP1, FGF2, MMP3, FAP, COL1A2, ACTA2; Matrix group: COL11A1, LAMB3, FN1, COL1A1, COL4A1, ELN, LGALS9, LGALS7, LAMC2, TNC, LAMA3, COL3A1, COL5A1, VTN, COL1A2; Matrix remodeling group: MMP1, PLOD2, MMP2, MMP12, ADAMTS5, ADAMTS4, LOX, MMP9, MMP11, MMP3, MMP1, CA9; Angiogenesis group: PGF, CXCL8, FLT1, ANGPT1, ANGPT2, VEGFC, VEGFB, CXCR2, VEGFA, VWF, CDH5, CXCL5, PDGFC, KDR, TEK; Endothelium group: NOS3, MMRN1, FLT1, CLEC14A, MMRN2, VCAM1, ENG, VWF, CDH5, KDR; Proliferation rate group: AURKA, MCM2, CCNB1, MYBL2, MCM6, CDK2, E2F1, CCNE1, ESCO2, CCND1, AURKB, BUB1, MKI67, PLK1, CETN3; EMT signature group: SNAI2, TWIST1, ZEB2, SNAI1, ZEB1, TWIST2, CDH2; Cyclic Nucleotides Metabolism group: ADCY4, PDE11A, PDE6A, PDE9A, PDE6C, ADCY7, PDE4A, PDE8A, PDE1B, PDE1A, GUCY2C, GUCY1A3, ADCY9, ADCY2, PDE6B, ADCY8, PDE8B, GUCY2F, PDE4C, PDE3A, GUCY1A2, PDE6G, PDE1C, GUCY2D, ADCY10, GUCY1B3, GUCY1B2, PDE7B, PDE5A, PDE6D, NPR2, ADCY5, NPR1, ADCY6, PDE7A, PDE2A, PDE4B, PDE10A, PDE6H, PDE4D, ADCY1, PDE3B, ADCY3; Glycolysis and Gluconeogenesis group: SLC2A9, PFKL, GCK, PFKFB4, SLC16A7, PCK1, PGAM2, GAPDH, BPGM, G6PC2, FBP2, LDHD, SLC2A3, GPI, ENO1, SLC25A11, PFKFB3, PFKM, LDHAL6B, SLC2A2, G6PC3, SLC2A6, GAPDHS, SLC2A11, PCK2, PFKP, PGK1, ALDOC, SLC2A10, ACYP2, SLC2A4, PKLR, HKDC1, PGK2, SLC2A8, PGAM1, SLC5A1, SLC5A12, SLC16A1, ALDOB, HK3, HK1, SLC5A9, GPD2, PFKFB1, SLC2A7, SLC5A11, SLC5A3, ACYP1, SLC16A8, PFKFB2, ALDOA, SLC5A2, HK2, ENO3, SLC2A12, FBP1, LDHA, LDHB, LDHC, G6PC, SLC2A14, SLC5A8, TPI1, SLC16A3, PKM2, ENO2, PGM1, UEVLD, LDHAL6A, SLC2A1, PGM2; Citric Acid Cycle group: ACLY, FAH, PC, MDH1B, SLC16A7, IREB2, PCK1, MDH1, SLC33A1, ALDH1B1, IDH3B, DLST, PDHB, MDH2, ACO1, IDH1, SLC5A6, HICDH, SLC16A8, GOT1, ME3, ME1, CS, OGDH, SDHA, ALDH5A1, CLYBL, SDHD, IDH3A, SLC25A1, ACSS2, SDHC, ACSS1, SUCLA2, SLC13A5, PDHX, SDHB, ALDH4A1, PCK2, DLD, ACO2, PDHA1, SLC13A2, FAHD1, IDH2, GOT2, ME2, ADSL, SUCLG2, SLC13A3, SUCLG1, SLC25A10, FH, IDH3G, SLC16A1, SLC25A11, PDHA2, DLAT; and, Fatty Acid Metabolism group: MLYCD, ALDH3A2, SLC27A5, SLC27A3, LIPC, SLC27A2, ACSL4, ACSL1, PCCB, SLC25A20, AADAC, SLC22A4, SLC22A5, ECH1, PCCA, SLC27A1, SLC27A4, CROT, ACSL5, ACSL3, CYP4F12.
In some embodiments, the RNA expression levels for genes in the plurality of gene groups further comprise RNA expression levels for each of the genes from each of the following gene groups: ECM associated group: ADAM8, ADAMTS4, C1QL3, CST7, CTSW, CXCL8, FASLG, LTB, MUC1, OSM, P4HA2, SCUBE1, SEMA4B, SEMA7A, SERPINE1, TCHH, TGFA, TGM2, TNFSF11, TNFSF9, WNT10B; TLS kidney group: ZNF683, POU2AF1, LAX1, CD79A, CXCL9, XCL2, JCHAIN, SLAMF7, CD38, SLAMF1, TNFRSF17, IRF4, HSH2D, PLA2G2D, MZB1; NRF2 signature group: TRIM16L, UGDH, KIAA1549, PANX2, FECH, LRP8, AKR1C2, FTH1, AKR1C3, CBR1, PFN2, CBX2, TXN, CYP4F11, CYP4F3, AKR1C1, AKR1B15, G6PD, PRDX1, TALDO1, EPT1, SRXN1, JAKMIP3, FTHL3, UCHL1, TXNRD1, Clorf131, CASKIN1, PGD, GPX2, OSGIN1, KIAA0319, CABYR, AIFM2, TRIM16, AKR1B10, GCLC, ABCC2, ETFB, IDH1, MAFG, NECAB2, ME1, PTGR1, PIR, GSR, RIT1, GCLM, ALDH3A1, NQO1, PKD1L2, NRG4, ABHD4, HRG, SLC7A11; and, tRCC signature group: FST, TRIM63, SLC10A2, ANTXRL, ERW-2, SNX22, INHBE, SV2B, FAM124A, EPHA5, LUZP2, CPEB1, HOXB13, ALLC, KCNF1, NDRG4, GREB1, ASTN1, JSRP1, UBE2U, KCNQ4, MYO7B, BRINP2, C1QL2, CCDC136, SLC51B, CATSPERG, PMEL, BIRC7, PLK5, ADARB2, CFAP61, TUBB4A, PLIN4, ABCB5, SYT3, HCN4, CTSK, SPACA1, TRIM67, NMRK2, LGI3, ARHGEF4, NTSR2, KEL, SNCB, PLD5, ADGRB1, CYP17A1, IGFBPL1, TRIM71, SLC45A2, TP73, IP6K3, HABP2, RGS20, IGFN1, CDH17.
In some embodiments, determining the gene group scores comprises determining a respective gene group score for each of at least two of the following gene groups, using, for a particular gene group, RNA expression levels for at least three genes in the particular gene group to determine the gene group score for the particular group, the gene groups including: Effector cells group: PRF1, GZMB, TBX21, CD8B, ZAP70, IFNG, GZMK, EOMES, FASLG, CD8A, GZMA, GNLY; NK cells group: GZMB, NKG7, CD160, GZMH, CD244, EOMES, KLRK1, NCR1, GNLY, KLRF1, FGFBP2, SH2D1B, KIR2DL4, IFNG, NCR3, KLRC2, CD226; T cells group: TRAC, TRBC2, TBX21, CD3E, CD3D, ITK, TRBC1, CD3G, CD28, TRAT1, CD5; B cells group: CR2, MS4A1, CD79A, FCRL5, STAP1, TNFRSF17, TNFRSF13B, CD19, BLK, CD79B, TNFRSF13C, CD22, PAX5; Antitumor cytokines group: IFNA2, CCL3, TNF, TNFSF10, IL21, IFNB1; Checkpoint inhibition group: CTLA4, HAVCR2, CD274, LAG3, BTLA, VSIR, PDCD1LG2, TIGIT, PDCD1; Treg group: TNFRSF18, IKZF2, IL10, IKZF4, CTLA4, FOXP3, CCR8; Neutrophil signature group: FCGR3B, CD177, CTSG, PGLYRP1, FFAR2, CXCR2, PRTN3, ELANE, MPO, CXCR1; Granulocyte traffic group: CXCL8, CCR3, CXCR2, CXCL2, CCL11, KITLG, CXCL1, CXCL5, CXCR1; MDSC group: ARG1, IL4I1, IL10, CYBB, IL6, PTGS2, IDO1; Macrophages group: MRC1, CD163, MSR1, SIGLEC1, IL4I1, CD68, IL10, CSF1R; Cancer-associated fibroblasts (CAF) group: PDGFRB, COL6A3, FBLN1, CXCL12, COL6A2, COL6A1, LUM, CD248, COL5A1, MMP2, COL1A1, MFAP5, PDGFRA, LRP1, FGF2, MMP3, FAP, COL1A2, ACTA2; Matrix group: COL11A1, LAMB3, FN1, COL1A1, COL4A1, ELN, LGALS9, LGALS7, LAMC2, TNC, LAMA3, COL3A1, COL5A1, VTN, COL1A2; Angiogenesis group: PGF, CXCL8, FLT1, ANGPT1, ANGPT2, VEGFC, VEGFB, CXCR2, VEGFA, VWF, CDH5, CXCL5, PDGFC, KDR, TEK; Endothelium group: NOS3, MMRN1, FLT1, CLEC14A, MMRN2, VCAM1, ENG, VWF, CDH5, KDR; Proliferation rate group: AURKA, MCM2, CCNB1, MYBL2, MCM6, CDK2, E2F1, CCNE1, ESCO2, CCND1, AURKB, BUB1, MKI67, PLK1, CETN3; EMT signature group: SNAI2, TWIST1, ZEB2, SNAI1, ZEB1, TWIST2, CDH2; Citric Acid Cycle group: ACLY, FAH, PC, MDH1B, SLC16A7, IREB2, PCK1, MDH1, SLC33A1, ALDH1B1, IDH3B, DLST, PDHB, MDH2, ACO1, IDH1, SLC5A6, HICDH, SLC16A8, GOT1, ME3, ME1, CS, OGDH, SDHA, ALDH5A1, CLYBL, SDHD, IDH3A, SLC25A1, ACSS2, SDHC, ACSS1, SUCLA2, SLC13A5, PDHX, SDHB, ALDH4A1, PCK2, DLD, ACO2, PDHA1, SLC13A2, FAHD1, IDH2, GOT2, ME2, ADSL, SUCLG2, SLC13A3, SUCLG1, SLC25A10, FH, IDH3G, SLC16A1, SLC25A11, PDHA2, DLAT; Glycolysis and Gluconeogenesis group: SLC2A9, PFKL, GCK, PFKFB4, SLC16A7, PCK1, PGAM2, GAPDH, BPGM, G6PC2, FBP2, LDHD, SLC2A3, GPI, ENO1, SLC25A11, PFKFB3, PFKM, LDHAL6B, SLC2A2, G6PC3, SLC2A6, GAPDHS, SLC2A11, PCK2, PFKP, PGK1, ALDOC, SLC2A10, ACYP2, SLC2A4, PKLR, HKDC1, PGK2, SLC2A8, PGAM1, SLC5A1, SLC5A12, SLC16A1, ALDOB, HK3, HK1, SLC5A9, GPD2, PFKFB1, SLC2A7, SLC5A11, SLC5A3, ACYP1, SLC16A8, PFKFB2, ALDOA, SLC5A2, HK2, ENO3, SLC2A12, FBP1, LDHA, LDHB, LDHC, G6PC, SLC2A14, SLC5A8, TPI1, SLC16A3, PKM2, ENO2, PGM1, UEVLD, LDHAL6A, SLC2A1, PGM2; and Fatty Acid Metabolism group: MLYCD, ALDH3A2, SLC27A5, SLC27A3, LIPC, SLC27A2, ACSL4, ACSL1, PCCB, SLC25A20, AADAC, SLC22A4, SLC22A5, ECH1, PCCA, SLC27A1, SLC27A4, CROT, ACSL5, ACSL3, CYP4F12.
In some embodiments, determining the gene group scores comprises determining a respective gene group score for each of at least two of the following gene groups, using, for a particular gene group, RNA expression levels for at least three genes in the particular gene group to determine the gene group score for the particular group, the gene groups including: MHC I group: HLA-C, B2M, HLA-B, HLA-A, TAP1, TAP2, NLRC5, TAPBP; MHC II group: HLA-DQA1, HLA-DMA, HLA-DRB1, HLA-DMB, CIITA, HLA-DPA1, HLA-DPB1, HLA-DRA, HLA-DQB1; Coactivation molecules group: CD80, TNFRSF4, CD27, CD83, TNFSF9, CD40LG, CD70, ICOS, CD86, CD40, TNFSF4, ICOSLG, TNFRSF9, CD28; Effector cells group: PRF1, GZMB, TBX21, CD8B, ZAP70, IFNG, GZMK, EOMES, FASLG, CD8A, GZMA, GNLY; T cell traffic group: CXCL9, CCL3, CXCR3, CXCL10, CXCL11, CCL5, CCL4, CX3CL1, CX3CR1; NK cells group: GZMB, NKG7, CD160, GZMH, CD244, EOMES, KLRK1, NCR1, GNLY, KLRF1, FGFBP2, SH2D1B, KIR2DL4, IFNG, NCR3, KLRC2, CD226; T cells group: TRAC, TRBC2, TBX21, CD3E, CD3D, ITK, TRBC1, CD3G, CD28, TRAT1, CD5; B cells group: CR2, MS4A1, CD79A, FCRL5, STAP1, TNFRSF17, TNFRSF13B, CD19, BLK, CD79B, TNFRSF13C, CD22, PAX5; M1 signatures group: IL1B, IL12B, NOS2, SOCS3, IRF5, IL23A, TNF, IL12A, CMKLR1; Th1 signature group: IL12RB2, IL2, TBX21, IFNG, STAT4, IL21, CD40LG; Antitumor cytokines group: IFNA2, CCL3, TNF, TNFSF10, IL21, IFNB1; Checkpoint inhibition group: CTLA4, HAVCR2, CD274, LAG3, BTLA, VSIR, PDCD1LG2, TIGIT, PDCD1; Treg group: TNFRSF18, IKZF2, IL10, IKZF4, CTLA4, FOXP3, CCR8; T reg traffic group: CCL28, CCR10, CCR4, CCR8, CCL17, CCL22, CCL1; Neutrophil signature group: FCGR3B, CD177, CTSG, PGLYRP1, FFAR2, CXCR2, PRTN3, ELANE, MPO, CXCR1; Granulocyte traffic group: CXCL8, CCR3, CXCR2, CXCL2, CCL11, KITLG, CXCL1, CXCL5, CXCR1; MDSC group: ARG1, IL4I1, IL10, CYBB, IL6, PTGS2, IDO1; MDSC traffic group: CCL15, IL6R, CSF2RA, CSF2, CXCL8, CXCL12, IL6, CSF3, CCL26, CXCR4, CXCR2, CSF3R, CSF1, CXCL5, CSF1R; Macrophages group: MRC1, CD163, MSR1, SIGLEC1, IL4I1, CD68, IL10, CSF1R; Macrophage DC traffic group: CCL7, CCL2, XCR1, XCL1, CSF1, CCR2, CCL8, CSF1R; Th2 signature group: IL13, CCR4, IL10, IL5, IL4; Protumor cytokines group: MIF, TGFB1, IL10, TGFB3, IL6, TGFB2, IL22; CAF group: PDGFRB, COL6A3, FBLN1, CXCL12, COL6A2, COL6A1, LUM, CD248, COL5A1, MMP2, COL1A1, MFAP5, PDGFRA, LRP1, FGF2, MMP3, FAP, COL1A2, ACTA2; Matrix group: COL11A1, LAMB3, FN1, COL1A1, COL4A1, ELN, LGALS9, LGALS7, LAMC2, TNC, LAMA3, COL3A1, COL5A1, VTN, COL1A2; Matrix remodeling group: MMP1, PLOD2, MMP2, MMP12, ADAMTS5, ADAMTS4, LOX, MMP9, MMP11, MMP3, MMP1, CA9; Angiogenesis group: PGF, CXCL8, FLT1, ANGPT1, ANGPT2, VEGFC, VEGFB, CXCR2, VEGFA, VWF, CDH5, CXCL5, PDGFC, KDR, TEK; Endothelium group: NOS3, MMRN1, FLT1, CLEC14A, MMRN2, VCAM1, ENG, VWF, CDH5, KDR; Proliferation rate group: AURKA, MCM2, CCNB1, MYBL2, MCM6, CDK2, E2F1, CCNE1, ESCO2, CCND1, AURKB, BUB1, MKI67, PLK1, CETN3; EMT signature group: SNAI2, TWIST1, ZEB2, SNAI1, ZEB1, TWIST2, CDH2; Cyclic Nucleotides Metabolism group: ADCY4, PDE11A, PDE6A, PDE9A, PDE6C, ADCY7, PDE4A, PDE8A, PDE1B, PDE1A, GUCY2C, GUCY1A3, ADCY9, ADCY2, PDE6B, ADCY8, PDE8B, GUCY2F, PDE4C, PDE3A, GUCY1A2, PDE6G, PDE1C, GUCY2D, ADCY10, GUCY1B3, GUCY1B2, PDE7B, PDE5A, PDE6D, NPR2, ADCY5, NPR1, ADCY6, PDE7A, PDE2A, PDE4B, PDE10A, PDE6H, PDE4D, ADCY1, PDE3B, ADCY3; Glycolysis and Gluconeogenesis group: SLC2A9, PFKL, GCK, PFKFB4, SLC16A7, PCK1, PGAM2, GAPDH, BPGM, G6PC2, FBP2, LDHD, SLC2A3, GPI, ENO1, SLC25A11, PFKFB3, PFKM, LDHAL6B, SLC2A2, G6PC3, SLC2A6, GAPDHS, SLC2A11, PCK2, PFKP, PGK1, ALDOC, SLC2A10, ACYP2, SLC2A4, PKLR, HKDC1, PGK2, SLC2A8, PGAM1, SLC5A1, SLC5A12, SLC16A1, ALDOB, HK3, HK1, SLC5A9, GPD2, PFKFB1, SLC2A7, SLC5A11, SLC5A3, ACYP1, SLC16A8, PFKFB2, ALDOA, SLC5A2, HK2, ENO3, SLC2A12, FBP1, LDHA, LDHB, LDHC, G6PC, SLC2A14, SLC5A8, TPI1, SLC16A3, PKM2, ENO2, PGM1, UEVLD, LDHAL6A, SLC2A1, PGM2; Citric Acid Cycle group: ACLY, FAH, PC, MDH1B, SLC16A7, IREB2, PCK1, MDH1, SLC33A1, ALDH1B1, IDH3B, DLST, PDHB, MDH2, ACO1, IDH1, SLC5A6, HICDH, SLC16A8, GOT1, ME3, ME1, CS, OGDH, SDHA, ALDH5A1, CLYBL, SDHD, IDH3A, SLC25A1, ACSS2, SDHC, ACSS1, SUCLA2, SLC13A5, PDHX, SDHB, ALDH4A1, PCK2, DLD, ACO2, PDHA1, SLC13A2, FAHD1, IDH2, GOT2, ME2, ADSL, SUCLG2, SLC13A3, SUCLG1, SLC25A10, FH, IDH3G, SLC16A1, SLC25A11, PDHA2, DLAT; and, Fatty Acid Metabolism group: MLYCD, ALDH3A2, SLC27A5, SLC27A3, LIPC, SLC27A2, ACSL4, ACSL1, PCCB, SLC25A20, AADAC, SLC22A4, SLC22A5, ECH1, PCCA, SLC27A1, SLC27A4, CROT, ACSL5, ACSL3, CYP4F12.
In some embodiments, determining the gene group scores further comprises determining a respective gene group score for each of at least two of the following gene groups, using, for a particular gene group, RNA expression levels for at least three genes in the particular gene group to determine the gene group score for the particular group, the gene groups including: ECM associated group: ADAM8, ADAMTS4, C1QL3, CST7, CTSW, CXCL8, FASLG, LTB, MUC1, OSM, P4HA2, SCUBE1, SEMA4B, SEMA7A, SERPINE1, TCHH, TGFA, TGM2, TNFSF11, TNFSF9, WNT10B; TLS kidney group: ZNF683, POU2AF1, LAX1, CD79A, CXCL9, XCL2, JCHAIN, SLAMF7, CD38, SLAMF1, TNFRSF17, IRF4, HSH2D, PLA2G2D, MZB1; NRF2 signature group: TRIM16L, UGDH, KIAA1549, PANX2, FECH, LRP8, AKR1C2, FTH1, AKR1C3, CBR1, PFN2, CBX2, TXN, CYP4F11, CYP4F3, AKR1C1, AKR1B15, G6PD, PRDX1, TALDO1, EPT1, SRXN1, JAKMIP3, FTHL3, UCHL1, TXNRD1, Clorf131, CASKIN1, PGD, GPX2, OSGIN1, KIAA0319, CABYR, AIFM2, TRIM16, AKR1B10, GCLC, ABCC2, ETFB, IDH1, MAFG, NECAB2, ME1, PTGR1, PIR, GSR, RIT1, GCLM, ALDH3A1, NQO1, PKD1L2, NRG4, ABHD4, HRG, SLC7A11; and, tRCC signature group: FST, TRIM63, SLC10A2, ANTXRL, ERW-2, SNX22, INHBE, SV2B, FAM124A, EPHA5, LUZP2, CPEB1, HOXB13, ALLC, KCNF1, NDRG4, GREB1, ASTN1, JSRP1, UBE2U, KCNQ4, MYO7B, BRINP2, C1QL2, CCDC136, SLC51B, CATSPERG, PMEL, BIRC7, PLK5, ADARB2, CFAP61, TUBB4A, PLIN4, ABCB5, SYT3, HCN4, CTSK, SPACA1, TRIM67, NMRK2, LGI3, ARHGEF4, NTSR2, KEL, SNCB, PLD5, ADGRB1, CYP17A1, IGFBPL1, TRIM71, SLC45A2, TP73, IP6K3, HABP2, RGS20, IGFN1, CDH17.
In some embodiments, determining the gene group scores comprises determining a respective gene group score for each of the following gene groups, using, for each gene group, RNA expression levels for each of the genes in each gene group to determine the gene group score for each particular group, the gene groups including: Effector cells group: PRF1, GZMB, TBX21, CD8B, ZAP70, IFNG, GZMK, EOMES, FASLG, CD8A, GZMA, GNLY; NK cells group: GZMB, NKG7, CD160, GZMH, CD244, EOMES, KLRK1, NCR1, GNLY, KLRF1, FGFBP2, SH2D1B, KIR2DL4, IFNG, NCR3, KLRC2, CD226; T cells group: TRAC, TRBC2, TBX21, CD3E, CD3D, ITK, TRBC1, CD3G, CD28, TRAT1, CD5; B cells group: CR2, MS4A1, CD79A, FCRL5, STAP1, TNFRSF17, TNFRSF13B, CD19, BLK, CD79B, TNFRSF13C, CD22, PAX5; Antitumor cytokines group: IFNA2, CCL3, TNF, TNFSF10, IL21, IFNB1; Checkpoint inhibition group: CTLA4, HAVCR2, CD274, LAG3, BTLA, VSIR, PDCD1LG2, TIGIT, PDCD1; Treg group: TNFRSF18, IKZF2, IL10, IKZF4, CTLA4, FOXP3, CCR8; Neutrophil signature group: FCGR3B, CD177, CTSG, PGLYRP1, FFAR2, CXCR2, PRTN3, ELANE, MPO, CXCR1; Granulocyte traffic group: CXCL8, CCR3, CXCR2, CXCL2, CCL11, KITLG, CXCL1, CXCL5, CXCR1; MDSC group: ARG1, IL4I1, IL10, CYBB, IL6, PTGS2, IDO1; Macrophages group: MRC1, CD163, MSR1, SIGLEC1, IL4I1, CD68, IL10, CSF1R; Cancer-associated fibroblasts (CAF) group: PDGFRB, COL6A3, FBLN1, CXCL12, COL6A2, COL6A1, LUM, CD248, COL5A1, MMP2, COL1A1, MFAP5, PDGFRA, LRP1, FGF2, MMP3, FAP, COL1A2, ACTA2; Matrix group: COL11A1, LAMB3, FN1, COL1A1, COL4A1, ELN, LGALS9, LGALS7, LAMC2, TNC, LAMA3, COL3A1, COL5A1, VTN, COL1A2; Angiogenesis group: PGF, CXCL8, FLT1, ANGPT1, ANGPT2, VEGFC, VEGFB, CXCR2, VEGFA, VWF, CDH5, CXCL5, PDGFC, KDR, TEK; Endothelium group: NOS3, MMRN1, FLT1, CLEC14A, MMRN2, VCAM1, ENG, VWF, CDH5, KDR; Proliferation rate group: AURKA, MCM2, CCNB1, MYBL2, MCM6, CDK2, E2F1, CCNE1, ESCO2, CCND1, AURKB, BUB1, MKI67, PLK1, CETN3; EMT signature group: SNAI2, TWIST1, ZEB2, SNAI1, ZEB1, TWIST2, CDH2; Citric Acid Cycle group: ACLY, FAH, PC, MDH1B, SLC16A7, IREB2, PCK1, MDH1, SLC33A1, ALDH1B1, IDH3B, DLST, PDHB, MDH2, ACO1, IDH1, SLC5A6, HICDH, SLC16A8, GOT1, ME3, ME1, CS, OGDH, SDHA, ALDH5A1, CLYBL, SDHD, IDH3A, SLC25A1, ACSS2, SDHC, ACSS1, SUCLA2, SLC13A5, PDHX, SDHB, ALDH4A1, PCK2, DLD, ACO2, PDHA1, SLC13A2, FAHD1, IDH2, GOT2, ME2, ADSL, SUCLG2, SLC13A3, SUCLG1, SLC25A10, FH, IDH3G, SLC16A1, SLC25A11, PDHA2, DLAT; Glycolysis and Gluconeogenesis group: SLC2A9, PFKL, GCK, PFKFB4, SLC16A7, PCK1, PGAM2, GAPDH, BPGM, G6PC2, FBP2, LDHD, SLC2A3, GPI, ENO1, SLC25A11, PFKFB3, PFKM, LDHAL6B, SLC2A2, G6PC3, SLC2A6, GAPDHS, SLC2A11, PCK2, PFKP, PGK1, ALDOC, SLC2A10, ACYP2, SLC2A4, PKLR, HKDC1, PGK2, SLC2A8, PGAM1, SLC5A1, SLC5A12, SLC16A1, ALDOB, HK3, HK1, SLC5A9, GPD2, PFKFB1, SLC2A7, SLC5A11, SLC5A3, ACYP1, SLC16A8, PFKFB2, ALDOA, SLC5A2, HK2, ENO3, SLC2A12, FBP1, LDHA, LDHB, LDHC, G6PC, SLC2A14, SLC5A8, TPI1, SLC16A3, PKM2, ENO2, PGM1, UEVLD, LDHAL6A, SLC2A1, PGM2; and Fatty Acid Metabolism group: MLYCD, ALDH3A2, SLC27A5, SLC27A3, LIPC, SLC27A2, ACSL4, ACSL1, PCCB, SLC25A20, AADAC, SLC22A4, SLC22A5, ECH1, PCCA, SLC27A1, SLC27A4, CROT, ACSL5, ACSL3, CYP4F12.
In some embodiments, determining the gene group scores comprises determining a respective gene group score for each of the following gene groups, using, for each gene group, RNA expression levels for each of the genes in each gene group to determine the gene group score for each particular group, the gene groups including: MHC I group: HLA-C, B2M, HLA-B, HLA-A, TAP1, TAP2, NLRC5, TAPBP; MHC II group: HLA-DQA1, HLA-DMA, HLA-DRB1, HLA-DMB, CIITA, HLA-DPA1, HLA-DPB1, HLA-DRA, HLA-DQB1; Coactivation molecules group: CD80, TNFRSF4, CD27, CD83, TNFSF9, CD40LG, CD70, ICOS, CD86, CD40, TNFSF4, ICOSLG, TNFRSF9, CD28; Effector cells group: PRF1, GZMB, TBX21, CD8B, ZAP70, IFNG, GZMK, EOMES, FASLG, CD8A, GZMA, GNLY; T cell traffic group: CXCL9, CCL3, CXCR3, CXCL10, CXCL11, CCL5, CCL4, CX3CL1, CX3CR1; NK cells group: GZMB, NKG7, CD160, GZMH, CD244, EOMES, KLRK1, NCR1, GNLY, KLRF1, FGFBP2, SH2D1B, KIR2DL4, IFNG, NCR3, KLRC2, CD226; T cells group: TRAC, TRBC2, TBX21, CD3E, CD3D, ITK, TRBC1, CD3G, CD28, TRAT1, CD5; B cells group: CR2, MS4A1, CD79A, FCRL5, STAP1, TNFRSF17, TNFRSF13B, CD19, BLK, CD79B, TNFRSF13C, CD22, PAX5; M1 signatures group: IL1B, IL12B, NOS2, SOCS3, IRF5, IL23A, TNF, IL12A, CMKLR1; Th1 signature group: IL12RB2, IL2, TBX21, IFNG, STAT4, IL21, CD40LG; Antitumor cytokines group: IFNA2, CCL3, TNF, TNFSF10, IL21, IFNB1; Checkpoint inhibition group: CTLA4, HAVCR2, CD274, LAG3, BTLA, VSIR, PDCD1LG2, TIGIT, PDCD1; Treg group: TNFRSF18, IKZF2, IL10, IKZF4, CTLA4, FOXP3, CCR8; T reg traffic group: CCL28, CCR10, CCR4, CCR8, CCL17, CCL22, CCL1; Neutrophil signature group: FCGR3B, CD177, CTSG, PGLYRP1, FFAR2, CXCR2, PRTN3, ELANE, MPO, CXCR1; Granulocyte traffic group: CXCL8, CCR3, CXCR2, CXCL2, CCL11, KITLG, CXCL1, CXCL5, CXCR1; MDSC group: ARG1, IL4I1, IL10, CYBB, IL6, PTGS2, IDO1; MDSC traffic group: CCL15, IL6R, CSF2RA, CSF2, CXCL8, CXCL12, IL6, CSF3, CCL26, CXCR4, CXCR2, CSF3R, CSF1, CXCL5, CSF1R; Macrophages group: MRC1, CD163, MSR1, SIGLEC1, IL4I1, CD68, IL10, CSF1R; Macrophage DC traffic group: CCL7, CCL2, XCR1, XCL1, CSF1, CCR2, CCL8, CSF1R; Th2 signature group: IL13, CCR4, IL10, IL5, IL4; Protumor cytokines group: MIF, TGFB1, IL10, TGFB3, IL6, TGFB2, IL22; CAF group: PDGFRB, COL6A3, FBLN1, CXCL12, COL6A2, COL6A1, LUM, CD248, COL5A1, MMP2, COL1A1, MFAP5, PDGFRA, LRP1, FGF2, MMP3, FAP, COL1A2, ACTA2; Matrix group: COL11A1, LAMB3, FN1, COL1A1, COL4A1, ELN, LGALS9, LGALS7, LAMC2, TNC, LAMA3, COL3A1, COL5A1, VTN, COL1A2; Matrix remodeling group: MMP1, PLOD2, MMP2, MMP12, ADAMTS5, ADAMTS4, LOX, MMP9, MMP11, MMP3, MMP1, CA9; Angiogenesis group: PGF, CXCL8, FLT1, ANGPT1, ANGPT2, VEGFC, VEGFB, CXCR2, VEGFA, VWF, CDH5, CXCL5, PDGFC, KDR, TEK; Endothelium group: NOS3, MMRN1, FLT1, CLEC14A, MMRN2, VCAM1, ENG, VWF, CDH5, KDR; Proliferation rate group: AURKA, MCM2, CCNB1, MYBL2, MCM6, CDK2, E2F1, CCNE1, ESCO2, CCND1, AURKB, BUB1, MKI67, PLK1, CETN3; EMT signature group: SNAI2, TWIST1, ZEB2, SNAI1, ZEB1, TWIST2, CDH2; Cyclic Nucleotides Metabolism group: ADCY4, PDE11A, PDE6A, PDE9A, PDE6C, ADCY7, PDE4A, PDE8A, PDE1B, PDE1A, GUCY2C, GUCY1A3, ADCY9, ADCY2, PDE6B, ADCY8, PDE8B, GUCY2F, PDE4C, PDE3A, GUCY1A2, PDE6G, PDE1C, GUCY2D, ADCY10, GUCY1B3, GUCY1B2, PDE7B, PDE5A, PDE6D, NPR2, ADCY5, NPR1, ADCY6, PDE7A, PDE2A, PDE4B, PDE10A, PDE6H, PDE4D, ADCY1, PDE3B, ADCY3; Glycolysis and Gluconeogenesis group: SLC2A9, PFKL, GCK, PFKFB4, SLC16A7, PCK1, PGAM2, GAPDH, BPGM, G6PC2, FBP2, LDHD, SLC2A3, GPI, ENO1, SLC25A11, PFKFB3, PFKM, LDHAL6B, SLC2A2, G6PC3, SLC2A6, GAPDHS, SLC2A11, PCK2, PFKP, PGK1, ALDOC, SLC2A10, ACYP2, SLC2A4, PKLR, HKDC1, PGK2, SLC2A8, PGAM1, SLC5A1, SLC5A12, SLC16A1, ALDOB, HK3, HK1, SLC5A9, GPD2, PFKFB1, SLC2A7, SLC5A11, SLC5A3, ACYP1, SLC16A8, PFKFB2, ALDOA, SLC5A2, HK2, ENO3, SLC2A12, FBP1, LDHA, LDHB, LDHC, G6PC, SLC2A14, SLC5A8, TPI1, SLC16A3, PKM2, ENO2, PGM1, UEVLD, LDHAL6A, SLC2A1, PGM2; Citric Acid Cycle group: ACLY, FAH, PC, MDH1B, SLC16A7, IREB2, PCK1, MDH1, SLC33A1, ALDH1B1, IDH3B, DLST, PDHB, MDH2, ACO1, IDH1, SLC5A6, HICDH, SLC16A8, GOT1, ME3, ME1, CS, OGDH, SDHA, ALDH5A1, CLYBL, SDHD, IDH3A, SLC25A1, ACSS2, SDHC, ACSS1, SUCLA2, SLC13A5, PDHX, SDHB, ALDH4A1, PCK2, DLD, ACO2, PDHA1, SLC13A2, FAHD1, IDH2, GOT2, ME2, ADSL, SUCLG2, SLC13A3, SUCLG1, SLC25A10, FH, IDH3G, SLC16A1, SLC25A11, PDHA2, DLAT; and, Fatty Acid Metabolism group: MLYCD, ALDH3A2, SLC27A5, SLC27A3, LIPC, SLC27A2, ACSL4, ACSL1, PCCB, SLC25A20, AADAC, SLC22A4, SLC22A5, ECH1, PCCA, SLC27A1, SLC27A4, CROT, ACSL5, ACSL3, CYP4F12.
In some embodiments, determining the gene group scores further comprises: determining a respective gene group score for each of the following gene groups, using, for each gene group, RNA expression levels for each of the genes in each gene group to determine the gene group score for each particular group, the gene groups including: ECM associated group: ADAMS, ADAMTS4, C1QL3, CST7, CTSW, CXCL8, FASLG, LTB, MUC1, OSM, P4HA2, SCUBE1, SEMA4B, SEMA7A, SERPINE1, TCHH, TGFA, TGM2, TNFSF11, TNFSF9, WNT10B; TLS kidney group: ZNF683, POU2AF1, LAX1, CD79A, CXCL9, XCL2, JCHAIN, SLAMF7, CD38, SLAMF1, TNFRSF17, IRF4, HSH2D, PLA2G2D, MZB1; NRF2 signature group: TRIM16L, UGDH, KIAA1549, PANX2, FECH, LRP8, AKR1C2, FTH1, AKR1C3, CBR1, PFN2, CBX2, TXN, CYP4F11, CYP4F3, AKR1C1, AKR1B15, G6PD, PRDX1, TALDO1, EPT1, SRXN1, JAKMIP3, FTHL3, UCHL1, TXNRD1, Clorf131, CASKIN1, PGD, GPX2, OSGIN1, KIAA0319, CABYR, AIFM2, TRIM16, AKR1B10, GCLC, ABCC2, ETFB, IDH1, MAFG, NECAB2, ME1, PTGR1, PIR, GSR, RIT1, GCLM, ALDH3A1, NQO1, PKD1L2, NRG4, ABHD4, HRG, SLC7A11; and, tRCC signature group: FST, TRIM63, SLC10A2, ANTXRL, ERVV-2, SNX22, INHBE, SV2B, FAM124A, EPHA5, LUZP2, CPEB1, HOXB13, ALLC, KCNF1, NDRG4, GREB1, ASTN1, JSRP1, UBE2U, KCNQ4, MYO7B, BRINP2, C1QL2, CCDC136, SLC51B, CATSPERG, PMEL, BIRC7, PLK5, ADARB2, CFAP61, TUBB4A, PLIN4, ABCB5, SYT3, HCN4, CTSK, SPACA1, TRIM67, NMRK2, LGI3, ARHGEF4, NTSR2, KEL, SNCB, PLD5, ADGRB1, CYP17A1, IGFBPL1, TRIM71, SLC45A2, TP73, IP6K3, HABP2, RGS20, IGFN1, CDH17.
In some embodiments, determining the gene group scores comprises determining a first score of a first gene group using a single-sample GSEA (ssGSEA) technique from RNA expression levels for at least some of the genes in one of the following gene groups: MHC I group: HLA-C, B2M, HLA-B, HLA-A, TAP1, TAP2, NLRC5, TAPBP; MHC II group: HLA-DQA1, HLA-DMA, HLA-DRB1, HLA-DMB, CIITA, HLA-DPA1, HLA-DPB1, HLA-DRA, HLA-DQB1; Coactivation molecules group: CD80, TNFRSF4, CD27, CD83, TNFSF9, CD40LG, CD70, ICOS, CD86, CD40, TNFSF4, ICOSLG, TNFRSF9, CD28; Effector cells group: PRF1, GZMB, TBX21, CD8B, ZAP70, IFNG, GZMK, EOMES, FASLG, CD8A, GZMA, GNLY; T cell traffic group: CXCL9, CCL3, CXCR3, CXCL10, CXCL11, CCL5, CCL4, CX3CL1, CX3CR1; NK cells group: GZMB, NKG7, CD160, GZMH, CD244, EOMES, KLRK1, NCR1, GNLY, KLRF1, FGFBP2, SH2D1B, KIR2DL4, IFNG, NCR3, KLRC2, CD226; T cells group: TRAC, TRBC2, TBX21, CD3E, CD3D, ITK, TRBC1, CD3G, CD28, TRAT1, CD5; B cells group: CR2, MS4A1, CD79A, FCRL5, STAP1, TNFRSF17, TNFRSF13B, CD19, BLK, CD79B, TNFRSF13C, CD22, PAX5; M1 signatures group: IL1B, IL12B, NOS2, SOCS3, IRF5, IL23A, TNF, IL12A, CMKLR1; Th1 signature group: IL12RB2, IL2, TBX21, IFNG, STAT4, IL21, CD40LG; Antitumor cytokines group: IFNA2, CCL3, TNF, TNFSF10, IL21, IFNB1; Checkpoint inhibition group: CTLA4, HAVCR2, CD274, LAG3, BTLA, VSIR, PDCD1LG2, TIGIT, PDCD1; Treg group: TNFRSF18, IKZF2, IL10, IKZF4, CTLA4, FOXP3, CCR8; T reg traffic group: CCL28, CCR10, CCR4, CCR8, CCL17, CCL22, CCL1; Neutrophil signature group: FCGR3B, CD177, CTSG, PGLYRP1, FFAR2, CXCR2, PRTN3, ELANE, MPO, CXCR1; Granulocyte traffic group: CXCL8, CCR3, CXCR2, CXCL2, CCL11, KITLG, CXCL1, CXCL5, CXCR1; MDSC group: ARG1, IL4I1, IL10, CYBB, IL6, PTGS2, IDO1; MDSC traffic group: CCL15, IL6R, CSF2RA, CSF2, CXCL8, CXCL12, IL6, CSF3, CCL26, CXCR4, CXCR2, CSF3R, CSF1, CXCL5, CSF1R; Macrophages group: MRC1, CD163, MSR1, SIGLEC1, IL4I1, CD68, IL10, CSF1R; Macrophage DC traffic group: CCL7, CCL2, XCR1, XCL1, CSF1, CCR2, CCL8, CSF1R; Th2 signature group: IL13, CCR4, IL10, IL5, IL4; Protumor cytokines group: MIF, TGFB1, IL10, TGFB3, IL6, TGFB2, IL22; CAF group: PDGFRB, COL6A3, FBLN1, CXCL12, COL6A2, COL6A1, LUM, CD248, COL5A1, MMP2, COL1A1, MFAP5, PDGFRA, LRP1, FGF2, MMP3, FAP, COL1A2, ACTA2; Matrix group: COL11A1, LAMB3, FN1, COL1A1, COL4A1, ELN, LGALS9, LGALS7, LAMC2, TNC, LAMA3, COL3A1, COL5A1, VTN, COL1A2; Matrix remodeling group: MMP1, PLOD2, MMP2, MMP12, ADAMTS5, ADAMTS4, LOX, MMP9, MMP11, MMP3, MMP1, CA9; Angiogenesis group: PGF, CXCL8, FLT1, ANGPT1, ANGPT2, VEGFC, VEGFB, CXCR2, VEGFA, VWF, CDH5, CXCL5, PDGFC, KDR, TEK; Endothelium group: NOS3, MMRN1, FLT1, CLEC14A, MMRN2, VCAM1, ENG, VWF, CDH5, KDR; Proliferation rate group: AURKA, MCM2, CCNB1, MYBL2, MCM6, CDK2, E2F1, CCNE1, ESCO2, CCND1, AURKB, BUB1, MKI67, PLK1, CETN3; EMT signature group: SNAI2, TWIST1, ZEB2, SNAI1, ZEB1, TWIST2, CDH2; Cyclic Nucleotides Metabolism group: ADCY4, PDE11A, PDE6A, PDE9A, PDE6C, ADCY7, PDE4A, PDE8A, PDE1B, PDE1A, GUCY2C, GUCY1A3, ADCY9, ADCY2, PDE6B, ADCY8, PDE8B, GUCY2F, PDE4C, PDE3A, GUCY1A2, PDE6G, PDE1C, GUCY2D, ADCY10, GUCY1B3, GUCY1B2, PDE7B, PDE5A, PDE6D, NPR2, ADCY5, NPR1, ADCY6, PDE7A, PDE2A, PDE4B, PDE10A, PDE6H, PDE4D, ADCY1, PDE3B, ADCY3; Glycolysis and Gluconeogenesis group: SLC2A9, PFKL, GCK, PFKFB4, SLC16A7, PCK1, PGAM2, GAPDH, BPGM, G6PC2, FBP2, LDHD, SLC2A3, GPI, ENO1, SLC25A11, PFKFB3, PFKM, LDHAL6B, SLC2A2, G6PC3, SLC2A6, GAPDHS, SLC2A11, PCK2, PFKP, PGK1, ALDOC, SLC2A10, ACYP2, SLC2A4, PKLR, HKDC1, PGK2, SLC2A8, PGAM1, SLC5A1, SLC5A12, SLC16A1, ALDOB, HK3, HK1, SLC5A9, GPD2, PFKFB1, SLC2A7, SLC5A11, SLC5A3, ACYP1, SLC16A8, PFKFB2, ALDOA, SLC5A2, HK2, ENO3, SLC2A12, FBP1, LDHA, LDHB, LDHC, G6PC, SLC2A14, SLC5A8, TPI1, SLC16A3, PKM2, ENO2, PGM1, UEVLD, LDHAL6A, SLC2A1, PGM2; Citric Acid Cycle group: ACLY, FAH, PC, MDH1B, SLC16A7, IREB2, PCK1, MDH1, SLC33A1, ALDH1B1, IDH3B, DLST, PDHB, MDH2, ACO1, IDH1, SLC5A6, HICDH, SLC16A8, GOT1, ME3, ME1, CS, OGDH, SDHA, ALDH5A1, CLYBL, SDHD, IDH3A, SLC25A1, ACSS2, SDHC, ACSS1, SUCLA2, SLC13A5, PDHX, SDHB, ALDH4A1, PCK2, DLD, ACO2, PDHA1, SLC13A2, FAHD1, IDH2, GOT2, ME2, ADSL, SUCLG2, SLC13A3, SUCLG1, SLC25A10, FH, IDH3G, SLC16A1, SLC25A11, PDHA2, DLAT; and, Fatty Acid Metabolism group: MLYCD, ALDH3A2, SLC27A5, SLC27A3, LIPC, SLC27A2, ACSL4, ACSL1, PCCB, SLC25A20, AADAC, SLC22A4, SLC22A5, ECH1, PCCA, SLC27A1, SLC27A4, CROT, ACSL5, ACSL3, CYP4F12.
In some embodiments, determining the gene group scores comprises using a single-sample GSEA (ssGSEA) technique to determine the gene group scores from RNA expression levels for each of the genes in each of the following gene groups: MHC I group: HLA-C, B2M, HLA-B, HLA-A, TAP1, TAP2, NLRC5, TAPBP; MHC II group: HLA-DQA1, HLA-DMA, HLA-DRB1, HLA-DMB, CIITA, HLA-DPA1, HLA-DPB1, HLA-DRA, HLA-DQB1; Coactivation molecules group: CD80, TNFRSF4, CD27, CD83, TNFSF9, CD40LG, CD70, ICOS, CD86, CD40, TNFSF4, ICOSLG, TNFRSF9, CD28; Effector cells group: PRF1, GZMB, TBX21, CD8B, ZAP70, IFNG, GZMK, EOMES, FASLG, CD8A, GZMA, GNLY; T cell traffic group: CXCL9, CCL3, CXCR3, CXCL10, CXCL11, CCL5, CCL4, CX3CL1, CX3CR1; NK cells group: GZMB, NKG7, CD160, GZMH, CD244, EOMES, KLRK1, NCR1, GNLY, KLRF1, FGFBP2, SH2D1B, KIR2DL4, IFNG, NCR3, KLRC2, CD226; T cells group: TRAC, TRBC2, TBX21, CD3E, CD3D, ITK, TRBC1, CD3G, CD28, TRAT1, CD5; B cells group: CR2, MS4A1, CD79A, FCRL5, STAP1, TNFRSF17, TNFRSF13B, CD19, BLK, CD79B, TNFRSF13C, CD22, PAX5; M1 signatures group: IL1B, IL12B, NOS2, SOCS3, IRF5, IL23A, TNF, IL12A, CMKLR1; Th1 signature group: IL12RB2, IL2, TBX21, IFNG, STAT4, IL21, CD40LG; Antitumor cytokines group: IFNA2, CCL3, TNF, TNFSF10, IL21, IFNB1; Checkpoint inhibition group: CTLA4, HAVCR2, CD274, LAG3, BTLA, VSIR, PDCD1LG2, TIGIT, PDCD1; Treg group: TNFRSF18, IKZF2, IL10, IKZF4, CTLA4, FOXP3, CCR8; T reg traffic group: CCL28, CCR10, CCR4, CCR8, CCL17, CCL22, CCL1; Neutrophil signature group: FCGR3B, CD177, CTSG, PGLYRP1, FFAR2, CXCR2, PRTN3, ELANE, MPO, CXCR1; Granulocyte traffic group: CXCL8, CCR3, CXCR2, CXCL2, CCL11, KITLG, CXCL1, CXCL5, CXCR1; MDSC group: ARG1, IL4I1, IL10, CYBB, IL6, PTGS2, IDO1; MDSC traffic group: CCL15, IL6R, CSF2RA, CSF2, CXCL8, CXCL12, IL6, CSF3, CCL26, CXCR4, CXCR2, CSF3R, CSF1, CXCL5, CSF1R; Macrophages group: MRC1, CD163, MSR1, SIGLEC1, IL4I1, CD68, IL10, CSF1R; Macrophage DC traffic group: CCL7, CCL2, XCR1, XCL1, CSF1, CCR2, CCL8, CSF1R; Th2 signature group: IL13, CCR4, IL10, IL5, IL4; Protumor cytokines group: MIF, TGFB1, IL10, TGFB3, IL6, TGFB2, IL22; CAF group: PDGFRB, COL6A3, FBLN1, CXCL12, COL6A2, COL6A1, LUM, CD248, COL5A1, MMP2, COL1A1, MFAP5, PDGFRA, LRP1, FGF2, MMP3, FAP, COL1A2, ACTA2; Matrix group: COL11A1, LAMB3, FN1, COL1A1, COL4A1, ELN, LGALS9, LGALS7, LAMC2, TNC, LAMA3, COL3A1, COL5A1, VTN, COL1A2; Matrix remodeling group: MMP1, PLOD2, MMP2, MMP12, ADAMTS5, ADAMTS4, LOX, MMP9, MMP11, MMP3, MMP1, CA9; Angiogenesis group: PGF, CXCL8, FLT1, ANGPT1, ANGPT2, VEGFC, VEGFB, CXCR2, VEGFA, VWF, CDH5, CXCL5, PDGFC, KDR, TEK; Endothelium group: NOS3, MMRN1, FLT1, CLEC14A, MMRN2, VCAM1, ENG, VWF, CDH5, KDR; Proliferation rate group: AURKA, MCM2, CCNB1, MYBL2, MCM6, CDK2, E2F1, CCNE1, ESCO2, CCND1, AURKB, BUB1, MKI67, PLK1, CETN3; EMT signature group: SNAI2, TWIST1, ZEB2, SNAI1, ZEB1, TWIST2, CDH2; Cyclic Nucleotides Metabolism group: ADCY4, PDE11A, PDE6A, PDE9A, PDE6C, ADCY7, PDE4A, PDE8A, PDE1B, PDE1A, GUCY2C, GUCY1A3, ADCY9, ADCY2, PDE6B, ADCY8, PDE8B, GUCY2F, PDE4C, PDE3A, GUCY1A2, PDE6G, PDE1C, GUCY2D, ADCY10, GUCY1B3, GUCY1B2, PDE7B, PDE5A, PDE6D, NPR2, ADCY5, NPR1, ADCY6, PDE7A, PDE2A, PDE4B, PDE10A, PDE6H, PDE4D, ADCY1, PDE3B, ADCY3; Glycolysis and Gluconeogenesis group: SLC2A9, PFKL, GCK, PFKFB4, SLC16A7, PCK1, PGAM2, GAPDH, BPGM, G6PC2, FBP2, LDHD, SLC2A3, GPI, ENO1, SLC25A11, PFKFB3, PFKM, LDHAL6B, SLC2A2, G6PC3, SLC2A6, GAPDHS, SLC2A11, PCK2, PFKP, PGK1, ALDOC, SLC2A10, ACYP2, SLC2A4, PKLR, HKDC1, PGK2, SLC2A8, PGAM1, SLC5A1, SLC5A12, SLC16A1, ALDOB, HK3, HK1, SLC5A9, GPD2, PFKFB1, SLC2A7, SLC5A11, SLC5A3, ACYP1, SLC16A8, PFKFB2, ALDOA, SLC5A2, HK2, ENO3, SLC2A12, FBP1, LDHA, LDHB, LDHC, G6PC, SLC2A14, SLC5A8, TPI1, SLC16A3, PKM2, ENO2, PGM1, UEVLD, LDHAL6A, SLC2A1, PGM2; Citric Acid Cycle group: ACLY, FAH, PC, MDH1B, SLC16A7, IREB2, PCK1, MDH1, SLC33A1, ALDH1B1, IDH3B, DLST, PDHB, MDH2, ACO1, IDH1, SLC5A6, HICDH, SLC16A8, GOT1, ME3, ME1, CS, OGDH, SDHA, ALDH5A1, CLYBL, SDHD, IDH3A, SLC25A1, ACSS2, SDHC, ACSS1, SUCLA2, SLC13A5, PDHX, SDHB, ALDH4A1, PCK2, DLD, ACO2, PDHA1, SLC13A2, FAHD1, IDH2, GOT2, ME2, ADSL, SUCLG2, SLC13A3, SUCLG1, SLC25A10, FH, IDH3G, SLC16A1, SLC25A11, PDHA2, DLAT; and, Fatty Acid Metabolism group: MLYCD, ALDH3A2, SLC27A5, SLC27A3, LIPC, SLC27A2, ACSL4, ACSL1, PCCB, SLC25A20, AADAC, SLC22A4, SLC22A5, ECH1, PCCA, SLC27A1, SLC27A4, CROT, ACSL5, ACSL3, CYP4F12.
In some embodiments, determining the gene group scores is performed using a single-sample GSEA (ssGSEA) technique and using RNA expression levels for each of the genes in each of the following gene groups: ECM associated group: ADAM8, ADAMTS4, C1QL3, CST7, CTSW, CXCL8, FASLG, LTB, MUC1, OSM, P4HA2, SCUBE1, SEMA4B, SEMA7A, SERPINE1, TCHH, TGFA, TGM2, TNFSF11, TNFSF9, WNT10B; TLS kidney group: ZNF683, POU2AF1, LAX1, CD79A, CXCL9, XCL2, JCHAIN, SLAMF7, CD38, SLAMF1, TNFRSF17, IRF4, HSH2D, PLA2G2D, MZB1; NRF2 signature group: TRIM16L, UGDH, KIAA1549, PANX2, FECH, LRP8, AKR1C2, FTH1, AKR1C3, CBR1, PFN2, CBX2, TXN, CYP4F11, CYP4F3, AKR1C1, AKR1B15, G6PD, PRDX1, TALDO1, EPT1, SRXN1, JAKMIP3, FTHL3, UCHL1, TXNRD1, Clorf131, CASKIN1, PGD, GPX2, OSGIN1, KIAA0319, CABYR, AIFM2, TRIM16, AKR1B10, GCLC, ABCC2, ETFB, IDH1, MAFG, NECAB2, ME1, PTGR1, PIR, GSR, RIT1, GCLM, ALDH3A1, NQO1, PKD1L2, NRG4, ABHD4, HRG, SLC7A11; and, tRCC signature group: FST, TRIM63, SLC10A2, ANTXRL, ERVV-2, SNX22, INHBE, SV2B, FAM124A, EPHA5, LUZP2, CPEB1, HOXB13, ALLC, KCNF1, NDRG4, GREB1, ASTN1, JSRP1, UBE2U, KCNQ4, MYO7B, BRINP2, C1QL2, CCDC136, SLC51B, CATSPERG, PMEL, BIRC7, PLK5, ADARB2, CFAP61, TUBB4A, PLIN4, ABCB5, SYT3, HCN4, CTSK, SPACA1, TRIM67, NMRK2, LGI3, ARHGEF4, NTSR2, KEL, SNCB, PLD5, ADGRB1, CYP17A1, IGFBPL1, TRIM71, SLC45A2, TP73, IP6K3, HABP2, RGS20, IGFN1, CDH17.
In some embodiments, generating the RC TME signature further comprises normalizing the gene group scores. In some embodiments, the normalizing comprises applying median scaling to the gene group scores.
In some embodiments, the plurality of RC TME types is associated with a respective plurality of RC TME signature clusters, wherein identifying, using the RC TME signature and from among a plurality of RC TME types, the RC TME type for the subject comprises associating the RC TME signature of the subject with a particular one of the plurality of RC TME signature clusters; and identifying the RC TME type for the subject as the RC TME type corresponding to the particular one of the plurality of RC TME signature clusters to which the RC TME signature of the subject is associated.
In some embodiments, methods described herein further comprise generating the plurality of RC TME signature clusters, the generating comprising obtaining multiple sets of RNA expression data by sequencing biological samples from multiple respective subjects, each of the multiple sets of RNA expression data indicating RNA expression levels for at least some genes in each of the at least some of the plurality of gene groups listed in Table 1; generating multiple RC TME signatures from the multiple sets of RNA expression data, each of the multiple RC TME signatures comprising gene group expression scores for respective gene groups in the plurality of gene groups, the generating comprising, for each particular one of the multiple RC TME signatures: determining the RC TME signature by determining the gene group expression scores using the RNA expression levels in the particular set of RNA expression data for which the particular one RC TME signature is being generated; and clustering the multiple RC signatures to obtain the plurality of RC TME signature clusters.
In some embodiments, the clustering comprises dense clustering, spectral clustering, k-means clustering, hierarchical clustering, and/or an agglomerative clustering.
In some embodiments, methods further comprise updating the plurality of RC TME signature clusters using the RC TME signature of the subject, wherein the RC TME signature of the subject is one of a threshold number RC TME signatures for a threshold number of subjects, wherein when the threshold number of RC TME signatures is generated the RC TME signature clusters are updated, wherein the threshold number of RC TME signatures is at least 50, at least 75, at least 100, at least 200, at least 500, at least 1000, or at least 5000 RC TME signatures.
In some embodiments, the updating is performed using a clustering algorithm selected from the group consisting of a dense clustering algorithm, spectral clustering algorithm, k-means clustering algorithm, hierarchical clustering algorithm, and an agglomerative clustering algorithm.
In some embodiments, the methods further comprise determining an RC TME type of a second subject, wherein the RC TME type of the second subject is identified using the updated RC TME signature clusters, wherein the identifying comprises determining an RC TME signature of the second subject from RNA expression data obtained by sequencing a biological sample obtained from the second subject; associating the RC TME signature of the second subject with a particular one of the plurality of the updated RC TME signature clusters; and identifying the RC TME type for the second subject as the RC TME type corresponding to the particular one of the plurality of updated RC TME signature clusters to which the RC TME signature of the second subject is associated.
In some embodiments, the plurality of RC TME types comprises: RC TME type A, RC TME type B, RC TME type C, RC TME type D, and RC TME type E.
In some embodiments, the methods further comprise identifying at least one therapeutic agent for administration to the subject using the RC TME type of the subject. In some embodiments, the at least one therapeutic agent comprises an immuno-oncology (IO) agent. In some embodiments, the at least one therapeutic agent comprises a tyrosine kinase inhibitor (TKI).
In some embodiments, identifying the at least one therapeutic agent based upon the RC TME type of the subject comprises identifying a TKI as the at least one therapeutic agent when the subject is identified as having RC TME type E.
In some embodiments, identifying the at least one therapeutic agent based upon the RC TME type of the subject comprises identifying a combination of a TKI and an IO agent as the at least one therapeutic agent when the subject is identified as having RC TME type A or RC TME type B.
In some embodiments, the methods further comprise administering the at least one identified therapeutic agent to the subject.
In some embodiments, the methods comprise normalizing the RNA expression data to transcripts per million (TPM) units prior to generating an RC myogenesis signature.
In some embodiments, RNA expression levels comprise RNA expression levels for at least three of the following genes: CASQ1, TNNI1, MB, MYLPF, MYH7, CKM, MYL2, MYL1, CSRP3, ACTA1, MYOZ1, TNNT3, TNNC2, and TNNC1.
In some embodiments, RNA expression levels comprise RNA expression levels for each of the following genes: CASQ1, TNNI1, MB, MYLPF, MYH7, CKM, MYL2, MYL1, CSRP3, ACTA1, MYOZ1, TNNT3, TNNC2, and TNNC1.
In some embodiments, an RC myogenesis signature is determined, using a single-sample GSEA (ssGSEA) technique, from RNA expression levels for each of the following genes: CASQ1, TNNI1, MB, MYLPF, MYH7, CKM, MYL2, MYL1, CSRP3, ACTA1, MYOZ1, TNNT3, TNNC2, and TNNC1.
In some embodiments, methods described by the disclosure further comprise determining whether the value of an RC myogenesis signature is greater than a specified threshold. In some embodiments, the specified threshold is 4.
In some embodiments, when the value of an RC myogenesis signature is greater than the specified threshold, the method further comprises identifying the subject as a non-responder to an immuno-oncology (IO) agent.
In some embodiments, the methods further comprise identifying one or more non-IO agents for the subject. In some embodiments, the one or more non-immunotherapeutic agents comprises a TKI. In some embodiments, the methods further comprise administering the identified one or more non-IO agents to the subject.
In some embodiments, generating a set of input features comprises determining the RC TME type for the subject, using RNA expression data as described herein.
In some embodiments, generating the set of input features comprises determining the RNA expression levels for one or more of the following genes: PD1, PD-L1, and PD-L2.
In some embodiments, generating the set of input features comprises determining the ECM associated signature for the subject using the RNA expression data by performing ssGSEA on the RNA expression data for at least three of the “ECM associated signature” genes listed in Table 1. In some embodiments, determining the ECM associated signature further comprises performing ssGSEA on the RNA expression data for at least 4, 5, 6, 7, 8, 9, or 10 of the “ECM associated signature” genes listed in Table 1. In some embodiments, determining the ECM associated signature further comprises performing ssGSEA on the RNA expression data for each of the “ECM associated signature” genes listed in Table 1.
In some embodiments, generating the set of input features comprises determining the Angiogenesis signature for the subject using the RNA expression data by performing ssGSEA on the RNA expression data for at least three of the “Angiogenesis” genes listed in Table 1. In some embodiments, determining the Angiogenesis signature further comprises performing ssGSEA on the RNA expression data for at least 4, 5, 6, 7, 8, 9, or 10 of the “Angiogenesis” genes listed in Table 1. In some embodiments, determining the Angiogenesis signature further comprises performing ssGSEA on the RNA expression data for each of the “Angiogenesis” genes listed in Table 1.
In some embodiments, generating the set of input features comprises determining the Proliferation rate signature for the subject using the RNA expression data by performing ssGSEA on the RNA expression data for at least three of the “Proliferation rate” genes listed in Table 1. In some embodiments, determining the Proliferation rate signature further comprises performing ssGSEA on the RNA expression data for at least 4, 5, 6, 7, 8, 9, or 10 of the “Proliferation rate” genes listed in Table 1. In some embodiments, determining the Proliferation rate signature further comprises performing ssGSEA on the RNA expression data for each of the “Proliferation rate” genes listed in Table 1.
In some embodiments, generating the set of input features comprises determining the similarity score by comparing the gene group scores of an RC TME signature of the subject to an average of gene group scores of a plurality of RC TME signatures from RC TME type B samples and/or an average of gene group scores of a plurality of RC TME signatures from RC TME type C samples.
In some embodiments, determining the similarity score comprises calculating a Spearman correlation coefficient between the gene group scores for the respective plurality of gene groups of an RC TME signature of the subject; and averaged gene group scores for a plurality of gene groups of other RC type B and/or RC type C samples.
In some embodiments, the methods further comprise identifying the subject as being “IO-low” when the responder score is ≤0.05; “IO-medium” when the responder score is ≥0.05 and <0.5; or (iii) “IO-high” when the responder score is ≥0.5.
In some embodiments, a specified threshold is 0.5.
In some embodiments, methods described herein further comprise identifying an IO agent for administration to the subject when the responder score of the subject is above the specified threshold or wherein the subject is identified as being “IO-high”.
In some embodiments, the methods further comprise administering an IO agent to the subject when the responder score of the subject is above the specified threshold or wherein the subject is identified as being “IO-high”. In some embodiments, the IO agent comprises a PD1 inhibitor, a PD-L1 inhibitor, a PD-L2 inhibitor, or a CTLA-4 inhibitor.
In some embodiments, RNA expression data comprises the mean of scaled expression levels of PD1 and PDL1.
In some embodiments, methods described by the disclosure further comprise determining whether the subject comprises one or more of the following biomarkers Ploidy >4; a value of a RC myogenesis signature for the subject is greater than 4; one or more mTOR activating mutations; and/or one or more mutations in a gene or genes associated with antigen presentation. In some embodiments, the determining takes place prior to the input features being input into a machine learning model.
In some embodiments, methods described by the disclosure further comprise identifying the subject as having a responder score of 0 when the subject comprises one or more of the biomarkers.
In some embodiments, generating a set of input features comprises determining an RC TME type for the subject, using the RNA expression data as described herein.
In some embodiments, generating a set of input features comprises determining the Macrophage signature for the subject using the RNA expression data by performing ssGSEA on the RNA expression data for at least three of the “Macrophages” genes listed in Table 1. In some embodiments, determining the Macrophage signature further comprises performing ssGSEA on the RNA expression data for at least 4, 5, 6, 7, 8, 9, or 10 of the “Macrophages” genes listed in Table 1. In some embodiments, determining the Macrophage signature further comprises performing ssGSEA on the RNA expression data for each of the “Macrophages” genes listed in Table 1.
In some embodiments, generating the a of input features comprises determining the Angiogenesis signature for the subject using the RNA expression data by performing ssGSEA on the RNA expression data for at least three of the “Angiogenesis” genes listed in Table 1. In some embodiments, determining the Angiogenesis signature further comprises performing ssGSEA on the RNA expression data for at least 4, 5, 6, 7, 8, 9, or 10 of the “Angiogenesis” genes listed in Table 1. In some embodiments, determining the Angiogenesis signature further comprises performing ssGSEA on the RNA expression data for each of the “Angiogenesis” genes listed in Table 1.
In some embodiments, generating a set of input features comprises determining the Proliferation rate signature for the subject using the RNA expression data by performing ssGSEA on the RNA expression data for at least three of the “Proliferation rate” genes listed in Table 1. In some embodiments, determining the Proliferation rate signature further comprises performing ssGSEA on the RNA expression data for at least 4, 5, 6, 7, 8, 9, or 10 of the “Proliferation rate” genes listed in Table 1. In some embodiments, determining the Proliferation rate signature further comprises performing ssGSEA on the RNA expression data for each of the “Proliferation rate” genes listed in Table 1.
In some embodiments, generating a set of input features comprises determining the similarity score by comparing the gene group scores of an RC TME signature of the subject to an average of gene group scores of a plurality of RC TME signatures from RC TME type B samples. In some embodiments, determining the similarity score comprises calculating a Spearman correlation coefficient between: the gene group scores for the respective plurality of gene groups of an RC TME signature of the subject; and averaged gene group scores for a plurality of gene groups of other RC type B and/or RC type C samples.
In some embodiments, methods described by the disclosure further comprise identifying the subject as being “TKI-low” when the responder score is ≤0.75; “TKI-medium” when the responder score is ≥0.75 and <0.95; or “TKI-high” when the responder score is ≥0.95.
In some embodiments, a specified threshold is 0.95.
In some embodiments, methods described by the disclosure further comprise identifying a TKI for administration to the subject when the responder score of the subject is above the specified threshold or wherein the subject is identified as being “TKI-medium” or “TKI-high”.
In some embodiments, methods described by the disclosure further comprise administering a TKI to the subject when the responder score of the subject is above the specified threshold or wherein the subject is identified as being “TKI-medium” or “TKI-high”.
In some embodiments, a TKI comprises a small molecule or antibody. In some embodiments, an antibody is a monoclonal antibody.
In some embodiments, renal cancer is clear cell renal carcinoma (ccRCC).
In some embodiments, when a subject is identified as “TKI-low” using a TKI responder score, methods described herein further comprise identifying the one or more therapeutic agents as: a TKI when the subject is identified, using the IO responder score, as “IO-low”; a combination of a TKI and an IO agent when the subject is identified, using the IO responder score, as “IO-low”; or, a combination of a TKI and an IO agent when the subject is identified, using the IO responder score, as “IO-medium” or “IO-high”.
In some embodiments, when a subject is identified as “TKI-medium” using the TKI responder score, methods described by the disclosure further comprise identifying the one or more therapeutic agents as a combination of a TKI and an IO agent when the subject is identified, using the IO responder score, as “IO-high”.
In some embodiments, when a subject is identified as “TKI-high” using the TKI responder score, methods described by the disclosure further comprise identifying the one or more therapeutic agents as: a TKI when the subject is identified, using the IO responder score, as “IO-low” or “IO-medium”; or, a combination of a TKI and an IO agent when the subject is identified, using the IO responder score, as “IO-high”.
In some embodiments, methods described by the disclosure further comprise administering an identified one or more therapeutic agents to the subject.
In some embodiments, methods described by the disclosure further comprise providing a recommendation that the identified one or more therapeutic agents be administered to the subject.
Aspects of the disclosure relate to methods for characterizing subjects having certain renal (kidney) cancers (RC), such as clear cell renal carcinoma (ccRCC). Clear cell renal cell carcinoma (ccRCC) is one of the most common renal cancers, and is known to have marked genetic intratumor heterogeneity (ITH). It has been found that ITH underlies tumor evolution, metastasis, and clinical responses to various therapies. Studies have focused on the utilization of whole exome sequencing to uncover evolution patterns, the mutational profile underlying ITH, clonal architecture, and inter-patient tumor differences. Multi-region DNA sequencing has revealed that del(3p) and ampl(5q), two common aberrations observed in 43% of ccRCC, are the first genetic events in kidney cell malignization. Additional DNA sequencing analyses suggest that clonal architecture can divide RC tumors into three prognostic subtypes that predict clinical outcome. Currently, immunotherapy has dramatically improved the clinical outcomes of RC patients, and the combination of ipilimumab and nivolumab is FDA-approved for frontline metastatic ccRCC. However, notable differences have been observed in patient responses to immunotherapy, which cannot be explained through genetic heterogeneity alone. Accordingly, the inventors have recognized that there is a need to develop methods for molecular characterization of RC types specifically based upon the underlying biology of both the tumor microenvironment and malignant cells, rather than more broadly defined cancer biomarkers and/or ITH.
Aspects of the disclosure relate to statistical techniques for analyzing expression data (e.g., RNA expression data), which was obtained from a biological sample obtained from a subject that has renal cancer (RC), is suspected of having RC, or is at risk of developing RC, in order to generate a gene expression signature for the subject (termed an “RC TME signature” or “RC TME signature” herein) and use this signature to identify a particular RC TME type that the subject may have. In some embodiments, the RC is ccRCC.
The inventors have recognized that a combination of certain gene group scores (e.g., a gene group scores for at least some of the gene groups listed in Table 1) may be combined to form an RC TME signature that characterizes patients having RC more accurately than previously developed methods. A RC TME signature comprising a combination of gene group scores associated with the tumor microenvironment and gene group scores associated with malignant cells, in turn, may be used to identify the subject as having a particular renal cancer (RC) tumor microenvironment (TME) type. In some embodiments, such RC TME types are useful for identifying the prognosis and/or likelihood that a subject will respond to particular therapeutic interventions (e.g., immuno-oncology (IO) agents, tyrosine kinase inhibitors (TKI), combinations of IO and TKI, etc.).
The inventors have also recognized that data relating to certain gene expression signatures (e.g., RC TME signature, myogenesis signature, etc.) of subjects having RC may be used to train machine learning-based models to produce responder scores that are reflective of a subject's likelihood of responding to treatment with certain therapeutic agents (e.g., IO agents, TKIs, combinations of IO agents and TKIs, etc.). Such responder scores are useful, in some embodiments, for guiding selection of therapeutic agents for treating RC patients by aiding in selection of therapeutic agents to which a patient has an increased likelihood of responding. The methods also aid in steering selection away from therapeutic agents to which a patient is unlikely to respond, as in the case of “clear IO non-responders” further described in the Examples.
The use of RC TME signatures comprising the combinations of gene group scores described by the disclosure represents an improvement over previously described RC molecular biomarkers or tumor microenvironment analyses because the specific groups of genes used to produce the RC TME signatures described herein better reflect the molecular tumor microenvironments of RC because these gene groups are associated with 1) immune and stromal tumor biology, and 2) renal cancer metabolic pathway activity. These focused combinations of gene groups (e.g., gene groups consisting of some or all of the genes listed in Table 1, or some or all genes of a myogenesis signature listed in Table 2) are unconventional, and differ from previously described molecular signatures, which attempt to incorporate expression data from either very large numbers of genes, or only account for certain subsets of genes involved in cancer (e.g., analysis limited only to immune cells).
The RC TME typing methods described herein have several utilities. For example, identifying a subject's RC TME type using methods described herein may allow for the subject to be diagnosed as having (or being at a high risk of developing) an aggressive form of RC at a timepoint that is not possible with previously described RC characterization methods. Earlier detection of aggressive RC types, enabled by the RC TME signatures described herein, improve the patient diagnostic technology by enabling earlier chemotherapeutic intervention for patients than currently possible for patients tested for RC using other methods.
As described herein, the inventors have also determined that subjects identified by methods described herein as having RX TME type A or RC TME type B are characterized as having a good prognosis and/or an increased likelihood of responding to certain therapeutic treatments, such as a combination of IO agents and TKIs.
Conversely, the inventors have determined that identifying a subject as having RC TME type E using methods described herein, are less likely to respond to IO agents but will likely respond to TKIs. Additionally, the inventors have determined that identifying subjects having a high myogenesis signature and/or certain other biomarkers (e.g., high ploidy, mutations in genes associated with mTOR activation or antigen presentation, etc.) are likely to be “clear IO non-responders”, and therefore should not be administered immunotherapeutic agents as a first line therapy for RC (e.g., ccRCC). Thus, the techniques developed by the inventors and described herein improve patient treatment and associated outcomes by increasing patient comfort, and avoiding toxic side effects of therapy that is not expected to be effective for the subject.
Clear Cell Renal Carcinomas (ccRCCs)
Aspects of the disclosure relate to methods of determining the renal cancer (RC) tumor microenvironment (TME) type (of a subject having, suspected of having, or at risk of having RC. As used herein, a “subject” may be a mammal, for example a human, non-human primate, rodent (e.g., rat, mouse, guinea pig, etc.), dog, cat, horse etc. In some embodiments, the subject is a human. The terms “individual” or “subject” may be used interchangeably with “patient.” As used herein, “renal cancer” refers to any renal or kidney adenocarcinoma, or any other types of malignancies caused by one or more various genetic mutations in the body that affects cells (originally present in or metastasized to) the kidneys of a subject. As used herein, “cancer” refers to any malignant and/or invasive growth or tumor caused by abnormal cell growth in a subject, including solid tumors, blood cancer, bone marrow or lymphoid cancer, etc. Examples of renal cancers include but are not limited to adenocarcinoma of the kidney(s) that are derived from proximal nephron and/or tubular epithelium of the kidney(s), for example clear cell renal cell carcinoma (ccRCC), and malignant epithelial cells with clear cytoplasm and a compact-alveolar (nested) or acinar growth pattern interspersed with intricate, arborizing vasculature.
A subject having RC may exhibit one or more signs or symptoms of RC, for example the presence of cancerous cells (e.g., tumor cells), fever, swelling, bleeding (e.g., bloody urine), nausea and vomiting, persistent lower back pain, and weight loss. In some embodiments, a subject having RC does not exhibit one or more signs or symptoms of RC. In some embodiments, a subject having RC has been diagnosed by a medical professional (e.g., licensed physician) as having RC based upon one or more assays (e.g., clinical assays, molecular diagnostics, etc.) that indicate that the subject has ccRCC, even in the absence of one or more signs or symptoms.
A subject suspected of having RC typically exhibits one or more signs or symptoms of RC, for example ccRCC. In some embodiments, a subject suspected of having RC exhibits one or more signs or symptoms of RC but has not been diagnosed by a medical professional (e.g., a licensed physician) and/or has not received a test result (e.g., a clinical assay, molecular diagnostic, etc.) indicating that the subject has RC.
A subject at risk of having RC may or may not exhibit one or more signs or symptoms of RC. In some embodiments, a subject at risk of having RC comprises one or more risk factors that increase the likelihood that the subject will develop RC. Examples of risk factors include the presence of pre-cancerous cells in a clinical sample, having one or more genetic mutations that predispose the subject to developing cancer (e.g., RC, such as ccRCC), taking one or more medications that increase the likelihood that the subject will develop cancer (e.g., RC, such as ccRCC), family history of RC, and the like.
Various acts of process 100 may be implemented using any suitable computing device(s). For example, in some embodiments, one or more acts of the illustrative process 100 may be implemented in a clinical or laboratory setting. For example, one or more acts of the process 100 may be implemented on a computing device that is located within the clinical or laboratory setting. In some embodiments, the computing device may directly obtain RNA expression data from a sequencing apparatus located within the clinical or laboratory setting. For example, a computing device included in the sequencing apparatus may directly obtain the RNA expression data from the sequencing apparatus. In some embodiments, the computing device may indirectly obtain RNA expression data from a sequencing apparatus that is located within or external to the clinical or laboratory setting. For example, a computing device that is located within the clinical or laboratory setting may obtain expression data via a communication network, such as Internet or any other suitable network, as aspects of the technology described herein are not limited to any particular communication network.
Additionally or alternatively, one or more acts of the illustrative process 100 may be implemented in a setting that is remote from a clinical or laboratory setting. For example, the one or more acts of process 100 may be implemented on a computing device that is located externally from a clinical or laboratory setting. In this case, the computing device may indirectly obtain RNA expression data that is generated using a sequencing apparatus located within or external to a clinical or laboratory setting. For example, the expression data may be provided to computing device via a communication network, such as Internet or any other suitable network.
It should be appreciated that, in some embodiments, not all acts of process 100, as illustrated in
As one illustrative example, in some embodiments, the sequencing data may comprise bulk sequencing data. The bulk sequencing data may comprise at least 1 million reads, at least 5 million reads, at least 10 million reads, at least 20 million reads, at least 50 million reads, or at least 100 million reads. In some embodiments, the sequencing data comprises bulk RNA sequencing (RNA-seq) data, single cell RNA sequencing (scRNA-seq) data, or next generation sequencing (NGS) data. In some embodiments, the sequencing data comprises microarray data. Next, process 100 proceeds to act 104, where the sequencing data obtained at act 102 is processed to obtain gene expression data. This may be done in any suitable way and may involve normalizing bulk sequencing data to transcripts-per-million (TPM) units (or other units) and/or log transforming the RNA expression levels in TPM units. Converting the data to TPM units and normalization are described herein including with reference to
Next, process 100 proceeds to act 106, where a renal cancer (RC) tumor microenvironment (TME) signature is generated for the subject using the RNA expression data generated at act 104 (e.g., from bulk-sequencing data, converted to TPM units and subsequently log-normalized, as described herein including with reference to
As described herein, in some embodiments, an RC TME signature comprises two or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, etc.) gene group scores. In some embodiments, the two or more gene group scores comprise gene group scores (which may also be referred to as gene group enrichment scores or gene group expression scores) for some or all of the gene groups shown in Table 1.
Accordingly, act 106 comprises: act 108 where the gene group scores are determined, act 110 where the RC TME signature is determined, and act 112 where the RC TME type is determined by using RC TME signature. In some embodiments, determining the gene group scores comprises determining, for each of multiple (e.g., some or all of the) gene groups listed in Table 1, a respective gene score. In some embodiments, determining the gene group scores comprises determining respective gene group scores for 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, or more gene groups (e.g., gene groups listed in Table 1). The gene group score for a particular gene group may be determined using RNA expression levels for at least some of the genes in the gene group (e.g., the expression levels obtained at act 104). The RNA expression levels may be processed using a gene set enrichment analysis (GSEA) technique to determine the score for the particular gene group.
In some embodiments, determining the RC TME gene signature comprises: determining gene group scores using the RNA expression levels for at least three genes from each of at least two of the gene groups, the gene groups including: (a) MHC I group: HLA-A, HLA-B, HLA-C, B2M, TAP1, TAP2, NLRC5, TAPBP; (b) MHC II group: HLA-DQA1, HLA-DMA, HLA-DRB1, HLA-DMB, CIITA, HLA-DPA1, HLA-DPB1, HLA-DRA, HLA-DQB1; and (c) Coactivation molecules group: CD28, CD40, TNFRSF4, ICOS, TNFRSF9, CD27, CD80, CD86, CD40LG, CD83, TNFSF4, ICOSLG, TNFSF9, CD70; (d) Effector cells group: IFNG, GZMA, GZMB, PRF1, GZMK, ZAP70, GNLY, FASLG, TBX21, EOMES, CD8A, CD8B; (e) T cell traffic group: CXCL9, CCL3, CXCR3, CXCL10, CXCL11, CCL5, CCL4, CX3CL1, CX3CR1; (f) NK cells group: GZMB, NKG7, CD160, GZMH, CD244, EOMES, KLRK1, NCR1, GNLY, KLRF1, FGFBP2, SH2D1B, KIR2DL4, IFNG, NCR3, KLRC2, CD226; (g) T cells group: TRAC, TRBC2, TBX21, CD3E, CD3D, ITK, TRBC1, CD3G, CD28, TRAT1, CD5; (h) B cells group: CR2, MS4A1, CD79A, FCRL5, STAP1, TNFRSF17, TNFRSF13B, CD19, BLK, CD79B, TNFRSF13C, CD22, PAX5; (i) M1 signature group: IL1B, IL12B, NOS2, SOCS3, IRF5, IL23A, TNF, IL12A, CMKLR1; (j) Th1 signature group: IL12RB2, IL2, TBX21, IFNG, STAT4, IL21, CD40LG; (k) Antitumor cytokines group: IFNA2, CCL3, TNF, TNFSF10, IL21, IFNB1; (l) Checkpoint inhibition group: CTLA4, HAVCR2, CD274, LAG3, BTLA, VSIR, PDCD1LG2, TIGIT, PDCD1; (m) Treg group: TNFRSF18, IKZF2, IL10, IKZF4, CTLA4, FOXP3, CCR8; (n) T reg traffic group: CCL28, CCR10, CCR4, CCR8, CCL17, CCL22, CCL1; (o) Neutrophil signature group: FCGR3B, CD177, CTSG, PGLYRP1, FFAR2, CXCR2, PRTN3, ELANE, MPO, CXCR1; (p) Granulocyte traffic group: CXCL8, CCR3, CXCR2, CXCL2, CCL11, KITLG, CXCL1, CXCL5, CXCR1; (q) MDSC group: IDO1, ARG1, IL10, CYBB, PTGS2, IL4I1, IL6; (r) MDSC traffic group: CCL15, IL6R, CSF2RA, CSF2, CXCL8, CXCL12, IL6, CSF3, CCL26, CXCR4, CXCR2, CSF3R, CSF1, CXCL5, CSF1R; (s) Macrophage group: MRC1, CD163, MSR1, SIGLEC1, IL4I1, CD68, IL10, CSF1R; (t) Macrophage DC traffic group: CCL7, CCL2, XCR1, XCL1, CSF1, CCR2, CCL8, CSF1R; (u) Th2 signature group: IL13, CCR4, IL10, IL5, IL4; (v) Protumor cytokines group: MIF, TGFB1, IL10, TGFB3, IL6, TGFB2, IL22; (w) Cancer associated fibroblast (CAF) group: PDGFRB, COL6A3, FBLN1, CXCL12, COL6A2, COL6A1, LUM, CD248, COL5A1, MMP2, COL1A1, MFAP5, PDGFRA, LRP1, FGF2, MMP3, FAP, COL1A2, ACTA2; (x) Matrix group: COL11A1, LAMB3, FN1, COL1A1, COL4A1, ELN, LGALS9, LGALS7, LAMC2, TNC, LAMA3, COL3A1, COL5A1, VTN, COL1A2; (y) Matrix-remodeling group: MMP1, PLOD2, MMP2, MMP12, ADAMTS5, ADAMTS4, LOX, MMP9, MMP11, MMP3, MMP1, CA9; (z) Angiogenesis group: VEGFA, VEGFB, VEGFC, PDGFC, CXCL8, CXCR2, FLT1, PGF, KDR, ANGPT1, ANGPT2, TEK, VWF, CDH5; (aa) endothelium group: NOS3, MMRN1, FLT1, CLEC14A, MMRN2, VCAM1, ENG, VWF, CDH5, KDR; (bb) Proliferation rate group: MKI67, ESCO2, CETN3, CDK2, CCND1, CCNE1, AURKA, AURKB, E2F1, MYBL2, BUB1, CCNB1, MCM2, MCM6; (cc) EMT signature group: SNAI2, TWIST1, ZEB2, SNAI1, ZEB1, TWIST2, CDH2; (dd) Cyclic Nucleotides Metabolism group: ADCY4, PDE11A, PDE6A, PDE9A, PDE6C, ADCY7, PDE4A, PDE8A, PDE1B, PDE1A, GUCY2C, GUCY1A3, ADCY9, ADCY2, PDE6B, ADCY8, PDE8B, GUCY2F, PDE4C, PDE3A, GUCY1A2, PDE6G, PDE1C, GUCY2D, ADCY10, GUCY1B3, GUCY1B2, PDE7B, PDE5A, PDE6D, NPR2, ADCY5, NPR1, ADCY6, PDE7A, PDE2A, PDE4B, PDE10A, PDE6H, PDE4D, ADCY1, PDE3B, ADCY3; (ee) Glycolysis and Gluconeogenesis group: SLC2A9, PFKL, GCK, PFKFB4, SLC16A7, PCK1, PGAM2, GAPDH, BPGM, G6PC2, FBP2, LDHD, SLC2A3, GPI, ENO1, SLC25A11, PFKFB3, PFKM, LDHAL6B, SLC2A2, G6PC3, SLC2A6, GAPDHS, SLC2A11, PCK2, PFKP, PGK1, ALDOC, SLC2A10, ACYP2, SLC2A4, PKLR, HKDC1, PGK2, SLC2A8, PGAM1, SLC5A1, SLC5A12, SLC16A1, ALDOB, HK3, HK1, SLC5A9, GPD2, PFKFB1, SLC2A7, SLC5A11, SLC5A3, ACYP1, SLC16A8, PFKFB2, ALDOA, SLC5A2, HK2, ENO3, SLC2A12, FBP1, LDHA, LDHB, LDHC, G6PC, SLC2A14, SLC5A8, TPI1, SLC16A3, PKM2, ENO2, PGM1, UEVLD, LDHAL6A, SLC2A1, PGM2; (xx) Citric Acid Cycle group: ACLY, FAH, PC, MDH1B, SLC16A7, IREB2, PCK1, MDH1, SLC33A1, ALDH1B1, IDH3B, DLST, PDHB, MDH2, ACO1, IDH1, SLC5A6, HICDH, SLC16A8, GOT1, ME3, ME1, CS, OGDH, SDHA, ALDH5A1, CLYBL, SDHD, IDH3A, SLC25A1, ACSS2, SDHC, ACSS1, SUCLA2, SLC13A5, PDHX, SDHB, ALDH4A1, PCK2, DLD, ACO2, PDHA1, SLC13A2, FAHD1, IDH2, GOT2, ME2, ADSL, SUCLG2, SLC13A3, SUCLG1, SLC25A10, FH, IDH3G, SLC16A1, SLC25A11, PDHA2, DLAT; and (ff) Fatty Acid Metabolism group: MLYCD, ALDH3A2, SLC27A5, SLC27A3, LIPC, SLC27A2, ACSL4, ACSL1, PCCB, SLC25A20, AADAC, SLC22A4, SLC22A5, ECH1, PCCA, SLC27A1, SLC27A4, CROT, ACSL5, ACSL3, CYP4F12.
Aspects of determining the gene group enrichment scores are described herein, including with reference to
As described above, at act 110, the RC TME signature is produced. In some embodiments, the RC TME signature consists of only gene group scores for one or more (e.g., all) gene groups listed in Table 1. In some embodiments, the RC TME signature comprises gene group scores for at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, or 33 gene groups listed in Table 1. In some embodiments, each gene group score is determined using RNA expression levels of some or all (e.g., at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, etc.) of the genes of each gene group listed in Table 1.
Next, process 100 proceeds to act 112, where an RC TME type is identified for the subject using the RC TME signature generated at act 110. This may be done in any suitable way. For example, in some embodiments, the each of the possible RC TME types is associated with a respective plurality of RC TME signature clusters. In such embodiments, an RC TME type for the subject may be identified by associating the RC TME signature of the subject with a particular one of the plurality of RC TME signature clusters; and identifying the RC TME type for the subject as the RC TME type corresponding to the particular one of the plurality of RC TME signature clusters to which the RC TME signature of the subject is associated. Examples of RC TME types are described herein. Aspects of identifying an RC TME type for a subject are described herein including in the section below titled “Generating RC TME Signature and Identifying TME Type”. In some embodiments, process 100 completes after act 112 completes. In some such embodiments the determined RC TME signature and/or identified RC TME Type may be stored for subsequent use, provided to one or more recipients (e.g., a clinician, a researcher, etc.), and/or used to update the RC TME signature clusters (as described hereinbelow).
However, in some embodiments, one or more other acts are performed after act 112. For example, in the illustrated embodiment, a subject's prognosis may be identified based on the RC TME type determined for the subject. For example, in some embodiments, the subject is identified (at act 114) as having a good prognosis when the subject is identified as having RC TME type D or RC TME type E. Subsequently, or as an alternative to act 114, process 100 may proceed to act 116, where the subject's RC TME type identified in act 112 is used to identify (or recommend) a therapeutic agent for administration to the subject. For example, a subject may be identified as having an increased likelihood of responding to a TKI when the subject is identified as having RC type E.
In some embodiments, process 100 completes after act 114 or 116 completes. In some such embodiments the determined RC TME signature and/or identified RC TME Type may be stored for subsequent use, provided to one or more recipients (e.g., a clinician, a researcher, etc.), and/or used to update the RC TME signature clusters (as described hereinbelow).
Aspects of the disclosure relate to methods for determining an RC TME type of a subject by obtaining sequencing data from a biological sample that has been obtained from the subject.
The biological sample may be from any source in the subject's body including, but not limited to, any fluid such as blood (e.g., whole blood, blood serum, or blood plasma), lymph node, and kidney(s). Other source in the subject's body may be from saliva, tears, synovial fluid, cerebrospinal fluid, pleural fluid, pericardial fluid, ascitic fluid, and/or urine], hair, skin (including portions of the epidermis, dermis, and/or hypodermis), oropharynx, laryngopharynx, esophagus, bronchus, salivary gland, tongue, oral cavity, nasal cavity, vaginal cavity, anal cavity, stomach, intestine, bone, bone marrow, brain, thymus, spleen, appendix, colon, rectum, anus, liver, biliary tract, pancreas, ureter, bladder, urethra, uterus, vagina, vulva, ovary, cervix, scrotum, penis, prostate, testicle, seminal vesicles, and/or any type of tissue (e.g., muscle tissue, epithelial tissue, connective tissue, or nervous tissue).
The biological sample may be any type of sample including, for example, a sample of a bodily fluid, one or more cells, one or more pieces of tissue(s) or organ(s). In some embodiments, the biological sample comprises kidney tissue sample of the subject. Examples of kidney tissue samples include but are not limited to glomerulus parietal cells, glomerulus podocytes, proximal tubule brush border cells, loop of Henle thin segment cells, thick ascending limb cells, distal tubule cells, collecting duct principal cells, collecting duct intercalated cells, interstitial kidney cells, and kidney tumor cells.
In some embodiments, a kidney tissue sample may be obtained from a subject using a surgical procedure (e.g., laparoscopic surgery, microscopically controlled surgery, or endoscopy), punch biopsy, endoscopic biopsy, or needle biopsy (e.g., a fine-needle aspiration, core needle biopsy, vacuum-assisted biopsy, or image-guided biopsy).
A sample of lymph node or blood, in some embodiments, refers to a sample comprising cells, e.g., cells from a blood sample or lymph node sample. In some embodiments, the sample comprises non-cancerous cells. In some embodiments, the sample comprises pre-cancerous cells. In some embodiments, the sample comprises cancerous cells. In some embodiments, the sample comprises blood cells. In some embodiments, the sample comprises lymph node cells. In some embodiments, the sample comprises lymph node cells and blood cells.
A sample of blood may be a sample of whole blood or a sample of fractionated blood. In some embodiments, the sample of blood comprises whole blood. In some embodiments, the sample of blood comprises fractionated blood. In some embodiments, the sample of blood comprises buffy coat. In some embodiments, the sample of blood comprises serum. In some embodiments, the sample of blood comprises plasma. In some embodiments, the sample of blood comprises a blood clot.
In some embodiments, a sample of blood is collected to obtain the cell-free nucleic acid (e.g., cell-free DNA) in the blood.
In some embodiments, the sample may be from a cancerous tissue or an organ or a tissue or organ suspected of having one or more cancerous cells. In some embodiments, the sample may be from a healthy (e.g., non-cancerous) tissue or organ. In some embodiments, a sample from a subject (e.g., a biopsy from a subject) may include both healthy and cancerous cells and/or tissue. In certain embodiments, one sample will be taken from a subject for analysis. In some embodiments, more than one (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more) samples may be taken from a subject for analysis. In some embodiments, one sample from a subject will be analyzed. In certain embodiments, more than one (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more) samples may be analyzed. If more than one sample from a subject is analyzed, the samples may be procured at the same time (e.g., more than one sample may be taken in the same procedure), or the samples may be taken at different times (e.g., during a different procedure including a procedure 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 days; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 weeks; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 months, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 years, or 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 decades after a first procedure). A second or subsequent sample may be taken or obtained from the same region (e.g., from the same tumor or area of tissue) or a different region (including, e.g., a different tumor). A second or subsequent sample may be taken or obtained from the subject after one or more treatments, and may be taken from the same region or a different region. As a non-limiting example, the second or subsequent sample may be useful in determining whether the cancer in each sample has different characteristics (e.g., in the case of samples taken from two physically separate tumors in a patient) or whether the cancer has responded to one or more treatments (e.g., in the case of two or more samples from the same tumor prior to and subsequent to a treatment).
Any of the biological samples described herein may be obtained from the subject using any known technique. See, for example, the following publications on collecting, processing, and storing biological samples, each of which is incorporated by reference herein in its entirety: Biospecimens and biorepositories: from afterthought to science by Vaught et al. (Cancer Epidemiol Biomarkers Prev. 2012 February; 21(2):253-5), and Biological sample collection, processing, storage and information management by Vaught and Henderson (IARC Sci Publ. 2011; (163):23-42).
Any of the biological samples from a subject described herein may be stored using any method that preserves stability of the biological sample. In some embodiments, preserving the stability of the biological sample means inhibiting components (e.g., DNA, RNA, protein, or tissue structure or morphology) of the biological sample from degrading until they are measured so that when measured, the measurements represent the state of the sample at the time of obtaining it from the subject. In some embodiments, a biological sample is stored in a composition that is able to penetrate the same and protect components (e.g., DNA, RNA, protein, or tissue structure or morphology) of the biological sample from degrading. As used herein, degradation is the transformation of a component from one form to another form such that the first form is no longer detected at the same level as before degradation.
In some embodiments, the biological sample is stored using cryopreservation. Non-limiting examples of cryopreservation include, but are not limited to, step-down freezing, blast freezing, direct plunge freezing, snap freezing, slow freezing using a programmable freezer, and vitrification. In some embodiments, the biological sample is stored using lyophilization. In some embodiments, a biological sample is placed into a container that already contains a preservant (e.g., RNALater to preserve RNA) and then frozen (e.g., by snap-freezing), after the collection of the biological sample from the subject. In some embodiments, such storage in frozen state is done immediately after collection of the biological sample. In some embodiments, a biological sample may be kept at either room temperature or 4° C. for some time (e.g., up to an hour, up to 8 h, or up to 1 day, or a few days) in a preservant or in a buffer without a preservant, before being frozen.
Non-limiting examples of preservants include formalin solutions, formaldehyde solutions, RNALater or other equivalent solutions, TriZol or other equivalent solutions, DNA/RNA Shield or equivalent solutions, EDTA (e.g., Buffer AE (10 mM TrisCl; 0.5 mM EDTA, pH 9.0)) and other coagulants, and Acids Citrate Dextronse (e.g., for blood specimens).
In some embodiments, special containers may be used for collecting and/or storing a biological sample. For example, a vacutainer may be used to store blood. In some embodiments, a vacutainer may comprise a preservant (e.g., a coagulant, or an anticoagulant). In some embodiments, a container in which a biological sample is preserved may be contained in a secondary container, for the purpose of better preservation, or for the purpose of avoid contamination.
Any of the biological samples from a subject described herein may be stored under any condition that preserves stability of the biological sample. In some embodiments, the biological sample is stored at a temperature that preserves stability of the biological sample. In some embodiments, the sample is stored at room temperature (e.g., 25° C.). In some embodiments, the sample is stored under refrigeration (e.g., 4° C.). In some embodiments, the sample is stored under freezing conditions (e.g., −20° C.). In some embodiments, the sample is stored under ultralow temperature conditions (e.g., −50° C. to −800° C.). In some embodiments, the sample is stored under liquid nitrogen (e.g., −1700° C.). In some embodiments, a biological sample is stored at −60° C. to −8-° C. (e.g., −70° C.) for up to 5 years (e.g., up to 1 month, up to 2 months, up to 3 months, up to 4 months, up to 5 months, up to 6 months, up to 7 months, up to 8 months, up to 9 months, up to 10 months, up to 11 months, up to 1 year, up to 2 years, up to 3 years, up to 4 years, or up to 5 years). In some embodiments, a biological sample is stored as described by any of the methods described herein for up to 20 years (e.g., up to 5 years, up to 10 years, up to 15 years, or up to 20 years).
Aspects of the disclosure relate to methods of determining an RC TME type of a subject using sequencing data or RNA expression data obtained from a biological sample from the subject.
The sequencing data may be obtained from the biological sample using any suitable sequencing technique and/or apparatus. In some embodiments, the sequencing apparatus used to sequence the biological sample may be selected from any suitable sequencing apparatus known in the art including, but not limited to, Illumina™, SOLid™, Ion Torrent™, PacBio™, a nanopore-based sequencing apparatus, a Sanger sequencing apparatus, or a 454™ sequencing apparatus. In some embodiments, sequencing apparatus used to sequence the biological sample is an Illumina sequencing (e.g., NovaSeq™, NextSeq™, HiSeq™, MiSeq™, or MiniSeq™) apparatus.
After the sequencing data is obtained, it is processed in order to obtain the RNA expression data. RNA expression data may be acquired using any method known in the art including, but not limited to: whole transcriptome sequencing, whole exome sequencing, total RNA sequencing, mRNA sequencing, targeted RNA sequencing, RNA exome capture sequencing, next generation sequencing, and/or deep RNA sequencing. In some embodiments, RNA expression data may be obtained using a microarray assay.
In some embodiments, the sequencing data is processed to produce RNA expression data. In some embodiments, RNA sequence data is processed by one or more bioinformatics methods or software tools, for example RNA sequence quantification tools (e.g., Kallisto) and genome annotation tools (e.g., Gencode v23), in order to produce expression data. The Kallisto software is described in Nicolas L Bray, Harold Pimentel, Pall Melsted and Lior Pachter, Near-optimal probabilistic RNA-seq quantification, Nature Biotechnology 34, 525-527 (2016), doi:10.1038/nbt.3519, which is incorporated by reference in its entirety herein.
In some embodiments, microarray expression data is processed using a bioinformatics R package, such as “affy” or “limma”, in order to produce expression data. The “affy” software is described in Bioinformatics. 2004 Feb. 12; 20(3):307-15. doi: 10.1093/bioinformatics/btg405. “affy—analysis of Affymetrix GeneChip data at the probe level” by Laurent Gautier 1, Leslie Cope, Benjamin M Bolstad, Rafael A Irizarry PMID: 14960456 DOI: 10.1093/bioinformatics/btg405, which is incorporated by reference herein in its entirety. The “limma” software is described in Ritchie M E, Phipson B, Wu D, Hu Y, Law C W, Shi W, Smyth G K “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Res. 2015 Apr. 20; 43(7):e47. 20. https://doi.org/10.1093/nar/gkv007
PMID: 25605792, PMCID: PMC4402510, which is incorporated by reference herein its entirety.
In some embodiments, sequencing data and/or expression data comprises more than 5 kilobases (kb). In some embodiments, the size of the obtained RNA data is at least 10 kb. In some embodiments, the size of the obtained RNA sequencing data is at least 100 kb. In some embodiments, the size of the obtained RNA sequencing data is at least 500 kb. In some embodiments, the size of the obtained RNA sequencing data is at least 1 megabase (Mb). In some embodiments, the size of the obtained RNA sequencing data is at least 10 Mb. In some embodiments, the size of the obtained RNA sequencing data is at least 100 Mb. In some embodiments, the size of the obtained RNA sequencing data is at least 500 Mb. In some embodiments, the size of the obtained RNA sequencing data is at least 1 gigabase (Gb). In some embodiments, the size of the obtained RNA sequencing data is at least 10 Gb. In some embodiments, the size of the obtained RNA sequencing data is at least 100 Gb. In some embodiments, the size of the obtained RNA sequencing data is at least 500 Gb.
In some embodiments, the expression data is acquired through bulk RNA sequencing. Bulk RNA sequencing may include obtaining expression levels for each gene across RNA extracted from a large population of input cells (e.g., a mixture of different cell types.) In some embodiments, the expression data is acquired through single cell sequencing (e.g., scRNA-seq). Single cell sequencing may include sequencing individual cells.
In some embodiments, bulk sequencing data comprises at least 1 million reads, at least 5 million reads, at least 10 million reads, at least 20 million reads, at least 50 million reads, or at least 100 million reads. In some embodiments, bulk sequencing data comprises between 1 million reads and 5 million reads, 3 million reads and 10 million reads, 5 million reads and 20 million reads, 10 million reads and 50 million reads, 30 million reads and 100 million reads, or 1 million reads and 100 million reads (or any number of reads including, and between).
In some embodiments, the expression data comprises next-generation sequencing (NGS) data. In some embodiments, the expression data comprises microarray data.
Expression data (e.g., indicating expression levels) for a plurality of genes may be used for any of the methods or compositions described herein. The number of genes which may be examined may be up to and inclusive of all the genes of the subject. In some embodiments, expression levels may be determined for all of the genes of a subject. As a non-limiting example, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, eleven or more, twelve or more, 13 or more, 14 or more, 15 or more, 16 or more, 17 or more, 18 or more, 19 or more, 20 or more, 21 or more, 22 or more, 23 or more, 24 or more, 25 or more, 26 or more, 27 or more, 28 or more, 29 or more, 30 or more, 35 or more, 40 or more, 50 or more, 60 or more, 70 or more, 80 or more, 90 or more, 100 or more, 125 or more, 150 or more, 175 or more, 200 or more, 225 or more, 250 or more, 275 or more, or 300 or more genes may be used for any evaluation described herein. As another set of non-limiting examples, the expression data may include, for each gene group listed in Table 1, expression data for at least 5, at least 10, at least 15, at least 20, at least 25, at least 35, at least 50, at least 75, at least 100 genes selected from each gene group.
In some embodiments, RNA expression data is obtained by accessing the RNA expression data from at least one computer storage medium on which the RNA expression data is stored. Additionally or alternatively, in some embodiments, RNA expression data may be received from one or more sources via a communication network of any suitable type. For example, in some embodiment, the RNA expression data may be received from a server (e.g., a SFTP server, or Illumina BaseSpace).
The RNA expression data obtained may be in any suitable format, as aspects of the technology described herein are not limited in this respect. For example, in some embodiments, the RNA expression data may be obtained in a text-based file (e.g., in a FASTQ, FASTA, BAM, or SAM format). In some embodiments, a file in which sequencing data is stored may contains quality scores of the sequencing data. In some embodiments, a file in which sequencing data is stored may contain sequence identifier information.
Expression data, in some embodiments, includes gene expression levels. Gene expression levels may be detected by detecting a product of gene expression such as mRNA and/or protein. In some embodiments, gene expression levels are determined by detecting a level of a mRNA in a sample. As used herein, the terms “determining” or “detecting” may include assessing the presence, absence, quantity and/or amount (which can be an effective amount) of a substance within a sample, including the derivation of qualitative or quantitative concentration levels of such substances, or otherwise evaluating the values and/or categorization of such substances in a sample from a subject.
Process 104 begins at act 200, where sequencing data is obtained from a biological sample obtained from a subject. The sequencing data is obtained by any suitable method, for example, using any of the methods described herein including in the Section titled “Biological Samples”.
In some embodiments, the bulk sequencing data obtained at act 104 comprises RNA-seq data. In some embodiments, the biological sample comprises blood or tissue. In some embodiments, the biological sample comprises one or more tumor cells, for example, one or more RC tumor cells.
Next, process 104 proceeds to act 202 where the sequencing data obtained at act 200 is normalized to transcripts per kilobase million (TPM) units. The normalization may be performed using any suitable software and in any suitable way. For example, in some embodiments, TPM normalization may be performed according to the techniques described in Wagner et al. (Theory Biosci. (2012) 131:281-285), which is incorporated by reference herein in its entirety. In some embodiments, the TPM normalization may be performed using a software package, such as, for example, the gcrma package. Aspects of the gcrma package are described in Wu J, Gentry RIwcfJMJ (2021). “gcrma: Background Adjustment Using Sequence Information. R package version 2.66.0.”, which is incorporated by reference in its entirety herein. In some embodiments, RNA expression level in TPM units for a particular gene may be calculated according to the following formula:
Next, process 104 proceeds to act 204, where the RNA expression levels in TPM units (as determined at act 202) may be log transformed. Process 104 is illustrative and there are variations. For example, in some embodiments, one or both of acts 202 and 204 may be omitted. Thus, in some embodiments, the RNA expression levels may not be normalized to transcripts per million units and may, instead, be converted to another type of unit (e.g., reads per kilobase million (RPKM) or fragments per kilobase million (FPKM) or any other suitable unit). Additionally or alternatively, in some embodiments, the log transformation may be omitted. Instead, no transformation may be applied in some embodiments, or one or more other transformations may be applied in lieu of the log transformation.
Expression data obtained by process 104 can include the sequence data generated by a sequencing protocol (e.g., the series of nucleotides in a nucleic acid molecule identified by next-generation sequencing, sanger sequencing, etc.) as well as information contained therein (e.g., information indicative of source, tissue type, etc.) which may also be considered information that can be inferred or determined from the sequence data. In some embodiments, expression data obtained by process 104 can include information included in a FASTA file, a description and/or quality scores included in a FASTQ file, an aligned position included in a BAM file, and/or any other suitable information obtained from any suitable file.
Aspects of the disclosure relate to processing of expression data to determine one or more gene expression signatures (e.g., an RC TME signature). In some embodiments, expression data (e.g., RNA expression data) is processed using a computing device to determine the one or more gene expression signatures. In some embodiments, the computing device may be operated by a user such as a doctor, clinician, researcher, patient, or other individual. For example, the user may provide the expression data as input to the computing device (e.g., by uploading a file), and/or may provide user input specifying processing or other methods to be performed using the expression data.
In some embodiments, expression data may be processed by one or more software programs running on computing device.
In some embodiments, methods described herein comprise an act of determining the RC TME signature comprising gene group scores for respective gene groups in the plurality of gene groups. In some embodiments, the RC TME signature comprises gene group scores for at least one (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, or 33) of the gene groups listed in Table 1.
The number of genes in a gene group used to determine a gene group score may vary. In some embodiments, all RNA expression levels for all genes in a particular gene group may be used to determine a gene group score for the particular gene group. In other embodiments, RNA expression data for fewer than all genes may be used (e.g., RNA expression levels for at least two genes, at least three genes, at least five genes, between 2 and 10 genes, between 5 and 15 genes, between 3 and 30 genes, or any other suitable range within these ranges).
In some embodiments, an RC TME signature comprises a gene group score for the MHC I gene group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, or at least seven genes) in the MHC I gene group, which is defined by its constituent genes: HLA-A, HLA-B, HLA-C, B2M, TAP1, TAP2, NLRC5, TAPBP.
In some embodiments, an RC TME signature comprises a gene group score for the MHC II gene group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes) in the MHC II gene group, which is defined by its constituent gene: HLA-DQA1, HLA-DMA, HLA-DRB1, HLA-DMB, CIITA, HLA-DPA1, HLA-DPB1, HLA-DRA, HLA-DQB1.
In some embodiments, an RC TME signature comprises a gene group score for the Coactivation molecules group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, or at least ten genes) in the Coactivation molecules group, which is defined by its constituent genes: CD28, CD40, TNFRSF4, ICOS, TNFRSF9, CD27, CD80, CD86, CD40LG, CD83, TNFSF4, ICOSLG, TNFSF9, CD70.
In some embodiments, an RC TME signature comprises a gene group score for the Effector cells group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, at least ten genes, or more than ten genes) in the Effector cells group, which is defined by its constituent genes: IFNG, GZMA, GZMB, PRF1, GZMK, ZAP70, GNLY, FASLG, TBX21, EOMES, CD8A, CD8B.
In some embodiments, an RC TME signature comprises a gene group score for the T cell traffic group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, or at least eight genes) in the T cell traffic group, which is defined by its constituent genes: CXCL9, CCL3, CXCR3, CXCL10, CXCL11, CCL5, CCL4, CX3CL1, CX3CR1.
In some embodiments, an RC TME signature comprises a gene group score for the NK cells group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, at least ten genes, or more than ten genes) in the NK cells group, which is defined by its constituent genes: GZMB, NKG7, CD160, GZMH, CD244, EOMES, KLRK1, NCR1, GNLY, KLRF1, FGFBP2, SH2D1B, KIR2DL4, IFNG, NCR3, KLRC2, CD226.
In some embodiments, an RC TME signature comprises a gene group score for the T cells group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, or at least ten genes) in the T cells group, which is defined by its constituent genes: TRAC, TRBC2, TBX21, CD3E, CD3D, ITK, TRBC1, CD3G, CD28, TRAT1, CD5.
In some embodiments, an RC TME signature gene group comprises a score for the B cells group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, at least ten genes, or more than ten genes) in the B cells group, which is defined by its constituent genes: CR2, MS4A1, CD79A, FCRL5, STAP1, TNFRSF17, TNFRSF13B, CD19, BLK, CD79B, TNFRSF13C, CD22, PAX5.
In some embodiments, an RC TME signature comprises a gene group score for the M1 signature group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, or at least eight genes) in the M1 signature group, which is defined by its constituent genes: IL1B, IL12B, NOS2, SOCS3, IRF5, IL23A, TNF, IL12A, CMKLR1.
In some embodiments, an RC TME signature comprises a gene group score for the Th1 signature group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, or at least five genes) in the Th1 signature group, which is defined by its constituent genes: IL12RB2, IL2, TBX21, IFNG, STAT4, IL21, CD40LG.
In some embodiments, an RC TME signature comprises a gene group score for the Antitumor cytokines group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, or at least five genes) in the Antitumor cytokines group, which is defined by its constituent genes: IFNA2, CCL3, TNF, TNFSF10, IL21, IFNB1.
In some embodiments, an RC TME signature comprises a gene group score for the Checkpoint inhibition group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, at least ten genes, or more than ten genes) in the Checkpoint inhibition group, which is defined by its constituent genes: CTLA4, HAVCR2, CD274, LAG3, BTLA, VSIR, PDCD1LG2, TIGIT, PDCD1.
In some embodiments, an RC TME signature comprises a gene group score for the Treg group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, or at least six genes) in the Treg group, which is defined by its constituent genes: TNFRSF18, IKZF2, IL10, IKZF4, CTLA4, FOXP3, CCR8.
In some embodiments, an RC TME signature comprises a gene group score for the T reg traffic group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, or at least six genes) in the T reg traffic group, which is defined by its constituent genes: CCL28, CCR10, CCR4, CCR8, CCL17, CCL22, CCL1.
In some embodiments, an RC TME signature comprises a gene group score for the Neutrophil signature group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, or at least nine genes) in the Neutrophil group, which is defined by its constituent genes: FCGR3B, CD177, CTSG, PGLYRP1, FFAR2, CXCR2, PRTN3, ELANE, MPO, CXCR1.
In some embodiments, an RC TME signature comprises a gene group score for the Granulocyte traffic group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, or at least six genes) in the Granulocyte traffic group, which is defined by its constituent genes: CXCL8, CCR3, CXCR2, CXCL2, CCL11, KITLG, CXCL1, CXCL5, CXCR1.
In some embodiments, an RC TME signature comprises a gene group score for the MDSC group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, or at least six genes) in the MDSC group, which is defined by its constituent genes: IDO1, ARG1, IL10, CYBB, PTGS2, IL4I1, IL6.
In some embodiments, an RC TME signature comprises a gene group score for the MDSC traffic signature group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, or at least seven genes) in the MDSC traffic group, which is defined by its constituent genes: CCL15, IL6R, CSF2RA, CSF2, CXCL8, CXCL12, IL6, CSF3, CCL26, CXCR4, CXCR2, CSF3R, CSF1, CXCL5, CSF1R.
In some embodiments, an RC TME signature comprises a gene group score for the Macrophage group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, or at least seven genes) in the Macrophage group, which is defined by its constituent genes: MRC1, CD163, MSR1, SIGLEC1, IL4I1, CD68, IL10, CSF1R.
In some embodiments, an RC TME signature comprises a gene group score for the Macrophage DC traffic group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, or at least seven genes) in the Macrophage DC traffic group, which is defined by its constituent genes: CCL7, CCL2, XCR1, XCL1, CSF1, CCR2, CCL8, CSF1R.
In some embodiments, an RC TME signature comprises a gene group score for the Th2 signature group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, or at least seven genes) in the Th2 signature group, which is defined by its constituent genes: IL13, CCR4, IL10, IL5, IL4.
In some embodiments, an RC TME signature comprises a gene group score for the Protumor cytokines group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, or at least seven genes) in the Protumor cytokines group, which is defined by its constituent genes: MIF, TGFB1, IL10, TGFB3, IL6, TGFB2, IL22.
In some embodiments, an RC TME signature comprises a gene group score for the Cancer associated fibroblast (CAF) group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, at least ten genes, or more than 10 genes) in the Cancer associated fibroblast (CAF) group, which is defined by its constituent genes: PDGFRB, COL6A3, FBLN1, CXCL12, COL6A2, COL6A1, LUM, CD248, COL5A1, MMP2, COL1A1, MFAP5, PDGFRA, LRP1, FGF2, MMP3, FAP, COL1A2, ACTA2.
In some embodiments, an RC TME signature comprises a gene group score for the Matrix group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, at least ten genes, or more than 10 genes) in the Matrix group, which is defined by its constituent genes: COL11A1, LAMB3, FN1, COL1A1, COL4A1, ELN, LGALS9, LGALS7, LAMC2, TNC, LAMA3, COL3A1, COL5A1, VTN, COL1A2.
In some embodiments, an RC TME signature comprises a gene group score for the Matrix-remodeling group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, at least ten genes, or more than 10 genes) in the Matrix-remodeling group, which is defined by its constituent genes: MMP1, PLOD2, MMP2, MMP12, ADAMTS5, ADAMTS4, LOX, MMP9, MMP11, MMP3, MMP1, CA9.
In some embodiments, an RC TME signature comprises a gene group score for the angiogenesis group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, at least ten genes, or more than ten genes) in the angiogenesis group, which is defined by its constituent genes: VEGFA, VEGFB, VEGFC, PDGFC, CXCL8, CXCR2, FLT1, PGF, KDR, ANGPT1, ANGPT2, TEK, VWF, CDH5.
In some embodiments, an RC TME signature comprises a gene group score for the endothelium group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, at least ten genes, or more than ten genes) in the endothelium group, which is defined by its constituent genes: NOS3, MMRN1, FLT1, CLEC14A, MMRN2, VCAM1, ENG, VWF, CDH5, KDR.
In some embodiments, an RC TME signature comprises a gene group score for the Proliferation rate group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, or at least six genes) in the Proliferation rate group, which is defined by its constituent genes: MKI67, ESCO2, CETN3, CDK2, CCND1, CCNE1, AURKA, AURKB, E2F1, MYBL2, BUB1, CCNB1, MCM2, MCM6.
In some embodiments, an RC TME signature comprises a gene group score for the EMT signature group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, at least ten genes, or more than 10 genes) in the EMT signature group, which is defined by its constituent genes: SNAI2, TWIST1, ZEB2, SNAI1, ZEB1, TWIST2, CDH2.
In some embodiments, an RC TME signature comprises a gene group score for the Cyclic Nucleotides Metabolism group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, at least ten genes, or more than 10 genes) in the Cyclic Nucleotides Metabolism group, which is defined by its constituent genes: ADCY4, PDE11A, PDE6A, PDE9A, PDE6C, ADCY7, PDE4A, PDE8A, PDE1B, PDE1A, GUCY2C, GUCY1A3, ADCY9, ADCY2, PDE6B, ADCY8, PDE8B, GUCY2F, PDE4C, PDE3A, GUCY1A2, PDE6G, PDE1C, GUCY2D, ADCY10, GUCY1B3, GUCY1B2, PDE7B, PDE5A, PDE6D, NPR2, ADCY5, NPR1, ADCY6, PDE7A, PDE2A, PDE4B, PDE10A, PDE6H, PDE4D, ADCY1, PDE3B, ADCY3.
In some embodiments, an RC TME signature comprises a gene group score for the Glycolysis and Gluconeogenesis group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, at least ten genes, or more than 10 genes) in the Glycolysis and Gluconeogenesis group, which is defined by its constituent genes: SLC2A9, PFKL, GCK, PFKFB4, SLC16A7, PCK1, PGAM2, GAPDH, BPGM, G6PC2, FBP2, LDHD, SLC2A3, GPI, ENO1, SLC25A11, PFKFB3, PFKM, LDHAL6B, SLC2A2, G6PC3, SLC2A6, GAPDHS, SLC2A11, PCK2, PFKP, PGK1, ALDOC, SLC2A10, ACYP2, SLC2A4, PKLR, HKDC1, PGK2, SLC2A8, PGAM1, SLC5A1, SLC5A12, SLC16A1, ALDOB, HK3, HK1, SLC5A9, GPD2, PFKFB1, SLC2A7, SLC5A11, SLC5A3, ACYP1, SLC16A8, PFKFB2, ALDOA, SLC5A2, HK2, ENO3, SLC2A12, FBP1, LDHA, LDHB, LDHC, G6PC, SLC2A14, SLC5A8, TPI1, SLC16A3, PKM2, ENO2, PGM1, UEVLD, LDHAL6A, SLC2A1, PGM2; Citric Acid Cycle group: ACLY, FAH, PC, MDH1B, SLC16A7, IREB2, PCK1, MDH1, SLC33A1, ALDH1B1, IDH3B, DLST, PDHB, MDH2, ACO1, IDH1, SLC5A6, HICDH, SLC16A8, GOT1, ME3, ME1, CS, OGDH, SDHA, ALDH5A1, CLYBL, SDHD, IDH3A, SLC25A1, ACSS2, SDHC, ACSS1, SUCLA2, SLC13A5, PDHX, SDHB, ALDH4A1, PCK2, DLD, ACO2, PDHA1, SLC13A2, FAHD1, IDH2, GOT2, ME2, ADSL, SUCLG2, SLC13A3, SUCLG1, SLC25A10, FH, IDH3G, SLC16A1, SLC25A11, PDHA2, DLAT.
In some embodiments, an RC TME signature comprises a gene group score for the Fatty Acid Metabolism group. In some embodiments, this gene group score may be calculated using RNA expression levels of at least three genes (e.g., at least three genes, at least four genes, at least five genes, at least six genes, at least seven genes, at least eight genes, at least nine genes, at least ten genes, or more than 10 genes) in the Fatty Acid Metabolism group, which is defined by its constituent genes: MLYCD, ALDH3A2, SLC27A5, SLC27A3, LIPC, SLC27A2, ACSL4, ACSL1, PCCB, SLC25A20, AADAC, SLC22A4, SLC22A5, ECH1, PCCA, SLC27A1, SLC27A4, CROT, ACSL5, ACSL3, CYP4F12.
In some embodiments, determining an RC TME signature comprises determining a respective gene group score for each of at least two of the following gene groups, using, for a particular gene group, RNA expression levels for at least three genes (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 17, 18, 19, 20, all genes in the gene group or any number therebetween) in the particular gene group to determine the gene group score for the particular group, the gene groups including: MHC I group: HLA-A, HLA-B, HLA-C, B2M, TAP1, TAP2, NLRC5, TAPBP; MHC H group: HLA-DQA1, HLA-DMA, HLA-DRB1, HLA-DMB, CIITA, HLA-DPA1, HLA-DPB1, HLA-DRA, HLA-DQB1; Coactivation molecules group: CD28, CD40, TNFRSF4, ICOS, TNFRSF9, CD27, CD80, CD86, CD40LG, CD83, TNFSF4, ICOSLG, TNFSF9, CD70; Effector cells group: IFNG, GZMA, GZMB, PRF1, GZMK, ZAP70, GNLY, FASLG, TBX21, EOMES, CD8A, CD8B; T cell traffic group: CXCL9, CCL3, CXCR3, CXCL10, CXCL11, CCL5, CCL4, CX3CL1, CX3CR1; NK cells group: GZMB, NKG7, CD160, GZMH, CD244, EOMES, KLRK1, NCR1, GNLY, KLRF1, FGFBP2, SH2D1B, KIR2DL4, IFNG, NCR3, KLRC2, CD226; T cells group: TRAC, TRBC2, TBX21, CD3E, CD3D, ITK, TRBC1, CD3G, CD28, TRAT1, CD5; B cells group: CR2, MS4A1, CD79A, FCRL5, STAP1, TNFRSF17, TNFRSF13B, CD19, BLK, CD79B, TNFRSF13C, CD22, PAX5; M1 signature group: IL1B, IL12B, NOS2, SOCS3, IRF5, IL23A, TNF, IL12A, CMKLR1; Th1 signature group: IL12RB2, IL2, TBX21, IFNG, STAT4, IL21, CD40LG; Antitumor cytokines group: IFNA2, CCL3, TNF, TNFSF10, IL21, IFNB1; Checkpoint inhibition group: CTLA4, HAVCR2, CD274, LAG3, BTLA, VSIR, PDCD1LG2, TIGIT, PDCD1; Treg group: TNFRSF18, IKZF2, IL10, IKZF4, CTLA4, FOXP3, CCR8; T reg traffic group: CCL28, CCR10, CCR4, CCR8, CCL17, CCL22, CCL1; Neutrophil signature group: FCGR3B, CD177, CTSG, PGLYRP1, FFAR2, CXCR2, PRTN3, ELANE, MPO, CXCR1; Granulocyte traffic group: CXCL8, CCR3, CXCR2, CXCL2, CCL11, KITLG, CXCL1, CXCL5, CXCR1; MDSC group: IDO1, ARG1, IL10, CYBB, PTGS2, IL4I1, IL6; MDSC traffic group: CCL15, IL6R, CSF2RA, CSF2, CXCL8, CXCL12, IL6, CSF3, CCL26, CXCR4, CXCR2, CSF3R, CSF1, CXCL5, CSF1R; Macrophage group: MRC1, CD163, MSR1, SIGLEC1, IL4I1, CD68, IL10, CSF1R; Macrophage DC traffic group: CCL7, CCL2, XCR1, XCL1, CSF1, CCR2, CCL8, CSF1R; Th2 signature group: IL13, CCR4, IL10, IL5, IL4; Protumor cytokines group: MIF, TGFB1, IL10, TGFB3, IL6, TGFB2, IL22; Cancer associated fibroblast (CAF) group: PDGFRB, COL6A3, FBLN1, CXCL12, COL6A2, COL6A1, LUM, CD248, COL5A1, MMP2, COL1A1, MFAP5, PDGFRA, LRP1, FGF2, MMP3, FAP, COL1A2, ACTA2; Matrix group: COL11A1, LAMB3, FN1, COL1A1, COL4A1, ELN, LGALS9, LGALS7, LAMC2, TNC, LAMA3, COL3A1, COL5A1, VTN, COL1A2; Matrix-remodeling group: MMP1, PLOD2, MMP2, MMP12, ADAMTS5, ADAMTS4, LOX, MMP9, MMP11, MMP3, MMP1, CA9; Angiogenesis group: VEGFA, VEGFB, VEGFC, PDGFC, CXCL8, CXCR2, FLT1, PGF, KDR, ANGPT1, ANGPT2, TEK, VWF, CDH5; endothelium group: NOS3, MMRN1, FLT1, CLEC14A, MMRN2, VCAM1, ENG, VWF, CDH5, KDR; Proliferation rate group: MKI67, ESCO2, CETN3, CDK2, CCND1, CCNE1, AURKA, AURKB, E2F1, MYBL2, BUB1, CCNB1, MCM2, MCM6; EMT signature group: SNAI2, TWIST1, ZEB2, SNAI1, ZEB1, TWIST2, CDH2; Cyclic Nucleotides Metabolism group: ADCY4, PDE11A, PDE6A, PDE9A, PDE6C, ADCY7, PDE4A, PDE8A, PDE1B, PDE1A, GUCY2C, GUCY1A3, ADCY9, ADCY2, PDE6B, ADCY8, PDE8B, GUCY2F, PDE4C, PDE3A, GUCY1A2, PDE6G, PDE1C, GUCY2D, ADCY10, GUCY1B3, GUCY1B2, PDE7B, PDE5A, PDE6D, NPR2, ADCY5, NPR1, ADCY6, PDE7A, PDE2A, PDE4B, PDE10A, PDE6H, PDE4D, ADCY1, PDE3B, ADCY3; Glycolysis and Gluconeogenesis group: SLC2A9, PFKL, GCK, PFKFB4, SLC16A7, PCK1, PGAM2, GAPDH, BPGM, G6PC2, FBP2, LDHD, SLC2A3, GPI, ENO1, SLC25A11, PFKFB3, PFKM, LDHAL6B, SLC2A2, G6PC3, SLC2A6, GAPDHS, SLC2A11, PCK2, PFKP, PGK1, ALDOC, SLC2A10, ACYP2, SLC2A4, PKLR, HKDC1, PGK2, SLC2A8, PGAM1, SLC5A1, SLC5A12, SLC16A1, ALDOB, HK3, HK1, SLC5A9, GPD2, PFKFB1, SLC2A7, SLC5A11, SLC5A3, ACYP1, SLC16A8, PFKFB2, ALDOA, SLC5A2, HK2, ENO3, SLC2A12, FBP1, LDHA, LDHB, LDHC, G6PC, SLC2A14, SLC5A8, TPI1, SLC16A3, PKM2, ENO2, PGM1, UEVLD, LDHAL6A, SLC2A1, PGM2; Citric Acid Cycle group: ACLY, FAH, PC, MDH1B, SLC16A7, IREB2, PCK1, MDH1, SLC33A1, ALDH1B1, IDH3B, DLST, PDHB, MDH2, ACO1, IDH1, SLC5A6, HICDH, SLC16A8, GOT1, ME3, ME1, CS, OGDH, SDHA, ALDH5A1, CLYBL, SDHD, IDH3A, SLC25A1, ACSS2, SDHC, ACSS1, SUCLA2, SLC13A5, PDHX, SDHB, ALDH4A1, PGK2, DLD, ACO2, PDHA1, SLC13A2, FAHD1, IDH2, GOT2, ME2, ADSL, SUCLG2, SLC13A3, SUCLG1, SLC25A10, FH, IDH3G, SLC16A1, SLC25A11, PDHA2, DLAT; and Fatty Acid Metabolism group: MLYCD, ALDH3A2, SLC27A5, SLC27A3, LIPC, SLC27A2, ACSL4, ACSL1, PCCB, SLC25A20, AADAC, SLC22A4, SLC22A5, ECH1, PCCA, SLC27A1, SLC27A4, CROT, ACSL5, ACSL3, CYP4F12.
A list of gene groups is provided in Table 1 below:
As described above, aspects of the disclosure relate to determining an RC TME signature for a subject. That signature may include a gene group scores (e.g., gene group scores generated using RNA expression data for some or all of the gene groups listed in Table 1). Aspects of determining of these signatures is described next with reference to
In some embodiments, an RC TME signature comprises gene group scores generated using a gene set enrichment analysis (GSEA) technique to determine a gene group score for one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, or 37) gene groups listed in Table 1. In some embodiments, each gene group score is generated using a RNA levels for at least some of the genes in each gene group. In some embodiments, using a GSEA technique comprises using single-sample GSEA. Aspects of single sample GSEA (ssGSEA) are described in Barbie et al. Nature. 2009 Nov. 5; 462(7269): 108-112, the entire contents of which are incorporated by reference herein. In some embodiments, ssGSEA is performed according to the following formula:
where ri represents the rank of the ith gene in expression matrix, where N represents the number of genes in the gene set (e.g., the number of genes in the first gene group when ssGSEA is being used to determine a score for the first gene group using expression levels of the genes in the first gene group), and where M represents total number of genes in expression matrix. Additional, suitable techniques of performing GSEA are known in the art and are contemplated for use in the methods described herein without limitation. In some embodiments, an RC TME signature is calculated by performing ssGSEA on expression data from a plurality of subjects, for example expression data from one or more cohorts of subjects, such as: KIRC, JAVELIN101, Immotion150, Immotion151, ICGC RECA EU, PUB_KIRC_CPTAC3, PUB_RCC_VanAllen_phs001493, WU_ccRCC_RCCTC, PUB_ccRCC_Chinese_2020, ccRCC_CheckMates_2020, PUB_ccRCC_Sato_2013, COMPARZ, E-MTAB-3267, GSE2109, GSE53757, GSE73731, and GSE40435, in order to produce a plurality of gene group scores.
For example, as shown in
Although the example of
The number of genes in a gene group used to determine a gene group expression score may vary. In some embodiments, all RNA expression levels for all genes in a particular gene group may be used to determine a gene group score for the particular gene group. In other embodiments, RNA expression data for fewer than all genes may be used (e.g., RNA expression levels for at least two genes, at least three genes, at least five genes, between 2 and 10 genes, between 5 and 15 genes, or any other suitable range within these ranges).
In some embodiments, RNA expression levels for a particular gene group may be embodied in at least one data structure having fields storing the expression levels. The data structure or data structures may be provided as input to software comprising code that implements a GSEA technique (e.g., the ssGSEA technique) and processes the expression levels in the at least one data structure to compute a score for the particular gene group. In some embodiments, ssGSEA is performed on expression data comprising three or more (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, or 37) gene groups set forth in Table 1. In some embodiments, each of the gene groups separately comprises one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75 or more) genes listed in Table 1. In some embodiments, an RC TME signature is produced by performing ssGSEA on 33 of the gene groups in Table 1, each gene group including all listed genes in Table 1. In some embodiments, an RC TME signature is produced by performing ssGSEA on 37 of the gene groups in Table 1, each gene group including all listed genes in Table 1
In some embodiments, one or more (e.g., a plurality) of enrichment scores are normalized in order to produce s RC TME signature for the expression data (e.g., expression data of the subject or of a cohort of subjects). In some embodiments, the enrichment scores are normalized by median scaling. In some embodiments, median scaling produces an RC TME signature of the subject. In some embodiments, median scaling comprises clipping the range of enrichment scores, for example clipping to about −1.0 to about +1.0, −2.0 to about +3.0, −3.0 to about +3.0, −4.0 to +4.0, −5.0 to about +5.0.
In some embodiments, an RC TME signature of a subject processed using a clustering algorithm to identify an RC tumor microenvironment type (e.g. an RC TME type). In some embodiments, the clustering comprises unsupervised clustering. In some embodiments, the unsupervised clustering comprises a dense clustering approach. In some embodiments, an RC TME signature of a subject is compared to pre-existing clusters of RC TME types and assigned an RC TME type based on that comparison. In some embodiments, clustering comprises generating a graph with samples at nodes and correlation of the ssGSEA scores at edges. In some embodiments, each node has 75 neighbors. In some embodiments, clustering further comprises applying the Leiden algorithm to the resulting graph.
In some embodiments, a RC TME signature of a subject is compared to pre-existing clusters of RC TME types and assigned a RC TME type based on that comparison.
Some aspects of determining gene group scores for gene groups are also described in U.S. Patent Publication No. 2020-0273543, entitled “SYSTEMS AND METHODS FOR GENERATING, VISUALIZING AND CLASSIFYING MOLECULAR FUNCTIONAL PROFILES”, the entire contents of which are incorporated by reference herein.
Techniques for generating RC TME clusters are described herein. It should be appreciated that the RC TME clusters may be updated as additional RC TME signatures are computed for patients. In some embodiments, the RC TME signature of the subject is one of a threshold number RC TME signatures for a threshold number of subjects. In some embodiments, when the threshold number of RC TME signatures is generated the RC TME signature clusters are updated. For example, once a threshold number of new RC TME signatures are obtained (e.g., 1 new signature, 10 new signatures, 100 new signatures, 500 new signatures, any suitable threshold number of signatures in the range of 10-1,000 signatures), the new signatures may be combined with the RC TME signatures previously used to generate the RC TME clusters and the combined set of old and new RC TME signatures may be clustered again (e.g., using any of the clustering algorithms described herein or any other suitable clustering algorithm) to obtain an updated set of RC TME signature clusters.
In this way, data obtained from a future patient may be analyzed in a way that takes advantage of information learned from patients whose RC TME signature was computed prior to that of the future patient. In this sense, the machine learning techniques described herein (e.g., the unsupervised clustering machine learning techniques) are adaptive and learn with the accumulation of new patient data. This facilitates improved characterization of the RC TME type that future patients may have and may improve the selection of treatment for those patients.
Aspects of the disclosure relate to methods for generating a myogenesis signature for a subject. The disclosure is based, in part, on the recognition that a myogenesis signature calculated as described herein can be used, in some embodiments, to identify subjects that have an increased likelihood of being a non-responder to treatment with immuno-oncology (IO) agents. In some embodiments, a myogenesis signature is generated using RNA expression data for at least some (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14) of the genes listed in Table 2.
In some embodiments, a myogenesis signature comprises a myogenesis gene group score. In some embodiments, a myogenesis gene group score is generated using RNA expression levels of at least some (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14) of the genes listed in Table 2). In some embodiments, generating the myogenesis gene group score comprises performing GSEA (e.g., ssGSEA) using RNA expression data for two or more genes listed in Table 2. In some embodiments, median scaling is performed on the gene expression (e.g., gene enrichment) scores resulting from the GSEA. A myogenesis signature may be expressed as a score ranging from −3 to 20 (e.g., −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20). In some embodiments, a subject having a myogenesis signature (score) higher than 4 is considered as having a “high myogenesis score”. In some embodiments, a subject having a myogenesis signature (score) higher than 8 is considered as having a “high myogenesis score”. In some embodiments, a subject having a myogenesis signature (score) higher than 10 is considered as having a “high myogenesis score”. In some embodiments, a subject having a myogenesis signature (score) higher than 15 is considered as having a “high myogenesis score”. In some embodiments, a subject having a “high” myogenesis signature (score) is considered to be a “non-responder to IO therapy”. A “non-responder to IO therapy” is a subject having RC (e.g., ccRCC) that is significantly less likely to respond to treatment with an immuno-therapeutic (IO) agent relative to a subject (e.g., an RC subject) not having a “high” myogenesis signature (score).
The number of genes in the gene group used to determine the myogenesis signature may vary. In some embodiments, all RNA expression levels for all genes in a myogenesis signature gene group may be used to determine the myogenesis gene group score. In other embodiments, RNA expression data for fewer than all genes may be used (e.g., RNA expression levels for 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 genes). A list of genes in the Myogenesis Signature gene group is provided in Table 2 below:
As described herein,
As described herein, in some embodiments, one of a plurality of different RC TME types may be identified for the subject using the RC TME signature determined for the subject using the techniques described herein. In some embodiments, the RC TME type comprises RC TME type A, RC TME type B, RC TME type C, RC TME type D, and RC TME type E.
In some embodiments, each of the plurality of RC TME types is associated with a respective RC TME signature cluster in a plurality of RC TME signature clusters. The RC TME type for a subject may be determined by: (1) associating the RC TME signature of the subject with a particular one of the plurality of RC TME signature clusters; and (2) identifying the RC TME type for the subject as the RC TME type corresponding to the particular one of the plurality of RC TME signature clusters to which the RC TME signature of the subject is associated.
In some embodiments, the RC TME signature clusters may be generated by: (1) obtaining RC TME signatures (using the techniques described herein) for a plurality of subjects; and (2) clustering the RC TME signatures so obtained into the plurality of clusters. Any suitable clustering technique may be used for this purpose including, but not limited to, a dense clustering algorithm, spectral clustering algorithm, k-means clustering algorithm, hierarchical clustering algorithm, and/or an agglomerative clustering algorithm.
For example, inter-sample similarity may be calculated using a Pearson correlation. A distance matrix may be converted into a graph where each sample forms a node and two nodes form an edge with a weight equal to their Pearson correlation coefficient. Edges with weight lower than a specified threshold may be removed. A Louvain community detection algorithm may be applied to calculate graph partitioning into clusters. To mathematically determine the optimum weight threshold for observed clusters minimum Davies Bouldin, maximum Calinski-Harabasz, and Silhouette techniques may be employed. Separations with low-populated clusters (<5% of samples) may be excluded.
Accordingly, in some embodiments, generating the RC TME signature clusters involves: (A) obtaining multiple sets of RNA expression data obtained by sequencing biological samples from multiple respective subjects, each of the multiple sets of RNA expression data indicating RNA expression levels for genes in a first plurality of gene groups (e.g., one or more of the gene groups in Table 1); (B) generating multiple RC TME signatures from the multiple sets of RNA expression data, each of the multiple RC TME signatures comprising gene group scores for respective gene groups, the generating comprising, for each particular one of the multiple RC TME signatures: (i) determining the RC TME signature by determining the gene group scores using the RNA expression levels in the particular set of RNA expression data for which the particular one RC TME signature is being generated, and (ii) clustering the multiple RC signatures to obtain the plurality of RC TME signature clusters.
The resulting RC TME signature clusters may each contain any suitable number of RC TME signatures (e.g., at least 10, at least 100, at least 500, at least 500, at least 1000, at least 5000, between 100 and 10,000, between 500 and 20,000, or any other suitable range within these ranges), as aspects of the technology described herein are not limited in this respect. The number of RC TME signature clusters in this example is five. And although, in some embodiments, it may be possible that the number of clusters is different, it should be appreciated that an important aspect of the present disclosure is the inventors' discovery that RC may be characterized into five types based upon the generation of RC TME signatures using methods described herein.
For example, as shown in
In some embodiments, a subject's RC TME signature may be associated with one of five RC TME signature clusters by using a machine learning technique (e.g., such as k-nearest neighbors (KNN) or any other suitable classifier) to assign the RC TME signature to one of the five RC TME signature clusters. The machine learning technique may be trained to assign RC TME signatures on the meta-cohorts represented by the signatures in the clusters.
In some embodiments, RC TME types include RC TME type A, RC TME type B, RC TME type C, RC TME type D, and RC TME type E. The RC TME types described herein may be described by qualitative characteristics, for example high signals for certain gene expression signatures or scores or low signals for certain other gene expression signatures or scores. In some embodiments, a “high” signal refers to a gene expression signal or score (e.g., an enrichment score) that is at least 1-fold, 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 20-fold, 50-fold, 100-fold, 1000-fold, or more increased relative to the score of the same gene or gene group in a subject having a different type of RC. In some embodiments, a “low” signal refers to a gene expression signal or score (e.g., an enrichment score) that is at least 1-fold, 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 20-fold, 50-fold, 100-fold, 1000-fold, or more decreased relative to the score of the same gene or gene group in a subject having a different type of RC TME.
In some embodiments, a subject is identified as having “Immune-enriched, fibrotic (IE/F)”, also referred to as “RC TME type A” RC. In some embodiments, RC TME type A is characterized by a high prevalence of immune cells and a high percentage of cancer-associated fibroblasts (CAF) relative to other RC TME types. In some embodiments, RC TME type A comprises abundant pro-tumor immune-suppressive infiltrate, including a significant number of regulatory T cells. In some embodiments, the percentage of malignant cells in RC TME type A is low relative to other RC TME types. In some embodiments, mutations in tumor suppressor BAP1 are frequent in RC TME type A. In some embodiments, subjects having RC TME type A are responsive to immune checkpoint inhibitors, alone or in combination with tyrosine kinase inhibitors. In some embodiments RC TME type A is characterized by a high tumor proliferation rate relative to other RC TME types. In some embodiments, subjects having RC TME type A have a poor prognosis relative to subjects having other RC TME types.
In some embodiments, a subject is characterized as having “Immune-enriched, non-fibrotic (IE)”, also referred to as “RC TME type B”. In some embodiments, RC TME type B is characterized by abundant immune-active infiltrate including cytotoxic effector cells, and low prevalence of stromal and fibrotic elements relative to other RC TME types. In some embodiments, RC TME type B is characterized by immune-induced inflammation. In some embodiments, subjects having RC TME type B comprise mutations in tumor suppressor BAP1. In some embodiments, subjects having RC TME type B are responsive to immune checkpoint inhibitors, alone or in combination with tyrosine kinase inhibitors.
In some embodiments, a subject is characterized as having “Fibrotic (F)” also referred to as “RC TME type C”. In some embodiments, RC TME type C is highly fibrotic (relative to other RC TME types), with dense collagen formation. In some embodiments, RC TME type C is characterized as having less inflammation than certain other RC TME types. In some embodiments, RC TME type C is characterized by minimal leukocyte/lymphocyte infiltration relative to other RC TME types. In some embodiments, Cancer-associated fibroblasts (CAF) are abundant in type C RC. In some embodiments, signs of epithelial-mesenchymal transition (EMT) are present in subjects having RC TME type C. In some embodiments, RC TME type C is associated with poor prognosis relative to other RC TME types.
In some embodiments, a subject is characterized as having “Immune desert with metabolic content (D)”, also referred to as “RC TME type D”. In some embodiments, the RC TME D type contains the highest malignant cell percentage relative to other RC TME types, and is characterized by minimal or complete absence of leukocyte/lymphocyte infiltration. In some embodiments, immune-mediated inflammation is not present. In some embodiments, signs of metabolic activation are present in subjects having RC TME type D. In some embodiments, RC TME type D is associated with a good prognosis relative to other RC TME types.
In some embodiments, a subject is characterized as having “Angiogenic, non-inflamed”, also referred to as “RC TME type E”. In some embodiments, RC TME type E is characterized by intense angiogenesis and low levels of immune infiltrate relative to other RC TME types. In some embodiments, signs of epithelial-mesenchymal transition (EMT) are present in subjects having RC TME type E. In some embodiments, RC TME type E is associated with low cancer stages and usually does not need to be treated. In some embodiments, subjects having RC TME type E are often responsive to tyrosine kinase inhibitors (TKIs). In some embodiments, RC TME type E is associated with good prognosis relative to other RC TME types.
In some embodiments, the present disclosure provides methods for identifying an RC subject's prognosis using an RC TME signature generated using methods described herein.
In some embodiments, the methods comprise identifying the subject as having a decreased risk of RC progression relative to other RC TME types when the subject is assigned RC TME type E or RC TME D. In some embodiments, “decreased risk of RC progression” may indicate better prognosis of RC or decreased likelihood of having advanced disease in a subject. In some embodiments, “decreased risk of RC progression” may indicate that the subject who has RC is expected to be more responsive to certain treatments. For instance, “decreased risk of RC progression” indicates that a subject is at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% likely to experience a progression-free survival event (e.g., relapse, retreatment, or death) than another RC patient or population of RC patients (e.g., patients having RC, but not the same RC TME type as the subject).
In some embodiments, the methods further comprise identifying the subject as having an increased risk of RC progression relative to other RC TME types when the subject is assigned a RC TME type other than RC TME type E, for example RC TME type A. In some embodiments, “increased risk of RC progression” may indicate less positive prognosis of RC or increased likelihood of having advanced disease in a subject. In some embodiments, “increased risk of RC progression” may indicate that the subject who has RC is expected to be less responsive or unresponsive to certain treatments and show less or no improvements of disease symptoms. For instance, “increased risk of RC progression” indicates that a subject is at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% more likely to experience a progression-free survival event (e.g., relapse, retreatment, or death) than another RC patient or population of RC patients (e.g., patients having RC, but not the same RC TME type as the subject).
Aspects of the disclosure relate to methods for determining whether or not a subject having RC (e.g., ccRCC) is likely to respond to certain therapeutic agents, such as immune-therapeutic (IO) agents or TKIs.
In some embodiments, the therapeutic agents are immuno-oncology (IO) agents. An IO agent may be a small molecule, peptide, protein (e.g., antibody, such as monoclonal antibody), interfering nucleic acid, or a combination of any of the foregoing. In some embodiments, the IO agents comprise a PD1 inhibitor, PD-L1 inhibitor, or PD-L2 inhibitor. Examples of IO agents include but are not limited to cemiplimab, nivolumab, pembrolizumab, avelumab, durvalumab, atezolizumab, BMS1166, BMS202, ipilimumab, etc.
In some embodiments, the therapeutic agents are tyrosine kinase inhibitors (TKIs). A TKI may be a small molecule, peptide, protein (e.g., antibody, such as monoclonal antibody), interfering nucleic acid, or a combination of any of the foregoing. Examples of TKIs include but are not limited to Imatinib mesylate (Gleevec®), Dasatinib (Sprycel®), Nilotinib (Tasigna®), Bosutinib (Bosulif®), Sunitinib (Sutent®), etc.
Aspects of the disclosure relate to methods for determining the likelihood of a subject having RC (e.g., ccRCC) responding to an IO agent. The disclosure is based, in part, on the identification of certain subgroups of RC patients that comprise biomarkers indicative of their response to IO agents. In some embodiments, when it is determined (e.g., a subject is identified as having, using methods described herein) that the subject comprises one or more of the following biomarkers, that subject is unlikely to respond to IO therapy: high Ploidy (e.g., as calculated by RTumor bioinformatics analysis), a high Myogenic Signature (e.g., as described throughout the specification for example in the section entitled Myogenesis Signature), RC TME type E (as determined by methods described throughout the specification), presence of mTOR activating mutations, or presence of mutations in antigen presentation machinery. Examples of mTOR activating mutations include but are not limited to mutations in MTOR, mutations in TSC1/2, mutations in PTEN, and mutations in MET, and those described in Cancer Discov. 2014 May; 4(5):554-63. Doi: 10.1158/2159-8290.CD-13-0929. Epub 2014 Mar. 14. Examples of mutations in antigen presentation machinery include but are not limited to mutations in PSMB5, PSMB6, PSMB7, PSMB8, PSMB9, PSMB10, TAP1, TAP2, ERAP1, ERAP2, CANX, CALR, PDIA3, TAPBP, B2M, HLA-A, HLA-B, and HLA-C. In some embodiments, a subject having one or more of the aforementioned biomarkers is referred to as an “IO non-responder”.
In some aspects, the disclosure provides a method for predicting the likelihood of a subject responding to an immuno-oncology (IO) agent by identifying an RC TME type for the subject using gene expression data for the subject, and then using a machine learning model to obtain a responder score which is indicative of the subject's likelihood of responding to IO. In some embodiments, the machine learning model comprises a gradient boosting model. In some embodiments, a machine learning model comprises a CatBoost classifier. In some embodiments, the machine learning model is trained using the following inputs from a plurality of samples (e.g., samples derived from a cohort of patients): RC TME type; expression level of PD1, PD-L1, and/or PD-L2 obtained from the gene expression data; an ECM associated signature (e.g., a gene group score generated using two or more, such as 2, 3, 4, 5, 6, or more genes from the ECM associated gene group of Table 1); an Angiogenesis signature (e.g., a gene group score generated using two or more, such as 2, 3, 4, 5, 6, or more genes from the Angiogenesis gene group of Table 1); a Proliferation rate signature (e.g., a gene group score generated using two or more, such as 2, 3, 4, 5, 6, or more genes from the Proliferation rate gene group of Table 1); and a similarity score produced by comparing the RC TME type identified for the subject to gene group scores of RC TME type B and/or RC TME type C gene group scores from other subjects. In some embodiments, the similarity score is produced by by comparing the gene group scores of the RC TME signature of the subject to an average of gene group scores of a plurality of RC TME signatures from RC TME type B samples and/or an average of gene group scores of a plurality of RC TME signatures from RC TME type C samples. In some embodiments subjects are identified as “TO non-responders” are excluded as inputs from the machine learning algorithms. In some embodiments, an ECM associated signature comprises gene enrichment scores for 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 genes listed in Table 1. In some embodiments, an Angiogenesis signature comprises gene enrichment scores for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 genes listed in Table 1. In some embodiments, a Proliferation rate signature comprises gene enrichment scores for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 genes listed in Table 1.
A responder score (e.g., an TO responder score) produced for a subject is then compared to a specified threshold in order to determine whether or not a subject is likely to respond to an immuno-therapeutic agent. In some embodiments, the specified threshold is used to determine (e.g., classify) a subject as being “IO-low”, “IO-medium”, or “IO-high”. The value of the specified threshold may range from between 0.2 to about 0.8 units (e.g., 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, or any unit value therebetween). In some embodiments, a specified threshold ranges from about 0 to about 1 units. In some embodiments, a specified threshold is 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, or 1.0. In some embodiments, a responder score is used to identify the subject as “IO-low” when the responder score is ≤0.05; “IO-medium” when the responder score is ≥0.05, or <0.5; or “IO-high” when the responder score is ≥0.5. In some embodiments, a subject identified as having a responder score above the specified threshold is identified as being likely to respond to treatment with an TO agent. In some embodiments, a subject having a responder score that is below a specified threshold (e.g., less than 0.5) is unlikely to respond positively to an IO therapy (e.g., the IO agent is unlikely to be therapeutically effective in the subject). In some embodiments, a responder score equal to or greater than 0.5 indicates that a subject is likely to respond positively to an IO therapy (e.g., the IO agent is unlikely to be therapeutically effective in the subject).
Turning to the Figures,
After the RC TME type of the subject has been identified, a machine learning model is used to obtain an output indicating a responder score (the responder score indicative of a likelihood that the subject responds to an IO agent) in process 706. In some embodiments, the obtaining comprises generating, using RNA expression data that has been obtained from the subject, a set of input features, the set of input features comprising at least two (e.g., 2, 3, 4, 5, or 6) of the following features: an RC TME type for the subject; RNA expression levels for one or more of the following genes: PD1, PD-L1, and PD-L2; an ECM associated signature for the subject; an Angiogenesis signature for the subject; a Proliferation rate signature for the subject; and a similarity score indicative of a similarity of an RC TME signature for the subject to RC TME signatures associated with RC TME type B and/or RC TME Type C samples in act 708.
In some embodiments, generating the set of input features comprises determining the RNA expression levels for one or more of the following genes: PD1, PD-L1, and PD-L2. In some embodiments, generating the set of input features comprises determining the ECM associated signature for the subject using the RNA expression data by performing ssGSEA on the RNA expression data for at least three (e.g., 3, 4, 5, 6, or all) of the “ECM associated signature” genes listed in Table 1 to produce an ECM associated gene group score. In some embodiments, generating the set of input features comprises determining the Angiogenesis signature for the subject using the RNA expression data by performing ssGSEA on the RNA expression data for at least three (e.g., 3, 4, 5, 6, or all) of the “Angiogenesis” genes listed in Table 1 to produce an Angiogenesis gene group score. In some embodiments, generating the set of input features comprises determining the Proliferation rate signature for the subject using the RNA expression data by performing ssGSEA on the RNA expression data for at least three (e.g., 3, 4, 5, 6, or all) of the “Proliferation rate” genes listed in Table 1 to produce a Proliferation rate gene group score. In some embodiments, generating the set of input features comprises determining the similarity score by comparing the gene group scores of the RC TME signature of the subject to an average of gene group scores of a plurality of RC TME signatures from RC TME type B samples and/or an average of gene group scores of a plurality of RC TME signatures from RC TME type C samples.
Next, the set of input features generated in act 708 is used as input for a machine learning model comprising a gradient boosting model, which is used to obtain a corresponding output indicating a responder score in act 710. In some embodiments, a gradient boosting model comprises a CatBoost classifier.
The machine learning model used to generate an immuno-oncology (IO) responder score may be of any suitable type. For example, in some embodiments, the machine learning model may be a gradient boosted machine learning model. Non-limiting examples of a gradient boosted machine learning model include an XGBoost model, a LightGBM model, a CatBoost model, an Adaboost model, or a random forest model. However, the machine learning model may be of any other suitable type and, for example, may be a non-linear regression model (e.g., a logistic regression model), a neural network model, a support vector machine, a Gaussian mixture model, a random forest model, a decision tree model, or any other suitable type of machine learning model, as aspects of the technology described herein are not limited in this respect.
In some embodiments, the machine learning model may comprise between 10 and 100 parameters, between 100 and 1000 parameters, between 1000 and 10,000 parameters, between 10,000 and 100,000 parameters or more than 100K parameters. Processing input data with a machine learning model comprises performing calculations using values of the machine learning model parameters and the values of the input to the machine learning model to obtain the corresponding output. Such calculations may involve hundreds, thousands, tens of thousands, hundreds of thousands or more calculations, in some embodiments.
In some embodiments, the machine learning model may include multiple parameters whose values may be estimated using training data. The process of estimating parameter values using training data is termed “training”. In some embodiments, a machine learning model may include one or more hyperparameters in addition to the multiple parameters. Values of the hyperparameters may be estimated during training as well.
After a responder score has been output from the machine learning model, the subject may optionally be identified as “IO-low”, IO-medium”, or “IO-high” based upon the responder score in act 712. The responder score of the subject is then compared to a specified threshold in order to determine whether or not a subject is likely to respond to an immuno-therapeutic agent in act 714. The value of the specified threshold may vary. In some embodiments, the value of the specified threshold ranges from about 0.2 to about 0.8 units. In some embodiments, the specified threshold is 0.5 units. If a subject is identified as having a responder score above the specified threshold, then the subject is identified as having an increase likelihood of responding to an IO agent. In some embodiments, the subject is identified as being “IO-low” when the subject has a responder score that is less than ≤0.05. In some embodiments, the subject is identified as being “IO-medium” when the responder score is ≥0.05 and <0.5. In some embodiments, the subject is identified as being “IO-high” when the responder score is ≥0.5. However, it should be appreciated that depending on the value of the specified threshold, a subject having a responder score of ≤0.5 may be identified as being “IO-high” (e.g., if the threshold value is 0.4, then a subject identified as having a responder score >0.4 will be identified as being “IO-high”). In some embodiments, the method further comprises administering an IO therapy to the subject in act 716.
In some aspects, the disclosure provides a method for predicting the likelihood of a subject responding to a tyrosine kinase inhibitor (TKI). In some embodiments, the method comprises generating, using RNA expression data that has been obtained from a subject, a set of input features, the set of input features comprising at least two (e.g., 2, 3, or 4) of the following features: a Macrophage signature for the subject; an Angiogenesis signature for the subject; a Proliferation rate signature for the subject; and a similarity score indicative of a similarity of an RC TME signature for the subject to RC TME signatures associated with RC TME type B samples. In some embodiments, the set of input features is used to train a machine learning model to obtain a corresponding output indicating a responder score, which is indicative of a likelihood that the subject responds to the TKI. In some embodiments, the machine learning model comprises a logistic regression model.
In some embodiments, a Macrophage signature comprises a Macrophages gene group score generated using RNA levels for 1, 2, 3, 4, 5, 6, 7, or 8 of the Macrophages group genes listed in Table 1. In some embodiments, an Angiogenesis signature comprises an Angiogenesis gene group score for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 Angiogenesis genes listed in Table 1. In some embodiments, a Proliferation rate signature comprises gene enrichment scores for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 genes listed in Table 1.
A responder score (e.g., a TKI responder score) produced for a subject is then compared to a specified threshold in order to determine whether or not a subject is likely to respond to a TKI. In some embodiments, the specified threshold is used to determine (e.g., classify) a subject as being “TKI-low”, “TKI-medium”, or “TKI-high”. The value of the specified threshold may range from between 0.1 to about 1.5 units (e.g., 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, or 1.5, or any unit value therebetween). In some embodiments, a specified threshold ranges from about 0 to about 1 units. In some embodiments, a specified threshold is 0.25, 0.5, 0.75, 0.85, 0.95, or 1.0. In some embodiments, a subject identified as having a responder score above the specified threshold is identified as being likely to respond to treatment with a TKI agent. In some embodiments, a subject having a responder score that is less than 0.6 is unlikely to respond positively to a TKI (e.g., the TKI is unlikely to be therapeutically effective in the subject). In some embodiments, a responder score equal to or greater than 0.6 indicates that a subject is likely to respond positively to a TKI (e.g., the TKI is unlikely to be therapeutically effective in the subject). In some embodiments, a responder score is used to identify the subject as “TKI-low” when the responder score is ≤0.75; “TKI-medium” when the responder score is ≥0.75, or <0.95; or “TKI-high” when the responder score is ≥0.95.
After the RC TME type of the subject has been identified, a machine learning model is used to obtain an output indicating a responder score (the responder score indicative of a likelihood that the subject responds to a TKI) from a set of input features in process 906. In some embodiments, the obtaining comprises generating, using RNA expression data that has been obtained from the subject, a set of input features, the set of input features comprising at least two (e.g., 2, 3, or 4) of the following features: a Macrophage associated signature for the subject; an Angiogenesis signature for the subject; a Proliferation rate signature for the subject; and a similarity score indicative of a similarity of an RC TME signature for the subject to RC TME signatures associated with RC TME type B samples in, act 908.
In some embodiments, generating the set of input features comprises determining the Macrophage signature for the subject using the RNA expression data by performing ssGSEA on the RNA expression data for at least three (e.g., 3, 4, 5, 6, or all) of the “Macrophages” genes listed in Table 1 to produce a Macrophage gene group score. In some embodiments, generating the set of input features comprises determining the Angiogenesis signature for the subject using the RNA expression data by performing ssGSEA on the RNA expression data for at least three (e.g., 3, 4, 5, 6, or all) of the “Angiogenesis” genes listed in Table 1 to produce an Angiogenesis gene group score. In some embodiments, generating the set of input features comprises determining the Proliferation rate signature for the subject using the RNA expression data by performing ssGSEA on the RNA expression data for at least three (e.g., 3, 4, 5, 6, or all) of the “Proliferation rate” genes listed in Table 1 to produce a Proliferation rate gene group score. In some embodiments, generating the set of input features comprises determining the similarity score by comparing the gene group scores of the RC TME signature of the subject to an average of gene group scores of a plurality of RC TME signatures from RC TME type B samples.
Next, the set of input features generated in act 908 is used as input for a machine learning model comprising a logistic regression model, which is used to obtain a corresponding output indicating a responder score in act 910.
The machine learning model used to generate a TKI responder score may be of any suitable type. For example, in some embodiments, the machine learning model may be a non-linear regression model (e.g., a logistic regression model). However, the machine learning model may be of any other suitable type and, for example, may be a gradient boosting model (e.g., XGBoost, CatBoost, LightGBM, etc.), a neural network model, a support vector machine, a Gaussian mixture model, a random forest model, a decision tree model, or any other suitable type of machine learning model, as aspects of the technology described herein are not limited in this respect.
In some embodiments, the machine learning model may comprise between 10 and 100 parameters, between 100 and 1000 parameters, between 1000 and 10,000 parameters, between 10,000 and 100,000 parameters or more than 100K parameters. Processing input data with a machine learning model comprises performing calculations using values of the machine learning model parameters and the values of the input to the machine learning model to obtain the corresponding output. Such calculations may involve hundreds, thousands, tens of thousands, hundreds of thousands or more calculations, in some embodiments.
In some embodiments, the machine learning model may include multiple parameters whose values may be estimated using training data. The process of estimating parameter values using training data is termed “training”. In some embodiments, a machine learning model may include one or more hyperparameters in addition to the multiple parameters. Values of the hyperparameters may be estimated during training as well.
After a responder score has been output from the machine learning model, the subject may optionally be identified as “TKI-low”, “TKI-medium”, or “TKI-high” based upon the responder score in act 912.
The responder score of the subject is then compared to a specified threshold in order to determine whether or not a subject is likely to respond to an immuno-therapeutic agent in act 914. The value of the specified threshold may vary. In some embodiments, the value of the specified threshold ranges from about 0.1 to about 1.5 units. In some embodiments, the specified threshold is 0.6 units. If a subject is identified as having a responder score above the specified threshold, then the subject is identified as having an increase likelihood of responding to a TKI. In some embodiments, the subject is identified as being “TKI-low” when the subject has a responder score that is less than ≤0.75. In some embodiments, the subject is identified as being “TKI-medium” when the responder score is ≥0.75 and <0.95. In some embodiments, the subject is identified as being “TKI-high” when the responder score is ≥0.95. However, it should be appreciated that depending on the value of the specified threshold, a subject having a responder score of <0.95 may be identified as being “TKI-high” (e.g., if the threshold value is 0.6, then a subject identified as having a responder score >0.6 will be identified as being “TKI-high”). In some embodiments, the method further comprises administering a TKI to the subject in act 916.
Aspects of the disclosure relate to methods for selecting one or more therapeutic agents for a subject having a renal cancer (e.g., ccRCC). The disclosure is based, in part, on methods that identify the likelihood of a patient's response to either an immune-oncology (IO) agent and/or a tyrosine kinase inhibitor (TKI) using RNA sequencing data obtained from the subject to produce one or more responder scores (e.g., a responder score for IO, a responder score for TKI, etc.) for the subject. Without wishing to be bound by any particular theory, methods of selecting a therapeutic agent described herein provide physicians increased confidence in identifying classes of therapeutic agents, or combinations of therapeutic agents, to which their patients have an increased likelihood of responding (and conversely, allow physicians to avoid prescribing therapeutic agents to which their patients are unlikely to respond), thereby improving patient care technology. A schematic depicting an example of methods described in this section is provided in
In some aspects, the disclosure provides a method for identifying one or more therapeutic agents for administration to a subject having renal cancer, the method comprising: generating an International Metastatic RCC Database Consortium (IMDC) Risk Score for the subject; when the subject is identified as having a Poor IMDC Risk Score, identifying a combination of immuno-oncology (IO) agent and TKI as the one or more therapeutic agents for administration to the subject; when the subject is identified as having a Favorable or Intermediate IMDC Risk Score, generating: an IO responder score according to a method as described herein; a TKI responder score according to a method as described herein; and identifying the one or more therapeutic agents for the subject using the IO responder score and the TKI responder score.
An IMDC Risk Score may be calculated using any suitable method, for example as described by Guida et al. Oncotarget. 2020; 11:4582-4592. Typically, an IMDC Risk Score classifies patients as one of the following categories: “Good” (also referred to as “Favorable”), “Intermediate”, or “Poor”, based upon six negative clinical prognostic factors: performance status (e.g., a score of <80 for Karnofsky Performance Status [KPS]); a hemoglobin level <low normal level [LNL]), the time from diagnosis to start of systemic treatment [DTT] (<1 year), a corrected serum calcium level (>upper normal level [UNL]), neutrophil count (>UNL), and platelet count (>UNL)). Patients lacking these negative factors are identified as having a “good” (or “favourable”) prognosis; patients presenting 1 or 2 of the factors have an “intermediate” risk of death; and patients with 3 or more factors have an expected “poor” risk outcome. In some embodiments of methods described by the disclosure, when a sample from a subject identified as having a “poor” IMDC Risk Score, a combination of TKI and IO agents are selected for the subject without further analysis of an IO responder score or a TKI responder score.
In some embodiments, the methods comprise a step of generating a TKI responder score for a subject having an “Intermediate” or “Favourable” IMDC Risk Score. Methods of generating IO responder scores are described elsewhere in the specification, for example in
In some embodiments, the methods comprise a step of generating an IO responder score for a subject having an “Intermediate” or “Favourable” IMDC Risk Score. In some embodiments, prior to generating the IO responder score, it is determined whether the subject is an “IO non-responder”. Methods of identifying “IO non-responders” are described elsewhere in the disclosure, for example in the section entitled “Responder Scores”. In some embodiments, identifying the subject as an “IO non-responder” comprises identifying that the subject (e.g., a biological sample obtained from the subject) has one or more of the following biomarkers: high Ploidy (e.g., as calculated by RTumor bioinformatics analysis), a high Myogenic Signature (e.g., as described throughout the specification for example in the section entitled Myogenesis Signature), RC TME type E (as determined by methods described throughout the specification), presence of mTOR activating mutations, or presence of mutations in antigen presentation machinery. In some embodiments, the method comprises selecting one or more TKIs for a subject identified as an “IO non-responder”.
If the subject is not identified as an “IO non-responder”, an IO responder score is generated for the subject. Methods of generating IO responder scores are described elsewhere in the specification, for example in
In some embodiments, the method comprises selecting (or providing a recommendation to select) one or more therapeutic agents for the subject (e.g., producing a report recommending selection of one or more therapeutic agents for the subject) using the TKI and IO responder scores. In some embodiments, when a subject is identified as “TKI-low” and “IO-low”, a TKI agent is selected for the subject. In some embodiments, when a subject is identified as “TKI-low” and “IO-low”, a combination of a TKI agent and an IO agent is selected for the subject. In some embodiments, when a subject is identified as “TKI-medium” and “IO-low” or “IO-medium”, a combination of a TKI agent and an IO agent is selected for the subject. In some embodiments, when a subject is identified as “TKI-medium” and “IO-low” or “IO-medium”, a TKI agent is selected for the subject. In some embodiments, when a subject is identified as “TKI-medium” and “IO-high” a combination of a TKI agent and an IO agent is selected for the subject. In some embodiments, when a subject is identified as “TKI-high” and “IO-low” or “IO-medium”, a TKI agent is selected for the subject. In some embodiments, when a subject is identified as “TKI-high” and “IO-high” a combination of a TKI agent and an IO agent is selected for the subject. In some aspects, the methods further comprise a step of administering the identified one or more therapeutic agents (e.g., TKI agent, or combination of TKI agent and IO agent) to the subject. Methods of administering a TKI agent or a combination of TKI agent and IO agent to a subject are described further herein, for example in the section entitled “Therapeutic Indications”.
Aspects of the disclosure relate to methods of identifying or selecting a therapeutic agent for a subject based upon determination of the subject's RC TME type and/or responder score (e.g., IO responder score or TKI responder score). The disclosure is based, in part, on the recognition that subjects having RC TME type E have a decreased likelihood of responding to certain therapies (e.g., an IO agent) relative to subjects having other RC TME types but may still respond to other therapies, for example TKIs. The disclosure is based, in part, on the recognition that subjects having RC TME type A or RC TME type B have an increased likelihood of responding to certain therapies (e.g., an IO agent) relative to subjects having other RC TME types.
In some embodiments, the therapeutic agents are immuno-oncology (IO) agents. An IO agent may be a small molecule, peptide, protein (e.g., antibody, such as monoclonal antibody), interfering nucleic acid, or a combination of any of the foregoing. In some embodiments, the IO agents comprise a PD1 inhibitor, PD-L1 inhibitor, or PD-L2 inhibitor. Examples of IO agents include but are not limited to cemiplimab, nivolumab, pembrolizumab, avelumab, durvalumab, atezolizumab, BMS1166, BMS202, etc.
In some embodiments, the therapeutic agents are tyrosine kinase inhibitors (TKIs). A TKI may be a small molecule, peptide, protein (e.g., antibody, such as monoclonal antibody), interfering nucleic acid, or a combination of any of the foregoing. Examples of TKIs include but are not limited to Axitinib (Inlyta®), Cabozantinib (Cabometyx®), Imatinib mesylate (Gleevec®), Dasatinib (Sprycel®), Nilotinib (Tasigna®), Bosutinib (Bosulif®), Sunitinib (Sutent®), etc.
In some embodiments, methods described by the disclosure further comprise a step of administering one or more therapeutic agents to the subject based upon the determination of the subject's RC TME type and/or responder score. In some embodiments, a subject is administered one or more (e.g., 1, 2, 3, 4, 5, or more) IO agents. In some embodiments, a subject is administered one or more (e.g., 1, 2, 3, 4, 5, or more) TKIs. In some embodiments, a subject is administered a combination of one or more IO agents and one or more TKIs.
Aspects of the disclosure relate to methods of treating a subject having (or suspected or at risk of having) RC based upon a determination of the RC TME type of the subject. In some embodiments, the methods comprise administering one or more (e.g., 1, 2, 3, 4, 5, or more) therapeutic agents to the subject. In some embodiments, the therapeutic agent (or agents) administered to the subject are selected from small molecules, peptides, nucleic acids, radioisotopes, cells (e.g., CAR T-cells, etc.), and combinations thereof. Examples of therapeutic agents include chemotherapies (e.g., cytotoxic agents, etc.), immunotherapies (e.g., immune checkpoint inhibitors, such as PD-1 inhibitors, PD-L1 inhibitors, etc.), antibodies (e.g., anti-HER2 antibodies), cellular therapies (e.g. CAR T-cell therapies), gene silencing therapies (e.g., interfering RNAs, CRISPR, etc.), antibody-drug conjugates (ADCs), and combinations thereof.
In some embodiments, a subject is administered an effective amount of a therapeutic agent. “An effective amount” as used herein refers to the amount of each active agent required to confer therapeutic effect on the subject, either alone or in combination with one or more other active agents. Effective amounts vary, as recognized by those skilled in the art, depending on the particular condition being treated, the severity of the condition, the individual patient parameters including age, physical condition, size, gender and weight, the duration of the treatment, the nature of concurrent therapy (if any), the specific route of administration and like factors within the knowledge and expertise of the health practitioner. These factors are well known to those of ordinary skill in the art and can be addressed with no more than routine experimentation. It is generally preferred that a maximum dose of the individual components or combinations thereof be used, that is, the highest safe dose according to sound medical judgment. It will be understood by those of ordinary skill in the art, however, that a patient may insist upon a lower dose or tolerable dose for medical reasons, psychological reasons, or for virtually any other reasons.
Empirical considerations, such as the half-life of a therapeutic compound, generally contribute to the determination of the dosage. For example, antibodies that are compatible with the human immune system, such as humanized antibodies or fully human antibodies, may be used to prolong half-life of the antibody and to prevent the antibody being attacked by the host's immune system. Frequency of administration may be determined and adjusted over the course of therapy, and is generally (but not necessarily) based on treatment, and/or suppression, and/or amelioration, and/or delay of a cancer. Alternatively, sustained continuous release formulations of an anti-cancer therapeutic agent may be appropriate. Various formulations and devices for achieving sustained release are known in the art.
In some embodiments, dosages for an anti-cancer therapeutic agent as described herein may be determined empirically in individuals who have been administered one or more doses of the anti-cancer therapeutic agent. Individuals may be administered incremental dosages of the anti-cancer therapeutic agent. To assess efficacy of an administered anti-cancer therapeutic agent, one or more aspects of a cancer (e.g., tumor microenvironment, tumor formation, tumor growth, or RC TME types, etc.) may be analyzed.
Generally, for administration of any of the anti-cancer antibodies described herein, an initial candidate dosage may be about 2 mg/kg. For the purpose of the present disclosure, a typical daily dosage might range from about any of 0.1 μg/kg to 3 μg/kg to 30 μg/kg to 300 μg/kg to 3 mg/kg, to 30 mg/kg to 100 mg/kg or more, depending on the factors mentioned above. For repeated administrations over several days or longer, depending on the condition, the treatment is sustained until a desired suppression or amelioration of symptoms occurs or until sufficient therapeutic levels are achieved to alleviate a cancer, or one or more symptoms thereof. An exemplary dosing regimen comprises administering an initial dose of about 2 mg/kg, followed by a weekly maintenance dose of about 1 mg/kg of the antibody, or followed by a maintenance dose of about 1 mg/kg every other week. However, other dosage regimens may be useful, depending on the pattern of pharmacokinetic decay that the practitioner (e.g., a medical doctor) wishes to achieve. For example, dosing from one-four times a week is contemplated. In some embodiments, dosing ranging from about 3 μg/mg to about 2 mg/kg (such as about 3 μg/mg, about 10 μg/mg, about 30 μg/mg, about 100 μg/mg, about 300 μg/mg, about 1 mg/kg, and about 2 mg/kg) may be used. In some embodiments, dosing frequency is once every week, every 2 weeks, every 4 weeks, every 5 weeks, every 6 weeks, every 7 weeks, every 8 weeks, every 9 weeks, or every 10 weeks; or once every month, every 2 months, or every 3 months, or longer. The progress of this therapy may be monitored by conventional techniques and assays and/or by monitoring RC TME types as described herein. The dosing regimen (including the therapeutic used) may vary over time.
When the anti-cancer therapeutic agent is not an antibody, it may be administered at the rate of about 0.1 to 300 mg/kg of the weight of the patient divided into one to three doses, or as disclosed herein. In some embodiments, for an adult patient of normal weight, doses ranging from about 0.3 to 5.00 mg/kg may be administered. The particular dosage regimen, e.g., dose, timing, and/or repetition, will depend on the particular subject and that individual's medical history, as well as the properties of the individual agents (such as the half-life of the agent, and other considerations well known in the art).
For the purpose of the present disclosure, the appropriate dosage of an anti-cancer therapeutic agent will depend on the specific anti-cancer therapeutic agent(s) (or compositions thereof) employed, the type and severity of cancer, whether the anti-cancer therapeutic agent is administered for preventive or therapeutic purposes, previous therapy, the patient's clinical history and response to the anti-cancer therapeutic agent, and the discretion of the attending physician. Typically, the clinician will administer an anti-cancer therapeutic agent, such as an antibody, until a dosage is reached that achieves the desired result.
Administration of an anti-cancer therapeutic agent can be continuous or intermittent, depending, for example, upon the recipient's physiological condition, whether the purpose of the administration is therapeutic or prophylactic, and other factors known to skilled practitioners. The administration of an anti-cancer therapeutic agent (e.g., an anti-cancer antibody) may be essentially continuous over a preselected period of time or may be in a series of spaced dose, e.g., either before, during, or after developing cancer.
As used herein, the term “treating” refers to the application or administration of a composition including one or more active agents to a subject, who has a cancer, a symptom of a cancer, or a predisposition toward a cancer, with the purpose to cure, heal, alleviate, relieve, alter, remedy, ameliorate, improve, or affect the cancer or one or more symptoms of RC, or the predisposition toward RC.
Alleviating RC includes delaying the development or progression of the disease, or reducing disease severity. Alleviating the disease does not necessarily require curative results. As used therein, “delaying” the development of a disease (e.g., a cancer) means to defer, hinder, slow, retard, stabilize, and/or postpone progression of the disease. This delay can be of varying lengths of time, depending on the history of the disease and/or individuals being treated. A method that “delays” or alleviates the development of a disease, or delays the onset of the disease, is a method that reduces probability of developing one or more symptoms of the disease in a given time frame and/or reduces extent of the symptoms in a given time frame, when compared to not using the method. Such comparisons are typically based on clinical studies, using a number of subjects sufficient to give a statistically significant result.
“Development” or “progression” of a disease means initial manifestations and/or ensuing progression of the disease. Development of the disease can be detected and assessed using clinical techniques known in the art. Alternatively, or in addition to the clinical techniques known in the art, development of the disease may be detectable and assessed based on other criteria. However, development also refers to progression that may be undetectable. For purpose of this disclosure, development or progression refers to the biological course of the symptoms. “Development” includes occurrence, recurrence, and onset. As used herein “onset” or “occurrence” of a cancer includes initial onset and/or recurrence.
Examples of the antibody anti-cancer agents include, but are not limited to, alemtuzumab (Campath), trastuzumab (Herceptin), Ibritumomab tiuxetan (Zevalin), Brentuximab vedotin (Adcetris), Ado-trastuzumab emtansine (Kadcyla), blinatumomab (Blincyto), Bevacizumab (Avastin), Cetuximab (Erbitux), ipilimumab (Yervoy), nivolumab (Opdivo), pembrolizumab (Keytruda), atezolizumab (Tecentriq), avelumab (Bavencio), durvalumab (Imfinzi), and panitumumab (Vectibix).
Examples of an immunotherapy include, but are not limited to, a PD-1 inhibitor or a PD-L1 inhibitor, a CTLA-4 inhibitor, adoptive cell transfer, therapeutic cancer vaccines, oncolytic virus therapy, T-cell therapy, and immune checkpoint inhibitors.
Examples of radiation therapy include, but are not limited to, ionizing radiation, gamma-radiation, neutron beam radiotherapy, electron beam radiotherapy, proton therapy, brachytherapy, systemic radioactive isotopes, and radiosensitizers.
Examples of a surgical therapy include, but are not limited to, a curative surgery (e.g., tumor removal surgery), a preventive surgery, a laparoscopic surgery, and a laser surgery.
Examples of the chemotherapeutic agents include, but are not limited to, R-CHOP, Carboplatin or Cisplatin, Docetaxel, Gemcitabine, Nab-Paclitaxel, Paclitaxel, Pemetrexed, and Vinorelbine. Additional examples of chemotherapy include, but are not limited to, Platinating agents, such as Carboplatin, Oxaliplatin, Cisplatin, Nedaplatin, Satraplatin, Lobaplatin, Triplatin, Tetranitrate, Picoplatin, Prolindac, Aroplatin and other derivatives; Topoisomerase I inhibitors, such as Camptothecin, Topotecan, irinotecan/SN38, rubitecan, Belotecan, and other derivatives; Topoisomerase II inhibitors, such as Etoposide (VP-16), Daunorubicin, a doxorubicin agent (e.g., doxorubicin, doxorubicin hydrochloride, doxorubicin analogs, or doxorubicin and salts or analogs thereof in liposomes), Mitoxantrone, Aclarubicin, Epirubicin, Idarubicin, Amrubicin, Amsacrine, Pirarubicin, Valrubicin, Zorubicin, Teniposide and other derivatives; Antimetabolites, such as Folic family (Methotrexate, Pemetrexed, Raltitrexed, Aminopterin, and relatives or derivatives thereof); Purine antagonists (Thioguanine, Fludarabine, Cladribine, 6-Mercaptopurine, Pentostatin, clofarabine, and relatives or derivatives thereof) and Pyrimidine antagonists (Cytarabine, Floxuridine, Azacitidine, Tegafur, Carmofur, Capacitabine, Gemcitabine, hydroxyurea, 5-Fluorouracil (5FU), and relatives or derivatives thereof); Alkylating agents, such as Nitrogen mustards (e.g., Cyclophosphamide, Melphalan, Chlorambucil, mechlorethamine, Ifosfamide, mechlorethamine, Trofosfamide, Prednimustine, Bendamustine, Uramustine, Estramustine, and relatives or derivatives thereof); nitrosoureas (e.g., Carmustine, Lomustine, Semustine, Fotemustine, Nimustine, Ranimustine, Streptozocin, and relatives or derivatives thereof); Triazenes (e.g., Dacarbazine, Altretamine, Temozolomide, and relatives or derivatives thereof); Alkyl sulphonates (e.g., Busulfan, Mannosulfan, Treosulfan, and relatives or derivatives thereof); Procarbazine; Mitobronitol, and Aziridines (e.g., Carboquone, Triaziquone, ThioTEPA, triethylenemalamine, and relatives or derivatives thereof); Antibiotics, such as Hydroxyurea, Anthracyclines (e.g., doxorubicin agent, daunorubicin, epirubicin and relatives or derivatives thereof); Anthracenediones (e.g., Mitoxantrone and relatives or derivatives thereof); Streptomyces family antibiotics (e.g., Bleomycin, Mitomycin C, Actinomycin, and Plicamycin); and ultraviolet light.
In some aspects, the disclosure provides a method for treating renal cancer (RC), the method comprising administering one or more therapeutic agents (e.g., one or more anti-cancer agents, such as one or more chemotherapeutic agents) to a subject identified as having a particular RC TME type, wherein the RC TME type of the subject has been identified by method as described by the disclosure.
In some embodiments, a subject has been identified as having RC TME type A, RC TME type B, RC TME type C, RC TME type D, or RC TME type E. In some embodiments, a subject has been identified as having RC TME type B or RC TME type A.
The disclosure is based, in part, on the inventors' recognition that subjects having certain RC TME types are likely to respond well to certain immunotherapies (e.g., immune checkpoint inhibitors, such as pembrolizumab or nivolumab, or TKIs). Dosing of immuno-oncology agents is well-known, for example as described by Louedec et al. Vaccines (Basel). 2020 December; 8(4): 632. For example, dosages of pembrolizumab, for example, include administration of 200 mg every 3 weeks or 400 mg every 6 weeks, by infusion over 30 minutes. Dosing of TKIs is also well-known, for example as described by Gerritse et al. Cancer Treat Rev. 2021 June; 97:102171. doi: 10.1016/j.ctrv.2021.102171. Combination dosing of TKIs and IO agents is also known, for example as described by Rassy et al. Ther Adv Med Oncol. 2020; 12: 1758835920907504.
Aspects of the disclosure are based on the inventors' recognition that subjects having certain RC TME types are unlikely to respond well to certain therapeutic agents, such as immunotherapeutic agents or TKIs. Thus, in some embodiments, the therapeutic agent comprises a therapeutic agent other than an immunotherapy when the subject has been identified as having an RC TME type E, or when the subject has been identified as a “clear IO non-responder”. In some embodiments, the other therapeutic agent is a TKI.
In some aspects, methods disclosed herein comprise generating a report for assisting with the preparation of recommendation for prognosis and/or treatment. The generated report can provide summary of information, so that the clinician can identify the RC TME type or suitable therapy. The report as described herein may be a paper report, an electronic record, or a report in any format that is deemed suitable in the art. The report may be shown and/or stored on a computing device known in the art (e.g., handheld device, desktop computer, smart device, website, etc.). The report may be shown and/or stored on any device that is suitable as understood by a skilled person in the art.
In some embodiments, methods disclosed herein can be used for commercial diagnostic purposes. For example, the generated report may include, but is limited to, information concerning expression levels of one or more genes from any of the gene groups described herein, clinical and pathologic factors, patient's prognostic analysis, predicted response to the treatment, classification of the RC TME environment (e.g., as belonging to one of the types described herein), the alternative treatment recommendation, and/or other information. In some embodiments, the methods and reports may include database management for the keeping of the generated reports. For instance, the methods as disclosed herein can create a record in a database for the subject (e.g., subject 1, subject 2, etc.) and populate the specific record with data for the subject. In some embodiments, the generated report can be provided to the subject and/or to the clinicians. In some embodiments, a network connection can be established to a server computer that includes the data and report for receiving or outputting. In some embodiments, the receiving and outputting of the date or report can be requested from the server computer.
An illustrative implementation of a computer system 1500 that may be used in connection with any of the embodiments of the technology described herein (e.g., such as the method of
Computing device 1500 may also include a network input/output (I/O) interface 1540 via which the computing device may communicate with other computing devices (e.g., over a network), and may also include one or more user I/O interfaces 1550, via which the computing device may provide output to and receive input from a user. The user I/O interfaces may include devices such as a keyboard, a mouse, a microphone, a display device (e.g., a monitor or touch screen), speakers, a camera, and/or various other types of I/O devices.
The above-described embodiments can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software, or a combination thereof. When implemented in software, the software code can be executed on any suitable processor (e.g., a microprocessor) or collection of processors, whether provided in a single computing device or distributed among multiple computing devices. It should be appreciated that any component or collection of components that perform the functions described above can be generically considered as one or more controllers that control the above-discussed functions. The one or more controllers can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware (e.g., one or more processors) that is programmed using microcode or software to perform the functions recited above.
In this respect, it should be appreciated that one implementation of the embodiments described herein comprises at least one computer-readable storage medium (e.g., RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other tangible, non-transitory computer-readable storage medium) encoded with a computer program (i.e., a plurality of executable instructions) that, when executed on one or more processors, performs the above-discussed functions of one or more embodiments. The computer-readable medium may be transportable such that the program stored thereon can be loaded onto any computing device to implement aspects of the techniques discussed herein. In addition, it should be appreciated that the reference to a computer program which, when executed, performs any of the above-discussed functions, is not limited to an application program running on a host computer. Rather, the terms computer program and software are used herein in a generic sense to reference any type of computer code (e.g., application software, firmware, microcode, or any other form of computer instruction) that can be employed to program one or more processors to implement aspects of the techniques discussed herein.
The foregoing description of implementations provides illustration and description but is not intended to be exhaustive or to limit the implementations to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of the implementations. In other implementations the methods depicted in these figures may include fewer operations, different operations, differently ordered operations, and/or additional operations. Further, non-dependent blocks may be performed in parallel. It will be apparent that example aspects, as described above, may be implemented in many different forms of software, firmware, and hardware in the implementations illustrated in the figures. Further, certain portions of the implementations may be implemented as a “module” that performs one or more functions. This module may include hardware, such as a processor, an application-specific integrated circuit (ASIC), or a field-programmable gate array (FPGA), or a combination of hardware and software.
Kidney cancer is among the top 10 most frequently diagnosed cancers worldwide, with clear cell renal cell carcinoma (ccRCC) comprising ˜75% of all cases. While the emergence of combination strategies of immuno-oncology (IO) agents with anti-angiogenic tyrosine kinase inhibitors (TKI) has significantly improved the clinical outcomes in this patient population, currently no reliable biomarkers exist to guide treatment decisions.
This example describes a novel approach that integrated whole exome and transcriptome sequencing data from ccRCC samples (n=1,527) to classify the RCC tumor microenvironment (TME) into five major types (referred to as RCC types or RCTs) with distinct immunological composition: “immune-enriched, fibrotic (IE/F)”, “immune-enriched (IE)”, “fibrotic (F)”, “desert with metabolic content (D)”, and “desert with high endothelial cell content (D/E)”. In addition, multiple genomic and transcriptomic-based biomarkers that correlated with lack of IO response were identified. Furthermore, machine learning-based algorithms were employed to generate multifaceted IO and TKI responder scores that combined factors including the RCCTs, angiogenesis, proliferation, macrophage signature and the expression of PD-1, PD-L1 and PD-L2 genes.
Data indicate association of the RCT “IE/F” with superior clinical response (41% response rate) in the IO+TKI cohort. Conversely, the RCT “DIE”, characterized by elevated angiogenesis and the absence of immune cell infiltration, responded significantly better to single agent TKI (50% response rate). Among the genomic and transcriptomic biomarkers, activating mutations of genes within the mTOR signaling pathway, mutations in antigen presentation machinery, high ploidy (>4), and a high myogenesis signature were meaningfully enriched in the IO-resistant patients.
Multi-Omics Responder Scores were then retrospectively constructed and validated in multiple public cohorts, including patients treated with sunitinib (Beuselinck, n=53), pazopanib or sunitinib (COMPARZ trial, n=341), avelumab plus axitinib or sunitinib (JAVELIN Renal 101, n=726), atezolizumab or atezolizumab plus bevacizumab (IMmotion 150, n=160), nivolumab (CheckMate 009/010/025, n=172) and a mixed cohort of patients from Washington University (WUSMRCC, n=75). In all cohorts, implementation of the scoring system described herein led to improved progression free survival and overall survival and appeared superior to all currently available approaches.
In conclusion, a machine learning-based multifaceted approach based on genomic and transcriptomic analyses plus TME composition of ccRCC tumors was used to predict response to IO and TKIs. Retrospective analyses of multiple different cohorts supported the clinical utility of these novel biomarkers.
This example describes a novel approach that integrates whole exome and transcriptome sequencing data from ccRCC samples (n˜1,500) to classify the tumor microenvironment (TME) into five major RC types (also referred to as RC TME types) with distinct immunological composition: immune-enriched, fibrotic (IE/F or “type A”), immune-enriched (IE or “type B”), fibrotic (F or “type C”), desert with metabolic content (type “D”), and desert with high endothelial cell content (“type E”). RC TME types were identified using the gene signatures shown in Table 1, which reflected immune and stromal part of tumor and metabolism pathways activity. ccRCC is considered a cancer caused by metabolic changes due to a high frequency of mutations in genes that control aspects of metabolism, such as a VHL mutation in the hypoxia pathway and mutations in the PI3K-AKT-mTOR pathway (MTOR, TSC1/2, PTEN, and MET) that dysregulate the control of growth in response to nutrient levels. Metabolic shift in glycolysis, oxidative phosphorylation, TCA cycle, fatty acid metabolism and other processes have been observed in ccRCC cells and subjects.
For input of the analysis, gene expression data was obtained using standard bioinformatics analysis packages. For example, RNAseq gene expression data was provided in transcripts per million (TPM). In some cases, FPKM/RPKM values were utilized. For all cohorts, only ccRCC samples were selected (other histological types and normal samples were excluded). There were nine datasets (n˜1500) collected from various platforms. In the analysis, 33 gene signatures were used to identify the five RC TME types. The activity of each signature in each sample was measured using a ssGSEA algorithm.
The ssGSEA scores for each signature were medium-scaled inside each cohort. After that, a graph-based clustering algorithm was performed to produce a graph with samples at nodes and correlation of the ssGSEA scores at edges. Each node had 120 neighbors. Then, the Leiden algorithm was applied to the resulting graph and the five RC TME types were identified. A representative heatmap showing clear cell renal carcinoma cancer samples classified into five distinct RC TME types (A, B, C, D, E) based on unsupervised dense clustering of 33 gene expression signatures is shown in
Data, shown in
To prescribe the correct treatment for a tumor, a patient typically undergoes a biopsy or a part of the tumor is taken after surgical removal. Then it is sequenced (Targeted Exome-normal, Targeted Exome-tumor, WES-normal, WES-tumor, RNAseq-tumor) and all the necessary molecular functional features are annotated. According to these features, obtained models predict the probability of a response to therapeutic agents. Then, a physician, depending on the patient's previous treatment, current condition, and other clinical factors, decides which drug to use, based upon on the prediction of the models (if there are several alternative therapeutic options).
There are more than a dozen different approved drugs for the treatment of metastatic and advanced clear cell renal carcinoma (ccRCC). When choosing a treatment, a doctor is usually guided by the patient's condition and mono-biomarkers, such as PDL1 IHC, TMB, MSI, which cannot fully account for the complex composition of the tumor as a whole. This example describes a model which is able to assess the likelihood of response to the two most common treatment options for ccRCC: immuno-oncology (IO) agents and tyrosine kinase inhibitors (TKI).
Machine learning-based algorithms that generate multifaceted IO and TKI responder scores were produced. The scores combine factors including the RC TME type, angiogenesis signature, proliferation signature, macrophage signature and the expression of PD-1, PD-L1 and PD-L2 genes. Among the genomic and transcriptomic biomarkers that were meaningfully enriched in the IO-resistant patients were 1) activating mutations of genes within the mTOR signaling pathway, 2) mutations in antigen presentation machinery, 3) high ploidy (>4) and 4) a high Myogenesis signature (described throughout the specification including an Example 4).
To make the models, all currently available public datasets (genomic+transcriptomic) obtained on different platforms of patients with the same diagnosis were collected. The patients in these datasets were treated with the same drug and have known responses or survival rates. All previously published biomarkers of treatment response and other important traits were also collected and used as features for models. Ten public datasets of ccRCC diagnosis and responses to IO and TKI treatments and transcriptomic and genomic data were collected together. For each type of treatment, out of all known features that can be obtained from tumor DNA and RNA using NGS and microarrays, the most important biomarker features were selected using machine learning methods. These features provide the best way to determine the likelihood of a drug response. Thus, for IO, the most important features were determined to be the expression of genes PD1, PDL1, PDL2, and the signatures characterizing ECM associated genes, Angiogenesis and Proliferation rate. For TKI, the most important parameters were determined to be the signatures of Angiogenesis, Macrophages and Proliferation rate.
All features are unified so that they are comparable between datasets. Among all the features in the final model, only those are selected that have the maximum predictive ability and do not introduce noise. Some of the datasets are used for training, and some for validation. The output is a model that provides a single “responder score” predicting the likelihood of a response to treatment (e.g., IO or TKI).
In all cohorts, implementation of scores from the models led to improved progression free survival (PFS) and overall survival (OS) and appeared superior to all currently available approaches. Having NGS cancer biopsy tests and applying these models on the calculated set biomarkers, the physician can assess the likelihood of response to a particular treatment and make a more informed decision.
Examples of training and validation of machine learning models for assessing likelihood of response to IO are shown in
If a sample is classified as ‘IO non-responder’, it was automatically assigned a responder score of 0 and the sample was not used for model training and validation.
A CatBoost classifier model with parameter auto_class_weights set to ‘Balanced’ was used. The model was trained on eight transcriptomic features: expression of PD1, PDL1, PDL2; Endothelium signature, Angiogenesis signature, and Proliferation rate signature. Similarity of a sample to RC TME type “B” and RC TME type “C” was also used for training. Similarity of a sample to a particular RC TME type was calculated as the Spearman correlation coefficient between sample's ssGSEA scores of 33 gene signatures (e.g., based on the gene groups shown in Table 1) and a particular RC TME type's averaged ssGSEA scores of 33 gene signatures (based on the gene groups shown in Table 1). The resulting model score (e.g., responder score) varied from 0 to 1.
The IO model was trained on samples treated with IO-containing regimes from 3 cohorts: WUSMRCC (31 patients IPI+NIVO and 10 patients CABO+NIVO/AXI+PEM), Immotion150 (77 patients ATEZO, 83 ATEZO+BEV), CheckMate25 (172 patients NIVO). For model training patients were selected from these three cohorts with CR (25 patients) and PD (93 patients) RECIST (Response evaluation criteria in solid tumors). These patients were used as two groups for a model to predict. The model was validated on the JAVELIN cohort (354 patients AVE+AXI).
Examples of training and validation of machine learning models for assessing likelihood of response to TKI are shown in
The TKI model was trained on samples treated with TKI regimes from two cohorts: MUSMRCC (37 patients PAZ/SUN/CABO/AXI), Beuselinck (53 patients SUN). For model training, patients were from these two cohorts with CR+PR (45 patients) and PD (13 patients) RECIST. These patients were used as two groups for a model to predict. The model was validated on the JAVELIN cohort (372 patients SUN) and COMPARZ cohort (341 patients PAZ/SUN).
It was observed that a small group of “IO non-responder” patients exhibit an extremely high activity of a Myogenesis signature, which consists of 14 genes (listed in Table 2) that are expressed mainly in myoblasts and skeletal muscle cells. A Myogenesis signature was calculated using the gene group signature pipeline described above: ssGSEA scores were calculated, and the scores inside the cohorts were median scaled.
This application claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application No. 63/158,825, filed Mar. 9, 2021, titled “PREDICTING RESPONSE TO TREATMENTS IN PATIENTS WITH CLEAR CELL RENAL CELL CARCINOMA”, the entire contents of which are incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63158825 | Mar 2021 | US |