Significant effort has been expended towards developing state-of-the-art models that are trained and deployed on datasets for predicting disease activity in patients. For example, models are developed using a training dataset including data related to a disease and the models are subsequently deployed on a test dataset to generate predictions for the disease. These state-of-the art models require the development of disease-specific signatures that are only applicable for making predictions for that particular disease. Put another way, these trained models are only useful for generating predictions for the same disease for which the models were trained for.
There are significant limitations to this strategy. First, obtaining a training dataset that is sufficient for training a model can be difficult for certain diseases, such as a disease for which there are not enough real life data points. This can be the case for rare diseases or for novel diseases. Second, even if a sufficient training dataset is obtained, the process of training a model for multiple diseases is computationally expensive and often risks overfitting each model to the training dataset. As a result, the model suffers a significant loss in performance when applied to a test dataset or when the models are generalized to new sources of data (e.g., new sources of data with differences in geography and patient populations).
Disclosed herein are universal signatures that represent generalizable features that are informative for making predictions for different disease indications. In various embodiments, a machine learning approach is implemented to identify common elements in data sets and then these common elements are tested empirically to determine whether they are informative about a second data set from a disease or process distinct from the original data set. Sets of genes, hereafter referred to as universal signatures, are predictive across diverse datasets and/or species (e.g. rhesus to humans). These universal signatures are useful in different use cases, examples of which include the cases of progression of latent to active tuberculosis, and severity of COVID-19 and influenza A H1N1 infection. Therefore, universal signatures can be deployed in settings that lack disease-specific biomarkers. Thus, a small set of archetypal human immunophenotypes, captured by universal signatures, can explain a larger set of responses to diverse diseases.
Embodiments described herein are methods for developing one or more universal signatures according to data associated with a first disease indication. The one or more universal signatures are used to generate predictions for disease activity in a second (e.g., different) disease indication. Furthermore, described herein are embodiments directed to non-transitory computer readable mediums comprising instructions that, when executed by a processor, cause the processor to develop one or more universal signatures according to data associated with a first disease. Furthermore, such instructions can cause the processor to use the one or more universal signatures to generate predictions for disease activity in a second (e.g., different) disease.
Altogether, the development and implementation of the one or more universal signatures represents a form of transfer learning, where the one or more universal signatures learned from data relating to a first disease indication can be applied to solve a new problem, which in this case involves generating predictions for a second disease indication (e.g., a different disease or a disease in a different species). Thus, universal signatures can be informative across unrelated datasets pertaining to different diseases. The use of transfer learned universal signatures is useful for generating predictions for diseases where sufficient examples in training datasets are limited or difficult to obtain. For example, the learned universal signature of a first disease indication can be applied to generate predictions for disease activity of a rare or novel disease. Additionally, the use of transfer learned universal signatures avoids the problem of overfitted models. Universal signatures may sacrifice a level of sensitivity and/or specificity for any particular individual disease to ensure that the universal signatures are generally predictive for disease activities across multiple diseases. More generally, the work provides support to the concept of human immunophenotypes based on universal signatures.
Disclosed herein is a method for identifying one or more universal signatures useful for evaluating disease activity of two or more diseases, the method comprising: obtaining or having obtained expressions of a plurality of markers across individuals for a first disease indication; analyzing the expressions of the plurality of markers using a machine-learned analysis to identify one or more universal signatures from the first disease indication, wherein the one or more universal signatures are features that are predictive for a second disease indication, wherein each of the first disease indication and the second disease indication is characterized by a common condition.
Additionally disclosed herein is a method for generating a prediction of a second disease indication for a patient, the method comprising: obtaining or having obtained expressions of one or more universal signatures from the subject, the one or more universal signatures derived from a machine-learned analysis of a plurality of markers across individuals associated with a first disease indication, wherein each of the first disease indication and the second disease indication is characterized by a common condition; and based on the expressions for the one or more universal signatures, generating the prediction of the second disease indication.
In various embodiments, the one or more universal signatures comprise one or more of genes, nucleic acids, metabolites, or protein biomarkers. In various embodiments, the common condition is any one of a precursor to a disease, a sub phenotype of a disease, progression from latent to acute infection, progression from acute to chronic infection, response to an intervention, susceptibility to disease or infection, presence of acute inflammation, presence of chronic inflammation, a dysregulated pathway expression, a cellular phenotype, or a clinical phenotype. In various embodiments, the clinical phenotype is any one of high blood pressure, fever, loss of blood, loss of consciousness, increased heart rate, or need for mechanical ventilation. In various embodiments, the first disease indication describes a disease activity of a first disease, and wherein the second disease indication describes a disease activity of a second disease, and wherein the first disease indication differs from the second disease indication by any of a different disease activity of a disease, a disease activity of different diseases, different disease activity of different diseases.
In various embodiments, each of the first disease indication or second disease indication is any one of activity of an inflammatory disease, activity of a disease observed in an animal model, activity of a bacterial infectious disease, a progression from latent to acute infection, and wherein the disease activity of the second disease is any one of disease of a cancer, activity of a human disease that represents an equivalent phenotype of a disease in an animal, activity of an infectious disease from a non-bacterial infectious agent, protection after vaccination, estimated time to death due to disease, or a diseased condition. In various embodiments, the first disease is an inflammatory disease and the second disease is a cancer. In various embodiments, the first disease is observed in an animal model and wherein the second disease is an equivalent disease phenotype in humans. In various embodiments, the first disease is a bacterial infectious disease and wherein the second disease is a disease from a non-bacterial infectious agent. In various embodiments, the disease activity of the first disease is a progression from latent to acute infection and wherein the disease activity of the second disease is protection after vaccination.
In various embodiments, the machine-learned analysis is random forest or gradient boosting for identifying the one or more universal signatures. In various embodiments, the intervention is any one of a small molecule therapeutic, a biologic, a vaccine, or a gene therapy. In various embodiments, individuals with the second disease have encountered or are likely to encounter the common condition.
In various embodiments, generating a prediction of the second disease indication for the patient comprises performing an unsupervised clustering of the expressions of the one or more universal signatures to classify the patient. In various embodiments, generating the prediction of the second disease indication for a patient comprises performing a dimensionality reduction analysis of the expressions of the one or more universal signatures.
In various embodiments, the method further comprises: determining whether to include the subject in a clinical trial study according to the predicted disease activity of the disease in the subject.
In various embodiments, the one or more universal signatures comprise one or more genes selected from NUP93, PPM1G, C6orf62, PJA1, MEST, NDUFS2, DDOST, DHRS7B, NOLC1, POLA2, PRSS23, SHMT1, RIPK1, AKR1A1, PRPF3, ETS1, MANSC1, PDHA1, ACLY, CHI3L2, MCMI, DNAJC18, LCT, YRDC, AIFM1, SFN, FBN1, EIF4H, CLEC4A, BCAP31, ATG4B, CSRP1, RDH11, GCLM, CDC7, GLOD5, IDH2, FMR1, PPARA, CCNE1, DDB1, BMP1, EHD4, VAV3, MPG, SPAG4, PSMD3, BCKDHA, GRAMD1B, and SEC61A1. In various embodiments, the one or more universal signatures comprise one or more genes selected from CRB3, BCAP31, GMPPB, CD4, STARD3, CALR, CSRP1, CPT1A, LDLRAP1, RRAS, HMGCR, RASGRP2, PTS, SORDSLC26A6, VAT1, GPAA1, CXCR3, NAMPT, EPHX1, SEPT9, GMPPA, B4GALT7, AAAS, TP53INP1, GYS1, FASN, NOC4L, RRP9, MXIL TP53, SLC7A11, FOXP3, DNASE1L1, MGAT1, SEC61A1, FYCO1, S100A10, LSS, IFRD1, DCP2, EDC4, ANKZFL IDUA, IGFBP2, DDX39A, UCHL1, NR4A1, PDIA5, and ENGASE. In various embodiments, the one or more universal signatures comprise one or more genes selected from NUB1, CASP1, WARS, TRIM21, STAT1, MOCOS, BCL2L14, ATF3, KIF2A, PDCD1LG2, SNX10, SEC24D, UBE2L6, LDHC, FAS, CXCL10, STAT2, IRF7, CD274, PSME2, LPCAT2, PSMB8, FBXO6, DUSP10, PLA2G4C, BANF1, EPOR, KCNMA1, CTSK, ITGA2, MPZL2, FEZ1, JAK2, BAZ1A, ICAM4, DAPP1, RIPK1, RNF144B, LAP3, C1QA, TYMP, GCH1, C1QB, CREM, ETV7, FOSB, MRPL15, PSEN1, MXI1, and TRAFD1.
In various embodiments, the one or more universal signatures comprise one or more genes selected from DNAAF1, UQCRC2, XPNPEP1, ACSM1, DDX60, TPI1, EFNA3, ZDHHC19, DDIT3, DNAJC12, RET, IL20RB, TNFSF10, DLG4, CKAP4, NDST1, GAPDH, ARL3, PLG, MDH2, GSTP1, S100A9, B4GALT7, H2AFJ, LTB4R, TAGLN2, IRF7, NDUFV1, CD300LB, RTP4, CTSD, HIST1H2BG, IL27, TNFRSF1B, SORBS1, NOP2, TNFSF13B, HLA-DRB5, RHOG, PSMB9, HSPA6, CD63, SLC2A8, IFITM1, CKB, ALDOA, MSRB1, OSMR, DRAP1, and PLA2G4A. In various embodiments, the one or more universal signatures comprise one or more genes selected from LRRC28, E2F4, MRPL15, CCL22, OTUD1, NSUN7, CHEK1, ADGRA2, ZFPM2, GYS2, CD151, RAD51C, ARHGEF2, PFN1, AP4B1, IGFBP4, OASL, PDGFC, MIEN1, BEST3, SH3RF1, RACGAP1, FMO3, HNRNPA2B1, F2RL1, CAMKK2, ITGB5, FLVCR2, ZNF462, KIAA1324, CENPN, IKBKE, SERPINF2, FAM162A, SNX2, SERPING1, CLCA2, DPEP3, TNFAIP2, FSTL4, CTSD, BCAR1, MKX, RGS2, SAMD9, GCLM, BST1, IRS2, RNASE6, and ELOVL3. In various embodiments, the one or more universal signatures comprise one or more genes selected from GSTM3, GYG1, CCL22, MOCS2, LY6E, CD151, S100A12, HEBP2, EIF3B, BAAT, MRPL11, OAS1, RFX5, PSMD7, ALDH2, STAP1, GYS2, GMFB, CCL3, PSMA4, CTHRC1, CMTM2, CD36, B4GALT2, EDF1, CDK5R1, TREML3P, PML, HEPHL1, TNFRSF21, PSMB9, GNAI1, TSPAN13, ATP6V0B, SLC4A4, ILF2, AKAP12, HLA-DRB5, PGR, AGTRAP, P3H1, CDADC1, TRIM5, PTGER3, ADCY6, ERBB2, NFYA, STATE, MMD, and RPL10A.
In various embodiments, the one or more universal signatures comprise one or more genes selected from MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, FRMD5, INHBA, ALDH2, PAPSS2, LTF, S100A12, MS4A6A, GSTK1, RNF31, NOTCH4, COL17A1, S100A8, CTSG, STX11, PTX3, MYOF, LTA4H, TRIM26, CYP1B1, ARG1, IFNGR2, B3GNT5, KYNU, LPGAT1, SLC9A3R1, HP, PADI4, PSME1, MGST2, NR4A1, SPP1, DEFA3, ME1, RBP7, DUSP6, and MCRS1. In various embodiments, the one or more universal signatures comprise one or more genes selected from POLH, PTGER3, RUNX1, CASP6, CHPT1, APOBEC3F, USP14, PEX16, HLA-DQA1, IRF4, TNNC2, RIT1, ALG1, PDCD4, CYP2E1, GABARAPL2, B4GALT7, IFNAR1, MEF2C, TLR8, TSPYL2, M6PR, IKZF1, CNDP2, SLCO2A1, RBM4, FH, MRTO4, DTX4, RFC2, CAMK1G, CBX8, HM13, PSMB10, GCLM, SLC25A3, MYD88, IL33, ITGAM, PPIA, SEC22B, CXCR3, SCRN1, RXRA, SDHA, GLDC, FGF6, PRKG2, TFPI, and IMMT. In various embodiments, the one or more universal signatures comprise one or more genes selected from CPEB4, CDKN3, TRIM14, ANXA9, CRYAB, CHST11, ANAPC11, RNASE3, FN1, ARNTL2, KRT82, PRIM2, MOCS2, IL21R, MAPK8, NMNAT1, ZNF107, CTSG, IL7, ANKRD34B, TMF1, HPS3, CIT, TRAP1, MSH2, PDGFC, TMLHE, MVP, TBX21, PICALM, KRT6A, FMR1, PCSK9, DNASE1L3, ENDOG, TPD52L1, PEX6, MPO, CHRNA7, SLFN5, TNFRSF1A, CD24, CASC1, LLGL2, DLG5, MYO5C, PGR, PFKFB2, AK2, and COL19A1.
In various embodiments, the one or more universal signatures comprise one or more genes selected from HUWE1, KCNK5, STX11, MORC3, NETO2, BATF2, CCL3L1, SAMD9, CCL2, PPFIA4, RPH3A, CXCL11, ERMAP, GBP2, CASP1, TLR7, EPX, ANKH, ARFGAP3, BAZ1A, COL5A1, COP1, BIRC2, SLC7A5, TRO, CXCL6, TNFSF10, GYPE, COL17A1, ROCK1, CD83, AK7, MSR1, LCN2, SPN, ASS1, HDGF, CXCL16, POLR3D, GK, OLFM4, STK3, RCBTB1, FOLR3, FBXO32, TMEM98, PRDX2, CKB, UHRF1BP1L and CTSG. In various embodiments, the one or more universal signatures comprise one or more genes selected from AKR1A1, NDST1, RNF144B, HDAC9, PSMB3, PFKP, MB, MYC, PEX14, TAF13, BMX, PRKAA2, PTGER3, C3, SPTAN1, PROCR, AARS2, RHOT2, PHEX, THOP1, TIMM10, TBL1X, HNF4A, SLC6A9, FECH, CLCN3, CEACAM4, MMPI, HSD11B2, SLC25A25, RAB32, CXCL9, KCNE2, FCAR, CFP, IGF1, PEX16, RNF214, PIM1, JUNB, MDM2, PFKFB4, SIAH2, EGR2, KCNK10, EHMT2, FPR1, CD27, CETN2, and TGM1.
In various embodiments, the one or more universal signatures comprise one or more genes selected from SPOCK3, PVR, CHTF8, SLC20A1, PARP8, FGG, ZFAND2A, CCL25, CALR, TM7SF2, FUS, DDAH2, SPAG4, FBXL14, LGALS8, GNE, HAS2, IGSF6, B4GALT1, POLK, PLK4, NDUFB4, GNG8, MUC1, AGGF1, PPIB, SLC1A4, HLA-DQB1, SEMA4G, MT2A, COL4A2, PLCB4, GYS1, PRKCG, RXFP2, PLA2G4C, ALDH1A2, IL1A, IBTK, SPARC, OAS3, EPHA4, HLA-B, MICB, CCL18, SLC39A6, GLCE, TUBB2B, FBXO8, and SNX6. In various embodiments, the one or more universal signatures comprise one or more genes selected from NLRC5, CACNB2, CELSR1, PARP8, ECT2, HTATIP2, NRP1, NCK2, TMEM100, CLCA2, BAALC, PTPN14, IRF9, SAA2, HR, IRGQ, AKT3, SYNGR1, NKX2-2, MT1H, SERPINA6, CAMK2N1, CCT6B, WDHD1, NKX3-1, LDHC, MALT1, CD9, CLGN, SLC25A19, MAP7, XCL1, ACSL6, TFRC, CAT, NKD1, CNBP, ALDH1L1, CCL7, SLC20A1, KRAS, CSF1, CASP2, HDAC11, KIR2DS4, CEACAM19, CFH, CAB39L, DEPDC1, and PSMAL In various embodiments, the one or more universal signatures comprise one or more genes selected from CCK, SESN2, NACAD, PCSK9, C1R, SLC7A1, ECM1, XCL1, ARG2, SPSB1, DNAH17, TNNC1, CPN1, SYNGR2, CPA4, MYL1, DUOX2, ZNF621, GAPDHS, BCAP31, DLG1, IL17RB, SLC6A6, BCL2L2, HSPA1B, SLC1A4, TSTD1, HSPB8, MSC, CENPJ, ARL8A, CTLA4, GFRA1, WASF1, RIPK1, ENO3, KRT19, PLVAP, RAD18, ACHE, FBLN5, MGST2, ANAPC5, RFX5, CASP7, STC1, NCK2, IFI27, APOA4, and MSRB2.
Additionally disclosed herein is a non-transitory computer-readable medium for identifying one or more universal signatures useful for evaluating two or more disease indications, the computer-readable medium comprising instructions that, when executed by a processor, cause the processor to perform the steps comprising: obtaining or having obtained expressions of a plurality of markers across individuals for a first disease indication; analyzing the expressions of the plurality of markers using a machine-learned analysis to identify one or more universal signatures from the first disease indication, wherein the one or more universal signatures are features that are predictive for a second disease indication, wherein each of the first disease indication and the second disease indication is characterized by a common condition.
Additionally disclosed herein is a non-transitory computer-readable medium for generating a prediction of a second disease indication for a patient, the computer-readable medium comprising instructions that, when executed by a processor, cause the processor to perform the steps comprising: obtaining or having obtained expressions of one or more universal signatures from the subject, the one or more universal signatures derived from a machine-learned analysis of a plurality of markers across individuals associated with a first disease indication, wherein each of the first disease indication and the second disease indication is characterized by a common condition; and based on the expressions for the one or more universal signatures, generating the prediction of the second disease indication.
In various embodiments, the one or more universal signatures comprise one or more of genes, nucleic acids, metabolites, or protein biomarkers. In various embodiments, the common condition is any one of a precursor to a disease, a sub phenotype of a disease, progression from latent to acute infection, progression from acute to chronic infection, response to an intervention, susceptibility to disease or infection, presence of acute inflammation, presence of chronic inflammation, a dysregulated pathway expression, a cellular phenotype, or a clinical phenotype (e.g., high blood pressure, fever, loss of blood, loss of consciousness, or increased heart rate). In various embodiments, the clinical phenotype is any one of high blood pressure, fever, loss of blood, loss of consciousness, increased heart rate, or need for mechanical ventilation.
In various embodiments, the first disease indication describes a disease activity of a first disease, and wherein the second disease indication describes a disease activity of a second disease, and wherein the first disease indication differs from the second disease indication by any of a different disease activity of a disease, a disease activity of different diseases, different disease activity of different diseases. In various embodiments, each of the first disease indication or second disease indication is any one of activity of an inflammatory disease, activity of a disease observed in an animal model, activity of a bacterial infectious disease, a progression from latent to acute infection, a dysregulated blood cell population makeup, or a dysregulated pathway expression, and wherein the disease activity of the second disease is any one of disease of a cancer, activity of a human disease that represents an equivalent phenotype of a disease in an animal, activity of an infectious disease from a non-bacterial infectious agent, protection after vaccination, estimated time to death due to disease, or a diseased condition. In various embodiments, the first disease is an inflammatory disease and the second disease is a cancer. In various embodiments, the first disease is observed in an animal model and wherein the second disease is an equivalent disease phenotype in humans. In various embodiments, the first disease is a bacterial infectious disease and wherein the second disease is a disease from a non-bacterial infectious agent. In various embodiments, the disease activity of the first disease is a progression from latent to acute infection and wherein the disease activity of the second disease is protection after vaccination.
In various embodiments, the machine-learned analysis is random forest or gradient boosting for identifying the one or more universal signatures. In various embodiments, the intervention is any one of a small molecule therapeutic, a biologic, a vaccine, or a gene therapy. In various embodiments, individuals with the second disease have encountered or are likely to encounter the common condition.
In various embodiments, generating the prediction of the second disease indication for the patient comprises performing an unsupervised clustering of the expressions of the one or more universal signatures to classify the subject. In various embodiments, generating the prediction of the second disease indication for the patient comprises performing a dimensionality reduction analysis of the expressions of the one or more universal signatures. In various embodiments, the non-transitory computer-readable medium further comprises instructions that, when executed by the processor, cause the processor to perform the steps comprising: determining whether to include the subject in a clinical trial study according to the prediction of the disease indication for the patient.
In various embodiments, the one or more universal signatures comprise one or more genes selected from NUP93, PPM1G, C6orf62, PJA1, MEST, NDUFS2, DDOST, DHRS7B, NOLC1, POLA2, PRSS23, SHMT1, RIPK1, AKR1A1, PRPF3, ETS1, MANSC1, PDHA1, ACLY, CHI3L2, MCMI, DNAJC18, LCT, YRDC, AIFM1, SFN, FBN1, EIF4H, CLEC4A, BCAP31, ATG4B, CSRP1, RDH11, GCLM, CDC7, GLOD5, IDH2, FMR1, PPARA, CCNE1, DDB1, BMP1, EHD4, VAV3, MPG, SPAG4, PSMD3, BCKDHA, GRAMD1B, and SEC61A1. In various embodiments, the one or more universal signatures comprise one or more genes selected from CRB3, BCAP31, GMPPB, CD4, STARD3, CALR, CSRP1, CPT1A, LDLRAP1, RRAS, HMGCR, RASGRP2, PTS, SORDSLC26A6, VAT1, GPAA1, CXCR3, NAMPT, EPHX1, SEPT9, GMPPA, B4GALT7, AAAS, TP53INP1, GYS1, FASN, NOC4L, RRP9, MXIL TP53, SLC7A11, FOXP3, DNASE1L1, MGAT1, SEC61A1, FYCO1, S100A10, LSS, IFRD1, DCP2, EDC4, ANKZFL IDUA, IGFBP2, DDX39A, UCHL1, NR4A1, PDIA5, and ENGASE. In various embodiments, the one or more universal signatures comprise one or more genes selected from NUB1, CASP1, WARS, TRIM21, STAT1, MOCOS, BCL2L14, ATF3, KIF2A, PDCD1LG2, SNX10, SEC24D, UBE2L6, LDHC, FAS, CXCL10, STAT2, IRF7, CD274, PSME2, LPCAT2, PSMB8, FBXO6, DUSP10, PLA2G4C, BANF1, EPOR, KCNMA1, CTSK, ITGA2, MPZL2, FEZ1, JAK2, BAZ1A, ICAM4, DAPP1, RIPK1, RNF144B, LAP3, C1QA, TYMP, GCH1, C1QB, CREM, ETV7, FOSB, MRPL15, PSEN1, MXIL and TRAFD1.
In various embodiments, the one or more universal signatures comprise one or more genes selected from DNAAF1, UQCRC2, XPNPEP1, ACSM1, DDX60, TPI1, EFNA3, ZDHHC19, DDIT3, DNAJC12, RET, IL20RB, TNFSF10, DLG4, CKAP4, NDST1, GAPDH, ARL3, PLG, MDH2, GSTP1, S100A9, B4GALT7, H2AFJ, LTB4R, TAGLN2, IRF7, NDUFV1, CD300LB, RTP4, CTSD, HIST1H2BG, IL27, TNFRSF1B, SORBS1, NOP2, TNFSF13B, HLA-DRB5, RHOG, PSMB9, HSPA6, CD63, SLC2A8, IFITM1, CKB, ALDOA, MSRB1, OSMR, DRAP1, and PLA2G4A. In various embodiments, the one or more universal signatures comprise one or more genes selected from LRRC28, E2F4, MRPL15, CCL22, OTUD1, NSUN7, CHEK1, ADGRA2, ZFPM2, GYS2, CD151, RAD51C, ARHGEF2, PFN1, AP4B1, IGFBP4, OASL, PDGFC, MIEN1, BEST3, SH3RF1, RACGAP1, FMO3, HNRNPA2B1, F2RL1, CAMKK2, ITGB5, FLVCR2, ZNF462, KIAA1324, CENPN, IKBKE, SERPINF2, FAM162A, SNX2, SERPING1, CLCA2, DPEP3, TNFAIP2, FSTL4, CTSD, BCAR1, MKX, RGS2, SAMD9, GCLM, BST1, IRS2, RNASE6, and ELOVL3. In various embodiments, the one or more universal signatures comprise one or more genes selected from GSTM3, GYG1, CCL22, MOCS2, LY6E, CD151, S100A12, HEBP2, EIF3B, BAAT, MRPL11, OAS1, RFX5, PSMD7, ALDH2, STAP1, GYS2, GMFB, CCL3, PSMA4, CTHRC1, CMTM2, CD36, B4GALT2, EDF1, CDK5R1, TREML3P, PML, HEPHL1, TNFRSF21, PSMB9, GNAI1, TSPAN13, ATP6V0B, SLC4A4, ILF2, AKAP12, HLA-DRB5, PGR, AGTRAP, P3H1, CDADC1, TRIM5, PTGER3, ADCY6, ERBB2, NFYA, STATE, MMD, and RPL10A.
In various embodiments, the one or more universal signatures comprise one or more genes selected from MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, FRMD5, INHBA, ALDH2, PAPSS2, LTF, S100A12, MS4A6A, GSTK1, RNF31, NOTCH4, COL17A1, S100A8, CTSG, STX11, PTX3, MYOF, LTA4H, TRIM26, CYP1B1, ARG1, IFNGR2, B3GNT5, KYNU, LPGAT1, SLC9A3R1, HP, PADI4, PSME1, MGST2, NR4A1, SPP1, DEFA3, ME1, RBP7, DUSP6, and MCRS1. In various embodiments, the one or more universal signatures comprise one or more genes selected from POLH, PTGER3, RUNX1, CASP6, CHPT1, APOBEC3F, USP14, PEX16, HLA-DQA1, IRF4, TNNC2, RIT1, ALG1, PDCD4, CYP2E1, GABARAPL2, B4GALT7, IFNAR1, MEF2C, TLR8, TSPYL2, M6PR, IKZF1, CNDP2, SLCO2A1, RBM4, FH, MRTO4, DTX4, RFC2, CAMK1G, CBX8, HM13, PSMB10, GCLM, SLC25A3, MYD88, IL33, ITGAM, PPIA, SEC22B, CXCR3, SCRN1, RXRA, SDHA, GLDC, FGF6, PRKG2, TFPI, and IMMT. In various embodiments, the one or more universal signatures comprise one or more genes selected from CPEB4, CDKN3, TRIM14, ANXA9, CRYAB, CHST11, ANAPC11, RNASE3, FN1, ARNTL2, KRT82, PRIM2, MOCS2, IL21R, MAPK8, NMNAT1, ZNF107, CTSG, IL7, ANKRD34B, TMF1, HPS3, CIT, TRAP1, MSH2, PDGFC, TMLHE, MVP, TBX21, PICALM, KRT6A, FMR1, PCSK9, DNASE1L3, ENDOG, TPD52L1, PEX6, MPO, CHRNA7, SLFN5, TNFRSF1A, CD24, CASC1, LLGL2, DLG5, MYO5C, PGR, PFKFB2, AK2, and COL19A1. In various embodiments, the one or more universal signatures comprise one or more genes selected from HUWE1, KCNK5, STX11, MORC3, NETO2, BATF2, CCL3L1, SAMD9, CCL2, PPFIA4, RPH3A, CXCL11, ERMAP, GBP2, CASP1, TLR7, EPX, ANKH, ARFGAP3, BAZ1A, COL5A1, COP1, BIRC2, SLC7A5, TRO, CXCL6, TNFSF10, GYPE, COL17A1, ROCK1, CD83, AK7, MSR1, LCN2, SPN, ASS1, HDGF, CXCL16, POLR3D, GK, OLFM4, STK3, RCBTB1, FOLR3, FBXO32, TMEM98, PRDX2, CKB, UHRF1BP1L and CTSG. In various embodiments, the one or more universal signatures comprise one or more genes selected from AKR1A1, NDST1, RNF144B, HDAC9, PSMB3, PFKP, MB, MYC, PEX14, TAF13, BMX, PRKAA2, PTGER3, C3, SPTAN1, PROCR, AARS2, RHOT2, PHEX, THOP1, TIMM10, TBL1X, HNF4A, SLC6A9, FECH, CLCN3, CEACAM4, MMPI, HSD11B2, SLC25A25, RAB32, CXCL9, KCNE2, FCAR, CFP, IGF1, PEX16, RNF214, PIM1, JUNB, MDM2, PFKFB4, SIAH2, EGR2, KCNK10, EHMT2, FPR1, CD27, CETN2, and TGM1.
In various embodiments, the one or more universal signatures comprise one or more genes selected from SPOCK3, PVR, CHTF8, SLC20A1, PARP8, FGG, ZFAND2A, CCL25, CALR, TM7SF2, FUS, DDAH2, SPAG4, FBXL14, LGALS8, GNE, HAS2, IGSF6, B4GALT1, POLK, PLK4, NDUFB4, GNG8, MUC1, AGGF1, PPIB, SLC1A4, HLA-DQB1, SEMA4G, MT2A, COL4A2, PLCB4, GYS1, PRKCG, RXFP2, PLA2G4C, ALDH1A2, ILIA, IBTK, SPARC, OAS3, EPHA4, HLA-B, MICB, CCL18, SLC39A6, GLCE, TUBB2B, FBXO8, and SNX6. In various embodiments, the one or more universal signatures comprise one or more genes selected from NLRC5, CACNB2, CELSR1, PARP8, ECT2, HTATIP2, NRP1, NCK2, TMEM100, CLCA2, BAALC, PTPN14, IRF9, SAA2, HR, IRGQ, AKT3, SYNGR1, NKX2-2, MT1H, SERPINA6, CAMK2N1, CCT6B, WDHD1, NKX3-1, LDHC, MALT1, CD9, CLGN, SLC25A19, MAP7, XCL1, ACSL6, TFRC, CAT, NKD1, CNBP, ALDH1L1, CCL7, SLC20A1, KRAS, CSF1, CASP2, HDAC11, KIR2DS4, CEACAM19, CFH, CAB39L, DEPDC1, and PSMA1. In various embodiments, the one or more universal signatures comprise one or more genes selected from CCK, SESN2, NACAD, PCSK9, C1R, SLC7A1, ECM1, XCL1, ARG2, SPSB1, DNAH17, TNNC1, CPN1, SYNGR2, CPA4, MYL1, DUOX2, ZNF621, GAPDHS, BCAP31, DLG1, IL17RB, SLC6A6, BCL2L2, HSPA1B, SLC1A4, TSTD1, HSPB8, MSC, CENPJ, ARL8A, CTLA4, GFRA1, WASF1, RIPK1, ENO3, KRT19, PLVAP, RAD18, ACHE, FBLN5, MGST2, ANAPC5, RFX5, CASP7, STC1, NCK2, IFI27, APOA4, and MSRB2.
These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description and accompanying drawings. It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality. For example, a letter after a reference numeral, such as “third party entity 330A,” indicates that the text refers specifically to the element having that particular reference numeral. A reference numeral in the text without a following letter, such as “third party entity 330,” refers to any or all of the elements in the figures bearing that reference numeral (e.g. “third party entity 330” in the text refers to reference numerals “third party entity 330A” and/or “third party entity 330B” in the figures).
These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, and accompanying drawings, where:
Figure (
Terms used in the claims and specification are defined as set forth below unless otherwise specified.
The term “subject,” “individual,” or “patient” are used interchangeably and encompass a cell, tissue, organism, human or non-human, mammal or non-mammal, male or female, whether in vivo, ex vivo, or in vitro. In various embodiments, different subjects can be human or non-human, and as such, the generation and use of universal signatures, as described herein, can be generated and/or deployed for both human and non-human subjects.
The terms “marker,” “markers,” “biomarker,” and “biomarkers” are used interchangeably and encompass, without limitation, lipids, lipoproteins, proteins, cytokines, chemokines, growth factors, peptides, nucleic acids, genes, oligonucleotides, metabolites, mutations, variants, polymorphisms, modifications, fragments, subunits, degradation products, elements, and other analytes or sample-derived measures. A marker can also include mutated proteins, mutated nucleic acids, structural variants including copy number variations, inversions, and/or transcript variants.
The term “expression of markers” refers to a quantity or state of a marker. For example, expression of a peptide can refer to a quantitative amount of the peptide e.g., quantity of the peptide in a sample. As another example, expression of a nucleic acid can refer to a quantitative amount of the nucleic acid e.g., quantity of the nucleic acid in a sample. As another example, expression of a gene can refer to the quantitative amount of gene product (e.g., a transcript such as RNA nucleic acid transcribed from the gene, or a protein translated from the mRNA of the gene). As another example, expression of a gene can refer to a state of the gene, such as an active state or a silenced state. As another example, expression of a marker refers to quantities of metabolites or metabolic patterns from metabolomics.
The terms “universal signature,” “transfer signature,” or “shared signature” are used interchangeably and refers to one or more markers that are predictive for two or more disease indications. In various embodiments, a universal signature includes one marker, such as a gene marker. In various embodiments, a universal signature includes two or more markers, such as two or more gene markers. Generally, a universal signature, as disclosed herein, is identified by analyzing data related to a first disease indication. Such a universal signature can then be applied for generating predictions for additional disease indications. In various embodiments, a universal signature is associated with a common condition of the first disease indication and the second disease indication. For example, the universal signature can play a role in the underlying biology of the common condition of the first disease indication and the second disease indication. This enables the universal signature to be predictive of the first disease indication and the second disease indication.
The term “disease indication” refers to disease activity or state of a disease. The term “different disease indication” refers to any of 1) different disease activity of a disease, 2) a disease activity of different diseases, or 3) different disease activity of different diseases. Generally, a first disease indication and a second disease indication differ either by the disease activity, the disease, or both. For example, a first disease indication can be vaccine protection in tuberculosis, where the disease activity refers to vaccine protection and the disease is tuberculosis. A second disease indication can be progression of tuberculosis, where disease activity refers to progression and the disease is tuberculosis. As another example, a first disease indication can be chronic infection in infectious diseases, where the disease activity refers to chronic infection and the diseases are infectious diseases. A second disease indication can refer to the same disease activity (e.g., chronic infection) in a different disease (e.g., glioma). The phrase “different disease” also encompasses a disease in different species. For example, tuberculosis in a human and tuberculosis in a non-human (e.g., Rhesus Macaque) are considered different diseases.
The phrase “disease activity of a disease” refers to any one of activity of an inflammatory disease, activity of a cancer, activity of a disease observed in an animal model, activity of a bacterial infectious disease, activity of a viral infectious disease, a progression from latent to acute infection, disease of a cancer, activity of a human disease that represents an equivalent phenotype of a disease in an animal, activity of an infectious disease from a non-bacterial infectious agent, protection after vaccination, antibody response to vaccination, estimated time to death due to disease, or a diseased condition.
The term “sample” or “test sample” can include a single cell or multiple cells or fragments of cells or an aliquot of body fluid, such as a blood sample, taken from a subject, by means including venipuncture, excretion, ejaculation, massage, biopsy, needle aspirate, lavage sample, scraping, surgical incision, or intervention or other means known in the art.
The term “obtaining data” or “obtaining a dataset” encompasses obtaining a set of data determined from at least one sample. Obtaining a dataset encompasses obtaining a sample and processing the sample to experimentally determine the data. The phrase also encompasses creating a dataset. The phrase also encompasses receiving a set of data, e.g., from a third party that has processed the sample to experimentally determine the dataset. Additionally, the phrase encompasses mining data from at least one database or at least one publication or a combination of databases and publications. A dataset can be obtained by one of skill in the art via a variety of known ways including stored on a storage memory.
The phrase “common condition” refers to any one of a precursor to a disease, a sub phenotype of a disease, progression from latent to acute infection, progression from acute to chronic infection, response to an intervention, susceptibility to disease or infection, presence of acute inflammation, presence of chronic inflammation, a dysregulated pathway expression, a cellular phenotype, or a clinical phenotype (e.g., high blood pressure, fever, loss of blood, loss of consciousness, or increased heart rate). In various embodiments, a first disease and a second disease share a common condition (e.g., share a common precursor or common sub phenotype).
Therefore, one or more universal signatures developed from a first disease indication can be predictive for disease activity for a second disease indication due to the sharing of the common condition between the first and second diseases.
It must be noted that, as used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise.
Overview
Data associated with a first disease indication 110 is obtained. In various embodiments, data associated with a first disease indication 110 comprises data that are derived from individuals. Such individuals can be known to have the first disease indication (e.g., disease activity of a first disease). For example, the individuals may have been clinically diagnosed with the first disease. Data associated with a first disease indication 110 can include expressions of markers of the individuals who are known to exhibit disease activity of the first disease.
As shown in
Referring now to the deployment process 160, the one or more universal signatures 120 identified during the development process 150 are used to generate a prediction for a second disease indication. In various embodiments, a common condition 125 guides the selection of the one or more universal signatures that are to be used for generating a prediction for a second disease indication. For example, the first disease indication and second disease indication may share a common condition 125 that characterize, at least in part, each of the first and second disease indications. Examples of a common condition 125 include a precursor to a disease, a sub phenotype of a disease, progression from latent to acute infection, progression from acute to chronic infection, response to an intervention, susceptibility to disease or infection, presence of acute inflammation, presence of chronic inflammation, a dysregulated pathway expression, a cellular phenotype, or a clinical phenotype (e.g., high blood pressure, fever, loss of blood, loss of consciousness, or increased heart rate). The common condition 125 indicates likely commonality in the underlying biology of the first and second disease indications such that the one or more universal signatures developed for the first disease indication can be predictive for the second disease indication.
As shown in
Although
The deployment process 160 involves analyzing 135 the expressions of markers (e.g., genes) the one or more universal signatures from the patients 130. The analysis of the expressions of markers of the one or more universal signatures yields a prediction for the second disease indication 140. In one embodiment, the analysis of the expressions of the markers of the one or more universal signatures involves the application of a machine learning model that is trained to predict disease activity of the second disease using the one or more universal signatures. In other words, the machine learning model can be previously trained using a training dataset with expressions of markers of the universal signatures and the corresponding disease activity of the second disease. In one embodiment, the analysis of the expressions of markers of the universal signatures involves an unsupervised clustering process for classifying the patients 130 into a category. The prediction for the second disease indication 140 can be used for various purposes, such as determining whether patients 130 are eligible or ineligible for enrollment in a clinical trial. In various embodiments, the prediction for the second disease indication 140 can be used to guide the care that is provided to a patient 130 (e.g., selection of an intervention that is provided to a patient 130).
Although
Additionally, in various embodiments, a universal signature identified from a development process 150 can be applied more than once across different deployment processes 160 for different disease indications. For example, a universal signature determined from data associated with a first disease indication can be applied to generate predictions for additional disease indications that share a common condition 125 with the first disease indication. In various embodiments, the multiple disease indications can be two disease indications, three disease indications, four disease indications, five disease indications, six disease indications, seven disease indications, eight disease indications, nine disease indications, or ten disease indications. In various embodiments, the multiple disease indications can be eleven or more disease indications.
Methods for Developing Universal Signatures
Reference is now made to
Step 210 involves obtaining data associated with a first disease indication, such as expressions of markers for individuals associated with the first disease indication. In various embodiments, the individuals have been clinically diagnosed and exhibit disease activity of the first disease. In some embodiments, the individuals have not been clinically diagnosed with the first disease and do not exhibit disease activity of the first disease. For example, such individuals may be healthy individuals. In various embodiments, these individuals have encountered a condition (e.g., a common condition as is described in further detail below) of the first disease. In some embodiments, the individuals need not have encountered the condition but may be likely to encounter the condition of the first disease in the future.
In various embodiments, the expressions of markers for individuals associated with the first disease indication is in response to a perturbation or stimuli. Put another way, the expression of markers for individuals may have been determined from the individuals at a timepoint relative to a perturbation or stimuli. Examples of a perturbation or stimulus include an infection (e.g., bacterial infection or viral infection) or a treatment (e.g., drug treatment, medication, or a vaccination). As a specific example, the perturbation is a vaccine, and therefore the expression of markers for individuals can be determined from individuals at any of the different timepoints of 1) pre-vaccination, 2) pre-challenge, or 3) post-challenge.
Therefore, in some embodiments, the expressions of markers obtained at step 210 represent the response to the perturbation or stimulus.
In various embodiments, data associated with a first disease indication can include data from different studies. Thus, the data from the different studies can be aggregated to generate an aggregated dataset. As an example, a first study can include data from a human clinical trial. A second study can include data from a non-human study. Such a non-human study can be a pre-clinical trial study that involves a non-human subject (e.g., a study involving mammalian subjects, such as Rhesus Macaques). Thus, the aggregated dataset includes data from two or more studies and in such embodiments, the identification of one or more universal signatures, as described in further detail below, involves analyzing data from different sources (e.g., from human and non-human subjects). In various embodiments, when identifying one or more universal signatures from multiple sources, the top performing N markers from each source is included as a universal signature. In various embodiments, the top performing N markers across all sources are selected as a universal signature.
In one embodiment, obtaining the expressions of markers encompasses obtaining samples from the individuals and performing one or more assays on the samples to obtain the expressions of markers. Example assays for obtaining expressions of biomarkers include quantitating biomarkers using antibodies or performing gene expression profiling with microarrays or RNAseq. These examples are described herein in further detail. In various embodiments, obtaining the expressions of markers of universal signatures encompasses receiving, from a third party, a dataset including the expressions of markers of universal signatures of the individuals. In such embodiments, the third party may have performed the assay on samples obtained from the individuals to generate the dataset including expressions of markers. In various embodiments, data associated with the first disease indication 110 is curated from datasets. For example, such datasets can be curated from publicly available databases that include expressions of markers in patients who were previously known to have disease activity of the first disease. Examples of publicly available databases include the NCBI Gene Expression Omnibus (GEO) database (e.g., Accession numbers GSE79362, GSE102440, GSE110480, GSE17924, GSE21802, GSE111368, GSE145926, GSE48023, GSE48018) and the NIH Genomic Data Commons Data Portal. In such embodiments, datasets from different databases are aggregated to generate a single dataset for which subsequent analysis can be performed.
Generally, the dataset includes expressions of a plurality of markers for a plurality of individuals. In various embodiments, the dataset includes expressions of tens, hundreds, thousands, tens of thousands, or hundreds of thousands of markers. In some embodiments, the dataset includes expressions of at least 10, at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, or at least 1000 markers. In some embodiments, the dataset includes expressions of at least 2000, at least 3000, at least 4000, at least 5000, at least 6000, at least 7000, at least 8000, at least 9000, or at least 10,000 markers. In various embodiments, the dataset includes expressions of a plurality of markers for tens, hundreds, thousands, tens of thousands, or hundreds of thousands of individuals. In some embodiments, the dataset includes expressions of a plurality of markers for at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, or at least 1000 individuals. In some embodiments, the dataset includes expressions of a plurality of markers for at least 2000, at least 3000, at least 4000, at least 5000, at least 6000, at least 7000, at least 8000, at least 9000, or at least 10,000 individuals.
In various embodiments, the dataset includes additional information pertaining to each individual. As an example, the additional information can include a reference ground truth that are useful for implementing machine-learning methods for extracting a universal signature. A reference ground truth can indicate the presence or absence of disease activity in the individual. For example, if the individual is a healthy individual who has not exhibited disease activity, a reference ground truth value can be assigned to the training example involving the healthy individual. A different individual who is exhibiting disease activity can be assigned a different reference ground truth value. For example, assuming that the disease activity is a progression from latent to acute infection, the reference ground truth for the individual identifies whether or not the individual progressed from a latent infection to an acute infection. As another example, assuming that the disease activity is protection after receiving vaccination, the reference ground truth for the individual indicates whether or not the individual exhibits immunity to the first disease due to the vaccination. In various embodiments, a reference ground truth value of “1” can be assigned to indicate that the individual exhibits disease activity of the disease whereas a reference ground truth value of “0” can be assigned to indicate that the individual does not exhibit disease activity (e.g., the individual is healthy).
At step 220, one or more universal signatures are identified by analyzing the expressions of markers in the dataset. The identified universal signatures include markers that represent a subset of the biomarkers in the dataset. Generally, a universal signature can contain markers that represent features that are informative for predicting disease activity in the first disease, given that the universal signature is identified from a training dataset associated with the first disease indication. However, as described further below, the universal signature can additionally be informative for predicting disease activity in one or more additional diseases.
In one embodiment, a universal signature is identified through univariate feature selection methods. For example, the expression of each marker in the dataset can be analyzed to determine the correlation between the expression of the marker and the reference ground truth (e.g., a reference ground truth indicating presence or absence of disease activity in an individual). The correlation between the biomarker and the reference ground truth can be represented as a coefficient, an example of which is the Pearson correlation coefficient. Depending on the coefficient, the univariate analysis can reveal whether a biomarker is positively correlated (e.g., Pearson correlation coefficient equal to or close to 1), negatively correlated (e.g., Pearson correlation coefficient equal to or close to −1), or limitedly correlated (e.g., Pearson correlation coefficient equal to or close to 0) to the reference ground truth. In various embodiments, positively or negatively correlated biomarkers can be useful when included in the universal signature. For example, the top N biomarkers that are most positively or negatively correlated with reference ground truth values can be selected for the universal signature. Other univariate feature selection methods involve performing a statistical significance test (e.g., a t-test p-value ranking) to identify biomarkers that most correlate with the disease activity of the first disease.
In one embodiment, identifying one or more universal signatures involves, at step 225, implementing machine-learning methods, including deep learning, to extract one or more universal signatures from the biomarkers of the dataset. Example machine-learning methods include random forest, gradient boosting (XGBoost), neural networks, and support vector machines (SVMs).
In one embodiment, a universal signature includes a set of markers that had the highest weights in the random forest models, the highest weights indicating that the set of markers best discriminate between control (e.g., non-diseased) and disease state of the first disease indication. In other words, the markers that have the highest predictive power on the training dataset are combined be used as the universal signature. As one example, for random forest feature selection, a method of mean decrease impurity can be implemented to identify the set of markers that are the most influential for the disease activity of the first disease. A node in the decision tree contains a measure, also referred to as an impurity. Therefore, as model is trained, the impact of each feature can be determined according to how much the feature changes the impurity in the tree. Heavily influential features are selected and combined as a universal signature. In various embodiments, to account for the differences of the markers (e.g., different gene numbers), the feature importance are first standardized before being combined. The markers with the highest standardized feature importance are selected as the universal signature.
As another example, for random forest feature selection, a method of mean decrease accuracy can be implemented. The goal for this method is to determine the impact of each feature on the performance of the model by shuffling the values of features such that the performance of the model is reduced. The shuffling of values for features that are predictive for the disease activity will likely negatively impact the performance of the model whereas less important features, when their values are shuffled, will impact the performance of the model limitedly.
In various embodiments, step 220 involves identifying at least one universal signature, at least two universal signatures, at least three universal signatures, at least four universal signatures, at least five universal signatures, at least six universal signatures, at least seven universal signatures, at least eight universal signatures, at least nine universal signatures, at least ten universal signatures, at least eleven universal signatures, at least twelve universal signatures, at least thirteen universal signatures, at least fourteen universal signatures, at least fifteen universal signatures, at least sixteen universal signatures, at least seventeen universal signatures, at least eighteen universal signatures, at least nineteen universal signatures, at least twenty universal signatures, at least twenty one universal signatures, at least twenty two universal signatures, at least twenty three universal signatures, at least twenty four universal signatures, at least twenty five universal signatures, at least twenty six universal signatures, at least twenty seven universal signatures, at least twenty eight universal signatures, at least twenty nine universal signatures, at least thirty universal signatures, at least thirty one universal signatures, at least thirty two universal signatures, at least thirty three universal signatures, at least thirty four universal signatures, at least thirty five universal signatures, at least thirty six universal signatures, at least thirty seven universal signatures, at least thirty eight universal signatures, at least thirty nine universal signatures, at least forty universal signatures, at least forty one universal signatures, at least forty two universal signatures, at least forty three universal signatures, at least forty four universal signatures, at least forty five universal signatures, at least forty six universal signatures, at least forty seven universal signatures, at least forty eight universal signatures, at least forty nine universal signatures, or at least fifty universal signatures. In various embodiments, step 220 involves identifying at least sixty, at least seventy, at least eighty, at least ninety, or at least one hundred universal signatures.
Example Universal Signature
In various embodiments, a universal signature includes one marker, such as a gene marker. In various embodiments, a universal signature includes at least two markers, at least three markers, at least four markers, at least five markers, at least six markers, at least seven markers, at least eight markers, at least nine markers, at least ten markers, at least eleven markers, at least twelve markers, at least thirteen markers, at least fourteen markers, at least fifteen markers, at least sixteen markers, at least seventeen markers, at least eighteen markers, at least nineteen markers, at least twenty markers, at least twenty one markers, at least twenty two markers, at least twenty three markers, at least twenty four markers, at least twenty five markers, at least twenty six markers, at least twenty seven markers, at least twenty eight markers, at least twenty nine markers, at least thirty markers, at least thirty one markers, at least thirty two markers, at least thirty three markers, at least thirty four markers, at least thirty five markers, at least thirty six markers, at least thirty seven markers, at least thirty eight markers, at least thirty nine markers, at least forty markers, at least forty one markers, at least forty two markers, at least forty three markers, at least forty four markers, at least forty five markers, at least forty six markers, at least forty seven markers, at least forty eight markers, at least forty nine markers, or at least fifty markers. In various embodiments, a universal signature includes at least sixty markers, at least seventy markers, at least eighty markers, at least ninety markers, or at least one hundred markers.
Table 5 documents example sets of universal signatures generated from different datasets. In the examples shown in Table 5, each set of universal signatures includes 50 markers. In some embodiments, fewer or additional universal signatures may be included in a set of universal signatures. For example, as shown in Table 5, the markers in a set of universal signatures are ranked from 1-50. In some embodiments, the markers are ranked based on standardized feature importance
A universal signature can comprise the top 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 markers from the ranked set of markers shown in Table 5. In various embodiments, the universal signature comprises five markers selected from: (a) NUP93, PPM1G, C6orf62, PJA1, and MEST; (b) CRB3, BCAP31, GMPPB, CD4, and STARD3; (c) NUB1, CASP1, WARS, TRIM21, and STAT1; (d) DNAAF1, UQCRC2, XPNPEP1, ACSM1, and DDX60; (e) LRRC28, E2F4, MRPL15, CCL22, and OTUD1; (f) GSTM3, GYG1, CCL22, MOCS2, and LY6E; (g) MAFB, LGALS3, VCAN, PDK4, and CD81; (h) POLH, PTGER3, RUNX1, CASP6, and CHPT1; (i) CPEB4, CDKN3, TRIM14, ANXA9, and CRYAB; (j) HUWE1, KCNK5, STX11, MORC3, and NETO2; (k) AKR1A1, NDST1, RNF144B, HDAC9, and PSMB3; (l) SPOCK3, PVR, CHTF8, SLC20A1, and PARP8; (m) NLRC5, CACNB2, CELSR1, PARP8, and ECT2; or (n) CCK, SESN2, NACAD, PCSK9, and CIR.
In various embodiments, the universal signature comprises ten markers selected from: (a) NUP93, PPM1G, C6orf62, PJA1, MEST, NDUFS2, DDOST, DHRS7B, NOLC1, and POLA2; (b) CRB3, BCAP31, GMPPB, CD4, STARD3, CALR, CSRP1, CPT1A, LDLRAP1, and RRAS; (c) NUB1, CASP1, WARS, TRIM21, STAT1, MOCOS, BCL2L14, ATF3, KIF2A, and PDCD1LG2; (d) DNAAF1, UQCRC2, XPNPEP1, ACSM1, DDX60, TPI1, EFNA3, ZDHHC19, DDIT3, and DNAJC12; (e) LRRC28, E2F4, MRPL15, CCL22, OTUD1, NSUN7, CHEK1, ADGRA2, ZFPM2, and GYS2; (f) GSTM3, GYG1, CCL22, MOCS2, LY6E, CD151, S100A12, HEBP2, EIF3B, and BAAT; (g) MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, and CSTA; (h) POLH, PTGER3, RUNX1, CASP6, CHPT1, APOBEC3F, USP14, PEX16, HLA-DQA1, and IRF4; (i) CPEB4, CDKN3, TRIM14, ANXA9, CRYAB, CHST11, ANAPC11, RNASE3, FN1, and ARNTL2; (j) HUWE1, KCNK5, STX11, MORC3, NETO2, BATF2, CCL3L1, SAMD9, CCL2, and PPFIA4; (k) AKR1A1, NDST1, RNF144B, HDAC9, PSMB3, PFKP, MB, MYC, PEX14, and TAF13; (l) SPOCK3, PVR, CHTF8, SLC20A1, PARP8, FGG, ZFAND2A, CCL25, CALR, and TM7SF2; (m) NLRC5, CACNB2, CELSR1, PARP8, ECT2, HTATIP2, NRP1, NCK2, TMEM100, and CLCA2; or (n) CCK, SESN2, NACAD, PCSK9, C1R, SLC7A1, ECM1, XCL1, ARG2, and SPSB1.
In various embodiments, the universal signature comprises fifteen markers selected from: (a) NUP93, PPM1G, C6orf62, PJA1, MEST, NDUFS2, DDOST, DHRS7B, NOLC1, POLA2, PRSS23, SHMT1, RIPK1, AKR1A1, and PRPF3; (b) CRB3, BCAP31, GMPPB, CD4, STARD3, CALR, CSRP1, CPT1A, LDLRAP1, RRAS, HMGCR, RASGRP2, PTS, SORD, and SLC26A6; (c) NUB1, CASP1, WARS, TRIM21, STAT1, MOCOS, BCL2L14, ATF3, KIF2A, PDCD1LG2, SNX10, SEC24D, UBE2L6, LDHC, and FAS; (d) DNAAF1, UQCRC2, PNPEP1, ACSM1, DDX60, TPI1, EFNA3, ZDHHC19, DDIT3, DNAJC12, RET, IL20RB, TNFSF10, DLG4, and CKAP4; (e) LRRC28, E2F4, MRPL15, CCL22, OTUD1, NSUN7, CHEK1, ADGRA2, ZFPM2, GYS2, CD151, RAD51C, ARHGEF2, PFN1, and AP4B1; (f) GSTM3, GYG1, CCL22, MOCS2, LY6E, CD151, S100A12, HEBP2, EIF3B, BAAT, MRPL11, OAS1, RFX5, PSMD7, and ALDH2; (g) MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, and FRMD5; (h) POLH, PTGER3, RUNX1, CASP6, CHPT1, APOBEC3F, USP14, PEX16, HLA-DQA1, IRF4, TNNC2, RIT1, ALG1, PDCD4, and CYP2E1; (i) CPEB4, CDKN3, TRIM14, ANXA9, CRYAB, CHST11, ANAPC11, RNASE3, FN1, ARNTL2, KRT82, PRIM2, MOCS2, IL21R, and MAPK8; (j) HUWE1, KCNK5, STX11, MORC3, NETO2, BATF2, CCL3L1, SAMD9, CCL2, PPFIA4, RPH3A, CXCL11, ERMAP, GBP2, and CASP1; (k) AKR1A1, NDST1, RNF144B, HDAC9, PSMB3, PFKP, MB, MYC, PEX14, TAF13, BMX, PRKAA2, PTGER3, C3, and SPTAN1; (l) SPOCK3, PVR, CHTF8, SLC20A1, PARP8, FGG, ZFAND2A, CCL25, CALR, TM7SF2, FUS, DDAH2, SPAG4, FBXL14, and LGALS8; (m) NLRC5, CACNB2, CELSR1, PARP8, ECT2, HTATIP2, NRP1, NCK2, TMEM100, CLCA2, BAALC, PTPN14, IRF9, SAA2, and HR; (n) CCK, SESN2, NACAD, PCSK9, C1R, SLC7A1, ECM1, XCL1, ARG2, SPSB1, DNAH17, TNNC1, CPN1, SYNGR2, and CPA4.
In various embodiments, the universal signature comprises twenty markers selected from: (a) NUP93, PPM1G, C6orf62, PJA1, MEST, NDUFS2, DDOST, DHRS7B, NOLC1, POLA2, PRSS23, SHMT1, RIPK1, AKR1A1, PRPF3, ETS1, MANSC1, PDHA1, ACLY, and CHI3L2; (b) CRB3, BCAP31, GMPPB, CD4, STARD3, CALR, CSRP1, CPT1A, LDLRAP1, RRAS, HMGCR, RASGRP2, PTS, SORD, SLC26A6, VAT1, GPAA1, CXCR3, NAMPT, and EPHX1; (c) NUB1, CASP1, WARS, TRIM21, STAT1, MOCOS, BCL2L14, ATF3, KIF2A, PDCD1LG2, SNX10, SEC24D, UBE2L6, LDHC, FAS, CXCL10, STAT2, IRF7, CD274, and PSME2; (d) DNAAF1, UQCRC2, PNPEP1, ACSM1, DDX60, TPI1, EFNA3, ZDHHC19, DDIT3, DNAJC12, RET, IL20RB, TNFSF10, DLG4, CKAP4, NDST1, GAPDH, ARL3, PLG, and MDH2; (e) LRRC28, E2F4, MRPL15, CCL22, OTUD1, NSUN7, CHEK1, ADGRA2, ZFPM2, GYS2, CD151, RAD51C, ARHGEF2, PFN1, AP4B1, IGFBP4, OASL, PDGFC, MIEN1, and BEST3; (f) GSTM3, GYG1, CCL22, MOCS2, LY6E, CD151, S100A12, HEBP2, EIF3B, BAAT, MRPL11, OAS1, RFX5, PSMD7, ALDH2, STAP1, GYS2, GMFB, CCL3, and PSMA4; (g) MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, FRMD5, INHBA, ALDH2, PAPSS2, LTF, and S100A12; (h) POLH, PTGER3, RUNX1, CASP6, CHPT1, APOBEC3F, USP14, PEX16, HLA-DQA1, IRF4, TNNC2, RIT1, ALG1, PDCD4, CYP2E1, GABARAPL2, B4GALT7, IFNAR1, MEF2C, and TLR8; (i) CPEB4, CDKN3, TRIM14, ANXA9, CRYAB, CHST11, ANAPC11, RNASE3, FN1, ARNTL2, KRT82, PRIM2, MOCS2, IL21R, MAPK8, NMNAT1, ZNF107, CTSG, IL7, and ANKRD34B; (j) HUWE1, KCNK5, STX11, MORC3, NETO2, BATF2, CCL3L1, SAMD9, CCL2, PPFIA4, RPH3A, CXCL11, ERMAP, GBP2, CASP1, TLR7, EPX, ANKH, ARFGAP3, and BAZ1A; (k) AKR1A1, NDST1, RNF144B, HDAC9, PSMB3, PFKP, MB, MYC, PEX14, TAF13, BMX, PRKAA2, PTGER3, C3, SPTAN1, PROCR, AARS2, RHOT2, PHEX, and THOP1; (l) SPOCK3, PVR, CHTF8, SLC20A1, PARP8, FGG, ZFAND2A, CCL25, CALR, TM7SF2, FUS, DDAH2, SPAG4, FBXL14, LGALS8, GNE, HAS2, IGSF6, B4GALT1, and POLK; (m) NLRC5, CACNB2, CELSR1, PARP8, ECT2, HTATIP2, NRP1, NCK2, TMEM100, CLCA2, BAALC, PTPN14, IRF9, SAA2, HR, IRGQ, AKT3, SYNGR1, NKX2-2, and MT1H; (n) CCK, SESN2, NACAD, PCSK9, C1R, SLC7A1, ECM1, XCL1, ARG2, SPSB1, DNAH17, TNNC1, CPN1, SYNGR2, CPA4, MYL1, DUOX2, ZNF621, GAPDHS, and BCAP31.
In various embodiments, the universal signature comprises twenty five markers selected from: (a) NUP93, PPM1G, C6orf62, PJA1, MEST, NDUFS2, DDOST, DHRS7B, NOLC1, POLA2, PRSS23, SHMT1, RIPK1, AKR1A1, PRPF3, ETS1, MANSC1, PDHA1, ACLY, CHI3L2, MCMI, DNAJC18, LCT, YRDC, and AIFM1; (b) CRB3, BCAP31, GMPPB, CD4, STARD3, CALR, CSRP1, CPT1A, LDLRAP1, RRAS, HMGCR, RASGRP2, PTS, SORD, SLC26A6, VAT1, GPAA1, CXCR3, NAMPT, EPHX1, SEPT9, GMPPA, B4GALT7, AAAS, and TP53INP1; (c) NUB1, CASP1, WARS, TRIM21, STAT1, MOCOS, BCL2L14, ATF3, KIF2A, PDCD1LG2, SNX10, SEC24D, UBE2L6, LDHC, FAS, CXCL10, STAT2, IRF7, CD274, PSME2, LPCAT2, PSMB8, FBXO6, DUSP10, and PLA2G4C; (d) DNAAF1, UQCRC2, XPNPEP1, ACSM1, DDX60, TPI1, EFNA3, ZDHHC19, DDIT3, DNAJC12, RET, IL20RB, TNFSF10, DLG4, CKAP4, NDST1, GAPDH, ARL3, PLG, MDH2, GSTP1, S100A9, B4GALT7, H2AFJ, and LTB4R; (e) LRRC28, E2F4, MRPL15, CCL22, OTUD1, NSUN7, CHEK1, ADGRA2, ZFPM2, GYS2, CD151, RAD51C, ARHGEF2, PFN1, AP4B1, IGFBP4, OASL, PDGFC, MIEN1, BEST3, SH3RF1, RACGAP1, FMO3, HNRNPA2B1, and F2RL1; (f) GSTM3, GYG1, CCL22, MOCS2, LY6E, CD151, S100A12, HEBP2, EIF3B, BAAT, MRPL11, OAS1, RFX5, PSMD7, ALDH2, STAP1, GYS2, GMFB, CCL3, PSMA4, CTHRC1, CMTM2, CD36, B4GALT2, and EDF1; (g) MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, FRMD5, INHBA, ALDH2, PAPSS2, LTF, S100A12, MS4A6A, GSTK1, RNF31, NOTCH4, and COL17A1; (h) POLH, PTGER3, RUNX1, CASP6, CHPT1, APOBEC3F, USP14, PEX16, HLA-DQA1, IRF4, TNNC2, RIT1, ALG1, PDCD4, CYP2E1, GABARAPL2, B4GALT7, IFNAR1, MEF2C, TLR8, TSPYL2, M6PR, IKZF1, CNDP2, and SLCO2A1; (i) CPEB4, CDKN3, TRIM14, ANXA9, CRYAB, CHST11, ANAPC11, RNASE3, FN1, ARNTL2, KRT82, PRIM2, MOCS2, IL21R, MAPK8, NMNAT1, ZNF107, CTSG, IL7, ANKRD34B, TMF1, HPS3, CIT, TRAP1, and MSH2; (j) HUWE1, KCNK5, STX11, MORC3, NETO2, BATF2, CCL3L1, SAMD9, CCL2, PPFIA4, RPH3A, CXCL11, ERMAP, GBP2, CASP1, TLR7, EPX, ANKH, ARFGAP3, BAZ1A, COL5A1, COP1, BIRC2, SLC7A5, and TRO; (k) AKR1A1, NDST1, RNF144B, HDAC9, PSMB3, PFKP, MB, MYC, PEX14, TAF13, BMX, PRKAA2, PTGER3, C3, SPTAN1, PROCR, AARS2, RHOT2, PHEX, THOP1, TIMM10, TBL1X, HNF4A, SLC6A9, and FECH; (l) SPOCK3, PVR, CHTF8, SLC20A1, PARP8, FGG, ZFAND2A, CCL25, CALR, TM7SF2, FUS, DDAH2, SPAG4, FBXL14, LGALS8, GNE, HAS2, IGSF6, B4GALT1, POLK, PLK4, NDUFB4, GNG8, MUC1, and AGGF1; (m) NLRC5, CACNB2, CELSR1, PARP8, ECT2, HTATIP2, NRP1, NCK2, TMEM100, CLCA2, BAALC, PTPN14, IRF9, SAA2, HR, IRGQ, AKT3, SYNGR1, NKX2-2, MT1H, SERPINA6, CAMK2N1, CCT6B, WDHD1, and NKX3-1; (n) CCK, SESN2, NACAD, PCSK9, C1R, SLC7A1, ECM1, XCL1, ARG2, SPSB1, DNAH17, TNNC1, CPN1, SYNGR2, CPA4, MYL1, DUOX2, ZNF621, GAPDHS, BCAP31, DLG1, IL17RB, SLC6A6, BCL2L2, and HSPA1B.
In various embodiments, the universal signature comprises thirty markers selected from: (a) NUP93, PPM1G, C6orf62, PJA1, MEST, NDUFS2, DDOST, DHRS7B, NOLC1, POLA2, PRSS23, SHMT1, RIPK1, AKR1A1, PRPF3, ETS1, MANSC1, PDHA1, ACLY, CHI3L2, MCMI, DNAJC18, LCT, YRDC, AIFM1, SFN, FBN1, EIF4H, CLEC4A, and BCAP31; (b) CRB3, BCAP31, GMPPB, CD4, STARD3, CALR, CSRP1, CPT1A, LDLRAP1, RRAS, HMGCR, RASGRP2, PTS, SORD, SLC26A6, VAT1, GPAA1, CXCR3, NAMPT, EPHX1, SEPT9, GMPPA, B4GALT7, AAAS, TP53INP1, GYS1, FASN, NOC4L, RRP9, and MXI1; (c) NUB1, CASP1, WARS, TRIM21, STAT1, MOCOS, BCL2L14, ATF3, KIF2A, PDCD1LG2, SNX10, SEC24D, UBE2L6, LDHC, FAS, CXCL10, STAT2, IRF7, CD274, PSME2, LPCAT2, PSMB8, FBXO6, DUSP10, PLA2G4C, BANF1, EPOR, KCNMA1, CTSK, and ITGA2; (d) DNAAF1, UQCRC2, XPNPEP1, ACSM1, DDX60, TPI1, EFNA3, ZDHHC19, DDIT3, DNAJC12, RET, IL20RB, TNFSF10, DLG4, CKAP4, NDST1, GAPDH, ARL3, PLG, MDH2, GSTP1, S100A9, B4GALT7, H2AFJ, LTB4R, TAGLN2, IRF7, NDUFV1, CD300LB, and RTP4; (e) LRRC28, E2F4, MRPL15, CCL22, OTUD1, NSUN7, CHEK1, ADGRA2, ZFPM2, GYS2, CD151, RAD51C, ARHGEF2, PFN1, AP4B1, IGFBP4, OASL, PDGFC, MIEN1, BEST3, SH3RF1, RACGAP1, FMO3, HNRNPA2B1, F2RL1, CAMKK2, ITGB5, FLVCR2, ZNF462, and KIAA1324; (f) GSTM3, GYG1, CCL22, MOCS2, LY6E, CD151, S100A12, HEBP2, EIF3B, BAAT, MRPL11, OAS1, RFX5, PSMD7, ALDH2, STAP1, GYS2, GMFB, CCL3, PSMA4, CTHRC1, CMTM2, CD36, B4GALT2, EDF1, CDK5R1, TREML3P, PML, HEPHL1, and TNFRSF21; (g) MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, FRMD5, INHBA, ALDH2, PAPSS2, LTF, S100A12, MS4A6A, GSTK1, RNF31, NOTCH4, COL17A1, S100A8, CTSG, STX11, PTX3, and MYOF; (h) POLH, PTGER3, RUNX1, CASP6, CHPT1, APOBEC3F, USP14, PEX16, HLA-DQA1, IRF4, TNNC2, RIT1, ALG1, PDCD4, CYP2E1, GABARAPL2, B4GALT7, IFNAR1, MEF2C, TLR8, TSPYL2, M6PR, IKZF1, CNDP2, SLCO2A1, RBM4, FH, MRTO4, DTX4, and RFC2; (i) CPEB4, CDKN3, TRIM14, ANXA9, CRYAB, CHST11, ANAPC11, RNASE3, FN1, ARNTL2, KRT82, PRIM2, MOCS2, IL21R, MAPK8, NMNAT1, ZNF107, CTSG, IL7, ANKRD34B, TMF1, HPS3, CIT, TRAP1, MSH2, PDGFC, TMLHE, MVP, TBX21, and PICALM; (j) HUWE1, KCNK5, STX11, MORC3, NETO2, BATF2, CCL3L1, SAMD9, CCL2, PPFIA4, RPH3A, CXCL11, ERMAP, GBP2, CASP1, TLR7, EPX, ANKH, ARFGAP3, BAZ1A, COL5A1, COP1, BIRC2, SLC7A5, TRO, CXCL6, TNFSF10, GYPE, COL17A1, and ROCK1; (k) AKR1A1, NDST1, RNF144B, HDAC9, PSMB3, PFKP, MB, MYC, PEX14, TAF13, BMX, PRKAA2, PTGER3, C3, SPTAN1, PROCR, AARS2, RHOT2, PHEX, THOP1, TIMM10, TBL1X, HNF4A, SLC6A9, FECH, CLCN3, CEACAM4, MMP7, HSD11B2, and SLC25A25; (l) SPOCK3, PVR, CHTF8, SLC20A1, PARP8, FGG, ZFAND2A, CCL25, CALR, TM7SF2, FUS, DDAH2, SPAG4, FBXL14, LGALS8, GNE, HAS2, IGSF6, B4GALT1, POLK, PLK4, NDUFB4, GNG8, MUC1, AGGF1, PPIB, SLC1A4, HLA-DQB1, SEMA4G, and MT2A; (m) NLRC5, CACNB2, CELSR1, PARP8, ECT2, HTATIP2, NRP1, NCK2, TMEM100, CLCA2, BAALC, PTPN14, IRF9, SAA2, HR, IRGQ, AKT3, SYNGR1, NKX2-2, MT1H, SERPINA6, CAMK2N1, CCT6B, WDHD1, NKX3-1, LDHC, MALT1, CD9, CLGN, and SLC25A19; (n) CCK, SESN2, NACAD, PCSK9, C1R, SLC7A1, ECM1, XCL1, ARG2, SPSB1, DNAH17, TNNC1, CPN1, SYNGR2, CPA4, MYL1, DUOX2, ZNF621, GAPDHS, BCAP31, DLG1, IL17RB, SLC6A6, BCL2L2, HSPA1B, SLC1A4, TSTD1, HSPB8, MSC, and CENPJ.
In various embodiments, the universal signature comprises thirty five markers selected from: (a) NUP93, PPM1G, C6orf62, PJA1, MEST, NDUFS2, DDOST, DHRS7B, NOLC1, POLA2, PRSS23, SHMT1, RIPK1, AKR1A1, PRPF3, ETS1, MANSC1, PDHA1, ACLY, CHI3L2, MCMI, DNAJC18, LCT, YRDC, AIFM1, SFN, FBN1, EIF4H, CLEC4A, BCAP31, ATG4B, CSRP1, RDH11, GCLM, and CDC7; (b) CRB3, BCAP31, GMPPB, CD4, STARD3, CALR, CSRP1, CPT1A, LDLRAP1, RRAS, HMGCR, RASGRP2, PTS, SORD, SLC26A6, VAT1, GPAA1, CXCR3, NAMPT, EPHX1, SEPT9, GMPPA, B4GALT7, AAAS, TP531NP1, GYS1, FASN, NOC4L, RRP9, MXI1, TP53, SLC7A11, FOXP3, DNASE1L1, and MGAT1; (c) NUB1, CASP1, WARS, TRIM21, STAT1, MOCOS, BCL2L14, ATF3, KIF2A, PDCD1LG2, SNX10, SEC24D, UBE2L6, LDHC, FAS, CXCL10, STAT2, IRF7, CD274, PSME2, LPCAT2, PSMB8, FBXO6, DUSP10, PLA2G4C, BANF1, EPOR, KCNMA1, CTSK, ITGA2, MPZL2, FEZ1, JAK2, BAZ1A, and ICAM4; (d) DNAAF1, UQCRC2, XPNPEP1, ACSM1, DDX60, TPI1, EFNA3, ZDHHC19, DDIT3, DNAJC12, RET, IL20RB, TNFSF10, DLG4, CKAP4, NDST1, GAPDH, ARL3, PLG, MDH2, GSTP1, S100A9, B4GALT7, H2AFJ, LTB4R, TAGLN2, IRF7, NDUFV1, CD300LB, RTP4, CTSD, HIST1H2BG, IL27, TNFRSF1B, and SORBS1; (e) LRRC28, E2F4, MRPL15, CCL22, OTUD1, NSUN7, CHEK1, ADGRA2, ZFPM2, GYS2, CD151, RAD51C, ARHGEF2, PFN1, AP4B1, IGFBP4, OASL, PDGFC, MIEN1, BEST3, SH3RF1, RACGAP1, FMO3, HNRNPA2B1, F2RL1, CAMKK2, ITGB5, FLVCR2, ZNF462, KIAA1324, CENPN, IKBKE, SERPINF2, FAM162A, and SNX2; (f) GSTM3, GYG1, CCL22, MOCS2, LY6E, CD151, S100A12, HEBP2, EIF3B, BAAT, MRPL11, OAS1, RFX5, PSMD7, ALDH2, STAP1, GYS2, GMFB, CCL3, PSMA4, CTHRC1, CMTM2, CD36, B4GALT2, EDF1, CDK5R1, TREML3P, PML, HEPHL1, TNFRSF21, PSMB9, GNAI1, TSPAN13, ATP6V0B, and SLC4A4; (g) MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, FRMD5, INHBA, ALDH2, PAPSS2, LTF, S100A12, MS4A6A, GSTK1, RNF31, NOTCH4, COL17A1, S100A8, CTSG, STX11, PTX3, MYOF, LTA4H, TRIM26, CYP1B1, ARG1, and IFNGR2; (h) POLH, PTGER3, RUNX1, CASP6, CHPT1, APOBEC3F, USP14, PEX16, HLA-DQA1, IRF4, TNNC2, RIT1, ALG1, PDCD4, CYP2E1, GABARAPL2, B4GALT7, IFNAR1, MEF2C, TLR8, TSPYL2, M6PR, IKZF1, CNDP2, SLCO2A1, RBM4, FH, MRTO4, DTX4, RFC2, CAMK1G, CBX8, HM13, PSMB10, and GCLM; (i) CPEB4, CDKN3, TRIM14, ANXA9, CRYAB, CHST11, ANAPC11, RNASE3, FN1, ARNTL2, KRT82, PRIM2, MOCS2, IL21R, MAPK8, NMNAT1, ZNF107, CTSG, IL7, ANKRD34B, TMF1, HPS3, CIT, TRAP1, MSH2, PDGFC, TMLHE, MVP, TBX21, PICALM, KRT6A, FMR1, PCSK9, DNASE1L3, and ENDOG; (j) HUWE1, KCNK5, STX11, MORC3, NETO2, BATF2, CCL3L1, SAMD9, CCL2, PPFIA4, RPH3A, CXCL11, ERMAP, GBP2, CASP1, TLR7, EPX, ANKH, ARFGAP3, BAZ1A, COL5A1, COP1, BIRC2, SLC7A5, TRO, CXCL6, TNFSF10, GYPE, COL17A1, ROCK1, CD83, AK7, MSR1, LCN2, and SPN; (k) AKR1A1, NDST1, RNF144B, HDAC9, PSMB3, PFKP, MB, MYC, PEX14, TAF13, BMX, PRKAA2, PTGER3, C3, SPTAN1, PROCR, AARS2, RHOT2, PHEX, THOP1, TIMM10, TBL1X, HNF4A, SLC6A9, FECH, CLCN3, CEACAM4, MMPI, HSD11B2, SLC25A25, RAB32, CXCL9, KCNE2, FCAR, and CFP; (l) SPOCK3, PVR, CHTF8, SLC20A1, PARP8, FGG, ZFAND2A, CCL25, CALK, TM7SF2, FUS, DDAH2, SPAG4, FBXL14, LGALS8, GNE, HAS2, IGSF6, B4GALT1, POLK, PLK4, NDUFB4, GNG8, MUC1, AGGF1, PPIB, SLC1A4, HLA-DQB1, SEMA4G, MT2A, COL4A2, PLCB4, GYS1, PRKCG, and RXFP2; (m) NLRC5, CACNB2, CELSR1, PARP8, ECT2, HTATIP2, NRP1, NCK2, TMEM100, CLCA2, BAALC, PTPN14, IRF9, SAA2, HR, IRGQ, AKT3, SYNGR1, NKX2-2, MT1H, SERPINA6, CAMK2N1, CCT6B, WDHD1, NKX3-1, LDHC, MALT1, CD9, CLGN, SLC25A19, MAP7, XCL1, ACSL6, TFRC, and CAT; (n) CCK, SESN2, NACAD, PCSK9, C1R, SLC7A1, ECM1, XCL1, ARG2, SPSB1, DNAH17, TNNC1, CPN1, SYNGR2, CPA4, MYL1, DUOX2, ZNF621, GAPDHS, BCAP31, DLG1, IL17RB, SLC6A6, BCL2L2, HSPA1B, SLC1A4, TSTD1, HSPB8, MSC, CENPJ, ARL8A, CTLA4, GFRA1, WASF1, and RIPK1.
In various embodiments, the universal signature comprises forty markers selected from: (a) NUP93, PPM1G, C6orf62, PJA1, MEST, NDUFS2, DDOST, DHRS7B, NOLC1, POLA2, PRSS23, SHMT1, RIPK1, AKR1A1, PRPF3, ETS1, MANSC1, PDHA1, ACLY, CHI3L2, MCMI, DNAJC18, LCT, YRDC, AIFM1, SFN, FBN1, EIF4H, CLEC4A, BCAP31, ATG4B, CSRP1, RDH11, GCLM, CDC7, GLOD5, IDH2, FMR1, PPARA, and CCNE1; (b) CRB3, BCAP31, GMPPB, CD4, STARD3, CALK, CSRP1, CPT1A, LDLRAP1, RRAS, HMGCR, RASGRP2, PTS, SORD, SLC26A6, VAT1, GPAA1, CXCR3, NAMPT, EPHX1, SEPT9, GMPPA, B4GALT7, AAAS, TP53INP1, GYS1, FASN, NOC4L, RRP9, MXI1, TP53, SLC7A11, FOXP3, DNASE1L1, MGAT1, SEC61A1, FYCO1, S100A10, LSS, and IFRD1; (c) NUB1, CASP1, WARS, TRIM21, STAT1, MOCOS, BCL2L14, ATF3, KIF2A, PDCD1LG2, SNX10, SEC24D, UBE2L6, LDHC, FAS, CXCL10, STAT2, IRF7, CD274, PSME2, LPCAT2, PSMB8, FBXO6, DUSP10, PLA2G4C, BANF1, EPOR, KCNMA1, CTSK, ITGA2, MPZL2, FEZ1, JAK2, BAZ1A, ICAM4, DAPP1, RIPK1, RNF144B, LAP3, and C1QA; (d) DNAAF1, UQCRC2, XPNPEP1, ACSM1, DDX60, TPI1, EFNA3, ZDHHC19, DDIT3, DNAJC12, RET, IL20RB, TNFSF10, DLG4, CKAP4, NDST1, GAPDH, ARL3, PLG, MDH2, GSTP1, S100A9, B4GALT7, H2AFJ, LTB4R, TAGLN2, IRF7, NDUFV1, CD300LB, RTP4, CTSD, HIST1H2BG, IL27, TNFRSF1B, SORBS1, NOP2, TNFSF13B, HLA-DRB5, RHOG, and PSMB9; (e) LRRC28, E2F4, MRPL15, CCL22, OTUD1, NSUN7, CHEK1, ADGRA2, ZFPM2, GYS2, CD151, RAD51C, ARHGEF2, PFN1, AP4B1, IGFBP4, OASL, PDGFC, MIEN1, BEST3, SH3RF1, RACGAP1, FMO3, HNRNPA2B1, F2RL1, CAMKK2, ITGB5, FLVCR2, ZNF462, KIAA1324, CENPN, IKBKE, SERPINF2, FAM162A, SNX2, SERPING1, CLCA2, DPEP3, TNFAIP2, and FSTL4; (f) GSTM3, GYG1, CCL22, MOCS2, LY6E, CD151, S100A12, HEBP2, EIF3B, BAAT, MRPL11, OAS1, RFX5, PSMD7, ALDH2, STAP1, GYS2, GMFB, CCL3, PSMA4, CTHRC1, CMTM2, CD36, B4GALT2, EDF1, CDK5R1, TREML3P, PML, HEPHL1, TNFRSF21, PSMB9, GNAI1, TSPAN13, ATP6V0B, SLC4A4, ILF2, AKAP12, HLA-DRB5, PGR, and AGTRAP; (g) MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, FRMD5, INHBA, ALDH2, PAPSS2, LTF, S100A12, MS4A6A, GSTK1, RNF31, NOTCH4, COL17A1, S100A8, CTSG, STX11, PTX3, MYOF, LTA4H, TRIM26, CYP1B1, ARG1, IFNGR2, B3GNT5, KYNU, LPGAT1, SLC9A3R1, and HP; (h) POLH, PTGER3, RUNX1, CASP6, CHPT1, APOBEC3F, USP14, PEX16, HLA-DQA1, IRF4, TNNC2, RIT1, ALG1, PDCD4, CYP2E1, GABARAPL2, B4GALT7, IFNAR1, MEF2C, TLR8, TSPYL2, M6PR, IKZF1, CNDP2, SLCO2A1, RBM4, FH, MRTO4, DTX4, RFC2, CAMK1G, CBX8, HM13, PSMB10, GCLM, SLC25A3, MYD88, IL33, ITGAM, and PPIA; (i) CPEB4, CDKN3, TRIM14, ANXA9, CRYAB, CHST11, ANAPC11, RNASE3, FN1, ARNTL2, KRT82, PRIM2, MOCS2, IL21R, MAPK8, NMNAT1, ZNF107, CTSG, IL7, ANKRD34B, TMF1, HPS3, CIT, TRAP1, MSH2, PDGFC, TMLHE, MVP, TBX21, PICALM, KRT6A, FMR1, PCSK9, DNASE1L3, ENDOG, TPD52L1, PEX6, MPO, CHRNA7, and SLFN5; (j) HUWE1, KCNK5, STX11, MORC3, NETO2, BATF2, CCL3L1, SAMD9, CCL2, PPFIA4, RPH3A, CXCL11, ERMAP, GBP2, CASP1, TLR7, EPX, ANKH, ARFGAP3, BAZ1A, COL5A1, COP1, BIRC2, SLC7A5, TRO, CXCL6, TNFSF10, GYPE, COL17A1, ROCK1, CD83, AK7, MSR1, LCN2, SPN, ASS1, HDGF, CXCL16, POLR3D, and GK; (k) AKR1A1, NDST1, RNF144B, HDAC9, PSMB3, PFKP, MB, MYC, PEX14, TAF13, BMX, PRKAA2, PTGER3, C3, SPTAN1, PROCR, AARS2, RHOT2, PHEX, THOP1, TIMM10, TBL1X, HNF4A, SLC6A9, FECH, CLCN3, CEACAM4, MMPI, HSD11B2, SLC25A25, RAB32, CXCL9, KCNE2, FCAR, CFP, IGF1, PEX16, RNF214, PIM1, and JUNB; (l) SPOCK3, PVR, CHTF8, SLC20A1, PARP8, FGG, ZFAND2A, CCL25, CALK, TM7SF2, FUS, DDAH2, SPAG4, FBXL14, LGALS8, GNE, HAS2, IGSF6, B4GALT1, POLK, PLK4, NDUFB4, GNG8, MUC1, AGGF1, PPIB, SLC1A4, HLA-DQB1, SEMA4G, MT2A, COL4A2, PLCB4, GYS1, PRKCG, RXFP2, PLA2G4C, ALDH1A2, ILIA, IBTK, and SPARC; (m) NLRC5, CACNB2, CELSR1, PARP8, ECT2, HTATIP2, NRP1, NCK2, TMEM100, CLCA2, BAALC, PTPN14, IRF9, SAA2, HR, IRGQ, AKT3, SYNGR1, NKX2-2, MT1H, SERPINA6, CAMK2N1, CCT6B, WDHD1, NKX3-1, LDHC, MALT1, CD9, CLGN, SLC25A19, MAP7, XCL1, ACSL6, TFRC, CAT, NKD1, CNBP, ALDH1L1, CCL7, and SLC20A1; (n) CCK, SESN2, NACAD, PCSK9, C1R, SLC7A1, ECM1, XCL1, ARG2, SPSB1, DNAH17, TNNC1, CPN1, SYNGR2, CPA4, MYL1, DUOX2, ZNF621, GAPDHS, BCAP31, DLG1, IL17RB, SLC6A6, BCL2L2, HSPA1B, SLC1A4, TSTD1, HSPB8, MSC, CENPJ, ARL8A, CTLA4, GFRA1, WASF1, RIPK1, ENO3, KRT19, PLVAP, RAD18, and ACHE.
In various embodiments, the universal signature comprises forty five markers selected from: (a) NUP93, PPM1G, C6orf62, PJA1, MEST, NDUFS2, DDOST, DHRS7B, NOLC1, POLA2, PRSS23, SHMT1, RIPK1, AKR1A1, PRPF3, ETS1, MANSC1, PDHA1, ACLY, CHI3L2, MCMI, DNAJC18, LCT, YRDC, AIFM1, SFN, FBN1, EIF4H, CLEC4A, BCAP31, ATG4B, CSRP1, RDH11, GCLM, CDC7, GLOD5, IDH2, FMR1, PPARA, CCNE1, DDB1, BMP1, EHD4, VAV3, and MPG; (b) CRB3, BCAP31, GMPPB, CD4, STARD3, CALK, CSRP1, CPT1A, LDLRAP1, RRAS, HMGCR, RASGRP2, PTS, SORD, SLC26A6, VAT1, GPAA1, CXCR3, NAMPT, EPHX1, SEPT9, GMPPA, B4GALT7, AAAS, TP53INP1, GYS1, FASN, NOC4L, RRP9, MXI1, TP53, SLC7A11, FOXP3, DNASE1L1, MGAT1, SEC61A1, FYCO1, S100A10, LSS, IFRD1, DCP2, EDC4, ANKZF1, IDUA, and IGFBP2; (c) NUB1, CASP1, WARS, TRIM21, STAT1, MOCOS, BCL2L14, ATF3, KIF2A, PDCD1LG2, SNX10, SEC24D, UBE2L6, LDHC, FAS, CXCL10, STAT2, IRF7, CD274, PSME2, LPCAT2, PSMB8, FBXO6, DUSP10, PLA2G4C, BANF1, EPOR, KCNMA1, CTSK, ITGA2, MPZL2, FEZ1, JAK2, BAZ1A, ICAM4, DAPP1, RIPK1, RNF144B, LAP3, C1QA, TYMP, GCH1, C1QB, CREM, and ETV7; (d) DNAAF1, UQCRC2, XPNPEP1, ACSM1, DDX60, TPI1, EFNA3, ZDHHC19, DDIT3, DNAJC12, RET, IL20RB, TNFSF10, DLG4, CKAP4, NDST1, GAPDH, ARL3, PLG, MDH2, GSTP1, S100A9, B4GALT7, H2AFJ, LTB4R, TAGLN2, IRF7, NDUFV1, CD300LB, RTP4, CTSD, HIST1H2BG, IL27, TNFRSF1B, SORBS1, NOP2, TNFSF13B, HLA-DRB5, RHOG, PSMB9, HSPA6, CD63, SLC2A8, IFITM1, and CKB; (e) LRRC28, E2F4, MRPL15, CCL22, OTUD1, NSUN7, CHEK1, ADGRA2, ZFPM2, GYS2, CD151, RAD51C, ARHGEF2, PFN1, AP4B1, IGFBP4, OASL, PDGFC, MIEN1, BEST3, SH3RF1, RACGAP1, FMO3, HNRNPA2B1, F2RL1, CAMKK2, ITGB5, FLVCR2, ZNF462, KIAA1324, CENPN, IKBKE, SERPINF2, FAM162A, SNX2, SERPING1, CLCA2, DPEP3, TNFAIP2, FSTL4, CTSD, BCAR1, MKX, RGS2, and SAMD9; (f) GSTM3, GYG1, CCL22, MOCS2, LY6E, CD151, S100A12, HEBP2, EIF3B, BAAT, MRPL11, OAS1, RFX5, PSMD7, ALDH2, STAP1, GYS2, GMFB, CCL3, PSMA4, CTHRC1, CMTM2, CD36, B4GALT2, EDF1, CDK5R1, TREML3P, PML, HEPHL1, TNFRSF21, PSMB9, GNAI1, TSPAN13, ATP6V0B, SLC4A4, ILF2, AKAP12, HLA-DRB5, PGR, AGTRAP, P3H1, CDADC1, TRIM5, PTGER3, and ADCY6; (g) MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, FRMD5, INHBA, ALDH2, PAPSS2, LTF, S100A12, MS4A6A, GSTK1, RNF31, NOTCH4, COL17A1, S100A8, CTSG, STX11, PTX3, MYOF, LTA4H, TRIM26, CYP1B1, ARG1, IFNGR2, B3GNT5, KYNU, LPGAT1, SLC9A3R1, HP, PADI4, PSME1, MGST2, NR4A1, and SPP1; (h) POLH, PTGER3, RUNX1, CASP6, CHPT1, APOBEC3F, USP14, PEX16, HLA-DQA1, IRF4, TNNC2, RIT1, ALG1, PDCD4, CYP2E1, GABARAPL2, B4GALT7, IFNAR1, MEF2C, TLR8, TSPYL2, M6PR, IKZF1, CNDP2, SLCO2A1, RBM4, FH, MRTO4, DTX4, RFC2, CAMK1G, CBX8, HM13, PSMB10, GCLM, SLC25A3, MYD88, IL33, ITGAM, PPIA, SEC22B, CXCR3, SCRN1, RXRA, and SDHA; (i) CPEB4, CDKN3, TRIM14, ANXA9, CRYAB, CHST11, ANAPC11, RNASE3, FN1, ARNTL2, KRT82, PRIM2, MOCS2, IL21R, MAPK8, NMNAT1, ZNF107, CTSG, IL7, ANKRD34B, TMF1, HPS3, CIT, TRAP1, MSH2, PDGFC, TMLHE, MVP, TBX21, PICALM, KRT6A, FMR1, PCSK9, DNASE1L3, ENDOG, TPD52L1, PEX6, MPO, CHRNA7, SLFN5, TNFRSF1A, CD24, CASC1, LLGL2, and DLG5; (j) HUWE1, KCNK5, STX11, MORC3, NETO2, BATF2, CCL3L1, SAMD9, CCL2, PPFIA4, RPH3A, CXCL11, ERMAP, GBP2, CASP1, TLR7, EPX, ANKH, ARFGAP3, BAZ1A, COL5A1, COP1, BIRC2, SLC7A5, TRO, CXCL6, TNFSF10, GYPE, COL17A1, ROCK1, CD83, AK7, MSR1, LCN2, SPN, ASS1, HDGF, CXCL16, POLR3D, GK, OLFM4, STK3, RCBTB1, FOLR3, and FBXO32; (k) AKR1A1, NDST1, RNF144B, HDAC9, PSMB3, PFKP, MB, MYC, PEX14, TAF13, BMX, PRKAA2, PTGER3, C3, SPTAN1, PROCR, AARS2, RHOT2, PHEX, THOP1, TIMM10, TBL1X, HNF4A, SLC6A9, FECH, CLCN3, CEACAM4, MMPI, HSD11B2, SLC25A25, RAB32, CXCL9, KCNE2, FCAR, CFP, IGF1, PEX16, RNF214, PIM1, JUNB, MDM2, PFKFB4, SIAH2, EGR2, and KCNK10; (l) SPOCK3, PVR, CHTF8, SLC20A1, PARP8, FGG, ZFAND2A, CCL25, CALR, TM7SF2, FUS, DDAH2, SPAG4, FBXL14, LGALS8, GNE, HAS2, IGSF6, B4GALT1, POLK, PLK4, NDUFB4, GNG8, MUC1, AGGF1, PPIB, SLC1A4, HLA-DQB1, SEMA4G, MT2A, COL4A2, PLCB4, GYS1, PRKCG, RXFP2, PLA2G4C, ALDH1A2, ILIA, IBTK, SPARC, OAS3, EPHA4, HLA-B, MICB, and CCL18; (m) NLRC5, CACNB2, CELSR1, PARP8, ECT2, HTATIP2, NRP1, NCK2, TMEM100, CLCA2, BAALC, PTPN14, IRF9, SAA2, HR, IRGQ, AKT3, SYNGR1, NKX2-2, MT1H, SERPINA6, CAMK2N1, CCT6B, WDHD1, NKX3-1, LDHC, MALT1, CD9, CLGN, SLC25A19, MAP7, XCL1, ACSL6, TFRC, CAT, NKD1, CNBP, ALDH1L1, CCL7, SLC20A1, KRAS, CSF1, CASP2, HDAC11, and KIR2DS4; (n) CCK, SESN2, NACAD, PCSK9, C1R, SLC7A1, ECM1, XCL1, ARG2, SPSB1, DNAH17, TNNC1, CPN1, SYNGR2, CPA4, MYL1, DUOX2, ZNF621, GAPDHS, BCAP31, DLG1, IL17RB, SLC6A6, BCL2L2, HSPA1B, SLC1A4, TSTD1, HSPB8, MSC, CENPJ, ARL8A, CTLA4, GFRA1, WASF1, RIPK1, ENO3, KRT19, PLVAP, RAD18, ACHE, FBLN5, MGST2, ANAPC5, RFX5, and CASP7.
In various embodiments, the universal signature comprises fifty markers selected from: (a) NUP93, PPM1G, C6orf62, PJA1, MEST, NDUFS2, DDOST, DHRS7B, NOLC1, POLA2, PRSS23, SHMT1, RIPK1, AKR1A1, PRPF3, ETS1, MANSC1, PDHA1, ACLY, CHI3L2, MCMI, DNAJC18, LCT, YRDC, AIFM1, SFN, FBN1, EIF4H, CLEC4A, BCAP31, ATG4B, CSRP1, RDH11, GCLM, CDC7, GLOD5, IDH2, FMR1, PPARA, CCNE1, DDB1, BMP1, EHD4, VAV3, MPG, SPAG4, PSMD3, BCKDHA, GRAMD1B, and SEC61A1; (b) CRB3, BCAP31, GMPPB, CD4, STARD3, CALR, CSRP1, CPT1A, LDLRAP1, RRAS, HMGCR, RASGRP2, PTS, SORD, SLC26A6, VAT1, GPAA1, CXCR3, NAMPT, EPHX1, SEPT9, GMPPA, B4GALT7, AAAS, TP53INP1, GYS1, FASN, NOC4L, RRP9, MXI1, TP53, SLC7A11, FOXP3, DNASE1L1, MGAT1, SEC61A1, FYCO1, S100A10, LSS, IFRD1, DCP2, EDC4, ANKZF1, IDUA, IGFBP2, DDX39A, UCHL1, NR4A1, PDIA5, and ENGASE; (c) NUB1, CASP1, WARS, TRIM21, STAT1, MOCOS, BCL2L14, ATF3, KIF2A, PDCD1LG2, SNX10, SEC24D, UBE2L6, LDHC, FAS, CXCL10, STAT2, IRF7, CD274, PSME2, LPCAT2, PSMB8, FBXO6, DUSP10, PLA2G4C, BANF1, EPOR, KCNMA1, CTSK, ITGA2, MPZL2, FEZ1, JAK2, BAZ1A, ICAM4, DAPP1, RIPK1, RNF144B, LAP3, C1QA, TYMP, GCH1, C1QB, CREM, ETV7, FOSB, MRPL15, PSEN1, MXI1, and TRAFD1; (d) DNAAF1, UQCRC2, XPNPEP1, ACSM1, DDX60, TPI1, EFNA3, ZDHHC19, DDIT3, DNAJC12, RET, IL20RB, TNFSF10, DLG4, CKAP4, NDST1, GAPDH, ARL3, PLG, MDH2, GSTP1, S100A9, B4GALT7, H2AFJ, LTB4R, TAGLN2, IRF7, NDUFV1, CD300LB, RTP4, CTSD, HIST1H2BG, IL27, TNFRSF1B, SORBS1, NOP2, TNFSF13B, HLA-DRB5, RHOG, PSMB9, HSPA6, CD63, SLC2A8, IFITM1, CKB, ALDOA, MSRB1, OSMR, DRAP1, and PLA2G4A; (e) LRRC28, E2F4, MRPL15, CCL22, OTUD1, NSUN7, CHEK1, ADGRA2, ZFPM2, GYS2, CD151, RAD51C, ARHGEF2, PFN1, AP4B1, IGFBP4, OASL, PDGFC, MIEN1, BEST3, SH3RF1, RACGAP1, FMO3, HNRNPA2B1, F2RL1, CAMKK2, ITGB5, FLVCR2, ZNF462, KIAA1324, CENPN, IKBKE, SERPINF2, FAM162A, SNX2, SERPING1, CLCA2, DPEP3, TNFAIP2, FSTL4, CTSD, BCAR1, MKX, RGS2, SAMD9, GCLM, BST1, IRS2, RNASE6, and ELOVL3; (f) GSTM3, GYG1, CCL22, MOCS2, LY6E, CD151, S100A12, HEBP2, EIF3B, BAAT, MRPL11, OAS1, RFX5, PSMD7, ALDH2, STAP1, GYS2, GMFB, CCL3, PSMA4, CTHRC1, CMTM2, CD36, B4GALT2, EDF1, CDK5R1, TREML3P, PML, HEPHL1, TNFRSF21, PSMB9, GNAI1, TSPAN13, ATP6V0B, SLC4A4, ILF2, AKAP12, HLA-DRB5, PGR, AGTRAP, P3H1, CDADC1, TRIM5, PTGER3, ADCY6, ERBB2, NFYA, STATE, MMD, and RPL10A; (g) MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, FRMD5, INHBA, ALDH2, PAPSS2, LTF, S100A12, MS4A6A, GSTK1, RNF31, NOTCH4, COL17A1, S100A8, CTSG, STX11, PTX3, MYOF, LTA4H, TRIM26, CYP1B1, ARG1, IFNGR2, B3GNT5, KYNU, LPGAT1, SLC9A3R1, HP, PADI4, PSME1, MGST2, NR4A1, SPP1, DEFA3, ME1, RBP7, DUSP6, and MCRS1; (h) POLH, PTGER3, RUNX1, CASP6, CHPT1, APOBEC3F, USP14, PEX16, HLA-DQA1, IRF4, TNNC2, RIT1, ALG1, PDCD4, CYP2E1, GABARAPL2, B4GALT7, IFNAR1, MEF2C, TLR8, TSPYL2, M6PR, IKZF1, CNDP2, SLCO2A1, RBM4, FH, MRTO4, DTX4, RFC2, CAMK1G, CBX8, HM13, PSMB10, GCLM, SLC25A3, MYD88, IL33, ITGAM, PPIA, SEC22B, CXCR3, SCRN1, RXRA, SDHA, GLDC, FGF6, PRKG2, TFPI, and IMMT; (i) CPEB4, CDKN3, TRIM14, ANXA9, CRYAB, CHST11, ANAPC11, RNASE3, FN1, ARNTL2, KRT82, PRIM2, MOCS2, IL21R, MAPK8, NMNAT1, ZNF107, CTSG, IL7, ANKRD34B, TMF1, HPS3, CIT, TRAP1, MSH2, PDGFC, TMLHE, MVP, TBX21, PICALM, KRT6A, FMR1, PCSK9, DNASE1L3, ENDOG, TPD52L1, PEX6, MPO, CHRNA7, SLFN5, TNFRSF1A, CD24, CASC1, LLGL2, DLG5, MYO5C, PGR, PFKFB2, AK2, and COL19A1; (j) HUWE1, KCNK5, STX11, MORC3, NETO2, BATF2, CCL3L1, SAMD9, CCL2, PPFIA4, RPH3A, CXCL11, ERMAP, GBP2, CASP1, TLR7, EPX, ANKH, ARFGAP3, BAZ1A, COL5A1, COP1, BIRC2, SLC7A5, TRO, CXCL6, TNFSF10, GYPE, COL17A1, ROCK1, CD83, AK7, MSR1, LCN2, SPN, ASS1, HDGF, CXCL16, POLR3D, GK, OLFM4, STK3, RCBTB1, FOLR3, FBXO32, TMEM98, PRDX2, CKB, UHRF1BP1L, and CTSG; (k) AKR1A1, NDST1, RNF144B, HDAC9, PSMB3, PFKP, MB, MYC, PEX14, TAF13, BMX, PRKAA2, PTGER3, C3, SPTAN1, PROCR, AARS2, RHOT2, PHEX, THOP1, TIMM10, TBL1X, HNF4A, SLC6A9, FECH, CLCN3, CEACAM4, MMPI, HSD11B2, SLC25A25, RAB32, CXCL9, KCNE2, FCAR, CFP, IGF1, PEX16, RNF214, PIM1, JUNB, MDM2, PFKFB4, SIAH2, EGR2, KCNK10, EHMT2, FPR1, CD27, CETN2, and TGM1; (l) SPOCK3, PVR, CHTF8, SLC20A1, PARP8, FGG, ZFAND2A, CCL25, CALR, TM7SF2, FUS, DDAH2, SPAG4, FBXL14, LGALS8, GNE, HAS2, IGSF6, B4GALT1, POLK, PLK4, NDUFB4, GNG8, MUC1, AGGF1, PPIB, SLC1A4, HLA-DQB1, SEMA4G, MT2A, COL4A2, PLCB4, GYS1, PRKCG, RXFP2, PLA2G4C, ALDH1A2, IL1A, IBTK, SPARC, OAS3, EPHA4, HLA-B, MICB, CCL18, SLC39A6, GLCE, TUBB2B, FBXO8, and SNX6; (m) NLRC5, CACNB2, CELSR1, PARP8, ECT2, HTATIP2, NRP1, NCK2, TMEM100, CLCA2, BAALC, PTPN14, IRF9, SAA2, HR, IRGQ, AKT3, SYNGR1, NKX2-2, MT1H, SERPINA6, CAMK2N1, CCT6B, WDHD1, NKX3-1, LDHC, MALT1, CD9, CLGN, SLC25A19, MAP7, XCL1, ACSL6, TFRC, CAT, NKD1, CNBP, ALDH1L1, CCL7, SLC20A1, KRAS, CSF1, CASP2, HDAC11, KIR2DS4, CEACAM19, CFH, CAB39L, DEPDC1, and PSMA1; (n) CCK, SESN2, NACAD, PCSK9, C1R, SLC7A1, ECM1, XCL1, ARG2, SPSB1, DNAH17, TNNC1, CPN1, SYNGR2, CPA4, MYL1, DUOX2, ZNF621, GAPDHS, BCAP31, DLG1, IL17RB, SLC6A6, BCL2L2, HSPA1B, SLC1A4, TSTD1, HSPB8, MSC, CENPJ, ARL8A, CTLA4, GFRA1, WASF1, RIPK1, ENO3, KRT19, PLVAP, RAD18, ACHE, FBLN5, MGST2, ANAPC5, RFX5, CASP7, STC1, NCK2, IFI27, APOA4, and MSRB2.
In various embodiments, a universal signature can be used to predict progression of tuberculosis in an individual. In various embodiments, the progression of tuberculosis can be the progression of latent tuberculosis to active tuberculosis. In various embodiments, the progression of tuberculosis occurs within one year. In various embodiments, a universal signature can be used to predict progression of a glioma in an individual In various embodiments, the progression of a glioma can be a severe progression of glioma such that the patient is likely to expire within a year. In various embodiments, a universal signature can be used to predict either the progression of tuberculosis or the progression of glioma in an individual. In such embodiments, the universal signature comprises markers selected from: (a) NUP93, PPM1G, C6orf62, PJA1, and MEST; (b) CRB3, BCAP31, GMPPB, CD4, and STARD3; (c) NUB1, CASP1, WARS, TRIM21, and STAT1; (d) NUP93, PPM1G, C6orf62, PJA1, MEST, NDUFS2, DDOST, DHRS7B, NOLC1, POLA2, PRSS23, SHMT1, RIPK1, AKR1A1, PRPF3, ETS1, MANSC1, PDHA1, ACLY, CHI3L2, MCMI, DNAJC18, LCT, YRDC, and AIFM1; (e) CRB3, BCAP31, GMPPB, CD4, STARD3, CALR, CSRP1, CPT1A, LDLRAP1, RRAS, HMGCR, RASGRP2, PTS, SORD, SLC26A6, VAT1, GPAA1, CXCR3, NAMPT, EPHX1, SEPT9, GMPPA, B4GALT7, AAAS, and TP53INP1; (f) NUB1, CASP1, WARS, TRIM21, STAT1, MOCOS, BCL2L14, ATF3, KIF2A, PDCD1LG2, SNX10, SEC24D, UBE2L6, LDHC, FAS, CXCL10, STAT2, IRF7, CD274, PSME2, LPCAT2, PSMB8, FBXO6, DUSP10, and PLA2G4C; (g) NUP93, PPM1G, C6orf62, PJA1, MEST, NDUFS2, DDOST, DHRS7B, NOLC1, POLA2, PRSS23, SHMT1, RIPK1, AKR1A1, PRPF3, ETS1, MANSC1, PDHA1, ACLY, CHI3L2, MCMI, DNAJC18, LCT, YRDC, AIFM1, SFN, FBN1, EIF4H, CLEC4A, BCAP31, ATG4B, CSRP1, RDH11, GCLM, CDC7, GLOD5, IDH2, FMR1, PPARA, CCNE1, DDB1, BMP1, EHD4, VAV3, MPG, SPAG4, PSMD3, BCKDHA, GRAMD1B, and SEC61A1; (h) CRB3, BCAP31, GMPPB, CD4, STARD3, CALR, CSRP1, CPT1A, LDLRAP1, RRAS, HMGCR, RASGRP2, PTS, SORD, SLC26A6, VAT1, GPAA1, CXCR3, NAMPT, EPHX1, SEPT9, GMPPA, B4GALT7, AAAS, TP53INP1, GYS1, FASN, NOC4L, RRP9, MXI1, TP53, SLC7A11, FOXP3, DNASE1L1, MGAT1, SEC61A1, FYCO1, S100A10, LSS, IFRD1, DCP2, EDC4, ANKZF1, IDUA, IGFBP2, DDX39A, UCHL1, NR4A1, PDIA5, and ENGASE; or (i) NUB1, CASP1, WARS, TRIM21, STAT1, MOCOS, BCL2L14, ATF3, KIF2A, PDCD1LG2, SNX10, SEC24D, UBE2L6, LDHC, FAS, CXCL10, STAT2, IRF7, CD274, PSME2, LPCAT2, PSMB8, FBXO6, DUSP10, PLA2G4C, BANF1, EPOR, KCNMA1, CTSK, ITGA2, MPZL2, FEZ1, JAK2, BAZ1A, ICAM4, DAPP1, RIPK1, RNF144B, LAP3, C1QA, TYMP, GCH1, C1QB, CREM, ETV7, FOSB, MRPL15, PSEN1, MXI1, and TRAFD1.
In various embodiments, a universal signature can be used to predict presence of an infection, severity of an infection, progression of an infection, or a patient response to a vaccine against an infection. In various embodiments, the infection is a viral infection. In various embodiments, the infection can be any one of a SARS CoV-2 infection, a HBV infection, H1N1 infection, or influenza infection. In various embodiments, the severity of an infection can be classified as one of severe or not severe. In various embodiments, the severity of the symptoms of an individual with a viral infection can be the severity of the symptoms after one year. In some embodiments, the universal signature useful for predicting presence of an infection, severity of an infection, progression of an infection, or patient response to a vaccine against an infection comprises markers selected from: (a) DNAAF1, UQCRC2, XPNPEP1, ACSM1, and DDX60; (b) LRRC28, E2F4, MRPL15, CCL22, and OTUD1; (c) GSTM3, GYG1, CCL22, MOCS2, and LY6E; (d) MAFB, LGALS3, VCAN, PDK4, and CD81; (e) POLH, PTGER3, RUNX1, CASP6, and CHPT1; (f) CPEB4, CDKN3, TRIM14, ANXA9, and CRYAB; (g) HUWE1, KCNK5, STX11, MORC3, and NETO2; (h) AKR1A1, NDST1, RNF144B, HDAC9, and PSMB3; (i) SPOCK3, PVR, CHTF8, SLC20A1, and PARP8; (j) NLRC5, CACNB2, CELSR1, PARP8, and ECT2; or (k) CCK, SESN2, NACAD, PCSK9, and C1R. In some embodiments, the universal signature useful for predicting presence of an infection, severity of an infection, progression of an infection, or patient response to a vaccine against an infection comprises markers selected from: (a) DNAAF1, UQCRC2, XPNPEP1, ACSM1, DDX60, TPI1, EFNA3, ZDHHC19, DDIT3, DNAJC12, RET, IL20RB, TNFSF10, DLG4, CKAP4, NDST1, GAPDH, ARL3, PLG, MDH2, GSTP1, S100A9, B4GALT7, H2AFJ, and LTB4R; (b) LRRC28, E2F4, MRPL15, CCL22, OTUD1, NSUN7, CHEK1, ADGRA2, ZFPM2, GYS2, CD151, RAD51C, ARHGEF2, PFN1, AP4B1, IGFBP4, OASL, PDGFC, MIEN1, BEST3, SH3RF1, RACGAP1, FMO3, HNRNPA2B1, and F2RL1; (c) GSTM3, GYG1, CCL22, MOCS2, LY6E, CD151, S100A12, HEBP2, EIF3B, BAAT, MRPL11, OAS1, RFX5, PSMD7, ALDH2, STAP1, GYS2, GMFB, CCL3, PSMA4, CTHRC1, CMTM2, CD36, B4GALT2, and EDF1; (d) MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, FRMD5, INHBA, ALDH2, PAPSS2, LTF, S100A12, MS4A6A, GSTK1, RNF31, NOTCH4, and COL17A1; (e) POLH, PTGER3, RUNX1, CASP6, CHPT1, APOBEC3F, USP14, PEX16, HLA-DQA1, IRF4, TNNC2, RIT1, ALG1, PDCD4, CYP2E1, GABARAPL2, B4GALT7, IFNAR1, MEF2C, TLR8, TSPYL2, M6PR, IKZF1, CNDP2, and SLCO2A1; (f) CPEB4, CDKN3, TRIM14, ANXA9, CRYAB, CHST11, ANAPC11, RNASE3, FN1, ARNTL2, KRT82, PRIM2, MOCS2, IL21R, MAPK8, NMNAT1, ZNF107, CTSG, IL7, ANKRD34B, TMF1, HPS3, CIT, TRAP1, and MSH2; (g) HUWE1, KCNK5, STX11, MORC3, NETO2, BATF2, CCL3L1, SAMD9, CCL2, PPFIA4, RPH3A, CXCL11, ERMAP, GBP2, CASP1, TLR7, EPX, ANKH, ARFGAP3, BAZ1A, COL5A1, COP1, BIRC2, SLC7A5, and TRO; (h) AKR1A1, NDST1, RNF144B, HDAC9, PSMB3, PFKP, MB, MYC, PEX14, TAF13, BMX, PRKAA2, PTGER3, C3, SPTAN1, PROCR, AARS2, RHOT2, PHEX, THOP1, TIMM10, TBL1X, HNF4A, SLC6A9, and FECH; (i) SPOCK3, PVR, CHTF8, SLC20A1, PARP8, FGG, ZFAND2A, CCL25, CALR, TM7SF2, FUS, DDAH2, SPAG4, FBXL14, LGALS8, GNE, HAS2, IGSF6, B4GALT1, POLK, PLK4, NDUFB4, GNG8, MUC1, and AGGF1; (j) NLRC5, CACNB2, CELSR1, PARP8, ECT2, HTATIP2, NRP1, NCK2, TMEM100, CLCA2, BAALC, PTPN14, IRF9, SAA2, HR, IRGQ, AKT3, SYNGR1, NKX2-2, MT1H, SERPINA6, CAMK2N1, CCT6B, WDHD1, and NKX3-1; (k) CCK, SESN2, NACAD, PCSK9, C1R, SLC7A1, ECM1, XCL1, ARG2, SPSB1, DNAH17, TNNC1, CPN1, SYNGR2, CPA4, MYL1, DUOX2, ZNF621, GAPDHS, BCAP31, DLG1, IL17RB, SLC6A6, BCL2L2, and HSPA1B. In some embodiments, the universal signature useful for predicting presence of an infection, severity of an infection, progression of an infection, or patient response to a vaccine against an infection comprises markers selected from: (a) DNAAF1, UQCRC2, XPNPEP1, ACSM1, DDX60, TPI1, EFNA3, ZDHHC19, DDIT3, DNAJC12, RET, IL20RB, TNFSF10, DLG4, CKAP4, NDST1, GAPDH, ARL3, PLG, MDH2, GSTP1, S100A9, B4GALT7, H2AFJ, LTB4R, TAGLN2, IRF7, NDUFV1, CD300LB, RTP4, CTSD, HIST1H2BG, IL27, TNFRSF1B, SORBS1, NOP2, TNFSF13B, HLA-DRB5, RHOG, PSMB9, HSPA6, CD63, SLC2A8, IFITM1, CKB, ALDOA, MSRB1, OSMR, DRAP1, and PLA2G4A; (b) LRRC28, E2F4, MRPL15, CCL22, OTUD1, NSUN7, CHEK1, ADGRA2, ZFPM2, GYS2, CD151, RAD51C, ARHGEF2, PFN1, AP4B1, IGFBP4, OASL, PDGFC, MIEN1, BEST3, SH3RF1, RACGAP1, FMO3, HNRNPA2B1, F2RL1, CAMKK2, ITGB5, FLVCR2, ZNF462, KIAA1324, CENPN, IKBKE, SERPINF2, FAM162A, SNX2, SERPING1, CLCA2, DPEP3, TNFAIP2, FSTL4, CTSD, BCAR1, MKX, RGS2, SAMD9, GCLM, BST1, IRS2, RNASE6, and ELOVL3; (c) GSTM3, GYG1, CCL22, MOCS2, LY6E, CD151, S100A12, HEBP2, EIF3B, BAAT, MRPL11, OAS1, RFX5, PSMD7, ALDH2, STAP1, GYS2, GMFB, CCL3, PSMA4, CTHRC1, CMTM2, CD36, B4GALT2, EDF1, CDK5R1, TREML3P, PML, HEPHL1, TNFRSF21, PSMB9, GNAI1, TSPAN13, ATP6V0B, SLC4A4, ILF2, AKAP12, HLA-DRB5, PGR, AGTRAP, P3H1, CDADC1, TRIM5, PTGER3, ADCY6, ERBB2, NFYA, STATE, MMD, and RPL10A; (d) MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, FRMD5, INHBA, ALDH2, PAPSS2, LTF, S100A12, MS4A6A, GSTK1, RNF31, NOTCH4, COL17A1, S100A8, CTSG, STX11, PTX3, MYOF, LTA4H, TRIM26, CYP1B1, ARG1, IFNGR2, B3GNT5, KYNU, LPGAT1, SLC9A3R1, HP, PADI4, PSME1, MGST2, NR4A1, SPP1, DEFA3, ME1, RBP7, DUSP6, and MCRS1; (e) POLH, PTGER3, RUNX1, CASP6, CHPT1, APOBEC3F, USP14, PEX16, HLA-DQA1, IRF4, TNNC2, RIT1, ALG1, PDCD4, CYP2E1, GABARAPL2, B4GALT7, IFNAR1, MEF2C, TLR8, TSPYL2, M6PR, IKZF1, CNDP2, SLCO2A1, RBM4, FH, MRTO4, DTX4, RFC2, CAMK1G, CBX8, HM13, PSMB10, GCLM, SLC25A3, MYD88, IL33, ITGAM, PPIA, SEC22B, CXCR3, SCRN1, RXRA, SDHA, GLDC, FGF6, PRKG2, TFPI, and IMMT; (f) CPEB4, CDKN3, TRIM14, ANXA9, CRYAB, CHST11, ANAPC11, RNASE3, FN1, ARNTL2, KRT82, PRIM2, MOCS2, IL21R, MAPK8, NMNAT1, ZNF107, CTSG, IL7, ANKRD34B, TMF1, HPS3, CIT, TRAP1, MSH2, PDGFC, TMLHE, MVP, TBX21, PICALM, KRT6A, FMR1, PCSK9, DNASE1L3, ENDOG, TPD52L1, PEX6, MPO, CHRNA7, SLFN5, TNFRSF1A, CD24, CASC1, LLGL2, DLG5, MYO5C, PGR, PFKFB2, AK2, and COL19A1; (g) HUWE1, KCNK5, STX11, MORC3, NETO2, BATF2, CCL3L1, SAMD9, CCL2, PPFIA4, RPH3A, CXCL11, ERMAP, GBP2, CASP1, TLR7, EPX, ANKH, ARFGAP3, BAZ1A, COL5A1, COP1, BIRC2, SLC7A5, TRO, CXCL6, TNFSF10, GYPE, COL17A1, ROCK1, CD83, AK7, MSR1, LCN2, SPN, ASS1, HDGF, CXCL16, POLR3D, GK, OLFM4, STK3, RCBTB1, FOLR3, FBXO32, TMEM98, PRDX2, CKB, UHRF1BP1L, and CTSG; (h) AKR1A1, NDST1, RNF144B, HDAC9, PSMB3, PFKP, MB, MYC, PEX14, TAF13, BMX, PRKAA2, PTGER3, C3, SPTAN1, PROCR, AARS2, RHOT2, PHEX, THOP1, TIMM10, TBL1X, HNF4A, SLC6A9, FECH, CLCN3, CEACAM4, MMPI, HSD11B2, SLC25A25, RAB32, CXCL9, KCNE2, FCAR, CFP, IGF1, PEX16, RNF214, PIM1, JUNB, MDM2, PFKFB4, SIAH2, EGR2, KCNK10, EHMT2, FPR1, CD27, CETN2, and TGM1; (i) SPOCK3, PVR, CHTF8, SLC20A1, PARP8, FGG, ZFAND2A, CCL25, CALR, TM7SF2, FUS, DDAH2, SPAG4, FBXL14, LGALS8, GNE, HAS2, IGSF6, B4GALT1, POLK, PLK4, NDUFB4, GNG8, MUC1, AGGF1, PPIB, SLC1A4, HLA-DQB1, SEMA4G, MT2A, COL4A2, PLCB4, GYS1, PRKCG, RXFP2, PLA2G4C, ALDH1A2, ILIA, IBTK, SPARC, OAS3, EPHA4, HLA-B, MICB, CCL18, SLC39A6, GLCE, TUBB2B, FBXO8, and SNX6; (j) NLRC5, CACNB2, CELSR1, PARP8, ECT2, HTATIP2, NRP1, NCK2, TMEM100, CLCA2, BAALC, PTPN14, IRF9, SAA2, HR, IRGQ, AKT3, SYNGR1, NKX2-2, MT1H, SERPINA6, CAMK2N1, CCT6B, WDHD1, NKX3-1, LDHC, MALT1, CD9, CLGN, SLC25A19, MAP7, XCL1, ACSL6, TFRC, CAT, NKD1, CNBP, ALDH1L1, CCL7, SLC20A1, KRAS, CSF1, CASP2, HDAC11, KIR2DS4, CEACAM19, CFH, CAB39L, DEPDC1, and PSMA1; (k) CCK, SESN2, NACAD, PCSK9, C1R, SLC7A1, ECM1, XCL1, ARG2, SPSB1, DNAH17, TNNC1, CPN1, SYNGR2, CPA4, MYL1, DUOX2, ZNF621, GAPDHS, BCAP31, DLG1, IL17RB, SLC6A6, BCL2L2, HSPA1B, SLC1A4, TSTD1, HSPB8, MSC, CENPJ, ARL8A, CTLA4, GFRA1, WASF1, RIPK1, ENO3, KRT19, PLVAP, RAD18, ACHE, FBLN5, MGST2, ANAPC5, RFX5, CASP7, STC1, NCK2, IFI27, APOA4, and MSRB2.
In particular embodiments, the universal signature useful for predicting presence of an infection, severity of an infection, progression of an infection, or patient response to a vaccine against an infection comprises markers selected from: (a) MAFB, LGALS3, VCAN, PDK4, and CD81; (b) MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, FRMD5, INHBA, ALDH2, PAPSS2, LTF, S100A12, MS4A6A, GSTK1, RNF31, NOTCH4, and COL17A1; or (c) MAFB, LGALS3, VCAN, PDK4, CD81, OLFM4, MMP8, CD1D, KLF4, CSTA, IDH1, ITPRIPL2, HMOX1, VSIG4, FRMD5, INHBA, ALDH2, PAPSS2, LTF, S100A12, MS4A6A, GSTK1, RNF31, NOTCH4, COL17A1, S100A8, CTSG, STX11, PTX3, MYOF, LTA4H, TRIM26, CYP1B1, ARG1, IFNGR2, B3GNT5, KYNU, LPGAT1, SLC9A3R1, HP, PADI4, PSME1, MGST2, NR4A1, SPP1, DEFA3, ME1, RBP7, DUSP6, and MCRS1. In particular embodiments, the infection is a viral infection selected from SARS-CoV-2 or H1N1.
Applying Universal Signatures to a Second Disease Indication
Step 230 involves identifying a suitable second disease indication that is different from the first disease indication used to identify the universal signature. A suitable second disease indication is a disease indication in which the universal signature can be applied for predicting disease activity of the suitable second disease indication.
In various embodiments, the process of identifying a second disease indication involves comparing a condition that characterizes the second disease indication with a condition that characterizes the first disease indication. A condition of the first or second disease indication refers to any one of a precursor to a disease, a phenotype or sub-phenotype of a disease, progression from latent to acute infection, progression from acute to chronic infection, response to an intervention, susceptibility to disease or infection, presence of acute inflammation, presence of chronic inflammation, a clinical phenotype, or a clinical condition (e.g., high blood pressure, fever, loss of blood, loss of consciousness, or increased heart rate). In one embodiment, if the condition of the first disease indication and the condition of the second disease indication are the same, the condition is a common condition of the first and second disease indications. Given the common condition that characterizes both the first and second disease indications, the second disease indication can be selected for applying the universal signature which was previously developed from data of the first disease indication.
As an example, a first disease indication may refer to progression in infectious diseases. A second disease indication may refer to patient survival time after diagnosis with a brain tumor (e.g., glioma). Here, both infectious diseases and brain tumors are characterized by at least a common condition of chronic infection. Therefore, in comparing the conditions of infectious diseases and brain tumors, the common condition of chronic infection is identified. The second disease indication involving the disease of brain tumors is a suitable disease indication for applying the universal signature determined from data describing progression in infectious diseases.
As another example, a first disease indication and a second disease indication may share a common condition of a clinical phenotype. As a specific example, a first disease indication can involve H1N1 and a clinical phenotype of the disease is the need for mechanical ventilation. Therefore, a second disease indication can be identified that similarly shares the clinical phenotype of a need for mechanical ventilation. An example of an identified second disease indication involves SARS-CoV-2, as patients with SARS-CoV-2 often encounter the need for mechanical ventilation. Thus, the universal signature determined from data of H1N1 can be applied to generate predictions for SARS-CoV-2 patients. As another specific example, a first disease indication may involve H1N1 and a clinical phenotype of the disease is a response to a vaccination, as measured by antibody titers. A second disease indication, such as HBV, can be identified that shares the clinical phenotype of a response to a vaccination as measured by antibody titers. Thus, universal the signature determined from data of vaccine-administered H1N1 patients can be used to generate predictions for vaccine-administered HBV patients.
As another example, a first disease indication and a second disease indication may share a common condition of a cellular phenotype. A first disease indication can involve a cellular phenotype including a dysregulated cell population. A dysregulated cell population can be a cell population with aberrant behavior (e.g., dysregulated gene expression, biomarker expression, or protein synthesis). A second disease indication can be identified that shares the cellular phenotype of a dysregulated cell population (e.g., dysregulated gene expression, biomarker expression, or protein synthesis). Therefore, the universal signature determined from data of the first disease indication can be used to generate predictions for the second disease indication.
As another example, a first disease indication and a second disease indication may share a common condition of a dysregulated pathway expression. A dysregulated pathway expression refers to one or more aberrant pathways where markers of the pathway are differentially expressed in comparison to their expressions in a healthy state. As such, an aberrant pathway may be associated with and/or be the cause of multiple diseases (e.g., diseases of the first disease indication and second disease indication). In various embodiments, a dysregulated pathway expression refers to aberrant expression of one, two, three, four, five, six, seven, eight, nine, or ten markers of the pathway. In various embodiments, a dysregulated pathway expression refers to aberrant expression of at least ten markers of the pathway.
In various embodiments, each of the first disease indication and the second disease indication may be characterized by multiple conditions. Here, the process of identifying a second disease indication as suitable for applying the universal signature can involve determining whether there are a threshold number of common conditions between the first disease indication and the second disease indication. If the first disease indication and the second disease indication share at least a threshold number of common conditions, then the second disease indication is suitable for applying the universal signature developed using data for the first disease indication. In various embodiments, the threshold number of common conditions is one common condition, two common conditions, three common conditions, four common conditions, five common conditions, six common conditions, seven common conditions, eight common conditions, nine common conditions, or ten common conditions.
Step 240 involves obtaining expressions of markers of the universal signature expressed by patients, such as patients 130 described above in
In one embodiment, obtaining the expressions of markers of the universal signature encompasses obtaining samples from the patients associated with or having the second disease of the second disease indication and performing one or more assays on the samples to obtain the expressions of the markers of the universal signature. Example assays for obtaining expressions of the markers of the universal signature include quantitating biomarkers using antibodies or performing gene expression profiling with microarrays or RNAseq. In various embodiments, obtaining the expressions of the markers of the universal signature encompasses receiving, from a third party, a dataset including the expressions of the markers of the universal signature. In such embodiments, the third party may have performed the assay on samples obtained from patients associated with or having the second disease of the second disease indication to generate the expressions of markers of the universal signature.
Step 250 involves generating a prediction of the second disease indication for the patients by analyzing the expressions of markers of the universal signature of the patients. Step 250 describes, in further detail, step 135 in
In one embodiment, analyzing the expressions of the markers of the universal signature involves applying a machine learning model that generates predictions for a second disease indication (e.g., disease activity of a second disease). In this scenario, the markers of the universal signature serve as features for the machine learning model, which outputs the prediction of disease activity of the second disease indication 140. The machine learning model can be trained using a dataset including training examples that include expression of at least markers of the universal signature. In various embodiments, the training examples can further include a reference ground truth, which is an indication of the disease activity of the second disease. Here, the machine learning model can be trained using supervised learning such that the machine learning model can more accurately predict disease activity of the second disease based on the universal signature.
In various embodiments, the machine learning model can be trained using a machine learning implemented method such as any one of a linear regression algorithm, logistic regression algorithm, decision tree algorithm, support vector machine classification, Naïve Bayes classification, K-Nearest Neighbor classification, random forest algorithm, deep learning algorithm, or gradient boosting algorithm. In various embodiments, the machine learning model is trained using supervised learning algorithms, unsupervised learning algorithms, or semi-supervised learning algorithms (e.g., partial supervision).
In various embodiments, the process of training the machine learning model occurs subsequent to the development process (e.g., development process 150 described in
In various embodiments, a non-machine learning method is implemented to analyze the expression of the universal signature. For example, analyzing the expression of the markers of the universal signature involves performing an unsupervised cluster analysis of the patients 130 according to their expressions of the markers of the universal signature. The individual clusters are labeled and therefore, the patients in a cluster are classified according to the label. Therefore, the predicted disease activity of the second disease for a patient is based upon the cluster in which the patient is grouped into.
In various embodiments, the individual clusters are labeled by using patient data from the first disease indication. In various embodiments, patients of the first disease indication, whose disease activity is known, are overlaid on the reduced dimensionality. Therefore, the known disease activity of the patients of the first disease indication can be used to label the individual clusters. For example, patients of the first disease indication can be known as either responding to or not responding to a vaccination. Therefore, when overlaid on the reduced dimensionality, the clusters can be labeled as likely responders or non-responders according to the allocation of patients of the first disease indication. For example, if a majority of patients (e.g., greater than 50% of patients) of the first disease indication, who are identified as responders to a vaccine, are located more proximal or are overlapping with a first cluster in comparison to a second cluster, then the first cluster can be labeled as responders to the vaccine. As another example, if a majority of patients (e.g., greater than 50% of patients) of the first disease indication, who are identified as non-responders to a vaccine, are located more proximal or are overlapping with a first cluster in comparison to a second cluster, then the first cluster can be labeled as non-responders to the vaccine.
In various embodiments, the individual clusters are labeled by using patient data from the first disease indication. In various embodiments, gene expression of patients of the first disease indication, whose disease activity is known are used. Specifically, the expression data between training and test sets were not directly compared, as the range of expression is most likely more different across datasets than across phenotypes within a dataset. Thus, the direction of the signal is used rather than the amplitude: for each marker present in the universal signature, the median expression in each cluster was compared and the direction of the signal was recorded in each cluster (high, low or intermediate—in the presence of more than 2 clusters). The same analysis was performed in the training dataset where the universal signature was obtained from, using the true labels (case/control) instead of clusters to group the samples. Clusters in the test dataset were assessed for to determine the highest proportion of genes that matched the label of interest in the training dataset (in terms of signal direction) and defined it as “case cluster”, while the other cluster(s) were defined as control cluster.
Examples of unsupervised cluster analysis include hierarchical clustering, k-means clustering, clustering using mixture models, density based spatial clustering of applications with noise (DBSCAN), ordering points to identify the clustering structure (OPTICS), or combinations thereof. In preferred embodiments, unsupervised cluster analysis includes hierarchical density based spatial clustering of applications with noise (HDBSCAN).
In various embodiments, analyzing the expressions of markers of the universal signature involves performing dimensionality reduction analysis. For example, in scenarios in which multiple genes of a universal signature are used for generating a prediction for a second disease indication, dimensionality reduction analysis is useful for mapping the expressions of the markers of the universal signature into a lower dimensional space. Thus, predictions of the second disease indication can be made for patients according to expressions of the markers of the universal signature that have been mapped onto a lower dimensional space. Examples of dimensionality reduction analysis include principal component analysis (PCA), kernel PCA, graph-based kernel PCA, linear discriminant analysis, generalized discriminant analysis, autoencoder, non-negative matrix factorization, T-distributed stochastic neighbor embedding (t-SNE), or uniform manifold approximation and projection (UMAP) and dens-UMAP. Additional details of performing UMAP is described in Narayan, A. et al, “Density-Preserving Data Visualization Unveils Dynamic Patterns Of Single-Cell Transcriptomic Variability.” bioRxiv 2020.05.12.077776, which is hereby incorporated by reference in its entirety.
In various embodiments, combinations of the aforementioned methods (e.g., application of machine learning model, unsupervised clustering, and dimensionality reduction analysis) can be performed to generate a prediction of the second disease indication. As one example, in the embodiment shown in
In various embodiments, the prediction of the second disease indication for the patients can be useful for guiding the care that is provided to a patient. For example, given the prediction of the second disease indication that indicates that the patient is likely to undergo a progression of disease, the patient can be provided an intervention to slow or combat the progression of the disease.
In various embodiments, the prediction of the second disease indication for the patients can be useful for evaluating whether patients are eligible or ineligible for enrollment in clinical trials. For example, the prediction of the second disease indication can be evaluated against an eligibility criterion such that patients that meet the eligibility criterion can be enrolled in the clinical trial whereas patients that fail to meet the eligibility criterion are not enrolled. This is useful for particular clinical trials that enroll large numbers of patients in hopes of obtaining a sufficient number of patients that satisfy a particular criterion. Here, at the time of enrollment, it is not known whether the patients are likely to satisfy the criterion or not. For example, classic trials typically enroll a large number of patients with the hopes that a sufficient number of those enrolled patients meet the criterion after the fact. A large number of enrolled patients in a classic trial are subsequently eliminated for not meeting the criterion at a later timepoint.
For example, a control group for a clinical trial involving tuberculosis patients may require a sufficient number of patients to progress to active tuberculosis within a certain time frame (e.g., 6 months or 1 year). Thus, enrolled patients that do not progress within the time frame are eliminated from the trial.
Using the universal signature, the prediction of the second disease indication enables the prospective identification of patients with tuberculosis that would likely meet this criterion and therefore, can be enrolled in the clinical trial. Altogether, the use of the universal signature for generating predictions for a second disease indication for purposes of enrolling patients in clinical trials represents an enrichment strategy such that fewer patients need to be enrolled. This can be highly beneficial for clinical trials in which a limited numbers of patients are available e.g., in rare or novel diseases. For example, fewer enrolled patients in a clinical trial will result in substantial economic benefits.
System Environment
In various embodiments, the universal signature system 310 performs the methods described above in reference to
In various embodiments, the universal signature system 310 performs a subset of the methods described in
Third Party Entity
In various embodiments, the third party entity 330 represents a partner entity of the universal signature system 310. The third party entity 330 can operate either upstream or downstream of the universal signature system 310. As one example, the third party entity 330 operates upstream of the universal signature system 310 and provide information to the universal signature system 310 that enables the universal signature system 310 to perform the methods for identifying universal signatures. Here, the universal signature system 310 receives data, such as expressions of markers, of patients associated with a first disease indication from the third party entity 330. Thus, the universal signature system 310 analyzes the received data to identify one or more universal signatures.
As another example, the third party entity 330 operates downstream of the universal signature system 310. In this scenario, the universal signature system 310 uses the one or more universal signatures to generate a prediction for a second disease indication provides the prediction to the third party entity 330. The third party entity 330 can subsequently use the prediction for their purposes. For example, the third party entity 330 may be a healthcare provider. Therefore, the third party entity 330 can provide appropriate medical attention (e.g., medical advice, a treatment, an intervention, or the like) to a patient based on the prediction.
Network
This disclosure contemplates any suitable network 320 that enables connection between the universal signature system 310 and other third party entities 330A and 330B. The network 320 may comprise any combination of local area and/or wide area networks, using both wired and/or wireless communication systems. In one embodiment, the network 320 uses standard communications technologies and/or protocols. For example, the network 320 includes communication links using technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 3G, 4G, code division multiple access (CDMA), digital subscriber line (DSL), etc. Examples of networking protocols used for communicating via the network 320 include multiprotocol label switching (MPLS), transmission control protocol/Internet protocol (TCP/IP), hypertext transport protocol (HTTP), simple mail transfer protocol (SMTP), and file transfer protocol (FTP). Data exchanged over the network 320 may be represented using any suitable format, such as hypertext markup language (HTML) or extensible markup language (XML). In some embodiments, all or some of the communication links of the network 704 may be encrypted using any suitable technique or techniques.
Non-Transitory Computer Readable Medium
Also provided herein is a computer readable medium comprising computer executable instructions configured to implement any of the methods described herein. In various embodiments, the computer readable medium is a non-transitory computer readable medium. In some embodiments, the computer readable medium is a part of a computer system (e.g., a memory of a computer system). The computer readable medium can comprise computer executable instructions for implementing a machine learning model for the purposes of predicting a clinical phenotype.
Computing Device
The methods described above, including the methods of developing and applying one or more universal signatures, are, in some embodiments, performed on a computing device. Examples of a computing device can include a personal computer, desktop computer laptop, server computer, a computing node within a cluster, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like.
In various embodiments, the different methods described above in relation to
The methods for developing and applying one or more universal signatures can be implemented in hardware or software, or a combination of both. In one embodiment, a non-transitory machine-readable storage medium, such as one described above, is provided, the medium comprising a data storage material encoded with machine readable data which, when using a machine programmed with instructions for using said data, is capable of displaying any of the datasets and execution and results e.g., a prediction of disease activity of a second disease. Such data can be used for a variety of purposes, such as patient monitoring, treatment considerations, and the like. Embodiments of the methods described above can be implemented in computer programs executing on programmable computers, comprising a processor, a data storage system (including volatile and non-volatile memory and/or storage elements), a graphics adapter, an input interface, a network adapter, at least one input device, and at least one output device. A display is coupled to the graphics adapter. Program code is applied to input data to perform the functions described above and generate output information. The output information is applied to one or more output devices, in known fashion. The computer can be, for example, a personal computer, microcomputer, or workstation of conventional design.
Each program can be implemented in a high-level procedural or object oriented programming language to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language can be a compiled or interpreted language. Each such computer program is preferably stored on a storage media or device (e.g., ROM or magnetic diskette) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. The system can also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
The signature patterns and databases thereof can be provided in a variety of media to facilitate their use. “Media” refers to a manufacture that contains the signature pattern information of the present invention. The databases of the present invention can be recorded on computer readable media, e.g. any medium that can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. One of skill in the art can readily appreciate how any of the presently known computer readable mediums can be used to create a manufacture comprising a recording of the present database information. “Recorded” refers to a process for storing information on computer readable medium, using any such methods as known in the art. Any convenient data storage structure can be chosen, based on the means used to access the stored information. A variety of data processor programs and formats can be used for storage, e.g. word processing text file, database format, etc.
The storage device 408 is a non-transitory computer-readable storage medium such as a hard drive, compact disk read-only memory (CD-ROM), DVD, or a solid-state memory device. The memory 406 holds instructions and data used by the processor 402. The input interface 414 is a touch-screen interface, a mouse, track ball, or other type of input interface, a keyboard, or some combination thereof, and is used to input data into the computing device 400. In some embodiments, the computing device 400 may be configured to receive input (e.g., commands) from the input interface 414 via gestures from the user. The graphics adapter 412 displays images and other information on the display 418. For example, the display 418 can show a prediction of disease activity, such as a prediction of disease activity of a second disease 140 described above in
The computing device 400 is adapted to execute computer program modules for providing functionality described herein. As used herein, the term “module” refers to computer program logic used to provide the specified functionality. Thus, a module can be implemented in hardware, firmware, and/or software. In one embodiment, program modules are stored on the storage device 408, loaded into the memory 406, and executed by the processor 402.
The types of computing devices 400 can vary from the embodiments described herein. For example, the computing device 400 can lack some of the components described above, such as graphics adapters 412, input interface 414, and displays 418. In some embodiments, a computing device 400 can include a processor 402 for executing instructions stored on a memory 406.
Example Assays for Obtaining Expressions of Markers
In one embodiment, obtaining the expressions of markers encompasses obtaining samples from the individuals and performing one or more assays on the samples to obtain the quantities (e.g., expression values) of markers.
One approach for measuring expression levels is to perform identification with the use of antibodies. As used herein, the term “antibody” is intended to refer broadly to any immunologic binding agent such as IgG, IgM, IgA, IgD and IgE. Generally, IgG and/or IgM are the most common antibodies in the physiological situation and are most easily made in a laboratory setting. The term “antibody” also refers to any antibody-like molecule that has an antigen binding region, and includes antibody fragments such as Fab′, Fab, F(ab′)2, single domain antibodies (DABs), Fv, scFv (single chain Fv), and the like. In various embodiments, immunodetection methods can be employed to detect levels of expression. Some immunodetection methods include enzyme linked immunosorbent assay (ELISA), radioimmunoassay (RIA), immunoradiometric assay, fluoroimmunoassay, chemiluminescent assay, bioluminescent assay, and Western blot to mention a few. The steps of various useful immunodetection methods have been described in the scientific literature, such as, e.g., Doolittle and Ben-Zeev O, 1999; Gulbis and Galand, 1993; De Jager et al., 1993; and Nakamura et al., 1987, each incorporated herein by reference.
Another approach for measuring expression levels is to perform gene expression profiling with microarrays. Microarrays comprise a plurality of polymeric molecules spatially distributed over, and stably associated with, the surface of a substantially planar substrate, e.g., biochips. In gene expression analysis with microarrays, an array of “probe” oligonucleotides is contacted with a nucleic acid sample of interest, i.e., target, such as polyA mRNA from a particular tissue type. Contact is carried out under hybridization conditions and unbound nucleic acid is then removed. The resultant pattern of hybridized nucleic acid provides information regarding the genetic profile of the sample tested. Methodologies of gene expression analysis on microarrays are capable of providing both qualitative and quantitative information. One example of a microarray is a single nucleotide polymorphism (SNP)—Chip array, which is a DNA microarray that enables detection of polymorphisms in DNA.
Another approach for measuring expression levels is to perform gene expression profiling with high throughput sequencing (RNAseq). RNA-seq (RNA Sequencing), one example of which is Whole Transcriptome Shotgun Sequencing (WTSS), is a technology that utilizes the capabilities of next-generation sequencing to reveal a snapshot of RNA presence and quantity from a genome at a given moment in time. An example of a RNA-seq technique is Perturb-seq. The transcriptome of a cell is dynamic; it continually changes as opposed to a static genome. The recent developments of Next-Generation Sequencing (NGS) allow for increased base coverage of a DNA sequence, as well as higher sample throughput. This facilitates sequencing of the RNA transcripts in a cell, providing the ability to look at alternative gene spliced transcripts, post-transcriptional changes, gene fusion, mutations/SNPs and changes in gene expression. In addition to mRNA transcripts, RNA-Seq can look at different populations of RNA to include total RNA, nascent RNA, small RNA, such as miRNA, tRNA, and ribosomal profiling. RNA-Seq can also be used to determine exon/intron boundaries and verify or amend previously annotated 5′ and 3′ gene boundaries, Ongoing RNA-Seq research includes observing cellular pathway alterations that arise (e.g., for a particular disease indication), and gene expression level changes (e.g., for particular disease indications).
Further disclosed herein are particular combinations of 1) a first disease indication, 2) second disease indication, and 3) common condition shared between the first disease indication and second disease indication. Example combinations of first disease indication, second disease indication, and common condition are shown below.
Generally, in the first step, performance of 153 signatures on each training data set was characterized. Training datasets were from six studies covering responses to dengue infection, influenza H1N1 infection, and to vaccination to influenza, hepatitis B virus, and one study on tuberculosis in rhesus macaques. Machine learning models were trained and evaluated with the feature set restricted to the genes contained in the signature. Effectively, for any training dataset, for example on dengue infection, 153 models were obtained, from which ROC values and the individual importance of the genes in the original signature were extracted. The ROC AUCs were computed using the label prediction of each sample left out with the leave-one-out cross-validation strategy. As the different datasets do not contain the same fraction of cases and controls, it is not possible to directly compare ROC AUCs; for this reason, the results are expressed in percentiles rather than raw ROC AUC values.
ROC AUCs percentiles were obtained by comparing the literature signature to random list of genes of the same size. A large proportion of signatures performed well across training datasets, supporting the notion that published signatures contain valuable information that can be used to train predictive models and classifiers
To establish a universal signature for each training dataset, signatures were selected that had a ROC AUC higher than the 70th percentile compared to random list of genes of the same size. For the purpose of defining a universal signature, the cognate signature was excluded for this step in order to focus on genes that were also relevant in at least one external study.
Signatures that had a ROC AUC percentile above a given threshold were used at this step. Percentiles were determined as follows: for each signature—training dataset pair, 100 random genes signatures of the same size were used to compare the performance of the literature signature. Percentiles were used to be able to compare the numbers across datasets that did not have the same case/control distributions. The thresholds of 70, 80 and 90 were empirically tested and the 70th percentile was chosen, as the two latter were too stringent (in terms of number of signatures that passed the threshold) when the signatures were split by group. In order to be able to compare the gene importance feature across signatures for a given training dataset, each gene signature importance feature was standardized to obtain a mean of 0 and a standard deviation of 1 (z-scores). The z-scores were then aggregated, and the top unique genes were selected as representing the universal signature.
The first 50 genes with the highest standardized importance feature score were selected. As expected, universal signatures performed well on their target datasets (datasets they were trained on).
Because universal signatures include genes specifically selected because they had the highest weight in the random forest models, the approach leads to optimized signatures for a given training study dataset. Fitting an overly expressive model will limit the generalizability of signatures to new datasets. Therefore, moving forward, the universal signatures will include a list of genes and there are no weights attached to the genes. Thus, the next step of dimensionality reduction involved the use of the universal signatures without any weights, followed by unsupervised clustering and a hyperparameter-less decision boundary to explore the generalization ability of gene signature-based prediction on a new test dataset.
Literature signatures: Five categories of signatures from publications were derived, hereafter referred to as “literature signatures”: (i) curated sets of gene lists—referred as hallmark signatures (N=50, https://www.gsea-msigdb.org/gsea/msigdb/collections.jsp) (1), (ii) gene signatures associated with cell composition in PBMC—referred as cell type signatures (N=22) (2), (iii) vaccine protection and response signatures—referred as vaccine signatures (N=13), (iv) progression from latent to active TB infection signatures—referred as TB signatures (N=20) and (v) viral and bacterial infection signatures—referred as infection signatures (N=43). Of note, due to gene nomenclature conversion issues, some signatures may be missing some genes identified in the parent paper.
Training datasets: 14 different training datasets were used from six studies: one study on dengue infection (4) (Table 2—study 1), one study on influenza H1N1 infection (5) (Table 2—study 2), one study on trivalent Influenza vaccination comprising two cohorts, one with males (Table 2—study 3) and one with females (6) (Table 2—study 4)—each comprising 3 datasets obtained at different timepoints (pre-vaccination, day 1 and day 14 post-vaccination), one study on hepatitis B virus (HBV) vaccination (7) (Table 2—study 5)—comprising 3 datasets obtained at different timepoints (pre-vaccination, day 3 and day 7 post-vaccination) and one study on tuberculosis (TB) vaccination in rhesus macaques (8) (Table 2—study 6)—comprising 3 datasets obtained at different timepoints (pre-vaccination, pre-challenge with TB and 28 days post-challenge with TB). Of note, several studies contained multiple non-independent datasets (or timepoints). This design is expected to help understand the biology of shared transcriptome signature and enables to monitor what are the earliest time points with predictive power.
Test datasets: 3 test datasets from three studies were used: one study on bronchoalveolar lavage in SARS-CoV-2 infection (9) (Table 3—study 7), one study on influenza infection (10) (Table 3—study 8) and one longitudinal study on TB progression in latently infected individuals (11) (Table 3—study 9). Of note, all test datasets were independent from each other and from any training datasets.
Phenotypes used: Multiple phenotypes in the training and test datasets were explored; the phenotype can be categorized in four groups, namely (i) severity of symptoms during viral infection (for dengue, influenza and SARS-CoV-2 infection studies), (ii) vaccine response (for both HBV and influenza vaccination studies), (iii) disease state—for TB vaccination study in rhesus macaque, and (iv) time to disease in the longitudinal study TB progression. Further description and the number of individuals in each phenotype category per study is provided in Tables 2 and 3. Of note, the phenotype extracted from the publicly available datasets is not necessarily the one used in the original study. As an example, categorical/binary phenotypes were used even when the original study used numerical phenotype in order to be consistent across datasets and to better mimic future potential practical use cases.
The successful implementation of universal signatures described above leaves open the question of how to choose the universal signature to be applied in a new dataset. Specifically, training and test data sets were selected for diseases that were likely related due to underlying disease pathogenesis. For example, TB vaccination efficacy may relate to prevention of progression of TB, and the severity of viral disease caused by Dengue, SARS-CoV-2 and influenza may be considered to be related. To challenge this biological-understanding-biased decision, the performance of transfer signatures and test data sets from biological processes that were less clearly related were also evaluated. To this end the transfer signatures described above and additional transfer signatures from influenza and hepatitis B vaccination were used to predict the severity of inflammatory and autoimmune diseases (rheumatoid arthritis and asthma) and to predict survival from malignancy as measured in datasets from cancer.
“Related pairs” were defined as training-test pairs from diseases with apparent biological relationships. “Unrelated pairs” were defined as training-test pairs from unrelated diseases. All possible pairs of training (n=14) and test datasets (n=3 “related pairs”, n=34 “unrelated pairs”) were evaluated. Tables 7A (“related pairs”) and 7B (“unrelated pairs”) provide the F1 score obtained when comparing the inferred case cluster versus the inferred control cluster. The highest score is also provided for each test dataset.
As hypothesized, the original training-test pairs from diseases with more apparent biological relationships (dengue and SARS-CoV-2 and influenza; tuberculosis in an animal model and in humans) were appropriate choices (“related pairs”, Tables 7A and 7C showing F1 score and log 2 enrichment scores respectively). Additionally, good performance was observed for severe respiratory viral infection transfer signatures in rheumatoid arthritis, which reinforces the concept of shared immunophenotypes, and suggests that diseases with less apparent relationships clinically nevertheless have underlying similarities in biology that are identified by the machine learning-based approach described herein. In addition, some transfer signatures were occasionally predictors of outcome for certain cancer types (“unrelated pairs”, Table 7B and 7D showing F1 score and log 2 enrichment scores respectively). These observations extend the interest of exploring transfer signatures from infectious diseases to unrelated fields such as auto-immunity and in cancer.
Additionally,
As more specific examples, universal signatures for disease were generated by analyzing Rhesus Macaque or human datasets that included expressions of markers. These universal signatures were then applied to Rhesus Macaque (RM) or human data pertaining to a second disease indication. This experiment demonstrates the ability to develop universal signatures from data pertaining to a first disease indication that are then predictive for a second disease indication. In one scenario, the first disease indication and second disease indication differ according to the animal species in which the disease manifests (e.g., first disease in a RM and second disease in a human). Thus, the universal signatures are applicable across different disease indications, which in this scenario refers to diseases in different organisms.
Rhesus Macaque and human datasets were obtained from the following NCBI Gene Expression Omnibus databases: Accession number 79362, 102440, 110480, 17924, 21802, 111368, 145926, 48023, and 48018. To generate universal signatures, a feature selection process is performed on a dataset pertaining to a first disease indication. As used in the subsequent examples below, a feature selection process is performed on any of: a RM dataset including data pertaining to TB vaccine protection, a human dataset including data pertaining to progression of TB (e.g., progression of latent TB to active TB), an infectious disease database including human data pertaining to infectious diseases, or a human dataset including data pertaining to presence of TB, or an aggregation of two datasets (e.g., a RM dataset including data pertaining to TB vaccine protection and a human dataset including data pertaining to progression of TB). These datasets include expression data for genes and/or gene products such as gene transcripts (e.g., mRNA) and biomarkers/proteins.
Generally, a supervised feature selection process using random forest was performed on the dataset to identify signatures that are informative for the first disease indication. For example, a supervised feature selection process using random forest was performed on the RM dataset to identify RM signatures that are informative for distinguishing between RMs that exhibit TB vaccine protection and RMs that do not exhibit TB vaccine protection. A Random Forest model is run on each “gene signature-training dataset” pair. In the model, normalized gene expression of the subset of genes is used to classify the phenotype of interest. The models are trained using leave-one-out cross validation (LOOCV). The LOOCV strategy results in one RF model trained per sample per “gene signature-training dataset” pair. To obtain the combined gene importance feature, the feature importance scores are averaged across all models from a given “gene signature-training dataset” pair, resulting in one score of “importance” per gene per “gene signature-training dataset” pair, where the importance measure reflect the mean decrease in node impurity. The receiving operating characteristic (ROC) area under the curve (AUC) are computed using the predictions of the single left-out sample per trained model. In order to be able to compare the gene importance feature across signatures for a given training dataset, each gene signature importance feature is standardized to obtain a mean of 0 and a standard deviation of 1. The standardized scores are then aggregated, and the top unique genes are selected to be included in the universal signature.
Given the universal signature obtained from analysis of the first disease indication, the universal signature is applied to generate a prediction for a second disease indication. For example, a second dataset includes expressions of markers, a subset of which are included in the universal signature learned from data of a first disease indication. Thus, analyzing the expression of markers of the universal signature from the second dataset generates predictions for any of: vaccine protection in RM data, progression of TB in human data, or outlook (e.g., survival time) of human patients with brain cancer (e.g., glioma).
In this example, generating a prediction for the second disease indication involves performing a dimensionality reduction analysis on the quantities of the second dataset according to the signatures learned from the first dataset. Here, a uniform manifold approximation and projection (UMAP) analysis was conducted to map the expressions of the universal signature in the second dataset to a lower dimensional space. The dimension reduction was performed using dens-UMAP (http://cb.csail.mit.edu/cb/densvis/), that enable to maintain the local density of datapoint in the initial data space (Narayan, A. et al, “Density-Preserving Data Visualization Unveils Dynamic Patterns Of Single-Cell Transcriptomic Variability.” bioRxiv 2020.05.12.077776), Next, an unsupervised clustering analysis, specifically hierarchical density based spatial clustering (HDBScan), was performed on the expressions in the lower dimensional space to cluster and classify the patients. HDBSCAN can cluster data of varying shape and density, where the only parameter required to be provided is the minimal number of samples per cluster. The minimal number of samples was tested empirically for each unsupervised clustering, by identifying the number of samples per cluster that resulted in the lowest number of outliers and samples with low probability (<0.05) of cluster assignment. Thus, patients that fall within a particular cluster are predicted to have a particular disease activity (e.g., active or latent TB progression, better patient outlook or worse patient outlook, etc.).
More specifically, once clusters were identified, the inference of cluster attribution (case or control) was estimated based on the expression of the genes in the signature. Specifically, the direction of the signal rather than the amplitude was used for cluster attribution: for each gene present in the universal signature, the median expression in each cluster was compared and the direction of the signal in each cluster was recorded (high, low or intermediate—in the presence of more than 2 clusters). The same analysis was conducted in the training dataset where the universal signature was obtained from, using the true labels (case/control) instead of clusters to group the samples. Next, clusters in the test dataset were assessed according to the highest proportion of genes that matched the label of interest in the training dataset (in terms of signal direction), thereby defining clusters as either “case cluster” or control cluster. In the rare case where two clusters had the same proportion of matches, the sum of the absolute difference (in median expression) of the genes that matched the direction of the signal in the training dataset was compared. Of note, biological understanding can be used to decide which phenotype label in the training dataset would resemble the phenotype of interest (“case”) in the test dataset. For example, in the tuberculosis use case where the universal signature was obtained with the post-challenge timepoint, it was expected that the rhesus macaques that were not protected by the vaccine at the end of the study, were the most likely to resemble the individuals that were going to develop acute TB within in a year, as the rhesus macaques were already in a disease state at that time point and the unprotected animals were expected to have a much higher level of immune gene expression in the disease state. On the contrary, when the universal signatures obtained from the pre-vaccine or pre-challenge datasets were used, it was expected that the “case” phenotype to the be rhesus macaques that were protected by the vaccine at the end of the study, as the animals with higher basal level of immune gene expression (such as interferon stimulated genes) are expected to have a higher likelihood of vaccine protection.
Gene Signature evaluation in training datasets: A random forest model was run on each “literature signature-training dataset” pair (hereafter referred as S-D pair). In order to prevent overfitting the model to a specific pair and given the downstream goal of identifying genes that were common biomarkers across experiments and conditions, rather than specific to a single study or pair, hyperparameters were not tuned and were used as follow: number of trees (N=1,000); all other hyperparameters were the default in randomForest function from the R package “randomForest”. In the model, normalized gene expression of the subset of genes present in the signature was used to classify the phenotype of interest. For RNAseq input datasets, the normalization consisted in log 10 (reads per million mapped read+1e-7) and genes with initially less than 20 reads in every samples in the dataset were removed. For microarray input datasets, the normalized data from the GEO repository was retrieved, the normalized signal of all probes were averaged per gene and the log 10 (average normalized signal per gene+1e-7) was used as input for the model. The code used for running the random forest modeling was adapted from https://github.com/jasonzhao0307/R_lib_jason/blob/master/RF_output.R
Given the small sample size of most datasets and limited availability of datasets, the models were trained using leave-one-out cross validation (LOOCV), where for each sample of a dataset, all other samples from the same dataset are used to train the RF model, and the resulting model is used to predict the label or phenotype of the remaining sample. The LOOCV strategy results in one RF model trained per sample per S-D pair. To obtain the combined gene importance feature for a specific S-D pair, the gene importance scores were averaged across all models from a given S-D pair, resulting in one score of “importance” per gene per S-D pair, where the importance measure reflects the mean decrease in node impurity. The receiving operating characteristic (ROC) and precision recall (PR) area under the curve (AUC) are computed using the scores of the single left-out sample per trained model.
Extraction of universal signatures: Only literature signatures that had a ROC AUC percentile above a given threshold were used at this step. Percentiles were determined as follows: for each S-D pair, 100 random gene lists of the same size were used to compare the performance of the literature signature. Percentiles were used to be able to compare the numbers across datasets that did not have the same case/control distributions. The thresholds of 70, 80 and 90 were empirically tested and the 70th percentile was chosen, as the two latter were too stringent (in terms of number of literature signatures that passed the threshold) when the signatures were split by group. In order to be able to compare the gene importance feature across literature signatures for a given training dataset, each gene literature signature importance feature was standardized to obtain a mean of 0 and a standard deviation of 1 (z-scores). The z-scores were then aggregated, and the top unique genes were selected as representing the universal signature.
The number of genes (N=10, 20 and 50) were empirically tested. The size of 50 genes was chosen for further analyses, with the rationale that (i) 50 genes appeared to provide the best performance in the datasets for which the signature length appeared to play the largest impact and (ii) the larger the signature length the more likely the signature will generalize to other datasets under different conditions. The gene lists of universal signatures derived from all contributing literature signatures are provided in Table 5.
Gene set overrepresentation was performed on the Biological Process GO ontology. Significance was judged by Benjamini-Hochberg correct p-value cutoff of 0.01. The top 10 significant GO sets are laid out in a plane by placing sets of higher overlap closer to each other. Specifically the ‘enrichplot’ and ‘clusterProfiler’ R packages have been used. Gene enrichment for Tuberculosis (e.g., TB, TB Pre-vaccine, TB pre-challenge, and TB post-challenge) and Dengue universal signatures are provided in Tables 8-13.
Additionally, the performance of literature signatures is shown in Table 6. The classifying performance of the predicted phenotypes obtained from the random forest models (with leave-one-out cross validation) using the literature signatures was assessed for each training dataset. The columns in Table 6 represent the training datasets and the rows the literature signatures. In order to be able to compare the performance across datasets (which do not have the same case/control distribution), the ROC AUCs were evaluated in terms of percentiles. The percentiles are obtained by comparing the literature signature performance to 100 random gene lists of the same size. The higher the percentile the better the performance of the signature. Missing data—due to gene conversion issues or no expression in the training datasets—are entered as “NA”.
As shown in
In comparison,
Similarly,
Universal signatures were used in an unsupervised analysis to cluster samples from new test datasets, that originated from independent studies (notably new condition, new organism or new infectious agent). The dimension reduction was performed using Uniform Manifold Approximation and Projection (UMAP), followed by Hierarchical Density-Based Spatial Clustering of Application with Noise (HDBSCAN) which can cluster data of varying shape and density. In this approach, the only parameter required is the minimal number of samples per cluster. For this purpose, the minimal number was tested empirically by identifying the number of samples per cluster that resulted in the lowest number of outliers multiplied by a penalty score equivalent to the square of the number of clusters. This approach limits the creation of excessive numbers of clusters, which could make interpretation difficult. The minimum number of samples per cluster was set to contain at least 7% of the total population. HDBSCAN was run using the hdbscan command from the R package “dbscan” (https://github.com/mhahsler/dbscan). The samples considered as outliers by HDBSCAN, were attributed to the closest cluster label using the 3 nearest neighbors with the knn command from the R package “dbscan” (https://github.com/mhahsler/dbscan). The code used for running the dimensionality reduction and unsupervised clustering was adapted from https://github.com/NikolayOskolkov/ClusteringHighDimensions/blob/master/easy_scrnaseq_tsn e_cluster.R
Once the clusters were identified, the inference of cluster attribution (case or control) was estimated based on the expression of the genes in the signature. Specifically, the direction of the signal rather than the absolute value was used. For each gene present in the universal signature, the median expression in each cluster was compared and the direction of the signal in each cluster (high, low or intermediate—in the presence of more than 2 clusters) was recorded. The same analysis was conducted in the training dataset where the universal signature was obtained from, using the true labels (case/control) instead of clusters to group the samples. Next, the cluster in the test dataset that had the highest proportion of genes that matched the label of interest in the training dataset (in terms of signal direction) was identified and defined as “case cluster”, while the other cluster(s) were defined as control cluster. In the rare case where two clusters had the same proportion of matches, the sum of the absolute difference (in median expression) of the genes that matched the direction of the signal in the training dataset was compared. Of note, biological understanding was used to decide which phenotype label in the training dataset would resemble the most the phenotype of interest (“case”) in the test dataset, if not the clusters will be inverted. For example, in the tuberculosis use case, when the universal signature obtained with the post-challenge timepoint was used, it was expected that rhesus macaques that were not protected by the vaccine at the end of the study, were the most likely to resemble the individuals that were going to develop acute TB within in a year, as the rhesus macaques were already in a disease state at that time point and the unprotected animals were expected to have a much higher level of immune gene expression in the disease state. While on the opposite, when the universal signatures obtained from the pre-vaccine or pre-challenge datasets were used, it was reasoned that the “case” phenotype to the be rhesus macaques that were protected by the vaccine at the end of the study, as the animals with higher basal level of immune gene expression (such as interferon stimulated genes) are expected to have a higher likelihood of vaccine protection.
Universal signatures were evaluated to assess the challenge of enriching a clinical trial with individuals that are likely to reach a given endpoint. The scenario is the use of a pharmacological or vaccine intervention to prevent progression from latent tuberculosis to active disease. Progression to active tuberculosis is a rare event (estimated as 0.084 cases per 100 person-years); therefore, it would be important to be able to recruit individuals that are the most likely to develop active infection within one year. Indeed, in the presence of a limited numbers of individuals that may reach a study endpoint the study may lack power to detect differences between the placebo and vaccine or treatment group.
Here, universal signatures obtained with the datasets from the Hansen et al. study were evaluated (Hansen, S. G., et al. Prevention of tuberculosis in rhesus macaques by a cytomegalovirus-based vaccine. Nat Med 24, 130-143 (2018)). This study assessed the efficacy of a TB vaccine on Rhesus macaques, with longitudinal samples from 27 Rhesus macaques collected pre-vaccine, after vaccination but before TB challenge and four weeks post challenge. The phenotype used for training the random forest models was protection from TB (vaccine efficacy), defined as a computed tomography score of <10 (protected, N=13) at any time point post challenge versus not (not protected, N=14). Here, the target dataset was the data from Zak, D. E., et al. A blood RNA signature for tuberculosis disease risk: a prospective cohort study. Lancet 387, 2312-2322 (2016)., a longitudinal study assessing progression from latent to active TB. Cases were defined as individuals that developed TB within a year (N=30) and controls as individuals that did not develop TB within a year after entry in the study (N=109). The results of the unsupervised clustering are shown in
Here, a universal signature was extracted (e.g., using the feature extraction process described above) from RM datasets include data describing tuberculosis vaccine protection in RMs. Three different timepoints of data were analyzed to extract universal signatures: 1) pre-vaccine, 2) pre-challenge, and 3) post-challenge.
The universal signature was applied to human data to predict TB progression (latent TB to Active TB). This application of the universal signature to human data represents a cross-disease and cross-species analysis where the universal signature learned from one disease indication (e.g., TB vaccine protection in RMs) is useful for a prediction of a second disease indication (e.g., TB progression in humans).
The human data was analyzed by performing a dimensional reduction analysis on the universal signature, specifically a uniform manifold approximation and projection (UMAP) analysis. As shown in
As shown in
With the universal signature defined on the pre-vaccine rhesus macaque samples, 32.8% of the predicted cases were correct, i.e., developed active TB within a year, while the samples outside of this cluster contained only 11.1% of true cases. Here, the unsupervised clustering lead to a 3.0-fold enrichment and a 73.3% recall. In a similar setting, but with the universal signature derived from pre-challenge samples, a 2.0-fold enrichment (34.7% versus 14.4%) and a 56.7% recall was obtained, while with the signature derived from post-challenge samples, a 5.5-fold enrichment (60.0% versus 11.0%) and 60.0% recall was obtained.
Altogether, this example demonstrates that universal signatures learned from one disease indication (e.g., TB vaccine protection in RM) can be transfer learned and applied for predicting progressors or non-progressors of TB in a human dataset. Additionally, the use of the universal signatures would allow the prospective recruitment of individuals into clinical trials with a greater likelihood of reaching adequate power.
Here, the universal signature was extracted (e.g., using the feature extraction process described in Example 1) from human datasets include data describing presence of tuberculosis in human individuals. The universal signature was applied to human data, specifically on a human glioma dataset obtained from the Cancer Genome Atlas (TCGA), to classify patient outlook with glioma. Patient outlook refers to the patient survival time.
As shown in
As evident in
Again, these results establish that universal signatures learned from one disease indication (e.g., TB infection) can be transfer learned and applied for a second disease (e.g., patient outlook for glioma patients).
Universal signatures were assessed for their use in the setting of viral infection to predict or classify the severity of the symptoms of individuals that are hospitalized. Here, universal signatures were extracted from the dataset from the Devignot et al. study, consisting of children with acute dengue infection, with blood samples collected within 3 to 7 days after onset of fever (Devignot, S., et al. Genome-wide expression profiling deciphers host responses altered during dengue shock syndrome and reveals the role of innate immunity in severe dengue. PLoS One 5, e11671 (2010)). For the purpose of this analysis, children with severe manifestations of disease (shock syndrome and hemorrhagic fever; N=32) were considered as cases, while children that had uncomplicated dengue fever were considered controls (N=16). Data from Liao, M., et al. Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19. Nat Med 26, 842-844 (2020) and Dunning, J., et al. Progression of whole-blood transcriptional signatures from interferon-induced to neutrophil-associated patterns in severe influenza. Nat Immunol 19, 625-635 (2018) were used as two different target datasets.
The study of Liao et al characterized bronchoalveolar lavage fluid immune cells from patients infected with SARS-CoV-2. For the purpose of this analysis, cases were the individuals that were described as having severe disease (N=6), while individuals with moderate disease (N=3) or not infected (N=3) were considered as controls (total N=6). The RNA samples were obtained 4-10 days after the phenotypes were established. All true cases of severe SARS-CoV-2 study were correctly classified in unsupervised clustering.
The study of Dunning et al characterized blood samples from individuals hospitalized with influenza. For the purpose of this analysis, cases were considered as the individuals that required mechanical ventilation (N=20), while individuals that did not require respiratory support were considered as controls (N=63). Given that the phenotypes were established at the same time or before the RNA samples were obtained in both studies, the unsupervised clustering results therefore reflect the performance of universal signatures as classifiers rather than predictors. The inferred case cluster included 57.1% true cases (individuals that required mechanical ventilation), while none of the samples in the inferred control cluster were true cases. Both the SARS-CoV-2 and the influenza study achieved a 100% recall, thus supporting the transportability of signatures across different viral infections as represented by the capacity to classify and predict disease severity. Analysis of the content of the Dengue universal signature confirmed the enrichment of genes of the immune response (Table 8 and
As shown in
Using the universal signature, classification of infection severity for SARS-CoV-2 subjects was successful in differentiating between a case cluster (e.g., severe infection) and a control cluster (e.g., not severe infection). Additionally, using the universal signature, classification of infection severity for H1N1 subjects was successful in differentiating between a case cluster (e.g., severe infection) and a control cluster (e.g., not severe infection).
Again, these results establish that universal signatures learned from one disease indication (e.g., Dengue virus infection) can be transfer learned and applied for multiple second diseases (e.g., SARS CoV-2 infection and H1N1 infection).
Of note, the results described above for the various use cases used a 50-gene-long transfer signature; however, similar results were obtained when selecting only the top 20 genes, while the performance dropped with some of the 10-gene transfer signatures (
Tables
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Rhesus
Macaque
Rhesus
Macaque
Rhesus
Macaque
Rhesus
Macaque
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Cynomolgus
Macaque
Cynomolgus
Macaque
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
Homo
Sapiens
This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/062,665 filed Aug. 7, 2020, U.S. Provisional Patent Application No. 63/129,931 filed Dec. 23, 2020, and U.S. Provisional Patent Application No. 63/192,461 filed May 24, 2021, the entire disclosure of each of which is hereby incorporated by reference in its entirety for all purposes.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/044903 | 8/6/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63062665 | Aug 2020 | US | |
63129931 | Dec 2020 | US | |
63192461 | May 2021 | US |