This application contains a fifteen (15) tables and one (1) computer program listing as an appendix. They have been submitted electronically via EFS-Web as an ASCII text files. They have the following file attributes: (1) Supplementary Table 1, Annotation of Gene Expression Signatures entitled Supp_Table1.txt, it has a size of 2,042,486 bytes, and was created on Mar. 4, 2019; (2) Supplementary Table 2, Annotation of copy number segments entitled Supp_Table2.txt, it has a size of 728,822 bytes, and was created on Oct. 9, 2019; (3) Supplementary Table 3, Summary of Elastic Net models for gene entitled Supp_Table3.txt, it has a size of 1,236,182 bytes, and was created on Oct. 9, 2019; (4) Supplementary Table 4, Summary of Elastic Net models for molecular subtypes and histology in breast cancers entitled Supp_Table4.txt, it has a size of 37,355 bytes, and was created on Oct. 9, 2019; (5) Supplementary Table 5, Summary of Elastic Net models for protein expressions and clinical receptor statuses in breast cancers entitled Supp_Table5.txt, it has a size of 405,443 bytes, and was created on Oct. 9, 2019; (6) Supplementary Table 6, Summary of Elastic Net models for somatic mutations in breast cancers entitled Supp_Table6.txt, it has a size of 120,280 bytes, and was created on Oct. 9, 2019; (7) Supplementary Table 7, Summary of subtype-specific signature predictions in breast cancers entitled Supp_Table7.txt, it has a size of 1,001,837 bytes, and was created on Oct. 9, 2019; (8) Supplementary Table 8, Summary of Elastic Net models for gene expression signatures in lung cancers entitled Supp_Table8.txt, it has a size of 1,199,707 bytes, and was created on Oct. 9, 2019; (9) Supplementary Table 9, Summary of Elastic Net models for gene expression signatures using FOUNDATIONONE® genomic test genes entitled Supp_Table9.txt, it has a size of 56,199 bytes, and was created on Oct. 9, 2019; (10) Supplementary Table 10, Gene expression signature scores for 1038 TCGA breast tumors entitled Supp_Table10.txt, it has a size of 6,875,948 bytes, and was created on Oct. 9, 2019; (11) Supplementary Table 11, Gene expression signature scores for 512 TCGA LUAD tumors and 498 TCGA LUSC tumors entitled Supp_Table11.txt, it has a size of 6,627,663 bytes, and was created on Oct. 9, 2019; (12) Supplementary Table 12, Gene expression signature scores for 1689 METABRIC breast tumors entitled Supp_Table12.txt, it has a size of 6,235,101 bytes, and was created on Oct. 9, 2019; (13) Supplementary Table 13. Binary mutation matrix for 972 breast tumors entitled Supp_Table13.txt, it has a size of 162,934 bytes, and was created on Oct. 9, 2019; (14) Supplementary Table 14, Summary of Pan Cancer signature predictions entitled Supp_Table14.txt, it has a size of 577,688 bytes, and was created on Oct. 9, 2019; (15) Supplementary Table 15, List of amplicon signatures entitled Supp_Table15.txt, it has a size of 2,739 bytes, and was created on Oct. 9, 2019. (16) Computer program listing entitled helper.R, it has a size of 15,697 bytes, and was created on Sep. 25, 2019. All sixteen (16) files are hereby incorporated by reference in their entireties.
The present disclosure provides a method for generating a calculated cancer signature for a cancer-related phenotype based on copy number alterations (CNAs) in a patient sample. The calculated cancer signature may correspond to a somatic mutation, an mRNA expression signature, or a protein expression signature. The disclosure also provides a for method treating a patient using the calculated cancer phenotype. In addition, the disclosure provides a method for generating a calculated signature based on CNAs to replicate a cancer phenotype.
2.1. Introduction
The “background” description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Tumorigenesis is often driven by multiple types of aberrations in DNA leading to diseases of enormous complexity and heterogeneity. The ability to dissect this heterogeneity is crucial to understanding cancer mechanisms, and for identifying patient subgroups for personalized treatments. One limitation to capture this heterogeneity lies in the characterization of disease phenotypes. With the effort of many consortiums including The Cancer Genome Atlas (TCGA), large-scale multi-platform genomic data are now available, providing an opportunity to study cancer phenotypes on a molecular level and by using multiple technology types1-4. In particular, many gene expression signatures have been developed to define specific cancer phenotypes varying from proliferation rates to features of the tumor microenvironment5-7. These mRNA expression features, along with protein expression, somatic mutations, and clinical features provide a comprehensive molecular portrait of tumors. Integrating multi-platform genomic data together to elucidate the relationship between genotype and phenotype is critical to understanding genetic causes underlying tumor behaviour8. Building predictive models for key tumor driving phenotypes would be valuable to stratify patients for personalized treatments.
The present disclosure provides a method of generating a calculated cancer signature for a sample from a patient which comprises: (a) obtaining, or having obtained, a sample from the patient; (b) measuring, or having measured, a plurality of copy number alterations (CNAs) over a plurality of locations on a plurality of chromosomes; and (c) analyzing the measured CNAs using a mathematical model based on mRNA expression data and molecular subtypes, wherein the mathematical model has been validated by at least two different statistical methods so as to generate the calculated cancer signature for the sample. In one embodiment, greater than 50 CNAs are measured, alternatively greater than 100 CNAs are measured, alternatively between about 250 and about 400 CNAs are measured. In another embodiment, greater than 400 CNAs are measured. The plurality of copy number alterations (CNAs) are obtained from whole genome sequencing (WGS), whole exome sequencing (WES), or a combination thereof.
For the methods disclosed herein, the calculated cancer signature may correspond to a somatic mutation signature. The mathematical model to prepare the somatic mutation signature may be based on 10 or more beta-coefficient values in Supplemental Table 6. Alternatively, the mathematical model may be based on 20 or more beta-coefficient values, 40 or more beta-coefficient values, 60 or more beta-coefficient values, or 100 or more beta-coefficient values. In another embodiment, the mathematical model may be based on the top 5%, top 10%, top 25%, or top 50% of the beta-coefficient values.
In another embodiment of the methods disclosed herein, the calculated cancer signature may correspond to an mRNA expression signature, which may be a signature of a breast cancer subtype. The mathematical model to prepare the breast cancer subtype signature may be based on 10 or more beta-coefficient values in Supplemental Table 4. Alternatively, the mathematical model may be based on 20 or more beta-coefficient values, 40 or more beta-coefficient values, 60 or more beta-coefficient values, or 100 or more beta-coefficient values. In another embodiment, the mathematical model may be based on the top 5%, top 10%, top 25%, or top 50% of the beta-coefficient values.
In another yet embodiment of the methods disclosed herein, the calculated cancer signature may correspond to a protein expression signature. The mathematical model to prepare the protein expression signature may be based on 10 or more beta-coefficient values in Supplemental Table 5. Alternatively, the mathematical model may be based on 20 or more beta-coefficient values, 40 or more beta-coefficient values, 60 or more beta-coefficient values, or 100 or more beta-coefficient values. In another embodiment, the mathematical model may be based on the top 5%, top 10%, top 25%, or top 50% of the beta-coefficient values.
For the methods disclosed herein, the protein expression signature may be an immunohistochemistry (IHC) signature. The IHC signature may be an estrogen receptor (ER), an epidermal growth factor receptor (EGFR), a human epidermal growth factor receptor 2 (HER2), a progesterone receptor (PR), or a retinoblastoma (RB) signature.
The calculated cancer signature may correspond to a result from a commercial vendor for cancer diagnostics. For example, the calculated cancer signature may correspond to a FoundationOne® CDX result, an MAMMAPRINT® 70-GENE recurrence score, an OncotypeDX™ recurrence score, or a Prosigna® risk of recurrence score. The calculated cancer signature may be a FoundationOne® result and the mathematical model to prepare the FoundationOne® result may be based on 10 or more beta-coefficient values in Supplemental Table 9. Alternatively, the mathematical model may be based on 20 or more beta-coefficient values, 40 or more beta-coefficient values, 60 or more beta-coefficient values, or 100 or more beta-coefficient values. In another embodiment, the mathematical model may be based on the top 5%, top 10%, top 25%, or top 50% of the beta-coefficient values.
The calculated cancer signature may be associated with mutations, substitutions, or insertions or deletions (indels) in any of the following genes: (C17orf39), (MLL), (MLL2), ABL1, ACVR1B, AKT1, AKT2, AKT3, ALK, ALOX12B, AMER1, APC, AR, ARAF, ARFRP1, ARID1A, ASXL1, ATM, ATR, ATRX, AURKA, AURKB, AXIN1, AXL, BAP1, BARD1, BCL2, BCL2L1, BCL2L2, BCL6, BCOR, BCORL1, BRAF, BRCA1, BRCA2, BRD4, BRIP1, BTG1, BTG2, BTK, C11orf30, CALR, CARD11, CASP8, CBFB, CBL, CCND1, CCND2, CCND3, CCNE1, CD22, CD274, CD70, CD79A, CD79B, CDC73, CDH1, CDK12, CDK4, CDK6, CDK8, CDKN1A, CDKN1B, CDKN2A, CDKN2B, CDKN2C, CEBPA, CHEK1, CHEK2, CIC, CREBBP, CRKL, CSF1R, CSF3R, CTCF, CTNNA1, CTNNB1, CUL3, CUL4A, CXCR4, CYP17A1, DAXX, DDR1, DDR2, DIS3, DNMT3A, DOT1L, EED, EGFR, EP300, EPHA3, EPHB1, EPHB4, ERBB2, ERBB3, ERBB4, ERCC4, ERG, ERRFIl, ESR1, EZH2, FAM46C, FANCA, FANCC, FANCG, FANCL, FAS, FBXW7, FGF10, FGF12, FGF14, FGF19, FGF23, FGF3, FGF4, FGF6, FGFR1, FGFR2, FGFR3, FGFR4, FH, FLCN, FLT1, FLT3, FOXL2, FUBP1, GABRA6, GATA3, GATA4, GATA6, GID4, GNA11, GNA13, GNAQ, GNAS, GRM3, GSK3B, H3F3A, HDAC1, HGF, HNF1A, HRAS, HSD3B1, ID3, IDH1, IDH2, IGF1R, IKBKE, IKZF1, INPP4B, IRF2, IRF4, IRS2, JAK1, JAK2, JAK3, JUN, KDM5A, KDM5C, KDM6A, KDR, KEAP1, KEL, KIT, KLHL6, KMT2A, KMT2D, KRAS, LTK, LYN, MAF, MAP2K1, MAP2K2, MAP2K4, MAP3K1, MAP3K13, MAPK1, MCL1, MDM2, MDM4, MED12, MEF2B, MEN1, MERTK, MET, MITF, MKNK1, MLH1, MPL, MRE11A, MSH2, MSH3, MSH6, MST1R, MTAP, MTOR, MUTYH, MYC, MYCL, MYCN, MYD88, NBN, NF1, NF2, NFE2L2, NFKBIA, NKX2-1, NOTCH1, NOTCH2, NOTCH3, NPM1, NRAS, NT5C2, NTRK1, NTRK2, NTRK3, P2RY8, Page 4 of 36 RAL-0003-01, PALB2, PARK2, PARP1, PARP2, PARP3, PAX5, PBRM1, PDCD1, PDCD1LG2, PDGFRA, PDGFRB, PDK1, PIK3C2B, PIK3C2G, PIK3CA, PIK3CB, PIK3R1, PIM1, PMS2, POLD1, POLE, PPARG, PPP2R1A, PPP2R2A, PRDM1, PRKAR1A, PRKCI, PTCH1, PTEN, PTPN11, PTPRO, QKI, RAC1, RAD21, RAD51, RAD51B, RAD51C, RAD51D, RAD52, RAD54L, RAF1, RARA, RB1, RBM10, REL, RET, RICTOR, RNF43, ROS1, RPTOR, SDHA, SDHB, SDHC, SDHD, SETD2, SF3B1, SGK1, SMAD2, SMAD4, SMARCA4, SMARCB1, SMO, SNCAIP, SOCS1, SOX2, SOX9, SPEN, SPOP, SRC, STAG2, STAT3, STK11, SUFU, SYK, TBX3, TEK, TET2, TGFBR2, TIPARP, TNFAIP3, TNFRSF14, TP53, TSC1, TSC2, TYRO3, U2AF1, VEGFA, VHL, WHSC1, WHSC1L1, WT1, XPO1, XRCC2, ZNF217, or ZNF703.
The calculated cancer signature may be associated with a rearrangement of ALK, introns 18, 19; BCL2, 3′UTR; BCR, introns 8, 13, 14; BRAF, introns 7-10; BRCA1, introns 2, 7, 8, 12, 16, 19, 20; BRCA2, intron 2; CD74, introns 6-8; EGFR, introns 7, 15, 24-27; ETV4, introns 5, 6; ETV5, introns 6, 7; ETV6, introns 5, 6; EWSR1, introns 7-13; EZR. introns 9-11; FGFR1, intron 1, 5, 17; FGFR2, intron 1, 17; FGFR3, intron 17; KIT, intron 16; KMT2A (MLL), introns 6-11; MSH2, intron 5; MYB, intron 14; MYC, intron 1; NOTCH2, intron 26; NTRK1, introns 8-10; NTRK2, intron 12; NUTM1, intron 1; PDGFRA, introns 7, 9, 11; RAF1, introns 4-8; RARA, intron 2; RET, introns 7-11; ROS1, introns 31-35; RSPO2, intron 1; SDC4, intron 2; SLC34A2, intron 4; TERC, ncRNA; TERT, Promoter; or TMPRSS2, introns 1-3.
The calculated cancer signature may be a bladder urothelial carcinoma (BLCA), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), colon adenocarcinoma (COAD), esophageal carcinoma (ESCA), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), acute myeloid leukemia (LAML), brain lower grade glioma (LGG), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), pancreatic adenocarcinoma (PAAD), pheochromocytoma and paraganglioma (PCPG), prostate adenocarcinoma (PRAD), rectum adenocarcinoma (READ), sarcoma (SARC), skin cutaneous melanoma (SKCM), stomach adenocarcinoma (STAD), testicular germ cell tumors (TGCT), thyroid carcinoma (THCA), thymoma (THYM), or uterine corpus endometrial carcinoma (UCEC) signature. The mathematical model to prepare the calculated signature is based on 10 or more beta-coefficient values in Supplemental Table 14. Alternatively, the mathematical model may be based on 20 or more beta-coefficient values, 40 or more beta-coefficient values, 60 or more beta-coefficient values, or 100 or more beta-coefficient values. In another embodiment, the mathematical model may be based on the top 5%, top 10%, top 25%, or top 50% of the beta-coefficient values.
This disclosure also provides a method for treating a cancer patient with chemotherapy comprising the steps of: determining whether the patient has a specific cancer subtype by: (a) obtaining or having obtained a biological sample from the patient; (b) performing or having performed a gene level copy number alteration (CNA) assay on the biological sample wherein copy numbers are measured over a plurality of locations on a plurality of chromosomes; (c) comparing to results of the CNA assay to a set of standards to determine if the patient has a specific cancer subtype; and (d) if the patient has a specific cancer subtype, then administering a suitable chemotherapy regimen to the cancer patient in based on the determined cancer subtype. The chemotherapy regimen may be an ongoing therapeutic intervention. The ongoing therapeutic intervention comprises discontinuing a specific treatment.
In another embodiment, the disclosure provides a method for generating a calculated cancer signature for a cancer phenotype, the method comprising: (a) receiving a plurality of gene expression signatures and subtype information for the cancer phenotype; (b) receiving a plurality of copy number alteration (CNA) data sets for the cancer phenotype; (c) analyzing the plurality of CNA data sets with an artificial intelligence algorithm to obtain a preliminary set of CNA segment level signatures for the cancer phenotype; (d) using a gene expression training set to revise the preliminary set CNA segment level signatures and obtain a final set CNA segment level signatures; and (e) using the final set CNA segment level signatures to prepare the calculated cancer signature for the cancer phenotype. The cancer phenotype may be associated with a somatic mutation, a level of mRNA expression, a level of protein expression, or an immunohistochemistry (IHC) signature.
In addition, the disclosure provides a method for generating a calculated cancer signature for a patient, the method comprising: (a) receiving copy number alteration (CNA) data for the patient; (b) receiving one or more CNA(s) signature(s) associated with a cancer phenotype, wherein the CNA signature is based on cancer expression analysis, cancer subtype information, and CNA gain/loss information; (c) processing the CNA data for patient with an algorithm utilizing the one or more CNA(s) signature(s) associated with the cancer phenotype so as to characterize the properties of the CNA data for the patient properties relative to the one or more CNA(s) signature(s); and (d) preparing a calculated cancer signature for the patient The cancer phenotype may be associated with a somatic mutation, a level of mRNA expression, a level of protein expression, or an immunohistochemistry (IHC) signature.
For the methods disclosed herein, the cancer phenotype may be associated with an adrenal gland, a bladder, a bone, a breast, a cervix, a colon, a liver, a lung, a lymph, an ovarian, a pancreas, a penis, a prostate, a rectal, a salivary gland, a skin, a spleen, a testicular, a thymus gland, a thyroid, a trachea, or a uterine cancer. In a preferred embodiment, the cancer phenotype is associated with a breast cancer.
In another embodiment disclosure provides a method for treating a subject with cancer, comprising: (i) receiving copy number alteration (CNA) data for the patient; (ii) receiving one or more CNA(s) signature(s) associated with a cancer phenotype, wherein the CNA signature is based on cancer expression analysis, cancer subtype information, and CNA gain/loss information; (iii) processing the CNA data for the patient with an algorithm utilizing the one or more CNA(s) signature(s) associated with the cancer phenotype so as to characterize the properties of the CNA data for the patient properties relative to the one or more CNA(s) signature(s); (iv) preparing the calculated cancer signature for the patient based on the characterized properties; and (b) treating the patient based on a treatment plan based on the calculated cancer signature. The treatment may be an ongoing therapeutic intervention. The ongoing therapeutic intervention may comprise discontinuing a specific treatment.
The disclosure also provides a device comprising a processor configured to process the patient CNA data and the one or more CNA(s) signature(s) associated with the cancer phenotype with the algorithm to generate the calculated cancer signature described above. A system comprising the device of claim 36 is also provided. In the addition, the disclosure provides a device of claim 36, comprising software that comprises an algorithm to compare the patient CNA data with the one or more CNA(s) signature(s) associated with the cancer phenotype.
Those skilled in the art would recognize that there are a number of methods to obtain copy number alteration (CNA) data for use in the methods described herein. Sources of CNA data may be traditional methods including, but not limited to, fluorescent in situ hybridization (FISH), comparative genomic hybridization (CGH), array comparative genomic hybridization (aCGH), or single nucleotide polymorphism (SNP) arrays. More recently, methods to obtain CNA data for a sample have been described using whole genome sequencing (WGS), whole exome sequencing (WES), or a combination of WGS and WES. See Hehir-Kwa, et al. (2018) The clinical implementation of copy number detection in the age of next-generation sequencing, Expert Review of Molecular Diagnostics, 18:10, 907-915 for a review. There are a number of tools available to obtain CNA data from WGS or WES. Examples include ADTEx, ControlFREEC, VarScan2, and SynthEx. See, e.g., Silva et al. (2017) SynthEx: a synthetic-normal-based DNA sequencing tool for copy number alteration detection and tumor heterogeneity profiling Genome Biology 18:66; or Zare et al. (2017) An evaluation of copy number variation detection tools for cancer using whole exome sequencing data, BMC Bioinformatics (2017) 18:286. Software programs to obtain CNA data from next generation sequencers (copy number variant (CNV) callers) and their year of release include ADTex(2014), BIC-seq (2011), BreakDancer (2009), CANOES (2014), Canvas (2011), CLAMMS (2015), cn.MOPS (2012), CNVem (2013), CNVer (2010), CnvHiTSeq (2012), CNVkit (2016), CNVnator (2011), CNVrd2 (2014), CNV-seq (2009), CODEX (2015), CONIFER (2012), CONTRA (2012), Control-FREEC (2011), CoNVaDING (2015), Copy-Seq (2010), Cortex (2012), DECoN (2016), Delly (2012), Excavator (2013), ExomeCNV (2011), ExomeCopy (2011), ExomeDepth (2012), FermiKit (2015), GASV (2009), GATK(2010), GenomeSTRiPv2 (2015), GROM-RD (2015), iCopyDAV (2018), JointSLM (2011), LUMPY (2014), Magnolya (2009), m-HMM (2013), mrCaNaVAR (2009), PEMer(2009), Pindel (2009), RDXplorer (2009), ReadDepth (2011), RSICNV (2017), Samblaster(2013), SegSeq (2009), SeqCNV (2017), SOAPsv (2011), Ulysses (2015), VariationHunter (2009), VarScan (2012), and XHMM (2012). See McCormick (Aug. 8, 2019) CNV Analysis Shifts Focus to NGS Sequences https://www.biocompare.com/Editorial-Articles/363086-CNV-Analysis-Shifts-Focus-to-NGS-Sequences/.
There are many examples of approved drugs that would benefit from the methods described herein. The drugs include small molecule kinase inhibitors such as imatinib (Gleevac®) an inhibitor of breakpoint cluster region-abelson (BCR-ABL) approved initially for chronic myelogenous leukemia (CML). Examples of monoclonal antibody kinase inhibitors are trastuzumab (Herceptin®), an inhibitor of ERB-B2 and approved for breast cancer or bevacizumab (Avastin®), an inhibitor of vascular endothelial growth factor (VEGF) approved for colorectal cancer. Other examples of drugs approved with a companion diagnostic include drugs approved for BRCA1/2 mutations, KRAS mutations and cKIT expression. Table 1 lists a number of approved drugs including a number of kinase inhibitors. See, Janne et al., 2009 Nat. Rev. Drug Disc. 8 709-723; Levitzki and Klein, 2010 Mol. Aspects Med. 31, 287-329; and Mellor et al. 2011 Tox. Sci. 120(1) 14-32; and the package inserts for the specific drugs.
Abbreviations: For gene targets see Gene Cards (http://www.genecards.org/). For indications: ALL, acute lymphoblastic leukemia; AML, acute myeloid leukemia; BrCA, breast cancer; CML, chronic myeloid leukemia; CMML, chronic myelomonocytic leukemia; CRC, colorectal cancer; GIST, gastrointestinal stromal tumor; HCC, hepatocellular carcinoma; HNSCC, head and neck squamous cell carcinoma; MDS/MPD, myelodysplastic syndrome/myeloproliferative disease; NSCLC, non-small cell lung cancer; OvCA, ovarian cancer; RCC, renal cell carcinoma; STS, soft tissue sarcoma; and TNBC, triple negative breast cancer.
Many of these drugs are approved for use with a companion diagnostic. For example, trastuzumab (Herceptin®) is approved for breast cancer over expressing ERB-B2 and cetuximab (Erbitux®) for patients with wild-type KRAS. Amado et al., 2008, J Clin Oncol 26 (10): 1626-1634; Allegra et al., 2009 J Clin Oncol 27 2091-2096. Another kinase inhibitor approved for use with a diagnostic is crizotinib (Xalkori®) approved with a fluorescent in situ hybridization (FISH) test for ALK rearrangements (Vysis LSI ALK Dual Color, Break Apart Rearrangement Probe; Abbott Molecular, Abbott Park, Ill.). Shah et al., 2011 Lancet Oncol 12 1004-1012; Shaw et al., 2009 J Clin Oncol 27 4247-4253. Vemurafenib (Zelboraf®) is approved for use in patients with BRAF V600E mutation (Cobas 4800 BRAF V600 Mutation Test, Roche Molecular Diagnostics, Pleasanton, CA). Chapman et al., 2011 NEJM 364 2507-2516. Additional details may be found at the US FDA website for companion diagnostics (https://www.fda.gov/MedicalDevices/ProductsandMedicalProcedure/InVitroDiagnostics/ucm30 1431.htm). See the Biomarker column in Table 1 for additional companion diagnostics.
The methods disclosed herein may be used as an aid in the diagnostics and treatment of a number of cancers. Once a particular cancer is diagnosed there are a variety of targeted therapies that a clinician may use to treat the patient. Non-limiting examples for bladder cancer include erdafitinib (BALVERSA™) or pembrolizumab (KEYTRUDA®) in Table 1, additional therapies include avelumab (BAVENCIO®), durvalumab (IMFINZI™), or nivolumab (OPDIVO®). Non-limiting examples for BrCA include abemaciclib (VERZENIO®), ado-trastuzumab emtansine (KADCYLA®), alpelisib (PIQRAY®), atezolizumab (TECENTRIQ®), Everolimus (AFINITOR®), lapatinib (TYKERB®), olaparib (LYNPARZA®), palbociclib (IBRANCE®), pertuzumab (PERJETA®), ribociclib (KISQALI®) or trastuzumab (HERCEPTIN®), or trastuzumab (HERCEPTIN HYLECTA™) in Table 1, additional therapies include anastrozole (ARIMIDEX®), exemestane (AROMASIN®), fulvestrant (FASLODEX®), letrozole (FEMARA®), neratinib (NERLYNX™), tamoxifen (SOLTAMOX®), or toremifene (FARESTON®). Non-limiting examples for CRC include bevacizumab (AVASTIN®), Cetuximab (ERBITUX®), panitumumab (VECTIBIX®), ramucirumab (CYRAMZA®), or regorafenib (STIVARGA®) in Table 1, additional therapies include ipilimumab (YERVOY®), nivolumab (OPDIVO®), or ziv-aflibercept (ZALTRAP®). Non-limiting examples for HCC include pembrolizumab (KEYTRUDA®), ramucirumab (CYRAMZA®), regorafenib (STIVARGA®), or sorafenib (NEXAVAR®) in Table 1, additional therapies include cabozantinib (CABOMETYX™), lenvatinib (LENVIMA®), or nivolumab (OPDIVO®). Non-limiting examples for kidney cancer include axitinib (INLYTA®), bevacizumab (AVASTIN®), cabozantinib (CABOMETYX®), Everolimus (AFINITOR®), pazopanib (VOTRIENT®), pembrolizumab (KEYTRUDA®), sorafenib (NEXAVAR®), sunitinib (SUTENT®), temsirolimus (TORISEL®) in Table 1, additional therapies include avelumab (BAVENCIO®), ipilimumab (YERVOY®), lenvatinib mesylate (LENVIMA®), or nivolumab (OPDIVO®). Non-limiting examples for leukemia include dasatinib (SPRYCEL®), enasidenib (IDHIFA®), gilteritinib (XOSPATA®), imatinib (GLEEVEC®), ivosidenib (TIBSOVO®), midostaurin (RYDAPT®), nilotinib (TASIGNA®), or venetoclax (VENCLEXTA®) in Table 1, additional therapies include alemtuzumab (CAMPATH®), blinatumomab (BLINCYTO®), bosutinib (BOSULIF®), duvelisib (COPIKTRA™), gemtuzumab ozogamicin (MYLOTARG™), glasdegib (DAURISMO™), ibrutinib (IMBRUVICA®), idelalisib (ZYDELIG®), inotuzumab ozogamicin (BESPONSA®), moxetumomab pasudotox-tdfk (LUMOXITI™), obinutuzumab (GAZYVA®), ofatumumab (ARZERRA®), ponatinib (ICLUSIG®), rituximab (RITUXAN®), rituximab and hyaluronidase human (RITUXAN HYCELA™), tagraxofusp-erzs (ELZONRIS™), tisagenlecleucel (KYMRIAH®), or tretinoin (VESANOID®). Non-limiting examples for lung cancers, e.g., NSCLC, include in Table 1 afatinib (GILORAF®), alectinib (ALECENSA®), atezolizumab (TECENTRIQ®), bevacizumab (AVASTIN®), ceritinib (LDK378/ZYKADIA®), crizotinib (XALKORI®), dabrafenib (TAFINAR®), dacomitinib (VIZIMPRO®), erlotinib (TARCEVA®), gefitinib (IRESSA®), osimertinib (TAGRISSO®), pembrolizumab (KEYTRUDA®), pemetrexed (ALIMTA®), ramucirumab (CYRAMZA®), trametinib (MEKANIST®), additional therapies include brigatinib (ALUNBRIG™), durvalumab (IMFINZI™), lorlatinib (LORBRENA®), necitumumab (PORTRAZZA™), nivolumab (OPDIVO®). Non-limiting examples for lymphoma include acalabrutinib (CALQUENCE®), pembrolizumab (KEYTRUDA®), venetoclax (VENCLEXTA®) in Table 1, additional therapies include axicabtagene ciloleucel (YESCARTA™), belinostat (BELEODAQ®), bexarotene (TARGRETIN®), bortezomib (VELCADE®), brentuximab vedotin (ADCETRIS copanlisib (ALIQOPA™), denileukin diftitox (ONTAK®), duvelisib (COPIKTRA™), Ibritumomab tiuxetan (ZEVALIN®), ibrutinib (IMBRUVICA®), idelalisib (ZYDELIG®), mogamulizumab-kpkc (POTELIGEO®), nivolumab (OPDIVO®), obinutuzumab (GAZYVA®), polatuzumab vedotin-piiq (POLIVY™), pralatrexate (FOLOTYN®), rituximab (Rituxan®), rituximab and hyaluronidase human (RITUXAN HYCELA™), romidepsin (ISTODAX®), siltuximab (SYLVANT®), tisagenlecleucel (KYMRIAH®), vorinostat (ZOLINZA®). Non-limiting examples for melanoma include alitretinoin (PANRETIN®), binimetinib (MEKTOVI®), cobimetinib (COTELLIC®), dabrafenib (TAFINAR®), encorafenib (BRAFTOVI™), pembrolizumab (KEYTRUDA®), trametinib (MEKANIST®), or vemurafenib (ZELBORAF®) in Table 1, additional therapies include avelumab (BAVENCIO®), cemiplimab-rwlc (LIBTAYO®), ipilimumab (YERVOY®), nivolumab (OPDIVO®), sonidegib (ODOMZO®), or vismodegib (ERIVEDGE®). Non-limiting examples for multiple myeloma (MM) include Bortezomib (VELCADE®), carfilzomib (KYPROLIS®), daratumumab (DARZALEX™), elotuzumab (EMPLICITI™), ixazomib (NINLARO®), panobinostat (FARYDAK®), selinexor (XPOVIO™). Non-limiting examples for prostate cancer include abiraterone acetate (ZYTIGA®) in Table 1, additional therapies include apalutamide (ERLEADA™), Cabazitaxel (JEVTANA®), darolutamide (NUBEQA®), enzalutamide (XTANDI®), radium 223 dichloride (XOFIGO®). Additional drugs that may be used for cancer treatment include Denosumab (XGEVA®), Dinutuximab (UNITUXIN™), iobenguane I 131 (AZEDRA®), Lanreotide acetate (SOMATULINE® Depot), lutetium Lu 177-dotatate (LUTATHERA®), niraparib (ZEJULA™), rucaparib camsylate (RUBRACA™), ruxolitinib phosphate (JAKAFI®), Sirolimus (RAPAMUNE®), or Talazoparib (TALZENNA®).
While the following terms are believed to be well understood by one of ordinary skill in the art, the following definitions are set forth to facilitate explanation of the presently disclosed subject matter.
As used herein “area under curve” or “AUC” for a calculated signature is predictable if the AUC is greater than 0.60, or 0.65. The AUC for a calculated signature is “highly predictable” if it is 0.75 or greater. The AUC for a calculated signature may be 0.80, 0.85, 0.90, 0.95, 0.97, or greater.
As used herein, “clinical signs of cancer” means and includes any sign or indication of the existence of cancer in a subject, which sign or indication would be well known to the skilled artisan (e.g., oncologist, nurse practitioner). The clinical signs of cancer may be any symptom known to be associated with the cancer. Clinical signs of some cancers include, for example, chronic pain, nausea, vomiting, abnormal taste sensation, constipation, urinary symptoms (e.g., bladder spasm), respiratory symptoms, skin problems (e.g., pruritus, hair loss), or fever, among others.
As used herein, “remission” means and includes a period during which the symptoms of a cancer have been reduced or eliminated, as remission is ordinarily defined in the oncology art.
As used herein “serially monitoring” levels of a biomarker in a sample, refers to measuring levels of a biomarker in a sample more than once, e.g., quarterly, bimonthly, monthly, biweekly, weekly, every three days, daily, or several times per day. Serial monitoring of a level includes periodically measuring levels of biomarkers at regular intervals as deemed necessary by the skilled artisan.
The term “standard level” as used herein refers to a baseline level of a biomarker as determined in one or more normal subjects. For example, a baseline may be obtained from at least one subject and preferably is obtained from an average of subjects (e.g., n=2 to 100 or more), wherein the subject or subjects have no prior history of cancer. In the present invention, the measurement of biomarker levels may be carried out using the multiplexed copy number as described.
As used herein, “elevation” of a measured level of a biomarker relative to a standard level means that the amount or concentration of a biomarker in a sample is sufficiently greater in a subject relative to the standard to be detected by the methods described herein. For example, elevation of the measured level relative to a standard level may be any statistically significant elevation which is detectable. Such an elevation may include, but is not limited to, about a 1%, about a 10%, about a 20%, about a 40%, about an 80%, about a 2-fold, about a 4-fold, about an 8-fold, about a 20-fold, or about a 100-fold elevation, or more, relative to the standard. The term “about” as used herein, refers to a numerical value plus or minus 10% of the numerical value.
Non-limiting examples of signaling pathway modulators or chemotherapeutic agents known in the art are 5-fluorouracil; asparaginase; bevacizumab (AVASTIN®); bleomycin; campathecins; cetuximab (ERBITUX®); crizotinib (XALKORI®); cyclophosphamide; cytarabine; dacarbazine; dactinomycin; dasatinib (SPRYCEL®); daunorubicin; DNA methyltransferase inhibitors (DNMTs) such as azacitidine (VIDAZA®) and decitabine; doxorubicin; doxorubicin; epirubicin; erbstatin; erlotinib (TARCEVA®); estramustine; etoposide; etoposide; gefitinib (IRESSA®), gemcitabine, genistein, histone acetyl transferase inhibitors (HATs); histone deacetyl transferase inhibitors (HDACs) such as belinostat, entinostat (MS-275), panobinostat, PCI-24781, romidepsin (depsipeptide, FK-228), valproic acid, vorinostat (ZOLINZA®, SAHA) or heat shock protein inhibitors, including HSP90 inhibitors such as alvespimycin (IPI-493), AT13387, AUY922 (resorcinolic isoxazole amide), CNF2024 (BIIB021), HSP990, MPC-3100, retaspimycin (IPI-504), SNX-2112, SNX-5422, STA-9090, tanespimycin (17-AAG; KOS-953), or XL888; herbimycin A; hexamethylmelamine; hedgehog pathway inhibitors such as saridegib (IPL-926), vismodegib (ERIVEDGE™); hydroxyurea, idarubicin, ifosfamide, imatinib (GLEEVEC®), irinotecan, lapatinib (TYKERB lavendustin A, leucovorin, levamisole, mercaptopurine, methotrexate, mitomycin, mitoxantrone, mTOR inhibitors such as everolimus (AFINITOR®), sirolimus (RAPAMUNE®), temsirolimus (TORISEL®); nilotinib (TASIGNA®); nitrosoureas such as carmustine and lomustine; paclitaxel; panitumumab (VECTIBIX®); pazopanib (VOTRIENT®); pegaptanib (MACUGEN®); platinum compounds such as carboplatin, cisplatin, oxaplatin; plicamycin; procarbizine; proteasome inhibitors such as bortezomib (VELCADE®); ranibizumab (LUCENTIS®); sorafenib (NEXAVARC)); sunitinib (SUTENT®); taxanes such as docetaxel, paclitaxel, taxol; thioguanine; topotecan; trastuzumab (HERCEPTIN®); tyrosine kinase inhibitors; tyrphostins; vandetanib (CAPRELSA®); vemurafenib (ZELBORAFC)); vinblastine; vinca alkaloids; vincristine; or vinorelbine. In a preferred embodiment, the chemotherapeutic agent is bevacizumab (AVASTIN®), cetuximab (ERBITUX®), crizotinib (XALKORI®), dasatinib (SPRYCEL®), erlotinib (TARCEVA®), everolimus (AFINITOR®), gefitinib (IRESSA®), imatinib (GLEEVEC®), lapatinib (TYKERB®), nilotinib (TASIGNA®), panitumumab (VECTIBIX®), pazopanib (VOTRIENT®), sirolimus (RAPAMUNE®), sorafenib (NEXAVAR®), sunitinib (SUTENT®), temsirolimus (TORISEL®), trastuzumab (HERCEPTIN®), vandetanib (CAPRELSA®), or vemurafenib (ZELBORAF®). Further examples of chemotherapeutic agents may be found Table 1 above in standard publications and texts. See e.g., National Comprehensive Cancer Network (NCCN Guideline™) or Manual of Clinical Oncology, Dennis A. Casciato and Barry B. Lowitz, ed., 4th edition, Jul. 15, 2000, Little, Brown and Company, U.S.
Non-limiting examples of proteins whose expression signatures may be calculated using the methods disclosed herein are: 14-3-3_zeta, 4E-BP1, 4E-BP1_pS65, 4E-BP1_pT37_T46, 4E-BP1_pT70, 53BP1, ACC_pS79, ACC1, ADAR1, Akt_pS473, Akt_pT308, AMPK_alpha, Annexin_VII, AR, A-Raf_pS299, ASNS, Bap1-c-4, Bc1-2, Bim, B-Raf, B-Raf pS445, BRD4, Caspase-8, CDK1_pY15, Chk2, Chk2_pT68, cIAP, COG3, Cyclin_B1, Cyclin_EL DJ-1, DUSP4, Dv13, eEF2K, EGFR, eIF4E, eIF4G, ER, ER-alpha, ER-alpha_pS118, ERK2, FASN, FoxM1, GAPDH, GATA3, HER2, HER2_pY1248, INPP4B, IRS1, JNK2, MSH2, MSH6, NF2, p16_INK4a, p53, p62-LCK-ligand, p70S6K, p90RSK, P-Cadherin, PCNA, PDK1, PDK1_pS241, PI3K-p110-alpha, PI3K-p85, PR, PREX1, Rab25, Rad50, Raptor, S6, Smac, Smad1, Src, VEGFR2, XRCC1, or YAP_pS127. See
Throughout the present specification, the terms “about” and/or “approximately” may be used in conjunction with numerical values and/or ranges. The term “about” is understood to mean those values near to a recited value. For example, “about 40 [units]” may mean within ±25% of 40 (e.g., from 30 to 50), within ±20%, ±15%, ±10%, ±9%, ±8%, ±7%, ±6%, ±5%, ±4%, ±3%, ±2%, ±1%, less than ±1%, or any other value or range of values therein or there below. Alternatively, depending on the context, the term “about” may mean±one half a standard deviation, ±one standard deviation, or ±two standard deviations. Furthermore, the phrases “less than about [a value]” or “greater than about [a value]” should be understood in view of the definition of the term “about” provided herein. The terms “about” and “approximately” may be used interchangeably.
Throughout the present specification, numerical ranges are provided for certain quantities. It is to be understood that these ranges comprise all subranges therein. Thus, the range “from 50 to 80” includes all possible ranges therein (e.g., 51-79, 52-78, 53-77, 54-76, 55-75, 60-etc.). Furthermore, all values within a given range may be an endpoint for the range encompassed thereby (e.g., the range 50-80 includes the ranges with endpoints such as 55-80, 50-etc.).
As used herein, the verb “comprise” as used in this description and in the claims and its conjugations are used in its non-limiting sense to mean that items following the word are included, but items not specifically mentioned are not excluded.
Throughout the specification the word “comprising,” or variations such as “comprises” or “comprising,” will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The present disclosure may suitably “comprise”, “consist of”, or “consist essentially of”, the steps, elements, and/or reagents described in the claims.
A computing device may be implemented in programmable hardware devices such as processors, digital signal processors, central processing units, field programmable gate arrays, programmable array logic, programmable logic devices, cloud processing systems, or the like. The computing devices may also be implemented in software for execution by various types of processors. An identified device may include executable code and may, for instance, comprise one or more physical or logical blocks of computer instructions, which may, for instance, be organized as an object, procedure, function, or other construct. Nevertheless, the executable of an identified device need not be physically located together but may comprise disparate instructions stored in different locations which, when joined logically together, comprise the computing device and achieve the stated purpose of the computing device. In another example, a computing device may be a server or other computer located within a hospital or out-patient environment and communicatively connected to other computing devices (e.g., POS equipment or computers) for managing accounting, purchase transactions, and other processes within the hospital or out-patient environment. In another example, a computing device may be a mobile computing device such as, for example, but not limited to, a smart phone, a cell phone, a pager, a personal digital assistant (PDA), a mobile computer with a smart phone client, or the like. In another example, a computing device may be any type of wearable computer, such as a computer with a head-mounted display (HMD), or a smart watch or some other wearable smart device. Some of the computer sensing may be part of the fabric of the clothes the user is wearing. A computing device can also include any type of conventional computer, for example, a laptop computer or a tablet computer. A typical mobile computing device is a wireless data access-enabled device (e.g., an iPHONE® smart phone, a BLACKBERRY® smart phone, a NEXUS ONE™ smart phone, an iPAD® device, smart watch, or the like) that is capable of sending and receiving data in a wireless manner using protocols like the Internet Protocol, or IP, and the wireless application protocol, or WAP. This allows users to access information via wireless devices, such as smart watches, smart phones, mobile phones, pagers, two-way radios, communicators, and the like. Wireless data access is supported by many wireless networks, including, but not limited to, Bluetooth, Near Field Communication, CDPD, CDMA, GSM, PDC, PHS, TDMA, FLEX, ReFLEX, iDEN, TETRA, DECT, DataTAC, Mobitex, EDGE and other 2G, 3G, 4G, 5G, and LTE technologies, and it operates with many handheld device operating systems, such as PalmOS, EPOC, Windows CE, FLEXOS, OS/9, JavaOS, iOS and Android. Typically, these devices use graphical displays and can access the Internet (or other communications network) on so-called mini- or micro-browsers, which are web browsers with small file sizes that can accommodate the reduced memory constraints of wireless networks. In a representative embodiment, the mobile device is a cellular telephone or smart phone or smart watch that operates over GPRS (General Packet Radio Services), which is a data technology for GSM networks or operates over Near Field Communication e.g. Bluetooth. In addition to a conventional voice communication, a given mobile device can communicate with another such device via many different types of message transfer techniques, including Bluetooth, Near Field Communication, SMS (short message service), enhanced SMS (EMS), multi-media message (MMS), email WAP, paging, or other known or later-developed wireless data formats. Although many of the examples provided herein are implemented on smart phones, the examples may similarly be implemented on any suitable computing device, such as a computer.
An executable code of a computing device may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different applications, and across several memory devices. Similarly, operational data may be identified and illustrated herein within the computing device, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, as electronic signals on a system or network.
The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided, to provide a thorough understanding of embodiments of the disclosed subject matter. One skilled in the relevant art will recognize, however, that the disclosed subject matter can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the disclosed subject matter.
As used herein, the term “memory” is generally a storage device of a computing device. Examples include, but are not limited to, read-only memory (ROM) and random access memory (RAM).
The device or system for performing one or more operations on a memory of a computing device may be a software, hardware, firmware, or combination of these. The device or the system is further intended to include or otherwise cover all software or computer programs capable of performing the various heretofore-disclosed determinations, calculations, or the like for the disclosed purposes. For example, exemplary embodiments are intended to cover all software or computer programs capable of enabling processors to implement the disclosed processes. Exemplary embodiments are also intended to cover any and all currently known, related art or later developed non-transitory recording or storage mediums (such as a CD-ROM, DVD-ROM, hard drive, RAM, ROM, floppy disc, magnetic tape cassette, etc.) that record or store such software or computer programs. Exemplary embodiments are further intended to cover such software, computer programs, systems and/or processes provided through any other currently known, related art, or later developed medium (such as transitory mediums, carrier waves, etc.), usable for implementing the exemplary operations disclosed below.
In accordance with the exemplary embodiments, the disclosed computer programs can be executed in many exemplary ways, such as an application that is resident in the memory of a device or as a hosted application that is being executed on a server and communicating with the device application or browser via a number of standard protocols, such as TCP/IP, HTTP, XML, SOAP, REST, JSON and other sufficient protocols. The disclosed computer programs can be written in exemplary programming languages that execute from memory on the device or from a hosted server, such as BASIC, COBOL, C, C++, Java, Pascal, or scripting languages such as JavaScript, Python, Ruby, PHP, Perl, or other suitable programming languages.
As referred to herein, the terms “computing device” and “entities” should be broadly construed and should be understood to be interchangeable. They may include any type of computing device, for example, a server, a desktop computer, a laptop computer, a smart phone, a cell phone, a pager, a personal digital assistant (PDA, e.g., with GPRS NIC), a mobile computer with a smartphone client, or the like.
As referred to herein, a user interface is generally a system by which users interact with a computing device. A user interface can include an input for allowing users to manipulate a computing device, and can include an output for allowing the system to present information and/or data, indicate the effects of the user's manipulation, etc. An example of a user interface on a computing device (e.g., a mobile device) includes a graphical user interface (GUI) that allows users to interact with programs in more ways than typing. A GUI typically can offer display objects, and visual indicators, as opposed to text-based interfaces, typed command labels or text navigation to represent information and actions available to a user. For example, an interface can be a display window or display object, which is selectable by a user of a mobile device for interaction. A user interface can include an input for allowing users to manipulate a computing device, and can include an output for allowing the computing device to present information and/or data, indicate the effects of the user's manipulation, etc. An example of a user interface on a computing device includes a graphical user interface (GUI) that allows users to interact with programs or applications in more ways than typing. A GUI typically can offer display objects, and visual indicators, as opposed to text-based interfaces, typed command labels or text navigation to represent information and actions available to a user. For example, a user interface can be a display window or display object, which is selectable by a user of a computing device for interaction. The display object can be displayed on a display screen of a computing device and can be selected by and interacted with by a user using the user interface. In an example, the display of the computing device can be a touch screen, which can display the display icon. The user can depress the area of the display screen where the display icon is displayed for selecting the display icon. In another example, the user can use any other suitable user interface of a computing device, such as a keypad, to select the display icon or display object. For example, the user can use a track ball or arrow keys for moving a cursor to highlight and select the display object.
The display object can be displayed on a display screen of a mobile device and can be selected by and interacted with by a user using the interface. In an example, the display of the mobile device can be a touch screen, which can display the display icon. The user can depress the area of the display screen at which the display icon is displayed for selecting the display icon. In another example, the user can use any other suitable interface of a mobile device, such as a keypad, to select the display icon or display object. For example, the user can use a track ball or times program instructions thereon for causing a processor to carry out aspects of the present disclosure.
As referred to herein, a computer network may be any group of computing systems, devices, or equipment that are linked together. Examples include, but are not limited to, local area networks (LANs) and wide area networks (WANs). A network may be categorized based on its design model, topology, or architecture. In an example, a network may be characterized as having a hierarchical internetworking model, which divides the network into three layers: access layer, distribution layer, and core layer. The access layer focuses on connecting client nodes, such as workstations to the network. The distribution layer manages routing, filtering, and quality-of-server (QoS) policies. The core layer can provide high-speed, highly-redundant forwarding services to move packets between distribution layer devices in different regions of the network. The core layer typically includes multiple routers and switches.
The present subject matter may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present subject matter.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a RAM, a ROM, an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network, or Near Field Communication. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of the present subject matter may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++, Javascript or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present subject matter.
Aspects of the present subject matter are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the subject matter. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present subject matter. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely”, “only” and the like in connection with the recitation of claim elements, or the use of a “negative” limitation.
Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Preferred methods, devices, and materials are described, although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure. All references cited herein are incorporated by reference in their entirety.
The following Examples further illustrate the disclosure and are not intended to limit the scope. In particular, it is to be understood that this disclosure is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present disclosure will be limited only by the appended claims.
6.1. Summary
The ability to accurately characterize and predict tumor phenotypes, is crucial to patients for predicting prognosis. In addition, building predictive models for key phenotypes like signaling pathway activity, would be valuable to guide treatment selection. In this study, tumor DNA information and a comprehensive archive of gene expression signatures were utilized as a framework to fully characterize multiple aspects of tumor biology. An integrative computational approach using a genome-wide association analysis and an Elastic Net prediction method is presented to analyze the relationship between DNA copy number alterations and gene expression signatures. The approach was able to quantitatively predict many expression signature levels within individual tumors across breast cancers with high accuracy based upon DNA copy number features alone, including proliferation status and EGFR pathway activity. Elastic Net models were also able to predict many other key phenotypes including intrinsic molecular subtypes, some protein expression features including estrogen receptor status, and for somatic mutation status including TP53 and CDH1. This approach was successfully applied to multiple other tumor types (Pan-Cancer), which identified a number of repeatedly predictable signatures including immune cell features in squamous/basal-like cancers. These Elastic Net DNA predictors could also be called from commonly used DNA-based gene panels, thus to also inform about non-genetic tumor features that often guide therapeutic decision making. See Xia et al. (published Dec. 11, 2019) Genetic Determinants of the Molecular Portraits of Epithelial Cancers, Nat. Comm. 10:5666, the contents of which are incorporated in its entirety.
Characterization of Multiple Gene Signature-Specific DNA Copy Number Alterations
The possible associations between DNA Copy Number Alterations (CNAs) and multiple gene expression signatures was investigated first. The initial focus is on breast tumors, where multiple gene expression signatures are already in common clinical use10-12. A panel of 543 published gene expression signatures13 measuring diverse phenotypes including multiple signaling pathways, the known prognostic/predictive models, tumor microenvironment features, and features of DNA amplicons and deletions, was applied to 1038 breast cancers using the RNA-seq data coming from the TCGA breast cancer project2 (Supplementary Table 1). DNA copy number data were used to identify possible associations linking DNA CNAs to each signature-based phenotype. Previously Gatza et al.8 developed an association analysis method on a much smaller cohort of patients to examine the possible associations between CNAs and a limited panel of 52 gene signatures. This association analysis was modified to include another 491 signatures, and take into account molecular intrinsic subtype information (
The reproducibility of the association landscapes was analyzed by comparing these results to those from Gatza et al. for the same signatures8. All 52 signatures of Gatza et al. were included here, and in particular the RB-LOH signature15 is a focus noting that the current analysis used data on a much larger cohort of TCGA breast tumors (n=1038 vs. n=476); in addition, another systematic difference between the two studies is that Gatza et al. used gene expression microarrays while mRNA-seq was used. More importantly, molecular subtype was accounted for to identify universal associations irrespective of subtype. Despite these methodological differences, there was a high concordance between the association landscape for RB-LOH signature and that published by Gatza et al (
New, and old, possible associations were examined using all 543 gene signatures. Associations to previously determined DNA amplicon gene expression signatures were found and all encompassed regions of the corresponding amplicons (
CNA-Based Gene Signature Predictions by Elastic Net Models
Given the strengths of these associations, the feasibility of building computational predictors of gene expression signature levels based upon DNA CNAs features only was assessed. Based on the fact that associations between CNAs and gene signatures could be found, it was possible that at least some of the expression signatures would be predictable using DNA information alone. To successfully build predictive models, a statistical modeling approach called Elastic Net was used, which is a regularized regression model that is capable of handling large numbers of potential co-linear variables and then is able to select the most relevant features to build the final model9. Instead of using gene-level CNA scores as predictors, 536 segment-level CNA scores were calculated using predefined chromosome regions that have been shown to be important in cancers18-22 (Supplementary Table 2). These DNA segments included pan-cancer significant somatic CNAs as well as breast cancer subtype-specific CNA regions. The 1038 sample TCGA breast cancer data set was split into training set (70%) and testing set (30%). Models were built solely on TCGA training set and validated on both TCGA testing set as well as a large independent breast tumor data set (Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), n=1,689)3. Models were trained to classify samples into those with high signature scores (top third) versus low signature scores (bottom two-thirds). Area under ROC curve (AUC) values were used to evaluate model performance (
AUC distributions for all gene signatures demonstrated high predictability for some, but not all of the signatures (
tests showed that a test set AUC of 0.75 indicates significant predictive power (
To better understand these predictive models, the CNA regions selected by the Elastic Net models (Supplementary Table 3) were investigated. To directly compare versus the association landscapes, model feature landscapes for the three signatures are shown. Remarkably, for RB-LOH signature and basal signaling signature, which had many associations with CNAs, there was a significant amount of overlap between the association landscape and the Elastic Net model feature landscape (
CNA-Based Predictions for Intrinsic Molecular Subtypes
Next the Elastic Net DNA feature modeling strategy was applied to the prediction of other complex tumor phenotypes, including prediction of individual sample molecular subtypes of breast cancer11. Prediction models for all intrinsic subtypes demonstrated high AUC values (i.e. >0.75) indicating that these RNA-based phenotypes can be well explained by DNA-based information (
CNA-Based Predictions for Individual Protein Expression
In addition, the Elastic Net DNA-based modeling strategy was applied to build prediction models for individual proteins. The reverse phase protein array (RPPA) data measuring 216 proteins and phospho-proteins coming from TCGA breast cancer samples2 was utilized. Many studies have addressed the relationship between protein levels and mRNA abundance and concluded that mRNA transcript levels predict protein levels about 50% of the time27. A few studies have also investigated the influence of DNA copy number on protein expression and find some proteins with significant correlations, typically those that are the target of amplification or deletion28,29. However, these studies assessed correlations on individual genes. Here, the Elastic Net model is built to take into account the whole genome to predict protein expression. Using the aforementioned definition of “high precision” AUC greater or equal to 0.75, the model was able to accurately predict 16 out of the 216 protein expression levels present in the RPPA arrays (testing set AUC>0.75) (
In breast cancer, the most critical therapeutic biomarkers are ER, PR, and HER2 scored for by immunohistochemistry. For HER2 prediction, HER2 and 17q were selected with the largest coefficients, by both the model for HER2 RPPA protein expression, and by the model guided by HER2 clinical IHC status (
Finally, an increasingly common clinical assay today for cancer patients is a gene panel assay where typically hundreds of genes are DNA sequenced using massively parallel sequencing, thus giving somatic mutation status and DNA copy number values for each gene33. One of the most widely utilized gene panels is FoundationOne®, which at the time of the writing of this manuscript contained 313 genes. Using only the DNA copy number information for these 313 genes, all gene signature and protein expression Elastic Net prediction models achieved essentially identical results and AUC values (
CNA-Based Predictions for Somatic Mutations
Next the ability to predict individual somatic mutations was examined Mutation data from TCGA breast tumors that have highly confident mutation calls 34 was utilized and the analyses was limited to the significantly mutated gene list identified by previous work as well as frequently mutated genes (frequency >5%) excluding HLA and IGH genes. Only a few mutations passed the test set AUC threshold of 0.75, namely TP53, CDH1, MAP3K1 (
Subtype-Specific Predictions for Gene Signatures
To investigate if molecular subtype affects the predictability of gene signatures, identical Elastic Net analyses was performed as described above, but only applied to Basal-like subtype tumors, Luminal A subtype tumors, or Luminal tumors (HER2-Enriched, Luminal A and Luminal B combined). Results showed that prediction accuracies differed across different subtypes (Supplementary Table 7), and in some cases, also which signatures were predicable varied. One striking phenomenon was that some immune signatures that had low AUC values using models built using all breast cancer samples, had higher AUC values when using only Basal-like tumors (
Predictions for Gene Signatures in Lung Cancer
To evaluate the generalizability of the Elastic Net modeling strategy, prediction models using TCGA lung cancer data were evaluated including both lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC)40,41, again using gene expression signatures. First the DNA-based prediction models derived from breast cancers onto lung cancers were applied. Results identified 37 signatures that passed the AUC threshold of 0.75 across the lung training set, lung testing set and breast cancer testing set (
Pan-Cancer Predictions for Gene Signatures
Finally the Elastic Net modeling approach was applied to 23 other tumor types from TCGA that have multi-platform data and at least 100 samples4. 504 median expression-based gene signatures to each tumor type were examined the predictability of each signature in each tumor type. Results showed that successful models with high accuracy (AUC>0.75) were built for multiple tumor types besides breast cancer and lung cancer. Not surprisingly, there were more gene signatures that were highly predictable in tumor types that have more CNA events (
Noting that a Claudin-low signature, representing Claudin-low subtype and epithelial-mesenchymal transition (EMT)-like state43, were highly predictable among 11 tumor types, CNA regions that are universally important in predicting this signature were built in a model on combined data from these tumor types. The resulting model had training set AUC of 0.8 and testing set AUC of 0.74, indicating the multi-tumor model was able to predict the signature across 11 tumor types. In addition, CNA regions selected by this model highlighted many RAS/MAPK pathway components including a less-known gene ERAS (
6.2. Discussion
The ability to predict key tumor phenotypes, like mutation status or biomarker levels or complex expression phenotypes, is critical to understanding the biological complexity of solid epithelial cancers. Nowadays for breast cancer, protein expression analysis is required for ER, PR and HER2, and gene expression tests are common. For lung cancer, gene panel testing is included within the standard of care, and expression analyses (both mRNA and protein) are growing in prominence, in large part due to immunotherapy. Many solid epithelial cancers, particularly breast and lung, are thought to be at least partially DNA copy number driven because a large number of copy number events occur, and many are known genetic drivers45,46. It was reasoned that many key tumor phenotypes might be predicable when using the diversity of DNA copy number changes when examining a proposed copy number driven tumor type. To address this hypothesis, an extensive archive of manually curated gene expression signatures taken from multiple publications was used to study tumor phenotypes and estimate their predictability. The relationship between DNA copy number alterations and each gene expression signature was analyzed through two means; first was a genome-wide association method, while the second was to build Elastic Net prediction models and assess their accuracy. The association study allowed us to find genes positively or negatively correlated DNA features to expression signatures by evaluating genes one by one. These two methods cooperatively produced a big picture of linkages between CNAs and gene signatures. Known associations between CNAs and gene signatures were consistently found, including gene signatures of DNA amplifications and losses, and for gene signatures of more complex phenotypes including signaling pathway activities (i.e. TP53 and EGFR), and gene signatures of cellular proliferation status; in fact, the methods were able to predict many of these signatures with very high accuracy (AUC>0.9) on a true test set that even used different gene expression and DNA copy number technologies (i.e. METABRIC). Taken together each gene signature's association landscape and Elastic Net feature landscape provides CNA regions for further investigation for potential genetic drivers. In addition, further application of the Elastic Net modeling strategy to a variety of other molecular phenotypes including molecular intrinsic subtypes, protein expression levels and somatic mutation status (including tumor mutation burden) revealed the ability to accurately predict many key phenotypes in breast cancer. These models may be clinically useful and could at least provide an orthogonal approach for calling key features like ER, PR, and HER2 status in breast cancer, and might possibly even be used by itself eventually to call key phenotypes given the growing use of DNA exomes and gene panels in the cancer clinic.
For the analyses presented here, the expression signatures were divided into the highest tertile versus the bottom two tertiles; however, Elastic Net models where the expression signatures were treated as continuous variables were also analyzed, and these were also successful for those models that showed high AUCs when tested as dichotomous variables (
Many commercial gene panel tests have been developed with the goal of improving precision medicine. Using DNA CNA information of only 313 genes that can be derived by FoundationOne® genomic testing, gene expression signature prediction and protein expression prediction accuracies remained the same when compared to that using whole exome of CNA values. The 313 genes have been selected as highly cancer relevant and reported to be important in tumorigenesis. This result suggests a small part of the genome accounts for a large part of the predictive power of cancer phenotypes seen in some solid epithelial cancer types. This also sheds light on the application of Elastic Net models in the clinic. For example, various proliferation signatures, including the RB-LOH signature evaluated here, might serve as a potential biomarker for CDK4/6 inhibitors which target the RB/E2F pathway47. The Elastic Net model for RB-LOH signature could be used to stratify patients into those with high proliferation rates, which typically identifies those with RB loss, and for whom then a CDK4/6 inhibitor would not be recommended. Further validation is needed to confirm this specific hypothetical application, however, if validated, then a whole new set of prognostic and predictive biomarkers could be read out from existing DNA-based gene panels, thus providing more guidance for precision medicine at no additional cost.
Lastly, the generalizability of this approach was shown through a Pan-Cancer analysis of 23 different tumor types. Consistent with a working hypothesis, a variety of gene signatures besides amplicon signatures were predictable in tumor types that have many copy number changes. Tumor types that have been shown to share similar features had similar patterns of signature predictability. More importantly, those shared key features were often highly predictable such as immune features in squamous tumors and proliferation rate in adenocarcinomas.
Collectively these results demonstrate the ability to build CNA-based predictors for multiple key cancer phenotypes for breast and non-small cell lung cancer patients. While most research focuses on finding genetic drivers of tumorigenesis, the work carries important implications that critical complex tumor phenotypes can be predicted using DNA information, which could be potentially used in the clinic.
6.3. Methods
Gene expression data. Illumina HiSeq 2000 RNA sequencing data for human breast cancer and lung cancer (both Lung Adenocarcinoma and Lung Squamous Cell Carcinoma) were acquired from The Broad Institute TCGA GDAC Firehose4. Illumina HT-29 v3 expression data for the METABRIC project (n=1,992 samples) were acquired from the European Genome-phenome Archive at the European Bioinformatics Institute3. For TCGA breast cancer and lung cancer gene expression data, gene-level RNA-Seq reads were upper-quartile normalized and log 2 transformed, filtered to genes that were expressed in over 70% of samples, median centered and sample-wise standardized within each data set. For METABRIC microarray gene expression data, acquired data were filtered to genes that were expressed in over 70% of samples and were median centered for each gene and standardized for each sample. For both TCGA and METABRIC breast cancer data, PAM50 subtyping was applied as previously described2,3,11. Gene expression data for all other tumor types were downloaded from GDC PanCanAtlas publication site. For each tumor type, gene expression data were filtered to genes that were expressed in over 70% of samples, median centered and sample-wise standardized within each tumor type.
DNA copy number data. GISTIC2 gene-level copy number data for human breast cancer and lung cancer were acquired from The Broad Institute TCGA GDAC Firehose with no further processing. For the METABRIC project, copy number segmentation data using circular binary segmentation (CBS) algorithm were acquired from the European Genome-phenome Archive3. Using Ensembl 54 (hg18) genome build, gene-level copy number score were derived through the extreme method as used in GISTIC248: Genes that fell completely within a CBS-identified copy number segment were assigned corresponding segment value. Genes that overlapped with multiple segments were assigned the greatest amplification or the least deletion value among the overlapped segments. Genes with no overlapping segments were excluded from further analyses. GISTIC2 gene-level copy number data for all other tumor types were downloaded from GDC PanCanAtlas publication site with no further processing.
Protein expression data. Normalized protein expression data for human breast cancer were acquired from The Broad Institute TCGA GDAC Firehose with no further processing.
Mutation data. Mutation Annotation Format (MAF) data from 2015 TCGA Lobular Breast Cancer dataset were used34. MAF file was first filtered to only include the following variant classifications: Frame_Shift_Del, Frame_Shift_Ins, In_Frame_Del, In_Frame_Ins, Missense_Mutation, Nonsense_Mutation, Nonstop_Mutation, RNA, Splice_Site, Translation_Start_Site. A binary gene by sample matrix of 1 indicating any mutation and 0 indicating no mutation was then constructed based on the filtered MAF (Supplementary Table Mutation load for each sample was then determined by the total number of mutated genes in that sample.
Gene expression signatures. A panel of 543 previously published gene expression signatures were used to fully characterize cancer phenotypes. These 543 signatures were obtained from multiple publications or GSEA49 and were partially summarized by Tanioka et al13. The complete list of genes in each signature and their references is shown in Supplementary Table 1. Signature scores were calculated in a manner consistent with their derivation. For 504 signatures with homogeneous expression across the genes, median expression value was used as signature score. The rest of the signatures were based on correlation to predetermined gene centroids or based on published algorithms. For correlation-based signatures, all predetermined training sets are available to download through the GitHub repository (https://github.com/xyouli/DNA-based-predictors-of-non-genetic-cancer-phenotypes). For each such signature, DWD53 was used to first merge gene expression matrix with corresponding training set and then Pearson/Spearman correlation/Euclidean distance was computed for each sample in the merged data. For several algorithm-based signatures, corresponding R code is provided to calculate each signature. See the COMPUTER PROGRAM LISTING Appendix. See All 543 signatures were applied to TCGA breast cancer and lung cancer data as well as METABRIC data (See Supplementary Tables 12-14). 504 median-expression based signatures were applied to Pan Cancer data.
Identification of gene signature-specific CNAs. To identify associations between CNAs and gene expression signatures, two independent statistical tests were used8 on TCGA breast cancer cohort with matched gene expression and copy number data excluding all Normal-like samples (n=1038). For each signature, each gene was tested for significant association with molecular subtype taken into account as a confounding variable: signature˜CNA+(1|Basal)+(1|HER2)+(1|LumA)+(1|LumB) Positive/negative correlation was determined by the coefficient of CNA and p-value in the model. A Fisher's exact test was used to compare either frequency of CNA gain or loss in samples with high signature score (top quartile) and those with low signature score (bottom three quartiles). For each analysis, Benjamini-Hochberg multiple testing correction was used to adjust p values for each signature across all genes. Significant threshold was set to 0.01 to identify genes that were significant in both analyses.
Building Elastic Net prediction models. An Elastic Net modeling approach, which is a regularized regression method that linearly combines the L1 and L2 penalties of the Ridge Regression and Least Absolute Shrinkage and Selection Operator (LASSO), was used to build DNA CNA-based predictors of cancer phenotypes 9. Alternatively, other methods of machine learning may be used to build the models. Examples of such other method include, but are not limited to, LASSO: Tibshirani, Robert, 1996, “Regression shrinkage and selection via the lasso.” Journal of the Royal Statistical Society: Series B (Methodological) 58.1: 267-288; RIDGE: Hoer1, Arthur E., and Robert W. Kennard, 1970, “Ridge regression: Biased estimation for nonorthogonal problems.” Technometrics 12.1: 55-67; supporter vector machine: Cortes, Corinna, and Vladimir Vapnik, 1995, “Support-vector networks.” Machine learning 20.3: 273-297; or deep learning: LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton, 2015, “Deep learning.” nature 521.7553: 436. Generally, gene-level CNA scores were first collapsed to segment-level CNA scores. The complete list of genes in each segment is shown in Supplementary Table 218-22. Each segment score was calculated as the mean CNA score across genes within the segment. For each cancer phenotype, total sample was split into 70% training set and 30% testing set (R package sampling) stratified by clinical variables: overall survival, gender, race, ER status, PR status and HER2 status, histological subtype, pathologic stages and molecular subtype when available. Models were built on training set only; 200 rounds of Monte-Carlo cross validation (R package caret) were used to select the tuning parameters. Lambda values were selected over a range of alphas from 0.1 to 1 by 0.1. The optimal parameter combination was determined to have the best classification accuracy. Model with the optimal parameters was then applied to testing set and other validation sets if available. Receiving operating characteristics (ROC) curves were constructed and area under ROC curve (AUC) values were used to evaluate model performances. Phenotypes with AUC values above 0.75 are considered highly predictable.
For predicting gene expression signatures, protein expression, and mutation load that had continuous scores, models were built to classify samples with high scores (top third) versus low scores (bottom two thirds). For molecular subtype, clinical receptor status, cancer histology and mutations that had binary outcomes, models were built to classify each outcome. For breast cancer gene expression signatures, Normal-like samples were excluded (n=1038) as in association tests described above. For somatic mutations, all IGH and HLA genes were removed and only genes that have mutation frequency greater than 5% and/or significantly mutated genes identified in 2015 TCGA Lobular Breast Cancer analysis34 were included (supplementary Table 15).
For subtype-specific gene signature predictions, the same Elastic Net model approach was repeated within samples of a particular subtype, split into 70% training and 30% testing, and models were applied to METABRIC samples with the same subtype.
For gene signature and histology prediction using the non-small cell lung cancer data, the whole TCGA lung data set was used that combined both LUAD (n=498) and LUSC (n=512), which were split into training and testing sets balanced for clinical variables: overall survival, gender, pathological stages and histology (LUAD or LUSC). Models were built on training set and applied to testing set. Models built on TCGA breast cancer training set were also applied to the whole lung data set. Models were also built within LUAD and LUSC separately.
For Pan Cancer gene expression signature predictions, analysis was limited to tumor types with at least 100 samples that had RNA, DNA and clinical data. 504 median expression-based gene signatures were applied to each tumor type. For each signature prediction in each tumor type, total sample was split into 70% training set and 30% testing set, balanced for gender, race and overall survival. Models were then built on training set and applied to testing set to get training and testing AUC values.
To look at the features selected by each prediction model, coefficients of CNA segments were re-mapped to genes within each segment and plotted Summary of all Elastic Net models including coefficients of CNA segments and AUC values are reported in Supplementary Tables.
Data availability. The data sets generated and/or analyzed during the current study are available within the disclosure and its supplementary information tables. All raw and primary data come from TCGA and METABRIC public data repositories.
Applications to Differing Sources of Copy Number Alteration (CNA) Data.
Methods
Copy number alternation profiles determined by DNA exome sequencing were called by CNVkit for 1067 breast cancer samples from TCGA breast cancer project50. 536 segment values were calculated as described above. Elastic Net predictions for key gene expression signatures, clinical assays, clinical receptor statuses and molecular subtypes were made by applying the segment values from exome sequencing to existing models built from SNP array copy number data as described herein. Area under receiving operating curves (AUC) values were calculated to reflect model performances.
Results
To investigate if the Elastic Net models can be called from DNA exome sequencing data, DNA exome sequencing determined copy number data was applied to models built from DNA SNP array data for representative highly predictable phenotypes. The results shown in
The following numbered statements provide a general description of the disclosure and are not intended to limit the appended claims.
Statement 1: A method of generating a calculated cancer signature for a sample from a patient which comprises: (a) obtaining, or having obtained, a sample from the patient; (b) measuring, or having measured, a plurality of copy number alterations (CNAs) over a plurality of locations on a plurality of chromosomes; and (c) analyzing the measured CNAs using a mathematical model based on mRNA expression data and molecular subtypes, wherein the mathematical model has been validated by at least two different statistical methods so as to generate the calculated cancer signature for the sample.
Statement 2: The method of Statement 1, wherein greater than 50 CNAs are measured.
Statement 3: The method of Statement 1, wherein greater than 100 CNAs are measured.
Statement 4: The method of Statement 1, wherein between about 250 and about 400 CNAs are measured.
Statement 5: The method of any of Statements 1-4, wherein the calculated cancer signature corresponds to a somatic mutation signature.
Statement 6: The method of Statement 5, wherein the mathematical model to prepare the somatic mutation signature is based on 10 or more beta-coefficient values in Supplemental Table 6.
Statement 7: The method of any of Statements 1-4, wherein the calculated cancer signature corresponds to an mRNA expression signature.
Statement 8: The method of Statement 7, wherein the calculated cancer signature is a signature of a breast cancer subtype.
Statement 9: The method of Statement 7, wherein the mathematical model to prepare the breast cancer subtype signature is based on 10 or more beta-coefficient values in Supplemental Table 4.
Statement 10: The method of any of Statements 1-4, wherein the calculated cancer signature corresponds to a protein expression signature.
Statement 11: The method of Statement 10, wherein the mathematical model to prepare the protein expression signature is based on 10 or more beta-coefficient values in Supplemental Table 5.
Statement 12: The method of Statement 10, wherein the protein expression signature is an immunohistochemistry (IHC) signature.
Statement 13: The method of Statement 12, wherein the IHC signature is an estrogen receptor (ER), an epidermal growth factor receptor (EGFR), a human epidermal growth factor receptor 2 (HER2), a progesterone receptor (PR), or a retinoblastoma (RB) signature.
Statement 14: The method of any of Statements 1-4, wherein the calculated cancer signature corresponds to a FoundationOne® CDX result, an MAMMAPRINT® 70-GENE recurrence score, an OncotypeDX™ recurrence score, or a Prosigna® risk of recurrence score.
Statement 15: The method of Statement 14, wherein the calculated cancer signature is a FoundationOne® result and the mathematical model to prepare the FoundationOne® result is based on 10 or more beta-coefficient values in Supplemental Table 9.
Statement 16: The method of Any of Statements 1-4, wherein the calculated cancer signature is a bladder urothelial carcinoma (BLCA), cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), colon adenocarcinoma (COAD), esophageal carcinoma (ESCA), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP), acute myeloid leukemia (LAML), brain lower grade glioma (LGG), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), pancreatic adenocarcinoma (PAAD), pheochromocytoma and paraganglioma (PCPG), prostate adenocarcinoma (PRAD), rectum adenocarcinoma (READ), sarcoma (SARC), skin cutaneous melanoma (SKCM), stomach adenocarcinoma (STAD), testicular germ cell tumors (TGCT), thyroid carcinoma (THCA), thymoma (THYM), or uterine corpus endometrial carcinoma (UCEC) signature.
Statement 17: The method of Statement 16, wherein the mathematical model to prepare the calculated signature is based on beta-coefficient values in Supplemental Table 14.
Statement 18: The method of any of Statements 1-17, wherein the plurality of copy number alterations (CNAs) are obtained from whole genome sequencing (WGS), whole exome sequencing (WES), or a combination thereof.
Statement 19: A method for treating a cancer patient with chemotherapy comprising the steps of: determining whether the patient has a specific cancer subtype by: (a) obtaining or having obtained a biological sample from the patient; (b) performing or having performed a gene level copy number alteration (CNA) assay on the biological sample wherein copy numbers are measured over a plurality of locations on a plurality of chromosomes; (c) comparing to results of the CNA assay to a set of standards to determine if the patient has a specific cancer subtype; and (d) if the patient has a specific cancer subtype, then administering a suitable chemotherapy regimen to the cancer patient in based on the determined cancer subtype.
Statement 20: The method of Statement 19, wherein the chemotherapy regimen is an ongoing therapeutic intervention.
Statement 21: The method of Statement 20, wherein the ongoing therapeutic intervention comprises discontinuing a specific treatment.
Statement 22: The method of any of Statements 19-21, wherein the copy number alteration (CNA) assay is obtained from whole genome sequencing (WGS), whole exome sequencing (WES), or a combination thereof.
Statement 23: A method for generating a calculated cancer signature for a cancer phenotype, the method comprising: (a) receiving a plurality of gene expression signatures and subtype information for the cancer phenotype; (b) receiving a plurality of copy number alteration (CNA) data sets for the cancer phenotype; (c) analyzing the plurality of CNA data sets with an artificial intelligence algorithm to obtain a preliminary set of CNA segment level signatures for the cancer phenotype; (d) using a gene expression training set to revise the preliminary set CNA segment level signatures and obtain a final set CNA segment level signatures; and (e) using the final set CNA segment level signatures to prepare the calculated cancer signature for the cancer phenotype.
Statement 24: The method of Statement 23, wherein the cancer phenotype is associated with a somatic mutation.
Statement 25: The method of Statement 23, wherein the cancer phenotype corresponds to a level of mRNA expression.
Statement 26: The method of Statement 23, wherein the cancer phenotype corresponds to a level of protein expression.
Statement 27: The method of Statement 26, wherein the level of protein expression corresponds to an immunohistochemistry (IHC) signature.
Statement 28: The method of any of Statements 23-26, wherein the plurality of copy number alteration (CNA) data sets are obtained from whole genome sequencing (WGS), whole exome sequencing (WES), or a combination thereof.
Statement 29: A method for generating a calculated cancer signature for a patient, the method comprising: (a) receiving copy number alteration (CNA) data for the patient; (b) receiving one or more CNA(s) signature(s) associated with a cancer phenotype, wherein the CNA signature is based on cancer expression analysis, cancer subtype information, and CNA gain/loss information; (c) processing the CNA data for patient with an algorithm utilizing the one or more CNA(s) signature(s) associated with the cancer phenotype so as to characterize the properties of the CNA data for the patient properties relative to the one or more CNA(s) signature(s); and (d) preparing a calculated cancer signature for the patient.
Statement 30: The method of Statement 29, wherein the cancer phenotype is associated with a somatic mutation.
Statement 31: The method of Statement 29, wherein the cancer phenotype corresponds to a level of mRNA expression.
Statement 32: The method of Statement 29, wherein the cancer phenotype corresponds to a level of protein expression.
Statement 33: The method of Statement 29, wherein the level of protein expression corresponds to an immunohistochemistry (IHC) signature.
Statement 34: The method of Statement 29, wherein the cancer phenotype is associated with an adrenal gland, a bladder, a bone, a breast, a cervix, a colon, a liver, a lung, a lymph, an ovarian, a pancreas, a penis, a prostate, a rectal, a salivary gland, a skin, a spleen, a testicular, a thymus gland, a thyroid, a trachea, or a uterine cancer.
Statement 35: The method of Statement 41, wherein the cancer phenotype is associated with a breast cancer.
Statement 36: The method of any of Statements 29-35, wherein the copy number alteration (CNA) data are obtained from whole genome sequencing (WGS), whole exome sequencing (WES), or a combination thereof.
Statement 37: A method for treating a subject with cancer, comprising: (a) generating a calculated cancer signature for a patient comprising: (i) receiving copy number alteration (CNA) data for the patient; (ii) receiving one or more CNA(s) signature(s) associated with a cancer phenotype, wherein the CNA signature is based on cancer expression analysis, cancer subtype information, and CNA gain/loss information; (iii) processing the CNA data for the patient with an algorithm utilizing the one or more CNA(s) signature(s) associated with the cancer phenotype so as to characterize the properties of the CNA data for the patient properties relative to the one or more CNA(s) signature(s); (iv) preparing the calculated cancer signature for the patient based on the characterized properties; and (b) treating the patient based on a treatment plan based on the calculated cancer signature.
Statement 38: The method of Statement 37, wherein the treatment is an ongoing therapeutic intervention.
Statement 39: The method of Statement 37, wherein the ongoing therapeutic intervention comprises discontinuing a specific treatment.
Statement 40: The method of any of Statements 37-39, wherein the copy number alteration (CNA) data are obtained from whole genome sequencing (WGS), whole exome sequencing (WES), or a combination thereof.
Statement 41: A device comprising a processor configured to process the patient CNA data and the one or more CNA(s) signature(s) associated with the cancer phenotype with the algorithm to generate the calculated cancer signature for the patient of Statement 29.
Statement 42: A system comprising the device of Statement 41.
Statement 43: The device of Statement 41, comprising software that comprises an algorithm to compare the patient CNA data with the one or more CNA(s) signature(s) associated with the cancer phenotype.
Statement 36: The device of any of Statements 41-43, wherein the copy number alteration (CNA) data are obtained from whole genome sequencing (WGS), whole exome sequencing (WES), or a combination thereof.
It should be understood that the above description is only representative of illustrative embodiments and examples. For the convenience of the reader, the above description has focused on a limited number of representative examples of all possible embodiments, examples that teach the principles of the disclosure. The description has not attempted to exhaustively enumerate all possible variations or even combinations of those variations described. That alternate embodiments may not have been presented for a specific portion of the disclosure, or that further undescribed alternate embodiments may be available for a portion, is not to be considered a disclaimer of those alternate embodiments. One of ordinary skill will appreciate that many of those undescribed embodiments, involve differences in technology and materials rather than differences in the application of the principles of the disclosure. Accordingly, the disclosure is not intended to be limited to less than the scope set forth in the following claims and equivalents.
All references, articles, publications, patents, patent publications, and patent applications cited herein are incorporated by reference in their entireties for all purposes. However, mention of any reference, article, publication, patent, patent publication, and patent application cited herein is not, and should not be taken as an acknowledgment or any form of suggestion that they constitute valid prior art or form part of the common general knowledge in any country in the world. It is to be understood that, while the disclosure has been described in conjunction with the detailed description, thereof, the foregoing description is intended to illustrate and not limit the scope. Other aspects, advantages, and modifications are within the scope of the claims set forth below. All publications, patents, and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference.
9. COMPUTER PROGRAM
DNA-based-predictors-of-non-genetic-cancer-phenotypes
Analysis scripts and relative data used in the paper “Genetic determinants of the molecular portraits of epithelial cancers” by Xia et al.
Data
contains many .rda files ready to use in the analysis. May be found at http://github.com/xyouli/DNA-based-predictors-of-non-genetic-cancer-phenotypes
Rscripts
This application claims the benefit of U.S. Appn. No. 62/912,727 filed Oct. 9, 2019, Perou et al., Atty. Dkt. No. 150-32-PROV, which is hereby incorporated by reference in its entirety.
This invention was made with government support under Grant Numbers CA58223, CA148761 and CA195740 awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US20/55093 | 10/9/2020 | WO |
Number | Date | Country | |
---|---|---|---|
62912727 | Oct 2019 | US |