Many cancer cells undergo continuous mutation that can help tumors evade therapeutic interventions and significantly impact disease progression. New and improved methods are needed to conclusively identify the various pathways that drive cancer mutagenesis, and to specifically quantify the extent to which different mutagenic pathways are active in tumor cells.
Tumor cell mutagenesis presents a complex web of DNA mutations with multiple pathways and targets implicated in mutagenesis. The present disclosure provides a strategy and approach for systematically identifying and quantifying the mutagenic contribution of a given mutagenic pathway (e.g., the APOBEC family of cytidine deaminases). Using this approach, one can define the mutagenic features and trends in a specific tumor cell type or cell line and track the time course of tumor mutagenesis as defined by which specific pathways are active at a given point in time. The observed data can then be cross-referenced with broader trends in clinical tumor progression and used to guide treatment and inform cancer diagnosis and prognosis. As disclosed herein, APOBEC3A, APOBEC3B, and REV1 have been identified as a driver of specific mutagenic signatures in tumor cells.
The increase in whole-genome sequencing throughput over the last decade has enabled systematic investigations into the patterns of somatic mutations in genomes (e.g., genomes of healthy cells and genomes of cancer cells) at high resolution. Such efforts revealed both unclustered and clustered mutations at cytosine bases commonly present at TCN (where N is any base) trinucleotide sequence contexts in human cancers (Nik-Zainal et al. 2012; Roberts et al. 2012). Previously recognized sequence preferences of the APOBEC3 family of cytidine deaminases, which target DNA and RNA of viruses and retroelements at TCN sequence contexts as part of the innate immune defense, led to the proposal that certain observed mutations may arise due to APOBEC3 off-target activity (Nik-Zainal et al. 2012; Roberts et al. 2012). The APOBEC3 family, which evolved to edit viral sequences, thus emerged as a putative double-edge sword lingering in human cells, but might sometimes faultily unleash its mutagenic effects on the human genome.
Subsequent mathematical deconvolution of somatic mutational patterns across thousands of human cancer genomes led to the identification of APOBEC3-associated mutational signatures in more than 75% of cancer types and more than 50% of all cancer genomes analyzed to date, with a particular prominence in many cancers of unknown etiological origins, such as breast cancer, bladder cancer, lung adenocarcinoma, esophageal adenocarcinoma, and other cancers (Alexandrov et al. 2020; Alexandrov et al. 2013). Mutagenesis by APOBEC3 deaminases thus emerged as a putative source of one of the most prominent mutational processes in cancer.
Two mutational signatures, termed ‘SBS2’ and ‘SBS13’ have been proposed to be caused by off-target (e.g., activity not associated with protection from viral infections) APOBEC3 activity (Alexandrov et al. 2020). SBS2 is characterized specifically by C>T base substitutions at TCN trinucleotides, while SBS13 is characterized by C>T, C>G, and C>A mutations at TCN motifs (
The identity of the APOBEC3 deaminase(s) underlying somatic mutations in cancer is a major ongoing question in the field. Several studies revealed that APOBEC3B mRNA expression is upregulated in various cancers compared to the matched healthy tissues, and that it positively correlates with genome-wide APOBEC3 mutational burden (Burns et al. 2013; Roberts et al. 2013; Burns et al. 2013; de Bruin et al. 2014; Middlebrooks et al. 2016). Based on these insights, these studies concluded that APOBEC3B was the major contributor of mutations in breast and other cancers. The extent to which APOBEC3B generates mutations in cancer was put into question upon finding that a germline APOBEC3B deletion polymorphism associates with an elevated risk of breast cancer in East Asian populations and a higher overall burden of APOBEC3-associated mutations in breast cancers (Nik-Zainal et al. 2014; Long et al. 2013; Komatsu et al. 2008). The deletion polymorphism that effectively deletes the APOBEC3B sequence and fuses the APOBEC3A coding sequence to the 3′UTR of APOBEC3B was subsequently found to stabilize APOBEC3A mRNA (Caval et al. 2014). More recent refinements of expression-based analyses suggested that APOBEC3A is the only APOBEC3 whose expression correlates with APOBEC-induced mutational burden thus nominating it as a more prominent mutator relative to APOBEC3B (Cortez et al. 2019). Consistent with the possibility that APOBEC3A may be a more prominent mutator than APOBEC3B, examination of the extended sequence contexts at which mutations at TCN contexts occur in human cancer genomes revealed that pyrimidine(Y)-preceded TCN (YTCN) contexts preferred by APOBEC3A are more commonly mutated than purine(R)-preceded TCN (RTCN) motifs preferred by APOBEC3B in most cancers (Chan et al. 2015). Similarly, a minor portion of mutations at TCN context often present at hairpin loops, that are hotspot sequences preferably attacked by APOBEC3A, but not APOBEC3B in an in vitro biochemical assay (Buisson et al. 2019). Thus, whereas APOBEC3B was initially proposed as a major cause of mutational signatures in cancer, APOBEC3A has recently been highlighted as possibly a more relevant mutator. A role for APOBEC3H in cancer mutagenesis has also been proposed (Starrett et al. 2016). The identity of the APOBEC3 deaminase(s) responsible for clustered mutagenesis, kataegis, also remain cryptic. In support of a potential role for APOBEC3B, endogenous APOBEC3B was recently identified as the source of kataegis in a non-tumorigenic cell model of telomere crisis (Maciejowski et al. 2015; Maciejowski et al. 2020).
Speculations regarding the contributions of individual APOBEC3 enzymes and subsequent DNA repair and replication mechanisms that contribute different clustered and unclustered APOBEC3-associated mutations in cancer are supported by association-based rather than causal links (Petljak et al. 2019; Petljak and Maciejowski 2020; Granadillo Rodriguez et al. 2020; Green and Weitzman 2019). Experimental confirmation that APOBEC3 deaminases are indeed the mutators in cancer and identification of the relevant mutator is critical to pursuing the proposed therapeutic interventions based on modulating activities of the cryptic mutator APOBEC3 deaminase (Olson et al. 2018; Venkatesan et al. 2018; Swanton et al. 2015; Driscoll et al. 2020; Law et al. 2016; Green et al. 2017; Nikkilä et al. 2017; Buisson et al. 2017). Furthermore, the ability to investigate the mechanisms of APOBEC3 misregulation in cancer, which remain entirely unknown and critical to understanding the source of a large proportion of mutations in many cancers of unknown origins, depends on defining the relevant mutator enzymes.
Progress has been hindered by differences between the human and murine APOBEC3 loci and the lack of the genetically amenable human cancer cell models with naturally occurring APOBEC3-associated mutagenesis (Petljak and Maciejowski 2020). APOBEC3-associated mutational signatures were recently found to continue to be generated in many human cell lines from cancer cell lineages exposed to APOBEC3-associated mutagenesis in the past (Petljak et al. 2019). Unlike other mutational signatures that were acquired continuously over time, APOBEC3-associated mutations were generated in episodic bursts, suggesting that the underlying misregulation occurs intermittently rather than continuously in cancer cells. The episodic nature of APOBEC3-associated mutations was subsequently observed in primary human tissues (Lawson et al. 2020; Yoshida et al. 2020). Cell lines with active episodic APOBEC3-associated mutagenesis are thus expected to retain patterns of regulation operative in primary cancers. The individual roles of APOBEC3A and APOBEC3B enzymes in generating both APOBEC3-associated clustered and unclustered mutations are disclosed herein, as well as the roles of base excision repair (BER) components in generating the APOBEC3-associated mutations in human breast and lymphoma cancer cells.
Accordingly, the present disclosure provides methods for treating cancer in a subject, methods of diagnosing cancer in a subject, methods of determining prognosis of cancer in a subject, methods of tracking mutagenesis induced by a gene of interest, and methods of screening for inhibitors and synthetic lethalities. The present disclosure also provides cell lines and antibodies.
In one aspect, the present disclosure provides methods for treating cancer in a subject in need thereof with an agent. The methods may comprise using the agent to inhibit an APOBEC protein in the subject. In some embodiments, the method comprises inhibiting APOBEC3A. In some embodiments, the method comprises inhibiting APOBEC3B. The methods may also comprise using the agent to inhibit REV1 in a subject (for example, as disclosed in Chatterjee et al., Proc. Nat. Acad. Sci. U.S.A. 2020, 117(46), 28918-28921), since REV1 has been shown to play a role in the generation of APOBEC3-induced non-clustered signatures SBS2 and SBS13, as well as clustered kataegis and omikli events in cancer cell genomes, as described herein. The agent used in the methods described herein may be a small molecule, a protein (e.g., an antibody or fragment thereof), a peptide, or a nucleic acid. In some embodiments, the agent is an mRNA, an antisense RNA, an miRNA, an siRNA, an RNA aptamer, a double stranded RNA (dsRNA), a short hairpin RNA (shRNA), or an antisense oligonucleotide (ASO). In certain embodiments, the agent is an siRNA.
In another aspect, the present disclosure provides methods of identifying a subject in need of a treatment for cancer who is likely to respond to an APOBEC3A inhibitor. The methods may comprise (i) taking a biological sample (e.g., tumor biopsy) from the subject; and (ii) determining whether a mutational signature induced by APOBEC3A is present in the sample. The subject is likely to respond to an APOBEC3A inhibitor if a mutation induced by APOBEC3A is present in the sample. In some embodiments, determining whether a mutation induced by APOBEC3A is present in the sample is accomplished by performing whole-genome sequencing. In some embodiments, determining whether the mutational signature is present is accomplished by whole exome sequencing. Determining whether a mutation induced by APOBEC3A is present in the sample may also be accomplished by (i) providing a set of primers comprising a first primer and a second primer to the sample, wherein the first primer binds to a region of the genome upstream of a mutation induced by APOBEC3A and the second primer binds to a region of the genome downstream of the mutation induced by APOBEC3A; (ii) amplifying the region of the genome between the first primer and the second primer; and (iii) sequencing the amplified region of the genome.
In some embodiments, the mutations induced by APOBEC3A are single base substitutions (SBS). Single base substitutions induced by APOBEC3A include, but are not limited to, SBS1, SBS2, SBS5, SBS8_18_36, and SBS13 as defined, for example, in Petjalk, M. et al. Cell 2019, 176(6), 1282-1294 and Alenandrov et al., Nature 2020, 578(7793), 94-101.
In another aspect, the present disclosure provides methods of tracking mutagenesis induced by a gene of interest (e.g., APOBEC3A, APOBEC3B, REV1) in a population of cells over time. Such methods may comprise the following steps:
Knocking out a gene of interest may be accomplished by various genetic methods. In some embodiments, knocking out a gene of interest is accomplished by transfecting a cell from a population of cells with a vector encoding a nuclease. In certain embodiments, the nuclease is a CRISPR-associated nuclease (e.g., Cas9). The vector may also encode a guide RNA (gRNA). In some embodiments, the vector encodes a gRNA, wherein the sequence of a portion of the gRNA is complementary to a portion of the gene of interest.
The present disclosure contemplates the use of the methods disclosed herein for any gene of interest. In some embodiments, the gene of interest is an APOBEC deaminase (e.g., APOBEC3A or APOBEC3B). In certain embodiments, the gene of interest is APOBEC3A. In some embodiments, the gene of interest is REV1.
In another aspect, the present disclosure provides cancer cell lines comprising a population of knockout (KO) cells. The cells may comprise an APOBEC protein KO. In some embodiments, the cancer cell line comprises a population of APOBEC3A KO cells. In some embodiments, the cancer cell line comprises a population of APOBEC3B KO cells. In some embodiments, the cancer cell line comprises a population of REV1 KO cells.
The cancer cell lines of the present disclosure comprise various cell types including, but not limited to, bladder cancer cells, cervical cancer cells, lung cancer cells, head and neck cancer cells, breast cancer cells, esophageal cancer cells, lymphoma cells, oral squamous cell carcinoma cells, uterine cancer cells, ovarian adenocarcinoma cells, pancreatic adenocarcinoma cells, stomach adenocarcinoma cells, or biliary adenocarcinoma cells. In some embodiments, the cells are breast cancer cells (e.g., derived from the human breast cancer cell line BT-474 or MDA-MB-453). In some embodiments, the cells are lymphoma cells (e.g., derived from the human B cell lymphoma cancer cell line BC-1 or JSC-1). In some embodiments, the cells are derived from a sample taken from a patient (e.g., a cancer patient's tumor).
The methods described herein may also be used to track mutagenesis induced by a gene of interest over time.
Another aspect of the present disclosure provides isolated monoclonal antibodies generated from APOBEC protein peptide sequences, e.g., the N-terminal amino acids of APOBEC3A, such as the peptide sequence: MEASPASGPRHLMDPHIFTSNFNNGIGRH (SEQ ID NO: 1). In some embodiments, the antibody is a mouse monoclonal antibody. In some embodiments, the antibody is a humanized antibody. In certain embodiments, the antibody is an anti-APOBEC3A/B/G antibody. In certain embodiments, the antibody is an anti-APOBEC3A antibody.
In another aspect, the present disclosure provides methods for screening for inhibitors of an APOBEC protein (e.g., APOBEC3A or APOBEC3B). The methods may comprise (i) propagating a population of cells in the presence and absence of a candidate APOBEC3A inhibitor; and (ii) determining whether the frequency of a mutational signature induced by APOBEC3A is reduced in the presence of the candidate APOBEC3A inhibitor. The mutational signature induced by APOBEC3A may be single base substitutions (SBS). For example, single base substitutions include, but are not limited to, SBS1, SBS2, SBS5, SBS8_18_36, and SBS13.
Another aspect of the present disclosure provides methods of screening for inhibitors of DNA repair protein REV1 (referred to hereinafter as (“REV1”). The methods may comprise (i) propagating a population of cells in the presence and absence of a candidate APOBEC3A inhibitor; and (ii) determining whether the frequency of a mutational signature induced by REV1 is reduced in the presence of the candidate REV1 inhibitor. In some embodiments, the mutations induced by REV1 are single base substitutions (e.g., any of the single base substitutions disclosed herein).
In another aspect, the present disclosure provides methods for screening for a synthetic lethality associated with active APOBEC3A comprising propagating a population of WT cells and a population of ABOBEC3A KO cells in the presence of an agent capable of inhibiting the activity of a gene of interest. A synthetic lethality is identified when the population of WT cells is able to propagate in the presence of the agent, and the population of APOBEC3A KO cells is not able to propagate in the presence of the agent. The agent may be an inhibitor of a gene of interest. In some embodiments, the inhibitor is a small molecule inhibitor. In some embodiments, the inhibitor is an siRNA inhibitor. In certain embodiments, the agent is a Cas9 nuclease associated with a gRNA, wherein the sequence of a portion of the gRNA is complementary to a portion of the gene of interest.
In other aspects, the present disclosure also provides reagents for performing any of the methods described herein. In some embodiments, the reagents for performing any one of the methods disclosed herein are provided as part of a kit. In some embodiments, the kit further comprises instructions for performing one of the methods disclosed herein. Primers and vectors for performing the methods disclosed herein are also provided by the present disclosure.
The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present disclosure, which can be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. As used herein, the following terms have the meanings ascribed to them unless specified otherwise.
A “subject” to which administration is contemplated refers to a human (i.e., male or female of any age group, e.g., pediatric subject (e.g., infant, child, or adolescent) or adult subject (e.g., young adult, middle-aged adult, or senior adult)) or non-human animal. In certain embodiments, the non-human animal is a mammal (e.g., primate (e.g., cynomolgus monkey or rhesus monkey), commercially relevant mammal (e.g., cattle, pig, horse, sheep, goat, cat, or dog), or bird (e.g., commercially relevant bird, such as chicken, duck, goose, or turkey)). In certain embodiments, the non-human animal is a fish, reptile, or amphibian. The non-human animal may be a male or female at any stage of development. The non-human animal may be a transgenic animal or genetically engineered animal. The term “patient” refers to a human subject in need of treatment of a disease.
The term “administer,” “administering,” or “administration” refers to implanting, absorbing, ingesting, injecting, inhaling, or otherwise introducing an agent or inhibitor as described herein, in or on a subject.
The terms “treatment,” “treat,” and “treating” refer to reversing, alleviating, delaying the onset of, or inhibiting the progress of a disease described herein. In some embodiments, treatment may be administered after one or more signs or symptoms of the disease have developed or have been observed. In other embodiments, treatment may be administered in the absence of signs or symptoms of the disease. For example, treatment may be administered to a susceptible subject prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of exposure to a pathogen). Treatment may also be continued after symptoms have resolved, for example, to delay or prevent recurrence.
As used herein the term “inhibit” or “inhibition” in the context of enzymes, for example, in the context of APOBEC3A, APOBEC3B, or REV1, refers to a reduction in the activity of the enzyme. In some embodiments, the term refers to a reduction of the level of enzyme activity, e.g., APOBEC3A, APOBEC3B, or REV1 activity, to a level that is statistically significantly lower than an initial level, which may, for example, be a baseline level of enzyme activity. In some embodiments, the term refers to a reduction of the level of enzyme activity, e.g., APOBEC3A, APOBEC3B, or REV1 activity, to a level that is less than 75%, less than 50%, less than 40%, less than 30%, less than 25%, less than 20%, less than 10%, less than 9%, less than 8%, less than 7%, less than 6%, less than 5%, less than 4%, less than 3%, less than 2%, less than 1%, less than 0.5%, less than 0.1%, less than 0.01%, less than 0.001%, or less than 0.0001% of an initial level, which may, for example, be a baseline level of enzyme activity.
The term “sample” or “biological sample” refers to any sample including tissue samples (such as tissue sections and needle biopsies of a tissue); cell samples (e.g., cytological smears (such as Pap or blood smears) or samples of cells obtained by microdissection); samples of whole organisms (such as samples of yeasts or bacteria); or cell fractions, fragments or organelles (such as obtained by lysing cells and separating the components thereof by centrifugation or otherwise). Other examples of biological samples include blood, serum, urine, semen, fecal matter, cerebrospinal fluid, interstitial fluid, mucous, tears, sweat, pus, biopsied tissue (e.g., obtained by a surgical biopsy or needle biopsy), nipple aspirates, milk, vaginal fluid, saliva, swabs (such as buccal swabs), or any material containing biomolecules that is derived from a first biological sample. A sample may be taken from a subject, e.g., for diagnostic purposes.
The term “cancer” refers to a class of diseases characterized by the development of abnormal cells that proliferate uncontrollably and have the ability to infiltrate and destroy normal body tissues. See e.g., Stedman's Medical Dictionary, 25th ed.; Hensyl ed.; Williams & Wilkins: Philadelphia, 1990. Exemplary cancers include, but are not limited to, acoustic neuroma; adenocarcinoma; adrenal gland cancer; anal cancer; angiosarcoma (e.g., lymphangiosarcoma, lymphangioendotheliosarcoma, hemangiosarcoma); appendix cancer; benign monoclonal gammopathy; biliary cancer (e.g., cholangiocarcinoma); bladder cancer; breast cancer (e.g., adenocarcinoma of the breast, papillary carcinoma of the breast, mammary cancer, medullary carcinoma of the breast); brain cancer (e.g., meningioma, glioblastomas, glioma (e.g., astrocytoma, oligodendroglioma), medulloblastoma); bronchus cancer; carcinoid tumor; cervical cancer (e.g., cervical adenocarcinoma); choriocarcinoma; chordoma; craniopharyngioma; colorectal cancer (e.g., colon cancer, rectal cancer, colorectal adenocarcinoma); connective tissue cancer; epithelial carcinoma; ependymoma; endotheliosarcoma (e.g., Kaposi's sarcoma, multiple idiopathic hemorrhagic sarcoma); endometrial cancer (e.g., uterine cancer, uterine sarcoma); esophageal cancer (e.g., adenocarcinoma of the esophagus, Barrett's adenocarcinoma); Ewing's sarcoma; ocular cancer (e.g., intraocular melanoma, retinoblastoma); familiar hypereosinophilia; gall bladder cancer; gastric cancer (e.g., stomach adenocarcinoma); gastrointestinal stromal tumor (GIST); germ cell cancer; head and neck cancer (e.g., head and neck squamous cell carcinoma, oral cancer (e.g., oral squamous cell carcinoma), throat cancer (e.g., laryngeal cancer, pharyngeal cancer, nasopharyngeal cancer, oropharyngeal cancer)); hematopoietic cancers (e.g., leukemia such as acute lymphocytic leukemia (ALL) (e.g., B-cell ALL, T-cell ALL), acute myelocytic leukemia (AML) (e.g., B-cell AML, T-cell AML), chronic myelocytic leukemia (CML) (e.g., B-cell CML, T-cell CML), and chronic lymphocytic leukemia (CLL) (e.g., B-cell CLL, T-cell CLL)); lymphoma such as Hodgkin lymphoma (HL) (e.g., B-cell HL, T-cell HL) and non-Hodgkin lymphoma (NHL) (e.g., B-cell NHL such as diffuse large cell lymphoma (DLCL) (e.g., diffuse large B-cell lymphoma), follicular lymphoma, chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL/SLL), mantle cell lymphoma (MCL), marginal zone B-cell lymphomas (e.g., mucosa-associated lymphoid tissue (MALT) lymphomas, nodal marginal zone B-cell lymphoma, splenic marginal zone B-cell lymphoma), primary mediastinal B-cell lymphoma, Burkitt lymphoma, lymphoplasmacytic lymphoma (i.e., Waldenström's macroglobulinemia), hairy cell leukemia (HCL), immunoblastic large cell lymphoma, precursor B-lymphoblastic lymphoma and primary central nervous system (CNS) lymphoma; and T-cell NHL such as precursor T-lymphoblastic lymphoma/leukemia, peripheral T-cell lymphoma (PTCL) (e.g., cutaneous T-cell lymphoma (CTCL) (e.g., mycosis fungoides, Sezary syndrome), angioimmunoblastic T-cell lymphoma, extranodal natural killer T-cell lymphoma, enteropathy type T-cell lymphoma, subcutaneous panniculitis-like T-cell lymphoma, and anaplastic large cell lymphoma); a mixture of one or more leukemia/lymphoma as described above; and multiple myeloma (MM)), heavy chain disease (e.g., alpha chain disease, gamma chain disease, mu chain disease); hemangioblastoma; hypopharynx cancer; inflammatory myofibroblastic tumors; immunocytic amyloidosis; kidney cancer (e.g., nephroblastoma a.k.a. Wilms' tumor, renal cell carcinoma); liver cancer (e.g., hepatocellular cancer (HCC), malignant hepatoma); lung cancer (e.g., bronchogenic carcinoma, small cell lung cancer (SCLC), non-small cell lung cancer (NSCLC), adenocarcinoma of the lung); leiomyosarcoma (LMS); mastocytosis (e.g., systemic mastocytosis); muscle cancer; myelodysplastic syndrome (MDS); mesothelioma; myeloproliferative disorder (MPD) (e.g., polycythemia vera (PV), essential thrombocytosis (ET), agnogenic myeloid metaplasia (AMM) a.k.a. myelofibrosis (MF), chronic idiopathic myelofibrosis, chronic myelocytic leukemia (CML), chronic neutrophilic leukemia (CNL), hypereosinophilic syndrome (HES)); neuroblastoma; neurofibroma (e.g., neurofibromatosis (NF) type 1 or type 2, schwannomatosis); neuroendocrine cancer (e.g., gastroenteropancreatic neuroendoctrine tumor (GEP-NET), carcinoid tumor); osteosarcoma (e.g., bone cancer); ovarian cancer (e.g., cystadenocarcinoma, ovarian embryonal carcinoma, ovarian adenocarcinoma); papillary adenocarcinoma; pancreatic cancer (e.g., pancreatic andenocarcinoma, intraductal papillary mucinous neoplasm (IPMN), Islet cell tumors); penile cancer (e.g., Paget's disease of the penis and scrotum); pinealoma; primitive neuroectodermal tumor (PNT); plasma cell neoplasia; paraneoplastic syndromes; intraepithelial neoplasms; prostate cancer (e.g., prostate adenocarcinoma); rectal cancer; rhabdomyosarcoma; salivary gland cancer; skin cancer (e.g., squamous cell carcinoma (SCC), keratoacanthoma (KA), melanoma, basal cell carcinoma (BCC)); small bowel cancer (e.g., appendix cancer); soft tissue sarcoma (e.g., malignant fibrous histiocytoma (MFH), liposarcoma, malignant peripheral nerve sheath tumor (MPNST), chondrosarcoma, fibrosarcoma, myxosarcoma); sebaceous gland carcinoma; small intestine cancer; sweat gland carcinoma; synovioma; testicular cancer (e.g., seminoma, testicular embryonal carcinoma); thyroid cancer (e.g., papillary carcinoma of the thyroid, papillary thyroid carcinoma (PTC), medullary thyroid cancer); urethral cancer; vaginal cancer; and vulvar cancer (e.g., Paget's disease of the vulva). In some embodiments, the cancer is bladder cancer, cervical cancer, lung cancer, head and neck cancer, breast cancer, esophageal cancer, lymphoma, oral squamous cell carcinoma, uterine cancer, ovarian adenocarcinoma, pancreatic adenocarcinoma, stomach adenocarcinoma, or biliary adenocarcinoma. In certain embodiments, the cancer is lung cancer. In certain embodiments, lung cancer is lung adenocarcinoma or squamous cell carcinoma. In certain embodiments, the cancer is breast cancer. In certain embodiments, the cancer is B cell lymphoma.
The term “gene” or “gene of interest” refers to a nucleic acid fragment that expresses a protein, including regulatory sequences preceding (5′ non-coding sequences) and following (3′ non-coding sequences) the coding sequence. “Native gene” refers to a gene as found in nature with its own regulatory sequences. “Chimeric gene” or “chimeric construct” refers to any gene or a construct, not a native gene, comprising regulatory and coding sequences that are not found together in nature. Accordingly, a chimeric gene or chimeric construct may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different than that found in nature. “Endogenous gene” refers to a native gene in its natural location in the genome of an organism. A “foreign” gene refers to a gene not normally found in the host organism, but which is introduced into the host organism by gene transfer. Foreign genes can comprise native genes inserted into a non-native organism, or chimeric genes. A “transgene” is a gene that has been introduced into the genome by a transformation procedure.
An “antibody” refers to a glycoprotein belonging to the immunoglobulin superfamily. The terms antibody and immunoglobulin are used interchangeably. With some exceptions, mammalian antibodies are typically made of basic structural units each with two large heavy chains and two small light chain. There are several different types of antibody heavy chains, and several different kinds of antibodies, which are grouped together into different isotypes based on which heavy chain they possess. Five different antibody isotypes are known in mammals (IgG, IgA, IgE, IgD, and IgM), which perform different roles and help direct the appropriate immune response for each different type of foreign object they encounter. The term “antibody” as used herein also encompasses antibody fragments and nanobodies, as well as variants of antibodies and variants of antibody fragments and nanobodies.
“Small molecules” include molecules, whether naturally occurring or artificially created (e.g., via chemical synthesis) that have a relatively low molecular weight. Typically, a small molecule is an organic compound (e.g., it contains carbon). The small molecule may contain multiple carbon-carbon bonds, stereocenters, and other functional groups (e.g., amines, hydroxyl, carbonyls, and heterocyclic rings, etc.). In certain embodiments, the molecular weight of a small molecule is not more than about 1,000 g/mol, not more than about 900 g/mol, not more than about 800 g/mol, not more than about 700 g/mol, not more than about 600 g/mol, not more than about 500 g/mol, not more than about 400 g/mol, not more than about 300 g/mol, not more than about 200 g/mol, or not more than about 100 g/mol. In certain embodiments, the molecular weight of a small molecule is at least about 100 g/mol, at least about 200 g/mol, at least about 300 g/mol, at least about 400 g/mol, at least about 500 g/mol, at least about 600 g/mol, at least about 700 g/mol, at least about 800 g/mol, or at least about 900 g/mol, or at least about 1,000 g/mol. Combinations of the above ranges (e.g., at least about 200 g/mol and not more than about 500 g/mol) are also possible. In certain embodiments, the small molecule is a therapeutically active agent such as a drug (e.g., a molecule approved by the U.S. Food and Drug Administration as provided in the Code of Federal Regulations (C.F.R.)). The small molecule may also be complexed with one or more metal atoms and/or metal ions. In this instance, the small molecule is also referred to as a “small organometallic molecule.” Preferred small molecules are biologically active in that they produce a biological effect in animals, preferably mammals, more preferably humans. Small molecules include, but are not limited to, radionuclides and imaging agents. In certain embodiments, the small molecule is a drug. Preferably, though not necessarily, the drug is one that has already been deemed safe and effective for use in humans or animals by the appropriate governmental agency or regulatory body. For example, drugs approved for human use are listed by the FDA under 21 C.F.R. §§ 330.5, 331 through 361, and 440 through 460, incorporated herein by reference; drugs for veterinary use are listed by the FDA under 21 C.F.R. §§ 500 through 589, incorporated herein by reference. All listed drugs are considered acceptable for use in accordance with the present invention.
A “protein,” “peptide,” or “polypeptide” comprises a polymer of amino acid residues linked together by peptide bonds. The term refers to proteins, polypeptides, and peptides of any size, structure, or function. Typically, a protein will be at least three amino acids long. A protein may refer to an individual protein or a collection of proteins. Inventive proteins preferably contain only natural amino acids, although non-natural amino acids (i.e., compounds that do not occur in nature but that can be incorporated into a polypeptide chain) and/or amino acid analogs as are known in the art may alternatively be employed. Also, one or more of the amino acids in a protein may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl group, an isofarnesyl group, a fatty acid group, a linker for conjugation or functionalization, or other modification. A protein may also be a single molecule or may be a multi-molecular complex. A protein may be a fragment of a naturally occurring protein or peptide. A protein may be naturally occurring, recombinant, synthetic, or any combination of these.
Cytidine deaminases are enzymes involved in pyrimidine salvaging. Cytidine deaminases catalyze the irreversible hydrolytic deamination of cytidine and deoxycytidine to uridine and deoxyuridine, respectively. The majority of cytidine deaminases act on RNA, while a few act on DNA (e.g., single stranded DNA). APOBEC proteins are a family of cytidine deaminases. APOBEC proteins include, but are not limited to, APOBEC1, APOEC2, APOEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H, APOBEC4, and activation-induced deaminase (AID).
The human protein APOBEC3A consists of the amino acid sequence:
The human protein APOBEC3B consists of the amino acid sequence:
The human protein REV1 is a DNA repair protein. REV1 is a Y family DNA polymerase and is sometimes referred to as a deoxycytidyl transferase because it inserts deoxycytidine across from lesions. REV1 uses an arginine as a template, which complements well with cytidine, and thus always adds a cytidine, no matter the nucleotide present at the abasic site. REV1 is thought to play a role in recruiting other TLS proteins. As described herein, REV1 has also been shown to play a role in the in the generation of APOBEC3-induced non-clustered signatures SBS2 and SBS13, as well as clustered kataegis and omikli events in cancer cell genomes.
Human REV1 consists of the amino acid sequence:
The APOBEC3 family of cytidine deaminases has emerged as a major putative source of somatic mutations in cancer. However, a lack of appropriate experimental models has hindered establishment of causal links between the activities of individual APOBEC3 enzymes and mutations in cancer cells, leaving the major mutator debatable and the mechanisms underlying different APOBEC3-attributed mutational signatures unknown. To test the long-postulated hypothesis pertaining to APOBEC3 mutagenesis in cancer, candidate APOBEC3 genes were deleted from cancer cell lines that naturally generate APOBEC3-associated mutations in episodic bursts. Deletion of the APOBEC3A paralog severely diminished the acquisition of mutations of speculative APOBEC3 origins in two breast cancer and two lymphoma cell lines, while increased APOBEC3 mutational burdens were observed in APOBEC3B knockout cell lines. APOBEC3A deletion also diminished the appearance of clusters of APOBEC3-associated mutation types, termed kataegis and omikli, which are frequently found in cancer genomes. The uracil glycosylase UNG and the translesion polymerase Rev1 were also found to play critical roles in the generation of mutations induced by APOBEC3A. These data represent the first experimental confirmation that APOBEC3 deaminases generate prevalent clustered and non-clustered mutational signatures in human cancer cells, identify the APOBEC3A and APOBEC3B paralogs as drivers and potential modulators of the episodic mutational bursts, and dissect the roles of the relevant enzymes in generating the associated mutations in breast cancer and B cell lymphoma cell lines. Accordingly, the present disclosure provides methods for treating cancer in a subject, methods of diagnosing cancer in a subject, methods of tracking mutagenesis induced by a gene of interest, and methods of screening for inhibitors and synthetic lethalities. The present disclosure also provides cell lines and antibodies. Finally, the present disclosure additionally provides reagents, kits, primers, and vectors for performing the methods disclosed herein.
One aspect of the present disclosure provides methods for treating cancer in a subject in need thereof with an agent. The methods may comprise using the agent to inhibit an APOBEC protein in the subject. APOBEC proteins are a family of cytidine deaminases. APOBEC proteins include, but are not limited to, APOBEC1, APOEC2, APOEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H, APOBEC4, and activation-induced deaminase (AID). In some embodiments, the methods comprise inhibiting APOBEC3A in a subject in need thereof. In some embodiments, the methods comprise inhibiting APOBEC3B in a subject in need thereof. In some embodiments, the methods comprise inhibiting AID in a subject in need thereof. The methods may also comprise using the agent to inhibit the translesion polymerase REV1 in a subject. REV1 is a DNA repair protein in humans. In some embodiments, the methods comprise inhibiting UNG. In certain embodiments, multiple proteins may be inhibited simultaneously, or sequentially (e.g., one or more of APOBEC3A, APOBEC3B, another APOBEC protein, REV1, and UNG are inhibited simultaneously, or sequentially).
The agent used in the methods for treating cancer described herein may be a small molecule. For example, APOBEC inhibitors are described in Olson et al., Cell Chemical Biology 2018, 25(1), 36-49 and Kvach et al., Biochemistry 2019, 58, 391-400. Small molecule inhibitors of APOBEC3A include, for example, those described in King, J. J. et al. ACS Pharmacol. Transl. Sci. 2021, 4(4), 1390-1407, including small molecules with the following structures, and derivatives thereof:
The agent used the methods for treating cancer disclosed herein may also be a protein (including an antibody as described herein). In some embodiments, the inhibitor is an anti-APOBEC3A antibody. In some embodiments, the inhibitor is an anti-APOBEC3B3 antibody. In some embodiments, the antibody is an anti-REV1 antibody.
The agent used in the methods of treating cancer disclosed herein may also be a nucleic acid. In some embodiments, the agent is an mRNA, an antisense RNA, an miRNA, an siRNA, an RNA aptamer, a double stranded RNA (dsRNA), a short hairpin RNA (shRNA), or an antisense oligonucleotide (ASO). In some embodiments, the agent is an siRNA. siRNAs are small inhibitory RNA duplexes that induce the RNA interference (RNAi) pathway, where the siRNA interferes with the expression of specific genes with a complementary nucleotide sequence. siRNA molecules can vary in length (e.g., between 18-30 or 20-25 base pairs) and contain varying degrees of complementarity to their target mRNA in the antisense strand. Some siRNA have unpaired overhanging bases on the 5′ or 3′ end of the sense strand and/or the antisense strand. The term siRNA includes duplexes of two separate strands, as well as single strands that can form hairpin structures comprising a duplex region. For example, an siRNA directed to knocking out APOBEC3A could be used in the treatment of cancer. In some embodiments, the siRNA is directed to knocking out APOBEC3B3. In some embodiments, the siRNA is directed to knocking out REV1. Suitable siRNAs for use in the methods described here include, for example, those disclosed in Cortez, L. M. et al. PLOS Genetics 2019, 15(12):e1008545.
Treatment of any cancer disclosed herein is contemplated by the methods of the present application. In particular, treatment of bladder cancer, cervical cancer, lung cancer, head and neck cancer, breast cancer, esophageal cancer, lymphoma, oral squamous cell carcinoma, uterine cancer, ovarian adenocarcinoma, pancreatic adenocarcinoma, stomach adenocarcinoma, or biliary adenocarcinoma is contemplated by the present disclosure.
Treatment of a subject with any cancer using the methods disclosed herein is contemplated by this disclosure. In some embodiments, subjects with a tumor with one or more mutational signatures associated with an APOBEC protein are treated using the methods disclosed herein. In some embodiments, the tumor in the subject being treated has one or more mutational signatures associated with APOBEC3A. In some embodiments, the one or more mutational signatures are associated with APOBEC3B. In certain embodiments, the one or more mutational signatures are associated with REV1. In certain embodiments, the one or more mutational signatures are associated with UNG.
In another aspect of the present disclosure, methods for treating cancer comprising enhancing the activity of an APOBEC protein are provided. In some embodiments, the activity of APOBEC3B is induced.
Methods for Diagnosing Cancer and/or Identifying a Subject in Need of Cancer Treatment
In another aspect, the present disclosure provides methods of identifying a subject in need of treatment for cancer. The present disclosure contemplates identifying subjects who are likely to respond to or benefit from being treated with an APOBEC3A inhibitor. The methods disclosed herein may comprise (i) taking a sample from the subject (e.g., a tissue biopsy from a tumor); and (ii) determining whether a mutational signature induced by APOBEC3A is present in the sample. The subject is likely to respond to or benefit from treatment with an APOBEC3A inhibitor if a mutational signature induced by APOBEC3A is present in the sample.
Determining whether a mutational signature induced by APOBEC3A is present in the sample may be accomplished through various methods known in the art. For example, determining whether such a mutational signature is present may be determined through any type of sequencing method known in the art, such as whole-genome sequencing. In some embodiments, determining whether the mutational signature is present is accomplished by whole exome sequencing. Determining whether a mutation induced by APOBEC3A is present in the sample may also be accomplished by targeted-gene sequencing (e.g., through a method such as (i) providing a set of primers comprising a first primer and a second primer to the sample, wherein the first primer binds to a region of the genome upstream of a mutational signature induced by APOBEC3A and the second primer binds to a region of the genome downstream of the mutational signature induced by APOBEC3A; (ii) amplifying the region of the genome between the first primer and the second primer; and (iii) sequencing the amplified region of the genome).
Various mutational signatures may be induced by APOBEC3A. For example, a mutational signature induced by APOBEC3A may be a single base substitution (SBS). An SBS is a genetic mutation in which a single nucleotide base is changed from a DNA or RNA sequence in an organism's genome. Single base substitutions induced by APOBEC proteins, such as APOBEC3A, include, but are not limited to, SBS1, SBS2, SBS5, SBS8_18_36, and SBS13. These mutational signatures are known in the art, for example, in Petljak, M. et al. Cell 2019, 176, 1282-1294.
Another aspect of the present disclosure provides methods of tracking mutagenesis induced by a gene of interest (e.g., APOBEC3A cytidine deaminase) in a population of cells over time. Such methods may comprise the following steps:
Knocking out a gene of interest may be accomplished by any genetic method known in the art. For example, knocking out a gene of interest can be accomplished by transfecting a cell from a population of cells with a vector encoding a nuclease, such as a CRISPR-associated nuclease (e.g., Cas9 nuclease). The vector may also encode a guide RNA (gRNA), where the sequence of a portion of the gRNA is complementary to a portion of the gene of interest (e.g., the sequence has sufficient complementarity to be able to hybridize with the gene of interest, forming a stable duplex). Cells may also be treated with a nuclease and a gRNA directly. Such a treatment may also include a transfection reagent, or a fusion to the nuclease, to help the nuclease and gRNA enter the cell to edit the genome.
In the methods disclosed herein, the first and second propagating steps may each be performed for a variable number of days. For example, the first propagating step may be performed for at least 10 days. The first propagating step may also be performed for at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, or at least 50 days. The first propagating step may also be performed for up to 100, up to 150, up to 200, up to 250, or up to 300 days or more. In certain embodiments, the first propagating step is performed for anywhere between 50 and 150 days.
The sequencing step may be performed using various methods known in the art. For example, in some embodiments, the sequencing step comprises whole genome sequencing. In some embodiments, the sequencing step comprises whole exome sequencing. In certain embodiments, the sequencing step comprises targeted gene sequencing.
The present disclosure contemplates the use of the methods disclosed herein for tracking mutagenesis induced by any gene of interest. For example, in some embodiments, the gene of interest is an APOBEC deaminase (e.g., APOBEC3A, APOBEC3B, or any APOBEC deaminase disclosed herein). In certain embodiments, the gene of interest is REV1. In certain embodiments, the gene of interest is UNG.
The present disclosure also provides for distinguishing which APOBEC3A-associated mutational signatures (e.g., SBS's) are diagnostic for APOBEC3A activity, and which mutational signatures are not associated with APOBEC3A activity. Mutational signatures that are not associated with APOBEC3A activity can be used as standards to cross-compare or normalize SBS activity between different cell samples.
Another aspect of the present disclosure provides cancer cell lines comprising a population of knockout (KO) cells. The cells may comprise a knockout of an APOBEC protein. The cancer cell lines contemplated by the present disclosure include populations of cells comprising an APOBEC3A KO. In some embodiments, the cancer cell line comprises a population of APOBEC3B KO cells.
The cancer cell lines of the present disclosure comprise various cell types including, but not limited to, bladder cancer cells, cervical cancer cells, lung cancer cells, head and neck cancer cells, breast cancer cells, esophageal cancer cells, lymphoma cells, oral squamous cell carcinoma cells, uterine cancer cells, ovarian adenocarcinoma cells, pancreatic adenocarcinoma cells, stomach adenocarcinoma cells, or biliary adenocarcinoma cells. In some embodiments, the cells are breast cancer cells (e.g., derived from the human breast cancer cell line BT-474 or MDA-MB-453). In some embodiments, the cells are lymphoma cells (e.g., derived from the human B cell lymphoma cancer cell line BC-1 or JSC-1). In some embodiments, the cells are derived from a sample taken from a subject (e.g., a tumor sample).
Another aspect of the present disclosure provides isolated monoclonal antibodies generated from APOBEC peptides, e.g., the N-terminal amino acids of APOBEC3A, such as the peptide sequence: MEASPASGPRHLMDPHIFTSNFNNGIGRH (SEQ ID NO: 1). The antibodies contemplated by the present disclosure may include any of the several different types of antibody heavy chains, and the several different kinds of antibodies, which are grouped together into different isotypes as disclosed herein. The antibodies disclosed herein may include, for example, any of the five mammalian antibody isotypes (IgG, IgA, IgE, IgD, and IgM).
The present disclosure provides isolated monoclonal antibodies generated from the peptide: MEASPASGPRHLMDPHIFTSNFNNGIGRH (SEQ ID NO: 1). In some embodiments, the monoclonal antibody is a mouse antibody. In some embodiments, the antibody is a human antibody. In some embodiments, the monoclonal antibody is a human antibody. In certain embodiments, the monoclonal antibody is an anti-APOBEC3A/B/G antibody. In certain embodiments, the monoclonal antibody is an anti-APOBEC3A antibody.
In another aspect, the present disclosure provides methods for screening for inhibitors. For example, methods of screening for inhibitors of an APOBEC protein (e.g., APOBEC3A or APOBEC3B) are provided. Such methods may comprise (i) propagating a population of cells in the presence and absence of a candidate APOBEC3A inhibitor; and (ii) determining whether the frequency of a mutational signature induced by APOBEC3A is reduced in the presence of the candidate APOBEC3A inhibitor. In another example, methods of screening for inhibitors of REV1 are provided. Such methods may comprise (i) propagating a population of cells in the presence and absence of a candidate REV1 inhibitor; and (ii) determining whether the frequency of a mutational signature induced by REV1 is reduced in the presence of the candidate REV1 inhibitor. Inhibitors of other APOBEC family proteins (e.g., APOBEC3B), or inhibitors of UNG, could also be screened for using similar methods.
The mutational signature induced by APOBEC3A may comprise single base substitutions (SBS). For example, single base substitutions include, but are not limited to, SBS1, SBS2, SBS5, SBS8_18_36, and SBS13, as well as any SBS disclosed herein.
In another aspect, the present disclosure provides methods for screening for a synthetic lethality associated with active APOBEC3A. A synthetic lethality arises when a combination of deficiencies (e.g., through genetic knockout, or through enzyme inhibition) of at least two genes leads to the death of a cell, while a deficiency of only one of the genes does not result in cell death. Such methods may comprise propagating a population of WT cells and a population of ABOBEC3A KO cells in the presence of an agent capable of inhibiting the activity of a gene of interest. A synthetic lethality is identified when the population of WT cells is able to propagate in the presence of the agent and the population of APOBEC3A KO cells is not able to propagate in the presence of the agent. The agent may be an inhibitor of a gene of interest. In some embodiments, the inhibitor is a small molecule inhibitor, as described herein. In some embodiments, the inhibitor is an siRNA inhibitor, as described herein. In certain embodiments, the agent is a Cas9 nuclease associated with a gRNA, wherein the sequence of a portion of the gRNA is complementary to a portion of the gene of interest, as described herein. The present disclosure also provides similar methods for screening for a synthetic lethality associated with active APOBEC3B, REV1, or UNG.
In other aspects, the present disclosure also provides reagents for performing any of the methods described herein. In some embodiments, the reagents for performing any one of the methods disclosed herein are provided as part of a kit. In some embodiments, the kit further comprises instructions for performing one of the methods disclosed herein. Primers and vectors for performing the methods disclosed herein are also provided by the present disclosure. Additional cell lines and antibodies used in the methods described herein are also provided by this disclosure.
Mutational signatures identified in a DNA sequence reflect traces of historic mutational processes. It was recently found that cell lines with evidential historic exposure to APOBEC3-associated mutagenesis often continue to generate the unclustered and clustered kataegis mutations associated with APOBEC3 deaminases in episodic bursts over time (Petljak et al. Cell 2019, 176(6), 1282-1294).
Comparison of the APOBEC3-associated mutational signatures across DNA sequences of 780 widely used human cancer cell lines and 1843 primary human cancers (
To determine the relative contributions of individual genes to generation of APOBEC3-associated signatures, a selection of candidate genes were deleted from two commonly used human breast cancer cell lines (BT-474 and MDA-MB-453), as well as two B cell lymphoma (BC-1 and JSC-1) cancer cell lines previously shown to naturally acquire clustered kataegis and unclustered APOBEC3-associated mutations over time without experimental perturbations (
Examination of SBS profiles of the bulk cell lines revealed that BT-474, MDA-MB-453 and JSC-1 cell lines carried patterns of both SBS2 and SBS13, while BC-1 displayed only SBS2 signature (
Next, the burdens of individual mutational signatures were quantified across the pre-existing mutations identified in parent clones and de novo acquired mutations identified in daughter clones to investigate the contributions of candidate genes to acquisition of APOBEC-associated mutations. First, the signatures identified here were deconvoluted (
To test whether endogenous APOBEC3 activity represents an enzymatic source of cancer mutagenesis and delineate potential roles of individual APOBEC3 paralogs, APOBEC3A and APOBEC3B were deleted by CRISPR-Cas9 gene targeting from cancer cell lines with evidence of APOBEC3A and APOBEC3B expression and active APOBEC3-associated mutagenesis (
Continued generation of SBS2 and SBS13 was detectable in WT clones of breast cancer and both B cell lymphoma cell lines (
Consistent with widely reported observations of upregulation of APOBEC3B in breast and other cancer types (Burns et al. Nature 2013; Burns et al. Nat. Genet. 2013; Leonard et al. 2013), all cell lines exhibited substantially elevated mRNA and protein levels of APOBEC3B relative to APOBEC3A (
Despite low expression of APOBEC3A compared to APOBEC3B in all breast and lymphoma cell lines and measurable activities from both enzymes upon DNA substrates in vitro, deletion of APOBEC3A, but not APOBEC3B, severely diminished SBS2 and SBS13 mutations in daughter clones isolated from KO parent clones (
Analysis of cytosine mutations acquired at APOBEC3A-preferred YTCN and APOBEC3B-preferred RTCN sequence contexts revealed that mutational catalogues of most WT clones were enriched in APOBEC3A-preferred YTCN contexts, in line with APOBEC3A being the major mutator (
While deletion of APOBEC3B did not diminish overall mutational burdens, daughter clones isolated from the APOBEC3B KO breast cancer cell lines BT-474 and MDA-MB-453 exhibited more SBS2 and SBS13 mutations on average than their WT counterparts (
Most cancers and cell lines with mutational signatures of APOBEC3 deaminases exhibit both SBS2 and SBS13 signatures, albeit at different relative proportions (
In sharp contrast to WT clones from MDA-MB-453 and BT-474 cell lines, which exhibited both SBS2 and SBS13, daughters isolated from the UNG KO clones exhibited exclusively SBS2 mutations (
Following uracil excision, replication across abasic sites by translesion synthesis (TLS) polymerases has been speculated to give rise to C>A and C>G transversions, as well as a portion of C>T mutations, based on models of activation-induced cytidine deaminase (AID) APOBEC family member during immunoglobulin gene somatic hypermutation (Masuda et al. 2009; Sale et al. 2012). Specifically, Rev1 was proposed to form a scaffold for components of TLS upon AID-mediated somatic hypermutation mediated by the AID APOBEC family member and to thus play a critical role in generation of a broad range of TLS-associated mutations (Simpson 2003). To assess the contribution of TLS to generation of SBS2 and SBS13, REV1 was targeted by CRISPR/Cas9 editing in breast cancer cell lines, and loss of expression was verified by immunoblotting (
Furthermore, the data suggests that in the absence of REV1, alternative, less mutagenic pathways, may be used to navigate the lesion. One such possibility is recombination-mediated bypass, which has previously been proposed to act downstream of AID (Simpson 2003). While such pathways were proposed to come at a cost of the increased genomic instability, an increase in rearrangements was not observed (
Unlike SBS1, mutational burdens attributed to SBS5 were significantly depleted in REV1 knockout cells of MDA-MB-453 cell lines (p=4.0×10-3, Mann-Whitney test). SBS5 has been attributed to an unknown process that is continuously operative across all tissues (Alexandrov et al. 2015; Kim et al. 2016), and its increased burdens in bladder cancers have been associated with mutations in the ERCC2 gene encoding a DNA helicase that plays a central role in the NER pathway (Kim et al. 2016). The data suggests that REV1 may play a critical part in the underlying mutational process.
Most APOBEC3-associated mutations in examined cell lines were dispersed throughout the genome (
In line with an increased burden of genome-wide SBS2 and SBS13 observed in APOBEC3B-deleted clones, the highest number of kataegis foci were observed in APOBEC3B KO clones from all cell lines (
Unexpectedly, loss of APOBEC3A also caused a reduction in clustered mutations occurring outside of APOBEC3-like sequence contexts in BC-1 and MDA-MB-453 cells, while deletion of APOBEC3B led to their modest increase in breast cancer cell lines (
Kataegis foci often co-localize with rearrangements in primary cancers, a phenomenon in part attributed to APOBEC3 attacks on ssDNA exposed during the resection phase of homologous recombination-mediated double-strand break DNA repair (Taylor et al. 2013; Nik-Zainal et al. 2012). A separate explanation proposes that APOBEC3-induced deamination may precede the dsDNA breaks (Taylor et al. 2013), if ssDNA breaks generated upon UNG-mediated uracil excision represent the initiating lesions for formation of subsequent dsDNA breaks. However, kataegis did not depend on UNG in MDA-MB-453 and BT-474 cell lines (
The present disclosure provides the first direct evidence that cytidine deaminases represent potent mutators in human cancer cells. The data establish APOBEC3A as the main driver of highly prevalent genome-wide and clustered kataegis APOBEC3-associated mutational signatures in breast and B cell lymphoma cancer cells. APOBEC3-associated mutational signatures are enriched at YTCN sequence contexts in the majority of individual human cancers and cancer types (Chan et al. 2015; Burns et al. Nature 2013; Burns et al. Nat. Genet. 2013). The finding described herein that APOBEC3A accounts for most APOBEC-associated mutations at YTCN sequence contexts in human cancer cells strongly indicates that APOBEC3A drives acquisition of the large majority of all APOBEC-associated mutations observed in cancer genomes. All the cancer cell lines analyzed in this study, where APOBEC3A is the predominant driver of the relevant mutations, possess high levels of APOBEC3B expression relative to APOBEC3A, an observation that was previously used to nominate APOBEC3B as the major mutator in cancer (Burns et al. Nature 2013; Burns et al. Nat. Genet. 2013; Leonard et al. 2013). Furthermore, despite APOBEC3A being the predominant mutator, activities of APOBEC3A and APOBEC3B were similar in in vitro deamination assays that have commonly been used as substitute readouts of mutagenesis by individual enzymes (Burns et al. Nature 2013; Burns et al. Nat. Genet. 2013). Thus, the data show that increased expression and deamination activities of individual APOBEC members may not always translate into active mutagenesis. These findings caution against the widespread use of such readouts as sole substitute measures of active mutagenesis by APOBEC3 deaminases, which resulted in distinct predictions regarding APOBEC members as predominant mutators in cancer (Burns et al. Nature 2013; Burns et al. Nat. Genet. 2013; Cortez et al. 2019; Jalili et al. 2020). The direct measurements of mutagenic activities of APOBEC3A and APOBEC3B enzymes in human cancer cell line genomes used here represent the strongest available support that mutagenesis by APOBEC3A, and not APOBEC3B, represents the major source of some of the most prevalent mutational signatures in human cancer. Recent work, largely based on correlations between individual APOBEC3 expression levels and deamination activities, has implicated distinct APOBEC3 members as drivers of targeted therapy resistance in lung cancers (Mayekar et al. 2020; Isozaki et al. 2021). The results described herein call for the use of more direct measures of APOBEC3 activity to delineate the role of individual APOBEC3 enzymes in cancer genome evolution.
Finally, these data implicate UNG and REV1, and thus BER, in the generation of APOBEC3-induced non-clustered signatures SBS2 and SBS13, as well as clustered kataegis and omikli events in cancer cell genomes. Experimental confirmation of APOBEC3 deaminases as mutators in human cancer cells and identification of APOBEC3A as the main generator of widespread mutations in cancer marks a critical advance in pursuing therapeutic interventions based on modulating the generation of the associated SBS signatures and in investigating the origins of APOBEC3-associated mutations in cancer. These data show that modulation of mutagenic activities by APOBEC3A offers avenues for therapeutic interventions.
Cell Culture: MDA-MB-453, BT-474, JSC-1, and BC-1 cancer cell lines were acquired from the cryopreserved aliquots of 1,001 cell lines, extensively characterized as part of the Genomics of Drug Sensitivity in Cancer (GDSC) (Iorio et al. 2016; Garnett et al. 2012) and COSMIC Cell Line projects (Petljak et al. 2019; Forbes et al. 2017). Cell lines were genotyped previously by SNP and STR profiling, as part of the COSMIC Cell Line Project (cancer.sanger.ac.uk/cell_lines) and individual clones obtained here were Fluidigm genotyped to ensure that their accurate identities. MCF10A cells were from Maria Jasin (MSKCC).
All cell lines were mycoplasma negative and fingerprinted by single nucleotide polymorphism (SNP) and short tandem repeat (STR) profiling at the MSKCC Antibody and Bioresource Core. MDA-MB-453 cells were grown in DMEM:F12 medium supplemented with 10% fetal bovine serum (FBS) and 100 U/mL penicillin-streptomycin. BC-1, BT-474, and JSC-1 cells were grown in RPMI medium supplemented with 10% FBS, 1% penicillin-streptomycin, 1% sodium pyruvate, and 1% glucose. MCF10A cells were cultured in 1:1 mixture of F12:DMEM media supplemented with 5% horse serum (Thermo Fisher Scientific), 20 ng/ml human EGF (Sigma), 0.5 mg/ml hydrocortisone (Sigma), 100 ng/ml cholera toxin (Sigma) and 10 μg/ml recombinant human insulin (Sigma).
Generation of Knockout Cell Lines: 106 cells were electroporated using the Lonza 4D-Nucleofector X Unit (MDA-MB-453) or Lonza Nucleofector 2b Device (BT-474, BC-1, JSC-1) using programs DK-100 (MDA-MB-453), X-001 (BT-474), or T-001 (BC-1, JSC-1) in buffer SF+18% supplement (MDA-MB-453) or 80% Solution 1 (125 mM Na2HPO4·7H2O, 12.5 mM KCl, acetic acid to pH=7.75) and 20% Solution 2 (55 mM MgCl2) (BT-474, BC-1, JSC-1) and 9 μg (UNG, SMUG1, REV1) or 10 μg (A3A, A3B) of pU6-sgRNA_CBh-Cas9-T2A-mCherry plasmid DNA. Electroporated cells were plated into 10 cm dishes and the media was changed after 24 h. mCherry positive cells were single-cell sorted into 96-well plates by FACS using FACSAria (BD Biosciences). To generate the APOBEC3B KO in JSC-1 cells, 106 cells were transfected with 10 μg pU6-sgA3B_CBh-Cas9-T2A-mCherry DNA using Lipofectamine 3000 reagent (ThermoFisher Scientific cat. #L3000015). Cells were plated for 48 h, after which mCherry positive cells were bulk sorted, grown, and subcloned by limiting dilution.
Knockout Screening and Validation by PCR: CRISPR KO Clone Screening. Cells were pelleted and their genomic DNA isolated using the Cell Monolayer protocol of the Zymo Research Genomic DNA Isolation Kit (cat. #ZD3025). Purified genomic DNA for CRISPR/Cas9 knockout screens was amplified using Touchdown PCR. Each PCR reaction consisted of: 7.4 μL ddH2O, 1.25 μL 10×PCR buffer (166 mM NH4SO4, 670 mM Tris base (pH 8.8), 67 mM MgCl2, 100 mM β-mercaptoethanol), 1.5 μL 10 mM dNTPs, 0.75 μL DMSO, 0.25 μL forward and reverse primers (10 μM each), 0.1 μL Platinum Taq DNA Polymerase (Invitrogen, cat. #10966083), and 1 μL genomic DNA.
PCR for Sanger Sequencing: PCR reactions for Sanger Sequencing were performed using the Invitrogen Platinum Taq DNA Polymerase (Invitrogen, cat. #10966083) protocol. 25 ng of genomic DNA was used for each reaction. DNA from PCR reactions was purified from agarose gels using the Invitrogen PureLink Quick Gel Extraction Kit (Invitrogen, cat. #K210012). Gel-purified DNA was cloned using the TOPO TA Cloning Kit for Sequencing (Invitrogen, cat. #450030) and grown on LB-Amp plates and colonies were selected for sequencing (Genewiz).
RNA Isolation and Quantitative PCR: Cells were pelleted, and their RNA isolated using the Zymo Research Quick-RNA Miniprep Kit (cat. #R1054). RNA was quantified and converted to cDNA using the Invitrogen SuperScript IV First-Strand Synthesis System (cat. #18091050). cDNA synthesis reactions were performed using 2 μL of 50 ng/μL random hexamers, 2 μL of 10 mM dNTPs, 4 μg RNA, and DEPC-treated water to a volume of 26 μL. The mixture was heated at 65° C. for 5 minutes, then cooled on ice for 5 minutes. Primers, probes, and cycling conditions were adapted from published methods (Refsland et al. 2010).
Immunoblotting: Cells were lysed in 2× sample buffer (100 mM TrisHCl, pH 6.8, 4% SDS, 10% β-mercaptoethanol). The whole-cell lysate was subjected to SDS-polyacrylamide gel electrophoresis on NuPAGE 4-12% Bis-Tris gradient gels (Novex Life Technologies) and proteins were transferred onto a nitrocellulose membrane (Millipore).
Cells were lysed in RIPA buffer [(150 mM NaCl, 50 mM Tris-HCl pH=8, 1% NP-40, 0.5% sodium deoxycholate, 0.1% SDS, Pierce Protease Inhibitor Tablet, EDTA free] or sample buffer (125 mM Tris-HCl pH 6.8, 1 M β-mercaptoethanol, 4% SDS, 20% glycerol, 0.02% bromophenol blue). Quantification of RIPA extracts was performed using the Thermo Scientific Pierce BCA Protein Assay kit. Protein transfer was performed via wet transfer using 1×Towbin buffer (25 mM Tris, 192 mM glycine, 0.01% SDS, 20% methanol) and nitrocellulose membrane. Blocking was performed in 5% milk in 1×TBST (19 mM Tris, 137 mM NaCl, 2.7 mM KCl, and 0.1% Tween-20) for 1 h at room temperature (RT). The following antibodies were used: anti-APOBEC3A (see below; WB 1:500), anti-APOBEC3B (Abeam; ab184990; WB 1:500), anti-SMUG1 (Abeam; ab192240; WB 1:1,000), anti-UNG (abeam; ab109214; WB 1:1,000), anti-GFP (Santa Cruz; sc-9996; WB 1:1,000), anti-3-actin (Abeam; ab8224; WB 1:3,000), anti-3-actin (Abeam, ab8227; WB 1:3,000); anti-Mouse IgG HRP (Thermo Fisher Scientific; 31432; 1:10,000), and anti-Rabbit IgG HRP (SouthernBiotech; 6441-05; 1:10,000).
APOBEC3 monoclonal antibody generation: Residues 1-29 (N1-term) or 13-43 (N2-term) from APOBEC3A and residues 354-382 (C-term) from APOBEC3B and were used to create three peptide immunogens (EZBiolab). Five mice were given three injections using Keyhole-Limpet-Hemocyanin (KLH)-conjugated peptides over the course of 12 weeks (MSKCC Antibody and Bioresource Core). Test bleeds from the mice were screened for anti-APOBEC3A titers by ELISA against APOBEC3A peptides conjugated to BSA. Mice showing positive anti-APOBEC3A immune responses were selected for final immunization boost before their spleens were harvested for B-cell isolation and hybridoma production. Hybridoma fusions of myeloma (SP2/IL6) cells and viable splenocytes from the selected mice were performed by MSKCC Antibody and Bioresource Core. Cell supernatants were screened by APOBEC3A ELISA. The strongest positive hybridoma pools were subcloned by limiting dilution to generate monoclonal hybridoma cell lines. Hybridomas 04A04 and 01D05 were expanded then grown in 1% FBS medium for XX. This medium was clarified by centrifugation to remove cells and then passed over a Protein G column (04A04) or Protein A column (01D05) to bind mAb. The resulting mAb was eluted in PBS (04A04) or 100 mM NaCitrate pH 6, 150 mM NaCl buffer (01D05).
In vitro DNA deaminase activity assay: Deamination activity assays were performed as described (Stenglein et al. 2010). Briefly, 1 million cells were pelleted and lysed in buffer (25 mM HEPES, 150 mM NaCl, 1 mM EDTA, 10% glycerol, 0.5% Triton-X, 1× protease inhibitor), sheared through a 28½-gauge syringe, then cleared by centrifugation at 13,000×g for 10 minutes at 4° C. Deaminase reactions (16.5 μl cell extracts with 2 μl UDG buffer (NEB), 0.5 μl RNase A (20 mg/ml), 1 μl 1 μM probe (linear=5′IRD800/ATTATTATTATTATTATTATTTCATTTATTTATTTATTTA (SEQ ID NO: 5) or hairpin=5′IRD800/ATTATTATTATTGCAAGCTGTTCAGCTTGCTGAATTTATT (SEQ ID NO: 6)), and 0.3 μl UDG (NEB)) were incubated at 37° C. for 2 hours followed by addition of 2 μl 1M NaOH and 15 minutes at 95° C. to cleave abasic sites. Reactions were then neutralized with 2 μl 1 M HCl, terminated by adding 20 μl urea sample buffer (90% formamide+EDTA), and separated on a pre-warmed 15% acrylamide/urea gel in 1×TBE buffer at 60° C. for 70 minutes at 100V to monitor DNA cleavage. Gels were imaged by Odyssey Infrared Imaging System (Li-COR) and quantified via ImageJ.
Comparison of APOBEC3-associated mutational signatures in cell line and primary cancer data: Annotations of mutational signatures across 1,001 human cancer cell lines and 2,710 primary cancers from multiple cancer types were published previously (Petljak et al. 2019). Where possible, cancer and cell line cancer classes were matched. Eventually, 780 cell lines and 1843 primary cancers from matching cancer types were used in analyses presented in
Whole-genome sequencing: Genomic DNA was extracted from a total of 136 individual clones using the DNeasy Blood and Tissue Kit (QIAGEN) and quantified with Biotium Accuclear Ultra high sensitivity dsDNA Quantitative kit using Mosquito LV liquid platform, Bravo WS and BMG FLUOstar Omega plate reader. Samples were diluted to 200 ng/120 μl using Tecan liquid handling platform, sheared to 450 bp using a Covaris LE220 instrument and purified using Agencourt AMPure XP SPRI beads on Agilent Bravo WS. Library construction (ER, A-tailing and ligation) was performed using ‘NEB Ultra II custom kit’ on an Agilent Bravo WS automation system. PCR was set up using Agilent Bravo WS automation system, KapaHiFi Hot start mix and IDT 96 iPCR tag barcodes or unique dual indexes (UDI, Ilumina). PCR included 6 standard cycles: 1) incubation at 95° C. 5 mins; 2) incubation at 98° C. 30 s; 3) incubation at 65° C. 30 s; 4) incubation at 72° C. 1 min; 5) cycle from 2, 5 more times; 6) incubation at 72° C. 10 mins. Post-PCR plates were purified with Agencourt AMPure XP SPRI beads on Beckman BioMek NX96 liquid handling platform. Libraries were quantified with Biotium Accuclear Ultra high sensitivity dsDNA Quantitative kit using Mosquito LV liquid handling platform, Bravo WS and BMG FLUOstar Omega plate reader, pooled in equimolar amounts on a Beckman BioMek NX-8 liquid handling platform and normalized to 2.8 nM ready for cluster generation on a c-BOT. Pooled samples were loaded on the Illumina Hiseq X platform using 150 PE run lengths and sequenced to approximately 30× coverage. Sequencing reads were aligned to the reference human genome (GRCh37) using Burrows-Wheeler Alignment (BWA)-MEM (https://github.com/cancerit/PCAP-core). Unmapped, non-uniquely mapped reads and duplicate reads were excluded from further analyses.
Mutation calling: Somatic single base substitutions (SBS) were discovered using CaVEMan (https://github.com/cancerit/cgpCaVEManWrapper) (Jones et al. 2016), with major and minor copy number options set to, respectively, 5 and 2, to maximize discovery sensitivity. Rearrangements were identified with the BRASS algorithm (https://github.com/cancerit/BRASS). Sequences of the corresponding parent clones were used as reference genomes to discover mutations in individual daughter clones, whereas an unrelated normal human genome (Petljak et al. 2019) was used as a reference to discover mutations in parent clones. Mutations shared between parent clones (see below) were used to derive proxies for the mutational catalogues of bulk cell lines in
First, only SBS flagged as ‘PASS’ by Caveman when analyzed across the panel of 98 unmatched normal samples (github.com/cancerit/cgpCaVEManWrapper) (Jones et al. 2016) were considered, removing large proportions of mapping and sequencing artefacts, as well as the common germline variation presenting across the 98 healthy samples (Jones et al. 2016). Four post-hoc filters were applied to ‘PASS’ variants to further remove sequencing and mapping artefacts that occur with XTEN and BWA-mem-aligned data and to ensure that the mutation loci were well covered in the reference sequences. ‘PASS’ mutations were removed if (Filter 1) the median alignment score (ASMD) of mutation-reporting reads was less or equal to 140; (Filter 2) the mutation locus had the clipping index (CLPM) greater than 0; (Filter 3) the mutation locus was covered by 20 or fewer reads in the reference samples used in comparisons; and (Filter 4) less than two sequencing reads of opposite directions reported the mutation.
Second, all mutation loci that passed the filters above across all available clones obtained from the matching cell lines were genotyped. cgpVAF was used to count the number of mutant and wild type reads across individual clones (github.com/cancerit/vafCorrect) and mutations from each parent or daughter clone that were found at cumulative VAF of >5% across >10% of clones from other parental lineages were removed (Filter 5). Mutations presenting at other clones below these cut-offs were determined false-positive calls upon manual inspection of individual reads and were thus retained. In mutational catalogues from parent clones, this step removed the majority of the germline mutations and a smaller proportion of somatic mutations shared between parent clones, thus retaining predominantly somatic mutations acquired in individual parent cell lineages prior to the examined in vitro periods spanning the two cloning events. Such likely pre-existent germline and somatic mutations identified were accumulated across the related parent clones into mutational catalogues of bulk cell lines (
Validation of parent-daughter allocations: Genotyping of mutation loci across all clones revealed that, occasionally, a large proportion of mutations absent from the parent clones, and thus postulated to be acquired de novo in culture, was shared between some or all of their daughters (e.g.,
In the absence of sample swaps and putative contaminations, high proportions of clonal mutations that are shared between some of the related daughters and absent from their corresponding parents indicate that these mutations were indeed acquired de novo. Such daughters were overall rare and most likely established from the common subclone that arose at some point during the cultivation of the parent clone, after its DNA was already extracted.
Validation of clonal sample origins: To ensure that samples were clonal and single-cell-derived, proportions of the variant-reporting reads were examined (equivalent to variant allele fraction, VAF) at the mutation loci. Consistent with the polyploid background of most cell lines under investigation (Petljak et al. 2019), VAF distributions often deviated from the average of ˜50% expected for clonal heterozygous somatic mutations occurring in a diploid genome. The largely unimodal VAF distributions validated the clonal origin of the majority of the samples. Bimodal VAF distributions were observed in several clones. However, in all cases, at least one of the peaks followed the VAF distribution of other clonal samples from the same cell line, indicating that the other peak presenting in some clones likely originates from the sub-clonal evolution taking place in culture after the relevant single cells were isolated. Such instances were overall rare, but most common in the clones from the BC-1 cell line.
Sequence context-based classification of single base substitutions: SigProfilerMatrixGenerator (python v.1.1; github.com/AlexandrovLab/SigProfilerMatrixGenerator) (Bergstrom et al. 2019) was used to categorize SBSs into three separate sequence-context based classifications, which were used in analyses of mutation enrichments at APOBEC3-associated target motifs, and mutational signatures analyses. The algorithm allocates each SBS to (1) one of the 6-class categories (C>A, C>G, C>T, T>A, T>C and T>G) in which the mutated base is represented by the pyrimidine of the base pair; (2) to one of the 96-class categories (in which each of 6-class mutation types is further split into 16 subcategories baked on the flanking 5′ and 3′ bases); (3) and to one of the 1,536-class categories (in which each of 6-class mutation types is further split into 256 subcategories based on two flanking bases 5′ and 3′ to the mutated base).
Enrichment of APOBEC3-associated mutations at target motifs: Once SBSs were allocated to their sequence context classes as described, whereby the mutated base is represented by the pyrimidine base of the base pair, C>T and C>G base substitutions at TCN (N is any mutation) contexts which brand APOBEC3-associated SBS2 and SBS13 signatures were classified as ‘APOBEC’, whereas C>T and C>G substitutions at other contexts were classified as ‘OTHER’. C>A substitutions were excluded for simplicity, because some of the C>A mutations have been attributed to both APOBEC mutagenesis, as well as other mutational processes commonly arising during in vitro cell cultivation (Petljak et al. 2019). Enrichment of ‘APOBEC’ mutations was then investigated in the target sequence motifs associated with APOBEC mutagenesis previously, including specific pentanucleotide motifs (Chan et al. 2015) across all clones.
Enrichment of APOBEC3-associated mutations at trinucleotide and pentanucleotide motifs: Enrichment of APOBEC3-associated mutations was compared across the pentanucleotide motifs that were previously associated with APOBEC3A (YTCN and YTCA, where Y is a pyrimidine base) and APOBEC3B activities (RTCN and RTCA, where R is a purine base) in yeast overexpression systems (Chan et al. 2015). Relevant APOBEC3-associated trinucleotide and pentanucletide sequence motifs were quantified with sequence_utils (v.1.1.0, github.com/cancerit/sequence_utils/releases/tag/1.1.0; (github.com/cancerit/sequence_utils/wiki #sequence-context-of-regions-processed-by-caveman) across human autosomal chromosomes (GRCh37) and by excluding the regions not considered by the CaVEMan algorithm in detecting SBS. Middle base pair of each reference pentanucleotide sequence was considered a putative mutation target and the sequence context surrounding it was quantified using the DNA strand belonging to the pyrimidine base of the target base-pair, giving rise to a total of 96 trinucleotide and 512 possible pentanucleotide contexts that were quantified across both DNA strands (e.g., AGT trinucleotide is reported as ACT; AAGCA pentanucleotide is reported as TGCTT; middle ‘target’ bases underlined). Enrichment of ‘APOBEC’ mutations at the pentanucleotide motifs of interest was calculated as described previously (Petljak et al. 2019; Chan et al. 2015). For example, to calculate enrichment (E) of ‘APOBEC’ mutations at RTCN sites the following was used: ERTCN=(MutAPOBEC(RTCN)/ConRTCN)/(MutAPOBEC(TCN)/ConTCN).
MutAPOBEC(TCN) is the total number of ‘APOBEC’ mutations (C>G and C>T mutations at TCN contexts) in autosomal chromosomes; MutAPOBEC(RTCN) is the sum of ‘APOBEC’ mutations at RTCN contexts in autosomal chromosomes; whereas ConTCN and ConRTCN represent the total number of TCN and RTCN contexts available among the regions considered by Caveman when calling mutations across the autosomal chromosomes. As described, both DNA strands are considered, but the mutation types and target motifs are reported based on the strand of the pyrimidine base of the target base pair.
Mutational signatures analysis: Mutational signatures analyses were performed using the SigProfilerExtractor tool (v. 1.0.17; github.com/AlexandrovLab/SigProfilerExtractor) (Islam et al. 2021), which is a method based on nonnegative matrix factorization (NMF) for de novo extraction of mutational signatures from a given matrix of SBS types. SBS were classified into 96 classes based on their trinucleotide sequence contexts (see ‘Sequence context-based classification of single base substitutions’). The tool was used over 500 iterations to identify profiles of mutational signatures operative across a total of 815,923 genome-wide mutations identified across 4 bulk cell lines and their corresponding 136 daughter and parent clones.
Mutational signatures were extracted de novo and subsequently mapped to the known COSMIC Mutational Signatures of cleaner patterns (v3, //cancer.sanger.ac.uk/cosmic/signatures). Activities of identified COSMIC mutational signatures were quantified in each clone as part of the factorization of the input 96-SBS channel matrices, whereby numbers of SBS mutations belonging to each identified signature were quantified in the genome of each sample. All the relevant outputs from SigProfilerExtractor include profiles of de novo extracted signatures, metrics related to mapping of de novo signatures to COSMIC signature profiles and per-sample activity estimations.
Kataegis identification: Kataegis, or foci of localized hypermutation (Nik-Zainal et al., 2012a), were quantified in 136 whole-genome sequenced parent and daughter clones. The relevant focus was defined as a cluster of 5 or more consecutive APOBEC3-associated mutations (C>A, C>T, and C>G substitutions at TCN trinucleotides), exhibit strand-coordination and have an average inter-mutation distance of <7,500 bases. While the approach may miss some foci, sensitivity of detection was sacrificed to obtain higher predictive value of kataegis foci.
Identification of clustered mutations: To detect clustered single base substitutions, a sample-dependent inter-mutational distance (IMD) cutoff was derived, which is unlikely to occur by chance given the mutational pattern and mutational burden of each clone. To derive a background model reflecting the distribution of mutations that one would expect to observe by chance, SigProfilerSimulator (v1.1.2) was used to randomly simulate the mutations in each clone across the genome (Bergstrom et al. 2020). Specifically, the model was generated to maintain the +/−1 bp sequence context for each substitution, the strand coordination including the transcribed or untranscribed strand within genic regions (Bergstrom et al. 2020) and the total number of mutations across each chromosome for a given sample. All single base substitutions were randomly simulated 100 times and used to calculate the sample-dependent IMD cutoff so that 90% of mutations below this threshold were clustered with respect to the simulated model (i.e., not occurring by chance with a q-value <0.01). Further, the heterogeneity in mutation rates across the genome and the variances in clonality or copy-number were considered by correcting for mutation rich regions present in 10 Mb-sized windows and by using a threshold for the difference in variant allele frequencies between subsequent substitutions in a clustered event (variant allele frequency difference <0.10). Subsequently, the clustered mutations were subclassified into specific categories of events: (i) doublet substitutions; two adjacent mutations with consistent variant allele frequencies; (ii) extended multi-base substitutions; previously termed omikli events (Mas-Ponte et al. 2020) that reflect any two mutational events greater than 1 bp and less than the sample-dependent IMD cutoff with consistent variant allele frequencies; (iii) large mutational events; previously termed kataegi (Nik-Zainal et al. 2012) with three or more mutational events greater than 1 bp and less than the sample-dependent IMD cutoff with consistent variant allele frequencies. Lastly, statistical comparisons across clones were performed using a Mann-Whitney U test.
The present application refers to various issued patent, published patent applications, scientific journal articles, and other publications, all of which are incorporated herein by reference. The details of one or more embodiments of the invention are set forth herein. Other features, objects, and advantages of the invention will be apparent from the Detailed Description, the Figures, the Examples, and the Claims.
In the articles such as “a,” “an,” and “the” may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Embodiments or descriptions that include “or” between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process.
Furthermore, the disclosure encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims is introduced into another claim. For example, any claim that is dependent on another claim can be modified to include one or more limitations found in any other claims that is dependent on the same base claim. Where elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should it be understood that, in general, where the invention, or aspects of the invention, is/are referred to as comprising particular elements and/or features, certain embodiments of the disclosure or aspects of the disclosure consist, or consist essentially of, such elements and/or features. For purposes of simplicity, those embodiments have not been specifically set forth in haec verba herein. It is also noted that the terms “comprising” and “containing” are intended to be open and permits the inclusion of additional elements or steps. Where ranges are given, endpoints are included. Furthermore, unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or sub-range within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise.
This application refers to various issued patents, published patent applications, journal articles, and other publications, all of which are incorporated herein by reference. If there is a conflict between any of the incorporated references and the instant specification, the specification shall control. In addition, any particular embodiment of the present invention that falls within the prior art may be explicitly excluded from any one or more of the embodiments. Because such embodiments are deemed to be known to one of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular embodiment of the invention can be excluded from any embodiment, for any reason, whether or not related to the existence of prior art.
Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation many equivalents to the specific embodiments described herein. The scope of the present embodiments described herein is not intended to be limited to the above Description, but rather is as set forth in the appended embodiments. Those of ordinary skill in the art will appreciate that various changes and modifications to this description may be made without departing from the spirit or scope of the present invention, as defined in the following embodiments.
This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Application, U.S. Ser. No. 63/140,706, filed Jan. 22, 2021, which is incorporated herein by reference.
This invention was made with government support under Grant No. R01ES030993-01A1, R01ES032547-01, R00CA212290, and P30CA008748 awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/013328 | 1/21/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63140706 | Jan 2021 | US |