The disclosure is generally directed to systems and methods involving diagnostics and treatments of neoplasms and cancers based upon mutational burden.
Cancer develops from an accumulation of somatic mutations over time. While a small subset of these mutations drives tumor progression, the vast majority of remaining mutations, known as passengers, are believed to not contribute to tumor pathogenicity or progression, despite their abundance and diversity. The number of passengers in a tumor can vary by over four orders of magnitude, even within the same cancer type, from just a few to tens of thousands of point mutations.
Various embodiments are directed to diagnostics and treatments of neoplasms and cancers based on mutational load. In several embodiments, genetic material of neoplastic tissue or cancer is assessed for mutational load. In many embodiments, when the genetic material of neoplastic tissue or cancer has a high mutational load, a treatment regimen can be determined. In some embodiments, a treatment is administered based on the mutational load. In some embodiments, a treatment is administered when the mutational load is above a threshold. Treatments include (but are not limited to) transcription inhibitors, cytoskeleton organization inhibitors, protein degradation inhibitors, agonists of apoptosis, chaperone inhibitors, protein inhibitors, DNA replication inhibitors, energy metabolism inhibitors, oxidative stress inhibitors, G-coupled receptor modulators, HSP90 inhibitors, proteasome inhibitors, and ubiquitin-specific proteasome inhibitors.
In some implementations, a method is for determining a treatment regimen for a neoplasm or cancer. The method comprises assessing genetic material of a neoplasm or cancer of an individual to determine a mutational burden. The method comprises based on an amount of mutational burden, determine a treatment regimen.
In some implementations, assessing genetic material comprises quantifying the amount of somatic mutations within the genetic material.
In some implementations, the somatic mutations comprise single nucleotide variations (SNVs), copy number variations (CNVs), insertions, and deletions.
In some implementations, the method further comprises performing a high-throughput sequencing reaction on the genetic material of the neoplasm or the cancer to yield a sequencing result. The method further comprises aligning the sequencing result of the neoplasm or the cancer against a reference genome to identify genetic variations within the genetic material of the neoplasm or the cancer. The genetic variations within the genetic material of the neoplasm or the cancer comprises somatic mutations.
In some implementations, the method further comprises performing a high-throughput sequencing reaction on genetic material of a control sample to yield a sequencing result. The method further comprises aligning the sequencing result of the control sample against a reference genome. The method further comprises aligning the sequencing result of the neoplasm or the cancer with the sequencing result of the control sample to identify somatic variations within the neoplasm or the cancer.
In some implementations, the high-throughput sequencing is whole genome sequencing, whole exome sequencing, or targeted sequencing.
In some implementations, the method further comprises obtaining a biopsy of the individual. The biopsy is a tumor excision, a liquid biopsy, or a biological waste biopsy. The method further comprises extracting the genetic material of the neoplasm or the cancer from the biopsy.
In some implementations, the method further comprises when it is determined that the amount of mutational burden is greater than a threshold, administering to the individual the treatment regimen.
In some implementations, the threshold is a mutational burden in the top 25% of a particular cancer type or is a mutational burden in the top 25% of all cancer types.
In some implementations, the threshold is a mutational burden in the top 10% of a particular cancer type or is a mutational burden in the top 10% of all cancer types.
In some implementations, the threshold is a mutational burden in the top 5% of a particular cancer type or is a mutational burden in the top 5% of all cancer types.
In some implementations, the treatment regimen comprises administration of a HSP90 inhibitor, wherein the HSP90 inhibitor is alvespimycin, BIIB021, CCT018159, ganetespib, gedunin, NVP-AUY922, PU-H71, or VER-49009.
In some implementations, the treatment regimen comprises administration of a proteasome inhibitor, wherein the proteasome inhibitor is bortezomib, carfilzomib, delanzomib, ixazomib, ixazomib-citrate, MG-132, ONX-0914, or oprozomib.
In some implementations, the treatment regimen comprises administration of a ubiquitin-specific proteasome inhibitor, wherein the ubiquitin-specific proteasome inhibitor is NSC-632839, P22077, or P5091.
In some implementations, the neoplasm or the cancer is bile duct cancer, bladder cancer, bone cancer, brain cancer, breast cancer, colon/colorectal cancer, endometrial/uterine cancer, esophageal cancer, gall bladder cancer, gastric cancer, head and neck cancer, kidney cancer, liver cancer, lung cancer, neuroblastoma, ovarian cancer, pancreatic cancer, prostate cancer, rhabdoid tumor, sarcoma, skin cancer, or thyroid cancer.
In some implementations, a method is for assessing cytotoxicity of a compound on a neoplastic cell, a cancer cell, or a tumor cell with high mutation burden. The method comprises performing a high-throughput sequencing reaction on genetic material of a specimen to yield a sequencing result. The specimen is a growth of neoplastic cells, a growth of cancer cells, or a tumor. The method comprises quantifying the amount of somatic mutations within the genetic material. The method comprises determining that the specimen has a mutational burden that is greater than a threshold. The method comprises contacting a neoplastic cell of the growth of neoplastic cells, a cancer cell of the growth of neoplastic cells, or a tumor cell of the tumor with a compound to assess the cytotoxicity of the compound on the neoplastic cell, the cancer cell, or the tumor cell.
In some implementations, the neoplastic cell, the cancer cell, or the tumor cell is in vitro.
In some implementations, the neoplastic cell, the cancer cell, or the tumor cell is in vivo.
In some implementations, the compound is classified as: a transcription inhibitor, a cytoskeleton organization inhibitor, a protein degradation inhibitor, an agonist of apoptosis, a chaperone inhibitor, a protein inhibitor, a DNA replication inhibitor, an energy metabolism inhibitor, an oxidative stress inhibitor, a G-coupled receptor modulator, a HSP90 inhibitor, a proteasome inhibitor, or a ubiquitin-specific proteasome inhibitor.
In some implementations, the somatic mutations comprise at least one of: single nucleotide variations (SNVs), copy number variations (CNVs), insertions, or deletions.
The description and claims will be more fully understood with reference to the following figures and data graphs, which are presented as exemplary embodiments of the invention and should not be construed as a complete recitation of the scope of the invention.
Turning now to the drawings and data, systems and methods for assessing neoplasms and cancers for determining a treatment regimen are provided. It is now understood that high mutational burden within neoplastic tissue or cancer is a vulnerability that can be exploited. Mutational burden is an amount of somatic mutations within the genetic material of a neoplasm or cancer. In several embodiments, the genetic material of a neoplasm or cancer is assessed for mutational burden. In many embodiments, a treatment regimen is determined based on the mutational burden. For instance, a neoplasm or cancer with high mutational burden can be treated with particular drugs that target the cellular processes that protect the neoplasm or cancer in response to mutational burden. In some embodiments, a neoplasm or a cancer is treated based on the determined treatment regimen
Whether passenger mutations are damaging to tumors has long been a matter of debate. Generally, many in the art suggest that passengers are functionally unimportant to tumors given that most non-synonymous mutations are not removed by negative selection in somatic tissues. This is in direct contrast to the human germ-line, where non-synonymous mutations do appear to be functionally damaging in most genes and the signals of negative selection are pervasive. The common explanation for why the protein-coding mutations are removed in the human-germline but maintained in somatic tissues is that most genes are only important for multi-cellular function at the organismal level (e.g. during development), but not during somatic growth.
Here, an alternative explanation is provided that suggests non-synonymous mutations are indeed damaging in somatic evolution, but negative selection is too inefficient at removing them due to the linkage effects driven by the lack of recombination in somatic cells. Without recombination to break apart combinations of mutations, the beneficial drivers and deleterious passengers that happen to be present in the same genome are acted upon by selection together. This makes it less efficient for selection to favor the beneficial drivers and to remove the deleterious passengers. As a result, a substantial number of weakly damaging passengers can accrue in neoplasms and cancer due to inefficient negative selection over time.
If individual passengers are in fact substantially damaging in cancer, successful tumors with thousands of linked mutations must find ways to maintain their viability by mitigating the deleterious effects resulting from mutational burden. While paths to mitigation are difficult to predict for non-coding mutations, tumors with mutations in protein-coding genes are expected to minimize the damaging phenotypic effects of protein mis-folding stress. To investigate this hypothesis, gene expression was used to assess how the physiological state of cancer cells change as they accumulate protein coding mutations. Using a general linear mixed effects regression model (GLMM) and leveraging variation across 10,295 tumors from 33 cancer types, it was found that complexes that re-fold proteins (chaperones), degrade proteins (proteasome) and splice mRNA (spliceosome) are more up-regulated in neoplasms and cancers with higher mutation burdens. The results were validated by showing that similar physiological responses occur in high mutational burden cancer cell lines as well. These results establish a new diagnostic and treatment regimen for neoplasms and cancers with higher mutation burdens.
A number of embodiments are directed to assessment of neoplasms and cancers based on their mutational burden and determining an appropriate treatment regimen. In several embodiments, genetic material of a neoplasm or cancer is assessed for mutational burden. Neoplasms and cancers with higher mutational burden are suspect to treatments that target the biological processes that allow for the high mutational burden.
Several embodiments are directed to methods for assessing a neoplasm or cancer. Generally, genetic material of the neoplasm or cancer is examined for mutational burden. A neoplasm or cancer with high mutational burden can be treated with drugs that target the protein complexes and/or pathways that help mitigate the negative effects of mutational burden. Provided in
Method 200 begins with assessing 201 mutational burden of a neoplastic cell, a cancer cell, a tumor, a neoplasm, or a cancer. In several embodiments, genetic material of the neoplastic cell, the cancer cell, the tumor, neoplasm, or the cancer is examined for mutational burden. In some embodiments, a biopsy of an individual is utilized to obtain genetic material for examination. Various biopsies can be utilized to extract genetic material, including (but not limited to) tumor excision, liquid biopsy (e.g., blood draw), or biological waste biopsy. In some embodiments, the genetic material is obtained from a neoplastic cell, a cancer cell, or a tumor grown in vitro or in vivo. The genetic material is any nucleic acid that would be able to provide analysis of mutational burden, including (but not limited to) DNA, RNA, cell-free nucleic acids, or any other nucleic acids that would have derived from the neoplasm or cancer.
Mutational burden is assessed by quantifying somatic mutations from the genetic material. Mutations that can be assessed include (but are not limited to) single nucleotide variations (SNVs), copy number variations (CNVs), insertions, and deletions. In some embodiments, a total number of mutations are quantified. In some embodiments, a relative number of mutations are quantified, such as (for example) the number of mutations per kilobase of genetic material.
Various methodologies can be utilized to quantify mutations. Generally, the genetic material is sequenced (e.g., high-throughput sequencing) and the sequencing result is assessed against a control sequence to identify somatic mutations across a genome, an exome, or a targeted set of sequence fragments. In some embodiments, whole genome sequencing is performed. In some embodiments, whole exome sequencing is performed. In some embodiments, targeted sequencing of a high number of sequence fragments is performed (e.g., over 1000 sequence fragments, over 10,000 sequence fragments, over 100,000 sequence fragments, or over 1,000,000 sequence fragments). In some embodiments, the control sequence is genetic material of healthy tissue (i.e., non-neoplasm or non-cancer) of the individual. In some embodiments, the control sequence is an established reference sequence. Reference sequences include (but are not limited to) hg19, hg38, and population genomes or superpopulation genomes such as those from the 1000 Genomes project (Clarke, et al. Nat Methods 9, 459-462 (2012), the disclosure of which is incorporated herein by reference).
Various methodologies can be utilized to identify mutations. Generally, the sequencing result of genetic material is aligned to yield a genome, an exome, or a targeted portion of the genome. Alignment can be done using an established reference sequence, such as (for example) hg19, hg38, and population genomes or superpopulation genomes such as those from the 1000 Genomes project. Mutations can be called based on variation from the control sequence. Protein coding mutations can further be assessed based on the effect of the variation on the protein sequence.
Method 200 further determines (203) a treatment regimen based on the mutational burden. It has been determined that neoplasms and cancers with higher mutational burden are susceptible to drugs that counteract the cellular processes that help mitigate the effects of somatic mutation.
In several embodiments, cancers and neoplasms with high mutational burden are determined to benefit from drugs that modulate the protein structures cellular processes that are more upregulated with greater mutational burden. Accordingly, in various embodiments, drug treatments include (but are not limited to) transcription inhibitors, cytoskeleton organization inhibitors, protein degradation inhibitors, agonists of apoptosis, chaperone inhibitors, protein inhibitors, DNA replication inhibitors, energy metabolism inhibitors, oxidative stress inhibitors, G-coupled receptor modulators, HSP90 inhibitors, proteasome inhibitors, and ubiquitin-specific proteasome inhibitors.
In some embodiments, the drug for treatment is a HSP90 inhibitor. HSP90 inhibitors include (but are not limited to) alvespimycin, BIIB021, CCT018159, ganetespib, gedunin, NVP-AUY922, PU-H71, and VER-49009. For more on each drug's effectiveness on cancer type, see
In some embodiments, the drug for treatment is a proteasome inhibitor. Proteasome inhibitors include (but are not limited to) bortezomib, carfilzomib, delanzomib, ixazomib, ixazomib-citrate, MG-132, ONX-0914, and oprozomib. For more on each drug's effectiveness on cancer type, see
In some embodiments, the drug for treatment is a ubiquitin-specific proteasome inhibitor. Ubiquitin-specific proteasome inhibitors include (but are not limited to) NSC-632839, P22077, and P5091. For more on each drug's effectiveness on cancer type, see
In some embodiments, the drug for treatment is a growth factor inhibitor, an apoptosis agonist, an energy metabolism inhibitor, an inflammatory/immune modulator, a protein synthesis inhibitor, a DNA replications inhibitor, a cytoskeleton inhibitor, a transcription inhibitor, an ion channel regulator, a protein degradation inhibitor, a chaperone inhibitor, an oxidative stress activator, a lipid metabolism inhibitor, a growth hormone inhibitor, an angiogenesis inhibitor, a neurotransmitter inhibitor, a neurotransmitter enhancer, an oxidative stress inhibitor, a mucolytic agent, a melanin inhibitor, a histone/methylation inhibitor, a sugar metabolism inhibitor, a G-coupled protein receptor regulator, a protein metabolism inhibitor, a nitrogen metabolism inhibitor, or a viral replication inhibitor. Non-limiting examples of drugs for treatment is provided in Table 1.
It is to be understood that drug combinations can also be provided. Accordingly, two or more drugs are provided for treatment. Any two drugs described herein can be combined together.
In several embodiments, the treatment regimen is based on the amount of mutational burden. In some embodiments, the drugs described herein for treatment are to be utilized within a regimen when the mutational burden is above a threshold. Any appropriate threshold can be utilized. In some embodiments, neoplasms or cancers having mutational burden in the top 5% of the particular cancer or in the top 5% of all cancers are provided a treatment regimen described herein. In some embodiments, neoplasms or cancers having mutational burden in the top 10% of the particular cancer or in the top 10% of all cancers are provided a treatment regimen described herein. In some embodiments, neoplasms or cancers having mutational burden in the top 25% of the particular cancer or in the top 25% of all cancers are provided a treatment regimen described herein. Table 2 provides mutational burden counts for the top 5%, top 10%, and top 25% in various particular cancers or of all cancer types.
Based upon the mutational burden and determined treatment regimen, a neoplastic cell, a cancer cell, a tumor, a neoplasm, or a cancer is optionally treated (205). Accordingly, the determined treatment regimen can be administered, such as the treatment regimens described herein.
In some embodiments, when a neoplastic cell, a cancer cell, a tumor, a neoplasm, or a cancer has a mutational burden over a threshold, the neoplastic cell, a cancer cell, a tumor, a neoplasm, or a cancer is contacted with a transcription inhibitor, a cytoskeleton organization inhibitor, a protein degradation inhibitor, an agonist of apoptosis, a chaperone inhibitor, a protein inhibitor, a DNA replication inhibitor, an energy metabolism inhibitor, an oxidative stress inhibitor, a G-coupled receptor modulator, a HSP90 inhibitor, a proteasome inhibitor, or a ubiquitin-specific proteasome inhibitor. In some embodiments, when a neoplastic cell, a cancer cell, a tumor, a neoplasm, or a cancer has a mutational burden is not over a threshold, the neoplastic cell, a cancer cell, a tumor, a neoplasm, or a cancer is not contacted with a transcription inhibitor, a cytoskeleton organization inhibitor, a protein degradation inhibitor, an agonist of apoptosis, a chaperone inhibitor, a protein inhibitor, a DNA replication inhibitor, an energy metabolism inhibitor, an oxidative stress inhibitor, a G-coupled receptor modulator, a HSP90 inhibitor, a proteasome inhibitor, or a ubiquitin-specific proteasome inhibitor, but is instead contacted with a medicament of a standardized treatment protocol for the cancer type.
In some embodiments, when a neoplastic cell, a cancer cell, a tumor, a neoplasm, or a cancer has a mutational burden over a threshold, an individual in which the neoplastic cell, a cancer cell, a tumor, a neoplasm, or a cancer was derived from is administered a treatment comprising a transcription inhibitor, a cytoskeleton organization inhibitor, a protein degradation inhibitor, an agonist of apoptosis, a chaperone inhibitor, a protein inhibitor, a DNA replication inhibitor, an energy metabolism inhibitor, an oxidative stress inhibitor, a G-coupled receptor modulator, a HSP90 inhibitor, a proteasome inhibitor, or a ubiquitin-specific proteasome inhibitor. In some embodiments, when a neoplastic cell, a cancer cell, a tumor, a neoplasm, or a cancer has a mutational burden is not over a threshold, an individual in which the neoplastic cell, a cancer cell, a tumor, a neoplasm, or a cancer was derived from is not administered a treatment comprising a transcription inhibitor, a cytoskeleton organization inhibitor, a protein degradation inhibitor, an agonist of apoptosis, a chaperone inhibitor, a protein inhibitor, a DNA replication inhibitor, an energy metabolism inhibitor, an oxidative stress inhibitor, a G-coupled receptor modulator, a HSP90 inhibitor, a proteasome inhibitor, or a ubiquitin-specific proteasome inhibitor, but is instead administered a medicament of a standardized treatment protocol for the cancer type.
Neoplasm and cancer types that can be treated include (but are not limited to) all cancers generally, bile duct cancer, bladder cancer, bone cancer, brain cancer, breast cancer, colon/colorectal cancer, endometrial/uterine cancer, esophageal cancer, gall bladder cancer, gastric cancer, head and neck cancer, kidney cancer, liver cancer, lung cancer, neuroblastoma, ovarian cancer, pancreatic cancer, prostate cancer, rhabdoid tumor, sarcoma, skin cancer, and thyroid cancer.
Dosing and therapeutic regimens can be administered appropriate to the cancer to be treated, and can be determined by preclinical and clinical studies.
In some embodiments, drugs are administered in a therapeutically effective amount as part of a course of treatment. As used in this context, to “treat” means to ameliorate at least one symptom of the disorder to be treated or to provide a beneficial physiological effect. For example, one such amelioration of a symptom could be reduction of tumor size.
A therapeutically effective amount can be an amount sufficient to prevent reduce, ameliorate or eliminate the symptoms of breast cancer. In some embodiments, a therapeutically effective amount is an amount sufficient to reduce the growth and/or metastasis of a cancer.
While specific examples of methods for assessing mutational burden and determining a treatment regimen are described above, one of ordinary skill in the art can appreciate that various steps of the method can be performed in different orders and that certain steps may be optional according to some embodiments of the disclosure. As such, it should be clear that the various steps of the process could be used as appropriate to the requirements of specific applications.
The embodiments of the disclosure will be better understood with the several examples provided. Validation results are also provided.
Cancers Adapt to their Mutational Load by Buffering Protein Misfolding Stress
In this example, the ability of tumors to maintain their viability by mitigating the detrimental effects of mutational load was examined by analyzing tumor tissues with paired mutational and gene expression profiles to assess how the physiological state of cancer cells change as they accumulate protein coding mutations. Using a general linear mixed effects regression model (GLMM), variation across 10,295 tumors from 33 cancer types was leverage and found that complexes that re-fold proteins (chaperones), degrade proteins (proteasome) and splice mRNA (spliceosome) are up-regulated in high mutation load tumors. These results were validated by showing that similar physiological responses occur in high mutational load cancer cell lines as well. Finally, a causal connection was established by showing that high mutational load cell lines are particularly sensitive when proteasome and chaperone function is disrupted through downregulation of expression via short-hairpin RNA (shRNA) knock-down or targeted therapies. Collectively, these data indicate that the viability of high mutational load tumors is strongly dependent on the up-regulation of complexes that degrade and refold proteins, revealing a generic vulnerability of cancer that can be therapeutically exploited.
A genome-wide screen was performed to systematically identify which genes are transcriptionally upregulated in response to mutational load in human tumors. To do so, publicly available whole-exome and gene expression data from 10,295 human tumors across 33 cancer types from The Cancer Genome Atlas (TCGA) were assessed. Multiple classes of mutations were considered to define mutational load and investigated their degree of collinearity, focusing on protein-coding regions since the use of whole-exome data limits the ability to accurately assess mutations in non-coding regions. It was found that there is a high degree of collinearity among synonymous, non-synonymous and nonsense point mutations in protein coding genes (R>0.9) but weak collinearity between point mutations and copy number alterations (R<0.05) (
Since gene expression can vary across tumors due to many factors, such as cancer type, tumor purity and other unknown factors, a generalized linear mixed model (GLMM) was utilized to measure the association of mutational load and gene expression while accounting for these potential confounders (
Y˜β0+β1X1+β2X2+v+e
where Y is a vector of normalized expression values across all tumors, β0 is the fixed intercept, β1 is the fixed slope for the predictor variable X1 which is a vector of mutational load values for each tumor, β2 is the fixed slope for the predictor variable X2 which is a vector of the purity of each tumor, v is the random intercept for each cancer type, and e is a Gaussian error term (for more details, see Methods section below).
Using this approach, the GLMM was applied to all tumors in TCGA and identified 5,330 genes that are significantly up-regulated in response to mutational load (β1>0, FDR<0.05). Next, these genes were linked to cellular function by performing gene set enrichment to known protein complexes (CORUM database) and pathways (KEGG database) using gprofiler2 (
It was next investigated in detail how these protein complexes could mitigate the damaging effects of protein misfolding in high mutational load tumors by examining the role of the spliceosome in gene silencing. It was hypothesized that the up-regulation of the spliceosome in high mutational load tumors prevents further protein misfolding by regulating pre-mRNA transcripts to be degraded rather than translated. The down-regulation of gene expression via alternative splicing events, such as intron retention, is one mechanism to silence genes by funneling transcripts to mRNA decay pathways.
To test whether gene expression is down-regulated in high mutational load tumors through intron retention, previously called alternative splicing events in TCGA were utilized. Alternative splicing events within this dataset were quantified through a metric called percent spliced in or PSI. PSI is calculated as the number of reads that overlap the alternative splicing event (e.g. for intron retention, either at intronic regions or those at the boundary of exon to intron junctions) divided by the total number of reads that support and don't support the alternative splicing event. Thus, PSI estimates the probability of alternative splicing events only at specific exonic boundaries in the entire transcript population without requiring information on the complete underlying composition of each full length-transcript.
Using these alternative splicing calls, it was reasoned that if a transcript contains an intron retention event and is downregulated in expression, the transcript is more likely to have been degraded by mRNA decay pathways. For all genes, it was first quantified whether intron retention events were present based on a threshold value >80% PSI. For each gene with an intron retention event, it was quantified whether the expression of the same gene was under-expressed. Each gene was counted as under-expressed if it was one standard deviation below the mean expression within the same cancer type. To control for mutations that might affect patterns of expression, (i.e., expression quantitative trait loci or eQTL effects), alternative splicing events that contained a point mutation within the same gene were removed from the analysis (which only represent ˜1% of intron retention events across all tumors; for more details, see Methods section below). It was found that relative to all transcripts with intron retention events, the number of transcripts that are under-expressed increases with tumor mutational load (
It was next investigated which genes are more likely to be silenced through mRNA decay between low and high mutational load tumors. For each intron retention event, it was calculated whether PSI values were significantly different in low mutational load tumors (<10 total protein-coding mutations) compared to high mutational load tumors (>1000 total protein-coding mutations) using a t-test. This approach identified 606 and 201 genes that have more and less intron retention events in high mutational load tumors, respectively. Using gene set enrichment analysis, it was found that cytoplasmic ribosomes contain more intron retention events in high mutational load tumors, potentially leading to their down-regulation through mRNA decay to prevent further protein mis-folding (
It was next investigated in detail how the remaining proteostasis complexes that were significant in the genome-wide screen, which regulate protein synthesis, degradation and folding, could mitigate protein misfolding in high mutational load tumors. To do so, we expanded our gene sets to include other chaperone families, all ribosomal complexes and proteasomal subunits (
Differences in expression of different structural components of the proteasome were further examined, which is a large protein complex responsible for degradation of intracellular proteins. Consistent with the over-expression of chaperone families that mitigate protein mis-folding, both the 19s regulatory particle (which recognizes and imports proteins for degradation) and the 20s core (which cleaves peptides) of the proteasome are up-regulated in response to mutational load in TCGA (
Finally, a jackknife re-sampling procedure was performed to confirm that specific cancer types were not driving patterns of association within the GLMM. This was achieved by removing each cancer type from the regression model one at a time, and re-calculating regression coefficients on the remaining set of samples. Overall, regression coefficients were stable across cancer types and trends were unchanged (
It was next sought to validate these results by first examining whether the expression patterns observed in human tumors replicate within cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE). Unlike TCGA, samples within each cancer type in CCLE can be small and are unbalanced (i.e., some cancer types have <10 samples and others have >100 samples). Since GLMMs may not be able to estimate among-population variance accurately in these cases, a simple generalized linear model (GLM) was utilized instead to measure the effect of mutational load on patterns of expression without over-constraining the model. Indeed, it was found that expression patterns seen in human tumors broadly replicate in cancer cell lines (
Overall, this indicated that the expression patterns observed are cell autonomous (i.e., independent of organismal effects such as the immune system, age or microenvironment) and consistent across high mutational load cancer cells. Importantly, it also demonstrates that cancer cell lines are a reasonable model to causally interrogate these effects further through functional and pharmacological perturbation experiments.
To establish a causal relationship between the over-expression of proteostasis machinery and maintenance of cell viability under high mutational load, expression knock-down (shRNA) estimates from project Achilles were utilized for the same cancer cell lines as in CCLE. It was sought to measure how mutational load impacts cell viability when protein complexes and gene families undergo a loss of function through expression knock-down. Since the shRNA screen was performed on an individual gene basis, a GLM framework was utilized that aggregates expression knock-down estimates of all genes within a given proteostasis gene family to jointly measure how mutational load impacts cell viability after loss of function. Specifically, an additional categorical variable of the gene name was included within each gene family to allow for a change in the intercept within each gene in the GLM when measuring the association of mutational load and cell viability after expression knock-down. In addition, it was similarly evaluated whether specific cancer types were driving patterns of association within the GLM through jackknife re-sampling by cancer type (
Overall, it was found that elevated mutational load is associated with decreased cell viability when the function of most chaperone gene families are disrupted through expression knock-down (
Since functional redundancy in the human genome can make expression knock-down estimates within individual genes noisy, it was also examined how drugs targeting the function of whole complexes impacts viability with mutational load across all cancer types and when removing individual cancer types through jackknife re-sampling. To do so, drug sensitivity screening data from project PRISM within CCLE was utilized and a simple GLM was used to measure the association of mutational load and cell viability after drug inhibition. It was found that treatment with the majority of proteasome inhibitors (6/8) and ubiquitin-specific proteasome inhibitors (2/3), which target protein degradation complexes, are significantly associated with a decrease in cell viability in high mutational load cell lines (
Lastly, it was found that most drugs in the PRISM database do not significantly decrease cell viability with mutational load (
Data availability and resources. Whole-exome, somatic mutation calls of 10,486 cancer patients across 33 cancer types in The Cancer Genome Atlas (TCGA) were downloaded from the Multi-Center Mutation Calling in Multiple Cancers (MC3) project (Ellrott, K. et al. Cell Syst. 6, 271-281.e7 (2018), the disclosure of which is incorporated herein by reference; gdc.cancer.gov/about-data/publications/mc3-2017). For the same patients in TCGA, RNA-seq data of log2 transformed RSEM normalized counts were downloaded from the UCSC Xena Browser (Goldman, M. J. et al. Nature Biotechnology (2020), the disclosure of which is incorporated herein by reference; xenabrowser.net/datapages/) and copy number alterations (CNAs), including amplifications and deletions, called via ABSOLUTE were downloaded from COSMIC (v91) (Bamford, S. et al. Br. J. Cancer (2004), the disclosure of which is incorporated herein by reference; cancer.sanger.ac.uk/cosmic/download). Tumor purity estimates for TCGA were downloaded from the Genomic Database Commons (GDC) (Grossman, R. L. et al. N. Engl. J. Med. 375, 1109-1112 (2016), the disclosure of which is incorporated herein by reference; gdc.cancergov/about-data/publications/pancanatlas). Data for all cancer cell lines in the Cancer Cell Line Encyclopedia (CCLE) were downloaded from DepMap (Barretina, J. et al. Nature (2012), the disclosure of which is incorporated herein by reference; depmap.org/portal/download/all/). Specifically, mutation calls (Version 21Q3) from whole-exome sequencing data, copy number alternations quantified by ABSOLUTE (Version CCLE 2019), log2 transformed TPM normalized counts (Version 21Q3) from RNA-seq data, proteomics data quantified by mass spectrometry, shRNA data from project Achilles (Tsherniak, A. et al. Cell (2017), the disclosure of which is incorporated herein by reference) normalized using DEMETER (DEMETER2 Data v6), and primary drug sensitivity screens of replicate collapsed log fold changes relative to DMSO from project PRISM (Version 19Q4; Corsello, S. M. et al. Discovering the anticancer potential of non-oncology drugs by systematic viability profiling. Nat. Cancer (2020), the disclosure of which is incorporated herein by reference) were used.
Statistical analysis. The ImerTest and Imer package in R was used to apply a separate generalized linear mixed model (GLMM) for each gene in the genome to identify groups of genes whose expression is up-regulated in response to mutational load in TCGA. For each gene, expression values across all patients were z-score normalized in all analyses to ensure fair comparisons across genes. Known co-variates of tumor purity and cancer type were included in the GLMM. Tumor purity and mutational load were modeled as fixed effects, whereas cancer type was modeled as a random effect (i.e. random intercept) since it varies across groups of patients and can be interpreted as repeated measurements across groups. For all analyses, mutational load was defined as login of the number of synonymous, nonsynonymous and nonsense mutations per tumor. For each gene, the parameters used in the GLMM were as follows,
Y˜β0+β1X1+β2X2+v+e
where Y is a vector of expression values of each tumor, β0 is the fixed intercept, β1 is the fixed slope for the predictor variable X1 which is a vector of mutational load values for each tumor, β2 is the fixed slope for the predictor variable X2 which is a vector of the purity of each tumor, v is the random intercept for each cancer type, and e is a Gaussian error term. To examine expression responses to mutational load within a given protein complex and cancer type, the same normalization procedures were applied as above within cancer types and a separate GLM for each cancer type was ran as follows,
Y˜β0+β1X1+β2X2+β3X3+e
where Y is a vector of expression values of each tumor in a given cancer type, β0 is the fixed intercept, β1 is the fixed slope for the predictor variable X1 which is a vector of mutational load values for each tumor, β2 is the fixed slope for the predictor variable X2 which is a vector of the purity of each tumor, β3 is a change in the intercept for X3 which is a categorical variable of individual genes within each proteostasis complex and e is a Gaussian error term.
Unlike TCGA, samples within each cancer type in CCLE can be small and are unbalanced (i.e. some cancer types have <10 samples and others have >100 samples). In these cases, mixed effects models may not be able to estimate among-population variance accurately. Thus, for all regression-based analyses in CCLE, a simple generalized linear model (GLM) was used instead. Cell viability values across all cell lines were z-score normalized by gene in all analyses to ensure fair comparisons across genes. To assess whether the same sets of genes are up-regulated in response to mutational load in CCLE using the GLM, a similar procedure to the GLMM was performed. A separate GLM was applied for each gene with the following parameters
Y˜β0+β1X1+e
where Y is a vector normalized expression values of each cell line, β0 is the fixed intercept, β1 is the fixed slope for the predictor variable X1 which is a vector of mutational load values for each tumor, and e is a Gaussian error term. To assess whether protein abundances are similarly up-regulated in response to mutational load in CCLE in proteostasis complexes, a separate GLM was applied to each gene with the following parameters,
Y˜β0+β1X1+β2X2+e
where Y is a vector of normalized cell viability estimates after expression knock-down of an individual gene across all cell lines, β0 is the fixed reference intercept, β1 is the fixed slope for the predictor variable X1 which is a vector of mutational load values for each cell line, β2 is a change in the intercept for X2 which is a categorical variable of individual genes within each proteostasis complex, and e is a Gaussian error term. To estimate the association of mutational load and cell viability after pharmacologic inhibition of proteostasis machinery, the following GLM was applied to each relevant drug in PRISM:
Y˜β0+β1X1+e
where Y is a vector normalized cell viability estimates after drug inhibition across all cell lines, β0 is the fixed intercept, β1 is the fixed slope for the predictor variable X1 which is a vector of mutational load values for each tumor, and e is a Gaussian error term.
Model validation. For both the GLM and GLMM, model assumptions of homogeneity of variance were verified by plotting residuals versus fitted values in the model and residuals versus each covariate in the model. Multi-collinearity with other mutational classes (e.g. such as copy number alterations, CNAs) were considered but not found to correlate with point mutations (
Proteostasis gene sets. Genes for chaperone complexes were identified from 76 and genes that are co-chaperones were not considered. Proteasome and ribosomal complexes were identified from CORUM (Giurgiu, M. et al. Nucleic Acids Res. (2019), the disclosure of which is incorporated herein by reference).
Gene set enrichment analysis. All gene set enrichment analysis was performed using gprofiler2 with default parameters (Peterson, H., et al. F1000Research (2020), the disclosure of which is incorporated herein by reference). For all sets of genes, significance was determined after correcting for multiple hypothesis testing (FDR<0.05). For gene set enrichment analysis used to identify genes up-regulated in TCGA in response to mutational load, all terms in CORUM database were reported and enrichment terms in the KEGG database of diseases not related to cancer (e.g. ‘Influenza A’) were omitted from the main figures for clarity and space (Tanabe, M. & Kanehisa, M. Curr. Protoc. Bioinforma. (2012), the disclosure of which is incorporated herein by reference). For gene sets used to identify terms differentially splice in between high and low mutational load tumors, all terms in the CORUM and the REACTOME database were reported in the main figures.
Alternative splicing analysis. Alternative splicing events were quantified through a previously established metric called PSI. PSI is calculated as the number of reads that overlap the alternative splicing event (e.g. for intron retention, either at intronic regions or those at the boundary of exon to intron junctions) divided by the total number of reads that support and don't support the alternative splicing event. PSI summarizes alternative splicing events at specific exonic boundaries in the entire transcript population without needing to know the complete underlying composition of each full length-transcript.
Alternative splicing calls for all tumors in TCGA were downloaded from TCGA SpliceSeq (Ryan, M. et al. Nucleic Acids Res. (2016), the disclosure of which is incorporated herein by reference). Default splice event filters (percentage of samples with PSI values >75%) from the database were applied. To test whether gene expression is down-regulated in high mutational load tumors through alternative splicing, we calculated whether alternative splicing events were present based on different threshold values of percent spliced in (PSI) from 90% to 50%. (
To investigate which genes are differentially spliced in between low and high mutational load tumors for specific alternative splicing events (i.e. intron retention), a t-test was used to calculate whether PSI values were significantly different in tumors with <10 protein-coding mutations compared to tumors with >1000 protein-coding mutations. Each alternative splicing event within a gene was required to have less than 25% of missing PSI values and a mean difference between the two groups of >0.01 to be considered. This approach identified 606 and 201 significant genes that have more and fewer intron retention events in high mutational load tumors, respectively, after correcting for multiple hypothesis testing (FDR<0.05).
Drug category annotation and enrichment analysis. A separate GLM was ran for all drugs in the PRISM database to evaluate whether they are associated with mutational load and cell viability. All drugs that were negatively associated with mutational load and viability were queried on PubMed based on their reported mechanism of action in PRISM and grouped into broad categories (Table 1). Categories of drug mechanism of action were first chosen based on their role in metabolism and known hallmarks of cancer. Additional categories not directly related to known cancer associated functional groups were made for drugs that could not otherwise be grouped (i.e. ‘Ion Channel Regulation’, Viral Replication Inhibitor’, etc.). Drugs with ambiguous mechanism of action (e.g. ‘cosmetic’, ‘coloring agent’) were grouped into ‘Other’. The abstracts of up to 10 associated papers were used to examine for evidence connecting drug mechanisms of action to 33 broad categories. In total, 700 drug mechanism of action were grouped and annotated into 33 broad categories. These broad categories were used to assess whether high mutational load cancer cell lines are generically vulnerable to drugs or whether certain categories are more likely to contain drugs effective against high mutational load cell lines. To control for differences in the number of drugs within each category, 50 drugs were randomly sampled, and the fraction of drugs significantly associated with mutational load in each category was calculated 100 times to generate confidence intervals.
This application claims priority to U.S. Provisional Application Ser. No. 63/364,936, entitled “Systems and Methods for Assessing Mutational Burden of Neoplasms and Associated Treatments Thereof” filed May 18, 2022, which is incorporated herein by reference in its entirety.
This invention was made with Government support under contracts CA238296 and GM118165 awarded by the National Institutes of Health. The Government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63364936 | May 2022 | US |