Cancer immunotherapy heightens the immune system's ability to recognize a cancer and destroy the cancer cells, as opposed to more traditional compounds that directly inhibit the cancer's ability to proliferate. Cancer immunotherapy can provide good responses, even in advanced stages of cancer. Some current immunotherapies include cancer vaccines, antibodies, T cell infusions, and checkpoint blockade therapy. Malignant tumors often co-opt immune suppressive and tolerance mechanisms to avoid immune destruction. Immune checkpoint blockade removes inhibitory signals of T-cell activation, which enables tumor-reactive T cells to overcome regulatory mechanisms and mount an effective antitumor response. Accordingly, immune checkpoint blockade inhibits T cell-negative costimulation in order to unleash antitumor T-cell responses that recognize tumor antigens.
However, only a subset of patients respond to current cancer immunotherapies, and it is difficult to predict which patients will respond. To increase the number of patients who benefit, combination therapies are being used. Cancer vaccine in combination with checkpoint blockade therapy is a promising approach to increasing the antitumor immune response. But, cancers typically have specific mutations (private mutations) in a person; cancer vaccines based on private mutations may be prohibitively expensive and inhibit widespread adoption of this approach.
Embodiments of the present disclosure provide a strategy for personalized cancer vaccines that use public antigens that are shared across individuals. Genomewide dysregulation of transcription and translation leads to overexpression of non-canonical protein coding genes, including transposable elements (TEs). TEs are strongly repressed in healthy cells to prevent genomic instability but can become dysregulated in cancer. Disclosed herein is a computational framework for identifying potential cancer antigens within transposable elements, e.g., using RNA-seq or mass spectrometry data. Some embodiments use autonomous transposable elements in the human genome, e.g., L1HS.
Embodiments of the present disclosure may include a method for identifying cancer antigens that may be used as cancer vaccines. The method may include identifying a group of candidate cancer antigens that are generated from transposable elements. Embodiments may also include determining a baseline expression level for each of the candidate cancer antigens using measurements of healthy tissue from a first cohort of healthy subjects. Embodiments may also include determining a tumor expression level for each of the candidate cancer antigens using measurements of tumor tissue from a second cohort of cancer subjects. Embodiments may also include determining a differential expression level for each of the candidate cancer antigens using the baseline expression levels and the tumor expression levels. Embodiments may also include selecting one or more of the candidate cancer antigens having a differential expression level greater than a threshold.
Embodiments of the present disclosure may include a method of identifying a cancer vaccine for a patient, the method may include identifying a group of candidate cancer antigens that are generated from transposable elements. Embodiments may also include determining a baseline expression level for each of the candidate cancer antigens, where the baseline expression levels are determined using measurements of healthy tissue from healthy subjects. Embodiments may also include determining a tumor expression level for each of the candidate cancer antigens using measurements of tumor tissue from the patient. Embodiments may also include determining a differential expression level for each of the candidate cancer antigens using the baseline expression levels and the tumor expression levels. Embodiments may also include selecting one or more of the candidate cancer antigens having a differential expression level greater than a threshold. Embodiments may also include selecting a cancer vaccine corresponding to the one or more of the candidate cancer antigens.
Embodiments of the present disclosure may include a microarray including a first array of nucleic acid probes that hybridize to expressed transposable element mRNA from tumor samples or to cDNA derived from such mRNA. Embodiments may also include a second array of nucleic acid probes that hybridize to mRNA or cDNA corresponding to different MHC haplotypes. Embodiments may also include a third array of nucleic acid probes that hybridize to mRNA or cDNA corresponding to mutated different genotypes of APOBEC
These and other embodiments of the disclosure are described in detail below. For example, other embodiments are directed to systems, devices, and computer readable media associated with methods described herein.
A better understanding of the nature and advantages of embodiments of the present disclosure may be gained with reference to the following detailed description and the accompanying drawings.
An Appendix includes: table 2 showing example nucleic acid probes that hybridize to cDNA from transposable elements in a human genome, table 3 showing example nucleic acid probes that hybridize to cDNA corresponding to different antigen presentation pathway genes, and table 4 showing nucleic acid probes that hybridize to cDNA corresponding to APOBEC mutated RNA transcripts.
The term “transposable element” may refer to a DNA sequence that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transposable elements are shared across individuals and related species.
This disclosure provides novel strategies for personalized cancer vaccines by identifying antigens in cancer cells that are shared across at least some individuals. Mutations are typically not shared among a large segment of cancer patients. Thus, proteins associated with such private mutations are not good antigens for a widespread approach. Instead, embodiments recognize that certain proteins (corresponding to transposable elements) are commonly expressed in tumors as a result of dysregulation (e.g., epigenetic dysregulation, as may occur from widespread DNA hypomethylation), where such dysregulation is not caused by sequence variations in the corresponding coding regions. Thus, these antigens will be common among a cohort of the population that share relevant parts of the genetic code, e.g., a same major histocompatibility complex (WIC) and APOBEC (“apolipoprotein B mRNA editing enzyme haplotype”) mutational signature. Different tissues may have different regulation of such antigen proteins, but tumors in a same tissue type tend to have dysregulation of similar antigens, thereby enabling a given vaccine to have relatively widespread applicability.
Genomewide dysregulation of transcription and translation leads to overexpression of non-canonical protein coding genes, including transposable elements (TEs). TEs are strongly repressed in healthy cells to prevent genomic instability but can become dysregulated in cancer. Disclosed herein is a computational framework for identifying potential cancer vaccine antigens within transposable elements. Such antigens can be used to stimulate a subpopulation of the patient's T-cells that are capable of identifying cancer cells. By immunizing a patient with the vaccine or by using the peptide to stimulate and expand T cells ex vivo, embodiments can expand and activate the T-cells that are in lymph nodes and circulating throughout the body to attack cancer cells that present TE peptides in the context of a major histocompatibility (MHC) protein.
Since the TEs are highly conserved across a population, cancer vaccines derived from TE's can have wide applicability. Further, since TE antigens are not normally expressed in healthy cells, there is potentially limited toxicity in such a vaccine. TE antigens can be selected using further criteria, e.g., solubility of the peptide or ability to be presented by the HLA molecules of an individual patient.
To identify TE proteins that are overexpressed in tumors, various embodiments can be used to analyze RNA sequencing data (from which protein expression can be inferred) or direct protein measurements, such as mass spectrometry. From this, a set of candidate TE proteins (candidate cancer antigens) can be identified. Such TE proteins can be defined/identified by kmers in TE loci in the genome or directly as described above. In particular, a TE type of long interspersed nuclear elements (LINEs) may be used, more specifically L1HS may be used. The L1HS subclass of LINEs is human-specific and its protein coding sequences are strongly conserved. As described herein, a kmer is a subsequence of a biological sequence (such as a polynucleotide or polypeptide) of a length k. The term kmer can also refer to all of a biological sequences subsequences of length k.
To detect overexpression of a TE protein, a baseline expression can be established in the candidate set of kmers/proteins. The baseline expression may be specific to a particular demographic, e.g., age, tissue type of the tumor, etc. The kmers/proteins can be ranked by levels of overexpression, with the ones being most highly overexpressed identified as candidate cancer antigens and peptides corresponding to those candidate cancer antigens can be synthesized. For example, clinical grade peptides corresponding to all or a portion of a particular kmer/protein in a ranked set of kmers/proteins can be synthesized using a solid-phase peptide synthesizer according to the 9-fluorenylmethoxycarbonyl group (Fmoc) protocol and validated using reverse-phase high-performance liquid chromatography followed by mass-spectrometry, or by other methods known to those of ordinary skill in the art.
When measuring expression levels and/or identifying genomic locations corresponding to directly measured proteins, the occurrence of RNA having a particular kmer (e.g., 24 mer) sequence can be identified. A particular kmer can correspond to multiple loci, and more than one kmer can correspond to a particular locus. Such knowledge of kmers and loci in the transposable elements (e.g., L1HS) can be used to create a mapping between certain proteins and certain kmers, potentially with different weights of a mapping between a kmer and a protein. The weights can be used to estimate a total expression of a particular protein by determining a weighted sum of the expression levels for each of the kmers mapped to the particular protein. The frame of each locus and the MHC haplotype of the patient can be used, along with the corresponding kmers, to determine the resulting proteins that are highly overexpressed.
Thus, a set of peptides can be generated for a set of protein antigens that are likely to be generally applicable for use as vaccines for administration to cancer patients. Then, for a second patient, RNA or protein measurements can be used to determine TE proteins that are overexpressed in the second patient. Peptides corresponding to the TE proteins from the second patient can be synthesized, or if a peptide is common to the first patient and the second patient, the common peptide can be selected for use as a vaccine. In this manner, a vaccine can be personalized to a patient (e.g., a particular vaccine can be newly synthesized or selected from a library).
Further, an APOBEC mutation signature can be used to determine whether the patient is likely to respond to a TE cancer vaccine. The APOBEC mutation signature can be inferred from RNA sequencing data.
The disclosed methods were applied to triple negative breast cancer (TNBC) and melanoma and it was determined that L1HS epitope kmers correlate with better survival in TNBC and complete response to checkpoint blockade therapy in melanoma. This illustrates that these elements correlate with better survival, presumably through activation of the host immune system. Further activation through vaccination can lead to even stronger antitumor immune responses, which can work synergistically with checkpoint blockade therapy.
Cancer is the second leading cause of death in the United States [1], and while there have been significant medical advances in treating this disease, the standard of care has not changed significantly over the past few decades. Chemotherapy, radiation, and surgery have been the frontline defense against cancer progression, but new therapeutic strategies are being developed that personalize the therapy to individuals. For example, targeted therapies are small-molecule drugs designed to inhibit specific molecular alterations, such as an activating kinase mutation. These therapies have generated complete responses in late-stage disease, but resistance often emerges and the cancer relapses. Targeted therapies are routinely used against recurrent activating mutations, including BRAF V600E in melanoma, but most patients do not have an actionable variant and do not benefit from these approaches. Furthermore, targeted therapies do not yield durable responses, since the cancer eventually relapses, and incur significant cost to the healthcare system [2].
Another approach for treating cancer is to amplify the antitumor immune response. This approach has achieved remarkable responses while inducing minimal toxic side-effects. The discovery that the immune system can recognize and destroy cancer cells has opened the door to an entirely new therapeutic approach. Genome-wide dysregulation of transcription and translation leads to the presentation of tumor-specific antigens by major histocompatibility complex molecules on the cell surface. Cytotoxic T cells recognize tumor-specific antigens and induce immune-mediated cell death of those tumors.
Unfortunately, this process can select for cancer cells that evade immune recognition, which leads to an immunosuppressive tumor microenvironment that is able to coexist with the host's immune system [3]. Cancer cells can evade immune recognition via inhibitory signals. Inhibitory signals can be created by (1) a reduction in the expression of proteins that would otherwise be detected by the immune system, or by (2) an increase in the expression of proteins that stop the immune system from attacking cancer cells or drowning out other antigenic proteins that the immune system could otherwise identify and attack. As an example, some cancer cells adopt immunosuppressive cell-surface markers to curb the antitumor immune response. These include the immune checkpoint molecules CTLA4 and PDL1. Identification of immune checkpoint expression in cancer has led to the development of antibody therapies that block the immunosuppressive signal allowing cytotoxic T-cells to continue the antitumor attack. Checkpoint blockade therapy can reduce the effect of immune checkpoint proteins, resulting in. durable responses with relatively minor toxic side-effects [4-7].
The anti-CTLA4 antibody, ipilimumab, was the first checkpoint blockade therapy to achieve FDA approval [6,8]. CTLA4 has a stronger binding affinity to CD80 and CD86 than the costimulatory CD28 molecules, leading to inhibition of T-cell activation [3]. CTLA4 normally becomes expressed after T-cell activation in order to prevent off-target autoimmunity; cancer cells may express CTLA4 to prevent cytotoxic T-cell activation [4-6]. The anti-PD1 antibody pembrolizumab came later and was found to be more efficacious and have fewer side-effects [9]. PD1 is a cell-surface receptor expressed after T-cell activation. Activation of the PD1 receptor by its ligand PDL1 leads to interference of downstream signaling from the T-cell receptor which suppresses the T-cell response [7,8].
The extraordinary responses to checkpoint blockade therapy has led to this therapy becoming widely used and at increasingly earlier stages in cancer treatment [7]. Using checkpoint blockade as a monotherapy achieves a response rate between 20 and 40% for melanoma [4,9]. Current biomarkers for response include PDL1 expression, T-cell infiltration, tumor bulk, mutation burden, crippled DNA repair machinery, and microsatellite instability. One of the markers for checkpoint blockade therapy is a high mutation burden. That is a problem for many patients who do not have a high mutation burden. In pediatric cancers, the mutation burden is extremely low and in some cases patients do not have a single mutation. And, even if there are mutations, these coding mutations only represent a small fraction (e.g., 5%) of the genome. Thus, there is a desperate need for therapeutic strategies that can induce responses similar to checkpoint blockade therapy, but in tumors that do not have the traditional biomarkers for response to check the blockade therapy.
In addition to identifying predictive biomarkers of response, combination immune checkpoint therapies are being investigated. Administering anti-CTLA4 and anti-PD1 therapies increases the response rate (>40%), but at the cost of increasing the number of adverse events, including fatal pulmonary toxicity [9]. The increased response rate with combination immunotherapy shows that further activation of the immune system correlates with increased antitumor effects. The additional toxic side-effects limit this approach's utility, so new approaches are needed to similarly activate the antitumor immune response while avoiding toxic side-effects. Checkpoint blockade therapy allows infiltrating T-cells to continue their cytotoxic functions, but does not influence the T-cell clones that travel to the tumor. Therapies that expand T-cell clones that are able to recognize cancer cells may work synergistically with checkpoint blockade therapy to tip the balance in favor of immune-mediated destruction of tumors [10].
During a normal infection, antigen-presenting cells enter peripheral lymph nodes to excite T-cells that recognize the antigen into rapidly expanding and circulating throughout the body in search of the antigen. Another strategy for improving response to checkpoint blockade therapy may be to increase the number of circulating T-cells able to recognize cancer cells using a cancer vaccine approach. Cancer vaccines expand the T-cells able to recognize cancer cells and increase the number of T-cells infiltrating the tumor [11].
Despite extensive research into cancer vaccines, the clinical response to cancer vaccine monotherapy has been modest [12,13]. Sipuleucel-T is the only FDA-approved cancer vaccine that stimulates the immune response against a tumor-specific antigen [14]. This suggests that expanding the number of antitumor T-cells is not sufficient, so checkpoint blockade therapy may be required to overcome the inhibitory mechanisms within the tumor microenvironment. Recent studies have shown that vaccines work synergistically with checkpoint blockade therapy to increase response rates [10,11].
Sipuleucel-T does not target a mutated protein, but instead targets a shared antigen that is overexpressed in prostate cancer cells but not in healthy somatic cells. Being shared across patients has facilitated the development of Sipuleucel-T. The alternative cancer strategy being investigated is to identify private mutations within each tumor and synthesize a unique set of peptide vaccines based on that individuals cancer mutations. The private mutation approach does not scale well since it requires DNA sequencing, alignment, variant calling, MHC binding prediction, peptide synthesis, quality control, and safety validation for each individual patient. It would be ideal to identify a set of protein-coding genes within the genome that are uniquely expressed in cancer cells but are also shared across individuals. However, this approach may also need to be personalized to the individual since the immunopeptidome reflects that patient's particular HLA genotype.
Accordingly, embodiments of the present disclosure can identify neoantigens that are uniquely (or at least predominantly) expressed in cancer cells, where new vaccines can be engineered to train the immune system to recognize and react to these neoantigens. Such vaccines can be used in combination with (e.g., before) checkpoint blockade therapy, e.g., to boost the number of T cells that can recognize these neoantigens (like viral peptides) in the patient's body. In this manner, when checkpoint blockade therapy is administered, the immunosuppression of the cancer cells is removed, and the number of T-cells that are able to recognize the cancer cells has been increased. The checkpoint blockade therapy can unleash the immune system, and the vaccine can help the immune system recognize and react to the cancer cells. The disclosed vaccines can also be used in the absence of checkpoint blockade therapy.
There is one FDA-approved cancer vaccine that helps the immune system recognize and react to a non-mutated gene that is overexpressed in cancer cells and not normal cells. This is an attractive model because cancer cells typically overexpress a large number of genes not usually expressed in healthy cells. Dysregulation of transcription and translation is a hallmark of cancer and causes many non-canonical genes to be expressed in tumor cells.
Epigenetic dysregulation is a hallmark of cancer. Cancer cells take on a stem-cell-like state, with the genome taking on a more euchromatic structure. This, in combination with widespread DNA hypomethylation, allows genes that are normally silenced to become expressed. Notably, 40% of the genome is composed of self-propagating DNA elements known as transposable elements (TEs), which the genome silences early in development via repressive epigenetic marks. TEs encode virus-like genes that facilitate reintegration of their sequences throughout the genome. These elements are normally repressed to prevent genomic instability, but have been identified in specific tissues and developmental stages. For example, transposable elements are under selective pressure to retrotranspose in germline cells in order to propagate across generations. There have also been reports of higher expression in brain tissue and stem cells [16-24]. But, in cancer, these repressive mechanisms get broken, resulting in wonton expression of these TE genes.
Transposable elements can be subdivided into DNA transposons and retrotransposons. DNA transposons replicate with a DNA intermediate, and retrotransposons replicate with an RNA intermediate coupled with a reverse transcription. There are two major classes of retrotransposon: long terminal repeat (LTR) and non-LTR elements [16]. LTR elements are related to retroviruses. The non-LTR elements contain two subclasses, the short interspersed nuclear elements (SINEs) and the long interspersed nuclear elements (LINEs). LINEs are the only class of TE that contain the necessary protein machinery to retrotranspose. Moreover, autonomous LINEs are required for other TEs, including Alu SINEs, to retrotranspose. Rodriguez-Martin, B., Alvarez, E. G., Baez-Ortega, A. et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat Genet 52, 306-319 (2020); incorporated by reference herein. For this reason, the LINEs are strongly repressed in somatic tissues to prevent genomic instability caused from widespread retrotransposition.
LINE-1 (L1) element L1 Homo sapiens (L1HS) is the youngest transposable element in the human genome and is one of the few classes of TEs that is autonomous. It was hypothesized that L1HS would be strongly repressed in somatic tissue, but likely expressed in tumors and thus would be an ideal candidate antigen for developing antitumor vaccine therapies. As the youngest class of TE, L1HS is the most potent at becoming activated in cancer cells since these elements have conserved regulatory sequences and coding regions. Despite the strong conservation, there is sufficient variation for L1HS elements to show differential expression across individuals due to differences in transcriptional regulation at different loci. To account for such differential expression, some embodiments of the disclosed methods can personalize vaccines to each tumor, and allow the re-use of peptides as vaccines for the peptides that are shared across individuals.
Accordingly, some embodiments of the disclosed methods make use of L1HS. L1HS vaccines have been developed to treat HIV patients because, like cancer cells, HIV infected cells also over-express transposable elements. The L1HS HIV vaccines were tested in pre-clinical models, including primates, and found to be immunogenic and safe [25]. However, immunization against these elements did not have an effect in protecting macaques from SIV infection, potentially because these vaccines were based on a consensus sequence of transposable elements and endogenous retroelements. Therefore they may not have been sufficiently variable to generate a response [26].
Methods for quantifying TE expression are currently being developed, but these methods are not designed for precision immuno-oncology applications. TE expression methods quantify expression at the class level using a consensus sequence or an average across all loci [15,27]. This approach does not capture candidate cancer antigen sequences, particularly those that are present at multiple loci or those that are unique to a specific locus.
Disclosed herein is a novel TE epitope expression quantification method that identifies unique TE sequences for precision cancer vaccine development by DNA and RNA analysis of TE expression. Also disclosed is a mass spectrometry method that identifies MHC bound TE peptides. This approach confirms that TE peptides are presented on MHCs and can be recognized by T cells.
Embodiments include novel approaches based on expression of unique L1HS epitope kmers and peptides in RNA-seq and mass spectrometry data. The disclosed method prioritizes L1HS epitopes that can be identified to facilitate the identification of cancer antigens. Also disclosed herein is a novel process for identifying tumor-specific epitopes that are shared among individuals, allowing for a panel of candidate cancer antigen peptide vaccines to be synthesized, validated, and matched to patient tumors. Normal expression of potential TE epitopes were quantified in several human tissue samples and across developmental stages. L1HS peptides were shown to be processed and presented on triple negative breast cancer (TNBC) tumors but not matched normal tissue. Finally, L1HS epitope expression correlates with better survival in TNBC and with a complete response to checkpoint blockade therapy in melanoma.
A software toolkit (also referred to as vaccinaTE) was developed to facilitate the identification of candidate cancer antigens. Three functionalities within the toolkit are as follows. A first function generates reference files for building a database of unique transposable element (TE) kmers and peptides. A second function quantifies unique kmers (corresponding to TEs) in RNA-seq data, which can provide RNA kmer frequencies for identifying candidate proteins that are overexpressed in tumor cells.
A third function generates in Silk® mutated kmers to detect APOBEC activity related to activation of an antiviral response within cancer cells. APOBEC randomly mutates mRNA when it senses there is expression of active transposable elements. The third function creates a database of all of the possible mutated mRNA that could result from APOBEC activation, and then quantifies this signal in the patient's RNA-seq data. A high rate of APOBEC-associated mutations correlates with more TE expression and response to vaccine therapy.
The vaccinaTE toolkit facilitates the analysis of transposable elements and their expression for large cancer gene expression datasets. The vaccinaTE toolkit includes routines for identifying open reading frames, predicting WIC binding, ranking peptides by their druggability, quantifying expression of peptides, and assembling full-length transposable elements from RNA-seq data. The vaccinaTE software is written in the C++ programming language to scale to genome-wide analysis of transposable element candidate cancer antigens, but other languages may be used. As further examples, some embodiments also provide several Python routines for preprocessing and analyzing the output of vaccinaTE.
At block 110, transposable element sequences are located and extracted. The TE sequences can be identified using a reference human genome (e.g., hg 38). The transposable sequences can be used to generate kmer sequences (i.e., subsequences of the TE sequence), potentially of various lengths. For example, kmers can be extracted from the transposable sequences. Each instance of a kmer in the TE regions can be identified and used in the approach. The location of each kmer can also be determined. The location can be used to assign a unique identifier to each kmer. A given kmer may appear at multiple locations, potentially with two instances of the kmer overlapping with a same genomic position. In some embodiments, the TE sequences are specific to L1HS.
Of the thousands of L1HS loci, the majority have become degraded and may not generate sufficient protein for vaccine development. The L1base2 database was used to prioritize full-length L1HS elements and L1HS loci with intact ORF2 sequences [37].
At block 120, the open reading frames are located. An open reading frame defines how the protein is encoded. An open reading frame is defined by a start codon (3-base sequence, usually AUG in terms of RNA) and a stop codon (usually UAA, UAG or UGA). The open reading frames can be identified in the transposable elements, for which kmer locations are known. As the open reading frames provides a complete protein sequence, the open reading frames can be used to map a kmer to a protein sequence, which can be needed when measuring expression levels for a particular protein using RNA measurements (e.g., RNA sequencing data).
The hg38 genome annotation was used to generate L1HS ORFS. The generate ORFs tool was used to identify protein-coding regions within L1HS elements. Protein domains within ORFs were investigated using the Pfam tool [38].
At block 130, the open reading frames are translated into a protein sequence. The standard human genetic code can be used to translate each open reading frame into a corresponding protein sequence (Osawa S. et al., Microbiol Rev., 56, 229-264; (1992) incorporated by reference herein). The open reading frames that map to known transposable element domains are used for downstream identification of candidate cancer antigens.
At block 140, it is predicted which of the protein sequences (peptides) from candidate cancer antigens are able to bind to MEW, and thus would present on a surface of the cell. MHC is the complex that holds the epitope on the cell surface. MEW is the general term, and human leukocyte antigen (HLA) is the human specific term for human Class I MEW. Different MHCs can be tested, as different MEW haplotypes exist in the population. Peptides that do not bind to at least one version of MEW can be removed (discarded). In some embodiments, it is determined whether the peptides will bind to the MEW (including HLA) haplotypes present in an individual patient.
In some implementations, the netMHCpan-4.0 software was applied to the translated L1HS ORFs for 2427 HLA genotypes. 8mers, 9mers, 10mers, and 11mers were investigated (although other kmers can be investigated). Certain peptides found in the open reading frames of proteins can be selected, for examples, peptides that were predicted to bind to at least one HLA allele with a minimum percentile rank, e.g., 2%.
At block 150, the peptides meeting specified criteria (e.g., the minimum percentile rank) can be assembled into a database. The peptides from block 140 can be mapped back to the transcript kmers to create a database of corresponding probes, which may be used in downstream analyses. For example, these probes can be used to detect expression levels. Such probes can be certain sequences to be identified in sequencing data or physical probes that can provide a signal when a specific sequence is detected, e.g., via hybridization. The measured levels of such probes can be aggregated (potentially with weights) to determine an expression level of a corresponding protein that may be a candidate cancer antigen. The aggregation can be a weighted sum, where each weight multiples a measurement amount of a particular kmer that contributes to the protein. The aggregated amount can be normalized, e.g., based on a total number of molecules analyzed.
The database can be created in such a way to facilitate going from DNA to protein space and vice versa. A peptide can be stored in connection with one or more kmers, and a peptide entry can have fields for each unique kmer location that contributes to generating that peptide. Alternatively or in addition, a kmer entry can be stored with fields(s) for each peptide that the kmer is included in the open reading frame that codes for the protein.
The database can be used to identify where a TE could have been generated in the human genome as well as identifying what proteins could have been generated by an over-expressed transcript. Without this database, one would need to realign the many kmer and peptide sequences. The database can be queried based on peptide and/or DNA kmer sequences. In some implementations, any sequences that could have been generated by a non-TE region of the genome are removed.
Embodiments can perform the identification of transposable element immunotherapy candidate cancer antigens using the vaccinaTE toolkit.
At block 210, annotations of a reference sequence can be used to identify TE regions, and particular types of TE regions, e.g., L1HS. Accordingly, the underlying database of TE candidate cancer antigens can be based on TE annotations from a human reference genome sequence. The open reading frames (ORFs) can be automatically detected and the resulting ORFs can be extracted. Thus, this routine can start in a DNA space of the TE regions and identify ORFs corresponding to an RNA space.
Accordingly, a step of the pipeline can identify unique open reading frames (ORFs) across all TEs. The generateORFs command takes a genome sequence file and a transposable element annotation file and generates the transcripts and predicted protein sequences for downstream analysis. There are several TE databases of interest to the cancer research community on the UCSC Xenahub [35].
At block 220, a routine determines whether peptides corresponding to the ORFs bind to MHCs. These ORFs can be defined as kmers of RNA, e.g., by each ORF including a collection of kmers at different locations in the ORF. This routine can translate the ORFs to peptides (e.g., as in block 130), and then determine whether those peptides bind to one or more MHC alleles.
As shown in
Accordingly, the peptides within the protein sequences that bind to the HLA genotypes in an individual patient or patient population can be identified. The findBinders script can run netMHCpan-4.0 or MHCflurry (or other tool) to generate a database of potential TE candidate cancer antigens. This database can be used to quantify HLA-peptide kmer expression in RNA-seq data.
At block 230, the peptides identified to bind to WIC (e.g., ones in the database at block 150) are used to predict corresponding RNA sequences that encode a peptide. This routine can in turn map the resulting RNA sequences back to particular locations in the genome that can be transcripted to the corresponding RNA. The duplicates can be resolved where each possible RNA kmer sequence is identified and used for measuring an expression level of the protein. Peptides predicted to bind to WIC can be mapped to transposable element ORFs using the TE sequence database.
Unique and multimapping DNA kmers can be used for quantifying expression of TEs from RNA-seq data. The vacKmer tool can be used to predict what mRNAs can encode the peptides and match the resulting kmer sequences to the transposable element loci that could have generated the particular peptide. This creates the genomic sequence database that can be used for quantifying transposable element expression in RNA-seq data.
At block 240, sequencing information from a sample can be analyzed to count the presence of RNA kmers, in order to determine an expression level for a corresponding protein. At block 240, the expression level of a particular protein can be compared to a baseline expression level for a healthy cell, and therefore used to detect a protein that is overexpressed in a tumor cell. The RNA kmers can be ranked by levels of overexpression. The highest ranked RNA kmers (e.g., top N (e.g., 10, 20, 30, etc.) or top X % (e.g., 5%, 10%, etc.)) can then be used to identify the cancer antigens, e.g., by in silico translation. The unique kmers can be mapped to identify the correct frame for translating to protein sequences. The mapping can identify the correct reading frame so that the kmer generates the protein sequence that would be generated by the DNA sequence of the TE. Descriptions herein of prediction and mapping can be performed using in silico techniques, which can model biochemical processes such as translation and transcription. Thus, such terms can refer to in silico techniques in the present context.
As an example, a list of kmers ranked in terms of RNA overexpression relative to a normal control can be produced (e.g., the top 100, 200, 300 kmers, etc.). The most highly ranked kmers might correspond to ones that are never expressed as proteins. For the ranking, a p-value can be generated using a distribution (e.g., a negative binomial distribution) for how overexpressed the kmer is relative to the normal control or cohort thereof. Thus, the element described in block 240 can filter out kmers that are likely to also be expressed as mRNA transcripts in normal cells. Other criteria can also be used (e.g., water solubility of the peptide corresponding to the over-expressed transcript) to determine the ranking of a particular candidate peptide for experimental validation. Furthermore, the MHC haplotype of a human subject can be determined for each sample, so the rank of the peptides can then be based on how likely they are to be presented by a patient of that MHC haplotype. Additionally, a distance (e.g., the hamming-distance) between the candidate cancer antigen and the closest normal protein antigen can be used as another criteria for prioritizing peptides that are strongly immunogenic.
When this analysis is performed for a particular individual or a particular cohort, an additional analysis can be used to confirm whether the subject is likely to respond to a vaccine. This analysis involves APOBEC genes.
APOBEC is a class of proteins/genes that protects the genome from transposable elements. Embodiments of the disclosed methods can use APOBEC mutation signatures as a secondary confirmation of overexpression of transposable elements. APOBEC can also be used to predict responsiveness to checkpoint blockade therapy. The usefulness of this approach is shown by the fact that overexpression of transposable elements can be correlated with response to immune-therapies.
Activation of the APOBEC antiviral response within cells is a hallmark of cancer [28,32]. The APOBEC family of proteins is also involved in repressing transposable elements through several mechanisms, including random mutagenesis of single-stranded RNA and DNA. To provide additional support to transposable element signal, a random mutagenesis database was generated using published APOBEC mutagenesis motifs [29,30,36]. The APOBEC mutation database along with the MHC bound TE peptides can be used for a complete analysis of expression signatures using the probeAnalysis tool. The probeAnalysis tool generates a ranked list of MHC bound peptides and APOBEC kmers for each sample. Analysis routines can annotate these lists for precision medicine applications.
APOBEC is active when transposable elements are active, but is otherwise inactive. One can then predict that when transposable element expression is high, higher APOBEC activity should result. APOBEC activity can be seen through very specific mutations in DNA and RNA, e.g., mutating a C to a T as an attempt to break the transposable element before reintegration into the genome. Thus, the RNA can be analyzed to detect mutations (e.g., more than a threshold) caused by the APOBEC pathway. For a given subject, if APOBEC is active, then there is a higher likelihood of identifying TE candidate cancer antigens specific for the subject. Whereas if APOBEC is off, then the likelihood is lower that the subject is a candidate for this type of therapy.
At block 205, a cancer patient's RNA-seq sequencing read file is downloaded, e.g., in FASTA format. The FASTA file provides the sequences of the RNA molecules obtained from the RNA sequencing of a biological sample from a subject, e.g., cells or a fluid.
At block 215, the APOBEC mutation binding sequence is identified in any of the RNA sequences. Activation of the APOBEC antiviral response within cells is a hallmark of cancer [28,32]. The APOBEC family of proteins is also involved in repressing transposable elements through several mechanisms, including random mutagenesis of single-stranded RNA and DNA. APOBEC3A is the most active APOBEC in cancer and is involved in repressing viral and retroelement reintegration events in the human genome. APOBEC3A causes a C>T substitution across the genome at the DNA-level, but Sharma et al. (2016) infra identified a secondary structure preference and a [CT][CT][ATC][TC]C[GA] binding motif preference. Similarly, APOBEC3G was recently found to preferentially bind to a N[CGT]N[CT])C motif.
At block 225, an inverted repeat structure is identified. Sharma et al. (2016) found that an inverted repeat was found in 98% of confirmed APOBEC3G mRNA edits due to a hairpin structure that facilitates APOBEC3G binding to RNA. The hairpin structure is found in the fasta file. Each potential mutation site will have this hairpin structure. It is a result of the RNA folding back on itself to form the hairpin shape that APOBEC can then bind to and mutate the sequence.
At block 235, an APOBEC3G kmer database was generated. The database can be used for comparison to the RNA sequencing data of a particular subject. The database can be generated synthetically on a computer, e.g., using the Gencode V32 transcriptome reference [42]. Synthetically mutated kmers containing these motifs were generated, filtering out kmers that match kmers in the normal transcriptome database as well as kmers related to common polymorphisms in the human population using the dbSNP resource [43]. The filtering is done since the detection of sequences that match regularly occurring sequences in the healthy population would not be associated with APOBEC activity.
Block 240 can use the sequencing results to count the occurrence of kmers (identified in block 230) corresponding to the identified peptides from block 220. Block 240 can also count the APOBEC kmers to estimate an APOBEC signature that has been found in tumor samples. The APOBEC signature can correspond to the number of kmers in the patient RNA-seq data that match a predicted APOBEC mutation generated in our database, which was generated in block 235. A reference distribution using healthy controls is used to estimate the threshold for activation of APOBEC. The threshold for an active APOBEC signature is identified using a reference cohort of healthy control RNA-seq fastq files, e.g., a specified number of standard deviations from the average of the reference cohort can be used.
Most bioinformatic tools actively ignore transposable elements because they are repetitive sequences that require special attention. As described above, the present disclosure includes a software suite that implements bioinformatic tools specifically designed for the unique challenges associated with transposable element analysis. Embodiments can include generation of a transposable element epitope database, locus-specific quantification of transposable elements, differential expression analysis, and identification of MHC-bound peptide in mass spectrometry data.
As described above, some embodiments can identify candidate cancer antigens that may be used in a cancer vaccine. Such proteins may be highly expressed in cancer cells (e.g., on the surface of cancer cells), but not expressed or minimally expressed in healthy cells. Further, such proteins may be expressed in at least a subpopulation as opposed to being related to a specific mutation. Examples of such proteins are generated from transposable elements in the genome, e.g., short interspersed nuclear elements (SINEs) and long interspersed nuclear elements (LINEs), such as LINE-1 (L1) element L1 Homo sapiens (L1HS).
At block 310, a group of candidate cancer antigens that are generated from transposable elements is identified. As an example, the initial identification of the candidate cancer antigens can be performed as described for
The kmers can be identified first in ORFs in TE regions, with the protein corresponding to a given ORF being mapped to the unique kmers (e.g., identified by sequence and location) in the ORF. As another example, sequences of peptides that bind to MHC can be used to predict (e.g., via in silico reverse transcription) corresponding kmers that can be analyzed to determine an expression level of the candidate cancer antigen.
In some implementations, additional candidate cancer antigen can be identified, starting from an initial set. For a given peptide sequence, one can identify similar peptides that are known to be bound to MHCs, using machine learning approaches like net-MHC (Nielsen et al, Protein Sci 12, 1007-1017 (2003); incorporated by reference herein). Once those similar peptides are known, the peptide sequences can be used to identify corresponding RNA sequences that can in turn identify which of the transposable elements uniquely express those peptides.
At block 320, a baseline expression level is determined for each of the candidate cancer antigens using measurements of tissue from a first cohort of healthy subjects. The cohort can be of one or more subjects. In some implementations, a baseline expression level can be determined for kmers, and the expression analysis can occur in RNA space. Later, the expression levels for the kmers can optionally be used to determine a baseline expression level for corresponding proteins. As another example, the identified kmers can be translated to proteins. The expression level for the protein can be measured directly, e.g., using mass spectrometry. The baseline expression level can be determined for a particular tissue type, e.g., by analyzing a biopsy from the particular tissue type. In embodiments, the baseline expression level can be determined using measurements of noncancerous tissue from the same subject.
In some embodiments, the baseline expression level can vary based on a subject's age, as the normal expression level for certain proteins can vary with age. The baseline expression level can also be determined for a particular tissue type, e.g., as method 300 may be implemented to identify candidate cancer antigens for a particular tissue type. Thus, the first cohort can have a particular age range and/or have tissue sample all from the same tissue type (e.g., breast, lungs, colon, liver, breast, prostate, etc.). The first cohort can also have a same or similar WIC haplotypes. A cohort can also share certain demographic information.
In other implementations, the expression levels of the proteins can be analyzed directly, e.g., using mass spectrometry. Whichever techniques are used, a tissue biopsy can be analyzed to perform the measurements. Alternatively, the analysis could use measurements performed by a different entity (e.g., published data), but which is still determined from healthy samples.
At block 330, a tumor expression level is determined for each of the candidate cancer antigens using measurements of tumor tissue from a second cohort of cancer subjects. The second cohort can have similar criteria as the first cohort, e.g., same age and/or tissue type. In one implementation, the tumor cohort comes from The Cancer Genome Atlas project, which includes publicly available data, with identifying characteristics to form various cohorts of samples. In another implementation, tumor samples from a subject can be analyzed, e.g., via RNA sequencing or mass spectrometry of proteins.
In some embodiments, the tumor expression level may be determined from measurements of the occurrence of various kmers. For instance, an expression level for a particular protein can be determined using measured amounts of various RNA kmers that can be translated to the protein. The amount of occurrence for each particular kmer (e.g., as measured via an intensity signal or by counting individual RNA molecules with the particular kmer), which can be translated to the protein, can be aggregated (e.g., a weighted sum) to determine the overall expression level for the protein.
The expression levels for kmers can be determined in various ways, e.g., using sequencing results or using sequence-specific probes, which can provide an intensity signal.
At block 340, a differential expression level is determined for each of the candidate cancer antigens using the baseline expression level and the tumor expression level. The differential expression level can be determined by comparing the tumor expression level to the baseline expression level. As examples, the comparison can include a ratio or a subtraction.
At block 350, one or more of the candidate cancer antigens having a differential expression level greater than a threshold can be selected. The proteins can be ranked based on a score that is dependent on the differential expression levels. As examples, the threshold can correspond to the N (e.g., 10) proteins having the highest differential or within a top range (e.g., by percentage) of differential expression levels. Constraints in synthesizing peptides/size may also be used in selecting candidate cancer antigens for the final library.
In some embodiments, the score can be further based on other criteria, such as chemical data like the solubility of the protein. For example, a hydrophobic candidate cancer antigen would be insoluble in water and would be unlikely to result in an effective cancer vaccine.
The comparison of the differential expression level to a threshold can be performed in RNA space. If a particular set of one or more kmers have expression levels above a threshold, the set of kmers can be mapped to the one or more of the candidate cancer antigens. The mapping can include finding the reading frame of a kmer within a transposable element. The mapping can also include identifying multiple kmers corresponding to a protein, and/or a single kmer coding for multiple proteins. Thus, there can be multiple mappings for a protein. Proteins can be grouped together, with pointers back to two locations in the genome that could have generated that protein. There can be different weights of a mapping between a kmer and a protein. The weights can be used to estimate a total expression of a particular protein by determining a weighted sum of the expression levels for each of the kmers mapped to the particular protein.
Depending on how the cohorts are defined, common targets can be identified for a broad range of subjects, e.g., as defined in a cohort. For example, candidate cancer antigens can be defined for a given tissue type for a subject within a particular age range. In this manner, the most common candidate cancer antigens can be identified, and vaccines based on these candidate cancer antigens can be administered. In other embodiments, a more personalized approach can be performed, using a specific measurement from a subject. For example, the measurements from a particular subject can be used to identify the highest ranked candidate cancer antigens for that subject, and vaccines based on those candidate cancer antigens can be administered. In another example, a determination of the subject's MHC haplotype can be used to identify higher ranked candidate cancer antigens for that subject.
Once the candidate cancer antigens are identified and ranked, vaccines can be designed and synthesized. For a personalized approach, the vaccines corresponding to the most highly overexpressed proteins for a particular subject can be selected for administration. Given that some candidate cancer antigens are shared across cohorts (particularly cohorts sharing one or more MHC alleles), vaccines can be predesigned and used for a matching patient.
Embodiments (e.g., as described in
A. Generation of LINE-1 Epitope Database
The identification of the candidate cancer antigens can be performed as described for
As described for block 120, open reading frames were identified within each locus. A total of 11,129 unique open reading frames were found. Open reading frames were correlated to peptides (e.g., as is block 130), and the peptides were then screened for binding to the 81 most common HLA haplotypes using the netMHC-4.0 software [1]. This generated 60,842 unique 8, 9, and 10mer peptides predicted to bind to at least one HLA haplotype, e.g., as described in block 140. These peptides can be reverse transcribed to determine RNA kmers that may be analyzed for expression levels.
B. Relation Between Loci, Kmers, and Peptides
In the process of creating the database of candidate cancer antigens, some embodiments can identify regions (e.g., around particular loci) corresponding to TEs, identify kmers corresponding to those loci (where the kmers are DNA or RNA), and the kmers can be translated into peptides. Additionally, kmers correlating to DNA or RNA can be predicted from peptides that bind a particular WIC protein. For example, the peptides can be mapped to RNA kmers, which can then be used to measure expression levels. The determination of which kmers correspond to which peptides, and vice versa, is described herein as mapping.
Regarding mapping, a given kmer can map to two or more proteins. A given RNA sequence (open reading frame) generates one peptide, but a given kmer sequence can occur in different open reading frames, and thus a kmer can map to more than one protein. For example, if the kmer is located at two or more loci and each locus maps to a different protein, a kmer can map to two or more proteins. In such a case, the expression of the kmer can contribute to (e.g., split among) both of the proteins, e.g., using a weight determined for a given protein. The weight can be stored in the database and determined by the number of kmers that map to multiple TE loci. Besides expression levels for which proteins can be ranked, other criteria can be used, e.g., whether a protein is hydrophobic or other biochemistry criteria to select which protein is the better candidate cancer antigen.
Conversely, a given protein can map back to multiple locations in the genome. Such mapping can be done at block 150, e.g., to identify additional kmers corresponding to TEs. In such a case, each expression level of a kmer can contribute (e.g., as defined by a weight) to an overall expression for the protein to which the kmers can be translated.
Further, each transposable element can include multiple unique kmers. Thus, when doing the gene expression analysis, there can be multiple mappings to that unique locus (each mapping via a different kmer). The relative counts for each of those kmers (e.g., via a microarray or via RNA sequencing) can be used to estimate the overall expression of that unique locus, e.g., that translated to a same protein. Then, the expression levels of each locus mapping to a protein can be aggregated.
Accordingly, in some embodiments, selecting one or more of the candidate cancer antigens can including mapping a set of kmers to the one or more of the candidate cancer antigens.
C. MHC
The database of candidate cancer antigens can be sorted by major histocompatibility complex (MHC) haplotype. The cell packages the peptide into the MHC complex and moves the complex to the cell surface. This complex on the cell surface is what is recognized by the T cell receptor, resulting in T dependent immune responses.
Thus, a database that takes account of MHC haplotypes can be used to select candidate cancer antigens, by, for example, focusing on the MHC haplotype of a subject person. The MHC haplotype of a subject can be measured in various ways, e.g., by genotyping the DNA using a microarray or by DNA sequencing.
When using mass spectrometry, peptides can be purified after binding to MHCs. More particularly, a peptide library can be contacted with recombinantly produced peptide receptive MHC molecules bound to a solid surface, such as a column. Peptides that do not bind the peptide receptive MHC molecules flow through the column. Peptides that bind the MHC molecules are eluted and then can be identified using mass spectrometry. Then, the sequences of the eluted peptides can be matched to a transposable elements, e.g., using a database of predicted transposable element mass spectra, as may be determined using steps described in
The expression of a transposable element can be measured in various ways, e.g., in RNA space or in protein space. In RNA space, certain sequences (referred to as kmers) can be quantified in cells of one or more tissue types, for both healthy and tumor tissues. The expression of a set of one or more kmers can be mapped to the expression of a particular protein, e.g., as a weighted sum. As noted above, certain kmers can contribute to more than one protein. In protein space, the expression measurements can be performed directly on the proteins. In some implementations, such measurements can be performed using mass spectrometry.
A. RNA
Accurate identification and quantification of transposable elements can use locus-specific sequences. The repetitive nature of L1HS and other transposable elements leads to multimapping of sequence reads, where a read can map to several locations in the genome. In some embodiments, to address the multimapping, embodiments can quantify the expression of locus-specific sequences. The locus-specific sequences can be unique. To be unique, the sequences (kmers) are (in general) relatively long (e.g., 20-30mers). In other embodiments, multimapping is addressed by having various kmers contribute to the protein generated from each locus.
Although not required, uniqueness may be used as a feature for identifying loci that have a particular relationship to cancer. For example, gene fusion events can occur in some cancers (Ph+ Leukemia) where two chromosomes break and merge together to form a new chromosome. This causes the regulation of these chromosomes to change and may result in the generation of TE peptides that are unique to a particular locus. By focusing on unique kmers, these loci can be identifies. But if uniqueness was not enforced, then we may identify TE sequences that are expressed at several loci in the human genome.
Embodiments can then determine if the expression of a kmer is statistically different compared to a reference dataset of control gene expression data across human tissues and developmental stages. The comparison of expression can occur on a per tissue and/or per developmental stage basis. The resulting differential expression levels for the candidate cancer antigens can be used to select vaccines. When the baseline expression level is determined for a particular tissue, a personalized expression threshold for differentially expressed transposable elements can be determined for the subject's specific tumor, e.g., in block 320 of
The quantification analysis can use reads from part or all of an RNA fragment. In a clinical setting, one may want to confirm the presence of the entire transcript sequence, which can be done by assembling the whole sequence from the fragments. This can be done by aligning the RNA-seq reads to the reference using bwa [Grabherr M. et al. “Full-length transcriptome assembly from RNA-Seq data without a reference genome,” Nat Biotechnol. 2011 May 15; 29(7):644-52] and assembling the full length transcript using the trinity software [Li H. and Durbin R., “Fast and accurate short read alignment with Burrows-Wheeler transform,” Bioinformatics 2009 Jul. 15; 25(14):1754-60].
One challenge of quantifying the expression of TE is that they are very repetitive. Software techniques described herein can find unique kmers, which can be used as a barcode to identify specific transposable element sequences that are candidate cancer antigens. By querying across the genome and analyzing expression data (e.g., overexpression of TE kmers in the RNA transcriptome), the database of reference normal samples can be used to isolate the tumor specific overexpression of transposable elements. Embodiments can rank those peptides by any of a number of factors, including, a score determined over expression of the kmer relative to normal tissue, water solubility, ability to be presented by a subject's MHC alleles, as well as other factors mentioned herein. The result of the analysis is a list of potential vaccine peptides for use in cancer therapy. These can be used alone or in combination with another therapy such as checkpoint blockade therapy. The unique kmers can correspond to the length of the peptides and be on the order of 24, 27, or 30 bases long.
Referring back to the gene expression approach 400 of
At block 410, the occurrence of each of the unique epitope kmers can be counted. Each of the sequence reads can be compared to the library of kmers (e.g., as determined according to
At block 415, differentially expressed kmers are identified by comparing to reference levels determined from normal tissue. As described herein, the reference levels can be determined on a per tissue basis and/or on a developmental age basis, as well as other factors. Kmers that have a sufficiently high differential expression can be identified and used for later blocks.
At block 420, the RNA sequencing data is aligned to noncanonical protein-coding genes inferred from transposable element sequences. The kmers that are highly overexpressed can be aligned to the noncanonical protein genes (e.g., in TE elements), e.g., as part of filtering out kmers that do not align to noncanonical protein genes. Overlapping reads aligning to inferred TE reference sequences can be assembled to recover full-length transcript sequences.
At block 425, RNA transcripts containing a candidate cancer antigen (termed a “protein epitope” in
At block 430, the RNA transcript isoforms are catalogued in the patient population. Blocks 420-430 can quantify the most abundant hits across patients to create a short list of the most widely used cancer antigens for vaccine production. This step can build a growing database of the most common hits.
B. Proteins—Mass Spectrometry Approach for Identifying Candidate Cancer Antigens from TE Peptides
Certain mass spectrometric approaches rely on protein databases for identifying peptides. One of the limitations of such approaches is that peptides that are not present in the search database are not identified. Since the focus in the field has been on the identification of canonical proteins, there has been limited attention paid to potential cancer antigens from non-canonical protein coding genes, including genes within transposable elements. Disclosed herein is a novel approach for identifying potential cancer antigens by first precomputing a database of transposable element epitopes using the vaccinaTE software, e.g., as described in
The mass spectrometry database of peptides from TE elements can be used to detect the expression levels by matching spectra patterns for the peptides in the database. The intensity of the peaks can provide the expression level for the protein. Certain TE peptides are not only overexpressed in cancer cells but are actually presented on the cell surface of real triple negative breast cancer patient tumors.
Referring back to the gene expression approach 400 of
At block 460, a target-decoy search is performed using an epitope database as described in, for example Elias J E and Gygi S P, Methods Mol Biol 604, 55-71 (2010); incorporated by reference herein. This search corresponds to a process of creating real peptide spectra and fake peptide spectra and determining if a mass spectra matches the real peptide more often than the fake peptide.
At block 465, a catalog of HLA bound peptides is identified in the patient population. As a result, the most prevalent peptide sequences can be catalogued. Embodiments can then move forward with synthesizing those most widely seen peptides.
C. Vaccine Catalog Process
After the expression levels are measured, the candidate cancer antigens can be identified.
At block 475, the peptides can be ranked by the prevalence in the disease population. The prevalence is based on the RNA expression data or data derived from direct peptide quantification (e.g., mass spectrometry). The ranking of the expression provides the peptides that occur more frequently in cancer cells, but not in healthy cells. Techniques for ranking are described herein.
At block 480, a panel of nucleic acid probes can be generated for use companion in diagnostics. Such an approach can accelerate the identification of candidates for the vaccine therapy. Once there is a ranked set of peptides, nucleic acid probes that detect the presence of these candidate cancer antigens in tumors can be generated. The probes can be used to screen patients who are likely to benefit from treatment with the peptide vaccine.
D. Generation of APOBEC Kmer Database
Besides quantifying kmer expression, embodiments can analyze APOBEC mutations. The ability to quantify APOBEC associated RNA editing/DNA mutations was investigated using RNA-seq data as input. This is a novel approach that uses in silico mutated transcriptome kmers to detect heightened APOBEC activity, which is a sign of viral infection and TE expression, and is an independent predictor of response to checkpoint blockade therapy [39,40]. The heightened activity was measured by comparing APOBEC RNA sequences in tumor tissue and in healthy tissue (“The Genotype-Tissue Expression (GTEx) project,” Nat Genet. 2013 June; 45(6):580-5).
APOBEC3A is believed to be the main enzyme responsible for the cancer APOBEC signature [28,31,36,41]. These enzymes are typically studied for their DNA mutagenesis signature, but APOBEC3A and 3G were recently found to have an RNA signature that is more specific than the C>T DNA mutagenesis signature. These APOBEC enzymes bind to a specific RNA secondary structure (used as a probe) that can be computationally modeled to detect APOBEC activity from RNA-seq data. The binding motif for APOBEC proteins can be used to make probes to detect APOBEC mutations in RNA, where the probes detect RNA expression of sequences with APOBEC mutations. This biological signature can be used to identify patients who may benefit from checkpoint blockade therapy.
APOBEC3A is the most active APOBEC in cancer and is involved in repressing viral and retroelement reintegration events in the human genome. APOBEC3A causes a C>T substitution across the genome at the DNA-level, but Sharma et al. (2016) identified a secondary structure preference and a [CT][CT][ATC][TC]C[GA] binding motif preference, which is an RNA sequence that binds to RNA in a tumor sample. Similarly, APOBEC3G was recently found to preferentially bind to a N[CGT]N[CT])C motif. Sharma et al. (2016) found that an inverted repeat was found in 98% of confirmed APOBEC3G mRNA edits due to a hairpin structure that facilitates APOBEC3G binding to RNA. Using the Gencode V32 transcriptome reference [42], kmers were synthetically mutated to contain this motif, filtering out kmers that match kmers in the normal transcriptome database as well as kmers related to common polymorphisms in the human population using the db SNP resource [43]. For example, one can start with the reference transcriptome and remove the variants that are in the human population, and then computationally mutate the sequences using the RNA sequence that APOBEC proteins bind to. These mutated sequence can then be used to measure APOBEC activity indirectly using the mutation patterns APOBEC makes when active.
Some implementations can then use the kmerCounter script to quantify the number of mutated and normal kmers in RNA-seq samples. The number of normal kmers can be used as a normalizing factor to account for biases in library depth. For example, if you sequence more, you may identify more reads, more errors, etc. Normal background expression can be used to subtract out noise.
Embodiments can create a repository of presynthesized validated vaccines, which would be applicable to a significant number of individuals, as they focus on TE sequences that are not mutated but are differentially expressed. For a given individual, measurements can be made to determine which of the preselected panel of proteins/kmers are overexpressed, and then use the corresponding vaccines. One or more vaccines can be used in combination.
A. Method of Identifying Personalized Vaccine
At block 510, a group of candidate cancer antigens (referred to as candidate target proteins in
At block 520, a baseline expression level is determined for each of the candidate cancer antigens. The baseline expression levels can be determined using measurements of healthy tissue from one or more healthy subjects. A baseline level can include a distribution of levels from the healthy tissue, which can provide information about the likelihood of a measured expression level being from healthy tissue. As an example, a certain number of standard deviations can be used as a cutoff to discriminate between a normally expressed and overly expressed.
At block 530, a tumor expression level is determined for each of the candidate cancer antigens using measurements of tumor tissue from the patient. The tumor expression level can be determined in various ways, e.g., as described herein. Non-tumor tissue can be collected along with the tumor tissue (e.g. tumor adjacent tissue) and expression levels in that non-tumor tissue can be measured to provide the baseline expression level.
At block 540, a differential expression level is determined for each of the candidate cancer antigens using the baseline expression levels and the tumor expression levels. The differential expression level can be determined in various ways, e.g., as described herein.
At block 550, one or more of the candidate cancer antigens having a differential expression level greater than a threshold are selected. These candidate cancer antigens would be ones that are overly expressed in the patient.
At block 560, a cancer vaccine corresponding to the one or more of the candidate cancer antigens is selected. In this manner, embodiments can determine which vaccine to use alone or in combination. For example, there may be 5-10 highly ranked candidate cancer antigens identified, and their corresponding vaccines can be used in combination.
In some implementations, an expected efficacy can be measured. For example, an expected efficacy of the cancer vaccine can be determined based on APOBEC activity in the tumor tissue. APOBEC activity can be measured by determining an amount of RNA molecules having an APOBEC mutation signature, e.g., as disclosed herein.
B. Microarray
In some embodiments, microarray technology can be used to detect the tumor expression levels for a subject for determining which vaccine(s) to use. The microarray would include probes (e.g., nucleic acids) that bind to the cancer antigens/RNA in the candidate library. A biopsy from the patient can be used to prepare the sample for use with the microarray. The expression levels for the proteins can be compared to the reference levels to determine which proteins are most highly overexpressed, e.g., as described for
At block 610, samples are obtained from the disease population. The disease population can be a subpopulation having particular characteristics, e.g., cancer of a same tissue type, of a same age and other demographic information, similar HLA type, and other characteristics described herein.
At block 620, RNA sequencing data is generated from the samples. One will appreciate the various RNA sequencing techniques that can be used. As an alternative, direct protein measurements can be performed, e.g., mass spectrometry.
At block 630, a computational framework as described herein is applied to detect TE proteins (e.g., L1HS) that are overexpressed. Such a computational framework can determine an expression of TE proteins on a surface of the cancer cells and compare the measured expression to a baseline expression expected in healthy cells.
At block 640, patients that coexpress these L1HS candidate cancer antigens are identified. In this manner, the candidate cancer antigens that occur often in the population can be identified. Since these candidate cancer antigens occur in a significant portion of the selected population (e.g., as determined by a threshold, such as 5%, 10%, or 20%), it is likely that a new patient will have the same cancer antigen overexpressed. Each subtype can have a same cancer antigen overly expressed.
At block 650, candidate cancer antigens are validated clinically. This step can take the vaccines that are shared within a group of patients and develop them into therapies. Testing can be performed for safety and efficacy in model organisms and human subjects.
The APOBEC signature probes 660 can be used to determine whether the subject would be responsive to certain vaccines, e.g., as a high level of APOBEC activity can be used to confirm that TE overexpression is present. Probes 660 can be used as an orthogonal signal to help guide the identification of the appropriate treatment for the patient.
MHC presentation pathway probes 670 can quantify the expression of MHC molecules so as to determine which MHC haplotype is present. Further, downregulation of MHC associated genes can be correlated with progressive disease. Patients who have downregulation with MHC tend to not respond to checkpoint blockade. If this patient has downregulation of MHC, an additional immune therapy that increases the MHC expression can be used. An option is to increase the expression using a cytokine like interferon gamma (Garrido, F. at al., The urgent need to recover MHC class I in cancers for effective immunotherapy, Curr Opin Immunol. 2016 April; 39:44-51).
In some embodiments, a microarray can comprise a first array of nucleic acid probes (e.g., 680) that hybridize to cDNA from transposable elements in a human genome. The first array of nucleic acid probes can includes one or more sequences from table 2 of the Appendix, which provides RNA sequence probes from various L1HS loci. Each row of table 2 provides a sequence, along with Class, Chromosome, Start Index, Stop Index, Strand, ORF, and Peptide Start Index. The class is L1HS for each of the sequences in table 2, but other classes of transposable elements can be used. The start and stop index refers to where the TE starts and stops in the genome. The strand refers to which strand is the sense strand for the TE, i.e., +/−. The ORF is the open reading frame within this locus. The Peptide start index refers to where in this ORF the peptide in question starts. The first array can include at least any 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 sequences from the list. In some implementations, the first array includes probes that include at least the five sequences:
The microarray can further comprise a second array of nucleic acid probes (e.g., 670) that hybridize to cDNA corresponding to genes involved in processing antigens for presentation on MHC molecules. These probes can test for defects in the pathway, which is a common mechanism for cancer cells to evade immune recognition. The second array of nucleic acid probes can include one or more sequences from table 3 of the Appendix, which provides RNA sequence probes for detecting different MHC alleles. Each row of table 3 provides the sequence and a name of a gene in the MHC presentation pathway. These genes were found to be differentially expressed between responders and non-responders to checkpoint blockade therapy. The genes are as follows: ERAP1: Endoplasmic Reticulum Aminopeptidase 1, ERAP2: Endoplasmic Reticulum Aminopeptidase 2, TAP1: Transporter 1: ATP Binding Cassette Subfamily B Member; TAP2: Transporter 2, ATP Binding Cassette Subfamily B Member; B2M: Beta-2-Microglobulin, HLA-A: Major Histocompatibility Complex, Class I, A; HLA-B: Major Histocompatibility Complex, Class I, B; HLA-C: Major Histocompatibility Complex, Class I, C; HLA-E: Major Histocompatibility Complex, Class I, E; and HLA-F: Major Histocompatibility Complex, Class I, F. The second array can include at least any 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 sequences from the list. In some implementations, the third array includes at least one probe for each of the four genes: TAP1, ERAP1, B2M, and HLA-A.
The microarray can further comprise a third array of nucleic acid probes (e.g., 660) that hybridize to cDNA corresponding to RNA transcripts that have been mutated by the APOBEC proteins. APOBEC activity is a marker of transposable element activation and correlates with response to immunotherapy. The third array of nucleic acid probes can include one or more sequences from table 4 of the Appendix, which provides RNA sequence probes for detecting APOBEC mutations. These probes are labeled as determined by a synthetic mutational techniques, e.g., as described herein. The third array can include at least any 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 sequences from the list. In some implementations, the third array includes probes that include at least the five sequences:
A. Creation of LINE-1 Peptide Kmer Database and APOBEC Signature
In order to quantify the expression of L1HS and APOBEC antigen signatures in healthy and cancer tissue samples, a database of kmers was generated using the Gencode V32 genome and transcriptome reference files and the L1base 2.0 annotation for full length L1HS elements and L1HS elements with intact ORF2 sequences [37,42]. This resulted in the generation of 38 unique ORF1 sequences and 56 unique ORF2 sequences. These ORF sequences were then analyzed for conserved protein domains using the Pfam software [38]. We found that ORF1 contained conserved LINE-1 domains, including the L1 RNA Binding Domain (RBD)-Like domain, the double stranded RBD-like domain, and the L1 trimerization domain (
The red line corresponds to the sequence similarity across an L1HS ORF multiple sequence alignment. The sequence similarity is across other L1HS regions of the genome. In the first open reading frame in
The data in
Filtering for predicted MEW binders generated a preference for 9mer epitopes (
Hotspots within the L1HS ORFs for generating MHCI binding peptides were analyzed as shown in
B. MHC Kmers Expressed Across Developmental Stages
A strength of this approach relies on the ability to identify L1HS peptides that are almost never expressed in healthy tissue. This is a challenge to identify because access to healthy tissue is limited, but fortunately a database of healthy human tissue was recently published (N=310) [48]. The mammalian expression database is particularly useful because it includes 7 human tissue types sampled across 23 developmental timepoints.
Transposable expression is expected to be higher during embryonal human developmental stages because regions of the genome that are not usually expressed become activated to support early human development [21]. We identified 1,649 L1HS epitope kmers with a count of at least 2 reads. There were 667 L1HS epitopes that were never detected across all 311 RNA-seq samples. We found 11 L1HS epitopes with decreasing expression across developmental stages and 36 kmers with increasing expression (Kruskal test: adjusted p-value <0.05).
Overall, consistently low expression of L1HS epitope kmers was observed across developmental stages and tissue types. As expected, constant expression of L1HS epitope sequences in brain tissues across developmental stages [24]. Similarly, constant expression across developmental stages in germline testis tissue was observed, but as well as constant expression in liver tissue (Kruskal test: p-value >0.05). Extracranial tissue including heart and kidney had high levels of expression in the embryo, but significantly lower expression in postnatal samples (Kruskal test: p-value <0.05).
Differential expression of several APOPBEC genes was observed with the highest expression at embryonic stages. Interestingly, a spike in L1HS expression and APOBEC expression was observed in the school-age children samples. A similar expression pattern was seen in synthetically mutated APOBEC3C kmers where embryonic tissue had the highest number of mutated kmers and later stages had lower expression.
C. L1HS Peptides are Presented on Triple Negative Breast Cancer Cells but not Matched Normal Cells
Triple negative breast cancer (TNBC) is an aggressive disease that is resistant to multimodal therapy. Immunotherapy has recently been approved as a first-line treatment for TNBC, but response rates remain low and additional strategies are needed to improve durable response rates [50]. The disclosed analysis of RNA-seq identifies TE T cell epitopes that are likely to be presented on MHC, but there are additional regulatory mechanisms that may prevent some of these peptides from being efficiently processed and presented on the MHC. Recent improvements in the resolution of mass spectrometry equipment has allowed for the identification of short peptides, including MHC-bound peptides [51,52]. Isolation of MEW peptides followed by high-resolution mass spectrometry identifies potential cancer antigens for TNBC.
While it is known that TEs are overexpressed in cancer cells, there has been limited data presented to show that TE peptides are presented by cancer cell MHCs. The L1HS epitope database, Immune Epitope Database (IEDB), and a publicly available immunopeptidome dataset for a cohort of TNBC tumor and matched normal samples was used to investigate whether shared candidate cancer antigens were presented on cancer samples but not matched normal samples (Table 1). Using the MaxQuant search algorithm for mass spectrum matching, we identified three L1HS peptides presented on 5 different patient tumor samples (Table 1). Two of the peptides were shared across different TNBC samples, suggesting that public antigens are similarly processed and presented across individuals with likely different HLA genotypes. This evidence shows that L1HS peptides are identifiable in patient tumor samples using mass spectrometry analysis and further supports these molecules as viable cancer antigens for combination immunotherapy. Furthermore, no L1HS peptides were detected on matched normal tissue samples that were similarly analyzed by MEW peptidome profiling.
Table 1 shows peptides that map to L1HS open reading frames that were presented on triple negative breast cancer tumors. Although the distribution of predicted binders did not show a preference to protein domain, all of the peptides for this small set of samples that were presented on the tumor cell surface mapped to a functional domain within the L1HS gene. This shows that while using the disclosed algorithmic approaches, there was no preference towards a particular protein domain.
L1HS epitope expression was then investigated using the TCGA TNBC cohort (N=190). A total of 1,428 L1HS epitope kmers were found with a count of at least 2 reads. There were 162 L1HS epitope sequences that were never detected in the healthy tissue compendium. The average number of expressed kmers per sample was 72, and the average number of expressed kmers predicted to bind to one of the patient's HLA alleles was 22. The average overlap in kmers across unrelated TNBC tumor samples with nonzero L1HS kmer expression was 6%. The number of expressed HLA-matched L1HS epitope binders was correlated with the TNBC patient's overall survival data A 58% decrease in the Cox proportional hazard ratio (95% CI: 0.19-0.97, p<0.05) was observed. Further amplification of the anti-L1HS immune response may through the use of TE peptides promote the antitumor immune response.
D. Shared L1HS Epitope Expression Occurs Across TCGA Cancer Types but not Normal Samples
It was then investigated whether the expressed L1HS epitopes were specific to cancer types or whether there were shared epitopes across diseases (
E. L1HS Kmers that Correlate with Checkpoint Blockade Response
Some embodiments can use TE vaccine therapies in combination with checkpoint blockade therapy. To investigate the clinical efficacy of this approach, the number of predicted L1HS epitopes was correlated to the response to checkpoint blockade therapy in a set of 129 melanoma tumor samples. It was found that patients with a complete response to checkpoint blockade therapy had more predicted MHC-bound LINE-1 peptides compared to samples with progressive disease or stable disease (Mann-Whitney U-test p-value <0.05,
Progressive disease (PD) means that checkpoint blockade therapy was given and the tumor kept progressing. Stable disease (SD) means the tumor stayed the same size, and then partial response (PR) means that the tumor reduced in size but did not meet the criterion for complete response.
Checkpoint blockade therapy has generated remarkable responses in a subset of cancer patients, but further research into combination therapies is needed to increase the number of patients who benefit [4,10,53]. Disclosed herein is a computational framework for prioritizing transposable element (TE) epitopes for personalized cancer vaccine therapies. It is hypothesized that combination TE vaccine immunization and checkpoint blockade therapy may tip the balance in favor of immune-mediated destruction of the tumor. A combination cancer vaccine and checkpoint blockade therapy was used recently to treat glioblastoma and this study found that these therapies work synergistically [10]. The power of the immune system to destroy cancer at a cellular level, throughout the body, and to maintain a memory against recurrence allows for this therapeutic approach to achieve durable response and potentially cure patients of their cancer.
We identified peptides that are expressed in cancer cells but not healthy cells. We applied our approach to a large cohort of 311 healthy RNA-seq datasets across 23 developmental stages and 7 tissue types. While we detected L1HS expression in these samples, we found that cancer cells express additional L1HS peptides that were never detected in the healthy control cohort. This suggests that it is possible to identify a subset of L1HS peptides that are only expressed in cancer cells, so amplification of an immune response against these peptides may not generate off-target effects that may be toxic to the patient.
Much of the data on TE expression in the literature is based on RNA-seq data, but whether these elements generate peptides that are presented on human cancer cell MHCs has not been sufficiently investigated. Disclosed herein is evidence that L1HS peptides are indeed presented by cancer cells in triple negative breast cancer tumors but not matched normal tissue samples. This shows that not only are these elements aberrantly expressed in cancer cells, but these TE transcripts are translated into proteins and these proteins are properly processed and presented by MHC molecules. Moreover, we found that expression of predicted MHC bound TE peptides lead to a 58% reduction in the Cox proportional hazards ratio for the TCGA TNBC cohort. Thus, people having cancers with these overexpressed peptides generally do better, as they have less risk than patients that have low express. This underscores the benefit of these molecules for treating cancer, since the expression of these molecules correlates with better patient outcomes, presumably since these molecules may induce immune responses that limit tumor growth.
Lastly, L1HS epitope expression was correlated with response to checkpoint blockade therapy in melanoma [54,55]. Surprisingly, the expression of L1HS epitopes correlated with the complete response group of melanoma patients. Introduction of checkpoint blockade therapy may have augmented the immune response. Notably, the expression of these peptides was low, but detectable in non-responders or partial responders.
These results provide hope that further expansion of T-cells that are able to recognize cancer cells through identification of tumor-specific TE expression analysis may increase the number of patients that experience durable responses. One of the many strengths of this approach is that these peptides are shared across individuals. We propose a novel therapeutic paradigm for matching tumors to a repository of validated cancer vaccines for efficient distribution and administration of therapy. This includes the screening of large cancer RNA-seq data sets for the most commonly overexpressed epitopes, prioritizing epitopes that correlate with patient benefit. We then propose synthesizing, quality control, and validation of these peptides before mass production and distribution to treat cancer at scale.
Transposable elements make up ˜40% of the human genome, encode viral like proteins, and are strongly repressed in somatic cells. This makes them attractive targets for cancer vaccine development, but the sequence similarity and complexity of the genome makes it difficult to identify which peptides to prioritize. Disclosed herein is an exciting new computational framework based on unique expression of MHC bound peptide kmers. This approach was able to identify expression of L1HS epitopes that correlated with better survival outcomes and complete response to checkpoint blockade therapy.
Logic system 1430 may be, or may include, a computer system, ASIC, microprocessor, etc. It may also include or be coupled with a display (e.g., monitor, LED display, etc.) and a user input device (e.g., mouse, keyboard, buttons, etc.). Logic system 1430 and the other components may be part of a stand-alone or network connected computer system, or they may be directly attached to or incorporated in a device (e.g., a sequencing device) that includes detector 1420 and/or assay device 1410. Logic system 1430 may also include software that executes in a processor 1450. Logic system 1430 may include a computer readable medium storing instructions for controlling measurement system 1400 to perform any of the methods described herein. For example, logic system 1430 can provide commands to a system that includes assay device 1410 such that sequencing or other physical operations are performed. Such physical operations can be performed in a particular order, e.g., with reagents being added and removed in a particular order. Such physical operations may be performed by a robotics system, e.g., including a robotic arm, as may be used to obtain a sample and perform an assay.
System 1400 may also include a treatment device 1460, which can provide a treatment to the subject. Treatment device 1460 can determine a treatment and/or be used to perform a treatment. Examples of such treatment can include surgery, radiation therapy, chemotherapy, immunotherapy, targeted therapy, hormone therapy, and stem cell transplant. Logic system 1430 may be connected to treatment device 1460, e.g., to provide results of a method described herein. The treatment device may receive inputs from other devices, such as an imaging device and user inputs (e.g., to control the treatment, such as controls over a robotic system).
Any of the computer systems mentioned herein may utilize any suitable number of subsystems. Examples of such subsystems are shown in
The subsystems shown in
A computer system can include a plurality of the same components or subsystems, e.g., connected together by external interface 81, by an internal interface, or via removable storage devices that can be connected and removed from one component to another component. In some embodiments, computer systems, subsystem, or apparatuses can communicate over a network. In such instances, one computer can be considered a client and another computer a server, where each can be part of a same computer system. A client and a server can each include multiple systems, subsystems, or components.
Aspects of embodiments can be implemented in the form of control logic using hardware circuitry (e.g. an application specific integrated circuit or field programmable gate array) and/or using computer software with a generally programmable processor in a modular or integrated manner. As used herein, a processor can include a single-core processor, multi-core processor on a same integrated chip, or multiple processing units on a single circuit board or networked, as well as dedicated hardware. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement embodiments of the present disclosure using hardware and a combination of hardware and software.
Any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C, C++, C#, Objective-C, Swift, or scripting language such as Perl or Python using, for example, conventional or object-oriented techniques. The software code may be stored as a series of instructions or commands on a computer readable medium for storage and/or transmission. A suitable non-transitory computer readable medium can include random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk) or Blu-ray disk, flash memory, and the like. The computer readable medium may be any combination of such storage or transmission devices. In addition, the order of operations may be re-arranged. A process can be terminated when its operations are completed, but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination may correspond to a return of the function to the calling function or the main function.
Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet. As such, a computer readable medium may be created using a data signal encoded with such programs. Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via Internet download). Any such computer readable medium may reside on or within a single computer product (e.g. a hard drive, a CD, or an entire computer system), and may be present on or within different computer products within a system or network. A computer system may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
Any of the methods described herein may be totally or partially performed with a computer system including one or more processors, which can be configured to perform the steps. Thus, embodiments can be directed to computer systems configured to perform the steps of any of the methods described herein, potentially with different components performing a respective step or a respective group of steps. Although presented as numbered steps, steps of methods herein can be performed at a same time or at different times or in a different order. Additionally, portions of these steps may be used with portions of other steps from other methods. Also, all or portions of a step may be optional. Additionally, any of the steps of any of the methods can be performed with modules, units, circuits, or other means of a system for performing these steps.
The specific details of particular embodiments may be combined in any suitable manner without departing from the spirit and scope of embodiments of the disclosure. However, other embodiments of the disclosure may be directed to specific embodiments relating to each individual aspect, or specific combinations of these individual aspects.
The above description of example embodiments of the present disclosure has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form described, and many modifications and variations are possible in light of the teaching above.
A recitation of “a”, “an” or “the” is intended to mean “one or more” unless specifically indicated to the contrary. The use of “or” is intended to mean an “inclusive or,” and not an “exclusive or” unless specifically indicated to the contrary. Reference to a “first” component does not necessarily require that a second component be provided. Moreover, reference to a “first” or a “second” component does not limit the referenced component to a particular location unless expressly stated. The term “based on” is intended to mean “based at least in part on.”
All patents, patent applications, publications, and descriptions mentioned herein are incorporated by reference in their entirety for all purposes. None is admitted to be prior art.
The present application is the National Stage of International Application No. PCT/US2020/056344, filed Oct. 19, 2020, claims priority from and is a nonprovisional application of U.S. Provisional Application No. 62/916,816, entitled “System And Method For Discovering, Validating, And Personalizing Transposable Element Cancer Vaccines,” filed Oct. 18, 2019, the entire contents of which are herein incorporated by reference for all purposes. The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Nov. 25, 2020, is named 102913-002210WO1-1192415_SL.txt and is 68,922 bytes in size.
This invention was made with government support under grant no. U54HG007990 awarded by the National Human Genome Research Institute of the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/056344 | 10/19/2020 | WO |
Number | Date | Country | |
---|---|---|---|
62916816 | Oct 2019 | US |