The present disclosure is related to a prognostic biomarker for stratifying individuals suffering from cancer to determine which individuals are more likely to have severe forms of the disease. In particular, embodiments relate to a test that analyzes multiple markers that are predictive of the severity of tumors in individuals.
There is a need for new diagnostic and prognostic tests to help medical professionals understand the severity of cancer in individuals. In some cases, there are no effective ways to determine how severe the effects of a particular tumor may be on an individual. In addition, there is a need to develop robust tests for determining how successful a particular treatment has been for an individual.
In some embodiments, a method of predicting the outcome of treating an individual suffering from cancer with an anti-cancer treatment is provided.
In some embodiments, the method of predicting comprises obtaining RNA sequence expression data from a biopsy taken from an individual having cancer, analyzing the RNA sequence expression data to determine if expression of cell-type specific markers in the biopsy are above a threshold value, analyzing the RNA sequence expression data to determine if hERV/retro-transposon gene expression is found within the biopsy, determining the cancer prognosis of the individual based on the threshold value and presence of the hERV/retro-transposon gene expression in the biopsy, and combining the expression profile of cell-type specific markers and expression profile of hERV/retro-transposon transactivation antigens to predict an outcome of an anti-cancer treatment.
In some embodiments of the method of predicting, the cancer comprises a tumor and the biopsy is a tumor biopsy.
In some embodiments of the method of predicting, analyzing the RNA sequence expression data comprises performing a transcriptome sequence analysis of global human endogenous retrovirus (hERV)/retro-transposon transactivation.
In some embodiments of the method of predicting, obtaining RNA sequence expression data comprises isolating total RNA from the cells, and performing next generation sequencing on the RNA sample to obtain the RNA sequence expression data.
In some embodiments of the method of predicting, analyzing the RNA sequence expression data to determine if hERV/retro-transposon gene expression is found comprises measuring expression of the hERV 2650 gene located on chromosome 7.
In some embodiments of the method of predicting, the cancer is selected from the group consisting of colorectal (CRC), breast adenocarcinoma, pancreatic adenocarcinoma, lung carcinoma, prostate cancer, glioblastoma multiform, hormone refractory prostate cancer, solid tumor malignancies such as colon carcinoma, non-small cell lung cancer (NSCLC), anaplastic astrocytoma, bladder carcinoma, sarcoma, ovarian carcinoma, rectal hemangiopericytoma, pancreatic carcinoma, advanced cancer, cancer of large bowel, stomach, pancreas, ovaries, melanoma, pancreatic cancer, colon cancer, bladder cancer, hematological malignancies, squamous cell carcinomas, breast cancer, glioblastoma, brain neoplasms, pilocytic astrocytoma, diffuse astrocytoma, anaplastic astrocytoma, brain stem gliomas, glioblastomas multiforme, meningioma, ependymomas, oligodendrogliomas, mixed gliomas, pituitary tumors, craniopharyngiomas, germ cell tumors, pineal region tumors, medulloblastomas, and primary CNS lymphomas.
In some embodiments of the method of predicting, the anti-cancer treatment is selected from the group consisting of surgery, radiation therapy, chemotherapy, immunotherapy, targeted therapy, hormone therapy, stem cell transplant, cytokine therapy, gene therapy, cell therapy, phototherapy, thermotherapy, and sound therapy.
In some embodiments of the method of predicting, the anti-cancer treatment comprises an anti-cancer chemotherapeutic selected from the group consisting of Cyclophosphamide, methotrexate, 5-fluorouracil, vinorelbine, Doxorubicin, cyclophosphamide, Docetaxel, doxorubicin, cyclophosphamide, Doxorubicin, bleomycin, vinblastine, dacarbazine, Mustine, vincristine, procarbazine, prednisolone, Cyclophosphamide, doxorubicin, vincristine, prednisolone, Bleomycin, etoposide, cisplatin, Epirubicin, cisplatin, 5-fluorouracil, Epirubicin, cisplatin, capecitabine, Methotrexate, vincristine, doxorubicin, cisplatin, Cyclophosphamide, doxorubicin, vincristine, vinorelbine, 5-fluorouracil, folinic acid, and oxaliplatin.
In some embodiments of the method of predicting, the cell-type specific markers are selected from the group consisting of: human endogenous retroviral (HERV) gene expression markers, tumor infiltrating lymphocyte (TIL) markers, microsatellite instability (MSI) status markers, and tumor mutational burden (TMB) markers.
In some embodiments of the method of predicting, the cell-type specific markers comprise markers associated with one or more of CD8+ T, CD4+ T, and CD19+ B cells.
In some embodiments of the method of predicting, the hERV/retro-transposon gene expression level is calculated using a univariate analysis of hERV gene expression.
In some embodiments, a method of obtaining a cellular signature of cells infiltrating a tumor is provided.
In some embodiments, the method of obtaining a cellular signature comprises obtaining a tumor, isolating cells of the tumor, isolating total RNA from the cells, performing RNAseq to obtain RNA sequence expression data, analyzing the RNA sequence expression data using a deconvolution algorithm to obtain an expression profile of cell-type specific markers, and determining a fraction of a cell-type based on the expression profile of cell-type specific markers in the RNA sequence expression data.
In some embodiments, the method of obtaining a cellular signature further comprises comparing the expression profile of cell-type specific markers and/or the expression profile of hERV/retro-transposon transactivation antigens and/or the fraction of one or more immune cell types in the tumor to a predetermined threshold, and administering an immune checkpoint inhibitor therapy to a patient if the tumor obtained from said patient exhibits a fraction above the predetermined threshold.
In some embodiments of the method of obtaining a cellular signature, the cell-type specific markers comprise markers associated with one or more of CD8+ T, CD4+ T, and CD19+ B cells.
In some embodiments of the method of obtaining a cellular signature, the immune checkpoint inhibitor therapy comprises a checkpoint inhibitor selected from the group consisting of Pembrolizumab (Keytruda), Nivolumab (Opdivo), Cemiplimab (Libtayo) Atezolizumab (Tecentriq), Avelumab (Bavencio), Durvalumab (Imfinzi), and Ipilimumab (Yervoy).
In some embodiments, a method of obtaining a composite score of global human endogenous retrovirus (hERV)/retro-transposon transactivation is provided.
In some embodiments, the method of obtaining a composite score method comprises obtaining a tumor, isolating cells of the tumor, isolating total RNA from the cells, performing RNAseq to obtain RNA sequence expression data, and analyzing the RNA sequence expression data to obtain an expression profile of hERV/retro-transposon transactivation antigens.
In some embodiments, the method of obtaining a composite score further comprises comparing the expression profile of cell-type specific markers and/or the expression profile of hERV/retro-transposon transactivation antigens and/or the fraction of one or more immune cell types in the tumor to a predetermined threshold, and administering an immune checkpoint inhibitor therapy to a patient if the tumor obtained from said patient exhibits a fraction above the predetermined threshold.
In some embodiments of the method of obtaining a composite score, the immune checkpoint inhibitor therapy comprises a checkpoint inhibitor selected from the group consisting of Pembrolizumab (Keytruda), Nivolumab (Opdivo), Cemiplimab (Libtayo) Atezolizumab (Tecentriq), Avelumab (Bavencio), Durvalumab (Imfinzi), and Ipilimumab (Yervoy).
Embodiments of the present disclosure relate to prognostic systems and methods for predicting the future health of an individual or individual. Embodiments relate to the discovery that the composite score generated from a transcriptome sequence analysis of global human endogenous retrovirus (hERV)/retro-transposon transactivation combined with a cell signature generated using deconvolution of immune cells within a tumor sample could be prognostic for predicting the health of an individual. In addition, the composite score may be useful for predicting the efficacy of chemotherapeutic agents and immune checkpoint inhibitors used on the individual population. In some embodiments, provided herein are survival analyses in individuals receiving chemotherapeutic agents and immune checkpoint inhibitors based on the composite score that is based on the level of hERV viral DNA and immune cell infiltration found in an individual's tumor sample.
There is a need for new diagnostic and prognostic biomarkers for cancers. For example, colorectal cancer (CRC) individuals have poor prognosis and there is a need for new diagnostic and prognostic biomarkers to avoid CRC-related deaths and avoid overtreatment. While the clinicopathological features such as tumor-node-metastasis (TNM) staging status at diagnosis, lymph node (LN) involvement (pNO-pN2), age, sidedness, etc. are well-established biomarkers of poor prognosis, the significance of molecular and cellular markers is well demonstrated in a clinical setting.
Some embodiments herein are related to methods of stratifying individuals to better predict the outcome of a treatment. In some embodiments, the treatment is an anti-cancer treatment. In some embodiments, the anti-cancer treatment is based on anti-cancer chemotherapeutics and/or checkpoint inhibitors. In some embodiments, the anti-cancer treatment is based on checkpoint inhibitors. In some embodiments, the anti-cancer treatment is based on anti-cancer chemotherapeutics or checkpoint inhibitors. In some embodiments, the anti-cancer treatment is based on anti-cancer chemotherapeutics and/or checkpoint inhibitors.
Non-limiting examples of anti-cancer chemotherapeutics include Cyclophosphamide, methotrexate, 5-fluorouracil, vinorelbine, Doxorubicin, cyclophosphamide, Docetaxel, doxorubicin, cyclophosphamide, Doxorubicin, bleomycin, vinblastine, dacarbazine, Mustine, vincristine, procarbazine, prednisolone, Cyclophosphamide, doxorubicin, vincristine, prednisolone, Bleomycin, etoposide, cisplatin, Epirubicin, cisplatin, 5-fluorouracil, Epirubicin, cisplatin, capecitabine, Methotrexate, vincristine, doxorubicin, cisplatin, Cyclophosphamide, doxorubicin, vincristine, vinorelbine, 5-fluorouracil, folinic acid, and oxaliplatin.
Non-limiting examples of checkpoint inhibitors include Pembrolizumab (Keytruda), Nivolumab (Opdivo), Cemiplimab (Libtayo) Atezolizumab (Tecentriq), Avelumab (Bavencio), Durvalumab (Imfinzi), and Ipilimumab (Yervoy).
In some embodiments, the systems and methods provided herein can be used to stratify individuals undergoing other forms of anti-cancer therapies. Non-limiting examples include surgery, radiation therapy, chemotherapy, immunotherapy, targeted therapy, hormone therapy, stem cell transplant, cytokine therapy, gene therapy, cell therapy, phototherapy, thermotherapy, and sound therapy.
In some embodiments, developing the composite score includes a method of obtaining a cellular signature of immune cells infiltrating a tumor. In some embodiments, the method comprises obtaining a tumor from an individual biopsy, isolating cells of the tumor, isolating total RNA from the cells, and performing next generation sequencing on the RNA sample (RNAseq) to obtain RNA sequence expression data for the transcriptome of the tumor cells. This tumor transcriptome is then analyzed using a deconvolution algorithm to obtain an expression profile of immune cell-type specific markers, and then determining a fraction of cells in the tumor sample based on the expression profile of cell-type specific markers in the RNA sequence expression data. A non-limiting example of a deconvolution analysis is provided in Example 1.
There is value in understanding the tumor microenvironment for its impact on tumor progression and immunotherapy efficacy. Computational tools based on gene expression data have shown promise for their ability to deconvolve the tumor microenvironment and report the types of immune cells present in heterogeneous tumor samples. In some embodiments, the method is related to obtaining a signature of cells infiltrating a tumor. In some embodiments, this information is used to determine the level of immune cell infiltration within a tumor. In some embodiments, this information is used to determine the type of cells that have infiltrated a tumor. In some embodiments, this information is used to determine the type of cells and the amount of each type of cell that have infiltrated a tumor.
Non-limiting examples of tumor/cancer include breast adenocarcinoma, pancreatic adenocarcinoma, lung carcinoma, prostate cancer, glioblastoma multiform, hormone refractory prostate cancer, solid tumor malignancies such as colon carcinoma, non-small cell lung cancer (NSCLC), anaplastic astrocytoma, bladder carcinoma, sarcoma, ovarian carcinoma, rectal hemangiopericytoma, pancreatic carcinoma, advanced cancer, cancer of large bowel, stomach, pancreas, ovaries, melanoma, pancreatic cancer, colon cancer, bladder cancer, hematological malignancies, squamous cell carcinomas, breast cancer, glioblastoma, or any neoplasm associated with brain including, but not limited to, astrocytomas (e.g., pilocytic astrocytoma, diffuse astrocytoma, anaplastic astrocytoma, and brain stem gliomas), glioblastomas (e.g., glioblastomas multiforme), meningioma, other gliomas (e.g., ependymomas, oligodendrogliomas, and mixed gliomas), and other brain tumors (e.g., pituitary tumors, craniopharyngiomas, germ cell tumors, pineal region tumors, medulloblastomas, and primary CNS lymphomas). In some embodiments, the tumor/cancer is related to one or more types of tumor/cancer provided herein.
The rise of immunotherapy in cancer treatment has resulted in increased interest in the immune microenvironment of the tumor. Better understanding the immune microenvironment could elucidate how and when immunotherapy will be effective.
A common approach to recapitulating the immune microenvironment is through cell type deconvolution, which models the complex mixture of cell types in a bulk tumor sample as a linear combination of a set of (characterized) prototypical cell signatures.
Without being limited by any particular theory, the high association between CD8+ T cells, Treg and PD1 indicates exhaustion of CD8+ T cells in most individuals. In some embodiments, tumor purity and immune infiltration are anti-correlated.
In some embodiments, a process termed “FRICTION”, for cell type deconvolution is provided (Example 1;
In some embodiments, FRICTION was trained to detect three cell types: CD8+ T, CD4+ T and CD19+ B cells. In some embodiments, the technique was validated using spike-in cell titrations, immunohistochemistry (IHC) staining of formalin-fixed, paraffin-embedded (FFPE) tumor samples and flow cytometry. The titration experiments (e.g.,
In some embodiments, the FRICTION process provides a novel approach to cell type deconvolution, focusing on robust normalization and background correction to produce estimates of the absolute concentration of immune cell types. In some embodiments, FRICTION has been developed to be robust to many different tissue backgrounds, and produces an estimate of the fraction of each of its signature immune cell types. Tumors with immune cell infiltration can be candidates for anti-cancer chemotherapy, for example, using checkpoint inhibitors. Without being limited by any particular theory, it is believed that checkpoint inhibitors stimulate the immune system to generate an immune response against tumor antigens. In some embodiments, increased levels of immune cell infiltration are associated with increased patient response to checkpoint inhibitor therapy, and thus can be used as a biomarker to identify patients that are candidates for checkpoint inhibitor therapy. In some embodiments, immune cell infiltration above a threshold level is associated with increased responsiveness to checkpoint inhibitor therapy.
Though the current version of FRICTION is trained using signatures from three cell types (CD8+ T cells, CD4+ T cells and CD19+ B cells) the procedure is general such that, with gene expression data from additional cell types, further signatures could be added. Thus, in some embodiments, cell type deconvolution from RNA-seq data is possible. Non-limiting examples of other cellular signatures include B Cells, Dendritic Cells, Granulocytes, Innate Lymphoid Cells, Megakaryocytes, Monocytes/Macrophages, Myeloid-derived Suppressor Cells, Natural Killer Cells, Platelets, Red Blood Cells, T Cells, and Thymocytes. Other ongoing work involves further validation of the algorithm with additional flow cytometry experiments. Further algorithmic improvements to address correlated cell types and data normalization will continue to increase performance.
In some embodiments, the deconvolution analysis can be applied to other types of sequence data, for example, ATAC-seq data, which is generated by cutting accessible DNA and reading cluster around open chromatin. In some embodiments, ATAC-seq is quick and easy and may even be a more direct measure of cell type than RNA.
Human Endogenous Retrovirus (hERV) Analysis
In some embodiments, the composite score is generated by also including a method of obtaining a score of global human endogenous retrovirus (hERV)/retro-transposon transactivation. In some embodiments, the method is related to obtaining a score of transactivation of all hERV sequences in the genome. In some embodiments, the method comprises obtaining a tumor, isolating cells of the tumor, isolating total RNA from the cells, performing RNAseq to obtain RNA sequence expression data, and analyzing the RNA sequence expression data to obtain an expression profile of hERV/retro-transposon transactivation antigens.
Viral sequences such as endogenous retrovirus (hERV) and/or retro-transposons are embedded in a genome. Normally, these viral sequences are silenced by methylation. However, in some tumors these silenced viral sequences are reactivated. Tumors with activated viral sequences can be candidates for anti-cancer chemotherapy, for example, using checkpoint inhibitors. Without being limited by any particular theory, it is believed that checkpoint inhibitors stimulate the immune system to generate an immune response against these viral sequences.
In some embodiments, a method of predicting how well an anti-cancer treatment may work on an individual is provided. In some embodiments, the method comprises obtaining RNA sequence expression data from a tumor biopsy taken from an individual having cancer, analyzing the RNA sequence expression data to determine if expression of immune cell markers in the tumor biopsy are above a threshold value, analyzing the RNA sequence expression data to determine if hERV/retro-transposon genes are found within the tumor biopsy, and determining if the cancer prognosis of the individual based on the threshold value and presence of the hERV/retro-transposon in the tumor biopsy.
As used herein, a “threshold” value is based on a percentile score. The threshold can vary based on an embodiment of the method or the parameter that is analyzed (e.g. hERV versus an immune cell). The threshold can also vary depending on the number of additional parameters included in an embodiment of a method.
The prognosis can be quantified using “hazard ratio” (HR), which is a probability of a “hazard” to a population (e.g., disease, debilitation, death, unresponsiveness to a treatment, etc.) determined as a statistics-based correlation between frequency to or more parameters (e.g., the type of hERV and one or more additional parameters as provided herein). For example, in the context of prognosis of responsiveness to an anti-cancer/tumor treatment, a lower hazard ratio would indicate a positive prognosis of response to treatment, and a higher hazard ratio would indicate a negative prognosis of response to treatment.
In some embodiments, the stratification of individuals based on a univariate analysis of hERVs only does not depend on the cancer type. In some embodiments, HR based on median hERV was universally applicable.
Without being limited by any particular theory, not all hERV have the same prognostic power.
Composite Score Based on Immune Cell Deconvolution and hERV Analyses
Some embodiments herein relate to human endogenous retroviral gene expression and immune cell infiltration as prognosis biomarkers in stage II/III colorectal cancer.
In some embodiments, tumor infiltrating lymphocytes (TILs) are closely related to hERV expression demonstrating immunogenicity of hERVs. Correlation with CD8 T cells is indicative of HERVs being immunogenic and very potent antigens.
Some embodiments are related to combining the expression profile of cell-type specific markers and expression profile of hERV/retro-transposon transactivation antigens to develop a composite score that may predict an outcome of anti-cancer treatment and overall survival or relapse free survival.
In some embodiments, the combined analysis serves as a prognostic indicator and enables segregation of the population based on overall survival (
As shown in
In some embodiments, one or more additional favorable and unfavorable traits/parameters including age of the individual, gender, stage of tumor, type of cancer, infection history of individual, cancer treatment regimens, sidedness, etc. can be included in the analysis to obtain a clinicopathological status and determine its correlation with overall survival and relapse free survival (
In some embodiments, combining clinicopathological negative status with the CD8−/hERV+ status (WTS− status) can deconvolve the poor clinicopathological group into two significantly distinct subgroups with different prognosis, i.e., clinicopathological negative/WTS− group and clinicopathological negative/WTS+ group (
Some embodiments relate to a method for accurate deconvolution of immune cells, measurements of HERVs as well as other biomarkers through WES/WTS sequencing and novel bioinformatics algorithms. Combining next-generation sequencing (NGS) based biomarkers with clinicopathological factors provides a better prediction of individual survival compared to clinicopathological biomarkers alone in CRC. Among several predictive biomarkers, CD8−/HERV+ strongly stratified individuals OS and RFS and revealed a previously unknown subset of CRC individuals with high risk of relapse, metastasis and death.
In some embodiments, the prognosis is better for some cancers versus other cancers. For example, as shown by the data in Table 1 below, the prognosis for right side CRC is better than the prognosis for left side CRC based on association between WES and WTS correlates.
In some embodiments, CRC is right sided. In some embodiments, CRC is left sided. Right sided CRC includes cancers of proximal colorectal cancers of the proximal two-thirds of the transverse colon, ascending colon, and cecum. Left sided CRCs include cancers of the distal colorectal cancers of the distal third of the transverse colon, splenic flexure, descending colon, sigmoid colon, and rectum).
The following examples are non-limiting and other variants within the scope of the art also contemplated.
Fractional Recovery of Immune Cell Types In Oncology NGS (FRICTION) is a validated, quantitative immune cell type deconvolution tool for performing cell type deconvolution analysis. It uses a basis of labeled cell type signatures to predict the fraction of these signatures within an unknown mixture sample. To do so, it essentially models the mixture sample as a penalized linear combination of the basis signatures. This is done using a simple SVM-based model. The calculation is performed on a subset of the presumption of linear combination.
A ZIPPY pipeline was used for prepping the RNA-seq data. It performs the following processes: bcl2fastq, then STAR, then RSEM and also generates some statistics. Bcl2fastq is performed to demultiplex next generation sequencing output in a bcl format into appropriate FASTQ files. Then alignment of reads and gene expression quantification are performed using STAR (Dobin et al., Bioinformatics. 2013 Jan; 29(1): 15-21.) and RSEM (Li, B., Dewey, C.N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011)) third party software packages. FRICTION can take in .genes.results files from RSEM as input, or two-column (feature, value) files.
Once the pipeline has been run, samples are added to the sample manifest: mixture files.tsv contains a record of every deconvolution sample that has been run, and also is used by run deconvolution.py to ingest data. For each sample, the filename provided should be the “.genes.results” file generated from RSEM. The mixture files format is mostly straightforward, but it's worth analyzing the metadata column, which contains information about the samples. Metadata can be Boolean or categorical. One example of metadata may be: “tissue:melanoma;id:ff5;total”. This describes that sample was from melanoma, was in the 5th batch of fresh frozen samples run, and is total (as opposed to purified cell type) RNA. The run deconvolution.py interface provides useful tools, for example allowing the testing of all melanoma, all total RNA, all samples from experiment ff5, etc.
Deconvolution is performed using the script run_deconvolution.py. There are many helper functions within this script to ease the process. The main function within the script is run_id, which will run all the samples associated with that id in the mixture_files.tsv spreadsheet.
Below is an example of running all samples for one experiment using magnetic bead-bound bases:
Of particular note: run_id puts its results in a csv file with name equal to the fname argument.
Run deconvolution can also be run from the command line to deconvolve a single sample: python run_deconvolution.py (gene list file) (sample to deconvolve)
One alternative to run_id is to build a separate running function. That function may follow this pattern:
The second argument to run_id is a gene list file. Currently, there are two gene lists that are used. The racle gene list may be found online.
The xgb_mad_v2 gene list was developed by using various heuristics together.
Deconvolution from ATAC-seq Data
Deconvolving the atac seq data was performed as follows:
Process ATAC seq data. Example atac zippy json:/home/awise/sngs/workflow/old_run_j_sons/atac9.json
For basis files, we merged them using another zippy script:/home/awise/sngs/workflow/merge_and_macs cd4.json
Now, there are files output from MACS with peaks. Now we want to reformat the macs files for deconvolution. This is done with/home/awise/atac/deconv/feature_select.py
Now we are ready to run deconvolution. However, there are many MACS peaks. We want to find a way of choosing the best of them. This is performed with atac_feature_select.py
FRICTION is a new technique for performing immune cell type deconvolution from RNA-Seq data. FRICTION takes RNA-Seq measured from tumor samples, and uses a pre-built set of cell type signatures to predict the immune content of specific immune cell components (see
FRICTION focuses on the selection and normalization of genes in ways that promote the detection of absolute cell fraction (i.e., the percentage of total cells) in contrast to other methods that focus on relative cell fraction (i.e., the percentage of immune cells) or statistical enrichment.
FRICTION works through a two-step process. First, a set of gene signatures are developed. In this experiment gene signatures were created using a set of purified cells, as well as an explicit set of background samples, from a variety of tissue types. Genes were then selected for deconvolution using three criteria:
Genes that scored well on the combination of these three measures were selected for performing deconvolution.
After the signatures are generated, FRICTION can be run from any human RNA-Seq sample. The deconvolution process may be run using a support vector regression based system inspired similar to that described by Newman et al., 2015. In contrast to Newman et al., but similar to Racle et al. 2017, we focused on the deconvolution of absolute cell fraction. This is enabled by our gene selection procedure, as well as our feature normalization that places each of our cell type signatures on the same scale.
FRICTION has been extensively evaluated using titration studies and sequenced tumors with orthogonal validation (IHC and flow cytometry).
Gene selection was performed using a set of magnetic-bead purified immune cell samples from three cell types (CD8+ T, CD4+ T and CD19+ B cells), with 6 CD8+ samples, 5 CD4+ samples and 4 CD19+ samples. Background tissue samples from ten tissue types (including lung, liver, colon, prostate and more) were used as controls. The resultant gene signature demonstrated both good separation between cell types as well as tight clustering of the background tissue types in a low-dimensional PCA representation (
FRICTION was evaluated using a series of titration experiments. In these experiments, known concentrations of CD8+, CD4+ and CD19+ primary cells were titrated into a variety of complex tissue backgrounds (primary individual tumors). Compared to simply looking at the correlation of marker genes and cell fraction, or using a simple hand-curated list of genes, FRICTION performs substantially better in terms of linear correlation, with median R2 value >0.97 (Table 2). Further comparison to IHC stained lung and colon samples has demonstrated FRICTION's ability to distinguish high vs. low CD4+ T cell content in primary tumors (data not shown).
0.14
0.72
0.72
0.56
0.00
0.66
0.68
0.26
0.65
0.72
0.45
0.05
0.60
FRICTION has also been evaluated in comparison to 5 primary melanoma tumors quantified using flow cytometry (
Primary immune cells were titrated into melanocyte cell background.
Training Set
1: CD8+ T cells (6 individuals)
2: CD4+ T cells (6 individuals)
3: CD19+ B cells (3 individuals)
Melanocyte Background
Sample #1: Melanocyte only
Sample #2: +0.6% CD4+, CD8+, and CD19+ cells
Sample #3: +3.3% CD4+, CD8+, and CD19+ cells
Sample #4: 11.8% CD4+, CD8+, and CD19+ cells
Titrations: Purified Immune Cells from Blood into Melanocytes
Mixture of immune cells (all same level)
Library Preparation: RNA Access (40 ng input)
Data are shown in
RNA from purified immune cells was titrated into RNA from total tissue (uterine tissue).
Training Set
1: CD8+ T cells (6 individuals)
2: CD4+ T cells (6 individuals)
3: CD19+ B cells (3 individuals)
Titrations: RNA of Purified Immune Cells from Blood Into RNA
RNA from purified immune cells from blood
Library Preparation: RNA Access (40 ng Input)
Data are shown in
0.01
0.73
0.14
0.72
0.72
0.45
0.32
0.56
0.00
0.66
0.63
0.68
0.26
0.65
0.72
0.70
0.45
0.05
0.66
0.65
0.53
0.60
DNA and RNA were extracted from fresh-frozen tumor and matched normal tissues of 114 individuals with stage IUIII CRC with a 1-1 MSH/MSS ratio (measured by MSI-PCR) together with the clinical data including overall survival (OS), relapse free survival (RFS), sex, age, stage, sidedness, adjuvant treatment, and metastatic status. Whole Exome Sequencing (WES) and Whole Transcriptome Sequencing (WTS) libraries were generated using Illumina Nextera™ Flex for Enrichment, and TruSeq™ Stranded Total RNA library prep methods respectively, and sequenced on a NovaSeg™ 6000 system.
Using an internally developed bioinformatics pipeline, various biomarkers such as human endogenous retroviral (HERV) gene expression, tumor infiltrating lymphocytes (TILs), microsatellite instability (MSI) status, tumor mutational burden (TMB), immune related gene expression were analyzed and the clinical significance of these signatures was evaluated.
Among clinicopathological factors, age, treatment, stage, and metastasis status were strong predictors of outcome. With WES and WTS derived biomarkers, MSI status together with HERV expression, CD8+ and CD19+ infiltration (as determined by a novel immune cell deconvolution-based method) were strong predictors. Interestingly, HERV expression and CD8+ cells have synergic impact on survival and median OS of CD8−/HERV+ subgroup is 29.8 compared to 37.5 for other subgroups (HR=4.4, log-rank P<0.001). Moreover, CD8−/HERV+ biomarker identified a more aggressive type of CRC that clinicopathological factors alone failed to uncover. Finally, a high correlation between the majority of detected HERV transcripts and TILs, was observed demonstrating the immunogenicity of these novel targets suggesting HERV expression as potential biomarker of response to immune-checkpoint inhibitors in CRC as well as other tumor types.
Provided herein is a HERV quantification process. A list of approximately 3000 genomic sequences belonging to human endogenous and exogenous retroviral genes was compiled. An alignment was performed of WTS-obtained reads using the custom index file based on this list appended to a hg19 human genome reference build. STAR and SALMON (Patro, et al., Nat Methods. 2017 Apr; 14(4): 417-419) third party alignment software was used and transcript quantification methods were employed using an optimized set of options. After quantification of these genes, library normalization was performed and to calculate median HERV values using the median normalized expression of all viral related genes for the sample.
In some embodiments, the disclosed methods for determining a composite score are implemented in an application-specific hardware designed or programmed to compute the disclosed methods with higher efficiency than a general-purpose computer processor. For example, the process may be run using a general-purpose computer, or alternatively run using a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
In some embodiments, one or more Application-Specific Integrated Circuits (ASICs) can be programmed to perform the functions of one or more of the respective methods described herein. ASICs include integrated circuits that include one or more programmable logic circuits that are similar to the FPGAs described herein in that the digital logic gates of the ASIC are programmable using a hardware description language such as VHDL. However, ASICs differ from FPGAs in that ASICs are programmable only once and cannot be dynamically reconfigured once programmed. Furthermore, aspects of the present disclosure are not limited to determining a composite score using FPGAs or ASICs. Instead, the main processing unit of any system performing the method may be implemented using one or more central processing units (CPUs), graphical processing units (GPUs), or any combination therefore.
In some implementations, the use of integrated circuits such as an FPGA, ASIC, CPU, GPU, or combination thereof, can include a single FPGA, a single ASIC, a single CPU, a single GPU, or any combination thereof. Alternatively, or in addition, the use of integrated circuits such as FPGA, ASIC, CPU, GPU, or combination thereof, can include multiple FPGAs, multiple ASICs, multiple CPUs, or multiple GPUs, or any combination thereof. The use of additional integrated circuits such as multiple FPGAs can reduce the amount of time it takes to perform additional analyses operations.
With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes but is not limited to,” etc.). It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to embodiments containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” and/or “an” should be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “ a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). In those instances where a convention analogous to “at least one of A, B, or C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g.,“a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.”
In addition, where features or aspects of the disclosure are described in terms of Markush groups, those skilled in the art will recognize that the disclosure is also thereby described in terms of any individual member or subgroup of members of the Markush group.
As will be understood by one skilled in the art, for any and all purposes, such as in terms of providing a written description, all ranges disclosed herein also encompass any and all possible sub-ranges and combinations of sub-ranges thereof. Any listed range can be easily recognized as sufficiently describing and enabling the same range being broken down into at least equal halves, thirds, quarters, fifths, tenths, etc. As a non-limiting example, each range discussed herein can be readily broken down into a lower third, middle third and upper third, etc. As will also be understood by one skilled in the art all language such as “up to,” “at least,” “greater than,” “less than,” and the like include the number recited and refer to ranges which can be subsequently broken down into sub-ranges as discussed above. Finally, as will be understood by one skilled in the art, a range includes each individual member. Thus, for example, a group having 1-3 articles refers to groups having 1, 2, or 3 articles. Similarly, a group having 1-5 articles refers to groups having 1, 2, 3, 4, or 5 articles, and so forth.
While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
This application claims the benefit of U.S. Provisional Application 62/977010 filed on Feb. 14, 2020, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62977010 | Feb 2020 | US |