This disclosure generally relates to immunology.
The Major Histocompatibility Complex (MHC) exposes protein content on the cell surface to allow detection of antigens by the immune system. This applies to non-self-antigens such as viral proteins as well as self-antigens such as tumor proteins.
Tumor cells harbor oncogenic alterations that can be presented to the immune system by the MHC, which normally causes immune recognition and elimination (sometimes referred to as “immune surveillance”). However, in order to grow, invade, and spread, tumors must evade immune surveillance. Common mechanisms of immune evasion include a) loss of the MHC molecules or b) the upregulation of immune checkpoint molecules on cell surfaces that normally regulate the amplitude and duration of a T cell response. Antibodies that block immune checkpoint molecules, known as immune checkpoint inhibitors (ICPi), can invigorate inactive and/or exhausted T cells, producing anti-tumor effects that confer long-term survival benefits in certain types of cancer. However, ICPi are effective in only 10-40% of patients for reasons that remain unclear. Meta-analyses of clinical trials in melanoma patients treated with ICPi suggest that young and female patients are characterized by low response rates. The reason(s) for the poor response of these two populations remains elusive, and developing a predictive assay would be beneficial.
Individual MHC genotype constrains the mutational landscape during tumorigenesis. Immune checkpoint inhibition reactivates immunity against tumors that escaped immune surveillance in approximately 30% of cases. Recent studies, however, demonstrated poorer response rates in female and younger melanoma patients. Although immune responses differ with sex and age, the role of MHC-based immune selection in this context is unknown. As described herein, female tumors accumulated more poorly presented driver mutations despite no sex-based differences in MHC genotype. Younger patients showed stronger effects of MHC-based driver mutation selection, with younger females showing compounded effects and nearly twice as much MHC-II based selection. This disclosure presents the first evidence that strength of immune selection during tumor development varies with sex and age, and may influence responsiveness to immune checkpoint inhibition therapy.
In one aspect, a computer implemented method for determining whether a subject is at risk of having or developing a cancer is provided. Such a method typically includes a) genotyping the subject's major histocompatibility complex class II (MHC-II); and b) scoring the ability of the subject's MHC-II to present a mutant cancer-associated peptide based upon a library of known cancer-associated peptide sequences sequences derived from subjects, wherein the produced score is the MHC-II presentation score. Generally, i) if the subject is a poor MHC-II presenter of specific mutant cancer-associated peptides, the subject has an increased likelihood of having or developing the cancer for which the specific mutant cancer-associated peptides are associated; or ii) if the subject is a good MHC-II presenter of specific mutant cancer-associated peptides, the subject has a decreased likelihood of having or developing the cancer for which the specific mutant cancer-associated peptides are associated.
Such a method can further include c) determining whether a biopsy sample obtained from the subject comprises DNA encoding a mutant cancer-associated peptide based upon a library of cancer-associated mutations obtained from subjects.
In some embodiments, the biopsy sample is a liquid biopsy sample. In some embodiments, the biopsy sample is a solid biopsy sample. Representative liquid biopsy samples include, without limitation, blood, saliva, urine, or other body fluid.
In some embodiments, the library of cancer-associated mutations is obtained by whole genome sequencing of subjects.
In some embodiments, the step of scoring the ability of the subject's MHC-II to present a mutant cancer-associated peptide comprises using a predicted MHC-II affinity for a given mutation xij, where x is the MHC-II affinity of subject i for mutation j to fit a mixed-effects logistic regression model that follows a model equation obtained from a large dataset of subjects from which MHC-II genotypes and presence of peptides of interest can be obtained:
logit(P(yij=1|xij))=ηj+γ log(xij)
wherein: yij is a binary mutation matrix yij ∈{0,1} indicating whether a subject i has a mutation j; xij is a binary mutation matrix indicating predicted MHC-II binding affinity of subject i having mutation j; γ measures the effect of the log-affinities on the mutation probability; and ηj˜N(0, ϕη) are random effects capturing residue-specific effects, wherein the model tests the null hypothesis that γ=0 and calculates odds ratios for MHC-II affinity of a mutation and presence of a cancer.
In some embodiments, the predicted MHC-II affinity for a given mutation xij is a Subject Harmonic-mean Best Rank (PHBR) score. In some embodiments, the PHBR score is obtained by aggregating MHC-II binding affinities of a set of mutant cancer-associated peptides by referring to a pre-determined dataset of peptides binding to MHC-II molecules encoded by at least 12 different HLA alleles.
In some embodiments, the mutant cancer-associated peptide contains an amino acid substitution, and wherein the set of peptides consists of at least 15 of all possible 15-amino acid long peptides incorporating the substitution at every position along the peptide. In some embodiments, the mutant cancer-associated peptide contains an amino acid insertion or deletion, and wherein the set of peptides consists of at least 15 of all possible 15-amino acid long peptides incorporating the insertion or deletion at every position along the peptide. In some embodiments, the set of mutant cancer-associated peptides comprises any one or more of the mutations shown in Appendix A, wherein the presence of any one of these mutations indicates the presence of or increased risk of developing cancer.
Representative cancers include, without limitation, bladder urothelial carcinoma (BLCA), a breast invasive carcinoma (BRCA), a colon adenocarcinoma (COAD), a glioblastoma multiforme (GBM), a head and neck squamous cell carcinoma (HNSC), a brain lower grade glioma (LGG), a liver hepatocellular carcinoma (LIHC), a lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), an ovarian serous cystadenocarcinoma (OV), a pancreatic adenocarcinoma (PAAD), a prostate adenocarcinoma (PRAD), a rectum adenocarcinoma (READ), a skin cutaneous melanoma (SKCM), a stomach adenocarcinoma (STAD), a thyroid carcinoma (THCA), a uterine corpus endometrial carcinoma (UCEC), or a uterine carcinosarcoma (UCS).
In another aspect, a computing system for determining whether a subject is at risk of having or developing a cancer is provided. Such a system typically includes a) a communication system for using a library of cancer-associated peptides derived from subjects; and b) a processor for scoring the ability of the subject's major histocompatibility complex class II (MHC-II) to present a mutant cancer-associated peptide based upon a library of cancer-associated peptides derived from subjects, wherein the produced score is the MHC-II presentation score.
In some embodiments, the step of scoring the ability of the subject's MHC-II to present a mutant cancer-associated peptide comprises using a predicted MHC-II affinity for a given mutation xij, where x is the MHC-II affinity of subject i for mutation j to fit a mixed-effects logistic regression model that follows a model equation obtained from a large dataset of subjects from which MHC-II genotypes and presence of peptides of interest can be obtained:
logit(P(yij=1|xij))=ηj+γ log(xij)
wherein: yij is a binary mutation matrix yij∈{,1} indicating whether a subject i has a mutation j; xij is a binary mutation matrix indicating predicted MHC-II binding affinity of subject i having mutation j; γ measures the effect of the log-affinities on the mutation probability; and ηj˜N(0, ϕη) are random effects capturing residue-specific effects, wherein the model tests the null hypothesis that γ=0 and calculates odds ratios for MHC-II affinity of a mutation and presence of a cancer.
In some embodiments, the predicted MHC-II affinity for a given mutation xij is a Subject Harmonic-mean Best Rank (PHBR)-II score. In some embodiments, the PHBR-II score is obtained by aggregating MHC-II binding affinities of a set of mutant cancer-associated peptides by referring to a pre-determined dataset of peptides binding to MHC-II molecules encoded by at least 12 different HLA alleles.
In some embodiments, the mutant cancer-associated peptide contains an amino acid substitution, and wherein the set of peptides consists of at least 15 of all possible 15-amino acid long peptides incorporating the substitution at every position along the peptide. In some embodiments, the mutant cancer-associated peptide contains an amino acid insertion or deletion, and wherein the set of peptides consists of at least 15 of all possible 15-amino acid long peptides incorporating the insertion or deletion at every position along the peptide.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the methods and compositions of matter belong. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the methods and compositions of matter, suitable methods and materials are described below. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety.
Part B—Strength of Immune Selection in Tumors Varies with Sex and Age
MHC-II molecules typically present 12-16 amino acid peptides to CD4+ T cells. CD4+ T cells play a more complex role than CD8+ T cells. While possessing cytotoxic effector properties similar to CD8+ T cells, CD4+ T cells also exert a wide range of regulatory functions that distinguish them from CD8+ T cells. Classically, CD4+ T cells provide functional help to B cells, CD8+ T cells, and CD4+ T cells in the form of cooperation involving cognate interaction with an antigen presenting cell (B cell or dendritic cell). The role of CD4+ T cells in tumor immunity and protection has been demonstrated in the mouse, and patients responding to immunotherapy show a strong proliferative CD4+ T cell response to tumor-associated antigens. In addition, adoptive CD4+ T cell therapy has been associated with durable clinical responses in melanoma and cholangiocarcinoma patients.
Early detection, diagnosis, and treatment of tumors is a major determinant of patient morbidity and mortality. Accurate predictions of when, where, and how tumors are likely to arise would have enormous implications for cancer screening and could improve survival rates. While the main contributor to the development of most adulthood tumors is sporadic somatic mutation, germline variants have been implicated as a determinant of tumor characteristics. Here, we propose that the MHC-II genotype is an additional such germline influence.
This disclosure describes the essential role of MHC-II molecules in antigen presentation and in immune detection of mature tumors through neoantigen recognition. MHC-II, like MHC-I, is highly variable among humans, with 4,802 documented alleles. However, the antigen affinity of each MHC-II molecule is influenced by two genes, producing a combinatorial effect that leads to higher variation than MHC-I. In addition, the average MHC binding affinity for MHC-II-restricted peptides required to activate CD4+ T cells is less stringent than that for MHC-I restricted peptides, the MHC-II peptide binding groove structure allows more promiscuous binding of peptides, and CD4+ T cell responses can extend to encompass additional antigens after initial activation (epitope spreading). As described herein, however, we surprisingly found that MHC-II genotype has an even stronger influence over mutation probability than does the MHC-I genotype.
MHC-II appears to exert a stronger selective pressure than MHC-I, leading to a stronger effect by MHC-II on somatic mutation probability. This role aligns with the understanding of CD4+ T cells as a necessary component of the activation and regulation of CD8+ T cells. While the diversity of an individual's MHC-I may play a role in tumor susceptibility, MHC-I appears to have weaker effects on mutation selection.
Notably, as described herein, MHC-II had stronger effects than MHC-I in shaping the driver mutations of a tumor. Interestingly, these effects appear to be less patient-specific than MHC-I, perhaps due to the promiscuous nature of MHC-II peptide binding. Furthermore, these effects could be driven by a faster evasion of MHC-I presentation than MHC-II presentation due to mechanisms like HLA mutation or HLA loss of heterozygosity that would occur within the tumor but are unlikely to affect the MHC-II on professional APCs. Another possibility is that MHC-II presentation and CD4+ T cell recognition may be a necessary prerequisite to CD8+ T cell cytotoxicity and tumor elimination, in agreement with the regulatory role of CD4+ T cells. We reason that the stronger effect of MHC-II on the odds of acquiring a mutation is consistent with a dual regulatory and effector CD4+ role. If the role of CD4+ T cells was purely regulatory, MHC-I specificity would be expected to drive mutation probability. Therefore, the role of the MHC-II genotype and MHC-II presentation needs to be properly weighted to understand the role of the interplay between mutational burden and tumor evolution. This understanding will be essential in the development of immunotherapies, likely being a critical component of their future success.
This disclosure indicates that the response rate to immune checkpoint inhibitors (ICPi) may be dependent on the strength of immune selection occurring early in tumorigenesis. Methods to accurately predict the impact of immunoediting on a patient-specific basis may lead to better predictive algorithms for response to therapy. As a corollary, we posit that ICPi treatment is likely to have a reduced effect in younger female patients since this treatment will attempt to reactivate T cells for immunologically invisible neoantigens. Rather, adaptive T cell therapy against patient-validated neoantigens or therapeutic vaccination against conserved antigens will likely be more beneficial in these patients. Finally, these findings shed new light on the role of immune surveillance in cancer progression.
As described herein, we found that predicted MHC-II presentation of cancer-related somatic mutations shape tumor development through variation in antigen presentation in complementary fashion to MHC-I, highlighting the need to consider the independent, yet complementary, roles of CD4+ and CD8+ T cells in the selection and elimination of tumors.
In accordance with the present invention, there may be employed conventional molecular biology, microbiology, biochemical, and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature. The invention will be further described in the following examples, which do not limit the scope of the methods and compositions of matter described in the claims.
Data were obtained from publicly available sources including The Cancer Genome Atlas (TCGA) Research Network (cancergenome.nih.gov/ on the World Wide Web), The Allele Frequency Net Database (Gonzalez-Galarza et al., 2018, Methods Mol. Biol., 1802:49-62), Ensembl, Exome Variant Server, UniProt (UniProt Consortium, 2015), or cited literature (Ciudad et al., 2017, J. Leukoc. Biol., 101:15-27). TCGA normal exome sequences and TCGA clinical data were also downloaded from the GDC. Furthermore, TCGA somatic mutations were accessed from the NCI Genomic Data Commons (portal.gdc.cancer.gov/ on the World Wide Web). Population level HLA frequencies were obtained from the Allele Frequency Net Database. Common germline variants were downloaded from the Exome Variant Server NHLBI GO Exome Sequencing Project (ESP), Seattle, Wash. Finally, viral and bacterial peptides were obtained from UniProt.
To create a residue-centric presentation score, we evaluated allele-based ranks for peptides containing the residue of interest. Each allele-based rank was predicted using the NetMHCIIPan-3.1 tool, downloaded from the Center for Biological Sequence Analysis (Karosiene et al., 2013, Immunogenetics, 65:711-724). NetMHCIIPan-3.1 takes a peptide and an MHC-II protein (HLA-DRB1, HLA-DPA1/DPB1 or HLA-DQA1/DQB1) and returns binding affinity IC50 scores and corresponding allele-based ranks. Peptides with rank <10 and <2 are considered to be weak and strong binders, respectively. Allele-based ranks were used to represent peptide binding affinity. We previously established the best rank of possible peptides containing the residue as an effective estimator of extracellular presentation (Marty et al., 2017, Cell, 171:1272-83). Here, we evaluated two approaches to selecting the set of peptides containing the residue to consider:
Insertion and deletion mutations were modeled by the resulting peptides that differed from the native sequence and tested with the same peptide-set parameters. These two peptide selection models were compared based on performance in a multi-allelic setting and the all 15-mers model was selected (see below).
We defined a patient presentation score to represent a particular patient's ability to present a residue given their distinct set of 12 HLA-encoded MHC-II molecules (4 combinations of HLA-DPA1/DPB1 and HLA-DQA1/DQB1; 2 alleles of HLA-DRB1 considered twice each (since HLA-DRA1 is invariant) for consistency between resulting molecules). The Patient Harmonic-mean Best Rank (PHBR) score was assigned as the harmonic mean of the best residue presentation scores for each of the 12 MHC-II molecules. A lower patient presentation score indicates that the patient's MHC-II molecules are more likely to present a residue on the cell surface.
In order to test the performance of the different peptide sets that could compose the multi-allelic PHBR score to predict presentation, we used published MS data for 7 cell lines expressing 2-3 HLA-DRB1 alleles typed to the fourth digit (Ciudad et al., 2017, J. Leukoc. Biol., 101:15-27). Ciudad et al. (2017, J. Leukoc. Biol., 101:15-27) catalogs peptides observed in complex with MHC-II (HLA-DR) on the cell surface for 7 different combinations of 2-3 HLA-DRB1 alleles, with 70 to 240 mappable peptides each. These data were combined with a set of random peptides to construct a benchmark for evaluating the performance of scoring schemes for identifying residues presented on the cell surface as follows:
HLA genotyping was performed for genes HLA-DRB1, HLA-DPA1, HLA-DPB1, HLA-DQA1 and HLA-DQB1, which encode three protein determinants of MHC-I peptide binding specificity, HLA-DR, HLA-DP, and HLA-DQ. TCGA samples (see Table 51 in doi.org/10.1016/j.cell.2018.08.048 on the World Wide Web) were typed with HLA-HD (Kawaguchi et al., 2017, Hum. Mutat. 38:788-97), using default parameters. HLA-HD requires germline (whole blood or tissue matched) whole exome sequenced samples. The tool reports 100% 4-digit validation accuracy across 90 low-coverage exomes. Samples with very low coverage on specific genes are left untyped by HLA-HD. Patients were assigned an HLA-DR type if they were successfully typed for HLA-DRB1. Patients were assigned HLA-DP and -DQ types if they had successful typing for HLA-DPA1/HLA-DPB1 and HLA-DQA1/HLA-DQB1, respectively. Samples were validated by xHLA (Xie et al., 2017, PNAS USA, 114:8059-64), run with default parameters, and only patients where all alleles agreed were included in the analysis (
Somatic mutations were considered to be recurrent and oncogenic if they occurred in one of the 100 most highly ranked oncogenes or tumor suppressors described by Davoli et al. (2013, Cell, 155:948-62) and were observed in at least 3 TCGA samples. Among these, we retained only mutations that would result in predictable protein sequence changes that could generate neoantigens, including missense mutations and inframe indels. A total 1,018 mutations (512 missense mutations from oncogenes, 488 missense mutations from tumor suppressors, 11 indels from oncogenes and 7 indels from tumor suppressors) were obtained (Marty et al., 2017, Cell, 171:1272-83). All mutations observed in TCGA patients that did not fall into the 200 most highly ranked cancer genes were designated passenger-like mutations. Furthermore, we created an additional set of established non-cancer mutations. To do so, we selected a set of genes that were known non-cancer genes and selected mutations in these genes regardless of their recurrence in TCGA (Table 1) (Lawrence et al., 2013, Nature, 499(7457):214-8).
Peptides from pathogens, common germline human variants and randomly mutated human peptides were assembled for comparison with recurrent oncogenic mutations (Marty et al., 2017, Cell, 171:1272-83). The proteomes of 10 virus species and 10 bacterial species were downloaded from UniProt (UniProt Consortium, 2015). One thousand residues were selected at random from both the viral and the bacterial set. A random set of mutations was generated by sampling 3,000 possible amino acid substitutions across human proteins from Ensembl (release 90; GRCh38) (Aken et al., 2017, Nuc. Acids Res., 45(D1):D635-42). A set of 1,000 common germline variants was sampled from the Exome Variant Server.
To allow determination of peptide sequences incorporating missense mutations, protein sequences were obtained from Ensembl (release 90; GRCh38) (Aken et al., 2017, Nuc. Acids Res., 45(D1):D635-42) and updated with the new amino acid. For indels, we modified the corresponding mature messenger RNA transcript sequences (CDS) by inserting or deleting nucleotides, then translated the modified mRNA to protein sequence.
A matrix of PHBR scores was constructed with 5,942 TCGA samples as rows, 1,018 recurrent oncogenic mutations as columns, and PHBR score in each cell. The matrix was clustered using hierarchical agglomerative clustering on rows and columns. For convenience of visualization, a partial matrix is displayed in
PHBR presentation scores were calculated for 5,942 TCGA patients across different classes of residue including 71 highly-recurrent (>10) oncogenic missense mutations, 1000 random amino acid substitution, 1000 germline variants, 1000 viral residues and 1000 bacterial residues (see Selection of Other Classes of Residues). Across categories, this resulted in 24,189,882 PHBR scores (oncogenes: 231,738; tumor suppressor genes: 190,144; random: 5,942,000; common: 5,942,000; viral: 5,942,000; bacterial: 5,942,000). The distributions of PHBR scores in each category were compared with Mann-Whitney U tests and visualized with violin plots (
As a control population, we used dbGaP samples (dbGaP: Phs000398, Phs000254, Phs000632, Phs000209, Phs000290, Phs000179, Phs000422, Phs000291, Phs000631 and Phs000518) typed at MHC-II using HLA-HD (Kawaguchi et al., 2017, Hum. Mutat. 38:788-97), with default parameters and typed at MHC-I using Optitype (Szolek et al., 2014, Bioinformatics, 30(23):3310-6), with default parameters. Both tools require germline (whole blood or tissue matched) whole exome sequenced samples. We successfully typed the HLA-I genes for 1,386 patients and the HLA-II genes for 1,219 patients who had alleles in the netMHCpan-3.0 and the netMHCIIpan-3.1 database. This control population was used to look at the MHC-II population of different classes of peptides by a non-cancer specific population (
The PHBR scores of 5,942 patients in TCGA were calculated for 1000 passenger mutations (observed 1 or 2 times in the 5,942 patients; not occurring in 200 cancer-implicated genes). PHBR scores were calculated for 1,018 recurrent driver mutations (from 200 cancer implicated genes) in the 7137 patients. The distribution of passenger PHBR scores was compared to 841 low frequency (≤5 times), 149 medium frequency (>5, ≤20 times) and 28 high frequency oncogenic mutations (>20 times). The distributions of PHBR scores in each category were compared with Mann-Whitney U tests and visualized with violin plots (
To assess the role of MHC-II in regards to mutation probability, we further restricted the recurrent oncogenic mutations to those occurring at least two times in the set of patients, resulting in 787 mutations and 5,942 patients. To first visualize the difference in PHBR-II distributions for mutations observed versus absent from tumors, PHBR-II scores from the 1,018 mutations×5,942 patient matrix were grouped according to mutation status and plotted in side-by-side violin plots. Next, we built a 5,942×787 binary mutation matrix yij ∈{0, 1} indicating whether patient i has a specific mutation j. We evaluated the relationship between this binary matrix and the matched 5,942×787 matrix with PHBR-II scores xij of patient i and for mutation j. We fitted a generalized additive model for the PHBR-II score and mutation probability with the GAM function in the MGCV R package (Wood, 2001, R. News, 1:20-5). To estimate the effect of xij on yij, we considered the following random effects model:
logit(P(yij=1|xij))=ηi+γ log(xij)
where ηi˜N(0, θη) are random effects capturing different mutation propensities among patients.
In these models, γ measures the effect of the log-PHBR-II. We fitted this model using the glmer function from the lme4 R package (Bates et al., 2015, J. Stat. Softw. 67:1-48) and tested the null hypothesis that γ=0. To analyze the PHBR-mutation relationship in different tumor types, we fit separate models for each tumor type where there were at least 50 total number of driver mutations in the cohort. Furthermore, we used this same method to evaluate the difference in selection between mutations high allelic fraction and low allelic fraction (see ‘Clonality of mutations’ section).
To assess the interaction between MHC-I and MHC-II in regards to mutation probability, we reduced the set of patients to those successfully typed for both MHC-I and MHC-II (Marty et al., 2017, Cell, 171:1272-83). We further restricted the recurrent oncogenic mutations to those occurring at least twice in the set of patients, resulting in 787 mutations and 5,942 patients. Then, we checked the correlation between MHC-I and MHC-II presentation using a Spearman Rank Test between MHC-I and MHC-II scores for each patient across all 1,018 mutations. These correlations were displayed as a histogram (
We built a 5,942×787 binary mutation matrix yij ∈{0, 1} indicating whether patient i has a specific mutation j. We evaluated the relationship between this binary matrix and two matched 5,942×787 matrices with MHC-I PHBR scores wij of patient i and for mutation j and MHC-II PHBR scores xij of patient i and for mutation j. To visualize the relationship between wij and xij with yij, we fit an generalized additive model for the PHBR scores of both classes using the GAM function in the mgcv R package (Wood, 2001, R. News, 1:20-5). Finally, to estimate the effect of xij and wij on yij, we considered the following random effects model:
A within-patient model relating xij and wij to yij for a given patient
logit(P(yij=1|xij,wij))=α+ηi+γ log(xij)+β log(wij)
where α is the intercept term and ηi˜N(0, θη) are random effects capturing different mutation propensities among patients.
In these models, γ measures the effect of the log-PHBR-I and β measures the effect of the log-PHBR-II on the probability of a mutation being observed. We fitted this model using the glmer function from the lme4 R package (Bates et al., 2015, J. Stat. Softw. 67:1-48) and tested the null hypothesis that γ=0 and β=0. To analyze the PHBR-mutation relationship in different tumor types, we fit separate models for each tumor type where there were at least 50 total number of driver mutations in the cohort. Given the distinct PHBR score ranges for MHC-I and MHC-II, we constructed an OR analysis to compare the relative effects in the population. Instead of reporting the OR for a single unit increase, we reported the odds of observing a mutation in the 25th PHBR percentile relative to the 75th PHBR percentile.
For each mutation in our set of 1,018 driver mutations, we calculated the fraction of patients that could present the mutation based on their MHC-I and MHC-II genotype, respectively. We used the standard weak binding cutoffs of 2 for MHC-I and 10 for MHC-II. These results were visualized with a density plot (
The occurrences of mutations within the set of 1,018 driver mutations were designated as likely clonal or likely subclonal based on the allelic fraction annotation provided by TCGA. Mutations that were among the lowest 30th percentile were designated likely subclonal and all the remaining were considered likely clonal. We modeled the independent effect of PHBR-II and PHBR-I on mutation probability separately for subclonal and clonal occurrences as described above in the section ‘Modeling the effect of PHBR-II on mutation probability’.
Immune infiltration levels were quantified from expression using CIBERSORT
(Newman et al., 2015, Nat. Methods, 12(5):453-7) and patient-specific cytotoxicity scores were derived (Rooney et al., 2015, Cell, 160:48-61). Tumors were divided into “high” and “low” groups for each of the following categories using the tumor-type specific 30th and 70th percentile: APC infiltration (B cells, dendritic cells and macrophages), cytolytic activity, CD8+ T cell infiltration and CD4+ T cell infiltration. We modeled the independent effect of PHBR-II and PHBR-I on mutation probability in the high and low groups as described above in the section ‘Modeling the effect of PHBR-II on mutation probability’.
MHC-I and MHC-II coverage of driver mutations was determined by calculating the fraction of the 1,018 driver mutation PHBR scores for each patient that fell below the binding thresholds, 2 and 10 for MHC-I and MHC-II respectively. This analysis resulted in each patient being assigned two MHC coverage values (MHC-I and MHC-II). Furthermore, two more values were calculated for each patient using 1,000 passenger mutations. The number of homozygous genes was determined for each patient by adding the number of identical alleles for MHC-I (-A, -B, -C) and MHC-II (-DRB, -DPA, -DPB, -DQA, -DQB) separately. The MHC coverage values were calculated for these patients as well and compared to the TCGA MHC coverage values with a Mann Whitney U test.
To visualize the association between MHC coverage and age at diagnosis, the patients with MHC coverage values in the lowest quartile and the patients with MHC coverage values in the highest quartile were compared. To determine statistical significance, a linear model in R was applied with age as the independent variable and MHC coverage, ancestry and tumor type as the dependent variables. Statistical significance was also determined for MHC-I and MHC-II coverage of passenger mutations and MHC homozygosity count as a replacement for MHC coverage. To assess the practical effect size of the extreme cases of MHC coverage, we compared the ages at diagnosis of the 5% of patients with the lowest MHC-I coverage with the ages at diagnosis for the 5% of patients with the highest MHC-I coverage with a two sample t test. We also performed the same analysis for the patients with the highest and lowest 10% of MHC-I coverage. A Pearson correlation test was used to determine the correlation between MHC coverage of driver mutations and MHC coverage of passenger mutations for both MHC-I and MHC-II.
For all individual tests, a p value of less than 0.05 was considered significant. When multiple comparisons were made, p values were adjusted using the Benjamini-Hochberg method unless otherwise specified. For all box plots, whiskers indicate the 1.5 IQR range.
The python (2.7) and R code used to perform the analyses described in this manuscript and generate all main and supplemental figures is available in Data 51 and at github.com/Rachelmarty20/MHC_II on the World Wide Web.
To study the role of MHC-II during tumorigenesis, we needed a score linking MHC-II genotype to presentation of specific mutations. We first constructed a score representing the ability of a single MHC-II molecule to present a residue. We previously established that using the best rank among peptides provided the best performance for predicting MHC-I presentation. We therefore adapted this scoring scheme to reflect the structure and composition of MHC-II. Three molecules (HLA-DR, HLA-DP, and HLA-DQ) make up the MHC-II, all of which are heterodimers formed by an alpha and beta chain. Both the alpha and the beta chain influence the binding affinity of a peptide. In contrast to MHC-I, the MHC-II binding groove is open at both ends, allowing longer peptides to bind. To predict binding affinity to each alpha- and beta-paired MHC-II molecule, we used netMHCIIpan-3.1 that returns a single rank for the pair with each peptide (Karosiene et al., 2013, Immunogenetics, 65:711-24). Unlike netMHCpan-3.0, netMHCIIpan-3.1 has only been optimized for 15-mers and not for varying lengths. As with MHC-I, we assigned the single MHC-II molecule presentation score as the best rank of all k-mers containing the desired residue (
Next, single molecule residue-centric presentation scores were combined into an MHC-II genotype score. Previously, MHC-I single allele best rank scores were combined using the harmonic mean resulting in the patient best-rank harmonic mean (PHBR-I) score, as this outperformed all other tested formulations. To create an analogous score for MHC-II, we modified the PHBR-I score to account for the different composition of MHC-II molecules. The MHC-II genotype comprises two copies each of HLADR alpha and beta, HLA-DP alpha and beta and HLA-DR alpha and beta. HLA-DRA is the only non-variable gene in the population, resulting in only two possible HLA-DR heterodimers. Each individual can form four possible alpha-beta heterodimers from HLA-DP and HLA-DQ. This results in a total of ten possible unique heterodimeric MHC-II molecules (
To assess the performance of the PHBR-II score at predicting extracellular presentation, we compared the scores for peptides derived from several multi-allelic HLA-DR expressing cell lines against matched scores for randomly derived peptides (Ciudad et al., 2017, J. Leukoc. Biol., 101:15-27) (
Finally, we applied the HLA-HD tool (Kawaguchi et al., 2017, Hum. Mutat. 38:788-97) to predict HLA-II alleles for patients in TCGA with exome sequencing data (see Table S1 in doi.org/10.1016/j.cell.2018.08.048 on the World Wide Web). To the best of our knowledge, HLA-HD is currently the only tool that can call alpha and beta alleles for HLA-DR, HLA-DP, and HLA-DQ with high accuracy. Thus, from a total of 8,333 patients with exome sequencing, we successfully typed 7,929 patients at all three genes. To validate these HLA types, we also applied xHLA (Xie et al., 2017, PNAS USA, 114: 8059-64), which calls the beta alleles for HLA-DR, HLA-DP, and HLA-DQ. We restricted our patient set to samples where both HLA-HD and xHLA completely agreed, leaving 5,942 patients (
Mutations that drive the early development of tumors should be observed more frequently across tumors. We therefore used recurrence of mutations in established oncogenes and tumor suppressors as criteria to assemble a list of 1,018 cancer-driving mutations likely to have occurred prior to immune evasion and that could therefore reflect the effects of selection by immunosurveillance. We calculated PHBR-II scores for every mutation-patient combination, resulting in a matrix of 5,942 patients (
Next, we compared the ability of the 5,942 cancer patients to present different classes of residues by MHC-II. We calculated the PHBR-II scores of every patient for 1,000 viral residues, 1,000 bacterial residues, 1,000 common polymorphisms, and 1,000 random mutations (Marty et al., 2017, Cell, 171:1272-83). To compare the behaviors of PHBR-II scores, we visualized raw distribution and the cumulative distribution function (CDF) for each class of residues. Viral and bacterial residues were presented the most effectively out of these classes by the patients in the population (
We next evaluated whether the recurrence of a mutation was related to its presentation by MHC-II by comparing the PHBR-II score distributions of passenger mutations and varying frequencies of cancer-driving mutations (
Given observed bias for cancer mutations to be poorly presented by human MHC-II (
Next, we used a logistic regression with non-linear effects to model the relationship between MHC-II genotype and the probability of observing a recurrent somatic mutation in a pan-cancer setting. We found a substantial increase in odds of acquiring a mutation as PHBR-II scores increased (OR=1.23, p<9.9e±58, Table 3). Importantly, passenger mutations, established non-driver mutations (Table 1), and germline polymorphisms did not exhibit the same increase (OR=1.00, OR=0.99, and OR=0.99, respectively, Table 3). In addition, the OR decreased when less stringent HLA type calls were used (OR=1.20), suggesting the importance of accurate HLA typing.
Because the immune environment can vary considerably across tissue sites, we revisited our analysis for each tumor type separately (
We previously established the influence of germline MHC-I genotype on the probability of observing specific mutations in tumors (Marty et al., 2017, Cell, 171:1272-83). To assess the combined influence of MHC-I and MHC-II on mutation probability, we evaluated the correlation between PHBR-I and -II scores across recurrent cancer mutations. The range and distribution of PHBR-I and -II scores differs substantially (
To quantify the influence of MHC-I and MHC-II on probability of mutation, we used an additive logistic regression model with non-linear effects that incorporated both PHBR-I and -II scores in the pan-cancer setting. Because the distributions of PHBR-I and -II are very different, we calculated the ORs between the 25th and 75th percentile PHBR, such that the OR represents the increase in odds of observing a mutation among individuals with a high PHBR score relative to a low PHBR score for each MHC class. Notably, we found the impact of MHC-II on the probability of a mutation to be larger than the impact of MHC-I (single model incorporating both classes: OR=1.74 with CI [1.67, 1.80] and OR=1.60 with CI [1.54, 1.64], respectively). To better understand the relative effects of presentation by MHC II versus MHC I in a tissue-specific setting, we also estimated their individual effects on mutation probability in a joint model. Consistent with our pan-cancer analysis, we found MHC-II to have more extreme effect sizes in most tissues (
The same driver mutations can occur early or late during tumor development; however, in a model where immune selection is impaired later in tumorigenesis by mechanisms of immune evasion, selection should be stronger on early clonal occurrences. Therefore, we further annotated mutations according to whether they were more likely clonal or subclonal based on relative allelic fraction of the mutations (STAR Methods). Consistent with our assumption, likely subclonal mutations had decreased ORs relative to PHBR II and PHBR I scores (single class model, reference Table 3: PHBR-II OR=1.13 as compared to 1.21 for all mutations, PHBR-I OR=1.16 as compared to 1.20 for all mutations,
Next, we explored whether practical differences exist in the presentation of particular driver mutations by MHC-II versus MHC-I. We compared the fraction of patients wherein a mutation was presented by MHC-II with the same fraction for MHC-I (
Based on this analysis, the relative abundance of class I peptides appears to be higher than that for class II, suggesting better potential for engineering class I anti-tumor responses; however, recent reports suggest a bias for responses to be CD4+-driven in practice. This could indicate that TCR availability is a major bottleneck for effective CD8+ immune responses.
Differences in the dynamics of peptide presentation and immune response for MHC-I versus MHC-II may have important implications for tumor-immune interactions. Whereas MHC-I binds peptides with high specificity, MHC-II binds a broader array of peptides with a high degree of promiscuity. CD4+ T cells activated by MHC-II-peptide complexes can play either a regulatory or an effector role, whereas CD8+ T cells are strictly (cytotoxic) effectors. The different properties of class I- and class II-based immunity are essential for an effective defense against pathogens, but the implications for anti-tumor responses are less clear. We therefore sought to further quantify the potential for these distinct roles to introduce measurable differences between class I- and class II-mediated immunosurveillance during tumor development. Because of its established regulatory role in cancer, we reasoned that MHC II-driven immunosurveillance could have a larger effect on the immune microenvironment than MHCI. Using CIBERSORT (Newman et al., 2015, Nat. Methods, 12(5):453-7) to evaluate infiltration by different immune cell types into tumors, we sought to identify a relationship between immune infiltrates, cytotoxicity score (Rooney et al., 2015, Cell, 160:48-61), and strength of immune selection. We divided patients into groups based on their immune infiltrates and cytotoxicity scores and tested for differences in immune selection (
Population level variation in effectiveness of cancer-relevant immunosurveillance could also relate directly to cancer susceptibility. We reasoned that patients whose MHC genotype could present a larger fraction of driver mutations to the immune system would be more resistant to developing cancer. As homozygous genotype at MHC alleles could reduce the diversity of presented peptides, we compared presentation across patients with different levels of homozygosity. We quantified coverage of cancer causing mutations as the fraction of the 1,018 driver mutations that could be presented by the MHC-II genotype of each patient (STAR Methods) and henceforth refer to this fraction as MHC-II coverage. As expected, patients with more homozygous MHC-II alleles were able to present a smaller fraction of the space due to their decreased MHC diversity (
Next, we asked whether higher MHC coverage could delay the development of cancer. We reasoned that if two patients acquired a cancer-driving mutation at the same time, the patient with higher MHC coverage would be more likely to expose their mutation to the immune system and stop expansion of the cancer. Thus, high MHC coverage should lead to diagnosis with cancer later in life and vice-versa (
Part B—Strength of Immune Selection in Tumors Varies with Sex and Age
Data were obtained from publicly available sources including The Cancer Genome Atlas (TCGA) Research Network (cancergenome.nih.gov on the World Wide
Web). TCGA normal exome sequences and TCGA clinical data were downloaded from the GDC. Furthermore, TCGA somatic mutations were accessed from the NCI Genomic Data Commons (portal.gdc.cancer.gov/ on the World Wide Web).
dbGaP studies (accession numbers: phs001493.v1.p1.c2, phs001041.v1.p1.c1, phs001425.v1.p1.c1, phs001493.v1.p1.c1, phs000980.v1.p1.c1, phs001469.v1.p1.c1, phs000452.v2.p1.c1, phs001451.v1.p1.c1, phs001519.v1.p1.c1, phs001565.v1.p1.c1) were obtained from the dbGaP database and WXS/WGS data obtained from the Sequence Read Archive (SRA) (Leinonen et al., 2010, Nuc. Acids Res., 39:E19-21). Somatic mutation files were obtained from the respective papers associated with each study. Additional non-TCGA patients' WXS/WGS data was obtained from the ICGC and somatic mutation data from the ICGC DCC Data Release (PCAWG and THCA-SA) (Appendix B). The validation cohort's MHC-I and -II genotypes were typed using HLA-HD (Kawaguchi et al., 2017, Hum. Mutat., 38:788:97), and PHBR scores calculated using the method described in “Presentation score assignment”.
HLA genotyping was performed for class I genes HLA-A, HLA-B, HLA-C and class II genes HLA-DRB1, HLA-DPA1, HLA-DPB1, HLA-DQA1 and HLA-DQB1, which encode three protein determinants of MHC-I peptide binding specificity, HLA DR, HLA-DP, and HLA-DQ. TCGA samples were typed with Polysolver (Shukla et al., 2015, Nat. Biotechnol., 33:1152-1158), with default parameters, for class I and typed with HLA-HD (Kawaguchi et al., 2017, Hum. Mutat., 38:788-97), using default parameters, for class II. Both tools requires germline (whole blood or tissue matched) whole exome sequenced samples. Samples with very low coverage on specific genes are left untyped by HLA-HD. Patients were assigned an HLA-DR type if they were successfully typed for HLA-DRB1. Patients were assigned HLA-DP and -DQ types if they had successful typing for HLA-DPA1/HLA-DPB1 and HLA-DQA1/HLA-DQB1, respectively. Class I and class II types were validated by xHLA (Xie et al., 2017, PNAS USA, 114:8059-64), run with default parameters, and only patients where all alleles agreed in both classes were included in the analysis.
Patient presentation scores, as defined in (Marty et al., 2017, Cell, 171:1272-83), were used to represent a particular patient's ability to present a residue given their distinct set of HLA types. For class I, 6 HLA alleles were considered (HLA-A, HLA-B and HLA-C). For class II, 12 HLA-encoded MHC-II molecules (4 combinations of HLA-DPA1/DPB1 and HLA-DQA1/DQB1; 2 alleles of HLA-DRB1 considered twice each—since HLA-DRA1 is invariant—for consistency between resulting molecules). The Patient Harmonic-mean Best Rank (PHBR) score was assigned as the harmonic mean of the best residue presentation scores for each group of MHC-I and MHC-II molecules. A lower patient presentation score indicates that the patient's MHC molecules are more likely to present a residue on the cell surface.
We would like to thank the TCGA research network for providing data used in the analyses, the ICGC database, as well as the following studies used in the validation cohort.
phs001493.v1.p1.c2 and phs001451.v1.p1.c1 We would also like to thank the Blavatnik Family Foundation, grants from the Broad Institute SPARC program, the National Institutes of Health (NCI-5R01CA155010-02, NHLBI-5R01HL103532-03, NCI-SPORE-2P50CA101942-11A1, NCI-R50-RCA211482A), the Francis and Adele Kittredge Family Immuno-Oncology and Melanoma Research Fund, the Faircloth Family Research Fund, and the DFCI Center for Cancer Immunotherapy Research fellowship and Leukemia and Lymphoma Society.
phs001041.v1.p1.c1 We thank Martin Miller at Memorial Sloan Kettering Cancer Center (MSKCC) for his assistance with the NetMHC server, Agnes Viale and Kety Huberman at the MSKCC Genomics Core, Annamalai Selvakumar and Alice Yeh at the MSKCC HLA typing laboratory for their technical assistance, and John Khoury for assistance in chart review.
phs001425.v1.p1.c1 Christine N. Spencer, Pei-Ling Chen, Michael T. Tetzlaff, Michael A. Davies, Jeffrey E. Gershenwald, Sapna P. Patel, Adi Diab, Isabella C. Glitza, Hussein Tawbi, Alexander J. Lazar, Patrick Hwu, Wen-Jen Hwu, Scott E. Woodman, Rodabe N. Amaria, Victor G. Prieto, and Jennifer A. Wargo enrolled subjects and contributed samples.
phs001493.v1.p1.c1 This study was supported by an AACR KureIt grant.
phs000980.v1.p1.c1 We thank the members of the Thoracic Oncology Service and the Chan and Wolchok labs at MSKCC for helpful discussions, as well as the Immune Monitoring Core at MSKCC, including L. Caro, R. Ramsawak, and Z. Mu, for exceptional support with processing and banking peripheral blood lymphocytes. We thank P. Worrell and E. Brzostowski for help in identifying tumor specimens for analysis. We thank A. Viale for superb technical assistance. We thank D. Philips, M. van Buuren, and M. Toebes for help performing the combinatorial coding screens. This work was supported by the Geoffrey Beene Cancer Research Center (MDH, NAR, TAC, JDW, AS), the Society for Memorial Sloan Kettering Cancer Center (MDH), Lung Cancer Research Foundation (WL), Frederick Adler Chair Fund (TAC), The One Ball Matt Memorial Golf Tournament (EBG), Queen Wilhelmina Cancer Research Award (TNS), The STARR Foundation (TAC, JDW), the Ludwig Trust (JDW), and a Stand Up To Cancer-Cancer Research Institute Cancer Immunology Translational Cancer Research Grant (JDW, TNS, TAC). Stand Up To Cancer is a program of the Entertainment Industry Foundation administered by the American Association for Cancer Research.
phs001469.v1.p1.c1 This work was supported by NIH grants R35CA197633, P01CA168585, 5P50CA168536 and GM08042. A comprehensive description of the data set can be found at PMID:29320474.
phs001519.v1.p1.c1 We thank the Ben and Catherine Ivy Foundation, the Blavatnik Family Foundation, the Broad Institute SPARC program, and NIH (NCI-1R01CA155010-02 (to C.J.W.)), NHLBI-5R01HL103532-03 (to C.J.W.), Francis and Adele Kittredge Family Immuno-Oncology and Melanoma Research Fund (to P.A.O.), Faircloth Family Research Fund (to P.A.O.), NIH/NCI R21 CA216772-01A1 (to D.B.K.), NCI-SPORE-2P50CA101942-11A1 (to D.B.K.); NHLBI-T32HL007627 (to J.B.I.); NCI (R50CA211482) (to S.A. S.), Zuckerman STEM Leadership Program (to I.T.); Benoziyo Endowment Fund for the Advancement of Science (to I.T.); P50 CA165962 (SPORE) and P01 CA163205 (to K.L.L.); DFCI Center for Cancer Immunotherapy Research fellowship (to Z.H.); Howard Hughes Medical Institute Medical Research Fellows Program (to A.J.A.); and American Cancer Society PF-17-042-01-LIB (to N.D.M.). C.J.W. is a scholar of the Leukemia and Lymphoma Society. We thank the Center for Neuro-Oncology, J. Russell and Dana-Farber Cancer Institute (DFCI) Center for Immuno-Oncology (CIO) staff; B. Meyers, C. Harvey and S. Bartel (Clinical Pharmacy); M. Severgnini, K. Kleinsteuber and E. McWilliams, (CIO laboratory); M. Copersino (Regulatory Affairs); T. Bowman (DFHCC Specialized Histopathology Core Laboratory); A. Lako (CIO); M. Seaman and D. H. Barouch (BIDMC); the Broad Institute's Biological Samples, Genetic Analysis and Genome Sequencing Platforms; J. Petricciani and M. Krane for regulatory advice; B. McDonough (CSBio), I. Javeri and K. Nellaiappan (CuriRx) for peptide development.
phs001565.v1.p1.c1 The research reported in this article was supported by BroadIgnite, BroadNext10, NIH K08CA188615, the Howard Hughes Medical Institute, and Stand Up To Cancer—American Cancer Society Lung Cancer Dream Team Translational Research Grant (grant number: SU2C-AACR-DT17-15). Stand Up To Cancer is a program of the Entertainment Industry Foundation. Research grants are administered by the American Association for Cancer Research, the scientific partner of SU2C.
Somatic mutations were considered to be recurrent and oncogenic if they occurred in one of the 100 most highly ranked oncogenes or tumor suppressors described by Davoli et al. (2013, Cell, 155:948-62) and were observed in at least 3 TCGA samples. Among these, only mutations that would result in predictable protein sequence changes that could generate neoantigens, including missense mutations and inframe indels, were retained. A total of 1,018 mutations (512 missense mutations from oncogenes, 488 missense mutations from tumor suppressors, 11 indels from oncogenes and 7 indels from tumor suppressors) were obtained (Marty et al., 2017, Cell, 171:1272-83).
Two matrices, for PHBR-I scores and PHBR-II scores, were built from the 1,018 mutations and the 1,912 patients with both PHBR-I and -II calls. Next, a binary mutation matrix yij e {0,1} indicating whether patient i has a specific mutation j was built. The relationship between this binary matrix, the matched 1,912×1,018 matrices with log PHBR-I and -II scores, x1ij and x2ij, respectively, and the variable of interest (sex or age) for patient i and mutation j were evaluated. A generalized additive model was fit for the centered log PHBR-I, centered log PHBR-II scores, centered sex (coded 0/1 for males/females) or centered age, and mutation probability with the GAM function in the MGCV R package (Wood et al., 2001, R. news, 1:20-5). To estimate the effects of PHBR and sex or age on probability of mutation, the following random effects models were considered:
Logit(P(yij=1))=β1x1ij+β2x2ij+β3Sexi+β1x1ij*Sexi+β2x2ij*Sexi+ηi
Logit(P(yij=1))=β1x1ij+β2x2ij+β3Agei+β1x1ij*Agei+β2x2ij*Agei+ηi
And a PHBR-II specific model (results in Table 4):
Logit(P(yij=1))=β1x2ij+β2Agei+β2Sexi+β2x2ij*Sexi+β2x2ij*Agei+ηi
where ηi˜N(0, θη) are random effects capturing different mutation propensities among patients. In these models, βn measures the effect of the log-PHBR-I, log-PHBR-II, and sex or age. This analysis was repeated for the validation cohort.
Mutational signatures analysis was performed using a previously developed computational framework SigProfiler (Alexandrov et al., 2013, Cell Rep., 3:246-59). A detailed description of the workflow of the framework can be found in (Alexandrov et al., 2013, Cell Rep., 3:246-59; biorxiv.org/content/early/2018/05/15/322859 on the World Wide Web), while the code can be downloaded freely from mathworks.com/matlabcentral/fileexchange/38724-sigprofiler on the World Wide Web).
All boxplots were evaluated using the default one-tailed Mann Whitney U statistical test, via the scipy.stats Python package. Mutational signature sex-specific distributions were also compared using the one-tailed Mann Whitney U test, and p-values were adjusted using the Benjamin-Hochberg Procedure.
Code to reproduce findings and figures can be freely accessed at github.com/CarterLab/HLA-immunoediting on the World Wide Web.
A set of 1,018 driver mutations, defined in (Marty et al., 2017, Cell, 171:1272-83), were examined, since driver mutations are more persistent in the clonal architecture of an individual's cancer and confer a selective growth advantage. MHC-I and MHC-II types were assigned based on the consensus of two exome-based calling methods (Shukla et al, 2015, Nat. Biotechnol., 33:1152-8; Xie et al., 2017, PNAS USA, 114:8059-64; and Kawaguchi et al., 2017, Hum. Mutat., 38:788-97) and only microsatellite-stable (MSS) TCGA patients that had identically matched typing were considered. Ultimately, 2,554 patients with confident MHC-I calls and 2,681 patients with confident MHC-II calls who were diverse in sex, with more males than females (
It was reasoned that the discrepancy might be due to differences in the strength of immune selection, e.g., tumors with stronger immunoediting should retain fewer driver mutations that are presentable to T cells by the patient's own MHC molecules. For sex- and age-specific groups in each cohort, the PHBR-I and PHBR-II score distributions for expressed driver mutations observed in patient tumors were compared. Across pan-cancer cohorts, females were at a significant disadvantage in presenting their driver mutations by both their MHC-I and MHC-II molecules (
Next, the immune system's ability to eliminate effectively-presented mutations was explored. Sex- and age-specific generalized additive models with random effects were used to account for variation in mutation rate across individuals and examined the coefficients corresponding to independent and interaction effects for PHBR-I, PHBR-II, and sex or age to assess their contribution to immune selection. In both models, it was found that PHBR-I and PHBR-II scores alone had significant effects on the probability of a mutation to be a target of immune selection (Table 5). Positive coefficients for both PHBR scores indicate that the higher the PHBR score (i.e., poorer presentation), the higher the probability of mutation. Furthermore, when the influence of both scores on probability of mutation were quantified using odds ratios between respective 25th and 75th percentiles, it was found that PHBR-II (OR: 2.11, CI [2.01, 2.20]) has a much larger impact on probability of mutation than PHBR-I (OR: 1.25, CI [1.23, 1.27]), echoing the larger effect sizes seen in
As females and younger patients both demonstrated stronger immunoediting compared to males and older patients, the cohorts were further segregated simultaneously by sex and age, and the distribution of PHBR-I and -II scores were investigated for these groups. It was found that sex and age effects are cumulative, with tumors in younger females exhibiting significantly higher selective pressure by MHC than those in the other three groups (
It was next explored whether sex- and age-specific effects could be driven by differences in environmental exposure rather than the strength of immunoediting. Mutational signatures assign specific mutations to different mutagenic processes, allowing the exploration of differences in environmental exposure across sex and age. The sex-specific occurrence of mutational signatures were compared in each tumor type and only a minority of instances were found where signature strength was weakly but significantly associated with sex (
We sought validation of these findings in a cohort of 465 MHC-I typed patients and 426 MHC-II typed patients, compiled from published dbGaP studies and non-TCGA samples in the International Cancer Genome Consortium (ICGC) database (Zhang et al., 2011, Database, bar026) and filtered to exclude tumor types not represented in TCGA. While fewer tumor types were represented relative to the discovery cohort, these patients were diverse with respect to sex and age at diagnosis, with slightly more males than females, and similar average numbers of driver mutations and PHBR score distributions for all patient groups (
It was found, as in the discovery cohort, that driver mutations had significantly poorer MHC-II presentation in younger females compared to older females and older males (p<2.16e-05, p<0.001), and trended toward significance relative to younger males (p<0.29) (
It is to be understood that, while the methods and compositions of matter have been described herein in conjunction with a number of different aspects, the foregoing description of the various aspects is intended to illustrate and not limit the scope of the methods and compositions of matter. Other aspects, advantages, and modifications are within the scope of the following claims.
Disclosed are methods and compositions that can be used for, can be used in conjunction with, can be used in preparation for, or are products of the disclosed methods and compositions. These and other materials are disclosed herein, and it is understood that combinations, subsets, interactions, groups, etc. of these methods and compositions are disclosed. That is, while specific reference to each various individual and collective combinations and permutations of these compositions and methods may not be explicitly disclosed, each is specifically contemplated and described herein. For example, if a particular composition of matter or a particular method is disclosed and discussed and a number of compositions or methods are discussed, each and every combination and permutation of the compositions and the methods are specifically contemplated unless specifically indicated to the contrary. Likewise, any subset or combination of these is also specifically contemplated and disclosed.
This application claims the benefit of priority under 35 U.S.C. 119(e) to U.S. Application No. 62/722,607 filed Aug. 24, 2018.
This invention was made with government support under CA220009, OD017937, T15LM011271, DP5-OD017937, P41-GM103504, and 2015205295 awarded by the National Institutes of Health, the National Resource for Network Biology (NRNB), and the National Science Foundation. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/047981 | 8/23/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62722607 | Aug 2018 | US |