EPIGENOMIC PROFILING REVEALS THE SOMATIC PROMOTER LANDSCAPE OF PRIMARY GASTRIC ADENOCARCINOMA

Information

  • Patent Application
  • 20210301348
  • Publication Number
    20210301348
  • Date Filed
    February 16, 2017
    7 years ago
  • Date Published
    September 30, 2021
    3 years ago
Abstract
The present invention relates to a method for determining the presence or absence of at least one promoter in a cancerous biological sample relative to a non-cancerous biological sample. The present invention also relates to a method for determining the prognosis of cancer in a subject, a method for modulating the activity of at least one cancer-associated promoter in a cell, a method for modulating the immune response of a subject to cancer, a method for determining the presence of at least one cancer-associated promoter in a cancerous biological sample relative to a non-cancerous biological sample and a biomarker for detecting cancer in a subject.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority of Singapore application No. 10201601142V, filed 16 Feb. 2016, the contents of it being hereby incorporated by reference in its entirety for all purposes.


FIELD OF THE INVENTION

The invention relates to a method for determining the presence or absence of at least one promoter in a cancerous biological sample relative to a non-cancerous biological sample.


BACKGROUND OF THE INVENTION

Gastric cancer (GC) is the third leading cause of global cancer mortality with high prevalence in many East Asian countries. GC patients often present with late-stage disease, and clinical management remains challenging as exemplified by several recent negative Phase II and Phase III clinical trials. At the molecular level, studies have identified characteristic gene mutations, copy number alterations, gene fusions, and transcriptional patterns in GC. However, few of these have been clinically translated into targeted therapies, with the exception of HER2-positive GC and traztuzumab. There is thus a strong need for additional and more comprehensive explorations of GC, as these may highlight new biomarkers for disease detection, predicting patient prognosis or responses to therapy, as well as new therapeutic modalities.


Promoter elements are cis-regulatory elements which function to link gene transcription initiation to upstream regulatory stimuli, integrating inputs from diverse signaling pathways. Promoters represent an important reservoir of biological, functional, and regulatory diversity, as current estimates suggest that 30-50% of genes in the human genome are associated with multiple promoters, which can be selectively activated as a function of developmental lineage and cellular state. Differential usage of alternative promoters causes the generation of distinct 5′ untranslated regions (5′ UTRs) and first exons in transcripts, which in turn can influence mRNA expression levels, translational efficiencies, and generation of different protein isoforms through gain and loss of 5′ coding domains. To date, promoter alterations in cancer have been largely studied on a gene-by-gene basis, and very little is known about the global extent of promoter-level diversity in GC and other solid malignancies.


Accordingly, there is a need for a method of profiling promoter elements in cancer.


SUMMARY

In one aspect there is provided a method for determining the presence or absence of at least one promoter in a cancerous biological sample relative to a non-cancerous biological sample, comprising: contacting the cancerous biological sample with at least one antibody specific for histone modifications H3K4me3 and H3K4me1; isolating nucleic acid from the cancerous biological sample having a signal ratio of H3K4me3 relative to H3K4me1 greater than 1, wherein the isolated nucleic acid comprises at least one region specific to said histone modifications; detecting a signal intensity of H3K4me3 in the isolated nucleic acid; and determining the presence or absence of at least one promoter in the cancerous biological sample based on the change in the signal intensity of H3K4me3 relative to the signal intensity of H3K4me3 in a non-cancerous biological sample.


In another aspect there is provided a method for determining the prognosis of cancer in a subject, comprising, contacting a cancerous biological sample obtained from the subject with at least one antibody specific for histone modification H3K4me3 and H3K4me1; isolating nucleic acid from the cancerous biological sample having a signal ratio of H3K4me3 relative to H3K4me1 greater than 1, wherein the isolated nucleic acid comprises at least one region specific to said histone modifications; detecting a signal intensity of H3K4me3 in the isolated nucleic acid; and determining the presence or absence of at least one cancer-associated promoter in the cancerous biological sample based on the change in the signal intensity of H3K4me3 relative to the signal intensity of H3K4me3 in a reference nucleic acid sequence, wherein the presence or absence of the at least one cancer-associated promoter in the cancerous biological sample is indicative of the prognosis of the cancer in the subject.


In another aspect there is provided a biomarker for detecting cancer in a subject, the biomarker comprising at least one promoter having a change in signal intensity of H3K4me3 in a cancerous biological sample relative to a non-cancerous biological sample.


In another aspect there is provided a method for modulating the activity of at least one cancer-associated promoter in a cell, comprising administering an inhibitor of EZH2 to the cell.


In another aspect there is provided a method for modulating the immune response of a subject to cancer, comprising administering to the subject an inhibitor of EZH2, wherein the EZH2 is associated with at least one cancer-associated promoter in the subject.


In another aspect there is provided a method for determining the presence or absence of at least one cancer-associated promoter in a cancerous biological sample relative to a non-cancerous biological sample, comprising: contacting the cancerous biological sample with at least one antibody specific for histone modifications H3K4me3 and H3K4me1; isolating nucleic acid from the cancerous biological sample having a signal ratio of H3K4me3 relative to H3K4me1 greater than 1, wherein the isolated nucleic acid comprises at least one region specific to said histone modifications; detecting a signal intensity of H3K4me3 in the isolated nucleic acid at a read depth of 20M; and determining the presence or absence of at least one cancer-associated promoter in the cancerous biological sample based on the change in the signal intensity of H3K4me3 relative to the signal intensity of H3K4me3 in a non-cancerous biological sample.


In one aspect, there is provided a biomarker comprising at least one promoter having a change in signal intensity of H3K4me3 in a cancerous biological sample relative to a non-cancerous biological sample for use in detecting cancer in a subject.


In one aspect, there is provided a use of a biomarker comprising at least one promoter having a change in signal intensity of H3K4me3 in a cancerous biological sample relative to a non-cancerous biological sample in the manufacture of a medicament for detecting cancer in a subject.


In one aspect, there is provided an inhibitor of EZH2 for use in modulating the activity of at least one cancer-associated promoter in a cell.


In one aspect, there is provided a use of an inhibitor of EZH2 in the manufacture of a medicament for modulating the activity of at least one cancer-associated promoter in a cell.


In one aspect, there is provided an inhibitor of EZH2 for use in modulating the immune response of a subject to cancer, wherein the EZH2 is associated with at least one cancer-associated promoter in the subject.


In one aspect, there is provided a use of an inhibitor of EZH2 in the manufacture of a medicament for modulating the immune response of a subject to cancer, wherein the EZH2 is associated with at least one cancer-associated promoter in the subject.


Definitions

The following are some definitions that may be helpful in understanding the description of the present invention. These are intended as general definitions and should in no way limit the scope of the present invention to those terms alone, but are put forth for a better understanding of the following description.


As used herein, the term “promoter” is intended to refer to a region of DNA that initiates transcription of a particular gene.


As used herein, the term “cancerous” relates to being affected by or showing abnormalities characteristic of cancer.


As used herein, the term “biological sample” refers to a sample of tissue or cells from a patient that has been obtained from, removed or isolated from the patient. The term “obtained or derived from” as used herein is meant to be used inclusively. That is, it is intended to encompass any nucleotide sequence directly isolated from a biological sample or any nucleotide sequence derived from the sample.


As used herein, the term “antibody” or “antibodies” as used herein refers to molecules with an immunoglobulin-like domain and includes antigen binding fragments, monoclonal, recombinant, polyclonal, chimeric, fully human, humanised, bispecific and heteroconjugate antibodies; a single variable domain, single chain Fv, a domain antibody, immunologically effective fragments and diabodies.


The term “specifically binds” as used throughout the present specification in relation to antigen binding proteins means that the antigen binding protein binds to a target epitope on an antigen with a greater affinity than that which results when bound to a non-target epitope. In certain embodiments, specific binding refers to binding to a target with an affinity that is at least 10, 50, 100, 250, 500, or 1000 times greater than the affinity for a non-target epitope. For example, binding affinity may be as measured by routine methods, e.g., by competition ELISA or by measurement of Kd with BIACORE™, KINEXA™ or PROTEON™.


As used herein, the term “isolated” relates to a biological component (such as a nucleic acid molecule, protein or organelle) that has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, i.e., other chromosomal and extra-chromosomal DNA and RNA, proteins and organelles. Nucleic acids and proteins that have been “isolated” include nucleic acids and proteins purified by standard purification methods. The term also embraces nucleic acids and proteins prepared by recombinant expression in a host cell as well as chemically synthesized nucleic acids.


As used herein, the term “nucleic acid” refers to a deoxyribonucleotide or ribonucleotide polymer in either single, or double stranded form, and unless otherwise limited, encompassing known analogues of natural nucleotides that hybridize to nucleic acids in a manner similar to naturally occurring nucleotides, “Nucleotide” includes, but is not limited to, a monomer that includes a base linked to a sugar, such as a pyrimidine, purine or synthetic analogs thereof, or a base linked to an amino acid, as in a peptide nucleic acid (MA). A nucleotide is one monomer in a polynucleotide. A nucleotide sequence refers to the sequence of bases in a polynucleotide.


As used herein, the term “prognosis” or grammatical variants thereof, as used herein refers to a prediction of the probable course and outcome of a clinical condition or disease. A prognosis of a patient is usually made by evaluating factors or symptoms of a disease that are indicative of a favorable or unfavorable course or outcome of the disease. The term “prognosis” does not refer to the ability to predict the course or outcome of a condition with 100% accuracy. Instead, the term “prognosis” refers to an increased probability that a certain course or outcome will occur; that is, that a course or outcome is more likely to occur in a patient exhibiting a given condition, when compared to those individuals not exhibiting the condition.


As used herein, the term “modulating” is intended to refer to an adjustment of the immune response to a desired level.


As used herein, the term “annotated promoter” refers to a promoter mapping close (<500 bp) to a known Gencode transcription start site (TSS).


The term “unannotated promoter” refers to a promoter mapping to genomic regions devoid of known Gencode TSSs.


As used herein, the term “canonical” in the context of a promoter refers to a promoter region exhibiting unaltered H3K4me3 peaks.


As used herein, the term “detectable label” or “reporter” refers to a detectable marker or reporter molecules, which can be attached to nucleic acids. Typical labels include fluorophores, radioactive isotopes, ligands, chemiluminescent agents, metal sols and colloids, and enzymes. Methods for labeling and guidance in the choice of labels useful for various purposes are discussed, e.g., in Sambrook et al., in Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press (1989) and Ausubel et al., in Current Protocols in Molecular Biology, Greene Publishing Associates and Wiley-Intersciences (1987),


As used herein, the term “hypomethylated” refers to a decrease in the normal methylation level of DNA,


As used herein, the term “hypermethylated” refers to an increase in the normal methylation level of DNA.


As used herein, the term “about”, in the context of concentrations of components of the formulations, typically means +/−5% of the stated value, more typically +/−4% of the stated value, more typically +/−3% of the stated value, more typically, +/−2% of the stated value, even more typically +/−1% of the stated value, and even more typically +/−0.5% of the stated value.


Throughout this disclosure, certain embodiments may be disclosed in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosed ranges. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.


Certain embodiments may also be described broadly and generically herein. Each of the narrower species and subgeneric groupings falling within the generic disclosure also form part of the disclosure. This includes the generic description of the embodiments with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein.


Unless the context requires otherwise or specifically stated to the contrary, integers, steps, or elements of the invention recited herein as singular integers, steps or elements clearly encompass both singular and plural forms of the recited integers, steps or elements.


The word “substantially” does not exclude “completely” e.g. a composition which is “substantially free” from Y may be completely free from Y. Where necessary, the word “substantially” may be omitted from the definition of the invention.


The invention illustratively described herein may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. Thus, for example, the terms “comprising”, “including”, “containing”, etc. shall be read expansively and without limitation. Additionally, the terms and expressions employed herein have been used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. Thus, it should be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the inventions embodied therein herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention.


The invention has been described broadly and generically herein. Each of the narrower species and subgeneric groupings falling within the generic disclosure also form part of the invention. This includes the generic description of the invention with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein.


Other embodiments are within the following claims and non-limiting examples. In addition, where features or aspects of the invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.





BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which



FIG. 1: Somatic Promoter Alterations in Primary Gastric Adenocarcinoma.


A) Example of an unaltered GC promoter. The UCSC genome track of the RHOA TSS (shaded box) highlights similar H3K4me3 signals in GC and matched normal samples. Similar signals are seen in GC lines. The bottom two tracks display similar levels of RNA expression in the same GC and matched normal sample (RNAseq).


B) Example of a gained somatic promoter. The UCSC genome track of the CEACAM6 TSS (shaded box) highlights gain of H3K4me3 signals in GC samples and GC lines, compared to matched normal samples. In contrast, no changes are observed at the TSS of CEACAM5, an adjacent gene. Concordant tumor-specific gain of RNA expression is shown in the bottom 2 tracks displaying RNA-seq profiles of the same GC and matched normal samples.


C) Example of a lost somatic promoter. The UCSC genome track of the ATP4A TSS (shaded box) highlights loss of H3K4me3 signals in GC samples and GC lines compared to matched normal samples. Concordant tumor-specific loss of RNA expression is shown in the bottom 2 tracks displaying RNA-seq profiles of the same GC and gastric normal samples.


D) Heatmap of H3K4me3 read densities (row scaled) of somatic promoters (rows) in primary GCs and matched normal samples.


E) Correlation between H3K4me3 promoter signals and H3K27ac activity signals in primary gastric samples (r=0.91, P<0.001). Each data point corresponds to a single H3K4me3 hi/H3K4me1 lo region. Analysis was performed using data from 16 N/T pairs (Table 4).


F) Top 5 gene sets associated with canonical gained and lost somatic promoters. Genesets associated with genes up and downregulated in GC are rediscovered. Also note that gene sets related to H3K27me3 and SUZ12, a PRC2 component, are enriched.



FIG. 2: Association of Somatic Promoter Alterations with Gene Expression in GC and Other Tumor Types


A) Example of a GC somatic promoter. Example is for illustrative purposes only.


B) Changes in RNA-seq expression (top) and DNA methylation (bottom) in discovery samples between somatic promoters and all promoters. Top—Boxplot depicting changes in RNA-seq expression between 9 paired primary GC and gastric normal samples at genomic regions exhibiting somatic promoters (gained and lost) (***P<0.001, Wilcoxon Test). Bottom—Boxplot depicting changes in DNA methylation (β-values) at regions exhibiting somatic promoters between 20 paired GC and gastric normal samples, compared to all promoters. (***P<0.001, Wilcoxon test)


C) Independent Validation Cohorts. Boxplot depicting changes in RNA-seq expression at genomic regions exhibiting somatic promoters across 354 (321 GC, 33 normal) TCGA Stomach adenocarcinoma (STAD) samples, compared to all promoters (***P<0.001, Wilcoxon test)


D) Somatic Promoters in Other Cancer Types. Boxplot depicting changes in RNA-seq expression at genomic regions exhibiting GC somatic promoters compared against all promoters, across 326 TCGA Colon adenocarcinoma (COAD) samples (286 COAD, 40 normal; ***P<0.001, Wilcoxon test), 170 TCGA kidney renal clear cell carcinoma (ccRCC) samples (98 ccRCC and 72 normal; ***P<0.001, Wilcoxon test), and 115 TCGA lung adenocarcinoma (LUAD) samples (58 LUAD, 57 normal; ***P<0.001 somatic gain vs all promoters and somatic gain vs. somatic loss, Wilcoxon test).



FIG. 3: Alternative Promoters in GC


A) UCSC browser track of the HNF4α gene. GC and matched gastric normal samples have equal H3K4me3 signals at the canonical HNF4α promoter. However, an alternative promoter, seen by H3K4me3 gain, can be observed at a downstream TSS in GCs compared to matched normals. At the RNA level, both in-house and TCGA STAD samples also show gain of gene expression at the alternate promoter TSS compared to normal samples.


B) UCSC browser track of the EPCAM gene. Another example of alternative promoter usage at a downstream TSS. Gain of H3K4me3 is observed at a TSS downstream of the canonical promoter, while the canonical promoter exhibits equal H3K4me3 signals in GC and gastric normal. Gain of RNA-seq expression can also be observed in GC at the alternative promoter driven transcript in both in-house and TCGA STAD samples.


C) UCSC browser track of the RASA3 gene, demonstrating H3K4me3 and RNA-seq signals highlighting gain of promoter activity at an un-annotated TSS (dark grey box) corresponding to a novel N-terminal truncated RASA3 transcript. Expression of this variant transcript was validated through 5′RACE in GC lines (bottom).


D) Functional domains of the translated RASA3 canonical and alternate isoform. The alternate transcript is predicted to encode a RASA3 protein missing the RASGAP domain. E) Effect of overexpression of RASA3 canonical (CanT) and alternate (SomT) isoforms on the migration capability of SNU1967 (top) and GES1 (bottom) cells. Representative images of RASA3-Ctl (Empty vector), RASA3-CanT and RASA3-SomT in migration assays (n=3). Barplots show the % area of migrated cells vs the area of transwell membrane. Data is shown as mean±SD; n=3. (*P<0.05, **P<0.01, ***P<0.001, Student's one sided t-test)



FIG. 4: Somatic Promoter Alterations Exhibit Immunoediting Signatures


A) Schematic outlining alternative promoter usage leading to alternative transcript usage (Transcript box) and N terminally truncated protein isoforms (protein box).


B) Barplot showing the average % of peptides with predicted high-affinity binding to MHC Class I (HLA-A, B, and C, IC<=50 nm). N-terminal peptides associated with recurrent somatic promoters (alternative promoters) show significantly enriched predicted MHC I binding compared to canonical GC peptides (P<0.01, Fisher's test), random peptides from the human proteome (P<0.001) and C-terminal peptides (P<0.01) derived from the same genes exhibiting the N-terminal alterations. Canonical peptides refer to peptides derived from protein coding genes overexpressed in GC through non-alternative promoters.


C) Percentage (%) of high affinity peptides predicted to bind different HLA-alleles categorized by somatic gain or loss. Most alleles have a greater number of N-terminal lost peptides predicted to have high binding affinity.


D) Quantification of somatic promoter expression using Nanostring profiling. Top—Distinct Nanostring probes were designed to measure expression of alternate and canonical promoter driven transcripts. 2 probes were designed for each gene—a canonical probe at the 5′ transcript marked by unaltered H3K4me3, and an alternate probe at the 5′ transcript of the somatic promoter. Bottom—Heatmap of alternative promoter expression from 95 GCs and matched normal samples. GC samples have been ordered left to right by their levels of somatic promoter usage.


E) Association between Somatic Promoters and T-cell immune correlates (Singapore (SG) cohort). Top left—Expression of T-cell markers CD8A (P=0.1443) and the T-cell cytolytic markers GZMA (P=0.0001) and PRF1 (P=0.00806) in GC samples with either high or low somatic promoter usage (SG). Samples with high alternative promoter usage show lower expression of immune markers. All P values are from Wilcoxon one sided test. Right-Kaplan-Meier analysis comparing overall survival curves between validation samples with high somatic promoter usage (top 25%) and low somatic promoter usage (bottom 25%) (HR=2.56, P=0.02).


F) Association of Somatic Promoters with T-cell Correlates in TCGA and ACRG Cohorts. (Left) Expression of T-cell markers CD8A (P=0.02), GZMA (P=0.01) and PRF1 (P=0.03) in TCGA STAD with either high or low somatic promoter usage. T-cell markers were evaluated by RNA-seq (Transcripts per million, Right) Expression of T-cell markers CD8A (P=0.035), GZMA (P=0.001) and PRF1 (P=0.025) in ACRG GC samples with either high or low somatic promoter usage. All P values are from Wilcoxon one sided test.


G) EpiMAX Heatmap of total cytokine responses (Fold change relative to Actin) for 15 peptide pools against 9 donors.


H) Individual cytokine responses against 15 peptides for two individual donors (Donor 2 and Donor 3) showing complex cytokine responses (FC2).



FIG. 5: Somatic Promoters are Associated with EZH2 Occupancy


A) Binding enrichment of ReMap-defined TFBSs at genomic regions exhibiting somatic promoters. TFs were sorted according to their binding frequency at all H3K4me3-defined promoter regions. EZH2 and SUZ12 binding sites significantly overlap regions exhibiting somatic promoters (gained and lost) (P<0.01, Empirical distribution test).


B) Proportion of RNA transcripts associated with somatic promoters changing upon GSK126 treatment in IM95 cells, compared to RNA transcripts associated with unaltered promoters. The top somatic promoter figure is for illustrative purposes only. Unaltered promoters were defined as all gene promoters except the somatic promoters. The proportion of genes changing upon treatment, as a proportion of all genes, is also shown. Somatic promoters are more likely to change expression after GSK126 treatment relative to unaltered promoters (OR 1.46, P<0.001) or all GSK126 regulated genes (OR 9.21, P<0.001, Fisher Test)


C) UCSC browser track of the SLC9A9 TSS, a gene with loss of promoter activity. Gain of expression is seen after inhibition of EZH2 using GSK126 in IM95 cells at both day 6 (D6) and Day 9 (D9) treatment.


D) UCSC browser track of the PSCA TSS, with loss of promoter activity. Gain of expression is seen after inhibition of EZH2 using GSK126 in IM95 cells at both day 6 (D6) and Day 9 (D9) treatment.



FIG. 6: Somatic promoters reveal novel cancer-associated transcripts


A) Distribution of distances for different promoter categories to the nearest annotated TSSs. (left) The first barplot shows distance distributions for promoters present in gastric normal tissues, the second for promoter present in GC samples, and the third for promoters exhibiting somatic alterations (i.e. different in tumor vs normal). (right) The barplots present distance distributions associated with either lost or gained somatic promoters. A substantial proportion of gained somatic promoters occupy locations distant from previously annotated TSSs


B) Median functional scores of unannotated promoters as predicted by GenoSkyline across 7 different tissues. Unannotated promoters exhibited high functional scores for GI, fetal and ESC tissues.


C) Boxplot depicting average RNA-seq reads for CAGE-validated promoters, comparing either all promoters or somatic promoters and also supported by CAGE data. (**P<0.001, Wilcoxon one sided test). Somatic promoters are observed to have lower levels of RNA-seq expression.


D) Cartoon depicting proposed effects of dynamic range on NanoChIP-seq and RNA-seq sensitivity in detecting lowly expressed transcripts. Due to a more restricted dynamic range, epigenomic profiling may detect active promoters missed by RNA-sequencing, due to the random sampling of abundantly expressed genes by RNAseq.


E) Down and Up-sampling analysis. The y-axis depicts the number of transcripts detected that overlap either all promoters or somatic promoters at varying RNA-sequencing depths. Original primary sample RNA-seq data was sequenced at ˜106M reads which was down-sampled to 20M, 40M and 60M reads. Deep RNA-seq data was additionally generated at ˜139M read depth.


F) Cancer-associated transcripts detected at deep but not regular RNA-seq depth. The UCSC genome browser track for ABCA13 shows an example of a novel transcript detected by NanoChIP-seq at a read depth of 20M but only detected by RNA-sequencing at read depth of ˜139M (Deep sequencing GC). This transcript is not detected by regular depth RNA-seq (GC).



FIG. 7: Chromatin Profiles of Primary GC


A) Chromatin profiles of primary GCs, matched normal gastric mucosae, and GC cell lines for 3 marks (H3K4me3, H3K27ac and H3K4me1). Shown are UCSC genome browser tracks of the GC driver gene MYC highlighting strong H3K4me3 and H3K27ac signals and low H3K4me1 at promoter locations


B) H3K4me3, H3K27ac and H3K4me1 signal distributions at transcription start sites (TSS). Line plots show the distribution of chromatin signals for H3K4me3 hi/H3K4me1 lo regions at TSS regions (+/−3 kb). Heatmaps were plotted using ngs.plot(6) for the top 10,000 H3K4me3 hi/H3K4me1 lo regions


C) Density distributions of H3K4me3:H3K4me1 ratios at identified H3K4me3 regions. All regions with H3K4me3/H3K4me1 ratios >1 were selected for further analysis (73%)


D) Distribution of H3K4me3 hi/H3k4me1 lo regions against representative gene body features (top). The arrow represents the TSS.


E) Enrichment of H3K4me3 hi/H3K4me1 lo regions against 15 chromatin states (columns) defined in different gastrointestinal tissues from the Epigenome Roadmap database (rows). Each column is scaled from 0 to 1.


F) Overlap of H3K4me3 hi/H3K4me1 lo regions with FANTOMS CAGE data



FIG. 8: Epithelial features of GC promoters


A) Spearman correlation heat-map between H3K4me3 signals of primary GC, gastric normal samples (red type, highlighted by red arrow) and various tissue types from the Epigenome Roadmap database across all H3K4me3 hi/H3K4me1 lo regions


B) Overlap of H3K4me3 hi/H3K4me1 lo regions with H3K4me3 regions identified in GC cell lines (87%), gastrointestinal fibroblast cells (61%) and colon carcinoma lines (74%)



FIG. 9: GC Somatic Promoter Features


A) Differential (somatic) H3K4me3 regions identified from 2 independent algorithms DESeq2 and edgeR. 96% of regions identified from DESeq2 overlapped those identified using edgeR. Both sets were pooled for subsequent analysis.


B) Principal component analysis of 16 GC and gastric normal samples based on somatic promoters


C) Heatmap of H3K27ac read densities across 16 GC and gastric normal samples across 1959 somatic promoters.


D) Correlation between H3K4me3 promoter signals and H3K27ac activity signals in primary gastric samples for gained somatic (Left, r=0.78, p<0.001) and lost somatic (Right, r=0.82, p<0.001) promoters. Each data point corresponds to a single H3K4me3 hi/H3K4me1 lo region. Analysis was performed using data from 16 N/T pairs (Table 4).


E) Volcano plot of somatic promoters (Top) highlighting the dynamic range of fold changes differences (x-axis) and the false discovery rate (FDR)-adjusted significance (−log 10 scale, y axis). The majority of the somatic promoters lie between FC 1 and 2.82, which likely reflects the dynamic range of Chip-seq. The Table (bottom) lists the number of somatic promoters identified at differing levels of stringency. Despite varying FDR thresholds, the majority of differential peaks are still preserved (e.g. 59% at q<0.01).


F) Enrichment analysis of somatic promoters at varying fold change and FDR (q value) for top 5 genesets (FIG. 1F) associated with gained (red) and lost somatic promoters (blue). X axis reflects the −log 10 p value for gene-sets found to be enriched in subsets of somatic promoters. Even at stricter fold change (FC 2) and q-value thresholds (0.05, 0.01 and 0.001), similar GC specific and PRC2 associated signatures are still observed.



FIG. 10: Association of Somatic Promoters with Gene Expression in GC and Other Tumor Types


A) Example of a GC somatic promoter. Example is for illustrative purposes only.


B) Changes in RNA-seq expression (top) and DNA methylation (bottom) discovery samples between somatic promoters and unaltered promoters. Top—Boxplot depicting changes in RNA-seq expression between 9 paired primary GC and gastric normal samples at genomic regions exhibiting somatic promoters (gained and lost) (***P<0.001, Wilcoxon Test). Bottom—Boxplot depicting changes in DNA methylation (β-values) at regions exhibiting somatic promoters between 20 paired GC and gastric normal samples, compared to unaltered promoters (***P<0.001, Wilcoxon test)


C) Independent Validation Cohorts. Boxplot depicting changes in RNA-seq expression at genomic regions exhibiting somatic promoters across 354 (321 GC, 33 normal) TCGA Stomach adenocarcinoma (STAD) samples, compared to unaltered promoters (***P<0.001, Wilcoxon test)


D) Somatic Promoters in Other Cancer Types. Boxplot depicting changes in RNA-seq expression at genomic regions exhibiting GC somatic promoters compared to unaltered promoters, across 328 TCGA Colon adenocarcinoma (COAD) samples (286 COAD, 40 normal; ***P<0.001, Wilcoxon test), 170 TCGA kidney renal clear cell carcinoma (ccRCC) samples (98 ccRCC and 72 normal; ***P<0.001, Wilcoxon test), and 115 TCGA lung adenocarcinoma (LUAD) samples (58 LUAD, 57 normal; ***P<0.001 Somatic gain vs unaltered and somatic gain vs somatic loss, *P<0.05 Somatic loss vs unaltered, Wilcoxon test).



FIG. 11: Changes in DNA methylation at CpG island containing promoters


A) Boxplot depicting changes in DNA methylation (β-values) at CpG island bearing somatic promoters between 20 paired GC and gastric normal samples, compared to all promoters bearing CpG islands (**P<0.001, Wilcoxon test)



FIG. 12: Expression distribution of alternative and canonical isoforms


A) Barplot showing distribution of T/N ratios of canonical and alternative transcript isoforms for all alternative transcripts (Global—top), HNF4α (middle), and EPCAM (bottom) using four independent quantification techniques, Cufflinks, MISO, Kallisto and NanoString. The Nanostring platform is introduced in FIG. 4 of the Main Text. ++ Nanostring analysis is confined to queried probes. (*P<0.05, **P<0.01, ***P<0.001, Wilcoxon one sided test).


B) Boxplot showing the T/N ratio of N-terminal reads mapping to canonical promoters, compared to N-terminal reads mapping to alternative promoters. Alternative promoter driven transcripts exhibit significantly higher T/N ratios (p=0.04, Wilcoxon one sided test).



FIG. 13: Characterization of RASA3 Isoform


A) UCSC browser track of the RASA3 gene demonstrating H3K4me3 and RNA-seq signals at Somatic and Canonical TSSs. The Canonical TSS has equal signals while the Somatic TSS shows gain of promoter activity at an un-annotated TSS corresponding to a novel N-terminal truncated RASA3 transcript.


B) UCSC browser track of the RASA3 gene demonstrating RNA-seq signals for the NCC24 GC cell line at Somatic and Canonical TSSs. NCC24 only expresses RASA3 SomT (also see C).


C) Left—Identification of RASA3 SomT and CanT transcripts in NCC24 and NCC59 GC cells by 5′RACE. A third line (MKN1), was negative for RASA3 SomT as shown in the gel picture. A no-RNA template was run as a negative control. Right-Western Blot highlighting expression of RASA3 SomT protein in NCC24 cells.


D) RAS GTP assays. (left) The Western blot shows levels of RAS in GES1 cells transfected with either empty vector (EV), RASA3 CanT or RASA3 SomT (n=3). GES1 cells were serum-starved overnight followed by serum stimulation for 30 minutes prior to harvest and a RAS-GTP pull down assay. Total RAS was measured in corresponding whole cell protein lysates. β-actin was used as a loading control. Positive (GTP) and negative (GDP) controls from the pull down assay are also shown. (right) The barplot quantifies active RAS intensity from three independent pull-down assays, performed in GES1 cells transfected with either empty vector (EV), RASA3 CanT or RASA3 SomT under FBS exposed conditions. Data is shown as mean±SD; n=3. (*P<0.05, Student's two sided t-test).


E) Cell proliferation assays of SNU1967, GES1 and AGS cells after transfection with RASA3 CanT and SomT normalized to Day 0. (Data is shown as mean±SD performed in triplicate, representative of 3 independent experiments).


F) Effect of overexpression of RASA3 CanT and SomT isoforms on the invasive capability of GES1 and SNU1967 cells. Representative images of EV, RASA3-WT and RASA3-Var in invasion assay (n=3). Barplot showing % area of invaded cells vs the area of transwell membrane. Data is shown as mean±SD; n=3. (*P<0.05, **P<0.01, ***P<0.001, Student's one sided t-test).


G) Effect of overexpression of RASA3 CanT and SomT protein isoforms on the migration capability of highly migratory KRAS mutated AGS cells. Barplot showing % area of migrated cells vs the area of transwell membrane. Data is shown as mean±SD; n=3. (*P<0.05, **P<0.01, ***P<0.001, Student's one sided t-test). RASA3 WT induces more potent migration suppression than RASA3 Var, suggesting that RASA3 WT is a migration inhibitor.


H) siRNA-mediated knockdown of RASA3 SomT in NCC24 cells. Cells were treated with sc-siRNA (control) and 2 RASA3 siRNAs (siRNA1-hs.Ri.RASA3.13 TriFECTa® Kit DsiRNA and siRNA-3-Silencer® Select Pre-Designed siRNA s355). (Left) Barplots showing fold change differences in mRNA expression of RASA3 SomT after treatment with siRNA-1 and siRNA-3. Data is shown as mean±SD; n=3. (Right) Western blotting results confirming RASA3 SomT protein reductions. Cells were harvested and lysed after 48 hrs of transfection. (***P<0.001, Student's one sided t-test).


I) Effect of siRNA knockdown of RASA3 SomT isoform on the migration (left) and invasive (right) capability of NCC24 cells from two independent siRNAs. Representative images of sc-siRNA (control), siRNA-1, and siRNA-3 in migration and invasion assays (n=3). Barplot showing % area of migrated/invaded cells vs the area of transwell membrane. Data is shown as mean±SD; n=3. (*P<0.05, **P<0.01, ***P<0.001, Student's one sided t-test).



FIG. 14: Characterization of MET Isoforms


A) UCSC browser track of the MET gene, demonstrating H3K4me3 and RNA-seq signals highlighting gain of promoter activity at an alternative downstream locus (dark grey box).


B) Functional domains of the MET canonical (WT) and alternative (Var) isoform. The alternative isoform is predicted to encode a MET protein with an N terminally truncated SEMA domain.


C) Expression of MET (Var) transcripts in GC lines, as detected by 5′RACE.


D) Western blot of HEK293 cells transfected with empty vector (EV), MET canonical full length (MET-WT) and truncated Variant (MET-Var) at 0, 15 and 30 minutes of HGF treatment (100 ng/ml) (n=3). GAB1, STAT3 and ERK1/2 are known downstream effectors of MET signaling. Number below each band is the quantified intensity using Image Lab. In both untreated and HGF-treated conditions, MET-Var transfected cells exhibited higher levels of p-Gab1 (Y627), a key mediator of MET signaling (2.48-3.95 fold, p=0.003 (untreated), p<0.05 (T15 and T30). In untreated samples, cells transfected with MET-Var also exhibited higher pERK1/2 levels (2.74 fold) and also higher p-STAT3 (Y705) levels (1.80 fold) compared to MET-WT (p=0.023 and p=0.026 for pERK and p-STAT3 (Y705) respectively).


E) Bar graphs showing increase in pERK1/2 for EV, MET-WT and MET-Var at T0, T15 and T30, reflecting effects of HGF treatment. Data is shown as mean±SD; n=3. (*P<0.05, **P<0.01, ***P<0.001, Student's one sided t-test)


F) Bar graphs showing increase in p-GAB1 (Y627), p-STAT3 (Y705), and pERK1/2 in cells transfected with MET-Var compared to EV and MET-WT. Graphs for all 3 time points are shown. Data is shown as mean±SD; n=3. (*P<0.05, **P<0.01, ***P<0.001, Student's one sided t-test)



FIG. 15: Immunogenicity of N-terminal peptides


A) Barplot showing average % of N-terminal peptides with predicted high-affinity binding to MHC Class I HLA-A (IC<=50 nm). As comparison, the figure in the Main Text represents average % s based on all three HLA classes (HLA-A, HLA-B, HLA-C). N-terminal peptides associated with recurrent somatic alternative promoters show significantly enriched predicted MHC I binding compared to canonical GC peptides (p<0.01), random peptides from human proteome and C-terminal peptides (p<0.001, Fisher's Test) derived from the same genes exhibiting the N-terminal alterations.


B) MHC Binding Predictions using N-terminal peptides inferred by RNA-seq analysis alone. Annotated transcripts exhibiting different N-terminal exons in GC vs normals were identified using two different RNA-seq algorithms (DEXSeq(7) and Voom-diffsplice(8)) (FC>=2, FDR 0.05). This analysis identified 96 genes with potential alternative N-terminal transcripts, of which 46 (48%) were predicted to result in differing N terminal peptides (Purple bar).



FIG. 16: Immunogenicity Assay and Nanostring Profiling


A) Scatter plot of fold change (T vs N) of expression of alternate and canonical probes from NanoString and RNA-seq data of the same samples. An improved correlation is observed using the alternate probes


B) Left—Expression of T-cell markers CD8A, GZMA and PRF1 in SG series (top), TCGA STAD (middle) and ACRG cohort (bottom) with high or low somatic promoter usage after adjustment of tumor purities as estimated by ASCAT. P values (Wilcoxon one sided test) are: CD8A—p=0.09 (SG), 0.004 (TCGA), 0.3 (ACRG); GZMA—0.0001 (SG), 0.002 (TCGA), 0.166 (ACRG), PRF1—0.013 (SG), 0.006 (TCGA), 0.3 (ACRG). Right—Expression of T-cell markers CD8A, GZMA and PRF1 in SG series (top), TCGA STAD (middle) and ACRG cohort (bottom) with high or low somatic promoter usage after adjustment of tumor content as estimated by ESTIMATE. p values (Wilcoxon one sided test) are: CD8A—p=0.28 (SG), 0.17 (TCGA), 0.37 (ACRG), GZMA—0.0005 (SG), 0.03 (TCGA), 0.09 (ACRG), PRF1—0.02 (SG), 0.22 (TCGA), 0.17 (ACRG). Samples with high alternative promoter usage are in red, while those with low usage are in blue.


C) Kaplan-Meier analysis comparing overall survival curves between validation samples with high somatic promoter usage and low somatic promoter usage (split by median) (HR=1.81, P=0.04)


D) Left—Expression of T-cell markers CD8A, GZMA and PRF1 in TCGA STAD with high or low somatic promoter usage after adjustment of mutation burden. P values (Wilcoxon one sided test) are: P=0.02 (CD8A), 0.01 (GZMA) and 0.03 (PRF1). Right—Expression of T-cell markers CD8A, GZMA and PRF1 in ACRG cohort with high or low somatic promoter usage after adjustment of mutation burden. P values (Wilcoxon one sided test) are: P=0.167 (CD8A), 0.009 (GZMA) and 0.03 (PRF1).


E) Heatmap of alternative promoter expression from 264 ACRG GCs for all gained alternative promoters. GC samples have been ordered left to right by their levels of somatic promoter usage.



FIG. 17: Functional Assessment of Peptide Immunogenicity


A) Individual cytokine responses against 15 peptides for other normal donor PBMCs tested against different peptide pools.


B) Experimental Immunogenicity Assay. Experimental design of in-vitro assay—i) Immature dendritic cells (DCs) cultured from CD14+ monocytes from HLA-A02:06 donors were differentiated in mature DCs (see Methods). Mature DCs were exposed to isogenic GC cell lysates (AGS cells) expressing Canonical (CanT) and Somatic (SomT) RASA3 isoforms. ii) Antigen presentation and T-cell activation: DCs presenting Can or Som RASA3 isoforms were co-cultured with HLA-matched T cells, resulting in T-cells primed against CanT or SomT RASA3. Primed T cells were then independently co-cultured with RASA3 CanT or RASA3 SomT expressing GC cells for two days, and markers of T-cell activation were assessed.


C) Concentration of interferon-gamma (IFN-γ) secretion by co-culture of T cells primed with RASA3 CanT or SomT Isoforms, after antigen challenge. RASA3 CanT primed T cells released significantly more IFN-γ when co-cultured with RASA3 CanT expressing cells, compared to T cells primed with RASA3 SomT and co-cultured with RASA3 SomT expressing cells (P=0.02, representative of n=3 experiments). IFN-γ levels were determined by ELISA.



FIG. 18: EZH2 Inhibition


A) Barplot showing increased enrichment of EZH2 binding sites in HFE-145 cells at somatic promoters compared to all promoters (P<0.01).


B) Growth curves of IM95 GC cells after GSK126 administration. Cell proliferation was monitored from 24 to 216 hours and represented relative to DMSO control treated cells (means±s.e.m. represents data from three experiments, and each experiment was performed in duplicate)


C) Top 5 enriched curated gene sets (C2) for the set of genes identified from differential analysis of GSK126 treated vs DMSO control IM95 RNA-seq data at promoter loci.


D) UCSC browser track of alternative promoter ESRRG with loss of promoter activity (GC (red) and normal gastric tissue (blue) H3K4me3). Gain of expression is seen after inhibition of EZH2 using GSK126 in IM95 cells at both day 6 (D6) and Day 9 (D9) treatment.



FIG. 19: Unannotated somatic promoters


A) Barplot showing fold enrichment of L1 (FC=8.02, P<0.001) and ERV1 (FC=2.78, P<0.001) repeat elements at unannotated promoter regions compared to all promoters


B) Boxplot comparing H3K27ac signals (rpm) at unannotated somatic promoters with annotated somatic promoters. Unannotated somatic promoters have lower H3K27ac signals.





DETAILED DESCRIPTION OF THE PRESENT INVENTION

In a first aspect, the present invention refers to a method for determining the presence or absence of at least one promoter in a cancerous biological sample relative to a non-cancerous biological sample. The method comprises contacting the cancerous biological sample with at least one antibody or antibodies specific for histone modifications H3K4me3 and H3K4me1; isolating nucleic acid from the cancerous biological sample having a signal ratio of H3K4me3 relative to H3K4me1 greater than 1, wherein the isolated nucleic acid comprises at least one region or regions specific to said histone modifications; detecting a signal intensity of H3K4me3 in the isolated nucleic acid; and determining the presence or absence of at least one promoter in the cancerous biological sample based on the change in the signal intensity of H3K4me3 relative to the signal intensity of H3K4me3 in a non-cancerous biological sample.


In one embodiment, the cancerous and non-cancerous biological sample may comprise a single cell, multiple cells, fragments of cells, body fluid or tissue. In one embodiment the cancerous and non-cancerous biological sample may be obtained from the same subject.


In one embodiment, the cancerous and non-cancerous biological sample are each obtained from different subjects.


The contacting step in accordance with the method as described herein may comprise the immunoprecipitation of chromatin with the antibodies specific for the histone modifications. Examples of histone modification include but are not limited to H3K27ac, H3K4me3, H3K4me1. In a preferred embodiment, the histone modification is H3K4me3 and/or H3K4me1. In yet another embodiment, the histone modification is H3K27ac.


The method may further comprise mapping at least one promoter from the cancerous biological sample against at least one reference nucleic acid sequence to identify a gene transcript associated with the at least one promoter.


In some embodiments, the at least one reference nucleic acid sequence may comprise a nucleic acid sequence derived from: i) an annotated genome sequence; ii) a de novo transcriptome assembly; and/or iii) a non-cancerous nucleic acid sequence library or database.


In one embodiment, the change of signal intensity of H3K4me3 may be greater than a 0.5 fold, greater than a 1 fold, greater than a 1.5 fold, greater than a 2 fold, greater than a 2.5 fold or greater than a 3 fold increase or decrease relative to the signal intensity of H3K4me3 in the non-cancerous biological sample. In a preferred embodiment, the change of signal intensity of H3K4me3 may be greater than a 1.5 fold increase or decrease relative to the signal intensity of H3K4me3 in the non-cancerous biological sample. In another embodiment, the change of signal intensity of H3K4me3 greater than a 0.5 fold, greater than a 1 fold, greater than a 1.5 fold, greater than a 2 fold, greater than a 2.5 fold or greater than a 3 fold increase relative to the signal intensity of H3K4me3 in a non-cancerous biological sample, may correlate to the presence of at least one cancer-associated promoter in the cancerous biological sample.


In a preferred embodiment the change of signal intensity of H3K4me3 greater than a 1.5 fold increase relative to the signal intensity of H3K4me3 in a non-cancerous biological sample, may correlate to the presence of at least one cancer-associated promoter in the cancerous biological sample.


In one embodiment, the activity of the at least one cancer-associated promoter may correlate with an increase of SUZ12 or EZH2 binding sites relative to the total promoter population.


In one embodiment, an increase of SUZ12 or EZH2 binding sites correlates with an upregulation of activity of the at least one cancer-associated promoter. In another embodiment, the increase of SUZ12 or EZH2 binding sites correlates with a downregulation of activity of the at least one cancer-associated promoter.


In one embodiment, the at least one promoter may be a canonical promoter that is positioned within 100 bp, 200 bp, 300 bp, 400 bp, 500 bp, 600 bp, 700 bp, 800 bp, 900 bp or 1000 bp from a known gene transcript start site. In a preferred embodiment, the at least one promoter may be a canonical promoter that is positioned within 500 bp from a known gene transcript start site. The gene transcript start site may be associated with one or more of a cell-type specification gene, a cell adhesion gene, a cell mediated immunity gene, a gastric cancer-associated or deregulated gene, a PRC2 target gene or a transcription factor. In one embodiment, the gene transcript start site may be associated with an oncogene. The gene transcript start site may be associated with a gene selected from the group consisting of MYC, MET, CEACAM6, CLDN7, CLDN3, HOTAIR, PVT1, HNF4a, RASA3, GRIN2D, EpCAM and a combination thereof.


In one embodiment, the cancer is gastrointestinal cancer, gastric cancer or colon cancer.


In another embodiment, the at least one promoter may be an alternative promoter that may be associated with a canonical promoter, wherein the canonical promoter may be present in both the cancerous biological sample and the non-cancerous biological sample, and i) wherein the alternative promoter may be only present in the cancerous biological sample, or ii) wherein the alternative promoter may be only absent in the cancerous biological sample.


In some embodiments, the at least one promoter is an unannotated promoter that is positioned more than 100 bp, more than 200 bp, more than 300 bp, more than 400 bp, more than 500 bp away, more than 600 bp, more than 700 bp, more than 800 bp, more than 900 bp or more than 1000 bp from a gene transcript start site. In a preferred embodiment, the at least one promoter is an unannotated promoter that is positioned more than 500 bp away from a gene transcript start site.


In one embodiment, the method as described herein further comprises measuring the expression level of the at least one alternative promoter in the cancerous biological sample and non-cancerous biological sample, wherein the measuring comprises digital profiling of reporter probes; and determining the differential expression level of the at least one alternative promoter relative to the non-cancerous biological sample, based on the digital profiling of the reporter probes, to validate the presence or absence of at least one alternative promoter in the cancerous biological sample relative to a non-cancerous biological sample.


The step of measuring may be conducted using a NanoString™ platform.


In another aspect, the present invention provides a method for determining the prognosis of cancer in a subject. The method comprises contacting a cancerous biological sample obtained from the subject with at least one antibody or antibodies specific for histone modification H3K4me3 and H3K4me1; isolating nucleic acid from the cancerous biological sample having a signal ratio of H3K4me3 relative to H3K4me1 greater than 1, wherein the isolated nucleic acid comprises at least one region or regions specific to said histone modifications; detecting a signal intensity of H3K4me3 in the isolated nucleic acid; and determining the presence or absence of at least one cancer-associated promoter in the cancerous biological sample based on the change in the signal intensity of H3K4me3 relative to the signal intensity of H3K4me3 in a reference nucleic acid sequence, wherein the presence or absence of the at least one cancer-associated promoter in the cancerous biological sample is indicative of the prognosis of the cancer in the subject.


In one embodiment, the at least one cancer-associated promoter may be an alternative promoter that is associated with a canonical promoter, wherein the canonical promoter may be present in both the cancerous biological sample and the reference nucleic acid sequence, and i) wherein the alternative promoter may be only present in the cancerous biological sample, or ii) wherein the alternative promoter may be only absent in the cancerous biological sample.


The presence or absence of the at least one alternative promoter in the cancerous sample may indicative of a poor prognosis of cancer survival in the subject.


In one embodiment the method as described herein further comprises measuring the expression level of the at least one alternative promoter in the cancerous biological sample and the reference nucleic acid sequence, wherein the measuring comprises digital profiling of reporter probes; and determining the differential expression level of the at least one alternative promoter relative to the non-cancerous biological sample, based on the digital profiling of the reporter probes, to validate the presence or absence of at least one alternative promoter in the cancerous biological sample relative to the reference nucleic acid sequence.


The step of measuring may be conducted using a NanoString™ platform.


In another aspect the present invention provides a biomarker for detecting cancer in a subject, the biomarker comprising at least one promoter having a change in signal intensity of H3K4me3 in a cancerous biological sample relative to a non-cancerous biological sample.


In one embodiment, the at least one promoter comprises an increase of EZH2 binding sites relative to the total promoter population. In one embodiment, the at least one promoter may be hypomethylated. In another embodiment, the at least one promoter may be hypermethylated.


The at least one promoter may be a canonical promoter that is positioned less than 500 bp away from a gene transcript start site. In one embodiment, the gene transcript start site may be associated with one or more of a cell-type specification gene, a cell adhesion gene, a cell mediated immunity gene, a gastric cancer-associated or deregulated gene, a PRC2 target gene or a transcription factor. In one embodiment, the gene transcript start site may be associated with an oncogene.


In one embodiment, the gene transcript start site may be associated with a gene selected from the group consisting of MYC, MET, CEACAM6, CLDN7, CLDN3, HOTAIR, PVT1, HNF4α, RASA3, GRIN2D, EpCAM or a combination thereof.


In one embodiment, the at least one promoter may be an alternative promoter that may be associated with a canonical promoter, wherein the canonical promoter may be present in both a cancerous sample and a non-cancerous sample, and i) wherein the alternative promoter may be only present in a cancerous sample, or ii) wherein the alternative promoter may be only absent in a cancerous sample.


In one embodiment, the at least one promoter may be an unannotated promoter that may be positioned more than 100 bp, more than 200 bp, more than 300 bp, more than 400 bp, more than 500 bp, more than 600 bp, more than 700 bp, more than 800 bp, more than 900 bp or more than 1000 bp away from a gene transcript start site. In a preferred embodiment, the at least one promoter may be an unannotated promoter that may be positioned more than 500 bp away from a gene transcript start site.


In another aspect, there is provided a method for modulating the activity of at least one cancer-associated promoter in a cell, comprising administering an inhibitor of EZH2 to the cell. In another aspect there is provided a method for modulating the immune response of a subject to cancer, comprising administering to the subject an inhibitor of EZH2, wherein the EZH2 is associated with at least one cancer-associated promoter in the subject.


In one embodiment, the inhibitor of EZH2 may modulate the expression of immunogenic N-terminal peptides.


In one embodiment, the at least one cancer-associated promoter may be an alternative promoter that may be associated with a canonical promoter, wherein the canonical promoter may be present in both a cancerous sample and a non-cancerous sample, and i) wherein the alternative promoter may only be present in a cancerous sample, or ii) wherein the alternative promoter may only be absent in a cancerous sample.


In one embodiment, the alternative promoter is associated with a transcript variant, and wherein the transcript variant encodes a N-terminal protein variant.


In one embodiment, the N-terminal protein variant may be an N-terminal truncated protein or an N-terminal elongated protein. In one embodiment, the inhibitor of EZH2 may be a siRNA or a small molecule.


In one embodiment, the inhibitor of EZH2 may be GSK126.


In another aspect, there is provided use of an inhibitor of EZH2 in the manufacture of a medicament for modulating the activity of at least one cancer-associated promoter in a cell.


In another aspect there is provided use of an inhibitor of EZH2, wherein the EZH2 is associated with at least one cancer-associated promoter in the subject, in the manufacture of a medicament for modulating the immune response of a subject to cancer.


In another aspect, there is provided an inhibitor of EZH2 for use in modulating the activity of at least one cancer-associated promoter in a cell. In yet another aspect, there is provided an inhibitor of EZH2 for use in modulating the immune response of a subject to cancer, wherein the EZH2 is associated with at least one cancer-associated promoter in the subject.


In another aspect there is provided a method for determining the presence or absence of at least one cancer-associated promoter in a cancerous biological sample relative to a non-cancerous biological sample. The method comprises: contacting the cancerous biological sample with antibodies specific for histone modifications H3K4me3 and H3K4me1; isolating nucleic acid from the cancerous biological sample having a signal ratio of H3K4me3 relative to H3K4me1 greater than 1, wherein the isolated nucleic acid comprises regions specific to said histone modifications; detecting a signal intensity of H3K4me3 in the isolated nucleic acid at a read depth of 20M; and determining the presence or absence of at least one cancer-associated promoter in the cancerous biological sample based on the change in the signal intensity of H3K4me3 relative to the signal intensity of H3K4me3 in a non-cancerous biological sample.


EXPERIMENTAL SECTION

Methods and Materials


Primary Tissue Samples and Cell Lines


Primary patient samples were obtained from the SingHealth tissue repository with approvals from institutional research ethics review committees and signed patient informed consent. ‘Normal’ (non-malignant) samples used in this study refers to samples harvested from the stomach, from sites distant from the tumour and exhibiting no visible evidence of tumour or intestinal metaplasia/dysplasia upon surgical assessment. Tumor samples were confirmed by cryosectioning to contain >60% tumor cells. FU97, IM95, MKN7, OCUM1 and RERF-GC-1B cell lines were obtained from the Japan Health Science Research Resource Bank. AGS, KATOIII and SNU16, Hs 1.Int and Hs 738.St/Int gastrointestinal fibroblast lines were obtained from the American Type Culture Collection. NCC-59, NCC-24 and SNU-1967 and SNU-1750 were obtained from the Korean Cell Line Bank. YCC3, YCC7, YCC21, YCC22 were gifts from Yonsei Cancer Centre, South Korea. HFE145 cells were a gift from Dr. Hassan Ashktorab, Howard University. GES-1 cells were a gift from Dr. Alfred Cheng, Chinese University of Hong Kong. Cell line identifies were confirmed by STR DNA profiling using ANSI/ATCC ASN-0002-2011 guidelines. For our study, MKN7 cells, listed as a commonly misidentified cell line by ICLAC (http://iclac.org/databases/cross-contaminations/), exhibited a perfect match (100%) with MKN7 reference profiles in the Japanese Collection of Research Bioresources Cell Bank. All cell lines were negative for mycoplasma contamination as assessed by the MycoAlert™ Mycoplasma Detection Kit (Lonza) and the MycoSensor qPCR Assay Kit (Agilent Technologies). PBMCs from healthy donors were collected under protocol CIRB Ref No. 2010/720/E.


Nano-ChIPseq


Nano-ChIP-Seq was performed as described below.


Primary Tissue and Cell Line Fixation


Fresh-frozen cancer and normal tissues were dissected using a razor blade in liquid nitrogen to obtain—5 mg sized pieces for each ChIP. Tissue pieces were fixed in 1% formaldehyde/PBS buffer for 10 min at room temperature. Fixation was stopped by addition of glycine to a final concentration of 125 mM. Tissue pieces were washed 3 times with TBSE buffer. For cell lines, 1 million fresh harvested cells were fixed in 1% formaldehyde/medium buffer for 10 minutes (min) at room temperature. Fixation was stopped by addition of glycine to a final concentration of 125 mM. Fixed cells were washed 3 times with TBSE buffer, and centrifuged (5,000 r.p.m., 5 min).


ChIP


Pelleted cells and pulverized tissues were lysed in 100 μl 1% SDS lysis buffer and sonicated to 300-500 bp using a Bioruptor (Diagenode). ChIP was performed using the following antibodies: H3K4me3 (07-473, Millipore); H3K4me1 (ab8895, Abcam); H3K27ac (ab4729, Abcam).


WGA


After recovery of ChIP and input DNA, whole-genome-amplification was performed using the WGA4 kit (Sigma-Aldrich) and BpmI-WGA primers. Amplified DNAs were purified using PCR purification columns (QIAGEN) and digested with BpmI (New England Biolabs) to remove WGA adapters.


Library Preparation and Sequencing


30 ng of amplified DNA was used for each sequencing library preparation (New England Biolabs). 8 libraries were multiplexed (New England Biolabs) and sequenced on 2 lanes of a Hiseq2500 sequencer (Illumina) to an average depth of 20-30 million reads per library.


Sequencing reads were trimmed (10 bp from front and back) and mapped against human genome reference hg19 using the Burrows-Wheeler Aligner (BWA) (version 0.6.2) ‘aln’ algorithm. Reading statistics were generated using mapstat from samtools. We filtered reads based on their mapping quality (MAPQ>=10) and used uniquely mapped reads to perform peak calling using CCAT v3.0. We chose a MAPQ value of ≥10 because i) MAPQ≥10 has been previously reported as a reliable value for confident read mapping, ii) MAPQ≥10 has been recommended by the developers of the BWA-algorithm as a suitable threshold for confident mapping, and iii) independent studies comparing various read alignment algorithms have shown that mapping accuracies plateau at a 10-12 MAPQ threshold.


EZH2 ChIP-seq


Cells were cross-linked with 1% formaldehyde for 10 minutes at room temperature, and stopped by adding glycine to a final concentration of 0.2M. Chromatin was extracted and sonicated to ˜500 bp fragments. EZH2 antibodies (Catalog #5246, Cell Signaling) were used for chromatin immunoprecipitation (ChIP). 30 ng of ChIPed DNA was used for each sequencing library preparation (New England Biolabs). The library was sequenced on a Hiseq2500 (Illumina). Input DNA from cells prior to immunoprecipitation was used to normalize ChIP-seq peak calling. Prior to sequencing, qPCR was used to verify that positive and negative control ChIP regions were amplified in the linear range. Sequencing reads were mapped against human genome reference hg19 using the Burrows-Wheeler Aligner (BWA) (version 0.7) ‘aln’ algorithm. Reading statistics were generated using mapstat from samtools. We filtered reads based on their mapping quality (MAPQ>=10) and used uniquely mapped reads to perform peak calling using MACS2.


Quality Control Assessments of Nano-ChIPseq Data


ChIP Enrichment Assessment


We assessed ChIP library qualities (H3K27ac, H3K4me3 and H3K4me1) using two different methods. First, we estimated ChIP qualities, particularly H3K27ac and H3K4me3, by interrogating their enrichment levels at annotated promoters of protein-coding genes. Specifically, we computed median read densities of input and input-corrected ChIP signals around the transcription start sites (TSSs, +/−500 bp) of highly expressed protein-coding genes. For each sample, we then compared read density ratios of ChIP over input as a surrogate of data quality, retaining only those samples where the ChIP/input ratio was greater than 2-fold. Using this criteria, all H3K4me3 and H3K27ac samples (GC lines and primary samples) exhibited greater than 2-fold enrichment, indicating successful enrichment. Second, we used CHANCE (ChIp-seq ANalytics and Confidence Estimation), a software for ChIP-seq quality control and protocol optimization that indicates whether a ChIP library shows successful or weak enrichment. CHANCE assessment confirmed that the large majority (81%) of samples in our study exhibited successful enrichment. Quality status of each library, as assessed by both methods, are reported in Table 1.









TABLE 1







Read Mapping statistics of NanoChIP-seq libraries



























ChIP










# of

enrich-










Peaks

ment









Total
(FDR
CHANCE
around


S.
Patient

Sample
Library
Histone
Total
Mapped
<5%,
Enrich-
TSS


No
No
Group
ID
ID
Modification
Reads
Reads
CCAT)
ment
(>2 Fold)




















1
1
N
2000639
CHG023
H3K4Me1
116,179,997
56,009,114
11,438
successful
yes


2
1
N
2000639
CHG079
H3K4Me3
144,760,092
45,662,594
13,301
successful
yes


3
1
N
2000639
CHG022
H3K27Ac
107,005,238
47,688,264
30,155
successful
yes


4
1
N
2000639
CHG021
Input
108,432,681
53,434,667





5
1
T
2000639
CHG019
H3K4Me1
139,751,844
62,529,719
9,133
successful
yes


6
1
T
2000639
CHG078
H3K4Me3
176,761,815
52,219,714
15,417
successful
yes


7
1
T
2000639
CHG018
H3K27Ac
125,811,014
56,636,793
22,220
successful
yes


8
1
T
2000639
CHG017
Input
133,549,980
62,465,142





9
2
N
2000721
CHG081
H3K4Me3
123,984,264
41,723,243
13,046
successful
yes


10
2
N
2000721
CHG031
H3K4Me1
142,898,092
61,716,210
17,896
successful
yes


11
2
N
2000721
CHG030
H3K27Ac
142,881,448
56,328,103
24,624
successful
yes


12
2
N
2000721
CHG029
Input
144,582,591
67,254,098





13
2
T
2000721
CHG080
H3K4Me3
128,094,707
52,416,345
12,751
successful
yes


14
2
T
2000721
CHG026
H3K27Ac
132,143,844
52,416,345
45,274
successful
yes


15
2
T
2000721
CHG027
H3K4Me1
120,824,194
54,688,706
48,701
successful
yes


16
2
T
2000721
CHG025
Input
150,621,523
65,242,401





17
3
N
2000986
CHG083
H3K4Me3
145,813,278
44,476,466
13,305
successful
yes


18
3
N
2000986
CHG039
H3K4Me1
112,190,461
52,061,916
14,977
successful
yes


19
3
N
2000986
CHG038
H3K27Ac
136,195,033
47,671,991
26,993
successful
yes


20
3
N
2000986
CHG037
Input
125,858,642
58,503,831





21
3
T
2000986
CHG082
H3K4Me3
199,735,230
48,070,517
13,296
successful
yes


22
3
T
2000986
CHG035
H3K4Me1
99,757,592
48,602,649
25,882
successful
yes


23
3
T
2000986
CHG034
H3K27Ac
127,564,120
45,231,776
29,278
successful
yes


24
3
T
2000986
CHG033
Input
127,392,001
57,846,771





25
4
N
980437
CHG087
H3K4Me3
252,269,976
16,106,111
6,925
weak
yes


26
4
N
980437
CHG089
H3K27Ac
248,399,140
21,095,856
20,018
weak
yes


27
4
N
980437
CHG086
input
223,083,607
13,951,728





28
4
T
980437
CHG091
H3K4Me3
254,777,628
12,340,257
7,007
weak
yes


29
4
T
980437
CHG093
H3K27Ac
215,915,787
19,054,278
48,614
weak
yes


30
4
T
980437
CHG090
input
214,007,053
18,743,433





31
5
N
980097
CHG097
H3K27Ac
254,991,965
17,871,717
10,566
weak
yes


32
5
N
980097
CHG094
Input
248,345,017
15,056,998





33
5
T
980097
CHG101
H3K27Ac
254,857,885
16,050,861
81,607
successful
yes


34
5
T
980097
CHG098
Input
235,148,448
16,412,565





35
6
N
990068
CHG441
H3K4Me3
25,942,766
18,661,944
9,040
successful
yes


36
6
N
990068
CHG443
H3K27Ac
28,993,775
20,404,671
30,306
successful
yes


37
6
N
990068
CHG444
Input
16,583,307
14,164,125





38
6
T
990068
CHG437
H3K4Me3
19,295,687
15,981,638
23,546
successful
yes


39
6
T
990068
CHG439
H3K27Ac
30,394,067
26,279,884
84,958
successful
yes


40
6
T
990068
CHG440
Input
54,957,058
46,535,339





41
7
N
2000085
CHG449
H3K4Me3
22,207,074
17,120,624
13,421
weak
yes


42
7
N
2000085
CHG451
H3K27Ac
31,752,518
26,505,029
93,432
successful
yes


43
7
N
2000085
CHG452
Input
23,861,825
20,188,881





44
7
T
2000085
CHG445
H3K4Me3
27,386,842
17,898,292
16,274
successful
yes


45
7
T
2000085
CHG447
H3K27Ac
37,833,126
29,893,873
67,464
successful
yes


46
7
T
2000085
CHG448
Input
25,476,868
21,590,215





47
8
N
980401
GCC005
H3K4Me3
47,143,397
32,011,124
9,739
weak
yes


48
8
N
980401
GCC006
H3K4Me1
49,813,057
38,517,830
29,304
successful
yes


49
8
N
980401
GCC007
H3K27Ac
49,333,955
34,378,734
104,483
successful
yes


50
8
N
980401
GCC008
Input
48,654,609
39,027,473





51
8
T
980401
GCC002
H3K4Me1
46,014,858
35,781,553
5,374
weak
yes


52
8
T
980401
GCC001
H3K4Me3
40,037,248
16,724,980
11,773
successful
yes


53
8
T
980401
GCC003
H3K27Ac
70,844,500
51,841,868
108,169
successful
yes


54
8
T
980401
GCC004
Input
55,650,648
46,769,330





55
9
N
980447
GCC013
H3K4Me3
49,510,760
43,302,748
10,442
successful
yes


56
9
N
980447
GCC014
H3K4Me1
51,911,778
46,524,450
18,916
weak
yes


57
9
N
980447
GCC015
H3K27Ac
43,725,655
38,581,698
147,189
successful
yes


58
9
N
980447
GCC016
Input
43,722,729
36,570,838





59
9
T
980447
GCC010
H3K4Me1
51,224,701
40,643,956
7,959
successful
yes


60
9
T
980447
GCC009
H3K4Me3
41,895,137
28,002,598
9,325
weak
yes


61
9
T
980447
GCC011
H3K27Ac
75,243,898
63,172,397
98,169
successful
yes


62
9
T
980447
GCC012
Input
40,502,678
33,280,117





63
10
N
2001206
GCC021
H3K4Me3
42,094,067
35,485,202
12,682
successful
yes


64
10
N
2001206
GCC022
H3K4Me1
44,213,793
38,760,554
50,615
weak
yes


65
10
N
2001206
GCC023
H3K27Ac
47,356,714
34,355,781
112,565
successful
yes


66
10
N
2001206
GCC024
Input
58,885,884
49,927,340





67
10
T
2001206
GCC017
H3K4Me3
48,193,228
36,729,294
13,835
successful
yes


68
10
T
2001206
GCC018
H3K4Me1
43,730,845
35,480,758
44,504
weak
yes


69
10
T
2001206
GCC019
H3K27Ac
52,518,766
42,398,517
111,758
successful
yes


70
10
T
2001206
GCC020
Input
81,949,870
70,380,385





71
11
N
980436
GCC029
H3K4Me3
27,612,232
20,121,957
12,398
weak
yes


72
11
N
980436
GCC030
H3K4Me1
22,983,565
20,452,059
53,077
weak
yes


73
11
N
980436
GCC031
H3K27Ac
23,061,305
15,315,483
104,880
successful
yes


74
11
N
980436
GCC032
Input
24,411,542
21,182,579





75
11
T
980436
GCC025
H3K4Me3
31,564,679
24,866,375
8,625
weak
yes


76
11
T
980436
GCC026
H3K4Me1
51,645,661
38,028,800
58,456
successful
yes


77
11
T
980436
GCC027
H3K27Ac
51,093,256
35,496,776
102,351
successful
yes


78
11
T
980436
GCC028
Input
25,606,490
20,820,223





79
12
N
980417
GCC037
H3K4Me3
18,976,505
15,277,228
10,387
successful
yes


80
12
N
980417
GCC039
H3K27Ac
30,443,642
25,447,390
70,910
successful
yes


81
12
N
980417
GCC038
H3K4Me1
22,127,416
18,537,610
109,119
successful
yes


82
12
N
980417
GCC040
Input
33,758,416
28,242,473





83
12
T
980417
GCC033
H3K4Me3
42,615,610
27,972,601
10,260
successful
yes


84
12
T
980417
GCC035
H3K27Ac
33,438,272
29,141,996
76,369
successful
yes


85
12
T
980417
GCC034
H3K4Me1
31,115,402
26,172,044
142,635
weak
yes


86
12
T
980417
GCC036
Input
26,806,807
22,277,771





87
13
N
980319
GCC075
H3K4Me3
34,503,108
26,201,666
9,466
successful
yes


88
13
N
980319
GCC076
H3K4Me1
32,308,832
28,194,660
56,964
weak
yes


89
13
N
980319
GCC077
H3K27Ac
28,534,828
24,595,902
73,073
successful
yes


90
13
N
980319
GCC078
Input
31,533,287
26,147,884





91
13
T
980319
GCC071
H3K4Me3
31,707,599
22,793,555
14,049
succesful
yes


92
13
T
980319
GCC073
H3K27Ac
42,548,744
35,755,479
102,971
successful
yes


93
13
T
980319
GCC072
H3K4Me1
28,112,304
24,361,418
196,347
weak
yes


94
13
T
980319
GCC074
Input
28,895,896
24,529,014





95
14
N
990275
GCC088
H3K4Me3
39,968,810
31,536,231
7,964
successful
yes


96
14
N
990275
GCC089
H3K27Ac
52,738,627
22,089,449
70,246
successful
yes


97
14
N
990275
GCC090
Input
33,342,252
21,049,309





98
14
T
990275
GCC085
H3K4Me3
26,399,904
14,795,436
25,423
weak
yes


99
14
T
990275
GCC086
H3K27Ac
45,712,891
25,668,453
183,458
successful
yes


100
14
T
990275
GCC087
Input
40,285,061
32,790,063





101
15
N
2000877
GCC082
H3K4Me3
52,151,546
22,229,998
11,368
successful
yes


102
15
N
2000877
GCC083
H3K27Ac
45,775,899
41,027,897
61,175
weak
yes


103
15
N
2000877
GCC084
Input
38,226,148
30,117,584





104
15
T
2000877
GCC079
H3K4Me3
49,368,282
24,022,463
9,837
successful
yes


105
15
T
2000877
GCC080
H3K27Ac
38,621,705
33,990,267
41,048
successful
yes


106
15
T
2000877
GCC081
Input
38,824,621
32,814,299





107
16
N
20020720
GCC100
H3K4Me3
58,679,413
34,278,884
9,901
successful
yes


108
16
N
20020720
GCC101
H3K27Ac
43,532,496
37,750,917
65,167
successful
yes


109
16
N
20020720
GCC102
Input
39,544,734
31,454,551





110
16
T
20020720
GCC097
H3K4Me3
57,599,648
16,022,427
12,922
successful
yes


111
16
T
20020720
GCC098
H3K27Ac
35,400,105
29,507,542
74,115
successful
yes


112
16
T
20020720
GCC099
Input
37,092,424
29,452,932





113
17
N
20021007
GCC094
H3K4Me3
56,788,147
18,217,449
16,073
successful
yes


114
17
N
20021007
GCC095
H3K27Ac
40,488,514
33,372,754
122,851
successful
yes


115
17
N
20021007
GCC096
Input
40,712,616
34,440,613





116
17
T
20021007
GCC091
H3K4Me3
33,903,211
27,230,052
7,843
weak
yes


117
17
T
20021007
GCC092
H3K27Ac
50,268,912
19,156,361
98,104
successful
yes


118
17
T
20021007
GCC093
Input
34,936,961
29,417,989





119
CL1 
FU97
FU97
GCC043
H3K27Ac
30,087,131
22,566,178
21,867
successful
yes


120
CL1 
FU97
FU97
GCC041
H3K4Me3
26,986,288
23,243,556
26,562
successful
yes


121
CL1 
FU97
FU97
GCC045
Input
33,566,067
23,430,741





122
CL10
RERF-
RERF-
CHG374
H3K27Ac
39,882,820
19,500,590
11,201
successful
yes




GC-1B
GC-1B









123
CL10
RERF-
RERF-
CHG371
H3K4Me3
42,450,431
25,988,948
16,625
successful
yes




GC-1B
GC-1B









124
CL10
RERF-
RERF-
CHG376
Input
21,437,700
16,948,709







GC-1B
GC-1B









125
CL11
SNU16
SNU16
CHG236
H3K27Ac
21,726,635
16,967,938
13,619
successful
yes


126
CL11
SNU16
SNU16
CHG233
H3K4Me3
20,136,058
18,151,002
19,445
successful
yes


127
CL11
SNU16
SNU16
CHG232
Input
19,522,181
14,558,761





128
CL12
SNU1750
SNU1750
CHG230
H3K27Ac
18,716,777
15,805,037
15,074
successful
yes


129
CL12
SNU1750
SNU1750
CHG227
H3K4Me3
16,655,044
14,883,880
18,130
successful
yes


130
CL12
SNU1750
SNU1750
CHG226
Input
19,602,424
13,575,272





131
CL13
YCC21
YCC21
CHG429
H3K27Ac
22,884,268
13,861,557
21,415
successful
yes


132
CL13
YCC21
YCC21
CHG427
H3K4Me3
22,788,225
15,669,142
20,120
successful
yes


133
CL13
YCC21
YCC21
CHG431
Input
40,378,916
34,747,778





134
CL13
YCC22
YCC22
GCC063
H3K27Ac
33,314,935
23,877,905
11,774
successful
yes


135
CL13
YCC22
YCC22
GCC061
H3K4Me3
27,410,298
24,163,717
25,417
successful
yes


136
CL13
YCC22
YCC22
GCC065
Input
26,685,596
18,976,555





137
CL14
YCC3 
YCC3 
GCC053
H3K27Ac
27,581,400
21,579,098
14,118
successful
yes


138
CL14
YCC3 
YCC3 
GCC051
H3K4Me3
22,106,259
18,914,296
17,276
success
yes


139
CL14
YCC3 
YCC3 
GCC055
Input
27,745,993
18,854,658





140
CL15
YCC7 
YCC7 
CHG424
H3K27Ac
38,599,550
22,445,268
32,770
successful
yes


141
CL15
YCC7 
YCC7 
CHG422
H3K4Me3
19,594,480
14,546,474
22,521
successful
yes


142
CL15
YCC7 
YCC7 
CHG426
Input
24,527,190
21,748,808





143
CL2 
HFE145
HFE145
CHG245
H3K4Me3
24,122,708
19,760,850
18,492
successful
yes


144
CL2 
HFE145
HFE145
CHG244
Input
22,447,791
17,960,470





145
CL2 
HFE145
HFE145
HFE145-
H3K4Me3
50,701,700
45,821,209
17,299
weak







EZH2-












MJ-5246








146
CL2 
HFE145
HFE145
HFE145-
Input
36,885,332
36,157,452









input-MJ








147
CL3 
Hs1.Int
Hs1.Int
HsInt-
H3K4Me3
37,088,221
32,789,363
22,518
successful







K4me3.












merged








148
CL3 
Hs1.Int
Hs1.Int
HsInt-G-
H3K4Me3
30,617,105
27,713,302
20,298
successful






(replicate)
K4me3.












merged








149
CL3 
Hs1.Int
Hs1.Int
HsInt-
Input
32,275,816
28,576,200









input.












merged








150
CL4 
Hs738.
Hs738.
Hs738-
H3K4Me3
37,945,394
33,334,651
150,552
successful





St/Int
St/Int
K4me3.












merged








151
CL4 
Hs738.
Hs738.St/
Hs738-
Input
32,275,816
24,581,922







St/Int
Int
K4me3.












merged








152
CL5 
IM95
IM95
CHG434
H3K27Ac
23,309,435
9,168,213
27,692
successful
yes


153
CL5 
IM95
IM95
CHG432
H3K4Me3
25,179,506
14,069,213
19,956
successful
yes


154
CL5 
IM95
IM95
CHG436
Input
37,968,519
33,292,944





155
CL6 
KATO3
KATO3
CHG242
H3K27Ac
24,559,532
17,356,721
28,730
successful
yes


156
CL6 
KATO3
KATO3
CHG238
Input
20,527,352
14,593,025





157
CL7 
MKN7
MKN7
CHG419
H3K27Ac
35,301,333
30,804,178
24,268
successful
yes


158
CL7 
MKN7
MKN7
CHG417
H3K4Me3
28,119,400
24,793,006
23,766
successful
yes


159
CL7 
MKN7
MKN7
CHG421
Input
35,839,896
31,791,610





160
CL8 
NCC59
NCC59
CHG218
H3K27Ac
22,973,156
19,828,610
14,937
successful
yes


161
CL8 
NCC59
NCC59
CHG215
H3K4Me3
15,642,441
13,907,147
12,410
successful
yes


162
CL8 
NCC59
NCC59
CHG214
Input
17,926,188
13,139,789





163
CL9 
OCUM1
OCUM1
CHG212
H3K27Ac
24,573,737
20,570,185
17,284
successful
yes


164
CL9 
OCUM1
OCUM1
CHG209
H3K4Me3
19,557,872
17,178,274
15,445
successful
yes


165
CL9 
OCUM1
OCUM1
CHG208
Input
20,585,679
16,680,529












Promoter Analysis


Promoter (H3K4Me3 hi/H3K4Me1 lo) regions were identified by calculating the H3K4Me3:H3K4Me1 ratio for all H3K4Me3 regions merged across normal and GC samples. We estimated the required sample size to achieve 80% power and 10% type I error (http://powerandsamplesize.com/) based on the average signals of top 100 differential promoters between tumor and normal samples. This result yielded a recommended sample size of 11 (average), which is met in our study (16 N/T). Regions with H3K4Me3:H3K4Me1 ratios <1 in both normal and GC samples were excluded from further analysis. For all analyses performed in this study, promoter regions were defined as genomic locations exhibiting H3K4me3 hi/me1 low signals, and for all subsequent analyses, it was only within this pre-defined H3K4me3 hi/me1 low subset that H3K4me3 signals were compared. H3K27ac data was used for correlative analysis. H3K4me3 data (fastqs) for colon carcinoma lines was downloaded from public databases—Hct116 and Caco2 from ENCODE and V503 and V400 from GSE36204. To compare promoter signals between GC and normal samples, we used the DESeq2 and edgeR bioconductor packages using a read count matrix of chipseq signals, adjusting for replicate information. Regions with fold changes greater than 1.5 (FDR 0.1) were selected as significantly different. The criteria of FC 1.5 and q<0.1 was based on previous literature comparing ChIP-seq profiles using DESeq2 and edgeR also using similar thresholds. Significantly altered promoters identified by DESeq2 overlapped almost completely with altered promoters found by edgeR. A regularized log transformation of the DESeq2 read counts was used to plot PCAs and heatmaps.


Transcriptome Analysis


RNA-seq data was obtained from the European Genome-phenome Archive under Accession No: EGAS00001001128. Data was processed by first aligning to GENCODE v19 transcript annotations using TopHat v2.0.12. Cufflinks 2.2.0 was used to generate FPKM abundance measures. For identification of novel transcripts, Cufflinks was used without employing a reference transcript annotation. Transcripts were then merged across all GC and normal samples and compared against GENCODE annotations to identify novel transcripts using Cuffmerge 2.2.0. Deep-depth strand-specific RNA sequencing was also performed on 10 additional primary samples. Total RNA was extracted using the Qiagen RNeasy Mini kit, and RNA-seq libraries were constructed according to manufacturer's instructions using Illumina Stranded Total RNA Sample Prep Kit v2 (Illumina, San Diego, Calif., USA) Ribo-Zero Gold option (Epicentre, Madison, Wis., USA), and 1 ug total RNA. Sequencing was performed using the paired-end 101 bp read option. TCGA datasets were downloaded from TCGA Data Portal (https://tcga-data.nci.nih.gov/tcga) in form of fastq files which were then aligned to GENCODE v19 transcript annotations using TopHat v2.0.12. To analyze promoter-associated RNA expression, RNA-seq reads from TCGA samples (tumors and normals) were mapped against the genomic locations of promoter regions originally defined by epigenomic profiling in the discovery samples, including all promoters, gained somatic promoters, and lost somatic promoters (see FIG. 1 in Main Text). RNA-seq reads mapping to these epigenome-defined promoter regions were then quantified, normalized by promoter length (kilobases) and by total library size, and fold changes in expression were computed between tumor and normal TCGA sample groups. Length of promoter loci was defined as the number of base pairs (bps) between the start and stop genomic coordinate of the H3K4me3 region as identified by the peak caller program CCAT v3.0. (190) Isoform level quantification for alternative promoter driven transcripts was performed using cufflinks (FPKM), Kallisto (TPM) and MISO (isoform centric analysis). Assigned counts for each isoform were normalized by DESeq2.


DNA Methylation Analysis


Genomic DNA of gastric tumors and matched normal gastric tissues was extracted (QIAGEN) and processed for DNA methylation profiling using Illumina HumanMethylation450 BeadChips (HM450). Methylation β-values were calculated and background corrected using the methylumi R BioConductor package. Normalization was performed using the BMIQ method (wateRmelon package in R). CpG island locations were downloaded from the UCSC genome browser. Overlaps of at least 1 bp between promoter loci and CpG islands were identified using BEDTools intersect. For each group (all promoters, gained somatic promoters and lost somatic promoters), we identified probes overlapping the predicted promoter regions and calculated average beta value differences. A two-sample Wilcoxon test was performed.


Survival Analysis


Kaplan-Meier survival analysis was used with overall survival as the outcome metric. Log-rank tests were used to assess the significance of the Kaplan-Meier analysis.


Gene Set Enrichment Analysis


Gene set enrichment analysis was performed using MsigDB by computing the overlap of genes associated with somatic promoters against the C2 set of curated genes.


Mass Spectrometry and Data Analysis


Peptide level mass spectrometry data for 90 colon and rectal cancer (CRC) samples and 60 normal colon epithelium samples were downloaded from the CPTAC portal generated by the Clinical Proteomic Tumor Analysis Consortium (NCl/NIH). (https://cptac-data-portal.georgetown.edu/cptac). Spectral counts were extracted using IDPicker's idQuery tool. Differentially expressed peptides were identified by fitting a linear model (limma R) on quantile normalized and log2 transformed spectral counts. For GC cell line mass spectrometry, AGS, GES-1, SNU1750 and MKN1 cells were extracted with RIPA buffer supplemented with protease inhibitor. 150 μg protein extract of each biological quadruplicate (i.e. 4 replicates per cell line) were separated on a 12% NuPAGE Novel Bis-Tris precast gel (Thermo Scientific). For in-gel digestion, samples were separated into two fractions and reduced in 10 mM DTT for 1 h at 56° C. followed by alkylation with 55 mM iodoacetamide (Sigma) for 45 min in the dark. Tryptic digests were performed in 50 mM ammonium bicarbonate buffer with 2 μg trypsin (Promega) at 37° C. overnight. Peptides were desalted on StageTips and analysed by nanoflow liquid chromatography on an EASY-nLC 1200 system coupled to a Q Exactive HF mass spectrometer (Thermo Fisher Scientific). Peptides were separated on a C18-reversed phase column (25 cm long, 75 μm inner diameter) packed in-house with ReproSil-Pur C18-QAQ 1.9 μm resin (Dr Maisch). The column was mounted on an Easy Flex Nano Source and temperature controlled by a column oven (Sonation) at 40° C. A 225-min gradient from 2 to 40% acetonitrile in 0.5% formic acid at a flow of 225 nl/min was used. Spray voltage was set to 2.4 kV. The Q Exactive HF was operated with a TOP20 MS/MS spectra acquisition method per MS full scan. MS scans were conducted with 60,000 and MS/MS scans with 15,000 resolution. For data analysis, raw files were processed with MaxQuant version 1.5.2.8 against the UNIPROT annotated human protein database. Carbamidomethylation was set as a fixed modification while methionine oxidation and protein N-acetylation were considered as variable modifications. Search results were processed with MaxQuant filtered with a false discovery rate of 0.01. The match between run option and LFQ quantitation were activated. LFQ intensities were filtered for potential contaminants, reverse proteins and loge transformed. They were then imputed using open source software Perseus (0.5 width, 1.8 downshift) and fitted using linear models (limma R).


5′ RACE and Gene Cloning


5′ Rapid amplification of cDNA ends (5′ RACE) was performed using the 5′ RACE System for Rapid Amplification of cDNA Ends, Version 2 (Invitrogen, 18374-058). Briefly, 2 μg of total RNA was used for each reverse transcription reaction with SuperScript™ II reverse transcriptase and gene-specific primer 1 for each gene. After cDNA synthesis, RNase mix (RNase H and RNase T1) was used to degrade the RNA. First strand cDNAs were then purified with S.N.A.P. columns, and tailed with dCTP and TdT. dC-tailed cDNAs were amplified using the abridged anchor primer and nested gene-specific primer 2 by Go Taq®Hot Start Polymerase (Promega, M5001). Subsequently, primary PCR products were reamplified with the abridged universal amplification primer (AUAP), and gene-specific primer 3. Gel electrophoresis was performed. PCR bands of interest were excised and purified for cloning with the TA Cloning Kit (Invitrogen, K2020). A minimum of 12 independent colonies were isolated, and purified plasmid DNA was sequenced bi-directionally on an ABI 3730 DNA analyzer (Applied Biosystems) (Table 2). Constructs for MET transcripts were generated by PCR amplification of full-length cDNAs encoding wild type and variant MET from KATOIII cells. Wild type and variant RASA3 full-length transcripts were PCR amplified from NCC59 cells. cDNA fragments were cloned into the pCI-Puro-HA vector (modified from Promega's pCI-Neo vector, a gift from Wanjin Hong, Institute of Molecular and Cell Biology, Singapore). Plasmids were transiently transfected into cell lines using Lipofectamine 3000 (Thermo Scientific).









TABLE 2







RACE Primers











Gene
Gene
Gene



specific
specific
specific


Gene
primer 1
primer 2
primer 3





RASA3
5′GGAGTAGATACGC
5′CACAGCCAGTG
5′CTTCTCCACTG



TCCGT3′
GCCGCTCAGGTA3′
CCAGGATGTT3′



(SEQ ID 
(SEQ ID 
(SEQ ID



NO: 1837)
NO: 1838)
NO: 1839)





MET
5′TAGGAGAATGTAC
5′GGAGACACTGG
5′CGAGAAACCAC



TGTAT 3′
ATGGGAGTC 3′
AACCTGCAT3′



(SEQ ID 
(SEQ ID 
(SEQ ID



NO: 1840)
NO: 1841)
NO: 1842)









Western Blotting


3×105 HEK293 cells were seeded and transfected using Lipofectamine 3000 (Thermo Scientific). Cells were serum starved for 16 hours before addition of human HGF (R&D systems, 100 ng/ml) for 0, 15 and 30 minutes, and immediately harvested with cold Triton-X100 Lysis Buffer (50 mM Tris pH 8.0, 150 mM NaCl, 1% Triton X-100) with protease and phosphatase inhibitors (Roche) on ice. Protein concentration was measured by Pierce BCA protein assay (Thermo Scientific). Cell lysates were heated at 95° C. for 10 min in SDS sample buffer and 20 μg of each cell lysate was loaded per well. Proteins were transferred to nitrocellulose membranes. Western blotting was performed by incubating membranes 4 hrs at room temperature with the following antibodies: Met & β-actin (Santa Cruz), p-MET (Y1234/1235 & Y1349), pSTAT3 (S727 & Y705), STAT3, ERK, p-ERK, Gab1, pGab1 (Y627) (Cell Signaling). Membranes were incubated in secondary antibodies at 1:3,000 for 1 hr at room temperature and developed with SuperSignal West Femto Maximum Sensitivity substrate (Thermo Scientific) using ChemiDoc™ MP Imaging System (BIO-RAD). Western blot bands were quantified using Image Lab software (BIO-RAD). Experiments were repeated in triplicate.


Cell Proliferation Assays


3×103 GES1, SNU1967 and AGS cells were plated into 96-well plates in media with 10% fetal bovine serum and left overnight to attach. The next day (Day 0), cells were transiently transfected with wild-type and variant RASA3 constructs using Lipofectamine 3000 (Thermo Scientific). The amount of the constructs was 40 ng/well for AGS and 100 ng/well for GES1 and SNU1967 cells. Cell proliferation was measured by the WST-8 assay (Cell Counting Kit-8, Dojindo) from 24 to 120 hours post-transfection. 10 uL of WST-8 solution was added per well and the absorbance reading was measured at 450 nm after 2 hours of incubation in a humidified incubator.


Transfection with RASA3 siRNAs


Two RASA3 siRNAs were used to silence the RASA3 SomT transcript in NCC24 cells (hs.Ri.RASA3.13.1 TriFECTa® Kit DsiRNA Duplex (Integrated DNA Technologies), and Silencer® Select Pre-Designed siRNA s355 (Life Technologies)). NCC24 cells were transfected either with the above two siRNAs or a non-targeting control (ON-TARGETplus Non-targeting pool, Dharmacon) at a final concentration of 100 nM for 48 hours, subsequently followed by qPCR and western validation and migration/invasion assays.


Migration and Invasion Assays


To determine cell migratory capacities, RASA3 wild type and variant transfected AGS and GES1, SNU1967 and AGS, and siRNA treated NCC24 cells were tested using Corning Costar 6.5 mm Transwell with 8.0 μm Pore Polycarbonate Membrane Inserts (3422, Corning, N.Y., USA). 2.5×104 AGS cells and 2×104 GES1 cells, 3×104 SNU1967 cells and 5×104 NCC24 cells were suspended in 0.1 ml serum-free RPMI medium and added to the top of the Transwell insert. 0.6 ml RPMI containing 10% FBS was added into the bottom well as a chemoattractant. After incubation for 24 h at 37° C. in a 5% CO2 incubator, cells were fixed with 3.7% formaldehyde and permeabilized with 100% methanol. Non-migrated cells were scraped off with cotton swabs from the upper surface of the membrane. Migrated cells were stained with 0.5% crystal violet. The number of migrated cells were represented as the total area of migrated cells vs the area of transwell membrane calculated using ImageJ software. For cell invasion assays, the above Transwell inserts were coated with 0.1 ml (300 μg/mL) Corning Matrigel matrix (354234, Corning, N.Y., USA) for 2 to 4 h at 37° C. before use. All subsequent steps were identical to the migration assay protocol.


Measurement of RASA3 mRNA Levels


Total RNA was extracted from three independent experiments using the Qiagen RNAeasy mini kit according to manufacturer's instructions. RNA was reverse transcribed using Improm-II™ Reverse Transcriptase (Promega). Real time PCR was performed in triplicate using Quantifast SYBR Green PCR kit (Qiagen) on an Applied Biosystems HT7900 Real Time PCR System. Fold change was calculated using the Delta Ct method and normalised to β-actin. Primer sequences are as follows. β-actin: F-5′ TCCCTGGAGAAGAGCTACG 3′ (SEQ ID NO: 1843), R-5′ GTAGTTTCGTGGATGCCACA 3′ (SEQ ID NO: 1844); RASA3 SomT: F-5′ TTGTGAGTGGTTCAGCGGTA 3′ (SEQ ID NO: 1845), R-5′ TCAAGCGAAACCATCTCTTCT 3′ (SEQ ID NO: 1846).


RAS-GTP Assay


GES1 cells were transfected with either RASA3 CanT, RASA3 SomT or empty vector for 48 hours. Cells were harvested for protein in FBS containing media or subjected to over-night serum starvation followed by serum stimulation for 30 minutes prior to harvest. Proteins were extracted using ice-cold lysis buffer (Active RAS Pull-down and Detection Kit) containing protease inhibitor cocktail (Nacalai Tesque). Active RAS fraction was obtained using the Active RAS Pull-down and Detection Kit (Thermo Fisher Scientific) according to manufacturer's instructions. Total RAS was measured in corresponding whole cell protein lysates. B-actin was used as a loading control. Protein concentrations were determined using the Pierce BCA protein assay (Thermo Scientific). SDS sample buffer was added to the lysates and boiled at 100° C. for 5 minutes. Samples were loaded in each well of a 4-15% Mini-Protean TGX gel (Biorad) and transferred to a PVDF membrane using a semi-dry blotting system (Biorad). Membranes were probed with anti-RAS (1 in 200 dilution, supplied in Active RAS Pull-down and Detection Kit), or B-actin (1 in 5000 dilution, Sigma A5316) in 5% milk-PBST at 4° C. over-night. Secondary anti-mouse antibody (LNA931, Amersham) was used at a dilution of 1 in 2000 for 1 hour at room temperature. Membranes were developed using Amersham ECL Prime Western Blotting Detection Reagent and imaged using a Chemidoc Imaging system (Biorad).


Altered Peptide and Antigen Prediction


Altered peptides were defined as variant N-terminal protein sequences arising from somatic alterations in alternative promoter usage. The following filters were applied to select the pool of altered peptides—i) Fold change of at least 1.5 for alternate vs. canonical RNA-seq expression ii) Only one canonical and one alternate isoform per gene loci iii) Annotated transcripts are confirmed as protein coding by Gencode. Canonical promoters were defined as regions exhibiting unaltered H3K4me3 peaks. Random peptides from the human proteome were generated from amino acid sequences of Gencode coding transcripts. N-terminal peptide gains were identified as cases where the alternative transcript was associated with a different 5′ region predicted to result in a different translated protein sequence compared to the canonical transcript. For each N terminal altered protein, we evaluated binding of 9-mer peptides using the NetMHCpan 2.8 using a strict threshold of IC<=50 nm to identify strong MHC binders. N-terminal gained peptides were mapped against protein assembly data of the same gene to evaluate protein expression. Antigen predictions were performed against HLA types of 13 GC samples predicted using OptiType. OptiType was run using default parameters except BWA mem was used as an aligner for pre-filtering reads aligning to the Optitype provided reference sequences. 3 samples with poor coverage and unpaired reads with mismatches were omitted from analysis. Eleven HLA-A, HLA-B, and HLA-C allelic variants of increased prevalence in the South East Asian population (HLA-A*02:07/HLA-A*11:01/HLA-A*24:02/HLA-A*33:03/HLA-A*24:07, HLA-B*13:01/HLA-B*40:01/HLA-B*46:01, HLA-C*03:04/HLA-C*07:02/HLA-C*08:01) were obtained from the Allele Frequency Net Database (http://www.allelefrequencies.net).


Association of Cytolytic Markers with Alternative Promoter Usage


Local immune cytolytic activity was evaluated using the expression of Granzyme A (GZMA) and Perforin (PRF1). Tumor content was estimated using two algorithms—ASCAT(79) (aberrant cell fraction) and ESTIMATE (tumor purity). Expression data for the SG series was downloaded (GSE15460) and normalized using the robust multi-array average algorithm in the ‘affy’ R package and loge transformed. Affymetrix SNP Array 6.0 data for the SG series was downloaded from GSE31168 and GSE85466. Mutation frequencies for TCGA STAD samples were downloaded from the TCGA STAD publication data (https://tcga-data.nci.nih.gov/docs/publications/stad_20140 using level 2 curated MAF files (QCv5_blacklist_Pass.aggregated.capture.tcga.uuid.curated.somatic.maf) filtered for “Missense” variant classification. Expression data for TCGA STAD samples (TPM) was computed using the kallisto algorithm. Raw SNP Array 6.0.CEL files for TCGA gastric cancers (STAD) were downloaded from the GDC data portal (https://gdc-portal.nci.nih.gov/). Access to this dataset was obtained using dbGaP credentials and an ID issued by eRA commons. Precomputed ESTIMATE scores for TCGA STAD were downloaded from http://bioinformatics.mdanderson.org/estimate/and converted to tumor purity using the formula cos (0.6049872018+0.0001467884×ESTIMATE score). Preprocessed expression data for the ACRG series was downloaded from GSE62254, and pre-computed ASCAT scores obtained from collaborators (JL). Expression of cytolytic markers was adjusted for missense mutation and tumor purity frequencies using a spline regression model.


Peptides and Cells for Cytokine Assays


A set of peptides for 15 representative alternative promoters was purchased from GenScript (GenScript). Peptide sequences and composition of peptide pools for each alternative promoter are described in Table 3. Control peptide pools for human Actin were purchased from JPT (PM-ACTS, PepMix™ Human (Actin) JPT). Peripheral blood mononuclear cells (PBMCs) were obtained from 9 healthy volunteers of whom 8 PBMC samples were HLA-typed (Table 3).









TABLE 3







HLA types of healthy PBMC donors










Sample
HLA-A
HLA-B
HLA-C
















Donor 1
A*11:01
A*24:02
B*15:01
B*51:01
C*04:01
C*14:02


Donor 2
A*11:01
A*33:03
B*40:01
B*58:01
C*03:02
C*07:02


Donor 3
A*03:01
A*33:03
B*35:03
B*38:01
C*12:03
C*12:03


Donor 4
A*02:07
A*24:07
B*15:02
B*46:01
C*01:02
C*08:01


Donor 5
A*02:03
A*11:01
B*15:02
B*51:01
C*08:01
C*14:02


Donor 6
A*02:01
A*68:01
B*15:13
B*40:06
C*08:01
C*15:02


Donor 7
A*02:07
A*33:03
B*27:04
B*58:01
C*03:02
C*12:02


Donor 8
A*02:03
A*11:01
B*38:02
B*46:01
C*01:02
C*07:02








Donor 9
Not determined









EpiMAX Assay


PBMCs were labelled with 1 μM CFSE (Life Technologies, Thermo Fisher Scientific) and cultured at a density of 200,000 cells per well in complete culture medium (cRPMI comprising RPMI 1640 medium (Gibco, Thermo Fisher Scientific), 15 mM HEPES (Gibco), 1% non-essential amino acid (Gibco), 1 mM sodium pyruvate (Gibco), 1% penicillin/streptomycin (Gibco), 2 mM L-glutamine (Gibco), 50 μM β2-mercaptoethanol (Sigma, Merck), and 10% heat-inactivated FCS (Hyclone)) for 5 days. Individual peptide pools of each alternative promoter were added at the start of the culture at a concentration of 1 μg/ml for each peptide. At the end of day 5, cells were stained with LIVE/DEAD® fixable near-IR dead cell stain kit (Life Technologies), and labelled with CD4-BUV737 (BD), CD8-PacificBlue (BD), CD3-PE (BioLegend), CD19-PE/TexasRed (Beckman), and CD56-APC (BD). Analysis of T cell proliferation by CFSE dilution was performed by flow cytometry using a LSRII (BD). In addition, magnetic bead-based cytokine multiplex analysis (human cytokine panel 1, Millipore, Merck) was performed on cell culture supernatants to measure secreted cytokine levels.


IFN-γ Assay


To test the immunogenicity of the RASA3 WT and Variant protein sequences, CD14+ monocytes were isolated from a HLA-A*02:06 donor by positive selection using magnetic beads (Miltenyi, Germany). Dendritic cells were generated by GM-CSF (1000 IU/ml) and IL-4 (400 IU/ml), and further matured by TNF (10 ng/ml), IL-1b (10 ng/ml), IL-6 (10 ng/ml) (Miltenyi, Germany) and PGE2 (1 μg/ml) (Stemcell Technologies, Canada) for 24 hours. The DCs were then primed with AGS cell lysates expressing WT RASA3 or Variant RASA3 for 24 hours, before being co-cultured with T cells from the same donor at the ratio of 1:5. After 5 days of co-culture with DC, T cells were isolated by positive selection using CD3 magnetic beads (Miltenyi, Germany) and co-cultured with AGS cells expressing either WT or Variant RASA3 at the ratio of 20:1 for two days. Supernatants were harvested and IFN-γ release was measured by ELISA (R&D, USA).


NanoString Analysis


Nanostring nCounter Reporter CodeSets were designed for 95 genes (83 upregulated in GC and 11 downregulated) and 5 housekeeping genes (AGPAT1, CLTC, B2M, POL2RL and TBP covering a broad expression range) on the SG series samples. For each gene, we designed 3 probes, targeting a) the 5′ end of the alternate promoter location, b) the 5′ end of the canonical promoter (defined by promoter regions of equal enrichment in both GC and normal samples OR the longest protein coding transcript) and c) a common downstream probe. Vendor-provided nCounter software (nSolver) was used for data analysis. Raw counts were normalized using the geometric mean of the internal positive control probes included in each CodeSet.


A separate NanoString assay was designed for 88 genes on the ACRG cohort. For each gene, we designed 3 probes, targeting a) the 5′ end of the alternate promoter location, b) the 5′ end of the canonical promoter (defined by promoter regions of equal enrichment in both GC and normal samples OR the longest protein coding transcript).


Repeat Enrichment Analysis


Repetitive element families over-represented at regions exhibiting somatic promoter alterations were identified using RepeatMasker annotations from the UCSC Table Browser (GRCh37/hg19). “Unknown”, “Simple_Repeat” and “Satellite” annotations were filtered from the repeat set. Repetitive elements were included only if they overlapped a promoter by a minimum of 50%. Enrichment of repetitive element families was assessed using a binomial test with Benjamini-Hochberg FDR correction and all promoter regions were used as the background.


Functional Prediction Analysis


Genome wide and tissue specific functional scores were downloaded from GenoCanyon (http://genocanyon.med.yale.edu/GenoCanyon_Downloads.html, Version 1.0.3) and GenoSkyline (http://genocanyon.med.yale.edu/GenoSkyline) respectively. Overlaps were calculated using bedtools IntersectBed and functional scores over each unannotated somatic promoter were computed.


Transcription Factor Enrichment


Transcription factor binding sites for 237 TFs were obtained from the ReMap database, a public database of ENCODE and other public Chip-seq TFBS data sets. Overlaps were calculated and counted against the somatic promoter set. Relative enrichment scores were calculated as ratio of (#bases in state and overlap feature)/(#bases in genome) and [(#bases overlap feature)/(#bases in genome)×(#bases in state)/(#bases in genome)].


EZH2 Inhibition


IM95 were treated with GSK126 (Selleck, USA), a selective EZH2 inhibitor, at a concentration of 5 uM. Cell proliferation was monitored in 96-well plates post-treatment with GSK126 using the CellTiter-Glo® Luminescent Cell Viability Assay (Promega) for three independent experiments. For RNA-seq analysis, total RNA was extracted using the Qiagen RNAeasy mini kit according to manufacturer's instructions. Cells were treated with GSK126 (Selleck, USA; dissolved in DMSO) at a concentration of 5 uM. Control cells were treated with the same concentration of DMSO (0.1%). RNAseq differential analysis for promoter loci was carried out using edgeR on read counts mapping to H3K4me3 regions estimated using featureCounts. RNAseq gene level differential analysis was performed using cuffdiff2.2.1.


Additional Information


Accession codes: Genomic data for this study has been deposited in the National


Center for Biotechnology GEO database, under accession numbers GSE51776 and GSE75898. (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=kfoxqeamzftpal&acc=GSE75898)


Results


Identifying Epigenomic Promoter Alterations in GC


Using NanoChIP-seq, we profiled three histone modification marks (H3K4me3, H3K27ac and H3K4me1) across 17 GCs, matched normal gastric mucosae (34 samples) and 13 GC cell lines, generating 110 epigenomic profiles (Tables 1 and 4 provide clinical and sequencing metrics) (FIG. 1a). Quality control of the Nano-ChIPseq data was performed using two independent methods: ChIP-enrichment at known promoters, and employing the ChIP-seq quality control and validation tool CHANCE (CHip-seq ANalytics and Confidence Estimation). Comparisons of Nano-ChIPseq read densities at 1,000 promoters associated with highly expressed protein-coding genes confirmed successful enrichment in all H3K27ac and H3K4me3 libraries. CHANCE analysis also revealed that the large majority (81%) of samples exhibited successful enrichment (Table 1). We have previously also shown that Nano-ChIP signals exhibit a good concordance with orthogonal ChIP-qPCR results.









TABLE 4







Clinicopathological Parameters of samples used























Site










Sample



of
Stage
Stage
Stage
Stage

Lauren's
EBV
TCGA


ID
Platform
Age
Gender
Tumor
(T)
(N)
(M)
AJCC7
Grade
Classification
status
Subtype






















20021007
ChIPseq +
53.8
male
GE
T2b
N0
m0
2A
poorly
intestinal type
unknown
GS



Infinium450K


junction




differentiated
adenocarcinoma




20020720
ChIPseq +
75.2
male
antrum
T2a
N1
m0
2A
moderately
intestinal type
unknown
CIN



Infinium450K







differentiated
adenocarcinoma




2001206
ChIPseq +
64.8
male
antrum
T4a
N3b
m1
4
poorly
diffuse type
unknown
C!N



Infinium450K







differentiated
adenocarcinoma




2000877
ChIPseq +
44.6
male
cardia
T2a
N1
m0
2A
poorly
intestinal type
unknown
CIN



Infinium450K







differentiated
adenocarcinoma




2000085
ChIPseq +
52.6
male
lesser
T2
N0
m0
1B
moderately
intestinal type
yes
GS



Infinium450K


curve




differentiated
adenocarcinoma




990275
ChIPseq +
71.6
male
lesser
T4a
N0
m0
2B
moderately
intestinal type
no
CIN



Infinium450K


curve




differentiated
adenocarcinoma




990068
ChIPseq +
73.3
male
body
T4a
N2
m0
3B
poorly
intestinal type
no
GS



Infinium450K







differentiated
adenocarcinoma




980447
ChIPseq +
68.8
male
lesser
T4a
T3b
m1
4
poorly
intestinal type
unknown
CIN



Infinium450K


curve




differentiated
adenocarcinoma




980436
ChIPseq +
65.0
female
lesser
T4a
N1
m0
3A
moderately
intestinal type
unknown
GS



Infinium450K


curve




differentiated
adenocarcinoma




980401
ChIPseq +
82.9
female
unknown
T4a
N1
m0
3A
poorly
diffuse type
unknown
GS



Infinium450K







differentiated
adenocarcinoma




980319
ChIPseq +
67.8
male
unknown
T4a
N1
m0
3A
poorly
mixed/
yes
GS



Infinium450K







differentiated
OTHERS




2000986
ChIPseq +
39.0
female
pylorus
T4a
T3b
m1
4
poorly
diffuse type
unknown
GS



Infinium450K +







differentiated
adenocarcinoma





RNA-seq













2000721
ChIPseq +
70.9
male
lesser
T4a
T3b
m1
4
poorly
diffuse type
yes
GS



Infinium450K +


curve




differentiated
adenocarcinoma





RNA-seq













2000639
ChIPseq +
69.5
male
lesser
T4a
N3a
m1
4
moderately
intestinal type
yes
GS



Infinium450K +


curve




differentiated
adenocarcinoma





RNA-seq













980437
ChIPseq +
67.8
female
incisura
T4a
T3b
m0
3C
poorly
intestinal type
unknown
CIN



Infinium450K +







differentiated
adenocarcinoma





RNA-seq













980417
ChIPseq +
67.0
male
lesser
T4a
T3b
m0
3C
poorly
diffuse type
yes
GS



Infinium450K +


curve




differentiated
adenocarcinoma





RNA-seq













980097
ChIPseq +
65.4
male
unknown
T2
N1
m0
2A
undifferentiated
mixed/
unknown
EBV



Infinium450K +








OTHERS





RNA-seq













980418
Infinium450K
88.0
male
greater
T4a
N2
m0
3B
moderately
intestinal type
unknown







curve




differentiated
adenocarcinoma




57689477
RNA-seq
84.5
female
greater
T1b
N0
m0
1A
moderately
intestinal type
no







curve




differentiated
adenocarcinoma




43658255
RNA-seq
66.6
male
antrum
T4a
N3a
m1
4
moderately
intestinal type
unknown












differentiated
adenocarcinoma




2000892
RNA-seq
71.3
female
lesser
T2
N1
m0
2A
moderately
intestinal type
no







curve




differentiated
adenocarcinoma









To enable accurate promoter identification, we integrated data from multiple histone modifications, selecting H3K4me3 regions simultaneously co-depleted for H3K4me142 (“H3K4me3 hi/H3K4me1 lo regions”; FIG. 7, Methods). Comparisons against data from external sources, including GENCODE reference transcripts, ENCODE chromatin-state models, and CAGE (CAP analysis gene expression) databases, validated the vast majority of H3K4me3 hi/H3K4me1 lo regions as true promoter elements (see section titled “Validation of H3K4me3 hi/H3K4me1 lo regions as true promoters” and FIG. 7). Because primary gastric tissues comprise several different tissue types, including epithelial cells, immune cells, and stroma, we further confirmed that our promoter profiles were reflective of bona fide gastric epithelia by comparisons against Epigenome Roadmap data for gastric and non-gastric tissues. Gastric tumor and matched normal promoter profiles exhibited the highest correlations to Roadmap gastric mucosae, and were distinct from other gastrointestinal tissues (small intestine, colon mucosa, colon sigmoid), stomach-associated muscle, skin, and blood (CD14) (FIG. 8). Primary tissue promoter profiles also showed a significant overlap with promoter profiles of GC cell lines (87%), which are purely epithelial in origin, compared to gastrointestinal fibroblast lines (58-69%), and colon carcinoma lines (59-74%) (FIG. 8).


In total, we mapped ˜23,000 promoter elements in the Nano-ChIPseq cohort. Visual exploration of these promoter elements identified three main promoter categories—unaltered promoters, promoters gained in tumors (gained somatic or tumor-specific promoters), and promoters present in normal gastric tissues but lost or decreased in GC (lost somatic or normal-specific promoters) (FIG. 1a-c). Representative examples of unaltered promoters included RhoA (FIG. 1a), while CEACAM6, an intracellular adhesion gene, exhibited somatic promoter gain at the CEACAM6 transcription start site (TSS) in tumor samples and cell lines (FIG. 1b). Conversely, ATP4A, a parietal cell-associated H+/K+ ATPase with decreased expression in GC43, exhibited somatic promoter loss (FIG. 1c). Both CEACAM6 and ATP4A promoter alterations were correlated with increased and decreased CEACAM6 and ATP4A gene expression in the same samples respectively (FIGS. 1b and 1c).


Previous studies have established distinct molecular subtypes of GC. Due to limited sample sizes however, we elected in the current stay to identify promoter alterations (“somatic promoters”) present in multiple GC tissues relative to control tissues irrespective of subtype. Focusing on recurrent alterations also has the benefit of reducing potential artefacts due to “private” epigenomic variation or individual sample-specific technical errors. Using two complementary read-count based algorithms commonly used for analysis of ChIP-seq data, we identified ˜2000 highly recurrent somatic promoters, of which 75% were gained in GCs (FC 1.5, q<0.1). Two-dimensional heat-map clustering and principal components analysis (PCA) plots based on somatic promoters confirmed a separation of GCs from normal samples based on promoter alterations (FIG. 1d and FIG. 9). Somatic promoter H3K4me3 levels were also highly correlated with H3K27ac signals (r=0.91, P<0.001, FIG. 1e), commonly regarded as a marker of active regulatory activity. This correlation was observed across all somatic promoters (r=0.84, P<0.001, FIG. 1E), and also when gained somatic and lost somatic promoters were analyzed separately (r=0.78, P<0.001 for gained somatic; r=0.82, P<0.001 for lost somatic, FIG. 9). Pathway analysis revealed that both gained somatic and lost somatic promoters were significantly associated with expression genesets previously reported to be up and downregulated in GC respectively (FIG. 10. These included upregulated oncogenes (MET, ABL2), cell adhesion genes (CEACAM6) and claudin family members (CLDN7, CLDN3). 15-18% of somatic promoters mapped to non-coding RNAs (ncRNAs), including HOTAIR and PVT1, previously associated with GC (Table 5). Additional analyses at increasing thresholds of stringency (FC from 1.5-2 and FDR from 0.1-0.001) yielded similar results, supporting the robustness of this analysis (FIG. 9). These results demonstrate that normal gastric epithelia and GCs can be distinguished on the basis of epigenomic promoter profiles.









TABLE 5







Non coding RNAs associated with Altered promoters










Gene
H3K4Me3 (T/N)







AC004158.2
Gain



AC004870.4
Gain



AC005281.1
Gain



AC005550.4
Gain



AC007040.5
Gain



AC007392.3
Gain



AC009229.6
Gain



AC012531.23
Gain



AC016683.6
Gain



AC016995.3
Gain



AC019201.1
Loss



AC068134.6
Gain



AC069277.2
Gain



AC073479.1
Loss



AC079779.4
Loss



AC090051.1
Loss



AC092296.1
Gain



AC092594.1
Gain



AC092635.1
Loss



AC096579.1
Loss



AC096579.13
Loss



AC096579.7
Loss



AC116351.2
Gain



AC128653.1
Loss



AC131951.1
Loss



AC133680.1
Loss



AC140912.1
Gain



AC144521.1
Gain



AF127936.5
Loss



AJ003147.8
Gain



AL031721.1
Gain



AL109618.1
Gain



AL122015.1
Gain



AL122127.1
Loss



AL122127.2
Loss



AL122127.3
Loss



AL122127.4
Loss



AL122127.5
Loss



AL139319.1
Gain



AP000525.9
Gain



AP001065.15
Gain



C11orf95
Gain



C1orf132
Loss



CASC9
Gain



CCAT1
Gain



CECR7
Loss



CT49
Gain



CTB-175P5.4
Gain



CTC-228N24.1
Gain



CTC-276P9.1
Loss



CTC-480C2.1
Gain



CTD-2008P7.9
Loss



CTD-2147F2.1
Gain



CTD-2201E18.5
Gain



CTD-2314B22.1
Gain



CTD-2314B22.3
Gain



CTD-2532K18.1
Gain



CTD-2591A6.2
Gain



FENDRR
Loss



FZD10-AS1
Gain



GS1-179L18.1
Gain



GS1-259H13.2
Gain



H19
Gain



hsa-mir-4537
Loss



hsa-mir-4538
Loss



hsa-mir-4539
Loss



JRK
Loss



LINC00237
Gain



LINC00278
Loss



LINC00355
Gain



LINC00365
Loss



LINC00393
Gain



LINC00665
Gain



LINC00668
Gain



LINC00669
Gain



LINC00675
Loss



LINC00858
Gain



LINC00898
Gain



LINC00939
Gain



LINC00960
Gain



MIR1184-1
Gain



MIR135B
Gain



MIR144
Loss



MIR196B
Gain



MIR3147
Gain



MIR3185
Gain



MIR31HG
Loss



MIR4488
Gain



MIR4634
Gain



MIR663A
Gain



MIR663B
Loss



MIR935
Gain



MLLT4-AS1
Gain



PVT1
Gain



RN7SKP258
Gain



RN7SL773P
Gain



RNA5S17
Gain



RNA5SP18
Gain



RNA5SP19
Gain



RNA5SP75
Loss



RNU1-92P
Gain



RNVU1-10
Gain



RP11-108K3.1
Gain



RP11-138J23.1
Gain



RP11-13A1.1
Gain



RP11-161I10.1
Gain



RP11-163N6.2
Gain



RP11-168L22.2
Gain



RP11-16E12.2
Loss



RP11-177F15.1
Gain



RP11-191L9.4
Gain



RP11-211C9.1
Gain



RP11-229C3.2
Loss



RP11-246A10.1
Gain



RP11-25H12.1
Gain



RP11-276H19.2
Gain



RP11-288G11.3
Loss



RP11-299P2.1
Loss



RP11-2E17.1
Loss



RP11-308B16.2
Gain



RP11-326A19.4
Gain



RP11-346D19.1
Gain



RP11-347D21.4
Gain



RP11-348J24.2
Gain



RP11-351J23.2
Gain



RP11-356J5.12
Gain



RP11-357H14.17
Gain



RP11-371I1.2
Gain



RP11-137D17.1
Gain



RP11-395B7.2
Gain



RP11-3J1.1
Gain



RP11-400N13.2
Gain



RP11-403I13.5
Gain



RP11-408B11.2
Gain



RP11-426L16.8
Gain



RP11-431M3.1
Loss



RP11-434D9.2
Gain



RP11-43F13.4
Gain



RP11-44H4.1
Gain



RP11-44N12.5
Gain



RP11-451B8.1
Gain



RP11-
Gain



453F18_B.1




RP11-460N16.1
Gain



RP11-469L4.1
Loss



RP11-472N13.2
Gain



RP11-48O20.4
Loss



RP11-499F3.2
Gain



RP11-514D23.1
Loss



RP11-547I7.2
Gain



RP11-575F12.1
Gain



RP11-576D8.4
Gain



RP11-599B13.3
Loss



RP11-608O21.1
Gain



RP11-60A8.1
Gain



RP11-61G19.1
Gain



RP11-626G11.4
Gain



RP11-626H12.1
Gain



RP11-627G23.1
Loss



RP11-632K5.3
Gain



RP11-66B24.2
Gain



RP11-66B24.7
Gain



RP11-689K5.3
Gain



RP1-170O19.14
Gain



RP1-170O19.17
Gain



RP11-776H12.1
Gain



RP11-79P5.7
Gain



RP11-809C18.5
Gain



RP11-81H14.2
Loss



RP11-831A10.2
Loss



RP11-834C11.14
Gain



RP11-834C11.6
Loss



RP11-867G2.6
Gain



RP11-89F3.2
Gain



RP11-933H2.4
Gain



RP11-963H4.3
Loss



RP1-274L7.1
Gain



RP13-137A17.4
Loss



RP13-137A17.6
Loss



RP13-379O24.3
Loss



RP1-63G5.5
Gain



RP1-79C4.4
Gain



RP3-522D1.1
Gain



RP4-562J12.2
Gain



RP4-594A5.1
Gain



RP5-1077H22.2
Loss



RP5-1121A15.3
Gain



RP5-884M6.1
Gain



RP5-916L7.2
Gain



RP6-114E22.1
Gain



SNORA31
Gain



SNORA48
Gain



SNORD56B
Loss



snoU13
Gain



SOX21-AS1
Loss



TPTEP1
Loss



TTTY15
Loss



U3
Loss



U8
Loss










Validation of H3K4Me3 Hi/H3K4Me1 Lo Regions as True Promoters


Four lines of evidence support the vast majority of H3K4me3 hi/H3K4me1 lo regions as true promoters. First, H3K4me3 hi/H3K4me1 lo regions were strongly enriched at genomic locations located 1 kb upstream of known GENCODE transcription start sites (TSSs) (FIG. 7). Second, at TSS regions, H3K4me3 signals exhibited a classical skewed bimodal intensity pattern, previously reported to be associated with promoters (FIG. 7). Third, when overlapped with regions defined by the Epigenomic Roadmap (EpiRd) 15 state model, we observed significant enrichments of H3K4me3 hi/H3K4me1 lo regions at proximal promoter states (TSSs/Regions flanking transcription sites) in gastrointestinal tissues relative to other tissues (FIG. 7). Fourth, CAGE (CAP analysis gene expression) is a specialized transcriptome sequencing method used to map gene promoters using 5′ mRNA data. Integration with CAGE data from the FANTOMS consortium revealed an 81% overlap of H3K4me3 hi/H3K4me1 lo regions with robust CAGE tag clusters. (FIG. 7).


Somatic Promoters in GC Exhibit Deregulation in Diverse Cancer Types


To explore relationships between epigenomic promoter alterations and gene expression, we analyzed RNA-seq data from the same discovery cohort (˜106 million reads/sample), quantifying RNA-seq transcript reads mapping to the epigenome-guided promoter regions or directly downstream. Examining somatic promoter regions (FIG. 2A provides an illustrative example of a gained somatic promoter), we observed significantly increased expression at gained somatic promoters in GCs, and significantly decreased expression at lost somatic promoters, compared to either all promoters (P<0.001, FIG. 2B), or unaltered promoters (P<0.001, FIG. 10). Among other types of epigenetic modifications, previous studies have also reported a reciprocal relationship between active regulatory regions and DNA methylation. Using Infinium 450K DNA methylation arrays, we identified 7,505 CpG sites overlapping somatic promoter regions (5,213 sites for gained somatic promoters, 2,292 sites for lost somatic promoters). Promoters gained in GC were significantly hypomethylated compared to all promoters, (P<0.001, Wilcoxon test) while promoters lost in GC were hypermethylated (P<0.001, Wilcoxon test) (FIG. 2b, bottom). As DNA methylation typically occurs in CpG rich regions, (56) we then repeated the analysis focusing only on CpG island bearing promoters (Methods and Materials). Similar to the original results, CpG island bearing promoters gained in GC were significantly hypomethylated compared to all CpG island bearing promoters, (P<0.001, Wilcoxon test) while CpG island bearing promoters lost in GC were hypermethylated (P<0.001, Wilcoxon test) (FIG. 11).


To validate the somatic promoter alterations in a larger independent GC cohort and also to examine their behavior in other cancer types, we proceeded to query RNA-seq data of 354 GC samples from the TCGA consortium (n=321 GC, n=33 matched normals). To perform this analysis, RNA-seq reads from TCGA samples were mapped against the epigenome-guided somatic promoter regions defined by the discovery samples, and normalized to calculate fold change differences in expression in GC vs. normals (see Methods and Materials). Similar to the discovery series, we observed that TCGA GCs also exhibited significantly increased expression at gained somatic promoters, while lost somatic promoters exhibited decreased expression, relative to either all promoters (P<0.001, FIG. 2C) or unaltered promoters (P<0.001, FIG. 10). We further tested the tissue-specificity of the GC somatic promoters by querying RNA-seq data from other tumor types, including colon, kidney renal clear cell carcinoma (ccRCC), and lung adenocarcinoma (LUAD) (FIG. 2d). Almost two-thirds (n=1231, 63%, FC=1.5) of GC somatic promoters were also differentially regulated in TCGA colon cancer samples and similarly, a significant proportion of GC somatic promoters were also associated with differential RNA-seq expression in TCGA ccRCC (n=939, 48%, FC=1.5) and LUAD samples (n=1059, 54%, FC=1.5) (FIG. 2D). This result suggests that many GC somatic promoters are also likely associated with deregulated promoter activity in other solid epithelial malignancies.


Role of Alternative Promoters


By comparing the somatic promoters against the reference Gencode database (V19), we discovered extensive use of alternative promoters (18%) in GCs, defined as situations where a common unaltered promoter is present in both normal tissues and tumors (canonical promoter) but a secondary tumor-specific promoter is engaged in the latter (alternative promoter). The remaining 82% of somatic promoters corresponded to single major isoforms or unannotated transcripts (see later). 57% of the alternative promoters occurred downstream of the canonical promoter. Using multiple RNA-seq analysis methods, we confirmed that transcript isoforms driven by alternative promoters are overexpressed in GCs to a significantly greater degree than canonical promoters in the same gene (Methods and Materials, FIG. 12). For example, HNF4α, a transcription factor overexpressed in GC, is driven by two promoters (P1 and P2). At the HNF4α canonical promoter (“P2”), we observed equal promoter signals in GCs and normal tissues; however we also further observed gain of an additional promoter in GCs at a transcription start site 45 kb downstream (“P1”). Similar HNF4α P1 promoter gains were also observed in GC cell lines (FIG. 3a), with RNA-seq analysis supporting HNF4α P1 isoform expression in GCs. Alternative promoter usage was also observed at the EpCAM gene, frequently used to identify circulating tumor cells, causing expression of EpCAM transcript ENST00000263735.4 (FIG. 3b). Notably, both the HNF4α and EpCAM alternative isoforms exhibited significantly greater cancer overexpression compared to their canonical isoforms (FIG. 12). Other genes associated with tumor-specific alternative promoters, many reported for the first time, including NKX6-3 (FC 1.83, q<0.05) and GRIN2D (FC 1.9, q<0.001). A complete list of GC tumor-specific promoters is provided (Table 6).









TABLE 6







Alternative Promoters














Change




H3K4Me3

in



Loci
(T/N)
Type
protein
Gene





chr2: 69900550-69901900
Loss
Alternate
1
AAK1


chr2: 44058400-44060450
Gain
Alternate
1
ABCG5


chr1: 179108750-
Gain
Alternate
1
ABL2


179113100






chr1: 6451200-6453300
Gain
Alternate
1
ACOT7


chr7: 991700-995250
Gain
Alternate
1
ADAP1


chr11: 69811750-
Gain
Alternate
1
ANO1


69814800






chr19: 50308050-
Gain
Alternate
1
AP2A1


50309350






chr17: 36620950-
Gain
Alternate
1
ARHGAP23


36622550






chr2: 10902450-10904150
Gain
Alternate
1
ATP6V1C2


chr7: 70060000-70066050
Gain
Alternate
1
AUTS2


chr18: 60804550-
Loss
Alternate
1
BCL2


60807050






chr11: 1463100-1464700
Gain
Alternate
1
BRSK2


chr4: 2038150-2039400
Gain
Alternate
1
C4orf48


chr21: 44482600-
Gain
Alternate
1
CBS


44484300






chr3: 46988600-46990000
Gain
Alternate
1
CCDC12


chr16: 28946800-
Gain
Alternate
1
CD19


28948350






chr6: 4836100-4837550
Gain
Alternate
1
CDYL


chr6: 118985250-
Loss
Alternate
1
CEP85L


118986450






chr9: 124497650-
Gain
Alternate
1
DAB2IP


124504300






chr19: 6474700-6477300
Gain
Alternate
1
DENND1C


chr4: 955250-957700
Gain
Alternate
1
DGKQ


chr16: 21059250-
Gain
Alternate
1
DNAH3


21060650






chr7: 35074250-35076850
Gain
Alternate
1
DPY19L1


chr6: 56553350-56559100
Gain
Alternate
1
DST


chr2: 47595450-47602500
Gain
Alternate
1
EPCAM


chrX: 137860100-
Gain
Alternate
1
FGF13


137861300






chr3: 69283500-69286950
Gain
Alternate
1
FRMD4B


chr7: 99774000-99776200
Gain
Alternate
1
GPC2


chr10: 25754300-
Gain
Alternate
1
GPR158


25755900






chr11: 123458150-
Gain
Alternate
1
GRAMD1B


123465950






chr20: 43029650-
Gain
Alternate
1
HNF4A


43032200






chr17: 46639600-
Gain
Alternate
1
HOXB3


46642950






chr7: 23506000-23515500
Gain
Alternate
1
IGF2BP3


chr1: 38410700-38414500
Loss
Alternate
1
INPP5B


chr19: 17952000-
Gain
Alternate
1
JAK3


17953950






chr14: 24891600-
Loss
Alternate
1
KHNYN


24897600






chr18: 21452050-
Gain
Alternate
1
LAMA3


21455250






chr5: 154091500-
Loss
Alternate
1
LARP1


154095100






chr5: 38605950-38609550
Loss
Alternate
1
LIFR


chr16: 1013250-1015550
Gain
Alternate
1
LMF1


chr19: 49003900-
Gain
Alternate
1
LMTK3


49005550






chr1: 156896950-
Gain
Alternate
1
LRRC71


156898350






chr1: 156893100-
Gain
Alternate
1
LRRC71


156894550






chr1: 236045300-
Loss
Alternate
1
LYST


236047550






chr20: 33134200-
Gain
Alternate
1
MAP1LC3A


33135900






chr7: 130125100-
Gain
Alternate
1
MEST


130127800






chr7: 116363550-
Gain
Alternate
1
MET


116365500






chr3: 158448250-
Gain
Alternate
1
MFSD1


158451400






chr1: 1562700-1565700
Gain
Alternate
1
MIB2


chr14: 102700300-
Gain
Alternate
1
MOK


102702150






chr17: 60756900-
Gain
Alternate
1
MRC2


60758850






chr8: 144652950-
Gain
Alternate
1
MROH6


144655550






chr7: 100607850-
Gain
Alternate
1
MUC12


100613600






chr11: 76902300-
Gain
Alternate
1
MYO7A


76903800






chr1: 24434350-24435800
Gain
Alternate
1
MYOM3


chr6: 126136250-
Loss
Alternate
1
NCOA7


126140700






chr2: 233755200-
Gain
Alternate
1
NGEF


233756650






chr2: 233791350-
Gain
Alternate
1
NGEF


233792700






chr17: 26119900-
Gain
Alternate
1
NOS2


26121850






chr1: 200007500-
Gain
Alternate
1
NR5A2


200010950






chr18: 55099800-
Gain
Alternate
1
ONECUT2


55108900






chr8: 107629450-
Loss
Alternate
1
OXR1


107632850






chr4: 169575100-
Loss
Alternate
1
PALLD


169577200






chr19: 18364400-
Loss
Alternate
1
PDE4C


18366800






chr4: 111557000-
Gain
Alternate
1
PITX2


111559350






chr8: 145009000-
Gain
Alternate
1
PLEC


145018500






chr19: 49370000-
Gain
Alternate
1
PLEKHA4


49372300






chr11: 16944700-
Gain
Alternate
1
PLEKHA7


16947800






chr1: 6530450-6535000
Gain
Alternate
1
PLEKHG5


chr5: 74990850-74992350
Gain
Alternate
1
POC5


chr6: 35359200-35364100
Loss
Alternate
1
PPARD


chr19: 49631500-
Gain
Alternate
1
PPFIA3


49632100






chr22: 22900650-
Gain
Alternate
1
PRAME


22902550






chr9: 132458700-
Gain
Alternate
1
PRRX2


132461300






chr9: 139873000-
Gain
Alternate
1
PTGDS


139874300






chr1: 29562850-29565950
Gain
Alternate
1
PTPRU


chr17: 2878500-2880550
Gain
Alternate
1
RAP1GAP2


chr9: 134548500-
Loss
Alternate
1
RAPGEF1


134553400






chr3: 24851300-24854350
Loss
Alternate
1
RARB


chr13: 114769100-
Gain
Alternate
1
RASA3


114771100






chr20: 399750-402500
Gain
Alternate
1
RBCK1


chr19: 14088450-
Gain
Alternate
1
RFX1


14090950






chr4: 3310150-3312100
Gain
Alternate
1
RGS12


chr8: 74035400-74036300
Loss
Alternate
1
SBSPON


chr21: 38063750-
Loss
Alternate
1
SIM2


38066650






chr19: 19215350-
Gain
Alternate
1
SLC25A42


19217300






chr7: 103021250-
Loss
Alternate
1
SLC26A5


103022850






chr12: 40425950-
Loss
Alternate
1
SLC2A13


40427700






chr12: 20975550-
Gain
Alternate
1
SLCO1B3


20976900






chr16: 68418000-
Loss
Alternate
1
SMPD3


68421750






chr4: 186729400-
Loss
Alternate
1
SORBS2


186734150






chr2: 231206350-
Gain
Alternate
1
SP140L


231208750






chr7: 87854350-87856200
Gain
Alternate
1
SRI


chr3: 17734300-17735900
Gain
Alternate
1
TBC1D5


chr8: 67866500-67867950
Gain
Alternate
1
TCF24


chr6: 10409250-10419650
Gain
Alternate
1
TFAP2A


chr3: 129512300-
Gain
Alternate
1
TMCC1


129514550






chr18: 20910450-
Gain
Alternate
1
TMEM241


20912050






chr2: 218874000-
Gain
Alternate
1
TNS1


218875450






chr8: 141017700-
Gain
Alternate
1
TRAPPC9


141019200






chr4: 8435700-8439650
Loss
Alternate
1
TRMT44


chr21: 45844650-
Gain
Alternate
1
TRPM2


45846700






chrX: 107016000-
Loss
Alternate
1
TSC22D3


107021000






chr2: 3371900-3374350
Gain
Alternate
1
TSSC1


chr17: 40784750-
Loss
Alternate
1
TUBG2


40786950






chr16: 1428050-1430700
Gain
Alternate
1
UNKL


chr12: 109507100-
Gain
Alternate
1
USP30


109508350






chr20: 50719850-
Gain
Alternate
1
ZFP64


50723350






chr4: 8128400-8130450
Gain
Alternate
0
ABLIM2


chr16: 72660100-
Gain
Alternate
0
AC004158.2


72662050






chr2: 66801200-66811950
Gain
Alternate
0
AC007392.3


chr2: 114081700-
Gain
Alternate
0
AC016745.3


114084050






chr19: 52104750-
Loss
Alternate
0
AC018755.16


52106000






chr2: 19504600-19506400
Gain
Alternate
0
AC092594.1


chr2: 118899750-
Gain
Alternate
0
AC093901.1


118901550






chr17: 263900-267650
Loss
Alternate
0
AC108004.3


chr3: 18734950-18736300
Gain
Alternate
0
AC144521.1


chr12: 109568950-
Loss
Alternate
0
ACACB


109570000






chrX: 23783150-
Gain
Alternate
0
ACOT9


23786000






chr7: 5601050-5603800
Gain
Alternate
0
ACTB


chr7: 15600650-
Gain
Alternate
0
AGMO


15602200






chr21: 45336050-
Loss
Alternate
0
AGPAT3


45337600






chr15: 86232000-
Loss
Alternate
0
AKAP13


86236800






chr9: 112909300-
Loss
Alternate
0
AKAP2


112915400






chr2: 241496150-
Gain
Alternate
0
ANKMY1


241498200






chr2: 242127000-
Loss
Alternate
0
ANO7


242129850






chr5: 139972550-
Gain
Alternate
0
APBB3


139973900






chr18: 24443050-
Loss
Alternate
0
AQP4-AS1


24445900






chr4: 86395150-86399900
Loss
Alternate
0
ARHGAP24


chr19: 47362700-
Gain
Alternate
0
ARHGAP35


47367650






chr9: 35672750-35677150
Loss
Alternate
0
ARHGEF39


chrX: 100739600-
Gain
Alternate
0
ARMCX4


100741600






chr9: 120175650-
Loss
Alternate
0
ASTN2


120177900






chr3: 193270000-
Loss
Alternate
0
ATP13A4


193274550






chr18: 77102950-
Loss
Alternate
0
ATP9B


77104300






chr1: 179486050-
Loss
Alternate
0
AXDND1


179487950






chr4: 102332100-
Gain
Alternate
0
BANK1


102333250






chr1: 94046300-94051100
Loss
Alternate
0
BCAR3


chr11: 27686500-
Gain
Alternate
0
BDNF-AS


27687900






chr20: 11897750-
Loss
Alternate
0
BTBD3


11902000






chr11: 63531650-
Gain
Alternate
0
C11orf95


63533550






chr19: 30199050-
Gain
Alternate
0
C19orf12


30200500






chr1: 207991400-
Loss
Alternate
0
C1orf132


208001200






chr6: 109571700-
Gain
Alternate
0
C6orf183


109573350






chr8: 128305850-
Gain
Alternate
0
CASC8


128307550






chr5: 43409150-43412850
Loss
Alternate
0
CCL28


chr8: 95245700-95247400
Gain
Alternate
0
CDH17


chr7: 105603300-
Loss
Alternate
0
CDHR3


105604700






chr7: 90338500-90340500
Loss
Alternate
0
CDK14


chr7: 29184550-29187650
Gain
Alternate
0
CHN2


chr15: 79011600-
Gain
Alternate
0
CHRNB4


79013200






chr7: 139226300-
Gain
Alternate
0
CLEC2L


139228850






chr6: 25164900-25167200
Loss
Alternate
0
CMAHP


chr16: 81684900-
Loss
Alternate
0
CMIP


81687600






chr6: 37391200-37392800
Gain
Alternate
0
CMTR1


chr3: 74662150-74664400
Loss
Alternate
0
CNTN3


chr11: 111172600-
Loss
Alternate
0
COLCA1


111176650






chr6: 36722500-36725900
Loss
Alternate
0
CPNE5


chr11: 85392850-
Loss
Alternate
0
CREBZF


85394650






chr16: 21288600-
Gain
Alternate
0
CRYM


21290700






chr5: 60597450-60601050
Loss
Alternate
0
CTC-






436P18.3


chr15: 45544050-
Loss
Alternate
0
CTD-


45548600



2651B20.3


chr20: 110300-111350
Gain
Alternate
0
DEFB126


chr2: 234326350-
Loss
Alternate
0
DGKD


234331500






chr1: 223101350-
Loss
Alternate
0
DISP1


223104800






chr11: 111852050-
Loss
Alternate
0
DIXDC1


111855050






chr13: 50759600-
Gain
Alternate
0
DLEU1


50762100






chr1: 46954600-46956800
Gain
Alternate
0
DMBX1


chr16: 30021900-
Gain
Alternate
0
DOC2A


30023950






chr6: 56715250-56717500
Gain
Alternate
0
DST


chr18: 46894350-
Loss
Alternate
0
DYM


46895900






chr5: 106838450-
Loss
Alternate
0
EFNA5


106842400






chr4: 111331750-
Gain
Alternate
0
ENPEP


111333350






chr14: 74461400-
Loss
Alternate
0
ENTPD5


74463450






chr19: 55590850-
Gain
Alternate
0
EPS8L1


55593800






chr5: 172332450-
Loss
Alternate
0
ERGIC1


172333000






chr1: 17024500-17028900
Gain
Alternate
0
ESPNP


chr1: 216892850-
Loss
Alternate
0
ESRRG


216898200






chr1: 217249050-
Loss
Alternate
0
ESRRG


217252200






chr6: 36326200-36331550
Gain
Alternate
0
ETV7


chr12: 124778800-
Loss
Alternate
0
FAM101A


124786100






chr17: 47822200-
Loss
Alternate
0
FAM117A


47825200






chr4: 187025100-
Loss
Alternate
0
FAM149A


187028650






chr1: 178986050-
Loss
Alternate
0
FAM20B


178987900






chr7: 102574000-
Loss
Alternate
0
FBXL13


102576900






chr16: 86529000-
Loss
Alternate
0
FENDRR


86534050






chr20: 34192700-
Loss
Alternate
0
FER1L4


34196000






chr8: 124926550-
Gain
Alternate
0
FER1L6


124929550






chr7: 121942750-
Gain
Alternate
0
FEZF1


121947900






chr12: 32654200-
Loss
Alternate
0
FGD4


32659150






chr16: 86608950-
Gain
Alternate
0
FOXL1


86611800






chr8: 75230900-75235150
Gain
Alternate
0
GDAP1


chr7: 100288750-
Gain
Alternate
0
GIGYF1


100293000






chr11: 58694450-
Loss
Alternate
0
GLYATL1


58696550






chr5: 89854500-89855350
Loss
Alternate
0
GPR98


chr2: 165476750-
Gain
Alternate
0
GRB14


165479250






chr9: 140056700-
Gain
Alternate
0
GRIN1


140058300






chr19: 48900250-
Gain
Alternate
0
GRIN2D


48904400






chr9: 104466750-
Gain
Alternate
0
GRIN3A


104468450






chr3: 14642850-14644150
Loss
Alternate
0
GRIP2


chr11: 2016000-2021350
Gain
Alternate
0
H19


chrX: 152760450-
Gain
Alternate
0
HAUS7


152761150






chr7: 18534500-18539050
Loss
Alternate
0
HDAC9


chr15: 83619150-
Loss
Alternate
0
HOMER2


83622750






chr7: 27159450-27164850
Gain
Alternate
0
HOXA3


chr7: 27208400-27220700
Gain
Alternate
0
HOXA9


chr17: 46678350-
Gain
Alternate
0
HOXB6


46683450






chr17: 46694850-
Gain
Alternate
0
HOXB8


46697150






chr3: 11178050-11179900
Gain
Alternate
0
HRH1


chr3: 11195250-11198600
Gain
Alternate
0
HRH1


chr3: 11265900-11269000
Gain
Alternate
0
HRH1


chr1: 23543800-23544900
Gain
Alternate
0
HTR1D


chrX: 130711450-
Gain
Alternate
0
IGSF1


130713600






chr17: 38016450-
Loss
Alternate
0
IKZF3


38022250






chr2: 113619100-
Loss
Alternate
0
IL1B


113622250






chr4: 143394250-
Gain
Alternate
0
INPP4B


143396200






chr19: 2255550-2257400
Loss
Alternate
0
JSRP1


chr17: 68071050-
Loss
Alternate
0
KCNJ16


68073700






chr14: 88788450-
Gain
Alternate
0
KCNK10


88791000






chr4: 56914350-56916700
Gain
Alternate
0
KIAA1211


chr10: 24725650-
Loss
Alternate
0
KIAA1217


24728200






chr11: 33398050-
Gain
Alternate
0
KIAA1549L


33400750






chr15: 31637200-
Loss
Alternate
0
KLF13


31640250






chr19: 55019200-
Gain
Alternate
0
LAIR2


55020400






chr1: 65991250-65992850
Loss
Alternate
0
LEPR


chr5: 78014050-78017100
Loss
Alternate
0
LHFPL2


chr12: 113904650-
Gain
Alternate
0
LHX5


113906650






chr22: 30651400-
Gain
Alternate
0
LIF


30654850






chr20: 21085550-
Gain
Alternate
0
LINC00237


21087550






chr13: 74234250-
Gain
Alternate
0
LINC00393


74236800






chr3: 8652200-8654000
Gain
Alternate
0
LMCD1-






AS1


chr20: 6031700-6033850
Gain
Alternate
0
LRRN4


chr3: 116161150-
Gain
Alternate
0
LSAMP


116164900






chr11: 1889150-1894600
Loss
Alternate
0
LSP1


chrX: 149588950-
Gain
Alternate
0
MAMLD1


149590100






chr1: 27683050-27684600
Loss
Alternate
0
MAP3K6


chrX: 20115700-
Loss
Alternate
0
MAP7D2


20118300






chr3: 150959500-
Gain
Alternate
0
MED12L


150960300






chr22: 42148300-
Loss
Alternate
0
MEI1


42150300






chr1: 205537050-
Loss
Alternate
0
MFSD4


205540700






chr1: 22489600-22491100
Gain
Alternate
0
MIR4418


chr19: 748150-750100
Gain
Alternate
0
MISP


chr3: 69914350-69917750
Loss
Alternate
0
MITF


chr6: 168215700-
Gain
Alternate
0
MLLT4-


168217350



AS1


chr19: 1286150-1288700
Gain
Alternate
0
MUM1


chr19: 50690700-
Gain
Alternate
0
MYH14


50695700






chr17: 73606350-
Gain
Alternate
0
MYO156


73609450






chr17: 31010250-
Gain
Alternate
0
MYO1D


31012000






chr18: 55888350-
Loss
Alternate
0
NEDD4L


55892150






chr2: 131965200-
Gain
Alternate
0
NF1P8


131968600






chr14: 27147750-
Gain
Alternate
0
NOVA1-


27148900



AS1


chr11: 108040050-
Loss
Alternate
0
NPAT


108041550






chr7: 98248450-98250250
Gain
Alternate
0
NPTX2


chr15: 76302650-
Loss
Alternate
0
NRG4


76305350






chr9: 132370500-
Gain
Alternate
0
NTMT1


132373750






chr3: 32118200-32120100
Gain
Alternate
0
OSBPL10


chr19: 14171500-
Loss
Alternate
0
PALM3


14173250






chr7: 32107350-32111900
Loss
Alternate
0
PDE1C


chr3: 111450850-
Loss
Alternate
0
PHLDB2


111453300






chr12: 18395250-
Loss
Alternate
0
PIK3C2G


18399450






chr8: 110534900-
Loss
Alternate
0
PKHD1L1


110536100






chr20: 8094750-8096650
Gain
Alternate
0
PLCB1


chr1: 6544500-6545600
Gain
Alternate
0
PLEKHG5


chr22: 41990400-
Gain
Alternate
0
PMM1


41991450






chr6: 31150550-31154950
Loss
Alternate
0
POU5F1


chr11: 7626600-7631400
Loss
Alternate
0
PPFIBP2


chr2: 182895050-
Gain
Alternate
0
PPP1R1C


182896750






chr8: 143759850-
Loss
Alternate
0
PSCA


143765700






chr8: 27237450-27239750
Loss
Alternate
0
PTK2B


chr8: 142384050-
Gain
Alternate
0
PTP4A3


142385550






chr9: 96767600-96770450
Loss
Alternate
0
PTPDC1


chr12: 120661250-
Loss
Alternate
0
PXN


120664850






chr18: 52384600-
Loss
Alternate
0
RAB27B


52386250






chr11: 82706750-
Loss
Alternate
0
RAB30


82709350






chr8: 95485350-95488300
Gain
Alternate
0
RAD54B


chr4: 82964050-82966400
Gain
Alternate
0
RASGEF1B


chr4: 40512300-40518850
Loss
Alternate
0
RBM47


chr9: 116225550-
Gain
Alternate
0
RGS3


116228700






chr10: 62758000-
Loss
Alternate
0
RHOBTB1


62762450






chr8: 104510350-
Gain
Alternate
0
RIMS2


104514700






chr21: 38379100-
Gain
Alternate
0
RIPPLY3


38379750






chr8: 61324800-61327100
Gain
Alternate
0
RP11-






163N6.2


chr20: 6301750-6304300
Gain
Alternate
0
RP11-






199O14.1


chr3: 187606800-
Gain
Alternate
0
RP11-


187608950



30O15.1


chr1: 39191950-39194400
Loss
Alternate
0
RP11-






334L9.1


chr11: 112140350-
Gain
Alternate
0
RP11-


112142500



356J5.12


chr6: 82809950-82812100
Gain
Alternate
0
RP11-






379B8.1


chr14: 39702300-
Loss
Alternate
0
RP11-


39706400



407N17.3


chr1: 203394800-
Gain
Alternate
0
RP11-


203398950



435P24.3


chr9: 72091300-72092650
Gain
Alternate
0
RP11-






470P21.2


chr15: 82161650-
Gain
Alternate
0
RP11-


82163400



499F3.2


chr4: 88631250-
Gain
Alternate
0
RP11-


88631950



742B18.1


chr11: 94372300-
Gain
Alternate
0
RP11-


94374550



867G2.5


chr3: 131049650-
Gain
Alternate
0
RP11-


131051500



933H2.4


chr17: 10746250-
Loss
Alternate
0
RP11-


10749200



963H4.3


chr6: 85334900-85337050
Gain
Alternate
0
RP1-






90L14.1


chr7: 156735150-
Gain
Alternate
0
RP5-


156736500



1121A15.3


chr2: 55236200-55238400
Loss
Alternate
0
RTN4


chr16: 51186150-
Loss
Alternate
0
SALL1


51187850






chr2: 200326950-
Gain
Alternate
0
SATB2


200329550






chr3: 53031650-53034600
Gain
Alternate
0
SFMBT1


chr14: 71849000-
Loss
Alternate
0
SIPA1L1


71850350






chr1: 232760700-
Gain
Alternate
0
SIPA1L2


232767700






chr7: 100448750-
Gain
Alternate
0
SLC12A9


100451750






chr12: 105344050-
Loss
Alternate
0
SLC41A2


105348050






chr6: 31843950-31847850
Loss
Alternate
0
SLC44A4


chr1: 75840850-75842350
Gain
Alternate
0
SLC44A5


chr1: 205637750-
Gain
Alternate
0
SLC45A3


205639250






chr11: 26985950-
Gain
Alternate
0
SLC5A12


26987450






chr14: 23622000-
Loss
Alternate
0
SLC7A8


23623950






chr22: 31459200-
Gain
Alternate
0
SMTN


31461650






chr20: 10197250-
Gain
Alternate
0
SNAP25-


10201300



AS1


chr16: 1842850-1844950
Loss
Alternate
0
SPSB3


chr11: 4010850-4011700
Loss
Alternate
0
STIM1


chr8: 99951150-99961750
Gain
Alternate
0
STK3


chr7: 23761400-23764000
Gain
Alternate
0
STK31


chr1: 110573450-
Loss
Alternate
0
STRIP1


110574700






chr7: 73131100-73134700
Gain
Alternate
0
STX1A


chr20: 46411750-
Gain
Alternate
0
SULF2


46414250






chr12: 79438650-
Gain
Alternate
0
SYT1


79440250






chr15: 57509850-
Loss
Alternate
0
TCF12


57515600






chr12: 110411050-
Gain
Alternate
0
TCHP


110419200






chr21: 32640100-
Loss
Alternate
0
TIAM1


32641350






chr19: 3707600-3711250
Loss
Alternate
0
TJP3


chr10: 102830000-
Loss
Alternate
0
TLX1NB


102833650






chr2: 228241600-
Gain
Alternate
0
TM4SF20


228244450






chr16: 19427700-
Gain
Alternate
0
TMC5


19435900






chr7: 47490900-47493500
Loss
Alternate
0
TNS3


chr8: 144436800-
Gain
Alternate
0
TOP1MT


144438000






chr13: 45955000-
Gain
Alternate
0
TPT1-AS1


45957700






chr17: 3459750-3462900
Loss
Alternate
0
TRPV3


chr3: 12522200-12524700
Gain
Alternate
0
TSEN2


chr22: 46683150-
Loss
Alternate
0
TTC38


46685350






chr6: 133003800-
Gain
Alternate
0
VNN1


133008900






chr15: 53831700-
Gain
Alternate
0
WDR72


53833550






chr11: 102617350-
Gain
Alternate
0
WTAPP1


102619450






chr11: 68436350-
Gain
Alternate
0
Novel Gene


68438200






chr12: 125226400-
Loss
Alternate
0
Novel Gene


125228400






chr12: 89240400-
Gain
Alternate
0
Novel Gene


89241750






chr14: 99752650-
Loss
Alternate
0
Novel Gene


99754000






chr18: 76805850-
Gain
Alternate
0
Novel Gene


76809250






chr19: 53560600-
Gain
Alternate
0
Novel Gene


53562700






chr2: 45227500-45229600
Gain
Alternate
0
Novel Gene


chr2: 134784950-
Gain
Alternate
0
Novel Gene


134786450






chr2: 176458500-
Gain
Alternate
0
Novel Gene


176460750






chr20: 46600150-
Gain
Alternate
0
Novel Gene


46603250






chr4: 10830100-10832350
Gain
Alternate
0
Novel Gene


chr5: 35404300-35405800
Gain
Alternate
0
Novel Gene


chr5: 42999400-43001150
Gain
Alternate
0
Novel Gene


chr5: 72496650-72498300
Gain
Alternate
0
Novel Gene


chr1: 204682350-
Loss
Alternate
0
Novel Gene


204684550






chr6: 868400-871100
Loss
Alternate
0
Novel Gene


chr1: 220635500-
Gain
Alternate
0
Novel Gene


220637400






chr6: 47146850-47150550
Loss
Alternate
0
Novel Gene


chr6: 160720200-
Gain
Alternate
0
Novel Gene


160722150






chr6: 170474550-
Gain
Alternate
0
Novel Gene


170475800






chr1: 242107250-
Gain
Alternate
0
Novel Gene


242109450






chr7: 27274550-27276500
Gain
Alternate
0
Novel Gene


chr9: 17905350-17908250
Loss
Alternate
0
Novel Gene


chr9: 31848250-31849950
Gain
Alternate
0
Novel Gene


chrX: 56133300-
Gain
Alternate
0
Novel Gene


56134800






chrX: 3466450-3468750
Gain
Alternate
0
Novel Gene


chrX: 6849150-6851300
Gain
Alternate
0
Novel Gene


chr11: 60941900-
Loss
Alternate
0
Novel Gene


60945700






chr11: 71350450-
Gain
Alternate
0
Novel Gene


71351500






chr11: 119775600-
Loss
Alternate
0
Novel Gene


119779600






chr5: 82391600-82392950
Gain
Alternate
0
XRCC4


chr3: 141107100-
Loss
Alternate
0
ZBTB38


141108400






chr18: 45660800-
Loss
Alternate
0
ZBTB7C


45664950






chr13: 100619800-
Gain
Alternate
0
ZIC5


100623100






chr2: 180425300-
Loss
Alternate
0
ZNF385B


180426950






chr19: 53539900-
Gain
Alternate
0
ZNF702P


53541600









To explore the influence of alternative promoters on protein diversity, we identified 714 tumor-specific promoter alterations predicted to change N-terminal protein composition and also supported by both H3K4me3 and RNA-seq data. The vast majority of these alterations (>95%) were in-frame to that of the canonical protein. Of these, 47% (n=338) were predicted to cause gains of new N-terminal peptides in tumors (see Methods). To confirm protein-level expression of these N-terminal peptides in gastrointestinal cancer, we queried publically available peptide spectral data of 90 TCGA colorectal cancer (CRC) and 60 normal colon samples. CRC data was used for this analysis as large-scale proteomic data of primary GCs are not currently available, and because many GC somatic promoters are also observed in CRC (FIG. 2d). Among N-terminal peptides predicted to be gained in tumors, we confirmed protein expression of 33% (112/338) in the CRC data (Table 7), of which 51.8% were overexpressed in CRC samples relative to normal colon samples (FDR 10%). In a separate experiment, we further investigated if these N-terminal peptides also exhibit tumor overexpression in proteomic data from 3 GC cell lines and 1 normal gastric epithelial line (GES1) (Methods and Materials). Similar to the CRC data, 48% of the N-terminal peptides were overexpressed in the GC lines relative to normal GES1 gastric cells. Taken collectively, these analyses suggest that alternative promoters may contribute significantly towards proteomic diversity in gastrointestinal cancer.









TABLE 7







Spectral Counts from CRC samples of N terminal peptides


predicted to be gained in GC













Spectral


SEQ_ID_NO
Peptide
GeneId
Count













SEQ ID NO: 1
IDNSQVESGSLEDDWDFLPPKK
ENSG00000179218.9
2602





SEQ ID NO: 2
FYALSASFEPFSNK
ENSG00000179218.9
2047





SEQ ID NO: 3
EQFLDGDGWTSR
ENSG00000179218.9
1370





SEQ ID NO: 4
IKDPDASKPEDWDER
ENSG00000179218.9
805





SEQ ID NO: 5
GDVTAQIALQPALK
ENSG00000112096.12
601





SEQ ID NO: 6
GISLNPEQWSQLK
ENSG00000113387.7
536





SEQ ID NO: 7
AYHSFLVEPISCHAWNK
ENSG00000130429.8
497





SEQ ID NO: 8
IAVQPGTVGPQGR
ENSG00000134871.13
468





SEQ ID NO: 9
VLAQNSGFDLQETLVK
ENSG00000146731.6
435





SEQ ID NO: 10
CKDDEFTHLYTLIVRPDNTYEVK
ENSG00000179218.9
424





SEQ ID NO: 11
AKIDDPTDSKPEDWDKPEHIPDP
ENSG00000179218.9
414



DAK







SEQ ID NO: 12
VHVIFNYK
ENSG00000179218.9
396





SEQ ID NO: 13
HEQNIDCGGGYVK
ENSG00000179218.9
361





SEQ ID NO: 14
LIDFGLAR
ENSG00000065534.14
359





SEQ ID NO: 15
TWKPTLVILR
ENSG00000130429.8
358





SEQ ID NO: 16
AIWNVINWENVTER
ENSG00000112096.12
353





SEQ ID NO: 17
IDDPTDSKPEDWDKPEHIPDPDA
ENSG00000179218.9
323



K







SEQ ID NO: 18
NVRPDYLK
ENSG00000112096.12
320





SEQ ID NO: 19
NSVSQISVLSGGK
ENSG00000130429.8
317





SEQ ID NO: 20
DGNVLLHEMQIQHPTASLIAK
ENSG00000146731.6
314





SEQ ID NO: 21
AGATHVER
ENSG00000145016.9
311





SEQ ID NO: 22
LVALLNTLDR
ENSG00000119383.15
298





SEQ ID NO: 23
HHAAYVNNLNVTEEK
ENSG00000112096.12
296





SEQ ID NO: 24
FYGDEEKDKGLQTSQDAR
ENSG00000179218.9
290





SEQ ID NO: 25
KVHVIFNYK
ENSG00000179218.9
283





SEQ ID NO: 26
GPLPAAPPVAPER
ENSG00000115310.13
282





SEQ ID NO: 27
VLLSALER
ENSG00000100714.11
277





SEQ ID NO: 28
SVSIGYLLVK
ENSG00000134871.13
276





SEQ ID NO: 29
IQQEIAVQNPLVSER
ENSG00000167770.7
271





SEQ ID NO: 30
GELLEAIKR
ENSG00000112096.12
268





SEQ ID NO: 31
AHNQDLGLAGSCLAR
ENSG00000134871.13
265





SEQ ID NO: 32
YVVVTGITPTPLGEGK
ENSG00000100714.11
256





SEQ ID NO: 33
MEDLDQSPLVSSSDSPPRPQPAF
ENSG00000115310.13
254



K







SEQ ID NO: 34
AAQAPSSFQLLYDLK
ENSG00000100714.11
253





SEQ ID NO: 35
LQAQLNELQAQLSQK
ENSG00000137497.13
250





SEQ ID NO: 36
ALQFLEEVK
ENSG00000146731.6
244





SEQ ID NO: 37
LLTSGYLQR
ENSG00000167770.7
242





SEQ ID NO: 38
GDLNDCFIPCTPK
ENSG00000100714.11
241





SEQ ID NO: 39
ASSEGGTAAGAGLDSLHK
ENSG00000130429.8
240





SEQ ID NO: 40
EAVTEILGIEPDREK
ENSG00000211460.7
236





SEQ ID NO: 41
EVEERPAPTPWGSK
ENSG00000130429.8
235





SEQ ID NO: 42
IITEGFEAAK
ENSG00000146731.6
235





SEQ ID NO: 43
YLNIFGESQPNPK
ENSG00000004864.9
234





SEQ ID NO: 44
LTAASVGVQGSGWGWLGFNK
ENSG00000112096.12
229





SEQ ID NO: 45
IAPLEEGTLPFNLAEAQR
ENSG00000004864.9
221





SEQ ID NO: 46
GQTLVVQFTVK
ENSG00000179218.9
220





SEQ ID NO: 47
AQLGVQAFADALLIIPK
ENSG00000146731.6
217





SEQ ID NO: 48
QVAPEKPVK
ENSG00000113387.7
217





SEQ ID NO: 49
VATAQDDITGDGTTSNVLIIGELL
ENSG00000146731.6
215



K







SEQ ID NO: 50
GLLPQLLGVAPEK
ENSG00000004864.9
214





SEQ ID NO: 51
NAYVWTLK
ENSG00000130429.8
214





SEQ ID NO: 52
IYGADDIELLPEAQHK
ENSG00000100714.11
211





SEQ ID NO: 53
CHAIIDEQPLIFK
ENSG00000169756.12
210





SEQ ID NO: 54
KGISLNPEQWSQLK
ENSG00000113387.7
209





SEQ ID NO: 55
GIDPFSLDALSK
ENSG00000146731.6
207





SEQ ID NO: 56
LLQCYPPPEDAAVK
ENSG00000196961.8
207





SEQ ID NO: 57
GVPTGFILPIR
ENSG00000100714.11
204





SEQ ID NO: 58
IVTCGTDR
ENSG00000130429.8
204





SEQ ID NO: 59
TPVPSDIDISR
ENSG00000100714.11
203





SEQ ID NO: 60
YQEALAK
ENSG00000112096.12
198





SEQ ID NO: 61
VAWVSHDSTVCLADADKK
ENSG00000130429.8
197





SEQ ID NO: 62
LDIDPETITWQR
ENSG00000100714.11
194





SEQ ID NO: 63
IDNSQVESGSLEDDWDFLPPK
ENSG00000179218.9
192





SEQ ID NO: 64
LAILQVGNR
ENSG00000100714.11
192





SEQ ID NO: 65
AQAALAVNISAAR
ENSG00000146731.6
191





SEQ ID NO: 66
GALALAQAVQR
ENSG00000100714.11
189





SEQ ID NO: 67
TDPTTLTDEEINR
ENSG00000100714.11
189





SEQ ID NO: 68
LELSVLYK
ENSG00000167770.7
188





SEQ ID NO: 69
GLDGYQGPDGPR
ENSG00000134871.13
187





SEQ ID NO: 70
LSGLEQPQGALQTR
ENSG00000133316.11
184





SEQ ID NO: 71
SCQTALVEILDVIVR
ENSG00000067704.8
182





SEQ ID NO: 72
DDNMFQIGK
ENSG00000113387.7
181





SEQ ID NO: 73
EHNGQVTGIDWAPESNR
ENSG00000130429.8
179





SEQ ID NO: 74
KIKDPDASKPEDWDER
ENSG00000179218.9
178





SEQ ID NO: 75
MFGIPVVVAVNAFK
ENSG00000100714.11
178





SEQ ID NO: 76
FFEHFIEGGR
ENSG00000167770.7
177





SEQ ID NO: 77
IFHELTQTDK
ENSG00000100714.11
174





SEQ ID NO: 78
FINLFPETK
ENSG00000196961.8
172





SEQ ID NO: 79
FYGDEEKDK
ENSG00000179218.9
172





SEQ ID NO: 80
FNGGGHINHSIFWTNLSPNGGG
ENSG00000112096.12
169



EPK







SEQ ID NO: 81
DPDASKPEDWDER
ENSG00000179218.9
168





SEQ ID NO: 82
LGSPDYGNSALLSLPGYRPTTR
ENSG00000137497.13
168





SEQ ID NO: 83
ASGDSARPVLLQVAESAYR
ENSG00000004864.9
167





SEQ ID NO: 84
TDTESELDLISR
ENSG00000100714.11
166





SEQ ID NO: 85
LDFVCSFLQK
ENSG00000137497.13
165





SEQ ID NO: 86
WIDETPPVDQPSR
ENSG00000119383.15
165





SEQ ID NO: 87
GLLGALTSTPYSPTQHLER
ENSG00000153310.14
164





SEQ ID NO: 88
KPEDWDEEMDGEWEPPVIQNP
ENSG00000179218.9
162



EYK







SEQ ID NO: 89
FSDIQIR
ENSG00000100714.11
160





SEQ ID NO: 90
STSFNVQDLLPDHEYK
ENSG00000065534.14
160





SEQ ID NO: 91
GEQGFMGNTGPTGAVGDR
ENSG00000134871.13
159





SEQ ID NO: 92
QPSQGPTFGIK
ENSG00000100714.11
157





SEQ ID NO: 93
THLSLSHNPEQK
ENSG00000100714.11
157





SEQ ID NO: 94
APVPSTCSSTFPEELSPPSHQAK
ENSG00000137497.13
155





SEQ ID NO: 95
GEGGTTNPHIFPEGSEPK
ENSG00000167770.7
155





SEQ ID NO: 96
TALAEAELEYNPEHVSR
ENSG00000067704.8
155





SEQ ID NO: 97
FPLLKPSPK
ENSG00000067704.8
154





SEQ ID NO: 98
DQAANLMANR
ENSG00000198947.10
153





SEQ ID NO: 99
HLTAQVR
ENSG00000137497.13
153





SEQ ID NO:
FVLSSGK
ENSG00000179218.9
149


100








SEQ ID NO:
SSLPPVLGTESDATVK
ENSG00000065534.14
148


101








SEQ ID NO:
AWGAVVPLVGK
ENSG00000153310.14
146


102








SEQ ID NO:
IEGYPDPEVVWFK
ENSG00000065534.14
145


103








SEQ ID NO:
GKNVLINK
ENSG00000179218.9
144


104








SEQ ID NO:
GLQTSQDAR
ENSG00000179218.9
144


105








SEQ ID NO:
HTLTQIK
ENSG00000146731.6
144


106








SEQ ID NO:
VHAELADVLTEAVVDSILAIK
ENSG00000146731.6
144


107








SEQ ID NO:
YVIHTVGPIAYGEPSASQAAELR
ENSG00000133315.6
142


108








SEQ ID NO:
IQSSHNFQLESVNK
ENSG00000135052.12
141


109








SEQ ID NO:
QIDNPDYK
ENSG00000179218.9
140


110








SEQ ID NO:
DAEGILEDLQSYR
ENSG00000153310.14
139


111








SEQ ID NO:
YTAESSDTLCPR
ENSG00000067704.8
139


112








SEQ ID NO:
EESREPAPASPAPAGVEIR
ENSG00000113657.8
138


113








SEQ ID NO:
EMDRETLIDVAR
ENSG00000146731.6
138


114








SEQ ID NO:
NEVSFVIHNLPVLAK
ENSG00000086475.10
138


115








SEQ ID NO:
QVAPEKPVKK
ENSG00000113387.7
137


116








SEQ ID NO:
FLINLEGGDIR
ENSG00000067704.8
136


117








SEQ ID NO:
LSVNSVTAGDYSR
ENSG00000211460.7
135


118








SEQ ID NO:
QAQVNLTVVDKPDPPAGTPCAS
ENSG00000065534.14
135


119
DIR







SEQ ID NO:
IFDDVSSGVSQLASK
ENSG00000101199.8
134


120








SEQ ID NO:
PDASKPEDWDER
ENSG00000179218.9
134


121








SEQ ID NO:
YGGAPQALTLK
ENSG00000196961.8
132


122








SEQ ID NO:
LVTPGETPSWTGSGFVR
ENSG00000172037.9
131


123








SEQ ID NO:
EQISDIDDAVR
ENSG00000113387.7
129


124








SEQ ID NO:
KPAAGLSAAPVPTAPAAGAPLM
ENSG00000115310.13
129


125
DFGNDFVPPAPR







SEQ ID NO:
ATSSTQSLAR
ENSG00000137497.13
128


126








SEQ ID NO:
LLVPTQFVGAIIGK
ENSG00000136231.9
128


127








SEQ ID NO:
GELLEAIK
ENSG00000112096.12
126


128








SEQ ID NO:
FFQPTEMAAQDFFQR
ENSG00000196961.8
124


129








SEQ ID NO:
GSGSRPGIEGDTPR
ENSG00000113657.8
121


130








SEQ ID NO:
NAIDDGCVVPGAGAVEVAMAE
ENSG00000146731.6
121


131
ALIK







SEQ ID NO:
AAAAAAVGPGAGGAGSAVPGG
ENSG00000142453.7
120


132
AGPCATVSVFPGAR







SEQ ID NO:
DFLTPPLLSVR
ENSG00000196961.8
120


133








SEQ ID NO:
LFVVPADEAQAR
ENSG00000105223.14
120


134








SEQ ID NO:
WMIQYNNLNLK
ENSG00000100714.11
120


135








SEQ ID NO:
SLPISLVFLVPVR
ENSG00000169896.12
119


136








SEQ ID NO:
ALQVGCLLR
ENSG00000196961.8
118


137








SEQ ID NO:
ESFNPESYELDK
ENSG00000086475.10
118


138








SEQ ID NO:
TGWISTSSIWK
ENSG00000067704.8
118


139








SEQ ID NO:
EYAEDDNIYQQK
ENSG00000167770.7
117


140








SEQ ID NO:
TQIAICPNNHEVHIYEK
ENSG00000130429.8
117


141








SEQ ID NO:
SLEAQVAHADQQLR
ENSG00000137497.13
116


142








SEQ ID NO:
SVTLLIK
ENSG00000146731.6
116


143








SEQ ID NO:
IHFVPGWDCHGLPIEIK
ENSG00000067704.8
115


144








SEQ ID NO:
QQPDTELEIQQK
ENSG00000067704.8
115


145








SEQ ID NO:
KGEPVSAEDLGVSGALTVLMK
ENSG00000100714.11
114


146








SEQ ID NO:
LGIGMDTCVIPLR
ENSG00000086475.10
113


147








SEQ ID NO:
QPSWDPSPVSSTVPAPSPLSAAA
ENSG00000115310.13
113


148
VSPSK







SEQ ID NO:
QISEGVEYIHK
ENSG00000065534.14
109


149








SEQ ID NO:
SEGGTAAGAGLDSLHK
ENSG00000130429.8
108


150








SEQ ID NO:
PTGFILPIR
ENSG00000100714.11
107


151








SEQ ID NO:
SQAGVSSGAPPGR
ENSG00000137497.13
107


152








SEQ ID NO:
VCGDSDKGFVVINQK
ENSG00000146731.6
107


153








SEQ ID NO:
LGIVQGIVGAR
ENSG00000172037.9
104


154








SEQ ID NO:
FLSLPEVR
ENSG00000106066.9
103


155








SEQ ID NO:
GLVLDHGAR
ENSG00000146731.6
102


156








SEQ ID NO:
LKNQVTQLK
ENSG00000100714.11
102


157








SEQ ID NO:
TSVQFQNFSPTVVHPGDLQTQL
ENSG00000196961.8
102


158
AVQTK







SEQ ID NO:
EPPYGADVLR
ENSG00000067704.8
101


159








SEQ ID NO:
AAGPLLTDECR
ENSG00000133315.6
100


160








SEQ ID NO:
IIEVAPQVATQNVNPTPGATS
ENSG00000086475.10
100


161








SEQ ID NO:
LFSQGQDVSNK
ENSG00000130396.16
100


162








SEQ ID NO:
VSGPWEEADAEAVAR
ENSG00000090006.13
100


163








SEQ ID NO:
VTGTQPITCTWMK
ENSG00000065534.14
100


164








SEQ ID NO:
VLIDIR
ENSG00000113387.7
99


165








SEQ ID NO:
AVLEEGTDVVIK
ENSG00000067704.8
98


166








SEQ ID NO:
QFAEILHFTLR
ENSG00000153310.14
97


167








SEQ ID NO:
IVGAPMHDLLLWNNATVTTCHS
ENSG00000100714.11
96


168
K







SEQ ID NO:
AYIQENLELVEK
ENSG00000100714.11
95


169








SEQ ID NO:
EIGLLSEEVELYGETK
ENSG00000100714.11
95


170








SEQ ID NO:
DSFLGSIPGK
ENSG00000067704.8
94


171








SEQ ID NO:
QLDALLEALK
ENSG00000172037.9
94


172








SEQ ID NO:
IIDEDFELTER
ENSG00000065534.14
93


173








SEQ ID NO:
DTINLLDQR
ENSG00000135052.12
92


174








SEQ ID NO:
VVQSLEQTAR
ENSG00000211460.7
92


175








SEQ ID NO:
DDSNLYINVK
ENSG00000100714.11
90


176








SEQ ID NO:
VSGQPQSVTASSDK
ENSG00000101199.8
90


177








SEQ ID NO:
EFCQQEVEPMCK
ENSG00000167770.7
89


178








SEQ ID NO:
AGNSLAASTAEETAGSAQGR
ENSG00000172037.9
88


179








SEQ ID NO:
EYWMDPEGEMKPGR
ENSG00000113387.7
88


180








SEQ ID NO:
LQSQLLSIEK
ENSG00000106976.14
88


181








SEQ ID NO:
AGESVELFGK
ENSG00000065534.14
86


182








SEQ ID NO:
NGEFFMSPNDFVTR
ENSG00000004864.9
86


183








SEQ ID NO:
VVVGAPQEIVAANQR
ENSG00000169896.12
86


184








SEQ ID NO:
SQAPLESSLDSLGDVFLDSGRK
ENSG00000137497.13
85


185








SEQ ID NO:
GCLELIK
ENSG00000100714.11
84


186








SEQ ID NO:
HSQTDQEPMCPVGMNK
ENSG00000134871.13
84


187








SEQ ID NO:
NPQVCGPGR
ENSG00000090006.13
83


188








SEQ ID NO:
SRGPGAPCQDVDECAR
ENSG00000090006.13
83


189








SEQ ID NO:
TKDEYLINSQTTEHIVK
ENSG00000067704.8
83


190








SEQ ID NO:
IATTTASAATAAAIGATPR
ENSG00000137497.13
82


191








SEQ ID NO:
LGHELQQAGLK
ENSG00000137497.13
82


192








SEQ ID NO:
TEVPPLLLILDR
ENSG00000136631.8
82


193








SEQ ID NO:
YGDEEKDK
ENSG00000179218.9
82


194








SEQ ID NO:
SESQGTAPAFK
ENSG00000065534.14
81


195








SEQ ID NO:
LPQEPGREQVVEDRPVGGR
ENSG00000135052.12
80


196








SEQ ID NO:
LPYGGQCRPCPCPEGPGSQR
ENSG00000172037.9
79


197








SEQ ID NO:
VYLLYRPGHYDILYK
ENSG00000167770.7
79


198








SEQ ID NO:
FQVATDALK
ENSG00000137497.13
78


199








SEQ ID NO:
LQEGQTLEFLVASVPK
ENSG00000172037.9
78


200








SEQ ID NO:
LQGAVCGVSSGPPPPR
ENSG00000011028.9
78


201








SEQ ID NO:
IQNVVTSFAPQR
ENSG00000172037.9
77


202








SEQ ID NO:
VSTLQNQR
ENSG00000169896.12
77


203








SEQ ID NO:
LSQLEEHLSQLQDNPPQEK
ENSG00000137497.13
76


204








SEQ ID NO:
SQAPLESSLDSLGDVFLDSGR
ENSG00000137497.13
76


205








SEQ ID NO:
AGPDLASCLDVDECR
ENSG00000090006.13
75


206








SEQ ID NO:
GTCHYYANK
ENSG00000134871.13
74


207








SEQ ID NO:
HKSETDTSLIR
ENSG00000146731.6
74


208








SEQ ID NO:
KQQNQELQEQLR
ENSG00000137497.13
74


209








SEQ ID NO:
SGDLYVLAADK
ENSG00000067704.8
74


210








SEQ ID NO:
AFGFSHLEALLDDSK
ENSG00000167770.7
73


211








SEQ ID NO:
EILTLLQGVHQGAGFQDIPK
ENSG00000211460.7
73


212








SEQ ID NO:
IQQCPGTETAEYQSLCPHGR
ENSG00000090006.13
73


213








SEQ ID NO:
KDPDASKPEDWDER
ENSG00000179218.9
73


214








SEQ ID NO:
SYWLSTTAPLPMMPVAEDEIKPY
ENSG00000134871.13
73


215
ISR







SEQ ID NO:
VPQDVLQK
ENSG00000086475.10
73


216








SEQ ID NO:
DFGSFDKFK
ENSG00000112096.12
72


217








SEQ ID NO:
FIILSQEGSLCSVSIEK
ENSG00000065534.14
72


218








SEQ ID NO:
LAVATFAGIENK
ENSG00000004864.9
72


219








SEQ ID NO:
RLENAGSLK
ENSG00000065534.14
72


220








SEQ ID NO:
AAMPPQIIQFPEDQK
ENSG00000065534.14
71


221








SEQ ID NO:
EAQNLSAMEIR
ENSG00000067704.8
71


222








SEQ ID NO:
ILVAGDSMDSVK
ENSG00000196961.8
71


223








SEQ ID NO:
LVHSYPYDWR
ENSG00000067704.8
71


224








SEQ ID NO:
AEAGDAALSVAEWLR
ENSG00000186635.10
70


225








SEQ ID NO:
ELSNFYFSIIK
ENSG00000067704.8
70


226








SEQ ID NO:
AEAAAPYTVLAQSAPR
ENSG00000090006.13
69


227








SEQ ID NO:
GPGAPCQDVDECAR
ENSG00000090006.13
69


228








SEQ ID NO:
VSDFYDIEER
ENSG00000065534.14
69


229








SEQ ID NO:
NNDFYVTGESYAGK
ENSG00000106066.9
68


230








SEQ ID NO:
QPVVDTFDIR
ENSG00000142453.7
68


231








SEQ ID NO:
QQLQALSEPQPR
ENSG00000135052.12
68


232








SEQ ID NO:
APAEILNGKEISAQIR
ENSG00000100714.11
67


233








SEQ ID NO:
KLDVEEPDSANSSFYSTR
ENSG00000137497.13
67


234








SEQ ID NO:
QPPPDSSEEAPPATQNFIIPK
ENSG00000119383.15
67


235








SEQ ID NO:
SLADVDAILAR
ENSG00000172037.9
67


236








SEQ ID NO:
TGGSAQPETPYSGPGLLIDSLVLL
ENSG00000172037.9
67


237
PR







SEQ ID NO:
CDLCQEVLADIGFVK
ENSG00000169756.12
66


238








SEQ ID NO:
FIAGTGCLVR
ENSG00000184207.8
66


239








SEQ ID NO:
HHAAYVNNLNVTEEKYQEALAK
ENSG00000112096.12
66


240








SEQ ID NO:
QGIVHLDLKPENIMCVNK
ENSG00000065534.14
66


241








SEQ ID NO:
TLGDQLSLLLGAR
ENSG00000011028.9
66


242








SEQ ID NO:
CTHWAEGGK
ENSG00000100714.11
65


243








SEQ ID NO:
FGLYLPLFKPSVSTSK
ENSG00000004864.9
65


244








SEQ ID NO:
GSCYPATGDLLVGR
ENSG00000172037.9
65


245








SEQ ID NO:
VMPLIIQGFK
ENSG00000086475.10
65


246








SEQ ID NO:
TPLWIGLAGEEGSR
ENSG00000011028.9
64


247








SEQ ID NO:
TQPDGTSVPGEPASPISQR
ENSG00000137497.13
64


248








SEQ ID NO:
VWGVPIPVFHHK
ENSG00000067704.8
64


249








SEQ ID NO:
ALLNVVDNAR
ENSG00000105223.14
63


250








SEQ ID NO:
GGTTNPHIFPEGSEPK
ENSG00000167770.7
63


251








SEQ ID NO:
YTVNFLEAK
ENSG00000142453.7
63


252








SEQ ID NO:
ATIQGVLR
ENSG00000196961.8
62


253








SEQ ID NO:
GPLGDQYQTVK
ENSG00000172037.9
62


254








SEQ ID NO:
VAAQVDGGAQVQQVLNIECLR
ENSG00000196961.8
62


255








SEQ ID NO:
FTPVVCGLR
ENSG00000090006.13
61


256








SEQ ID NO:
LFPNSLDQTDMHGDSEYNIMFG
ENSG00000179218.9
61


257
PDICGPGTK







SEQ ID NO:
TILLSTTDPADFAVAEALEK
ENSG00000130396.16
61


258








SEQ ID NO:
LTYLGCASVNAPR
ENSG00000011454.12
60


259








SEQ ID NO:
SCYLSSLDLLLEHR
ENSG00000133315.6
60


260








SEQ ID NO:
VVATTQMQAADAR
ENSG00000166825.9
60


261








SEQ ID NO:
GVGGSQPPDIDKTELVEPTEYLV
ENSG00000166825.9
59


262
VHLK







SEQ ID NO:
KEIHTVPDMGK
ENSG00000119383.15
59


263








SEQ ID NO:
LFTALFPFEK
ENSG00000169896.12
59


264








SEQ ID NO:
SLESALK
ENSG00000130429.8
59


265








SEQ ID NO:
VDDQIAIVFK
ENSG00000119383.15
59


266








SEQ ID NO:
VLDPAIPIPDPYSSR
ENSG00000172037.9
59


267








SEQ ID NO:
ATPFIECNGGR
ENSG00000134871.13
58


268








SEQ ID NO:
CSVCEAPAIAIAVHSQDVSIPHCP
ENSG00000134871.13
58


269
AGWR







SEQ ID NO:
EAQVAHADQQLR
ENSG00000137497.13
58


270








SEQ ID NO:
EIILDDDECPLQIFR
ENSG00000130396.16
58


271








SEQ ID NO:
TPAAIPATPVAVSQPIR
ENSG00000130396.16
58


272








SEQ ID NO:
DLGFFGIYK
ENSG00000004864.9
57


273








SEQ ID NO:
EERPAPTPWGSK
ENSG00000130429.8
57


274








SEQ ID NO:
YVGFGNTPPPQK
ENSG00000101199.8
57


275








SEQ ID NO:
CLFQSPLFAK
ENSG00000142453.7
56


276








SEQ ID NO:
SETDTSLIR
ENSG00000146731.6
56


277








SEQ ID NO:
ILETWGELLSK
ENSG00000011454.12
54


278








SEQ ID NO:
YSGLCPHVVVLVATVR
ENSG00000100714.11
54


279








SEQ ID NO:
ENSLLFDPLSSSSSNK
ENSG00000166825.9
53


280








SEQ ID NO:
IKNEAEPEFASR
ENSG00000198947.10
53


281








SEQ ID NO:
VSAPDGPCPTGFER
ENSG00000090006.13
53


282








SEQ ID NO:
AQGIAQGAIR
ENSG00000172037.9
52


283








SEQ ID NO:
KVCGDSDKGFVVINQK
ENSG00000146731.6
52


284








SEQ ID NO:
LWSGYSLLYFEGQEK
ENSG00000134871.13
52


285








SEQ ID NO:
VPIWDQDIQFLPGSQK
ENSG00000133316.11
52


286








SEQ ID NO:
YLSYTLNPDLIR
ENSG00000166825.9
52


287








SEQ ID NO:
YVIGVGDAFR
ENSG00000169896.12
52


288








SEQ ID NO:
DLEVVEGSAAR
ENSG00000065534.14
51


289








SEQ ID NO:
FAVGSGSR
ENSG00000130429.8
50


290








SEQ ID NO:
GFGQSVVQLQGSR
ENSG00000169896.12
50


291








SEQ ID NO:
GLPGEVLGAQPGPR
ENSG00000134871.13
50


292








SEQ ID NO:
LAETLGR
ENSG00000169756.12
50


293








SEQ ID NO:
LPPKVESLESLYFTPIPAR
ENSG00000137497.13
50


294








SEQ ID NO:
PTDSKPEDWDKPEHIPDPDAK
ENSG00000179218.9
50


295








SEQ ID NO:
QLSLPQQEAQK
ENSG00000196961.8
50


296








SEQ ID NO:
DVTTFFSGK
ENSG00000101199.8
49


297








SEQ ID NO:
GQVEQANQELQELIQSVK
ENSG00000172037.9
49


298








SEQ ID NO:
IDDVLHTLTGAMSLLR
ENSG00000130396.16
49


299








SEQ ID NO:
LQLPNCIEDPVSPIVLR
ENSG00000169896.12
49


300








SEQ ID NO:
VESLESLYFTPIPAR
ENSG00000137497.13
49


301








SEQ ID NO:
FGDPLGYEDVIPEADREGVIR
ENSG00000169896.12
48


302








SEQ ID NO:
LEPNAQAQMYR
ENSG00000196961.8
48


303








SEQ ID NO:
DSLEDCVTIWGPEGR
ENSG00000011028.9
47


304








SEQ ID NO:
EAVTEILGIEPDR
ENSG00000211460.7
47


305








SEQ ID NO:
FQNLDKK
ENSG00000130429.8
47


306








SEQ ID NO:
GGECASPLPGLR
ENSG00000090006.13
47


307








SEQ ID NO:
IAVSKPSGPQPQADLQALLQSGA
ENSG00000105223.14
47


308
QVR







SEQ ID NO:
VLELSIPASAEQIQHLAGAIAER
ENSG00000172037.9
47


309








SEQ ID NO:
AAPVPTAPAAGAPLMDFGNDFV
ENSG00000115310.13
46


310
PPAPR







SEQ ID NO:
GGYTCVCPDGFLLDSSR
ENSG00000090006.13
46


311








SEQ ID NO:
VLLTRPGEGGTGLPGPPLITR
ENSG00000152894.10
46


312








SEQ ID NO:
ELQPQQQPR
ENSG00000130396.16
45


313








SEQ ID NO:
FCQLHSSGARPPAPAVPGLTR
ENSG00000090006.13
45


314








SEQ ID NO:
LAAGDQLLSVDGR
ENSG00000130396.16
45


315








SEQ ID NO:
SLTLDTWEPELLK
ENSG00000114331.8
45


316








SEQ ID NO:
EQVPGFTPR
ENSG00000100714.11
44


317








SEQ ID NO:
ETGVPIAGR
ENSG00000100714.11
44


318








SEQ ID NO:
KITIGQAPTEK
ENSG00000100714.11
44


319








SEQ ID NO:
FSTMPFLYCNPGDVCYYASR
ENSG00000134871.13
43


320








SEQ ID NO:
LLTIGDANGEIQR
ENSG00000142453.7
43


321








SEQ ID NO:
LQSQVISELDACK
ENSG00000132205.6
43


322








SEQ ID NO:
LTILAAR
ENSG00000065534.14
43


323








SEQ ID NO:
LVECLETVLNK
ENSG00000196961.8
43


324








SEQ ID NO:
SSPQFGVTLLTYELLQR
ENSG00000004864.9
43


325








SEQ ID NO:
YQCHEEGLVPSK
ENSG00000172037.9
43


326








SEQ ID NO:
GCQLCPPFGSEGFR
ENSG00000090006.13
42


327








SEQ ID NO:
KPGLEEAVESACAMR
ENSG00000067704.8
42


328








SEQ ID NO:
LVQCVDAFEEK
ENSG00000065534.14
42


329








SEQ ID NO:
QWFINITDIK
ENSG00000067704.8
42


330








SEQ ID NO:
SQLEAIFLR
ENSG00000105223.14
42


331








SEQ ID NO:
VLEGSELELAK
ENSG00000137497.13
42


332








SEQ ID NO:
VVQDLAAR
ENSG00000172037.9
42


333








SEQ ID NO:
AIMEFNPR
ENSG00000169896.12
41


334








SEQ ID NO:
ALAEGGSILSR
ENSG00000172037.9
41


335








SEQ ID NO:
EICPAGPGYHYSASDLR
ENSG00000090006.13
41


336








SEQ ID NO:
EQVVEDRPVGGR
ENSG00000135052.12
41


337








SEQ ID NO:
LYCNPGDVCYYASR
ENSG00000134871.13
41


338








SEQ ID NO:
TQDASGPELILPASIEFR
ENSG00000130396.16
41


339








SEQ ID NO:
YSEIEPSTEGEVIYR
ENSG00000172037.9
41


340








SEQ ID NO:
AWCVNCFACSTCNTK
ENSG00000169756.12
40


341








SEQ ID NO:
DDPTDSKPEDWDKPEHIPDPDA
ENSG00000179218.9
40


342
K







SEQ ID NO:
IVQATTLLTMDK
ENSG00000130396.16
40


343








SEQ ID NO:
VDLSTSTDWK
ENSG00000133315.6
40


344








SEQ ID NO:
AQLLQQTR
ENSG00000213380.9
39


345








SEQ ID NO:
DVDECQLFR
ENSG00000090006.13
39


346








SEQ ID NO:
IEGYPDPEVVWFKDDQSIR
ENSG00000065534.14
39


347








SEQ ID NO:
LSSMAMISGLSGR
ENSG00000065534.14
39


348








SEQ ID NO:
NNGVLFENQLLQIGVK
ENSG00000196961.8
39


349








SEQ ID NO:
RADPAELR
ENSG00000004864.9
39


350








SEQ ID NO:
SAPASQASLR
ENSG00000137497.13
39


351








SEQ ID NO:
DWEQFEYK
ENSG00000137497.13
38


352








SEQ ID NO:
IQAELAVILK
ENSG00000137497.13
38


353








SEQ ID NO:
SNRDELELELAENRK
ENSG00000137497.13
38


354








SEQ ID NO:
TPVPEKVPPPKPATPDFR
ENSG00000065534.14
38


355








SEQ ID NO:
VSLEPHQGPGTPESK
ENSG00000137497.13
38


356








SEQ ID NO:
CTEPEDQLYYVK
ENSG00000106066.9
37


357








SEQ ID NO:
ECYFDTAAPDACDNILAR
ENSG00000090006.13
37


358








SEQ ID NO:
FGLGSVAGAVGATAVYPIDLVK
ENSG00000004864.9
37


359








SEQ ID NO:
GQEDAILSYEPVTR
ENSG00000082458.7
37


360








SEQ ID NO:
IMELEGR
ENSG00000135052.12
37


361








SEQ ID NO:
TCVSLAVSR
ENSG00000196961.8
37


362








SEQ ID NO:
TILTLTGVSTLGDVK
ENSG00000184207.8
37


363








SEQ ID NO:
VLQIVTNRDDVQGYAAK
ENSG00000196961.8
37


364








SEQ ID NO:
AFGFSHLEALLDDSKELQR
ENSG00000167770.7
36


365








SEQ ID NO:
AGPDSAGIALYSHEDVCVFK
ENSG00000142453.7
36


366








SEQ ID NO:
AQGVLAAQAR
ENSG00000172037.9
36


367








SEQ ID NO:
LPSFQQSCR
ENSG00000213380.9
36


368








SEQ ID NO:
MLSSFLSEDVFK
ENSG00000166825.9
36


369








SEQ ID NO:
DTEQTLYQVQER
ENSG00000172037.9
35


370








SEQ ID NO:
DVEVTKEEFVLAAQK
ENSG00000004864.9
35


371








SEQ ID NO:
INQLSEENGDLSFK
ENSG00000137497.13
35


372








SEQ ID NO:
LNIPATNVFANR
ENSG00000146733.9
35


373








SEQ ID NO:
SLVKPITQLLGR
ENSG00000169896.12
35


374








SEQ ID NO:
YLCEGTESPYQTGQLHPAIR
ENSG00000152894.10
35


375








SEQ ID NO:
ASMQPIQIAEGTGITTR
ENSG00000137497.13
34


376








SEQ ID NO:
IAGALGGLLTPLFLR
ENSG00000064545.10
34


377








SEQ ID NO:
LGASALDSIQEFR
ENSG00000032444.11
34


378








SEQ ID NO:
SGTIFDNFLITNDEAYAEEFGNET
ENSG00000179218.9
34


379
WGVTK







SEQ ID NO:
TVLDLQSSLAGVSENLK
ENSG00000132205.6
34


380








SEQ ID NO:
AGPDLASCLDVDECRER
ENSG00000090006.13
33


381








SEQ ID NO:
EGGTAAGAGLDSLHK
ENSG00000130429.8
33


382








SEQ ID NO:
FYEFSQR
ENSG00000153310.14
33


383








SEQ ID NO:
GEWIKPGAIVIDCGINYVPDDK
ENSG00000100714.11
33


384








SEQ ID NO:
NDPYHPDHFNCANCGK
ENSG00000169756.12
33


385








SEQ ID NO:
SLEPHQGPGTPESK
ENSG00000137497.13
33


386








SEQ ID NO:
SLGEENFEVVK
ENSG00000132561.9
33


387








SEQ ID NO:
THIDTVINALK
ENSG00000196961.8
33


388








SEQ ID NO:
VHAELADVLTEAVVDSILAIKK
ENSG00000146731.6
33


389








SEQ ID NO:
VMQHQYQVSNLGQR
ENSG00000169896.12
33


390








SEQ ID NO:
ASFITPVPGGVGPMTVAMLMQ
ENSG00000100714.11
32


391
STVESAK







SEQ ID NO:
FEHFIEGGR
ENSG00000167770.7
32


392








SEQ ID NO:
LQQAQLYPIAIFIKPK
ENSG00000082458.7
32


393








SEQ ID NO:
MTLADIER
ENSG00000004864.9
32


394








SEQ ID NO:
TVELLSGVVDQTK
ENSG00000004864.9
32


395








SEQ ID NO:
AMDYDLLLR
ENSG00000172037.9
31


396








SEQ ID NO:
DFGSFDK
ENSG00000112096.12
31


397








SEQ ID NO:
EPAVYFKEQFLDGDGWTSR
ENSG00000179218.9
31


398








SEQ ID NO:
FLINLEGGDIREESSYK
ENSG00000067704.8
31


399








SEQ ID NO:
GEWIKPGAIVIDCGINYVPDDKK
ENSG00000100714.11
31


400
PNGR







SEQ ID NO:
HAVVVGR
ENSG00000100714.11
31


401








SEQ ID NO:
LEGDTFLLLIQSLK
ENSG00000104450.8
31


402








SEQ ID NO:
NTSVVDSEPVR
ENSG00000162614.14
31


403








SEQ ID NO:
PGTTDQVPR
ENSG00000113657.8
31


404








SEQ ID NO:
QLDQHLDLLK
ENSG00000172037.9
31


405








SEQ ID NO:
TVIVHGFTLGEK
ENSG00000067704.8
31


406








SEQ ID NO:
YAPDDIPNINSTCFK
ENSG00000130396.16
31


407








SEQ ID NO:
AADLLYAMCDR
ENSG00000196961.8
30


408








SEQ ID NO:
EMGEAFAADIPR
ENSG00000196961.8
30


409








SEQ ID NO:
IQGTLQPHAR
ENSG00000172037.9
30


410








SEQ ID NO:
LPIAVNGSLIYGVCAGK
ENSG00000059691.7
30


411








SEQ ID NO:
VNDDLISEFPHK
ENSG00000082458.7
30


412








SEQ ID NO:
DGGCSLPILR
ENSG00000090006.13
29


413








SEQ ID NO:
ENVDYIIQELR
ENSG00000136631.8
29


414








SEQ ID NO:
GAAVDEYFR
ENSG00000142453.7
29


415








SEQ ID NO:
GETAVPGAPEALR
ENSG00000184207.8
29


416








SEQ ID NO:
ILYSFATAFR
ENSG00000011454.12
29


417








SEQ ID NO:
NVFECNDQVVK
ENSG00000169896.12
29


418








SEQ ID NO:
STGSFVGELMYK
ENSG00000004864.9
29


419








SEQ ID NO:
TIRDLEVVEGSAAR
ENSG00000065534.14
29


420








SEQ ID NO:
TVFEALQAPACHENMVK
ENSG00000196961.8
29


421








SEQ ID NO:
VGLLQYGSTVK
ENSG00000132561.9
29


422








SEQ ID NO:
YVLSNQYRPDISPTER
ENSG00000130396.16
29


423








SEQ ID NO:
AEAELEYNPEHVSR
ENSG00000067704.8
28


424








SEQ ID NO:
ASPDLVPMGEWTAR
ENSG00000196961.8
28


425








SEQ ID NO:
CEACAPGHFGDPSRPGGR
ENSG00000172037.9
28


426








SEQ ID NO:
EDGYSDASGFGYCFR
ENSG00000090006.13
28


427








SEQ ID NO:
GDLIGVVEALTR
ENSG00000032444.11
28


428








SEQ ID NO:
LAILQVGNRDDSNLYINVK
ENSG00000100714.11
28


429








SEQ ID NO:
NDAGQAECSCQVTVDDAPASE
ENSG00000065534.14
28


430
NTK







SEQ ID NO:
QNWFEAFEILDK
ENSG00000106066.9
28


431








SEQ ID NO:
SSEGLLATATVPLDLFK
ENSG00000157617.12
28


432








SEQ ID NO:
STTTIGLVQALGAHLYQNVFACV
ENSG00000100714.11
28


433
R







SEQ ID NO:
VLVLEMFSGGDAAALER
ENSG00000172037.9
28


434








SEQ ID NO:
KQVAPEKPVK
ENSG00000113387.7
27


435








SEQ ID NO:
LQELEGTYEENER
ENSG00000172037.9
27


436








SEQ ID NO:
LVEQHGSDIWWTLPPEQLLPK
ENSG00000067704.8
27


437








SEQ ID NO:
NPTFMCLALHCIANVGSR
ENSG00000196961.8
27


438








SEQ ID NO:
SSDGRPDSGGTLR
ENSG00000130396.16
27


439








SEQ ID NO:
AAPQPLNLVSSVTLSK
ENSG00000114861.14
26


440








SEQ ID NO:
AVQAQGGESQQEAQR
ENSG00000137497.13
26


441








SEQ ID NO:
DFLNQEGADPDSIEMVATR
ENSG00000172037.9
26


442








SEQ ID NO:
GQVLDVVER
ENSG00000172037.9
26


443








SEQ ID NO:
LALIQPSR
ENSG00000146733.9
26


444








SEQ ID NO:
LQQDVLQFQK
ENSG00000135052.12
26


445








SEQ ID NO:
LTFEELER
ENSG00000162614.14
26


446








SEQ ID NO:
QVTPLFIHFR
ENSG00000166825.9
26


447








SEQ ID NO:
SFNVQDLLPDHEYK
ENSG00000065534.14
26


448








SEQ ID NO:
SSCISQHVISEAK
ENSG00000090006.13
26


449








SEQ ID NO:
VLQIVTNR
ENSG00000196961.8
26


450








SEQ ID NO:
VVGDVAYDEAK
ENSG00000100714.11
26


451








SEQ ID NO:
ALQSGPPQSR
ENSG00000136231.9
25


452








SEQ ID NO:
ITIGQAPTEK
ENSG00000100714.11
25


453








SEQ ID NO:
KAQGVLAAQAR
ENSG00000172037.9
25


454








SEQ ID NO:
LKENLYPYLGPSTLR
ENSG00000136631.8
25


455








SEQ ID NO:
LPVTINK
ENSG00000196961.8
25


456








SEQ ID NO:
SILTAIPNDDPYFHITK
ENSG00000213380.9
25


457








SEQ ID NO:
SLGNVIHPDVVVNGGQDQSK
ENSG00000067704.8
25


458








SEQ ID NO:
AVQTSIATAYR
ENSG00000114331.8
24


459








SEQ ID NO:
DASKPEDWDER
ENSG00000179218.9
24


460








SEQ ID NO:
IPVSGPFLVK
ENSG00000136231.9
24


461








SEQ ID NO:
LLGPAGLTWER
ENSG00000138162.13
24


462








SEQ ID NO:
LPVEAFSAVFTK
ENSG00000032444.11
24


463








SEQ ID NO:
SEESTTVHSSPGATGTALFPTR
ENSG00000205277.5
24


464








SEQ ID NO:
SEESTTVHSSPGATGTALFPTR
ENSG00000205277.5
24


465








SEQ ID NO:
SEESTTVHSSPGATGTALFPTR
ENSG00000205277.5
24


466








SEQ ID NO:
TKVHAELADVLTEAVVDSILAIK
ENSG00000146731.6
24


467








SEQ ID NO:
YGEGHQAWIIGIVEK
ENSG00000086475.10
24


468








SEQ ID NO:
ADLYLEGK
ENSG00000067704.8
23


469








SEQ ID NO:
CLEEKNEILQGK
ENSG00000137497.13
23


470








SEQ ID NO:
FIFDCVSQEYGINPER
ENSG00000184207.8
23


471








SEQ ID NO:
IHGTEEGQQILK
ENSG00000137497.13
23


472








SEQ ID NO:
KIQTQLQR
ENSG00000166825.9
23


473








SEQ ID NO:
KVVGDVAYDEAK
ENSG00000100714.11
23


474








SEQ ID NO:
LDSISGNLQR
ENSG00000132205.6
23


475








SEQ ID NO:
LFEDLEFQQLER
ENSG00000019144.12
23


476








SEQ ID NO:
SLGNVIHPDVVVNGGQDQSKEP
ENSG00000067704.8
23


477
PYGADVLR







SEQ ID NO:
TEVNSGFFYK
ENSG00000146731.6
23


478








SEQ ID NO:
TSAGTFPGSQPQAPASPVLPARP
ENSG00000090006.13
23


479
PPPPLPR







SEQ ID NO:
VHSPQQVDFR
ENSG00000065534.14
23


480








SEQ ID NO:
VLTGNTIALVLGGGGAR
ENSG00000032444.11
23


481








SEQ ID NO:
VSALSVVR
ENSG00000004864.9
23


482








SEQ ID NO:
ASLENGVLLCDLINK
ENSG00000136153.15
22


483








SEQ ID NO:
ETLIDVAR
ENSG00000146731.6
22


484








SEQ ID NO:
FESKPQSQEVK
ENSG00000065534.14
22


485








SEQ ID NO:
GHLQIAACPNQDPLQGTTGLIPL
ENSG00000112096.12
22


486
LGIDVWEHAYYLQYK







SEQ ID NO:
GICEALEDSDGRQDSPAGELPK
ENSG00000132561.9
22


487








SEQ ID NO:
GYLAPSGDLSLR
ENSG00000090006.13
22


488








SEQ ID NO:
LQSQLLSIEKEVEEYK
ENSG00000106976.14
22


489








SEQ ID NO:
SGQGSDRGSGSRPGIEGDTPR
ENSG00000113657.8
22


490








SEQ ID NO:
VAISTFQK
ENSG00000213380.9
22


491








SEQ ID NO:
GQDIFIIQTIPR
ENSG00000161542.12
21


492








SEQ ID NO:
ITLDAQDVLAHLVQMAFK
ENSG00000130396.16
21


493








SEQ ID NO:
RTEVPPLLLILDR
ENSG00000136631.8
21


494








SEQ ID NO:
SSPPVQFSLLHSK
ENSG00000196961.8
21


495








SEQ ID NO:
SSTGSPTSPLNAEK
ENSG00000065534.14
21


496








SEQ ID NO:
TKFPAEQYYR
ENSG00000211460.7
21


497








SEQ ID NO:
ANFWYQPSFHGVDLSALR
ENSG00000142453.7
20


498








SEQ ID NO:
DAQIAMMQQR
ENSG00000137497.13
20


499








SEQ ID NO:
EHGAFDAVK
ENSG00000100714.11
20


500








SEQ ID NO:
GLAQADGTLITCVDSGILR
ENSG00000133316.11
20


501








SEQ ID NO:
GLNCEQCQDFYR
ENSG00000172037.9
20


502








SEQ ID NO:
KVVATTQMQAADAR
ENSG00000166825.9
20


503








SEQ ID NO:
MKLTHSLQEELEK
ENSG00000151914.13
20


504








SEQ ID NO:
NIDVFNVEDQKR
ENSG00000135052.12
20


505








SEQ ID NO:
QASDKDDRPFQGEDVENSR
ENSG00000130396.16
20


506








SEQ ID NO:
SLDQTDMHGDSEYNIMFGPDIC
ENSG00000179218.9
20


507
GPGTK







SEQ ID NO:
STIFHSSPDASGTTPSSAHSTTSG
ENSG00000205277.5
20


508
R







SEQ ID NO:
STIFHSSPDASGTTPSSAHSTTSG
ENSG00000205277.5
20


509
R







SEQ ID NO:
STIFHSSPDASGTTPSSAHSTTSG
ENSG00000205277.5
20


510
R







SEQ ID NO:
STIFHSSPDASGTTPSSAHSTTSG
ENSG00000205277.5
20


511
R







SEQ ID NO:
VCLHVQK
ENSG00000169896.12
20


512








SEQ ID NO:
VSQFLQVLETDLYR
ENSG00000213380.9
20


513








SEQ ID NO:
VSSTATTQDVIETLAEK
ENSG00000130396.16
20


514








SEQ ID NO:
YNTRPLGQEPPR
ENSG00000090006.13
20


515








SEQ ID NO:
ANHPMDAEVTK
ENSG00000196961.8
19


516








SEQ ID NO:
ASELGHSLNENVLKPAQEK
ENSG00000101199.8
19


517








SEQ ID NO:
AWVSHDSTVCLADADKK
ENSG00000130429.8
19


518








SEQ ID NO:
FSYDLSQCINQMK
ENSG00000135052.12
19


519








SEQ ID NO:
IYQFTAASPK
ENSG00000005020.8
19


520








SEQ ID NO:
KQDEPIDLFMIEIMEMK
ENSG00000146731.6
19


521








SEQ ID NO:
NIMAGLQQTNSEK
ENSG00000198947.10
19


522








SEQ ID NO:
RPDYLK
ENSG00000112096.12
19


523








SEQ ID NO:
SEESTTVHSSPVATATTPSPAR
ENSG00000205277.5
19


524








SEQ ID NO:
SEESTTVHSSPVATATTPSPAR
ENSG00000205277.5
19


525








SEQ ID NO:
SEESTTVHSSPVATATTPSPAR
ENSG00000205277.5
19


526








SEQ ID NO:
SEESTTVHSSPVATATTPSPAR
ENSG00000205277.5
19


527








SEQ ID NO:
THLTSLK
ENSG00000211460.7
19


528








SEQ ID NO:
AQEAEQLLR
ENSG00000172037.9
18


529








SEQ ID NO:
AQIINDAFNLASAHK
ENSG00000166825.9
18


530








SEQ ID NO:
DQLGGWFQSSLLTSVAAR
ENSG00000067704.8
18


531








SEQ ID NO:
GADDIELLPEAQHK
ENSG00000100714.11
18


532








SEQ ID NO:
GFSHLEALLDDSK
ENSG00000167770.7
18


533








SEQ ID NO:
GLLTDSPAATVLAEAR
ENSG00000019144.12
18


534








SEQ ID NO:
HSNFLGAYDSIR
ENSG00000172037.9
18


535








SEQ ID NO:
KNEFQGELEK
ENSG00000135052.12
18


536








SEQ ID NO:
SFLEEVLASGLHSR
ENSG00000136631.8
18


537








SEQ ID NO:
TEILGIEPDREK
ENSG00000211460.7
18


538








SEQ ID NO:
VILLDPSIIEAK
ENSG00000104450.8
18


539








SEQ ID NO:
AETVQAALEEAQR
ENSG00000172037.9
17


540








SEQ ID NO:
AFVENYPQFK
ENSG00000136631.8
17


541








SEQ ID NO:
DFISNLLK
ENSG00000065534.14
17


542








SEQ ID NO:
DGFFGLSISDR
ENSG00000172037.9
17


543








SEQ ID NO:
DHVFQVNNFEALK
ENSG00000169896.12
17


544








SEQ ID NO:
DPTDSKPEDWDKPEHIPDPDAK
ENSG00000179218.9
17


545








SEQ ID NO:
KIIELK
ENSG00000146731.6
17


546








SEQ ID NO:
LCCPVALAQDVTGALEDALAK
ENSG00000213380.9
17


547








SEQ ID NO:
PAIAHLIHSLNPVR
ENSG00000106066.9
17


548








SEQ ID NO:
PSSTPTTHFSASSTTLGR
ENSG00000205277.5
17


549








SEQ ID NO:
PSSTPTTHFSASSTTLGR
ENSG00000205277.5
17


550








SEQ ID NO:
PSSTPTTHFSASSTTLGR
ENSG00000205277.5
17


551








SEQ ID NO:
PSSTPTTHFSASSTTLGR
ENSG00000205277.5
17


552








SEQ ID NO:
PSSTPTTHFSASSTTLGR
ENSG00000205277.5
17


553








SEQ ID NO:
PSSTPTTHFSASSTTLGR
ENSG00000205277.5
17


554








SEQ ID NO:
PSSTPTTHFSASSTTLGR
ENSG00000205277.5
17


555








SEQ ID NO:
QFVTGIIDSLTISPK
ENSG00000132561.9
17


556








SEQ ID NO:
SEAVLQSPEFAIFR
ENSG00000198947.10
17


557








SEQ ID NO:
TTQGLTALLLSLK
ENSG00000136631.8
17


558








SEQ ID NO:
VPLSVQLKPEVSPTQDIR
ENSG00000125826.15
17


559








SEQ ID NO:
VTAIDFR
ENSG00000004864.9
17


560








SEQ ID NO:
YLIFPNPVCLEPGISYK
ENSG00000172037.9
17


561








SEQ ID NO:
YRLPNTLKPDSYR
ENSG00000166825.9
17


562








SEQ ID NO:
AFLLSLAALR
ENSG00000105223.14
16


563








SEQ ID NO:
DLAQYSSNDAVVETSLTK
ENSG00000114331.8
16


564








SEQ ID NO:
DRLPQEPGREQVVEDRPVGGR
ENSG00000135052.12
16


565








SEQ ID NO:
EAIQHPADEKLQEK
ENSG00000153310.14
16


566








SEQ ID NO:
EFQNNPNPR
ENSG00000169896.12
16


567








SEQ ID NO:
ELSAALQDKK
ENSG00000137497.13
16


568








SEQ ID NO:
ELSGSGLER
ENSG00000213380.9
16


569








SEQ ID NO:
ELWILNR
ENSG00000166825.9
16


570








SEQ ID NO:
FSTEYELQQLEQFKK
ENSG00000166825.9
16


571








SEQ ID NO:
GPALCGSQR
ENSG00000090006.13
16


572








SEQ ID NO:
GPLEPGPPKPGVPQEPGR
ENSG00000125826.15
16


573








SEQ ID NO:
GSLYQCDYSTGSCEPIR
ENSG00000169896.12
16


574








SEQ ID NO:
IQTQLQR
ENSG00000166825.9
16


575








SEQ ID NO:
KNSSIIGDYKQICSQLSER
ENSG00000011454.12
16


576








SEQ ID NO:
LEINFEELLK
ENSG00000162614.14
16


577








SEQ ID NO:
LIVPEPDVDFDAK
ENSG00000132205.6
16


578








SEQ ID NO:
LVGPEGFVVTEAGFGADIGMEK
ENSG00000100714.11
16


579








SEQ ID NO:
QEHCGCYTLLVENK
ENSG00000065534.14
16


580








SEQ ID NO:
RSQAGVSSGAPPGR
ENSG00000137497.13
16


581








SEQ ID NO:
SPGSTPTTHFPASSTTSGHSEK
ENSG00000205277.5
16


582








SEQ ID NO:
SPGSTPTTHFPASSTTSGHSEK
ENSG00000205277.5
16


583








SEQ ID NO:
SPGSTPTTHFPASSTTSGHSEK
ENSG00000205277.5
16


584








SEQ ID NO:
SPGSTPTTHFPASSTTSGHSEK
ENSG00000205277.5
16


585








SEQ ID NO:
VLSQIDVAQK
ENSG00000198947.10
16


586








SEQ ID NO:
YGGMFCNVEGAFESK
ENSG00000113657.8
16


587








SEQ ID NO:
ATVVVEATEPEPSGSIANPAASTS
ENSG00000131711.10
15


588
PSLSHR







SEQ ID NO:
EMTADVIELK
ENSG00000067704.8
15


589








SEQ ID NO:
GEQGFMGNTGPTGAVGDRGPK
ENSG00000134871.13
15


590








SEQ ID NO:
LAEAELEYNPEHVSR
ENSG00000067704.8
15


591








SEQ ID NO:
LESEEDVSQAFLEAVAEEKPHVK
ENSG00000065534.14
15


592
PYFSK







SEQ ID NO:
LMCELGNDVINR
ENSG00000114331.8
15


593








SEQ ID NO:
QQQDYWLIDVR
ENSG00000166825.9
15


594








SEQ ID NO:
SSEGGTAAGAGLDSLHK
ENSG00000130429.8
15


595








SEQ ID NO:
SYKPVFWSPSSR
ENSG00000067704.8
15


596








SEQ ID NO:
TAHLDEEVNKGDILVVATGQPE
ENSG00000100714.11
15


597
MVK







SEQ ID NO:
TRPDGNCFYR
ENSG00000167770.7
15


598








SEQ ID NO:
TSGQCLCR
ENSG00000172037.9
15


599








SEQ ID NO:
AFCVANK
ENSG00000114331.8
14


600








SEQ ID NO:
AMISGLSGR
ENSG00000065534.14
14


601








SEQ ID NO:
AVESSKPLSNAQPSGPLKPVGN
ENSG00000065534.14
14


602








SEQ ID NO:
AYHSFLVEPISCHAWNKDR
ENSG00000130429.8
14


603








SEQ ID NO:
EGVVDIYNCVK
ENSG00000152894.10
14


604








SEQ ID NO:
GTWIHPEIDNPEYSPDPSIYAYD
ENSG00000179218.9
14


605
NFGVLGLDLWQVK







SEQ ID NO:
HLTQAVCTVK
ENSG00000141447.12
14


606








SEQ ID NO:
ITISPLQELTLYNPER
ENSG00000136231.9
14


607








SEQ ID NO:
LACESASSTEVSGALK
ENSG00000169896.12
14


608








SEQ ID NO:
LDCTQCLQHPWLMK
ENSG00000065534.14
14


609








SEQ ID NO:
LDEEAENLVATVVPTHLAAAVPE
ENSG00000119383.15
14


610
VAVYLK







SEQ ID NO:
LPEDDEPPARPPPPPPASVSPQA
ENSG00000115310.13
14


611
EPVWTPPAPAPAAPPSTPAAPK







SEQ ID NO:
LPNTLKPDSYR
ENSG00000166825.9
14


612








SEQ ID NO:
LSSTQQSLAEK
ENSG00000082805.15
14


613








SEQ ID NO:
LVALETGIQK
ENSG00000019144.12
14


614








SEQ ID NO:
MHGGGPTVTAGLPLPK
ENSG00000100714.11
14


615








SEQ ID NO:
QQALELVVQEVSSVLR
ENSG00000157617.12
14


616








SEQ ID NO:
QSMAFSILNTPK
ENSG00000137497.13
14


617








SEQ ID NO:
SSNLLDLK
ENSG00000142453.7
14


618








SEQ ID NO:
VLQDQLK
ENSG00000135052.12
14


619








SEQ ID NO:
WVSHDSTVCLADADKK
ENSG00000130429.8
14


620








SEQ ID NO:
AAQLDGLEAR
ENSG00000172037.9
13


621








SEQ ID NO:
ANALASATCER
ENSG00000169756.12
13


622








SEQ ID NO:
ATDNEPSQFSEPR
ENSG00000132205.6
13


623








SEQ ID NO:
CGFSELYSWQR
ENSG00000067704.8
13


624








SEQ ID NO:
DLLQAAQDK
ENSG00000172037.9
13


625








SEQ ID NO:
EPAPASPAPAGVEIR
ENSG00000113657.8
13


626








SEQ ID NO:
EYELFEFR
ENSG00000136631.8
13


627








SEQ ID NO:
HKPGIVQETTFDLGGDIHSGTAL
ENSG00000130396.16
13


628
PTSK







SEQ ID NO:
IWDLQGSEEPVFR
ENSG00000133316.11
13


629








SEQ ID NO:
LFGDVEASLGR
ENSG00000213380.9
13


630








SEQ ID NO:
LHTLGDNLLDPR
ENSG00000172037.9
13


631








SEQ ID NO:
RFSDIQIR
ENSG00000100714.11
13


632








SEQ ID NO:
SEVYGPMK
ENSG00000166825.9
13


633








SEQ ID NO:
SLSESAATR
ENSG00000159788.14
13


634








SEQ ID NO:
VTCVEMEPLAEYVVR
ENSG00000152894.10
13


635








SEQ ID NO:
YLFEEDNLLR
ENSG00000132561.9
13


636








SEQ ID NO:
AAECLDVDECHR
ENSG00000090006.13
12


637








SEQ ID NO:
AGMSSLKG
ENSG00000146731.6
12


638








SEQ ID NO:
ALASATCER
ENSG00000169756.12
12


639








SEQ ID NO:
CDSHDDPALGLVSGQCR
ENSG00000172037.9
12


640








SEQ ID NO:
DCSIALPYVCK
ENSG00000011028.9
12


641








SEQ ID NO:
DISLQGPGLAPEHCYIENLR
ENSG00000019144.12
12


642








SEQ ID NO:
FVLDHEDGLNLNEDLENFLQK
ENSG00000137497.13
12


643








SEQ ID NO:
GANQHATDEEGKDPLSIAVEAA
ENSG00000114331.8
12


644
NADIVTLLR







SEQ ID NO:
GFSHLEALLDDSKELQR
ENSG00000167770.7
12


645








SEQ ID NO:
GSGVSNFAQLIVR
ENSG00000152894.10
12


646








SEQ ID NO:
IINDAFNLASAHK
ENSG00000166825.9
12


647








SEQ ID NO:
KVVQSLEQTAR
ENSG00000211460.7
12


648








SEQ ID NO:
QPAVEEPAEVTATVLASR
ENSG00000076662.5
12


649








SEQ ID NO:
QTQVLGLTQTCETLK
ENSG00000169896.12
12


650








SEQ ID NO:
RVEDAYILTCNVSLEYEK
ENSG00000146731.6
12


651








SEQ ID NO:
TLDFDALSVGQR
ENSG00000113657.8
12


652








SEQ ID NO:
VVNAMGK
ENSG00000169756.12
12


653








SEQ ID NO:
AKIDDPTDSKPEDWDKPEHIPD
ENSG00000179218.9
11


654








SEQ ID NO:
ALEQLLTELDDFLK
ENSG00000169129.10
11


655








SEQ ID NO:
ASKPEDWDER
ENSG00000179218.9
11


656








SEQ ID NO:
DLNQLFQQDSSSR
ENSG00000082805.15
11


657








SEQ ID NO:
ETPGRPPDPTGAPLPGPTGDPVK
ENSG00000032444.11
11


658
PTSLETPSAPLLSR







SEQ ID NO:
GSACEEDVDECAQEPPPCGPGR
ENSG00000090006.13
11


659








SEQ ID NO:
KASSEGGTAAGAGLDSLHK
ENSG00000130429.8
11


660








SEQ ID NO:
LGFITNNSSK
ENSG00000184207.8
11


661








SEQ ID NO:
LPSHSDFLAELR
ENSG00000169896.12
11


662








SEQ ID NO:
LQDVHVAEGK
ENSG00000065534.14
11


663








SEQ ID NO:
LVTCTGYHQVR
ENSG00000133316.11
11


664








SEQ ID NO:
SIQLPTTVR
ENSG00000166825.9
11


665








SEQ ID NO:
VLSELGR
ENSG00000067704.8
11


666








SEQ ID NO:
WAPNENKFAVGSGSR
ENSG00000130429.8
11


667








SEQ ID NO:
AQELQQTGVLGAFESSFWHMQ
ENSG00000172037.9
10


668
EK







SEQ ID NO:
ASAAAAAGGGATGHPGGGQGA
ENSG00000104450.8
10


669
ENPAGLK







SEQ ID NO:
EAENFHEEDDVDVRPAR
ENSG00000162614.14
10


670








SEQ ID NO:
ERLPSHSDFLAELR
ENSG00000169896.12
10


671








SEQ ID NO:
EWSLESSPAQNWTPPQPR
ENSG00000101199.8
10


672








SEQ ID NO:
FYALSASFEPFSNKG
ENSG00000179218.9
10


673








SEQ ID NO:
GISLNPEQWSQLKEQISDIDDAV
ENSG00000113387.7
10


674
R







SEQ ID NO:
HPLLVGHMPVMVAK
ENSG00000104728.11
10


675








SEQ ID NO:
IAHGNSSIIADR
ENSG00000100714.11
10


676








SEQ ID NO:
IYADSLKPNIPYK
ENSG00000130396.16
10


677








SEQ ID NO:
LAILDSQAGQIR
ENSG00000019144.12
10


678








SEQ ID NO:
NMVVDDDSPEMYK
ENSG00000162614.14
10


679








SEQ ID NO:
NRLDCTQCLQHPWLMK
ENSG00000065534.14
10


680








SEQ ID NO:
PVLLQVAESAYR
ENSG00000004864.9
10


681








SEQ ID NO:
QEPLGSDSEGVNCLAYDEAIMA
ENSG00000167770.7
10


682
QQDR







SEQ ID NO:
QEVEELWIGLNDLK
ENSG00000011028.9
10


683








SEQ ID NO:
SFVIHNLPVLAK
ENSG00000086475.10
10


684








SEQ ID NO:
STTFHSSPR
ENSG00000205277.5
10


685








SEQ ID NO:
STTFHSSPR
ENSG00000205277.5
10


686








SEQ ID NO:
STTFHSSPR
ENSG00000205277.5
10


687








SEQ ID NO:
TAAGLMHTFNAHAATDITGFGIL
ENSG00000086475.10
10


688
GHAQNLAK







SEQ ID NO:
TGAFGLR
ENSG00000172037.9
10


689








SEQ ID NO:
TSLTVVLLR
ENSG00000076662.5
10


690








SEQ ID NO:
VPPLLIYGPFGTGK
ENSG00000130589.12
10


691








SEQ ID NO:
VPSFAAGR
ENSG00000136231.9
10


692








SEQ ID NO:
VPVGDQPPDIEFQIR
ENSG00000106976.14
10


693








SEQ ID NO:
VYDPASPQR
ENSG00000133316.11
10


694








SEQ ID NO:
WFYIDFGGVKPMGSEPVPK
ENSG00000004864.9
10


695








SEQ ID NO:
WTPPAPAPAAPPSTPAAPK
ENSG00000115310.13
10


696








SEQ ID NO:
YDNQWFHGCTSTGR
ENSG00000011028.9
10


697








SEQ ID NO:
YFSYDCGADFPGVPLAPPR
ENSG00000172037.9
10


698








SEQ ID NO:
YGDEEKDKGLQTSQDAR
ENSG00000179218.9
10


699








SEQ ID NO:
YLETADYAIR
ENSG00000196961.8
10


700








SEQ ID NO:
AKQPDLAPGLTTIGASPTQTVTL
ENSG00000198947.10
9


701
VTQPVVTK







SEQ ID NO:
ASPLLPANHVTMAK
ENSG00000067704.8
9


702








SEQ ID NO:
AVLELLQRPGNAR
ENSG00000105963.9
9


703








SEQ ID NO:
CFQVQGQEPQSR
ENSG00000011028.9
9


704








SEQ ID NO:
DKGLQTSQDAR
ENSG00000179218.9
9


705








SEQ ID NO:
DLTALSNMLPK
ENSG00000166825.9
9


706








SEQ ID NO:
DPFSLDALSK
ENSG00000146731.6
9


707








SEQ ID NO:
FGDPLGYEDVIPEADR
ENSG00000169896.12
9


708








SEQ ID NO:
FGLYLPLFK
ENSG00000004864.9
9


709








SEQ ID NO:
FSTEYELQQLEQFK
ENSG00000166825.9
9


710








SEQ ID NO:
GAVYLFHGTSGSGISPSHSQR
ENSG00000169896.12
9


711








SEQ ID NO:
HLCELLAQQF
ENSG00000196961.8
9


712








SEQ ID NO:
ILDQENLSSTALVK
ENSG00000169129.10
9


713








SEQ ID NO:
ISETTMLQSGMK
ENSG00000130396.16
9


714








SEQ ID NO:
ISYHGSCPQGLADSAWIPFR
ENSG00000011028.9
9


715








SEQ ID NO:
KQNWFEAFEILDK
ENSG00000106066.9
9


716








SEQ ID NO:
PISLVFLVPVR
ENSG00000169896.12
9


717








SEQ ID NO:
SKESSQVTSR
ENSG00000136631.8
9


718








SEQ ID NO:
SPPPCTYGR
ENSG00000090006.13
9


719








SEQ ID NO:
SQLNCLLLSGR
ENSG00000133316.11
9


720








SEQ ID NO:
TPLSAAAHTHPVYCVNVVGTQN
ENSG00000158560.10
9


721
AHNLITVSTDGK







SEQ ID NO:
VNYDEENWR
ENSG00000166825.9
9


722








SEQ ID NO:
VSFVIHNLPVLAK
ENSG00000086475.10
9


723








SEQ ID NO:
VTLRPYLTPNDR
ENSG00000166825.9
9


724








SEQ ID NO:
WNVINWENVTER
ENSG00000112096.12
9


725








SEQ ID NO:
ADTDGGLIFR
ENSG00000163975.7
8


726








SEQ ID NO:
AGYTGLR
ENSG00000172037.9
8


727








SEQ ID NO:
AVESSKPLSNAQPSGPLKPVGNA
ENSG00000065534.14
8


728
K







SEQ ID NO:
CSEGFVLAEDGRR
ENSG00000132561.9
8


729








SEQ ID NO:
DLMVLNDVYR
ENSG00000166825.9
8


730








SEQ ID NO:
FPAEQYYR
ENSG00000211460.7
8


731








SEQ ID NO:
FTGHCSCRPGVSGVR
ENSG00000172037.9
8


732








SEQ ID NO:
GDPGDTGAPGPVGMK
ENSG00000134871.13
8


733








SEQ ID NO:
GGPSLSSVLNELPSAATLR
ENSG00000167608.7
8


734








SEQ ID NO:
IKDPDASKPEDWDERAK
ENSG00000179218.9
8


735








SEQ ID NO:
ILCIGAVPGLQPR
ENSG00000110237.3
8


736








SEQ ID NO:
IQSDLTSHEISLEEMKK
ENSG00000198947.10
8


737








SEQ ID NO:
ITGHFYACQVAQR
ENSG00000136231.9
8


738








SEQ ID NO:
KVVGDVAYDEAKER
ENSG00000100714.11
8


739








SEQ ID NO:
LDTDILLGATCGLK
ENSG00000184207.8
8


740








SEQ ID NO:
LVSAVVEYGGK
ENSG00000136631.8
8


741








SEQ ID NO:
MLGVAAGMTHSNMANALASAT
ENSG00000169756.12
8


742
CER







SEQ ID NO:
NIPNGLQEFLDPLCQR
ENSG00000130396.16
8


743








SEQ ID NO:
QADIIGKPSR
ENSG00000184207.8
8


744








SEQ ID NO:
QEISIMNCLHHPK
ENSG00000065534.14
8


745








SEQ ID NO:
QIVSEMLR
ENSG00000196961.8
8


746








SEQ ID NO:
RAEQLLQDAR
ENSG00000172037.9
8


747








SEQ ID NO:
RFENAPDSAK
ENSG00000082805.15
8


748








SEQ ID NO:
SGAPWFK
ENSG00000162614.14
8


749








SEQ ID NO:
SIVEHVASK
ENSG00000146733.9
8


750








SEQ ID NO:
SLVGLSQER
ENSG00000130396.16
8


751








SEQ ID NO:
TVNELQNLSSAEVVVPR
ENSG00000136231.9
8


752








SEQ ID NO:
VIAVVNK
ENSG00000130396.16
8


753








SEQ ID NO:
VSHSELR
ENSG00000146733.9
8


754








SEQ ID NO:
WSDGVGFSYHNFDR
ENSG00000011028.9
8


755








SEQ ID NO:
YGADDIELLPEAQHK
ENSG00000100714.11
8


756








SEQ ID NO:
AKPEASFQVWNK
ENSG00000073849.10
7


757








SEQ ID NO:
ALQLSNSPGASSAFLK
ENSG00000170776.15
7


758








SEQ ID NO:
ASSEGGTAAGAGLDSLHKNSVS
ENSG00000130429.8
7


759
QISVLSGGK







SEQ ID NO:
AVEMAAQR
ENSG00000184207.8
7


760








SEQ ID NO:
AVLELLQR
ENSG00000105963.9
7


761








SEQ ID NO:
AYAQQLADWAR
ENSG00000165912.11
7


762








SEQ ID NO:
DHSAIPVINR
ENSG00000166825.9
7


763








SEQ ID NO:
DLRDPAVCR
ENSG00000172037.9
7


764








SEQ ID NO:
FGSCVPHTTRPR
ENSG00000082458.7
7


765








SEQ ID NO:
GPQYGTLEK
ENSG00000165912.11
7


766








SEQ ID NO:
HWDDVVCESR
ENSG00000172037.9
7


767








SEQ ID NO:
IVLYQTDASLTPWTVR
ENSG00000032444.11
7


768








SEQ ID NO:
KVHSPQQVDFR
ENSG00000065534.14
7


769








SEQ ID NO:
LCTDHGSQLVTITNR
ENSG00000011028.9
7


770








SEQ ID NO:
LDFLPDMMVEGR
ENSG00000048740.13
7


771








SEQ ID NO:
LEAVAEEKPHVKPYFSK
ENSG00000065534.14
7


772








SEQ ID NO:
LEVDAIVNAANSSLLGGGGVDG
ENSG00000133315.6
7


773
CIHR







SEQ ID NO:
LLHEMQIQHPTASLIAK
ENSG00000146731.6
7


774








SEQ ID NO:
LLVEELPLR
ENSG00000198947.10
7


775








SEQ ID NO:
LMNSQLVTTEK
ENSG00000073849.10
7


776








SEQ ID NO:
LSNPPSAGPIVVHCSAGAGR
ENSG00000152894.10
7


777








SEQ ID NO:
LSPSSTETTTLPGSPTTPSLSEK
ENSG00000205277.5
7


778








SEQ ID NO:
LSPSSTETTTLPGSPTTPSLSEK
ENSG00000205277.5
7


779








SEQ ID NO:
LSPSSTETTTLPGSPTTPSLSEK
ENSG00000205277.5
7


780








SEQ ID NO:
LSPSSTETTTLPGSPTTPSLSEK
ENSG00000205277.5
7


781








SEQ ID NO:
MYLFYGNK
ENSG00000196961.8
7


782








SEQ ID NO:
PPLLLILDR
ENSG00000136631.8
7


783








SEQ ID NO:
PSLSLGTITDEEMK
ENSG00000137497.13
7


784








SEQ ID NO:
QCHECIEHIR
ENSG00000106066.9
7


785








SEQ ID NO:
QQNQELQEQLR
ENSG00000137497.13
7


786








SEQ ID NO:
SFAPILPHLAEEVFQHIPY
ENSG00000067704.8
7


787








SEQ ID NO:
SGLCPHVVVLVATVR
ENSG00000100714.11
7


788








SEQ ID NO:
SITILSTPEGTSAACK
ENSG00000136231.9
7


789








SEQ ID NO:
SLEGSDDAVLLQR
ENSG00000198947.10
7


790








SEQ ID NO:
SMDAETYVEGQR
ENSG00000130396.16
7


791








SEQ ID NO:
STTSGLVGESTPSR
ENSG00000205277.5
7


792








SEQ ID NO:
STTSGLVGESTPSR
ENSG00000205277.5
7


793








SEQ ID NO:
STTSGLVGESTPSR
ENSG00000205277.5
7


794








SEQ ID NO:
STTSGLVGESTPSR
ENSG00000205277.5
7


795








SEQ ID NO:
TQGSSTSWFGSNQSKPEFTVDLK
ENSG00000165322.13
7


796








SEQ ID NO:
VIMIVTDGRPQDSVAEVAAK
ENSG00000132561.9
7


797








SEQ ID NO:
VPPPKPATPDFR
ENSG00000065534.14
7


798








SEQ ID NO:
WGFCPIK
ENSG00000011028.9
7


799








SEQ ID NO:
YAVQVAEGMGYLESKR
ENSG00000061938.12
7


800








SEQ ID NO:
AAEEIGIKATHIKLPR
ENSG00000100714.11
6


801








SEQ ID NO:
AGDAVNVVVTGGK
ENSG00000132205.6
6


802








SEQ ID NO:
AGDTLSGTCLLIANK
ENSG00000142453.7
6


803








SEQ ID NO:
AGDTLSGTCLLIANKR
ENSG00000142453.7
6


804








SEQ ID NO:
AIDYEIQR
ENSG00000059691.7
6


805








SEQ ID NO:
ALEQALEK
ENSG00000166825.9
6


806








SEQ ID NO:
ALSSAGER
ENSG00000172037.9
6


807








SEQ ID NO:
CFLCDSR
ENSG00000172037.9
6


808








SEQ ID NO:
DAEEWVQQLK
ENSG00000005020.8
6


809








SEQ ID NO:
DDEFTHLYTLIVRPDNTYEVK
ENSG00000179218.9
6


810








SEQ ID NO:
DFGSFDKFKEK
ENSG00000112096.12
6


811








SEQ ID NO:
DGDVQAGANLSFNR
ENSG00000158560.10
6


812








SEQ ID NO:
EFASHLQQLQDALNELTEEHSK
ENSG00000137497.13
6


813








SEQ ID NO:
ETLPELPSVTR
ENSG00000059691.7
6


814








SEQ ID NO:
GAPMHDLLLWNNATVTTCHSK
ENSG00000100714.11
6


815








SEQ ID NO:
HKSDFGK
ENSG00000179218.9
6


816








SEQ ID NO:
IALETSLSK
ENSG00000076662.5
6


817








SEQ ID NO:
IGDFGLMR
ENSG00000061938.12
6


818








SEQ ID NO:
ILREEGPK
ENSG00000004864.9
6


819








SEQ ID NO:
KSEAPFTHK
ENSG00000162614.14
6


820








SEQ ID NO:
LCGDLVSCFQER
ENSG00000165912.11
6


821








SEQ ID NO:
LLDLLEGLTGQK
ENSG00000198947.10
6


822








SEQ ID NO:
LLEQSIQSAQETEK
ENSG00000198947.10
6


823








SEQ ID NO:
LQAEDCSIACLPR
ENSG00000152894.10
6


824








SEQ ID NO:
MNVVFAVK
ENSG00000136631.8
6


825








SEQ ID NO:
NPPAAYIQK
ENSG00000184922.9
6


826








SEQ ID NO:
NTSLNPQELQR
ENSG00000125826.15
6


827








SEQ ID NO:
NVLINKDIR
ENSG00000179218.9
6


828








SEQ ID NO:
PAETLKPMGN
ENSG00000065534.14
6


829








SEQ ID NO:
PAETLKPMGN
ENSG00000065534.14
6


830








SEQ ID NO:
PFSLDALSK
ENSG00000146731.6
6


831








SEQ ID NO:
PLLPANHVTMAK
ENSG00000067704.8
6


832








SEQ ID NO:
PSGYTCACDSGFR
ENSG00000090006.13
6


833








SEQ ID NO:
PSVVLSAAHTVAAR
ENSG00000032444.11
6


834








SEQ ID NO:
QASNGVLIR
ENSG00000166825.9
6


835








SEQ ID NO:
QGLELAADCHLSR
ENSG00000130396.16
6


836








SEQ ID NO:
QVEELLMAMEK
ENSG00000082805.15
6


837








SEQ ID NO:
QVEKEETNEIQVVNEEPQR
ENSG00000135052.12
6


838








SEQ ID NO:
RLEAEFPPHHSQSTFR
ENSG00000061938.12
6


839








SEQ ID NO:
SWDTNLIECNLDQELK
ENSG00000131711.10
6


840








SEQ ID NO:
TGEPCVAELTEENFQR
ENSG00000082805.15
6


841








SEQ ID NO:
VECEPSWQPFQGHCYR
ENSG00000011028.9
6


842








SEQ ID NO:
VRFTPVVCGLR
ENSG00000090006.13
6


843








SEQ ID NO:
VSLSQPR
ENSG00000090006.13
6


844








SEQ ID NO:
AAEGYTQFYYVDVLDGK
ENSG00000205277.5
5


845








SEQ ID NO:
AALEEVEGDVAELELK
ENSG00000114331.8
5


846








SEQ ID NO:
AEEFGNETWGVTK
ENSG00000179218.9
5


847








SEQ ID NO:
AFEDWLNDDLGSYQGAQGNR
ENSG00000101199.8
5


848








SEQ ID NO:
ATQEWLEK
ENSG00000137497.13
5


849








SEQ ID NO:
CSQFCTTGMDGGMSIWDVK
ENSG00000130429.8
5


850








SEQ ID NO:
DQLVIPDGQEEEQEAAGEGR
ENSG00000135052.12
5


851








SEQ ID NO:
EAQEAEAFALYHK
ENSG00000099991.12
5


852








SEQ ID NO:
EGNCSGCIQDCNR
ENSG00000104450.8
5


853








SEQ ID NO:
EGQIQSVVTYDLALDSGRPHSR
ENSG00000169896.12
5


854








SEQ ID NO:
EIDAALQKK
ENSG00000162614.14
5


855








SEQ ID NO:
ERFQNLDKK
ENSG00000130429.8
5


856








SEQ ID NO:
ETQPPDLPTTALGGCPSDWIQFL
ENSG00000011028.9
5


857
NK







SEQ ID NO:
FREFLESQEDYDPCWSLQEK
ENSG00000101199.8
5


858








SEQ ID NO:
GGTAAGAGLDSLHK
ENSG00000130429.8
5


859








SEQ ID NO:
GLNPGTLNILVR
ENSG00000152894.10
5


860








SEQ ID NO:
GQLAPVFQR
ENSG00000213380.9
5


861








SEQ ID NO:
GSAASTCILTIESK
ENSG00000162614.14
5


862








SEQ ID NO:
ICGVEDAVSEMTR
ENSG00000146733.9
5


863








SEQ ID NO:
IITEGFEAAKEK
ENSG00000146731.6
5


864








SEQ ID NO:
ILKDIANR
ENSG00000067704.8
5


865








SEQ ID NO:
IQDLEHHLGLALNEVQAAK
ENSG00000011454.12
5


866








SEQ ID NO:
IVDAVIEQVK
ENSG00000170776.15
5


867








SEQ ID NO:
KVNVLQK
ENSG00000082805.15
5


868








SEQ ID NO:
LLLQCQVSSDPPATIIWTLNGK
ENSG00000065534.14
5


869








SEQ ID NO:
LSFEEMER
ENSG00000162614.14
5


870








SEQ ID NO:
LSPIPAVPASVPLQAWHPAK
ENSG00000104450.8
5


871








SEQ ID NO:
NQDNEDEWPLAEILSVK
ENSG00000172977.8
5


872








SEQ ID NO:
PTTLTDEEINR
ENSG00000100714.11
5


873








SEQ ID NO:
QIIEDQSGHYIWVPSPEKL
ENSG00000082458.7
5


874








SEQ ID NO:
QIQESEHMK
ENSG00000065534.14
5


875








SEQ ID NO:
RDFGSFDK
ENSG00000112096.12
5


876








SEQ ID NO:
RPQLEELITAAQNLK
ENSG00000198947.10
5


877








SEQ ID NO:
RPYWCISR
ENSG00000067704.8
5


878








SEQ ID NO:
SEESTASHSSQDATGTIVLPAR
ENSG00000205277.5
5


879








SEQ ID NO:
SEESTASHSSQDATGTIVLPAR
ENSG00000205277.5
5


880








SEQ ID NO:
SEESTASHSSQDATGTIVLPAR
ENSG00000205277.5
5


881








SEQ ID NO:
SEESTASHSSQDATGTIVLPAR
ENSG00000205277.5
5


882








SEQ ID NO:
SGTIFDNFLITNDEAY
ENSG00000179218.9
5


883








SEQ ID NO:
SQDADSPGSSGAPENLTFK
ENSG00000130396.16
5


884








SEQ ID NO:
TCYPLESRPSLSLGTITDEEMK
ENSG00000137497.13
5


885








SEQ ID NO:
TGLFTPDMAFETIVK
ENSG00000106976.14
5


886








SEQ ID NO:
VATEAEFSPEDSPSVR
ENSG00000155629.10
5


887








SEQ ID NO:
VPPPCDLGR
ENSG00000090006.13
5


888








SEQ ID NO:
VVSNFILQALQGEPLTVYGSGSQ
ENSG00000115652.10
5


889
TR







SEQ ID NO:
AAIVFTDGR
ENSG00000132561.9
4


890








SEQ ID NO:
AGKGEVTFEDVK
ENSG00000004864.9
4


891








SEQ ID NO:
AIDLEIK
ENSG00000162614.14
4


892








SEQ ID NO:
AIEEELQEIASEPTNK
ENSG00000132561.9
4


893








SEQ ID NO:
ASFITPVPGGVGPMTVAMLMQ
ENSG00000100714.11
4


894
STVESAKR







SEQ ID NO:
CAVVSSAGSLK
ENSG00000073849.10
4


895








SEQ ID NO:
CHYYANK
ENSG00000134871.13
4


896








SEQ ID NO:
CLTALPYICK
ENSG00000011028.9
4


897








SEQ ID NO:
DEELPTLLHFAAK
ENSG00000155629.10
4


898








SEQ ID NO:
DKVMPLIIQGFK
ENSG00000086475.10
4


899








SEQ ID NO:
DKVVALAEGR
ENSG00000101199.8
4


900








SEQ ID NO:
DQVFGSNLANLCQR
ENSG00000165322.13
4


901








SEQ ID NO:
DVFNVEDQKR
ENSG00000135052.12
4


902








SEQ ID NO:
EAELEYNPEHVSR
ENSG00000067704.8
4


903








SEQ ID NO:
EATDVIIIHSK
ENSG00000166825.9
4


904








SEQ ID NO:
EQYDVPQEWR
ENSG00000205277.5
4


905








SEQ ID NO:
ESPQDSAITR
ENSG00000011454.12
4


906








SEQ ID NO:
EVVLQWFTENSK
ENSG00000166825.9
4


907








SEQ ID NO:
EYFTFPASK
ENSG00000130396.16
4


908








SEQ ID NO:
FFDSACTMGAYHPLLYEK
ENSG00000073849.10
4


909








SEQ ID NO:
FGSFDKFK
ENSG00000112096.12
4


910








SEQ ID NO:
FIEAGQFNDNLYGTSIQSVR
ENSG00000082458.7
4


911








SEQ ID NO:
FIPGSALNGMVEMMDR
ENSG00000067704.8
4


912








SEQ ID NO:
GHLQIAACPNQD
ENSG00000112096.12
4


913








SEQ ID NO:
GSWQPVGDLLIDSLQDHLEK
ENSG00000198947.10
4


914








SEQ ID NO:
HVVPGVER
ENSG00000130589.12
4


915








SEQ ID NO:
IDYGTGHEAAFAAFLCCLCK
ENSG00000119383.15
4


916








SEQ ID NO:
IVGNGSEQQLQK
ENSG00000011454.12
4


917








SEQ ID NO:
KESEETIIQTDEDVPGPVPVK
ENSG00000152894.10
4


918








SEQ ID NO:
LEPAGPACPEGGR
ENSG00000213380.9
4


919








SEQ ID NO:
LETLTNQFSDSK
ENSG00000082805.15
4


920








SEQ ID NO:
LFSGSQVR
ENSG00000059691.7
4


921








SEQ ID NO:
LLEILK
ENSG00000082805.15
4


922








SEQ ID NO:
LLQQFPLDLEK
ENSG00000198947.10
4


923








SEQ ID NO:
LLTESVNSVIAQAPPVAQEALKK
ENSG00000198947.10
4


924








SEQ ID NO:
LPVEDKIR
ENSG00000100714.11
4


925








SEQ ID NO:
LPYGGQCR
ENSG00000172037.9
4


926








SEQ ID NO:
LSTAITLLPLEEGR
ENSG00000019144.12
4


927








SEQ ID NO:
LTASSTCGLNGPQPYCIVSHLQD
ENSG00000172037.9
4


928
EKK







SEQ ID NO:
LVTPHGESEQIGVIPSK
ENSG00000082458.7
4


929








SEQ ID NO:
NAEVRPPFTYASLIR
ENSG00000114861.14
4


930








SEQ ID NO:
PAETLKPMGNAKPDENLK
ENSG00000065534.14
4


931








SEQ ID NO:
PGGAGPCATVSVFPGAR
ENSG00000142453.7
4


932








SEQ ID NO:
QELNTIASKPPR
ENSG00000169896.12
4


933








SEQ ID NO:
RFSTEYELQQLEQFKK
ENSG00000166825.9
4


934








SEQ ID NO:
RVPPPCAPGR
ENSG00000090006.13
4


935








SEQ ID NO:
SCHAGFGSPAGWDVPVGALIQR
ENSG00000163975.7
4


936








SEQ ID NO:
SFGHFPGPEFLDVEK
ENSG00000165322.13
4


937








SEQ ID NO:
SITEVGEALK
ENSG00000198947.10
4


938








SEQ ID NO:
SLQADTTNTDTALTTLEEALAEKE
ENSG00000082805.15
4


939
R







SEQ ID NO:
SSNLLDLKNPFFR
ENSG00000142453.7
4


940








SEQ ID NO:
TGYAFVDCPDESWALK
ENSG00000136231.9
4


941








SEQ ID NO:
TQVTFFFPLDLSYR
ENSG00000169896.12
4


942








SEQ ID NO:
TSKDDLLLTDFEGALK
ENSG00000011454.12
4


943








SEQ ID NO:
TVTINTEQK
ENSG00000065534.14
4


944








SEQ ID NO:
VADLLQHINLMK
ENSG00000152894.10
4


945








SEQ ID NO:
VDANISVHHPGEPLGVR
ENSG00000059691.7
4


946








SEQ ID NO:
VMVGDLEDINEMIIK
ENSG00000198947.10
4


947








SEQ ID NO:
VVGDVAYDEAKER
ENSG00000100714.11
4


948








SEQ ID NO:
VYLLYR
ENSG00000167770.7
4


949








SEQ ID NO:
WANGLSEEKPLSVPR
ENSG00000064545.10
4


950








SEQ ID NO:
WAPNENK
ENSG00000130429.8
4


951








SEQ ID NO:
WCVLSTPEIQK
ENSG00000163975.7
4


952








SEQ ID NO:
WMDPEGEMKPGR
ENSG00000113387.7
4


953








SEQ ID NO:
WVLLQDILLK
ENSG00000198947.10
4


954








SEQ ID NO:
YEEQRPSLK
ENSG00000162614.14
4


955








SEQ ID NO:
YGLLNVTK
ENSG00000165322.13
4


956








SEQ ID NO:
YQHIGLVAMFR
ENSG00000169896.12
4


957








SEQ ID NO:
YVPAIAHLIHSLNPVR
ENSG00000106066.9
4


958








SEQ ID NO:
AAILQTEVDALR
ENSG00000082805.15
3


959








SEQ ID NO:
ADGGPEAGELPSIGEATAALALA
ENSG00000019144.12
3


960
GR







SEQ ID NO:
AENYWWR
ENSG00000061938.12
3


961








SEQ ID NO:
AEQPPHLTPGIR
ENSG00000146733.9
3


962








SEQ ID NO:
AIEALSGK
ENSG00000136231.9
3


963








SEQ ID NO:
AIGNIELGIR
ENSG00000131711.10
3


964








SEQ ID NO:
AMNNSWHPECFR
ENSG00000169756.12
3


965








SEQ ID NO:
APNLSSGNVSLK
ENSG00000155629.10
3


966








SEQ ID NO:
AQVAHADQQLR
ENSG00000137497.13
3


967








SEQ ID NO:
AREHFGTVK
ENSG00000211460.7
3


968








SEQ ID NO:
ARFEQMAKAREE
ENSG00000162614.14
3


969








SEQ ID NO:
ASFANEDGQVSPGSLLLAGAIAG
ENSG00000004864.9
3


970
MPAASLVTPADVIK







SEQ ID NO:
AVVVGFDPHFSYMK
ENSG00000184207.8
3


971








SEQ ID NO:
DDLLLTDFEGALK
ENSG00000011454.12
3


972








SEQ ID NO:
DNEETGFGSGTR
ENSG00000166825.9
3


973








SEQ ID NO:
DVDGLTSINAGK
ENSG00000100714.11
3


974








SEQ ID NO:
EAGIQPSLLCVR
ENSG00000163975.7
3


975








SEQ ID NO:
EDFNSKHMANQRALGK
ENSG00000172037.9
3


976








SEQ ID NO:
EEGDLGPVYGFQWR
ENSG00000176890.11
3


977








SEQ ID NO:
EELSSGDSLSPDPWK
ENSG00000130396.16
3


978








SEQ ID NO:
ELQKAVEEMK
ENSG00000198947.10
3


979








SEQ ID NO:
ENSMLREEMHRRFENAPDSAKT
ENSG00000082805.15
3


980
K







SEQ ID NO:
EQISDIDDAVRK
ENSG00000113387.7
3


981








SEQ ID NO:
EVVDAGLVGLER
ENSG00000138162.13
3


982








SEQ ID NO:
FEALQAPACHENMVK
ENSG00000196961.8
3


983








SEQ ID NO:
FHLCSVATR
ENSG00000196961.8
3


984








SEQ ID NO:
FNLDTENAMTFQENAR
ENSG00000169896.12
3


985








SEQ ID NO:
FTEEIPLK
ENSG00000136231.9
3


986








SEQ ID NO:
GALTSTPYSPTQHLER
ENSG00000153310.14
3


987








SEQ ID NO:
GDEGPIGHQGPIGQEGAPGR
ENSG00000134871.13
3


988








SEQ ID NO:
GDSGQPLFLTPYIEAGK
ENSG00000106066.9
3


989








SEQ ID NO:
GEPVSAEDLGVSGALTVLMK
ENSG00000100714.11
3


990








SEQ ID NO:
GFSGIFPACHPCHACFGDWDR
ENSG00000172037.9
3


991








SEQ ID NO:
GIDTPQCHR
ENSG00000172037.9
3


992








SEQ ID NO:
GWDSSHEDDLPVYLAR
ENSG00000113657.8
3


993








SEQ ID NO:
HEQNIDCGGGYV
ENSG00000179218.9
3


994








SEQ ID NO:
HLNQGTDEDIYLLGK
ENSG00000073849.10
3


995








SEQ ID NO:
IAELQQR
ENSG00000137497.13
3


996








SEQ ID NO:
ILVVITDGEK
ENSG00000169896.12
3


997








SEQ ID NO:
INDAFNLASAHK
ENSG00000166825.9
3


998








SEQ ID NO:
INLPAPNPDHVGGYK
ENSG00000004864.9
3


999








SEQ ID NO:
IQEILTQVK
ENSG00000136231.9
3


1000








SEQ ID NO:
IQPTTPSEPTAIK
ENSG00000198947.10
3


1001








SEQ ID NO:
ISPGSTEITTLPGSTTTPGLSEAST
ENSG00000205277.5
3


1002
TFYSSPR







SEQ ID NO:
ISPGSTEITTLPGSTTTPGLSEAST
ENSG00000205277.5
3


1003
TFYSSPR







SEQ ID NO:
ISPGSTEITTLPGSTTTPGLSEAST
ENSG00000205277.5
3


1004
TFYSSPR







SEQ ID NO:
ISPGSTEITTLPGSTTTPGLSEAST
ENSG00000205277.5
3


1005
TFYSSPR







SEQ ID NO:
ISSMERGLR
ENSG00000082805.15
3


1006








SEQ ID NO:
IVLDVGCGSGILSFFAAQAGAR
ENSG00000142453.7
3


1007








SEQ ID NO:
IYGADDIELLPEAQHKAEVYTK
ENSG00000100714.11
3


1008








SEQ ID NO:
KDVKLDK
ENSG00000170776.15
3


1009








SEQ ID NO:
KFQETEQTIQK
ENSG00000132205.6
3


1010








SEQ ID NO:
KFSYDLSQCINQMK
ENSG00000135052.12
3


1011








SEQ ID NO:
KLPAENGSSSAETLNAK
ENSG00000065534.14
3


1012








SEQ ID NO:
KLTELENELNTK
ENSG00000130396.16
3


1013








SEQ ID NO:
KQTENPK
ENSG00000198947.10
3


1014








SEQ ID NO:
KQVTPLFIHFR
ENSG00000166825.9
3


1015








SEQ ID NO:
KRVEDAYILTCNVSLEYEK
ENSG00000146731.6
3


1016








SEQ ID NO:
KVPFAWCAPESLK
ENSG00000061938.12
3


1017








SEQ ID NO:
LAGAPAPK
ENSG00000184207.8
3


1018








SEQ ID NO:
LHELYEKVFSRRADR
ENSG00000032444.11
3


1019








SEQ ID NO:
LLDPEDVDTTYPDKK
ENSG00000198947.10
3


1020








SEQ ID NO:
LLESLQENHFQEDEQFLGAVMP
ENSG00000086475.10
3


1021
R







SEQ ID NO:
LLQVAVEDR
ENSG00000198947.10
3


1022








SEQ ID NO:
LLVSDIQTIQPSLNSVNEGGQK
ENSG00000198947.10
3


1023








SEQ ID NO:
LNLHSADWQR
ENSG00000198947.10
3


1024








SEQ ID NO:
LPAENGSSSAETLNAK
ENSG00000065534.14
3


1025








SEQ ID NO:
LPLEDADIIK
ENSG00000110237.3
3


1026








SEQ ID NO:
LPLQMALTELETLAEK
ENSG00000104728.11
3


1027








SEQ ID NO:
LPTEWNVLGTDQSLHDAGPR
ENSG00000170776.15
3


1028








SEQ ID NO:
LQEALSQLDFQWEK
ENSG00000198947.10
3


1029








SEQ ID NO:
LQEPSAQANCCDSEKNGDIGQQ
ENSG00000132205.6
3


1030
IK







SEQ ID NO:
LQSQVISELDACKECTQGVQR
ENSG00000132205.6
3


1031








SEQ ID NO:
LYIGNLSENAAPSDLESIFK
ENSG00000136231.9
3


1032








SEQ ID NO:
MLESYLHAK
ENSG00000142453.7
3


1033








SEQ ID NO:
NLLLATR
ENSG00000061938.12
3


1034








SEQ ID NO:
NVLLHEMQIQHPTASLIAK
ENSG00000146731.6
3


1035








SEQ ID NO:
QKPCDLPLR
ENSG00000136231.9
3


1036








SEQ ID NO:
QPAAFIVTQYPLPNTVK
ENSG00000152894.10
3


1037








SEQ ID NO:
QQLGHIEAWAEK
ENSG00000130396.16
3


1038








SEQ ID NO:
QREEHYFCK
ENSG00000133315.6
3


1039








SEQ ID NO:
QVFHALEDELQK
ENSG00000151914.13
3


1040








SEQ ID NO:
QWMENPNNNPIHPNLR
ENSG00000166825.9
3


1041








SEQ ID NO:
SAQALVEQMVNEGVNADSIK
ENSG00000198947.10
3


1042








SEQ ID NO:
SATSVLVGEPTTSPISSGSTETTAL
ENSG00000205277.5
3


1043
PGSTTTAGLSEK







SEQ ID NO:
SATSVLVGEPTTSPISSGSTETTAL
ENSG00000205277.5
3


1044
PGSTTTAGLSEK







SEQ ID NO:
SATSVLVGEPTTSPISSGSTETTAL
ENSG00000205277.5
3


1045
PGSTTTAGLSEK







SEQ ID NO:
SAVEGMPSNLDSEVAWGK
ENSG00000198947.10
3


1046








SEQ ID NO:
SEDSTIYDLLKDPVSLR
ENSG00000104728.11
3


1047








SEQ ID NO:
SLESALKDLK
ENSG00000130429.8
3


1048








SEQ ID NO:
SPNPALTFCVK
ENSG00000019144.12
3


1049








SEQ ID NO:
STTFYTSPR
ENSG00000205277.5
3


1050








SEQ ID NO:
STTFYTSPR
ENSG00000205277.5
3


1051








SEQ ID NO:
STTFYTSPR
ENSG00000205277.5
3


1052








SEQ ID NO:
STTFYTSPR
ENSG00000205277.5
3


1053








SEQ ID NO:
TCHYYANK
ENSG00000134871.13
3


1054








SEQ ID NO:
TCSECQELHWGDPGLQCHACDC
ENSG00000172037.9
3


1055
DSR







SEQ ID NO:
TCYPLESR
ENSG00000137497.13
3


1056








SEQ ID NO:
TEFQLELPVK
ENSG00000169896.12
3


1057








SEQ ID NO:
TKEPVIMSTLETVR
ENSG00000198947.10
3


1058








SEQ ID NO:
TPLWIGLAGEEGSRR
ENSG00000011028.9
3


1059








SEQ ID NO:
TQSLNPAPFSPLTAQQMKPEKPS
ENSG00000130396.16
3


1060
TLQRPQETVIR







SEQ ID NO:
TVGWNVPVGYLVESGR
ENSG00000163975.7
3


1061








SEQ ID NO:
VASSSSGNNFLSGSPASPMGDIL
ENSG00000137497.13
3


1062
QTPQFQMR







SEQ ID NO:
VAWVSHDSTVCLADADK
ENSG00000130429.8
3


1063








SEQ ID NO:
VEQQPDYR
ENSG00000130396.16
3


1064








SEQ ID NO:
VIQEVSGLPSEGASEGNQYTPDA
ENSG00000169129.10
3


1065
QR







SEQ ID NO:
VLDLLDPASGDLVIR
ENSG00000079616.8
3


1066








SEQ ID NO:
VLLHEMQIQHPTASLIAK
ENSG00000146731.6
3


1067








SEQ ID NO:
VMDKVTSDETR
ENSG00000138162.13
3


1068








SEQ ID NO:
VPRYELLLK
ENSG00000127084.13
3


1069








SEQ ID NO:
VQFGASHVFK
ENSG00000130396.16
3


1070








SEQ ID NO:
VSCIVSAAK
ENSG00000169129.10
3


1071








SEQ ID NO:
VTEILGIEPDREK
ENSG00000211460.7
3


1072








SEQ ID NO:
VVDALNQGLPR
ENSG00000079616.8
3


1073








SEQ ID NO:
WKTPAAIPATPVAVSQPIR
ENSG00000130396.16
3


1074








SEQ ID NO:
YLETADYAIREEIVLK
ENSG00000196961.8
3


1075








SEQ ID NO:
YLNWESDQPDNPSEENCGVIR
ENSG00000011028.9
3


1076








SEQ ID NO:
YVGFGNTPPPQKK
ENSG00000101199.8
3


1077








SEQ ID NO:
AAGNFATK
ENSG00000130396.16
2


1078








SEQ ID NO:
AEGERQPPPDSSEEAPPATQNFII
ENSG00000119383.15
2


1079
PK







SEQ ID NO:
AGLVVEDALFETLPSDVR
ENSG00000171488.10
2


1080








SEQ ID NO:
AHCGDPVSLAAAGDGSPDIGPT
ENSG00000127084.13
2


1081
GELSGSLK







SEQ ID NO:
AILQNHTDFKDK
ENSG00000142453.7
2


1082








SEQ ID NO:
AINVYGTSEPSQESELTTVGEKPE
ENSG00000065534.14
2


1083
EPK







SEQ ID NO:
ALGEDQVAETSAMSDVLKDILK
ENSG00000157617.12
2


1084








SEQ ID NO:
ANIVMVLEIVSGGELFER
ENSG00000065534.14
2


1085








SEQ ID NO:
APEEQGLLPNGEPSQHSSAPQK
ENSG00000169129.10
2


1086








SEQ ID NO:
APGLGVLSPSGEER
ENSG00000065534.14
2


1087








SEQ ID NO:
AQDDVSEWASK
ENSG00000132561.9
2


1088








SEQ ID NO:
ASSISEEVAVGSIAATLK
ENSG00000170776.15
2


1089








SEQ ID NO:
ATLALDSVLTEEGK
ENSG00000170776.15
2


1090








SEQ ID NO:
AVGGDRQEAIQPGCIGGPKGLP
ENSG00000134871.13
2


1091
GLPGPPGPTGAKGLRGIPGFAGA





DGGP







SEQ ID NO:
AVGLVSTWTQR
ENSG00000127084.13
2


1092








SEQ ID NO:
AVSSADPR
ENSG00000138162.13
2


1093








SEQ ID NO:
AWHAFFTAAER
ENSG00000165912.11
2


1094








SEQ ID NO:
DCTQCLQHPWLMK
ENSG00000065534.14
2


1095








SEQ ID NO:
DEISDDAKDFISNLLK
ENSG00000065534.14
2


1096








SEQ ID NO:
DFGPASQHFLSTSVQGPWER
ENSG00000198947.10
2


1097








SEQ ID NO:
DFLDSLGFSTR
ENSG00000176890.11
2


1098








SEQ ID NO:
DGEWEPPVIQNPEYK
ENSG00000179218.9
2


1099








SEQ ID NO:
DTSPAPSGTTSAFVK
ENSG00000205277.5
2


1100








SEQ ID NO:
EAEDRARQEEERR
ENSG00000130396.16
2


1101








SEQ ID NO:
EAPYGAPR
ENSG00000090006.13
2


1102








SEQ ID NO:
ECAIYTNR
ENSG00000104450.8
2


1103








SEQ ID NO:
EGIVALRR
ENSG00000146731.6
2


1104








SEQ ID NO:
EGPYTVDAIQK
ENSG00000198947.10
2


1105








SEQ ID NO:
EKELQTIFDTLPPMR
ENSG00000198947.10
2


1106








SEQ ID NO:
ELEQQLQESAR
ENSG00000019144.12
2


1107








SEQ ID NO:
EQLDKIQSSHNFQLESVNK
ENSG00000135052.12
2


1108








SEQ ID NO:
EVTKEEFVLAAQK
ENSG00000004864.9
2


1109








SEQ ID NO:
EVVPGDSVNSLLSILDVITGHQHP
ENSG00000032444.11
2


1110
QR







SEQ ID NO:
EYWMDPEGEMKPGRK
ENSG00000113387.7
2


1111








SEQ ID NO:
FGFSHLEALLDDSK
ENSG00000167770.7
2


1112








SEQ ID NO:
FGSQASQK
ENSG00000101199.8
2


1113








SEQ ID NO:
FHELTQTDK
ENSG00000100714.11
2


1114








SEQ ID NO:
FLDLGISIAENR
ENSG00000125826.15
2


1115








SEQ ID NO:
FLLDCGIR
ENSG00000065534.14
2


1116








SEQ ID NO:
FVDPSQDHALAK
ENSG00000130396.16
2


1117








SEQ ID NO:
FYGDEEK
ENSG00000179218.9
2


1118








SEQ ID NO:
GAWLGMNFNPK
ENSG00000011028.9
2


1119








SEQ ID NO:
GILVFQLK
ENSG00000130396.16
2


1120








SEQ ID NO:
GISLNPEQWSQL
ENSG00000113387.7
2


1121








SEQ ID NO:
GLYLPLFKPSVSTSK
ENSG00000004864.9
2


1122








SEQ ID NO:
GMEDLIPLVNR
ENSG00000106976.14
2


1123








SEQ ID NO:
GPIGHQGPIGQEGAPGR
ENSG00000134871.13
2


1124








SEQ ID NO:
GPNKHTLTQIK
ENSG00000146731.6
2


1125








SEQ ID NO:
GPTCNEFTGQCHCR
ENSG00000172037.9
2


1126








SEQ ID NO:
GSEGEPGIR
ENSG00000134871.13
2


1127








SEQ ID NO:
GTDVREPDDSPQGR
ENSG00000011028.9
2


1128








SEQ ID NO:
GWAGDSGPQGR
ENSG00000134871.13
2


1129








SEQ ID NO:
HAQEELPPPPPQKK
ENSG00000198947.10
2


1130








SEQ ID NO:
HSTVLENTDGK
ENSG00000163975.7
2


1131








SEQ ID NO:
IEELEEALR
ENSG00000082805.15
2


1132








SEQ ID NO:
IEGSGDQIDTYELSGGAR
ENSG00000106976.14
2


1133








SEQ ID NO:
IELHGKPIEVEHSVPK
ENSG00000136231.9
2


1134








SEQ ID NO:
IIDEDFELTERECIK
ENSG00000065534.14
2


1135








SEQ ID NO:
IKLIDFGLAR
ENSG00000065534.14
2


1136








SEQ ID NO:
ILDLLNEGSAR
ENSG00000079616.8
2


1137








SEQ ID NO:
ILMELDGPNWR
ENSG00000104450.8
2


1138








SEQ ID NO:
IPQAVVDVSSHLQK
ENSG00000171488.10
2


1139








SEQ ID NO:
IQAEQVDAVTLSGEDIYTAGK
ENSG00000163975.7
2


1140








SEQ ID NO:
IVIYVQQTTNK
ENSG00000011454.12
2


1141








SEQ ID NO:
IVSEFDYVEK
ENSG00000166825.9
2


1142








SEQ ID NO:
KADTLPR
ENSG00000049323.11
2


1143








SEQ ID NO:
KINQLSEENGDLSFK
ENSG00000137497.13
2


1144








SEQ ID NO:
KIQEILTQVK
ENSG00000136231.9
2


1145








SEQ ID NO:
KKLPAENGSSSAETLNAK
ENSG00000065534.14
2


1146








SEQ ID NO:
KLLLQCQVSSDPPATIIWTLNGK
ENSG00000065534.14
2


1147








SEQ ID NO:
KPAAGLSAAPVPTAPAAGAPL
ENSG00000115310.13
2


1148








SEQ ID NO:
KSPSSDSWTCADTSTER
ENSG00000101199.8
2


1149








SEQ ID NO:
KSSTGSPTSPLNAEK
ENSG00000065534.14
2


1150








SEQ ID NO:
LALLNEK
ENSG00000137497.13
2


1151








SEQ ID NO:
LDIDEK
ENSG00000130396.16
2


1152








SEQ ID NO:
LIAPLEGYTR
ENSG00000167608.7
2


1153








SEQ ID NO:
LKEEEEDKK
ENSG00000179218.9
2


1154








SEQ ID NO:
LKNQVTQLKEQVPGFTPR
ENSG00000100714.11
2


1155








SEQ ID NO:
LLDPQTNTEIANYPIYK
ENSG00000011454.12
2


1156








SEQ ID NO:
LLDRLPSFQQSCR
ENSG00000213380.9
2


1157








SEQ ID NO:
LLEAIKR
ENSG00000112096.12
2


1158








SEQ ID NO:
LLGFGSALLDNVDPNPENFVGA
ENSG00000196961.8
2


1159
GIIQTK







SEQ ID NO:
LQAQLNELQAQLSQKEQAAEHY
ENSG00000137497.13
2


1160
K







SEQ ID NO:
LQDVHVAEGKK
ENSG00000065534.14
2


1161








SEQ ID NO:
LQGEVLALEEER
ENSG00000019144.12
2


1162








SEQ ID NO:
LSALHLEVR
ENSG00000165912.11
2


1163








SEQ ID NO:
LSSQLVEHCQK
ENSG00000198947.10
2


1164








SEQ ID NO:
LSVMGCDVLK
ENSG00000163975.7
2


1165








SEQ ID NO:
LTAASVGVQGSGWGWLGFNKE
ENSG00000112096.12
2


1166
R







SEQ ID NO:
LTDVAIGAPGEEDNR
ENSG00000169896.12
2


1167








SEQ ID NO:
LTHGVLHTK
ENSG00000105223.14
2


1168








SEQ ID NO:
LVTDPDSGLCSHYWGAIIR
ENSG00000130396.16
2


1169








SEQ ID NO:
MDPEGEMKPGR
ENSG00000113387.7
2


1170








SEQ ID NO:
MELLVK
ENSG00000145362.12
2


1171








SEQ ID NO:
MVSMMEGVIQK
ENSG00000130396.16
2


1172








SEQ ID NO:
MVVASSK
ENSG00000100714.11
2


1173








SEQ ID NO:
NDAGQAECSCQVTVDDAPASE
ENSG00000065534.14
2


1174
NTKAPEMK







SEQ ID NO:
NILSEFQR
ENSG00000198947.10
2


1175








SEQ ID NO:
NLLEVSEVEQELACQNDHSSALQ
ENSG00000136631.8
2


1176
NIK







SEQ ID NO:
NLVDSYMAIVNK
ENSG00000106976.14
2


1177








SEQ ID NO:
NVNVFFPHFK
ENSG00000151116.12
2


1178








SEQ ID NO:
PASAEQIQHLAGAIAER
ENSG00000172037.9
2


1179








SEQ ID NO:
PAVPASVPLQAWHPAK
ENSG00000104450.8
2


1180








SEQ ID NO:
PFSAIYFPCYAHVK
ENSG00000004864.9
2


1181








SEQ ID NO:
PGPVPAHSLCGHLVPK
ENSG00000172037.9
2


1182








SEQ ID NO:
PLQGTTGLIPLLGIDVWEHAYYL
ENSG00000112096.12
2


1183
QYK







SEQ ID NO:
PNENKFAVGSGSR
ENSG00000130429.8
2


1184








SEQ ID NO:
PPVQFSLLHSK
ENSG00000196961.8
2


1185








SEQ ID NO:
QAPIGGDFPAVQK
ENSG00000198947.10
2


1186








SEQ ID NO:
QKLQDVHVAEGK
ENSG00000065534.14
2


1187








SEQ ID NO:
QLAAYIADKVDAAQMPQEAQK
ENSG00000198947.10
2


1188








SEQ ID NO:
QLSESSKLK
ENSG00000157617.12
2


1189








SEQ ID NO:
QQTANKVEIEK
ENSG00000011454.12
2


1190








SEQ ID NO:
QSSSSRDDNMFQIGK
ENSG00000113387.7
2


1191








SEQ ID NO:
QYTYGLVSCGLDR
ENSG00000004139.9
2


1192








SEQ ID NO:
RAGNSLAASTAEETAGSAQGR
ENSG00000172037.9
2


1193








SEQ ID NO:
REAPYGAPR
ENSG00000090006.13
2


1194








SEQ ID NO:
REPAPNAPGDIAAAFPAER
ENSG00000138162.13
2


1195








SEQ ID NO:
RGWDSSHEDDLPVYLAR
ENSG00000113657.8
2


1196








SEQ ID NO:
RLEEESAQLK
ENSG00000011454.12
2


1197








SEQ ID NO:
RQVEKEETNEIQVVNEEPQR
ENSG00000135052.12
2


1198








SEQ ID NO:
RSESQGTAPAFK
ENSG00000065534.14
2


1199








SEQ ID NO:
SCTEETHGFICQK
ENSG00000011028.9
2


1200








SEQ ID NO:
SDFGKFVLSSGK
ENSG00000179218.9
2


1201








SEQ ID NO:
SEYMEGNVR
ENSG00000166825.9
2


1202








SEQ ID NO:
SFAPILPHLAEEVFQHIPYIK
ENSG00000067704.8
2


1203








SEQ ID NO:
SKVPQETQSGGGSR
ENSG00000049323.11
2


1204








SEQ ID NO:
SPATTLSPASTTSSGVSEESTTSHS
ENSG00000205277.5
2


1205
R







SEQ ID NO:
SPATTLSPASTTSSGVSEESTTSHS
ENSG00000205277.5
2


1206
R







SEQ ID NO:
SPATTLSPASTTSSGVSEESTTSHS
ENSG00000205277.5
2


1207
RPGSTHTTAFPDSTTTPGLSR







SEQ ID NO:
SPATTLSPASTTSSGVSEESTTSHS
ENSG00000205277.5
2


1208
RPGSTHTTAFPDSTTTPGLSR







SEQ ID NO:
SQDLQVIDLLTVGESR
ENSG00000169231.9
2


1209








SEQ ID NO:
SREPQAKPQLDLSIDSLDLSCEEG
ENSG00000137497.13
2


1210
TPLSITSK







SEQ ID NO:
SRQELASGLPSPAATQELPVER
ENSG00000138162.13
2


1211








SEQ ID NO:
SSAAAGAPSR
ENSG00000049323.11
2


1212








SEQ ID NO:
SSPNVANQPPSPGGK
ENSG00000130396.16
2


1213








SEQ ID NO:
SSSEVLVLAETLDGVR
ENSG00000130589.12
2


1214








SEQ ID NO:
SVQEIAEQLLLENHPAR
ENSG00000151914.13
2


1215








SEQ ID NO:
TCTGYHQVR
ENSG00000133316.11
2


1216








SEQ ID NO:
TGETSR
ENSG00000113387.7
2


1217








SEQ ID NO:
TIQNQLR
ENSG00000169896.12
2


1218








SEQ ID NO:
TLFSLMQYSEEFR
ENSG00000169896.12
2


1219








SEQ ID NO:
TPAPDGPR
ENSG00000032444.11
2


1220








SEQ ID NO:
TPGQIVSEK
ENSG00000059691.7
2


1221








SEQ ID NO:
TPVPEK
ENSG00000065534.14
2


1222








SEQ ID NO:
TTLLDPDSCR
ENSG00000205277.5
2


1223








SEQ ID NO:
TTTESEVMK
ENSG00000100714.11
2


1224








SEQ ID NO:
TVLQIDCGLQLANDSVNR
ENSG00000104450.8
2


1225








SEQ ID NO:
VAQQPLSLVGCEVVPDPSPDHLY
ENSG00000169129.10
2


1226
SFR







SEQ ID NO:
VHALNNVNK
ENSG00000198947.10
2


1227








SEQ ID NO:
VIVMPTTK
ENSG00000067704.8
2


1228








SEQ ID NO:
VLQEDLEQEQVR
ENSG00000198947.10
2


1229








SEQ ID NO:
VPAHAVVVR
ENSG00000163975.7
2


1230








SEQ ID NO:
WLNEVEFK
ENSG00000198947.10
2


1231








SEQ ID NO:
WTDGSIINFISWAPGK
ENSG00000011028.9
2


1232








SEQ ID NO:
WTDGSIINFISWAPGKPR
ENSG00000011028.9
2


1233








SEQ ID NO:
WVNAQFSK
ENSG00000198947.10
2


1234








SEQ ID NO:
YDNFGVLGLDLWQVK
ENSG00000179218.9
2


1235








SEQ ID NO:
YLLYRPGHYDILYK
ENSG00000167770.7
2


1236








SEQ ID NO:
YLSSLDLLLEHR
ENSG00000133315.6
2


1237








SEQ ID NO:
YLVHCLQSELNNYMPAFLDDPEE
ENSG00000130396.16
2


1238
NSLQRPK







SEQ ID NO:
YRDPGVLPWGALEEEEEDGGR
ENSG00000167608.7
2


1239








SEQ ID NO:
AAAAAVGPGAGGAGSAVPGGA
ENSG00000142453.7
1


1240
GPCATVSVFPGAR







SEQ ID NO:
AAAKVALTKRADPAELR
ENSG00000004864.9
1


1241








SEQ ID NO:
AAATEEPEVIPDPAK
ENSG00000152894.10
1


1242








SEQ ID NO:
AAEEPQQQK
ENSG00000167770.7
1


1243








SEQ ID NO:
AAGDGSPDIGPTGELSGSLKIPNR
ENSG00000127084.13
1


1244








SEQ ID NO:
AAGLQAEIGQVK
ENSG00000082805.15
1


1245








SEQ ID NO:
AASGVPR
ENSG00000155629.10
1


1246








SEQ ID NO:
ACGNMFGLMHGTCPETSGGLLI
ENSG00000086475.10
1


1247
CLPR







SEQ ID NO:
ADSAVSQEQLR
ENSG00000165912.11
1


1248








SEQ ID NO:
AEEKPHVKPYFSK
ENSG00000065534.14
1


1249








SEQ ID NO:
AELEYNPEHVSR
ENSG00000067704.8
1


1250








SEQ ID NO:
AEQLLQDAR
ENSG00000172037.9
1


1251








SEQ ID NO:
AEYMRIQAQQQATKPSKEMS
ENSG00000017373.11
1


1252








SEQ ID NO:
AFCGLGTTGMWR
ENSG00000110237.3
1


1253








SEQ ID NO:
AFLEAVAEEKPHVKPYFSK
ENSG00000065534.14
1


1254








SEQ ID NO:
AHKQCALKLLR
ENSG00000141447.12
1


1255








SEQ ID NO:
ALMDLLQLTR
ENSG00000079616.8
1


1256








SEQ ID NO:
ALQDFEEPDK
ENSG00000061938.12
1


1257








SEQ ID NO:
ALQFLEEVKVSR
ENSG00000146731.6
1


1258








SEQ ID NO:
ALQHMAAMSSAQIVSATAIHNK
ENSG00000187079.10
1


1259
LGLPGIPRPT







SEQ ID NO:
AMAYETLEQYGK
ENSG00000104450.8
1


1260








SEQ ID NO:
AMLAAVLEQELPALAENLHQEQ
ENSG00000142733.10
1


1261
K







SEQ ID NO:
AMLAAVLEQELPALAENLHQEQ
ENSG00000142733.10
1


1262
K







SEQ ID NO:
ANGITMYAVGVGKAIEEELQEIA
ENSG00000132561.9
1


1263
SEPTNK







SEQ ID NO:
APAPDVPGCSR
ENSG00000172037.9
1


1264








SEQ ID NO:
APILPHLAEEVFQHIPYIK
ENSG00000067704.8
1


1265








SEQ ID NO:
AQALLADVDTLLFDCDGVLWR
ENSG00000184207.8
1


1266








SEQ ID NO:
AQNSGFDLQETLVK
ENSG00000146731.6
1


1267








SEQ ID NO:
ARFEQMAK
ENSG00000162614.14
1


1268








SEQ ID NO:
ARPEAYQVPASYQPDEEER
ENSG00000125826.15
1


1269








SEQ ID NO:
ARTSAGVGAWGAAAVGRTAGV
ENSG00000133315.6
1


1270
R







SEQ ID NO:
ASIPLKELEQFNSDIQK
ENSG00000198947.10
1


1271








SEQ ID NO:
ATSCFPRPMTPRDR
ENSG00000137497.13
1


1272








SEQ ID NO:
AVTSVSGPGEHLR
ENSG00000169231.9
1


1273








SEQ ID NO:
CAEVVSGK
ENSG00000067704.8
1


1274








SEQ ID NO:
CFGLLLSPGK
ENSG00000011454.12
1


1275








SEQ ID NO:
CGDSDKGFVVINQK
ENSG00000146731.6
1


1276








SEQ ID NO:
CGGLSCNGAAATADLALGR
ENSG00000172037.9
1


1277








SEQ ID NO:
CLCPPDFAGK
ENSG00000090006.13
1


1278








SEQ ID NO:
CLQHPWLMK
ENSG00000065534.14
1


1279








SEQ ID NO:
CLVENAGDVAFVR
ENSG00000163975.7
1


1280








SEQ ID NO:
CSGNIDPMDPDACDPHTGQCLR
ENSG00000172037.9
1


1281








SEQ ID NO:
CTEGPIDLVFVIDGSK
ENSG00000132561.9
1


1282








SEQ ID NO:
CTQCLQHPWLMK
ENSG00000065534.14
1


1283








SEQ ID NO:
CVRWAPNENK
ENSG00000130429.8
1


1284








SEQ ID NO:
DALLEALK
ENSG00000172037.9
1


1285








SEQ ID NO:
DCCFEISAPDKR
ENSG00000005020.8
1


1286








SEQ ID NO:
DDRTGTGTLSVFGMQARYSLR
ENSG00000176890.11
1


1287








SEQ ID NO:
DEDFELTERECIK
ENSG00000065534.14
1


1288








SEQ ID NO:
DISLQGPGLAPE
ENSG00000019144.12
1


1289








SEQ ID NO:
DITAALAAER
ENSG00000106976.14
1


1290








SEQ ID NO:
DLNVISSLLK
ENSG00000225485.3
1


1291








SEQ ID NO:
DQREPLPPAPAENEMK
ENSG00000104728.11
1


1292








SEQ ID NO:
DQSPLVSSSDSPPRPQPAFK
ENSG00000115310.13
1


1293








SEQ ID NO:
DRRGSGKPR
ENSG00000130396.16
1


1294








SEQ ID NO:
DSSHAFTLDELR
ENSG00000163975.7
1


1295








SEQ ID NO:
DWDSPYSHDLDTSADSVGNACR
ENSG00000105223.14
1


1296








SEQ ID NO:
EAEQLLRGPLGDQYQTVK
ENSG00000172037.9
1


1297








SEQ ID NO:
EAEVQTWLQQIGFSK
ENSG00000004139.9
1


1298








SEQ ID NO:
EDTVQSVK
ENSG00000106066.9
1


1299








SEQ ID NO:
EEAEQVLGQAR
ENSG00000198947.10
1


1300








SEQ ID NO:
EGIVALR
ENSG00000146731.6
1


1301








SEQ ID NO:
EGTEAEPLPLR
ENSG00000142733.10
1


1302








SEQ ID NO:
EGTEAEPLPLR
ENSG00000142733.10
1


1303








SEQ ID NO:
EGTPGIFQK
ENSG00000205277.5
1


1304








SEQ ID NO:
EGVIQNFK
ENSG00000130396.16
1


1305








SEQ ID NO:
EIDAALQK
ENSG00000162614.14
1


1306








SEQ ID NO:
EIHTVPDMGKWKR
ENSG00000119383.15
1


1307








SEQ ID NO:
EKLTAASVGVQGSGWGWLGFN
ENSG00000112096.12
1


1308
K







SEQ ID NO:
ELEAKMLAQKAEEKENHCPTML
ENSG00000079616.8
1


1309
R







SEQ ID NO:
ELEEKDGDVQAGANLSFNR
ENSG00000158560.10
1


1310








SEQ ID NO:
ELETLTTNYQWLCTR
ENSG00000198947.10
1


1311








SEQ ID NO:
ELLLSGPPEVAAPDTPYLHVDSA
ENSG00000138162.13
1


1312
AQR







SEQ ID NO:
ELQDGIGQR
ENSG00000198947.10
1


1313








SEQ ID NO:
EMSKKAPSEISRK
ENSG00000198947.10
1


1314








SEQ ID NO:
ENIRQEISIMNCLHHPK
ENSG00000065534.14
1


1315








SEQ ID NO:
EPMKAPLCGEGDQPGGFESQEK
ENSG00000138162.13
1


1316








SEQ ID NO:
EPYAREMLAISFISAVNR
ENSG00000225485.3
1


1317








SEQ ID NO:
ERARKFSGSGLAMGLGSASASA
ENSG00000082458.7
1


1318
WRR







SEQ ID NO:
ERARKFSGSGLAMGLGSASASA
ENSG00000082458.7
1


1319
WRR







SEQ ID NO:
ERVLSLSQALATEASQWHR
ENSG00000105559.7
1


1320








SEQ ID NO:
ESGRGSSTPPGPIAALGMPDTGP
ENSG00000127084.13
1


1321
GSSSLGK







SEQ ID NO:
ESGSLEDDWDFLPPKK
ENSG00000179218.9
1


1322








SEQ ID NO:
EVARNVFECNDQVVK
ENSG00000169896.12
1


1323








SEQ ID NO:
EVPEEGPGAPAR
ENSG00000186635.10
1


1324








SEQ ID NO:
EYQEDLALR
ENSG00000125826.15
1


1325








SEQ ID NO:
FAGDSLK
ENSG00000151914.13
1


1326








SEQ ID NO:
FGPGDQVR
ENSG00000114331.8
1


1327








SEQ ID NO:
FGVLGLDLWQVK
ENSG00000179218.9
1


1328








SEQ ID NO:
FKDNPTVVVEDLR
ENSG00000114331.8
1


1329








SEQ ID NO:
FNGAPTANFQQDVGTK
ENSG00000073849.10
1


1330








SEQ ID NO:
FNHPAEAKWMK
ENSG00000019144.12
1


1331








SEQ ID NO:
FNRALNCMNLPPDK
ENSG00000184922.9
1


1332








SEQ ID NO:
FRLAEDGKR
ENSG00000132561.9
1


1333








SEQ ID NO:
FSAEALR
ENSG00000073849.10
1


1334








SEQ ID NO:
FSPEVPGQK
ENSG00000131711.10
1


1335








SEQ ID NO:
FTDFEEVR
ENSG00000106976.14
1


1336








SEQ ID NO:
FVPIIGIAMPLSSR
ENSG00000151835.9
1


1337








SEQ ID NO:
FWPAIDDGLRR
ENSG00000105223.14
1


1338








SEQ ID NO:
FWVVDQTHFYLGSANMDWR
ENSG00000105223.14
1


1339








SEQ ID NO:
GAAVDEYFRQPVVDTFDIR
ENSG00000142453.7
1


1340








SEQ ID NO:
GAFHRPVLGGFR
ENSG00000165912.11
1


1341








SEQ ID NO:
GAGLAWGVHDCQLCSER
ENSG00000090006.13
1


1342








SEQ ID NO:
GAPISAYQIVVEELHPHRT
ENSG00000152894.10
1


1343








SEQ ID NO:
GATGHPGGGQGAENPAGLKSQ
ENSG00000104450.8
1


1344
GNELFR







SEQ ID NO:
GCLELIKETGVPIAGR
ENSG00000100714.11
1


1345








SEQ ID NO:
GCPQEDSDIAFLIDGSGSIIPHDF
ENSG00000169896.12
1


1346
R







SEQ ID NO:
GDEGPIGHQGPIGQEGAPGRPG
ENSG00000134871.13
1


1347
SPGLPGMPGR







SEQ ID NO:
GDKGERGAPGVTGPK
ENSG00000134871.13
1


1348








SEQ ID NO:
GDNVLINTFSGLLK
ENSG00000142733.10
1


1349








SEQ ID NO:
GDNVLINTFSGLLK
ENSG00000142733.10
1


1350








SEQ ID NO:
GDTGNPGAPGTPGTKGWAGDS
ENSG00000134871.13
1


1351
GPQGRP







SEQ ID NO:
GEFAIDGYSVR
ENSG00000005020.8
1


1352








SEQ ID NO:
GEGLYADPYGLLHEGR
ENSG00000017373.11
1


1353








SEQ ID NO:
GEIAPLKENVSHVNDLAR
ENSG00000198947.10
1


1354








SEQ ID NO:
GEWKPRQIDNPDYK
ENSG00000179218.9
1


1355








SEQ ID NO:
GGCVALATGSAMGLWEVK
ENSG00000011028.9
1


1356








SEQ ID NO:
GGHDIILAAFDNFK
ENSG00000184922.9
1


1357








SEQ ID NO:
GGSQPPDIDKTELVEPTEYLVVHL
ENSG00000166825.9
1


1358
K







SEQ ID NO:
GGVSAVPGFR
ENSG00000134871.13
1


1359








SEQ ID NO:
GHLQIAACPNQDPLQGTTGLIPL
ENSG00000112096.12
1


1360
LGIDVWEHAY







SEQ ID NO:
GHPDRLPLQMALTELETLAEK
ENSG00000104728.11
1


1361








SEQ ID NO:
GKEAGEVR
ENSG00000169896.12
1


1362








SEQ ID NO:
GKNVLINKDIR
ENSG00000179218.9
1


1363








SEQ ID NO:
GLCFLFGSNLR
ENSG00000169896.12
1


1364








SEQ ID NO:
GLEEAVESACAMR
ENSG00000067704.8
1


1365








SEQ ID NO:
GLGKYICQKCHAIIDEQPL
ENSG00000169756.12
1


1366








SEQ ID NO:
GNCFCYGHASECAPAPGAPAHA
ENSG00000172037.9
1


1367
EGMVHGACICK







SEQ ID NO:
GPAPARPKMLVISGGDGYEDFRL
ENSG00000110237.3
1


1368
SSGGGSSS







SEQ ID NO:
GPGAGSALDDGRR
ENSG00000196961.8
1


1369








SEQ ID NO:
GPPSSVPK
ENSG00000184922.9
1


1370








SEQ ID NO:
GQLQDELEKGER
ENSG00000082805.15
1


1371








SEQ ID NO:
GQTPEAGADKRSPRRASAAAAA
ENSG00000104450.8
1


1372
GGGATGHPGG







SEQ ID NO:
GREPASCEDLCGGGVGADGGGS
ENSG00000065534.14
1


1373
DR







SEQ ID NO:
GRISVSLQEEASGGSLAAPAR
ENSG00000032444.11
1


1374








SEQ ID NO:
GSDGMDAVRSAPTLIR
ENSG00000150672.12
1


1375








SEQ ID NO:
GSRPGIEGDTPR
ENSG00000113657.8
1


1376








SEQ ID NO:
GTISFFEIDGR
ENSG00000172977.8
1


1377








SEQ ID NO:
GTWIHPEIDNPEYSPD
ENSG00000179218.9
1


1378








SEQ ID NO:
GVTDTLAQIR
ENSG00000017373.11
1


1379








SEQ ID NO:
GWDCHGLPIEIK
ENSG00000067704.8
1


1380








SEQ ID NO:
HCELCRPFFYR
ENSG00000172037.9
1


1381








SEQ ID NO:
HFQIDYDEDGNCSLIISDVCGDD
ENSG00000065534.14
1


1382
DAK







SEQ ID NO:
HGGLSLVQTTDYIYPIVDDPYM
ENSG00000086475.10
1


1383
MGR







SEQ ID NO:
HLDTLHNFVSR
ENSG00000151914.13
1


1384








SEQ ID NO:
HLNPGLQLYR
ENSG00000114331.8
1


1385








SEQ ID NO:
HTEILEILEIPQLMDTCVR
ENSG00000213380.9
1


1386








SEQ ID NO:
HTLTQIKDAVR
ENSG00000146731.6
1


1387








SEQ ID NO:
IAALNASSTIEDDHEGSFK
ENSG00000099991.12
1


1388








SEQ ID NO:
IAEIQAR
ENSG00000152894.10
1


1389








SEQ ID NO:
IDALREELMEGMDR
ENSG00000132205.6
1


1390








SEQ ID NO:
IFEEQPCLRK
ENSG00000099991.12
1


1391








SEQ ID NO:
IFLTEQPLEGLEK
ENSG00000198947.10
1


1392








SEQ ID NO:
IFSAYIK
ENSG00000130429.8
1


1393








SEQ ID NO:
IIDRIHGTEEGQQILK
ENSG00000137497.13
1


1394








SEQ ID NO:
ILHKGEELAK
ENSG00000169129.10
1


1395








SEQ ID NO:
INELENGGEILNETRSFHHK
ENSG00000059691.7
1


1396








SEQ ID NO:
IPASAEQIQHLAGAIAER
ENSG00000172037.9
1


1397








SEQ ID NO:
IQGTLQPH
ENSG00000172037.9
1


1398








SEQ ID NO:
IQNQWDEVQEHLQNR
ENSG00000198947.10
1


1399








SEQ ID NO:
IQNVVTSFAPQRRAAWWQSEN
ENSG00000172037.9
1


1400
GIPA







SEQ ID NO:
IRQKVDDCERCR
ENSG00000011454.12
1


1401








SEQ ID NO:
ITEQEKLK
ENSG00000151914.13
1


1402








SEQ ID NO:
ITSVSTGNLCTEEQTPPPRPEAYPI
ENSG00000130396.16
1


1403
PTQTYTR







SEQ ID NO:
IVLGGTTVHNTK
ENSG00000136631.8
1


1404








SEQ ID NO:
IVTTHIR
ENSG00000106976.14
1


1405








SEQ ID NO:
KDAEGILEDLQSYR
ENSG00000153310.14
1


1406








SEQ ID NO:
KDVEVTKEEFVLAAQK
ENSG00000004864.9
1


1407








SEQ ID NO:
KEADMQQK
ENSG00000158560.10
1


1408








SEQ ID NO:
KHPSSPECLVSAQK
ENSG00000137497.13
1


1409








SEQ ID NO:
KIQNHIQTLK
ENSG00000198947.10
1


1410








SEQ ID NO:
KISEESGETAKRR
ENSG00000099991.12
1


1411








SEQ ID NO:
KIYAVEASTMAQHAEVLVK
ENSG00000142453.7
1


1412








SEQ ID NO:
KKEELNAVR
ENSG00000198947.10
1


1413








SEQ ID NO:
KKGPGAGSALDDGR
ENSG00000196961.8
1


1414








SEQ ID NO:
KLMQIR
ENSG00000151914.13
1


1415








SEQ ID NO:
KLSSQLVEHCQK
ENSG00000198947.10
1


1416








SEQ ID NO:
KLTFEYR
ENSG00000119383.15
1


1417








SEQ ID NO:
KMEEEPLGPDLEDLKR
ENSG00000198947.10
1


1418








SEQ ID NO:
KMSGTVSK
ENSG00000136631.8
1


1419








SEQ ID NO:
KQVAPEKPVKK
ENSG00000113387.7
1


1420








SEQ ID NO:
KSSTGSPTSPLNAEKLESEEDVSQ
ENSG00000065534.14
1


1421
AF







SEQ ID NO:
KTRPDGNCFYR
ENSG00000167770.7
1


1422








SEQ ID NO:
KVSTLQNQR
ENSG00000169896.12
1


1423








SEQ ID NO:
LAGEEEALR
ENSG00000125826.15
1


1424








SEQ ID NO:
LCDNIVSESESTTAR
ENSG00000170776.15
1


1425








SEQ ID NO:
LCIEHVEEHGLDIDGIYR
ENSG00000165322.13
1


1426








SEQ ID NO:
LCQFEEAKQDCDQALQLADGNV
ENSG00000104450.8
1


1427
K







SEQ ID NO:
LDAWEEAQVEFMASHGNDAAR
ENSG00000105963.9
1


1428








SEQ ID NO:
LDEDLTTLGQMSK
ENSG00000110237.3
1


1429








SEQ ID NO:
LDLFEISQPTEDLEFHGVMR
ENSG00000130396.16
1


1430








SEQ ID NO:
LEAIKR
ENSG00000112096.12
1


1431








SEQ ID NO:
LEMLQQIANR
ENSG00000151914.13
1


1432








SEQ ID NO:
LESEEDVSQAFLEAVAEEKPHVK
ENSG00000065534.14
1


1433








SEQ ID NO:
LESEEDVSQAFLEAVAEEKPHVK
ENSG00000065534.14
1


1434
PY







SEQ ID NO:
LETMARNEVIADINCK
ENSG00000141447.12
1


1435








SEQ ID NO:
LEYNVDAANGIVMEGYLFK
ENSG00000114331.8
1


1436








SEQ ID NO:
LFPNSLDQTDMHGDSEYNIMFG
ENSG00000179218.9
1


1437
PDICGPGTKK







SEQ ID NO:
LGCTMSMR
ENSG00000059691.7
1


1438








SEQ ID NO:
LGIEKTDPTTLTDEEINR
ENSG00000100714.11
1


1439








SEQ ID NO:
LGIVNVDEAVLHFK
ENSG00000155629.10
1


1440








SEQ ID NO:
LGYTPLIVACHYGNVK
ENSG00000145362.12
1


1441








SEQ ID NO:
LHEMQIQHPTASLIAK
ENSG00000146731.6
1


1442








SEQ ID NO:
LHYNELGAK
ENSG00000198947.10
1


1443








SEQ ID NO:
LKAVQAQGGESQQEAQR
ENSG00000137497.13
1


1444








SEQ ID NO:
LKEDMKKIVAVPLNEQK
ENSG00000138640.10
1


1445








SEQ ID NO:
LKEEEEDKKR
ENSG00000179218.9
1


1446








SEQ ID NO:
LKELNDWLTK
ENSG00000198947.10
1


1447








SEQ ID NO:
LKLSFEEMER
ENSG00000162614.14
1


1448








SEQ ID NO:
LKLTFEELER
ENSG00000162614.14
1


1449








SEQ ID NO:
LKPEIQCVSAK
ENSG00000163975.7
1


1450








SEQ ID NO:
LLEATPTDSCGYFR
ENSG00000142733.10
1


1451








SEQ ID NO:
LLEATPTDSCGYFR
ENSG00000142733.10
1


1452








SEQ ID NO:
LLKGESALQR
ENSG00000114331.8
1


1453








SEQ ID NO:
LLNEGQR
ENSG00000163975.7
1


1454








SEQ ID NO:
LNGFQLENFTLK
ENSG00000136231.9
1


1455








SEQ ID NO:
LNKILK
ENSG00000067704.8
1


1456








SEQ ID NO:
LNREVAESPRPR
ENSG00000019144.12
1


1457








SEQ ID NO:
LPPSSPQKLADVAAPPGGPPPPH
ENSG00000017373.11
1


1458
SPYSGPPSR







SEQ ID NO:
LQDAFSAIGQNADLDLPQIAVVG
ENSG00000106976.14
1


1459
GQSAGK







SEQ ID NO:
LQELEGTYEENERALESK
ENSG00000172037.9
1


1460








SEQ ID NO:
LQQQCDDYGSSYLGVIELIGEK
ENSG00000132205.6
1


1461








SEQ ID NO:
LSAHTHTLSLTDINELVCGAPGD
ENSG00000172037.9
1


1462
APCATSPCGGAGCR







SEQ ID NO:
LSFEEMERQRR
ENSG00000162614.14
1


1463








SEQ ID NO:
LSGWLAQQEDAHR
ENSG00000032444.11
1


1464








SEQ ID NO:
LSHFEYVKNEDLEK
ENSG00000061938.12
1


1465








SEQ ID NO:
LSIPQLSVTDYEIM
ENSG00000198947.10
1


1466








SEQ ID NO:
LSIPQLSVTDYEIMEQR
ENSG00000198947.10
1


1467








SEQ ID NO:
LSPAYSLGSLTGASPCQSPCVQR
ENSG00000019144.12
1


1468








SEQ ID NO:
LSSGGGSSSETVGR
ENSG00000110237.3
1


1469








SEQ ID NO:
LTEEQCLFSAWLSEKEDAVNK
ENSG00000198947.10
1


1470








SEQ ID NO:
LVAAGGLDAVLYWCR
ENSG00000004139.9
1


1471








SEQ ID NO:
LVEFSAFLEQQR
ENSG00000187079.10
1


1472








SEQ ID NO:
LVPSVNGVR
ENSG00000100714.11
1


1473








SEQ ID NO:
LVTPHGESEQIGVIPSKK
ENSG00000082458.7
1


1474








SEQ ID NO:
LVVTQEDVELAYQEAMMNMAR
ENSG00000086475.10
1


1475
LNRTAAGLMH







SEQ ID NO:
MAAAEAGGDDAR
ENSG00000184207.8
1


1476








SEQ ID NO:
MAVWEAEQLGGLQR
ENSG00000130589.12
1


1477








SEQ ID NO:
MEALENR
ENSG00000132561.9
1


1478








SEQ ID NO:
MEFDEKELRR
ENSG00000106976.14
1


1479








SEQ ID NO:
MESGRGSSTPPGPIAALGMPDT
ENSG00000127084.13
1


1480
GPG







SEQ ID NO:
MESGRGSSTPPGPIAALGMPDT
ENSG00000127084.13
1


1481
GPGSSSLGK







SEQ ID NO:
MESQLK
ENSG00000082805.15
1


1482








SEQ ID NO:
MGMSFGLESGK
ENSG00000114126.13
1


1483








SEQ ID NO:
MGNAAGSAEQPAGPAAPPPK
ENSG00000184922.9
1


1484








SEQ ID NO:
MIISTPQRLTSSGSVLIGSPYTPAP
ENSG00000114126.13
1


1485
AMVTQTHIA







SEQ ID NO:
MILTNPEGR
ENSG00000152894.10
1


1486








SEQ ID NO:
MKAAKSGTKDGLEK
ENSG00000074964.12
1


1487








SEQ ID NO:
MLEDLGFKDLTLQPR
ENSG00000125826.15
1


1488








SEQ ID NO:
MNSLTLNR
ENSG00000213380.9
1


1489








SEQ ID NO:
MSDKSDLKAELER
ENSG00000158560.10
1


1490








SEQ ID NO:
MSGSSGGAAAPAASSGPAAAAS
ENSG00000038382.13
1


1491
AAGSGCGGGA







SEQ ID NO:
MSKSLGNVIHP
ENSG00000067704.8
1


1492








SEQ ID NO:
MVSTSATDEPR
ENSG00000032444.11
1


1493








SEQ ID NO:
NANSSPVASTTPSASATTNPASA
ENSG00000166825.9
1


1494
TTLDQSKA







SEQ ID NO:
NATLVNEADKLR
ENSG00000166825.9
1


1495








SEQ ID NO:
NAVLEHMEELQEQVALLTER
ENSG00000184922.9
1


1496








SEQ ID NO:
NDKSYWLSTTAPLPMMPVAEDE
ENSG00000134871.13
1


1497
IKPYISR







SEQ ID NO:
NFVKEAEEISSNRR
ENSG00000213380.9
1


1498








SEQ ID NO:
NILVSDMEMNEQQE
ENSG00000011028.9
1


1499








SEQ ID NO:
NLAATLQDIETK
ENSG00000019144.12
1


1500








SEQ ID NO:
NLEELYLVGSLSHDISR
ENSG00000171488.10
1


1501








SEQ ID NO:
NLLEVSEVEQELACQNDHSSALQ
ENSG00000136631.8
1


1502
NIKR







SEQ ID NO:
NLVGSGSEIQFLSEAQDDPQKR
ENSG00000115652.10
1


1503








SEQ ID NO:
NRTEAEVKR
ENSG00000169129.10
1


1504








SEQ ID NO:
NSLSVLSPK
ENSG00000171488.10
1


1505








SEQ ID NO:
NTSAASTAQLVEATEELRR
ENSG00000172037.9
1


1506








SEQ ID NO:
NVQVFLISGGFR
ENSG00000146733.9
1


1507








SEQ ID NO:
NYPSSLCALCVGDEQGR
ENSG00000163975.7
1


1508








SEQ ID NO:
PCPCPEGPGSQR
ENSG00000172037.9
1


1509








SEQ ID NO:
PCQDVDECAR
ENSG00000090006.13
1


1510








SEQ ID NO:
PDENLKSASKEELKK
ENSG00000065534.14
1


1511








SEQ ID NO:
PEAYQVPASYQPDEEERAR
ENSG00000125826.15
1


1512








SEQ ID NO:
PEGEMKPGR
ENSG00000113387.7
1


1513








SEQ ID NO:
PETPYSGPGLLIDSLVLLPR
ENSG00000172037.9
1


1514








SEQ ID NO:
PEVVWFK
ENSG00000065534.14
1


1515








SEQ ID NO:
PGAGAVEVAMAEALIK
ENSG00000146731.6
1


1516








SEQ ID NO:
PGEMGPQGPPGEPGFRGAPGK
ENSG00000134871.13
1


1517








SEQ ID NO:
PGETPSWTGSGFVR
ENSG00000172037.9
1


1518








SEQ ID NO:
PGFHGQAAR
ENSG00000172037.9
1


1519








SEQ ID NO:
PGHVGQMGPVGAPGRPGPPGP
ENSG00000134871.13
1


1520
PGPK







SEQ ID NO:
PILPHLAEEVFQHIPYIK
ENSG00000067704.8
1


1521








SEQ ID NO:
PKIDDVLHTLTGAMSLLRR
ENSG00000130396.16
1


1522








SEQ ID NO:
PKMLVISGGDGYEDFR
ENSG00000110237.3
1


1523








SEQ ID NO:
PPDIDKTELVEPTEYLVVHLK
ENSG00000166825.9
1


1524








SEQ ID NO:
PPKPATPDFR
ENSG00000065534.14
1


1525








SEQ ID NO:
PPVIQNPEYK
ENSG00000179218.9
1


1526








SEQ ID NO:
PPVLGTESDATVK
ENSG00000065534.14
1


1527








SEQ ID NO:
PQLLGVAPEK
ENSG00000004864.9
1


1528








SEQ ID NO:
PRMSAQEQLERMR
ENSG00000105559.7
1


1529








SEQ ID NO:
PSGPATAEDPGRRPVLPQR
ENSG00000132205.6
1


1530








SEQ ID NO:
PTPRPVPMKRHIFR
ENSG00000186635.10
1


1531








SEQ ID NO:
PVAGSELPR
ENSG00000176890.11
1


1532








SEQ ID NO:
PYWCISR
ENSG00000067704.8
1


1533








SEQ ID NO:
QAASPLEPK
ENSG00000137497.13
1


1534








SEQ ID NO:
QAEEVNTEWEK
ENSG00000198947.10
1


1535








SEQ ID NO:
QAEGLSEDGAAMAVEPTQIQLS
ENSG00000198947.10
1


1536
K







SEQ ID NO:
QAPSSFQLLYDLK
ENSG00000100714.11
1


1537








SEQ ID NO:
QAQLEKELSAALQDKK
ENSG00000137497.13
1


1538








SEQ ID NO:
QAQVNLTVVDKPD
ENSG00000065534.14
1


1539








SEQ ID NO:
QDCDQALQLADGNVK
ENSG00000104450.8
1


1540








SEQ ID NO:
QEMVIEVKAIGGKK
ENSG00000110237.3
1


1541








SEQ ID NO:
QETPPPRSPPVANSGSTGFSRRG
ENSG00000105559.7
1


1542
SGRGGGPTP







SEQ ID NO:
QGPMTQAINR
ENSG00000170776.15
1


1543








SEQ ID NO:
QHEVEEATNILTATR
ENSG00000114331.8
1


1544








SEQ ID NO:
QIASLTGLVQSALLR
ENSG00000017373.11
1


1545








SEQ ID NO:
QICSQLSER
ENSG00000011454.12
1


1546








SEQ ID NO:
QKASGDSAR
ENSG00000004864.9
1


1547








SEQ ID NO:
QKMEEEKRRTEEER
ENSG00000162614.14
1


1548








SEQ ID NO:
QLELACETQEEVDSWK
ENSG00000106976.14
1


1549








SEQ ID NO:
QLNETGGPVLVSAPISPEEQDKL
ENSG00000198947.10
1


1550
ENK







SEQ ID NO:
QLPKPNQDTMQILFR
ENSG00000165322.13
1


1551








SEQ ID NO:
QLQTLAPK
ENSG00000105223.14
1


1552








SEQ ID NO:
QNGDSAYLYLLSAR
ENSG00000125826.15
1


1553








SEQ ID NO:
QPDVEEILSK
ENSG00000198947.10
1


1554








SEQ ID NO:
QQNLAVSESPVTPSALAELLDLLD
ENSG00000059691.7
1


1555
SR







SEQ ID NO:
QQQMHIVDMLSK
ENSG00000130396.16
1


1556








SEQ ID NO:
QSSHNFQLESVNK
ENSG00000135052.12
1


1557








SEQ ID NO:
QTLLAESEALTSYSHR
ENSG00000167608.7
1


1558








SEQ ID NO:
QTSVADLLASFNDQSTSDYLVVY
ENSG00000167770.7
1


1559
LR







SEQ ID NO:
QVFGQTTIHQHIPFNWDSEFVQ
ENSG00000004864.9
1


1560
LHFGK







SEQ ID NO:
QVVQDLLK
ENSG00000141447.12
1


1561








SEQ ID NO:
RASAAAAAGGGATGHPGGGQG
ENSG00000104450.8
1


1562
AENPAGLK







SEQ ID NO:
RCDLCAPGYYGFGPTGCQACQC
ENSG00000172037.9
1


1563
SHEGALSSLCEK







SEQ ID NO:
RCEQVQPGYFR
ENSG00000172037.9
1


1564








SEQ ID NO:
RDNEVDGQDYHFVVSR
ENSG00000082458.7
1


1565








SEQ ID NO:
RDPSSNDINGGMEPTPSTVSTPS
ENSG00000196961.8
1


1566
PSADLLGLR







SEQ ID NO:
REMAAASAAAISGAGR
ENSG00000079616.8
1


1567








SEQ ID NO:
RETLFTLDDQALGPELTAPAPEPP
ENSG00000213380.9
1


1568
AEEPR







SEQ ID NO:
RFSTEYELQQLEQFK
ENSG00000166825.9
1


1569








SEQ ID NO:
RGSDELTVPRYR
ENSG00000017373.11
1


1570








SEQ ID NO:
RIEGSGDQIDTYELSGGAR
ENSG00000106976.14
1


1571








SEQ ID NO:
RKEEEEAEDK
ENSG00000179218.9
1


1572








SEQ ID NO:
RLDIDEKPLVVQLNWNKDDR
ENSG00000130396.16
1


1573








SEQ ID NO:
RPPEPEKAPPAAPTRPSALELK
ENSG00000184922.9
1


1574








SEQ ID NO:
RPRPQGRSVSEPR
ENSG00000125744.7
1


1575








SEQ ID NO:
RQAEGLSEDGAAMAVEPTQIQL
ENSG00000198947.10
1


1576
SK







SEQ ID NO:
RRKVPPSGSGGSELSNGEAGEAY
ENSG00000110237.3
1


1577
R







SEQ ID NO:
RSLELQTRTEEEKK
ENSG00000127084.13
1


1578








SEQ ID NO:
RSSYLLAITTERSK
ENSG00000225485.3
1


1579








SEQ ID NO:
RVAAQVDGGAQVQQVLNIECLR
ENSG00000196961.8
1


1580








SEQ ID NO:
SAEESDRLR
ENSG00000130396.16
1


1581








SEQ ID NO:
SCDCDPMGSQDGGR
ENSG00000172037.9
1


1582








SEQ ID NO:
SDVLETVVLINPSDEAVSTEVR
ENSG00000131711.10
1


1583








SEQ ID NO:
SEDYELLCPNGAR
ENSG00000163975.7
1


1584








SEQ ID NO:
SFGSSLMESEVNLDR
ENSG00000198947.10
1


1585








SEQ ID NO:
SGHDQVVELLLERGAPLLAR
ENSG00000145362.12
1


1586








SEQ ID NO:
SGLTSLHLAAQEDKVNVADILTK
ENSG00000145362.12
1


1587








SEQ ID NO:
SGRPSCLYSAARPSGSYR
ENSG00000124831.14
1


1588








SEQ ID NO:
SGTIFDNFLITNDEA
ENSG00000179218.9
1


1589








SEQ ID NO:
SGTLALVEPLVASLDPGR
ENSG00000004139.9
1


1590








SEQ ID NO:
SKIVGAPMHDLLLWNNATVTTC
ENSG00000100714.11
1


1591
HSK







SEQ ID NO:
SKPEDWDER
ENSG00000179218.9
1


1592








SEQ ID NO:
SLEGSDDAVLLQRRLDNMNFKW
ENSG00000198947.10
1


1593
SELR







SEQ ID NO:
SLNPEQWSQLK
ENSG00000113387.7
1


1594








SEQ ID NO:
SLSDPSRRGELAGPGFEGPGGEP
ENSG00000110237.3
1


1595
IREV







SEQ ID NO:
SNRDELELELAENR
ENSG00000137497.13
1


1596








SEQ ID NO:
SPARPQPGEGPGGPGGPPEVSR
ENSG00000105559.7
1


1597








SEQ ID NO:
SPARPQPGEGPGGPGGPPEVSR
ENSG00000105559.7
1


1598








SEQ ID NO:
SPDTTLSPASTTSSGVSEESTTSHS
ENSG00000205277.5
1


1599
R







SEQ ID NO:
SPDTTLSPASTTSSGVSEESTTSHS
ENSG00000205277.5
1


1600
R







SEQ ID NO:
SPDTTLSPASTTSSGVSEESTTSHS
ENSG00000205277.5
1


1601
R







SEQ ID NO:
SPFPSQHLEAPEDK
ENSG00000198947.10
1


1602








SEQ ID NO:
SPGPPQVDGTPTMSLERPPR
ENSG00000155629.10
1


1603








SEQ ID NO:
SPTTTLSPASMTSLGVGEESTTSR
ENSG00000205277.5
1


1604








SEQ ID NO:
SPTTTLSPASMTSLGVGEESTTSR
ENSG00000205277.5
1


1605








SEQ ID NO:
SPTTTLSPASMTSLGVGEESTTSR
ENSG00000205277.5
1


1606








SEQ ID NO:
SPTTTLSPASMTSLGVGEESTTSR
ENSG00000205277.5
1


1607








SEQ ID NO:
SQAYADYIGFILTLNEGVK
ENSG00000119383.15
1


1608








SEQ ID NO:
SQMNCNLGTCQLQR
ENSG00000205277.5
1


1609








SEQ ID NO:
SRQELNTIASKPPR
ENSG00000169896.12
1


1610








SEQ ID NO:
SSHVTIDTLK
ENSG00000163975.7
1


1611








SEQ ID NO:
SSQNDSPGDASEGPEYLAIGNLD
ENSG00000145016.9
1


1612
PRGR







SEQ ID NO:
STEYELQQLEQFKK
ENSG00000166825.9
1


1613








SEQ ID NO:
STSFNVQDLLPDHEYKFR
ENSG00000065534.14
1


1614








SEQ ID NO:
SVEQEVVQSQLNHCVNLYK
ENSG00000198947.10
1


1615








SEQ ID NO:
SVYTMPLANHR
ENSG00000090006.13
1


1616








SEQ ID NO:
SWAEDEKQKAETVQAALEEAQR
ENSG00000172037.9
1


1617








SEQ ID NO:
SWCSGHLHLRCPR
ENSG00000032444.11
1


1618








SEQ ID NO:
SYVDTGGVSR
ENSG00000184922.9
1


1619








SEQ ID NO:
SYVITGSWNPK
ENSG00000011454.12
1


1620








SEQ ID NO:
TAIWEDQNLR
ENSG00000205277.5
1


1621








SEQ ID NO:
TALLTAGDIYLLSTFR
ENSG00000169231.9
1


1622








SEQ ID NO:
TEALMDAQKEDFNSK
ENSG00000172037.9
1


1623








SEQ ID NO:
TEFCLHDGPPYANGDPHVGHAL
ENSG00000067704.8
1


1624
NK







SEQ ID NO:
TESSGGWQNR
ENSG00000011028.9
1


1625








SEQ ID NO:
THIESSGHGVDTCLHVVLSSKVC
ENSG00000019144.12
1


1626
R







SEQ ID NO:
TKVHAELADVLTEAVVDSILAIKK
ENSG00000146731.6
1


1627








SEQ ID NO:
TLEIALEQKKEECLK
ENSG00000082805.15
1


1628








SEQ ID NO:
TLNATGEEIIQQSSK
ENSG00000198947.10
1


1629








SEQ ID NO:
TLPSMVHR
ENSG00000101199.8
1


1630








SEQ ID NO:
TMNGDMR
ENSG00000120549.11
1


1631








SEQ ID NO:
TNHIGWVQEFLNEENR
ENSG00000184922.9
1


1632








SEQ ID NO:
TNIQLPACLR
ENSG00000213380.9
1


1633








SEQ ID NO:
TPDELQK
ENSG00000198947.10
1


1634








SEQ ID NO:
TPLERDDLHESVFR
ENSG00000151914.13
1


1635








SEQ ID NO:
TSGNQDEILVIR
ENSG00000106976.14
1


1636








SEQ ID NO:
TTLSPASSTSPGLQGESTAFQTHP
ENSG00000205277.5
1


1637
ASTHTTPSPPSTATAPVEESTTYH





R







SEQ ID NO:
TTLSPASSTSPGLQGESTAFQTHP
ENSG00000205277.5
1


1638
ASTHTTPSPPSTATAPVEESTTYH





R







SEQ ID NO:
TTLSPASSTSPGLQGESTAFQTHP
ENSG00000205277.5
1


1639
ASTHTTPSPPSTATAPVEESTTYH





R







SEQ ID NO:
TTQGLTALLLSLKK
ENSG00000136631.8
1


1640








SEQ ID NO:
TTQIINITMTK
ENSG00000137497.13
1


1641








SEQ ID NO:
TWVQQSETK
ENSG00000198947.10
1


1642








SEQ ID NO:
VAIGPSVLNAAR
ENSG00000067704.8
1


1643








SEQ ID NO:
VAYIPDEMAAQQNPLQQPR
ENSG00000136231.9
1


1644








SEQ ID NO:
VDSDMNDAYLGYAAAIILR
ENSG00000169896.12
1


1645








SEQ ID NO:
VEDAYILTCNVSLEYEK
ENSG00000146731.6
1


1646








SEQ ID NO:
VGAPMHDLLLWNNATVTTCHS
ENSG00000100714.11
1


1647
K







SEQ ID NO:
VHLFDIITQYR
ENSG00000213380.9
1


1648








SEQ ID NO:
VIECFNVESR
ENSG00000104728.11
1


1649








SEQ ID NO:
VLGHFEKPLFLELCR
ENSG00000032444.11
1


1650








SEQ ID NO:
VLMDLQNQK
ENSG00000198947.10
1


1651








SEQ ID NO:
VLTTSPSR
ENSG00000019144.12
1


1652








SEQ ID NO:
VMLPPGAQHSDEK
ENSG00000130396.16
1


1653








SEQ ID NO:
VNFRPRYVTRYKTVTQLEWRCCP
ENSG00000132205.6
1


1654
GFRGGDCQEGPK







SEQ ID NO:
VPDMAEIQSR
ENSG00000032444.11
1


1655








SEQ ID NO:
VQLLSQYDNEK
ENSG00000184922.9
1


1656








SEQ ID NO:
VSRASSPEGRHLPSPQLGTK
ENSG00000105559.7
1


1657








SEQ ID NO:
VTCTGYHQVR
ENSG00000133316.11
1


1658








SEQ ID NO:
VTEFDAAR
ENSG00000136631.8
1


1659








SEQ ID NO:
VVQEENQHMQMTIQALQDELR
ENSG00000082805.15
1


1660








SEQ ID NO:
VYLDLTPVK
ENSG00000169129.10
1


1661








SEQ ID NO:
WCATSDPEQHK
ENSG00000163975.7
1


1662








SEQ ID NO:
WFSIQNNQLVYQK
ENSG00000114331.8
1


1663








SEQ ID NO:
WIEFCQLLSER
ENSG00000198947.10
1


1664








SEQ ID NO:
WYQNPDYNFFNNYK
ENSG00000073849.10
1


1665








SEQ ID NO:
YADSLKPNIPYK
ENSG00000130396.16
1


1666








SEQ ID NO:
YENHSATAESSR
ENSG00000152894.10
1


1667








SEQ ID NO:
YLITATLTPER
ENSG00000132205.6
1


1668








SEQ ID NO:
YLQQPGCLLVGTNMDNR
ENSG00000184207.8
1


1669








SEQ ID NO:
YLRELSGSGLER
ENSG00000213380.9
1


1670








SEQ ID NO:
YLSASEYGSSVDGHPEVPETK
ENSG00000169129.10
1


1671








SEQ ID NO:
YNASSQQQR
ENSG00000165322.13
1


1672








SEQ ID NO:
YQETMSAIR
ENSG00000198947.10
1


1673








SEQ ID NO:
YSFWLTTIPEQSFQGSPSADTLK
ENSG00000134871.13
1


1674








SEQ ID NO:
YTKQGFGNLPICMAK
ENSG00000100714.11
1


1675








SEQ ID NO:
YVPAIAHLIHSLN
ENSG00000106066.9
1


1676








SEQ ID NO:
AAECLDVDECHRVPPPCDLGR
ENSG00000090006.13
0


1677








SEQ ID NO:
AEGGKRPAR
ENSG00000104450.8
0


1678








SEQ ID NO:
AEPVWTPPAPAPAAPPSTPAAP
ENSG00000115310.13
0


1679
K







SEQ ID NO:
AFLCPLICHNGGVCVKPDR
ENSG00000090006.13
0


1680








SEQ ID NO:
AHLIHSLNPVR
ENSG00000106066.9
0


1681








SEQ ID NO:
AIAHLIHSLNPVR
ENSG00000106066.9
0


1682








SEQ ID NO:
AIWNVINW
ENSG00000112096.12
0


1683








SEQ ID NO:
AIWNVINWENV
ENSG00000112096.12
0


1684








SEQ ID NO:
ANGITMYAVGVGK
ENSG00000132561.9
0


1685








SEQ ID NO:
AQPVPFVPQVLGVMIGAGVAVV
ENSG00000032444.11
0


1686
VTAVLILLVVRR







SEQ ID NO:
ARILTAAR
ENSG00000004139.9
0


1687








SEQ ID NO:
AVGPGAGGAGSAVPGGAGPCA
ENSG00000142453.7
0


1688
TVSVFPGAR







SEQ ID NO:
AYDNFGVLGLDLWQVK
ENSG00000179218.9
0


1689








SEQ ID NO:
CVCPAGFR
ENSG00000090006.13
0


1690








SEQ ID NO:
CVHGPTGSR
ENSG00000090006.13
0


1691








SEQ ID NO:
CVPPRTSAGTFPGSQPQAPASPV
ENSG00000090006.13
0


1692
LPAR







SEQ ID NO:
DHPSSHSAQPPR
ENSG00000138162.13
0


1693








SEQ ID NO:
DKERLQAMMTHLHVKSTEPK
ENSG00000114861.14
0


1694








SEQ ID NO:
DLDNAEEKADALNK
ENSG00000011454.12
0


1695








SEQ ID NO:
DLYSALIQFFQIFPEYK
ENSG00000106066.9
0


1696








SEQ ID NO:
DPASDKLLGPAGLTWERNLPGA
ENSG00000138162.13
0


1697
GVGKEMAGVPPTLR







SEQ ID NO:
DSAVMDDSVVIPSHQVSTLAK
ENSG00000145362.12
0


1698








SEQ ID NO:
DSSTPYQEIAAVPSAGR
ENSG00000138162.13
0


1699








SEQ ID NO:
DWDSPYSHDLDT
ENSG00000105223.14
0


1700








SEQ ID NO:
DWDSPYSHDLDTS
ENSG00000105223.14
0


1701








SEQ ID NO:
EDLDQSPLVSSSDSPPRPQPAFK
ENSG00000115310.13
0


1702








SEQ ID NO:
EESREPAPASPAPA
ENSG00000113657.8
0


1703








SEQ ID NO:
ELSSKGVK
ENSG00000176890.11
0


1704








SEQ ID NO:
EMELRRQALEEERR
ENSG00000019144.12
0


1705








SEQ ID NO:
ENGTVPK
ENSG00000165322.13
0


1706








SEQ ID NO:
ENKEVVLQWFTENSK
ENSG00000166825.9
0


1707








SEQ ID NO:
EVAESPRPR
ENSG00000019144.12
0


1708








SEQ ID NO:
FILDNLK
ENSG00000151835.9
0


1709








SEQ ID NO:
FLEAVAEEKPHVKPYFSK
ENSG00000065534.14
0


1710








SEQ ID NO:
FPIEGGQKDPK
ENSG00000107957.12
0


1711








SEQ ID NO:
FSTEYELQQLEQFKKDNEETGFG
ENSG00000166825.9
0


1712
SGTR







SEQ ID NO:
FWPAIDDGLR
ENSG00000105223.14
0


1713








SEQ ID NO:
FYIDFGGVKPMGSEPVPKSR
ENSG00000004864.9
0


1714








SEQ ID NO:
GADLIEEAASRIVDAVIEQVKAAG
ENSG00000170776.15
0


1715
ALLTEGE







SEQ ID NO:
GADYAEPTWNLK
ENSG00000166825.9
0


1716








SEQ ID NO:
GDEEKDKGLQTSQDAR
ENSG00000179218.9
0


1717








SEQ ID NO:
GDILQTPQFQMR
ENSG00000137497.13
0


1718








SEQ ID NO:
GDNLPQYR
ENSG00000205277.5
0


1719








SEQ ID NO:
GNEAVASR
ENSG00000135052.12
0


1720








SEQ ID NO:
GPNKHTLTQIKDAVR
ENSG00000146731.6
0


1721








SEQ ID NO:
GQGPMFLDADFVAFTNHFK
ENSG00000198947.10
0


1722








SEQ ID NO:
GTATPELHTATDYR
ENSG00000170776.15
0


1723








SEQ ID NO:
GWAGDSGPQGRPGVFGLPGEK
ENSG00000134871.13
0


1724








SEQ ID NO:
GYLAPSGDLSLRR
ENSG00000090006.13
0


1725








SEQ ID NO:
HAEQQALR
ENSG00000142453.7
0


1726








SEQ ID NO:
IEDPSLLNSR
ENSG00000032444.11
0


1727








SEQ ID NO:
IFMEEVPGGSLSSLLRS
ENSG00000142733.10
0


1728








SEQ ID NO:
IFMEEVPGGSLSSLLRS
ENSG00000142733.10
0


1729








SEQ ID NO:
IIEVAPQVATQNVNPTPGAT
ENSG00000086475.10
0


1730








SEQ ID NO:
ILNSDQTTCR
ENSG00000132561.9
0


1731








SEQ ID NO:
ISCWGHSEPSMR
ENSG00000105223.14
0


1732








SEQ ID NO:
IVVHSVENMNFR
ENSG00000184922.9
0


1733








SEQ ID NO:
KAVAHMK
ENSG00000132561.9
0


1734








SEQ ID NO:
KDITAALAAER
ENSG00000106976.14
0


1735








SEQ ID NO:
KDNEETGFGSGTR
ENSG00000166825.9
0


1736








SEQ ID NO:
KHQGHFLLGTLSR
ENSG00000061938.12
0


1737








SEQ ID NO:
KIAEIQARR
ENSG00000152894.10
0


1738








SEQ ID NO:
KKEADMQQK
ENSG00000158560.10
0


1739








SEQ ID NO:
KLFGGPGSRR
ENSG00000110237.3
0


1740








SEQ ID NO:
KPAAGLSAAPVPTAPAAGAP
ENSG00000115310.13
0


1741








SEQ ID NO:
KSSTGSPTSPLNAEKLESEEDVSQ
ENSG00000065534.14
0


1742
A







SEQ ID NO:
KVVATTQMQAADARK
ENSG00000166825.9
0


1743








SEQ ID NO:
LADSDQASKVQQQK
ENSG00000137497.13
0


1744








SEQ ID NO:
LAYVSCVR
ENSG00000032444.11
0


1745








SEQ ID NO:
LGIVQGIVGARNTSAASTAQLVE
ENSG00000172037.9
0


1746
ATEELRREIG







SEQ ID NO:
LHYNELGAKVTERKQQ
ENSG00000198947.10
0


1747








SEQ ID NO:
LIEVGPSGAQFLGK
ENSG00000145362.12
0


1748








SEQ ID NO:
LKQTNLQWIK
ENSG00000198947.10
0


1749








SEQ ID NO:
LKTVFYR
ENSG00000104728.11
0


1750








SEQ ID NO:
LLISCWGHSEPSMR
ENSG00000105223.14
0


1751








SEQ ID NO:
LMFDRSEVYGPMK
ENSG00000166825.9
0


1752








SEQ ID NO:
LMLEWQFQK
ENSG00000130396.16
0


1753








SEQ ID NO:
LPAAPPVAPER
ENSG00000115310.13
0


1754








SEQ ID NO:
LPPVLGTESDATVK
ENSG00000065534.14
0


1755








SEQ ID NO:
LPQEPGR
ENSG00000135052.12
0


1756








SEQ ID NO:
LQGQDSERVRAWQR
ENSG00000165912.11
0


1757








SEQ ID NO:
LSRKGGHER
ENSG00000019144.12
0


1758








SEQ ID NO:
LTELENELNTK
ENSG00000130396.16
0


1759








SEQ ID NO:
LTGKAEGGK
ENSG00000104450.8
0


1760








SEQ ID NO:
LWEAVKRR
ENSG00000061938.12
0


1761








SEQ ID NO:
LWHLDPDTEYEIR
ENSG00000152894.10
0


1762








SEQ ID NO:
LYGVVLTPPMK
ENSG00000061938.12
0


1763








SEQ ID NO:
MELEEVTRLLNLKDK
ENSG00000104450.8
0


1764








SEQ ID NO:
MIEDSGPGMKVLL
ENSG00000136631.8
0


1765








SEQ ID NO:
MPVAGSELPR
ENSG00000176890.11
0


1766








SEQ ID NO:
NFVLVLSPGALDK
ENSG00000004139.9
0


1767








SEQ ID NO:
NIMFGPDICGPGTK
ENSG00000179218.9
0


1768








SEQ ID NO:
NITIIVEDPIAESCNDKAKLRGPL
ENSG00000145016.9
0


1769








SEQ ID NO:
NPKAEVARAQAALAVNISAARG
ENSG00000146731.6
0


1770
LQDVLRTNLGPK







SEQ ID NO:
NQVTQLK
ENSG00000100714.11
0


1771








SEQ ID NO:
NVINWENVTER
ENSG00000112096.12
0


1772








SEQ ID NO:
PGHYDILYK
ENSG00000167770.7
0


1773








SEQ ID NO:
PGSPGLPGMPGR
ENSG00000134871.13
0


1774








SEQ ID NO:
PLEEGLNKAIHYFR
ENSG00000115652.10
0


1775








SEQ ID NO:
PLSTRVPR
ENSG00000132561.9
0


1776








SEQ ID NO:
PSAGFLPTHR
ENSG00000090006.13
0


1777








SEQ ID NO:
PSGPQPQADLQALLQSGAQVR
ENSG00000105223.14
0


1778








SEQ ID NO:
PSSSGSTGTKLSPARSTTSGLVGE
ENSG00000205277.5
0


1779
STPSR







SEQ ID NO:
PSSSGSTGTKLSPARSTTSGLVGE
ENSG00000205277.5
0


1780
STPSR







SEQ ID NO:
QGYILNSDQTTCR
ENSG00000132561.9
0


1781








SEQ ID NO:
QVFEELWK
ENSG00000059691.7
0


1782








SEQ ID NO:
QVKPKTVSEEERKV
ENSG00000065534.14
0


1783








SEQ ID NO:
QYISKMIEDSGPGMK
ENSG00000136631.8
0


1784








SEQ ID NO:
QYMPWEAALSSLSYFK
ENSG00000166825.9
0


1785








SEQ ID NO:
RADVLAFPSSGFTDLAEIVSR
ENSG00000032444.11
0


1786








SEQ ID NO:
RAVAAQPGRKR
ENSG00000172977.8
0


1787








SEQ ID NO:
RDEGSQDQTGSLSRARPSSR
ENSG00000110237.3
0


1788








SEQ ID NO:
RDPEVGKDELSKPSSDAESR
ENSG00000138162.13
0


1789








SEQ ID NO:
RMQSSADLIIQEFMDLRTR
ENSG00000151914.13
0


1790








SEQ ID NO:
SASFEPFSNK
ENSG00000179218.9
0


1791








SEQ ID NO:
SDQIGLPDFNAGAMENWGLVT
ENSG00000166825.9
0


1792
YR







SEQ ID NO:
SFACQCPEGHVLR
ENSG00000132561.9
0


1793








SEQ ID NO:
SFLKLILQVEKWQEECEEGEGRTI
ENSG00000152894.10
0


1794
IHCLNGGGR







SEQ ID NO:
SFPAAQIPIAVEEPGSSSRESVSK
ENSG00000138162.13
0


1795
AGMPVSADAAK







SEQ ID NO:
SFTQGEGAR
ENSG00000132561.9
0


1796








SEQ ID NO:
SFTQGEGARPLSTR
ENSG00000132561.9
0


1797








SEQ ID NO:
SHTLSHASYLR
ENSG00000145362.12
0


1798








SEQ ID NO:
SLEQLQK
ENSG00000137497.13
0


1799








SEQ ID NO:
SPHTTLSPAGSTTR
ENSG00000205277.5
0


1800








SEQ ID NO:
SPHTTLSPAGSTTR
ENSG00000205277.5
0


1801








SEQ ID NO:
SPHTTLSPAGSTTR
ENSG00000205277.5
0


1802








SEQ ID NO:
SPHTTLSPAGSTTR
ENSG00000205277.5
0


1803








SEQ ID NO:
SQTLIDLNR
ENSG00000059691.7
0


1804








SEQ ID NO:
SSHNFQLESVNK
ENSG00000135052.12
0


1805








SEQ ID NO:
STCAPSPQR
ENSG00000138162.13
0


1806








SEQ ID NO:
STTFYSSPR
ENSG00000205277.5
0


1807








SEQ ID NO:
STTFYSSPR
ENSG00000205277.5
0


1808








SEQ ID NO:
STTFYSSPR
ENSG00000205277.5
0


1809








SEQ ID NO:
STTFYSSPR
ENSG00000205277.5
0


1810








SEQ ID NO:
STTFYSSPR
ENSG00000205277.5
0


1811








SEQ ID NO:
STTFYSSPR
ENSG00000205277.5
0


1812








SEQ ID NO:
STTFYSSPR
ENSG00000205277.5
0


1813








SEQ ID NO:
STTFYSSPR
ENSG00000205277.5
0


1814








SEQ ID NO:
STTFYSSPR
ENSG00000205277.5
0


1815








SEQ ID NO:
TATAGAISELTESRLR
ENSG00000128487.12
0


1816








SEQ ID NO:
TEVAIGPSVLNAAR
ENSG00000067704.8
0


1817








SEQ ID NO:
TGDPQETLRR
ENSG00000137497.13
0


1818








SEQ ID NO:
THLSLSHNPEQKGVPTGFILPIRDI
ENSG00000100714.11
0


1819
R







SEQ ID NO:
THTATGIR
ENSG00000169896.12
0


1820








SEQ ID NO:
TLATQLNQQK
ENSG00000151914.13
0


1821








SEQ ID NO:
TPVPEKVPPPKPATPDF
ENSG00000065534.14
0


1822








SEQ ID NO:
TVQQPTVQHR
ENSG00000132561.9
0


1823








SEQ ID NO:
TYQGFWNPPLAPR
ENSG00000152894.10
0


1824








SEQ ID NO:
VLCGDAGLLRGLADGLVQAGVG
ENSG00000142733.10
0


1825
TEALLTPLVGRLARL







SEQ ID NO:
VLCGDAGLLRGLADGLVQAGVG
ENSG00000142733.10
0


1826
TEALLTPLVGRLARL







SEQ ID NO:
VNYDEENWRK
ENSG00000166825.9
0


1827








SEQ ID NO:
VPEGFTCR
ENSG00000090006.13
0


1828








SEQ ID NO:
WSELRKKSLNIR
ENSG00000198947.10
0


1829








SEQ ID NO:
WSSRGSGGWGVYRSPSFGAGE
ENSG00000110237.3
0


1830
GLLR







SEQ ID NO:
WYQPSFHGVDLSALR
ENSG00000142453.7
0


1831








SEQ ID NO:
YCNPGDVCYYASR
ENSG00000134871.13
0


1832








SEQ ID NO:
YGNLGHVNIGAIQEPLAFILPK
ENSG00000213380.9
0


1833








SEQ ID NO:
YITISGNR
ENSG00000151914.13
0


1834








SEQ ID NO:
YLSYTLNPDLIRK
ENSG00000166825.9
0


1835








SEQ ID NO:
YMVTER
ENSG00000105223.14
0


1836









To examine possible functions of somatic promoters on cancer development, we focused on RASA3, a RAS GTPase-activating protein required for Gαi-induced inhibition of mitogen-activated protein kinases. In both GCs (50%) and GC lines, we observed gain of promoter activity at an intronic region 127 kb downstream apart from the canonical RASA3 TSS (FIG. 3c, top, FIG. 10). RNA-seq and 5′ RACE analysis confirmed expression of this shorter RASA3 isoform (FIG. 3c, bottom), and expression of this shorter RASA3 isoform was also observed in TCGA RNA-seq data (FIG. 3c). Compared to the canonical full-length RASA3 protein (CanT), the shorter 31 kDa RASA3 somatic isoform (SomT) is predicted to lack the N-terminal RasGAP domain (FIG. 3d). Consistent with these predictions, transection of RASA3 CanT into GES1 normal gastric epithelial cells induced lower levels of active GTP-bound RAS compared to either empty vector or RASA3 SomT transfected cells, indicating that RASA3 CanT has higher RASGAP activity (FIG. 13).


To address functions of RASA3 SomT, we transfected the RASA3 CanT and SomT isoforms into SNU1967 GC cells. Compared to untransfected cells, transfection of RASA3 SomT into SNU1967 cells significantly stimulated migration (P<0.01) and invasion (P<0.01) while RASA3 CanT significantly suppressed invasion (P<0.001) (FIG. 3E, FIG. 13). Similarly, transfection of RASA3 SomT into GES1 cells significantly stimulated migration (p<0.01, FIG. 3e) and invasion (P<0.01, FIG. 13) while RASA3 CanT did not. When tested on KRAS mutated AGS GC cells that are innately highly migratory, expression of RASA3 CanT potently suppressed migration while RASA3 SomT exhibited significantly less attenuation (P<0.01, FIG. 13). These results suggest that tumor-specific use of RASA3 SomT is likely to increase GC cell migration and invasion. Notably, RASA3 CanT and SomT transfections did not alter SNU1967, GES1 or AGS cellular proliferation rates (FIG. 13). To confirm that these observations are not due to non-physiological in vitro expression levels, we then examined NCC24 GC cells, which normally express high endogenous levels of RASA3 SomT and minimal RASA3 CanT (FIG. 13). Silencing of endogenous RASA3 SomT using two independent siRNA constructs significantly inhibited NCC24 migration and invasion (P<0.01-0.001) (FIG. 13), consistent with RASA3 SomT playing a role in promoting cancer migration and invasion.


In an earlier study, we reported a transcript isoform of the MET receptor tyrosine kinase, driven by an internal alternative promoter, which has been independently confirmed in other cancer types. However, functional implications of this MET variant remain unclear. RNA-seq and 5′ RACE analysis confirmed transcript expression of this shorter isoform, predicted to harbor a truncated SEMA domain (FIG. 14). To assess functional differences between wild type (WT) and variant (Var) MET, we performed transient transfections of MET(WT) and MET(Var) into HEK293 cells. In both untreated and HGF-treated conditions, MET-Var transfected cells exhibited significantly higher levels of p-Gab1 (Y627), a key mediator of MET signaling (e.g. 2.48-3.95 fold comparing MET-Var vs MET-WT, P=0.003 (untreated), P<0.05 (T15 and T30). (66) In addition, in HGF-untreated samples, cells transfected with MET-Var also exhibited higher p-ERK1/2 levels (2.74 fold) and also higher p-STAT3 (Y705)(67-70) levels (1.80 fold) compared to MET-WT (P=0.023 and P=0.026 for p-ERK and p-STAT3 (Y705) respectively). These results suggest that expression of the MET Var isoform may promote MET-downstream signaling kinetics in a manner important for GC tumorigenesis.


Somatic Promoters Correlate with Tumor Immunity


Cancer immunoediting is a process where developing tumors sculpt their immunogenic and antigenic profile to evade host immune surveillance. Mechanisms of cancer immunoediting are diverse, including upregulation of immune checkpoint inhibitors such as PD-L1. To explore potential contributions of somatic promoters to tumor immunity, we identified somatic promoter-associated N-terminal peptides with high predicted affinity binding to GC specific MHC Class I HLA alleles (Table 8 and 9), which are required for antigen presentation to CD8+ cytotoxic T cells (IC50≤50 nM, FIG. 4a). Analysis of recurrent somatic promoter-associated peptides using the NetMHCpan-2.8 algorithm revealed a significant enrichment in high-affinity MHC I binding compared to multiple control peptide populations, including canonical GC peptides (average 36% vs 24%; P<0.01), randomly selected peptides (P<0.001), and C-terminal peptides (P<0.01) (FIG. 4B shows HLA-A, B, and C combined, FIG. 15A depicts data for HLA-A only). The majority of high affinity somatic promoter-associated peptides corresponded to situations where the somatic transcript lacking the N-terminal peptide is overexpressed in tumors relative to normal tissues (78% lost; 76/97 high-affinity peptides, FIG. 4C). Notably, because transcripts driven by the N-terminal lacking somatic TSSs are also overexpressed in tumors to a significantly greater degree than transcripts driven by the canonical TSS (P<0.05, Wilcoxon one sided test) (FIG. 12), such a scenario would be predicted to result in relative depletion of these N-terminal immunogenic peptides in tumors. Interestingly, an analogous N-terminal analysis using RNA-seq data alone (in the absence of epigenomic data) revealed that epigenome-guided N-terminal peptides exhibited significantly higher predicted immunogenicity scores compared to RNA-seq-only identified peptides (36.10% vs 27% for MHC presentation, P=0.02, Fisher Test), suggesting that epigenome-guided promoter identification can provide complementary value to RNA-seq-only guided analyses (FIG. 15).









TABLE 8







HLA prediction of GC samples













Sample
A1
A2
B1
B2
C1
C
















2000639
A*33:03
A*24:02
B*58:01
B*40:01
C*03:02
C*03:67


2000721
A*11:01
A*11:01
B*46:01
B*15:01
C*01:02
C*04:01


2000986
A*24:02
A*11:01
B*40:01
B*38:02
C*07:02
C*15:02


980437
A*33:03
A*02:07
B*40:01
B*39:01
C*07:02
C*04:01


990068
A*02:03
A*11:01
B*51:01
B*55:02
C*08:01
C*14:02


2000085
A*24:07
A*34:01
B*15:21
B*15:21
C*04:03
C*04:03


980401
A*33:03
A*11:01
B*58:01
B*40:01
C*03:02
C*07:02


980447
A*11:01
A*11:01
B*38:02
B*27:04
C*12:02
C*07:02


2001206
A*02:07
A*24:02
B*46:01
B*40:06
C*01:02
C*08:01


980436
A*02:03
A*02:07
B*46:01
B*46:01
C*01:02
C*01:02


980417
A*33:03
A*11:01
B*58:01
B*46:01
C*03:02
C*01:02


980319
A*33:03
A*11:02
B*58:01
B*27:04
C*03:02
C*12:02


20021007
A*24:10
A*24:02
B*15:27
B*40:01
C*03:04
C*04:01
















TABLE 9







Recurrent N terminal sequences with high affinity to MHC Class I










SEQ ID NO.
Gene
N terminal sequence
High Affinity HLA





SEQ ID NO: 1847
ENSG00000007171.12
MACPWKFLFKTKFHQYA
A*02:03, A*02:07, A*11:01, 




MNGEKDINNNVEKAPCAT
A*11:02, A*24:10, A*34:01, 




SSPVTQDDLQYHNLSKQQ
B*15:01, B*15:21, B*15:27, 




NESPQPLVETGKKSPESLVK
B*27:04, B*39:01, B*40:01, 




LDATPLSSPRHVRIKNWGS
B*46:01, B*58:01, C*03:02, 




GMTFQDTLHHKAKGILTCR
C*12:02




SKSCLGSIMTPKSLTRGPRD





KPTPPDELLPQAIEFVNQYY





GSFKEAKIEEHLARVEAVTK





EIETTGTYQLTGDELIFATK





QAWRNAPRCIGRIQWSNL





QVFDARSCSTARE






SEQ ID NO: 1848
ENSG00000011028.9
MGPGRPAPAPWPRHLLRC
A*02:03, A*11:01, A*11:02, 




VLLLGCLHLGRPGAPGDAA
A*24:02, A*24:07, A*24:10, 




LPEPNVFLIFSHGLQGCLEA
A*33:03, B*15:01, B*15:27, 




QGGQVRVTPACNTSLPAQ
B*38:02, B*39:01, B*40:01, 




RWKWVSRNRLFNLGTMQ
B*40:06, B*51:01, B*58:01, 




CLGTGWPGTNTTASLGMY
C*03:02, C*03:04, C*12:02, 




ECDREALNLRWHCRTLGD
C*14:02




QLSLLLGARTSNISKPGTLE





RGDQTRSGQWRIYGSEED





LCALPYHEVYTIQGNSHGK





PCTIPFKYDNQWFHGCTST





GREDGHLWCATTQDYGK





DERWGFCPIKSNDCETFW





DKDQLTDSCYQFNFQSTLS





WREAWASCEQQGADLLSI





TEIHEQTYINGLLTGYSSTL





WIGLNDLDTSGGWQWSD





NSPLKYLNWESDQPDNPS





EENCGVIRTESSGGWQNR





DCSIALPYVCKKKPNATAEP





TPPDRWANVKVECEPSW





QPFQGHCYRLQAEKRSW





QESKKACLRGGGDLVSIHS





MAELEFITKQIKQEVEELWI





GLNDLKLQMNFEWSDGSL





VSFTHWHPFEPNNFRDSLE





DCVTIWGPEGRWNDSPC





NQSLPSICKKAGQLSQGAA





EEDHGCRKGWTWHSPSC





YWLGEDQVTYSEARRLCT





DHGSQLVTITNREEQAFVS





SLIYNWEGEYFWTALQDL





NSTGSFFWLSGDEVMYTH





WNRDQPGYSRGGCVALA





TGSAMGLWEVKNCTSFRA





RYICRQSLGTPVTPELPGPD





PTPSLTGSCPQGWASDTKL





RYCYKVFSSERLQDKKSWV





QAQGACQELGAQLLSLASY





EEEHFVANMLNKIFGESEP





EIHEQHWFWIGLNRRDPR





GGQSWRWSDGVGFSYHN





FDRSRHDDDDIRGCAVLDL





ASLQWVAMQCDTQLDWI





CKIPRGTDVREPDDSPQGR





REWLRFQEAEYKFFEHHST





WAQAQRICTWFQAELTSV





HSQAELDFLSHNLQKFSRA





QEQHWWIGLHTSESDGRF





RWTDGSIINFISWAPGKPR





PVGKDKKCVYMTASRED





WGDQRCLTALPYICKRSNV





TKETQPPDLPTTALGGCPS





DWIQFLNKCFQVQGQEPQ





SRVKWSEAQFSCEQQEAQ





LVTITNPLEQAFITASLPNV





TFDLWIGLHASQRDFQWV





EQEPLMYANWAPGEPSG





PSPAPSGNKPTSCAVVLHS





PSAHFTGRWDDRSCTEET





HGFICQKGTDPSLSPSPAAL





PPAPGTELSYLNGTFRLLQK





PLRWHDALLLCESRNASLA





YVPDPYTQAFLTQAARGLR





TPLWIGLAGEEGSRRYSW





VSEEPLNYVGWQDGEPQ





QPGGCTYVDVDGAWRTT





SCDTKLQGAVCGVSSGPPP





PRRISYHGSCPQGLADSA





WIPEREHCYSFHMELLLGH





KEARQRCQRAGGAVLSILD





EMENVFVWEHLQSYEGQS





RGAWLGMNFNPKGGTLV





WQDNTAVNYSNWGPPGL





GPSMLSHNSCYWIQSNSG





LWRPGACTNITMGVVCKL





PRAEQSSFSPSALPENPAAL





VVVLMAVLLLLALLTAALIL





YRRRQSIERGAFEGARYSR





SSSSPTEATEKNILVSDME





MNEQQE






SEQ ID NO: 1849
ENSG00000020256.15
MNASSEGESFAGSVQIPG
A*02:03, B*15:01, C*03:02, 




GTTVLVELTPDIHICGICKQ
C*03:04




QFNNLDAFVAHKQSGCQL





TGTSAAAPSTVQFVSEETV





PATQTQTTTRTITSETQTIT





VSAPEFVFEHGYQTY






SEQ ID NO: 1850
ENSG00000032389.8
MEDDAPVIYGLEFQARALT
A*02:03, A*24:07, A*24:10, 




PQTAETDAIRFLVGTQSLKY
A*33:03, B*15:01, B*15:21, 




DNQIHIIDFDDENNIINKNV
B*15:27, B*38:02, B*39:01, 




LLHQAGEIWHISASPADRG
B*40:01, B*40:06, B*46:01, 




VLTTCYNRRDIIESFGILPVA
B*51:01, B*55:02, B*58:01, 




QSPTIVFVNTLHQVFFRGQ
C*01:02, C*03:02, C*03:04, 




VAASDSKVLTCAAVWR
C*03:67, C*04:01, C*08:01, 





C*12:02, C*14:02, C*15:02





SEQ ID NO: 1851
ENSG00000037042.8
MLEAILGGGGLPVEGRGST
A*02:03, A*11:01, A*11:02, 




EFEAFRLILFGSEDSVLPSPL
A*24:02, A*24:07, A*24:10, 




LYKMAHMGSDGGVLPVH
B*40:01, B*40:06, B*51:01, 




YATILFSL
C*01:02, C*04:03, C*08:01, 





C*14:02





SEQ ID NO: 1852
ENSG00000053747.11
MAAAARPRGRALGPVLPP
A*02:03, A*11:01, A*11:02, 




TPLLLLVLRVLPACGATARD
A*24:02, A*24:07, A*24:10, 




PGAAAGLSLHPTYFNLAEA
A*33:03, B*15:01, B*39:01, 




ARIWATATCGERGPGEGR
B*40:01, B*55:02, B*58:01, 




PQPELYCKLVGGPTAPGSG
C*03:02, C*03:04, C*03:67, 




HTIQGQFCDYCNSEDPRKA
C*07:02, C*12:02, C*14:02, 




HPVTNAIDGSERWWQSPP
C*15:02




LSSGTQYNRVNLTLDLGQL





FHVAYILIKFANSPRPDLWV





LERSVDFGSTYSPWQYFAH





SKVDCLKEFGREANMAVT





RDDDVLCVTEYSRIVPLEN





GEVVVSLINGRPGAKNFTF





SHTLREFTKATNIRLRFLRT





NTLLGHLISKAQRDPTVTR





RYYYSIKDISIGGQCVCNGH





AEVCNINNPEKLFRCECQH





HTCGETCDRCCTGYNQRR





WRPAAWEQSHECEACNC





HGHASNCYYDPDVERQQA





SLNTQGIYAGGGVCINCQH





NTAGVNCEQCAKGYYRPY





GVPVDAPDGCIPCSCDPEH





ADGCEQGSGRCHCKPNFH





GDNCEKCAIGYYNFPFCLRI





PIFPVSTPSSEDPVAGDIKG





CDCNLEGVLPEICDAHGRC





LCRPGVEGPRCDTCRSGFY





SFPICQACWCSALGSYQM





PCSSVTGQCECRPGVTGQ





RCDRCLSGAYDFPHCQGSS





SACDPAGTINSNLGYCQCK





LHVEGPTCSRCKLLYWNLD





KENPSGCSECKCHKAGTVS





GTGECRQGDGDCHCKSHV





GGDSCDTCEDGYFALEKSN





YFGCQGCQCDIGGALSSM





CSGPSGVCQCREHVVGKV





CQRPENNYYFPDLHHMKY





EIEDGSTPNGRDLRFGFDP





LAFPEFSWRGYAQMTSVQ





NDVRITLNVGKSSGSLFRVI





LRYVNPGTEAVSGHITIYPS





WGAAQSKEIIFLPSKEPAFV





TVPGNGFADPFSITPGIWV





ACIKAEGVLLDYLVLLPRDY





YEASVLQLPVTEPCAYAGP





PQENCLLYQHLPVTRFPCT





LACEARHFLLDGEPRPVAV





RQPTPAHPVMVDLSGREV





ELHLRLRIPQVGHYVVVVE





YSTEAAQLFVVDVNVKSSG





SVLAGQVNIYSCNYSVLCR





SAVIDHMSRIAMYELLADA





DIQLKGHMARFLLHQVCII





PIEEFSAEYVRPQVHCIASY





GRFVNQSATCVSLAHETPP





TALILDVLSGRPFPHLPQQS





SPSVDVLPGVTLKAPQNQ





VTLRGRVPHLGRYVFVIHF





YQAAHPTFPAQVSVDGG





WPRAGSFHASFCPHVLGC





RDQVIAEGQIEFDISEPEVA





ATVKVPEGKSLVLVRVLVV





PAENYDYQILHKKSMDKSL





EFITNCGKNSFYLDPQTASR





FCKNSARSLVAFYHKGALP





CECHPTGATGPHCSPEGG





QCPCQPNVIGRQCTRCAT





GHYGFPRCKPCSCGRRLCE





EMTGQCRCPPRTVRPQCE





VCETHSFSFHPMAGCEGC





NCSRRGTIEAAMPECDRDS





GQCRCKPRITGRQCDRCAS





GFYRFPECVPCNCNRDGTE





PGVCDPGTGACLCKENVE





GTECNVCREGSFHLDPANL





KGCTSCFCFGVNNQCHSS





HKRRTKFVDMLGWHLETA





DRVDIPVSFNPGSNSMVA





DLQELPATIHSASWVAPTS





YLGDKVSSYGGYLTYQAKS





FGLPGDMVLLEKKPDVQLT





GQHMSIIYEETNTPRPDRL





HHGRVHVVEGNFRHASSR





APVSREELMTVLSRLADVRI





QGLYFTETQRLTLSEVGLEE





ASDTGSGRIALAVEICACPP





AYAGDSC






SEQ ID NO: 1853
ENSG00000059145.14
MPSVSKAAAAALSGSPPQ
A*02:03, A*24:10, A*33:03, 




TEKPTHYRYLKEFRTEQCPL
B*15:01, B*39:01, B*40:01, 




FSQHKCAQHRPFTCFHWH
B*58:01, C*03:02, C*03:04, 




FLNQRRRRPLRRRDGTFNY
C*15:02




SPDVYCSKYNEATGVCPDG





DECPYLHRTTGDTERKYHL





RYYKTGTCIHETDARGHCV





KNGLHCAFAHGPLDLRPPV





CDVRELQAQEALQNGQLG





GGEGVPDLQPGVLASQA





MIEKILSEDPRWQDANFVL





GSYKTEQCPKPPRLCRQGY





ACPHYHNSRDRRRNPRRF





QYRSTPCPSVKHGDEWGE





PSRCDGGDGCQYCHSRTE





QQFHPESTKCNDMRQTGY





CPRGPFCAFAHVEKSLGM





VNEWGCHDLHLTSPSSTG





SGQPGNAKRRDSPAEGGP





RGSEQDSKQNHLAVFAAV





HPPAPSVSSSVASSLASSAG





SGSSSPTALPAPPARALPLG





PASSTVEAVLGSALDLHLS





NVNIASLEKDLEEQDGHDL





GAAGPRSLAGSAPVAIPGS





LPRAPSLHSPSSASTSPLGS





LSQPLPGPVGSSA






SEQ ID NO: 1854
ENSG00000060656.15
MARAQALVLALTFQLCAPE
A*02:03, A*11:01, A*11:02, 




TETPAAGCTFEEASDPAVP
A*24:02, A*24:10, A*33:03, 




CEYSQAQYDDFQWEQVRI
A*34:01, B*15:01, B*15:27, 




HPGTRAPADLPHGSYLMV
B*38:02, B*39:01, B*40:01, 




NTSQHAPGQRAHVIFQSLS
B*55:02, B*58:01, C*03:02, 




ENDTHCVQFSYFLYSRDGH
C*03:04, C*07:02, C*12:02, 




SPGTLGVYVRVNGGPLGS
C*14:02, C*15:02




AVWNMTGSHGRQWHQA





ELAVSTFWPNEYQVLFEALI





SPDRRGYMGLDDILLLSYP





CAKAPHFSRLGDVEVNAG





QNASFQCMAAGRAAEAE





RFLLQRQSGALVPAAGVR





HISHRRFLATEPLAAVSRAE





QDLYRCVSQAPRGAGVSN





FAELIVKEPPTPIAPPQLLRA





GPTYLIIQLNTNSIIGDGPIV





RKEIEYRMARGPWAEVHA





VSLQTYKLWHLDPDTEYEI





SVLLTRPGDGGTGRPGPPL





ISRTKCAEPMRAPKGLAFA





EIQARQLTLQWEPLGYNVT





RCHTYTVSLCYHYTLGSSH





NQTIRECVKTEQGVSRYTIK





NLLPYRNVHVRLVLTNPEG





RKEGKEVTFQTDEDVPSGI





AAESLTFTPLEDMIFLKWEE





PQEPNGLITQYEISYQSIESS





DPAVNVPGPRRTISKLRNE





TYHVFSNLHPGTTYLFSVR





ARTGKGFGQAALTEITTNIS





APSEDYADMPSPLGESENT





ITVLLRPAQGRGAPISVYQV





IVEEERARRLRREPGGQDC





FPVPLTFEAALARGLVHYF





GAELAASSLPEAMPFTVGD





NQTYRGFWNPPLEPRKAY





LIYFQAASHLKGETRLNCIRI





ARKAACKESKRPLEVSQRS





EEMGLILGICAGGLAVLILLL





GAIIVIIRKGKPVNMTKATV





NYRQEKTHMMSAVDRSFT





DQSTLQEDERLGLSFMDT





HGYSTRGDQRSGGVTEAS





SLLGGSPRRPCGRKGSPYH





TGQLHPAVRVADLLQHIN





QMKTAEGYGFKQEYESFFE





GWDATKKKDKVKGSRQEP





MPAYDRHRVKLHPMLGD





PNADYINANYIDGYHRSNH





FIATQGPKPEMVYDFWR





MVWQEHCSSIVMITKLVE





VGRVKCSRYWPEDSDTYG





DIKIMLVKTETLAEYVVRTF





ALERRGYSARHEVRQFHFT





AWPEHGVPYHATGLLAFIR





RVKASTPPDAGPIVIHCSA





GTGRTGCYIVLDVMLDMA





ECEGVVDIYNCVKTLCSRR





VNMIQTEEQYIFIHDAILEA





CLCGETTIPVSEFKATYKEM





IRIDPQSNSSQLREEFQTLN





SVTPPLDVEECSIALLPRNR





DKNRSMDVLPPDRCLPFLI





STDGDSNNYINAALTDSYT





RSAAFIVTLHPLQSTTPDF





WRLVYDYGCTSIVMLNQL





NQSNSAWPCLQYWPEPG





RQQYGLMEVEFMSGTAD





EDLVARVFRVQNISRLQEG





HLLVRHFQFLRWSAYRDTP





DSKKAFLHLLAEVDKWQA





ESGDGRTIVHCLNGGGRS





GTFCACATVLEMIRCHNLV





DVFFAAKTLRNYKPNMVE





TMDQYHFCYDVALEYLEGL





ESR






SEQ ID NO: 1855
ENSG00000066248.10
METRESEDLEKTRRKSASD
A*02:03, A*11:01, A*11:01, 




QWNTDNEPAKVKPELLPE
A*11:02, A*11:02, A*24:02, 




KEETSQADQDIQDKEPHC
A*24:10, A*33:03, A*33:03, 




HIPIKRNSIFNRSIRRKSKAK
A*34:01, B*15:01, B*15:21, 




ARDNPERNASCLADSQDN
B*15:27, B*39:01, B*40:01, 




GKSVNEPLTLNIPWSRMPP
B*46:01, B*58:01, C*03:02, 




CRT
C*03:04, C*03:67, C*12:02, 





C*14:02





SEQ ID NO: 1856
ENSG00000077092.14
MTTSGHACPVPAVNGHM
A*24:02, A*24:07, A*24:10, 




THYPATPYPLLFPPVIGGLS
A*34:01, B*15:01, B*15:21, 




LPPLHGLHGHPPPSGCSTP
B*15:27, B*46:01, B*51:01, 




SPATIETQS
B*55:02, C*01:02, C*03:02, 





C*04:01, C*07:02, C*12:02, 





C*14:02





SEQ ID NO: 1857
ENSG00000079308.12
MTRLSWCFSCVIRWGKYL
A*02:03, A*02:07, B*27:04, 




FSCLLPLRFCLRSQPEDLEA
B*39:01, B*46:01, C*01:02, 




PKTHRFKVKTFKKVKPCGIC
C*03:02, C*03:04, C*03:67, 




RQVITQEGCTCKVCSFSCH
C*08:01, C*14:02




RKCQAKVAAPCVPPSNHE





LVPITTENAPKNVVDKGEG





ASRGGNTRKSLEDNGSTRV





TPSVQPHLQPIRN






SEQ ID NO: 1858
ENSG00000080823.17
MKNYKAIGKIGEGTFSEVM
A*02:03, A*33:03, B*40:01, 




KMQSLRDGNYYACKQMK
C*03:02, C*14:02




QRFESIEQVNNLREIQALRR





LNPHPNILMLHEVVFDRKS





GSLALICELMDMNIYELIRG





RRYPLSEKKIMHYMYQLCK





SLDHIHRNGIFHRDVKPENI





LIKQDVLKLGD






SEQ ID NO: 1859
ENSG00000097021.15
MARPGLIHSAPGLPDTCAL
A*02:03




LQPPAASAAAAPS






SEQ ID NO: 1860
ENSG00000100441.5
MPTWGARPASPDRFAVSA
A*02:03, A*02:07, A*11:01, 




EAENKVREQQPHVERIFSV
A*11:02, A*24:02, A*24:07, 




GVSVLPKDCPDNPHIWLQ
A*24:10, A*33:03, B*15:01, 




LEGPKENASRAKEYLKGLCS
B*15:21, B*15:27, B*40:01, 




PELQDEIHYPPKLHCIFLGA
B*40:06, B*55:02, B*58:01, 




QGFFLDCLAWSTSAHLVPR
C*03:02, C*03:04, C*03:67, 




APGSLMISGLTEAFVMAQS
C*04:01, C*04:03, C*07:02, 




RVEELAERLSWDFTPGPSS
C*08:01, C*14:02, C*15:02




GASQCTGVLRDFSALLQSP





GDAHREALLQLPLAVQEEL





LSLVQEASSGQGPGALAS





WEGRSSALLGAQCQGVRA





PPSDGRESLDTGSMGPGD





CRGARGDTYAVEKEGGKQ





GGPREMDWGWKELPGEE





AWEREVALRPQSVGGGAR





ESAPLKGKALGKEEIALGG





GGFCVHREPPGAHGSCHR





AAQSRGASLLQRLHNGNA





SPPRVPSPPPAPEPPWHC





GDRGDCGDRGDVGDRGD





KQQGMARGRGPQWKRG





ARGGNLVTGTQRFKEALQ





DPFTLCLANVPGQPDLRHI





VIDGSNVAMVHGLQHYFS





SRGIAIAVQYFWDRGHRDI





TVFVPQWRFSKDAKVRES





HFLQKLYSLSLLSLTPSRVM





DGKRISSYDDRFMVKLAEE





TDGIIVSNDQFRDLAEESEK





W






SEQ ID NO: 1861
ENSG00000103056.7
MVLYTTPFPNSCLSALHCV
A*02:03, A*02:07, A*11:01, 




SWALIFPCYWLVDRLAASF
A*11:02, A*24:02, A*24:07, 




IPTTYEKRQRADDPCCLQLL
A*24:10, B*15:01, B*15:21, 




CTALFTPIYLALLVASLPFAF
B*15:27, B*27:04, B*38:02, 




LGFLFWSPLQSARRPYIYSR
B*39:01, B*40:01, B*40:06, 




LEDKGLAGGAALLSEWKG
B*46:01, B*51:01, B*55:02, 




TGPGKSFCFATANVCLLPD
B*58:01, C*01:02, C*03:02, 




SLARVNNLFNTQARAKEIG
C*03:04, C*03:67, C*04:01, 




QRIRNGAARPQIKIYIDSPT
C*04:03, C*07:02, C*08:01, 




NTSISAASFSSLVSPQGGD
C*12:02, C*15:02




GVARAVPGSIKRTASVEYK





GDGGRHPGDEAANGPAS





GDPVDSSSPEDACIVRIGG





EEGGRPPEADDPVPGGQA





RNGAGGGPRGQTPNHNQ





QDGDSGSLGSPSASRESLV





KGRAGPDTSASGEPGANS





KLLYKASVVKKAAARRRRH





PDEAFDHEVSAFFPANLDF





LCLQEVFDKRAATKLKEQL





HGYFEYILYDVGVYGCQGC





CSFKCLNSGLLFASRYPI






SEQ ID NO: 1862
ENSG00000103227.14
MLGAGLIKIRGDRCWRDL
A*02:03, A*11:01, A*11:02, 




TCMDFHYETQPMPNPVA
A*24:02, A*24:07, A*24:10, 




YYLHHSPWWFHRFETLSN
A*33:03, B*15:01, B*38:02, 




HFIELLVPFFLFLGRRACIIH
B*40:01, B*58:01, C*03:02, 




GVLQILFQAVLIVSGNLSFL
C*03:04, C*07:02, C*14:02, 




NWLTMVPSLACFDDATLG
C*15:02




FLFPSGPGSLKDRVLQMQ





RDIRGARPEPRFGSVVRRA





ANVSLGVLLAWLSVPVVLN





LLSSRQVMNTHFNSLHIVN





TYGAFGSITKERAEVILQGT





ASSNASAPDAMWEDYEFK





CKPGDPSRRPCLISPYHYRL





DWLMWFAAFQTYEHND





WIIHLAGKLLASDAEALSLL





AHNPFAGRPPPRWVRGE





HYRYKFSRPGGRHAAEGK





WWVRKRIGAYFPPLS






SEQ ID NO: 1863
ENSG00000105559.7
MEGSRPRSSLSLASSASTIS
A*02:03, A*11:01, A*11:02, 




SLSSLSPKKPTRAVNKIHAF
A*24:10, A*33:03, B*39:01, 




GKRGNALRRDPNLPVHIR
B*40:01, B*58:01, C*03:02, 




GWLHKQDSSGLRLWKRR
C*03:04, C*14:02




WFVLSGHCLFYYKDSREES





VLGSVLLPSYNIRPDGPGA





PRGRRFTFTAEHPGMRTY





VLAADTLEDLRGWLRALG





RASRAEGDDYGQPRSPAR





PQPGEGPGGPGGPPEVSR





GEEGRISESPEVTRLSRGRG





RPRLLTPSPTTDLHSGLQM





RRARSPDLFTPLSRPPSPLS





LPRPRSAPARRPPAPSGDT





APPARPHTPLSRIDVRPPLD





WGPQRQTLSRPPTPRRGP





PSEAGGGKPPRSPQHWSQ





EPRTQAHSGSPTYLQLPPR





PPGTRASMVLLPGPPLEST





FHQSLETDTLLTKLCGQDR





LLRRLQEEIDQKQEEKEQLE





AALELTRQQLGQATREAG





APGRAWGRQRLLQDRLVS





VRATLCHLTQERERVWDT





YSGLEQELGTLRETLEYLLH





LGSPQDRVSAQQQLWMV





EDTLAGLGGPQKPPPHTEP





DSPSPVLQGEESSERESLPE





SLELSSPRSPETDWGRPPG





GDKDLASPHLGLGSPRVSR





ASSPEGRHLPSPQLGTKAP





VARPRMSAQEQLERMRR





NQECGRPFPRPTSPRLLTL





GRTLSPARRQPDVEQRPV





VGHSGAQKWLRSSGSWSS





PRNTTPYLPTSEGHRERVLS





LSQALATEASQWHRMMT





GGNLDSQGDPLPGVPLPP





SDPTRQETPPPRSPPVANS





GSTGFSRRGSGRGGGPTP





WGPAWDAGIAPPVLPQD





EGAWPLRVTLLQSSF






SEQ ID NO: 1864
ENSG00000105639.14
MAPPSEETPLIPQRSCSLLS
A*02:03, A*11:01, A*11:02, 




TEAGALHVLLPARGPGPPQ
A*24:02, A*24:07, A*24:10, 




RLSFSFGDHLAEDLCVQAA
A*33:03, B*15:01, B*39:01, 




KASGILPVYHSLFALATEDL
B*40:01, B*55:02, B*58:01, 




SCWFPPSHIFSVEDASTQV
C*03:02, C*03:04, C*07:02, 




LLYRIRFYFPNWFGLEKCHR
C*14:02




FGLRKDLASAILDLPVLEHL





FAQHRSDLVSGRLPVGLSL





KEQGECLSLAVLDLARMAR





EQAQRPGELLKTVSYKACL





PPSLRDLIQGLSFVTRRRIR





RTVRRALRRVAACQADRH





SLMAKYIMDLERLDPAGA





AETFHVGLPGALGGHDGL





GLLRVAGDGGIAWTQGEQ





EVLQPFCDFPEIVDISIKQA





PRVGPAGEHRLVTVTRTD





NQILEAEFPGLPEALSFVAL





VDGYFRLTTDSQHFFCKEV





APPRLLEEVAEQCHGPITLD





FAINKLKTGGSRPGSYVLRR





SPQDFDSFLLTVCVQNPLG





PDYKGCLIRRSPTGTFLLVG





LSRPHSSLRELLATCWDGG





LHVDGVAVTLTSCCIPRPKE





KSNLIVVQRGHSPPTSSLV





QPQSQYQLSQMTFHKIPA





DSLEWHENLGHGSFTKIYR





GCRHEVVDGEARKTEVLLK





VMDAKHKNCMESFLEAAS





LMSQVSYRHLVLLHGVCM





AGDSTMVQEFVHLGAIDM





YLRKRGHLVPASWKLQVV





KQLAYALNYLEDKGLPHGN





VSARKVLLAREGADGSPPFI





KLSDPGVSPAVLSLEMLTD





RIPWVAPECLREAQTLSLE





ADKWGFGATVWEVFSGV





TMPISALDPAKKLQFYEDR





QQLPAPKWTELALLIQQC





MAYEPVQRPSFRAVIRDLN





SLISSDYELLSDPTPGALAPR





DGLWNGAQLYACQDPTIF





EERHLKYISQLGKGNFGSV





ELCRYDPLGDNTGALVAVK





QLQHSGPDQQRDFQREIQ





ILKALHSDFIVKYRGVSYGP





GRQSLRLVMEYLPSGCLRD





FLQRHRARLDASRLLLYSSQ





ICKGMEYLGSRRCVHRDLA





ARNILVESEAHVKIADFGLA





KLLPLDKDYYVVREPGQSPI





FWYAPESLSDNIFSRQSDV





WSFGVVLYELFTYCDKSCS





PSAEFLRMMGCERDVPAL





CRLLELLEEGQRLPAPPACP





AEVHELMKLCWAPSPQDR





PSFSALGPQLDMLWSGSR





GCETHAFTAHPEGKHHSLS





FS






SEQ ID NO: 1865
ENSG00000105650.17
MQAPVPHSQRRESFLYRS
A*02:03, B*15:01, B*39:01, 




DSDYELSPKAMSRNSSVAS
B*40:01, C*03:02, C*03:04, 




DLHGEDMIVTPFAQVLASL
C*15:02




RTVRSNVAALARQQCLGA





AKQGPVGN






SEQ ID NO: 1866
ENSG00000105963.9
MAKERRRAVLELLQRPGN
A*02:03, A*24:10, B*15:01, 




ARCADCGAPDPDWASYTL
C*03:02, C*03:04




GVFICLSCSGIHRNIPQVSK





VKSVRLDAWEEAQVEFMA





SHGNDAARARFESKVPSFY





YRPTP






SEQ ID NO: 1867
ENSG00000105976.10
MKAPAVLAPGILVLLFTLV
A*02:03, A*11:01, A*11:02, 




QRSNGECKEALAKSEMNV
A*24:02, A*24:07, A*24:10, 




NMKYQLPNFTAETPIQNVI
A*33:03, A*34:01, B*15:01, 




LHEHHIFLGATNYIYVLNEE
B*15:27, B*39:01, B*40:01, 




DLQKVAEYKTGPVLEHPDC
B*58:01, C*03:02, C*03:04, 




FPCQDCSSKANLSGGVWK
C*03:67, C*07:02, C*12:02, 




DNINMALVVDTYYDDQLIS
C*14:02, C*15:02




CGSVNRGTCQRHVFPHNH





TADIQSEVHCIFSPQIEEPS





QCPDCVVSALGAKVLSSVK





DRFINFFVGNTINSSYFPDH





PLHSISVRRLKETKDGFMFL





TDQSYIDVLPEFRDSYPIKY





VHAFESNNFIYFLTVQRETL





DAQTFHTRIIRFCSINSGLH





SYMEMPLECILTEKRKKRST





KKEVFNILQAAYVSKPGAQ





LARQIGASLNDDILFGVFA





QSKPDSAEPMDRSAMCAF





PIKYVNDFFNKIVNKNNVR





CLQHFYGPNHEHCFNRTLL





RNSSGCEARRDEYRTEFTT





ALQRVDLFMGQFSEVLLTS





ISTFIKGDLTIANLGTSEGRF





MQVVVSRSGPSTPHVNFL





LDSHPVSPEVIVEHTLNQN





GYTLVITGKKITKIPLNGLGC





RHFQSCSQCLSAPPFVQCG





WCHDKCVRSEECLSGTWT





QQICLPAIYKVFPNSAPLEG





GTRLTICGWDFGFRRNNK





FDLKKTRVLLGNESCTLTLS





ESTMNTLKCTVGPAMNKH





FNMSIIISNGHGTTQYSTFS





YVDPVITSISPKYGPMAGG





TLLTLTGNYLNSGNSRHISI





GGKTCTLKSVSNSILECYTP





AQTISTEFAVKLKIDLANRE





TSIFSYREDPIVYEIHPTKSFI





SGGSTITGVGKNLNSVSVP





RMVINVHEAGRNFTVACQ





HRSNSEIICCTTPSLQQLNL





QLPLKTKAFFMLDGILSKYF





DLIYVHNPVFKPFEKPVMIS





MGNENVLEIKGNDIDPEA





VKGEVLKVGNKSCENIHLH





SEAVLCTVPNDLLKLNSELN





IEWKQAISSTVLGKVIVQP





DQNFTGLIAGVVSISTALLL





LLGFFLWLKKRKQIKDLGSE





LVRYDARVHTPHLDRLVSA





RSVSPTTEMVSNESVDYRA





TFPEDQFPNSSQNGSCRQ





VQYPLTDMSPILTSGDSDIS





SPLLQNTVHIDLSALNPELV





QAVQHVVIGPSSLIVHFNE





VIGRGHFGCVYHGTLLDN





DGKKIHCAVKSLNRITDIGE





VSQFLTEGIIMKDFSHPNVL





SLLGICLRSEGSPLVVLPYM





KHGDLRNFIRNETHNPTVK





DLIGFGLQVAKGMKYLASK





KFVHRDLAARNCMLDEKF





TVKVADFGLARDMYDKEY





YSVHNKTGAKLPVKWMAL





ESLQTQKFTTKSDVWSFGV





LLWELMTRGAPPYPDVNT





FDITVYLLQGRRLLQPEYCP





DPLYEVMLKCWHPKAEM





RPSFSELVSRISAIFSTFIGEH





YVHVNATYVNVKCVAPYP





SLLSSEDNADDEVDTRPAS





FWETS






SEQ ID NO: 1868
ENSG00000107317.7
MATHHTLWMGLALLGVL
A*02:03, B*15:01, C*03:02, 




GDLQAAPEAQVSVQPNFQ
C*03:04, C*12:02




QD






SEQ ID NO: 1869
ENSG00000111700.8
MDQHQHLNKTAESASSEK
A*11:01, A*11:02




KKTRRCNGFK






SEQ ID NO: 1870
ENSG00000111860.9
MWGRFLAPEASGRDSPG
A*02:03, A*11:01, A*11:02, 




GARSFPAGPDYSSAWLPA
A*24:02, A*24:07, A*24:10, 




NESLWQATTVPSNHRNN
A*33:03, B*15:01, B*15:27, 




HIRRHSIASDSGDTGIGTSC
B*39:01, B*40:01, C*03:02, 




SDSVEDHSTSSGTLSFKPSQ
C*03:04, C*14:02




SLITLPTAHVMPSNSSASIS





KLRESLTPDGSKWSTSLMQ





TLGNHSRGEQDSSLDMKD





FRPLRKWSSLSKLTAPDNC





GQGGTVCREESRNGLEKIG





KAKALTSQLRTIGPSCLHDS





MEMLRLEDKEINKKRSSTL





DCKYKFESCSKEDFRASSST





LRRQPVDMTYSALPESKPI





MTSSEAFEPPKYLMLGQQ





AVGGVPIQPSVRTQMWLT





EQLRTNPLEGRNTEDSYSL





APWQQQQIEDFRQGSETP





MQVLTGSSRQSYSPGYQD





FSKWESMLKIKEGLLRQKEI





VIDRQKQQITHLHERIRDN





ELRAQHAMLGHYVNCEDS





YVASLQPQYENTSLQTPFS





EESVSHSQQGEFEQKLAST





EKEVLQLNEFLKQRLSLFSE





EKKKLEEKLKTRDRYISSLKK





KCQKESEQNKEKQRRIETL





EKYLADLPTLDDVQSQSLQ





LQILEEKNKNLQEALIDTEK





KLEEIKKQCQDKETQLICQK





KKEKELVTTVQSLQQKVER





CLEDGIRLPMLDAKQLQNE





NDNLRQQNETASKIIDSQQ





DEIDRMILEIQSMQGKLSK





EKLTTQKMMEELEKKERN





VQRLTKALLENQRQTDETC





SLLDQGQEPDQSRQQTVL





SKRPLFDLTVIDQLFKEMSC





CLFDLKALCSILNQRAQGK





EPNLSLLLGIRSMNCSAEET





ENDHSTETLTKKLSDVCQL





RRDIDELRTTISDRYAQDM





GDNCITQ






SEQ ID NO: 1871
ENSG00000111912.14
XEKTCSSLEREPHFSLLTMR
A*02:03, A*11:01, A*11:02, 




GQRLPLDIQIFYCARPDEEP
A*24:02, A*24:07, A*24:10, 




FVKIITVEEAKRRKSTCSYYE
A*33:03, B*15:01, B*15:27, 




DEDEEVLPVLRPHSALLEN
B*40:01, B*55:02, C*03:02, 




MHIEQLARRLPARVQGYP
C*03:04, C*03:67, C*12:02, 




WRLAYSTLEHGTSLKTLYRK
C*14:02, C*15:02




SASLDSPVLLVIKDMDNQIF





GAYATHPFKFSDHYYGTGE





TFLYTFSPHFKVFKWSGEN





SYFINGDISSLELGGGGGRF





GLWLDADLYHGRSNSCST





FNNDILSKKEDFIVQDLEV





WAFD






SEQ ID NO: 1872
ENSG00000112033.9
MEQPQEEAPEVREEEEKEE
A*02:03, A*02:07, A*11:01, 




VAEAEGAPELNGGPQHAL
A*11:02, A*24:02, A*24:07, 




PSSSYTDLSRSSSPPSLLDQL
A*24:10, A*33:03, A*34:01, 




QMGCDGASCGSLNMECR
B*15:01, B*15:21, B*15:27, 




VCGDKASGFHYGVHACEG
B*27:04, B*38:02, B*39:01, 




CKGFFRRTIRMKLEYEKCER
B*40:01, B*40:06, B*46:01, 




SCKIQKKNRNKCQYCRFQK
B*51:01, B*55:02, B*58:01, 




CLALGMSHNAIRFGRMPE
C*01:02, C*03:02, C*03:04, 




AEKRKLVAGLTANEGSQYN
C*04:01, C*04:03, C*07:02, 




PQVADLKAFSKHIYNAYLK
C*08:01, C*12:02, C*15:02




NFNMTKKKARSILTGKASH





TAPFVIHDIETLWQAEKGL





VWKQLVNGLPPYKEISVHV





FYRCQCTTVETVRELTEFAK





SIPSFSSLFLNDQVTLLKYG





VHEAIFAMLASIVNKDGLL





VANGSGFVTREFLRSLRKP





FSDIIEPKFEFAVKFNALELD





DSDLALFIAAIILCGDRPGL





MNVPRVEAIQDTILRALEF





HLQANHPDAQYLFP






SEQ ID NO: 1873
ENSG00000113594.5
MMDIYVCLKRPSWMVDN
A*02:03, A*11:01, A*11:02, 




KRMRTASNFQWLLSTFILL
A*24:02, A*24:07, A*24:10, 




YLMNQVNSQKKGAPHDLK
A*33:03, A*34:01, B*15:01, 




CVTNNLQVWNCSWKAPS
B*39:01, B*40:01, B*58:01, 




GTGRGTDYEVCIENRSRSC
C*03:02, C*03:04, C*03:67, 




YQLEKTSIKIPALSHGDYEITI
C*12:02, C*14:02, C*15:02




NSLHDFGSSTSKFTLNEQN





VSLIPDTPEILNLSADFSTST





LYLKWNDRGSVFPHRSNVI





WEIKVLRKESMELVKLVTH





NTTLNGKDTLHHWSWAS





DMPLECAIHFVEIRCYIDNL





HFSGLEEWSDWSPVKNIS





WIPDSQTKVFPQDKVILVG





SDITFCCVSQEKVLSALIGH





TNCPLIHLDGENVAIKIRNIS





VSASSGTNVVFTTEDNIFG





TVIFAGYPPDTPQQLNCET





HDLKEIICSWNPGRVTALV





GPRATSYTLVESFSGKYVRL





KRAEAPTNESYQLLFQMLP





NQEIYNFTLNAHNPLGRSQ





STILVNITEKVYPHTPTSFKV





KDINSTAVKLSWHLPGNFA





KINFLCEIEIKKSNSVQEQR





NVTIKGVENSSYLVALDKL





NPYTLYTFRIRCSTETFWK





WSKWSNKKQHLTTEASPS





KGPDTWREWSSDGKNLIIY





WKPLPINEANGKILSYNVS





CSSDEETQSLSEIPDPQHKA





EIRLDKNDYIISVVAKNSVG





SSPPSKIASMEIPNDDLKIE





QVVGMGKGILLTWHYDP





NMTCDYVIKWCNSSRSEP





CLMDWRKVPSNSTETVIES





DEFRPGIRYNFFLYGCRNQ





GYQLLRSMIGYIEELAPIVA





PNFTVEDTSADSILVKWED





IPVEELRGFLRGYLFYFGKG





ERDTSKMRVLESGRSDIKV





KNITDISQKTLRIADLQGKT





SYHLVLRAYTDGGVGPEKS





MYVVTKENSVGLIIAILIPVA





VAVIVGVVTSILCYRKREWI





KETFYPDIPNPENCKALQF





QKSVCEGSSALKTLEMNPC





TPNNVEVLETRSAFPKIEDT





EIISPVAERPEDRSDAEPEN





HVVVSYCPPIIEEEIPNPAA





DEAGGTAQVIYIDVQSMY





QPQAKPEEEQENDPVGGA





GYKPQMHLPINSTVEDIAA





EEDLDKTAGYRPQANVNT





WNLVSPDSPRSIDSNSEIVS





FGSPCSINSRQFLIPPKDED





SPKSNGGGWSFTNFFQNK





PND






SEQ ID NO: 1874
ENSG00000114541.10
MASVFMCGVEDLLFSGSR
A*02:03, A*11:01, A*11:02, 




FVWNLTVSTLRRWYTERLR
A*24:10, A*33:03, A*34:01, 




ACHQVLRTWCGLQDVYQ
B*40:01, B*58:01, C*07:02, 




MTEGRHCQVHLLDDRRLE
C*12:02, C*14:02




LLVQPKLLARELLDLVASHF





NLKEKEYFGITFIDDTGQQ





NWLQLDHRVLDHDLPKKP





GPTILHFAVRFYIESISFLKD





KTTVELFFLNAKACVHKGQ





IEVESETIFKLAAFILQEAKG





DYTSDENARKDLKTLPAFP





TKTLQEHPSLAYCEDRVIEH





YLKIKGLTRGQAVVQY






SEQ ID NO: 1875
ENSG00000115977.14
MKKFFDSRREQGGSGLGS
A*02:03, A*11:01, A*11:02, 




GSSGGGGSTSGLGSGYIGR
A*24:02, A*24:07, A*24:10, 




VFGIGRQQVTVDEVLAEG
B*15:01, B*39:01, B*40:01, 




GFAIVFLVRTSNGMKCALK
C*03:02, C*12:02, C*14:02




RMFVNNEHDLQVCKREIQI





MRDLSGHKNIVGYIDSSIN





NVSSGDVWEVLILMDFCR





GGQVVNLMNQRLQTGFT





ENEVLQIFCDTCEAVARLH





QCKTPIIHRDLKVENILLHD





RGHYVLCDFGSATNKFQN





PQTEGVNAVEDEIKKYTTL





SYRAPEMVNLYSGKIITTKA





DIWALGCLLYKLCYFTLPFG





ESQVAICDGNFTIPDNSRYS





QDMHCLIRYMLEPDPDKR





PDIYQVSYFSFKLLKKECPIP





NVQNSPIPAKLPEPVKASE





AAAKKTQPKARLTDPIPTTE





TSIAPRQRPKAGQTQPNP





GILPIQPALTPRKRATVQPP





PQAAGSSNQPGLLASVPQ





PKPQAPPSQPLPQTQAKQ





PQAPPTPQQTPSTQAQGL





PAQAQATPQHQQQLFLK





QQQQQQQPPPAQQQPA





GTFYQQQQAQTQQFQAV





HPATQKPAIAQFPVVSQG





GSQQQLMQNFYQQQQQ





QQQQQQQQQLATALHQ





QQLMTQQAALQQKPTMA





AGQQPQPQPAAAPQPAP





AQEPAIQAPVRQQPKVQT





TPPPAVQGQKVGSLTPPSS





PKTQRAGHRRILSDVTHSA





VFGVPASKSTQLLQAAAAE





AELLDPGRQTLQ






SEQ ID NO: 1876
ENSG00000116833.9
MSSNSDTGDLQESLKHGLT
A*02:03




PIGAGLPDRHGSPIPARGR





LV






SEQ ID NO: 1877
ENSG00000118855.14
MDAGKLARHPTDTGSERA
C*03:02, C*03:04, C*14:02




VPALAEIRPWWAPPLRPQ






SEQ ID NO: 1878
ENSG00000119547.5
MKAAYTAYRCLTKDLEGCA
A*02:03, A*11:01, A*11:02, 




MNPELTMESLGTLHGPAG
A*24:10, A*33:03, B*15:01, 




GGSGGGGGGGGGGGGG
B*15:27, B*39:01, B*58:01, 




GPGHEQELLASPSPHHAG
C*03:02, C*03:04, C*07:02, 




RGAAGSLRGPPPPPTAHQ
C*14:02




ELGTAAAAAAAASRSAMV





TSMASILDGGDYRPELSIPL





HHAMSMSCDSSPPGMG





MSNTYTTLTPLQPLPPISTV





SDKFHHPHPHHHPHHHH





HHHHQRLSGNVSGSFTLM





RDERGLPAMNNLYSPYKE





MPGMSQSLSPLAATPLGN





GLGGLHNAQQSLPNYGPP





GHDKMLSPNFDAHHTAM





LTRGEQHLSRGLGTPPAA





MMSHLNGLHHPGHTQSH





GPVLAPSRERPPSSSSGSQ





VATSGQLEEINTKEVAQRIT





AELKRYSIPQAIFAQRVLCR





SQGTLSDLLRNPKPWSKLK





SGRETFRRMWKWLQEPEF





QRMSALRLAA






SEQ ID NO: 1879
ENSG00000125826.15
MDEKTKKAEEMALSLTRA
A*02:03, A*02:07, A*11:01, 




VAGGDEQVAMKCAIWLA
A*11:02, A*24:10, A*33:03, 




EQRVPLSVQLKPEVSPTQD
B*40:01, C*03:02, C*03:04




IRLWVSVEDAQMHTVTIW





LTVRPDMTVASLKDMVFL





DYGFPPVLQQWVIGQRLA





RDQETLHSHGVRQNGDSA





YLYLLSARNTSLNPQELQRE





RQLRMLEDLGFKDLTLQPR





GPLEPGPPKPGVPQEPGR





GQPDAVPEPPPVGWQCP





GCTFINKPTRPGCEMCCRA





RPEAYQVPASYQPDEEERA





RLAGEEEALRQYQQRKQQ





QQEGNYLQHVQLDQRSLV





LNTEPAECPVCYSVLAPGE





AVVLRECLHTFCRECLQGTI





RNSQEAEVSCPFIDNTYSCS





GKLLEREIKALLTPEDYQRF





LDLGISIAENRSAFSYHCKT





PDCKGWCFFEDDVNEFTC





PVCFHVNCLLCKAIHEQM





NCKEYQEDLALRAQNDVA





ARQTTEMLKVMLQQGEA





MRCPQCQIVVQKKDGCD





WIRCTVCHTEICWVTKGPR





WGPGGPGDTSGGCRCRV





NGIPCHPSCQNCH






SEQ ID NO: 1880
ENSG00000129116.13
MSALASRSAPAMQSSGSF
A*02:03, A*11:01, A*11:02, 




NYARPKQFIAAQNLGPAS
A*24:02, A*24:10, A*33:03, 




GHGTPASSPSSSSLPSPMS
B*15:01, B*39:01, B*40:01, 




PTPRQFGRAPVPPFAQPF
B*58:01, C*03:02, C*03:04




GAEPEAPWGSSSPSPPPPP





PPVFSPTAAFPVPDVFPLPP





PPPPLPSPGQASHCSSPAT





RFGHSQTPAAFLSALLPSQ





PPPAAVNALGLPKGVTPA





GFPKKASRTARIASDEEIQG





TKDAVIQDLERKLRFKEDLL





NNGQPRLTYEERMARRLL





GADSATVFNIQEPEEETAN





QEYKVSSCEQRLISEIEYRLE





RSPVDESGDEVQYGDVPV





ENGMAPFFEMKLKHYKIFE





GMPVTFTCRVAGNPKPKIY





WFKDGKQISPKSDHYTIQR





DLDGTCSLHTTASTLDDDG





NYTIMAANPQGRISCTGRL





MVQAVNQRGRSPRSPSG





HPHVRRPRSRSRDSGDEN





EPIQERFFRPHFLQAPGDLT





VQEGKLCRMDCKVSGLPT





PDLSWQLDGKPVRPDSAH





KMLVRENGVHSLIIEPVTSR





DAGIYTCIATNRAGQNSFS





LELVVAAKE






SEQ ID NO: 1881
ENSG00000129682.9
MSGKVTKPKEEKDASKVLD
A*02:03, A*02:07, A*24:10, 




DAPPGTQEYIMLRQDSIQS
A*34:01, B*27:04, B*38:02, 




AELKKKESPFRAKCHEIFCC
B*39:01, B*46:01, B*55:02, 




PLKQVHHKENTEPEEPQLK
C*03:02, C*07:02, C*08:01, 




GIVTKLYSRQGYHLQLQAD
C*15:02




GTIDGTKDEDSTYTLFNLIP





VGLRVVAIQGVQTKLYLA






SEQ ID NO: 1882
ENSG00000131374.10
MYHSLSETRHPLQPEEQEV
A*02:03, A*24:02, A*24:07, 




GIDPLSSYSNKSGGDSNKN
A*24:10, A*33:03, B*27:04, 




GRRTSSTLDSEGTFNSYRKE
B*51:01, C*07:02, C*15:02




WEELFVNNNYLATIRQKGI





NGQLRSSRFRSICWKLFLC





VLPQDKSQWISRIEELRAW





YSNIKEIHITNPRKVVGQQ





DL






SEQ ID NO: 1883
ENSG00000131620.13
MWEASGMEERALEELAM
A*02:03, A*24:10, A*33:03, 




EETALDPLLAEAAGAVDGE
B*38:02, B*40:01, C*01:02




GAPPGGPSAQAATMRVN





EKYSTLPAEDRSVHIINICAI





EDIGYLPSEGTLLNSLSVDP





DAECKYGLYFRDGRRKVDY





ILVYHHKRPSGNRTLVRRV





QHSDTPSGARSVKQDHPL





PGKGASLDAGSGEPP






SEQ ID NO: 1884
ENSG00000132005.4
MATQAYTELQAAPPPSQP
B*15:01, B*58:01, C*03:02, 




PQAPPQAQPQPPPPPPPA
C*03:04, C*03:67, C*12:02, 




APQPPQPPTAAATPQPQY
C*14:02




VTELQSPQPQAQPPGGQK





QYVTELPAVPAPSQPTGAP





TPSPAPQQYIVVTVSEGAM





RASETVSEASPGSTASQTG





VPTQVVQQVQGTQQRLL





VQTSVQAKPGHVSPLQLT





NIQVPQQALPTQRLVVQS





AAPGSKGGQVSLTVHGTQ





QVHSPPEQSPVQANSSSSK





TAGAPTGTVPQQLQVHGV





QQSVPVTQERSVVQATPQ





APKPGPVQPLTVQGLQPV





HVAQEVQQLQQVPVPHV





YSSQVQYVEGGDASYTASA





IRSSTYSYPETPLYTQTASTS





YYEAAGTATQVSTPATSQA





VASSGS






SEQ ID NO: 1885
ENSG00000132359.9
MFGRKRSVSFGGFGWIDK
A*02:03, A*11:01, A*11:02, 




TMLASLKVKKQELANSSDA
A*34:01, B*40:01, C*03:02, 




TLPDRPLSPPLTAPPTMKSS
C*03:04, C*14:02, C*15:02




EFFEMLEKMQGIKLEEQKP





GPQKNKDDYIPYPSIDEVV





EKGGPYPQVILPQFGGYWI





EDPENVGTPTSLGSSICEEE





EEDNLSPNTFGYKLECKGE





ARAYRRHFLGKDHLNFYCT





GSSLGNLILSVKCEEAEGIEY





LRVILRSKLKTVHERIPLAGL





SKLPSVPQIAKAFCDDAVG





LRFNPVLYPKASQ






SEQ ID NO: 1886
ENSG00000134490.9
MCVRRSLVGLTFCTCYLAS
A*02:03, A*11:01, A*11:02, 




YLTNKYVLSVLKFTYPTLFQ
A*24:02, A*24:07, A*24:10, 




GWQTLIGGLLLHVSWKLG
A*33:03, B*15:01, B*15:27, 




WVEINSSSRSHVLVWLPAS
B*58:01, C*03:02, C*03:04, 




VLFVGIIYAGSRALSRLAIPV
C*12:02




FLTLHNVAEVIICGYQKCFQ





KEKTSPAKICSALLLLAAAG





CLPFNDSQFNPDGYFWAII





HLLCVGAYKILQKSQKPSAL





SDIDQQYLNYIFSVVLLAFA





SHPTGDLFSVLDFPFLYFYR





FHGSCCASGFLGFFLMFST





VKLKNLLAPGQCAAWIFFA





KIITAGLSILLFDAILTSATTG





CLLLGALGEALLVFSERKSS






SEQ ID NO: 1887
ENSG00000135093.8
MLSSRAEAAMTAADRAIQ
A*02:03, A*02:07, A*11:01, 




RFLRTGAAVRYKVMKNW
A*11:02, A*24:02, A*24:07, 




GVIGGIAAALAAGIYVIWG
A*24:10, B*15:21, B*27:04, 




PITERKKRRKGLVPGLVNL
B*38:02, B*39:01, B*40:01, 




GNTCFMNSLLQGLSACPA
B*51:01, B*58:01, C*03:02, 




FIRWLEEFTSQYSRDQKEP
C*07:02, C*14:02, C*15:02




PSHQYLSLTLLHLLKALSCQ





EVTDDEVLDASCLLDVLRM





YRWQISSFEEQDAHELFHV





ITSSLEDERDRQPRVTHLFD





VHSLEQQSEITPKQITCRTR





GSPHPTSNHWKSQHPFHG





RLTSN






SEQ ID NO: 1888
ENSG00000136231.9
MNKLYIGNLSENAAPSDLE
A*02:03, A*11:01, A*11:02, 




SIFKDAKIPVSGPFLVKTGY
A*24:10, A*33:03, A*34:01, 




AFVDCPDESWALKAIEALS
B*15:01, B*15:27, C*03:02, 




GKIELHGKPIEVEHSVPKRQ
C*03:04, C*14:02




RIRKLQIRNIPPHLQWEVLD





SLLVQYGVVESCEQVNTDS





ETAVVNVTYSSKDQARQA





LDKLNGFQLENFTLKVAYIP





DEMAAQQNPLQQPRGRR





GLGQRGSSRQGSPGSVSK





QKPCDLPLRLLVPTQFVGAI





IGKEGATIRNITKQTQSKID





VHRKENAGAAEKSITILSTP





EGTSAACKSILEIMHKEAQ





DIKFTEEIPLKILAHNNFVG





RLIGKEGRNLKKIEQDTDTK





ITISPLQELTLYNPERTITVK





GNVETCAKAEEEIMKKIRE





SYENDIASMNLQAHLIPGL





NLNALGLFPPTSGMPPPTS





GPPSAMTPPYPQFEQSETE





TVHLFIPALSVGAIIGKQGQ





HIKQLSRFAGASIKIAPAEA





PDAKVRMVIITGPPEAQFK





AQGRIYGKIKEENFVSPKEE





VKLEAHIRVPSFAAGRVIGK





GGKTVNELQNLSSAEVVVP





RDQTPDENDQVVVKITGH





FYACQVAQRKIQEILTQVK





QHQQQKALQSGPPQSRRK






SEQ ID NO: 1889
ENSG00000136848.12
MEPDSLLDQDDSYESPQE
A*02:03




RPGSRRSLPGSLSEKSPSM





EPSAATPFRVTGFLSRRLKG





SIKRTKSQPKLDRNHSFRHI






SEQ ID NO: 1890
ENSG00000137203.6
MLWKLTDNIKYEDCEDRH
A*02:03, A*11:01, A*11:02, 




DGTSNGTARLPQLGTVGQ
A*24:02, A*24:10, A*33:03, 




SPYTSAPPLSHTPNADFQP
B*39:01, C*14:02




PYFPPPYQPIYPQSQDPYS





HVNDPYSLNPLHAQPQPQ





HPGWPGQRQSQESGLLHT





HRGLPHQLSGLDPRRDYRR





HEDLLHGPHALSSGLGDLSI





HSLPHAIEEVPHVEDPGINI





PDQTVIKKGPVSLSKSNSN





AVSAIPINKDNLFGGVVNP





NEVFCSVPGRLSLLSSTSK






SEQ ID NO: 1891
ENSG00000137474.15
MVILQQGDHVWMDLRLG
A*02:03, A*11:01, A*11:02, 




QEFDVPIGAVVKLCDSGQV
A*24:02, A*24:07, A*24:10, 




QVVDDEDNEHWISPQNA
A*33:03, B*15:01, B*39:01, 




THIKPMHPTSVHGVEDMI
B*40:01, B*55:02, B*58:01, 




RLGDLNEAGILRNLLIRYRD
C*03:02, C*03:04, C*03:67, 




HLIYTYTGSILVAVNPYQLLS
C*07:02, C*12:02, C*14:02, 




IYSPEHIRQYTNKKIGEMPP
C*15:02




HIFAIADNCYFNMKRNSRD





QCCIISGESGAGKTESTKLIL





QFLAAISGQHSWIEQQVLE





ATPILEAFGNAKTIRNDNSS





RFGKYIDIHFNKRGAIEGAK





IEQYLLEKSRVCRQALDERN





YHVFYCMLEGMSEDQKKK





LGLGQASDYNYLAMGNCI





TCEGRVDSQEYANIRSAM





KVLMFTDTENWEISKLLAA





ILHLGNLQYEARTFENLDA





CEVLFSPSLATAASLLEVNP





PDLMSCLTSRTLITRGETVS





TPLSREQALDVRDAFVKGI





YGRLFVWIVDKINAAIYKPP





SQDVKNSRRSIGLLDIFGFE





NFAVNSFEQLCINFANEHL





QQFFVRHVFKLEQEEYDLE





SIDWLHIEFTDNQDALDMI





ANKPMNIISLIDEESKFPKG





TDTTMLHKLNSQHKLNAN





YIPPKNNHETQFGINHFAG





IVYYETQGFLEKNRDTLHG





DIIQLVHSSRNKFIKQIFQA





DVAMGAETRKRSPTLSSQF





KRSLELLMRTLGACQPFFV





RCIKPNEFKKPMLFDRHLC





VRQLRYSGMMETIRIRRAG





YPIRYSFVEFVERYRVLLPG





VKPAYKQGDLRGTCQRMA





EAVLGTHDDWQIGKTKIFL





KDHHDMLLEVERDKAITD





RVILLQKVIRGFKDRSNFLK





LKNAATLIQRHWRGHNCR





KNYGLMRLGFLRLQALHRS





RKLHQQYRLARQRIIQFQA





RCRAYLVRKAFRHRLWAVL





TVQAYARGMIARRLHQRL





RAEYLWRLEAEKMRLAEEE





KLRKEMSAKKAKEEAERKH





QERLAQLAREDAERELKEK





EAARRKKELLEQMERARH





EPVNHSDMVDKMFGFLG





TSGGLPGQEGQAPSGFED





LERGRREMVEEDLDAALPL





PDEDEEDLSEYKFAKFAATY





FQGTTTHSYTRRPLKQPLLY





HDDEGDQLAALAVWITILR





FMGDLPEPKYHTAMSDGS





EKIPVMTKIYETLGKKTYKR





ELQALQGEGEAQLPEGQK





KSSVRHKLVHLTLKKKSKLT





EEVTKRLHDGESTVQGNS





MLEDRPTSNLEKLHFIIGNG





ILRPALRDEIYCQISKQLTH





NPSKSSYARGWILVSLCVG





CFAPSEKFVKYLRNFIHGGP





PGYAPYCEERLRRTFVNGT





RTQPPSWLELQATKSKKPI





MLPVTFMDGTTKTLLTDSA





TTAKELCNALADKISLKDRF





GFSLYIALFD






SEQ ID NO: 1892
ENSG00000138075.7
MGDLSSLTPGGSMGLQV
A*02:03, A*02:07, A*11:01, 




NRGSQSSLEGAPATAPEPH
A*11:02, A*24:02, A*24:07, 




SLGILHASYSVSHRVRPW
A*24:10, A*33:03, A*34:01, 




WDITSCRQQWTRQILKDV
B*15:01, B*15:21, B*15:27, 




SLYVESGQIMCILGSSGSGK
B*27:04, B*38:02, B*39:01, 




TTLLDAMSGRLGRAGTFLG
B*40:01, B*40:06, B*46:01, 




EVYVNGRALRREQFQDCFS
B*55:02, B*58:01, C*03:02, 




YVLQSDTLLSSLTVRETLHY
C*03:04, C*03:67, C*04:01, 




TALLAIRRGNPGSFQKKVE
C*04:03, C*07:02, C*08:01, 




AVMAELSLSHVADRLIGNY
C*12:02, C*14:02, C*15:02




SLGGISTGERRRVSIAAQLL





QDPKVMLFDEPTTGLDCM





TANQIVVLLVELARRNRIVV





LTIHQPRSELFQLFDKIAILS





FGELIFCGTPAEMLDFFND





CGYPCPEHSNPFDFY






SEQ ID NO: 1893
ENSG00000142185.12
MEPSALRKAGSEQEEGFE
A*02:03, A*11:01, A*11:02, 




GLPRRVTDLGMVSNLRRS
A*24:02, A*24:07, A*24:10, 




NSSLFKSWRLQCPFGNND
A*33:03, A*34:01, B*15:01, 




KQESLSSWIPENIKKKECVY
B*15:27, B*39:01, B*40:01, 




FVESSKLSDAGKVVCQCGY
B*58:01, C*03:02, C*03:04, 




THEQHLEEATKPHTFQGT
C*12:02, C*14:02, C*15:02




QWDPKKHVQEMPTDAFG





DIVFTGLSQKVKKYVRVSQ





DTPSSVIYHLMTQHWGLD





VPNLLISVTGGAKNFNMKP





RLKSIFRRGLVKVAQTTGA





WIITGGSHTGVMKQVGEA





VRDFSLSSSYKEGELITIGVA





TWGTVHRREGLIHPTGSFP





AEYILDEDGQGNLTCLDSN





HSHFILVDDGTHGQYGVEI





PLRTRLEKFISEQTKERGGV





AIKIPIVCVVLEGGPGTLHTI





DNATTNGTPCVVVEGSGR





VADVIAQVANLPVSDITISLI





QQKLSVFFQEMFETFTESRI





VEWTKKIQDIVRRRQLLTV





FREGKDGQQDVDVAILQA





LLKASRSQDHFGHENWDH





QLKLAVAWNRVDIARSEIF





MDEWQWKPSDLHPTMT





AALISNKPEFVKLFLENGVQ





LKEFVTWDTLLYLYENLDPS





CLFHSKLQMHHVAQVLRE





LLGDFTQPLYPRPRHNDRL





RLLLPVPHVKLNVQGVSLR





SLYKRSSGHVTFTMDPIRD





LLIWAIVQNRRELAGIIWA





QSQDCIAAALACSKILKELS





KEEEDTDSSEEMLALAEEY





EHRAIGVFTECYRKDEERA





QKLLTRVSEAWGKTTCLQL





ALEAKDMKFVSHGGIQAFL





TKVWWGQLSVDNGLWR





VTLCMLAFPLLLTGLISFREK





RLQDVGTPAARARAFFTAP





VVVFHLNILSYFAFLCLFAY





VLMVDFQPVPSWCECAIY





LWLFSLVCEEMRQLFYDPD





ECGLMKKAALYFSDFWNK





LDVGAILLFVAGLTCRLIPA





TLYPGRVILSLDFILFCLRLM





HIFTISKTLGPKIIIVKRMMK





DVFFFLFLLAVWVVSFGVA





KQAILIHNERRVDWLFRGA





VYHSYLTIFGQIPGYIDGVN





FNPEHCSPNGTDPYKPKCP





ESDATQQRPAFPEWLTVLL





LCLYLLFTNILLLNLLIAMFN





YTFQQVQEHTDQIWKFQR





HDLIEEYHGRPAAPPPFILL





SHLQLFIKRVVLKTPAKRHK





QLKNKLEKNEEAALLSWEI





YLKENYLQNRQFQQKQRP





EQKIEDISNKVDAMVDLLD





LDPLKRSGSMEQRLASLEE





QVAQTAQALHWIVRTLRA





SGFSSEADVPTLASQKAAE





EPDAEPGGRKKTEEPGDSY





HVNARHLLYPNCPVTRFPV





PNEKVPWETEFLIYDPPFYT





AERKDAAAMDPMGENP





MGRTGLRGRGSLSCFGPN





HTLYPMVTRWRRNEDGAI





CRKSIKKMLEVLVVKLPLSE





HWALPGGSREPGEMLPRK





LKRILRQEHWPSFENLLKC





GMEVYKGYMDDPRNTDN





AWIETVAVSVHFQDQNDV





ELNRLNSNLHACDSGASIR





WQVVDRRIPLYANHKTLL





QKAAAEFGAHY






SEQ ID NO: 1894
ENSG00000142235.4
MRQVLWLCNVCVTARETR
A*02:03, A*33:03, B*15:01, 




HHLHLPAILDKMPAPGALI
B*39:01, B*40:01, C*03:02, 




LLAAVSASGCLASPAHPDG
C*03:04




FALGRAPLAPPYAVVLISCS





GLLAFIFLLLTCLCCKRGDV





GFKEFENPEGEDCSGEYTP





PAEETSSSQSLPDVYILPLAE





VSLPMPAPQPSHSDMTTP





LGLSRQHLSYLQEIGSGWF





GKVILGEIFSDYTPAQVVVK





ELRASAGPLEQRKFISEAQP





YRSLQHPNVLQCLGLCVET





LPFLLIMEFCQLGDLKRYLR





AQRPPEGLSPELPPRDLRTL





QRMGLEIARGLAHLHSHN





YV






SEQ ID NO: 1895
ENSG00000142661.14
MTLPHSLGGAGDPRPPQA
A*02:03, A*11:01, A*11:02, 




MEVHRLEHRQEEEQKEER
A*24:02, A*24:07, A*24:10, 




QHSLRMGSSVRRRTFRSSE
A*33:03, B*15:01, B*15:27, 




EEHEFSAADYALAAALALT
B*39:01, B*40:01, B*58:01, 




ASSELSWEAQLRRQTSAVE
C*03:02, C*03:04, C*03:67, 




LEERGQKRVGFGNDWERT
C*07:02, C*08:01, C*12:02, 




EIAFLQTHRLLRQRRDWKT
C*14:02




LRRRTEEKVQEAKELRELCY





GRGPWFWIPLRSHAVWE





HTTVLLTCTVQASPPPQVT





WYKNDTRIDPRLFRAGKYR





ITNNYGLLSLEIRRCAIEDSA





TYTVRVKNAHGQASSFAK





VLVRTYLGKDAGFDSEIFKR





STFGPSVEFTSVLKPVFARE





KEPFSLSCLFSEDVLDAESIQ





WFRDGSLLRSSRRRKILYTD





RQASLKVSCTYKEDEGLYM





VRVPSPFGPREQSTYVLVR





DAEAENPGAPGSPLNVRCL





DVNRDCLILTWAPPSDTRG





NPITAYTIERCQGESGEWIA





CHEAPGGTCRCPIQGLVEG





QSYRFRVRAISRVGSSVPSK





ASELVVMGDHDAARRKTE





IPFDLGNKITISTDAFEDTVT





IPSPPTNVHASEIREAYVVL





AWEEPSPRDRAPLTYSLEK





SVIGSGTWEAISSESPVRSP





RFAVLDLEKKKSYVFRVRA





MNQYGLSDPSEPSEPIALR





GPPATLPPPAQVQAFRDT





QTSVSLTWDPVKDPELLGY





YIYSRKVGTSEWQTVNNKP





IQGTRFTVPGLRTGKEYEFC





VRSVSEAGVGESSAATEPIR





VKQALATPSAPYGFALLNC





GKNEMVIGWKPPKRRGG





GKILGYFLDQHDSEELDWH





AVNQQPIPTRVCKVSDLHE





GHFYEFRARAANWAGVG





ELSAPSSLFECKEWTMPQP





GPPYDVRASEVRATSLVLQ





WEPPLYMGAGPVTGYHVS





FQEEGSEQWKPVTPGPISG





THLRVSDLQPGKSYVFQVQ





AMNSAGLGQPSMPTDPV





LLEDKPGAHEIEVGVDEEG





FIYLAFEAPEAPDSSEFQWS





KDYKGPLDPQRVKIEDKVN





KSKVILKEPGLEDLGTYSVIV





TDADEDISASHTLTEEELEK





LKKLSHEIRNPVIKLISGWNI





DILERGEVRLWLEVEKLSPA





AELHLIFNNKEIFSSPNRKIN





FDREKGLVEVIIQNLSEEDK





GSYTAQLQDGKAKNQITLT





LVDDDFDKLLRKADAKRRD





WKRKQGPYFERPLQWKVT





EDCQVQLTCKVTNTKKETR





FQWFFQRAEMPDGQYDP





ETGTGLLCIEELSKKDKGIYR





AMVSDDRGEDDTILDLTG





DALDAIFTELGRIGALSATP





LKIQGTEEGIRIFSKVKYYNV





EYMKTTWFHKDKRLESGD





RIRTGTTLDEIWLHILDPKD





SDKGKYTLEIAAGKEVRQLS





TDLSGQAFEDAMAEHQRL





KTLAIIEKNRAKVVRGLPDV





ATIMEDKTLCLTCIVSGDPT





PEISWLKNDQPVTFLDRYR





MEVRGTEVTITIEKVNSEDS





GRYGVFVKNKYGSETGQV





TISVFKHGDEPKELKSM






SEQ ID NO: 1896
ENSG00000143669.9
MSTDSNSLAREFLTDVNRL
A*02:03, A*11:01, A*11:02, 




CNAVVQRVEAREEEEEETH
A*24:02, A*24:07, A*24:10, 




MATLGQYLVHGRGFLLLTK
A*33:03, A*34:01, B*15:01, 




LNSIIDQALTCREELLTLLLSL
B*15:27, B*39:01, B*40:01, 




LPLVWKIPVQEEKATDFNL
B*55:02, B*58:01, C*03:02, 




PLSADIILTKEKNSSSQRST
C*03:04, C*03:67, C*07:02, 




QEKLHLEGSALSSQVSAKV
C*12:02, C*14:02, C*15:02




NVFRKSRRQRKITHRYSVR





DARKTQLSTSDSEANSDEK





GIAMNKHRRPHLLHHFLTS





FPKQDHPKAKLDRLATKEQ





TPPDAMALENSREIIPRQG





SNTDILSEPAALSVISNMN





NSPFDLCHVLLSLLEKVCKF





DVTLNHNSPLAASVVPTLT





EFLAGFGDCCSLSDNLESR





VVSAGWTEEPVALIQRML





FRTVLHLLSVDVSTAEMM





PENLRKNLTELLRAALKIRIC





LEKQPDPFAPRQKKTLQEV





QEDFVFSKYRHRALLLPELL





EGVLQILICCLQSAASNPFY





FSQAMDLVQEFIQHHGFN





LFETAVLQMEWLVLRDGV





PPEASEHLKALINSVMKIM





STVKKVKSEQLHHSMCTRK





RHRRCEYSHFMHHHRDLS





GLLVSAFKNQVSKNPFEET





ADGDVYYPERCCCIAVCAH





QCLRLLQQASLSSTCVQILS





GVHNIGICCCMDPKSVIIPL





LHAFKLPALKNFQQHILNIL





NKLILDQLGGAEISPKIKKA





ACNICTVDSDQLAQLEETL





QGNLCDAELSSSLSSPSYRF





QGILPSSGSEDLLWKWDAL





KAYQNFVFEEDRLHSIQIA





NHICNLIQKGNIVVQWKLY





NYIFNPVLQRGVELAHHCQ





HLSVTSAQSHVCSHHNQC





LPQDVLQIYVKTLPILLKSRV





IRDLFLSCNGVSQIIELNCLN





GIRSHSLKAFETLIISLGEQQ





KDASVPDIDGIDIEQKELSS





VHVGTSFHHQQAYSDSPQ





SLSKFYAGLKEAYPKRRKTV





NQDVHINTINLFLCVAFLCV





SKEAESDRESANDSEDTSG





YDSTASEPLSHMLPCISLES





LVLPSPEHMHQAADIWS





MCRWIYMLSSVFQKQFYR





LGGFRVCHKLIFMIIQKLFR





SHKEEQGKKEGDTSVNEN





QDLNRISQPKRTMKEDLLS





LAIKSDPIPSELGSLKKSADS





LGKLELQHISSINVEEVSAT





EAAPEEAKLFTSQESETSLQ





SIRLLEALLAICLHGARTSQ





QKMELELPNQNLSVESILFE





MRDHLSQSKVIETQLAKPL





FDALLRVALGNYSADFEHN





DAMTEKSHQSAEELSSQP





GDFSEEAEDSQCCSFKLLVE





EEGYEADSESNPEDGETQD





DGVDLKSETEGFSASSSPN





DLLENLTQGEIIYPEICMLEL





NLLSASKAKLDVLAHVFESF





LKIIRQKEKNVFLLMQQGT





VKNLLGGFLSILTQDDSDF





QACQRVLVDLLVSLMSSRT





CSEELTLLLRIFLEKSPCTKIL





LLGILKIIESDTTMSPSQYLT





FPLLHAPNLSNGVSSQKYP





GILNSKAMGLLRRARVSRS





KKEADRESFPHRLLSSWHI





APVHLPLLGQNCWPHLSE





GFSVSLWFNVECIHEAEST





TEKGKKIKKRNKSLILPDSSF





DGTESDRPEGAEYINPGER





LIEEGCIHIISLGSKALMIQV





WADPHNATLIFRVCMDSN





DDMKAVLLAQVESQENIFL





PSKWQHLVLTYLQQPQGK





RRIHGKISIWVSGQRKPDV





TLDFMLPRKTSLSSDSNKTF





CMIGHCLSSQEEFLQLAGK





WDLGNLLLFNGAKVGSQE





AFYLYACGPNHTSVMPCK





YGKPVNDYSKYINKEILRCE





QIRELFMTKKDVDIGLLIESL





SVVYTTYCPAQYTIYEPVIRL





KGQMKTQLSQRPFSSKEV





QSILLEPHHLKNLQPTEYKT





IQGILHEIGGTGIFVFLFARV





VELSSCEETQALALRVILSLI





KYNQQRVHELENCNGLSM





IHQVLIKQKCIVGFYILKTLL





EGCCGEDIIYMNENGEFKL





DVDSNAIIQDVKLLEELLLD





WKIWSKAEQGVWETLLAA





LEVLIRADHHQQMFNIKQL





LKAQVVHHFLLTCQVLQEY





KEGQLTPMPREVCRSFVKII





AEVLGSPPDLELLTIIFNFLL





AVHPPTNTYVCHNPTNFYF





SLHIDGKIFQEKVRSIMYLR





HSSSGGRSLMSPGFMVISP





SGFTASPYEGENSSNIIPQQ





MAAHMLRSRSLPAFPTSSL





LTQSQKLTGSLGCSIDRLQ





NIADTYVATQSKKQNSLGS





SDTLKKGKEDAFISSCESAK





TVCEMEAVLSAQVSVSDV





PKGVLGFPVVKADHKQLG





AEPRSEDDSPGDESCPRRP





DYLKGLASFQRSHSTIASLG





LAFPSQNGSAAVGRWPSL





VDRNTDDWENFAYSLGYE





PNYNRTASAHSVTEDCLVP





ICCGLYELLSGVLLILPDVLL





EDVMDKLIQADTLLVLVNH





PSPAIQQGVIKLLDAYFARA





SKEQKDKFLKNRGFSLLAN





QLYLHRGTQELLECFIEMFF





GRHIGLDEEFDLEDVRNM





GLFQKWSVIPILGLIETSLYD





NILLHNALLLLLQILNSCSKV





ADMLLDNGLLYVLCNTVA





ALNGLEKNIPMSEYKLLAC





DIQQLFIAVTIHACSSSGSQ





YFRVIEDLIVMLGYLQNSK





NKRTQNMAVALQLRVLQ





AAMEFIRTTANHDSENLTD





SLQSPSAPHHAVVQKRKSI





AGPRKFPLAQTESLLMKM





RSVANDELHVMMQRRMS





QENPSQATETELAQRLQRL





TVLAVNRIIYQEFNSDIIDIL





RTPENVTQSKTSVFQTEISE





ENIHHEQSSVFNPFQKEIFT





YLVEGFKVSIGSSKASGSKQ





QWTKILWSCKETFRMQLG





RLLVHILSPAHAAQERKQIF





EIVHEPNHQEILRDCLSPSL





QHGAKLVLYLSELIHNHQG





ELTEEELGTAELLMNALKLC





GHKCIPPSASTKADLIKMIK





EEQKKYETEEGVNKAAWQ





KTVNNNQQSLFQRLDSKS





KDISKIAADITQAVSLSQGN





ERKKVIQHIRGMYKVDLSA





SRHWQELIQQLTHDRAV





WYDPIYYPTSWQLDPTEG





PNRERRRLQRCYLTIPNKYL





LRDRQKSEDVVKPPLSYLFE





DKTHSSFSSTVKDKAASESI





RVNRRCISVAPSRETAGELL





LGKCGMYFVEDNASDTVE





SSSLQGELEPASFSWTYEEI





KEVHKRWWQLRDNAVEIF





LTNGRTLLLAFDNTKVRDD





VYHNILTNNLPNLLEYGNIT





ALTNLWYTGQITNFEYLTH





LNKHAGRSFNDLMQYPVF





PFILADYVSETLDLNDLLIYR





NLSKPIAVQYKEKEDRYVD





TYKYLEEEYRKGAREDDPM





PPVQPYHYGSHYSNSGTVL





HFLVRMPPFTKMFLAYQD





QSFDIPDRTFHSTNTTWRL





SSFESMTDVKELIPEFFYLPE





FLVNREGFDFGVRQNGER





VNHVNLPPWARNDPRLFI





LIHRQALESDYVSQNICQW





IDLVFGYKQKGKASVQAIN





VFHPATYFGMDVSAVEDP





VQRRALETMIKTYGQTPR





QLFHMAHVSRPGAKLNIE





GELPAAVGLLVQFAFRETR





EQVKEITYPSPLSWIKGLK





WGEYVGSPSAPVPVVCFS





QPHGERFGSLQALPTRAIC





GLSRNFCLLMTYSKEQGVR





SMNSTDIQWSAILSWGYA





DNILRLKSKQSEPPVNFIQS





SQQYQVTSCAWVPDSCQL





FTGSKCGVITAYTNRFTSST





PSEIEMETQIHLYGHTEEIT





SLFVCKPYSILISVSRDGTCII





WDLNRLCYVQSLAGHKSP





VTAVSASETSGDIATVCDS





AGGGSDLRLWTVNGDLV





GHVHCREIICSVAFSNQPE





GVSINVIAGGLENGIVRLW





STWDLKPVREITFPKSNKPI





ISLTFSCDGHHLYTANSDGT





VIAWCRKDQQRLKQPMFY





SFLSSYAAG






SEQ ID NO: 1897
ENSG00000143882.5
MSEFWLISAPGDKENLQAL
A*02:03, A*11:01, A*11:02, 




ERMNTVTSKSNLSYNTKFA
A*33:03, B*58:01, C*03:02, 




IPDFKVGTLDSLVGLSDELG
C*03:04




KLDTFAESLIRRMAQSVVE





VMEDSKGKVQEHLLANGV





DLTSFVTHFEWD






SEQ ID NO: 1898
ENSG00000145214.9
MAAAAEPGARAWLGGGS
A*02:03, A*11:01, A*11:02, 




PRPGSPACSPVLGSGGRAR
A*33:03, B*15:01, B*39:01, 




PGPGPGPGPERAGVRAPG
B*40:01, C*03:02, C*03:04




PAAAPGHSFRKVTLTKPTF





CHLCSDFIWGLAGFLCDVC





NFMSHEKCLKHVRIPCTSV





APSLVRVPVAHCFGPRGLH





KRKFCAVCRKVLEAPALHC





EVCELHLHPDCVPFACSDC





RQCHQDGHQDHDTHHH





HWREGNLPSGARCEVCRK





TCGSSDVLAGVRCEWCGV





QAHSLCSAALAPECGFGRL





RSLVLPPACVRLLPGGFSKT





QSFRIVEAAEPGEGGDGA





DGSAAVGPGRETQATPES





GKQTLKIFDGDDAVRRSQF





RLVTVSRLAGAEEVLEAALR





AHHIPEDPGHLELCRLPPSS





QACDAWAGGKAGSAVISE





EGRSPGSGEATPEAWVIRA





LPRAQEVLKIYPGWLKVGV





AYVSVRVTPKSTARSVVLE





VLPLLGRQAESPESFQLVEV





AMGCRHVQRTMLMDEQ





PLLDRLQDIRQMSVRQVS





QTRFYVAESRDVAPHVSLF





VGGLPPGLSPEEYSSLLHEA





GATKATVVSVSHIYSSQGA





VVLDVACFAEAERLYMLLK





DMAVRGRLLTALVLPDLLH





AKLPPDSCPLLVFVNPKSG





GLKGRDLLCSFRKLLNPHQ





VFDLTNGGPLPGLHLFSQV





PCFRVLVCGGDGTVGWVL





GALEETRYRLACPEPSVAIL





PLGTGNDLGRVLRWGAGY





SGEDPFSVLLSVDEADAVL





MDRWTILLDAHEAGSAEN





DTADAEP






SEQ ID NO: 1899
ENSG00000151025.9
MGAMAYPLLLCLLLAQLGL
A*02:03, A*02:07, A*11:01, 




GAVGASRDPQGRPDSPRE
A*11:02, A*24:02, A*24:07, 




RTPKGKPHAQQPGRASAS
A*24:10, A*33:03, B*15:01, 




DSSAPWSRSTDGTILAQKL
B*39:01, B*40:01, B*55:02, 




AEEVPMDVASYLYTGDSH
B*58:01, C*03:02, C*03:04, 




QLKRANCSGRYELAGLPGK
C*03:67, C*07:02, C*12:02, 




WPALASAHPSLHRALDTLT
C*14:02




HATNFLNVMLQSNKSREQ





NLQDDLDWYQALVWSLLE





GEPSISRAAITFSTDSLSAPA





PQVFLQATREESRILLQDLS





SSAPHLANATLETEWFHGL





RRKWRPHLHRRGPNQGP





RGLGHSWRRKDGLGGDKS





HFKWSPPYLECENGSYKPG





WLVTLSSAIYGLQPNLVPEF





RGVMKVDINLQKVDIDQC





SSDGWFSGTHKCHLNNSE





CMPIKGLGFVLGAYECICK





AGFYHPGVLPVNNFRRRG





PDQHISGSTKDVSEEAYVC





LPCREGCPFCADDSPCFVQ





EDKYLRLAIISFQALCMLLD





FVSMLVVYHFRKAKSIRAS





GLILLETILFGSLLLYFPVVILY





FEPSTFRCILLRWARLLGFA





TVYGTVTLKLHRVLKVFLSR





TAQRIPYMTGGRVMRML





AVILLVVFWFLIGWTSSVC





QNLEKQISLIGQGKTSDHLI





FNMCLIDRWDYMTAVAEF





LFLLWGVYLCYAVRTVPSA





FHEPRYMAVAVHNELIISAI





FHTIRFVLASRLQSDWML





MLYFAHTHLTVTVTIGLLLI





PKFSHSSNNPRDDIATEAY





EDELDMGRSGSYLNSSINS





AWSEHSLDPEDIRDELKKL





YAQLEIYKRKKMITNNPHL





QKKRCSKKGLGRSIMRRIT





EIPETVSRQCSKEDKEGAD





HGTAKGTALIRKNPPESSG





NTGKSKEETLKNRVFSLKKS





HSTYDHVRDQTEESSSLPT





ESQEEETTENSTLESLSGKK





LTQKLKEDSEAESTESVPLV





CKSASAHNLSSEKKTGHPR





TSMLQKSLSVIASAKEKTLG





LAGKTQTAGVEERTKSQKP





LPKDKETNRNHSNSDNTET





KDPAPQNSNPAEEPRKPQ





KSGIMKQQRVNPTTANSD





LNPGTTQMKDNFDIGEVC





PWEVYDLTPGPVPSESKV





QKHVSIVASEMEKNPTFSL





KEKSHHKPKAAEVCQQSN





QKRIDKAEVCLWESQGQSI





LEDEKLLISKTPVLPERAKEE





NGGQPRAANVCAGQSEEL





PPKAVASKTENENLNQIGH





QEKKTSSSEENVRGSYNSS





NNFQQPLTSRAEVCPWEF





ETPAQPNAGRSVALPASSA





LSANKIAGPRKEEIWDSFK





V






SEQ ID NO: 1900
ENSG00000151229.8
MSRKASENVEYTLRSLSSL
A*02:03, A*02:07, A*11:01, 




MGERRRKQPEPDAASAAG
A*11:02, A*24:10, A*34:01, 




ECSLLAAAESSTSLQSAGA
B*15:01, B*15:21, B*15:27, 




GGGGVGDLERAARRQFQ
B*27:04, B*40:01, B*40:06, 




QDETPAFVYVVAVFSALGG
B*46:01, B*55:02, B*58:01, 




FLFGYDTGVVSGAMLLLKR
C*01:02, C*03:02, C*03:04, 




QLSLDALWQELLVSSTVGA
C*03:67, C*04:01, C*04:03, 




AAVSALAGGALNGVFGRR
C*08:01, C*12:02, C*15:02




AAILLASALFTAGSAVLAAA





NNKETLLAGRLVVGLGIGIA





SMTVPVYIAEVSPPNLRGR





LVTINTLFITGGQFFASVVD





GAFSYLQKDGW






SEQ ID NO: 1901
ENSG00000151914.13
MAGYLSPAAYLYVEEQEYL
A*02:03, A*11:01, A*11:02, 




QAYEDVLERYKDERDKVQ
A*24:02, A*24:07, A*24:10, 




KKTFTKWINQHLMKVRKH
A*33:03, A*34:01, B*15:01, 




VNDLYEDLRDGHNLISLLEV
B*15:27, B*39:01, B*40:01, 




LSGDTLPREKGRMRFHRL
B*55:02, B*58:01, C*03:02, 




QNVQIALDYLKRRQVKLVN
C*03:04, C*07:02, C*12:02, 




IRNDDITDGNPKLTLGLIWT
C*14:02, C*15:02




IILHFQISDIHVTGESEDMS





AKERLLLWTQQATEGYAGI





RCENFTTCWRDGKLFNAII





HKYRPDLIDMNTVAVQSN





LANLEHAFYVAEKIGVIRLL





DPEDVDVSSPDEKSVITYVS





SLYDAFPKVPEGGEGIGAN





DVEVKWIEYQNMVNYLIQ





WIRHHVTTMSERTFPNNP





VELKALYNQYLQFKETEIPP





KETEKSKIKRLYKLLEIWIEF





GRIKLLQGYHPNDIEKEWG





KLIIAMLEREKALRPEVERL





EMLQQIANRVQRDSVICE





DKLILAGNALQSDSKRLESG





VQFQNEAEIAGYILECENLL





RQHVIDVQILIDGKYYQAD





QLVQRVAKLRDEIMALRN





ECSSVYSKGRILTTEQTKLM





ISGITQSLNSGFAQTLHPSL





TSGLTQSLTPSLTSSSMTSG





LSSGMTSRLTPSVTPAYTP





GFPSGLVPNFSSGVEPNSL





QTLKLMQIRKPLLKSSLLDQ





NLTEEEINMKFVQDLLNW





VDEMQVQLDRTEWGSDL





PSVESHLENHKNVHRAIEE





FESSLKEAKISEIQMTAPLKL





TYAEKLHRLESQYAKLLNTS





RNQERHLDTLHNFVSRAT





NELIWLNEKEEEEVAYDWS





ERNTNIARKKDYHAELMRE





LDQKEENIKSVQEIAEQLLL





ENHPARLTIEAYRAAMQT





QWSWILQLCQCVEQHIKE





NTAYFEFFNDAKEATDYLR





NLKDAIQRKYSCDRSSSIHK





LEDLVQESMEEKEELLQYK





STIANLMGKAKTIIQLKPRN





SDCPLKTSIPIKAICDYRQIEI





TIYKDDECVLANNSHRAK





WKVISPTGNEAMVPSVCF





TVPPPNKEAVDLANRIEQQ





YQNVLTLWHESHINMKSV





VSWHYLINEIDRIRASNVAS





IKTMLPGEHQQVLSNLQSR





FEDFLEDSQESQVFSGSDIT





QLEKEVNVCKQYYQELLKS





AEREEQEESVYNLYISEVRN





IRLRLENCEDRLIRQIRTPLE





RDDLHESVFRITEQEKLKKE





LERLKDDLGTITNKCEEFFS





QAAASSSVPTLRSELNVVL





QNMNQVYSMSSTYIDKLK





TVNLVLKNTQAAEALVKLY





ETKLCEEEAVIADKNNIENLI





STLKQWRSEVDEKRQVFH





ALEDELQKAKAISDEMFKT





YKERDLDFDWHKEKADQL





VERWQNVHVQIDNRLRDL





EGIGKSLKYYRDTYHPLDD





WIQQVETTQRKIQENQPE





NSKTLATQLNQQKMLVSEI





EMKQSKMDECQKYAEQYS





ATVKDYELQTMTYRAMVD





SQQKSPVKRRRMQSSADLI





IQEFMDLRTRYTALVTLMT





QYIKFAGDSLKRLEEEEKSL





EEEKKEHVEKAKELQKWVS





NISKTLKDAEKAGKPPFSK





QKISSEEISTKKEQLSEALQT





IQLFLAKHGDKMTDEERNE





LEKQVKTLQESYNLLFSESL





KQLQESQTSGDVKVEEKLD





KVIAGTIDQTTGEVLSVFQ





AVLRGLIDYDTGIRLLETQL





MISGLISPELRKCFDLKDAK





SHGLIDEQILCQLKELSKAK





EIISAASPTTIPVLDALAQS





MITESMAIKVLEILLSTGSLV





IPATGEQLTLQKAFQQNLV





SSALFSKVLERQNMCKDLI





DPCTSEKVSLIDMVQRSTL





QENTGMWLLPVRPQEGG





RITLKCGRNISILRAAHEGLI





DRETMFRLLSAQLLSGGLI





NSNSGQRMTVEEAVREGV





IDRDTASSILTYQVQTGGII





QSNPAKRLTVDEAVQCDLI





TSSSALLVLEAQRGYVGLI





WPHSGEIFPTSSSLQQELIT





NELAYKILNGRQKIAALYIP





ESSQVIGLDAAKQLGIIDNN





TASILKNITLPDKMPDLGDL





EACKNARRWLSFCKFQPST





VHDYRQEEDVFDGEEPVT





TQTSEETKKLFLSYLMINSY





MDANTGQRLLLYDGDLDE





AVGMLLEGCHAEFDGNTA





IKECLDVLSSSGVFLNNASG





REKDECTATPSSFNKCHCG





EPEHEETPENRKCAIDEEFN





EMRNTVINSEFSQSGKLAS





TISIDPKVNSSPSVCVPSLIS





YLTQTELADISMLRSDSENI





LTNYENQSRVETNERANEC





SHSKNIQNFPSDLIENPIMK





SKMSKFCGVNETENEDNT





NRDSPIFDYSPRLSALLSHD





KLMHSQGSFNDTHTPESN





GNKCEAPALSFSDKTMLSG





QRIGEKFQDQFLGIAAINIS





LPGEQYGQKSLNMISSNP





QVQYHNDKYISNTSGEDEK





THPGFQQMPEDKEDESEIE





EYSCAVTPGGDTDNAIVSL





TCATPLLDETISASDYETSLL





NDQQNNTGTDTDSDDDF





YDTPLFEDDDHDSLLLDGD





DRDCLHPEDYDTLQEEND





ETASPADVFYDVSKENENS





MVPQGAPVGSLSVKNKAH





CLQDFLMDVEKDELDSGE





KIHLNPVGSDKVNGQSLET





GSERECTNILEGDESDSLTD





YDIVGGKESFTASLKFDDSG





SWRGRKEEYVTGQEFHSD





TDHLDSMQSEESYGDYIYD





SNDQDDDDDDGIDEEGG





GIRDENGKPRCQNVAEDM





DIQLCASILNENSDENENIN





TMILLDKMHSCSSLEKQQR





VNVVQLASPSENNLVTEKS





NLPEYTTEIAGKSKENLLNH





EMVLKDVLPPIIKDTESEKT





FGPASISHDNNNISSTSELG





TDLANTKVKLIQGSELPELT





DSVKGKDEYFKNMTPKVD





SSLDHIICTEPDLIGKPAEES





HLSLIASVTDKDPQGNGSD





LIKGRDGKSDILIEDETSIQK





MYLGEGEVLVEGLVEEENR





HLKLLPGKNTRDSFKLINSQ





FPFPQITNNEELNQKGSLK





KATVTLKDEPNNLQIIVSKS





PVQFENLEEIFDTSVSKEIS





DDITSDITSWEGNTHFEESF





TDGPEKELDLFTYLKHCAK





NIKAKDVAKPNEDVPSHVL





ITAPPMKEHLQLGVNNTKE





KSTSTQKDSPLNDMIQSN





DLCSKESISGGGTEISQFTP





ESIEATLSILSRKHVEDVGK





NDFLQSERCANGLGNDNS





SNTLNTDYSFLEINNKKERI





EQQLPKEQALSPRSQEKEV





QIPELSQVFVEDVKDILKSR





LKEGHMNPQEVEEPSACA





DTKILIQNLIKRITTSQLVNE





ASTVPSDSQMSDSSGVSP





MTNSSELKPESRDDPFCIG





NLKSELLLNILKQDQHSQKI





TGVFELMRELTHMEYDLEK





RGITSKVLPLQLENIFYKLLA





DGYSEKIEHVGDFNQKACS





TSEMMEEKPHILGDIKSKE





GNYYSPNLETVKEIGLESST





VWASTLPRDEKLKDLCNDF





PSHLECTSGSKEMASGDSS





TEQFSSELQQCLQHTEKM





HEYLTLLQDMKPPLDNQES





LDNNLEALKNQLRQLETFE





LGLAPIAVILRKDMKLAEEF





LKSLPSDFPRGHVEELSISH





QSLKTAFSSLSNVSSERTKQ





IMLAIDSEMSKLAVSHEEFL





HKLKSFSDWVSEKSKSVKD





IEIVNVQDSEYVKKRLEFLK





NVLKDLGHTKMQLETTAF





DVQFFISEYAQDLSPNQSK





QLLRLLNTTQKCFLDVQES





VTTQVERLETQLHLEQDLD





DQKIVAERQQEYKEKLQGI





CDLLTQTENRLIGHQEAFM





IGDGTVELKKYQSKQEELQ





KDMQGSAQALAEVVKNTE





NFLKENGEKLSQEDKALIE





QKLNEAKIKCEQLNLKAEQ





SKKELDKVVTTAIKEETEKV





AAVKQLEESKTKIENLLDW





LSNVDKDSERAGTKHKQVI





EQNGTHFQEGDGKSAIGE





EDEVNGNLLETDVDGQVG





TTQENLNQQYQKVKAQHE





KIISQHQAVIIATQSAQVLL





EKQGQYLSPEEKEKLQKN





MKELKVHYETALAESEKKM





KLTHSLQEELEKFDADYTEF





EHWLQQSEQELENLEAGA





DDINGLMTKLKRQKSFSED





VISHKGDLRYITISGNRVLE





AAKSCSKRDGGKVDTSAT





HREVQRKLDHATDRFRSLY





SKCNVLGNNLKDLVDKYQ





HYEDASCGLLAGLQACEAT





ASKHLSEPIAVDPKNLQRQ





LEETKALQGQISSQQVAVE





KLKKTAEVLLDARGSLLPAK





NDIQKTLDDIVGRYEDLSKS





VNERNEKLQITLTRSLSVQD





GLDEMLDWMGNVESSLK





EQDVGTGYCRSSEQYKCH





E






SEQ ID NO: 1902
ENSG00000152359.10
MSSDEEKYSLPVVQNDSSR
A*02:03, A*11:01, A*11:02, 




GSSVSSNLQEEYEELLHYAI
A*24:02, A*24:10, A*33:03, 




VTPNIEPCASQSSHPKGEL
A*34:01, B*39:01, B*40:01, 




VPDVRISTIHDILHSQGNNS
B*55:02, C*03:02, C*03:04, 




EVRETAIEVGKGCDFHISSH
C*12:02




SKTDESSPVLSPRKPSHPV





MDFFSSHLLADSSSPATNS





SHTDAHEILVSDFLVSDENL





QKMENVLDLWSSGLKTNII





SELSKWRLNFIDWHRME





MRKEKEKHAAHLKQLCNQ





INELKELQKTFEISIGRKDEV





ISSLSHAIGKQKEKIELMRTF





FHWRIGHVRARQDVYEGK





LADQYYQRTLLKKVWKVW





RSVVQKQWKDVVERACQ





ARAEEVCIQISNDYEAKVA





MLSGALENAKAEIQRMQH





EKEHFEDSMKKAFMRGVC





ALNLEAMTIFQNRNDAGI





DSTNNKKEEYGPGVQGKE





HSAHLDPSAPPMPLPVTSP





LLPSPPAAVGGASATAVPS





AASMTSTRAASASSVHVP





VSALGAGSAATAASEEMY





VPRVVTSAQQKAGRTITAR





ITGRCDFASKNRISSSLAIM





GVSPPMSSVVVEKHHPVT





VQTIPQATAAKYPRTIHPES





STSASRSLGTRSAHTQSLTS





VHSIKVVD






SEQ ID NO: 1903
ENSG00000153046.13
MASEELYEVERIVDKRKNK
A*02:03, A*11:01, A*11:02, 




KGKTEYLVRWKGYDSEDD
A*33:03, B*15:01, C*03:02, 




TWEPEQHLVNCEEYIHDF
C*07:02, C*15:02




NRRHTEKQKESTLTRTNRT





SPNNARKQISRSTNSNFSK





TSPKALVIGKDHESKNSQLF





AASQKFRKNTAPSLSSRKN






SEQ ID NO: 1904
ENSG00000154556.13
MSYYQRPFSPSAYSLPASL
A*02:03, A*11:01, A*11:02, 




NSSIVMQHGTSLDSTDTYP
A*24:10, A*33:03, B*15:01, 




QHAQSLDGTTSSSIPLYRSS
B*15:27, B*39:01, B*58:01, 




EEEKRVTVIKAPHYPGIGPV
C*03:02, C*03:04, C*07:02, 




DESGIPTAIRTTVDRPKDW
C*12:02, C*14:02, C*15:02




YKTMFKQIHMVHKPDDDT





DMYNTPYTYNAGLYNPPY





SAQSHPAAKTQTYRPLSKS





HSDNSPNAFKDASSPVPPP





HVPPPVPPLRPRDRSSTEK





HDWDPPDRKVDTRKFRSE





PRSIFEYEPGKSSILQHERPA





SLYQSSIDRSLERPMSSAS





MASDFRKRRKSEPAVGPP





RGLGDQSASRTSPGRVDLP





GSSTTLTKSFTSSSPSSPSRA





KGGDDSKICPSLCSYSGLN





GNPSSELDYCSTYRQHLDV





PRDSPRAISFKNGWQMAR





QNAEIWSSTEETVSPKIKSR





SCDDLLNDDCDSFPDPKVK





SESMGSLLCEEDSKESCPM





AWGSPYVPEVRSNGRSRIR





HRSARNAPGFLKMYKKM





HRINRKDLMNSEVICSVKS





RILQYESEQQHKDLLRAWS





QCSTEEVPRDMVPTRISEF





EKLIQKSKSMPNLGDDMLS





PVTLEPPQNGLCPKRRFSIE





YLLEEENQSGPPARGRRGC





QSNALVPIHIEVTSDEQPR





AHVEFSDSDQDGVVSDHS





DYIHLEGSSFCSESDFDHFS





FTSSESFYGSSHHHHHHHH





HHHRHLISSCKGRCPASYT





RFTTMLKHERARHENTEEP





RRQEMDPGLSKLAFLVSPV





PFRRKKNSAPKKQTEKAKC





KASVFEALDSALKDICDQIK





AEKKRGSLPDNSILHRLISEL





LPDVPERNSSLRALRRSPLH





QPLHPLPPDGAIHCPPYQN





DCGRMPRSASFQDVDTAN





SSCHHQDRGGAL






SEQ ID NO: 1905
ENSG00000155275.14
MAEVGRTGISYPGALLPQG
A*02:03, A*11:01, A*11:02, 




FWAAVEVWLERPQVANK
A*24:02, A*24:10, A*33:03, 




RLCGARLEARWSAALPCAE
B*15:01, B*15:27, B*39:01, 




ARGPGTSAGSEQKERGPG
B*40:01, B*55:02, B*58:01, 




PGQGSPGGGPGPRSLSGP
C*03:02, C*14:02, C*15:02




EQGTACCELEEAQGQCQQ





EEAQREAASVPLRDSGHP





GHAEGREGDFPAADLDSL





WEDFSQSLARGNSELLAFL





TSSGAGSQPEAQRELDVVL





RTVIPKTSPHCPLTTPRREIV





VQDVLNGTITFLPLEEDDE





GNLKVKMSNVYQIQLSHS





KEEWFISVLIFCPERWHSD





GIVYPKPTWLGEELLAKLAK





WSVENKKSDFKSTLSLISIM





KYSKAYQELKEKYKEMVKV





WPEVTDPEKFVYEDVAIAA





YLLILWEEERAERRLTARQS





FVDLGCGNGLLVHILSSEG





HPGRGIDVRRRKIWDMYG





PQTQLEEDAITPNDKTLFP





DVDWLIGNHSDELTPWIP





VIAARSSYNCRFFVLPCCFF





DFIGRYSRRQSKKTQYREYL





DFIKEVGFTCGFHVDEDCL





RIPSTKRVCLVGKSRTYPSS





REASVDEKRTQYIKSRRGC





PVSPPGWELSPSPRWVAA





GSAGHCDGQQALDARVG





CVTRAWAAEHGAGPQAE





GPWLPGFHPREKAERVRN





CAALPRDFIDQVVLQVANL





LLGGKQLNTRSSRNGSLKT





WNGGESLSLAEVANELDT





ETLRRLKRECGGLQTLLRNS





HQVFQVVNGRVHIRDWR





EETLWKTKQPEAKQRLLSE





ACKTRLCWFFMHHPDGC





ALSTDCCPFAHGPAELRPP





RTTPRKKIS






SEQ ID NO: 1906
ENSG00000155506.12
MATQVEPLLPGGATLLQA
A*02:03




EEHGGLVRKKPPPAPEGKG





EPGPNDVRGGEPDGSARR





PRPPCAKPHKEGTGQQER





ESPRPLQLPGAEGPAISDG





EEGGGEPGAGGGAAGAA





GAGRRDFVEAPPPKVNPW





TKNALPPVLTTVNGQ






SEQ ID NO: 1907
ENSG00000157514.12
MNTEMYQTPMEVAVYQL
A*02:03, A*24:02, A*24:07, 




HNFSISFFSSLLGGDVVSVK
A*24:10, B*15:01, C*03:02, 




LD
C*03:04, C*03:67, C*12:02, 





C*15:02





SEQ ID NO: 1908
ENSG00000158321.11
MDGPTRGHGLRKKRRSRS
A*02:03, A*24:10, B*15:01, 




QRDRERRSRGGLGAGAAG
B*15:27, B*39:01, B*58:01, 




GGGAGRTRALSLASSSGSD
C*03:02, C*03:04, C*03:67, 




KEDNGKPPSSAPSRPRPPR
C*12:02, C*14:02, C*15:02




RKRRESTSAEEDIIDGFAMT





SFVTFEALEKDVALKPQER





VEKRQTPLTKKKREALTNG





LSFHSKKSRLSHPHHYSSDR





ENDRNLCQHLGKRKKMPK





ALRQLKPGQNSCRDSDSES





ASGESKGFHRSSSRERLSDS





SAPSSLGTGYFCDSDSDQE





EKASDASSEKLFNTVIVNKD





PELGVGTLPEHDSQDAGPI





VPKISGLERSQEKSQDCCKE





PIFEPVVLKDPCPQVAQPIP





QPQTEPQLRAPSPDPDLV





QRTEAPPQPPPLSTQPPQ





GPPEAQLQPAPQPQVQRP





PRPQSPTQLLHQNLPPVQ





AHPSAQSLSQPLSAYNSSSL





SLNSLSSSRSSTPAKTQPAP





PHISHHPSASPFPLSLPNHS





PLHSFTPTLQPPAHSHHPN





MFAPPTALPPPPPLT






SEQ ID NO: 1909
ENSG00000158486.9
MGATGRLELTLAAPPHPG
A*02:03, A*02:07, A*11:01, 




PAFQRSKARETQGEEEGSE
A*11:02, A*24:02, A*24:07, 




MQIAKSDSIHHMSHSQGQ
A*24:10, A*33:03, A*34:01, 




PELPPLPASANEEPSGLYQT
B*15:01, B*15:21, B*15:27, 




VMSHSFYPPLMQRTSWTL
B*27:04, B*38:02, B*39:01, 




AAPFKEQHHHRGPSDSIA
B*40:01, B*40:06, B*46:01, 




NNYSLMAQDLKLKDLLKVY
B*51:01, B*55:02, B*58:01, 




QPATISVPRDRTGQGLPSS
C*01:02, C*03:02, C*03:04, 




GNRSSSEPMRKKTKFSSRN
C*03:67, C*04:01, C*04:03, 




KEDSTRIKLAFKTSIFSPMK
C*07:02, C*08:01, C*12:02, 




KEVKTSLTFPGSRPMSPEQ
C*14:02, C*15:02




QLDVMLQQEMEMESKEK





KPSESDLERYYYYLTNGIRK





DMIAPEEGEVMVRISKLIS





NTLLTSPFLEPLMVVLVQE





KENDYYCSLMKSIVDYILM





DPMERKRLFIESIPRLFPQR





VIRAPVPWHSVYRSAKKW





NEEHLHTVNPMMLRLKEL





WFAEFRDLRFVRTAEILAG





KLPLQPQEFWDVIQKHCLE





AHQTLLNKWIPTCAQLFTS





RKEHWIHFAPKSNYDSSRN





IEEYFASVASFMSLQLRELV





IKSLEDLVSLFMIHKDGNDF





KEPYQEMKFFIPQLIMIKLE





VSEPIIVFNPSFDGCWELIR





DSFLEIIKNSNGIPKLKYIPLK





FSFTAAAADRQCVKAAEP





GEPSMHAAATAMAELKGY





NLLLGTVNAEEKLVSDFLIQ





TFKVFQKNQVGPCKYLNV





YKKYVDLLDNTAEQNIAAF





LKENHDIDDFVTKINAIKKR





RNEIASMNITVPLAMFCLD





ATALNHDLCERAQNLKDH





LIQFQVDVNRDTNTSICNQ





YSHIADKVSEVPANTKELVS





LIEFLKKSSAVTVFKLRRQLR





DASERLEFLMDYADLPYQI





EDIFDNSRNLLLHKRDQAE





MDLIKRCSEFELRLEGYHRE





LESFRKREVMTTEEMKHN





VEKLNELSKNLNRAFAEFEL





INKEEELLEKEKSTYPLLQA





MLKNKVPYEQLWSTAYEF





SIKSEEWMNGPLFLLNAEQ





IAEEIGNMWRTTYKLIKTLS





DVPAPRRLAENVKIKIDKFK





QYIPILSISCNPGMKDRHW





QQISEIVGYEIKPTETTCLSN





MLEFGFGKFVEKLEPIGAA





ASKEYSLEKNLDRMKLDW





VNVTFSFVKYRDTDTNILC





AIDDIQMLLDDHVIKTQTM





CGSPFIKPIEAECRKWEEKLI





RIQDNLDAWLKCQATWLY





LEPIFSSEDIIAQMPEEGRK





FGIVDSYWKSLMSQAVKD





NRILVAADQPRMAEKLQE





ANFLLEDIQKGLNDYLEKKR





LFFPRFFFLSNDELLEILSETK





DPLRVQPHLKKCFEGIAKLE





FTDNLEIVGMISSEKETVPFI





QKIYPANAKGMVEKWLQ





QVEQMMLASMREVIGLGI





EAYVKVPRNHWVLQWPG





QVVICVSSIFWTQEVSQAL





AENTLLDFLKKSNDQIAQIV





QLVRGKLSSGARLTLGALT





VIDVHARDVVAKLSEDRVS





DLNDFQWISQLRYYWVAK





DVQVQIITTEALYGYEYLGN





SPRLVITPLTDRCYRTLMGA





LKLNLGGAPEGPAGTGKTE





TTKDLAKALAKQCVVFNCS





DGLDYKAMGKFFKGLAQA





GAWACFDEFNRIEVEVLSV





VAQQILSIQQAIIRKLKTFIF





EGTELSLNPTCAVFIT






SEQ ID NO: 1910
ENSG00000159263.11
MKEKSKNAAKTRREKENG
A*02:03, A*24:02, A*24:07, 




EFYELAKLLPLPSAITSQLDK
A*24:10, A*34:01, B*15:01, 




ASIIRLTTSYLKMRAVFPEG
B*15:21, B*15:27, B*38:02, 




LGDA
B*39:01, B*40:01, B*40:06, 





B*51:01, B*55:02, C*14:02, 





C*15:02





SEQ ID NO: 1911
ENSG00000159788.14
MFRAGEASKRPLPGPSPPR
A*02:03, A*11:01, A*11:02, 




VRSVEVARGRAGYGFTLSG
A*24:10, A*33:03, A*34:01, 




QAPCVLSCVMRGSPADFV
B*15:01, B*40:01, B*55:02, 




GLRAGDQILAVNEINVKKA
C*15:02




SHEDVVKLIGKCSGVLHMV





IAEGVGRFESCSSDEEGGLY





EGKGWLKPKLDSKALGINR





AERVVEEMQSGGIFNMIF





ENPSLCASNSEPLKLKQRSL





SESAATRFDVGHESINNPN





PNMLSKEEISKVIHDDSVFS





IGLESHDDFALDASILNVA





MIVGYLGSIELPSTSSNLES





DSLQAIRGCMRRLRAEQKI





HSLVTMKIMHDCVQLSTD





KAGVVAEYPAEKLAFSAVC





PDDRRFFGLVTMQTNDD





GSLAQEEEGALRTSCHVF





MVDPDLFNHKIHQGIARR





FGFECTADPDTNGCLEFPA





SSLPVLQFISVLYRDMGELI





EGMRARAFLDGDADAHQ





NNSTSSNSDSGIGNFHQEE





KSNRVLVVD






SEQ ID NO: 1912
ENSG00000160200.13
MPSETPQAEVGPTGCPHR
A*02:03, A*11:01, A*11:02, 




SGPHSAKGSLEKGSPEDKE
A*24:10, A*33:03, B*15:01, 




AKEPLWIRPDAPSRCTWQ
B*38:02, B*39:01, B*40:01, 




LGRPASESPHHHTAPAKSP
B*58:01, C*03:02, C*03:04, 




KILPDILKKIGDTPMVRINKI
C*07:02, C*14:02




GKKFGLKCELLAKCEFFNA





GGSVKDRISLRMIEDAERD





GTLKPGDTIIEPTSGNTGIG





LALAAAVRGYRCIIVMPEK





MSSEKVDVLRALGAEIVRT





PTNARFDSPESHVGVAWR





LKNEIPNSHILDQYRNASN





PLAHYDTTADEILQQCDGK





LDMLVASVGTGGTITGIAR





KLKEKCPGCRIIGVDPEGSIL





AEPEELNQTEQTTYEVEGI





GYDFIPTVLDRTVVDKWFK





SNDEEAFTFARMLIAQEGL





LCGGSAGSTVAVAVKAAQ





ELQEGQRCVVILPDSVRNY





MTKFLSDRWMLQKGFLKE





EDLTEKKPWWWHLRVQE





LGLSAPLTVLPTITCGHTIEIL





REKGFDQAPVVDEAGVILG





MVTLGNMLSSLLAGKVQP





SDQVGKVIYKQFKQIRLTD





TLGRLSHILEMDHFALVVH





EQIQYHSTGKSSQRQMVF





GVVTAIDLLNFVAAQERDQ





K






SEQ ID NO: 1913
ENSG00000160799.7
MQDGRKGGAYAGKMEAT
A*02:03




TAGVGRLEEEALRRKERLK





ALREKTG






SEQ ID NO: 1914
ENSG00000160838.9
MSSEQSAPGASPRAPRPG
A*02:03, A*11:01, A*11:02, 




TQKSSGAVTKKGERAAKEK
A*24:02, A*24:07, A*24:10, 




PATVLPPVGEEEPKSPEEY
B*40:01, B*55:02, C*01:02, 




QCSGVLETDFAELCTRWG
C*03:02, C*04:01, C*04:03, 




YTDFPKVVNRPRPHPPFVP
C*07:02, C*15:02




SASLSEKATLDDPRLSGSCS





LNSLESKYVFFRPTIQVELE





QEDSKSVKEIYIRGWKVEE





RILGVFSKCLPPLTQLQAIN





LWKVGLTDKTLTTFIELLPL





CSSTLRKVSLEGNPLPEQSY





HKL






SEQ ID NO: 1915
ENSG00000164093.11
METNCRKLVSACVQLGVQ
A*11:01, A*11:02, A*33:03




PAAVECLFSKDSEIKKVEFT





DSPESRKEAASSKFFPRQH






SEQ ID NO: 1916
ENSG00000164764.10
MRTLWMALCALSRLWPG
A*11:01, A*11:02, A*24:10, 




AQAGCAEAGRCCPGRDPA
A*33:03, B*55:02, C*03:02, 




CFARGWRLDRVYGTCFCD
C*03:04




QACRFTGDCCFDYDRACP





ARPCFVGEWSPWSGCAD





QCKPTTRVRRRSVQQEPQ





NGGAPCPPLEERAGCLEYS





TPQGQDCGHTYVPAFITTS





AFNKERTRQATSPHWSTH





TEDAGYCMEFKTESLTPHC





ALENWPLTRWMQYLREG





YTVCVDCQPPAMNSVSLR





CSGDGLDSDGNQTLHWQ





AIGNPRCQGTWKKVRRVD





QCSCPAVHSFIFI






SEQ ID NO: 1917
ENSG00000164830.13
MDYLTTFTEKSGRLLRGTA
A*33:03




NRLLGFGGGGEARQVRFE





DYLREPAQGDLGCGSPPH





RPPAPSSPEGP






SEQ ID NO: 1918
ENSG00000166689.10
MAAATVGRDTLPEHWSY
A*33:03




GVCRDGRVFFINDQLRCTT





WLHPRTGEPVNSGHMIRS





DLPRGWEE






SEQ ID NO: 1919
ENSG00000167157.9
MDSAAAAFALDKPALGPG
A*11:01, A*11:02, C*03:02, 




PPPPPPALGPGDCAQARK
C*03:04, C*03:67




NFSVSHLLDLEEVAAAGRL





AARPGARAEAREGAAREP





SGGSSGSEAAPQ






SEQ ID NO: 1920
ENSG00000167632.10
MSVPDYMQCAEDHQTLL
A*02:03, A*02:07, A*11:01, 




VVVQPVGIVSEENFFRIYKR
A*11:02, A*24:02, A*24:07, 




ICSVSQISVRDSQRVLYIRYR
A*24:10, A*33:03, B*15:01, 




HHYPPENNEWGDFQTHR
B*15:27, B*39:01, B*40:01, 




KVVGLITITDCFSAKDWPQ
B*55:02, B*58:01, C*03:02, 




TFEKFHVQKEIYGSTLYDSR
C*03:04, C*03:67, C*07:02, 




LFVFGLQGEIVEQPRTDVA
C*12:02, C*14:02, C*15:02




FYPNYEDCQTVEKRIEDFIE





SLFIVLESKRLDRATDKSGD





KIPLLCVPFEKKDFVGLDTD





SRHYKKRCQGRMRKHVG





DLCLQAGMLQDSLVHYH





MSVELLRSVNDFLWLGAA





LEGLCSASVIYHYPGGTGG





KSGARRFQGSTLPAEAANR





HRPGALTTNGINPDTSTEI





GRAKNCLSPEDIIDKYKEAIS





YYSKYKNAGVIELEACIKAV





RVLAIQKRSMEASEFLQNA





VYINLRQLSEEEKIQRYSILS





ELYELIGFHRKSAFFKRVAA





MQCVAPSIAEPGWRACYK





LLLETLPGYSLSLDPKDFSR





GTHRGWAAVQMRLLHEL





VYASRRMGNPALSVRHLSF





LLQTMLDFLSDQEKKDVA





QSLENYTSKCPGTMEPIAL





PGGLTLPPVPFTKLPIVRHV





KLLNLPASLRPHKMKSLLG





QNVSTKSPFIYSPIIAHNRG





EERNKKIDFQWVQGDVCE





VQLMVYNPMPFELRVEN





MGLLTSGVEFESLPAALSLP





AESGLYPVTLVGVPQTTGTI





TVNGYHTTVFGVFSDCLLD





NLPGIKTSGSTVEVIPALPR





LQISTSLPRSAHSLQPSSGD





EISTNVSVQLYNGESQQLII





KLENIGMEPLEKLEVTSKVL





TTKEKLYGDFLSWKLEETLA





QFPLQPGKVATFTINIKVKL





DFSCQENLLQDLSDDGISV





SGFPLSSPFRQVVRPRVEG





KPVNPPESNKAGDYSHVKT





LEAVLNFKYSGGPGHTEGY





YRNLSLGLHVEVEPSVFFTR





VSTLPATSTRQCHLLLDVF





NSTEHELTVSTRSSEALILH





AGECQRMAIQVDKFNFES





FPESPGEKGQFANPKQLEE





ERREARGLEIHSKLGICWRI





PSLKRSGEASVEGLLNQLVL





EHLQLAPLQWDVLVDGQP





CDREAVAACQVGDPVRLE





VRLTNRSPRSVGPFALTVV





PFQDHQNGVHNYDLHDT





VSFVGSSTFYLDAVQPSGQ





SACLGALLFLYTGDFFLHIRF





HEDSTSKELPPSWFCLPSV





HVCALEAQA






SEQ ID NO: 1921
ENSG00000170615.10
MDHAEENEILAATQRYYVE
A*02:03, A*02:07, A*11:01, 




RPIFSHPVLQERLHTKDKVP
A*11:02, A*24:02, A*24:07, 




DSIADKLKQAFTCTPKKIRN
A*24:10, A*33:03, A*34:01, 




IIYMFLPITKWLPAYKFKEY
B*15:01, B*15:21, B*15:27, 




VLGDLVSGISTGVLQLPQG
B*27:04, B*38:02, B*39:01, 




LAFAMLAAVPPIFGLYSSFY
B*40:01, B*40:06, B*46:01, 




PVIMYCFLGTSRHISIGPFA
B*51:01, B*55:02, B*58:01, 




VISLMIGGVAVRLVPDDIVI
C*01:02, C*03:02, C*03:04, 




PGGVNATNGTEARDALRV
C*03:67, C*04:01, C*04:03, 




KVAMSVTLLSGIIQFCLGVC
C*08:01, C*12:02, C*14:02, 




RFGFVAIYLTEPLVRGFTTA
C*15:02




AAVHVFTSMLKYLFGVKTK





RYSGIFSVVYSTVAVLQNV





KNLNVCSLGVGLMVFGLLL





GGKEFNERFKEKLPAPIPLE





FFAVVMGTGISAGFNLKES





YNVDVVGTLPLGLLPPANP





DTSLFHLVYVDAIAIAIVGFS





VTISMAKTLANKHGYQVD





GNQELIALGLCNSIGSLFQT





FSISCSLSRSLVQEGTGGKT





QLAGCLASLMILLVILATGF





LFESLPQAVLSAIVIVNLKG





MFMQFSDLPFFWRTSKIEL





TIWLTTFVSSLFLGLDYGLIT





AVIIALLTVIYRTQS






SEQ ID NO: 1922
ENSG00000171680.16
MHYDGHVRFDLPPQGSVL
A*02:03, A*02:07, A*11:01, 




ARNVSTRSCPPRTSPAVDL
A*11:02, A*24:10, A*33:03, 




EEEEEESSVDGKGDRKSTG
B*15:01, B*39:01, B*40:01, 




LKLSKKKARRRHTDDPSKE
B*58:01, C*03:02, C*03:04, 




CFTLKFDLNVDIETEIVPAM
C*07:02, C*12:02, C*14:02, 




KKKSLGEVLLPVFERKGIAL
C*15:02




GKVDIYLDQSNTPLSLTFEA





YRFGGHYLRVKAPAKPGDE





GKVEQGMKDSKSLSLPILR





PAGTGPPALERVDAQSRRE





SLDILAPGRRRKNMSEFLG





EASIPGQEPPTPSSCSLPSG





SSGSTNTGDSWKNRAASR





FSGFFSSGPSTSAFGREVDK





MEQLEGKLHTYSLFGLPRL





PRGLRFDHDSWEEEYDED





EDEDNACLRLEDSWRELID





GHEKLTRRQCHQQEAVW





ELLHTEASYIRKLRVIINLFLC





CLLNLQESGLLCEVEAERLF





SNIPEIAQLHRRLWASVMA





PVLEKARRTRALLQPGDFL





KGFKMFGSLFKPYIRYCME





EEGCMEYMRGLLRDNDLF





RAYITWAEKHPQCQRLKLS





DMLAKPHQRLTKYPLLLKS





VLRKTEEPRAKEAVVAMIG





SVERFIHHVNACMRQRQE





RQRLAAVVSRIDAYEVVES





SSDEVDKLLKEFLHLDLTAPI





PGASPEETRQLLLEGSLRM





KEGKDSKMDVYCFLFTDLL





LVTKAVKKAERTRVIRPPLL





VDKIVCRELRDPGSFLLIYLN





EFHSAVGAYTFQASGQALC





RGWVDTIYNAQNQLQQL





RAQEPPGSQQPLQSLEEEE





DEQEEEEEEEEEEEEGEDS





GTSAASSPTIMRKSSGSPD





SQHCASDGSTETLAMVVV





EPGDTLSSPEEDSGPFSSQS





DETSLSTTASSATPTSELLPL





GPVDGRSCSMDSAYGTLS





PTSLQDFVAPGPMAELVP





RAPESPRVPSPPPSPRLRRR





TPVQLLSCPPHLLKSKSEAS





LLQLLAGAGTHGTPSAPSR





SLSELCLAVPAPGIRTQGSP





QEAGPSWDCRGAPSPGSG





PGLVGCLAGEPAGSHRKRC





GDLPSGASPRVQPEPPPGV





SAQHRKLTLAQLYRIRTTLL





LNSTLTASEV






SEQ ID NO: 1923
ENSG00000171791.10
MAHAGRTGYDNREIVMK
A*02:03, A*11:01, A*11:02, 




YIHYKLSQRGYEWDAGDV
A*24:02, A*24:07, A*24:10, 




GAAPPGAAPAPGIFSSQPG
A*33:03, A*34:01, B*15:21, 




HTPHPAASRDPVARTSPLQ
B*27:04, B*40:01, B*40:06, 




TPAAPGAAAGPALSPVPPV
B*46:01, B*55:02, B*58:01, 




VHLTLRQAGDDFSRRYRRD
C*01:02, C*03:02, C*04:01, 




FAEMSSQLHLTPFTARGRF
C*04:03, C*14:02




ATVVEELFRDGVNWGRIV





AFFEFGGVMCVESVNREM





SPLVDNIALWMTEYLNRHL





HTWIQDNGGWDAFVELY





GPS






SEQ ID NO: 1924
ENSG00000172765.12
MKRGTSLHSRRGKPEAPK
A*02:03, A*33:03, C*03:02, 




GSPQINRKSGQEMTAVM
C*03:04




QSGRPRSSSTTDAPTSSAM





MEIACAAAAAAAACLPGE





EGTAE






SEQ ID NO: 1925
ENSG00000174672.11
MTSTGKDGGAQHAQYVG
A*02:03, A*11:01, A*11:02, 




PYRLEKTLGKGQTGLVKLG
A*24:02, A*24:10, A*33:03, 




VHCVTCQKVAIKIVNREKLS
B*40:01, C*03:02, C*03:04, 




ESVLMKVEREIAILKLIEHPH
C*14:02




VLKLHDVYENKKYLYLVLEH





VSGGELFDYLVKKGRLTPK





EARKFFRQIISALDFCHSHSI





CHRDLKPENLLLDEKNNIRI





ADFGMASLQVGDSLLETSC





GSPHYACPEVIRGEKYDGR





KADVWSCGVILFALLVGAL





PFDDDNLRQLLEKVKRGVF





HMPHFIPPDCQSLLRGMIE





VDAARRLTLEHIQKHIWYI





GGKNEPEPEQPIPRKVQIR





SLPSLEDIDPDVLDSMHSL





GCFRDRNKLLQDLLSEEEN





QEKMIYFLLLDRKERYPSQE





DEDLPPRNEIDPPRKRVDS





PMLNRHGKRRPERKSMEV





LSVTDGGSPVPARRAIEMA





QHGQSKAMFSKSLDIAEA





HPQFSKEDRSRSISGASSGL





STSPLSSPRVTPHPSPRGSP





LPTPKGTPVHTPKESPAGT





PNPTPPSSPSVGGVPWRA





RLNSIKNSFLGSPRFHRRKL





QVPTPEEMSNLTPESSPEL





AKKSWFGNFISLEKEEQIFV





VIKDKPLSSIKADIVHAFLSI





PSLSHSVISQTSFRAEYKAT





GGPAVFQKPVKFQVDITYT





EGGEAQKENGIYSVTFTLLS





GPSRRFKRVVETIQAQLLST





HDPPAAQHLSEPPPPAPGL





SWGAGLKGQKVATSYESSL






SEQ ID NO: 1926
ENSG00000177380.9
MMCEVMPTISEDGRRGSA
A*02:03, A*11:01, A*11:02, 




LGPDEAGGELERLMVTML
A*24:10, A*33:03, B*15:01, 




TERERLLETLREAQDGLAT
B*39:01, B*40:01, B*58:01, 




AQLRLRELGHEKDSLQRQL
C*03:02, C*03:04, C*03:67, 




SIALPQEFAALTKELNLCRE
C*12:02




QLLEREEEIAELKAERNNTR





LLLEHLECLVSRHERSLRMT





VVKRQAQSPGGVSSEVEV





LKALKSLFEHHKALDEKVRE





RLRMALERVAVLEEELELS





NQETLNLREQLSRRRSGLE





EPGKDGDGQTLANGLGPG





GDSNRRTAELEEALERQRA





EVCQLRERLAVLCRQMSQ





LEEELGTAHRELGKAEEAN





SKLQRDLKEALAQREDME





ERITTLEKRYLSAQREATSL





HDANDKLENELASKESLYR





QSEEKSRQLAEWLDDAKQ





KLQQTLQKAETLPEIEAQLA





QRVAALNKAEERHGNFEE





RLRQLEAQLEEKNQELQRA





RQREKMNDDHNKRLSETV





DKLLSESNERLQLHLKERM





GALEEKNSLSEEIANMKKL





QDELLLNKEQLLAEMERM





QMEIDQLRGRPPSSYSRSL





PGSALELRYSQAPTLPSGA





HLDPYVAGSGRAGKRGR





WSGVKEEPSKDWERSAPA





GSIPPPFPGELDGSDEEEAE





GMFGAELLSPSGQADVQT





LAIMLQEQLEAINKEIKLIQE





EKETTEQRAEELESRVSSSG





LDSLGRYRSSCSLPPSLTTST





LASPSPPSSGHSTPRLAPPS





PAREGTDKANHVPKEEAG





APRGEGPAIPGDTPPPTPR





SARLERMTQALALQAGSLE





DGGPPRGSEGTPDSLHKA





PKKKSIKSSIGRLFGKKEKG





RMGPPGRDSSSLAGTPSD





ETLATDPLGLAKLTGPGDK





DRRNKRKHELLEEACRQGL





PFAAWDGPTVVSWLELW





VGMPAWYVAACRANVKS





GAIMANLSDTEIQREIGISN





PLHRLKLRLAIQEMVSLTSP





SAPASSRTSTGNVWMTHE





EMESLTATTKPILAYGDMN





HEWVGNDWLPSLGLPQY





RSYFMESLVDARMLDHLN





KKELRGQLKMVDSFHRVSL





HYGIMCLKRLNYDRKDLER





RREESQTQIRDVMVWSNE





RVMGWVSGLGLKEFATNL





TESGVHGALLALDETFDYS





DLALLLQIPTQNAQARQLL





EKEFSNLISLGTDRRLDEDS





AKSFSRSPSWRKMFREKDL





RGVTPDSAEMLPPNFRSA





AAGALGSPGLPLRKLQPEG





QTSGSSRADGVSVRTYSC






SEQ ID NO: 1927
ENSG00000177455.7
MPPPRLLFFLLFLTPMEVR
A*02:03, A*11:01, A*11:02, 




PEEPLVVKVEEGDNAVLQC
A*24:10, B*39:01, B*40:01, 




LKGTSDGPTQQLTWSRES
B*58:01, C*03:02, C*03:04, 




PLKPFLKLSLGLPGLGIHMR
C*12:02, C*14:02, C*15:02




PLAIWLFIFNVSQQMGGFY





LCQPGPPSEKAWQPGWT





VNVEGSGELFRWNVSDLG





GLGCGLKNRSSEGPSSPSG





KLMSPKLYVWAKDRPEIW





EGEPPCLPPRDSLNQSLSQ





DLTMAPGSTLWLSCGVPP





DSVSRGPLSWTHVHPKGP





KSLLSLELKDDRPARDMW





VMETGLLLPRATAQDAGK





YYCHRGNLTMSFHLEITAR





PVLWHWLLRTGGWKVSA





VTLAYLIFCLCSLVGILHLQR





ALVLRRKRKRMTDPTRRFF





KVTPPPGSGPQNQYGNVL





SLPTPTSGLGRAQRWAAG





LGGTAPSYGNPSSDVQAD





GALGSRSPPGVGPEEEEGE





GYEEPDSEEDSEFYENDSN





LGQDQLSQDGSGYENPED





EPLGPEDEDSFSNAESYEN





EDEELTQPVARTMDFLSPH





GSAWDPSREATSLGSQSYE





DMRGILYAAPQLRSIRGQP





GPNHEEDADSYENMDNP





DGPDPAWGGGGRMGTW





STR






SEQ ID NO: 1928
ENSG00000178209.10
MVAGMLMPRDQLRAIYE
A*02:03, A*11:01, A*11:02, 




VLFREGVMVAKKDRRPRSL
A*24:02, A*24:10, A*33:03, 




HPHVPGVTNLQVMRAMA
A*34:01, B*55:02, C*03:02, 




SLRARGLVRETFAWCHFY
C*03:04




WYLTNEGIAHLRQYLHLPP





EIVPASLQRVRRPVAMVM





PARRTPHVQAVQGPLGSP





PKRGPLPTEEQRVYRRKEL





EEVSPETPVVPATTQRTLA





RPGPEPAPAT






SEQ ID NO: 1929
ENSG00000181035.9
MGNGVKEGPVRLHEDAE
A*02:03, A*11:01, A*11:02, 




AVLSSSVSSKRDHRQVLSSL
A*24:02, A*24:07, A*24:10, 




LSGALAGALAKTAVAPLDR
A*33:03, B*15:01, B*39:01, 




TKIIFQVSSKRFSAKEAFRVL
B*40:01, C*03:02, C*03:04, 




YYTYLNEGFLSLWRGNSAT
C*03:67, C*12:02, C*14:02




MVRVVPYAAIQFSAHEEYK





RILGSYYGFRGEALPPWPR





LFAGALAGTTAASLTYPLDL





VRARMAVTPKEMYSNIFH





VFIRISREEGLKTLYHGFMP





TVLGVIPYAGLSFFTYETLKS





LHREYSGRRQPYPFERMIF





GACAGLIGQSASYPLDVVR





RRMQTAGVTGYPRASIAR





TLRTIVREEGAVRGLYKGLS





MNWVKGPIAVGISFTTFDL





MQILLRHLQS






SEQ ID NO: 1930
ENSG00000185404.12
MAGGGSDLSTRGLNGGVS
A*02:03, A*24:10, A*33:03, 




QVANEMNHLPAHSQSLQ
C*03:02




RLFTEDQDVDEGLVYDTVF





KHFKRHKLEISNAIKKTFPFL





EGLRDRELITNK






SEQ ID NO: 1931
ENSG00000185686.13
MERRRLWGSIQSRYISMS
A*02:03, A*11:01, A*11:02, 




VWTSPRRLVELAGQSLLKD
A*24:10, A*33:03, B*15:01, 




EALAIAALELLPRELFPPLF
B*39:01, B*40:01, B*58:01, 




MAAFDGRHSQTLKAMVQ
C*03:02, C*03:04, C*14:02




AWPFTCLPLGVLMKGQHL





HLETFKAVLDGLDVLLAQE





VRPRRWKLQVLDLRKNSH





QDFWTVWSGNRASLYSFP





EPEAAQPMTKKRKVDGLS





TEAEQPFIPVEVLVDLFLKE





GACDELFSYLIEKVKRKKNV





LRLCCKKLKIFAMPMQDIK





MILKMVQLDSIEDLEVTCT





WKLPTLAKFSPYLGQMINL





RRLLLSHIHASSYISPEKEEQ





YIAQFTSQFLSLQCLQALYV





DSLFFLRGRLDQLLRHVMN





PLETLSITNCRLSEGDVMHL





SQSPSVSQLSVLSLSGVML





TDVSPEPLQALLERASATL





QDLVFDECGITDDQLLALL





PSLSHCSQLTTLSFYGNSISI





SALQSLLQHLIGLSNLTHVL





YPVPLESYEDIHGTLHLERL





AYLHARLRELLCELGRPSM





VWLSANPCPHCGDRTFYD





PEPILCPCFMPN






SEQ ID NO: 1932
ENSG00000185989.9
MAVEDEGLRVFQSVKIKIG
A*02:03, A*11:01, A*11:02, 




EAKNLPSYPGPSKMRDCYC
A*24:02, A*24:07, A*24:10, 




TVNLDQEEVFRTKIVEKSLC
A*33:03, B*15:01, B*15:27, 




PFYGEDFYCEIPRSFRHLSF
B*39:01, B*40:01, B*58:01, 




YIFDRDVFRRDSIIGKVAIQ
C*03:02, C*03:04, C*07:02, 




KEDLQKYHNRDTWFQLQH
C*12:02, C*14:02




VDADSEVQGKVHLELRLSE





VITDTGVVCHKLATRIVEC





QGLPIVNGQCDPYATVTLA





GPFRSEAKKTKVKRKTNNP





QFDEVFYFEVTRPCSYSKKS





HFDFEEEDVDKLEIRVDLW





NASNLKFGDEFLGELRIPLK





VLRQSSSYEAWYFLQPRD





NGSKSLKPDDLGSLRLNVV





YTEDHVFSSDYYSPLRDLLL





KSADVEPVSASAAHILGEV





CREKQEAAVPLVRLFLHYG





RVVPFISAIASAEVKRTQDP





NTIFRGNSLASKCIDETMKL





AGMHYLHVTLKPAIEEICQ





SHKPCEIDPVKLKDGENLE





NNMENLRQYVDRVFHAIT





ESGVSCPTVMCDIFFSLREA





AAKRFQDDPDVRYTAVSSF





IFLRFFAPAILSPNLFQLTPH





HTDPQTSRTLTLISKTVQTL





GSLSKSKSASFKESYMATFY





EFFNEQKYADAVKNFLDLIS





SSGRRDPKSVEQPIVLKEG






SEQ ID NO: 1933
ENSG00000196961.8
MPAVSKGDGMRGLAVFIS
A*02:03, A*11:01, A*11:02, 




DIRNCKSKEAEIKRINKELA
A*24:02, A*24:07, A*24:10, 




NIRSKFKGDKALDGYSKKK
A*33:03, A*34:01, B*15:01, 




YVCKLLFIFLLGHDIDFGHM
B*15:27, B*39:01, B*40:01, 




EAVNLLSSNKYTEKQIGYLFI
B*40:06, B*58:01, C*03:02, 




SVLVNSNSELIRLINNAIKN
C*03:04, C*03:67, C*08:01, 




DLASRNPTFMCLALHCIAN
C*12:02, C*14:02, C*15:02




VGSREMGEAFAADIPRILV





AGDSMDSVKQSAALCLLRL





YKASPDLVPMGEWTARVV





HLLNDQHMGVVTAAVSLI





TCLCKKNPDDFKTCVSLAV





SRLSRIVSSASTDLQDYTYY





FVPAPWLSVKLLRLLQCYP





PPEDAAVKGRLVECLETVL





NKAQEPPKSKKVQHSNAK





NAILFETISLIIHYDSEPNLLV





RACNQLGQFLQHRETNLR





YLALESMCTLASSEFSHEAV





KTHIDTVINALKTERDVSVR





QRAADLLYAMCDRSNAKQ





IVSEMLRYLETADYAIREEIV





LKVAILAEKYAVDYSWYVD





TILNLIRIAGDYVSEEVWYR





VLQIVTNRDDVQGYAAKT





VFEALQAPACHENMVKVG





GYILGEFGNLIAGDPRSSPP





VQFSLLHSKFHLCSVATRAL





LLSTYIKFINLFPETKATIQG





VLRAGSQLRNADVELQQR





AVEYLTLSSVASTDVLATVL





EEMPPFPERESSILAKLKRK





KGPGAGSALDDGRRDPSS





NDINGGMEPTPSTVSTPSP





SADLLGLRAAPPPAAPPAS





AGAGNLLVDVFDGPAAQP





SLGPTPEEAFLSPGPEDIGP





PIPEADELLNKFVCKNNGV





LFENQLLQIGVKSEFRQNL





GRMYLFYGNKTSVQFQNF





SPTVVHPGDLQTQLAVQT





KRVAAQVDGGAQVQQVL





NIECLRDFLTPPLLSVRFRY





GGAPQALTLKLPVTINKFF





QPTEMAAQDFFQRWKQL





SLPQQEAQKIFKANHPMD





AEVTKAKLLGFGSALLDNV





DPNPENFVGAGIIQTKALQ





VGCLLRLEPNAQAQMYRL





TLRTSKEPVSRHLCELLAQQ





F






SEQ ID NO: 1934
ENSG00000197530.8
MAGALRRGRALGSRPSGP
A*02:03, A*11:01, A*11:02, 




TVSSRRSPQCPVAQEGLGA
A*24:02, A*24:07, A*24:10, 




RSRPRVAPRSLARCGPSSRL
A*33:03, B*15:01, B*39:01, 




MGWKPSEARGQSQSFQA
B*40:01, B*58:01, C*03:02, 




SGLQPRSLKAARRATGRPD
C*03:04, C*07:02, C*12:02, 




RSRAAPPNMDPDPQAGV
C*14:02




QVGMRVVRGVDWKWGQ





QDGGEGGVGTVVELGRH





GSPSTPDRTVVVQWDQG





TRTNYRAGYQGAHDLLLYD





NAQIGVRHPNIICDCCKKH





GLRGMRWKCRVCLDYDLC





TQCYMHNKHELAHAFDRY





ETAHSRPVTLSPRQGLPRIP





LRGIFQGAKVVRGPDWE





WGSQDGGEGKPGRVVDI





RGWDVETGRSVASVTWA





DGTTNVYRVGHKGKVDLK





CVGEAAGGFYYKDHLPRLG





KPAELQRRVSADSQPFQH





GDKVKCLLDTDVLREMQE





GHGGWNPRMAEFIGQTG





TVHRITDRGDVRVQFNHE





TRWTFHPGALTKHHSFWV





GDVVRVIGDLDTVKRLQA





GHGEWTDDMAPALGRVG





KVVKVFGDGNLRVAVAGQ





RWTFSPSCLVAYRPEEDAN





LDVAERARENKSSLSVALD





KLRAQKSDPEHPGRLVVEV





ALGNAARALDLLRRRPEQV





DTKNQGRTALQVAAYLGQ





VELIRLLLQARAGVDLPDDE





GNTALHYAALGNQPEATR





VLLSAGCRADAINSTQSTA





LHVAVQRGFLEVVRALCER





GCDVNLPDAHSDTPLHSAI





SAGTGASGIVEVLTEVPNID





VTATNSQGFTLLHHASLKG





HALAVRKILARARQLVDAK





KEDGFTALHLAALNNHREV





AQILIREGRCDVNVRNRKL





QSPLHLAVQQAHVGLVPLL





VDAGCSVNAEDEEGDTAL





HVALQRHQLLPLVADGAG





GDPGPLQLLSRLQASGLPG





SAELTVGAAVACFLALEGA





DVSYTNHRGRSPLDLAAEG





RVLKALQGCAQRFRERQA





GGGAAPGPRQTLGTPNTV





TNLHVGAAPGPEAAECLV





CSELALLVLFSPCQHRTVCE





ECARRMKKCIRCQVVVSKK





LRPDGSEVASAAPAPGPPR





QLVEELQSRYRQMEERITC





PICIDSHIRLVFQCGHGACA





PCGSALSACPICRQPIRDRI





QIFV






SEQ ID NO: 1935
ENSG00000204839.4
MAGGVWGRSRAREAPVG
A*02:03, A*11:01, A*11:02, 




ALTLTALTEGIRARQGQPQ
A*24:02, A*24:07, A*24:10, 




GPPSAGPQPKSWEVKPEA
A*33:03, B*39:01, B*40:01, 




EPQTQALTAPSEAEPGRGA
B*58:01, C*03:02, C*03:04, 




TVPEAGSEPCSLNSALEPAP
C*14:02




EGPHQVPQSSWEEGVLAD





LALYTAACLEEAGFAGTQA





TVLTLSSALEARGERLEDQV





HALVRGLLAQVPSLAEGRP





WRAALRVLSALALEHARD





VVCALLPRSLPADRVAAEL





WRSLSRNQRVNGQVLVQL





LWALKGASGPEPQALAAT





RALGEMLAVSGCVGATRG





FYPHLLLALVTQLHKLARSP





CSPDMPKIWVLSHRGPPH





SHASCAVEALKALLTGDGG





RMVVTCMEQAGGWRRLV





GAHTHLEGVLLLASAMVA





HADHHLRGLFADLLPRLRS





ADDPQRLTAMAFFTGLLQ





SRPTARLLREEVILERLLTW





QGDPEPTVRWLGLLGLGH





LALNRRKVRHVSTLLPALLG





ALGEGDARLVGAALGALR





RLLLRPRAPVRLLSAELGPR





LPPLLDDTRDSIRASAVGLL





GTLVRRGRGGLRLGLRGPL





RKLVLQSLVPLLLRLHDPSR





DAAESSEWTLARCDHAFC





WGLLEELVTVAHYDSPEAL





SHLCCRLVQRYPGHVPNFL





SQTQGYLRSPQDPLRRAA





AVLIGFLVHHASPGCVNQD





LLDSLFQDLGRLQSDPKPA





VAAAAHVSAQQVA






SEQ ID NO: 1936
ENSG00000205277.5
MLVIWILTLALRLCASVTTV
A*02:03, A*11:01, A*11:02, 




TPGSTVNTSIGGNTTSASTP
A*24:02, A*24:10, A*33:03, 




SSSDPFTTFSDYGVSVTFIT
B*15:01, B*39:01, B*40:01, 




GSTATKHFLDSSTNSGHSE
B*55:02, B*58:01, C*03:02, 




ESTVSHSGPGATGTTLFPS
C*03:04, C*03:67, C*07:02, 




HSATSVFVGEPKTSPITSAS
C*12:02, C*14:02, C*15:02




METTALPGSTTTAGLSEKS





TTFYSSPRSPDRTLSPARTT





SSGVSEKSTTSHSRPGPTHT





IAFPDSTTMPGVSQESTAS





HSIPGSTDTTLSPGTTTPSSL





GPESTTFHSSPGYTKTTRLP





DNTTTSGLLEASTPVHSST





GSPHTTLSPSSSTTHEGEPT





TFQSWPSSKDTSPAPSGTT





SAFVKLSTTYHSSPSSTPTT





HFSASSTTLGHSEESTPVHS





SPVATATTPPPARSATSGH





VEESTAYHRSPGSTQTMHF





PESSTTSGHSEESATFHGST





THTKSSTPSTTAALAHTSYH





SSLGSTETTHFRDSSTISGRS





EESKASHSSPDAMATTVLP





AGSTPSVLVGDSTPSPISSG





SMETTALPGSTTKPGLSEKS





TTFYSSPRSPDTTHLPASM





TSSGVSEESTTSHSRPGSTH





TTAFPGSTTMPGLSQESTA





SHSSPGPTDTTLSPGSTTAS





SLGPEYTTFHSRPGSTETTL





LPDNTTASGLLEASMPVHS





STRSPHTTLSPAGSTTRQG





ESTTFHSWPSSKDTRPAPP





TTTSAFVEPSTTSHGSPSSIP





TTHISARSTTSGLVEESTTY





HSSPGSTQTMHFPESDTTS





GRGEESTTSHSSTTHTISSA





PSTTSALVEEPTSYHSSPGS





TATTHFPDSSTTSGRSEEST





ASHSSQDATGTIVLPARSTT





SVLLGESTTSPISSGSMETT





ALPGSTTTPGLSERSTTFHS





SPRSPATTLSPASTTSSGVS





EESTTSRSRPGSTHTTAFPD





STTTPGLSRHSTTSHSSPGS





TDTTLLPASTTTSGPSQEST





TSHSSSGSTDTALSPGSTTA





LSFGQESTTFHSNPGSTHT





TLFPDSTTSSGIVEASTRVH





SSTGSPRTTLSPASSTSPGL





QGESTAFQTHPASTHTTPS





PPSTATAPVEESTTYHRSP





GSTPTTHFPASSTTSGHSEK





STIFHSSPDASGTTPSSAHS





TTSGRGESTTSRISPGSTEIT





TLPGSTTTPGLSEASTTFYSS





PRSPTTTLSPASMTSLGVG





EESITSRSQPGSTHSTVSPA





STTTPGLSEESTTVYSSSRG





STETTVFPHSTTTSVHGEEP





TTFHSRPASTHTTLFTEDST





TSGLTEESTAFPGSPASTQT





GLPATLTTADLGEESTTFPS





SSGSTGTKLSPARSTTSGLV





GESTPSRLSPSSTETTTLPGS





PTTPSLSEKSTTFYTSPRSPD





ATLSPATTTSSGVSEESSTS





HSQPGSTHTTAFPDSTTTS





DLSQEPTTSHSSQGSTEATL





SPGSTTASSLGQQSTTFHSS





PGDTETTLLPDDTITSGLVE





ASTPTHSSTGSLHTTLTPAS





STSAGLQEESTTFQSWPSS





SDTTPSPPGTTAAPVEVST





TYHSRPSSTPTTHFSASSTT





LGRSEESTTVHSSPGATGT





ALFPTRSATSVLVGEPTTSP





ISSGSTETTALPGSTTTAGLS





EKSTTFYSSPRSPDTTLSPAS





TTSSGVSEESTTSHSRPGST





HTTAFPGSTTMPGVSQEST





ASHSSPGSTDTTLSPGSTTA





SSLGPESITFHSSPGSTETT





LLPDNTTASGLLEASTPVHS





STGSPHTTLSPAGSTTRQG





ESTTFQSWPSSKDTMPAP





PTTTSAFVELSTTSHGSPSS





TPTTHFSASSTTLGRSEEST





TVHSSPVATATTPSPARSTT





SGLVEESTAYHSSPGSTQT





MHFPESSTASGRSEESRTS





HSSTTHTISSPPSTTSALVEE





PTSYHSSPGSTATTHFPDSS





TTSGRSEESTASHSSQDAT





GTIVLPARSTTSVLLGESTTS





PISSGSMETTALPGSTTTPG





LSEKSTTFHSSPRSPATTLSP





ASTTSSGVSEESTTSHSRPG





STHTTAFPDSTTTPGLSRHS





TTSHSSPGSTDTTLLPASTT





TSGPSQESTTSHSSPGSTDT





ALSPGSTTALSFGQESTTFH





SSPGSTHTTLFPDSTTSSGI





VEASTRVHSSTGSPRTTLSP





ASSTSPGLQGESTAFQTHP





ASTHTTPSPPSTATAPVEES





TTYHRSPGSTPTTHFPASST





TSGHSEKSTIFHSSPDASGT





TPSSAHSTTSGRGESTTSRI





SPGSTEITTLPGSTTTPGLSE





ASTTFYSSPRSPTTTLSPAS





MTSLGVGEESTTSRSQPGS





THSTVSPASTTTPGLSEEST





TVYSSSPGSTETTVFPRTPT





TSVRGEEPTTFHSRPASTH





TTLFTEDSTTSGLTEESTAFP





GSPASTQTGLPATLTTADL





GEESTTFPSSSGSTGTTLSP





ARSTTSGLVGESTPSRLSPS





STETTTLPGSPTTPSLSEKST





TFYTSPRSPDATLSPATTTS





SGVSEESSTSHSQPGSTHT





TAFPDSTTTPGLSRHSTTSH





SSPGSTDTTLLPASTTTSGP





SQESTTSHSSPGSTDTALSP





GSTTALSFGQESTTFHSSPG





STHTTLFPDSTTSSGIVEAST





RVHSSTGSPRTTLSPASSTS





PGLQGESTTFQTHPASTHT





TPSPPSTATAPVEESTTYHR





SPGSTPTTHFPASSTTSGHS





EKSTIFHSSPDASGTTPSSA





HSTTSGRGESTTSRISPGST





EITTLPGSTTTPGLSEASTTF





YSSPRSPTTTLSPASMTSLG





VGEESTTSRSQPGSTHSTV





SPASTTTPGLSEESTTVYSSS





PGSTETTVFPRSTTTSVRGE





EPTTFHSRPASTHTTLFTED





STTSGLTEESTAFPGSPAST





QTGLPATLTTADLGEESTTE





PSSSGSTGTTLSPARSTTSG





LVGESTPSRLSPSSTETTTLP





GSPTTPSLSEKSTTFYTSPRS





PDATLSPATTTSSGVSEESS





TSHSQPGSTHTTAFPDSTT





TSGLSQEPTASHSSQGSTE





ATLSPGSTTASSLGQQSTTF





HSSPGDTETTLLPDDTITSG





LVEASTPTHSSTGSLHTTLT





PASSTSAGLQEESTTFQSW





PSSSDTTPSPPGTTAAPVE





VSTTYHSRPSSTPTTHFSAS





STTLGRSEESTTVHSSPGAT





GTALFPTRSATSVLVGEPTT





SPISSGSTETTALPGSTTTA





GLSEKSTTFYSSPRSPDTTLS





PASTTSSGVSEESTTSHSRP





GSTHTTAFPGSTTMPGVS





QESTASHSSPGSTDTTLSP





GSTTASSLGPESTTFHSGPG





STETTLLPDNTTASGLLEAS





TPVHSSTGSPHTTLSPAGST





TRQGESTTFQSWPNSKDT





TPAPPTTTSAFVELSTTSHG





SPSSTPTTHFSASSTTLGRS





EESTTVHSSPVATATTPSPA





RSTTSGLVEESTTYHSSPGS





TQTMHFPESDTTSGRGEES





TTSHSSTTHTISSAPSTTSAL





VEEPTSYHSSPGSTATTHFP





DSSTTSGRSEESTASHSSQ





DATGTIVLPARSTTSVLLGE





STTSPISSGSMETTALPGST





TTPGLSEKSTTFHSSPRSPA





TTLSPASTTSSGVSEESTTS





HSRPGSTHTTAFPDSTTTP





GLSRHSTTSHSSPGSTDTTL





LPASTTTSGSSQESTTSHSS





SGSTDTALSPGSTTALSFG





QESTTFHSSPGSTHTTLFPD





STTSSGIVEASTRVHSSTGS





PRTTLSPASSTSPGLQGEST





AFQTHPASTHTTPSPPSTA





TAPVEESTTYHRSPGSTPTT





HFPASSTTSGHSEKSTIFHS





SPDASGTTPSSAHSTTSGR





GESTTSRISPGSTEITTLPGS





TTTPGLSEASTTFYSSPRSP





TTTLSPASMTSLGVGEESTT





SRSQPGSTHSTVSPASTTTP





GLSEESTTVYSSSPGSTETT





VFPRSTTTSVRREEPTTFHS





RPASTHTTLFTEDSTTSGLT





EESTAFPGSPASTQTGLPA





TLTTADLGEESTTFPSSSGS





TGTKLSPARSTTSGLVGEST





PSRLSPSSTETTTLPGSPQP





SLSEKSTTFYTSPRSPDATLS





PATTTSSGVSEESSTSHSQP





GSTHTTAFPDSTTTSGLSQ





EPTTSHSSQGSTEATLSPGS





TTASSLGQQSTTFHSSPGD





TETTLLPDDTITSGLVEASTP





THSSTGSLHTTLTPASSTST





GLQEESTTFQSWPSSSDTT





PSPPSTTAVPVEVSTTYHSR





PSSTPTTHFSASSTTLGRSE





ESTTVHSSPGATGTALFPTR





SATSVLVGEPTTSPISSGSTE





TTALPGSTTTAGLSEKSTTF





YSSPRSPDTTLSPASTTSSG





VSEESTTSHSRPGSMHTTA





FPSSTTMPGVSQESTASHS





SPGSTDTTLSPGSTTASSLG





PESTTEHSSPGSTETTLLPD





NTTASGLLEASTPVHSSTGS





PHTTLSPAGSTTRQGESTT





FQSWPNSKDTTPAPPTTTS





AFVELSTTSHGSPSSTPTTH





FSASSTTLGRSEESTTVHSS





PVATATTPSPARSTTSGLVE





ESTTYHSSPGSTQTMHFPE





SNTTSGRGEESTTSHSSTTH





TISSAPSTTSALVEEPTSYHS





SPGSTATTHFPDSSTTSGRS





EESTASHSSQDATGTIVLPA





RSTTSVLLGESTTSPISSGS





METTALPGSTTTPGLSEKST





TFHSSPSSTPTTHFSASSTTL





GRSEESTTVHSSPVATATTP





SPARSTTSGLVEESTAYHSS





PGSTQTMHFPESSTASGRS





EESRTSHSSTTHTISSPPSTT





SALVEEPTSYHSSPGSIATT





HFPESSTTSGRSEESTASHS





SPDTNGITPLPAHFTTSGRI





AESTTFYISPGSMETTLAST





ATTPGLSAKSTILYSSSRSPD





QTLSPASMTSSSISGEPTSL





YSQAESTHTTAFPASTTTSG





LSQESTTFHSKPGSTETTLS





PGSITTSSFAQEFTTPHSQP





GSALSTVSPASTTVPGLSEE





STTFYSSPGSTETTAFSHSN





TMSIHSQQSTPFPDSPGFT





HTVLPATLTTTDIGQESTAF





HSSSDATGTTPLPARSTAS





DLVGEPTTFYISPSPTYTTLF





PASSSTSGLTEESTTFHTSPS





FTSTIVSTESLETLAPGLCQE





GQIWNGKQCVCPQGYVG





YQCLSPLESFPVETPEKLNA





TLGMTVKVTYRNFTEKMN





DASSQEYQNFSTLFKNRM





DVVLKGDNLPQYRGVNIR





RLLNGSIVVKNDVILEADYT





LEVEELFENLAEIVKAKIMN





ETRTTLLDPDSCRKAILCYSE





EDTFVDSSVTPGFDFQEQC





TQKAAEGYTQFYYVDVLD





GKLACVNKCTKGTKSQMN





CNLGTCQLQRSGPRCLCPN





TNTHWYWGETCEFNIAKS





LVYGIVGAVMAVLLLALIILI





ILFSLSQRKRHREQYDVPQ





EWRKEGTPGIFQKTAIWE





DQNLRESRFGLENAYNNF





RPTLETVDSGTELHIQRPE





MVASTV






SEQ ID NO: 1937
ENSG00000205744.5
MESRAEGGSPAVFDWFFE
A*02:03, A*11:01, A*11:02, 




AACPASLQEDPPILRQFPP
A*24:10, A*33:03, B*15:01, 




DFRDQEAMQMVPKFCFP
B*39:01, B*40:01, B*55:02, 




FDVEREPPSPAVQHFTFAL
B*58:01, C*03:02, C*03:04, 




TDLAGNRRFGFCRLRAGT
C*14:02




QSCLCILSHLPWFEVFYKLL





NTVGDLLAQDQVTEAEELL





QNLFQQSLSGPQASVGLEL





GSGVTVSSGQGIPPPTRGN





SKPLSCFVAPDSGRLPSIPE





NRNLTELVVAVTDENIVGL





FAALLAERRVLLTASKLSTLT





SCVHASCALLYPMRWEHV





LIPTLPPHLLDYCCAPMPYL





IGVHASLAERVREKALEDV





VVLNVDANTLETTFNDVQ





ALPPDVVSLLRLRLRKVALA





PGEGVSRLFLKAQALLFGG





YRDALVCSPGQPVTFSEEV





FLAQKPGAPLQAFHRRAV





HLQLFKQFIEARLEKLNKGE





GFSDQFEQEITGCGASSGA





LRSYQLWADNLKKGGGAL





LHSVKAKTQPAVKNMYRS





AKSGLKGVQSLLMYKDGD





SVLQRGGSLRAPALPSRSD





RLQQRLPITQHFGKNRPLR





PSRRRQLEEGTSEPPGAGT





PPLSPEDEGCPWAEEALDS





SFLGSGEELDLLSEILDSLSM





GAKSAGSLRPSQSLDCCHR





GDLDSCFSLPNIPRWQPD





DKKLPEPEPQPLSLPSLQN





ASSLDATSSSKDSRSQLIPS





ESDQEVTSPSQSSTASADP





SIWGDPKPSPLTEPLILHLT





PSHKAAEDSTAQENPTPW





LSTAPTEPSPPESPQILAPTK





PNFDIAWTSQPLDPSSDPS





SLEDPRARPPKALLAERAHL





QPREEPGALNSPATPTSNC





QKSQPSSRPRVADLKKCFE





G






SEQ ID NO: 1938
ENSG00000213420.3
MSALRPLLLLLLPLCPGPGP
A*02:03, A*11:01, A*11:02, 




GPGSEAKVTRSCAETRQVL
A*24:02, A*24:10, A*33:03, 




GARGYSLNLIPPALISGEHL
B*15:01, B*15:27, B*38:02, 




RVCPQEYTCCSSETEQRLIR
B*39:01, B*40:01, B*58:01, 




ETEATFRGLVEDSGSFLVHT
C*03:02, C*03:04, C*12:02, 




LAARHRKFDEFFLEMLSVA
C*14:02, C*15:02




QHSLTQLFSHSYGRLYAQH





ALIFNGLFSRLRDFYGESGE





GLDDTLADFWAQLLERVF





PLLHPQYSFPPDYLLCLSRL





ASSTDGSLQPFGDSPRRLR





LQITRTLVAARAFVQGLET





GRNVVSEALKVPVSEGCSQ





ALMRLIGCPLCRGVPSLMP





CQGFCLNVVRGCLSSRGLE





PDWGNYLDGLLILADKLQ





GPFSFELTAESIGVKISEGL





MYLQENSAKVSAQVFQEC





GPPDPVPARNRRAPPPRE





EAGRLWSMVTEEERPTTA





AGTNLHRLVWELRERLAR





MRGFWARLSLTVCGDSR





MAADASLEAAPCWTGAG





RGRYLPPVVGGSPAEQVN





NPELKVDASGPDVPTRRRR





LQLRAATARMKTAALGHD





LDGQDADEDASGSGGGQ





QYADDWMAGAVAPPARP





PRPPYPPRRDGSGGKGGG





GSARYNQGRSRSGGASIGF





HTQTILILSLSALALLGPR






SEQ ID NO: 1939
ENSG00000225485.3
MNGVAFCLVGIPPRPEPRP
A*02:03, A*11:01, A*11:02, 




PQLPLGPRDGCSPRRPFP
A*24:02, A*24:07, A*24:10, 




WQGPRTLLLYKSPQDGFG
B*15:01, B*39:01, B*40:01, 




FTLRHFIVYPPESAVHCSLK
B*55:02, B*58:01, C*03:02, 




EEENGGRGGGPSPRYRLEP
C*03:04, C*03:67, C*12:02, 




MDTIFVKNVKEDGPAHRA
C*14:02, C*15:02




GLRTGDRLVKVNGESVIGK





TYSQVIALIQNSDDTLELSI





MPKDEDILQLAYSQDAYLK





GNEPYSGEARSIPEPPPICY





PRKTYAPPARASTRATMVP





EPTSALPSDPRSPAAWSDP





GLRVPPAARAHLDNSSLG





MSQPRPSPGAFPHLSSEPR





TPRAFPEPGSRVPPSRLEC





QQALSHWLSNQVPRRAG





ERRCPAMAPRARSASQDR





LEEVAAPRPWPCSTSQDAL





SQLGQEGWHRARSDDYLS





RATRSAEALGPGALVSPRF





ERCGWASQRSSARTPACP





TRDLPGPQAPPPSGLQGL





DDLGYIGYRSYSPSFQRRT





GLLHALSFRDSPFGGLPTF





NLAQSPASFPPEASEPPRV





VRPEPSTRALEPPAEDRGD





EVVLRQKPPTGRKVQLTPA





RQMNLGFGDESPEPEASG





RGERLGRKVAPLATTEDSL





ASIPFIDEPTSPSIDLQAKHV





PASAVVSSAMNSAPVLGT





SPSSPTFTFTLGRHYSQDCS





SIKAGRRSSYLLAITTERSKS





CDDGLNTFRDEGRVLRRLP





NRIPSLRMLRSFFTDGSLDS





WGTSEDADAPSKRHSTSD





LSDATFSDIRREGWLYYKQI





LTKKGKKAGSGLRQWKRV





YAALRARSLSLSKERREPGP





AAAGAAAAGAGEDEAAPV





CIG






SEQ ID NO: 1940
ENSG00000243449.2
MFRAALEDSVEKKSSLKET
A*02:03, A*24:10, A*33:03, 




ETTSKGTSKYDRERETEMK
B*27:04, B*38:02, B*39:01, 




TVMGMKMHFWVRTPAS
B*40:01, C*01:02, C*03:02, 




GRGRGGSDHARSRAAPLP
C*03:04, C*03:67, C*04:01, 




LLA
C*07:02, C*14:02, C*15:02





SEQ ID NO: 1941
ENSG00000261787.1
MDRGRPAGSPLSASAEPA
A*02:03, A*24:02, A*24:10, 




PLAAAIRDSRPGRTGPGPA
A*33:03, B*40:01, C*03:02, 




GPGGGSRSGSGRPAAANA
C*03:04, C*12:02, C*14:02




ARERSRVQTLRHAFLELQR





TLPSVPPDTKLSKLDVLLLA





TTYIAHLTRSLQDDAEAPA





DAGLGALRGDGYLHPVKK





WPMRSRLYIGATGQFLKH





SVSGEKTNHDNTPTDSQP
















TABLE 10







Peptide pools for alternative promoters












Peptide
Alternative

Corresponding


SEQ ID NO.
Pool
Promoter
Peptide Sequence
HLA variant














SEQ ID NO:
1
DNAH3
MAEKLQEANFLLEDI
A*02:01


1942









SEQ ID NO.


QYSHIADKVSEVPAN
A*02:03


1943









SEQ ID NO:


FLKKSSAVTVKLRR
A*03:01


1944









SEQ ID NO:


PKLKYIPLKFSFTAA
A*24:02


1945









SEQ ID NO:


EHLHTVNPMMLRLKE
A*33:03


1946









SEQ ID NO:


VSDFLIQTFKVFQKN
B*15:01


1947









SEQ ID NO:


DNTAEQNIAAFLKEN
B*40:01


1948









SEQ ID NO:


VNPMMLRLKELWFAE
B*58:01


1949









SEQ ID NO:


KTSLTFPGSRPMSPE
C*03:02


1950









SEQ ID NO:


IEEYFASVASFMSLQ
C*14:02


1951









SEQ ID NO:


NEIASMNITVPLAMF
C*15:02


1952









SEQ ID NO:
2
DST
NPKLTLGLIWTIILH
A*02:01


1953









SEQ ID NO:


FTKWINQHLMKVRKH
A*02:03


1954









SEQ ID NO:


ERDKVQKKTFTKWIN
A*03:01


1955









SEQ ID NO:


ISLLEVLSGDTLPRE
B*40:01


1956









SEQ ID NO:


MAGYLSPAAYLYVEE
C*03:02


1957









SEQ ID NO:


MAGYLSPAAYLYVE
C*14:02


1958









SEQ ID NO:
3
EPS8L1
ADVSQYPVNHLVTFC
A*02:01


1959









SEQ ID NO:


EVDILNHVFDDVESF
A*02:03


1960









SEQ ID NO:


MSTATGPEAAPKPSA
A*11:01


1961









SEQ ID NO:


AQPDVHFFQGLRLGA
A*33:03


1962









SEQ ID NO:


ILNHVFDDVESFVSR
B*15:02


1963









SEQ ID NO:


VSQYPVNHLVTFCLG
B*35:03


1964









SEQ ID NO:


PASKEELESYPLGAI
B*40:01


1965









SEQ ID NO:


EPERAQPDVHFFQGL
B*58:01


1966









SEQ ID NO:
4
FRMD4B
VEDLLFSGSRFVWNL
A*02:01


1967









SEQ ID NO:


LLDLVASHFNLKEKE
A*11:01


1968









SEQ ID NO:


TVSTLRRWYTERLRA
A*33:03


1969









SEQ ID NO:


QIEVESETIFKLAAF
B*40:01


1970









SEQ ID NO:


VWNLTVSTLRRWYTE
B*58:01


1971









SEQ ID NO:


AVRFYIESISFLKDK
C*07:02


1972









SEQ ID NO:
5
LAMA3
AEGVLLDYLVLLPRD
A*02:01


1973









SEQ ID NO:


SRIAMYELLADADIQ
A*02:03


1974









SEQ ID NO:


RTNTLLGHLISKAQR
A*03:01


1975









SEQ ID NO:


VIHFYQAAHPTFPAQ
A*24:02


1976









SEQ ID NO:


TKATNIRLRFLRTNT
A*33:03


1977









SEQ ID NO:


YAQMTSVQNDVRITL
A*68:01


1978









SEQ ID NO:


CLLYQHLPVTRFPCT
B*15:01


1979









SEQ ID NO:


DKVSSYGGYLTYQAK
B*15:02


1980









SEQ ID NO:


LSGREVELHLRLRIP
B*40:01


1981









SEQ ID NO:


LHKKSMDKSLEFITN
B*58:01


1982









SEQ ID NO:


DGYFALEKSNYFGCQ
C*03:02


1983









SEQ ID NO:


ENNYYFPDLHHMKYE
C*07:02


1984









SEQ ID NO:


ILRYVNPGTEAVSGH
C*12:02


1985









SEQ ID NO:


ADPFSITPGIWVACI
C*15:02


1986









SEQ ID NO:
6
MET
QNVILHEHHIFLGAT
A*02:01


1987









SEQ ID NO:


CKEALAKSEMNVNMK
A*02:03


1988









SEQ ID NO:


MDRSAMCAFPIKYVN
A*11:01


1989









SEQ ID NO:


TDQVIDVLPEFRDS
A*24:02


1990









SEQ ID NO:


LDAQTFHTRIIRFCS
A*33:03


1991









SEQ ID NO:


SNNFIYFLTVQRETL
A*68:01


1992









SEQ ID NO:


KDGFMFLTDQAYIDV
B*15:01


1993









SEQ ID NO:


RDSYPIKYVHAFESN
B*35:03


1994









SEQ ID NO:


QKVAEYKTGPVLEHP
B*40:01


1995









SEQ ID NO:


CSSKANLSGGVWKDN
B*58:01


1996









SEQ ID NO:


RDEYRTEFTTALQRV
C*07:02


1997









SEQ ID NO:


TINSSYFPDHPLHSI
C*12:03


1998









SEQ ID NO:


PMDRSAMCAFPIKYV
C*15:02


1999









SEQ ID NO:
7
MIB2
GASGIVEVLTEVPNI
A*02:01


2000









SEQ ID NO:


QGFTLLHHASLKGHA
A*03:01


2001









SEQ ID NO:


ENKSSLSVALDKLRA
A*11:01


2002









SEQ ID NO:


QVAAYLGQVELIRLL
A*24:02


2003









SEQ ID NO:


TALHLAALNNHREVA
A*33:03


2004









SEQ ID NO:


CVGEAAGGFYYKDHL
A*68:01


2005









SEQ ID NO:


LQRRVSADSQFFQHG
B*15:01


2006









SEQ ID NO:


GNLRVAVAGQRWTFS
B*58:01


2007









SEQ ID NO:


EDGFTALHLAALNNH
C*03:02


2008









SEQ ID NO:


GGFYYKDHLPRLGKP
C*07:02


2009









SEQ ID NO:
8
MRC2
DSCYQFNFQSTLSWR
A*02:01


2010









SEQ ID NO:


TDGSIINFISWAPGK
A*02:03


2011









SEQ ID NO:


RDCSIALPYVCKKKP
A*11:01


2012









SEQ ID NO:


EWLRFQEAEYKFFEH
A*24:02


2013









SEQ ID NO:


SGDEVMYTHWNRDQP
A*33:03


2014









SEQ ID NO:


RFEQAFVSSLIYNWE
B*15:02


2015









SEQ ID NO:


GWTWHSPSCYWLGED
B*38:02


2016









SEQ ID NO:


TNRFEQAFVSSLIYN
B*40:01


2017









SEQ ID NO:


QGRREWLRFQEAEYK
B*40:06


2018









SEQ ID NO:


LCALPYHEVYTIQGN
B*51:01


2019









SEQ ID NO:


CPIKSNDCETFWDKD
B*58:01


2020









SEQ ID NO:


GGCVALATGSAMGLW
C*03:02


2021









SEQ ID NO:


EGEYFWTALQDLNST
C*14:02


2022









SEQ ID NO:
9
NOS2
PDELLPQAIEFVNQY
A*02:01


2023









SEQ ID NO:


SKSCLGSIMTPKSLT
A*11:01


2024









SEQ ID NO:


VKLDATPLSSPRHVR
A*68:01


2025









SEQ ID NO:


IGRIQWSNLQVFDAR
B*15:01


2026









SEQ ID NO:


AIEFVNQYYGSFKEA
B*15:02


2027









SEQ ID NO:


TKEIETTGTYQLTGD
B*40:01


2028









SEQ ID NO:


MACPWKFLFKTK
B*58:01


2029









SEQ ID NO:
10
PLEC
RPRSLHPHVPGVTNL
A*02:01


2030









SEQ ID NO:


MVAGMLMPRDQL
A*11:01


2031









SEQ ID NO:


HLRQYLHLPPEIVPA
A*24:02


2032









SEQ ID NO:


RETFAWCHFYWYLTN
C*03:02


2033









SEQ ID NO:
11
PLEKHG5
KKKSLGEVLLPVFER
A*02:01


2034









SEQ ID NO:


LWASVMAPVLEKARR
A*03:01


2035









SEQ ID NO:


LHTEASYIRKLRVII
A*33:03


2036









SEQ ID NO:


SLGEVLLPVFERKGI
A*68:01


2037









SEQ ID NO:


WKNRAASRFSGFFSS
B*15:01


2038









SEQ ID NO:


KNMSEFLGEASIPGQ
B*40:01


2039









SEQ ID NO:


GSSGSTNTGDSWKNR
B*58:01


2040









SEQ ID NO:


TFEAYRFGGHYLRVK
C*14:02


2041









SEQ ID NO:
12
PTGDS
THHTLWMGLALLGVL
A*02:01


2042









SEQ ID NO:


HTLWMGLALLGVLGD
A*02:03


2043









SEQ ID NO:


APEAQVSVQPNFQQD
B*15:01


2044









SEQ ID NO:


MATHHTLWMGLA
C*03:02


2045









SEQ ID NO:
13
RASA3
GPSKMRDCYCTVNLD
A*02:03


2046









SEQ ID NO:


EIPRSFRHLSFYIFD
A*03:01


2047









SEQ ID NO:


RYTAVSSFIFLRFFA
A*11:01


2048









SEQ ID NO:


FKESYMATFYEFFNE
A*24:02


2049









SEQ ID NO:


LSFYIFDRDVFRRDS
A*33:03


2050









SEQ ID NO:


KESYMATFYEFFNEQ
B*15:01


2051









SEQ ID NO:


DADSEVQGKVHLELR
B*40:01


2052









SEQ ID NO:


DVRYTAVSSFIFLRF
B*58:01


2053









SEQ ID NO:


DHVFSSDYYSPLRDL
C*03:02


2054









SEQ ID NO:


GEDFYCEIPRSFRHL
C*07:02


2055









SEQ ID NO:


SSDYYSPLRDLLLKS
C*14:02


2056









SEQ ID NO:
14
TRPM2
HSKLQMHHVAQVLRE
A*02:03


2057









SEQ ID NO:


RLKSIFRRGLVKVAQ
A*03:01


2058









SEQ ID NO:


HPTMTAALISNKPEF
A*11:01


2059









SEQ ID NO:


LLGDFTQPLYPRPRH
A*3303


2060









SEQ ID NO:


ECGLMKKAALYFSDF
B*15:01


2061









SEQ ID NO:


VQLKEFYTWDTLLYL
B*40:01


2062









SEQ ID NO:


MKKAALYFSDFWNKL
B*58:01


2063









SEQ ID NO: 


HVTFTMDPIRDLLIW
C*12:02


2064









SEQ ID NO:


AALYFSDFWNKLDVG
C*14:02


2065









SEQ ID NO:
15
IKZF3
SAAVLNDYSLTKSHE
A*03:01


2066









SEQ ID NO:


LERHVVSFDSSRPTS
A*33:03


2067









SEQ ID NO:


LNDYSLTKSHEMENV
C*03:02


2068









To explore if somatic promoters might contribute to reducing tumor antigen burden and immunoreactivity in vivo, we proceeded to examine correlations between promoter alterations and intra-tumor T-cell activity in various primary GC cohorts. First, to detect promoter alterations in a cohort of 95 GC-normal pairs (SG cohort), we generated a customized Nanostring panel targeting the top 95 recurrent GC somatic promoters, measuring transcripts associated with either the canonical promoter or the alternative promoter. There was a significant correlation between the Nanostring data and RNA-seq (FIG. 16, r=0.65, P<0.001), with ˜35% of transcripts driven by alternate promoters upregulated in more than half of the GCs (FIG. 4D). Second, to examine markers of T-cell activity in these same GC samples, we analyzed previously published microarray data to measure CD8A (a measure of CD8+ tumor infiltrating lymphocytes), and granzyme A (GZMA) and perforin (PRF1), which are both T-cell effectors and validated markers of T-cell cytolytic activity. We confirmed that these three genes (CD8A, GZMA, and PRF1) were not themselves associated with somatic promoters. Comparing the top and bottom quartiles, GCs with high somatic promoter usage exhibited significantly lower GZMA and PRF1 levels (P<0.001 and P=0.01, Wilcoxon Test) indicating lower T-cell cytolytic activity (FIG. 4E, top left), and also a trend towards lower CD8A levels (P=0.14, Wilcoxon one sided test). Using two different algorithms (ASCAT and ESTIMATE), we further confirmed that the decreased GZMA and PRF1 levels are independent of tumor purity differences between GCs (FIG. 16). Similar results were obtained upon splitting the GC samples based on median promoter usage score (GZMA, P<0.001 and PRF1, P=0.03). Patients with GCs exhibiting high somatic promoter usage (top 25%) also showed poor survival compared to patients with GCs with low somatic promoter usage (bottom 25%) (FIG. 4e top right, HR 2.55, P=0.02). Again, dividing patients by their median somatic promoter usage score also showed similar survival differences (FIG. 11, HR=1.81, P=0.04).


To validate these findings, we then analyzed two other prominent GC cohorts—one from TCGA, and another from the Asian Cancer Research Group (ACRG). In the TCGA cohort, availability of RNA-seq data allowed us to infer somatic promoter usage directly from next-generation sequencing (NGS) data (FIG. 2c). Similar to the Singapore cohort, TCGA GCs with high somatic promoter usage (top 25%) exhibited decreased CD8A (P=0.002, Wilcoxon one sided test), GZMA (P=0.001, Wilcoxon one sided test) and PRF1 levels (P=0.005, Wilcoxon one sided test, FIG. 4e bottom left) compared to GCs with low somatic promoter usage (bottom 25%) in a manner independent of tumor purity (FIG. 16). Notably, as previous studies have suggested that somatic mutation burden may also correlate with intra-tumor T-cell cytolytic response, we further repeated the analysis after adjusting for the total number of missense mutations in each sample using a regression based approach. Even after correcting for somatic mutation burden, we still observed decreased CD8A (P=0.02, Wilcoxon one sided test), GZMA (P=0.01, Wilcoxon one sided test) and PRF1 expression (P=0.03, Wilcoxon one sided test) in samples with high somatic promoter usage (top 25% against bottom 25%) (FIG. 11).


We leveraged a third independent cohort of GC samples from ACRG. Using NanoString to target 89 canonical and alternative promoters along with various immune markers, we profiled 264 primary GC samples from the ACRG cohort. 40% of alternative promoter transcripts showed tumor specific expression in more than half of the samples (FIG. 11). Once again, samples with high somatic promoter usage (top 25%) showed significantly lower expression of T-cell cytolytic activity markers including CD8A (P=0.035, Wilcoxon one sided test), CD4A (P=0.005, Wilcoxon one sided test), GZMA (P=0.001, Wilcoxon one sided test) and PRF1 (P=0.025, Wilcoxon one sided test) (FIG. 4e, bottom right) (FIG. 16). Similar results were obtained upon splitting the GC samples based on median promoter usage score (Table 11) Also, after adjusting for mutational burden (for cases where information is available), samples with high somatic promoter usage still showed decreased CD8A (P=0.167, Wilcoxon one sided test), GZMA (P=0.009, Wilcoxon one sided test), and PRF1 (P=0.03, Wilcoxon one sided test) expression (FIG. 11). Taken collectively, these results, observed across multiple GC cohorts and assessed using diverse technologies (microarray, RNA-seq, Nanostring) all support a significant association between somatic promoter usage and reduced tumor immunity levels. Importantly, the decreased levels of T-cell cytolytic activity associated with somatic promoter usage are likely independent of tumor purity and mutational load.









TABLE 11







P values of Wilcoxon test between ACRG samples with


high and low somatic promoter usage.












Top and Bottom
Divided by median



Immune Marker
25 pctl
(50 pctl)















CD4A
0.01151
0.06053



CD8A
0.07829
0.02482



CTLA4
0.2048
0.2952



FOXP3
0.1054
0.1673



GZMA
0.002593
0.005957



IFNg
0.2376
0.8045



IL-10
0.8391
0.9311



LAG3
0.1672
0.2627



PD1
0.1192
0.1506



PDL1
0.5668
0.5869



PRF1
0.01272
0.05873



TIM3
0.578
0.9424



TNFA
0.1394
0.7184







* All P values are from Wilcoxon two sided test






Somatic Promoter Associated Peptides are Immunogenic In Vitro


To functionally test the ability of N-terminal peptides depleted in GC to elicit immune responses, we conducted in-vitro assays using the high-throughput EPIMAX (EPItope MAXimum) platform, which allows multi-epitope testing for both T cell proliferation and cytokine production. First, we identified N terminal peptides predicted to exhibit high HLA-binding affinities across a pool of healthy PBMC (peripheral blood mononuclear cell) donors. Second, selecting 15 alternative promoter-associated peptides for testing, we generated peptide pools for each peptide (Tables 9 and 10, Methods), which were then used to stimulate PBMCs from 9 healthy donors. T cell proliferation and cytokine production levels were measured and benchmarked against control peptides (Table 12). Across all 135 exposures (15 peptides across 9 donors), we observed strong cytokine responses for 79 peptide pools (58%; FC-2 relative to Actin peptides) (FIG. 4g) inducing complex Th1, Th2 and Th17 polarizations in a donor dependent fashion (FIG. 17).









TABLE 12







Cytokine Responses of N terminal Peptides















Fold






change






of total






cytokine






response






(normal-






ized




Analyte concentration (pg/ml)
Total
against
























Treat-
GM-
IFN-
IL-
IL-
IL-
IL-
IL-
IL-
IL-
IL-
IL-


analytes
Actin


Sample
ment
CSF
g
2
3
4
7
9
10
13
15
17A
sCD40L
TNFa
(pg/ml)
control)


























Donor 1
DNAH3
99.39
228.45
89
6.35
2.12
0.085
7.32
24.91
228.24
0.925
1.88
4.47
264.89
958.03
2.89


Donor 1
DST
114.18
149.87
58.02
11.41
0.03
0.085
14.11
57.29
311.22
0.925
1.58
8.97
251.98
979.67
2.96


Donor 1
EPS8L1
153.07
351.34
100.97
11.8
0.03
0.085
28.88
33.71
431.94
0.925
0.02
6.17
434.22
1553.16
4.69


Donor 1
FRMD4B
55.53
121.17
76.42
10.54
0.03
1.43
16.77
36.13
198.37
0.925
0.93
3.76
186.12
708.13
2.14


Donor 1
LAMA3
67.29
152.66
99.6
4.83
1.72
0.085
9.11
25.85
264.85
0.925
0.02
2.8
506.25
1135.99
3.43


Donor 1
MET
54.4
93.08
96.36
6.27
0.03
0.085
5.52
25.85
179.02
0.925
0.02
3.76
606.67
1071.99
3.23


Donor 1
MIB2
97.14
201.48
94.37
5.92
0.03
0.085
18.62
27
381.6
0.925
0.67
1.81
684.34
1513.99
4.57


Donor 1
MRC2
52.57
63.61
53.15
5.58
0.03
0.085
3.32
37.5
184.11
0.925
0.76
1.81
290.69
694.14
2.09


Donor 1
NOS2
31.72
130.64
26.25
3.51
0.03
0.085
5.04
28.47
133.76
0.925
0.02
1.62
154.92
516.99
1.56


Donor 1
PLEC
107.71
393.6
96.29
14.5
10.68
0.085
27.93
59.1
413.41
0.925
0.02
7.78
337.55
1469.58
4.43


Donor 1
PLEKHG5
74.89
128.23
96.23
9.37
3.33
0.085
9.16
40.97
207.45
0.925
4.22
3.64
236.32
814.82
2.46


Donor 1
PTGDS
29.12
223.36
63.06
2.73
0.03
0.085
10.02
48.05
254.29
0.925
0.02
0.01
395.74
1027.44
3.10


Donor 1
RASA3
33.95
50.06
58.28
3.84
0.03
0.085
8.6
39.39
196.78
0.925
0.02
0.01
157.88
549.85
1.66


Donor 1
TRPM2
121.32
323.62
90.23
6.24
2.53
0.085
18.26
51.65
368.92
0.925
0.02
7.61
428.91
1420.32
4.29


Donor 1
IKZF3
9.53
59.94
23.36
0.94
0.03
0.085
1.22
42.98
76.06
0.925
0.02
0.01
48.83
263.93
0.80


Donor 1
Actin
19.75
147.18
34.21
1.46
0.03
0.085
1.22
10.1
14.2
0.925
0.02
0.78
101.44
331.40
1.00


Donor 2
DNAH3
279.27
1324.9
24
0.5
0.03
0.085
1.22
18.44
156.05
0.925
2.26
4.59
130.71
1942.98
28.04


Donor 2
DST
773.57
6732.16
46.6
2
0.03
0.085
1.22
23.76
370.78
0.925
2.56
3.88
257.33
8214.90
118.57


Donor 2
EPS8L1
427.99
1030.19
85.97
3.33
4.33
0.085
18.4
21.15
386.22
0.925
0.76
4.3
167.42
2151.07
31.05


Donor 2
FRMD4B
390.31
1070.19
94.99
3.93
10.28
1.27
1.22
19.9
415.04
0.925
0.02
5.24
159.4
2172.72
31.36


Donor 2
LAMA3
358.14
643.22
67.18
2.34
0.03
0.085
1.22
11.66
362.67
0.925
0.02
0.17
109.58
1557.24
22.48


Donor 2
MET
302.2
256.37
64.56
1.53
0.91
0.085
1.22
14.16
312.32
0.925
2.39
4.24
84.79
1045.70
15.09


Donor 2
MIB2
173.84
141.37
17.97
0.73
0.03
0.085
1.22
13.23
153.31
0.925
0.02
0.65
61.99
565.37
8.16


Donor 2
MRC2
1401.1
5545.58
205.47
5.98
6.32
0.085
13.83
14.06
889.87
0.925
6.68
4.59
531.62
8626.11
124.50


Donor 2
NOS2
342.89
462.07
83.01
2.88
10.88
2.29
15.36
21.57
288.7
0.925
5.91
3.82
89.68
1329.99
19.20


Donor 2
PLEC
280.02
357.65
74.41
2.44
0.03
0.085
19.79
24.07
343.1
0.925
5.46
2.49
83.91
1194.38
17.24


Donor 2
PLEKHG5
236.12
757.03
103.14
2.69
4.13
0.085
1.22
24.39
155.22
0.925
1.54
6.63
89.39
1382.51
19.95


Donor 2
PTGDS
142.7
621.5
33.17
1.39
0.03
0.17
1.22
13.75
63.73
0.925
2.39
4.83
57.06
942.87
13.61


Donor 2
RASA3
630.2
2755.29
67.63
0.98
4.53
0.085
15.24
36.44
363.46
0.925
0.02
3.28
281.27
4159.35
60.03


Donor 2
TRPM2
495.45
1211.48
60.61
2.96
0.03
0.085
2.44
5.29
542.44
0.925
0.02
3.28
143.48
2468.49
35.63


Donor 2
IKZF3
427.38
1705.57
71.33
1.36
0.03
0.085
21.04
43.4
419.93
0.925
0.02
4.77
116.74
2812.58
40.59


Donor 2
Actin
15.58
7.71
11.28
0.76
0.03
1.73
1.22
5.29
13.75
0.925
0.02
1.81
9.18
69.29
1.00


Donor 3
DNAH3
42.21
664.34
19.01
0.005
0.03
0.085
1.22
5.08
15.32
0.925
0.02
0.01
29.25
777.51
4.56


Donor 3
DST
100.36
273.74
14.76
0.005
0.03
0.085
1.22
27
58.89
0.925
7.41
1.17
63.68
549.28
3.22


Donor 3
EPS8L1
208.07
530.49
41.94
1.07
3.73
0.085
1.22
13.12
107.94
0.925
0.85
0.01
50.21
959.66
5.63


Donor 3
FRMD4B
143.55
211.78
47.51
0.73
0.03
0.085
1.22
17.71
91.8
0.925
0.02
1.11
53.79
570.26
3.35


Donor 3
LAMA3
100.19
509.46
23.21
1.08
0.03
0.085
1.22
36.97
34.67
0.925
1.19
0.01
50.95
759.99
4.46


Donor 3
MET
143.98
322.33
34.04
1.99
0.03
0.085
1.22
12.39
29.84
0.925
2.64
0.01
54.62
604.10
3.55


Donor 3
MIB2
113.31
127.71
16.28
0.05
0.03
0.085
1.22
9.27
39.67
0.925
0.02
0.01
39.41
347.99
2.04


Donor 3
MRC2
150.52
323.25
48.19
0.96
0.03
0.085
1.22
11.66
54.63
0.925
0.58
0.09
74.36
666.50
3.91


Donor 3
NOS2
186.72
328.5
75.34
4.54
0.03
0.085
1.22
18.02
95.19
0.925
1.96
2.06
69.18
783.77
4.60


Donor 3
PLEC
132.57
235.34
52.69
0.76
0.03
0.085
1.22
27.21
69.82
0.925
2.93
1.05
43.28
567.91
3.33


Donor 3
PLEKHG5
275.71
343.92
56.78
0.69
0.03
0.085
1.22
14.06
132.99
0.925
0.49
0.01
118.75
945.66
5.55


Donor 3
PTGDS
185.73
186.82
57.3
0.005
0.28
0.085
1.22
18.44
127.35
0.925
0.02
0.01
90.73
668.92
3.93


Donor 3
RASA3
133.59
93.84
40.44
0.01
0.06
0.085
1.22
9.68
73.67
0.925
2.3
1.49
53.69
411.00
2.41


Donor 3
TRPM2
176.42
154.05
46.74
1.05
0.03
1.43
1.22
10.93
133.4
0.925
0.02
0.01
72
598.23
3.51


Donor 3
IKZF3
32.69
169.24
18.82
0.005
0.03
0.085
1.22
10.52
16.55
0.925
0.02
0.01
21.41
271.53
1.59


Donor 3
Actin
56.66
60.86
13.4
0.56
4.53
0.085
1.22
2.56
5.96
0.925
2.89
0.01
20.69
170.35
1.00


Donor 4
DNAH3
0.66
0.005
2.21
0.005
0.03
0.085
1.22
0.41
0.58
0.925
0.02
0.01
2.38
8.54
1.24


Donor 4
DST
1.83
1.05
1.06
0.005
0.03
0.085
1.22
3.61
2.32
0.925
0.02
0.01
19.23
31.40
4.55


Donor 4
EPS8L1
0.66
1.35
0.98
0.005
0.03
2.01
1.22
4.24
1.95
0.925
0.02
0.01
1.86
15.26
2.21


Donor 4
FRMD4B
0.66
0.005
2.01
0.07
0.03
0.085
1.22
2.02
1.19
0.925
0.02
0.01
0.6
8.85
1.28


Donor 4
LAMA3
0.66
2.26
1.99
0.005
0.03
0.085
1.22
0.09
1.25
0.925
0.02
0.01
2.34
10.89
1.58


Donor 4
MET
0.66
0.3
1.19
0.005
0.03
0.085
1.22
4.77
2.69
0.925
0.13
0.01
1.61
13.63
1.98


Donor 4
MIB2
0.66
0.005
1.6
0.005
0.03
0.085
1.22
6.55
0.03
0.925
0.02
0.01
2.12
13.26
1.92


Donor 4
MRC2
0.66
1.05
0.98
0.005
0.03
0.085
1.22
4.77
0.3
0.925
0.02
0.01
2.08
12.14
1.76


Donor 4
NOS2
0.66
2.49
1.02
0.005
0.03
0.085
1.22
6.55
2.14
0.925
0.02
0.01
1.47
16.63
2.41


Donor 4
PLEC
1.42
0.005
1.66
0.005
0.03
0.085
1.22
5.29
0.79
0.925
0.31
0.02
16.87
28.63
4.15


Donor 4
PLEKHG5
0.66
0.005
1.15
0.005
0.03
0.085
1.22
3.19
1.19
0.925
0.02
0.01
0.8
9.29
1.35


Donor 4
PTGDS
0.66
3.65
2.26
0.005
0.03
0.085
1.22
3.19
2.08
0.925
0.02
0.01
10.06
24.20
3.51


Donor 4
RASA3
0.66
0.01
2.55
0.005
0.03
0.085
1.22
3.3
1.44
0.925
0.02
0.01
1.81
12.07
1.75


Donor 4
TRPM2
0.66
1.35
1.32
0.005
0.03
0.085
1.22
4.98
1.05
0.925
0.02
0.01
1.7
13.36
1.94


Donor 4
IKZF3
0.66
0.9
1.21
0.005
0.03
0.085
1.22
2.56
3.12
0.925
0.02
0.01
3.25
14.00
2.03


Donor 4
Actin
0.66
0.01
1.27
0.005
0.03
0.085
1.22
0.18
0.99
0.925
0.02
0.01
1.49
6.90
1.00


Donor 5
DNAH3
0.66
0.005
1.66
0.84
0.03
0.085
1.22
2.87
1.05
0.925
0.27
0.01
2.82
12.45
0.78


Donor 5
DST
0.66
0.6
0.79
0.005
0.03
0.085
1.22
3.61
3.18
0.925
0.02
0.01
2.06
13.20
0.82


Donor 5
EPS8L1
0.66
0.16
1.93
0.005
0.03
1.43
1.22
3.4
1.19
0.925
0.58
0.01
3.54
15.08
0.94


Donor 5
FRMD4B
0.66
2.03
1.71
0.005
0.03
0.085
1.22
0.09
0.3
0.925
0.02
0.01
1.86
8.95
0.56


Donor 5
LAMA3
0.66
0.01
1.93
0.005
0.03
2.29
1.22
0.41
0.3
0.925
0.02
0.01
1.86
9.87
0.62


Donor 5
MET
0.66
0.005
1.69
0.005
0.03
0.085
1.22
0.09
1.44
0.925
0.02
0.01
2.54
8.72
0.54


Donor 5
MIB2
0.66
0.005
2.44
0.005
0.03
0.95
1.22
1.71
0.06
0.925
0.02
0.01
2.71
10.75
0.67


Donor 5
MRC2
0.66
0.005
3.06
0.005
0.03
0.085
1.22
0.09
0.92
0.925
0.02
0.01
1.38
8.41
0.52


Donor 5
NOS2
0.66
1.2
1.9
0.005
0.03
0.085
1.22
0.09
1.89
0.925
1.11
0.01
3.63
12.76
0.80


Donor 5
PLEC
0.66
0.01
1.56
0.005
0.03
0.085
1.22
1.28
0.03
0.925
0.85
0.01
2.06
8.73
0.54


Donor 5
PLEKHG5
0.66
0.005
1.77
0.54
0.49
0.085
1.22
0.09
1.19
0.925
0.93
0.01
3.21
11.13
0.69


Donor 5
PTGDS
0.66
0.005
0.48
0.005
0.03
0.085
1.22
2.66
2.57
0.925
1.71
0.01
2.08
12.44
0.78


Donor 5
RASA3
0.66
0.3
2.21
0.005
0.03
0.085
1.22
1.49
1.44
0.925
0.02
0.01
1.9
10.30
0.64


Donor 5
TRPM2
0.66
0.005
1.1
0.005
0.03
0.085
1.22
0.09
0.03
0.925
0.02
0.01
0.92
5.10
0.32


Donor 5
IKZF3
0.66
4.81
2.52
0.005
0.03
2.94
1.22
4.66
0.03
0.925
0.02
0.01
1.52
19.35
1.21


Donor 5
Actin
0.66
1.65
1.4
0.005
0.03
0.085
1.22
5.5
1.44
0.925
0.02
0.01
3.08
16.03
1.00


Donor 6
DNAH3
59.45
150.57
19.71
0.58
0.91
1.73
1.22
26.38
150.33
0.925
28.58
5.59
367.48
813.46
3.66


Donor 6
DST
44.3
186.38
22.05
1.56
0.03
0.085
28.27
21.57
149.86
0.925
6.68
4.12
170.63
636.19
2.86


Donor 6
EPS8L1
47.7
132.54
24.08
2.42
0.03
0.085
1.22
23.24
53.62
0.925
10.24
4.59
322.88
623.57
2.81


Donor 6
FRMD4B
12.51
94.1
18.98
0.5
4.13
0.78
1.22
27
33.89
0.925
0.8
0.24
24.26
219.34
0.99


Donor 6
LAMA3
47.4
31
11.77
0.54
0.03
0.085
1.22
15
48.92
0.925
8.14
0.01
254.81
419.85
1.89


Donor 6
MET
36.59
255.47
19.03
1.92
0.03
0.4
1.22
59.85
64.07
0.925
3.14
4.24
56.57
503.46
2.27


Donor 6
MIB2
28.73
46.26
15.32
1.69
7.7
0.085
1.22
16.35
44.57
0.925
1.58
0.58
202.54
367.55
1.65


Donor 6
MRC2
30.56
173.28
11.42
0.3
0.03
0.085
1.22
15.31
25.45
0.925
13.84
2.86
70.54
345.82
1.56


Donor 6
NOS2
70.25
513.42
21.89
2.25
0.03
1.11
1.22
72.8
117.93
1.85
2.77
2.06
197.11
1004.69
4.52


Donor 6
PLEC
52.82
69.38
21.92
1.42
0.03
0.085
1.22
20.11
58.11
0.925
16.23
2.43
262.58
507.26
2.28


Donor 6
PLEKHG5
23.2
140.24
15.8
0.19
0.03
0.085
1.22
20.73
55.53
0.925
1.96
0.17
136.4
396.48
1.78


Donor 6
PTGDS
44.5
194.94
14.38
1.12
0.03
0.085
1.22
30.35
54.69
0.925
6.64
2.43
125.84
477.15
2.15


Donor 6
RASA3
67.6
91.21
19.34
1.53
0.03
0.085
7.62
43.82
212.13
0.925
14.56
2.18
273.27
734.30
3.31


Donor 6
TRPM2
24.72
145.01
12.57
0.005
0.03
0.085
1.22
22.4
16.66
0.925
1.5
3.28
67.52
295.93
1.33


Donor 6
IKZF3
63.92
108.75
23.63
1.97
0.03
0.085
5.1
46.57
131.23
0.925
22.4
2.86
116.65
524.12
2.36


Donor 6
Actin
18.81
135.48
11.03
0.5
0.03
0.085
1.22
4.66
8.77
0.925
2.22
0.01
38.39
222.13
1.00


Donor 7
DNAH3
25.1
28.72
2.1
0.005
0.03
0.085
1.22
7.49
2.45
0.925
0.02
0.09
48.76
117.00
1.64


Donor 7
DST
20.84
93.16
3.11
0.005
0.03
0.085
1.22
10.1
4.73
0.925
1.02
0.01
80.77
216.01
3.03


Donor 7
EPS8L1
1.32
0.9
2.84
0.005
0.03
0.085
1.22
3.4
0.03
0.925
0.63
0.01
7.74
19.14
0.27


Donor 7
FRMD4B
12.7
21.99
3.25
0.005
0.03
0.085
1.22
2.66
1.7
0.925
0.02
0.01
27.73
72.33
1.01


Donor 7
LAMA3
2.88
3.49
3.13
0.005
0.03
0.085
1.22
1.06
2.32
0.925
0.02
0.38
7.3
22.85
0.32


Donor 7
MET
0.66
1.05
1.82
0.005
0.03
0.085
1.22
3.09
0.22
0.925
0.02
0.01
8.53
17.67
0.25


Donor 7
MIB2
44.9
19.98
7.32
0.005
0.03
0.085
1.22
0.63
8.89
0.925
0.02
0.01
30.68
114.70
1.61


Donor 7
MR2C2
4.99
6.61
2.17
0.005
0.03
0.085
1.22
0.09
2.2
0.925
0.02
0.01
15.08
33.44
0.47


Donor 7
NOS2
64.4
61.11
9.55
0.38
0.03
2.29
1.22
3.93
10.2
0.925
0.18
0.01
29.13
183.36
2.57


Donor 7
PLEC
68.55
449.86
8.19
0.005
0.03
0.085
1.22
6.34
13.64
0.925
0.02
1.43
36.75
587.05
8.23


Donor 7
PLEKHG5
39.34
37.86
7.75
0.005
0.03
0.085
1.22
7.6
5.31
0.925
0.02
2.92
55.5
158.57
2.22


Donor 7
PTGDS
32.88
24.01
4.51
0.005
2.73
0.085
1.22
7.6
3.9
0.925
0.02
0.01
45.13
123.03
1.73


Donor 7
RASA3
42.8
44.03
7.54
0.005
0.03
0.085
1.22
7.8
14.2
0.925
0.02
0.31
36.75
155.72
2.18


Donor 7
TRPM2
29.69
140.85
2.97
0.005
0.03
0.085
1.22
25.75
3.72
0.925
0.02
0.01
124.46
329.74
4.62


Donor 7
IKZF3
43.4
29.69
8.26
0.005
0.03
0.085
1.22
5.71
6.88
0.925
0.02
0.45
37.8
134.48
1.89


Donor 7
Actin
3.31
6.53
0.77
0.01
0.03
2.29
1.22
7.7
0.14
0.925
0.02
0.01
48.35
71.31
1.00


Donor 8
DNAH3
110.13
191.67
72.91
1.32
0.03
4.85
3.47
9.27
105.51
0.925
0.4
0.78
121.93
623.20
47.79


Donor 8
DST
58.57
75.26
15.34
0.38
0.49
0.085
1.22
12.81
45.35
0.925
0.02
2.43
79.79
292.67
22.44


Donor 8
EPS8L1
88.89
63.7
41.38
1.19
0.03
0.085
6.26
10.1
121.32
0.925
0.02
4.24
92.38
430.52
33.02


Donor 8
FRMD4B
29.4
65.37
9.26
0.42
0.03
0.085
6.48
8.43
53.96
0.925
0.02
1.68
53.45
229.71
17.62


Donor 8
LAMA3
197.84
534.58
80.04
6.66
5.92
0.085
11.96
16.25
222.4
0.925
0.49
0.01
173.02
1250.18
95.87


Donor 8
MET
166.16
260.07
34.37
1.29
0.03
0.95
6.15
19.79
180.96
0.925
3.81
0.01
150.63
825.15
63.28


Donor 8
MIB2
55.58
97.75
8.09
3.34
0.03
0.4
10.38
14.37
48.48
0.925
4.22
0.01
70.89
314.47
24.12


Donor 8
MRC2
18.72
20.86
7.27
0.005
0.03
0.085
1.22
5.92
27.67
0.925
0.02
0.01
27.96
110.70
8.49


Donor 8
NOS2
79.04
62.03
23.6
1.36
0.03
0.085
8.21
11.98
120.62
0.925
1.28
0.01
53.5
362.67
27.81


Donor 8
PLEC
190.8
360.99
57.12
8.89
0.03
0.085
33.62
22.19
218.93
0.925
0.67
0.58
135.11
1029.94
78.98


Donor 8
PLEKHG5
30.37
80.65
6.89
0.005
0.03
0.085
1.22
12.39
12.62
0.925
0.08
0.01
34.21
179.94
13.76


Donor 8
PTGDS
17.08
7.78
5.28
0.005
1.92
0.085
1.22
13.44
25.12
0.925
0.67
2.31
25.09
100.93
7.74


Donor 8
RASA3
125.64
123.92
31.79
2.26
0.03
0.085
51.42
14.69
295.64
0.925
3.02
1.3
122.48
773.20
59.29


Donor 8
TRPM2
24.34
6.76
9.28
0.54
0.03
0.085
1.22
10.62
36.72
0.925
0.76
0.38
38.24
129.90
9.96


Donor 8
IKZF3
91.55
147.61
33.66
1.15
0.03
0.085
3.39
9.16
104.46
0.925
1.02
2.8
80.67
476.51
36.54


Donor 8
Actin
0.66
1.12
1.9
0.22
0.03
0.085
1.22
3.61
0.03
0.925
0.02
0.58
2.64
13.04
1.00


Donor 9
DNAH3
18.58
8.02
1.45
0.005
0.91
0.085
1.22
12.71
4.02
0.925
0.18
0.78
106.41
155.30
2.24


Donor 9
DST
18.02
15.32
3.89
0.17
0.03
0.085
1.22
8.22
1.19
0.925
0.02
0.01
64.97
114.07
1.64


Donor 9
EPS8L1
0.66
3.49
16.23
0.005
0.03
0.085
1.22
2.77
3.18
0.925
0.58
0.01
7.16
36.35
0.52


Donor 9
FRMD4B
5.93
3.18
2.93
0.005
0.03
0.085
1.22
0.09
0.92
0.925
0.04
0.01
12.73
28.10
0.40


Donor 9
LAMA3
0.66
4.03
2.75
0.005
0.03
2.01
1.22
1.28
1.51
0.925
0.02
0.01
6.68
21.13
0.30


Donor 9
MET
2.43
0.005
2.88
0.005
0.03
0.085
1.22
4.66
0.92
0.925
0.02
0.01
15.76
28.95
0.42


Donor 9
MIB2
13.91
10.55
5.42
0.005
0.03
0.085
1.22
6.55
4.25
0.925
0.02
0.01
63.45
106.43
1.53


Donor 9
MRC2
0.66
15.32
5.84
0.005
0.03
0.085
1.22
9.06
3.42
0.925
0.02
0.01
11.63
48.23
0.69


Donor 9
NOS2
27.96
18.69
4.86
0.005
0.03
0.085
1.22
22.19
2.01
0.925
1.19
0.01
220.43
299.61
4.32


Donor 9
PLEC
3.36
4.73
2.7
0.005
0.03
2.01
1.22
1.92
0.65
0.925
0.02
0.01
15.95
33.53
0.48


Donor 9
PLEKHG5
1.42
1.35
2.97
0.56
4.13
0.085
1.22
4.03
0.51
0.925
0.02
0.01
8.07
25.50
0.37


Donor 9
PTGDS
9.72
1.5
2.15
0.005
0.03
0.085
1.22
5.71
1.95
0.925
0.02
0.01
47.71
71.04
1.02


Donor 9
RASA3
2.48
6.14
2.12
0.005
0.03
0.085
1.22
4.03
0.03
0.925
1.19
0.01
14.78
33.05
0.48


Donor 9
TRPM2
5.56
0.9
4.77
0.38
0.03
0.085
1.22
4.03
1.32
0.925
0.02
0.01
10.04
29.29
0.42


Donor 9
IKZF3
9.67
0.005
6.18
0.005
0.03
1.43
1.22
5.08
1.32
0.925
0.08
0.01
31.98
57.94
0.83


Donor 9
Actin
0.66
3.49
0.77
0.36
0.03
2.01
1.22
2.13
1.05
0.925
0.58
0.01
56.18
69.42
1.00









To test the immunogenic capacity of specific N-terminal peptides in a more cellular setting, we then assessed responses of T cells previously primed to recognize either altered or wild-type peptides, when co-cultured with HLA-matched isogenic GC cells expressing either altered or wild-type peptides respectively (FIG. 12). By MHC-I affinity screening, a VMCDIFFSL nonamer in the WT RASA3 N-terminus was predicted to exhibit high MHC-I affinity binding for both the HLA-A02:01 (IC50=6.93 nm) and HLA-A02:06 (IC50=9.74 nm) alleles. Using HLA-A*02:06 T cells that are cross-reactive to HLA-A*02:01-positive AGS cells, we tested release of interferon gamma (IFNγ) from primed T cells after exposure to AGS lysates expressing either RASA3 CanT or SomT isoforms. ELISA assays demonstrated that T cells primed to recognize RASA3 CanT released significantly more IFNγ when co-cultured with RASA3 CanT-expressing AGS cells than when co-cultured with RASA3 SomT-expressing AGS cells. In contrast, T-cells primed with RASA3 SomT did not exhibit appreciable IFNγ release when co-cultured with RASA3 SomT expressing AGS cells, indicating that RASA3 SomT is less immunogenic (FIG. 12). Taken collectively, these in vitro results demonstrate that peptides predicted to be depleted in GCs through somatic promoter alterations can produce immunogenic responses, with the magnitude of immune responses depending on both peptide sequence and host immune background.


Somatic Promoters are Associated with EZH2 Occupancy


To identify potential oncogenic mechanisms driving somatic promoter alterations, we intersected the genomic locations of the somatic promoters with transcription factor binding sites (TFBS) of 237 transcription factors from 83 different tissues. Regions exhibiting somatic promoters were significantly enriched in regions associated with EZH2 (P<0.01) and SUZ12 (P<0.01) binding (FIG. 6a, Table 13), confirming earlier findings on a smaller cohort. Both EZH2 and SUZ12 are components of the PRC2 epigenetic regulator complex, which is upregulated in many cancer types including GC. To validate these findings, we then performed EZH2 Chip-sequencing on HFE-145 normal gastric epithelial cells (Methods and Materials). Concordant with the previous findings, we observed significant enrichment of EZH2 binding sites at somatic promoters compared to all promoters (Enrichment score 27 vs. 13 for all promoters, P<0.01), and this EZH2 enrichment remained significant when the gained somatic (Enrichment Score 28, P<0.01) and lost somatic promoters (Enrichment Score 24, P<0.01) were analyzed separately (FIG. 18).









TABLE 13







Somatic Promoters Overlapping EZH2/SUZ12 Binding Sites










Annotation



Loci
Status
Associated Gene





chrX: 136647100-
Known
ZIC3


136648150




chr13: 100634350-
Known
ZIC2


100638150




chr13: 100630200-
Known
ZIC2


100634000




chr20: 50719850-
Known
ZFP64


50723350




chr18: 45660800-
Known
ZBTB7C


45664950




chr1: 185226150-
Known
Y_RNA


185227950




chr3: 13920600-
Known
WNT7A


13921250




chr2: 71126100-
Known
VAX2


71129800




chr5: 6448050-
Known
UBE2QL1


6451150




chr8: 72986650-
Known
TRPA1


72987850




chr22: 17082250-
Known
TPTEP1


17084550




chr19: 55657350-
Known
TNNT1


55658650




chr19: 55666950-
Known
TNNI3


55668450




chr22: 42320400-
Known
TNFRSF13C


42323750




chr8: 119962100-
Known
TNFRSF11B


119965650




chr21: 42873650-
Known
TMPRSS2


42881750




chr20: 1164650-
Known
TMEM74B


1168700




chr17: 53797250-
Known
TMEM100


53803100




chr11: 119291200-
Known
THY1


119294700




chr20: 55203450-
Known
TFAP2C


55206500




chr6: 10409250-
Known
TFAP2A; TFAP2A-AS1


10419650




chr6: 85471550-
Known
TBX18


85475350




chr20: 46411750-
Known
SULF2


46414250




chr8: 70403800-
Known
SULF1


70408450




chr5: 172753250-
Known
STC2


172757450




chr14: 38675750-
Known
SSTR1


38681750




chr7: 20824950-
Known
SP8


20827850




chr13: 95362100-
Known
SOX21; SOX21-AS1


95368650




chr3: 181428150-
Known
SOX2


181434750




chr8: 101660950-
Known
SNX31


101662650




chr20: 10197250-
Known
SNAP25; SNAP25-AS1


10201300




chr20: 48598400-
Known
SNAI1


48604100




chr14: 70346050-
Known
SMOC1


70347700




chr12: 85303950-
Known
SLC6A15


85307700




chr19: 17981100-
Known
SLC5A5


17986400




chr2: 228580350-
Known
SLC19A3


228583450




chr3: 121656650-
Known
SLC15A2


121658300




chr6: 100910100-
Known
SIM1


100913300




chr21: 44842150-
Known
SIK1


44848700




chr7: 37953600-
Known
SFRP4


37956950




chr4: 154708850-
Known
SFRP2


154714150




chr16: 23193600-
Known
SCNN1G


23197800




chr16: 23312800-
Known
SCNN1B


23315350




chr2: 200326950-
Known
SATB2


200329550




chr20: 50415800-
Known
SALL4


50419950




chr20: 981750-
Known
RSPO4


984100




chr1: 148247000-
Known
RP11-89F3.2


148248800




chr12: 54472600-
Known
RP11-834C11.6; RP11-


54477950

834C11.7


chr5: 72746300-
Known
RP11-79P5.7


72748200




chr1: 61103800-
Known
RP11-776H12.1


61106600




chr11: 134335600-
Known
RP11-627G23.1


134339750




chr11: 69830350-
Known
RP11-626H12.1


69834850




chr16: 89987550-
Known
RP11-566K11.4; TUBB3


89991500




chr16: 86319900-
Known
RP11-514D23.1


86321550




chr3: 50191700-
Known
RP11-493K19.3; SEMA3F


50195800




chr3: 132756350-
Known
RP11-469L4.1; TMEM108


132758550




chr6: 26613750-
Known
RP11-457M11.6


26615600




chr3: 87841650-
Known
RP11-451B8.1


87842700




chr1: 113391350-
Known
RP11-426L16.8; RP3-


113395900

522D1.1


chr12: 85711250-
Known
RP11-408B11.2


85713200




chr6: 106807450-
Known
RP11-404H14.1


106809950




chr1: 149230550-
Known
RP11-403I13.5


149232000




chr1: 222138950-
Known
RP11-400N13.2


222144050




chr3: 178577000-
Known
RP11-385J1.2


178578500




chr17: 46721450-
Known
RP11-357H14.17


46725800




chr5: 522450-
Known
RP11-310P5.2; SLC9A3


524750




chr15: 80542500-
Known
RP11-2E17.1


80545200




chr5: 74343750-
Known
RP11-229C3.2


74351250




chr5: 63460450-
Known
RNF180


63463050




chr1: 228742450-
Known
RNA5SP19


228743450




chr1: 228781900-
Known
RNA5S17; RNA5SP18


228785450




chr21: 38379100-
Known
RIPPLY3


38379750




chr21: 43180350-
Known
RIPK4


43189850




chr8: 104510350-
Known
RIMS2; RP11-1C8.4


104514700




chr10: 62758000-
Known
RHOBTB1


62762450




chr15: 90039550-
Known
RHCG


90040150




chr2: 86564650-
Known
REEP1


86566000




chr4: 82964050-
Known
RASGEF1B; RP11-689K5.3


82966400




chr3: 75707050-
Known
RARRES2P1


75708850




chr8: 85093500-
Known
RALYL


85097700




chr8: 128805200-
Known
PVT1


128810000




chr1: 29562850-
Known
PTPRU


29565950




chr7: 158378250-
Known
PTPRN2


158380350




chr1: 170630400-
Known
PRRX1; RP1-79C4.4


170636550




chr6: 150463250-
Known
PPP1R14C


150464400




chr12: 133264050-
Known
POLE; PXMP2; RP13-


133266950

672B3.2


chr5: 74990850-
Known
POC5


74992350




chr20: 56280450-
Known
PMEPA1


56287350




chr16: 57315850-
Known
PLLP


57319550




chr1: 6544500-
Known
PLEKHG5


6545600




chr14: 69950300-
Known
PLEKHD1


69951550




chr1: 201251800-
Known
PKP1


201254650




chr2: 42275400-
Known
PKDCC


42282950




chr12: 130823500-
Known
PIWIL1


130825600




chr4: 111557000-
Known
PITX2


111559350




chr7: 32107350-
Known
PDE1C


32111900




chr1: 55504650-
Known
PCSK9


55507550




chr15: 102029650-
Known
PCSK6


102031300




chr3: 142606500-
Known
PCOLCE2


142609050




chr14: 37129750-
Known
PAX9


37133800




chr1: 17443850-
Known
PADI2


17446850




chr8: 99951150-
Known
OSR2; RP11-44N12.5; STK3


99961750




chr1: 161991300-
Known
OLFML2B


161994850




chr7: 8473050-
Known
NXPH1


8474100




chr9: 87282200-
Known
NTRK2


87286150




chr19: 15309800-
Known
NOTCH3


15311950




chr4: 56500900-
Known
NMU


56504300




chr1: 183385400-
Known
NMNAT2


183388500




chr8: 41502400-
Known
NKX6-3


41510150




chr10: 134596450-
Known
NKX6-2; RP11-288G11.3


134599400




chr4: 85417400-
Known
NKX6-1


85421400




chr2: 233791350-
Known
NGEF


233792700




chrX: 107016000-
Known
NCBP2L; TSC22D3


107021000




chr11: 1150000-
Known
MUC5AC


1157350




chr7: 100607850-
Known
MUC12; MUC3A; RP11-


100613600

395B7.2


chr16: 56699800-
Known
MT1G; MT1H


56705700




chr12: 132313150-
Known
MMP17


132317650




chr7: 73036850-
Known
MLXIPL


73039200




chr19: 54482850-
Known
MIR935


54485950




chr9: 21554500-
Known
MIR31HG


21561150




chr17: 46800050-
Known
MIR3185; PRAC1; PRAC2


46802400




chr1: 1562700-
Known
MIB2


1565700




chr1: 205537050-
Known
MFSD4


205540700




chr13: 31480150-
Known
MEDAG


31483050




chr2: 132152200-
Known
MED15P3


132153000




chr3: 150959500-
Known
MED12L


150960300




chr2: 149894250-
Known
LYPD6B


149897500




chr11: 1889150-
Known
LSP1


1894600




chr1: 156896950-
Known
LRRC71


156898350




chr11: 61275250-
Known
LRRC10B; MIR4488


61276400




chr9: 103789900-
Known
LPPR1


103792650




chr16: 1013250-
Known
LMF1


1015550




chr1: 2980250-
Known
LINC00982; PRDM16


2991900




chr3: 75719150-
Known
LINC00960


75723200




chr20: 21085550-
Known
LINC00237


21087550




chr19: 55127750-
Known
LILRB1


55130550




chr7: 103968400-
Known
LHFPL3


103969950




chr1: 202182400-
Known
LGR6


202184350




chr1: 202161700-
Known
LGR6


202163400




chr1: 65991250-
Known
LEPR


65992850




chr1: 205424550-
Known
LEMD1; RP11-576D8.4


205426850




chr20: 9494050-
Known
LAMP5; RP5-1119D9.4


9498000




chr6: 129203450-
Known
LAMA2


129207800




chr19: 51485750-
Known
KLK7


51487700




chr3: 126073900-
Known
KLF15


126077300




chr1: 245315950-
Known
KIF26B


245321950




chr1: 180880350-
Known
KIAA1614


180883200




chr15: 81070500-
Known
KIAA1199


81075050




chr20: 43728950-
Known
KCNS1


43730250




chr14: 88788450-
Known
KCNK10


88791000




chr7: 119911950-
Known
KCND2


119914550




chr1: 111210100-
Known
KCNA3


111218300




chr16: 31366400-
Known
ITGAX


31369100




chr20: 13200350-
Known
ISM1


13202100




chr16: 54316250-
Known
IRX3


54322800




chr5: 2748900-
Known
IRX2


2751450




chr17: 38016450-
Known
IKZF3


38022250




chr22: 23229500-
Known
IGLC1; IGLJ1; IGLL5


23237350




chr19: 46579500-
Known
IGFL4


46581300




chr7: 45927300-
Known
IGFBP1


45929150




chr7: 23506000-
Known
IGF2BP3


23515500




chr6: 87646350-
Known
HTR1E


87648250




chr5: 175084150-
Known
HRH2


175086850




chr3: 11195250-
Known
HRH1


11198600




chr4: 175439400-
Known
HPGD


175445700




chr12: 54386800-
Known
HOXC6; HOXC9; HOXC-


54395700

AS1; HOXC-AS2


chr12: 54421700-
Known
HOXC6


54423400




chr12: 54410150-
Known
HOXC4; HOXC6; RP11-


54413050

834C11.14


chr12: 54446200-
Known
HOXC4


54449350




chr12: 54331500-
Known
HOXC13; HOXC-AS5


54334550




chr12: 54375250-
Known
HOXC10; HOXC-AS3; RP11-


54381900

834C11.12


chr17: 46701450-
Known
HOXB9


46705000




chr17: 46804450-
Known
HOXB13


46808100




chr7: 27159450-
Known
HOXA3; HOXA-AS2


27164850




chr7: 27208400-
Known
HOXA10; HOXA9; HOXA-


27220700

AS4; MIR196B; RP1-




170O19.20


chr7: 27221300-
Known
HOTTIP; HOXA11; HOXA11-


27251300

AS; HOXA13; RP1-




170O19.14


chr12: 54365950-
Known
HOTAIR; HOXC11


54373250




chr1: 6478800-
Known
HES2


6480950




chr11: 2016000-
Known
H19


2021350




chr11: 45942850-
Known
GYLTL1B


45946400




chr9: 140056700-
Known
GRIN1


140058300




chr15: 72488700-
Known
GRAMD2


72491050




chr17: 72425800-
Known
GPRC5C


72433550




chr5: 89854500-
Known
GPR98


89855350




chrX: 133117900-
Known
GPC3


133120700




chr19: 2700850-
Known
GNG7


2702900




chr7: 99526050-
Known
GJC3; RP4-604G5.1


99527900




chr8: 75230900-
Known
GDAP1; JPH1


75235150




chr7: 74379400-
Known
GATSL1


74380400




chr20: 61046800-
Known
GATA5; RP13-379O24.3


61052500




chr8: 11533800-
Known
GATA4


11540650




chr8: 11557150-
Known
GATA4


11568950




chr11: 11640700-
Known
GALNT18


11644650




chr12: 130645350-
Known
FZD10; FZD10-AS1


130646800




chr6: 96460900-
Known
FUT9


96466650




chr13: 39259850-
Known
FREM2


39263000




chr16: 86600550-
Known
FOXC2; RP11-463O9.5


86601800




chr6: 1608550-
Known
FOXC1


1611700




chr14: 38051900-
Known
FOXA1; TTC6


38070050




chr17: 39965500-
Known
FKBP10; LEPREL4


39970950




chr9: 133813800-
Known
FIBCD1


133816150




chr11: 69630950-
Known
FGF3


69635350




chr3: 13973700-
Known
FGD5P1


13975200




chr10: 95325600-
Known
FFAR4


95329150




chr7: 121942750-
Known
FEZF1; FEZF1-AS1


121947900




chr16: 86529000-
Known
FENDRR


86534050




chr21: 42687850-
Known
FAM3B


42691150




chr17: 66593700-
Known
FAM20A


66598900




chr1: 179711850-
Known
FAM163A


179712600




chr8: 53476650-
Known
FAM150A


53479500




chr4: 187025100-
Known
FAM149A


187028650




chr12: 124778800-
Known
FAM101A


124786100




chr7: 27281600-
Known
EVX1; EVX1-AS


27284150




chrX: 103498450-
Known
ESX1


103500200




chr1: 216892850-
Known
ESRRG


216898200




chr19: 55590850-
Known
EPS8L1


55593800




chr8: 144950100-
Known
EPPK1


144953650




chr17: 48608600-
Known
EPN3


48615100




chr1: 23037600-
Known
EPHB2


23041300




chr9: 112080500-
Known
EPB41L4B


112082950




chr7: 155250600-
Known
EN2


155253200




chr19: 14885900-
Known
EMR2


14888350




chr22: 37821950-
Known
ELFN2; RP1-63G5.5


37823900




chr19: 1286150-
Known
EFNA2; MUM1


1288700




chr20: 57874800-
Known
EDN3


57877300




chr15: 45399500-
Known
DUOX2; DUOXA2


45410700




chr16: 30021900-
Known
DOC2A


30023950




chr7: 96633500-
Known
DLX6; DLX6-AS1; DLX6-AS2


96636700




chr7: 96652750-
Known
DLX5


96654900




chr19: 6474700-
Known
DENND1C


6477300




chr10: 94831200-
Known
CYP26A1


94834300




chr4: 48987500-
Known
CWH43


48989500




chr8: 104382100-
Known
CTHRC1


104385900




chr5: 174177950-
Known
CTD-2532K18.1; MIR4634


174179050




chr14: 19924450-
Known
CTD-2314B22.3


19925600




chr14: 19640850-
Known
CTD-2314B22.1


19641750




chr15: 97838750-
Known
CTD-2147F2.1


97841300




chr5: 134912900-
Known
CTC-321K16.1; CXCL14


134915350




chr5: 134371700-
Known
CTC-276P9.1


134375750




chr16: 21288600-
Known
CRYM


21290700




chr2: 102002650-
Known
CREG2


102005250




chr15: 78632500-
Known
CRABP1


78634200




chr3: 9745600-
Known
CPNE9


9747050




chr16: 89640950-
Known
CPNE7


89643950




chr3: 99355450-
Known
COL8A1


99359900




chr6: 33160200-
Known
COL11A2


33161450




chr6: 35754500-
Known
CLPSL1


35755750




chr21: 36041150-
Known
CLIC6


36045150




chr17: 7161850-
Known
CLDN7; RP1-4G17.5


7167950




chr7: 73181100-
Known
CLDN3


73185850




chr3: 190034900-
Known
CLDN1; CLDN16


190041800




chr7: 29184550-
Known
CHN2; CPVL


29187650




chr2: 27340450-
Known
CGREF1


27342750




chr13: 28538700-
Known
CDX2


28543950




chr5: 149545100-
Known
CDX1


149550500




chr16: 68677900-
Known
CDH3; RP11-615I2.2


68681200




chr16: 68770300-
Known
CDH1


68774200




chr11: 6279800-
Known
CCKBR


6283200




chr18: 57363700-
Known
CCBE1; RP11-2N1.2


57365350




chr8: 76189900-
Known
CASC9


76191050




chr6: 17392850-
Known
CAP2


17396100




chr1: 20808950-
Known
CAMK2N1


20814450




chr7: 44265350-
Known
CAMK2B


44266400




chr8: 86350000-
Known
CA3


86351450




chr5: 2751850-
Known
C5orf38; IRX2


2754050




chr3: 138664900-
Known
C3orf72; FOXL2


138667100




chr17: 77019250-
Known
C1QTNF1; C1QTNF1-AS1


77024000




chr1: 223565950-
Known
C1orf65


223567600




chr1: 190440800-
Known
BRINP3; RP11-


190450200

161I10.1; RP11-547I7.2


chr2: 198650550-
Known
BOLL


198651850




chr15: 83952250-
Known
BNC1


83953300




chr4: 42152300-
Known
BEND4


42155900




chr17: 47209750-
Known
B4GALNT2


47211400




chr11: 134279600-
Known
B3GAT1


134282050




chr4: 94748600-
Known
ATOH1


94754050




chr9: 120175650-
Known
ASTN2


120177900




chr9: 133319400-
Known
ASS1


133324650




chr11: 2285750-
Known
ASCL2


2292550




chr16: 329250-
Known
ARHGDIG


332250




chr8: 145908800-
Known
ARHGAP39


145912600




chr4: 86395150-
Known
ARHGAP24


86399900




chr18: 24443050-
Known
AQP4; AQP4-AS1


24445900




chr11: 71318250-
Known
AP000867.1


71320050




chr5: 79864800-
Known
ANKRD34B


79866650




chr2: 133014850-
Known
ANKRD30BL; MIR663B


133015750




chr12: 85672750-
Known
ALX1


85675650




chr6: 168195400-
Known
AL009178.1; C6orf123


168198750




chr10: 4867450-
Known
AKR1E2


4870200




chr16: 3232300-
Known
AJ003147.8


3234150




chr8: 11203650-
Known
AF131216.5; TDH


11206800




chr17: 15847250-
Known
ADORA2B


15850800




chr7: 5601050-
Known
ACTB


5603800




chr7: 100490350-
Known
ACHE


100495550




chr3: 18734950-
Known
AC144521.1


18736300




chr2: 131593950-
Known
AC133785.1; ARHGEF4


131595800




chr4: 44447900-
Known
AC131951.1; KCTD8


44452050




chr17: 7982650-
Known
AC129492.6; ALOX12B


7984350




chr5: 1003400-
Known
AC116351.2; RP11-


1005850

43F13.4


chr2: 100721300-
Known
AC092667.2; AFF3


100722600




chr2: 286750-
Known
AC079779.4; FAM150B


288600




chr2: 132121200-
Known
AC073869.1


132122150




chr2: 233282700-
Known
AC068134.5; AC068134.6


233286450




chr16: 31495650-
Known
AC026471.6; SLC5A2


31500700




chr12: 54348250-
Known
AC012531.23; HOXC12


54351050




chr2: 118561200-
Known
AC009312.1


118562150




chr16: 51182700-
Known
AC009166.5; SALL1


51185700




chr2: 171671550-
Known
AC007405.8; GAD1


171676200




chr2: 66801200-
Known
AC007392.3


66811950




chr2: 71113350-
Known
AC007040.5


71116800




chr7: 15720950-
Known
AC005550.4; MEOX2


15728900




chr6: 1611750-
Unknown



1616000




chr15: 96958950-
Unknown



96961350




chr2: 66652100-
Unknown



66655200




chr2: 8833050-
Unknown



8834200




chr9: 17905350-
Unknown



17908250




chr5: 2746900-
Unknown



2748550




chr7: 45001800-
Unknown



45003250




chr12: 52257150-
Unknown



52258000




chr2: 218874000-
Unknown



218875450




chr19: 30214300-
Unknown



30216100




chr8: 140717350-
Unknown



140719650




chr7: 27264550-
Unknown



27266100




chr19: 48900250-
Unknown



48904400




chr16: 51186150-
Unknown



51187850




chr9: 132458700-
Unknown



132461300




chr11: 44337850-
Unknown



44339250




chr17: 46694850-
Unknown



46697150




chr10: 124898400-
Unknown



124900700




chr6: 10382900-
Unknown



10384750




chr8: 144489000-
Unknown



144490750




chr20: 49837550-
Unknown



49839250




chr3: 193921100-
Unknown



193922050




chr13: 100619800-
Unknown



100623100




chr1: 165320950-
Unknown



165322700




chr1: 180203650-
Unknown



180205650




chr1: 23543800-
Unknown



23544900




chr8: 144842350-
Unknown



144844000




chr5: 174162150-
Unknown



174163450




chr1: 184632450-
Unknown



184634700




chr13: 21295150-
Unknown



21296450




chr1: 156893100-
Unknown



156894550




chr20: 46434400-
Unknown



46435400




chr11: 33398050-
Unknown



33400750




chr6: 134216650-
Unknown



134218050




chr2: 45176050-
Unknown



45177700




chr13: 36044350-
Unknown



36045800




chr2: 45227500-
Unknown



45229600




chr10: 43427950-
Unknown



43429950




chr1: 152079200-
Unknown



152081300




chr7: 54731350-
Unknown



54733200




chr20: 4201500-
Unknown



4202700




chr8: 145555300-
Unknown



145556800




chr7: 64733800-
Unknown



64735500




chrX: 119124000-
Unknown



119127100




chr3: 14642850-
Unknown



14644150




chr10: 102488400-
Unknown



102492200




chr5: 42999400-
Unknown



43001150




chr21: 38063750-
Unknown



38066650




chr2: 131010400-
Unknown



131011600




chr19: 30018700-
Unknown



30020150




chr5: 72731550-
Unknown



72734700




chr8: 102092150-
Unknown



102094400




chr4: 4867350-
Unknown



4869600




chr4: 4854350-
Unknown



4855850




chr7: 156735150-
Unknown



156736500




chr1: 161442450-
Unknown



161443650




chr12: 54356450-
Unknown



54358100




chr1: 48174300-
Unknown



48176650




chr7: 25900700-
Unknown



25903050




chr10: 102830000-
Unknown



102833650




chr6: 137310350-
Unknown



137312150




chr1: 152081400-
Unknown



152084100




chr7: 27274550-
Unknown



27276500




chr12: 113904650-
Unknown



113906650




chr1: 17024500-
Unknown



17028900




chr5: 72528750-
Unknown



72529950




chr9: 99481850-
Unknown



99483650




chr1: 46954600-
Unknown



46956800




chr17: 26119900-
Unknown



26121850




chr1: 2253650-
Unknown



2254650




chr7: 73060250-
Unknown



73063150




chr19: 1754200-
Unknown



1758750




chr9: 29211200-
Unknown



29215700




chr7: 31375200-
Unknown



31377000




chr1: 165344500-
Unknown



165346650




chr10: 57389650-
Unknown



57391700




chr1: 163441550-
Unknown



163443100




chr1: 200842700-
Unknown



200844850




chr20: 44639000-
Unknown



44640950




chr2: 176952400-
Unknown



176953750




chr20: 6031700-
Unknown



6033850




chr5: 2738550-
Unknown



2740800




chr3: 74662150-
Unknown



74664400




chr10: 134600350-
Unknown



134602350




chr1: 152084900-
Unknown



152085650




chr8: 52520450-
Unknown



52521550




chr1: 121279850-
Unknown



121280850




chr13: 37729350-
Unknown



37731000




chr7: 8390700-
Unknown



8392150




chr12: 32818500-
Unknown



32820350




chr16: 15350450-
Unknown



15351950




chr2: 58342200-
Unknown



58346950




chr3: 112383300-
Unknown



112384750




chr19: 1682300-
Unknown



1683350




chr4: 27077050-
Unknown



27078000




chr8: 23507850-
Unknown



23509050




chr4: 10782250-
Unknown



10783600




chr17: 12927950-
Unknown



12928650




chr2: 11989300-
Unknown



11990550




chr7: 23074700-
Unknown



23076100




chr22: 28479200-
Unknown



28480250




chr9: 36763800-
Unknown



36766950




chr6: 28757250-
Unknown



28758600




chr1: 50032150-
Unknown



50033200




chr6: 4334150-
Unknown



4335300




chr1: 195732150-
Unknown



195733300




chr6: 170483200-
Unknown



170484200




chr12: 38447100-
Unknown



38448600




chr7: 86667750-
Unknown



86669950




chr16: 9683650-
Unknown



9684650




chr1: 171342100-
Unknown



171343300




chr20: 47203350-
Unknown



47204450




chr20: 62030950-
Unknown



62034000




chr1: 168323150-
Unknown



168325650




chr6: 10133900-
Unknown



10134950




chr4: 71924850-
Unknown



71926200




chrX: 130711450-
Unknown



130713600




chr12: 38549550-
Unknown



38551600




chr2: 131094200-
Unknown



131095000




chr1: 183626800-
Unknown



183628050




chr6: 28918100-
Unknown



28918850




chr2: 198504700-
Unknown



198507250




chr11: 71350450-
Unknown



71351500




chr20: 47001000-
Unknown



47003900




chr21: 10600500-
Unknown



10603150




chr3: 34131250-
Unknown



34132150




chr5: 7170200-
Unknown



7171750




chr17: 50486700-
Unknown



50487400




chr2: 122809550-
Unknown



122810150




chr8: 57178000-
Unknown



57179050




chr4: 142803450-
Unknown



142805000




chr10: 118367950-
Unknown



118370350




chrX: 115004100-
Unknown



115005700




chr3: 53961050-
Unknown



53963000




chr6: 28920750-
Unknown



28922800




chr17: 11769750-
Unknown



11770850




chr6: 1594950-
Unknown



1595600




chr15: 79783300-
Unknown



79784500




chr7: 83684250-
Unknown



83685650




chr18: 2246500-
Unknown



2247900




chr10: 36147250-
Unknown



36148500




chr7: 91023500-
Unknown



91025650




chr2: 79337900-
Unknown



79339650




chrX: 115002950-
Unknown



115003900




chr1: 34557900-
Unknown



34558600




chr19: 523250-
Unknown



524300




chr13: 91315500-
Unknown



91317200




chr6: 26330700-
Unknown



26333000




chr9: 115565950-
Unknown



115567400




chr14: 42380150-
Unknown



42381450




chr7: 76356350-
Unknown



76358750




chr13: 108578200-
Unknown



108579350




chr8: 90569800-
Unknown



90570900




chr3: 185842600-
Unknown



185844550




chr1: 207903150-
Unknown



207904800




chr2: 14988000-
Unknown



14988950




chr12: 47819700-
Unknown



47821500




chr1: 83728350-
Unknown



83730000




chr11: 105384700-
Unknown



105387850




chr3: 88557900-
Unknown



88558600




chr6: 142290050-
Unknown



142291600




chr3: 83265600-
Unknown



83268250









To experimentally test if inhibiting EZH2/PRC2 activity might modulate somatic promoter usage in GC, we treated IM95 GC cells with GSK126, a highly selective small-molecule inhibitor of EZH2 methyltransferase activity. This line was selected as it has previously shown to be sensitive to EZH2 depletion (FIG. 14). RNA-seq analysis of GSK126-treated IM95 cells at two treatment time points (Day 6 and 9) confirmed that genes upregulated upon EZH2 inhibition are enriched in previously identified PRC2 target gene sets (FIG. 18). GSK126 treatment caused deregulation of 2134 promoters in total. Of 1959 promoters exhibiting somatic alterations in primary GCs (FIG. 1D), GSK126 treatment caused deregulation of 251 somatic promoters in IM95 cells (12.8%). This proportion was significantly greater than the proportion of unaltered promoters exhibiting deregulation after GSK126 challenge (8.8%, OR 1.46 P<0.001, Fisher Test, FIG. 5B), suggesting heightened sensitivity of somatic promoters to EZH2 inhibition. The proportion of somatic promoters deregulated after EZH2 inhibition was also greater than the total proportion of genes (as defined by Gencode) regulated by GSK126 (1.5%, OR 9.21, P<0.001, FIG. 5B). Of those promoters exhibiting both GSK126 deregulation and also mapping to somatic promoters lost in primary GC, 89.6% were reactivated following GSK126 administration (78/87, FC>=2, qval <0.1, Methods and Materials), consistent with EZH2 functioning to repress these promoters. For example, FIGS. 5C and 5D highlights two lost somatic promoters (SLC9A9 and PSCA), exhibiting expression gain after GSK126 treatment (FIG. 5). These results thus suggest a general role for EZH2 in regulating epigenomic promoter alterations in GC.


Somatic Promoters Reveal Novel Cancer-Associated Transcripts


Finally, when analyzing the altered somatic promoters with respect to both proximity to known genes, we found that somatic promoters could be classified into annotated and unannotated categories. Annotated promoters were defined as promoters mapping close (<500 bp) to a known Gencode transcription start site (TSS), while unannotated promoters refer to those mapping to genomic regions devoid of known Gencode TSSs. The majority of promoters present in non-malignant tissues, and also promoters unchanged between tumors and normal tissues, mapped closely to previously annotated TSSs (72%-92%). In contrast, only 41% of promoters mapped to annotated promoter locations, while the remaining 59% mapped to “unannotated” locations, distant from Gencode TSSs and in many cases 2-10 kb away (FIG. 6a).


To test the functional relevance of these unannotated promoters, we used GenoCanyon, a nucleotide level quantification of genomic functional potential that integrates multiple levels of conservation and epigenomic information. We observed that 81% of the unannotated promoter regions exhibited a maximum genome wide functional score of greater than 0.9 (range 0-1), indicating high functional potential. To ascertain tissue type specificities, we then applied tissue specific annotations using GenoSkyline, an extension of the GenoCanyon framework integrating Roadmap Epigenomics data We observed that GI tissues had the 3rd highest median score after ESC and fetal tissues, consistent with our tumors being gastric in lineage and also de-differentiated (FIG. 5b). In a separate analysis, recent studies have also suggested that endogenous repeat elements in the human genome may contribute significantly to regulatory element variation, and hypomethylation of repeat elements can induce cancer-associated transcription. We found that unannotated promoters, were also significantly enriched for the repeat elements ERV1 (P<0.0001 Unannotated vs. All) and L1 (P<0.0001 Unannotated vs. All, FIG. 13).


Compared to annotated promoters, unannotated promoters exhibited weaker H3K27ac signals suggesting that the former might have lower activity and decreased gene expression levels (FIG. 13). Supporting this, somatic promoters, even those supported by CAGE tags (indicating true promoters), exhibited significantly lower RNA-seq expression levels compared CAGE tag supported all promoters (FIG. 5c). We thus hypothesized that unannotated promoters might be associated with low transcript levels, thereby rendering them more challenging to detect by conventional depth transcriptome sequencing given the very wide dynamic range of cellular transcriptomes (10-10,000 transcripts per cell for different genes) (FIG. 5d). To test this possibility, we employed both down-sampling and up-sampling analysis. Not surprisingly, decreasing levels of RNA-seq depth caused a concomitant decrease in detected somatic promoter transcripts. For example, downsampling to −40M reads caused ˜250 transcripts (FPKM>0, FIG. 5e) to be rendered undetectable at somatic promoters. More convincingly, in the reciprocal experiment, we experimentally generated deep RNA-seq data for matched 5 GC/normal pairs (average read depth 140M compared to standard 100M), and confirmed the additional detection of 435 new somatic promoter-associated transcripts (FPKM>0) (FIG. 5e). We estimate that usage of deep RNA-sequencing data allowed us to discover additional transcripts for 22% of the unannotated promoters, not previously detectible at regular depth RNA-seq (FIG. 5f). These results demonstrate that despite being associated with bona-fide cancer associated transcripts, many somatic promoters defined by epigenomic profiling may have been missed by conventional-depth RNA-seq.


Discussion


Identifying somatically-altered cis-regulatory elements, and understanding how these elements direct cancer-associated gene expression represents a critical scientific goal. Here, we defined close to 2000 promoters exhibiting altered activity in GC, indicating that somatic promoters in GC are pervasive. Promoters are canonically defined as proximal cis-regulatory elements that recruit general transcription factors to initiate transcription. However, selection and activation of TSSs by RNA polymerase at core promoters is dependent on multiple factors. Core promoters are differentially distributed between genes of different functions, and chromatin distributions and epigenetic landscapes of core promoter regions can also differ in a tissue specific manner. Presence of multiple transcription initiation sites within the same gene can generate distinct transcript isoforms with different 5′UTRs that can act as switches to regulate gene expression, and usage of alternative 5′UTRs can also impact both translation and protein stability of cancer associated genes such as BRCA1, TGF-β and ERG Such findings demonstrate that specific promoter element activity is complex and cell context dependent, with impact on downstream transcriptional, translational, and functional processes.


A significant proportion (˜18%) of somatic promoters corresponded to alternative promoters. In cancer, alternative promoter utilization is of major relevance, as increasing numbers of genes (e.g. LEF1, TP53, TGFB3) are now being shown to exhibit distinct alternative-promoter associated isoforms that differentially affect malignant growth. In the current study, we identified alternative promoters in genes both known and novel to GC biology with significant clinical and translational implications. For example, we discovered an alternative promoter at the EpCAM gene locus specifically activated in gastric tumors. In GC, EpCAM encodes a transmembrane glycoprotein which has been proposed as a marker for circulating tumor cells and EpCAM expression levels have been correlated with GC patient prognosis. However, little is known about the specific cellular mechanisms driving high EpCAM expression in GC. Our finding that EpCAM is regulated in GC not through its canonical promoter, but instead through a cancer-specific alternative promoter may lend credence to recent reports suggesting that in addition to acting as an experimentally convenient surface marker, EpCAM may actually play a more direct pro-oncogenic role in stimulating cellular proliferation.


Another novel example of an alternative promoter-associated gene, identified for the first time in our study, was RASA3. While a functional role for RASA3 in cancer remains to definitely established, studies from other biological fields have shown that RASA3 can inhibit RAP1, which in turn has been implicated in invasion and metastasis in various cancers. RASA3 depletion can enhance signaling by integrins and mitogen-activated protein kinases, and the possibility that RASA3 can act as tumor suppressor has also been recently suggested through independent cross-species cancer studies. A plausible role for RASA3 as a potential tumor suppressor is consistent with our own results where expression of wild-type RASA3 potently inhibited cell migration and invasion in GC cell lines, while N-terminal variant RASA3 enhanced migration and invasion in normal gastric epithelial cells. A third example of an alternative-promoter driven genes was MET, which has been extensively investigated as a target for cancer therapy. While we and others have previously reported expression of an N-terminal truncated MET variant in cancer, functional implications of this truncated MET variant have remained unclear. In the present study, experimental assessment of MET wild-type and variant signaling revealed that truncated MET variants may have different downstream signaling effects compared to full-length MET isoforms. Under the experimental conditions used, we observed significant differences in phosphorylation patterns of ERK, STAT3 and GAB1, in a manner consistent with MET-Var being more pro-oncogenic compared to MET-Var, as both ERK, STAT3, and GAB1 have been shown to facilitate MET-induced signaling. The MET signaling pathway is known to be particularly complex with multiple feedback loops, and understanding how expression of the N terminal short MET isoform might modulate downstream survival signaling will be an important subject of future research, particularly in light of recent clinical trials targeting MET in lung cancer using antibodies which have been unsuccessful.


Our study also revealed an unexpected relationship between somatic promoters and tumor immunity. Specifically, we discovered that alternative promoter isoforms overexpressed in GC were significantly depleted of N-terminal peptides predicted to be potentially immunogenic, based on computational predictions of high-affinity MHC Class I binding and other immunological assays. We believe that finding is relevant to cancer immunity, as it builds on previous findings from the literature establishing the existence of self-reactive T-cells, the potential immunogenicity of overexpressed tumor antigens, and the process of tumor immunoediting. First, while the majority of self-reactive T-cells are clonally deleted during early development, numerous groups have also demonstrated the frequent persistence of self-reactive T cells in the periphery. For example, analysis of transgenic mice has shown that 25-40% of autoreactive T cells are likely to escape clonal deletion even in the presence of the deleting ligand, and in humans, Yu et al has demonstrated that clonal deletion prunes the T-cell repertoire but does not fully eliminate self-reactive T-cell clones. Importantly, while such self-reactive T-cells are typically low-avidity and are not capable of recognizing self-antigens under normal physiological conditions, they still retain the ability to become activated and to produce effector and memory cells under conditions of appropriate stimulation, such as infection and the mounting of anti-tumor responses.


Second, in cancer, several studies have shown that self-reactive T-cells can exhibit immunologic activity towards overexpressed tumor antigens, even if these antigens are also expressed at lower levels in normal tissues. One well-known example is the melanocyte differentiation antigen Melan-A/MART-1, which is expressed by both normal melanocytes and overexpressed in malignant melanoma cells. T-cell recognition of Melan-A/MART-1 has been detected in 50% of melanoma patients, and even healthy individuals have been shown to exhibit a disproportionately high frequency of Melan-A/MART-1-specific T cells in the peripheral blood. Besides Melan-A/MART-1, other examples of tumor associated self-antigens inducing immunological recognition in both healthy individuals and cancer patients include tyrosinase-related proteins (TRP-1 and TRP-2) and glycoprotein (gp) 100 in melanoma, and HA in mastocytoma cells. Such examples clearly demonstrate that in certain cases, normally expressed proteins can still become immunogenic when overexpressed in cancer. Third, tumor immunoediting—the acquired capacity of developing tumors to escape immune control, is a recognized hallmark of cancer. Tumor immune escape can occur via different mechanisms, such as through upregulation of immune checkpoint inhibitors (eg PD-L1), and altered transcription of antigen presenting genes or tumor-specific antigens. For example, decreased expression of melanoma antigens (eg gp100, MART-1, and HA) has been associated with melanoma progression to later disease stages. Besides overt downregulation of the entire gene, it is thus highly plausible that transcriptional changes affecting splice forms and promoter variants may also contribute to tumor immunoediting. For example, very recent work in B-cell acute lymphoblastic leukemia (B-ALL) has described the production of N-terminally truncated CD19 transcript variants in response to CD19 CART (chimeric antigen receptor-armed T cells) therapy, clearly showing that promoter transcript variants can indeed arise as a consequence of immunologic pressure. Taken collectively, we believe that these previously established findings all point to a plausible role for alternative promoters in reducing the immunogenic potential of tumors. In this regard, our observation that regions exhibiting somatic promoter alterations showed a significant overlap with binding targets of the Polycomb repressive complex 2 (PRC2) epigenetic regulator complex, and are particularly sensitive to EZH2 inhibition, suggests that pharmacologic approaches for reawakening somatic promoter-associated epitopes might represent an attractive strategy for increasing anti-tumor T-cell immunoreactivity and anti-tumor activity.


In conclusion, our study indicates an important role for somatic somatic promoters in GC. We also note that a significant portion (52%) of the somatic promoters localized to unannotated TSSs, consistent with recent studies indicating the existence of hundreds of transcript loci remaining to be annotated. Interestingly, a large portion of the human transcriptome has been shown to originate from repetitive elements that can exhibit promoter activity and/or express noncoding RNAs. Unannotated promoters activated in our GC study were found to be enriched in ERV-1 and L1 repeat elements which have been shown to be associated with stage specific transcription in early human embryonic cells, suggesting a yet unknown functional role for these promoters. Analysis of these unannotated promoters is likely to provide fertile ground for new and hitherto unanticipated insights into mechanisms of GC development and progression.

Claims
  • 1. A method for determining the presence or absence of at least one promoter in a cancerous biological sample relative to a non-cancerous biological sample, comprising: contacting the cancerous biological sample with at least one antibody specific for histone modifications H3K4me3 and H3K4me1;isolating nucleic acid from the cancerous biological sample having a signal ratio of H3K4me3 relative to H3K4me1 greater than 1, wherein the isolated nucleic acid comprises at least one region specific to said histone modifications;detecting a signal intensity of H3K4me3 in the isolated nucleic acid; anddetermining the presence or absence of at least one promoter in the cancerous biological sample based on the change in the signal intensity of H3K4me3 relative to the signal intensity of H3K4me3 in a non-cancerous biological sample.
  • 2. The method of claim 1, wherein the cancerous and non-cancerous biological sample comprises a single cell, multiple cells, fragments of cells, body fluid or tissue.
  • 3. The method of any one of claims 1-2, wherein the cancerous and non-cancerous biological sample is obtained from the same subject.
  • 4. The method of any one of claims 1-3, wherein the cancerous and non-cancerous biological sample are each obtained from different subjects.
  • 5. The method of any one of claims 1-4, wherein the contacting step comprises the immunoprecipitation of chromatin with the antibodies specific for the histone modifications.
  • 6. The method of any one of claims 1-5, further comprising mapping at least one promoter from the cancerous biological sample against at least one reference nucleic acid sequence to identify a gene transcript associated with the at least one promoter.
  • 7. The method of claim 6, wherein the at least one reference nucleic acid sequence comprises a nucleic acid sequence derived from: i) an annotated genome sequence;ii) a de novo transcriptome assembly; and/ora non-cancerous nucleic acid sequence library or database.
  • 8. The method of claim 1, wherein the change of signal intensity of H3K4me3 is greater than a 1.5 fold increase or decrease relative to the signal intensity of H3K4me3 in the non-cancerous biological sample.
  • 9. The method of claim 8, wherein a change of signal intensity of H3K4me3 greater than a 1.5 fold increase relative to the signal intensity of H3K4me3 in a non-cancerous biological sample, correlates to the presence of at least one cancer-associated promoter in the cancerous biological sample.
  • 10. The method of claim 9, wherein the activity of the at least one cancer-associated promoter correlates with an increase of SUZ12 or EZH2 binding sites relative to the total promoter population.
  • 11. The method of claim 10, wherein the increase of SUZ12 or EZH2 binding sites correlates with an upregulation of activity of the at least one cancer-associated promoter.
  • 12. The method of claim 10, wherein the increase of SUZ12 or EZH2 binding sites correlates with a downregulation of activity of the at least one cancer-associated promoter.
  • 13. The method of any one of claims 1-12, wherein the at least one promoter is a canonical promoter that is positioned within 500 bp from a known gene transcript start site.
  • 14. The method of claim 13, wherein the gene transcript start site is associated with one or more of a cell-type specification gene, a cell adhesion gene, a cell mediated immunity gene, a gastric cancer-associated or deregulated gene, a PRC2 target gene or a transcription factor.
  • 15. The method of claim 14, wherein the gene transcript start site is associated with an oncogene.
  • 16. The method of claim 13, wherein the gene transcript start site is associated with a gene selected from the group consisting of MYC, MET, CEACAM6, CIDN7, CIDN3, HOTAIR, PVT1, HNF4α, RASA3, GRIN2D, EpCAM and a combination thereof.
  • 17. The method of any of claims 1-16, wherein the cancer is gastric cancer or colon cancer.
  • 18. The method of any of claims 1-17, wherein the at least one promoter is an alternative promoter that is associated with a canonical promoter, wherein the canonical promoter is present in both the cancerous biological sample and the non-cancerous biological sample, and wherein the alternative promoter is only present in the cancerous biological sample, or wherein the alternative promoter is only absent in the cancerous biological sample.
  • 19. The method of any of claims 1-12, wherein the at least one promoter is an unannotated promoter that is positioned more than 500 bp away from a gene transcript start site.
  • 20. The method of claim 18, further comprising: measuring the expression level of the at least one alternative promoter in the cancerous biological sample and non-cancerous biological sample, wherein the measuring comprises digital profiling of reporter probes; anddetermining the differential expression level of the at least one alternative promoter relative to the non-cancerous biological sample, based on the digital profiling of the reporter probes, to validate the presence or absence of at least one alternative promoter in the cancerous biological sample relative to a non-cancerous biological sample.
  • 21. The method of claim 20, wherein said step of measuring is conducted using a NanoString™ platform.
  • 22. A method for determining the prognosis of cancer in a subject, comprising, contacting a cancerous biological sample obtained from the subject with at least one antibody specific for histone modification H3K4me3 and H3K4me1;
  • 23. The method of claim 22, wherein the at least one cancer-associated promoter is an alternative promoter that is associated with a canonical promoter, wherein the canonical promoter is present in both the cancerous biological sample and the reference nucleic acid sequence, and wherein the alternative promoter is only present in the cancerous biological sample or wherein the alternative promoter is only absent in the cancerous biological sample.
  • 24. The method of claim 23, wherein the presence or absence of the at least one alternative promoter in the cancerous sample is indicative of a poor prognosis of cancer survival in the subject.
  • 25. The method of claim 23, further comprising: measuring the expression level of the at least one alternative promoter in the cancerous biological sample and the reference nucleic acid sequence, wherein the measuring comprises digital profiling of reporter probes; anddetermining the differential expression level of the at least one alternative promoter relative to the non-cancerous biological sample, based on the digital profiling of the reporter probes, to validate the presence or absence of at least one alternative promoter in the cancerous biological sample relative to the reference nucleic acid sequence.
  • 26. The method of claim 25, wherein said step of measuring is conducted using a NanoString™ platform.
  • 27. A biomarker for detecting cancer in a subject, the biomarker comprising at least one promoter having a change in signal intensity of H3K4me3 in a cancerous biological sample relative to a non-cancerous biological sample.
  • 28. The biomarker of claim 27, wherein the at least one promoter comprises an increase of EZH2 binding sites relative to the total promoter population.
  • 29. The biomarker of claim 27, wherein the at least one promoter is hypomethylated.
  • 30. The biomarker of claim 27, wherein the at least one promoter is hypermethylated.
  • 31. The biomarker of claim 27, wherein the at least one promoter is a canonical promoter that is positioned less than 500 bp away from a gene transcript start site.
  • 32. The biomarker of claim 31, wherein the gene transcript start site is associated with one or more of a cell-type specification gene, a cell adhesion gene, a cell mediated immunity gene, a gastric cancer-associated or deregulated gene, a PRC2 target gene or a transcription factor.
  • 33. The biomarker of claim 31, wherein the gene transcript start site is associated with an oncogene.
  • 34. The biomarker of claim 31, wherein the gene transcript start site is associated with a gene selected from the group consisting of MYC, MET, CEACAM6, CIDN7, CIDN3, HOTAIR, PVT1, HNF4α, RASA3, GRIN2D, EpCAM and a combination thereof.
  • 35. The biomarker of claim 27, wherein the at least one promoter is an alternative promoter that is associated with a canonical promoter, wherein the canonical promoter is present in both a cancerous sample and a non-cancerous sample, and wherein the alternative promoter is only present in a cancerous sample, or wherein the alternative promoter is only absent in a cancerous sample.
  • 36. The biomarker of claim 27, wherein the at least one promoter is an unannotated promoter that is positioned more than 500 bp away from a gene transcript start site.
  • 37. A method for modulating the activity of at least one cancer-associated promoter in a cell, comprising administering an inhibitor of EZH2 to the cell.
  • 38. A method for modulating the immune response of a subject to cancer, comprising administering to the subject an inhibitor of EZH2, wherein the EZH2 is associated with at least one cancer-associated promoter in the subject.
  • 39. The method of claim 38, wherein the inhibitor of EZH2 modulates the expression of immunogenic N-terminal peptides.
  • 40. The method of claim 38 or 39, wherein the at least one cancer-associated promoter is an alternative promoter that is associated with a canonical promoter, wherein the canonical promoter is present in both a cancerous sample and a non-cancerous sample, and wherein the alternative promoter is only present in a cancerous sample, or wherein the alternative promoter is only absent in a cancerous sample.
  • 41. The method of claim 40, wherein the alternative promoter is associated with a transcript variant, and wherein the transcript variant encodes a N-terminal protein variant.
  • 42. The method of claim 41, wherein the N-terminal protein variant is an N-terminal truncated protein or an N-terminal elongated protein.
  • 43. The method of any one of claims 38 to 42, wherein the inhibitor of EZH2 is a siRNA or a small molecule.
  • 44. The method of any one of claims 38 to 43, wherein the inhibitor of EZH2 is GSK126.
  • 45. A method for determining the presence or absence of at least one cancer-associated promoter in a cancerous biological sample relative to a non-cancerous biological sample, comprising: contacting the cancerous biological sample with at least one antibody specific for histone modifications H3K4me3 and H3K4me1;isolating nucleic acid from the cancerous biological sample having a signal ratio of H3K4me3 relative to H3K4me1 greater than 1, wherein the isolated nucleic acid comprises at least one region specific to said histone modifications;detecting a signal intensity of H3K4me3 in the isolated nucleic acid at a read depth of 20M; anddetermining the presence or absence of at least one cancer-associated promoter in the cancerous biological sample based on the change in the signal intensity of H3K4me3 relative to the signal intensity of H3K4me3 in a non-cancerous biological sample.
  • 46. An inhibitor of EZH2 for use in modulating the activity of at least one cancer-associated promoter in a cell.
  • 47. Use of an inhibitor of EZH2 in the manufacture of a medicament for modulating the activity of at least one cancer-associated promoter in a cell.
  • 48. An inhibitor of EZH2 for use in modulating the immune response of a subject to cancer, wherein the EZH2 is associated with at least one cancer-associated promoter in the subject.
  • 49. Use of an inhibitor of EZH2 in the manufacture of a medicament for modulating the immune response of a subject to cancer, wherein the EZH2 is associated with at least one cancer-associated promoter in the subject.
Priority Claims (1)
Number Date Country Kind
10201601142V Feb 2016 SG national
PCT Information
Filing Document Filing Date Country Kind
PCT/SG2017/050072 2/16/2017 WO 00