Described herein are methods for identification of peptides that bind MIHC-I molecules from within a starting pool of candidate epitope peptides, using a cell-based genetic immunopeptidomic screen, and for generating cells that display only one or a selected set of peptide:MHC complexes on the cell surface.
The immune system samples the internal protein environment of all cells via the human leukocyte antigen (HLA) Class I (HLA-I) presentation system (HLA is the major histocompatibility complex (MIHC) in humans). Non-self or altered peptides displayed by HLA-I can elicit an immune response against those peptides through the activation of CD8+ Cytotoxic T cells. In some cases, nonmutant self-peptides displayed by HLA-I can elicit a response leading to autoimmunity. In a normal cell, proteins are digested by the proteosome in the cytosol into peptides of varying length. Those peptides, typically ranging from about 7 to 20 amino acids, are imported into the endoplasmic reticulum (ER) by a complex of two proteins, Transporter 1, ATP Binding Cassette Subfamily B Member 1 (TAP1) and TAP2. In the ER, two N-terminal peptidases, endoplasmic reticulum aminopeptidase 1 (ERAP1) and ERAP2, trim the peptides down, including to around 7-13 or 8-9 amino acids. Finally, the HLA-I proteins, HLA-A, -B and -C, sample peptides, generally in the range of 7-13 or 8-12 amino acids, and once bound sufficiently tightly, traffic to and present the peptides on the cell surface.
The presentation of intracellular peptides on the cell surface allows surveilling cytotoxic CD8+ T cells to identify pathogen-infected or malignant cells1. A better understanding of the rules governing peptide binding by MIHC-I molecules would facilitate the development of more effective vaccines and other immune-based therapies, but this task is complicated by the diverse array of MIHC-I molecules (HLA-A, -B, -C, -E and -G) expressed in human cells and their highly polymorphic nature across the human population3. Mass spectrometry (MS) is currently the leading method for identifying MIHC-I ligands, with large-scale experiments capable of identifying roughly a thousand peptides eluted from any given HLA allele4. One key limitation, however, is that MS-based approaches must inevitably sample peptides derived from the entire cellular proteome, and cannot be readily adapted to permit the targeted evaluation of T cell epitopes generated from a particular pathogen or neo-antigens presented by a particular tumour.
Described herein are methods and compositions for rapid empirical determination of MIHC-I binding for large pools of peptides, leveraging inexpensive DNA oligonucleotide synthesis to generate pre-defined libraries for targeted immunopeptidomics. The system can be used for querying individual peptides for MIHC-I binding, and has a number of applications.
Provided herein are isolated cells, wherein the cell has been engineered or modified to lack expression of two, three, four, or more, preferably all, of: human leukocyte antigen A (HLA-A); HLA-B; HLA-C; Transporter 1, ATP Binding Cassette Subfamily B Member 1 (TAP1); TAP2; endoplasmic reticulum aminopeptidase 1 (ERAP1); ERAP2; and histocompatibility minor 13 (HM13), and wherein the cell expresses a single HLA allele.
In some embodiments, the cell lacks expression of TAP1; TAP2; ERAP1; ERAP2; and HM13; and lacks expression of at least two of HLA-A; HLA-B; and HLA-C.
In some embodiments, the cell lacks expression of TAP1; TAP2; ERAP1; ERAP2; HM13; HLA-A; HLA-B; HLA-C, and expresses an exogenous HLA-I allele.
In some embodiments, the cell is a mammalian cell, preferably a human cell. Non mammalian cells can also be used, e.g., insect or avian cells; any cell type that can be engineered to place MHC, B2M peptide complexes on the surface of cells can be used.
In some embodiments, the cell further comprises (i) a nucleic acid comprising one or more sequences encoding candidate epitope peptides, e.g., 8-12mer, 9-mer, or longer, candidate epitope peptides, linked to a signal peptide that is preferably at least 16, 17, or 18 amino acids long and directs the peptide to the endoplasmic reticulum (ER), and a promoter that drives expression of the candidate epitope peptide linked to a signal peptide; or (ii) candidate epitope peptides, e.g., 8-12mer, 9-mer, or longer candidate epitope peptides linked to a signal peptide that is preferably at least 16, 17, or 18 amino acids long and directs the peptide to the ER. In some embodiments, the signal peptide comprises a MMTV gp70 signal peptide.
In some embodiments, the cell expresses the candidate epitope peptides linked to a signal peptide, and the candidate epitope peptides are trafficked to the ER.
Also provided herein are methods for identifying an MHC-I binding peptide. In some embodiments, the methods include providing a sample comprising the cells described herein that express a selected MHC-I allele; expressing in the cells a plurality of different candidate epitope peptides, such that each cell expresses a single selected candidate epitope peptide or plurality of candidate epitope peptides; isolating cells that have cell surface expression of the MHC-I allele; and identifying candidate epitope peptides in the cells that have cell surface expression of the MHC-I allele, thereby identifying peptides that bind to the MHC-I allele.
In some embodiments, expressing in the cells a plurality of different candidate epitope peptides comprises contacting the cells with a plurality of nucleic acids each comprising one or more sequences encoding 8-12mer, preferably 9-mer, candidate epitope peptides linked to a signal peptide that is at least 16, 17, or 18 amino acids long and directs the peptide to the endoplasmic reticulum (ER), and a promoter that drives expression of the candidate epitope peptide linked to the signal peptide, under conditions sufficient for the cells to express the peptides, preferably wherein the signal peptide comprises a MMTV gp70 signal peptide.
In some embodiments, the nucleic acids comprise expression vectors. In some embodiments, the expression vectors are viral expression vectors or plasmids. In some embodiments, the viral expression vectors are retroviral, preferably lentiviral, vectors.
In some embodiments, each cell expresses one to 100, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 20, 24, 30, 36 or more, e.g., up to 50 or 100, different candidate epitope peptides, but does not express any other peptides in the ER.
In some embodiments, the plurality of different candidate epitope peptides comprise random sequences.
In some embodiments, the plurality of different candidate epitope peptides comprise sequences derived from a pathogen, preferably a viral, bacterial, parasitic, or fungal pathogen, or from a cancer antigen.
In some embodiments, the plurality of different candidate epitope peptides comprise sequences from an autoantigen or potential autoantigen.
In some embodiments, the plurality of different candidate epitope peptides comprise an entire peptidome (peptides representing some or all of the genome of an organism).
In some embodiments, the methods include expressing at least 100; 1,000; 10,000; 100,000; 200,000; 250,000; 300,000; or more different candidate epitope peptides.
In some embodiments, isolating cells that have cell surface expression of an MHC allele comprises using fluorescence-activated cell sorting (FACS) or magnetic-activated cell sorting (MACS).
In some embodiments, identifying candidate epitope peptides comprises determining sequences encoding the peptides expressed in the cells that have cell surface expression of an MHC allele.
In some embodiments, the sequences encoding the peptides are determined by sequencing.
Additionally, provided herein are methods for isolating a cell for use in generating an immune response to an epitope in a subject. The methods can include providing a sample comprising the cells of claims 1 to 4 that express a selected MHC-I allele; expressing in the cells a plurality of different candidate epitope peptides linked to a signal peptide that is preferably at least 16, 17, or 18 amino acids long and directs the peptide to the endoplasmic reticulum (ER), such that each cell expresses a single selected candidate epitope peptide or plurality of candidate epitope peptides; and isolating cells that have cell surface expression of the MHC-I allele. In some embodiments, the plurality of different candidate epitope peptides comprise sequences derived from a pathogen, preferably a viral, bacterial, parasitic, or fungal pathogen, or from a cancer antigen.
Also provided herein are methods for stimulating T cells, or providing populations of stimulated/activated T cells. The methods can include providing a sample comprising the cells of claims 1 to 4 that express a selected MHC-I allele; expressing in the cells one or more specific epitope peptide linked to a signal peptide that is preferably at least 16, 17, or 18 amino acids long and directs the peptide to the endoplasmic reticulum (ER), such that each cell expresses a single specific epitope peptide or plurality of specific epitope peptides; incubating the cells in the presence of T cells in culture under conditions that allow activation of the T cells; and isolating activated T cells from the culture. These methods can be used to stimulate T cells in vitro to evolve T cells with specific specificities. In some embodiments, the specific epitope peptides comprise sequences derived from a pathogen, preferably a viral, bacterial, parasitic, or fungal pathogen, or from a cancer antigen.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Methods and materials are described herein for use in the present invention; other, suitable methods and materials known in the art can also be used. The materials, methods, and examples are illustrative only and not intended to be limiting. All publications, patent applications, patents, sequences, database entries, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control.
Other features and advantages of the invention will be apparent from the following detailed description and figures, and from the claims.
Described herein are cell-based genetic methods, one example of which is referred to herein as ‘EpiScan,’ that allow for rapid empirical determination of MHC-I binding for large pools of peptides, leveraging inexpensive DNA oligonucleotide synthesis to generate pre-defined libraries for targeted immunopeptidomics. The system can be used for querying individual peptides for MHC-I binding.
The present methods rely on the fact that HLA-I proteins are only stable on the cell surface when bound to a peptide. Thus, if a cell expressing only one HLA-I gene and one candidate peptide has HLA-I on its surface, as identified by flow cytometry, then that HLA-I protein must have bound to that peptide. However, a typical mammalian cell expresses several HLA-I genes/alleles and each HLA-I allele is exposed to tens of thousands of potential peptides. Thus, provided herein are cells engineered to remove expression of one, two, three, four, or more, e.g., all, relevant immune presentation related genes (e.g., HLA-A, B and -C; TAP1 and -2; ERAP1 and -2, and signal peptide peptidase HM13). In some embodiments, one or more or all of HLA-E, -F and -G are also deleted. TAP1/2 deletion prevents cytosolic peptides from being transported into the ER. ERAP1/2 deletion prevents ER-resident peptides, such as signal peptides, from being further processed to a length more suitable for HLA-I binding. HM13 deletion prevents membrane-resident signal peptides from being cleaved and released into the ER. The cells are also engineered to express only one HLA-I gene/allele, e.g., to retain a single endogenous HLA-I allele (e.g., one of HLA-A, -B, -C, -E, -F, or -G), or a single HLA-I allele can be introduced, e.g., via viral, preferably lentiviral, transduction. Cells lacking one or more of these genes would facilitate the detection of HLA driven to the surface by peptides engineered to go directly to the ER for loading onto HLA. A number of methods are known in the art for knocking out genes, including the use of CRISPR-Cas or other RNA-guided nucleases, TALEs, or zinc fingers, to introduce mutations that abrogate expression of a target gene, e.g., by introduction of a mutation that inserts a stop codon resulting in expression of a non-functional fragment of the target gene, or by homologous recombination to delete all or a part of the target gene. Alternatively, other methods can be used to reduce or eliminate expression of the genes. For example, for TAP knockout, viral TAP inhibition can be used instead of CRISPR-KO; for example a viral TAP inhibitor, UL49.538-40, can be used. In addition, viral gene induced degradation of HLA can be used. These methods include introduction of a viral gene such as human cytomegalovirus (HCMV) US2 or US11 (see Van den Boomen and Lehner, Mol. Immunol. 68, 106-111 (2015)), which use mammalian ER-associated degradation (ERAD) to induce rapid degradation of major histocompatibility class I (MHC-I) molecules, thereby degrading endogenous HLA alleles. Next, an HLA allele of choice that no longer has the lysine residue/s upon which US2 or US11 cause ubiquitination then degradation. Thus, the introduced allele is the only one that is not degraded. In some embodiments, expression of TAP genes is reduced using viral TAP inhibitor UL49.5, and expression of HLA is reduced using HCMV US2 or US11, thereby obviating the need for genomic engineering methods such as CRISPR to create a cell line.
In addition, a number of methods are known in the art for introducing a sequence into a cell, e.g., by use of a vector containing nucleic acid, e.g., a cDNA. The vectors can be viral vectors, including recombinant retroviruses (e.g., lentivirus), adenovirus, adeno-associated virus, lentivirus, and herpes simplex virus-1, or recombinant bacterial or eukaryotic plasmids. In some embodiments, transposons like Sleeping Beauty or piggyback are used, or plasmids that integrate site specifically by Cre or FLP-mediates integration or by homologous recombination into a particular locus. All of these could allow a screen to be performed at high complexity. By the way you should mention retroviruses as a class (lentivirus is a special kind of retrovirus) Viral vectors transfect cells directly; plasmid DNA can be delivered naked or with the help of, for example, cationic liposomes (lipofectamine) or derivatized (e.g., antibody conjugated), polylysine conjugates, gramacidin, artificial viral envelopes or other such intracellular carriers, as well as direct injection of the gene construct or CaPO4 precipitation carried out in vivo. See, e.g., Hall et al., Curr Protoc Cell Biol. 2009 September; CHAPTER: Unit19.1217; Doyle et al., Transgenic Res. 2012 April; 21(2): 327-349; Jin et al., PLoS One. 2020; 15(2): e0228910. The methods can include performing sequencing assays to confirm the presence of the intended mutation; RNA assays to confirm a lack of functional transcript; or protein detection methods to confirm a lack of protein.
Exemplary human genomic sequences encoding the target proteins that can be knocked out are provided in the following table.
These cells thus engineered lack short peptides in the ER, and presentation on MHC-I is impaired or lost, in the absence of expression of an exogenous sequence linked to a signal peptide that directs a peptide or other sequence to the ER, as described below.
Exemplary sequences for human HLA-I proteins and cDNAs encoding the proteins are provided in the following table.
Although the sequences provided above are human, other species can also be used; so long as a beta-2-microglobulin domain that binds the MHC of interest is also introduced, then any species' MHC can be studied. For example, a humanized version of the murine H2-Kb can be used, wherein the beta-2-microglobulin (β2M) interacting domain was replaced with the human equivalent the sequence is as follows (dotted underline and bold represents “humanized sequence” that was taken from HLA-A*02:01 and the rest is from mouse H2-Kb):
Further, although human cells are exemplified herein, other mammalian species' cells can also be used, e.g., non-human primates, cats, dogs, horses, cows, goats, sheep, stoats, and so on.
In some embodiments, the cells are also engineered to express selected candidate epitope peptides, e.g., one or more selected candidate epitope peptides, in the ER where HLA-I samples potential peptides for binding. By fusing the peptide of interest to a signal peptide, as the peptide is translated into the ER it is cleaved without needing any further processing. Preferred signal peptides include codon-optimized MMTV gp70 signal peptide (MPNHQSGSPTGSSDLLLDGKKQRAHLALRRKRRREMRKINRKVRRMNLAPIKE KTAWQHLQALIFEAEEVLKTSQTPQTSLTLFLALLAVLAPPPVSG (SEQ ID NO:172). Additionally, in preferred embodiments the signal peptide used is longer than 16 nucleotides, thus preventing its binding to HLA-I. The sequence encoding the peptide-signal peptide can be introduced into the cell, e.g., via viral, preferably lentiviral, transduction. In some embodiments, the peptide is ultimately exported from the ER; see. e.g., Byun, et al., J. Virol. 86, 214-25 (2012). Alternatively, synthesized peptides can be used with the EpiScan cells to determine MHC-I binding. The peptides can include the signal peptides. Synthetic peptides, e.g., produced using solid phase peptide synthesis (SPPS), can be added to the media; see, e.g., the “T2 assay,” Stuber et al., Eur J Immunol. 1992; 22(10):2697-2703.
MHC class I molecules are expressed in all nucleated cells and in platelets. The parental or host cells used for these methods can include any mammalian cells, preferably human cells, that can be maintained in culture. Examples of cells that can be used for the present methods and compositions include cells from cell lines, e.g., HEK-293T cells. In some embodiments, the cells are of tumor origin, or are not of tumor origin. Examples of commercially available human cell lines from non-tumor sources include CCD-1064Sk (ATCC® CRL-2076); HCC1599 BL (ATCC® CRL-2332); BJ (ATCC® CRL-2522); HCC1395 BL (ATCC® CRL-2325); HCC2157 BL (ATCC® CRL-2341) (+); COLO 829BL (ATCC® CRL-1980); HGF-1 (ATCC® CRL-2014); HCC1143 BL (ATCC® CRL-2362); Hs27 (ATCC® CRL-1634); FHC (ATCC® CRL-1831); HCC1007 BL (ATCC® CRL-2319); MRC-5 (ATCC® CCL-171); HUV-EC-C [HUVEC] (ATCC® CRL-1730); CCD-8Lu (ATCC® CCL-201); HEL 299 (ATCC® CCL-137); MCF-12F (ATCC® CRL-10783); CCD-33Lu (ATCC® CRL-1490); CCD-112CoN (ATCC® CRL-1541); Malme-3 (ATCC® HTB-102) (+); RWPE-2 (ATCC® CRL-11610); NCI-BL2126 [BL2126] (ATCC® CCL-256.1); HCC1937 BL (ATCC® CRL-2337); CCD-19Lu (ATCC® CCL-210); THLE-3 (ATCC® CRL-11233); 184B5 (ATCC® CRL-8799); CCD-986Sk (ATCC® CRL-1947) (+); HFL1 (ATCC® CCL-153); IMR-90 (ATCC® CCL-186); WPMY-1 (ATCC® CRL-2854); CCD-18Co (ATCC® CRL-1459) (+); RWPE-1 (ATCC® CRL-11609) (+); OAT1 HEK 293T/17 (ATCC® CRL-11268G-1); Detroit 548 (ATCC® CCL-116); MRC-9 (ATCC® CCL-212); NCI-BL1184 [BL1184](ATCC® CRL-5949); CCD 841 CoN (ATCC® CRL-1790); HS-5 (ATCC® CRL-11882); LL 24 (ATCC® CCL-151); HCC38 BL (ATCC® CRL-2346); NCI-BL1437 [BL1437] (ATCC® CRL-5958); Hs 895.Sk (ATCC® CRL-7636); WI-38 (ATCC® CCL-75); ARPE-19 (ATCC® CRL-2302); Detroit 551 (ATCC® CCL-110); Hs 578Bst (ATCC® HTB-125); FHs 74 Int (ATCC® CCL-241); NCI-BL1770 [BL1770] (ATCC® CRL-5960); WS1 (ATCC® CRL-1502) (+); CCD-1070Sk (ATCC® CRL-2091); CCD-16Lu (ATCC® CCL-204); NCI-BL2009 [BL2009] (ATCC® CRL-5961); HCC1954 BL (ATCC® CRL-2339); CCD-1079Sk (ATCC® CRL-2097); CCD-33Co (ATCC® CRL-1539); HCC2218 BL (ATCC® CRL-2363); NCI-BL1395 [BL1395] (ATCC® CRL-5957); Het-1A (ATCC® CRL-2692); TE 353.Sk (ATCC® CRL-7761); WPE1-NB26 (ATCC® CRL-2852); NCI-BL2052 [BL2052] (ATCC® CRL-5963); CCD-1059Sk (ATCC® CRL-2072); NCI-BL209 [BL209] (ATCC® CRL-5948); Hs 605.Sk (ATCC® CRL-7364); CCD-1090Sk (ATCC® CRL-2106); WPE1-NA22 (ATCC® CRL-2849); Hs 925.Sk (ATCC® CRL-7676); HBE4-E6/E7 [NBE4-E6/E7] (ATCC® CRL-2078); NCI-BL2195 [BL2195] (ATCC® CRL-5956); NCI-BL2087 [BL2087] (ATCC® CRL-5965); NCI-BL128 [BL128] (ATCC® CRL-5947); Hs 742.Sk (ATCC® CRL-7481); NCI-BL1672 [BL1672] (ATCC® CRL-5959); CCD-27Sk (ATCC® CRL-1475); Hs 789.Sk (ATCC® CRL-7518); WPE1-NB14 (ATCC® CRL-2850); and WPE1-NB11 (ATCC® CRL-2851). Other cell lines that can be used include lymphoid derived cells, e.g., K-562 (ATCC® CCL-243) or SKW 6.4 (ATCC® TIB-215). In some embodiments, the cell is a B cell or B-lymphoid cell, or is derived from an immortalized B cell (see, e.g., Nilsson et al., Hum Cell. 1992 March;5(1):25-41). In some embodiments, the cell is a K-562 cell that expresses GM-CSF (e.g., Smith et al., Clin Cancer Res. 2010 Jan. 1; 16(1): 338-347). In some embodiments, the cells are T2 or RMA-S (mouse), which have no TAP1/2, or B721.221, which is MHC-I deficient.
Assays
The cells described herein can be used to identify MHC-I binding epitopes. Generally speaking, in these assays a pool of oligonucleotides encoding potential MHC-I binding peptides, e.g., 8-12mer peptides, e.g., 9-mer peptides, is expressed in the cells, such that each cell expresses only one peptide (fused to a signal peptide as described above) designed to directly load onto MHC after minimal processing upon ER entry. In some embodiments, wherein ER proteases such as ERAP1/2 have been ablated, the peptide is only processed by the signal peptide peptidase to release it from the signal peptide. Alternatively, endogenous proteases can still be active and the peptide is further processed in the ER prior to binding to MHC. Additionally, exogeneous proteases may be introduced that can process the peptide. One could also modify genes in the peptide loading complex, such as TAPBP, CALR or PDIA3. In some embodiments, the oligonucleotides are random. In some embodiments, at least 1; 5; 10; 100; 1,000; 10,000; 100,000; 200,000; 250,000; 300,000; or more different peptides are sampled. The cells can be assayed as a pool in a unified sample, wherein the sample includes a plurality of different clones, each clone expressing different peptides. In some embodiments, the methods are used to identify MHC-I binding epitopes, and the pool of oligonucleotides comprises every possible 8-12mer peptide in a selected protein representing every possible 8-12mer from the selected protein. Alternatively, the oligonucleotides can represent a curated selection of 8-12mers, e.g., from candidate portions of the selected protein that are better candidates for MHC-I binding. Such candidate portions can be identified using methods known in the art, e.g., bioinformatics methods such as etMHC 4.0, NetMVHC 3.4, NetMHCpan 4.0, NetMHCpan 3.0, NetMHCpan 2.8, NetMHCcons 1.1, PuickPocket 1.1, IEDB recommended, IEDB consensus, IEDB SMMPMBEC, IEDB SMM, MHCflurry 1.1, and SYFPEITHI; see Bonsack et al., Cancer Immunol Res May 1 2019 (7) (5) 719-736.
Sequences encoding the peptides are cloned into an expression vector comprising a promoter for expression of the peptides, e.g., the exemplary EpiScan lentiviral vector described herein, and expressed in cells expressing a single HLA allele with modifications to HLA-I presentation machinery described herein. Cells expressing exogenous peptides that bind MIHC-I exhibit elevated cell surface MIHC-I levels, and can be isolated, e.g., using fluorescence-activated cell sorting (FACS) or magnetic-activated cell sorting (MACS). Then, the identity of the peptides can be determined, e.g., by sequencing, e.g., next-generation sequencing, using primers that bind to the vector sequence on either side of the sequence encoding the peptide.
In some embodiments, variants of (i.e., at least 60, 70, 80, 85, 90, 95, 97, 99% identical to) the proteins and nucleic acids described herein can be used. To determine the percent identity of two amino acid sequences, or of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). In a preferred embodiment, the length of a reference sequence aligned for comparison purposes is at least 80% of the length of the reference sequence, and in some embodiments is at least 90% or 100%. The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein amino acid or nucleic acid “identity” is equivalent to amino acid or nucleic acid “homology”). The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences.
The comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm. For example, the percent identity between two amino acid sequences can determined using the Needleman and Wunsch ((1970) J. Mol. Biol. 48:444-453) algorithm which has been incorporated into the GAP program in the GCG software package (available on the world wide web at gcg.com), using the default parameters, e.g., a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
Applications
The present methods and compositions have many applications in both basic and translational research, as well as clinical practice.
As demonstrated with SARS-CoV-2, the present methods can be used for uncovering the entire MHC-I immunopeptidome for a single protein or pathogen, e.g., to identify MHC-I binding epitopes in one or more proteins from a pathogen, e.g., a bacterium, virus, parasite, or fungus. Once the epitopes have been identified, cells can be engineered to express one or more of the epitopes for use in a live cell vaccine, and administered to a subject to elicit an immune response to the pathogen from which the epitope was derived.
The present methods can also be used to generate cells that display only, or a majority of (e.g., at least 1%, 2% 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 95%) a single peptide. In this way, dendritic cells that display only, or a large majority of, a single peptide, which can be used to focus a vaccine on a subset of epitopes. Cells that express a single, or a majority of, single peptides can be used to isolate rare T-cells specific to that peptide:MHC complex.
These methods can be used to find potential epitopes in any given protein. Once identified, the present methods can include using known molecular biology methods to ‘deimmunize’ the protein,30 e.g., by mutation of identified epitopes until the epitope is no longer presented. The mutated proteins (or mutated peptides therefrom) can then be subjected to further rounds of epitope scanning to confirm reduction or loss of MHC binding epitope. These methods can be used to develop nonimmunogenic gene therapies, e.g., for humans.
Classical vaccination methods utilize immunization with full-length proteins, but the immune response that follows typically focuses on only a subset of potential antigenic epitopes through the poorly understood process of T cell immunodominance31. Knowledge of the assortment of potential T cell epitopes given the MIHC-I haplotype of any given individual could guide the development of personalized vaccines, which should provide a broader and potentially more durable response32. In particular, the present methods can be used for assessment of potential neo-antigen peptide:MHC-I complexes necessary for personalized cancer vaccines33. The methods can be used to test recurrent cancer mutations for HLA display to match neoantigens and HLAs. In addition, the methods can be used to profile patient specific cancer mutations for HLA display.
The methods can also be used to identify tissue- or pathology-specific peptides presented on MHC to later use as vaccine targets.
In addition, the methods can be used to screen for interventions such as viruses, proteins, genes, or small molecules that enhance binding of a particular peptide on a given HLA, block HLA binding or that change the specificity of an HLA. These methods include conducting the assays described herein in the presence and absence of the intervention.
The methods can also be used to elicit T cell responses in order to precisely identify the epitope of a specific T-cell receptor (TCR). Co-incubation of EpiScan cells that express a single peptide:MHC-I complex on the surface, or pools of EpiScan cells that express different single peptide:MHC-I complexes on the surface, with T cells will activate T cells with TCRs that recognize the presented peptide:MHC-I complex. Methods known in the art, such as, but not limited to, IL-2 ELISpot (Ranieri et al., Methods Mol Biol. 2014; 1186:75-86), T-Scan (Kula et al., Cell. 2019 Aug. 8; 178(4):1016-1028.e13), CD69 FACS (Simms and Ellis, Clin Diagn Lab Immunol. 1996 May; 3(3):301-4) can be used to detect and isolate activated T cells, and the epitope can then be identified as above. See, e.g., Example 10 and
In addition, EpiScan data can be used to generate predictions about MIHC-I peptide binding preferences, for the development of computational models that can accurately predict MIHC-I ligands starting from the primary sequence of a protein4,20,21. An effective prediction algorithm analogous to the MSi algorithm recently developed by Sarkizova and colleagues4 was developed. Machine learning models were trained to classify 9-mer peptide sequences as binders or non-binders for HLA-A2, HLA-A3, HLA-B8 and HLA-B57. In addition to not suffering from detection bias inherent to MS, these methods render predictions solely based on allele-specific affinity, and thus can identify MHC-I ligands that aren't subject to proteasome processing or TAP import. See, e.g., Example 3 and
The invention is further described in the following examples, which do not limit the scope of the invention described in the claims.
Materials and Methods
The following materials and methods were used in the Examples below.
Cell Culture
HEK-293T (CRL-3216), T2 (CRL-1992) and CIR (CRL-2369) cells were obtained from ATCC. T2 and C1R cells were cultured in IMDM (Gibco, 12440053) with 10% FBS (HyClone) and 1% penicillin-streptomycin (15140-122, Invitrogen); HEK-293T were cultured in 10% DMEM (Gibco, 11995065) with 10% FBS (HyClone) and 1% penicillin-streptomycin (15140-122, Invitrogen). All cell lines were regularly tested for mycoplasma and all negative.
Generation of EpiScan Cells
HEK-293T cells were transfected with sgRNAs targeting TAP1 and TAP2; cells exhibiting diminished cell surface MHC-I were then single cell cloned by sorting into 96-well plates. An MHC-Ilow clone was then transfected with two sgRNAs targeting all endogenous MHC-I alleles. Cells lacking any detectable cell surface MHC-I were then single cell cloned. Then, a TAP1/2 deficient, MHC-I null clone was transfected with sgRNAs targeting ERAP1 and ERAP2 and single cell clones again generated from the resulting population. Successful disruption of ERAP1 and ERAP2 was confirmed by immunoblot and TOPO cloning and Sanger sequencing, respectively. Finally, cells without MHC-I, TAP1/2 or ERAP1/2 were transfected with sgRNA targeting HM13. Knockout of HM13 was confirmed via TOPO cloning and Sanger sequencing.
All sgRNAs were cloned into either lentiCRISPR v2-FE or PX458 (Addgene #48138); sequences used were:
Alternatively, for TAP knockout, viral TAP inhibition was used instead of CRISPR-KO. 293 Ts were infected with lentivirus encoding a viral TAP inhibitor, UL49.538-40.
Generation of EpiScan Vector
A lentiviral pHAGE vector with a CMV promoter plus an EF1α promoter driving EGFP-P2A-PuroR was used as the backbone. The vector was digested with PstI and AgeI to excise the EF1α promoter, and the Gibson assembly method used to insert a gBlock (IDT) encoding (1) a codon-optimized MMTV gp70 signal peptide (MPNHQSGSPTGSSDLLLDGKKQRAHLALRRKRRREMRKINRKVRRMNLAPIKE KTAWQHLQALIFEAEEVLKTSQTPQTSLTLFLALLAVLAPPPVSG (SEQ ID NO:15)), (2) filler region flanked by BsmBI sites and (3) an IRES element. The resulting vector was then converted into a Gateway-like destination vector by inserting the CmR and ccdB cassettes into the SphI site located in the filler region.
Peptide Pulsing
Cells were washed with PBS three times to remove FBS, resuspended in IMDM with 1% penicillin-streptomycin (15140-122, Invitrogen) without FBS, and 100,000 cells seeded per well of a 96-well plate. Peptides were added 24 h before analysis by flow cytometry.
Flow Cytometry
Cells were stained for at least 30 m in PBS, washed in PBS and then analyzed with a BD LSR2. All antibodies were from BioLegend and used at 1:100:
FACS
For EpiScan screens, 30 μl of antibody (APC-conjugated anti-human HLA-A2 antibody, BioLegend, 343308 or APC-conjugated anti-human 02-microglobulin antibody, BioLegend, 316312) in a total volume of 1.5 ml was used per 10 million cells. Staining was conducted for 30 min at 4° C.; cells were then washed in PBS prior to sorting. Sorting was performed on a Sony MA900 instrument.
Immunoblotting
Cells were pelleted, washed in PBS, and then lysed in RIPA buffer. Lysates were mixed with Novex Tris-Glycine SDS Sample Buffer containing β-mercaptoethanol and resolved on a 4-20% Tris-Glycine SDS-PAGE gel. Antibodies used were anti-GAPDH (sc-47724, Santa Cruz, 1:200) and anti-ERAP1 (MABF851, Millipore, 1:1000).
Transfection and Single Cell Cloning
HEK-293T cells were transfected using PolyJet (SignaGen, SL100688) as recommended by the manufacturer. Single cell cloning was carried out after 7 d by FACS using a Sony MA900 instrument.
Lentiviral Transduction
293T cells were transfected with PolyJet (SignaGen, SL100688) according to manufacturer's directions using a 1:1 ratio of lentiviral plasmids to packaging vectors (encoding VSV-G, Tat, Rev and Gag-Pol). Viral supernatants were harvested at 48 h and 72 h post-transfection, passaged through a 0.45 μm filter, and applied to target cells for 48 h in the presence of 8 μg/ml polybrene. Transduced cells were selected with 2 μg/ml puromycin for at least four days.
EpiScan Library Generation
Random 9-mer library. An oligo of the follow sequence was ordered from Integrated DNA Technologies: ccacctgtgagcgggNNBNNBNNBNNBNNBNNBNNBNNBNNBtaaGCacgttactgg (SEQ ID NO: 16), wherein B is Guanine/Thymine/Cytosine. It was amplified by PCR using the primers, tggccgtattggccccgccacctgtgagcggg (SEQ ID NO: 17) and attccaagcggcttcggccagtaacgtGCtta (SEQ ID NO: 18), and then cloned into the EpiScan vector digested with BsmBI using the Gibson assembly method. The resulting plasmids were then electroporated into Electromax DH10B competent cells (ThermoFisher Scientific).
SARS-CoV-2 library. Protein sequences of SARS-CoV-2 available as of 2/06/20 were downloaded from the NCBI Severe acute respiratory syndrome coronavirus 2 data hub. This represented a total of 11 strains of SARS-CoV-2. All protein sequences were broken into 9-, 10- and 11-mer fragments and duplicates were removed. The remaining sequences were then reverse translated using a custom script written in MATLAB R2019b to avoid restriction sites for EcoRI/XhoI/BsmBI/BbsI and to ensure GC content between 30% and 70%. Sequences were amplified from a SurePrint Oligonucleotide Library (Agilent) and digested with BbsI to liberate sticky ended peptide-encoding fragments. The EpiScan vector was digested with BsmBI to generate compatible sticky ends and the fragments were cloned in via T4 ligation. The ligation products were then electroporated into Electromax DH10B competent cells (ThermoFisher Scientific).
NGS Library Preparation
Genomic DNA was isolated via phenol/chloroform extraction. EpiScan vector sequences were amplified (F: tccctacacgacgctcttccgatctTACAGCTcgccacctgtgagcggg (SEQ ID NO: 19) and R: ggcttcggccagtaacgtgc (SEQ ID NO:20); the bold uppercase sequence represents a 0-7 nt variable stagger region) in a 125 μl reaction with 5 μg gDNA. PCR reactions for each sample were pooled, purified using the Machery-Nagel PCR clean-up kit (Takara, 740609), and 400 ng used for a second round of PCR to add Illumina P5 and P7 sequences and indices for multiplexing (F: aatgatacggcgaccaccgagatctacactcttTCCCTACACGACGCTCT TCCG (SEQ ID NO:21) and R: caagcagaagacggcatacgagat[xxxxxxx]GTGACTGGA GTTCAGACGTGT (SEQ ID NO:22); where [xxxxxx] represents the sample index). Finally samples were pooled, gel purified and then sequenced using an Illumina NextSeq or NovaSeq instrument.
Expression Vectors
All cDNAs were cloned into expression vectors via Gateway Cloning (ThermoFisher). ERAP1 (IOH80668) was obtained from the Harvard ORFeome v8 collection. ERAP2 and MHC-I alleles were codon optimized and synthesized as gBlocks with flanking attB sites by Integrated DNA Technologies. Destination vectors all used the EFlu promoter to drive cDNA expression and contained a selectable marker (BFP, mAmetrine, tdTomato or HygroR) driven by the PGK promoter.
Computational Prediction of NHC-I Ligands
The Keras Python library was used to train machine learning models to predict the likelihood of any given 9-mer binding MHC-I. A neural network architecture analogous to that developed by Sarkizova and colleagues4 was employed, with only minor modifications. Four different models were trained, each with different encodings of the peptide sequence: (1) sparse matrix encoding, (2) similarity encoding using the Blosum62 matrix, (3) similarity encoding based on the PMBEC matrix34, and (4) an encoding in which each amino acid was represented by the first three principal components derived from dimensionality reduction based on physiochemical properties35. For each model a single hidden layer of 100 neurons with sigmoid activation was used; the outputs of these models were combined in a single output layer to generate the final binding prediction.
For each allele, the positive hits were the MHC-I ligands identified by EpiScan, while the set of negative decoys comprised all other peptides which were identified in the input 9-mer random library but which were not found in any of the EpiScan sorting bins. Training was performed as described4, except that a 10-fold excess of decoys was used. Predictive power was assessed as recommended4, whereby the ability of the model to predict true binders amongst the top 0.1% of the dataset was evaluated in the presence of a 999-fold excess of decoy peptides (PPV metric). The data depicted in
Conservation Scoring
SARS-CoV-2 protein sequences were obtained from UniProt and entered into the ConSurf Server26,27,36. For S, 3a and 7a RCSB PDB structures (6VXX, 6XDC and 6W37, respectively) were used. HMIVMER was used as the homolog search algorithm with Uniprot as the protein database. Automatic homologue selection settings of a 35-95% homologue identity were required. The alignment method was MAFFT-L-INS-I with Bayesian calculation method with the default evolutionary substitution model. ORF10 was excluded due to lack of a sufficient number of homologues to perform conservation scoring. To locate epitopes in conserved regions, the conservation score was averaged over the length of the epitope.
T Cell Isolation and Expansion
Peripheral blood was provided by collaborators from Ragon Institute of MGH that were PCR-confirmed COVID-19 cases. All study participants provided verbal and/or written informed consent. Participation in these studies was voluntary and the study protocols have been approved by the Partners Institutional Review Board. Memory CD8+ T cells were isolated using the Miltenyi CD8+ Memory T cell isolation kit according to manufacturer's instructions. T cells were expanded using irradiated peripheral blood mononuclear cells (PBMCs). Briefly, apheresis collars were obtained from the Brigham and Women's Hospital Specimen Bank under protocol T0276 and PBMCs were purified on a Ficoll gradient. The cells at the interface were extracted, washed twice, and irradiated (60 Gy IR). For expansion, isolated memory CD8+ patient T cells were added to 2 million irradiated PBMCs in a final volume of 20 ml RPMI, 10% FBS, 100 units/ml penicillin, 0.1 mg/ml streptomycin, 50 U/ml IL-2 (Sigma), and 0.1 ug/ml anti-CD3 antibody (OKT3, ebioscience).
Tetramer Staining of Patient Samples
The following peptides were synthesized by New England Peptide:
Peptides were loaded at 10 mg/ml and exchange was quantified onto the QuickSwitch Quant HLA-A*02:01 Tetramers (PE or APC labeled) (MBL International) according to manufacturer's instructions. Tetramers were used for staining at a final concentration of 10 μg/ml. Where specified, cells were additionally stained with a Brilliant Violet 421-conjugated anti-CD3 antibody (BioLegend) and an Alexa Fluor 647-conjugated anti-CD8 antibody (Biolegend).
Statistical Tests
Unless otherwise noted, significance for all dot plots was measured by one-way ANOVA with Dunnett's multiple-comparison test with *p<0.05 **p<0.01 ***p<0.001 or ****p<0.0001 for each group relative to the negative control conditions. This was performed using GraphPad Prism 8. Fisher's Exact Test was performed with fishertest using MATLAB R2019b.
Graph Generation
Unless otherwise noted, all dot plots or bar graphs were created using either GraphPad Prism 8 or the Python Seaborn library. Data are represented as mean±SEM of the fold change in mean fluorescence intensity (MFI) relative to the average of the negative controls for that experiment. Each dot represents a different biological replicate. Scatter plots were created using Spotfire 10 (TIBCO).
Logoplot Generation
Logoplots were generated with Seq2Logo37. Logoplots were of type Shannon (-I 1), with Hobohm clustering (-C 2) and no weight on prior (-b 0). To account for the difference in amino acid frequencies between the 9-mer randomer library and the human proteome, for plots describing EpiScan data a custom (--bg argument) position-specific scoring matrix (PSSM) was employed.
Allele Specificity Correlation
For each allele for each methodology, the frequency of every amino acid at each of nine positions was calculated to create a 9×20 matrix. The matrix was flattened into a 1D array and then pairwise Pearson calculations were computed using numpy.corrcoef.
MHC Class I IP Procedure:
1. Cell pellets were thawed on ice, then lysed at 50 million cells/mL of lysis buffer, incubated 30 min on ice
2. Insoluble material was pelleted at 800×g for 5 min.
3. Supernatant was centrifuged at 20,000×g for 30 min at 4° C.
4. Resin was washed and combined with clarified lysates
5. Resin was mixed with lysates (normalized by BCA to lowest protein yield) by gentle rotation at 4° C. overnight.
6. The next day, samples were centrifuged at 800×g for 5 min at 4° C.
7. Three washes (Buffers 1-3) of the resin were performed, which consisted of the following:
8. At wash #4, 0.75 mL of Buffer 4 was added, and the total volume was transferred to loBind tubes
9. 1 mL of Elution buffer was added to each tube and incubated at 37° C. for 5 min.
10. Samples were centrifuged at 800×g for 5 min at 4° C. to elute.
11. Eluates (supernatant) were collected into new loBind Eppendorf tubes and stored at −80° C. until transfer to MSB.
12. Eluates were submitted for LC-MS/MS analysis and PRE and POST samples were tested by ELISA.
Peptides were desalted and concentrated using a Waters HLB solid phase extraction plate.
Mass Spectrometry
Half of each enriched sample was analyzed by nano LC-MS/MS using a Waters M-Class HPLC system interfaced to a ThermoFisher Fusion Lumos mass spectrometer.
Peptides were loaded on a trapping column and eluted over a 75 μm analytical column at 350 nL/min; both columns were packed with Luna C18 resin (Phenomenex). A 2 hr gradient was employed. The mass spectrometer was operated using a custom data-dependent method, with MS performed in the Orbitrap at 60,000 FWHM resolution and sequential MS/MS performed using high resolution CID and EThcD in the Orbitrap at 15,000 FWHM resolution. All MS data were acquired from m/z 300-800. A 3s cycle time was employed for all steps.
Data Processing
Data were searched using a local copy of PEAKS (Bioinformatics Solutions) with the following parameters:
Enzyme: None
Database: SwissProt Human appended with #1 Bruno_sample 1 or #2 Bruno_sample 2
Fixed modification: None
Variable modifications: Variable modifications: Oxidation (M), Deamidation (N,Q), Acetyl (Protein N-term)
Mass values: Monoisotopic
Peptide Mass Tolerance: 10 ppm
Fragment Mass Tolerance: 0.02 Da
PSM FDR: 1%
PEAKS output was further processed using Microsoft Excel.
EpiScan is a genetic platform that allows for the high-throughput and cost-efficient identification of peptides that bind MHC-I molecules from within a defined starting pool. EpiScan relies on the principle that MHC-I molecules are only trafficked to, and maintained on, the cell surface after stably binding a high-affinity peptide in the endoplasmic reticulum (ER) (
We validated the EpiScan platform using the model ovalbumin antigen, SIINFEKL (SEQ ID NO:33). Using a viral TAP inhibitor gene, UL49.5 (5A) or CRISPR/Cas9-mediated gene disruption, we isolated a HEK 293T clone (henceforth ‘EpiScan cells’) lacking MHC-I (HLA-A, -B, -C), TAP, and the ER-resident metallopeptidases ERAP1 and ERAP26,7 (
Peptidase activity in the ER could adversely affect the performance of EpiScan: destruction of the exogenous peptide would reduce the sensitivity of the assay, while partial proteolysis could generate false positives as a processed form of the peptide—and not the genetically-encoded peptide itself—might bind to MHC-I. Thus we also chose to mutate the peptidases ERAP1 and ERAP2, which trim antigenic peptides from their N-termini to generate fragments of the optimal size for MHC-I binding (8-12-mers)6,7. To verify the loss of the activity of these enzymes in EpiScan cells we expressed N-terminally extended versions of our positive control peptides, reasoning that this should not result in increased surface MHC-I levels in the absence of N-terminal peptidase activity. Indeed, N-terminally extended versions of SIINFEKL (SEQ ID NO:33) or NLVPMVATV (SEQ ID NO:34), a peptide derived from the pp65 gene of human cytomegalovirus, did not lead to increased MHC-I surface staining in either humanized-H2-Kb- or HLA-A2-expressing EpiScan cells (
Having optimized the EpiScan platform using individual peptides, we sought to implement the approach for high-throughput screening to identify MHC-I peptide ligands at scale (
To validate the utility of the EpiScan screening approach, we asked if the sequences of the peptide ligands recapitulated the known preferences of four three common, well studied, MHC-I alleles: HLA-A2, HLA-A3, HLA-B8 and HLA-B57. In each case, the sequences of the high-confidence peptides identified by EpiScan closely mirrored those of the corresponding sequences identified by mass spectrometry4 (
We further validated our EpiScan screening approach by investigating the underlying causes of abacavir hypersensitivity syndrome. Abacavir is an HIV reverse transcriptase inhibitor that causes hypersensitivity in around 5% of patients13; predisposition to abacavir hypersensitivity reactions is strongly associated with HLA*B57:01, and crystal structures show abacavir binding in the peptide binding groove of HLA*B57:0114,15. Screening a library of random 9-mer peptides in HLA-B57-expressing EpiScan cells in the presence and absence of abacavir yielded both overlapping and distinct sets of binding peptides. Consistent with previous mass spectrometry-based studies14,15, the primary difference between the two conditions occurs at the C-terminal anchor position: whereas the two most common anchor residues, tryptophan and phenylalanine, were present at equal frequency in both conditions, the frequency of tyrosine decreased upon abacavir treatment while the frequency of valine and isoleucine increased, as shown in the following table.
This difference would create a significant number of novel peptides displayed by HLA*B57:01 and explains the widespread T cell activation elicited in the hypersensitivity reaction. Thus, EpiScan is capable of detecting subtle changes in MHC-I binding specificity and can be further exploited to investigate autoimmunity and the interactions of drugs with the immune system.
Mass spectrometry (MS) represents the current best-in-class method for high-throughput MHC-I immunopeptidomics, and thus we wanted to scrutinize the differences between EpiScan and MS in an unbiased manner. First, we used unsupervised clustering to examine the similarities between the MHC-I ligands identified by MS and EpiScan. The clustering indicated that the differences between alleles was greater than the differences between the two methodologies (
For all four MHC-I alleles we noticed modest differences between the peptide binding preferences as determined by EpiScan and MS (
An important goal in the field of immunopeptidomics is the development of computational models that can accurately predict MHC-I ligands starting from the primary sequence of a protein4,20,21. Given the differences between the MHC-I ligands identified by EpiScan and MS, we wanted to provide proof-of-principle that an effective prediction algorithm could be developed from EpiScan data. Using a neural network architecture analogous to the MSi algorithm recently developed by Sarkizova and colleagues4 (
The key advantage of EpiScan over MS-based approaches is that it permits the targeted identification of MHC-I ligands from a defined pool of potential epitopes. The novel coronavirus, SARS-CoV-2, has spread rapidly across the globe; as of early July 2020, SARS-CoV-2 had caused over 12 million confirmed infections and was responsible for over 500,000 deaths. Outcomes resulting from SARS-CoV-2 infection vary greatly for individuals22, and recent work has shown that a robust T cell response is correlated with favourable outcomes22-24. Therefore, we set out to exploit the programmability of EpiScan to perform a comprehensive screen of the SARS-CoV-2 genome for MHC-I ligands.
We synthesized an oligonucleotide library encoding all possible 9-, 10- and 11-mer peptides covering 11 different strains of SARS-CoV-2 (a total of ˜30,000 sequences), and performed a series of EpiScan screens using a panel of cell lines expressing 11 of the most common HLA-I alleles (
Additionally, we used this independent dataset to evaluate the performance of our computational models that were trained on the random 9-mer data; we found that the models had comparable predictive power when applied to the SARS-CoV-2 EpiScan screens (
Lastly, we evaluated whether COVID-19 patients mount T cell responses against these epitopes. For 10 of the validated HLA-A2 ligands, we generated peptide-MHC tetramers (Table 4) and used them to assess the prevalence of reactive CD8+ T cells in the blood of convalescent COVID-19 patients. Each of the three patients tested had CD8+ T cells that reacted with at least one of the 10 tetramers (
We evaluated the effect of knocking out the signal peptide peptidase HM13. As shown in
In addition, when the sequences of the HLA-A*02:01 ligands identified by WT EpiScan, HM13 KO EpiScan, and mass spectrometry, were compared, the results (
We compared the affinity of L- to V-ended 9mers via EpiScan. As shown in
To confirm signal peptidase cleavage fidelity, we sought to challenge the system with peptides that would be most likely to be cleaved at the improper location. Thus, we chose three peptides known to bind HLA-A*02:01 that start with a glycine, which is also the last residue of the signal peptide, and included variants of each peptide with the initial glycine removed, or an additional glycine added. If the signal peptidase cleaves “too early”, leaving the last glycine of the signal peptide, then the removed glycine variant will cause an increase in surface MHC-I. Alternatively, if the signal peptides cleaves “too late”, removing an additional glycine, then the added glycine variant will cause an increase in surface MHC-I. If signal peptidase cleavage happens consistently, and precisely, at the end of the signal peptide then only the WT version of the peptides will lead to surface MHC-I signal. The results, shown in
A diverse set of 200,000 distinct peptides was introduced into HLA-A*02:01 HM13 KO EpiScan cells. After selection, MACS was performed using a biotin-conjugated β2m antibody on 100 million cells for each condition, and the column flow through and the cells captured by the column were plated after sorting. For capture, both streptavidin (
MACS allows more cells to be sorted in a shorter period of time than FACS. Thus, the success of MACS at isolating EpiScan cells that express higher affinity peptides permits larger scale screening of EpiScan peptide libraries.
We wanted to determine whether mass spectrometry (MS) could be used in tandem with EpiScan for more efficient MS-based determination of MHC-I ligands from a particular pathogen or other set of potential antigens. For comparison, we also sought to compare to a more conventional “targeted” MS approach wherein ORFs from the pathogen of interest are transfected into a cell line containing just one HLA-I allele. Thus, we transfected 293T cells engineered to only express HLA-A*02:01 with SARS-CoV-2 ORFs corresponding to ORF1a/b, M, N, and S, then harvested the cells for MS two days later. In parallel, we performed an EpiScan screen with HLA-A*02:01 and a SARS-CoV-2 library with all possible 9-, 10-, and 11-mers. For this purpose, the EpiScan cells bearing the SARS-CoV-2 library were sorted in one bin based on surface MHC-I. After recovering from sorting, the cells were expanded and then harvested for MS.
We found that conducting MS on the EpiScan sorted cells was much more efficient than ORF transfection at identifying potential SARS-CoV-2 epitopes. MS of eluted MHC-I ligands discovered 214 high-confidence SARS-CoV-2 peptides out of a total of 457 peptides for the EpiScan cells. However, for the ORF transfected cells, MS of eluted MHC-I ligands discovered 1 high-confidence SARS-CoV-2 peptide out of a total of 3130 peptides. Thus, MS, in combination with EpiScan, can be used to identify MHC-I ligands in a high-throughput fashion.
An assay for discovery of CD8 T cell epitopes known as T-Scan has been described (Kula et al., Cell. 2019 Aug. 8; 178(4):1016-1028.e13). When a T cell recognizes its cognate antigen on MHC-I, it releases granzyme to lyse the target cell. T-Scan relies on a Granzyme B (GzB) reporter that is activated after a CD8 T cell recognizes it. Here, T-Scan reporter cells have been engineered via TAP1/2 KO and HM13 KO to also be EpiScan cells. These EpiScan cells with the T-Scan reporter are referred to as EpiTScan cells. With EpiTScan we can precisely identify the specific peptide epitope responsible for T cell activation. Previously, T-Scan cells expressed short ORFs that were subject to endogenous processing and presentation and the short peptides responsible for T cell responses were inferred via prediction algorithms.
For this experiment, primary T cells were infected with a virus comprising a sequence for a human T cell Receptor (TCR), NLV3, that is specific to the peptide NLVPMVATV (SEQ ID NO:34), then those T cells were incubated together for 16 h at a 1:1 ratio with EpiTScan cells that express NLVPMVATV (SEQ ID NO:34) (
These results show that the EpiScan cells are capable of eliciting an immune response, as demonstrated by previously published metrics (TScan GzB reporter, Trogocytosis, and CD69).
Immunol. 68, 191-256 (1998).
It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.
This application claims the benefit of U.S. Provisional Application Ser. No. 62/942,428, filed on Dec. 2, 2019. The entire contents of the foregoing are incorporated herein by reference.
This invention was made with Government support under Grant No. BC171184 awarded by the Department of Defense. The Government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/062912 | 12/2/2020 | WO |
Number | Date | Country | |
---|---|---|---|
62942428 | Dec 2019 | US |