This invention pertains to the identification of antibody mediated epitope mimics and applications of the identification of said mimic peptides in the design of biotherapeutics and vaccines.
Autoimmune disease affects up to 50 million Americans, according to the American Autoimmune Related Diseases Association (AARDA). An autoimmune disease develops when the immune system, which defends the body against disease, decides that healthy self cells are foreign. As a result, the immune system attacks healthy cells. Depending on the type, an autoimmune disease can affect one or many different types of body tissue. It can also cause abnormal organ growth and changes in organ function.
There are as many as 80 types of autoimmune diseases documented. Many of them have similar symptoms, which makes them very difficult to diagnose. It is also possible to have more than one at the same time. Autoimmune diseases usually fluctuate between periods of remission (little or no symptoms) and flare-ups (worsening symptoms). Currently, treatment for autoimmune diseases focuses on relieving symptoms because there is no curative therapy. In some instances, onset of an autoimmune disease may be triggered by exposure of a subject to an infectious microorganism, an allergen, or other exogenous protein.
Autoimmune diseases often run in families, and 75 percent of those affected are women, according to AARDA. African Americans, Hispanics, and Native Americans also have an increased risk of developing an autoimmune disease.
It is also increasingly apparent that autoimmune mechanisms play a significant contributing role in the pathogenesis of many acute diseases, and in particular, infectious diseases, which are not generally thought of or characterized as autoimmune diseases. Indeed, the vast majority of clinical diseases may contain some autoimmune components to their pathogenesis.
As the human proteome differs in sequence from many species which are routinely used as experimental animal models, the occurrence of autoimmune phenomena varies between host species. This may result in disease observed in animal models diverging from that in the human host.
What is needed in the art are improved methods for determining which epitopes may give rise to autoimmune diseases and whether biotherapeutics and vaccines contain epitopes which can trigger autoimmune diseases. Furthermore, the art needs to better understand the autoimmune pathogenesis arising from infectious agents in order to facilitate the design of safe interventions, and in order to select appropriate animal models.
This invention pertains to the identification of antibody mediated epitope mimics and applications of the identification of said mimic peptides in the design of biotherapeutics and vaccines.
In some embodiments, the present invention provides methods for identifying epitope mimic peptides which elicit antibodies that bind to a host protein, comprising: assembling a database of all proteins in the host proteome; assigning a curation to each protein based on its reported function; computing the probable B cell epitopes in each protein of the host proteome database wherein the proteins are curated by function; identifying the core peptide of the probable B cell epitopes in each protein of the host proteome; assembling a database of the core peptides of the probable B cell epitopes from each protein of the host proteome in a computer readable medium; entering a sequence of a protein of interest into a computer with access to the database; computing probable B cell epitopes in the protein of interest; identifying the core peptide of the probable B cell epitopes in the protein of interest; comparing the core peptide of the probable B cell epitope in a protein of interest to the core peptides contained in the database of peptides from the host proteome; identifying core peptides in predicted B cell epitopes in the protein of interest which are identical to core peptides in predicted B cell epitopes in one or more proteins of the host proteome; and identifying the function of the host proteome proteins which comprise the identical core peptides matching the core peptides of the protein of interest.
In some embodiments, the host proteome is a human proteome. In other embodiments the host proteome is a murine proteome. In yet other embodiments the host protein is from another species, including but not limited to a non-human primate proteome.
In some embodiments, the probable B cell epitope in the protein of interest is in the top 25% most probable B cell epitopes in the protein of interest. In some embodiments, the probable B cell epitope in the protein of interest is in the top 10% most probable B cell epitopes in the protein of interest. In some embodiments, the probable B cell epitope in the host proteome protein is in the top 40% most probable B cell epitopes in the protein of interest. In some embodiments, the probable B cell epitope in the host proteome protein is in the top 25% most probable B cell epitopes in the protein of interest. In some embodiments, the core peptide in the probable B cell epitope in the protein of interest comprises a sequence of five contiguous amino acids. In some embodiments, the core peptide in the probable B cell epitope in the host proteome protein of interest comprises a sequence of five contiguous amino acids. In some embodiments, the database of core peptides in the data base of host proteome proteins is searched by application of a list of keywords to select to a subset of peptides with functions of interest. In some embodiments, the key words define a group of proteins with neurophysiological function. In some embodiments, the key words define a group of proteins with enzymatic function. In some embodiments, the key words define a group of proteins which function in blood clotting and vascular permeability. In some embodiments, the key words define a group of proteins which function in inflammation. In some embodiments, the key words define a group of proteins which function in arthritis. In some embodiments the core peptide of the probable B cell epitope is matched to the probable B cell epitopes in a dataset of proteins selected based on their known association with a particular disease syndrome. In one particular embodiment, the disease syndrome is Parkinson's disease and related alpha synucleinopathies.
In some embodiments, the methods further comprise identifying those probable B cell epitopes in the protein of interest which are located within 10 to 20 amino acids of a peptide with predicted high binding affinity for one or more MHC II molecule. In some embodiments, the methods further comprise identifying a subpopulation of subjects that is most at risk of adverse effects arising from antibody mediated autoimmunity. In some embodiments, the protein of interest is a microbial protein. In some embodiments, the microbial protein is selected from the group consisting of a virus, a bacteria, a parasite, a fungus, and a microbial toxin. In some embodiments, the protein of interest is an antigen binding protein. In some embodiments, the protein of interest is a biopharmaceutical protein. In some embodiments, the protein of interest is a vaccine. In some embodiments, the protein of interest is a pharmaceutical preparation. In some embodiments, the protein of interest is a food protein. In some embodiments, the protein of interest is an environmental protein. In some embodiments, the methods further comprise the step of synthesizing a mutant version of the protein of interest, wherein the core peptide in the protein of interest is mutated to abrogate the match to a core peptide in the human proteome.
In some embodiments, the present invention provides methods of selecting an animal model to study a disease or to test a vaccine or pharmaceutical product comprising: analyzing a protein of interest by the methods described above both for a human proteome and for a proposed animal model proteome. In some embodiments, said animal model is a mouse. In yet other embodiments the proposed model is a non-human primate. The occurrence of probable epitope mimics in the proposed animal model species is then compared with that of the human, to determine if the model would predict potential autoimmunity in the human subject.
In yet other embodiments, the probable mimics in the human proteome are analyzed by the methods described and then the core peptides of the mimics are compared to determine which other species have identical core peptides in their proteome proteins which are homologous in function to those in the human proteome that carry the core peptides matching the core peptides in the protein of interest.
In some embodiments, the present invention provides methods of producing a vaccine comprising: obtaining one or more gene or amino acid sequences encoding one or more components of vaccine that have been mutated to remove one or more epitope mimics or alter one or more epitope mimics to non-mimics as compared to the corresponding wild type sequences, the epitope mimics identified by a process comprising: assembling a database of all proteins in the human proteome; assigning a curation to each protein based on its reported function; computing the probable B cell epitopes in each protein of the human proteome database wherein the proteins are curated by function; identifying the core peptide of the probable B cell epitopes in each protein of the human proteome; assembling a database of the core peptides of the probable B cell epitopes from each protein of the human proteome in a computer readable medium; entering sequences encoding one or more components of vaccine into a computer with access to the database; computing probable B cell epitopes in the sequences encoding one or more components of vaccine; identifying the core peptide of the probable B cell epitopes in the sequences encoding one or more components of vaccine; comparing the core peptides of the probable B cell epitopes in the sequences encoding one or more components of vaccine to the core peptides contained in the database of peptides from the human proteome; identifying core peptides in predicted B cell epitopes in the sequences encoding one or more components of vaccine which are identical to core peptides in predicted B cell epitopes in one or more proteins of the human proteome; identifying the function of the human proteome proteins which comprise the identical core peptides matching the core peptides of sequences encoding one or more components of vaccine; and synthesizing components for a vaccine by a method selected from the group consisting of a) expressing the one more sequences encoding one or more components of vaccine that have been mutated to remove one or more epitope mimics or alter one or more epitope mimics to non-mimics as compared to the corresponding wild type sequences in a host cell to produce mutated proteins, and b) synthesizing nucleic acid segments encoding the one or more recombinant sequences encoding one or more components of vaccine that have been mutated to remove one or more epitope mimics or alter one or more epitope mimics to non-mimics as compared to the corresponding wild type sequences. In some embodiments, the methods further comprise formulating the mutated proteins or nucleic acid segments with a pharmaceutically acceptable carrier.
In some embodiments, the present invention provides methods of producing a biopharmaceutical protein comprising: obtaining one or more gene or amino acid sequences encoding a biopharmaceutical protein that has been mutated to remove one or more epitope mimics or alter one or more epitope mimics to non-mimics as compared to the corresponding target biopharmaceutical protein sequence, the epitope mimics identified by a process comprising: assembling a database of all proteins in the human proteome; assigning a curation to each protein based on its reported function; computing the probable B cell epitopes in each protein of the human proteome database wherein the proteins are curated by function; identifying the core peptide of the probable B cell epitopes in each protein of the human proteome; assembling a database of the core peptides of the probable B cell epitopes from each protein of the human proteome in a computer readable medium; entering sequences encoding the target biopharmaceutical protein into a computer with access to the database; computing probable B cell epitopes in the sequences encoding the target biopharmaceutical protein; identifying the core peptide of the probable B cell epitopes in the sequences encoding the target biopharmaceutical protein; comparing the core peptides of the probable B cell epitopes in the sequences encoding the target biopharmaceutical protein to the core peptides contained in the database of peptides from the human proteome; identifying core peptides in predicted B cell epitopes in the target biopharmaceutical protein which are identical to core peptides in predicted B cell epitopes in one or more proteins of the human proteome; identifying the function of the human proteome proteins which comprise the identical core peptides matching the core peptides of the target biopharmaceutical protein; and synthesizing the mutated biopharmaceutical protein by expressing the biopharmaceutical that have been mutated to remove one or more epitope mimics or alter one or more epitope mimics to non-mimics as compared to the corresponding target biopharmaceutical protein sequence. In some embodiments, the methods further comprise formulating the mutated biopharmaceutical protein with a pharmaceutically acceptable carrier.
In some embodiments, in the protein of interest is in the top 25% most probable B cell epitopes in the protein of interest (i.e., the vaccine component or biopharmaceutical protein). In some embodiments, the probable B cell epitope in the protein of interest is in the top 10% most probable B cell epitopes in the protein of interest. In some embodiments, the probable B cell epitope in the human proteome protein is in the top 40% most probable B cell epitopes in the protein of interest. In some embodiments, the probable B cell epitope in the human proteome protein is in the top 25% most probable B cell epitopes in the protein of interest. In some embodiments, the core peptide in the probable B cell epitope in the protein of interest comprises a sequence of five contiguous amino acids. In some embodiments, the core peptide in the probable B cell epitope in the human proteome protein of interest comprises a sequence of five contiguous amino acids. In some embodiments, the database of core peptides in the data base of human proteome proteins is searched by application of a list of keywords to select to a subset of peptides with functions of interest. In some embodiments, the key words define a group of proteins with neurophysiological function. In some embodiments, the key words define a group of proteins with enzymatic or endocrine function. In some embodiments, the key words define a group of proteins which function in blood clotting and vascular permeability. In some embodiments, the key words define a group of proteins which function in inflammation. In some embodiments, the methods further comprise identifying those probable B cell epitopes in the protein of interest which are located within 10 to 20 amino acids of a peptide with predicted high binding affinity for one or more MHC II molecule. In some embodiments, the sequences encoding one or more components of vaccine are microbial protein sequences. In some embodiments, the microbial protein sequences are selected from the group consisting of virus, bacteria, parasite, fungus, and microbial toxin sequences. In some embodiments, the target biopharmaceutical protein is selected from the group consisting of an antigen binding protein, a receptor protein and signaling protein. In some embodiments, the methods further comprise administering the one or more components of vaccine that have been mutated to remove one or more epitope mimics or alter one or more epitope mimics to non-mimics as compared to the corresponding wild type sequences to a subject in need thereof. In some embodiments, the methods further comprise administering the biopharmaceutical that have been mutated to remove one or more epitope mimics or alter one or more epitope mimics to non-mimics as compared to the corresponding target biopharmaceutical protein sequence to a subject in need thereof.
In some embodiments, the present invention provides methods of evaluating a biopharmaceutical protein comprising: identifying the presence in the biopharmaceutical protein of probable B cell epitopes and core peptides contained therein; determining which of the core peptides of the probable B cell epitopes match core peptides of probable B cell epitopes in a human proteome; and identifying the function of the proteins thus matched in the human proteome. In some embodiments, the methods further comprise the step of synthesizing a mutant version of the biopharmaceutical protein, wherein the core peptide in the biopharmaceutical protein is mutated to abrogate the match to a core peptide in the human proteome. In some embodiments, the methods further comprise identifying the spectrum of possible side effects arising from the binding of antibody elicited by the vaccine or biopharmaceutical protein to the B cell epitope in a human proteome protein.
In some embodiments, the present invention provides a non-transitory computer readable medium comprising a database of pentamer peptides which are found in human proteins of a defined set of functions and that are the core peptides of a predicted B cell epitope. In some embodiments, the defined set of functions are selected from the group consisting of neurophysiologic, endocrine, cardiovascular, respiratory, hormonal, skin and mucosal health, musculoskeletal functions.
In some embodiments, the present invention provides methods of evaluating potential side effects of a pharmaceutical protein comprising: determining the core peptides located in the probable B cell epitopes of the pharmaceutical proteins; interrogating the database as described above to determine if the core peptides of the pharmaceutical protein are present; and preparing a report identifying a spectrum of possible pathophysiologic interactions of the biopharmaceutical proteins.
In some embodiments, the present invention provides methods of attenuating the pathology of a microorganism comprising: identifying core peptides within probable B cell epitopes of the organism which elicit antibodies that bind to a matching core peptide in a B cell epitope of host protein; and mutating or removing the matching core peptide in the microorganism.
In some embodiments, the present invention provides methods of treating a subject affected by an autoimmune disease comprising: applying the methods described above to identify an epitope mimic peptide; providing the peptide as an antibody binding substrate; and incorporating the antibody binding substrate into an apheresis system.
In some embodiments, the present invention provides methods of diagnosing an autoimmune disease comprising: identifying epitope mimic peptides which elicit antibodies that bind to a human protein by the methods described above; providing a synthetic protein derived from the human protein which comprises the epitope mimic peptides; contacting the synthetic protein with serum harvested from a subject at risk of being affected by an autoimmune disease; and identifying the presence of antibodies with specific binding to mimic epitopes in the synthetic protein.
In some embodiments, the present invention provides methods of diagnosing an autoimmune disease wherein antibody mediated mimicry is suspected, comprising: harvesting a serum sample from a subject suspected of being affected by an autoimmune disease; contacting the serum sample to a microarray of peptides and identifying peptides which bind to antibodies in the serum; and analyzing the peptides thus identified by the methods described above to identify which of the peptides function as epitope mimic peptides.
As used herein, the term “genome” refers to the genetic material (e.g., chromosomes) of an organism or a host cell.
As used herein, the term “proteome” refers to the entire set of proteins expressed by a genome, cell, tissue or organism. A “partial proteome” refers to a subset the entire set of proteins expressed by a genome, cell, tissue or organism. Examples of “partial proteomes” include, but are not limited to, transmembrane proteins, secreted proteins, and proteins with a membrane motif. Human proteome refers to all the proteins comprised in a human being. This includes multiple isoforms of many proteins. Multiple such sets of proteins have been sequenced and are accessible at the InterPro international repository (www.ebi.ac.uk/interpro). Another such repository is UniProt (www.uniprot.org) Human proteome is also understood to include those proteins and antigens thereof which may be over-expressed in certain pathologies, or expressed in a different isoforms in certain pathologies. Hence, as used herein, tumor associated antigens are considered part of the human proteome. Murine proteome refers to the proteome of the mouse as catalogued in Uniprot, where a reference proteome is recorded for C57BL/6J mice www.uniprot.org/proteomes/UP000000589.
As used herein the term “host proteome” refers to the proteome of any species of interest in the study of a disease that afflicts said host. Thus for example, the human proteome is a host proteome for a human disease and a mouse proteome is a host proteome for a virus that infects it; and a macaque proteome is a host proteome for a parasite that affects it.
As used herein, the terms “protein,” “polypeptide,” and “peptide” refer to a molecule comprising amino acids joined via peptide bonds. In general “peptide” is used to refer to a sequence of 20 or less amino acids and “polypeptide” is used to refer to a sequence of greater than 20 amino acids.
As used herein, the term, “synthetic polypeptide,” “synthetic peptide” and “synthetic protein” refer to peptides, polypeptides, and proteins that are produced by a recombinant process (i.e., expression of exogenous nucleic acid encoding the peptide, polypeptide or protein in an organism, host cell, or cell-free system) or by chemical synthesis.
As used herein, the term “protein of interest” refers to a protein encoded by a nucleic acid of interest. It may be applied to any protein to which further analysis is applied or the properties of which are tested or examined. Similarly, as used herein, “target protein” may be used to describe a protein of interest that is subject to further analysis.
As used herein “peptidase” refers to an enzyme which cleaves a protein or peptide. The term peptidase may be used interchangeably with protease, proteinases, oligopeptidases, and proteolytic enzymes. Peptidases may be endopeptidases (endoproteases), or exopeptidases (exoproteases). Similarly, the term peptidase inhibitor may be used interchangeably with protease inhibitor or inhibitor of any of the other alternate terms for peptidase.
As used herein, the term “exopeptidase” refers to a peptidase that requires a free N-terminal amino group, C-terminal carboxyl group or both, and hydrolyses a bond not more than three residues from the terminus. The exopeptidases are further divided into aminopeptidases, carboxypeptidases, dipeptidyl-peptidases, peptidyl-dipeptidases, tripeptidyl-peptidases and dipeptidases.
As used herein, the term “endopeptidase” refers to a peptidase that hydrolyses internal, alpha-peptide bonds in a polypeptide chain, tending to act away from the N-terminus or C-terminus. Examples of endopeptidases are chymotrypsin, pepsin, papain and cathepsins. A very few endopeptidases act a fixed distance from one terminus of the substrate, an example being mitochondrial intermediate peptidase. Some endopeptidases act only on substrates smaller than proteins, and these are termed oligopeptidases. An example of an oligopeptidase is thimet oligopeptidase. Endopeptidases initiate the digestion of food proteins, generating new N- and C-termini that are substrates for the exopeptidases that complete the process. Endopeptidases also process proteins by limited proteolysis. Examples are the removal of signal peptides from secreted proteins (e.g. signal peptidase I), and the maturation of precursor proteins (e.g. enteropeptidase, furin). In the nomenclature of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) endopeptidases are allocated to sub-subclasses EC 3.4.21, EC 3.4.22, EC 3.4.23, EC 3.4.24 and EC 3.4.25 for serine-, cysteine-, aspartic-, metallo- and threonine-type endopeptidases, respectively. Endopeptidases of particular interest are the cathepsins, and especially cathepsin B, L and S known to be active in antigen presenting cells.
As used herein, the term “immunogen” refers to a molecule which stimulates a response from the adaptive immune system, which may include responses drawn from the group comprising an antibody response, binding to a B cell epitope, a cytotoxic T cell response, a T helper response, and a T cell memory. An immunogen may stimulate an upregulation of the immune response with a resultant inflammatory response, or may result in down regulation or immunosuppression. Thus the T-cell response may be a T regulatory response. An immunogen also may stimulate a B-cell response and lead to an increase in antibody titer. “Antigen” is a term used to describe one or more immunogens
As used herein, the term “native” (or “wild type”) when used in reference to a protein refers to proteins encoded by the genome of a cell, tissue, or organism, other than one manipulated to produce synthetic proteins.
As used herein the term “epitope” refers to a peptide sequence which elicits an immune response, from either T cells or B cells or antibody
As used herein, the term “B-cell epitope” refers to a polypeptide sequence that is recognized and bound by a B-cell receptor. A B-cell epitope may be a linear peptide or may comprise several discontinuous sequences which together are folded to form a structural epitope. Such component sequences which together make up a B-cell epitope are referred to herein as B-cell epitope sequences. Hence, a B-cell epitope may comprise one or more B-cell epitope sequences. Hence, a B cell epitope may comprise one or more B-cell epitope sequences. A linear B-cell epitope may comprise as few as 2-4 amino acids or more amino acids. In some particular instances the B cell epitope is a pentamer of five contiguous amino acids.
As used herein, the term “predicted B-cell epitope” refers to a polypeptide sequence that is predicted to bind to a B-cell receptor by a computer program, for example, as described in PCT US2011/029192, PCT US2012/055038, and US2014/014523, each of which is incorporated herein by reference, and in addition by Bepipred (Larsen, et al., Immunome Research 2:2, 2006.) and others as referenced by Larsen et al (ibid) (Hopp T et al PNAS 78:3824-3828, 1981; Parker J et al, Biochem. 25:5425-5432, 1986). A predicted B-cell epitope may refer to the identification of B-cell epitope sequences forming part of a structural B-cell epitope or to a complete B-cell epitope. In some usages herein B cell epitope is abbreviated to BEPI.
As used herein, the term “T-cell epitope” refers to a polypeptide sequence which when bound to a major histocompatibility protein molecule provides a configuration recognized by a T-cell receptor. Typically, T-cell epitopes are presented bound to a MHC molecule on the surface of an antigen-presenting cell.
As used herein, the term “predicted T-cell epitope” refers to a polypeptide sequence that is predicted to bind to a major histocompatibility protein molecule by the neural network algorithms described herein, by other computerized methods, or as determined experimentally.
As used herein, the term “major histocompatibility complex (MHC)” refers to the MHC Class I and MHC Class II genes and the proteins encoded thereby. Molecules of the MHC bind small peptides and present them on the surface of cells for recognition by T-cell receptor-bearing T-cells. The MHC-Is both polygenic (there are several MHC class I and MHC class II genes) and polyallelic or polymorphic (there are multiple alleles of each gene). The terms MHC-I, MHC-II, MHC-1 and MHC-2 are variously used herein to indicate these classes of molecules. Included are both classical and nonclassical MHC molecules. An MHC molecule is made up of multiple chains (alpha and beta chains) which associate to form a molecule. The MHC molecule contains a cleft or groove which forms a binding site for peptides. Peptides bound in the cleft or groove may then be presented to T-cell receptors. The term “MHC binding region” refers to the groove region of the MHC molecule where peptide binding occurs.
As used herein, a “MHC II binding groove” refers to the structure of an MHC molecule that binds to a peptide. The peptide that binds to the MHC II binding groove may be from about 11 amino acids to about 23 amino acids in length, but typically comprises a 15-mer. The amino acid positions in the peptide that binds to the groove are numbered based on a central core of 9 amino acids numbered 1-9, and positions outside the 9 amino acid core numbered as negative (N terminal) or positive (C terminal). Hence, in a 15mer the amino acid binding positions are numbered from −3 to +3 or as follows: −3, −2, −1, 1, 2, 3, 4, 5, 6, 7, 8, 9, +1, +2, +3.
As used herein, the term “haplotype” refers to the HLA alleles found on one chromosome and the proteins encoded thereby. Haplotype may also refer to the allele present at any one locus within the MHC. Each class of MHC-Is represented by several loci: e.g., HLA-A (Human Leukocyte Antigen-A), HLA-B, HLA-C, HLA-E, HLA-F, HLA-G, HLA-H, HLA-J, HLA-K, HLA-L, HLA-P and HLA-V for class I and HLA-DRA, HLA-DRB1-9, HLA-, HLA-DQA1, HLA-DQB1, HLA-DPA1, HLA-DPB1, HLA-DMA, HLA-DMB, HLA-DOA, and HLA-DOB for class II. The terms “HLA allele” and “MHC allele” are used interchangeably herein. HLA alleles are listed at hla.alleles.org/nomenclature/naming.html, which is incorporated herein by reference.
The MHCs exhibit extreme polymorphism: within the human population there are, at each genetic locus, a great number of haplotypes comprising distinct alleles—the IMGT/HLA database release (February 2010) lists 948 class I and 633 class II molecules, many of which are represented at high frequency (>1%). MHC alleles may differ by as many as 30-aa substitutions. Different polymorphic MHC alleles, of both class I and class II, have different peptide specificities: each allele encodes proteins that bind peptides exhibiting particular sequence patterns.
The naming of new HLA genes and allele sequences and their quality control is the responsibility of the WHO Nomenclature Committee for Factors of the HLA System, which first met in 1968, and laid down the criteria for successive meetings. This committee meets regularly to discuss issues of nomenclature and has published 19 major reports documenting firstly the HLA antigens and more recently the genes and alleles. The standardization of HLA antigenic specifications has been controlled by the exchange of typing reagents and cells in the International Histocompatibility Workshops. The IMGT/HLA Database collects both new and confirmatory sequences, which are then expertly analyzed and curated before been named by the Nomenclature Committee. The resulting sequences are then included in the tools and files made available from both the IMGT/HLA Database and at hla.alleles.org.
Each HLA allele name has a unique number corresponding to up to four sets of digits separated by colons. See e.g., hla.alleles.org/nomenclature/naming.html which provides a description of standard HLA nomenclature and Marsh et al., Nomenclature for Factors of the HLA System, 2010 Tissue Antigens 2010 75:291-455. HLA-DRB1*13:01 and HLA-DRB1*13:01:01:02 are examples of standard HLA nomenclature. The length of the allele designation is dependent on the sequence of the allele and that of its nearest relative. All alleles receive at least a four digit name, which corresponds to the first two sets of digits, longer names are only assigned when necessary.
The digits before the first colon describe the type, which often corresponds to the serological antigen carried by an allotype, The next set of digits are used to list the subtypes, numbers being assigned in the order in which DNA sequences have been determined. Alleles whose numbers differ in the two sets of digits must differ in one or more nucleotide substitutions that change the amino acid sequence of the encoded protein. Alleles that differ only by synonymous nucleotide substitutions (also called silent or non-coding substitutions) within the coding sequence are distinguished by the use of the third set of digits. Alleles that only differ by sequence polymorphisms in the introns or in the 5′ or 3′ untranslated regions that flank the exons and introns are distinguished by the use of the fourth set of digits. In addition to the unique allele number there are additional optional suffixes that may be added to an allele to indicate its expression status. Alleles that have been shown not to be expressed, ‘Null’ alleles have been given the suffix ‘N’. Those alleles which have been shown to be alternatively expressed may have the suffix ‘L’, ‘S’, ‘C’, ‘A’ or ‘Q’. The suffix ‘L’ is used to indicate an allele which has been shown to have ‘Low’ cell surface expression when compared to normal levels. The ‘S’ suffix is used to denote an allele specifying a protein which is expressed as a soluble ‘Secreted’ molecule but is not present on the cell surface. A ‘C’ suffix to indicate an allele product which is present in the ‘Cytoplasm’ but not on the cell surface. An ‘A’ suffix to indicate ‘Aberrant’ expression where there is some doubt as to whether a protein is expressed. A ‘Q’ suffix when the expression of an allele is ‘Questionable’ given that the mutation seen in the allele has previously been shown to affect normal expression levels.
In some instances, the HLA designations used herein may differ from the standard HLA nomenclature just described due to limitations in entering characters in the databases described herein. As an example, DRB1_0104, DRB1*0104, and DRB1-0104 are equivalent to the standard nomenclature of DRB1*01:04. In most instances, the asterisk is replaced with an underscore or dash and the semicolon between the two digit sets is omitted.
As used herein, the term “polypeptide sequence that binds to at least one major histocompatibility complex (MHC) binding region” refers to a polypeptide sequence that is recognized and bound by one or more particular MHC binding regions as predicted by the neural network algorithms described herein or as determined experimentally.
As used herein the terms “canonical” and “non-canonical” are used to refer to the orientation of an amino acid sequence. Canonical refers to an amino acid sequence presented or read in the N terminal to C terminal order; non-canonical is used to describe an amino acid sequence presented in the inverted or C terminal to N terminal order.
As used herein, the term “affinity” refers to a measure of the strength of binding between two members of a binding pair, for example, an antibody and an epitope and an epitope and a MHC-I or II haplotype. Kd is the dissociation constant and has units of molarity. The affinity constant is the inverse of the dissociation constant. An affinity constant is sometimes used as a generic term to describe this chemical entity. It is a direct measure of the energy of binding. The natural logarithm of K is linearly related to the Gibbs free energy of binding through the equation ΔG0=−RT LN(K) where R=gas constant and temperature is in degrees Kelvin. Affinity may be determined experimentally, for example by surface plasmon resonance (SPR) using commercially available Biacore SPR units (GE Healthcare) or in silico by methods such as those described herein in detail. Affinity may also be expressed as the ic50 or inhibitory concentration 50, that concentration at which 50% of the peptide is displaced. Likewise ln(ic50) refers to the natural log of the ic50.
The term “Koff”, as used herein, is intended to refer to the off rate constant, for example, for dissociation of an antibody from the antibody/antigen complex, or for dissociation of an epitope from an MHC haplotype.
The term “Kd”, as used herein, is intended to refer to the dissociation constant (the reciprocal of the affinity constant “Ka”), for example, for a particular antibody-antigen interaction or interaction between an epitope and an MHC haplotype.
As used herein, the terms “strong binder” and “strong binding” and “High binder” and “high binding” or “high affinity” refer to a binding pair or describe a binding pair that have an affinity of greater than 2×107M−1 (equivalent to a dissociation constant of 50 nM Kd)
As used herein, the term “moderate binder” and “moderate binding” and “moderate affinity” refer to a binding pair or describe a binding pair that have an affinity of from 2×107M−1 to 2×106M−1.
As used herein, the terms “weak binder” and “weak binding” and “low affinity” refer to a binding pair or describe a binding pair that have an affinity of less than 2×106M−1 (equivalent to a dissociation constant of 500 nM Kd)
Binding affinity may also be expressed by the standard deviation from the mean binding found in the peptides making up a protein. Hence a binding affinity may be expressed as “−1σ” or <−1σ, where this refers to a binding affinity of 1 or more standard deviations below the mean. A common mathematical transformation used in statistical analysis is a process called standardization wherein the distribution is transformed from its standard units to standard deviation units where the distribution has a mean of zero and a variance (and standard deviation) of 1. Because each protein comprises unique distributions for the different MHC alleles standardization of the affinity data to zero mean and unit variance provides a numerical scale where different alleles and different proteins can be compared. Analysis of a wide range of experimental results suggest that a criterion of standard deviation units can be used to discriminate between potential immunological responses and non-responses. An affinity of 1 standard deviation below the mean was found to be a useful threshold in this regard and thus approximately 15% (16.2% to be exact) of the peptides found in any protein will fall into this category.
The terms “specific binding” or “specifically binding” when used in reference to the interaction of an antibody and a protein or peptide or an epitope and an MHC haplotype means that the interaction is dependent upon the presence of a particular structure (i.e., the antigenic determinant or epitope) on the protein; in other words the antibody is recognizing and binding to a specific protein structure rather than to proteins in general. For example, if an antibody is specific for epitope “A,” the presence of a protein containing epitope A (or free, unlabeled A) in a reaction containing labeled “A” and the antibody will reduce the amount of labeled A bound to the antibody.
As used herein, the term “antigen binding protein” refers to proteins that bind to a specific antigen. “Antigen binding proteins” include, but are not limited to, immunoglobulins, including polyclonal, monoclonal, chimeric, single chain, and humanized antibodies, Fab fragments, F(ab′)2 fragments, and Fab expression libraries. Various procedures known in the art are used for the production of polyclonal antibodies. For the production of antibody, various host animals can be immunized by injection with the peptide corresponding to the desired epitope including but not limited to rabbits, mice, rats, sheep, goats, etc. Various adjuvants are used to increase the immunological response, depending on the host species, including but not limited to Freund's (complete and incomplete), mineral gels such as aluminum hydroxide, surface active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanins, dinitrophenol, and potentially useful human adjuvants such as BCG (Bacille Calmette-Guerin) and Corynebacterium parvum.
For preparation of monoclonal antibodies, any technique that provides for the production of antibody molecules by continuous cell lines in culture may be used (See e.g., Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.). These include, but are not limited to, the hybridoma technique originally developed by Kohler and Milstein (Kohler and Milstein, Nature, 256:495-497 [1975]), as well as the trioma technique, the human B-cell hybridoma technique (See e.g., Kozbor et al., Immunol. Today, 4:72 [1983]), and the EBV-hybridoma technique to produce human monoclonal antibodies (Cole et al., in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96 [1985]). In other embodiments, suitable monoclonal antibodies, including recombinant chimeric monoclonal antibodies and chimeric monoclonal antibody fusion proteins are prepared as described herein.
According to the invention, techniques described for the production of single chain antibodies (U.S. Pat. No. 4,946,778; herein incorporated by reference) can be adapted to produce specific single chain antibodies as desired. An additional embodiment of the invention utilizes the techniques known in the art for the construction of Fab expression libraries (Huse et al., Science, 246:1275-1281 [1989]) to allow rapid and easy identification of monoclonal Fab fragments with the desired specificity.
Antibody fragments that contain the idiotype (antigen binding region) of the antibody molecule can be generated by known techniques. For example, such fragments include but are not limited to: the F(ab′)2 fragment that can be produced by pepsin digestion of an antibody molecule; the Fab′ fragments that can be generated by reducing the disulfide bridges of an F(ab′)2 fragment, and the Fab fragments that can be generated by treating an antibody molecule with papain and a reducing agent.
Genes encoding antigen-binding proteins can be isolated by methods known in the art. In the production of antibodies, screening for the desired antibody can be accomplished by techniques known in the art (e.g., radioimmunoassay, ELISA (enzyme-linked immunosorbant assay), “sandwich” immunoassays, immunoradiometric assays, gel diffusion precipitin reactions, immunodiffusion assays, in situ immunoassays (using colloidal gold, enzyme or radioisotope labels, for example), Western Blots, precipitation reactions, agglutination assays (e.g., gel agglutination assays, hemagglutination assays, etc.), complement fixation assays, immunofluorescence assays, protein A assays, and immunoelectrophoresis assays, etc.) etc.
As used herein “immunoglobulin” means the distinct antibody molecule secreted by a clonal line of B cells; hence when the term “100 immunoglobulins” is used it conveys the distinct products of 100 different B-cell clones and their lineages.
As used herein, the terms “computer memory” and “computer memory device” refer to any storage media readable by a computer processor. Examples of computer memory include, but are not limited to, RAM, ROM, computer chips, digital video disc (DVDs), compact discs (CDs), hard disk drives (HDD), and magnetic tape.
As used herein, the term “computer readable medium” refers to any device or system for storing and providing information (e.g., data and instructions) to a computer processor. Examples of computer readable media include, but are not limited to, DVDs, CDs, hard disk drives, magnetic tape and servers for streaming media over networks.
As used herein, the terms “processor” and “central processing unit” or “CPU” are used interchangeably and refer to a device that is able to read a program from a computer memory (e.g., ROM or other computer memory) and perform a set of steps according to the program.
As used herein, the term “support vector machine” refers to a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.
As used herein, the term “classifier” when used in relation to statistical processes refers to processes such as neural nets and support vector machines.
As used herein “neural net”, which is used interchangeably with “neural network” and sometimes abbreviated as NN, refers to various configurations of classifiers used in machine learning, including multilayered perceptrons with one or more hidden layer, support vector machines and dynamic Bayesian networks. These methods share in common the ability to be trained, the quality of their training evaluated, and their ability to make either categorical classifications of non numeric data or to generate equations for predictions of continuous numbers in a regression mode. Perceptron as used herein is a classifier which maps its input x to an output value which is a function of x, or a graphical representation thereof.
As used herein, the term “principal component analysis”, or as abbreviated PCA, refers to a mathematical process which reduces the dimensionality of a set of data (Wold, S., Sjorstrom, M., and Eriksson, L., Chemometrics and Intelligent Laboratory Systems 2001. 58: 109-130; Multivariate and Megavariate Data Analysis Basic Principles and Applications (Parts I&II) by L. Eriksson, E. Johansson, N. Kettaneh-Wold, and J. Trygg, 2006 2nd Edit. Umetrics Academy). Derivation of principal components is a linear transformation that locates directions of maximum variance in the original input data, and rotates the data along these axes. For n original variables, n principal components are formed as follows: The first principal component is the linear combination of the standardized original variables that has the greatest possible variance. Each subsequent principal component is the linear combination of the standardized original variables that has the greatest possible variance and is uncorrelated with all previously defined components. Further, the principal components are scale-independent in that they can be developed from different types of measurements. The application of PCA generates numerical coefficients (descriptors). The coefficients are effectively proxy variables whose numerical values are seen to be related to underlying physical properties of the molecules. A description of the application of PCA to generate descriptors of amino acids and by combination thereof peptides is provided in PCT US2011/029192 incorporated herein by reference, Unlike neural nets PCA do not have any predictive capability. PCA is deductive not inductive.
As used herein, the term “vector” when used in relation to a computer algorithm or the present invention, refers to the mathematical properties of the amino acid sequence.
As used herein, the term “vector,” when used in relation to recombinant DNA technology, refers to any genetic element, such as a plasmid, phage, transposon, cosmid, chromosome, retrovirus, virion, etc., which is capable of replication when associated with the proper control elements and which can transfer gene sequences between cells. Thus, the term includes cloning and expression vehicles, as well as viral vectors.
As used herein, the term “vector” when used in relation to transmission of an arbovirus refers to the intermediate host of a virus, such as a mosquito or tick or other arthropod.
As used herein, the term “host cell” refers to any eukaryotic cell (e.g., mammalian cells, avian cells, amphibian cells, plant cells, fish cells, insect cells, yeast cells), and bacteria cells, and the like, whether located in vitro or in vivo (e.g., in a transgenic organism).
As used herein, the term “cell culture” refers to any in vitro culture of cells. Included within this term are continuous cell lines (e.g., with an immortal phenotype), primary cell cultures, finite cell lines (e.g., non-transformed cells), and any other cell population maintained in vitro, including oocytes and embryos.
The term “isolated” when used in relation to a nucleic acid, as in “an isolated oligonucleotide” refers to a nucleic acid sequence that is identified and separated from at least one contaminant nucleic acid with which it is ordinarily associated in its natural source. Isolated nucleic acids are nucleic acids present in a form or setting that is different from that in which they are found in nature. In contrast, non-isolated nucleic acids are nucleic acids such as DNA and RNA that are found in the state in which they exist in nature.
The terms “in operable combination,” “in operable order,” and “operably linked” as used herein refer to the linkage of nucleic acid sequences in such a manner that a nucleic acid molecule capable of directing the transcription of a given gene and/or the synthesis of a desired protein molecule is produced. The term also refers to the linkage of amino acid sequences in such a manner so that a functional protein is produced.
A “subject” is an animal such as vertebrate, preferably a mammal such as a human, or a bird, or a fish. Mammals are understood to include, but are not limited to, murines, simians, humans, bovines, ovines, cervids, equines, porcines, canines, felines etc.).
An “effective amount” is an amount sufficient to effect beneficial or desired results. An effective amount can be administered in one or more administrations,
As used herein, the term “purified” or “to purify” refers to the removal of undesired components from a sample. As used herein, the term “substantially purified” refers to molecules, either nucleic or amino acid sequences, that are removed from their natural environment, isolated or separated, and are at least 60% free, preferably 75% free, and most preferably 90% free from other components with which they are naturally associated. An “isolated polynucleotide” is therefore a substantially purified polynucleotide.
“Strain” as used herein in reference to a microorganism describes an isolate of a microorganism (e.g., bacteria, virus, fungus, parasite) considered to be of the same species but with a unique genome and, if nucleotide changes are non-synonymous, a unique proteome differing from other strains of the same organism. Typically strains may be the result of isolation from a different host or at a different location and time but multiple strains of the same organism may be isolated from the same host.
As used herein “Complementarity Determining Regions” (CDRs) are those parts of the immunoglobulin variable chains which determine how these molecules bind to their specific antigen. Each immunoglobulin variable region typically comprises three CDRs and these are the most highly variable regions of the molecule.
As used herein, the term “motif” refers to a characteristic sequence of amino acids forming a distinctive pattern.
The term “Groove Exposed Motif” (GEM) as used herein refers to a subset of amino acids within a peptide that binds to an MHC molecule; the GEM comprises those amino acids which are turned inward towards the groove formed by the MHC molecule and which play a significant role in determining the binding affinity. In the case of human MHC-I the GEM amino acids are typically (1, 2, 3, 9). In the case of MHC-II molecules two formats of GEM are most common comprising amino acids (−3, 2, −1, 1, 4, 6, 9, +1, +2, +3) and (−3, 2, 1, 2, 4, 6, 9, +1, +2, +3) based on a 15-mer peptide with a central core of 9 amino acids numbered 1-9 and positions outside the core numbered as negative (N terminal) or positive (C terminal).
“Immunoglobulin germline” is used herein to refer to the variable region sequences encoded in the inherited germline genes and which have not yet undergone any somatic hypermutation. Each individual carries and expresses multiple copies of germline genes for the variable regions of heavy and light chains. These undergo somatic hypermutation during affinity maturation. Information on the germline sequences of immunoglobulins is collated and referenced by www.imgt.org (1). “Germline family” as used herein refers to the 7 main gene groups, catalogued at IMGT, which share similarity in their sequences and which are further subdivided into subfamilies.
“Affinity maturation” is the molecular evolution that occurs during somatic hypermutation during which unique variable region sequences generated that are the best at targeting and neutralizing and antigen become clonally expanded and dominate the responding cell populations.
“Germline motif” as used herein describes the amino acid subsets that are found in germline immunoglobulins. Germline motifs comprise both GEM and TCEM motifs found in the variable regions of immunoglobulins which have not yet undergone somatic hypermutation.
“Immunopathology” when used herein describes an abnormality of the immune system. An immunopathology may affect B-cells and their lineage causing qualitative or quantitative changes in the production of immunoglobulins. Immunopathologies may alternatively affect T-cells and result in abnormal T-cell responses. Immunopathologies may also affect the antigen presenting cells. Immunopathologies may be the result of neoplasias of the cells of the immune system. Immunopathology is also used to describe diseases mediated by the immune system such as autoimmune diseases. Illustrative examples of immunopathologies include, but are not limited to, B-cell lymphoma, T-cell lymphomas, Systemic Lupus Erythematosus (SLE), allergies, hypersensitivities, immunodeficiency syndromes, radiation exposure or chronic fatigue syndrome.
An “autoimmune disease” or “autoimmunity” as used herein refers to any disease or pathology which arises as the result of an immune response directed to a self-antigen. An autoimmune disease may be chronic, lasting over years with periodic flare ups and remissions, or many be acute and transitory, such as when an acute infection generates antibodies directed to a self-protein and the effects of said antibodies wane rapidly in days or weeks.
“Obverse” as used herein describes the outward directed face or the side facing outwards. Hence, in the context of a pMHC complex, the obverse side is that face presented to the T-cell receptor and comprises the space-shape made up of the TCEM and the contiguous and surrounding outward facing components of the MHC molecule that will be different for each different MHC allele.
“pMHC” Is used to describe a complex of a peptide bound to an MHC molecule. In many instances a peptide bound to an MHC-I will be a 9-mer or 10-mer however other sizes of 7-11 amino acids may be thus bound. Similarly MHC-II molecules may form pMHC complexes with peptides of 15 amino acids or with peptides of other sizes from 11-23 amino acids. The term pMHC is thus understood to include any short peptide bound to a corresponding MHC.
“Somatic hypermutation” (SHM), as used herein refers to the process by which variability in the immunoglobulin variable region is generated during the proliferation of individual B-cells responding to an immune stimulus. SHM occurs in the complementarity determining regions.
“T-cell exposed motif” (TCEM), as used herein, refers to the sub set of amino acids in a peptide bound in a MHC molecule which are directed outwards and exposed to a T-cell binding to the pMHC complex. A T-cell binds to a complex molecular space-shape made up of the outer surface MHC of the particular HLA allele and the exposed amino acids of the peptide bound within the MHC. Hence any T-cell recognizes a space shape or receptor which is specific to the combination of HLA and peptide. The amino acids which comprise the TCEM in an MHC-I binding peptide typically comprise positions 4, 5, 6, 7, 8 of a 9-mer. The amino acids which comprise the TCEM in an MHC-II binding peptide typically comprise 2, 3, 5, 7, 8 or −1, 3, 5, 7, 8 based on a 15-mer peptide with a central core of 9 amino acids numbered 1-9 and positions outside the core numbered as negative (N terminal) or positive (C terminal). As indicated under pMHC, the peptide bound to a MHC may be of other lengths and thus the numbering system here is considered a non-exclusive example of the instances of 9-mer and 15 mer peptides.
“Regulatory T-cell” or “Treg” as used herein, refers to a T-cell which has an immunosuppressive or down-regulatory function. Regulatory T-cells were formerly known as suppressor T-cells. Regulatory T-cells come in many forms but typically are characterized by expression CD4+, CD25, and Foxp3. Tregs are involved in shutting down immune responses after they have successfully eliminated invading organisms, and also in preventing immune responses to self-antigens or autoimmunity.
“Tregitope” as used herein describes an epitope to which a Treg or regulatory T-cell binds.
“uTOPE™ analysis” as used herein refers to the computer assisted processes for predicting binding of peptides to MHC and predicting cathepsin cleavage, described in PCT US2011/029192, PCT US2012/055038, and US2014/01452, each of which is incorporated herein by reference.
“Framework region” as used herein refers to the amino acid sequences within an immunoglobulin variable region which do not undergo somatic hypermutation.
“Isotype” as used herein refers to the related proteins of particular gene family. Immunoglobulin isotype refers to the distinct forms of heavy and light chains in the immunoglobulins. In heavy chains there are five heavy chain isotypes (alpha, delta, gamma, epsilon, and mu, leading to the formation of IgA, IgD, IgG, IgE and IgM respectively) and light chains have two isotypes (kappa and lambda). Isotype when applied to immunoglobulins herein is used interchangeably with immunoglobulin “class”.
“Isoform” as used herein refers to different forms of a protein which differ in a small number of amino acids. The isoform may be a full length protein (i.e., by reference to a reference wild-type protein or isoform) or a modified form of a partial protein, i.e., be shorter in length than a reference wild-type protein or isoform.
“Class switch recombination” (CSR) as used herein refers to the change from one isotype of immunoglobulin to another in an activated B cell, wherein the constant region associated with a specific variable region is changed, typically from IgM to IgG or other isotypes.
“Immunostimulation” as used herein refers to the signaling that leads to activation of an immune response, whether said immune response is characterized by a recruitment of cells or the release of cytokines which lead to suppression of the immune response. Thus immunostimulation refers to both upregulation or down regulation.
“Up-regulation” as used herein refers to an immunostimulation which leads to cytokine release and cell recruitment tending to eliminate a non self or exogenous epitope. Such responses include recruitment of T cells, including effectors such as cytotoxic T cells, and inflammation. In an adverse reaction upregulation may be directed to a self-epitope.
“Down regulation” as used herein refers to an immunostimulation which leads to cytokine release that tends to dampen or eliminate a cell response. In some instances such elimination may include apoptosis of the responding T cells.
“Frequency class” or “frequency classification” as used herein is used to describe the counts of TCEM motifs found in a given dataset of peptides. A logarithmic (log base 2) frequency categorization scheme was developed to describe the distribution of motifs in a dataset. As the cellular interactions between T-cells and antigen presenting cells displaying the motifs in MHC molecules on their surfaces are the ultimate result of the molecular interactions, using a log base 2 system implies that each adjacent frequency class would double or halve the cellular interactions with that motif. Thus using such a frequency categorization scheme makes it possible to characterize subtle differences in motif usage as well as providing a comprehensible way of visualizing the cellular interaction dynamics with the different motifs. Hence a Frequency Class 2, or FC 2 means 1 in 4, a Frequency class 10 or FC 10 means 1 in 210 or 1 in 1024.
“40K set” as used herein refers to the database of 40,000 IGHV assembled from Genbank as described in Example 1
“IGHV” as used herein is an abbreviation for immunoglobulin heavy chain variable regions
“IGLV” as used herein is an abbreviation for immunoglobulin light chain variable regions “Adverse immune response” as used herein may refer to (a) the induction of immunosuppression when the appropriate response is an active immune response to eliminate a pathogen or tumor or (b) the induction of an upregulated active immune response to a self-antigen or (c) an excessive up-regulation unbalanced by any suppression, as may occur for instance in an allergic response.
As used herein “epitope mimic” describes a peptide that is present and elicits an immune response in one protein (e.g., source protein) and the humoral and cellular effectors of that immune response then recognize and act upon the same peptide motif where it occurs in a different protein (e.g., target protein). For example, an antibody which is elicited by a B cell epitope in a microorganism and which binds to a B cell epitope peptide derived from a human protein would be said to have found an epitope mimic. In some embodiments, epitope mimics are an important mechanism in autoimmunity.
As used herein “TCEM mimic” is used to describe a peptide which has an identical or overlapping TCEM, but may have a different GEM. Such a mimic occurring in one protein may induce an immune response directed towards another protein which carries the same TCEM motif. This may give rise to autoimmunity or inappropriate responses to the second protein.
“Anchor peptide”, as used herein, refers to peptides or polypeptides which allow binding to a substrate to facilitate purification or which facilitate attachment to a solid medium such as a bead or plastic dish or are capable of insertion into a membrane of a cell or liposome or virus like particle or other nanoparticle. Among the examples of anchor peptides are the following, which are considered non-limiting, his tags, immunoglobulins, Fc region of immunoglobulin, G coupled protein, receptor ligand, biotin, and FLAG tags. In some instances an anchor peptide is designed to be cleavable following exposure to an endopeptidase in vitro or in vivo.
“Cytotoxin” or “cytocide” as used herein refers to a peptide or polypeptide which is toxic to cells and which causes cell death. Among the non-limiting examples of such polypeptides are RNAses, phospholipase, membrane active peptides such as cercropin, and diphtheria toxin. Cytotoxin also includes radionuclides which are cytotoxic.
“Cytokine” as used herein refers to a protein which is active in cell signaling and may include, among other examples, chemokines, interferons, interleukins, lymphokines, granulocyte colony-stimulating factor, tumor necrosis factor and programmed death proteins.
As used herein the term “Alpha emitter” refers to a radioisotope which emits alpha radiation. Examples of alpha emitters which may be suitable for clinical use include Astatine-211, Bismuth-212, Bismuth-213, Actinium-225 Radium-223, Terbium-149, Fermium-255
As used herein “Auger particles” refers to the low energy electrons emitted by radionuclides such as but not limited to, Gadolinium-67, Technicium-99, Indium-111, Iodine-123, Iodine-125, Tellurium-201. Auger electrons are advantageous as they have a short path of transit through tissue.
As used herein “oncoprotein” means a protein encoded by an oncogene which can cause the transformation of a cell into a tumor cell if introduced into it. Examples of oncoproteins include but are not limited to the early proteins of papillomaviruses, polyomaviruses, adenoviruses and herpes viruses, however oncoproteins are not necessarily of viral origin.
“Label peptide” as used herein refers to a peptide or polypeptide which provides, either directly or by a ligated residue, a colorimetric, fluorescent, radiation emitting, light emitting, metallic or radiopaque signal which can be used to identify the location of said peptide. Among the non-limiting examples of such label peptides are streptavidin, fluorescein, luciferase, gold, ferritin, tritium,
“MHC subunit chain” as used herein refers to the alpha and beta subunits of MHC molecules. A MHC II molecule is made up of an alpha chain which is constant among each of the DR, DP, and DQ variants and a beta chain which varies by allele. The MHC I molecule is made up of a constant beta macroglobulin and a variable MHC A, B or C chain.
As used herein “high frequency T cell exposed motifs” refers to a T cell exposed motif which occurs at high frequency in a reference database of >50000 immunoglobulin variable regions. A motif that occurs more than once in 1024 variable regions is considered to be a high frequency motif which will have a large cognate T cell population and be likely to elicit a Tregulatory response when it is also highly bound by a MHC molecule.
The term “nanoparticle” as used herein refers to a small particle used to array immunogens which may be comprised of protein, lipid, carbohydrate or combination thereof or may be a “virus like particle” which mimics a virus in structure but lacks replicative capability.
As used herein an “immunostimulant” may refer to an adjuvant, including but not limited to Freunds adjuvant, inorganic compounds (e.g., alum, aluminum hydroxide, aluminum phosphate, calcium phosphate hydroxide), mineral oil (e.g., paraffin oil), bacterial products (e.g., killed bacteria, Bordetella pertussis, Mycobacterium bovis, toxoids), nonbacterial organics (e.g., squalene, thimerosal), detergents (e.g., Quil A), plant saponins from quillaja, soybean, polygala senega, cytokines (e.g., IL-1, IL-2, IL-12), and food Based oil (e.g., adjuvant 65).
A used herein the term “domain”, when used herein to describe the domains of flavivirus envelopes, refers to structural domains as characterized in crystal structures (e.g., crystal structures for tick borne encephalitis and Japanese encephalitis viruses (2, 3)).
“Neural and neurologic proteins,” as used herein, refers to proteins within the human proteome, which have been identified as having a function in the nervous system in development or function. Included among such proteins, but not limited to these examples, are those which have the term neural, neuron, neuronal, neurologic, neurotropic, neurotropin, neuropeptide, neurogenic, glial, synaptic, and neurite in their curation at Uniprot (www.uniprot.org). Proteins are described by their Uniprot identifies in the tables included herein. Glycoprotein M6A and Glial fibrillary acidic protein are also included herein. While described by use of the identifiers for human proteins the defined term is intended to also include close homologues from other species.
“Microencephaly,” as used herein describes a condition of fetuses and neonates in which part or all of the brain is absent and the cranium is reduced in size at birth.
“Guillain Barré syndrome,” abbreviated as GBS, as used herein refers to a complex of symptoms, which include peripheral neuropathy affecting motor, sensitive and autonomic nerves and spinal roots causing acute, or subacute, progressive motor weakness sometimes advancing to respiratory paralysis. GBS is an autoimmune disease and has been noted following various infections, including influenza, Campylobacter, dengue and Zika virus. Although symptomatology is shared, GBS may have various pathogeneses, with different immune responses directed to different self proteins.
“Flaviviruses” as used herein refers to the taxonomic group of viruses of that name (4). Abbreviations are used for several flaviviruses as follows Japanese encephalitis JEV, West Nile Virus WNV, Tick Borne encephalitis TBEV, yellow fever YF, dengue DEN.
“Microbiocide” as used herein refers to a composition which may be a peptide, polypeptide or enzyme or small molecule which acts on a microorganism to inhibit its replication or cause lethal structural damage. Microbiocides include but are not limited to bactericides, virucides, and fungicides.
“Core peptides” or “core pentamer” when used herein refers to the central 5 amino acid peptide in a predicted B cell epitope sequence. Said B cell epitope may be evaluated by predicting the binding of across a series of 9-mer windows, the core pentamer then is the central pentamer of the 9-mer window
“Target biopharmaceutical” as used herein refers to an original biopharmaceutical or a first iteration of a biopharmaceutical product which may be improved to reduce risk and increase safety by removal or mutation of a mimic epitope.
As used herein the term “arthritis” refers to any pathologic process resulting in inflammation, degeneration, pain or stiffness of the joints.
As used here in the term “alpha synucleinopathy”, or synucleinopathy, refers to a disease characterized by abnormal processing or accumulation of alphasynuclein protein in neurons.
Alphasynucleinopathy includes Parkinson's disease, dementia with Lewy bodies, and multiple system atrophy.
As used herein the term “parasite” refers to both endoparasites and ectoparasites. Endoparasites include protozoa, and multicellular parasites such as helminths; ectoparasites include arthropods such as ticks and lice. Antigens derived from said parasites which elicit antibodies may include both structural and physiologic proteins, and those proteins secreted by the parasites. In one particular instance, this includes the salivary proteins of ectoparasites.
There is increasing awareness that autoimmune reactions are a major contributor to morbidity and mortality. This includes both autoimmunity mediated by the cellular immune response and autoimmunity mediated by antibody responses.
The present invention provides a method for prediction and identification of antibody mediated epitope mimicry, in which antibodies elicited by an exogenous antigen react with an epitope on a self-protein, i.e., one that is a normal constituent of the human proteome or other host proteome. As the outcome of such interactions may be adverse and may contribute to clinical disease, anticipating such reactions permits avoidance, design away in development of biotherapeutics and vaccines, and interventions to remediate antibody mediated mimic reactions.
In one embodiment therefore the present invention provides a process to identify epitopes on an exogenous antigenic protein which are B cell epitopes and to identify predicted B cell epitopes within proteins of the human proteome which carry the same pentamer amino acid motif. In some particular embodiments, said exogenous protein is present in a microorganism, including but not limited to, a virus, bacteria, fungus, parasite, or a toxin thereof, and said autoimmunity is a sequel to an infection or infestation. In one particular embodiment involving parasites the protein which generates an antibody response is the saliva of an ectoparasite. In yet other embodiments the exogenous antigen is found in the environment as a component of a food product or an allergen, or any other environmental protein to which a subject is exposed. In further embodiments, the exogenous protein is a component of a pharmaceutical product, including but not limited to a vaccine, prophylactic or therapeutic drug, either as the active biopharmaceutical constituent thereof or as an excipient. These examples of antigenic proteins are not considered limiting.
The protein in the human proteome bearing the B cell epitope to which said antibody binds, recognizing it as a mimic of the epitope which elicited the antibody, may have one of many different functions. In some instances, the target protein may have a neurophysiologic function, in other instances it may function in cardiovascular systems, including but not limited to endothelial permeability and clotting. In yet further embodiments, the target protein may have urophysiologic, dermatologic, endocrine, or gastrointestinal functions, may involve a particular group of enzymes, or any one of several other physiologic functions the impairment of which results in disease. In order to classify the potential mimics, a series of filters may be applied which comprise groups of key words used in curation of the proteins pertinent to the organ system or physiologic function of interest.
In yet other embodiments, the proteins known to be associated or affected in a given disease may be examined to identify their B cell epitopes and thus provide a panel against which specific pathogens or exogenous antigens may be filtered. For instance, as non-limiting examples, human proteins known to be associated with arthritis or Parkinson's disease, may be selected and a panel established against which matches in a protein from an infectious agent of interest may be cross checked. The stringency of selection and identification of the antibody targeted mimicry is determined by the percentage of the ranked probability of B cell binding, first in the protein which gives rise to the antibody, i.e. the exogenous protein and secondly in the host self protein. In a preliminary screening such levels of stringency may be set to select the top 25% of B cell epitopes in the exogenous protein and the top 40% of B cell epitopes in the target protein. Such selection filters may be increased in stringency to select only the top 10% of the B cell epitopes in the exogenous protein and 25% of the target proteins B cell epitopes, or increased or decreased in stringency to whatever the operator deems to be an appropriate level of stringency. In particular embodiments, an additional selection criterion is to identify B cell epitopes in the exogenous protein which have closely juxtaposed peptides with high affinity MHC binding providing good T cell help. This is turn is conducive to generation of high antibody titers, immunoglobulin class switching and a higher chance of epitope mimicry occurring. In some instances, the B cell epitope in the exogenous protein is accompanied by peptides binding to one or more MHC alleles, however in yet other instances the adjacent peptides provide binding to most or all MHC alleles and at high affinity. This relationship will determine whether antibody mimicry affects all subjects, or occurs only sporadically in those subjects carrying a particular MHC allele. The MHC binding may determine the familial associations of an autoimmune disease.
In some embodiments, the process described herein for identifying antibody mediated epitope mimicry may be applied in the design of a vaccine, or a biopharmaceutical, where targeting antibodies to self-proteins is undesirable. Following identification of epitope mimics which may cause such adverse effects, a vaccine may be designed to mutate or delete said mimics and focus the response only on the desirable antibody eliciting epitopes. The approach described in this invention may also be employed to evaluate a novel biopharmaceutical to identify whether it may have epitopes which will elicit self reacting antibodies. Such an application of the methods can reduce risk, and hence cost and time, and increase safety in the design of a biopharmaceutical because multiple iterations can be evaluated in silico before a clinical trial.
In some particular embodiments once a target protein of autoimmunity is identified in silico, the information can be used to determine if a particular animal species will form a good preclinical disease model. This is by allowing a target protein to be compared in a proposed animal species for its identity and hence determine if it is representative of the protein in humans. This will aid in the selection of an animal model which can best represent the human species. In one particular embodiment, therefore, the proteome of the mouse, based on the C57BL6 inbred strain is used as a comparator to determine which exogenous antigens share B cell epitope mimics with the mouse proteome. In this embodiment, the B cell epitopes of the murine proteome are pre-computed and a set of key word based filters established for the mouse proteome to enable filtering of epitope mimic matches of infectious organisms or environmental or other exogenous antigens with murine proteins that have neurologic, cardiovascular, and other sets of functional groupings. As those skilled in the art will appreciate, as the complete proteomes of other important domestic and laboratory animals are sequenced and annotated, it will become increasingly possible to match epitope mimics in other animal models of interest, such as non-human primates, and thus the example of murine model is not considered limiting.
In some particular embodiments, the comparison of predicted epitope mimics can shed light on the differences in clinical manifestations arising from infections by different strains or isolates of a given infectious organism, whether viral or bacterial or of other taxonomies. In one particular embodiment, identifying the peptide in the exogenous protein which leads to the immune response and antibodies which ultimately are self-reactive, enables the use of said mimic peptide as a component of an apheresis device in which the peptide binds the antibodies which would otherwise bind to the self-protein.
The methods described herein provide a tool for understanding and responding to antibody mediated autoimmune diseases. It will be apparent to those skilled in the art that the applications are not limited to one autoimmune disease and can be applied to a wide variety of autoimmune diseases and thus none of the examples are considered limiting.
Historically, it was generally assumed that the immune system does not recognize self proteins. We are increasingly recognizing there is an active interaction and overlap between the immune recognition of self and exogenous antigens. There are many instances where the cellular immune system fails to differentiate between recognition motifs, comprising a small group of amino acids occurring in a pathogen, from the same small group of amino acids where they occur in a self-protein (see, e.g., PCT/US2015/039969, the entire contents of which is incorporated herein by reference; see also Bremel et al (5)). However, another sphere of interactions occurs between exogenous proteins, including but not limited to pathogens, and the self-proteins of the human proteome; this is antibody mediated epitope mimicry. Antibody mediated epitope mimicry occurs when an antigenic exogenous protein elicits antibodies that also recognize and bind to an epitope on a self-protein. The binding of an antibody to a self-protein may then inhibit or compromise the functionality or processing of the self-protein. In some instances, the spectrum of clinical signs following microbial infection may be as much, or even more, dependent on the effect of the antibodies elicited by the infectious agent binding to the host proteins, as it is due to the primary microbial replication. Antibody mediated autoimmune diseases, in which the antibodies generated in response to one epitope, on a microorganism or other exogenous protein, but which then bind to a self-protein are notoriously difficult to diagnose, and it can be very difficult to pin down the exact mechanism of pathogenesis leading to the clinical signs. The processes described in the present invention apply bioinformatics tools to greatly facilitate understanding of such antibody mediated autoimmune responses and to permit them to be identified and recognized rapidly. When applied to a biotherapeutic or vaccine synthetic protein, the in silico screening tools provided herein enable evaluation of potential mimics, thereby reducing the time, costs, and most importantly risks, of waiting for clinical trials. When applied to antibody mediated mimicry arising from natural infection or exposure to an antigenic exogenous proteins, the tools described herein enable diagnosis of the pathways of disease and hence provide information critical to designing interventions.
In a related mechanism, the presence of linear B cell epitopes may also reflect the propensity for a protruding and polarized peptide to bind other ligands. In other words, the presence of matching B cell epitopes is simply an indicator of potential interference or blocking between other ligands. The basic components of antibody mediated autoimmune disease are as follows.
An exogenous protein, which may be from any one of a wide range of sources, as noted below, has a group of amino acids which form a B cell epitope. The epitope binds to a B cell and causes that cell to generate antibodies. The antibodies thus generated recognize a B cell epitope on a self-protein and preferentially bind to it, impeding the function or processing of the protein.
The exogenous protein may be a microorganism, including but not limited to a virus, a bacteria, a parasite, a fungus, or a toxin generated by a microorganism. These taxonomic descriptions are intended to be descriptive examples, and not considered limiting. It may be a synthetic or attenuated microbial protein intended to be introduced into the host as a vaccine. In other embodiments the exogenous protein may be a biopharmaceutical protein, such as a monoclonal antibody or a monoclonal antibody-based product, comprising part or all of an immunoglobulin. In some particular instances an excipient incorporated in a pharmaceutical formulation may be the source of the exogenous protein which elicits antibodies. In some embodiments the exogenous protein may be a toxin. In yet others it may be an allergen or another environmental protein. Such examples provide orientation but are not intended to limit the definition of exogenous protein.
The titer of antibodies elicited by the exogenous protein will in part determine how much of the host protein is bound by antibodies, and to what degree its function is compromised, and hence the degree of clinical effect. If a B cell epitope is immediately flanked by a peptide of high MHC affinity, the chance of a strong T helper effect is increased (6). T cell help is also essential to bring about immunoglobulin class switch. The occurrence of IgG and not just IgM may be a deciding factor in antibody mimicry. For instance IgG will cross the human placental and may bind to proteins in the fetus whereas IgM will not. MHC binding peptides, taken up at the B cell synapse at the time of B cell epitope binding, will be those most likely to be presented by the B cell to T cells and elicit T cell help (7, 8). Hence those peptides close to the B cell epitope will be those most likely to provide specific help. Therefore, a further consideration in identifying B cell epitopes which may elicit antibodies that bind to antibody mimics is to also determine if there is an adjacent MHC binding peptide. In some cases, such MHC binding may be of high affinity for many alleles of MHC II. In other instances only a few alleles provide such T cell help. Therefore, a further aspect of the process described herein is to identify which alleles may lead to most risk of developing an antibody mediated autoimmunity. In this way a sub population of individual subjects who are most at risk can be identified. Importantly, this relationship is between the host MHC and the exogenous protein. It is unlikely that in the host protein that is the target of the antibody binding that the MHC binding plays any role in determining if the antibody will bind.
At some minimal level, such antibody mediated “off target binding” to mimics on self proteins occurs very frequently, is the norm, and occurs across the diversity of antibodies that a subject generates. This is inevitable given the relatively narrow number of different options in specificity. If a pentamer is considered as the core of the B cell epitope then only 205 or 3.2 million possibilities of different configuration exist. If the recipient epitope on the host protein is also a pentamer, comprising 3.2 million possibilities then the chance of a match is 205×205 or approximately 1 in 1013. Whether such binding has any clinical relevance is dependent on the titer of antibody, and thus how much of the host protein gets bound, the isotype of the immunoglobulin, with what affinity binding occurs, and in particular, what is the function of the host protein. Most of the time such binding has no clinical impact whatsoever; it is diverse, it is at low levels and transient, and it impacts proteins which are not on a critical metabolic path. Where high titer antibody and essential host protein function both occur, the clinical signs may become evident. This may be the case following a burst of antibody production after an acute infection or exposure.
There are many examples in which antibody mediated mimicry has been described and is well known to the art. There is rapidly increasing awareness of the role of antibodies in autoimmunity. Among the most recently reported antibody mediated autoimmune interactions are a relationship between seropositivity to West Nile virus and myasthenia gravis (9), interaction between certain antibodies to herpes simplex virus and alphasynuclein, a critical component of the Lowey bodies of Parkinson disease (10) and the demonstration that antibodies to dengue cross react with von Willebrand factor (11). Further, enteroviruses have been shown to exert neuropathologic effects through antibody mediated binding (12).
Guillain Barré (GBS) is a clinical syndrome of multiple autoimmune etiologies, which involve idiopathic peripheral neuropathy leading to acute flaccid paralysis. The clinical course of GBS varies; 25% of patients require artificial ventilation (days to months), 20% of patients remain non ambulatory at 6 months and 3-10% of patients die despite standard of care treatment. In medical care environments where ventilatory support is not readily available, GBS mortality is often much higher. Globally, annual GBS incidence is estimated at 1.1 to 1.8/100,000/year, of which approximately 70% appear associated with antecedent infectious disease and the product of antibody mimicry. Other cases of GBS arise from cell mediated autoimmunity. Infections leading to GBS are typically gastrointestinal or respiratory. Campylobacter jejeuni infections are among the most common infections which lead to GBS. This is seen as a sequel especially after severe C. jejeuni diarrhea (13, 14). As we show in the examples cited below, epitope mimicry may play a wider and under recognized role in pathogenesis.
A particular embodiment in which antibody mediated autoimmunity may cause additional problems is during pregnancy when the fetus is also exposed to the antibodies. The human placenta, unlike that of many species, is very efficient in transfer of IgG to the fetus. Placental transfer of immunoglobulins to a fetus prior to blood brain barrier formation can be detrimental to the fetus. The human placenta facilitates the transfer of IgG, but not IgM, mediated by FcRn and increasing during the second trimester (15). IgG1 and IgG4 are most efficiently transferred. Approximately 10% of maternal IgG is thought to pass into the fetal circulation, starting as early as week 13 (16). The fetal blood brain barrier (BBB) is not fully developed until the third trimester and indeed may preferentially transfer proteins to the fetal brain (17, 18). Thus, the literature suggests that the developing CNS is exposed to maternal antibodies in the first two trimesters. There is clearly precedent for autoimmune diseases caused by the transplacental passage of antibody, including pemphigus, myasthenia gravis, and lupus (16, 17, 19). Transplacental antibody has also been implicated in autism spectrum disorders (20). In dengue infection maternal antibodies transfer to the fetus, achieving a level determined by maternal antibody titer (21). Fetal titer may actually exceed maternal titer suggesting an active transfer process without direct adverse effects on the fetus being reported until ADE following post-natal dengue infection (22). In one embodiment, therefore, this invention addresses the understanding of autoimmunity in the fetus arising from maternal antibodies and the detection of immunogens that can result in antibodies in the mother that cross the placenta. Antibody binding proteins critical to fetal development at key time windows in development may result in teratogenic defects. Understanding this antibody transfer pathway is essential to development of products, including vaccines and biotherapeutics, intended to be administered to pregnant women.
Cytomegalovirus and rubella are both viral infections which cause congenital abnormalities, in some cases evident at birth in other cases developing during childhood. While in both cases virus may be isolated from the fetus and there is no question that direct pathology arises from such viral replication, there is still a lack of understanding of the pathogenesis of much of the teratologic effect seen (23, 24). In one embodiment of the present invention, the role of antibody mediated epitope mimicry is shown in which antibody to the membrane proteins of cytomegalovirus are predicted to generate antibodies which are reactive with among others the NAV2 neural navigator protein needed for neurite elongation in the early fetal development (25, 26). Notably secondary infections with cytomegalovirus are associated with a rise in antibodies membrane protein glycoprotein B. In another embodiment we show that similar antibodies are generated in response to rubella envelope protein 2. Remarkably it has been noted that babies born with more sever sequelae of rubella in utero infection have higher titers of antibody to rubella (27-29)
This is similar to the predicted antibody mimicry following Zika virus infection (see, e.g., copending applications 62/292,964; 62/290,616 and 62/286,779, each of which is incorporated by reference herein in its entirety). Zika virus has a pentamer epitope in its envelope protein Domain III that is predicted to generate antibodies which also bind to proNeuropeptide Y and, in Asian Pacific strains also has a Domain I envelope protein epitope, antibodies to which are also predicted to bind NAV2 and affect fetal growth and also impact retinal development, leading to the combination of clinical signs now recognized as Zika fetal syndrome. It will be apparent to those skilled in the art that grossly evident fetal malformation may be the “tip of the iceberg” and that lower titers of antibody transferred transplacentally may compromise fetal development to a lesser degree, leading to signs, such as the deafness, that may appear years after birth of a child exposed to rubella infection in utero, or which may manifest themselves as behavioral changes.
It is evident therefore that there is great need to be able to identify with greater precision and efficiency the exact pathways leading to autoimmunity in order to determine methods of intervention and to avoid off-target adverse responses in the development of biotherapeutics.
In one embodiment therefore, the present invention addresses researching the pathogenesis of autoimmune diseases to identify the epitope mimics leading to antibody mediated autoimmune responses in order to design interventions and avoid safety risks. This information can then be used in the design of vaccines and therapeutics in which key mimic epitopes are mutated out. In a parallel embodiment it then follows that having created a new epitope amino acid motif, by mutation of a known epitope mimic, that the process must be repeated and the replacement pentamer motif must be checked against the proteome to make sure a further new cross reactive epitope mimic motif has not been created in the process.
In a particular embodiment, the present invention addresses screening of a new biotherapeutic to identify potential epitope mimics. The invention provides a rapid way in which many biotherapeutics in early development can be screened in silico to anticipate adverse reactions which can arise from antibody mediated autoimmunity, and to identify epitope mimics. A particular reason why this is a major savings in cost and time is that the invention enables screening against the whole proteome of the human, and all isoforms of any protein therein. As not all isoforms occur in any single individual it is possible that early clinical trials would not detect all possible adverse effects from epitope mimics. Further in silico analysis by the methods described herein allows evaluation for all MHC alleles, identifying those individuals most likely to generate a high titer of antibody due to the T cell help. A further motive to apply the invention described herein, is that animal models may not detect epitope mimic effects. This is because, in addition to the MHC differences between hosts, where the host protein to which antibodies bind differs by as little as a single amino acid in the animal model species, there may be no antibody mediated mimic effect detected in the animal model. Thus a potential adverse effect could go unnoticed until the biotherapeutic or vaccine enters clinical trials in humans.
Another embodiment of the present invention is to assist in designing therapies for antibody mediated autoimmune diseases. If the peptide that forms the target of the antibody binding the host protein is identified, then this peptide can be deployed to bind the problem antibody. This could be done by administration of the peptide to the subject in a pharmaceutical preparation, or ex vivo by inclusion of the peptide in a plasmapheresis system, or similar exchange system, to bind and remove the antibodies of concern.
Given the differences between the proteomes of human and other species the occurrence of epitopes in the host proteome matching that of a given exogenous antigen will be species dependent. There is ongoing concern about the inability of animal models to accurately predict the pathogenesis of diseases in humans. This is a particular concern when animal models are used to assess the safety of therapeutics or vaccines in an animal model, only to find that they do not fully replicate what is seen in human clinical trials. In another embodiment therefore the present invention examines the differences in epitope mimics between human and murine models. As other species may be used as animal models and as the proteomes are fully annotated the example of the murine model can be extended to other species of interest. Furthermore having used the invention described herein to identify potential epitope matches in the human, using this peptide sequence as guidance, the presence or absence of the same epitope mimics in other species of interest such as non-human primates can be assessed by interrogating for the identical peptide in the proteome of that species.
The processes we describe herein utilize the ability to predict probable B cell epitopes and to predict MHC binding affinity, which we have described in copending application PCT US2011/029192, incorporated herein by reference in its entirety. The present invention then provides an appropriate set of selection filters to establish a stringent selection system, and a system for interrogating the large human proteome database for matches. The stringency filters are applied at two levels. On one hand it is necessary to determine which of the antibodies elicited by a linear epitope in an exogenous protein are most likely to generate a strong B cell response, and which are likely to be made at high titer. The algorithms developed permit an initial screen, for instance using the 25% linear epitopes in the exogenous protein most likely to elicit antibodies. This filter can be made less stringent, or more stringent, to select only 10% or only 5% of the probable B cell epitopes. In a preferred embodiment, the initial screen of potential antibody binding sites in the proteome protein would typically define the top 40% most probable antibody binding sites in each protein of the human proteome, but likewise can be set to be more or less stringent. This selection criterion can be changed to the top 30% or 20% as desired. The appropriate cutoff will depend on the circumstances; very low levels of mimic binding antibody may be problematic in the fetus whereas much more stringent cutoffs may be adequate for adults.
The following examples provide illustrations of the above embodiments.
Building on the methods described in PCT US2011/029192, incorporated herein by reference, which enable the prediction of a B cell epitope in a protein of interest we established a work flow for identifying core pentamer peptides in a source protein of interest, for instance a viral protein, and then detecting matches of this peptide in a human protein in which B cell epitope core pentamers have been previously computed. Proteins in the human proteome are curated as to their functions based on information in UniProt (30). This allows a set of search terms to be applied to extract sets of proteins from the overall proteome database based on key words.
In computing the predicted probable B cell epitopes, a sliding 9-mer window is used. For comparative purposes the pentamer central core of the 9-mer is used. A pentamer is chosen because, not only does it provide a very stringent filter, but it corresponds to the area needed to engage the paratope of an antibody (31). While an antibody may engage a smaller number of amino acids, as few as 3 may be sufficient, it was determined by experimentation that using a pentamer as the core peptide provided a filter with sufficient stringency to identify matches to a meaningful number of human proteins. While B cell epitopes may be conformational, comprising amino acids in different strands of a sequence that are juxtaposed by folding, the simplest form of B cell epitope is a linear sequence. Therefore pentamer motifs analyzed in identification of mimic matches may be linear or comprise conformationally juxtaposed amino acids brought together by folding.
To implement the search for matches between a protein of interest and the human proteome we implemented the following workflow, described here as for a viral protein but identically applicable to any protein of interest.
This process provides a highly selective set of filters. Any pentamer has a 205 chance of occurrence (5 of 20 amino acids, a 1 in 3.2 million chance). When this probability is applied independently to both all the Zika viral proteins (a polyprotein of 3423 amino acids) and to the human proteome sets, there is a 3423/205×205 chance of a match, or 1 in 3.3×1010. This probability is then further reduced by application of the BEPI and keyword filters, but increases because the proteome comprises multiple similar isoforms of some proteins and some repetitive pentamers may occur in the virus. Progressively greater stringency may be applied to identify B cell epitopes most likely to elicit antibodies and most likely to become host targets of such antibodies.
In a further independent evaluation step of the viral proteins, the adjacency to probable BEPIs of predicted high affinity MHC binding of 15mers which may stimulate T cell help is determined. T cell help will not change antibody binding but may stimulate a higher titer. This selection process is discussed in further detail in the methods.
In the particular work flow described above we were interested in proteins of neurologic function. Therefore a key word list was assembled to identify proteins with these functions as shown in Table 1
Similar lists may be developed to capture matches in proteome proteins with other functions, for instance the blood clotting cascade or pancreatic function. The key word list can be customized according to the circumstances and the protein of interest to focus the search for potential epitope mimics. In some cases the key word list may be selected based on the clinical signs of a particular disease, thus in jaundice a key word list would include the interactome of liver function.
Alternatively, the list of core pentamers located in BEPIs in the human proteome may be screened in its entirely to identify any protein in which a problematic mimic relationship may exist. This “all matches” approach allows the identification of B cell epitope mimics in proteins not identified by key word annotations in Uniprot. This is a particularly appropriate approach for any new biologic in development. It is also a desirable approach in comparing two exogenous proteins which differ only by one or two mutations, to determine what new mimics may have been created by mutation.
Ebola is an infection characterized by hemorrhagic lesions in all major organs. We were interested to determine the possibility that antibody mimicry may be contributing to the pathogenesis of the clinical disease. Following the procedure laid out in Example 1 we computed the B cell epitope probabilities in the Ebola proteins of West Africa 2014, Mayinga, Bundibugyo and Musoke strains of Ebola Marbug virus. However, instead of searching for pentamer BEPI matches in the human proteome based on neurologic key words as illustrated in Example 1 we used a key word search comprising the terms shown in Table 2 below.
This identified an array of pentamers in each of the key proteins that elicit the primary immune response which are indicative of antibody mediated mimicry which could contribute to the vascular and hemorrhagic signs. In Tables 3-6 we summarize those results for the 2014 West African isolates of Ebola virus and for the spike protein, small soluble glycoprotein, VP24 and VP40.
This provides an initial screening to identify the human proteome proteins of interest as potential targets of antibody mediated mimicry in Ebola virus.
It has been known for decades, since the beginning of development of cell culture attenuated mumps virus vaccines that certain strains of mumps virus retained their neurovirulence and that testing in animal models is not always a reliable detector of neuroattenuation (32). Neuroattenuation has been attributed to various of the mumps virus proteins and to specific single amino acid changes therein (33), (34), Cui et al PLOS One, 2013; Malik et al J Gen Virol, 2009; Lemon et al J Virol 2007); Shah et al J Med Virol 2009. We therefore selected several strains of mumps virus for which the characteristics of neurovirulence have been experimentally evaluated. These included the strains shown in Table 7.
In this case the analysis as described in Example 1 failed to find any pentamer matches peculiar to the known neurovirulent strains as compared to the avirulent strains in Table 7. Jeryl Lynn did have a number of pentamer matches to the proteome that differed from the other strains, this may reflect its extensive in vitro passage history
In order to evaluate the screening process on monoclonal antibody products we tapped a database of commercially developed monoclonal antibodies and downloaded sequences for brodalumab. Brodalumab, an anti-interleukin 17 receptor antibody was developed for treatment of psoriasis. It was effective in control of psoriasis but withdrawn from clinical trials because of an association with suicide and suicidal thoughts (Danesh M J Kimball Ab J am Acad Dermatol, 2016; see also Wikipedia.org/wiki/brodalumab). We addressed two questions: what makes brodalumab different from other monoclonal antibody products and does it have any neurologic mimics which offer any indicators on behavioral changes In parallel, we evaluated Rituximab as an example of a monoclonal which is well tolerated.
In order to produce a clinical result differing from other monoclonal antibodies Brodalumab would have to contain a different set of pentamer motifs from other antibodies, or at least a rare set in a different context relative to B cell epitope characteristics and associated MHC II binding peptides. Necessarily such a motif would lie in the variable region or in any part of the constant region which has been engineered.
To examine this we looked at the entire sequences of heavy and light chain, and noted especially the variable region of both heavy and light chains of the product, comprising the N terminal 150 amino acids, to identify rare pentamer motifs. We set the threshold from a previously computed database of antibodies (see, e.g., PCT US2011/029192). Briefly this database comprises 45,000 heavy chain variable regions retrieved from NCBI Protein resource with a search argument “(immunoglobulin heavy chain variable region) AND (Homo sapiens)”. Various search arguments were used to extract non-redundant subsets (by Genbank accession number) that were either immunoglobulin class-defined, or to eliminate sequences for which the metadata attached to the accession indicated association with an immunopathology (lymphoma, leukemia, lupus, rheumatoid arthritis, multiple sclerosis). Manual curation was used to remove sequences that were obviously not immunoglobulins. The final dataset thus included 39,957 non-class-defined immunoglobulins, not associated with immunopathology. The resulting dataset comprises many different accession groups from studies carried out over a considerable period of time so can be considered a representative sample of “natural” human immunoglobulins. Accessions with signal peptides were identified and signal peptides removed using the combined signal peptide and transmembrane predictor Phobius (phobius.sbc.su.se). IGHV were included in the final set if they contained at least 80 amino acids, a value approximating the shortest germline equivalent sequence. All sequences longer than 130 amino acids were truncated at that point. The approximate positions of the three complementarity determining regions (CDR) have been indicated in
Secondly we computed the B cell epitope pentamers of brodalumab and rituximab and compared these to our precomputed database of human proteome pentamers (as described above). A key word search was conducted to identify protein with neurologic function, using the key words in Table A above. This identified 496 matches, inclusive of all isoforms. For Rituximab 560 pentamer matches were identified. When this was filtered to identify those wherein the predicted probability of B cell epitopes was in the top 25% for the brodalumab and in the top 40% of the proteome neurologic subset, 77 heavy chain and 69 light chain matches were identified for brodalumab, inclusive of multiple isoforms. For rituximab we identified 67 heavy chain and 69 light chain matches, inclusive of multiple isoforms.
Homo sapiens
Homo sapiens
This focused our attention on five motif which are unique to brodalumab and all of which are in the heavy chain. Table 9 shows the affinity of these motifs in both brodalumab and the proteome as well as the position in the monoclonal.
Only two motifs RSTSE and overlapping STSES show high BEPI probability (<−1.4) and are located in the variable regions. Positions 134 and 135 are near the C terminus of the variable region and the motifs of interest may have been created as a function of the engineering of the variable region on to the constant region. As shown in
In the case of Rituximab, as shown in table 10A, the BEPI probabilities are lower and the motifs are in the constant regions, except for one motif located at position 43 of the light chain.
The two human proteins identified as unique matches in brodalumab, for Myoneurin and Myelin protein zero-like protein 1 are probable mimics and depending on the function of these two proteins would be candidates for investigation to determine their possible contribution to the neurologic changes seen in subjects.
When a search of all possible human proteome epitope mimics is conducted for the pentameric motifs that are high probability B cell epitopes in brodalumab but absent from rituximab, a further 344 possible proteins are identified which contain epitope mimics. Some have a function in neurologic pathways. These provide a second tier of proteins which should be examined for possible contributions to pathways leading to suicidal tendencies.
The surface proteins of ten strains of rubella virus, E1 E2 and capsid protein were analyzed following the steps laid out in example 1. The same key word search pattern was used as described in example 1 to detect neurologic function proteins. Table 10B shows the results for one exemplary isolate (Br1). Where more than one isoform of the human protein exhibited a match, only one example is included in the table in the interests of space.
Cytomegalovirus is a large virus comprising over 200 proteins of which over 130 are structural proteins. However, a large proportion of the virus by weight is comprised of the exposed surface membrane glycoproteins which are exposed to the host immune system and engender the majority of the antibody response. In secondary infections with cytomegalovirus antibody rise to glycoprotein B is particularly noted. While all proteins were analyzed, we report here on the results from the principal membrane glycoproteins. Further in the interests of space only results for glycoprotein B are shown in Table 11.
The procedure described in Example 1 was followed in the case of Zika virus. Predicted antibody mimics were defined in each of the viral proteins. Table N shows the predicted mimics identified in the structural proteins of Zika virus as well as whether the motif is present in both African and American strains. The occurrence of mimic in proNPY and the NAV2 proteins is consistent with the appearance of Guillain Barre syndrome and other neurologic deficits experienced by individuals infected. In addition, the interaction with NPY and with NAV2 at a critical point in fetal development may be the basis for the developmental failures the most obvious of which is microcephaly.
In the case of Zika envelope protein, a feature conserved which is not seen in other flaviviruses is a band of high affinity MHC II binding immediately adjacent to the sequence which forms the domain II loop DE. This loop is the location of the sequence PVITESTENSK which encompasses several of the mimic peptides listed in the above table. The juxtaposition of high MHC II binding and hence T cell help favors the development of higher titers of antibody and class switch of the immunoglobulins which may accentuate the autoimmune consequences
As discussed in Example 5 above, the anti-Zika antibody mediated mimics which target proNeuropeptide Y through the motif ESTEN we were interested to know which species in addition to humans would be affected by this mimicry. We therefore searched UniProt to determine the sequence composition of proNPY for multiple species. Table 13 summarizes the findings for a subset of species.
Sus scrofa
Oryctolagus
cuniculus
Equus caballus
Felis catus
Macaca mulatta
Canis familiaris
Bos taurus
Ovis aries
Rattus norvegicus
Mus musculus
Among the species examined, only non-human primates and rats and mice carry the ESTEN motif which is predicted to be targeted by the anti-Zika envelope antibodies. Thus other animal species infected by Zika would not experience neurologic impacts due to binding of CPON. On the other hand the motif GEDAP found in dengue 3 is conserved across all the species evaluated.
The implication of this finding is that testing of a mimic in a species other than humans, non-human primates and certain rodents would result in experimental results which would not provide useful information relative to the impact of antibody mediated mimics in man. This underscored the importance of applying computational screening to select appropriate animal models for diseases or to test novel protein biopharmaceuticals and vaccines. The above example applies specifically to Zika but other species distributions of critical motifs would be expected for other proteome proteins which constitute the antibody mimic targets of antibodies elicited by other antigens.
Dengue is well known as a hemorrhagic disease, with dengue hemorrhagic fever occurring most typically following a second infection with a different serotype from the first infection. While for many years the role of antibody dependent enhancement (ADE) has been cited as a cause for this (35), there is increasing evidence that dengue does evoke an autoimmune response (36), that von Willebrand factor may be depleted (37), and that other clotting factors may be affected (38, 39). Most recently the NS1 protein has been implicated as leading to vascular permeability in dengue (40, 41) and activating Toll receptor 4, and several possible direct viral pathogenic mechanisms have been described. However, the most serious vascular leakage in dengue hemorrhagic fever occurs after the peak of NS1 has declined, suggesting that a direct role of NS1 may not be the only factor (42). In particular embodiments of the present invention, a subset of the human proteome was selected to include those proteins which have a function in the cardiovascular system, including structural proteins found in endothelium, platelets, erythrocytes, and enzymes expressed by these cells, and coagulation cascade proteins. In the present invention, we describe the role of NS1 in dengue in eliciting auto antibodies to various proteins with cardiovascular function, including but not limited to coagulation factor V and VIII, prothrombin, von Willebrand factor, ADAMTS13 (A disintegrin and metalloproteinase with thrombospondin motifs 13), platelet glycoprotein Ib beta, vascular endothelial growth factor, vascular endothelial growth factor receptor and platelet endothelial aggregation receptor. Notably no such epitope matches in cardiovascular function proteins clearly linked to hemorrhage and thrombocytopenia occur in the corresponding proteins of West Nile virus. In particular embodiments we describe the precise B cell epitopes which are mimics, thereby enabling the mutation or removal of such epitopes to reduce adverse effects in a vaccine.
Infection with Zika virus has led to the development of deadly thrombocytopenia. (43, 44). In even mild cases of ZIKV, USUV, or dengue infection, an erythremic rash is a typical clinical sign. Epitope analysis of NS1 was conducted for an array of flaviviruses including four serotypes of dengue, yellow fever, Zika virus and Usutu virus, as well as St Louis encephalitis, West Nile, Japanese encephalitis, and Tick borne encephalitis. Particular attention was focused on the C terminal loop of NS1 lying between amino acids 280 and 329, bounded by cysteine residues, and more particularly between 290 and 311, likewise bounded by cysteine residues. This region in every flavivirus examined contains not only strong predicted B cell epitopes, but also a region of high MHC II binding for multiple alleles as shown in Table 14 below.
Analysis was then conducted on the NS1 proteins as described in Example 1 to compare predicted B cell linear epitopes to the predicted B cell linear epitopes in the proteins of the human proteome which have a function related to cardiovascular function. Human proteins were selected for inclusion in this comparison if they were annotated in UniProt with one of the key words shown in Table 15 indicative of a function in cardiovascular physiology or vascular endotheilial integrity.
Peptide pentamer motifs were identified in flaviviruses which matched pentamer motifs in the cardiovascular protein set, where in both cases the pentamer occurred in a predicted linear B cell epitope. The resulting list was manually curated to exclude proteins which contained terms such as “domain containing” and to identify the proteins actually verified as related to or expressed in blood coagulation, platelets, endothelial cells and erythrocytes.
Accession numbers of viruses used in identifying these were as shown in Table 16. Additional strains/isolates of all were used to evaluate conservation. Table 17 shows peptides found in dengue, Zika, and Usutu virus NS1 which have mimics in the human cardiovascular set proteins and which fulfill the B cell epitope criteria.
Some of these mimics may vary depending on the strain of dengue virus, and it will be clear to those skilled in the art that adjustments may be needed on a geographic basis or over time to adapt to changes in mimics which may affect clinical outcome. However, in particular it was noted that all dengue viruses contained a conserved motif SLRTT located in the stable C terminal loop of NS1 between two cysteine bonds (45) at positions 290-311 of the NS1 protein which corresponds to a motif in the C terminal region of ADAMTS13. ADAMTS13 is expressed in endothelial cells and is essential to cleavage to von Willebrand factor. A deficiency of ADAMTS13 is associated with accumulation of multimers of von Willebrand factor, intravascular platelet aggregation, and thrombocytopenia, both congenital and acquired (46, 47). ADAMTS is expressed in endothelial cells. Other motifs were found in coagulation factors V and VIII, von Willebrand factor and in platelet glycoprotein 1B beta which is also associated with acquired autoimmune thrombocytopenia (48) and is expressed in both platelets and endothelial cells. Notably these epitope mimic motifs for cardiovascular function proteins are not present in West Nile virus.
Development of transient autoimmunity to these motifs may arise on initial dengue infection but be exacerbated on re-exposure to a further dengue serotype, potentially further boosted by antibody dependent enhancement, thereby contributing to hemorrhagic signs characteristic of dengue hemorrhagic fever. It would be beneficial to remove such epitopes in a vaccine containing NS1 to preclude sensitization to an anamnestic autoimmune response on exposure to wildtype virus of any of the dengue serotypes.
Diagnosing the basis of mimicry in an antibody mediated autoimmune disease where the initial exogenous driver of immunity and antibody development is not known is a complex task. As indicated in some of the preceding examples the challenge is to identify the commonality between B cell epitopes in an exogenous protein, which may be unknown at the time of patient presentation, and a B cell epitope in a human protein, dysfunction of which is leading to the clinical signs, directly or indirectly. In one approach to this challenge, a microarray is prepared which displays peptides to which antibodies from the subject will bind. As the total number of possible pentamers comprising core peptides of B cell linear epitopes is 3.2 million in an ideal situation all 3.2 million would be arrayed. This has practical limitations and therefore a subset may be selected based on the presenting clinical signs or an array of longer peptides, for instance 15mers or 20 mers can be used each of which comprises multiple pentamers which can be further dissected. Identification of binding to one or many peptides created a more limited set of motifs which can then be searched in both the human proteome B cell epitope database created (Example 1) and in a microbiome or virome of interest and further analyzed.
The B cell epitope peptides in the murine proteome were computed using the process described in Example 1. The analysis was based on the reference mouse proteome documented in Uniprot uniprot.org/proteomes/UP000000589 which is for the C57BL/6J mouse. This proteome, with isoforms, comprises 58,430 proteins. 75% of the mouse genes are in 1:1 orthologous relationships to human genes and have most likely maintained their ancestral function in both species; however, this does not imply the protein sequences and thus B cell epitopes are the same.
As an example of the differences in mimic matches in murine and human proteome we compared matches with B cell epitopes in the envelope protein of Zika virus. Table 18 shows the similarities and differences of epitope mimics between human and murine proteomes across just 9 amino acids of the Zika envelope (strain SPH2015), comprising 5 possible pentamer motifs. For clarity records for duplicate entries (as isoforms) are not shown in Table 18. Even allowing for differences in annotations of proteins there is clearly a wide difference between the two proteomes. This provides an illustration of how over a whole protein or microbial proteome the potential for divergence in mimic matches among species is vast and may have a significant impact on the clinical disease syndrome seen in each species.
musculus GN = Stag2 PE = 1 SV = 3
musculus GN = Adamtsl2 PE = 2
musculus GN = Loxhd1 PE = 4 SV = 1
musculus GN = Tulp2 PE = 1 SV = 3
musculus GN = Cacnalb PE = 1 SV = 1
musculus GN = Ikbip PE = 1 SV = 2
musculus GN = Npy PE = 1 SV = 2
musculus GN = 5330417C22Rik
musculus GN = Cbfa2t2 PE = 1 SV = 3
musculus GN = Prrc2c PE = 1 SV = 3
musculus GN = Znf106 PE = 1 SV = 3
musculus GN = Zfp292 PE = 1 5V = 2
musculus GN = Cacnb4 PE = 1 SV = 1
musculus GN = Cacnb4 PE = 1 SV = 2
musculus GN = Zbtb9 PE = 2 SV = 1
musculus GN = Bcas1 PE = 1 SV = 3
musculus GN = Mink1 PE = 1 SV = 3
musculus GN = Nampt PE = 1 SV = 1
musculus GN = Slc26a8 PE = 2 SV = 2
musculus GN = Vwa3a PE = 2 SV = 1
Parkinson's disease is a chronic neurodegenerative disease characterized by the accumulation of aggregates of alpha synuclein as Lewy bodies, located in motor neurons of the midbrain. The mechanism leading to the alpha synuclein accumulation is not understood. A large number of other proteins have been examined for their association with the etiology of Parkinson's disease. In order to examine whether commonly occurring viruses may have any role in autoimmune mechanisms contributing to Parkinson's and related alpha synucleinopathies, we assembled a panel of the associated proteins in which the probable B cell epitope peptides were identified. The proteins included are shown in Table 19. These proteins were selected based on review of the literature and the Uniprot annotations indicating associations with Parkinson's disease. The epitopes in these human proteins were then compared to a set of potential candidate viromes, comprising common, non-arbovirus, causes of viral encephalitis, including herpes simplex 1 and 2, cytomegalovirus, and measles.
musculus Parkinson disease 7 domain
As an example of the output of such analysis, Table 20 provides an example of the epitope mimics found in measles virus that match those found in the Parkinson's disease associated proteins. The analysis was based on a recent US wildtype isolate (MiV Arizona.USA/11.08/2). This information, used alongside HLA data from a patient which would determine which virus epitopes would be likely to generate high titers is indicative of how the present invention can enable further inquiry to focus on a few proteins in seeking causal associations. A further example is provided in Table 21, where the epitope mimics in the envelope proteins of a HSV1 isolate (Kos). This result would be used as for measles above.
The examples of measles and HSV1 envelope proteins were selected in this Example simply in the interests of space (i.e. by using small virus examples). It does not imply that measles or HSV1 are primary suspects in the eitology of Parkinsons disease, but rather demonstrates an analytical approach that should in no way be considered limiting. While this example shows the application to a virus of interest; it is also indicative of how the invention can be applied to other microbial proteins or environmental antigens.
It will be evident to those skilled in the art that a list or proteins associated with other disease syndromes, particularly those of unknown or complex etiology, could be compiled and a similar analytical approach used to identify potential epitope mimics and autoimmune associations. Thus, the example of Parkinson's disease is not considered limiting.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US17/21781 | 3/10/2017 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62306262 | Mar 2016 | US |