In the investigation of the causes of human disease, there are still many diseases of unknown etiology, or whose etiology is still not well understood. Identifying the cause of disease is of obvious importance, both in developing treatments and better diagnostic tests. At least in some of these diseases, the patient's immune system will mount an immune response that is associated with the disease process. An antibody may be produced in an attempt to eliminate the disease-causing agent. For example, if a microorganism causes a disease, the host will usually mount an immune response (comprising antibodies and/or T lymphocytes) that are specific for the microorganism. Alternatively, the antibody may be autoimmune in nature. In other instances, antibodies may be produced against tumor-associated proteins. Regardless, the immune response might reveal valuable information to help us understand the cause of the disease. Unfortunately, there is currently no technology for identifying the target of an immune response if it is otherwise unknown, without ancillary clinical clues to facilitate an educated guess. Serologic immunoassays (measuring antibody responses) all require that the antigen is already known.
Previous investigators have used an expression screening approach in trying to identify antigens that bind to antibodies of unknown target specificity. One such approach was termed “SEREX” (serological analysis of recombinant cDNA expression) and involved screening libraries of human tumors with autologous serum). SEREX provided for the identification of antigens from a pool of candidate proteins. However, as an expression screening technology, it requires prior knowledge about the cellular source of the antigen. Therefore, the range of possible protein antigens to be identified is limited to those expressed by the cell type used as a source for constructing the cDNA expression screening library. There are many diseases, however, in which the nature of the antigen is completely unknown. In these diseases, the immune response may potentially point to an etiologic agent. Without at least some initial clues from a clinical context, it has not previously been possible to identify an antibody's target protein.
A new platform discovery technology harnesses the ability of the immune response to identify disease-associated proteins recognized by the immune system. This new technology is unique in that it doesn't necessarily require prior assumptions about the source of the antigen, providing an entirely new capability with which to explore disease pathophysiology. We call it “Epitope-Mediated Antigen Prediction (E-MAP)”. E-MAP is a protein identification technology. With E-MAP, we search broadly through the protein database using an antibody's predicted epitope as an in silico search probe.
E-MAP comprises at least two new aspects that make it possible to successfully identify antigens from antibodies. First, we have developed a method to identify a peptide sequence that reasonably accurately represents the epitope in the native protein sequence. We accomplished this by discovering that native protein sequences usually have higher affinities for the antibody as compared to homologous peptides that also bind to the antibody. Therefore, we developed methods of screening peptide combinatorial phage libraries that stringently select the most avidly binding phage. We also determined the effect of mismatches between the predicted and actual linear sequence and identified the thresholds of accuracy that are necessary in order to obtain an accurate match from the protein database.
In a second aspect, a bioinformatics search method is described. A significant hurdle in protein database searching with predicted epitopes was that single epitopes usually do not have enough information to accurately narrow down the list of candidate proteins if the entire protein database is searched, which includes proteins from all organisms. With 4-6 amino acids, there are too many protein database hits. We have discovered that this problem can be solved by searching with two epitope motifs simultaneously, from two different antibodies. We demonstrate for the first time that a concurrent search with two short epitope motifs, derived from the epitopes of two different antibodies to the same protein, contain sufficient information so as to converge on the true target. Such a pairwise search imposes the constraint that both antibodies must bind to the same protein.
It is usually not possible to know, a priori, if the two antibodies (of unknown specificity) bind to the same antigen. It is a trial and error process. Therefore, we assessed the consequences of searching with two motifs belonging to two different proteins. We find that such mismatched searches do not generate long lists of irrelevant database hits. The few hits that do result can usually be distinguished from true matches. The E-MAP method can be useful in a clinical context where more than one antibody to an etiologic agent is present.
As yet an additional aspect, the use of various immunoassays for human herpesvirus 5 (cytomegalovirus) in determining the antigen binding specificity of a paraprotein in multiple myeloma is described. The same can be true for the immunoglobulin synthesized by malignant lymphocytes in other gammopathies and lymphoproliferative disorders, such as amyloidosis AL, lymphoma, and leukemia. These immunoassays can take various forms, and examples are described herein that include both solid phase immunoassays and electrophoretic blots.
A technology to identify the antigenic target of disease-associated antibodies, not encumbered by the need to know the target's cellular source in advance, would be a valuable tool in life sciences research. Such a technology could take advantage of the fact that the antigen combining site is a unique structural aspect of every antibody. A portion of the antigen (the “epitope”) fits into the three-dimensional pocket of the antibody's antigen combining site (the “paratope”). By using an antibody to identify the amino acid sequence that comprises the epitope, such a technology would ideally link disease-associated antibodies with the protein antigens to which they bind. Thus, the unique linear sequence of an epitope might be considered analogous to a fingerprint. Just as it is possible to identify a person from a mere fingerprint, a technology to identify an antigen from just an antibody's epitope might create new opportunities in life sciences research.
It is technically possible to identify peptides that bind to the antigen-binding site of antibodies. These peptides are identified from peptide combinatorial libraries, usually expressed in M13 bacteriophage. This approach has been useful where the protein antigen is known, and the investigator is trying to identify the specific epitope on the protein to which the antibody binds. There are many examples in the published literature of epitope mapping using phage displayed peptide combinatorial libraries. In those examples, investigators deduce the epitope by analyzing the peptide inserts from phage that bind to the antibody. The epitope in the native protein is identified by searching for areas of similarity between the peptide inserts and the protein's amino acid sequence.
It has not previously been possible, however, to use these peptide inserts to identify unknown target proteins. A short peptide motif (4-6 amino acids) does not possess enough information content to uniquely identify a candidate antigen in broad bioinformatic searches of proteins from all species (i.e., the non-redundant protein database). The retrieved hit list from a protein database search is usually large, with hundreds or thousands of database hits effectively burying the true matching protein in the noise of extraneous results. In this patent application, we describe a technology to solve this problem.
There are three general obstacles to identifying a protein from a database using experimentally characterized epitope motifs. First, there is always some degree of uncertainty in reconstructing an epitope by phage display of peptide combinatorial libraries. A peptide combinatorial library, also known as a “random peptide library”, is comprised of a large collection of peptides, typically expressed in a vector, such as M13 bacteriophage. Each phage particle typically expresses a peptide on its surface that is usually different from the next phage particle, due to chance random combination from when the library was constructed. For us to reconstruct a peptide epitope by screening and analyzing a phage display library, it is necessary to identify a peptide that accurately represents the epitope of the native protein. However, antibody binding is somewhat promiscuous, in that antibodies will bind to many homologous peptides with varying affinities. It is important to develop a method to identify the peptide that, as accurately as possible, represents the native protein epitope.
In addition, even with a peptide that accurately represents the epitope in the native protein, the peptide must have sufficient information content (length) so as to distinguish the true match from the many other proteins in the database that are similar. Most epitopes do not have a sufficient number of amino acids to do that. With a typical 4-6 amino acid peptide that is identified from phage display, hundreds or even thousands of plausible protein matches will result from a protein database search, especially if allowance is provided in the search parameters for one or two errors or conserved substitutions. A method to further narrow the search is needed before this approach will be practical.
Lastly, proteins are catalogued in protein databases by their linear amino acid sequences. Therefore, a technique using protein database searching, such as E-MAP, only works if the predicted epitopes represent linear determinants. Since we cannot know a priori which predicted epitopes are linear versus conformational, this uncertainty might potentially lead to false matches. We investigated the potential impact of these parameters to bioinformatics searches.
In this study, we also apply the E-MAP technology to an exemplary disease context—multiple myeloma. It is generally believed that malignant transformation in multiple myeloma is due to the accumulation of mutations in the cell cycle and apoptosis regulatory control genes, leading to uncontrolled cellular proliferation. There has been little consideration to the role of antigen, such as infectious agents, as a growth stimulus for the malignant cells of multiple myeloma. One way to determine the antigenic specificity of myeloma cells would be to analyze the antigenic specificity of the secreted paraprotein. The secreted paraprotein has the same target specificity as the B cell receptor, and therefore is a convenient protein for analysis, as it is abundantly present in serum
The literature on paraproteins includes descriptions of paraprotein targets that were identified by chance clinical associations. They include individual case reports of paraproteins binding to the p24 gag protein of HIV [Jin, D., et al. Amer. J. Hematol. (2000) 64:210-213.], cytomegalovirus [Kohler, M., et al. Blut. (1987) 54:25-32.], or streptolysin-O [Waldenstrom, J., et al. Acta Medica Scandinavica. (1964) 176:619-631; Seligmann, M., et al. Nature. (1968) 220.], all of which were identified after serological assays on the patients came back with unexpectedly strong positive results. In other cases, a handful of paraproteins immunoreactive with carbohydrate specificities were identified after testing dozens or hundreds of paraproteins for immunoreactivity to various bacteria.[Kabat, E., et al. J. Exp. Med. (1980) 152:979-995; Emmrich, F., et al. Scand J Immunol. (1985) 21:119-126.] These cases likely represented cross-reactive epitopes and not the actual microbial antigen that stimulated immunoglobulin synthesis prior to malignant transformation. Therefore, there is little already known about the antigens to which paraproteins bind.
In the example of multiple myeloma, E-MAP analysis directed us to the human herpesvirus 5, also known as human cytomegalovirus (CMV or HCMV, used interchangeably). CMV is known to be a powerful immune stimulus, often resulting in such a profound clonal expansion as to produce paraproteins in otherwise healthy individuals [Buhler, S., et al. Clin Infect Dis. (2002) 35:1430-3.] as well as immunosuppressed patients. [Vodopick, H., et al. Blood. (1974) 44:189-195.]
In normal, healthy HCMV seropositive individuals, HCMV-specific CD8+ T lymphocytes comprise approximately 0.1% of the peripheral blood population, as measured by limiting dilution analysis. [Wills, M., et al. J Virol. (1996) 70:7569-7579.] The proportion of HCMV-reactive lymphocytes increases with age, exacting an increasingly heavy burden in elderly individuals. MHC tetramer analysis of elderly HCMV-seropositive individuals indicates that, on average, approximately 5% [Komatsu, H., et al. Clin. Exp. Immunol. (2003) 134:9-12; Khan, N., et al. J Immunol. (2002) 169:1984-1992.] of the CD8+ T lymphocytes may be specific for the HCMV pp65 immunodominant peptide. This figure may underestimate the percentage of T lymphocytes reactive with HCMV proteins since, contrary to previous belief, the T cell repertoire is not as focused solely on pp65 as was originally thought. [Khan, N., et al. J Immunol. (2002) 169:1984-1992; Elkington, R., et al. J Virol. (2003) 77:5226-5240.] Such a long-lasting, strong immune response to a single agent, years after initial exposure, may be due to chronic repetitive viral reactivation. [Sissons, J., et al. J Infec. (2002) 44:73-77; Soderberg-Naucler, C. J Intern Med. (2006) 259:219-46.] As a consequence, HCMV induces significant alterations in the immune parameters of elderly individuals. [Wikby, A., et al. Exp. Gerontol. (2002) 37:445-453; Looney, R., et al. Clin. Immunol. (1999) 90:213-219.]
E-MAP Protocol Overview
The E-MAP method incorporates two components, illustrated schematically in
The second step in the E-MAP process (
Requirements for Generating a Consensus Peptide Sequence Motif
In order to identify meaningful protein matches from predicted epitopes, it is important to maximize the certainty about the identity of each amino acid in the sequence. Uncertainty in the predicted epitope can inappropriately skew the content of the retrieved hit list. It also makes the assessment of potential database search results more difficult, lowering the likelihood of successfully identifying the antigen in question. Using peptide phage display [Kehoe, J., et al. Chem. Rev. (2005) 105:4056-4072.] we are essentially carrying out a casting process on a molecular scale. We are filling the antibody's binding site (the “paratope”) with random oligopeptides, and identifying which peptide sequences are the highest affinity binders. We then reconstruct a virtual best fitting consensus motif by analyzing the commonalities of those peptide sequences. We usually find certain positions in a motif to be invariant while others may exhibit conserved substitutions. These substitutions generate uncertainty in knowing the amino acid sequence of the native protein, affecting the size of the database search hit list and potentially skewing its contents. In our experience, a consensus motif usually emerges from the data if high stringency screening techniques are used during the phage display component.
Screening the peptides for strong binders. The selected peptides that bind most strongly to the antibody are identified by high stringency screening. High stringency screening is achieved by repeated rounds of positive and negative selection followed by a selection for the peptides most immunoreactive with the selecting antibody, using an immunoassay, such as an immunoblot. Positive selection refers to selecting phage that bind to the antibody of interest. Typically, the antibody is attached to a solid phase, such as paramagnetic beads. Negative selection refers to depleting from the library those phage that bind to one or more irrelevant antibodies. This process removes phage that may bind to invariant regions of antibody, outside the paratope (antigen-binding region of the antibody).
Our preferred method of screening the peptide library is to perform two or three rounds of selection. Each round of selection represents a positive-negative-positive series of selections before amplifying the phage by transfection into E. coli. According to this protocol, the peptide library expressed in phage is mixed with paramagnetic beads coated with the desired antibody. After allowing a suitable amount of time for binding, the paramagnetic beads are collected in one end of a test tube. Irrelevant phage particles contained in the supernatant are removed. Tightly-bound phage particles expressing peptides that are immunoreactive to the antibody are then eluted (pH 2.5) and the eluate is neutralized. The eluted phage are then allowed to bind to irrelevant antibodies (negative depletion). After collecting the paramagnetic beads in one end of the test tube, the unbound phage found in the supernatant are then used for another round of positive selection. The eluate of this second round of positive selection is then used to transfect E. coli. Transfection into E. coli amplifies the number of phage present, as the phage replicate within E. coli. After amplification, the process is repeated.
Computer Modeling of E-MAP Requirements
In order to better understand the requirements for accurately identifying the correct protein from an epitope, we first tested two variables: the length of the epitope and the fidelity with which the predicted epitope matches the actual sequence in the protein database. We expected that longer epitope lengths (more information) and higher epitope sequence fidelity to the native protein (average motif conservation) will both result in a greater likelihood of obtaining a correct database match.
To study the relationship of epitope length and average motif conservation on the success rate in protein database searching, we performed an in silico experiment.
The pseudoclones were then run through the MEME and MAST bioinformatic algorithms, searching the non-redundant protein database, and scored for the predicted epitope's ability to identify the target protein. The “success rate” (y axis in
It is our experience that with high stringency screening of phage libraries followed by selection of the most immunoreactive peptides, we generally obtain 60-80% average motif conservation. Some antibodies select a narrowly-defined range of phage clones with an average motif conservation towards the higher end of that range, such as described in
In our experience, epitope reconstruction by phage display of peptide combinatorial libraries typically yields a consensus motif four to six amino acids long. With higher stringency screening techniques, and by analyzing more phage clones, we can sometimes extend that consensus motif further. Allowing for a small degree of error in the sequence, such as conserved substitutions, the likelihood of a successful match to the protein database depends on epitope length. These in silico data (
In order to maximize the accuracy and length of a consensus sequence, we have found that screening the selected phage particles for peptides that are most immunoreactive for the selecting antibody is important. The peptides expressed on phage that bind best to the selecting antibody most closely resemble the epitope in the native protein. Occasionally, consensus sequences can be generated using this method that have at least seven amino acids (e.g., antibody 3,
One surprising finding from our simulation model, illustrated in
The details of how we generated the aforementioned model data are described in the following three sections, entitled “sequence generation”, “single motif searches”, and “multiple motif searches”:
Sequence Generation. To generate sets of sequences for computer analysis, short sequences of predefined length N were selected randomly from the NCBI nr (non-redundant) protein sequence database. These sequences were then used to construct a position specific scoring matrix (PSSM), with the degree of residue conservation at each position perturbed by a Gaussian function around the average conservation, C. These matrices were used to generate 20 “pseudo-epitopes” (mock phage clone peptide inserts), also termed “pseudoclones.” The pseudoclones contained the epitope motif at random positions within a 20-mer, flanked by randomly generated residues. Therefore these pseudoclones contained combinatorially-scrambled motifs, each with varying degrees of sequence conservation relative to the chosen native protein epitope sequence, but on the whole approaching the defined average conservation when looked at as a group.
Single Motif Searches. For each target epitope, sequences were generated as described above. These pseudoclone sequences were used as an input to the motif searching tool MEME [Bailey, T. L., et al. J Steroid Biochem Mol Biol. (1997) 62:29-44.]. The MEME output motif was then given to MAST [Bailey, T., et al. Bioinformatics. (1998) 14:48-54.], which was used to search the non-redundant (nr) database. Success was defined as recovering the original protein sequence within the top 10 MAST database hits. The above-described test was performed 40 times for each value of N and C. These success rates were averaged over 40 runs to obtain an average and standard deviation.
Multiple Motif Searches. To generate the success rates for two motifs in a pairwise search, proteins were randomly selected from the non-redundant (nr) database and random spans were chosen as target epitope sequences. For each protein, two non-overlapping epitopes of lengths 5-8 amino acids were randomly chosen from the nr database. Each epitope was used to generate pseudoclones (as described above) which were then processed with MEME. Both MEME motifs were then given to MAST. The average success rate and standard deviation were calculated as for the single motif searches.
From this analysis, we learned that in searching the non-redundant (nr) database, there is an inflection point at seven amino acids. Consensus motifs at or longer than seven amino acids have a much higher probability of success in finding the true protein target as compared to motifs shorter than seven amino acids. We can do this by finding better ways to generate the consensus motif, such as using high stringency screening, selecting only those phage clones expressing peptides that are most immunoreactive. Shorter consensus sequences, comprising five or six amino acids, may suffice if smaller protein databases are searched. Another method to surmount the threshold of seven amino acids is to use a pairwise bioinformatic search strategy.
Pairwise Epitope Submission: Conceptual Framework
Pairwise epitope submissions to the protein database dramatically increase the statistical power of a search, beyond what is possible with a single epitope. Querying two motifs simultaneously asks which proteins contain both predicted epitopes. From a clinical standpoint, it may require that a particular disease is caused by a single antigen, or a limited repertoire of antigens, in at least a group of patients. As a consequence, there are two or more antibodies to a target protein antigen in a patient sample, both of which will provide information about the protein's identity. In practice, one often cannot be certain that pairs of antibodies from patient sera are, in fact, directed to the same target. This problem can be surmounted as described later.
The conceptual underpinning for pairwise submission and how it is distinguished from single epitope searches is illustrated in
A pairwise search provides the needed discrimination to correctly prioritize a database search result.
We tested this hypothesis in silico, measuring the success rate for a pairwise submission strategy. The average motif conservation was held constant at 0.7, a typical figure in our experience for high stringency phage display screening. The results are listed in Table 1. Unlike single motif submission, the combination of two motifs with lengths 5-6 amino acids now becomes highly predictive (67-87% success rate). This success rate is in contrast to the expected result if each motif is searched individually (≦15% success rate).
To test the E-MAP methodology, we used a model system relating to two proteins—the human estrogen and progesterone receptors. We investigated whether we could identify these proteins by running through the epitope prediction protocol and bioinformatic algorithm (as summarized in
Predicted Epitope Identification
We tested these theoretical predictions using monoclonal antibodies to the steroid hormone receptors human estrogen and progesterone receptors. The antibodies were attached to paramagnetic beads and used for biopanning experiments. Monoclonal antibodies 1 and 3 bind to human estrogen receptor whereas antibodies 2 and 4 bind to progesterone receptor. These antibody specificities were chosen arbitrarily, since they were already in the lab and well characterized. We have no reason to believe that the results would be materially different had we chosen alternative antibody protein targets.
Several different phage libraries were employed, all encoding for random peptide inserts near the amino terminus of the cpIII M13 protein. The libraries contained six, eight, ten, eleven and twelve amino acid variable inserts in a constrained ring formation created by disulfide-bonded flanking cysteines. More recently, we are using linear libraries so as to avoid the additional uncertainty created by the invariant cysteines required for cyclic peptides. Details of the phage libraries and selection of phage (biopanning), DNA sequencing, and protein translation are known in the art of phage display, and summarized in the following three sections, entitled “phage-display libraries and biopanning”, “DNA insert sequencing”, and “protein translation”:
Phage-Display Libraries and Biopanning
Phage libraries contained rationally designed random combinatorial libraries of peptide sequences inserted into the N′ terminus of the pIII minor coat protein of the M13 bacteriophage. The cyclic 6-mer and 10-mer libraries contained two conserved cysteine resides separated respectively by four or eight amino acids. The cysteines formed a disulfide bridge, creating a conformationally constrained ring. [McLafferty, M., et al. Gene. (1993) 128:29-36.] Trinucleotide-mutagenesis technology, involving controlled polymerization of preformed trinucleotides, was used to diversify the amino acids within the ring and three amino acids on either side of the ring, allowing all amino acid types (except cysteine) with equal frequency. [Virnekas, B., et al. Nucl. Acid. Res. (1994) 22:5600-5607.]
Phage selection by biopanning. The libraries were enriched for binding to antibodies by biopanning using standard methods [Smith, G., et al. Chem. Rev. (1997) 97:391-410.] with a few modifications. Briefly, paramagnetic beads coated with anti-mouse IgG (Dynabeads; Dynal Corp., New York, N.Y.) were prepared by mixing either the ER- or PR-specific mouse mAbs (for positive enrichment) or the polyclonal mouse IgG (for negative depletion) and incubating overnight at 4° C. on a rotator. Antibody-adsorbed Dynabeads were washed five times with phosphate-buffered saline containing 0.05% Tween-20 (PBS-T) and twice with PBS before use in biopanning of phage libraries. A cyclic 6-mer or cyclic 10-mer phage library containing 1011-1012 plaque-forming units was negatively depleted by incubation with Dynabeads (100 μL) coated with polyclonal mouse IgG for 1 h at room temperature on a rotator. This negative depletion step removes phage that may bind to constant regions of mouse IgG. The unbound phage (supernatant) were then positively selected on the (ER or PR-specific) target mAb-adsorbed Dynabeads. The phage library was incubated with the mAb-coated beads for 2-3 hours on a rotator.
The beads were washed 10 times with PBS-T and three times with PBS to remove nonspecifically bound phage. Phage particles that bound to the mAb-coated beads were eluted with 0.1 mol/L glycine-HCl (pH 2.2) containing 1 g/L bovine serum albumin (BSA). The recovered eluate was neutralized with 1 mol/L Tris-HCl (pH 9.0). To ensure that the bound phage were completely eluted, the beads were treated a second time with elution buffer and the eluate was neutralized. The two eluates were pooled. The eluted phage were amplified and used in a second round of biopanning. After two rounds of positive selection, Escherichia coli were infected with the cultured phage and grown on agar plates.
DNA Insert Sequencing
Phage clones that had high specific immunoreactivity for the selecting antibody were submitted for further analysis, by sequencing the nucleotide inserts coding for the combinatorial peptides. The sequencing template was prepared by PCR amplification from an overnight phage culture. The primers used for PCR were 5-CGGCGCAACTATCGGTATCAAGCTG-3 and 5-CATGTACCGTAACACTGAGTTTCGTC-3. Thirty rounds of PCR were performed on an MJ Research Tetrad thermocycler (MJ Research, Inc.). The PCR product was diluted 1:20 with distilled H2O. Sequencing was performed in both the forward and reverse directions with the following primers: 5-GATAAACCGATACAATTAAAGGCTCC-3 and 5-GTTTTGTCGTCTTTCCAGACGTTAG-3. ABI Big Dye™ (Ver. 1.0) was used to perform a 5-μL sequencing reaction [2 μL of Big Dye, 1 μL of distilled H2O, 0.5 μL of primer (at 3 pmol/μL), and 1.5 μL of diluted PCR product]. The samples were then cycled for 45 rounds on an MJ Research Tetrad thermocycler. After cycling, 2.5 volumes of absolute ethanol were added, and the mixture was centrifuged at 1850×g for 30 min. The plates were inverted over paper towels, and then centrifuged at 100×g for 30 min The samples were resuspended in 5 μL of distilled H2O and detected on an ABI 3700 DNA Analyzer.
Protein Translation
The determined nucleotide sequences of the inserts were translated in silico using the Translate tool from ExPASy Proteomics Server of the Swiss Institute of Bioinformatics (SIB) web utility available at (http://ca.expasy.org). The translated protein sequences could be verified to be in frame by identification of invariant elements of the cpIII protein and the hallmark presence of the invariant cysteines (in the cyclic peptides).
E-MAP Validation Results with ER and PR Monoclonal Antibodies
After two rounds of biopanning, we found moderate sequence variability in the peptide inserts when sequenced phage clones were selected at random (data not shown). We found that when the second round phage clones were then screened on the basis of high affinity binding to the selecting antibody, the sequence variability decreased. The peptide insert amino acid sequence from each phage clone is shown in
Post-biopanning screening of phage clones to identify strongly binding peptides Replicate plaque lifts were created by laying nitrocellulose membranes onto the aforementioned agar plates, at 4° C. for 1 hour. The membranes were marked for orientation, carefully lifted from the agar, and placed at 65° C. to dry for 5 minutes. The membranes were then blocked with 5% non-fat dry milk in TBST (Tris-buffered Saline with 0.5% Tween-20) and then rinsed twice with TBST alone, without milk The selecting (ER- or PR-specific) mAb was prepared in TBST (2.5 mg/L) and placed on the membrane for 2 hours at room temperature or at 4° C. overnight. The membranes were then washed eight times with TBST and incubated with anti-mouse-IgG-Horseradish peroxidase (HRP) conjugate (Sigma Chemical Co., St Louis, Mo., 1:5000 dilution) for 1½ hours. A chemiluminescence protocol was used to visualize patterns of immunoreactivity (ECL Western Blotting Detection Reagents, Amersham Biosciences). Developed films were oriented to the corresponding agar plates by the markings we had made. The most immunoreactive spots (representing distinct plaque colonies) were picked and grown for further analysis. A second replicate lift was usually obtained and worked up in like manner as a control, testing non-specific immunoreactivity of the phage clones to mouse polyclonal IgG (representing the negative control).
Data Analysis from ER and PR Antibody Test
Analysis of Strongly-Binding Peptides so as to Identify the Consensus Peptide Sequence.
We used the MEME (Multiple Expectation-maximization for Motif Elicitation) software utility to identify motifs in the sequenced peptide inserts. [Bailey, T. L., et al. J Steroid Biochem Mol Biol. (1997) 62:29-44.] The program was instrumental for generating standardized and systematic motif determinations. MEME considers the relative presence of amino acids at each position of the emerging dominant motif. This leads to the creation of a consensus motif profile, capturing each phage clone's sequence information in a position-specific scoring matrix (PSSM), a two dimensional numeric array. The profile is, in essence, a virtual mimotopic array of the peptides that bind to the antigen-binding site of the antibody (the “paratope”). Using such a profile in a bioinformatic search offers distinct advantages. Instead of searching with a single “best-guess” query representing the dominant motif, the queried profile considers a larger number of combinatorially weighted sequences, averaging around the dominant motif.
Due to the stringent phage panning selection process, the individual phage peptide inserts had a high degree of consensus. The average positional conservation of each motif ranged from 73.25-95.2%. Even though there was a high degree of homology amongst the individual peptides, the derived consensus peptide sequence is not always an exact match to the native epitope. For example, the consensus motif for the Antibody 1 is SR(S/
The consensus motif of the second antibody epitope (Antibody 2) was predicted to be QAPYY (
Analysis of the third antibody determined the consensus motif to be GDF(P/
The fourth antibody's predicted sequence, LHQCQ, was close to the native sequence LHQIQ. Again, the difference is due to the invariant cysteine (C) being substituted for isoleucine (I) in the native sequence. With these predicted epitopes, we identified the likely corresponding sequences in the native protein. We then tested our predictions by determining if the monoclonal antibodies bind to peptides from the native sequence. In each case, the monoclonal antibodies were immunoreactive with their corresponding peptide fragment. [Sompuram, S., et al. Amer. J. Clin. Pathol. (2006) 82-89.] With these predicted epitopes in hand, we then asked if we could have deduced the correct protein from a protein database search using single or pairwise searches.
Identification of Antigens from the Non-Redundant Protein Database.
We used the MAST (Motif Alignment and Search Tool) utility [Bailey, T. L., et al. J Steroid Biochem Mol Biol. (1997) 62:29-44.] to perform single and pairwise motif searches against the non-redundant (nr) protein database. The pairwise submission finds proteins containing both predicted epitopes. The retrieved hits are ranked according to their combined p-value, which evaluates the two epitopes' degree of maximal homologous alignment to the database entry. In this way the algorithm creates a ranking system with stringent matching criteria. [Bailey, T. L., et al. J Comput Biol. (1998) 5:211-21.] The methods for bioinformatic searching are described in the following section, entitled “bioinformatic searching method”.
Bioinformatic Searching Method
The variable regions of the inserts were transcribed into the FASTA form and submitted to MEME (Multiple Expectation-maximization for Motif Elicitation), available at http://meme.sdsc.edu/meme/intro.html). The MEME output contains the submitted peptides rank-ordered for the presence of the dominant motif determinants.
Single motif searching. To carry out bioinformatic searches using a single consensus motif, the PSSM was submitted to the MAST (Motif-Alignment and Search Tool) utility, available at http://meme.sdsc.edu/meme/intro.html, to be searched against the nr (non-redundant) protein database while allowing a maximal E-value (expectation value). The first 500 hits were then screened for the presence of the known target. Alternatively, a single consensus sequence (instead of a PSSM) can also be used for database searching using the MAST or BLAST protein database search programs. Other protein databases can be searched (other than the non-redundant protein database), if there is information that allows the search to be narrowed. Alternatively, it is possible to limit the search results based on other criteria, such as the type of organism. Such limits may dramatically change the threshold requirements for successful identification of protein database matches. For example, whereas a seven amino acid homologous sequence may be required when searching the non-redundant protein database, fewer amino acids will be required if other search constraints (such as type of organism) co-exist. The specific threshold of amino acid number will depend on the circumstance, such as the size of the proteome being searched.
Pairwise motif searching. For pairwise motif searches, the PSSMs from two motifs were combined and submitted to MAST. The MAST database search program will return many hits, which can be ranked by their position p value, sequence p value, and combined p value of alignment. These terms are defined, and the program more thoroughly described, at http://meme.sdsc.edu/meme/mast-output.html. Briefly, when tentative matches are found, each is given a score, reflecting how well the motif's PSSM fits the particular span from the identified sequence. The position p value of an alignment is defined as the probability of a random span in a randomly generated sequence having a match score at least as large as that of the given motif. The sequence itself is assigned a p value which is defined as the probability of a random sequence of the same length having a match score at least as large as the highest scoring match in the sequence. MAST also assigns a combined p value, defined as the probability of a randomly generated same length sequence having sequence p values whose product is at least as small as that of the matches of the motifs to the given sequence. Based on the latter determination, an expectation value (E-value) is generated by multiplying the combined p value of a sequence by the number of database entries. The E-value can then be thought to represent the expected number of sequences in a random database of equal size that would match the motif(s) at least as well.
For most of our pairwise analyses, we set the E-value to <10 and the threshold value for motif display to p≦0.0001. Any proteins found with a qualifying E-value of <10 solely on the basis of a single motif were disqualified. Instead, we wanted to see homologous portions of both (not just one) peptides in the protein candidate identified by MAST. For the ER and PR test model, all possible pairwise combinations of the four determined motifs' PSSMs were analyzed in this manner.
Single motif search results for ER and PR antibody epitopes. Single motif searches are not generally successful, unless the epitope length is unusually long. In the single motif submission analysis against the non-redundant (nr) database, the heptamer SR(S/
The only apparent exception to the pattern of single motif searches was the octamer GDF(P/
Pairwise motif search results for ER and PR antibody epitopes. For pairwise searching, we set the expectation value (E-value) to ≦10 and the threshold value for motif display to p≦0.0001. This effectively returns hits that have high scoring alignments for both motifs.
The outcomes of the database searches for single versus pairwise submissions were markedly different. Concurrent alignment of two motifs results in a more stringent database search, effectively re-ordering the hits that each motif may potentially have retrieved individually. For instance, pairwise analysis reveals SRSCXSY (Antibody 1, PR) to partially align with its true cognate target ARSPRSY, a fact not evident in the first 500 hits of the single search for this motif. In this case, SRSCXSY serves to also corroborate the tentative PR identification based on GDFPDC (Antibody 3, PR). The case was similar for QAPYY (Antibody 2, ER), whose target was not in first 500 hits when queried singly due to a single amino acid mismatch to the native sequence QVPYY. This instance is also rather remarkable in demonstrating how two short motifs (Antibody 2×Antibody 4, both pentamers with a single mismatch), which would not be expected to fare well in single searches (according to the model shown in
The pairwise motif searches (
Distinguishing True from False Database Search Results.
When working with antibodies to unknown protein antigens, it is generally not possible to know, a priori, if the epitopes are actually on the same target protein. In any disease, patients may be producing many antibodies and we do not know if those antibodies are to one or many antigens. Even a single inciting microorganism may elicit antibodies to many different proteins. Some of the immunodominant epitopes may also be to conformational determinants and wouldn't be useful through this type of protein database search. An important concern in performing pairwise analysis is what might happen with pairwise submission of predicted epitopes that do not correspond to the same antigen. For E-MAP to be practical, such mismatched pairs should not yield database search results that will mislead a research investigation. This criterion is important, since pairwise searching might otherwise create an inordinately long list of false candidate target antigens. If the E-MAP technique is to be practical, then it is important to be adaptable to real-life situations where we do not know, a priori, whether the targets are correctly matched or not.
We found that inappropriate pairwise epitope searches can usually be distinguished.
Inappropriately paired predicted epitopes result when the two antibodies are directed to different antigens, in this case between epitopes for the human estrogen and progesterone receptors. The same situation would exist if one of the antibodies binds to a conformational determinant
So far, we have two threshold criteria: the presence of both motifs in the candidate protein and a low E value (e.g., ≦10 in the examples shown.). The low E value reflects a close matching of amino acids, between the predicted epitopes and the candidate protein. In analyzing the database search results in
In our data set, true matches can be distinguished from false ones by the degree of identity and homology for each entry. In this context, homology is a broad term referring to the degree of similarity in two amino acid sequences, which includes both identity (the same exact amino acid) or a conserved amino acid substitution. Identity represents a closer match than a conserved substitution, which in turn represents a closer match than a non-conserved substitution. A conserved amino acid substitution is one which two amino acids, although different, still belong to the same class. A common classification method includes aliphatic amino acids (glycine G. alanine A, valine V, leucine L, isoleucine I, referring to their single letter abbreviations), non-aromatic amino acids with hydroxyl groups (serine S and threonine T), amino acids with sulfur groups (cysteine C and methionine M), acidic amino acids and their amides (aspartic acid D, asparagines N, glutamic acid E, and glutamine Q), basic amino acids (arginine R, lysine K, histidine H), aromatic amino acids (phenylalanine F, tyrosine Y and tryptophan W), and imino acids (proline P). For example, both tyrosine and phenylalanine are both aromatic amino acids.
True matches can be distinguished from false ones by applying the following qualifying criteria: (a) For a five amino acid predicted epitope, an identical match in four positions out of five positions (80% identity) will distinguish true from false matches; (b) For a seven amino acid predicted epitope, identity in 4 positions (60% identity) and homology (either identity or conserved substitution) in at least 2 more (85% overall alignment match) will distinguish true from false matches; (c) For an eight amino acid epitope, identity in 6 positions (75% identity) and homology in at least 1 more (87.5% overall) makes the distinction. Applying this third criterion to the data set in
Summary of E-MAP Technology
Our newly described E-MAP technology is a valuable new investigative tool for uncovering the target of immune responses in various diseases. The new investigative capabilities of E-MAP may be useful for elucidating the etiology of various diseases, including B and T lymphoproliferative disorders, inflammatory diseases of unknown etiology, allergy, and autoimmunity. The only requirements for using the technique are the availability of antibodies, preferably monoclonal, and that at least some of them recognize linear epitopes. In addition, E-MAP requires that the true protein antigen, or a homologue, be present in the protein database. Pairwise searching may be equally useful in analyzing T lymphocyte targets in inflammatory diseases of unknown etiology. Unlike antibodies, the T lymphocyte receptor always recognizes linear epitopes, eliminating the drawback of unproductive searches due to antibody recognition of conformational epitopes.
An important new feature of this technology is the use of a screening step, selecting only the most immunoreactive phage binders to the selecting antibody. By including this step prior to phage clone selection, we select for phage particles expressing peptides that bind most strongly to the selecting antibody. We discovered that these peptides most closely resemble the epitope to where the antibody binds in the native protein. The screening step can be an immunoblot or other immunoassay that tests immunoreactivity of the phage particles to the selecting antibody. If the entire (non-redundant) protein database is being searched with the resulting sequence, then our predictions show that the consensus sequence must have at least seven amino acids that are homologous to the native protein. If smaller protein databases are searched, then fewer amino acids will suffice.
An important new feature of the E-MAP technology is the pairwise search analysis. This feature overcomes the statistical limitation that previously precluded finding accurate matches with most predicted epitopes. Searching the protein databases simultaneously with two, even short, predicted epitopes provides sufficient statistical power to accurately retrieve the correct protein target from the protein database. Such a pairwise motif analysis essentially “co-immunoprecipitates” the true antigen target in silico.
This pairwise analysis can yield strikingly different results compared to single search protocols currently in use. With a single epitope search, even one amino acid substitution can dramatically skew the search results. Because of this potential for error, top ranking search results from single epitope database searches may exhibit complete sequence identity in their alignment with the predicted epitope probes and still be incorrect matches. In fact, dozens or even hundreds of database hits may be exact matches or have only one amino acid substitution, depending upon the length of the predicted epitope. The longer the predicted epitope, the more unique that sequence will be, yielding fewer closely matching database search results. It is therefore difficult to critically evaluate such a large number of potential antigens and select candidates for experimental verification.
Pairwise motif analysis, on the other hand, combines the predictive power of two motifs, thereby establishing an even higher level of search stringency. The net result is the reorganization of candidate hit lists compared to single epitope searches, revealing a new set of search results with the requisite presence of both motifs appearing in declining order of relative combined alignment. Thus, E-MAP results do not independently prove that a particular protein is an antibody's target. Rather, E-MAP identifies a short list of potential protein candidates for further testing and evaluation.
In most instances, the predicted epitope is closely homologous to the eliciting epitope in the native protein. This is a testament to the power of the phage display technique that, by using a random peptide combinatorial library, provides an antibody with a staggering array of oligopeptides from which to select. By imposing high stringency selection conditions, proper phage to antibody ratios, and a post-panning immunoblot selection of individual clones, the selected phage clones' peptide inserts generally observe a tight convergence to the native protein epitope. There is always some degree of uncertainty in predicting epitopes using phage-displayed combinatorial peptide libraries. We have shown, however, that a small amount of uncertainty can be tolerated in the bioinformatics algorithms.
It is possible to narrow the search if there is information about the protein target from prior clinical investigation. The non-redundant protein database comprises the largest set of entries, spanning all species. If, for example, one has reason to believe that the protein is microbial in origin, then a more restricted database search, limited to microbial proteins, can be used to narrow the search parameters. The various protein databases have been described elsewhere [Apweiler, R., et al. Curr Opin Chem Biol. (2004) 8:76-80.] and specific subsets can be downloaded from various sources to be searched separately. With more limited searches, fewer amino acids than seven will suffice in the consensus sequence, for single epitope protein database searching. Pairwise searching will also likely yield a shorter list, with fewer irrelevant potential protein database matches, if a smaller protein database can be searched because of the availability of information limiting the protein to a particular species or group of species.
A limitation of E-MAP is that conformational epitopes will not yield matches in the protein database. Although some textbooks suggest that conformational epitopes may predominate in immune responses, we believe that this conclusion may somewhat overestimate their prevalence. Many antigens also produce humoral immune responses to linear epitopes. [Atassi, M. Z. Eur J Biochem. (1984) 145:1-20.] In fact, we previously described that the monoclonal antibodies used for clinical immunohistochemistry testing are all directed to linear epitopes. [Sompuram, S., et al. Amer. J. Clin. Pathol. (2006) 82-89.] The search tools that are currently available for epitope mapping of conformational epitopes require knowledge of the crystal structure of the protein antigen. [Schreiber, A., et al. J Comput Chem. (2005) 26:879-87.] Although antibodies to conformational epitopes do not help identify the protein target, our findings shown in
In practical terms, the E-MAP analysis process involves submitting a collection of clinically relevant monoclonal antibodies for analysis, not knowing which, if any are correctly matched to the same protein. Since we have no way to know which antibody pairs will be correctly matched, we submit all combinations in separate pairwise searches. The number of independent pairwise combinations to be performed is, in fact, manageable and calculated from combination theory, as n!/[2×(n−2)!], where n equals the number of independent antibodies being analyzed. For example, nine different antibodies results in 36 different pairwise searches.
Exemplary Application of E-MAP to Multiple Myeloma
Although there are many applications for an immunomic search technology, this immunomic search technology was of immediate interest to us for investigating the etiology of B lymphoproliferative disorders. There is growing evidence that these malignancies are triggered by antigenic stimuli. [Jack, H.-M., et al. Proc. Natl. Acad. Sci. (USA). (1992) 89:8482-8486; Friedman, D., et al. J. Exp. Med. (1991) 174:525-537; Lecuit, M., et al. N Engl J Med. (2004) 350:239-48; Sahota, S., et al. Blood. (1997) 89:219-226.] The accumulating evidence for stimulation through the B-cell receptor in clonal B-cell lymphoproliferative disorders highlights the importance of characterizing the antigenic stimuli. Identification of these antigens may illuminate the etiology of B-cell lymphoproliferative diseases and open new avenues of therapeutic intervention. However, without a clinical basis to suspect a particular antigen, as in the paradigm case of gastric MALT lymphoma and H. pylori [Parsonnet, J., et al. N Engl J Med. (2004) 350:213-5.], there is currently no method to identify putative antigenic stimuli. Consequently, we applied the E-MAP technology to this clinical question, by performing an immunomic analysis of the paraproteins found in multiple myeloma.
Multiple myeloma is a malignancy of cells in the B lymphocytic lineage that produce a monoclonal immunoglobulin, or “paraprotein”. There is no known etiologic agent for multiple myeloma, but there is growing evidence that microorganisms are important etiologic causes of other B lymphocytic malignancies. The most striking example is gastric MALT lymphoma, which has been linked to chronic H. pylori infection. [Isaacson, P. Annals of Oncology. (1999) 10:637-645; Eck, M., et al. Recent Results in Cancer Research. (2000) 156:9-18; Boot, H., et al. Scand. J. Gastroenterol.—Suppl. (2002) 236:27-36.] In that example, identification of the etiologic agent led to the use of antibiotics as a curative treatment, especially for patients with low grade lymphomas. Similarly, immunoproliferative small intestinal disease (IPSID), an uncommon form of B cell lymphoma arising in the small intestinal mucosa-associated lymphoid tissue, has been linked to C. jejuni. [Lecuit, M., et al. N Engl J Med. (2004) 350:239-48.] In other instances, B lymphomas have been described as autoreactive to an endogenous retrovirus in one case [Jack, H.-M., et al. Proc. Natl. Acad. Sci. (USA). (1992) 89:8482-8486.] or to unknown autoantigens in another. [Friedman, D., et al. J. Exp. Med. (1991) 174:525-537.] Other microbial antigenic drivers of B lymphoproliferative disorders include B. burgdorferi with MALT lymphoma of the skin, C. psittaci with MALT lymphoma of the ocular adnexa, and hepatitis C virus with splenic marginal zone lymphoma. [Fisher, S., et al. Curr Opin Oncol. (2006) 18:417-424.]
Despite these findings, there are no known microbial associations for the most prevalent B lymphoproliferative disorders. Previously established associations were initially suspected on the strength of clinical clues. For example, H. pylori had already been demonstrated to cause gastric ulcers, before it was investigated as a cause of gastric MALT lymphoma. Without clinical clues, there is no method for identifying the antigenic specificity of malignant T or B lymphocytes. In the past decade, there have been several attempts to identify antigens for multiple myeloma by probing paraproteins' antigen-binding regions (paratopes) with combinatorial peptide libraries. [Dybwad, A., et al. Scand J Immunol. (2003) 57:583-90; Szecsi, P. B., et al. Br J Haematol. (1999) 107:357-64; Thurnheer, M., et al. Eur. J. Immunol. (1999) 29:2676-83; Zonder, J., et al. American Society of Clinical Oncology Annual Meeting. (2005) Abstract 6626.] By identifying peptides that bind to a paratope, it was hoped that it might be possible to link the sequence to an entry from the protein databases. The peptide sequences that were identified were insufficiently precise or accurate to yield particularly meaningful database hits.
With E-MAP, it is possible to identify the corresponding protein antigens for antibodies, without ancillary clinical clues. E-MAP differs from previous methodologic approaches [Dybwad, A., et al. Scand J Immunol. (2003) 57:583-90; Szecsi, P. B., et al. Br J Haematol. (1999) 107:357-64; Thurnheer, M., et al. Eur. J. Immunol. (1999) 29:2676-83; Zonder, J., et al. American Society of Clinical Oncology Annual Meeting. (2005) Abstract 6626.] in at least two important ways. First, higher stringency levels are used during phage panning, resulting in a more accurate and predictive consensus peptide sequence. Also, E-MAP uses a different type of bioinformatic analysis, looking for clustering of protein database targets amongst two or more patients. We performed an E-MAP analysis on the paraproteins from nine randomly chosen patients' with multiple myeloma (MM).
E-MAP Analysis of Multiple Myeloma
A phage library with approximately 20-mer random linear peptide inserts was enriched by three rounds of panning against myeloma patients' paraproteins. Each round of selection comprised a positive selection against the paraprotein, a negative selection against normal human immunoglobulins, and a subsequent positive selection round against the same paraprotein. The eluted phage from round one were then amplified by transfection in E. coli and the process repeated. The enriched third round phage were then plated on an agar/E. coli lawn. Replicate lifts were created on nitrocellulose membranes, which were then tested against the myeloma patients' serum for immunoreactivity.
We analyzed nine patients by E-MAP, and show the sequence data from two of them—patients 12 and 20 (
For patient 20 motif 2, the fact that so many peptide stretches corresponding to the consensus sequence are immediately adjacent to the carboxy terminus (right-hand side) indicates that the next (invariant) amino acid is likely identical to the native sequence. Otherwise, the peptide stretches corresponding to the consensus sequences should have been randomly positioned within the peptide insert. For that reason, we included the next amino acid on the carboxy side (glycine, G) as part of the consensus peptide sequence. The dominant amino acid sequence for each of the two patients was derived from MEME, and is listed at the bottom of
A serum protein electrophoresis gel image from patients 12 and 20 is shown in
The consensus peptide sequences for patients 12 and motif 2 of patient 20 both share the amino acid sequence E-Y-T L-Y G (dashed spaces representing positions of some uncertainty). Because of the similarity, we speculated that the two paraproteins may actually recognize the same exact epitope. To evaluate this possibility, we tested phage preparations enriched to bind to one paraprotein for immunoreactivity to the other patients' serum antibodies. Namely, phage that were enriched for patient 12's paraprotein were tested for immunoreactivity against the paraprotein of patient 20, and vice versa. Several other patient sera were included as controls.
ELISA of phage Immulon-4HBX flat-bottom microtiter plates (Thermo Electron Corp; Milford, Mass.) were coated with 100 μl/well of 4 μg/mL of anti-human-IgG or anti-human-IgA (Vector Laboratories; Burlingame, Calif.) in 0.05 M carbonate-bicarbonate buffer, pH 9.6 (capsules by Sigma-Aldrich), overnight at 4° C. Unbound antibody was rinsed off and the wells were blocked with 200 μl/well of 5% non-fat dry milk in PBS, for 1 hour at room temperature. The wells were rinsed once and patient sera (as well as pooled normal control sera) were added, appropriately diluted so that the final concentration of immunoglobulins was 10 μg/mL in PBST (0.05% Tween), 0.1% milk, and incubated 2 hours at room temperature. Wells were washed 8× with PBST (0.05%). First, second and third round phage preps from each analyzed patient, as well as L-20 starting library and a phage preparation of wildtype M13 phage, were diluted 1:100 in PBST (0.1%), 0.1% milk and 100 μl/well are added and incubated overnight at 4° C. The wells were washed 8× with PBST (0.05%). Rabbit anti-fd (anti-phage) was prepared as 1:750 in PBST (0.05%), 0.1% milk and 100 μl/well and added for 2 hours at room temperature. The wells were washed 8× with PBST (0.05%). Goat anti-rabbit-Alkaline Phosphatase was prepared as 1:750 in PBST (0.05%), 0.1% milk and 100 μl/well were added for 2 hours at room temperature. One of ordinary skill will understand that any antibody-enzyme conjugate, where the antibody is directed to the M13 phage, will suffice in this assay. The wells were washed 8× with PBST (0.05%) and then 100 μl/well of alkaline phosphatase substrate (1 mg/mL, tablets, Sigma Chemicals; St. Louis, Mo.) was added. The absorbance at the appropriate wavelength (depending upon the enzyme and substrate used) and was read on a Bio-Rad Model 2550 EIA Reader instrument.
Since the sequence data for patient 12 and motif 2 of patient 20 (
MAST is capable of accepting the MEME analysis motif output in the form of a two-dimensional numeric display, the Position-Specific Scoring Matrix (PSSM). The latter is not simply a dominant motif string, but contains all of the phage clones' peptide insert information, preserving the experimentally-observed positional variation within the span of the determined motif. This results in a profile of a virtual mimotopic array of peptides. Matches are rated on exactness of fit and then scored for probabilities of occurrence based on accepted bioinformatics models. The better the fit, the higher the rank order of the retrieved hit.
We submitted the combined PSSM of patient's 12 and 20 to MAST, searching against the non-redundant (nr) protein database, having set a threshold expectation (E) value of 50. We retrieved 61 hits, 41 of which were entries for the glycoprotein B of human cytomegalovirus (HCMV), beginning at position 11. Discounting multiple entries for the same protein, we retrieved 15 distinct proteins. Aside from glycoprotein B, the remaining 14 were all entries for conceptual translations afforded by various sequencing projects. We scrutinized all of the hits for the number of amino acids demonstrating identity with our 9 well characterized positions. Only 4 hits exhibited identity in 7 out of the 9 positions, and out of those only glycoprotein B had maximal coverage for all 9 when conserved substitutions were considered.
We also submitted the dominant motif string (EXVYDTTLXYG) to the National Center for Biotechnology Information (NCBI)'s “search for short, nearly exact matches” protein-protein BLAST utility (http://www.ncbi.nlm nih gov/BLAST/). This allowed us to better lock in the amino acid identity for the predicted epitope's positions. Even though conserved substitutions would be considered, there was no PSSM introducing further laxity in defining the positions. We searched against the nr database, using default settings (PAM 30 matrix, word size 2 and expectation value 2000), requesting the top 100 hits. Glycoprotein B populated positions 2-66 of the search. The top ranked hit was a protein predicted to be similar to the zinc finger protein 539 from Pan troglodytes. However, this top ranked hit failed to exhibit the maximal alignment achieved with HCMV Glycoprotein B. All in all, the predicted epitope achieved a 63% (7/11) identity and 81% (9/11) overall homology with glycoprotein B.
We also performed a similar analysis for motif 1 of patient 20. This search identified the UL-48 gene product of human cytomegalovirus as a leading candidate. The degree of homology is shown in
HCMV Immunoreactivity Assays
HCMV Glycoprotein B ELISA. Since glycoprotein B of human cytomegalovirus so closely aligned with the combined consensus peptide sequence from patients 12 and 20, we tested whether it is, in fact, the antigen. Sera from forty different myeloma patients were tested for immunoreactivity to the AD2 domain of glycoprotein B in a commercial ELISA kit (Biotest, Dreieich, Germany). In this kit, the antigen is a fusion protein derived from the UL55 reading frame of HCMV glycoprotein B, strains AD169 and Towne.
HCMV Lysate Immunoassay. These findings were also confirmed in a different commercial assay (“VIDAS”), marketed by bioMérieux, Inc., Durham, N.C. Rather than testing for immunoreactivity to a purified HCMV recombinant glycoprotein B, the VIDAS assay tests for immunoreactivity to a HCMV lysate, which is immobilized onto a solid phase. Thus, the lysate is able to test for a greater array of different antibodies to various HCMV proteins. This particular assay detects IgG antibodies to HCMV with a monoclonal anti-human IgG-alkaline phosphatase conjugate.
UL-48 Gene Product ELISA. The same forty MM patients as tested for immunoreactivity to glycoprotein B were also tested for immunoreactivity to the N-terminus (amino acids 1-20) of the UL-48 gene product. Patient 20's serum sample yielded the strongest signal, confirming the immunoreactivity that was predicted by E-MAP analysis (
Patient 20 has Two Serum Paraproteins
We were surprised that patient 20's E-MAP analysis produced two different motifs, since serum protein electrophoresis (SPEP) from patient 20 revealed a single paraprotein (
To sort out the source of the two motifs associated with patient 20, we performed a phage immunoblot experiment (
Immunoblots for IgG and phage. Patient sera were diluted in PBS and 10 μl aliquots were loaded and ran on a precast protein PUN, agarose gel, in a Hydrasys instrument (SEBIA-USA, Norcross, Ga.) according to the manufacturer's instructions. The automated program was stopped after phoresis (40 Vh, ˜5 minutes) and not allowed to proceed to the gel drying step. The gel was removed from the instrument and contact blotted onto a nitrocellulose membrane (Protran BA83 0.2 μm nitrocellulose membrane; Whatman, Florham Park, N.J. or NitroBind Cast pure nitrocellulose 0.45 μm; General Electric Water & Process technologies, Minnetonka, Minn.), under 100 g of weight, for 30 minutes at room temperature. Placement of the gel relative to the membrane was noted with ink, demarking sample lanes and other features of interest. The gel was then removed and the membrane blocked with 2% milk PBST for 1 hour at room temperature. The membrane was rinsed twice with PBST and specific phage, prepared in 1% milk PBST, was added for an overnight incubation at 4° C. with rocking. The membrane was washed three times, 10 minutes each, with PBST, and mouse anti-M13-HRP conjugate was added, prepared as 1:5000 in 1% milk PBST, for 1½ hours at room temperature. The membrane was washed twice with PBST, once with PBS and any retained phage were visualized using a standard chemiluminescence protocol. Also, SPEP-blots were undertaken with patient sera diluted 1:1000 in PBS and these blots were developed with goat anti-human-IgG-HRP to reveal the location of the paraprotein, as an internal control for each run.
Agarose gel immunoblot with HCMV lysate and virions. In order to correlate specific paraproteins on the electrophoretic gel with its binding capability, we performed an immunoblot. We tested whether HCMV immunoreactivity co-migrates with the paraprotein on agarose gel electrophoresis. The serum protein electrophoretic patterns of patients 12 and 20, as stained with the protein dye amido black, are shown in lane 1 of
In order to assess HCMV immunoreactivity, an agarose gel immunoblot method was used, [Nooija, F., et al. J. Immunol. Methods. (1990) 134:273-281; Knisley, K., et al. J Immunol Methods. (1986) 95:79-87.] (lanes 3-6). Since the immunoblot is several log orders more sensitive than the SPEP, sera were diluted in order to find a linear range of detection. For patient 12, the restricted band that binds to both intact HCMV virions (lane 3) and an HCMV lysate (lane 5) exactly aligns with the paraprotein (lane 1, arrow). Since glycoprotein B is a viral membrane protein, we expected patient 12's paraprotein to bind both the HCMV lysate and intact virion preparation. With intact virions, viral membrane proteins such as glycoprotein B are accessible for antibody binding.
The analysis for patient 20 (
Agarose gel immunoblot. For this assay, proteins are electrophoretically separated in an agarose gel. The proteins are then contact blotted onto an antigen-coated nitrocellulose membrane. Protein transfer requires that serum antibodies in the gel bind to antigen on the nitrocellulose membrane. Only immunoglobulins capable of binding to the antigen adhere. The nitrocellulose membrane is otherwise saturated with irrelevant proteins, largely preventing non-specific protein transfer Immunoglobulins that are bound to the nitrocellulose sheet are then visualized with a human IgG-specific antibody-enzyme conjugate.
Nitrocellulose membranes were incubated with specific phage prepared in 0.5 M bicarbonate buffer (pH 8.0), overnight at 4° C. with rocking. The membranes were then rinsed with PBST and blocked for 1 hour with 2% milk PB ST. In this variation of the immunoblot, the gels are allowed to contact the antigen-coated nitrocellulose membranes for 30 minutes at room temperature, sandwiched between two glass plates. The relative position of the gels to the membranes are marked in ink, and the gels are removed. The membranes are thoroughly washed three times in PSBT for a total of 30 minutes. Membranes are then incubated with goat anti-human IgG-HRP conjugate prepared as 1:5,000 in 1% milk PBST for 1½ hours at RT or overnight at 4° C., with rocking. Membranes were washed twice with PBST and once with PBS before development by chemiluminescense.
CMV-Reactive Paraproteins in Other MM Patients (
Besides patients 12 and 20, we tested 24 other MM patients for immunoreactivity to HCMV lysates, using a commercial ELISA. Patient sera were diluted ten-fold more than recommended by the manufacturer, since the paraproteins are present in high concentrations. Ten sera were not reactive and therefore not further tested (data not shown). Fourteen of the 24 patients were seropositive (data not shown). We then performed agarose gel immunoblots on each of them, to determine if the paraprotein is the source of the HCMV immunoreactivity. Of the 14 seropositive MM patients, eight had bands on the HCMV lysates lane that co-migrate with the paraprotein seen on SPEP (
The HCMV immunoblots in
Identification of the Human Endogenous Retroviral K Envelope Glycoprotein (HERV-K Env) as a Paraprotein Target.
We identified the target antigen for the paraproteins of two other multiple myeloma patients who were not seropositive to CMV. Patient #14 is a 70 year-old man with a diagnosis of multiple myeloma, with an IgG-lambda monoclonal component representing >99% of the serum immunoglobulins. Patient #21 is a 75 year-old man with a diagnosis of multiple myeloma with an IgG-kappa monoclonal component representing >99.9% of the serum immunoglobulins. The motif for patient 14 is LNTPLVVP. The motif for patient 21 is KSIPTEP.
Both of these motifs' PSSMs were submitted to MAST in a simultaneous search of the nr database. The best possible match for both motifs was afforded by the human endogenous retrovirus K envelope protein (HERV-K Env) which appeared at position 1 and 56 of the database search results. The match can be seen in
Human endogenous retroviruses (HERVs) comprise 9% of the human genome. They are relics of unexpressed proviruses that integrated into the germline genome of primate/human predecessors 40 million years ago. Most of the HERV sequences are defective due to accumulation of deletions or mutations. The HERV-K family consists of 30 to 50 proviruses and is the only human endogenous provirus to retain open reading frames for the Gag, Prt, Pol and Env viral proteins. Our finding of two paraproteins directed to HERV-K Env protein suggest that the retrovirus is expressed in some myeloma patients. Involvement of HERVs in multiple myeloma or, for that matter any other type of clonal B lymphoproliferative disease, has not been previously described.
Implications of the E-MAP Findings in Multiple Myeloma Pathogenesis
In this first clinical application of the E-MAP methodology, we find that a suprisingly high proportion of paraproteins in MM are directed to HCMV. Including patients 12 and 20, we found that at least 10 out of 26 MM patients had HCMV-reactive paraproteins. The fact that patient 20 had two separate paraproteins, both directed to different HCMV proteins, further suggests that HCMV is not a randomly chosen antigenic target. These findings raise potentially important implications for the pathogenesis, diagnosis, and treatment of MM.
Our findings suggest that HCMV may represent a viral stimulus that leads to MM in a subset of infected individuals. Following an initial infection, HCMV normally remains in a persistent, latent state within the host, controlled by the host's immune system. Nonetheless, the virus is capable of reactivation and shedding, even in seropositive immune-competent individuals. Thus, it likely represents a chronic immune stimulus, fostering the ongoing stimulation and growth of HCMV-specific B and T lymphocytes.
Our findings raise the possibility that persistent or repetitive chronic immune stimulation by HCMV may act as a tumor promoter, by causing clonal expansion of HCMV-reactive B lymphocytes. As the proliferating lymphocytes accumulate mutations, the evolving pre-malignant MM cell may require the presence of antigen, HCMV. This hypothesis is consistent with the clinically observed entity known as monoclonal gammopathy of undetermined significance (MGUS), a precursor of MM. Such persistent proliferative stimulation may predispose the pre-malignant MM cell, over time, to additional transforming events associated with dysregulation of cell cycle checkpoints and apoptotic pathways. By the time a clinical diagnosis of MM is made, the virus may no longer need to be productively expressed [Hermouet, S., et al. Leukemia. (2003) 17:185-195.], and the MM cells may no longer be antigen-dependent. If this hypothesis is true, then it raises the possibility that early intervention with anti-viral agents may prevent progression to frank malignancy. Moreover, if infection could be prevented with an effective vaccine [Khanna, R., et al. Trends Mol. Med. (2006) 12:26-33.], then many cases of multiple myeloma might potentially be prevented. These findings also have potential implications for other B lymphoproliferative disorders, apart from multiple myeloma. If antigen acts as a tumor promoter, then B lymphoproliferative disorders provide us with a unique fingerprint—the antibody itself—for identifying the relevant antigens promoting tumor growth. The E-MAP technology now allows us to match the fingerprints to disease targets.
Our findings of paraproteins directed to both CMV and HERV-K raise the possibility that the two are linked pathogenetically. In this regard, it is relevant that Herpesviridae have been shown to transactivate HERV-K elements. It is of special interest that the latent proteins from Epstein-Barr virus (a known oncogenic and lymphotropic virus that infects B cells) are sufficient to transactivate HERV-K Env, and presumably other proviral transcripts.
Implications of E-MAP For Diagnostic Test Development & Biomarker Discovery
The E-MAP technology may be highly valuable in biomarker discovery for the development of medical diagnostic tests. In this context, the antigen itself can serve as a clinically relevant biomarker. Our findings with regard to CMV and HERV-K raise the possibility that immunoassays, including electrophoretic immunoassays, may be valuable in the diagnosis, classification for treatment, or prognosis of lymphoproliferative disorders and gammopathies, such as multiple myeloma. These assays can take many forms, including both solid phase immunoassays, such as ELISA, as well as electrophoretic immunoassays, such as immunofixation-in-gel or immunoblots.
For example, one type of assay might represent a column comprised of a solid phase substrate, such as Sepharose, to which CMV or HERV-K (or their proteins or peptides) are immobilized. The patient's serum sample would be passed into the column and any CMV or HERV-K-specific antibodies will contact and bind to their respective binding partners. After a suitable incubation time, typically 15-60 minutes, the serum (or plasma) is rinsed out, leaving only the column-adherent antibody. The antibody can then be eluted, such as with acid (e.g., 10 mM glycine pH 2.5) or base. The eluate can then be neutralized and analyzed by electrophoresis, to determine if the eluted antibody co-migrates with the serum paraprotein identified on serum protein electrophoresis or immunofixation.
Another exemplary immunoassay for determining if the immunoglobulin secreted by the malignant cell (a.k.a. the paraprotein) is a solid phase immunoassay, such as an ELISA or microarray. In the latter alternative, various proteins or peptides derived from CMV or
HERV-K can be coupled to the array substrate using techniques that are well known in the art. A suitable method for covalent conjugation of peptides or viral proteins to glass, for example, is described in U.S. Pat. No. 6,855,490, also assigned to Medical Discovery Partners LLC, the same assignee on this patent application. In such an embodiment, the patient's serum or plasma sample is pipetted onto the array surface, allowing any antibodies to the array components to contact and bind to their respective protein or peptide targets. After a suitable incubation time, such as 15-60 minutes, the serum or plasma sample is removed. The surface is typically rinsed with a physiologic buffer, to wash away any weakly-binding antibodies or other serum proteins. Tightly-bound serum antibodies are then detected with a reagent that binds to human immunoglobulins, such as an anti-human immunoglobulin antibody conjugate. The reagent can be conjugated to one of many suitable labels, including fluorochromes (e.g., fluorescein) or enzymes (e.g., horseradish peroxidase). Depending upon the label, the presence of bound antibodies from the patient's serum sample is detected visually, such as with a fluorescence microscope or brightfield microscope.
Another possible format for an immunoassay to test paraprotein target specificity is a Western blot. In a Western blot, the proteins from CMV or HERV-K (in this case) would be separated out electrophoretically, such as by SDS-polyacrylamide gel electrophoresis. The proteins are then transferred onto a membrane, such as nitrocellulose or PVDF. The membrane with the separated proteins bound to the surface then serves as a kind of solid phase in an immunoassay, albeit on a membrane. The serum or plasma sample, for example, are then added to the membrane, usually contained in a vessel, so that the serum/plasma sample thoroughly contacts the membrane. After a suitable incubation time, non-adherent serum or plasma components are removed by rinsing the membrane surface with a physiologic buffer. The presence of tightly bound serum antibodies, such as a paraprotein, is then detected with a reagent that binds to human immunoglobulins, such as an anti-human immunoglobulin antibody conjugate, such as described in the preceding paragraph. Tightly-bound serum antibodies, such as paraproteins, will bind in the same general shape as the viral protein on the membrane, as it ran in the electrophoretic gel. Identifying the specific location of the bands on the membrane will facilitate a determination of the identity of each protein in the gel, since various viral proteins can be correlated with their known electrophoretic mobility. Electrophoretic mobility of specific viral proteins can be established by identifying them through a variety of means, including blotting with monoclonal antibodies to each of the major viral proteins in parallel to the patient sample.
Immunoassays such as ELISA, microarrays or Western blot will detect antibodies to immobilized components, but those antibodies may not necessarily be paraproteins (derived from a malignant cell). However, since serum paraproteins in patients with gammopathies (such as multiple myeloma) are usually present in high concentrations, it is reasonable to make the inference that the antibody is the serum paraprotein if the antibody titer is beyond that which would be expected from the normal background of polyclonal antibodies. For example, a threshold value is established beyond which only a small fraction of normal individuals are reactive. In testing patients with gammopathies, any positive results will have a statistical likelihood of being derived from the serum paraprotein, depending upon the established threshold value.
We envision at least three different applications for E-MAP as a discovery tool leading to new diagnostic assays. In a first application, biomarker identification might be useful for diagnostics that are linked to therapy. For example, if anti-viral therapy is useful in treating multiple myeloma, then it is of obvious importance to know which myeloma patients have tumors associated with a particular virus. If the patient's paraprotein and malignant myeloma cells express surface receptors specific to a particular protein or peptide, then treatment might be possible where the antigen receptors on the cells are blocked, depriving the cells of an essential growth stimulus. Patients whose myeloma cells are directed to other targets might not benefit from this particular therapy. Similarly, it is potentially possible that the antigen itself or a peptide, conjugated to a cytotoxic agent, might serve as a means to target the receptor as a tumor-specific antigen. Exemplary cytotoxic agents are well known in the field, and can include radionuclides and toxins/toxin subunits. With such types of antigen conjugates, identifying the antigen is important if the patient is to receive the proper drug.
In a second application, E-MAP analysis can be useful in identifying markers for assessing disease prognosis. A precursor of multiple myeloma is a clinical entity called monoclonal gammopathy of undetermined significance (MGUS). Approximately 3% of the population over 55 years of age may have paraproteins, but the vast majority have no symptoms whatsoever. Only a small proportion of patients with MGUS progress to multiple myeloma, which is a life-threatening disease. Distinguishing those who will progress from those who will not could allow for early intervention. Identifying the antigen to which the paraprotein is directed might be informative in predicting which patients with gammopathies will develop multiple myeloma and which will remain as MGUS. Certain antigens may be expected to be associated with progression. If the clonal B lymphocytes responsible for MGUS are stimulated by different antigens, then the nature of the antigen could have a profound effect on the disease course. Certain antigens may be naturally present in higher concentrations, which might further support proliferation of a partially transformed malignant B lymphocyte clone. Alternatively, certain microorganisms may cause transformation by other ancillary means, such as by inserting viral promoters or dysregulating cell cycle or apoptosis machinery, and thereby be more predisposed to generating a malignant response. Regardless of the exact mechanism, any type of immunoassay that identifies the antigen to which the paraproteins are directed might be useful for determining patient prognosis.
In a third embodiment, the E-MAP technology could be useful in biomarker discovery in tests for disease detection and disease monitoring. For example, knowing the precise antigen or even peptide epitope to which malignant cells bind allows one of ordinary skill to create more specific diagnostic reagents for the malignant B lymphocyte clone. Thus, instead of performing immunostains for kappa or lambda light chain, the peptides or protein antigens can be used as probes for identifying or quantifying the malignant cells. The peptide or protein antigens can be conjugated to moieties such as fluorochromes or enzymes, in order to detect their presence in an immunoassay. This type of antigen conjugate could then be used in flow cytometry, immunofluorescence, immunohistochemistry, or any other cellular assay. For example, such a conjugate could be useful in detecting minimal residual disease and quantifying the residual malignant cell fraction. In addition, the antigen conjugate can be used for detecting and quantifying a secreted paraprotein. Since the paraprotein will bind to the antigen, there are various methods by which an immunoassay might be designed to quantify a paraprotein. For example, the antigen might be immobilized onto a solid phase substrate, such as for an ELISA. Alternatively, the antigen might be used in a precipitation assay, such as for immunofixation analysis. Currently, clinical laboratories use antibodies to various immunoglobulin subtypes (IgG, IgA, IgM, kappa or lambda light chains) to precipitate paraproteins in an agarose electrophoretic gel. The E-MAP method allows us to now identify antigens that can cross-link the paraproteins in place, in the gel. This method may provide higher resolution of the paraproteins, since they would be more specific than the broad categories of immunoglobulin subtypes.
Although we describe an application of E-MAP to multiple myeloma, many of the same conclusions and clinical opportunities exist for many other gammopathies. In fact, some gammopathies, such as MGUS or amyloidosis AL, may be more dependent on the presence of antigen for cellular growth than multiple myeloma. Thus, therapies aimed at suppressing the concentration of antigen may be more effective in some of these other clinical entities. Besides gammopathies, E-MAP should also be expected to be useful in a similar manner to other B lymphoproliferative disorders such as non-Hodgkin's lymphoma and chronic lymphocytic leukemia. Like multiple myeloma, E-MAP analysis of their B cell receptor immunoglobulin will be expected to identify the antigen to which these clonal B lymphocyte proliferations are directed. Thus, the same therapeutic and diagnostic opportunities exist for these clinical entities. In fact, since there is a much lower concentration of secreted immunoglobulin, some therapeutic options (such as antigen-toxin conjugates) may even be more useful in these other B lymphoproliferative disorders.
E-MAP analysis may also be useful in studying immune responses in other clinical entities, such as autoimmunity. E-MAP analysis can facilitate the identification of protein antigens linked to an autoimmune process. Identifying relevant antigens in autoimmune diseases may be diagnostically or therapeutically useful, in therapeutic target identification or in one or more of the diagnostic biomarker contexts previously described.
E-MAP analysis may also be useful in studying diseases of unknown etiology. To the extent that the immune response targets a pathogenetically-relevant protein antigen in a disease of unknown cause, E-MAP can identify these antigens as useful therapeutic or diagnostic targets. Exemplary diseases to which E-MAP can be applied includes granulomatous diseases of unknown cause, including sarcoidosis, Crohn's disease, and giant cell arteritis. In each disease, the cause is not known and there is debate as to whether any or all of them might be caused by an infectious agent. By identifying proteins targeted by the immune system in affected patients, E-MAP analysis can narrow the universe of potential etiologies to a short list, for further evaluation.
The pairwise approach of bioinformatic analysis described for E-MAP analysis, is also applicable to T lymphocytes as well. Pairwise analysis of T lymphocyte targets can help narrow down the list of candidate target proteins in a similar manner as described for antibody epitopes. In fact, since T lymphocytes only recognize linear epitopes, the analysis may be even simpler. Of course, epitope analysis of T lymphocytes requires a different methodology using T lymphocyte clones or purified T cell receptor. However, once the epitopes are experimentally reconstructed, the bioinformatic analysis that we describe is directly applicable.
A number of embodiments of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, other embodiments are within the scope of the following claims.
This application claims priority to U.S. Provisional Application Ser. No. 60/887,916, filed on Feb. 2, 2007, which is hereby incorporated by reference in its entirety.
The invention was supported, in whole or in part, by grants R44CA81950 and R44CA094557 from The National Institutes of Health. The Government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US08/52606 | 1/31/2008 | WO | 00 | 6/25/2010 |
Number | Date | Country | |
---|---|---|---|
60887916 | Feb 2007 | US |