Recombinant antibodies represent the fastest growing class of therapeutic medicines, and the generation of antibodies that meet specific criteria is increasingly important for therapeutic applications. Currently, there are two predominant methodologies for therapeutic antibody generation: immunization-based and surface display-based approaches. These methodologies are responsible for the majority of the currently marketed therapeutic antibodies and for the biopharma industry pipeline which are concentrated on only a small number of targets. A key challenge for the broader application of biotherapeutic approaches is the difficulty of raising functional antibodies against novel targets. Since many new targets are membrane spanning and multimeric proteins, there is a need to develop more effective methods to generate antibodies against these difficult targets. Also, the pharmaceutical properties of therapeutic antibodies are an active area for study concentrating on biophysical characteristics such as thermal stability and aggregation.
Of the currently approved antibody therapeutics, many are humanized rodent antibodies. Although obtaining fully human antibodies from phage displayed human antibody libraries or from transgenic rodents with human antibody genes are more popular techniques today, rodents with wild-type immunoglobulin genes remain an important source for therapeutic antibody discovery. Some industrial laboratories have been able to obtain antibodies with low picomolar affinity, thus providing good candidates for therapeutic antibody engineering, such as humanization. A major challenge to immunization-based antibody discovery is related to the nature of new targets themselves, many of which are membrane-spanning proteins. Therefore, conventional biochemistry in preparing soluble protein as immunogens does not work well for this target class. Also, immune tolerance can lead to difficulties generating neutralizing antibodies when antigens are well conserved or are toxic upon administration to animals. Specific, immunodominant epitopes may be preferentially selected, making it difficult to identify functional antibodies.
Display technologies such as phage, yeast and ribosome display are based on the in vitro selection of antibody fragments from libraries and overcome some of the limitations of immune tolerance or epitope dominance in vivo. However, selection from such libraries may not always generate high-affinity antibodies without subsequent affinity maturation. Moreover, these methods typically select the tightest binders, regardless of the epitope they bind, which often results in isolating non-functional antibodies. Furthermore, antibody fragments isolated from microbial display systems are not always easily reformatted to produce well-expressed IgGs, soluble enough to be formulated for subcutaneous delivery.
Mammalian cell expression systems offer a number of potential advantages for therapeutic antibody generation, including the ability to co-select for key manufacturing-related properties such as high-level expression and stability, while displaying functional glycosylated IgGs on the cell surface. However, mammalian cell display has been hampered by the smaller library sizes that can be screened, making direct isolation of high-affinity binders from naïve libraries improbable. Although small libraries biased toward a particular antigen have been used successfully, a more generalized approach to generate high-affinity human antibodies from immunologically naïve libraries has not been reported. Importantly, mammalian display systems are also designed to select fragments with higher affinity that are not necessarily functional.
Stability and aggregation level are two critical factors that affect the pharmaceutical properties of biologic drugs, including protein production, formulation, shelf-life, dosing route, in vivo half-life and immunogenicity. Both sequence and structure based approaches have been applied in attempts to improve biotherapeutic stability. Sequence based analyses such as germline analysis, sequence conservation analysis, and sequence covariance analysis have all revealed potential amino acid changes to improve protein stability. Structure-based engineering attempts to stabilize fragile regions have involved inserting extra stabilizing interactions or eliminating incompatible interactions.
Antibodies (Abs) have two distinct functions: one is to bind specifically to their target antigen (Ag); the other is to elicit an immune response against the bound Ag by recruiting other cells and molecules. The association between an Ab and an Ag involves a myriad of non-covalent interactions between the epitope—the binding site on the Ag, and the paratope—the binding site on the Ab. The ability of Abs to bind virtually any non-self surface with exquisite specificity and high affinity is not only the key to immunity but has also made Abs an enormously valuable tool in experimental biology, biomedical research, diagnostics and therapy. The diversity of their binding capabilities is particularly striking given the high structural similarity between all Abs. The availability of increasing amounts of structural data in recent years now allows for a much better understanding of the structural basis of Ab function in general, and of Ag recognition in particular.
Antibody-Antigen (Ab-Ag) interactions are based on non-covalent binding between the antibody (Ab) and the antigen (Ag). Correct identification of the residues that mediate Ag recognition and binding improves our understanding of antigenic interactions and permits the modification and manipulation of Abs. For example, introducing mutations into the V-genes has been suggested as a way to improve Ab affinity. (Crameri A et al., Nat Med 2: 100-102 (1996); Figini M. et al. J Mol Biol 239: 68-78 (1994); Hawkins, R. E. et al., J Mol Biol 226: 889-896 (1992). However, mutations in the framework regions (FRs) rather than in the Ag binding residues themselves are more likely to evoke an undesired immune response. (Lou, J. et al. “Affinity Maturation by Chain Shuffling and Site Directed Mutagenesis” in ANTIBODY ENGINEERING (New York: Springer) 377-396 (2010)). Knowing which residues are more likely to bind the Ag can help direct such mutations and be beneficial to Ab engineering. (Almagro, J. C., J Mol Recognit 17: 132-143 (2004); Gonzales, N. R. et al., Mol Immunol 41: 863-872 (2004); Padlan, E. A. et al. Faseb J9: 133-139 (1995)).
It has been shown that Ag binding residues are primarily located in the complementarity determining regions (CDRs). (Padlan, E. A. et al., Faseb J9: 133-139; MacCallum, R. M. et al., J Mol Biol 262: 732-745 (1996); Wu, T. T. et al., J Exp Med 132: 211-250 (1970)). Thus, the attempt to identify CDRs, and particularly the attempt to define their boundaries, has become the focus of extensive research over the last few decades. (Padlan, E. A. et al. Faseb J9: 133-139 (1995); MacCallum, R. M. et al., J Mol Riot 262: 732-745 (1996); Zhao, S. et al. Mol Immunol 47: 694-700 (2010)); Kabat and co-workers attempted to systematically identify CDRs in newly sequenced Abs. (Wu, T. T., and Kabat, E. A., J Exp Med 132: 211-250 (1970); Kabat, E. A. et al., “Sequence of proteins of immunological interest”, Bethesda: National Institute of Health 323 (1983)). Their approach was based on the assumption that CDRs include the most variable positions in Abs and therefore could be identified by aligning the fairly limited number of Abs available then. Based on this alignment, they introduced a numbering scheme for the residues in the hypervariable regions and determined which positions mark the beginning and the end of each CDR. The Kabat numbering scheme was developed when no structural information was available. Chothia et al. analyzed a small number of Ab structures and determined the relationship between the sequences of the Abs and the structures of their CDRs. (Chothia, C. et al., J Mol Biol 196: 901-917 (1987); Chothia, C. et al., Nature 342: 877-883 (1989)). The boundaries of the FRs and the CDRs were determined and the latter have been shown to adopt a restricted set of conformations based on the presence of certain residues at key positions in the CDRs and the flanking FRs. This analysis suggested that the sites of insertions and deletions in CDRs L1 and H1 are different than those suggested by Kabat. Thus, the Chothia numbering scheme is almost identical to the Kabat scheme, but based on structural considerations, places the insertions in CDRs L1 and H1 at different positions. As more experimental data became available, the analysis was performed anew, re-defining the boundaries of the CDRs. These definitions of CDRs are mostly based on manual analysis and may require adjustments as the structure of more Abs become available. Abhinandan et al. aligned Ab sequences in the context of structure and found that approximately 10% of the sequences in the manually annotated Kabat database have erroneous numbering. (Abhinandan, K. R. et al., Mol Immunol 45: 3832-3839A (2008)). A more recent attempt to define CDRs is that of the IMGT database which curates nucleotide sequence information for immunoglobulins (IG), T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules. (Lefranc, M. P. et al., Dev Comp Immunol 27: 55-77. (2003)). It proposes a uniform numbering system for IG and TcR sequences, based on aligning more than 5000 IG and TcR variable region sequences, taking into account and combining the Kabat definition of FRs and CDRs, structural data, and Chothia's characterization of the hypervariable loops. Their numbering scheme does not differentiate between the various immunoglobulins (i.e., IG or TcR), the chain type (i.e., heavy or light) or the species.
A drawback of these numbering schemes is that CDR length variability is accommodated with either annotation of insertion (Kabat and Chothia) or by providing excess numbers (IMGT). Abs with unusually long insertions may be hard to annotate this way, and therefore their CDRs may not be identified correctly. Honegger and Pluckthun suggested a structurally improved version of the IMGT scheme. (Honegger, A. et al., J Mol Biol 309: 657-670 (2001)). Instead of introducing unidirectional insertions and deletions as in the IMGT and Chothia schemes, they were placed symmetrically around a key position. MacCallum et al. have proposed focusing on the specific notion of Ag binding residues rather than the more vague concept of CDRs. (MacCallum, R. M. et al., J Mol Biol 262: 732-745 (1996)). They suggested that these residues could be identified based on structural analysis of the binding patterns of canonical loops. Other studies have dubbed those Ag binding residues Specificity Determining Regions (SDRs). (Almagro, J. C. et al., J Mol Recognit 17: 132-143 (2004); Padlan, E. A. et al., Faseb J9: 133-139 (1995)).
The specificity of the Ab molecule to its cognate Ag has been exploited for the development of a variety of immunoassays, vaccinations, and therapeutics. Ab engineering may offer to expand the application of Abs by permitting improvements of affinity (Marks, J. D. et al. Biotechnology 10:779-8310 (1992); Soderlind, E. et al., Immunotechnology 4:279-85 (1999)) and specificity (Hemminki, A. et al., Immunotechnology 4:59-69 (1998); Ohlin, M. et al., Mol Immunol 33:47-56 (1996)). Understanding of the role each structural element in the Ab plays in Ag recognition is essential for successful engineering of better binders. The engineering of Abs is also important for the clinical use of Abs from non-human sources. Early studies on the use of rodent Abs in humans determined that they can be immunogenic (Mirick, G. R. et al., Q J Nucl Med Mol Imaging 48:251-7 (2004)). Humanization by grafting of the CDRs from a mouse Ab to a human FR is a commonly used engineering strategy for reducing immunogenicity (Jones, P. T. et al., Nature 321:522-510 (1986); Queen, C. et al., Proc Natl Acad Sci USA 86:10029 (1989)). In most cases, the successful design of high-affinity, CDR-grafted, Abs requires that key residues in the human acceptor FRs that are crucial for preserving the functional conformation of the CDRs will be back-mutated to the amino acids of the original murine Ab (Queen, C. et al., Proc Natl Acad Sci USA 86:10029 (1989); Co, M. S. et al., Nature 351:501 (1991). Several groups (Padlan, E. A. et al., FASEB J9:133-9 (1995); Ofran Y. et al., J Immunol 181:6230-5 (2008); Kunik, V. et al., PLoS Comput Biol 8 (2012)) used the experimentally determined 3-D structures of Ab-Ag complexes in the Protein Data Bank (PDB) (Berman, H. M. et al., “The Protein Data Bank” Nucleic Acids Res 28:235 (2000) (hereby incorporated by reference in its entirety) to determine which residues participate in Ag recognition and binding. Such knowledge can be exploited to identify residues that are important for the function of the Ab in general and for Ag recognition in particular, and may guide Ab engineering (Haidar, J. N. et al., Proteins 80:896-912 (2012); Hanf, K. J. et al., Methods 10 (2013) (hereby incorporated by reference in their entirety)). Residues that help maintain the functional conformation of the CDRs, for example, can be used to improve Ab humanization efforts by CDR-grafting.
More recent studies have shown that virtually all Ag binding residues fall within regions of structural consensus. (Kunik, V. et al., PloS Computational Biology 8(2):e1002388 (February 2012)) (hereby incorporated by reference in its entirety). These regions are referred to as Ag Binding Regions (ABRs). It was shown that these regions can be identified from the Ab sequence as well. “Paratome”, an implementation of a structural approach for the identification of structural consensus in Abs, was used for this purpose. (Ofran, Y. et al., J. Immunol. 181:6230-6235 (2008)) (hereby incorporated by reference in its entirety). While residues identified by Paratome cover virtually all the Ag binding sites, the CDRs (as identified by the commonly used CDR identification tools) miss significant portions of them. Ag binding residues which were identified by Paratome but were not identified by any of the common CDR identification methods are referred to as Paratome-unique residues. Similarly, Ag binding residues that are identified by any of the common CDR identification methods but are not identified by Paratome are referred to as CDR-unique residues. Paratome-unique residues make crucial energetic contribution to Ab-Ag interactions, while CDRs-unique residues have a rather minor contribution. These results allow for better identification of Ag binding sites and thus for better identification of B-cell epitopes. They may also help improve vaccine and Ab design.
B cells are activated during exposure to pathogens, and produce antibodies (Abs) that bind specific antigens (Ags). The initial repertoire of germline Abs is generated by rearrangement of the V(D)J gene segment. (Maizels, N., Annu Rev Genet 39, 23-46 (2005)). These Abs are the first responders to the Ag, and are believed to bind Ag with low affinity. (Di Noia, J. M. & Neuberger, M. S., Annu Rev Biochem 76, 1-22 (2007)). Improvement of affinity occurs in the days after the initial exposure through introduction of high-rate base changes in the Ab sequence, known as somatic hypermutations (SHMs), and selection of B-cell clones that have better affinity toward the Ag. (Rajewsky, K., Nature 381: 751-758 (1996)). The SHM process enables development of an efficient secondary response and immunological memory, which is key to development of B-cell immunity. Investigating SHMs is therefore essential for understanding the immune system and can guide Ab engineering, thus improving development of Abs as research, diagnostic and therapeutic agents.
In one embodiment, the claimed invention is directed to a method for generating a library of antigen binding molecules for screening for binding to an epitope of interest, the method comprising:
a. selecting a template antigen-binding molecule from a set of possible template antigen binding molecules wherein said selected template does not specifically bind the epitope of interest but is known to specifically bind another epitope;
b. selecting at least one residue position in said template antigen-binding molecule for mutation; and
c. selecting at least one variant residue to substitute at the at least one residue position selected in b;
such that a library containing a plurality of variants of said template is generated. In another embodiment, the method further comprises synthesizing the template variants to form the library. In some embodiments, the set of possible template antigen-binding molecules comprises a plurality of known antibodies that do not bind the epitope of interest.
In some embodiments, the step of selecting a template antigen-binding molecule comprises screening the three-dimensional structures of the set of possible antigen-binding molecules based on one or more of the following criteria: shape complementarity to the epitope of interest, physico-chemical complimentarity to the epitope of interest, and the predicted free energy of the interaction with the epitope of interest. In some embodiments, the step of selecting a template antigen-binding molecule further comprises screening the three-dimensional structures of the set of possible antigen-binding molecules based on physico-chemical complimentarity to the epitope of interest. In another embodiment, the step of selecting a template antigen-binding molecule further comprises screening the three-dimensional structures of the set of possible antigen-binding molecules based on the predicted free energy of the interaction with the epitope of interest.
In some embodiments, the step of selecting at least one residue position comprises screening the three-dimensional structure of the template antigen-binding molecule to identify residues likely to contribute to binding to the epitope of interest. In another embodiment, the step of selecting at least one residue position comprises conducting multiple sequence alignment of the nucleic acid sequence of the template antigen-binding molecule to identify substitutable positions.
In certain embodients, the step of selecting at least one variant residue comprises, for each residue identified in step b above, identifying substitutions that are preferred, allowed and/or neutral at that residue position. The preferred, allowed and/or neutral substitutions can be determined by analyzing the sequences of a plurality of known antibodies derived from the same germline sequences as the template antigen-binding molecule. In one embodiment, the step of selecting at least one variant residue further comprises synthesizing variants of the template antigen-binding molecule to form a library.
The claimed invention is also directed to a library of antigen-binding molecules made by one or more of the above method(s).
In another embodiment, the invention is directed to screening the library with the antigen of interest to select for antigen-binding molecules that have desired properties (e.g., binding affinity, stability, etc.)
In another embodiment, the invention is directed to an antigen-binding molecule isolated from said library after said screening.
In another embodiment, the claimed invention is directed to a method for screening a library of antigen-binding molecules, comprising
a. screening said library with an epitope of interest to identify antigen-binding molecules that bind said epitope of interest;
b. sequencing the binders identified in step a. to determine which residues are enriched and which are depleted;
c. using the information from step b. to synthesize an optimized library of variants of the binders; and
d. repeating steps a-c using the optimized library.
In one embodiment, the at least one residue selected for mutation is in a
CDR region of the template. In a preferred embodiment, the antigen-binding molecules are antibodies and the residues selected for mutation are in less than all of the CDRs, or in regions outside of the CDR that are likely to affect antigen binding.
In certain embodiments, the methods of the invention are computer implemented. Thus, the invention is also directed to a database on a computer readable medium comprising the three-dimensional structure of a plurality of known antigen-binding molecules. In one embodiment, the invention is directed to a method for generating a library of antigen binding molecules for screening for binding to an epitope of interest, said method comprising:
a. executing a computer program to select a template antigen-binding molecule from a set of possible template antigen-binding molecules, wherein said selected template does not specifically bind the epitope of interest but may be known to specifically bind another epitope;
b. selecting at least one residue position in said template antigen-binding molecule for mutation;
c. selecting at least one variant residue to substitute at the at least one residue position selected in b;
such that a library containing a plurality of variants of said template is generated.
is the frequency of a specific amino acid in the germ-line sequences of the group. mutations in group is the number of mutations in the group. Standard errors are presented by the error bars
As used herein, the term “antigen binding molecule” refers in its broadest sense to a molecule that specifically binds an antigenic determinant. An antigen binding molecule can be, for example, an antibody or a fragment thereof that specifically binds to an antigenic determinant. By “specifically binds” is meant that the binding is selective for the antigen of interest and can be discriminated from unwanted or nonspecific interactions.
As used herein, the term “antibody” is intended to include whole antibody molecules, including monoclonal, polyclonal and multispecific (e.g., bispecific) antibodies, Also encompassed are antibody fragments that retain binding specificity including, but not limited to, VH fragments, VL fragments, Fab fragments, F(ab′)2 fragments, scFv fragments, Fv fragments, minibodies, diabodies, triabodies, and tetrabodies (see, e.g., Hudson and Souriau, Nature Med. 9: 129-134 (2003) (hereby incorporated by reference in their entirety)). Also encompassed are humanized, primatized and chimeric antibodies.
As used herein, the term “variant” refers to a polypeptide differing from a specifically recited polypeptide of the invention by amino acid insertions, deletions, and/or substitutions, created using, e.g., recombinant DNA techniques. Variants of the antigen binding molecules of the present invention include antigen binding molecules wherein one or several of the amino acid residues are modified by substitution, addition and/or deletion in such manner that does not substantially affect antigen binding affinity (that is, the affinity remains within one order of magnitude of the affinity of another variant). Guidance in determining which amino acid residues may be replaced, added or deleted without abolishing activities of interest, may be found by comparing the sequence of the particular polypeptide with that of homologous peptides and minimizing the number of amino acid sequence changes made in regions of high homology (conserved regions) or by replacing amino acids with consensus sequence amino acids.
As used herein, “shape complementarity” means the 3D shapes, either as detected experimentally or through homology modeling or through de-novo modeling, of the interacting surfaces fit each other without clashes or steric hindrances.
As used herein, “physico-chemical complementarity” means alignments of complementary charges, pi-pi interactions, donors and/or acceptors of H-bonds and any other molecular interactions that stabilize the complex.
As used herein, “substitutable positions” means positions in the antibody that, according to sequence and structure analysis, may be substituted without compromising the structure, expression stability or other characteristics of the antibody other than what it can bind.
As used herein, “preferred substitution” means that variability in a given position occurs more than expected by chance when comparing similar sequences.
As used herein, “neutral substitution” means that variability in a given position occurs as expected by chance when comparing similar sequences.
As used herein, “allowed substitution” means that variability in a given position occurs less than expected by chance when comparing similar sequences.
As used herein, “enriched residues” and “depleted residues” are determined as follows: The propensity of each amino acid in a given position in the original library determines the expected distribution of amino acids in this position, assuming that the position does not affect binding. After one or more rounds of selection, the observed propensities of amino acids in that position are recorded. If, by a predefined statistic, e.g. measuring the observed frequency compared to expected frequency using a measure such as log-odds, a certain amino acid is observed significantly more frequently than expected under the null hypothesis, then the amino acid is said to be enriched in that position. If it appears significantly less, it is said to be depleted.
Protein-protein docking is a computational method used to predict the structure of macromolecular complexes by orienting the three dimensional structures of two binding partners relative to each other, a goal of which is to accurately model the binding interface. A variety of algorithms can be utilized to sample the rotational and translational search space, including Fast Fourier Transform (Comeau, S. R., et al., ClusPro: a fully automated algorithm for protein-protein docking: Nucleic Acids Res, v. 32, p. W96-9 (2004); Ohue, M., et al., MEGADOCK: an all-to-all protein-protein interaction prediction system using tertiary structure data: Protein Pept Lett, v. 21, p. 766-78 (2014); Tovchigrechko, A., and I. A. Vakser, GRAMM-X public web server for protein-protein docking: Nucleic Acids Res, v. 34, p. W310-4 (2006)) (each of which is hereby incorporated by reference in its entirety), geometric hashing (Schneidman-Duhovny, D., et al., PatchDock and SymmDock: servers for rigid and symmetric docking: Nucleic Acids Res, v. 33, p. W363-7 (2005)) (hereby incorporated by reference in its entirety), Spherical polar Fourier (Ritchie, D. W., and V. Venkatraman, 2010, Ultra-fast FFT protein docking on graphics processors: Bioinformatics, v. 26, p. 2398-405) (hereby incorporated by reference in its entirety) Monte Carlo Search (Gray, J. J., et al. Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations: J Mol Biol, v. 331, p. 281-99 (2003); Huang, S. Y., Search strategies and evaluation in protein-protein docking: principles, advances and challenges: Drug Discov Today, v. 19, p. 1081-1096 (2014)) (each of which is hereby incorporated by reference in its entirety). The key to successful protein-protein docking is the ability to select native or near-native structures from the thousands of docking poses the search algorithm generates, which is not a trivial challenge (Huang, S. Y., Search strategies and evaluation in protein-protein docking: principles, advances and challenges: Drug Discov Today, v. 19, p. 1081-1096 (2014); Moal, I. H., et al., Scoring functions for protein-protein interactions: Curr Opin Struct Biol, v. 23, p. 862-7 (2014)) (each of which is hereby incorporated by reference in its entirety). To select docking poses, different scoring functions can be implemented to rank the set of docking poses, for example, optimizing shape complementarity, energy functions (vdw, electrostatics, desolvation), binding free energies, and statistical potentials (Chen, R., et al., ZDOCK: an initial-stage protein-docking algorithm: Proteins, v. 52, p. 80-7 (2003); Gray, J. J., et al., Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations: J Mol Biol, v. 331, p. 281-99 (2003); Huang, S. Y., Search strategies and evaluation in protein-protein docking: principles, advances and challenges: Drug Discov Today, v. 19, p. 1081-1096 (2014); Moal, I. H., et al., Scoring functions for protein-protein interactions: Curr Opin Struct Biol, v. 23, p. 862-7 (2013); Norel, R., et al., Electrostatic contributions to protein-protein interactions: fast energetic filters for docking and their physical basis: Protein Sci, v. 10, p. 2147-61 (2001); Ohue, M., et al., MEGADOCK: an all-to-all protein-protein interaction prediction system using tertiary structure data: Protein Pept Lett, v. 21, p. 766-78 (2014); Schneidman-Duhovny, et al., PatchDock and SymmDock: servers for rigid and symmetric docking: Nucleic Acids Res, v. 33, p. W363-7 (2005)) (each of which is hereby incorporated by reference in its entirety). In addition to these physical and statistical based scoring functions, biological data can be incorporated either at the search stage or the scoring stage, for example defining residues that contribute to the binding interface or restricting the docked interface to the cdrs of an Ab in Ab-Ag docking (Dominguez, C., et al., HADDOCK: a protein-protein docking approach based on biochemical or biophysical information: J Am Chem Soc, v. 125, p. 1731-7 (2003); Gray, J. J., et al., Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations: J Mol Biol, v. 331, p. 281-99 (2003)) (each of which is hereby incorporated by reference in its entirety).
Several challenges to the problem of protein-protein docking exist. Docking methods generally perform well when re-docking the individual binding partners from the structure a bound complex, yet performance degrades when the structures of two proteins in their unbound state are used (Janin, J., 2010, Protein-protein docking tested in blind predictions: the CAPRI experiment: Mol Biosyst, v. 6, p. 2351-62) (hereby incorporated by reference in its entirety). Moreover, often rigid docking is performed, which does not take into account the potentially large conformation changes in secondary structure that may occur in some cases of protein-protein binding. Advances in docking include attempting to incorporate flexibility into the structures being docked, whether on the level of backbone or side chain (Zacharias, M., 2010, Accounting for conformational changes during protein-protein docking: Curr Opin Struct Biol, v. 20, p. 180-6) (hereby incorporated by reference in its entirety).
An reasonably accurate model of the interface of a protein-protein complex is a important for protein design experiments that aim to introduce novel function to protein scaffold (Fleishman, S. J., et al., Computational design of proteins targeting the conserved stem region of influenza hemagglutinin: Science, v. 332, p. 816-21(2011)) (hereby incorporated by reference in its entirety). In some cases, there has even been success using models of the proteins of interest for docking and subsequent protein design (Tharakaraman, K., et al. Redesign of a cross-reactive antibody to dengue virus with broad-spectrum activity and increased in vivo potency: Proc Natl Acad Sci USA, v. 110, p. E1555-64 (2013)) (hereby incorporated by reference in its entirety).
In order to predict the structure of a macromolecular complex, using docking or other methods, a three-dimensional structure of the individual proteins is required. In the absence of experimentally determined structures (i.e. X-ray or NMR), a model of the protein must be generated. In general, models can be built using three methods—homology modeling, ab initio modeling and fold-recognition/threading methods (Petrey, D., and B. Honig, 2005, Protein structure prediction: inroads to biology: Mol Cell, v. 20, p. 811-9) (hereby incorporated by reference in its entirety). Reliable models can be generated by homology modeling if the protein of interest has a homolog with an experimentally determined structure, where the homology is at least ˜30% sequence identity (over a significant alignment length)(Rost, B., 1999, Twilight zone of protein sequence alignments: Protein Eng, v. 12, p. 85-94) (hereby incorporated by reference in its entirety). The homolog structure is used as ‘template’ on which to build the model (Sali, A., and T. L. Blundell, Comparative protein modelling by satisfaction of spatial restraints: J Mol Biol, v. 234, p. 779-815 (1993); Sali, A., et al., Evaluation of comparative protein modeling by MODELLER: Proteins, v. 23, p. 318-26 (1995); Webb, B., and A. Sali, Comparative Protein Structure Modeling Using MODELLER: Curr Protoc Bioinformatics, v. 47, p. 5.6.1-5.6.32 (2014)) (each of which is hereby incorporated by reference in its entirety). This 30% identity ‘rule of thumb’ may be sufficient for reliably modeling the correct protein fold; however, insertions or deletions, or sequence variability within loop regions, complicate the modeling and additional modeling approaches may be required. For proteins that do not have known 3D structures of homologs, or for regions of a protein with a high degree of variability relative to the template, methods such as ab initio modeling, or fold-recognition can be implemented (Petrey, D., and B. Honig, Protein structure prediction: inroads to biology: Mol Cell, v. 20, p. 811-9 (2005)) (hereby incorporated by reference in its entirety).
Structural relationships between evolutionarily distant sequences, as identified by structure alignments and/or other computational tools, can be used as a method to predict function for proteins that lack functional annotation but have known structures (Goldsmith-Fischman, S., and B. Honig, Structural genomics: computational methods for structure analysis: Protein Sci, v. 12, p. 1813-21 (2003); Goldsmith-Fischman, S., et al., The SufE sulfur-acceptor protein contains a conserved core structure that mediates interdomain interactions in a variety of redox protein complexes: J Mol Biol, v. 344, p. 549-65 (2004)) (each of which is hereby incorporated by reference in its entirety). As an extension of this idea, the structure of the interface in a protein-protein complex (experimental or modeled by docking) may be used to identify and/or predict additional potential binders, by aligning regions of the protein comprising one side of the interface with a database of protein 3D structures, either by structural alignment of atoms or alignment of protein surfaces (Dey, F., et al., Toward a “structural BLAST”: using structural relationships to infer function: Protein Sci, v. 22, p. 359-66 (2013); Gao, M., and J. Skolnick, iAlign: a method for the structural comparison of protein-protein interfaces: Bioinformatics, v. 26, p. 2259-65 (2010); Pandit, S. B., and J. Skolnick, Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score: BMC Bioinformatics, v. 9, p. 531 (2008); Shulman-Peleg, A. et al., SiteEngines: recognition and comparison of binding sites and protein-protein interfaces: Nucleic Acids Res, v. 33, p. W337-41 (2005); Zhang, Q. C., et al., Structure-based prediction of protein-protein interactions on a genome-wide scale: Nature, v. 490, p. 556-60 (2012)) (each of which is hereby incorporated by reference in its entirety).
Molecular Dynamics (MD) is a method that computationally simulates the movement of atoms and subsequent behavior of macromolecules in a biological system. (Karplus, M., and J. A. McCammon, Molecular dynamics simulations of biomolecules: Nat Struct Biol, v. 9, p. 646-52 (2002)) (hereby incorporated by reference in its entirety). The physical properties of the interaction potentials between atoms are described by a force-field, a set of functions approximating different properties of the atoms. The solvent properties of the biological system can be modelled explicity (i.e. using 3D models of water molecules) or implicitly, using various solvent models (Feig, M. et al., Journal of Computational Chemistry 25 (2): 265-84. (2004) (hereby incorporated by reference in its entirety)). MD can be utilized to assess and evaluate models of proteins, protein-ligand complexes, protein-protein interfaces.
In addition to physics-based approaches, machine learning methods can be implemented to analyze and predict components of protein-protein interfaces. Machine learning methods like Support Vector Machines (SVMs) and Random Forests are general algorithms developed to ‘learn’ from example data represented as vectors (Breiman, L., Random forests: Machine Learning, v. 45, p. 5-32 (2001); Cortes, C., and V. Vapnik, Support-vector networks, Machine Learning, September 1995, Volume 20, Issue 3, pp 273-297,) (each of which is hereby incorporated by reference in its entirety). Machine learning approaches as well as statistics-based methods have been used to predict Ag-Ab interfaces (Sela-Culang, I., et al., Using a combined computational-experimental approach to predict antibody-specific B cell epitopes: Structure, v. 22, p. 646-57 (2014)) (hereby incorporated by reference in its entirety) and suggest positions that may participate in Ag binding (Burkovitz, A., I. et al., Large-scale analysis of somatic hypermutations in antibodies reveals which structural regions, positions and amino acids are modified to improve affinity: FEBS J, v. 281, p. 306-19 (2014)) (hereby incorporated by reference in its entirety).
The molecular mechanisms that underlie somatic hypermutations have been the focus of extensive research. The introduced mutations are predominantly point mutations and rarely base insertions or deletions (Zhao, S. et al. Mol Immunol 47:694-700 (2010); Li, Z. et al., Genes Dev 18,1-11 (2004) (each of which is hereby incorporated by reference in its entirety)) and are mediated by the activation-induced deaminase (AID) enzyme (Maul, R. W. et al., Adv Immunol 105, 159-191 (2010); Muramatsu, M. et al., J Biol Chem 274,18470-18476 (1999) (each of which is hereby incorporated by reference in its entirety). AID introduces diversity by converting cytosine to uracil, which activates error-prone DNA repair mechanisms (Maul, R. W. et al., Adv Immunol 105,159-191 (2010); Pham, P. et al., Nature 424,103-107 (2003); Peled, J. U. et al., Annu Rev Immunol 26: 481-511 (2008) (each of which is hereby incorporated by reference in its entirety). Cytosines located within DNA motifs that are preferred binding targets of the AID enzyme are commonly referred to as hotspots (Dorner, T. et al., Eur J Immunol 28, 3384-3396 (1998) (hereby incorporated by reference in its entirety). However, not all of the hotspots are targeted (Kinoshita, K. et al., Nat Rev Mol Cell Biol 2,493-503 (2001) (hereby incorporated by reference in its entirety)), and many SHMs occur near hotspots but not within them (Clark, L. A. et al., J Immunol 177, 333-340 (2006) (hereby incorporated by reference in its entirety)). The assumption that AID plays an important role in the SHM process inspired attempts to utilize it in vitro, e.g. by coupling mammalian cell-surface display with AID-directed SHM (Bowers, P. M. et al., Proc Natl Acad Sci USA 108, 20455-20460 (2011) (hereby incorporated by reference in its entirety)), or by designing phage display libraries based on DNA hotspots (Chowdhury, P. S. et al., Nat Biotechnol 17, 568-572 (1999) (hereby incorporated by reference in its entirety)).
Studies that have attempted to characterize SHMs structurally mostly involved analyses of the crystal structures of one or a few pairs of germline and mature variants of a specific Ab in order to determine how structural factors affect affinity enhancement. In one such study, examination of the X-ray crystal structures of four anti-lysozyme Ab variants at various maturation stages revealed that binding is enhanced by burial of increasing amounts of an apolar surface area and by improving shape complementarity. (Li, Y. et al., Nat Struct Biol 10, 482-488 (2003) (hereby incorporated by reference in its entirety). However, analysis of another set of Abs found that the mature Ab does not have better shape complementarity to the Ag than its germline variant, but exhibits a small improvement in shape complementarity between the variable light (VL) chain and the variable heavy (VH) chain, and has a higher electrostatic contribution to Ag binding than that of the germline Ab. (Midelfort, K. S. et al., J Mol Biol 343, 685-701 (2004) (hereby incorporated by reference in its entirety). The X-ray structure of an anti-hapten Ab and its corresponding germline Ab suggested that, in this case, the increased affinity is achieved mainly by electrostatic optimization. (Chong, L. T. et al., Proc Natl Acad Sci USA 96, 14330-14335 (1999) (hereby incorporated by reference in its entirety). Several studies used molecular dynamics simulations of a handful of mature Abs (Wong, S. E. et al., Proteins 79, 821-829 (2011) (herein incorporated by reference in its entirety), or a specific Ab lineage (Schmidt, A. G. et al., Proc Natl Acad Sci USA 110, 264-269 (2013); Thorpe, I. F. et al., Proc Natl Acad Sci USA 104, 8821-8826 (2007) (each of which is herein incorporated by reference in its entirety), and reported that rigidification of the paratope leads to a reduction in the entropic cost of the interaction.
The studies that have examined whether SHMs are focused on residues involved in Ag binding reached contradictory conclusions. Clark et al. identified SHMs in over 11 000 Ab sequences. (Clark, L. A. et al., J Immunol 177, 333-340 (2006) (herein incorporated by reference in its entirety). They reported that Ag-contacting positions are mutated three times more often than core residues. However, in this analysis, interface positions in the Ab sequence were defined as Ab positions that are within 12 Å of an Ag atom in any PDB structure, a definition that covers mostly residues that do not physically interact with the Ag. SHMs and hotspots were reported to be over-represented in the complementarity-determining regions (CDRs) (Clark, L. A. et al., J Immunol 177, 333-340 (2006); Dorner, T. et al., J Immunol 158, 2779-2789 (1997)). However, while CDRs cover ˜80% of the Ag-binding residues, 50-60% of the residues in the CDRs do not contact the Ag. (Kunik, V. et al., PLoS Comput Biol 8, e1002388 (2012) (herein incorporated by reference in its entirety). Several studies indicated that SHMs mostly occur in the periphery of the germline Ag-binding site and not in its center (Tomlinson, I. M. et al., J Mol Riot 256, 813-817 (1996); Thom, G. et al., Proc Natl Acad Sci USA 103, 7619-7624 (2006) (hereby incorporated by reference in its entirety), and that SHMs do not show a clear preference toward residues that are in contact with the Ag (Ramirez-Benitez, M. C. et al., Proteins 45, 199-206 (2001); Raghunathan, T. et al., J Mol Recog 25, 103-113 (2012) (hereby incorporated by reference in their entirety)). It has even been suggested that mutations in the interface may be disfavored as they disrupt Ab-Ag interaction. (Ramirez-Benitez, M. C. et al., Proteins 45, 199-206 (2001); Persson, J. et al., Tumour Biol 30, 221-231 (2009) (hereby incorporated by reference in their entirety).
In one embodiment, the steps of the process of the present invention correspond to the iterative process described in
In one embodiment of the invention, a model of the antigen of interest in the receptor-bound conformation, is generated (e.g. using tools for homology structural modeling such as MODELLER (Fiser, A., et al. Modeling of loops in protein structures: Protein Sci, v. 9, p. 1753-73 (2000); Marti-Renom, M. A., et al., Comparative protein structure modeling of genes and genomes: Annu Rev Biophys Biomol Struct, v. 29, p. 291-325 (2000); Sali, A., and T. L. Blundell, Comparative protein modelling by satisfaction of spatial restraints: J Mol Biol, v. 234, p. 779-815 (1993)) (each of which is hereby incorporated by reference in its entirety) as implemented in the Discovery Studio suite, or any other structure prediction tool)(Accelrys et al., 2013 (hereby incorporated by reference in its entirety)). When the experimentally determined structure is available (e.g. in the PDB (Berman, H. M., et al., The Protein Data Bank: Acta Crystallogr D Biol Crystallogr, v. 58, p. 899-907 (2002)) (hereby incorporated by reference in its entirety)), it can be used as well). The model may be further refined by energy minimization (e.g. using CHarMM as implemented in the Discovery Studio suite (Brooks, B. R., et al. CHARMM: the biomolecular simulation program: J Comput Chem, v. 30, p. 1545-614 (2009)) (hereby incorporated by reference in its entirety), or any software for minimization), and in some cases molecular dynamics (MD) simulations (e.g. using GROMACS (Hess, B., et al. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation: Journal of Chemical Theory and Computation, v. 4, p. 435-447 (2008)) (hereby incorporated by reference in its entirety) or other MD software tools)
When it is impossible to reliably model the entire protein, a structural model of the desired epitope alone may be used. This model can be generated using, for example, homology modeling (as described above) or de-novo prediction of the structural determinant.
In one embodiment of the present invention, the model (or experimental structure) is then docked against a database of antibody three-dimensional structures, using, for example, ZDOCK (Chen, R., et al. ZDOCK: an initial-stage protein-docking algorithm: Proteins, v. 52, p. 80-7 (2003); Pierce, B., and Z. Weng, ZRANK: reranking protein docking predictions with an optimized energy function: Proteins, v. 67, p. 1078-86 (2007); Vreven, T., et al., Performance of ZDOCK in CAPRI rounds 20-26: Proteins (2013)) (each of which is herein incorporated by reference in its entirety) as implemented in Discovery Studio, (Accelrys, Software, and Inc., 2013, Discovery Studio Modeling Environment, Release 4.0, San Diego, Accelrys Software Inc. (hereby incorporated by reference in its entirety); Marcatili, P., et al. The association of heavy and light chain variable domains in antibodies: implications for antigen specificity: Febs Journal, v. 278, p. 2858-2866 (2011)) (hereby incorporated by reference in its entirety) and/or additional docking algorithms (e.g. Hex Ritchie, D. W., and V. Venkatraman, Ultra-fast FFT protein docking on graphics processors: Bioinformatics, v. 26, p. 2398-405 (2010)), (hereby incorporated by reference in its entirety) Megadock (Ohue, M., et al. , MEGADOCK: an all-to-all protein-protein interaction prediction system using tertiary structure data: Protein Pept Lett, v. 21, p. 766-78 (2014)) (each of which is herein incorporated by reference in its entirety). Biological and structural data for the antigen and antibody may be used to focus the docking or to eliminate unlikely poses (e.g. poses in which the contacts with the antigen are made by residues in the constant region) so that the epitope of interest and the CDRs are in the docked interface. This screening of poses may rely on the following considerations:
1. Determining whether the contacting residues in the pose involve CDR positions that are likely to be in contact with the antigen. This can be based on biophysical assessment and on statistical assessment of the propensities of contacts in each position in all known antibodies, as described in (Kunik, V., and Y. Ofran, The indistinguishability of epitopes from protein surface is explained by the distinct binding preferences of each of the six antigen-binding loops: Protein Eng Des Sel. (2013); Kunik, V., et al., Structural consensus among antibodies defines the antigen binding site.: PLoS Comput Biol, v. 8, p. e1002388 (2012b)) (each of which is hereby incorporated by reference in its entirety). Identification of the antigen binding residues can be based on the process described in (Kunik, V., et al. Paratome: an online tool for systematic identification of antigen-binding regions in antibodies based on sequence or structure, Nucleic Acids Res, v. 40: England, p. W521-4 (2012a)) (hereby incorporated by reference in its entirety), or on other methods for identifying CDRs (e.g. Chothia, C., and A. M. Lesk, Canonical structures for the hypervariable regions of immunoglobulins: J Mol Biol, v. 196, p. 901-17 (1987); Giudicelli, V., et al., IMGT/GENE-DB: a comprehensive database for human and mouse immunoglobulin and T cell receptor genes: Nucleic Acids Res, v. 33, p. D256-61 (2005); Kabat, E., A., et al., Sequence of proteins of immunological interest, National Institute of Health, Bathesda (1983); Lefranc, M. P., et al., IMGT/3Dstructure-DB and IMGT/DomainGapAlign: a database and a tool for immunoglobulins or antibodies, T cell receptors, MHC, IgSF and MhcSF: Nucleic Acids Research, v. 38, p. D301-D307 (2010); Lefranc, M. P., et al. IMGT/3Dstructure-DB and IMGT/StructuralQuery, a database and a tool for immunoglobulin, T cell receptor and MHC structural data: Nucleic Acids Research, v. 32, p. D208-D210 (2004); Lefranc, M. P., et al. IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains: Dev Comp Immunol, v. 27, p. 55-77 (2003); Morea, V., et al. Antibody modeling: implications for engineering and design: Methods, v. 20, p. 267-79 (2000)) (hereby incorporated by reference in their entirety) or antigen binding residues (Krawczyk, K., et al., Antibody i-Patch prediction of the antibody binding site improves rigid local antibody-antigen docking: Protein Eng Des Sel, v. 26, p. 621-9 (2013); Krawczyk, K., et al. , Improving B-cell epitope prediction and its application to global antibody-antigen docking: Bioinformatics, v. 30, p. 2288-94 (2014); Olimpieri, P. P., et al. Prediction of site-specific interactions in antibody-antigen complexes: the proABC method and server: Bioinformatics, v. 29, p. 2285-91 (2013); TRAMONTANO, A., et al. FRAMEWORK RESIDUE-71 IS A MAJOR DETERMINANT OF THE POSITION AND CONFORMATION OF THE 2ND HYPERVARIABLE REGION IN THE VH DOMAINS OF IMMUNOGLOBULINS: Journal of Molecular Biology, v. 215, p. 175-182 (1990)) (each of which is hereby incorporated by reference in its entirety).
2. Removing poses in which the epitope does not overlap with the preselected epitope.
3. Selecting poses that, based on structure-function analysis, are likely to result in desired biological activity.
In one embodiment, the resulting docking poses are then filtered in order to identify poses that have “native-like” properties, such as shape and/or biophysical feature complementarity. Additional scores are learned from known antibody-antigen complexes. The following filters may be implemented:
A. Docking ranking: Top X ranking by various docking scoring functions.
B. Docking consensus: For each docked antibody-antigen complex, poses that pass filter A are compared between at least two different docking algorithms, and those that are generated by more than one algorithm (based on agreement in RMSD of the antibody CDRs) are selected for further analysis.
C. Knowledge-based features of known antibody-antigen complexes: Use machine learning to evaluate the complexes that have passed filter B. For example, we developed two different types of machine-learning classifiers, based on a similar approach to the one described in (Sela-Culang, I. et al., Structure 22:646-657 (2014) (herein incorporated by reference in its entirety).
The present inventors assembled a training set of antibody-antigen complexes of known structure. In each complex the present inventors identified the ABR/CDR residues on the antibody that contact the antigen, and the residues on the antigen that contact the antibody. Each antigen residue was described in terms of its secondary structure (predicted or experimentally determined), evolutionary conservation, solvent accessibility, the identity, secondary structure and conservation of each of its neighbors (the inventors used a sequence window of 3 to 7 residues on each side but other window sizes may be used as well). The antibody residues were described in varied windows in terms of residue type, solvent accessibility, the position of residue within the CDR, the type of the CDR, and whether it is a germ-line residue or mutated (SHM). In addition, we built a knowledge-based potential for contacts between antibody residues and antigen residues. These potentials quantify the propensity (e.g. in terms of log likelihood) for a contact between a certain type of residue on the antibody and a certain type of residue on the antigen. That is, it assesses whether a certain type of residue-residue contact between antibody and antigen occurs more or less than expected by chance. This allowed the inventors to determine whether this particular contact is favored or disfavored in antibody-antigen interfaces. The inventors also built a more detailed set of such potentials for each CDR separately. This allows us to give a positive or negative score for each contact on each CDR. When possible (e.g. when the amount of experimental data permits), the inventors also built additional sets of potentials for specific structural positions on each CDR. This was done by aligning multiple CDRs that are similar to each other and then assessing the propensities of each of the 20×20 possible contacts between residue on the antibody position and residues on the antigen.
The input vector for the supervised machine-learning algorithm (Random Forest and SVMs was used, but other machine learning algorithms can be used as well), was a vector that describes a residue position on the antibody, a vector that described a residue position on the antigen and the contact potential for this pair. The positive training set was the observed contacts, and the negative set was random pairing of ABR antibody and antigen surface residues. A 3-fold cross-validation was used. The classifier distinguished well between real and decoy antigenic contacts.
Antibody-antigen complexes can be examined by the analysis of the predictions of classifiers' predictions on the interface residue pairs. For example, geometric or the arithmetic mean of the predictions scores on all or on a subset of the residue pairs in the interface of question.
The present inventors assembled a positive training set of antibody-antigen interfaces collected from experimentally determined 3D structures. A negative set was assembled from docking structures of antibodies to proteins, under the assumption that in the vast majority of cases a random antibody will not bind a random antigen and thus these interfaces represent false interfaces. The inventors filtered these negative interfaces, as described above, to retain only native-like complexes. Then, each interface was described using the following features: the number of contacts, what fraction of contacts are germ-line and what fraction are SHMs. How many specific contacts are there, how many H-bonds, how many aromatic interactions, etc. A score for the curvature of the surface, assessment of shape complementarity, Assessment of charge complementarity, area of the interface, relative area of interface on the antigen, reduction in solvent accessible area for the antibody and for the theoretical paratope (as calculated by canonic CDRs or by Paratome (Kunik, V., S. Ashkenazi, and Y. Ofran, Paratome: an online tool for systematic identification of antigen-binding regions in antibodies based on sequence or structure, Nucleic Acids Res, v. 40: England, p. W521-4 (2012a)) (herein incorporated by reference in its entirety). Other biophysical and structural description of the interface may be used as well (e.g. conservation). The inventors also recorded the potentials for all contacts, as described above. In addition docking was run for the positive set, and the docking score of all docked poses was recoded. The inventors added to the vector that represented each interface features that described the distribution of docking scores. This is motivated by the observation that the distribution of docking scores of the different poses of a given antibody-antigen pair, differ dramatically between pair that are known to bind each other and pairs that are not known to bind each other (and that are assumed not to). These features include the distance (in terms of standard deviations) of the extreme values from the mean and or the median, the standard deviation itself, the distant between the mean and the median, and quintile characteristics. The inventors then used a Random Forest and an SVM to distinguish between real interfaces and decoys. A 10-fold cross validation has shown that this classifier distinguishes well between real and false interfaces.
In addition to identifying “native-like” complexes based on results of protein-protein docking methods, the antibody-antigen complex may be modeled based on information obtained from structural analyses of protein-protein interfaces. Structures of either the antibodies or the antigen, or even only the epitope, may be screened against a database of 3D structures of protein complexes, in the form of local structure alignments, to identify protein-protein interfaces in which one partner shares structural features with the query protein. Superposition of the query (antibody or antigen/epitope) on the structurally similar protein-protein complex may suggest a model of the antibody-antigen complex, which can subsequently be analyzed using binding free energy calculations (e.g. using the energy calculation tools in Discovery Studio (Accelrys, Software, and Inc., 2013, Discovery Studio Modeling Environment, Release 4.0, San Diego, Accelrys Software Inc.) (hereby incorporated by reference in its entirety), or similar tools such as FoldX (Schymkowitz, J., et al. The FoldX web server: an online force field: Nucleic Acids Research, v. 33, p. W382-8 (2005) (hereby incorporated by reference in its entirety), Rosestta (Kuhlman, B., et al. Design of a novel globular protein fold with atomic-level accuracy: Science, v. 302, p. 1364-8 (2003); Kunik, V., et al., Paratome: an online tool for systematic identification of antigen-binding regions in antibodies based on sequence or structure, Nucleic Acids Res, v. 40: England, p. W521-4 (2012a); Liu, Y., and B. Kuhlman, RosettaDesign server for protein design: Nucleic Acids Res, v. 34, p. W235-8 (2006) (hereby incorporated by reference in their entirety) or other computational tools). It is also possible to use machine-learning analysis described above. This methodology can be also implemented as a filter to analyze the models resulting from protein-protein docking. In addition, antibody-antigen interfaces arising from protein-protein docking can be structurally compared, using these methods, with known protein-protein interfaces to identify interactions that may introduce specificity.
Docking models that pass the filters and represent potential complexes with the template antibody may be subjected to energetic refinement (for example, minimization and side chain refinement implemented in Discovery Studio or similar methods) prior to further analyses, and MD simulations may be used to assess their stability.
The process of pose selection described above enables the selection of a docked model with the antibody structure to be used as a template for library design.
In one embodiment of the present invention, positions within the CDRs of the template antibody or antibodies are selected for the introduction of variability for library design. For each antibody template, the CDRs are identified using, for example, Paratome (Kunik, V., et al. Paratome: an online tool for systematic identification of antigen-binding regions in antibodies based on sequence or structure, Nucleic Acids Res, v. 40: England, p. W521-4 (2012a)) (hereby incorporated by reference in its entirety) or other tools for CDR identification. Based on the docked model of the antibody-antigen complex, residues within the CDRs that are in the interface with the antigen in the model are selected as potential candidates for mutational variability. Sequence analysis (using Blast or similar program) and, in some cases, structure based sequence alignments (North, B. et al., J. Mol. Biol. 406:228-256 (2011) (herein incorporated by reference in its entirety) are used to analyze these positions to determine whether they are likely to tolerate variability (based on how often variability is observed in related sequences). In addition, bioinformatic analyses of SHM data such as the data available in the analysis in (Burkovitz, A., I. Sela-Culang, and Y. Ofran, 2014, Large-scale analysis of somatic hypermutations in antibodies reveals which structural regions, positions and amino acids are modified to improve affinity: FEBS J, v. 281, p. 306-19) (hereby incorporated by reference in its entirety), may be used to evaluate the variability of these positions as well as their potential structural and functional relevance. Thus, the SHM data can be used to select both the positions and the variations. As seen in
Variation at each selected CDR position can be determined using physical-chemical considerations, knowledge-based approaches, and based on the SHMs data described above. In one embodiment, the residue positions are mutated in silico to other amino acids, either in the context of the docked model or the structure of the free antibody, in order to calculate the effect of the mutation on both the binding free energy and the folding energy (stability), respectively, using, for example, the Mutation Energy protocols implemented in Discovery Studio (Accelrys, Software, and Inc., 2013, Discovery Studio Modeling Environment, Release 4.0, San Diego, Accelrys Software Inc.) (hereby incorporated by reference in its entirety), or similar such algorithms (e.g. FoldX (Schymkowitz, J., J. Borg, F. Stricher, R. Nys, F. Rousseau, and L. Serrano, 2005, The FoldX web server: an online force field: Nucleic Acids Research, v. 33, p. W382-8) (hereby incorporated by reference in its entirety) Rosetta or algorithms available in the Schrodinger suit (Kuhlman, B., G. Dantas, G. C. Ireton, G. Varani, B. L. Stoddard, and D. Baker, 2003, Design of a novel globular protein fold with atomic-level accuracy: Science, v. 302, p. 1364-8; Liu, Y., and B. Kuhlman, 2006, RosettaDesign server for protein design: Nucleic Acids Res, v. 34, p. W235-8; Schrodinger, Release, and 2014-3, 2014, MacroModel, version 10.5, Schrodinger, LLC, New York, N.Y.; Schymkowitz, J., J. Borg, F. Stricher, R. Nys, F. Rousseau, and L. Serrano, 2005, The FoldX web server: an online force field: Nucleic Acids Research, v. 33, p. W382-8) (each of which is herein incorporated by reference in its entirety)). Sequence analysis and structure based sequence alignments are used to analyze the CDR positions when considering resulting in silico mutations to determine their likelihood. 3D models of the mutated antibodies in complex with the antigen may be analyzed by machine learning to identify favorable mutations and may be subjected to molecular dynamics simulations to assess the stability of the mutant antibody-antigen docked pose. Interfaces of known binders of the antigen can also be used as a guide for the variability. Applying Genetic algorithm or another search/optimization algorithm on the classifiers can be used to suggest positions and mutations in the library.
This experiment sought to determine the principles that guide in vivo Ab affinity maturation. In particular, we attempted to identify factors that determine which residues are removed and which new ones are introduced during the SHM process. Given the controversies regarding the tendency of the paratope to undergo SHM, we sought to determine whether different structural parts of the Abs have different tendencies for substitutions. To this end, we analyzed 3495 SHMs in 196 structurally characterized Ab-Ag complexes, and examined (a) the role of AID hotspots in directing mutations, (b) the selective pressure for substitutions in different structural regions of the Ab, and (c) the predicted energetic effect of each substitution. It was found that AID motifs have no effect on selection of mutated residues, but the energetic contribution to Ag binding appears to have a major effect. Finally, a map was generated of the preferred substitutions in each region of the Ab. These results contribute to understanding of the principles that govern the SHM process, and may guide the design and engineering of high-affinity Abs.
Using the data regarding preferred substitutions, we identified residues in the template sequence to be modified. Template variants were created by substituting these residues with variant residues indicated by the SHM analysis. In this manner, a library of template variants was formed for subsequent screening.
3D structure files of 752 Ab-Ag complexes were downloaded from IMGT/3Dstructure-DB (version 4.5.0). (Ehrenmann, F. et al., Nucleic Acids Res 38, D301-D307 (2010); Kaas, Q. et al., Nucleic Acids Res 32, D208-D210 (2004) (each of which is herein incorporated by reference in its entirety). Complexes with Abs from human (154 structures) or mouse and chimeric Abs (492 structures) were retained. Abs from mouse and chimeric Abs were grouped as mouse Abs. To identify the light and heavy chains in each complex, we clustered human sequences into two clusters and murine sequences into two clusters, each corresponding to either heavy or light, using BlastClust. (Dondoshansky, I. et al., BLASTclust (NCBI Software Development Toolkit). National Center for Biotechnology Information, Bethesda, Md. (2002) (herein incorporated by reference in its entirety). Complexes that included only one chain and light chain dimers were removed. For redundancy removal, VH and VL sequences of each Ab were concatenated, and BlastClust was used with sequence identity of 97% and coverage of 95%. The Ab-Ag complex that was not engineered or mutated was the selected representative sequence in each cluster. In cases where there was more than one non-engineered complex, the longest Ag with the best resolution was used. We identified Ags that are proteins or peptides. All other Ags were removed. One complex (PDB ID lIGC) in which the sole non-Ab chain was protein G was also excluded from the analysis. In case where the closest gene in IMGT did not agree with the annotated species, we reviewed the relevant literature, which led to exclusion of 12 complexes from the analysis: six of these cases were humanized Abs, five of them came from non-naive synthetic libraries and one came from rabbit. Overall, the dataset contained 196 non-redundant Ab-Ag complexes.
Sequence alignment was used to identify the related germline gene precursors and identify SHMs. Only variable regions were analyzed. Human and mouse sequences were submitted separately. Default parameters were used. The CDRH3 and CDRL3 alignments were manually reviewed and corrected accordingly. Similar results were obtained when the analysis was repeated after removing junction positions (positions 106-116 for the VH domain and positions 115 and 116 for the VL domain).
For each complex structure in the protein-protein dataset (fully described previously in Kunik, V. et al., Protein Eng Des Sel 26:599-609 (2013)) (herein incorporated by reference in its entirety), the interface of a given chain included all residues in that chain for which at least one of their heavy atoms is within a distance of 6 Å from any of the other chains (Ofran, Y., “Prediction of protein interaction sites” In C
We performed a computational alanine scan for all contacting residues in the Ab, and assessed the effect of this mutation on Ag binding. To assess SHMs, we mutated each introduced residue back to its germline residue. ΔΔG values were calculated using FoldX. (Schymkowitz, J. et al., Nucleic Acids Res 33, W382-W388 (2005); Guerois, R. et al., J Mol Biol 320: 369-387 (2002)) (each of which is herein incorporated by reference in its entirety). The following steps were performed in both cases, as they differ from each other only in the mutation target (alanine or the corresponding germline residue). First, PDB structures were optimized using the FoldX RepairPDB function. Then each mutation was performed separately using the BuildModel function. This resulted in generation of mutants and their corresponding wild-type structure models. The heavy chain and the light chain of the Ab were grouped together to calculate the energy values of the assembled Ab, and the AnalyzeComplex function was used to calculate the binding ΔG of each model. The ΔΔG value for each mutant was then calculated by subtracting the wild-type ΔG value from the mutant ΔG value.
Contact between two residues was defined as at least two heavy atoms (one from each residue) within a distance of 6 Å. The region “Ag interface” comprises all residues that contact the Ag but do not contact residues from the other Ab chain. The region “VH-VL interface” comprises all residues that contact the other Ab chains but not the Ag. The region “both interfaces” comprises Ab residues that contact both the Ag and the other Ab chain. The ABRs were identified using Paratome. (Kunik, V. et al., Nucleic Acids Res 40, W521-W524 (2012)) (herein incorporated by reference in its entirety). Residues in the ABR regions that do not contact the Ag or the other Ab chain were grouped as “ABRs not in interfaces”.
The DNA hotspot motifs were RGYW or WRCY (Darner, T. et al., Eur J Immunol 28, 3384-3396 (1998)) (herein incorporated by reference in its entirety) where R indicates a purine base, Y indicates a pyrimidine base, and W indicates for an A or T base. For each amino acid, the proportion within hotspot motifs is the number of occasions the amino acid appeared within the hotspot motif out of the total appearances of the same amino acid in the germline sequences (V and J segments only) for all Abs in the dataset.
G. Distance from the Nearest Hotspot Motif
For each amino acid or mutation up to position 105 (according to IMGT numbering) in the V region, the distance from the nearest hotspot motif (RGYW or WRCY) was calculated as described previously. (Clark, L. A. et al., J. Immunol. 177: 333-340 (2006)) (herein incorporated by reference in its entirety). Briefly, the distance was defined as the number of bases between the middle codon and the nearest base of a hotspot motif A distance of zero indicates that the middle codon is inside a hotspot motif. Since the motifs have four positions the center nucleotide of a codon is four times more likely to fall somewhere within the motif than to fall in any other specific distance from it. Therefore, the observed number of cases with a distance of zero was divided by four before calculation of distributions. Amino acids or mutations that had two hotspots within the exact same distance were counted twice for that distance (with opposite signs).
The 196 Ab-Ag complexes were divided into three random subsets. The propensity of each amino acid to be mutated in each subset was calculated as:
where AA1 gl region→X mature region is the number of changes from amino acid AA1 in the germ-line to any amino acid in the structural region,
is the frequency of amino acid AA1 in the germ-line sequences of structural region, and. mutations in the region is the number of mutations in the structural region. Priors of 1 were added. Propensity values from each of the random subsets were averaged and then used for standard error calculation.
Abs positions and CDR definitions are numbered according to IMGT numbering. (Lefranc, M. P. et al., Dev Comp Immunol 27: 55-77 (2003)) (herein incorporated by reference in its entirety). The mutation probability was calculated as the number of mutations in a specific position divided by the number of appearances of an amino acid in this specific position. If the number of appearances of an amino acid in a specific position was ≦5, it was excluded from
Standard errors for
A non-redundant dataset of 196 Ab-Ag complexes was generated (Table S1). Overall, 3495 SHMs were identified in the variable regions. Of those, 2172 occurred in mouse sequences (with a mean of 14.87 mutations per Ab) and 1323 occurred in human sequences (with a mean of 26.46 mutations per Ab). This difference may be ascribed, at least in part, to the way Abs are collected from mice and humans. The former are typically killed, and Abs collected, shortly after exposure to the Ag when they are a few months old. Human Abs, on the other hand, are typically collected from the blood of infected adults after repeated exposures to Ags.
As only the amino acid sequences of the mature Abs are available in the Protein Data Bank, it is impossible in most cases to retrieve the DNA sequences of the mature Ab from public databases. However, it is possible to retrieve the DNA sequences of the germline genes. These sequences allow us to evaluate the relationships between SHMs and AID hotspot motifs (RGYW or WRCY; R indicates a purine base, Y indicates a pyrimidine base, W indicates an A or a T base) (Darner T, et al., Eur. J. Immunol. 28:3384-3396 (1998) (hereby incorporated by reference in its entirety) in the germline genes.
We assessed the energetic effect on the binding of the Ag for every mutated residue in the Ab by mutating it back to its germline amino acid (in silico) and predicting the effect of this mutation on the ΔΔG of binding. The calculations were performed using FoldX (Schymkowitz, J. et al., Nucleic Acids Res 33: W382-W388) (hereby incorporated by reference in its entirety), which uses parameters and weights derived from experimental data from a large number of mutations. Large-scale assessments of the energetic predictions by FoldX for 1030 mutants (Guerois, R. et al., J Mol Biol 320: 369-387 (2002) (hereby incorporated by reference in its entirety)) have shown them to be strongly correlated (R=0.83) with experimentally measured effects. Thus, while FoldX may not always provide individual accurate predictions, it may be trusted to reveal trends in large sets of mutations. Half (51%) of the SHMs had predicted ΔΔG values of 0, suggesting that they have no effect on binding, while 32% of the SHMs had positive ΔΔG values and only 17% had negative ΔΔG values, indicating that, as expected, mutating mature residues back to their germline amino acids hampers Ag binding more often than improving it. The distribution of ΔΔG values for SHMs in the VH domain is almost identical to that of SHMs in the VL domain (
We divided the Ab into five non-overlapping structural regions (
where ‘r’ represents one of the five structural regions. If there is no preference for mutations in one region, the value of Pr for that region is 0. This propensity may be used to assess the selective pressure on each of the structural regions defined. Consistent with previous reports (Ramirez-Benitez, M. C. et al., Proteins 45:199-206 (2001)), Raghunathan, G. et al., J. Mol. Recog. 25:103-113 (2012)), we found that most of the mutations (71.63%) occur outside the Ag-binding site (
To determine which contacts contribute more to Ag binding, i.e. those that are formed by the residues mutated during SHM (“SHM contacting residues”) or those that are formed by residues retained from the primary germline sequence (“germline contacting residues”), we compared their predicted energetic contribution by mutating each contacting residue to alanine and calculating the effect of this mutation on binding energy (see “Experimental procedures”). The results are shown in
F. SHMs Make the Ab-Ag Interface more Similar to Other Protein-Protein Interfaces
We compared the amino acid composition of SHM contacts and germline contacts with those of general protein-protein interfaces. All aliphatic hydrophobic amino acids (alanine, isoleucine, leucine, methionine, proline and valine) are under-represented in the Ab-Ag interface compared with general interfaces (
To understand the role of different amino acids in SHM and the differences between the structural regions, we further analyzed the propensities for mutation in germline amino acids during SHM. As shown in
All polar amino acids show a very distinct preference across these four structural regions. Tyrosine, which is highly important in Ag binding due to its over-representation in Ab ABRs (Kunik, V. et al., Prot. Eng. Des. Sel. 26:599-609 (2013), is actually a preferred target for substitution in ABRs residues that are not in interfaces and in the VH-VL interface. The only exception is the Ag interface, in which tyrosine is slightly protected from substitutions. Threonine, which has also been suggested to be over-represented in Ag interfaces (Ofran, Y. et al., J. Immunol. 181:6230-6235 (2008)), is mostly neutral to mutation, but is mutated less than expected in the VH-VL interface. Tryptophan is a slightly preferred target for mutation among the residues that are part of both interfaces, and is highly under-mutated in all other regions. Asparagine and glutamine show opposite patterns. While asparagine is over-represented, glutamine is under-represented in both the VH-VL interface and ABRs that are not in any interface. Asparagine also has high mutability in both interfaces, and glutamine is mutated less than expected in the Ag interface. As for the charged amino acids, arginine shows a negative propensity for mutation in the VH-VL interface and in both interfaces. Lysine shows a positive propensity for mutations in ABRs that are not in interfaces. Glutamic acid, aspartic acid and histidine are all less mutated than expected in the Ag interface and in both interfaces.
The propensities for substitutions in
Rational design of high-affinity Abs requires targeting of Ab positions for mutations. Our analysis identifies such positions based on in vivo SHM data.
The regions in the Ab that have high average ΔΔG values for mutating their residues back to the germline amino acids overlap to some extent with regions that have a high mutation probability. However, not all CDR positions undergo substitutions that contribute to binding. For example, CDRH2 (VH positions 56-65) has high mutation probabilities for most of its residues. However, positions 63 and 65 have, on average, no energetic effect on binding despite their high probability for mutations. Positions that are frequently mutated and also show a substantial effect of SHMs on Ag-binding energy, such as 38, 55, 57, 59, 112, 113 and 114 on the VH domain and 110 and 116 on the VL domain, may be promising targets for in vitro affinity enhancement.
Many of the insights into the structural basis of in vivo affinity maturation were obtained from analyses of SHMs in a single pair, or in several pairs, of germline and mature Abs Li Y, Li H, Yang F, Smith-Gill S J & Mariuzza R A (2003) X-ray snapshots of the maturation of an antibody response to a protein antigen. Nat Struct Biol 10, 482-488. Midelfort K S, Hernandez H H, Lippow S M, Tidor B, Drennan C L & Wittrup K D (2004) Substantial energetic improvement with minimal structural perturbation in a high affinity mutant antibody. J Mol Biol 343, 685-701. Chong L T, Duan Y, Wang L, Massova I & Kollman P A (1999) Molecular dynamics and free-energy calculations applied to affinity maturation in antibody 48G7. Proc Natl Acad Sci USA 96, 14330-14335. Wong S E, Sellers B D & Jacobson M P (2011) Effects of somatic mutations on CDR loop flexibility during affinity maturation. Proteins 79, 821-829.-Schmidt AG, Xu H, Khan A R, O'Donnell T, Khurana S, King L R, Manischewitz J, Golding H, Suphaphiphat P, Carfi A, et al. (2013) Preconfiguration of the antigen-binding site during affinity maturation of a broadly neutralizing influenza virus antibody. Proc Natl Acad Sci USA 110, 264-269. Thorpe I F & Brooks C L 3rd (2007) Molecular evolution of affinity and flexibility in the immune system. Proc Natl Acad Sci USA 104, 8821-8826., Acierno J P, Braden B C, Klinke S, Goldbaum F A & Cauerhff A (2007) Affinity maturation increases the stability and plasticity of the Fv domain of anti-protein antibodies. J Mol Biol 374, 130-146.]. Large-scale studies that attempted to elucidate the principles that guide SHM reached contradictory conclusions regarding preference for SHMs in the Ab-Ag interface (Clark L A, Ganesan S, Papp S & van Vlijmen H W (2006) Trends in antibody sequence changes during the somatic hypermutation process.:, 333-340; Dorner T, Brezinschek H P, Brezinschek R I, Foster S J, Domiati-Saad R & Lipsky P E (1997) Analysis of the frequency and pattern of somatic mutations within nonproductively rearranged human variable heavy chain genes. J Immunol 158, 2779-2789; Ramirez-Benitez M C & Almagro J C (2001) Analysis of antibodies of known structure suggests a lack of correspondence between the residues in contact with the antigen and those modified by somatic hypermutation. Proteins 45: 199-206; Raghunathan G, Smart J, Williams J & Almagro J C (2012) Antigen-binding site anatomy and somatic mutations in antibodies that recognize different types of antigens. J Mol Recog 25: 103-113.). Our division of the Ab into various structural regions, and the calculation of mutation probability and the energy effects of SHMs in each region, reveal that the highest propensity for SHMs is in Ag-binding regions (Ag interface and both interfaces). These regions also provide the greatest energetic contribution to Ag binding. These results are consistent with the selection of B cells based on Ag binding and with previous studies that showed fine-tuning of the Ag-binding site through SHMs (Li Y, Li H, Yang F, Smith-Gill S J & Mariuzza R A (2003) X-ray snapshots of the maturation of an antibody response to a protein antigen. Nat Struct Biol 10, 482-488: Chong L T, Duan Y, Wang L, Massova I & Kollman P A (1999) Molecular dynamics and free-energy calculations applied to affinity maturation in antibody 48G7. Proc Natl Acad Sci USA 96, 14330-14335). Although to a lower extent than the regions involved Ag binding, ABR residues that are not in the interfaces and residues in the VH-VL interface are both favored targets for mutations and make a substantial energetic contribution to Ag binding. This is consistent with previous studies that showed how internal interface stabilization (Acierno J P, Braden B C, Klinke S, Goldbaum F A & Cauerhff A (2007) Affinity maturation increases the stability and plasticity of the Fv domain of anti-protein antibodies. J Mol Biol 374, 130-146.) and increased VH-VL interface shape complementarity (Midelfort K S, Hernandez H H, Lippow S M, Tidor B, Drennan C L & Wittrup K D (2004) Substantial energetic improvement with minimal structural perturbation in a high affinity mutant antibody. J Mol Biol 343, 685-701). result in enhanced Ag binding.
DNA motifs that enhance targeting of the AID enzyme have been the focus of many studies that attempted to identify SHM sites. Such DNA hotspot motifs were previously suggested to play an important role in the formation of SHMs (Darner T, Foster S J, Farner N L & Lipsky P E (1998) Somatic hypermutation of human immunoglobulin heavy chain genes: targeting of RGYW motifs on both DNA strands. Eur J Immunol 28, 3384-3396). However, our results indicate that the mature Ab sequence is determined by the affinity and possibly the stability of the Ab. The lack of correlation between the extent to which an amino acid is located within hotspots and its frequency among mutated positions suggests that structural and functional considerations play a much more important role than the presence of AID hotspots.
Our analysis of SHM, germline and general protein-protein interfaces suggested some evolutionary insights. Tyrosine and tryptophan, which are large, flexible, amphipathic amino acids, were previously suggested to be highly represented in the Ag interfaces, and have been proposed to allow binding of several structurally similar Ags (Mian I S, Bradwell A R & Olson A J (1991) Structure, function and properties of antibody binding sites. J Mol Biol 217, 133-151.) However, the affinity maturation process decreases their representation and increases the representation of aliphatic hydrophobic amino acids. Both SHM contacts and protein-protein contacts are the result of specific evolution and optimization of contacts, while germline-Ag contacts occur between partners that have never met before. This may explain the abundance of germline interface residues that may form several different kinds of contacts, and also the higher similarity between protein-protein interfaces and SHM contacting residues. This observation is consistent with a previous study that suggested that Ab affinity maturation and protein-protein interface evolution are guided by similar principles (J Riot Chem 285: 3865-3871).
The ΔΔG values in this study were predicted by FoldX (Guerois R, et al. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations:, 369-387 (2002) (hereby incorporated by reference in its entirety). While there may be other tools that allow energetic assessment of individual mutations, FoldX enables rapid assessment of a large number of SHMs. An independent assessment has shown that FoldX is particularly good in assessment of the energetic effect of mutations to amino acids other than alanine and mutations of residues located in loops (Potapov V, et al., Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details 553-560 (2009). Previous studies have shown that FoldX may be used to identify trends in the evolution of protein function (Tokuriki N, et al., How protein stability and new functions trade off PLoS Comput Biol 4, e1000002 (2008); Tokuriki N, et al., The stability effects of protein mutations appear to be universally distributed 1318-1332 (2007)). Furthermore, it has recently been used for the study Ab-Ag interactions (Kunik V, et al. Structural consensus among antibodies defines the antigen binding site. PLoS Comput Biol 8, e100238 (2012). Kunik V & Ofran Y. The indistinguishability of epitopes from protein surface is explained by the distinct binding preferences of each of the six antigen-binding loops: 599-609 (2013). The FoldX energy function also includes scoring parameters for the entropic cost of mutation. However, these parameters are calculated based on theoretical data and have been acknowledged to be a crude estimation of the entropy (Schymkowitz J, et al. The FoldX web server: an online force field. Nucleic Acids Res 33: W382-W388 (2005). It has been shown that loss of flexibility in the Ab paratope and thus a lower entropic cost of the interaction is an important aspect in Ab affinity maturation (Wong S E, et al. Effects of somatic mutations on CDR loop flexibility during affinity maturation, Proteins 79: 821-829 (2011); Schmidt AG, et al., Preconfiguration of the antigen-binding site during affinity maturation of a broadly neutralizing influenza virus antibody, Proc Natl Acad Sci USA 110: 264-269 (2013). Thorpe I F & Brooks C L 3rd , Molecular evolution of affinity and flexibility in the immune system, Proc Natl Acad Sci USA 104 8821-8826 (2007). Quantification of such effects requires long molecular dynamics simulations or experimental procedures. Such methods are not applicable for a large number of Ab-Ag complexes, thus the estimation of paratope rigidification is beyond the scope of this study.
The Ab-Ag dataset we used consists of 196 non-redundant Ab-Ag complexes. As more Ab-Ag complexes become available, it will be possible to also apply this approach to Ab-hapten interaction, which is currently not practical, and even to the interfaces with specific Ags such as gp120 or hemagglutinin, to elicit SHM patterns that are unique for that Ag. For example, it has recently been shown that Abs that broadly neutralize HIV are characterized by a remarkably high number of SHMs (Scheid J F, et al., Broad diversity of neutralizing antibodies isolated from memory B cells in HIV-infected individuals, Nature 458: 636-640 (2009); Kwong P D & Mascola J R, Human antibodies that neutralize HIV-1: identification, structures, and B cell ontogenies, Immunity 37: 412-425 (2012); Wu X, et al., Rational design of envelope identifies broadly neutralizing human monoclonal antibodies to HIV-1, Science 329: 856-861 (2010). and may require also SHMs in their framework regions (Klein F, et al. Somatic mutations of the immunoglobulin framework are generally required for broad and potent HIV-1 neutralization, Cell 153: 126-138 (2013).
Over recent decades, Abs have become one of the most effective and popular tools in biotechnology and biomedicine (Maynard J & Georgiou G, Antibody engineering, Annu Rev Biomed Eng 2: 339-376 (2000)) and more than 30 Abs and Ab derivatives have been approved for therapeutic use by the US Food and Drug Administration (Beck A, et al., Strategies and challenges for the next generation of therapeutic antibodies, Nat Rev Immunol 10: 345-352 (2010). Therapeutic and diagnostic Abs frequently require engineering to enhance the affinity of Abs raised in immunized animals or selected by library screens. This step is important to expand detection limits, extend dissociation half-life, decrease drug dosage and increase drug efficiency (Lippow S M, et al., Computational design of antibody-affinity improvement beyond in vivo maturation, Nat Biotechnol 25: 1171-1176(2007). The structural and biophysical principles identified here may allow more focused in vitro design of Abs with enhanced affinities for use in building the libraries of the invention.
A model of the antigen of interest, in this case IL-17A in the receptor bound conformation, was generated using Modeler as implemented in the Discovery Studio suite.
The model was then docked against a large database of antibody three-dimensional structures using ZDOCK as implemented in Discovery Studio. Various poses were screened in order to identify poses that have “native like” properties. For the IL-17A antibody, poses providing optimal blocking of the binding site of the IL17AR were sought. A docking pose of antibody 2ZJS (PDB id) and the model of IL-17A was selected as a template for library design.
Positions within the CDRs of the antibody were selected for the introduction of variability for library design according to the methods described infra. For the initial library based on the 2ZJS antibody from the PDB, docked to IL-17A as described above, five positions were selected for variation (1 on chain H, 4 on chain L), yielding a library with diversity (at the amino acid level) of ˜500,000. In addition to the 2ZJS-based library, other libraries were designed based on the docking models with the following PDB structures: 2ADG, 1GPO, 3A6C, 3C09, 1DFB Libraries based on 2ADG and 2ZJS yielded IL-17A binders.
The initial selection of libraries (2ZJS and 2ADG) against IL-17A yielded several clones that bound the antigen specifically with sub-micromolar affinity, based on titrations performed on the yeast.
After each round of selection the surviving clones were deep-sequenced to analyze which variants are subject to selective pressures and which substitutions are favored or disfavored in the various positions. The results of this analysis were used to design an improved library. Briefly, positions that are under selective pressure (i.e. mutations in these positions improve or hamper binding) are positions that have an effect on the interface. This information can be used to refine the original model of antibody-antigen complex, and, in turn will allow another iteration of the process described above, yielding new libraries with more focused variations.
Clones selected from this library as IL-17A binders, were utilized as the basis for the introduction of additional variation to improve affinity and utility. Specific positions within the antibody were selected based on sequence analysis (for example, Blast), positions suggested in the literature, and/or the analysis of deep sequencing data from the initial library. Based on these analyses, a next-generation library was designed.
In this particular case, we were able to identify several positions in two of the libraries that were under selection. For example, in the library that was based on 2ZJS we observed that in two neighboring positions we saw a clear overrepresentation of aromatic residues. This round of selection culminated in a scFv that show full cross blocking of the soluble IL17Ra.
Additional analysis of the soluble scFv has shown that it does not only bind the IL17a but is also highly thermo-stable, as shown in
A critical question, therefore, in designing synthetic libraries is to what extent the resulting Abs are similar to natural Abs in the way they recognize and bind the Ag. Indeed, good therapeutic biomolecules do not have to mimic natural Abs. However, it is often assumed that libraries that better mimic natural Abs and natural diversity are more likely to yield better binders with better profile. Some novel approaches for library design attempt to introduce diversity that will better imitate natural diversity while also yielding Abs with improved biophysical properties. For example, the human combinatorial antibody library (HuCAL) was created to represent the most frequently used germline families and was optimized to obtain high expression and low aggregation in E. coli. The CDRs cassettes were designed to mimic the length and amino acid composition of naturally occurring Abs (Knappik A, et al., J Mol Biol 296:57-86 (2000); Rothe C, et al., J Mol Biol 376:1182-200 (2008)) (herein incorporated by reference in its entirety). Sidhu et al.(Sidhu S S, et al. J Mol Biol 338:299-310 (2004) (herein incorporated by reference in its entirety)) used a single stable framework scaffold to introduce diversity to the heavy chain, based on the observed propensities of amino acids in CDRs of natural Abs. Another strategy was to amplify only the CDR sequences from naïve B cells and randomly combine these CDRs into a selected Ab framework that can be highly expressed in bacterial system (Soderlind E, et al. Nat Biotechnol 18:852-6 (2000) (herein incorporated by reference in its entirety)). Further understanding of key properties of naturally existing Abs will help Ab engineering technologies to obtain more promising therapeutic Abs candidate.
Here, we compare synthetic Abs to natural Abs to assess to what extent synthetic Abs indeed mimic natural ones. This comparison allowed us to review and revise common assumptions about Ab-Ag interaction. We employ a novel computational tool we developed, “CDRs analyzer” to explore biophysical characteristics of Abs. In this analysis, natural Abs are Abs that originated from hybridoma or from immunized or naïve libraries, and synthetic Abs are Abs that were selected from a synthetic library (i.e., a library that is not naïve or immunized). We found that synthetic Abs rely on CDRH3 significantly more than natural Abs. The binding contribution of CDRH1 and CDRH2 of synthetic Abs is smaller than their contribution in natural Abs. When analyzing the binding mode, we found that epitopes of natural Abs contain many epitope residues that contact multiple CDRs, while epitopes of synthetic Abs have more residues that contact only one CDR. These results show that the current way in which synthetic libraries are designed often yields Abs that do not mimic the way in which natural Abs bind their Ags. Our analysis suggests a set of considerations for library design that will take better advantage of the binding possibilities offered by the structure of the Ab. We discuss how this can yield libraries with more effective binders and with greater diversity of paratopes.
B.1 Construction of Natural Ab-Ag Complexes Datasets
To build a large non-redundant set of natural Abs, a previously published non-redundant dataset of 196 Ab-Ag complexes (Burkovitz, A. et al., FEBS J 281:306-19 (2014) (herein incorporated by reference in its entirety)) was further filtered to create the current study dataset of natural Ab-Ag complex. The “CDRs Analyzer” cannot analyze scFv Abs, Abs that contain disorder residues in the CDRs or non-standard amino acids, complexes that were solved by NMR and complexes composed of more than 25000 atoms. Complexes that met these conditions were deleted from the original dataset. In addition, complexes that included synthetic Abs were moved to the synthetic Ab-Ag dataset. Finally, complexes that contain Ag with length of ≦30 amino acids were also removed. The resulting dataset contained 101 natural Ab-Ag complexes (Table S1).
B.2 Construction of Synthetic Ab-Ag Complexes Datasets
A synthetic Ab-Ag complexes dataset was constructed using both the PDB32 and sAbDab. (Dunbar, J. et al. Nucleic Acids Res 42:D1140-6 (2014) (herein incorporated by reference in its entirety). The PDB query search was used to curate manually synthetic Ab-Ag complexes. The PDB query type was set to “PubMed abstract” and search words were “phage display antibody” and “library antibody”. In addition, the sequences of the light chain, the heavy chain or the full variable domain of a representative synthetic Ab (PDBID:2H9G) was used to search the sAbDab database using the framework region only option. The retrieved PDB entries were considered synthetic Ab-Ag complexes if the library from which it was isolated included variable domains sequences that were not obtained from a natural repertoire. Two Ab-Ag complexes were considered redundant in case the two Abs bound the same Ag at a similar epitope. Redundancy was removed according to this criterion. We removed from the dataset complexes that contain scFvs, Ag length ≦30 amino acids, Abs that contains disordered residues in the CDRs or non-standard amino acids, complexes with resolution ≦3.6A° and complexes that are composed of more than 25000 atoms. The final synthetic Ab-Ag complexes dataset contained 36 non-redundant PDB entries.
B.3 Analyzing Ab-Ag Complexes Using “CDRs Analyzer”
CDRs analyzer takes as an input an X-ray structure of Ab-Ag complex in a PDB file format. It automatically identifies the CDRs residues and calculates a set of parameters for all six CDRs. The output is an HTML page presenting the calculated parameters (described below) for each of the CDRs, a list of contacting residues and list of specific interactions. “CDRs Analyzer” was implemented in Perl and Python. The front end of the server is designed in HTML and XML.
B.3.1 CDRs Identification
The CDRs are identified using Paratome. (Kunik, V. et al., Nucleic Acids Res 40:W521-4 (2012); Kunik, V. et al. PLoS Comput Biol 8:e1002388 (2012) (herein incorporated by reference in their entirety) An Ag-contacting residue within ±15 residues from the Ag binding region boundaries as defined by Paratome is added to the nearest DR. An Ag-contacting residue is a residue on the Ab that has at least one non-hydrogen atom within 5A from a non-hydrogen atom in the Ag.
B.3.2 Number of Contacting Residues
The number of “contacting residues” is the number of residues in a CDR that are in contact with the Ag and the number of residues in the Ag that are in contact with the CDR.
B.3.3 Number of Specific Interactions
The number of “specific interactions” is the sum of the number of salt-bridges, pi-pi, cation-pi and possible H-bonds (McDonald, I. K. et al., J Mol Biol 238:777-93(1994) (herein incorporated by reference in its entirety)) between the CDR and the Ag. A salt bridge is defined as one Asp or Glu side-chain carboxyl oxygen atom and one side-chain nitrogen atom of Arg, Lys or His that are within 4.0 Å of each other. H-bonds were identified by first adding polar hydrogens atoms to the complex using Discovery Studio Visualizer and then by submitting the output file to HBPLUS program with default parameters. (McDonald, I. K. et al., J Mol Biol 238:777-93 (1994) (herein incorporated by reference in its entirety)) Pi-pi interactions are identified according to McGaughey et al. (McGaughey, G. B. et al. J Biol Chem 273:15458-63 (1998) (herein incorporated by reference in its entirety)
Briefly, the distance between the centroid of each pair of pi rings should be 8A or less, at least one atom from each ring should be within 4.5 Å. In addition, the angle theta between the normal of one or both rings and the centroid-centroid vector must fall between 0 and ±60 degrees. The angle lambda between the normal of each ring must fall between 0 and ±30 degrees. A cation-pi interaction is defined if: Lys or Arg side chains cations are within 7 Å from a centroid of a pi ring. The perpendicular distance between the cation and the plane of the ring is within 6 Å and the angle between the cation-centroid vector and the ring plane is more than 45 degrees.
B.3.4 Energy Calculations(ΔΔG)
The effect of in-silico mutation of each CDR residue to ALA is calculated using FoldX. (Guerois, R. et al., Journal of Molecular Biology 320:369-87 (2002) (herein incorporated by reference in its entirety)) FoldX's calculations have been previously shown to be correlated to experimentally measured results of 1030 mutants (R=0.83).(Guerois, R. et al. Journal of Molecular Biology 320:369-87 (2002)) A recently published study curated 1100 mutations in Ab-Ag complexes and examined the performance of different energy scoring methods.(Sirin, S. et al. Protein Sci 2015(herein incorporated by reference in its entirety).
FoldX was one of the top performers in that study, on both destabilizing (ΔΔG>1.0 kcal/mol) and stabilizing (ΔΔG<-1.0 kcal/mol) mutations.
Each PDB structure is first optimized using the FoldX RepairPDB function. Then, residues in the CDR are mutated to Ala using the BuildModel function that generated mutants and their corresponding wild-type structure models. The heavy chain and the light chain of the Ab are grouped together to calculated the energy values of the assembled Ab, and the AnalyzeComplex function is used to calculate the binding ΔG of each model. The calculated ΔΔG for each mutant is then computed by subtracting the wild-type calculated ΔG value from the mutant calculated ΔG value. The “ΔΔG” of a CDR is considered as the sum over its residues. The “CDRs Analyzer” outputs the ranking of the six CDRs according to the ΔΔG values.
B.3.5 Delta Relative Surface Accessibility (ARSA)
RSA is given by dividing the solvent accessibility value by the surface area of the given amino acid. (Chothia, C., J Mol Biol 105:1-12 (1976) (herein incorporated by reference in its entirety)). The solvent accessibility of the Ab residues are calculated using DSSP program. (Kabsch, W. et al., Biopolymers 22:2577-637 (1983) (herein incorporated by reference in its entirety). RSA is computed for each of the residues in the CDR, once with Ag presence (RSAbound) and once without Ag presence (RSAunbound). The ARSA is given by subtracting the RSAQbound from the RSAbound. The ARSA of a CDR is considered as the sum over its residues.
B.3.6 Binding Contribution Score
To evaluate the involvement of each CDR in Ag recognition we used an estimated calculation, which sums the four parameters values into a single “binding contribution score”. For each of the four binding parameters above, values are normalized and scored according to their quartiles: 4 points for values within the top 25% of the scores, 1 for the values within the lowest 25%. The “binding contribution score” of a given CDR is the sum of the scores over its criteria varies from 4 (no contribution to Ag binding) to 16 (highest contribution to Ag binding). The binding contribution calculation gives an equal weight for the four binding parameters. When more structural data becomes available, these weights should be assessed and optimized. To verify that the score is not sensitive for arbitrary cutoffs, we checked different binding contribution scores by dividing the parameters values into bins of thirds and fifths (instead of quarters). This did not change the results.
B.3.7 Independent and Integrated Ag Residues
An “independent residue” is an Ag residue that is in contact with residues that belong to only one CDR. An “integrated residue” is an Ag residue that is in contact with at least three CDRs. These parameters are used by the “CDRs Analyzer” to calculate the “Independent binding score”, which measure the potential of a given CDR to bind the Ag as peptide. (Burkovitz, A. et al., J Immunol 190:2327-34 (2013) (herein incorporated by reference in its entirety)). For that purpose, the percentage of independent or integrated residues for a given CDR was calculated out of Ag residues contacting that CDR. Here, we aimed to evaluate the complexity of the Ab-Ag interaction. Thus, the percentage of independent or integrated residues were calculated out of the total number of the epitope residues.
B.3.8 Independent Binding Score
The six parameters above (contacting residues, specific interactions, ΔΔG, ARSA, percentage of Independent and integrated Ag residues) are used to evaluate the potential of a CDR to bind the Ag as peptide. (Burkovitz, A. et al., J Immunol190:2327-34 (2013) (herein incorporated by reference in its entirety)) The values of each of the parameters are normalized and scored according to their quartiles: 4 points for values within the top 25% of the scores, 1 for the values within the lowest 25%. The “Independent binding score” of a given CDR is the sum of the scores over its six criteria.
C.1 Data Sets of Natural and Synthetic Abs
Analyzing the Protein Data Bank (PDB) (Berman H M, et al., Nucleic Acids Research 28:235-42 (2000) (herein incorporated by reference in its entirety)) in search of a non-redundant set of natural or synthetic Abs (Methods) yielded a total of 137 Ab-Ag complexes. Of these, 101 are natural (Table S1) and 36 are synthetic (Table S2).
C.2 “CDRs Analyzer”—A Computational Framework for Exploring Ab-Ag Interactions.
The analysis utilized “CDRs Analyzer”, a computational tool we introduce for analyzing Ab-Ag interfaces. It is designed to assist Ab engineering by providing quantitative assessment of the biophysical properties of each residue and each CDR in the paratope. “CDRs Analyzer” takes as input a 3D structure of an Ab-Ag complex in a PDB format and the chain IDs of the Ab and Ag chains to be analyzed. The server provides output both at the residue and at the CDRs levels. The output includes a list of H-bonds (calculated by HBPLUS (McDonald I K, and Thornton J M, J Mol Biol 238:777-93 (1994) (herein incorporated by reference in its entirety)), salt-bridges, pi-pi and cation-pi interactions, and a list of contacting residues (see Methods). Additionally, “CDRs Analyzer” calculates, for each CDR, four parameters to evaluate its contribution to Ag binding: (1) “Contacting residues” is the sum of the number of residues in the CDR that are in contact with the Ag and the number of residues in the Ag that are in contact with the CDR; (2) “Specific interactions” is the number of salt-bridges, pi-pi and cation-pi interaction and the number of possible H-bonds between the CDR residues and the Ag; (3) “Calculated ΔΔG” is the predicted effect on binding of mutating each CDR residue to ALA calculated using FoldX (Guerois, R. et al., Journal of Molecular Biology 320:369-87 (2002) (herein incorporated by reference in its entirety)) and (4) “delta relative accessible surface area (ARSA)” is the sum of the changes in the relative solvent accessibility of each CDR residue upon dissociation of the Ab-Ag complex calculated using DSSP (Kabsch, W. and Sander, C., Biopolymers 22:2577-637 (1983) (herein incorporated by reference in its entirety)). These four binding parameters were combined to give a score that assesses the contribution to Ag binding of a given CDR. This score varies from 4 (no contribution to Ag binding) to 16 (highest contribution to Ag binding; see Methods). It is a unified score that gives an equal weight for the four binding parameters. Ideally, as more structural data become available, the weight that each parameter should have in the final score can be explored and optimized. The binding contribution score is a combined measurement of the Ag binding portion of a given CDR relative to other CDRs of the Ab.
Additionally, “CDRs analyzer” provides the potential of a CDR to bind the Ag as peptide, based on a computational approach that was described previously (Burkovitz A, et al., J Immunol190:2327-34 (2013) (herein incorporated by reference in its entirety)). “CDRs Analyzer” is available online in http://www.ofranlab.org/CDRs_Analyzer.
C.3 Synthetic Abs Rely Heavily on CDRH3 at the expense of CDRH2 and CDRH1.
CDRH3, which encompasses the V-D-J recombination site, is the most diverse component of natural Abs. As shown in Table A1, in natural Abs CDRH3 has, on average, higher values than any other CDR, for all of the four parameters that were assessed.
C.4 Unlike Synthetic Abs, CDRs in Natural Abs Specialize in Specific Types of Contacts
“CDRs Analyzer” also provides a list of specific contacts (H-bonds, salt bridges, cation-pi or pi-pi). The distribution of each type of interaction across the six CDRs is shown in
In natural Abs, each CDR on the heavy chain specializes in different types of interactions (Kunik, V. and Ofran, Y. Protein Eng Des Sel 2013(herein incorporated by reference in its entirety)). As shown above, CDRH2 is responsible the largest share of salt-bridges (39.66%). CDRH3 is the main source for H-bonds (30.14%) and all heavy chain CDRs take similar parts of the cation-pi interactions (20.57%, 22.7% and 26.24% of cation-pi interactions from CDRH3, CDRH1 and CDRH2, respectively). This differentiation and specialization is lost for synthetic Abs. For the Abs that emerge from synthetic libraries, CDRH3 takes the central role in all analyzed interactions. CDRH2 has an equal share as CDRH3 only in cation-pi contacts.
C.5 The Focus of Synthetic Abs on CDRH3 Creates Interfaces that are Less Complex and More Modular.
We evaluate the complexity of Ab-Ag interaction using two parameters: independent epitope residue and integrated epitope residues. These parameters reflect the extent to which the six CDRs create an integral interface. An epitope residue on the Ag is considered an “independent residue” if it contacts only one CDR. An epitope residue that contacts three or more different CDRs is considered as an “integrated residue”. To assess the complexity of Ab-Ag interaction, the percentage of integrated and independent residues out of all residues that contact the paratope are calculated (note, however, that the raw output of the “CDRs Analyzer” provides this calculations as a percentage of the residues that contact a given CDR and not as a percentage of the residues that contact the entire paratope, see methods). On average, 57.49% of the epitope residues of natural Abs are independent (that is contact only one CDR). Whereas epitope of synthetic Abs are composed of 63.09% independent residues (
C.6 Demonstrating the Differences Between Synthetic and Natural Abs
Synthetic libraries are clearly successful in yielding specific binders that often become successful drug leads. Here, we ask to what extent the products of these libraries mimic natural Abs. One may argue that, as long as the leads are successful, there is no need for the libraries to mimic natural Abs. However, our analysis can be important in two ways: first, as a basic science endeavor, it helps reveal the principles that guide natural Ab-Ag interaction. Second, revealing these principles suggests new avenues that may make synthetic libraries even more potent. While the dataset of synthetic Abs is smaller than that of the natural Abs, the dataset represent a diverse collection of synthetic Abs isolated from a variety of generic (e.g. HuCAL (Knappik, A. et al., J Mol Biol 296:57-86 (2000) or Lee et al. (Lee, C. V. et al., J Mol Riot 340:1073-93 (2004)) (herein incorporated by reference in their entirety)) or custom made libraries. The synthetic Abs in the dataset bind 30 different Ags, which are varied in their size from 51 to 915 residues. We validate that the Ag recognition occurred in different epitope in case two Abs bind the same Ags. Thus the synthetic Abs dataset represents the current strategies for library design. Obviously, as more synthetic Abs become available this analysis should be repeated to refine the insights and establish their significance further.
Large-scale analysis of Ab-Ag complexes can help reveal the principles that allow Igs to accommodate an exquisitely matching paratope for virtually any surface, while strictly maintaining its overall fold. (Novotn, J. et al., Proc Natl Acad Sci USA 83:226-30 (1986); Sela-Culang, I. et al., Front Immunol 4:302 (2013); Sela-Culang, I. et al., Curr Opin Virol 11:98-102. (herein incorporated by reference in their entirety)) The great challenge of Ab design is to make synthetic libraries that will yield Abs against a wide range of targets and epitopes. Indeed, in vivo Ab development relies on a more complex process, and hence may yield Abs with improved properties. This complex process includes gene rearrangement, somatic hyper mutations, clonal selection, both through positive selection for Ag recognition and negative selection for self-binding. We aimed to identify the differences between the Ag binding mechanism of synthetic Abs and natural Abs, which may help improve library design to yield more natural-line Abs. It also allowed us to revisit common assumptions about the role of CDRH3 in Ag recognition.
Obviously, some individual natural Abs and some individual synthetic Abs may be exceptions to the rule. Yet, our results reveal consistent differences between natural and synthetic Abs. The focus of synthetic libraries on engineering CDRH3 creates CDRH3 loops that participate in Ag recognition above the average of CDRH3 in natural Abs. As a result, CDRs H1 and H2 of synthetic Abs contribute less to Ag binding. CDRH3 loops encompass the V-D-J junction, hence this region displays the largest diversity among the six CDRs of the Abs in terms of length, sequence, and structure (Chothia, C. et al., Nature 342:877-83 (1989); Kuroda, D. et al., Proteins 73:608-20 (2008); Morea, V. et al., J Mol Biol 275:269-94 (1998) (herein incorporated by reference in their entirety)). CDRH3 is also located at the center of the binding site and is the CDR loop that undergoes the most significant conformational changes upon binding (Sela-Culang, I. et al., J Immunol 189:4890-9 (2012) (herein incorporated by reference in its entirety)) Thus, it is commonly assumed that CDRH3 accounts for the ability of Abs to recognize and bind specific epitopes. Understandably, Ab engineering methods often focus on CDRH3. For example, Fellouse et al. designed phage display libraries with diversity of 104 to 1022 in CDRH3 and diversity of 32 to 896 in other CDRs. (Fellouse, F. A. et al., J Mol Biol 373:924-40 (2007) (herein incorporated by reference in its entirety)) In the initial HuCAL libraries,(Knappik, A. et al., J Mol Biol 296:57-86(2000) (herein incorporated by reference in its entirety)) diversity beyond the 49 chosen frameworks was introduced only to CDRH3 and CDRL3. In other studies, specific Abs were obtained from libraries with introduced diversity only to CDRH3. (Mahon, C. M. et al. J Mol Biol 425:1712-30 (2013); Braunagel, M. and Little, M. Nucleic Acids Res 25:4690-1 (1997); der Maur, AAet al., J Biol Chem 277:45075-85 (2002) (herein incorporated by reference in their entirety).
However, the relative importance of CDRH3 compared to other CDRs has been recently revisited in numerous studies. Large scale analyses (Kunik, V. and Ofran, Y., Protein Eng Des Sel 2013; Robin, G. et al, J Mol Biol 426:3729-43(2014) (herein incorporated by reference in their entirety)) of Abs have assessed the role of CDRH3. It has been demonstrated that CDRH2 may be as important as CDRH3(Robin, G. et al. J Mol Biol 426:3729-43(2014(herein incorporated by reference in its entirety))) in its contribution to the binding free energy of the Ab-Ag complex. In addition, in 93% of the Ab-Ag complexes, CDRH2 contained at least one residue with high energetic contribution (ΔΔG>0.8 kcal/mol) in comparison to 90% of the complexes with such residues from CDRH3. In another study, CDRH3 was found to be responsible for 30.6% of the energetically important Ag-binding residues.(Kunik, V. and Ofran, Y., Protein Eng Des Sel 2013. (herein incorporated by reference in its entirety)) That is, most of the energetically important Ag-binding residues come from other CDRs. This has been shown also for specific examples like the interaction between HyHEL-10 and lysozyme, in which CDRH2 and CDRL1 display a dominant role, while CDRH3 shows very low binding contribution.(Burkovitz, A. et al., J Immunol 190:2327-34 (2013) (herein incorporated by reference in its entirety)) The fact that CDRH3 is not necessary for the versatility of Abs was ultimately demonstrated by a study that has shown that synthetic libraries can yield specific Abs against different Ags with diverse CDRL3 and fixed CDRH3. (Persson, H. et al., J Mol Biol 425:803-11(2013) (herein incorporated by reference in its entirety)) In another study, the introduction of diversity into the sequence of anti ErbB2 Ab only at CDRH3 did not result in affinity enhanced variant, while beneficial mutants could be obtained by engineering one of the other contacting CDRs (CDRH1,H2,L1 or L3). (Hu, D. et al., PLoS One 10:e0129125 (2015) (herein incorporated by reference in its entirety)) This emphasizes that the importance of CDRH3 differ between Abs.
The reliance of synthetic Abs on CDRH3 may take a toll on the diversity of the epitopes that the library can bind, which may be referred to as the effective diversity of the library (as opposed to its actual diversity, represented by the number of unique sequences). Existing synthetic libraries tend to yield Abs with CDRH3 dominance. The typically fixed length and sequence of the other loops does not allow for paratopes with other binding topologies. It is therefore possible that, while the number of variants in the library may be higher than the number of variants in natural repertoires, these synthetic Abs represent only a small subset of the possible Abs that would be represented in a much smaller natural set of Ab sequences.
The effective diversity of a library is not the number of unique Ab sequences it has, but the number of different epitopes they can bind. This is defined by how many of the variants are expressed and fold into Abs with paratopes that are very different from each other. Our results suggest that tampering only with CDRH3 may not be a good way to obtain diverse paratopes. Based on the results presented here, one can propose approaches for improving Ab engineering. Building libraries that allow for higher diversity in all CDRs may result in Abs that have binding modes that are more similar to those of natural Abs, which might increase the effective diversity of the library and culminate in higher success rates. Of note is the degeneration of CDRH2 and CDRH1 in synthetic Abs, most remarkably in the percentage of salt-bridges coming from these CDRs and H-bonds and cation-pi interactions from CDRH1. To correct for this and create better libraries, the amino acid composition in these CDRs should be corrected to favor these types of interactions. This could be achieved by elevating the propensity of charged amino acids in CDRH2 and CDRH1 to produce more salt bridges or elevating the propensity of aromatic, positively charges or polar amino acids in CDRH1 to produce more cation-pi and H-bonds. It is also possible that the frameworks that are commonly used in synthetic libraries are suitable for producing interactions that rely on CDRH3. Considering additional frameworks may, therefore, be beneficial.
A novel approach for the design of synthetic libraries is based on the diversity of natural Ig repertoire (naïve, memory and plasma B-cells), which can be characterized using next generation sequencing (NGS). (Glanville, J. et al., J Proc Natl Acad Sci USA 106:20216-21 (2009); Zhai, W. et al. , J Mol Biol 412:55-71(2011)
Glanville et al. (Zhai, W. et al., J Mol Biol 412:55-71 (2011) (herein incorporated by reference in their entirety)) analyzed ˜105 sequences of Ab variable fragments from 654 healthy human donors and, consistent with our finding, reported a substantial contribution to total diversity from somatically mutated residues in CDRs 1 and 2. Based on these results, a synthetic Ab library was constructed by introducing a diversity at positions across the six CDRs while the amino acid usage in each position was design to mimic the natural repertoires usage. (Zhai, W. et al., J Mol Biol 412:55-71 (2011) (herein incorporated by reference in its entirety)) The 3D structure of the Ab-Ag complexes that were selected by these modern libraries are still not available. We expect that the relative binding contribution of the different CDRs in these synthetic Abs will better mimic the natural Ab binding mechanism than the synthetic Abs analyzed in the current study.
Although there are many available tools for the automated analysis of Abs sequences, (Kaas, Q. et al., Nucleic Acids Res 32:D208-10 (2004); Ehrenmann, F. et al., Nucleic Acids Res 38:D301-7 (2010); Kunik, V. et al., Nucleic Acids Res 40:W521-4 (2012); Abhinandan, K. R. et al., J Mol Biol 369:852-62 (2007); Ye, J. et al. Nucleic Acids Res 41:W34-40 (2013); Retter, I. et al Nucleic Acids Res 33:D671-4 (2005) (herein incorporated by reference in their entirety)) the development of tools for the structural analysis of Ab-Ag complexes is still in its infancy. Two existing tools that provide comprehensive structural analysis of Abs are ABangle, for calculating the orientation between the VH and the VL, (Dunbar, J. et al., Protein Eng Des Sel 26:611-20 (2013) (herein incorporated by reference in its entirety)) and the “AbAgDb dataset”, which contains interaction profiles of ˜500 Ab-Ag complexes in the PDB. (Kulkarni-Kale, U. et al., Methods Mol Biol 1184:149-64 (2014) (herein incorporated by reference in its entirety)). In the “AbAgDb”, the data is available only for the curated PDBs and most of the output is at the atoms or residues level and not at CDRs level, similarly to tools analyzing general protein-protein interactions. (Tina, K. G. et al., Nucleic Acids Res 35:W473-6 (2007); Laskowski, R. A. et al. Trends Biochem Sci 22:488-90 (1997)) (herein incorporated by reference in their entirety).
“CDRs Analyzer” is designed to assist Ab engineering protocols by providing quantitative assessment of the biophysical properties both at the loop level—by assessing the contribution of each CDR—and at the residue level by identifying specific positions of interest within interface. Here, we used “CDRs Analyzer” to explore the differences between natural and synthetic interactions. This tool can be used to analyze Abs against pathogenic Ags or human-self Ags, to explore the theory that V-genes are evolutionarily pre-configured to recognize common motifs in Ags from pathogenic source. “CDRs Analyzer” can also be applied to characterize other sets of immunological interactions. For example, it allows evaluation of the differences in binding properties of peptide-binding Abs and protein-binding Abs, or the differences between different families of Abs or even differences between Abs against different Ags. However, the most straightforward way to use “CDRs Analyzer” is for the analysis of individual Abs. It is applicable for experimentally solved Ab-Ag complexes as well as to computational models of such complexes. The output of “CDRs Analyzer” can assist different Ab engineering protocols. The contacting residues list and the specific interactions list can guide choosing specific positions for Ab affinity enhancement, decreasing aggregation or for deimmunization. The CDRs binding contribution may be an important consideration for CDR grafting, Ab humanization, design of two-in-one Abs and for identifying CDR-derived peptides. (Burkovitz, A. et al. J Immunol 190:2327-34 (2013) (herein incorporated by reference in its entirety)).
1Fellouse F A, Li B, Compaan D M, Peden A A, Hymowitz S G, Sidhu S S. Molecular recognition by a binary code. J Mol Biol 2005; 348: 1153-62.
2 Lee C V, Liang W C, Dennis M S, Eigenbrot C, Sidhu S S, Fuh G. High-affinity human antibodies from phage-displayed synthetic Fab libraries with a single framework scaffold. J Mol Biol 2004; 340: 1073-93.
3 Liang W C, Dennis M S, Stawicki S, Chanthery Y, Pan Q, Chen Y, et al. Function blocking antibodies to neuropilin-1 generated from a designed human synthetic antibody phage library. J Mol Biol 2007; 366: 815-29.
4Rothe C, Urlinger S, Löhning C, Prassler J, Stark Y, Jäger U, et al. The human combinatorial antibody library HuCAL GOLD combines diversification of all six CDRs according to the natural immune system with a novel display method for efficient selection of high-affinity antibodies. J Mol Biol 2008; 376: 1182-200.
5Knappik A, Ge L, Honegger A, Pack P, Fischer M, Wellnhofer G, et al. Fully synthetic human combinatorial antibody libraries (HuCAL) based on modular consensus frameworks and CDRs randomized with trinucleotides. J Mol Biol 2000; 296: 57-86.
6Fellouse F A, Esaki K, Birtalan S, Raptis D, Cancasci V J, Koide A, et al. High-throughput generation of synthetic antibodies from highly functional minimalist phage-displayed libraries. J Mol Biol 2007; 373: 924-40.
7 Hoet R M, Cohen E H, Kent R B, Rookey K, Schoonbroodt S, Hogan S, et al. Generation of high-affinity human antibodies by combining donor-derived and synthetic complementarity-determining-region diversity. Nat Biotechnol 2005; 23: 344-8.
8 Birtalan S, Fisher R D, Sidhu S S. The functional capacity of the natural amino acids for molecular recognition. Mol Biosyst 2010; 6: 1186-94.
9 Bostrom J, Yu SF, Kan D, Appleton B A, Lee C V, Billeci K, et al. Variants of the antibody herceptin that interact with HER2 and VEGF at the antigen binding site. Science 2009; 323: 1610-4.
10Persson H, Ye W, Wernimont A, Adams J J, Koide A, Koide S, et al. CDR-H3 diversity is not required for antigen recognition by synthetic antibodies. J Mol Biol 2013; 425: 803-11.
11 Lee C V, Sidhu S S, Fuh G. Bivalent antibody phage display mimics natural immunoglobulin. J Immunol Methods 2004; 284: 119-32.
The antibody in this example binds to the human P2X4. Methods to re-epitope the antibody to introduce improved binding were developed. Strategies based on sequence, structural, and biological data were implemented to generate libraries that yielded improved Abs.
The first strategy for library design was based on sequence analyses of the antibody of this example in order to identify positions that play a key role in the native paratope as well as positions and specific variants that may contribute to a re-epitoped interface. Positions were selected for variation if they were in the CDRs, as defined by Paratome and/or Kabat, and if they were not conserved based on sequence alignments of homologs obtained by a Blast search of the pdb database. A total of 50 positions spanning CDRs in both the H and L chains were selected. Each position that was selected was varied independently, using an NNK codon (When N denotes any of the four standard nucleotides and K denotes Guanine or Thyamine), such that the library was made up of clones with single mutations. In addition, a library of clones with double mutations, one in the H chain and one in the L chain was constructed and cloned into a phage display plasmid.
Following three rounds of selection against P2X4 lipoparticles, as well as ‘null’ lipoparticles (i.e., lipoparticles that do not present the receptor), the libraries underwent deep sequencing to identify positions and variants that were variable or conserved under the different selection positions.
Standard sequencing identified a variant with increased affinity towards P2X4, which contained two mutations (one in each chain). This variant was expressed as soluble scFv and as IgG and the binding affinity was measured using standard techniques.
A second strategy for library design was to select positions for variation based on a combination of sequence, structure, and biological data, which are predicted to form surface patches on the Ab. Variation at each of these patches, or clusters of residues, may yield insight into the native paratope, as well as specific variants that contribute to binding and/or are relevant for re-epitoping. As this strategy includes prediction of surface patches, a three-dimensional model of the antibody is required.
Alternatively, one of the P2X4 library designs (based on P2X4 binder) is based on SHM data (Burkovitz, A. et al. FEBS J, v. 281, p. 306-19 (2014); Kunik, V. et al., Nucleic Acids Res, v. 40: England, p. W521-4 (2012a)(hereby incorporated by reference in their entirety). SHM data is used to choose positions to vary, and the data that describes the frequencies of the observed Ag-binding amino acid per CDR is used to choose variation at each position. This design does not depend on a 3D model of the antibody, and can be useful for designing a general library that can be used for different targets. Any germline sequence or an antibody with known favorable experimental properties can be used.
Several models of the antibody of this example were generated. Modeling was performed with the Antibody Modeling Protocols in Discovery Studio and in MOE. One of the models underwent further refinement by energy minimization.
Positions for variation were selected if they met the following criteria: 1) High probability of mutation from germline based on data in Burkovitz et al (greater than 0.2 frequency); 2) defined as a CDR by Paratome; 3) Are >10% solvent accessible in the antibody model. As H3 isn't represented fully in the data from Burkovitz et al, all positions in H3 were included. Residues that were predicted to be structurally important, for example, forming a salt-bridge within the antibody in the model, or contributing to hydrophobic core packing, even though they have >10% solvent accessibility, were excluded.
Positions that met the above requirements were visually inspected in the models. Groups of 5 of these positions that had spatial proximity were selected for variation with an NNS codon at each position (S denotes Guanine or Cytosine). Five such libraries were constructed, each spanning a distinct cluster of residues, although with some overlap in positions between some of the libraries. The libraries were cloned into phage display system and underwent selection against P2X4 by employing an iterative process of depletion on HEK cells and panning on P2X4 overexpressing HEK cells.
Enriched clones were sequenced and individually tested for binding. Purified scFV-phage fusion of enriched clones were mixed with a negative control scFv-phage particle at a ratio of 1:1000 and underwent one round of panning on P2X4 expressing HEK cells or on negative control HEK cells. Phages were eluted from the cells and the ratio of the tested clone scFv-phage over the negative control scFv-phage was determined. The enrichment of the tested scFv-phage in the course of panning is proportional to binding. This way a re-epitoped clone, displaying improved binding was identified. The next steps will be to purify a soluble scFv and then IgG determine affinity and test for biological activity.
It is to be appreciated that the Detailed Description section, and not the Summary and Abstract sections, is intended to be used to interpret the claims. The Summary and Abstract sections may set forth one or more but not all exemplary embodiments of the present invention as contemplated by the inventor(s), and thus, are not intended to limit the present invention and the appended claims in any way.
The present invention has been described above with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed.
The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying knowledge within the skill of the art, readily modify and/or adapt for various applications such specific embodiments, without undue experimentation, without departing from the general concept of the present invention. Therefore, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance.
The breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
The claims in the instant application are different than those of the parent application or other related applications. The Applicant therefore rescinds any disclaimer of claim scope made in the parent application or any predecessor application in relation to the instant application. The Examiner is therefore advised that any such previous disclaimer and the cited references that it was made to avoid, may need to be revisited. Further, the Examiner is also reminded that any disclaimer made in the instant application should not be read into or against the parent application.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2015/062768 | 11/25/2015 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62085205 | Nov 2014 | US | |
62085210 | Nov 2014 | US |