Much of basic research in the therapeutic space is directed to identifying and developing novel molecules with desirable properties, such as new peptide therapeutics or new peptide immunogens from which to develop new therapeutic antibodies. However, the standard molecular discovery paradigm relies on random sampling using stochastic processes to identify promising functional molecules. These molecule candidates are then taken through multiple rounds of evaluation and testing with the hope that they will have the desired activity, function, pharmacokinetics, and/or other needed characteristics for a certain use. This system, beginning with screening of a random group, often results in failure, with one or more needed characteristics not being met. Thus, what is needed are methods of developing engineered peptides that incorporate elements of computational, chemical, and biological design.
In some aspects, provided herein is an engineered peptide, wherein the engineered peptide has a molecular mass of between 1 kDa and 10 kDa, comprises up to 50 amino acids, and comprises: a combination of spatially-associated topological constraints, wherein one or more of the constraints is a reference target-derived constraint; and wherein between 10% to 98% of the amino acids of the engineered peptide meet the one or more reference target-derived constraints, wherein the amino acids that meet the one or more reference target-derived constraints have less than 8.0 Å backbone root-mean-square deviation (RSMD) structural homology with the reference target.
In some embodiments, the amino acids that meet the one or more reference target-derived constraints have between 10% and 90% sequence homology with the reference target. In some embodiments, they have a van der Waals surface area overlap with the reference of between 30 Å2 to 3000 Å2. In certain embodiments, the combination comprises at least two, or at least five reference target-derived constraints. In some embodiments, the combination of constraints comprises one or more constraints not derived from a reference target. In some embodiments, the one or more non-reference target-derived constraints describes a desired structural, dynamical, chemical, or functional characteristic, or any combinations thereof. In still further embodiments, one or more constraints is independently associated with a biological response or biological function. In some embodiments, at least a portion of the atoms in the engineered peptide associated with a biological response or biological function are topologically constrained to a secondary structural element in the reference target, such as a beta-sheet, or an alpha helix.
In other aspects, provided herein is a method of selecting an engineered peptide, comprising:
identifying one or more topological characteristics of a reference target;
designing spatially-associated constraints for each topological characteristic to produce a combination of spatially-associated topological constraints derived from the reference target;
comparing spatially-associated topological characteristics of candidate peptides with the combination of spatially-associated topological constraints derived from the reference target; and selecting a candidate peptide with spatially-associated topological characteristics that overlap with the combination of spatially-associated topological constraints derived from the reference target to produce the engineered peptide.
In some embodiments, the overlap between each characteristic is independently less than or equal to 75% Mean Percentage Error (MPE) as determined by one or more of Total Topological Constraint Distance (TCD), topological clustering coefficient (TCC), Euclidean distance, power distance, Soergel distance, Canberra distance, Sorensen distance, Jaccard distance, Mahalanobis distance, Hamming distance, Quantitative Estimate of Likeness (QEL), or Chain Topology Parameter (CTP). In certain embodiments, one or more constraints is derived from per-residue energy, per-residue interaction, per-residue fluctuation, per-residue atomic distance, per-residue chemical descriptor, per-residue solvent exposure, per-residue amino acid sequence similarity, per-residue bioinformatic descriptor, per-residue non-covalent bonding propensity, per-residue phi/psi angles, per-residue van der Waals radii, per-residue secondary structure propensity, per-residue amino acid adjacency, or per-residue amino acid contact. In some embodiments, the characteristics of one or more candidate peptides are determined by computer simulation. In still further embodiments, one or more constraints is independently associated with a biological response or biological function. In some embodiments, at least a portion of the atoms in the engineered peptide associated with a biological response or biological function are topologically constrained to a secondary structural element in the reference target, such as a beta-sheet, or an alpha helix.
In still further aspects, provided herein is a composition comprising two or more selection steering polypeptides, wherein each polypeptide is independently a positive selection molecule comprising one or more positive steering characteristics, or a negative selection molecule comprising one or more negative steering characteristics, wherein each characteristic type is independently selected from the group consisting of: amino acid sequence, polypeptide secondary structure, molecular dynamics, chemical features, biological function, immunogenicity, reference target(s) multi-specificity, cross-species reference target reactivity, selectivity of desired reference target(s) over undesired reference target(s), selectivity of reference target(s) within a sequence and/or structurally homologous family, selectivity of reference target(s) with similar protein function, selectivity of distinct desired reference target(s) from a larger family of undesired targets with high sequence and/or structurally homology, selectivity for distinct reference target alleles or mutations, selectivity for distinct reference target residue level chemical modifications, selectivity for cell type, selectivity for tissue type, selectivity for tissue environment, tolerance to reference target(s) structural diversity, tolerance to reference target(s) sequence diversity, and tolerance to reference target(s) dynamics diversity; and wherein at least one of the two or more polypeptides is an engineered peptide as described herein.
In some embodiments, at least one of the two or more polypeptides is a positive selection molecule, and at least one of the two or more polypeptides is a negative selection molecule. In some embodiments, at least one of the two or more polypeptides is a native protein. In certain embodiments, at least one pair of counterpart positive and negative selection molecules comprising at least one shared characteristic type, wherein the positive selection molecule comprises the positive characteristic and the negative selection molecule comprises the negative characteristic.
In yet additional aspects, provided herein is a method of screening a library of binding molecules with a composition comprising two or more selection steering molecules as described herein, the method comprising subjecting a pool of candidate binding molecules to at least one round of selection, wherein each round of selection comprises:
In some embodiments, the library of binding molecules is a phage library, or a cell library, such as a B-cell library or a T-cell library. In some embodiments, the method comprises two or more rounds of selection, or three or more rounds of selection. In certain embodiments, each round comprises a different set of selection molecules. In some embodiments, at least two rounds comprise the same negative selection molecule, or the same positive selection molecule, or both. In some embodiments, the method comprises analyzing the subset of the pool obtained from a round of selection prior to proceeding to the next round of selection.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. The present application can be understood by reference to the following description taking in conjunction with the accompanying figures.
Provided herein are methods of selecting meso-scale engineered peptides, and compositions comprising and methods of using said engineered peptides. For example, provided herein are methods of using engineered peptides in in vitro selection of antibodies.
The engineered peptides of the present disclosure are between 1 kDa and 10 kDa, referred to herein as “meso-scale”. Engineered peptides of this size may, in some embodiments, have certain advantages, such as protein-like functionality, a large theoretical space from which to select candidates, cell permeability, and/or structural and dynamical variability.
The methods provided herein comprise identifying a plurality of spatially-associated topological constraints, some of which may be derived from a reference target, constructing a combination of said constraints, comparing candidate peptides with said combination, and selecting a candidate that has constraints which overlap with the combination. By using spatially-associated topological constraints, different aspects of an engineered peptide can be included in the combination depending on the intended use, or desired function, or another desired characteristic. Further, not all constraints must, in some embodiments, be derived from a reference target. Through such methods, in some embodiments the selected engineered peptides are not simply variations of a reference target (such as might be obtained through peptide mutagenesis or progressive modification of a single reference), but rather may have a different overall structure than the reference peptide, while still retaining desired functional characteristics and/or key substructures.
Further provided herein are methods of using said engineered peptides, which include methods of programmable in vitro selection using one or more engineered peptides. Such selection may be used, for example, in the identification of antibodies.
These methods and engineered peptides are described in greater detail below.
I. Methods of Selecting Engineered Peptides
In some aspects, provided herein are methods of selecting an engineered peptide, comprising:
identifying one or more topological characteristics of a reference target;
designing spatially-associated constraints for each topological characteristic to produce a combination of reference target-derived constraints;
comparing spatially-associated topological characteristics of candidate peptides with the combination derived from the reference target; and
selecting a candidate peptide with spatially-associated topological characteristics that overlap with the combination of constraints derived from the reference target.
In some embodiments, one or more additional spatially-associated topological constraints that are not derived from the reference target are included in the combination.
a. Spatially-Associated Topological Constraints
The engineered peptides described herein are selected based on how closely they match a combination of spatially-associated topological constraints. This combination may also be described using the mathematical concept of a “tensor”. In such a combination (or tensor), each constraint is independently described in three dimensional space (e.g., spatially-associated), and the combination of these constraints in three dimensional space provides, for example, a representational “map” of different desired characteristics and their desired level (if applicable) relative to location. This map is not, in some embodiments, based on a linear or otherwise pre-determined amino acid backbone, and therefore can allow for flexibility in the structures that could fulfill the desired combination, as described. For example, in some embodiments, the “map” includes a spatial area wherein the prescribed constraint limitations could be adequately met by two adjacent amino acids—in some embodiments, these amino acids could be directly bonded (e.g., two contiguous amino acids) while in other embodiments, the amino acids are not directly bonded to each other but could be brought together in space by the folding of the peptide (e.g., are not contiguous amino acids). The separate constraints themselves are also not necessarily based on structure, but could include, for example, chemical descriptors and/or functional descriptors. In some embodiments, constraints include structural descriptors, such as a desired secondary structure or amino acid residue. In certain embodiments, each constraint is independently selected.
For example,
In some embodiments, the combination of constraints comprises at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, between 3 to 12, between 3 to 10, between 3 to 8, between 3 to 6, or 3, or 4, or 5, or 6 independently selected spatially-associated topological constraints. One or more of the constraints is derived from a reference target. In some embodiments, each of the constraints is derived from a reference target. In other embodiments, at least one constraint is derived from a reference target, and the remaining constraints are not derived from the reference target. For example, in some embodiments, between 1 and 9 constraints, between 1 and 7 constraints, between 1 and 5 constraints, or between 1 and 3 constraints are derived from a reference target, and between 1 and 9 constraints, between 1 and 7 constraints, between 1 and 5 constraints, or between 1 and 3 constraints are not derived from the reference target.
Once the combination of constraints has been constructed, a series of candidate peptides is compared to said combination to identify one or more new engineered peptides which meet the desired criteria. In some embodiments, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 125, at least 150, at least 175, at least 200, or at least 250 or more candidate peptides are compared to the combination to identify one or more new engineered peptides which meet the desired criteria. In some embodiments, more than 250 candidate peptides, more than 300 candidate peptides, more than 400 candidate peptides, more than 500 candidate peptides, more than 600 candidate peptides, or more than 750 candidate peptides are compared, for example. In some embodiments, topological characteristic simulations are used to evaluate the topological characteristic overlap, if any, of a candidate peptide compared to the combination of constraints. In some embodiments, one or more candidate peptides are also compared to the reference target, and overlap, if any, of candidate peptide topological characteristics with reference target topological characteristics is evaluated. In some embodiments, the engineered peptide is identified from a computational sample of more than 5, more than 10, more than 20, more than 30, more than 40, more than 50, more than 60, more than 70, more than 80, more than 90, or more than 100 distinct peptide and topological characteristic simulations and an engineered peptide is selected, wherein the selected engineered peptide has the highest topological characteristic overlap compared the reference target, out of the total sampled population.
The spatially-associated topological constraints used to construct the desired combination (e.g., the desired tensor) may each be independently selected from a wide group of possible characteristics. These may include, for example, constraints describing structural, dynamical, chemical, or functional characteristics, or any combinations thereof.
Structural constraints may include, for example, atomic distance, amino acid sequence similarity, solvent exposure, phi angle, psi angle, secondary structure, or amino acid contact, or any combinations thereof.
Dynamical constraints may include, for example, atomic fluctuation, atomic energy, van der Waals radii, amino acid adjacency, or non-covalent bonding propensity. Atomic energy may include, for example, pairwise attractive energy between two atoms, pairwise repulsive energy between two atoms, atom-level solvation energy, pairwise charged attraction energy between two atoms, pairwise hydrogen bonding attraction energy between two atoms, or non-covalent bonding energy, or any combinations thereof.
Chemical characteristics may include, for example, chemical descriptors. Such chemical descriptors may include, for example, hydrophobicity, polarity, atomic volume, atomic radius, net charge, log P, HPLC retention, van der Waals radii, charge patterns, or H-bonding patterns, or any combinations thereof.
Functional characteristics may include, for example, bioinformatic descriptors, biological responses, or biological functions. Bioinformatic descriptors may include, for example, BLOSUM similarity, pKa, zScale, Cruciani Properties, Kidera Factors, VHSE-scale, ProtFP, MS-WHIM scores, T-scale, ST-scale, Transmembrane tendency, protein buried area, helix propensity, sheet propensity, coil propensity, turn propensity, immunogenic propensity, antibody epitope occurrence, and/or protein interface occurrence, or any combinations thereof.
In some embodiments, designing the constraints incorporates information about per-residue energy, per-residue interaction, per-residue fluctuation, per-residue atomic distance, per-residue chemical descriptor, per-residue solvent exposure, per-residue amino acid sequence similarity, per-residue bioinformatic descriptor, per-residue non-covalent bonding propensity, per-residue phi/psi angles, per-residue van der Waals radii, per-residue secondary structure propensity, per-residue amino acid adjacency, or per-residue amino acid contact. In some embodiments, these characteristics are used for a subset of the total residues in the reference target, or a subset of the total residues of the total combination of constraints, or a combination thereof. In some embodiments, one or more different characteristics are used for one or more different residues. That is, in some embodiments, one or more characteristics are used for a subset of residues, and at least one different characteristic is used for a different subset of residues. In some embodiments, one or more of said characteristics used to design one or more constraints is determined by computer simulation. Suitable computer simulation methods may include, for example, molecular dynamics simulations, Monte Carlo simulations, coarse-grained simulations, Gaussian network models, machine learning, or any combinations thereof.
In some embodiments multiple constraints are selected from one category. For example, in some embodiments, the combination comprises two or more constraints that are independently a type of biological response. In some embodiments, two or more constraints are independently a type of secondary structure. In certain embodiments, two or more constraints are independently a type of chemical descriptor. In other embodiments, the combination comprises no overlapping categories of constraints.
In some embodiments, one or more constraints is independently associated with a biological response or biological function. In some embodiments, said constraint is a spatially defined atom(s)-level constraint, or spatially defined shape/area/volume-level constraint (such as a characteristic shape/area/volume that can be satisfied by several different atomic compositions), or a spatially defined dynamic-level constraint (such as a characteristic dynamic or set of dynamics that can be satisfied by several different atomic compositions).
In some embodiments, one or more constraints is derived from a protein structure or peptide structure associated with a biological function or biological response. For example, in some embodiments, one or more constraints is derived from an extracellular domain, such as a G protein-coupled receptor (GPCR) extracellular domain, or an ion channel extracellular domain. In some embodiments, one or more constraints is derived from a protein-protein interface junction. In some embodiments, one or more constraints is derived from a protein-peptide interface junction, such as MI-IC-peptide or GPCR-peptide interfaces. In certain embodiments, the atoms or amino acids constrained to such a protein or peptide structure are atoms or amino acids associated with a biological function or biological response. In some embodiments, the atoms or amino acids in the engineered peptide constrained to such a protein or peptide structure are atoms or amino acids derived from a reference target. In some embodiments, one or more constraints is derived from a polymorphic region of a reference target (e.g., a region subject to allelic variation between individuals).
In some embodiments, the biological response or biological function is selected from the group consisting of gene expression, metabolic activity, protein expression, cell proliferation, cell death, cytokine secretion, kinase activity, epigenetic modification, cell killing activity, inflammatory signals, chemotaxis, tissue infiltration, immune cell lineage commitment, tissue microenvironment modification, immune synapse formation, IL-2 secretion, IL-10 secretion, growth factor secretion, interferon gamma secretion, transforming growth factor beta secretion, immunoreceptor tyrosine-based activation motif activity, immunoreceptor tyrosine-based inhibition motif activity, antibody directed cell cytotoxicity, complement directed cytotoxicity, biological pathway agonism, biological pathway antagonism, biological pathway redirection, kinase cascade modification, proteolytic pathway modification, proteostasis pathway modification, protein folding/pathways, post-translational modification pathways, metabolic pathways, gene transcription/translation, mRNA degradation pathways, gene methylation/acetylation pathways, histone modification pathways, epigenetic pathways, immune directed clearance, opsonization, hormone signaling, integrin pathways, membrane protein signal transduction, ion channel flux, and g-protein coupled receptor response.
In some embodiments, the one or more atoms associated with a biological function or biological response are selected from the group consisting of carbon, oxygen, nitrogen, hydrogen, sulfur, phosphorus, sodium, potassium, zinc, manganese, magnesium, copper, iron, molybdenum, and nickel. In certain embodiments, the atoms are selected from the group consisting of oxygen, nitrogen, sulfur, and hydrogen.
In some embodiments, wherein one of the constraints is one or more amino acids associated with a biological function or biological response, and/or the engineered peptide comprises one or more amino acids associated with a biological function or biological response, the one or more amino acids are independently selected from the group consisting of the 20 proteinogenic naturally occurring amino acids, non-proteinogenic naturally occurring amino acids, and non-natural amino acids. In some embodiments, the non-natural amino acids are chemically synthesized. In certain embodiments, the one or more amino acids are selected from the 20 proteinogenic naturally occurring amino acids. In other embodiments, the one or more amino acids are selected from the non-proteinogenic naturally occurring amino acids. In still further embodiments, the one or more amino acids are selected from non-natural amino acids. In still further embodiments, the one or more amino acids are selected from a combination of 20 proteinogenic naturally occurring amino acids, non-proteinogenic naturally occurring amino acids, and non-natural amino acids.
While the combination of constraints used to select an engineered peptide as described herein comprises at least one constraint derived from a reference target, in some embodiments one or more constraints of the combination are not derived from a reference target. Thus, in certain embodiments, the selected engineered peptide comprises one or more characteristics that are not shared with the reference target.
In some embodiments, one or more constraints derived from the reference target and used in the combination describes the inverse of the characteristic as observed in the reference target. Thus, for example, a reference target may have a certain pattern of positive charge, a constraint related to charge is derived from said reference target, and the derived constraint describes a similar pattern but of neutral charge, or of negative charge. Thus, in some embodiments one or more inverse constraints are derived from the reference target and included in the combination. Such inverse constraints may be useful, for example, in selecting engineered peptides as control molecules for certain assays or panning methods, or as negative selection molecules in the programmable in vitro selection methods described herein.
In some embodiments, the combination of spatially-defined topological constraints comprises one or more non-reference derived topological constraints. In some embodiments, the one or more non-reference derived topological constraints enforces or stabilizes one or more secondary structural elements, enforces atomic fluctuations, alters peptide total hydrophobicity, alters peptide solubility, alters peptide total charge, enables detection in a labeled or label-free assay, enables detection in an in vitro assay, enables detection in an in vivo assay, enables capture from a complex mixture, enables enzymatic processing, enables cell membrane permeability, enables binding to a secondary target, or alters immunogenicity. In certain embodiments, the one or more non-reference derived topological constraints constrains one or more atoms or amino acids in the combination of constraints (or subsequently selected peptide) that were derived from the reference target. For example, in some embodiments, the combination of constraints includes a secondary structure that was derived from the reference target, and the combination of constraints also comprises a constraint that stabilizes the secondary structural element (e.g., through additional hydrogen bonding, or hydrophobic interactions, or side chain stacking, or a salt bridge, or a disulfide bond), wherein the stabilizing constraint is not present in the reference target. In another example, in some embodiments the combination of constraints (or subsequently selected peptide) comprises one or more atoms or amino acids that was derived from the reference target, and the combination of constraints also includes a constraint that enforces atomic fluctuations in at least a portion of the atoms or amino acids derived from the target reference, wherein the constraint is not present in the target reference. In some embodiments, one or more non-reference derived constraints is an inverse constraint. For example, in some embodiments, two combinations of constraints are constructed to select engineered peptides with inverse characteristics. In some such embodiments, a first combination of constraints will comprise one or more constraints derived from the reference target, and one or more constraints not derived from the reference target; and a second combination of constraints will comprise the same one or more constraints derived from the reference target, and the inverse of one or more of non-reference target constraints of the first combination.
d. Reference Target
Any suitable reference target may be used to derive one or more spatially-associated topological constraints for use in the methods provided herein. In some embodiments, the reference target is a full-length native protein. In other embodiments, the reference target is a portion of a full-length native protein. In still further embodiments, the reference target is a non-native protein, or portion thereof.
For example, in some embodiments the reference target is a cell-surface receptor, or a transmembrane protein, or a signaling protein, or a multiprotein complex, or a protein-peptide complex, or a portion thereof. In some embodiments, the reference target is a portion of a protein of interest, wherein the protein of interest is involved in disease process in an organism, such as a human. In some embodiments, the protein of interest is involved in the growth or metastasis of cancer, or in an inflammatory disorder, and the reference target is a portion of said protein of interest that is a putative epitope. Thus, in some embodiments, the methods provided herein may be used to select one or more engineered peptides that may serve as an immunogen, and may be used to raise antibodies of a protein of interest. Examples of proteins that may be of interest include, for example, PD-1, PD-L1, CD25, IL2, MIF, CXCR4, or VEGF. Thus, in some embodiments, the reference target is PD-1, PD-L1, CD25, IL2, MIF, CXCR4, or VEGF, or a portion thereof, such as an epitope. In some embodiments, the methods provided herein may be used to select one or more engineered peptides that are immunogens, and which may be used to raise one or more antibodies that specifically bind to the protein from which the target reference is derived. In still further embodiments, the methods provided herein may be used to select one or more engineered peptides which in turn may be used to select one or more binding partners of a protein of interest, such as an antibody or a Fab-displaying phage.
c. Comparison of Constraints
In some embodiments, the one or more constraints (e.g., reference-derived or non-reference derived) are determined by molecular simulation (e.g. molecular dynamics), or laboratory measurement (e.g. NMR), or a combination thereof. Once the constraints have been derived and combined, engineered peptide candidates are, in some embodiments, generated using a computational protein design (e.g., Rosetta). In some embodiments, other methods of sampling peptide space are used. Dynamics simulations may then be carried out on the candidate engineered peptides to obtain the parameters of constraints that have been selected. A covariance matrix of atomic fluctuations is generated for the reference target, covariance matrices are generated for the residues in each of the candidate engineered peptides, and these covariance matrices are compared to determine overlap. Principal component analysis is performed to compute the eigenvectors and eigenvalues for each covariance matrix—one covariance matrix for the reference target and one covariance for each of the candidate engineered peptides—and those eigenvectors with the largest eigenvalues are retained.
The eigenvectors describe the most, second-most, third-most, N-most dominant motion observed in a set of simulated molecular structures. Without wishing to be bound by any theory, if a candidate engineered peptide moves like the reference target, its eigenvectors will be similar to the eigenvectors of the reference target. The similarity of eigenvectors corresponds to their components (a 3D vector centered on each CA atom) being aligned, pointing in the same direction. Exemplary eigenvector comparisons between a reference target and a candidate engineered peptide are shown in
In some embodiments, this similarity between candidate engineered peptide and reference target eigenvectors is computed using the inner product of two eigenvectors. The inner product value is 0 if two eigenvectors are 90 degrees to each other or 1 if the two eigenvectors point precisely in the same direction. Without wishing to be bound by theory, since the ordering of eigenvectors is based on their eigenvalues, and eigenvalues may not necessarily be the same between two different molecules due to the stochastic nature by which molecular dynamics (MD) simulations sample the underlying energy landscape of those different molecules, the inner product between multiple, differentially ranked eigenvectors is, in some embodiments, needed (e.g. eigenvector 1 of the engineered peptide by eigenvector 2, 3, 4, etc. of the reference target). In addition, molecular motions are complex and may involve more than one (or more than a few) dominant/principal modes of motion. Thus, in some embodiments, the inner product between all pairs of eigenvectors in a candidate engineered peptide and the reference target are computed. This results in a matrix of inner products the dimensions of which are determined by the number of eigenvectors analyzed. For example, for 10 eigenvectors, the matrix of inner products is 10 by 10. This matrix of inner products can be distilled into a single value by computing the root mean-square value of the 100 (if 10 by 10) inner products. This is the root mean square inner product (RMSIP). The equation for RMSIP is shown in
e. Additional Steps
In some embodiments, selection of one or more engineered peptides comprises one or more additional steps. For example, in some embodiments an engineered peptide candidate is selected based on similarity to the defined combination of spatially-associated topological constraints, as described herein, and then undergoes one or more analyses to determine one or more additional characteristics, and one or more structural adjustments to impart or enforce said desired characteristics. For example, in some embodiments, the selected candidate is analyzed, such as through molecule dynamics simulations, to determine overall stability of the molecule and/or propensity for a particular folded structure. In some embodiments, one or more modifications are made to the engineered peptide to impart or reinforce a desired level of stability, or a desired propensity for a desired folded structure. Such modifications may include, for example, the installation of one or more cross-links (such as a disulfide bond), salt bridges, hydrogen bonding interactions, or hydrophobic interactions, or any combinations thereof.
The methods provided herein may further comprise assaying one or more selected engineered peptides for one or more desired characteristics, such as desired binding interactions or activity. Any suitable assay may be used, as appropriate to measure the desired characteristic.
II. Selected Engineered Peptides
In other aspects, provided herein are engineered peptides, such as engineered peptides selected through the methods described herein. In some embodiments, the engineered peptide has a molecular mass between 1 kDa and 10 kDa, and comprises up to 50 amino acids. In certain embodiments, the engineered peptide has a molecular mass between 2 kDa and 10 kDa, between 2 kDa and 10 kDa, between 3 kDa and 10 kDa, between 4 kDa and 10 kDa, between 5 kDa and 10 kDa, between 6 kDa and 10 kDa, between 7 kDa and 10 kDa, between 8 kDa and 10 kDa, between 9 kDa and 10 kDa, between 1 kDa and 9 kDa, between 1 kDa and 8 kDa, between 1 kDa and 7 kDa, between 1 kDa and 6 kDa, between 1 kDa and 5 kDa, between 1 kDa and 4 kDa, between 1 kDa and 3 kDa, or between 1 kDa and 2 kDa. In certain embodiments, the engineered peptide comprises up to 45 amino acids, up to 40 amino acids, up to 35 amino acids, up to 30 amino acids, up to 25 amino acids, up to 20 amino acids, at least 5 amino acids, at least 10 amino acids, at least 15 amino acids, at least 20 amino acids, at least 25 amino acids, at least 30 amino acids, at least 35 amino acids, or at least 40 amino acids.
In certain embodiments, the engineered peptide comprises a combination of spatially-associated topological constraints, wherein one or more of the constraints is a reference target-derived constraint. Any constraints described herein may be used in the combination, in some embodiments. In still further embodiments, between 10% to 98% of the amino acids of the engineered peptide meet the one or more reference target-derived constraints (e.g., if the engineered peptide comprises 50 amino acids, between 5 to 49 amino acids meet the one or more reference target-derived constraints). In some embodiments, between 20% to 98%, between 30% to 98%, between 40% to 98%, between 50% to 98%, between 60% to 98%, between 70% to 98%, between 80% to 98%, between 90% to 98%, between 10% to 90%, between 10% to 80%, between 10% to 70%, between 10% to 60%, between 10% to 50%, between 10% to 40%, between 10% to 30%, or between 10% to 20% of the amino acids of the engineered peptide meet the one or more reference target-derived constraints. In still further embodiments, the one or more amino acids that meet the one or more reference target-derived constraints have less than 8.0 Å, less than 7.5 Å, less than 7.0 Å, less than 6.5 Å, less than 6.0 Å, less than 5.5 Å, or less than 5.0 Å backbone root-mean-square deviation (RSMD) structural homology with the reference target. In some embodiments, the engineered peptide has a molecular mass of between 1 kDa and 10 kDa; comprises up to 50 amino acids; a combination of spatially-associated topological constraints, wherein one or more of the constraints is a reference target-derived constraint; between 10% to 98% of the amino acids of the engineered peptide meet the one or more reference target-derived constraints; and the amino acids that meet the one or more reference target-derived constraints have less than 8.0 Å backbone root-mean-square deviation (RSMD) structural homology with the reference target.
In some embodiments, the amino acids of the engineered peptide that meet the one or more reference target-derived constraints have between 10% and 90% sequence homology, between 20% and 90% sequence homology, between 30% and 90% sequence homology, between 40% and 90% sequence homology, between 50% and 90% sequence homology, between 60% and 90% sequence homology, between 70% and 90% sequence homology, or between 80% and 90% sequence homology with the reference target. In some embodiments, the amino acids that meet the one or more reference target-derived constraints have a van der Waals surface area overlap with the reference of between 30 Å2 to 3000 Å2, or between 100 Å2 to 3000 Å2, or between 250 Å2 to 3000 Å2, or between 500 Å2 to 3000 Å2, or between 750 Å2 to 3000 Å2, or between 1000 Å2 to 3000 Å2, or between 1250 Å2 to 3000 Å2, or between 1500 Å2 to 3000 Å2, or between 1750 Å2 to 3000 Å2, or between 2000 Å2 to 3000 Å2, or between 2250 Å2 to 3000 Å2, or between 2500 Å2 to 3000 Å2, or between 2750 Å2 to 3000 Å2.
The combination of constraints that the engineered peptide meets may comprise two or more, three or more, four or more, five or more, six or more, or seven or more reference target-derived constraints. The combination may comprise one or more constraints not derived from the reference target, as described elsewhere in the present disclosure. These reference-derived constraints, and non-reference derived constraints if present, may independently be any of the constraints described herein, such as any of the structural, dynamical, chemical, or functional characteristics described herein, or any combinations thereof.
In some embodiments, the engineered peptide comprises at least one structural difference when compared to the reference target. Such structural differences may include, for example, a difference in the sequence, number of amino acid residues, total number of atoms, total hydrophilicity, total hydrophobicity, total positive charge, total negative charge, one or more secondary structures, shape factor, Zernike descriptors, van der Waals surface, structure graph nodes and edges, volumetric surface, electrostatic potential surface, hydrophobic potential surface, local diameter, local surface features, skeleton model, charge density, hydrophilic density, surface to volume ratio, amphiphilicity density, or surface roughness, or any combinations thereof. In some embodiments, the difference in one or more characteristics (such as one or more characteristics described herein) is at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100%, or greater than 100% when compared to the characteristic in the reference target, as applicable to the type of characteristic. For example, in some embodiments the difference is the total number of atoms, and the engineered peptide has at least 10%, at least 20%, or at least 30% more atoms than the reference target, or at least 10%, at least 20%, or at least 30% fewer atoms than the reference target. In some embodiments, the difference is in total positive charge, and the total positive charge of the engineered peptide is at least 10%, at least 20%, at least 30%, at least 40%, or at least 50% larger (e.g., more positive) than the reference target, while in other embodiments the total positive charge of the engineered peptide is at least 10%, at least 20%, at least 30%, at least 40%, or at least 50% smaller (e.g., less positive) than the reference target.
In some embodiments, the combination of spatially-defined topological constraints includes one or more secondary structural elements not present in the reference target. Thus, in some embodiments, the engineered peptide comprises one or more secondary structural elements that are not present in the reference target. In some embodiments, the combination and/or engineered peptide comprises one secondary structural element, two secondary structural elements, three secondary structural elements, four secondary structural elements, or more than four secondary structural elements not found in the reference target. In some embodiments, each secondary structural element is independently selected form the group consisting of helices, sheets, loops, turns, and coils. In some embodiments, each secondary structural element not present in the reference target is independently an α-helix, β-bridge, β-strand, 310 helix, π-helix, turn, loop, or coil.
In some embodiments, the engineered peptide comprises one or more atoms, or one or more amino acids, or a combination thereof, that is associated with a biological response or a biological function. In some embodiments, the biological response or biological function is selected from the group consisting of gene expression, metabolic activity, protein expression, cell proliferation, cell death, cytokine secretion, kinase activity, epigenetic modification, cell killing activity, inflammatory signals, chemotaxis, tissue infiltration, immune cell lineage commitment, tissue microenvironment modification, immune synapse formation, IL-2 secretion, IL-10 secretion, growth factor secretion, interferon gamma secretion, transforming growth factor beta secretion, immunoreceptor tyrosine-based activation motif activity, immunoreceptor tyrosine-based inhibition motif activity, antibody directed cell cytotoxicity, complement directed cytotoxicity, biological pathway agonism, biological pathway antagonism, biological pathway redirection, kinase cascade modification, proteolytic pathway modification, proteostasis pathway modification, protein folding/pathways, post-translational modification pathways, metabolic pathways, gene transcription/translation, mRNA degradation pathways, gene methylation/acetylation pathways, histone modification pathways, epigenetic pathways, immune directed clearance, opsonization, hormone signaling, integrin pathways, membrane protein signal transduction, ion channel flux, and g-protein coupled receptor response.
In certain embodiments, the reference target comprises one or more atoms associated with a biological response or a biological function (such as one described herein); the engineered peptide comprises one or more atoms associated with a biological response or a biological function (such as one described herein); and the atomic fluctuations of said atoms in the engineered peptide overlap with the atomic fluctuations of said atoms in the reference target. Thus, for example, in some embodiments the atoms themselves are different atoms, but their atomic fluctuations overlap. In other embodiments, the atoms are the same atoms, and their atomic fluctuations overlap. In still further embodiments, the atoms are independently the same or different. In some embodiments, the overlap is a root mean square inner product (RMSIP) greater than 0.25. In some embodiments, the overlap is a RMSIP greater than 0.3, greater than 0.35, greater than 0.4, greater than 0.45, greater than 0.5, greater than 0.55, greater than 0.6, greater than 0.65, greater than 0.7, greater than 0.75, greater than 0.8, greater than 0.85, greater than 0.9, or greater than 0.95. In certain embodiments, the RMSIP is calculated by:
where n is the eigenvector of the engineered peptide topological constraints, and v is the eigenvector of the reference target topological constraints.
In some embodiments, the engineered peptide comprises atoms or amino acids (or combination thereof) associated with a biological response or biological function, and at least a portion of said atoms or amino acids or combination is derived from a reference target, and certain constraints of the set of atoms or amino acids in the engineered peptide and the set in the reference target can be described by a matrix. In some embodiments, the matrix is an L×L matrix. In other embodiments, the matrix is an S×S×M matrix. In still further embodiments, the matrix is an L×2 phi/psi angle matrix
For example in some embodiments, the atomic fluctuations of the atoms or amino acids in the engineered peptide that are associated with a biological response or biological function are described by an L×L matrix; a portion of said atoms or amino acids are derived from the reference target; and the atomic fluctuations in the reference target of said portion are described by an L×L matrix. In some embodiments, the adjacency of each set (related to amino acid location) is described by corresponding L×L matrices. In certain embodiments, the mean percentage error (MPE) across all matrix elements (i, j) of the engineered peptide L×L atomic fluctuation or adjacency matrix is less than or equal to 75% relative to the corresponding (i, j) elements in the reference target atomic fluctuation or adjacency matrix, for the fraction of the engineered peptide derived from the reference target. In some embodiments, the MPE is less than 70%, less than 65%, less than 60%, less than 55%, less than 50%, less than 45%, or less than 40% relative to the corresponding elements in the reference target matrix, for the fraction of the engineered peptide derived from the reference target. In some embodiments, wherein the matrices represent atomic fluctuations, L is the number of amino acid positions and the (i, j) value in the atomic fluctuation matrix element is the sum of intra-molecular atomic fluctuations for the ith and jth amino acid respectively if the (i, j) atomic distance is less than or equal to 7 Å, or zero if the (i, j) atomic distance is greater than 7 Å or if (i, j) is on the diagonal. Alternatively, in some embodiments the atomic distance can serve as a weighting factor for the atomic fluctuation matrix element (i, j) instead of a 0 or 1 multiplier. In certain embodiments, the ith and jth atomic fluctuations and distances can be determined by molecular simulation (e.g. molecular dynamics) and/or laboratory measurement (e.g. NMR). In other embodiments, wherein the matrices represent adjacency, L is the number of amino acid positions and the value in adjacency matrix element (i, j) is the intra-molecular atomic distance between the ith and jth amino acid respectively if the atomic distance is less than or equal to 7 Å, or zero if the atomic distance is greater than 7 Å or if (i, j) is on the diagonal. Alternatively, in some embodiments the atomic distance can serve as a weighting factor for the adjacency matrix element (i, j) instead of a 0 or 1 multiplier. In certain embodiments, the ith and jth atomic distances could be determined by molecular simulation (e.g. molecular dynamics) and/or laboratory measurement (e.g. NMR).
In certain embodiments, the atoms or amino acids associated with a response or function in the engineered peptide have a topological constraint chemical descriptor vector and a mean percentage error (MPE) less than 75% relative to the reference described by the same chemical descriptor, for the fraction of the engineered peptide derived from the reference target, wherein each ith element in the chemical descriptor vector corresponds to an amino acid position index. In some embodiments, the MPE is less than 70%, less than 65%, less than 60%, less than 55%, less than 50%, less than 45%, or less than 40% relative to the reference described by the same chemical descriptor, for the fraction of the engineered peptide derived from the reference target. An exemplary vector is presented in
In still further embodiments, the matrix is an L×2 phi/psi angel matrix, and the atoms or amino acids associated with a response or function in the engineered peptide have an MPE less than 75% with respect to the reference phi/psi angles matrix in the fraction of the engineered peptide derived from the reference target, wherein L is the number of amino acid positions and phi, psi values are in dimensions (L,1) and (L,2) respectively. In some embodiments, the MPE is less than 70%, less than 65%, less than 60%, less than 55%, less than 50%, less than 45%, or less than 40% with respect to the reference phi/psi angles matrix in the fraction of the engineered peptide derived from the reference target. In some embodiments, the phi/psi values are determined by molecular simulation (e.g. molecular dynamics), knowledge-based structure prediction, or laboratory measurement (e.g. NMR). An exemplary L×2 phi/psi matrix is shown in
In some embodiments, the matrix is an S×S×M secondary structural element interaction matrix, and the atoms or amino acids associated with a response or function in the engineered peptide have less than 75% mean percentage error (MPE) relative to the reference secondary structural element relationship matrix, in the fraction of the engineered peptide derived from the reference target, where S is the number of secondary structural elements and M is the number of interaction descriptors. In some embodiments, the MPE is less than 70%, less than 65%, less than 60%, less than 55%, less than 50%, less than 45%, or less than 40% relative to the reference secondary structural element relationship matrix, in the fraction of the engineered peptide derived from the reference target. Interaction descriptors may include, for example, hydrogen bonding, hydrophobic packing, van der Waals interaction, ionic interaction, covalent bridge, chirality, orientation, or distance, or any combinations thereof. In the secondary structural element interaction matrix index, (i, j, m)=mth interaction descriptor value between the ith and jth secondary structural elements. An exemplary S×S×M matrix is presented in
Mean Percentage Error (MPE) for different matrices as described herein may be calculated by:
where n is the topological constraint vector or matrix position index for the engineered peptide (engn) and the corresponding reference (refn), summed up to vector or matrix position n. An exemplary example of a topological matrix is provided in
In some embodiments, the engineered peptide has an MPE of less than 75% compared to the reference target. In certain embodiments, the engineered peptide has an MPE of less than 70%, less than 65%, less than 60%, less than 55%, less than 50%, less than 45%, or less than 40% compared to the reference target. In some embodiments, the MPE is determined by Total Topological Constraint Distance (TCD), topological clustering coefficient (TCC), Euclidean distance, power distance, Soergel distance, Canberra distance, Sorensen distance, Jaccard distance, Mahalanobis distance, Hamming distance, Quantitative Estimate of Likeness (QEL), or Chain Topology Parameter (CTP).
a. Secondary Structural Element
In some embodiments, at least a portion of the engineered peptide is topologically constrained to one or more secondary structural elements. In some embodiments, the atoms or amino acids associated with a biological response or biological function in the engineered peptide are topologically constrained to one or more secondary structural elements. In some embodiments, the secondary structural element is independently a sheet, helix, turn, loop, or coil. In some embodiments, the secondary structural element is independently an α-helix, β-bridge, β-strand, 310 helix, π-helix, turn, loop, or coil. In certain embodiments, one or more of the secondary structural elements to which at least a portion of the engineered peptide is topologically constrained is present in the reference target. In some embodiments, at least a portion of the engineered peptide is topologically constrained to a combination of secondary structural elements, wherein each element is independently selected from the group consisting of sheet, helix, turn, loop, and coil. In still further embodiments, each element is independently selected from the group consisting of an α-helix, β-bridge, β-strand, 310 helix, π-helix, turn, loop, and coil.
In some embodiments, the secondary structural element is a parallel or anti-parallel sheet. In some embodiments, a sheet secondary structure comprises greater than or equal to 2 residues. In some embodiments, a sheet secondary structure comprises less than or equal to 50 residues. In still further embodiments, a sheet secondary structure comprises between 2 and 50 residues. Sheets can be parallel or anti-parallel. In some embodiments, a parallel sheet secondary structure may be described as having two strands i, j in a parallel (N-termini of i and j strands opposing orientation), and a pattern of hydrogen bonding of residues i:j. In some embodiments, an anti-parallel sheet secondary structure may also be described as having two strands i, j in an anti-parallel (N-termini of i and j strands same orientation), and a pattern of hydrogen bonding of residues i:j−1, i:j+1. In certain embodiments, the orientation and hydrogen bonding of strands can be determined by knowledge-based or molecular dynamics simulation and/or laboratory measurement.
In some embodiments, the secondary structural element is a helix. Helices may be right or left handed. In some embodiments, the helix has a residue per turn (residues/turn) value of between 2.5 and 6.0, and a pitch between 3.0 Å and 9.0 Å. In some embodiments, the residues/turn and pitch are determined by knowledge-based or molecular dynamics simulation and/or laboratory measurement.
In some embodiments, the secondary structural element is a turn. In some embodiments, a turn comprises between 2 to 7 residues, and 1 or more inter-residue hydrogen bonds. In some embodiments, the turn comprises 2, 3, or 4 inter-residue hydrogen bonds. In certain embodiments, the turn is determined by knowledge-based or molecular dynamics simulation and/or laboratory measurement.
In still further embodiments, the secondary structural element is a coil. In certain embodiments, the coil comprises between 2 to 20 residues and zero predicted inter-residue hydrogen bonds. In some embodiments, these coil parameters are determined by knowledge-based or molecular dynamics simulation and/or laboratory measurement.
In still further embodiments, the engineered peptide comprises one or more atoms or amino acids derived from the reference target, wherein said atoms or amino acids have a secondary structure. In some embodiments, these atoms or amino acids are associated with a biological response or biological function. In some embodiments, the secondary structure motif vector of the atoms or amino acids in the engineered peptide has a cosine similarity greater than 0.25 relative to the reference target secondary structure motif vector for the fraction of the engineered peptide derived from the reference target, wherein the length of the vector is the number of secondary structure motifs and the value at the ith vector position defines the identity of the secondary structure motif (e.g. helix, sheet) derived from a lookup table. In some embodiments, each motif comprises two or more amino acids. In certain embodiments, motifs include, for example, α-helix, β-bridge, β-strand, 310 helix, π-helix, turn, and loop. In some embodiments, the cosine similarity is greater than 0.3, greater than 0.35, greater than 0.4, greater than 0.45, or greater than 0.5 relative to the reference target secondary structure motif vector for the fraction of the engineered peptide derived from the reference target. An exemplary secondary structure index and lookup table is provided in
wherein A is the peptide vector of secondary structure motif identifiers, B is the reference vector of secondary structure motif identifiers, n is the length of the secondary structure motif vector, and i is the ith secondary structure motif.
In some embodiments, one or more atoms or amino acids of the engineered peptide which are derived from the reference target can be compared to the corresponding reference target atoms or amino acids using a total topological constraint distance (TCD). In some embodiments, the total TCD of said engineered peptide atoms or amino acids derived from the reference target is +/−75% relative to the TCD distance of the corresponding atoms in the reference target, wherein two intra-molecule topological constraints are interacting if their pairwise distance is less than or equal to 7 Å. In some embodiments, the atoms or amino acids in the engineered peptide being compared are associated with a biological function or biological response. The ith, jth pairwise distance of two atoms or amino acids can, in some embodiments, be determined by molecular simulation (e.g. molecular dynamics) and/or laboratory measurement (e.g. NMR). An exemplary equation for calculating total topological constraint distance (TCD) is:
where i, j are the intra-molecular position indices for amino acids (i, j), Sij is the difference between constraints S(i) and S(j), Δ(i,j)=1 if amino acids (i, j) are within the 7 Å interaction threshold, and L is the number of amino acid positions in the peptide or the corresponding reference target. Alternatively, in some embodiments, Δ(i,j) can serve as a weighting factor for the Su difference instead of a 0 or 1 multiplier.
In some embodiments, one or more atoms or amino acids of the engineered peptide which are derived from the reference target can be compared to the corresponding reference target atoms or amino acids using a chain topology parameter (CTP). In some embodiments, the CTP of said engineered peptide atoms or amino acids is +/−50% relative to the CTP of the corresponding atoms or amino acids in the reference target, wherein intra-chain topological interaction is a pairwise distance less than or equal to 7 Å. In some embodiments, the atoms or amino acids in the engineered peptide being compared are associated with a biological function or biological response. In some embodiments, ith, jth pairwise distance can be determined by molecular simulation (e.g. molecular dynamics) and/or laboratory measurement (e.g. NMR). An exemplary equation for evaluating CTP is:
where i, j are the position indices for amino acids (i, j), Sij is the difference between topological constraints S(i) and S(j), Δ(i,j)=1 if amino acids (i, j) are within the 7 Å chain topological interaction threshold, L is the number of amino acid positions in the peptide or the corresponding reference target, and N is the total number of intra-chain contacts that meet the 7 Å topological interaction threshold in the engineered peptide or reference target. Alternatively, in some embodiments Δ(i,j) can serve as a weighting factor for the S1 difference instead of a 0 or 1 multiplier.
In some embodiments, one or more atoms or amino acids of the engineered peptide which are derived from the reference target can be compared to the corresponding reference target atoms or amino acids using a quantitative estimate of likeness (QEL). In some embodiments, the QEL of said engineered peptide atoms or amino acids is +/−50% relative to the QEL of the corresponding atoms or amino acids in the reference target. In some embodiments, the atoms or amino acids in the engineered peptide being compared are associated with a biological function or biological response. An exemplary equation for determining QEL is:
wherein di is a topological constraint for the ith amino acid or atom position, or a composition function (e.g. linear regression function) that combines multiple topological constraints for the ith amino acid or atom position, and n is the number of amino acid or atom positions in the peptide or the reference target.
In some embodiments, one or more atoms or amino acids of the engineered peptide which are derived from the reference target can be compared to the corresponding reference target atoms or amino acids using a topological clustering coefficient (TCC) vector and a mean percentage error (MPE). In some embodiments, the TCC vector and MPE is less than 75% relative to the TCC of the corresponding atoms or amino acids in the reference target, wherein each element (i) of the vector is a topological clustering coefficient for the ith amino acid position, intra-molecule clusters are defined by an interacting edge distance less than or equal to 7 Å, and two edges: i−j, j−1 from the ith amino acid position. In some embodiments, the atoms or amino acids in the engineered peptide being compared are associated with a biological function or biological response. In some embodiments, the ith, jth and lth edge distance can be determined by molecular simulation (e.g. molecular dynamics) and/or laboratory measurement (e.g. NMR). An exemplary equation for evaluating the topological clustering coefficient for the ith position is:
wherein Δ(i,j)=1, Δ(i,l)=1, Δ(j,l)=1 if intra-molecular amino acid positions: (i, j), (i, l), (j, l) are within the 7 Å interacting edge threshold respectively, Sijl is the combination (e.g. sum) of topological constraints for the ith, jth and lth amino acid, L is the number of amino acid positions in the peptide vector or corresponding reference target vector, Nc is the number of intra-molecular interacting amino acid positions for the ith amino acid, meeting the 7 Å edge threshold and two edges: i−j, j−1 from the ith amino acid. Alternatively, in some embodiments, Δ(i,j), Δ(i,l) and Δ(j,l) can serve as weighting factors for the clustering coefficient vector element (i) instead of a 0 or 1 multiplier. An exemplary diagram showing clusters and TCC vector for an exemplary engineered peptide is provided in
In still further embodiments, one or more atoms or amino acids of the engineered peptide which are derived from the reference target can be compared to the corresponding reference target atoms or amino acids using an L×M topological constraint matrix and mean percentage error (MPE) of: Euclidean distance, power distance, Soergel distance, Canberra distance, Sorensen distance, Jaccard distance, Mahalanobis distance, or Hamming distance across all M-dimensions. The L×M matrix element (1, m) contains the mth constraint value for the lth amino acid position, wherein L is the number of amino acid positions and M is the number of distinct topological constraints. In some embodiments, the MPE of the engineered peptide L×M matrix is less than 75% relative to the matrix of the corresponding reference target atoms or amino acids. In some embodiments, the MPE is less than 70%, less than 65%, less than 60%, less than 55%, less than 50%, or less than 45%. In some embodiments, the atoms or amino acids in the engineered peptide being compared are associated with a biological function or biological response. An exemplary L×M matrix is provided in
III. Programmable In Vitro Selection
In other aspects, further provided herein are methods of using the engineered peptides described herein in selecting binding partners using a series of programmed selection steps, wherein at least one selection step includes evaluating the interactions of a pool of potential binding partners with an engineered peptide.
In some embodiments, provided herein are methods of steering the selection of a binding molecule using two or more selection molecules. In some embodiments, the methods include subjecting a pool of candidate binding molecules to at least one round of selection, wherein each round comprises at least one negative selection step wherein at least a portion of the pool is screened against a negative selection molecule, and at least one positive selection step wherein at least a portion of the pool is screened against a positive selection molecule. In some embodiments the method comprises at least two rounds, at least three rounds, at least four rounds, at least five rounds, at least six rounds, at least seven rounds, at least eight rounds, at least nine rounds, at least ten rounds, or more, wherein each round independently comprises at least one negative selection step and at least one positive selection step. In some embodiments, each round independently comprises more than one negative selection step, or more than one positive selection step, or a combination thereof.
In some embodiments wherein the method comprises more than one round, each negative and positive selection molecule is independently chosen. In other embodiments, the same negative selection molecule, or the same positive selection molecule, or a combination thereof, may be used in more than one round. For example, in
Such methods of selection use positive (+) and negative (−) steps to steer the library of candidate binding molecules towards and away from certain desired characteristics, such as binding specificity or binding affinity. By using multiple steps with both positive and negative selection molecules, the pool of candidates can be directed in a stepwise manner to select for characteristics that are desirable and against characteristics that are undesirable. Further, in some embodiments the order of each step within each round, and the order of the rounds relative to each other can direct the selection in different directions. Thus, for example, in some embodiments a method comprising one round with (+) selection followed by (−) selection will result in a different final pool of candidates than if (−) selection is first, followed by (+) selection. Extrapolating this out to methods comprising multiple rounds, the order of selection steps may result in a different final pool of selected candidates even if the same positive and negative selection molecules are used overall.
In some embodiments a selection molecule is used that has in inverse characteristic of another selection molecule. This may be useful, for example, to ensure that the candidate binding partners identified using the positive selection molecule (or excluded because of a negative selection molecule) were identified (or excluded) because of a desired trait (or undesired trait), not because of a separate, unrelated binding interaction. To remove binding partners that are binding through unrelated interactions, an inverse selection molecule can be used that has similar or the same structure and characteristics as the selection molecule, except for the residues/structures conveying the desired trait (or undesired trait). For example, if interaction with a particular charge pattern in a positive selection molecule is desired, an inverse negative selection molecule may be used that has replaced the residues providing that charge pattern with uncharged residues, and/or residues of the opposite charge. Thus, for certain selection molecules, multiple different corresponding inverse selection molecules may be possible.
In the selection methods provided herein, at least one of the selection molecules is an engineered peptide as described herein. In some embodiments, more than one engineered peptide is used. In some embodiments, each engineered peptide is independently a positive or negative selection molecule. In certain embodiments, each selection molecule used in the one or more rounds of selection is independently an engineered peptide. In other embodiments, at least one molecule that is not an engineered peptide is used as a selection molecule. Such selection molecules that are not engineered peptides may comprise, for example, a naturally-occurring polypeptide, or a portion thereof. In other embodiments, one or more selection molecules that are not engineered peptides may comprise, for example, a non-naturally occurring polypeptide or portion thereof. For example, in some embodiments one or more selection molecules (e.g., positive selection molecule or negative selection molecule) is an immunogen, an antibody, cell-surface receptor, or a transmembrane protein, or a signaling protein, or a multiprotein complex, or a peptide-protein complex, or any portions thereof, or any combinations thereof. In some embodiments, one or more selection molecules is PD-1, PD-L1, CD25, IL2, MIF, CXCR4, or VEGF, or a portion of any of these, or an antibody to any of these (such as Bevacizumab, Avelumab, or Durvalumab).
The positive and negative characteristics being selected for or against in each step may be selected from a variety of traits, and may be tailored depending on the desired features of the final one or more binding molecules obtained. Such desired features may depend, for example, on the intended use of the one or more binding molecules. For example, in some embodiments the methods provided herein are used to screen antibody candidates for one or more positive characteristics such as high specificity, and against one or more negative characteristics such as cross-reactivity. It should be understood that what is considered a positive characteristic in one context might be a negative characteristic in another context, and vice versa. Thus, a positive selection molecule in one series of selection rounds may, in some embodiments, be a negative selection molecule in a different series of selection rounds, or in selecting a different type of binding molecule, or in selecting the same type of binding molecule but for a different purpose.
In some embodiments, each selection characteristic is independently selected from the group consisting of amino acid sequence, polypeptide secondary structure, molecular dynamics, chemical features, biological function, immunogenicity, reference target(s) multi-specificity, cross-species reference target reactivity, selectivity of desired reference target(s) over undesired reference target(s), selectivity of reference target(s) within a sequence and/or structurally homologous family, selectivity of reference target(s) with similar protein function, selectivity of distinct desired reference target(s) from a larger family of undesired targets with high sequence and/or structurally homology, selectivity for distinct reference target alleles or mutations, selectivity for distinct reference target residue level chemical modifications, selectivity for cell type, selectivity for tissue type, selectivity for tissue environment, tolerance to reference target(s) structural diversity, tolerance to reference target(s) sequence diversity, and tolerance to reference target(s) dynamics diversity. In some embodiments, each selection characteristic is a different type of selection characteristic. In other embodiments, two or more selection characteristics are different characteristics but of the same type. For example, in some embodiments, two or more selection characteristics are polypeptide secondary structure, wherein one is a positive selection for a desired polypeptide secondary structure and one is a negative selection for an undesired polypeptide secondary structure. In some embodiments, two or more selection characteristics are selectivity for cell type, wherein a positive selection characteristic is selectivity for a specific desired cell type, and a negative selection characteristic is selectivity for a specific undesired cell type. In some embodiments, two or more, three or more, four or more, five or more, or six or more selection characteristics are of the same type.
In yet another aspect, provided herein is a composition comprising two or more selection steering polypeptides, wherein each polypeptide is independently a positive selection molecule comprising one or more positive steering characteristics, or a negative selection molecule comprising one or more negative steering characteristics. Such characteristics may, in some embodiments, be selected from the group consisting of amino acid sequence, polypeptide secondary structure, molecular dynamics, chemical features, biological function, immunogenicity, reference target(s) multi-specificity, cross-species reference target reactivity, selectivity of desired reference target(s) over undesired reference target(s), selectivity of reference target(s) within a sequence and/or structurally homologous family, selectivity of reference target(s) with similar protein function, selectivity of distinct desired reference target(s) from a larger family of undesired targets with high sequence and/or structurally homology, selectivity for distinct reference target alleles or mutations, selectivity for distinct reference target residue level chemical modifications, selectivity for cell type, selectivity for tissue type, selectivity for tissue environment, tolerance to reference target(s) structural diversity, tolerance to reference target(s) sequence diversity, and tolerance to reference target(s) dynamics diversity.
Thus, in further aspects, provided herein is a method of screening a library of binding molecules with a selection steering composition as described herein, wherein each round of selection comprises: a negative selection step of screening at least a portion of the pool against a negative selection molecule; and a positive selection step of screening at least a portion of the pool for a positive selection molecule; wherein the order of selection steps within each round, and the order of rounds, result in the selection of a different subset of the pool than an alternative order.
In some embodiments, the binding partners being evaluated using the composition of selection steering polypeptides as described herein, or the methods of screening as described herein, are a phage library, for example a Fab-containing phage library; or a cell library, for example a B-cell library or a T-cell library.
In some embodiments of the methods of screening provided herein, the methods comprise two or more, three or more, four or more, five or more, six or more, or seven or more rounds of selection. In some embodiments, wherein there is more than one round, each round comprises a different set of selection molecules. In other embodiments, wherein there is more than one round, at least two rounds comprise the same negative selection molecule, the same positive selection molecule, or both.
In some embodiments of the screening methods, the method comprises analyzing the subset of the pool prior to proceeding to the next round of selection. In certain embodiments, each subset pool analysis is independently selected from the group consisting of peptide/protein biosensor binding, peptide/protein ELISA, peptide library binding, cell extract binding, cell surface binding, cell activity assay, cell proliferation assay, cell death assay, enzyme activity assay, gene expression profile, protein modification assay, Western blot, and immunohistochemistry. In some embodiments, gene expression profile comprises full sequence repertoire analysis of the subset pool, such as next-generation sequencing. In some embodiments, statistical and/or informatic scoring, or machine learning training is used to evaluate one or more subsets of the pool in one or more selection rounds.
In some embodiments, the identity and/or order of positive and/or negative selection molecules for a subsequent round is determined by analyzing a subset pool from one selection round. In some embodiments, statistical and/or informatic scoring, or machine learning training, is used to evaluate one or more subsets of the pool in one or more selection rounds to determine the identity and/or order of the positive and/or negative selection molecules for a subsequent round (such as the next round, or a round further along in the program).
In still further embodiments, the methods of selection include modifying a subset pool obtained from a selection round before proceeding to the next selection round. Such modifications may include, for example, genetic mutation of the subset pool, genetic depletion of the subset pool (e.g., selecting a subset of the subset pool to move forward in selection), genetic enrichment of the subset pool (e.g., increasing the size of the pool), chemical modification of at least a portion of the subset pool, or enzymatic modification of at least a portion of the subset pool, or any combinations thereof. In some embodiments, statistical and/or informatic scoring, or machine learning training is used to evaluate a subset pool and determine the one or more modifications to make prior to moving the modified subset pool forward in selection. In certain embodiments, such statistical and/or informatic scoring, or machine learning training, is also used to determine the identity and/or order of positive and/or negative selection molecules for a subsequent round of selection.
Any suitable assay may be used to evaluate the binding of a pool of binding partners with the selection molecules in each step. In some embodiments, binding is directly evaluated, for example by directly detecting a label on the binding partner. Such labels may include, for example, fluorescent labels, such as a fluorophore or a fluorescent protein. In other embodiments, binding is indirectly evaluated, for example using a sandwich assay. In a sandwich assay, a binding partner binds to the selection molecule, and then a secondary labeled reagent is added to label the bound binding partner. This secondary labeled reagent is then detected. Examples of sandwich assay components include His-tagged-binding partner detected with an anti-His-tag antibody or His-tag-specific fluorescent probe; a biotin-labeled binding partner detected with labeled streptavidin or labeled avidin; or an unlabeled binding partner detected with an anti-binding-partner antibody.
In some embodiments, the binding partners being selected in each step are identified based on the binding signal, or dose-response, using any number of available detection methods. These detection methods may include, for example, imaging, fluorescence-activated cell sorting (FACS), mass spectrometry, or biosensors. In some embodiments, a hit threshold is defined (for example the median signal), and any with signal above that signal is flagged as a putative hit motif.
IV. Use of Engineered Peptides to Produce Antibodies
The engineered peptides provided herein, and identified by the methods provided herein, may be used, for example, to produce one or more antibodies. In some embodiments, the antibody is a monoclonal or polyclonal antibody. Thus, in some embodiments, provided herein is an antibody produced by immunizing an animal with an immunogen, wherein the immunogen is an engineered peptide as provided herein. In some embodiments, the animal is a human, a rabbit, a mouse, a hamster, a monkey, etc. In certain embodiments, the monkey is a cynomolgus monkey, a macaque monkey, or a rhesus macaque monkey. Immunizing the animal with an engineered peptide can comprise, for example, administering at least one dose of a composition comprising the peptide and optionally an adjuvant to the animal. In some embodiments, generating the antibody from an animal comprises isolating a B cell which expresses the antibody. Some embodiments further comprise fusing the B cell with a myeloma cell to create a hybridoma which expresses the antibody. In some embodiments, the antibody generated using the engineered peptide can cross react with a human and a monkey, for example a cynomolgus monkey.
The description provided herein sets forth numerous exemplary configurations, methods, parameters, and the like. It should be recognized, however, that such description is not intended as a limitation on the scope of the present disclosure, but is instead provided as a description of exemplary embodiments.
Embodiment I-1. An engineered peptide, wherein the engineered peptide has a molecular mass of between 1 kDa and 10 kDa and comprises up to 50 amino acids, and wherein the engineered peptide comprises:
a combination of spatially-associated topological constraints, wherein one or more of the constraints is a reference target-derived constraint; and
wherein between 10% to 98% of the amino acids of the engineered peptide meet the one or more reference target-derived constraints,
wherein the amino acids that meet the one or more reference target-derived constraints have less than 8.0 Å backbone root-mean-square deviation (RSMD) structural homology with the reference target.
Embodiment I-2. The engineered peptide of embodiment I-1, wherein the amino acids that meet the one or more reference target-derived constraints have between 10% and 90% sequence homology with the reference target.
Embodiment I-3. The engineered peptide of embodiment I-1 or I-2, wherein the amino acids that meet the one or more reference target-derived constraints have a van der Waals surface area overlap with the reference of between 30 Å2 to 3000 Å2.
Embodiment I-4. The engineered peptide of any one of embodiments I-1 to I-3, wherein the combination comprises at least two reference target-derived constraints.
Embodiment I-5. The engineered peptide of any one of embodiments I-1 to I-4, wherein the combination comprises at least five reference target-derived constraints.
Embodiment I-6. The engineered peptide of any one of embodiments I-1 to I-5, wherein the combination of constraints comprises one or more constraints not derived from a reference target.
Embodiment I-7. The engineered peptide of embodiment I-6, wherein the one or more non-reference target-derived constraints describes a desired structural, dynamical, chemical, or functional characteristic, or any combinations thereof.
Embodiment I-8. The engineered peptide of any one of embodiments I-1 to I-7, wherein the constraints are independently selected from the group consisting of:
Embodiment I-9. The engineered peptide of any one of embodiments I-1 to I-8, wherein one or more constraints is independently an atomic fluctuation.
Embodiment I-10. The engineered peptide of any one of embodiments I-1 to I-9, wherein one or more constraints is independently a chemical descriptor.
Embodiment I-11. The engineered peptide of any one of embodiments I-1 to I-10, wherein one or more constraints is independently atomic distance.
Embodiment I-12. The engineered peptide of any one of embodiments I-1 to I-11, wherein one or more constraints is independently secondary structure.
Embodiment I-13. The engineered peptide of any one of embodiments I-1 to I-12, wherein one or more constraints is independently van der Waals surface.
Embodiment I-14. The engineered peptide of any one of embodiments I-1 to I-13, wherein one or more constraints is independently associated with a biological response or biological function.
Embodiment I-15. The engineered peptide of any one of embodiments I-1 to I-14, comprising one or more atoms associated with a biological response or biological function.
Embodiment I-16. The engineered peptide of any one of embodiments I-1 to I-15, comprising one or more amino acids associated with a biological response or biological function.
Embodiment I-17. The engineered peptide of any one of embodiments I-14 to I-16, wherein the biological response or biological function is selected from the group consisting of gene expression, metabolic activity, protein expression, cell proliferation, cell death, cytokine secretion, kinase activity, epigenetic modification, cell killing activity, inflammatory signals, chemotaxis, tissue infiltration, immune cell lineage commitment, tissue microenvironment modification, immune synapse formation, IL-2 secretion, IL-10 secretion, growth factor secretion, interferon gamma secretion, transforming growth factor beta secretion, immunoreceptor tyrosine-based activation motif activity, immunoreceptor tyrosine-based inhibition motif activity, antibody directed cell cytotoxicity, complement directed cytotoxicity, biological pathway agonism, biological pathway antagonism, biological pathway redirection, kinase cascade modification, proteolytic pathway modification, proteostasis pathway modification, protein folding/pathways, post-translational modification pathways, metabolic pathways, gene transcription/translation, mRNA degradation pathways, gene methylation/acetylation pathways, histone modification pathways, epigenetic pathways, immune directed clearance, opsonization, hormone signaling, integrin pathways, membrane protein signal transduction, ion channel flux, and g-protein coupled receptor response.
Embodiment I-18. The engineered peptide of embodiment I-15, wherein the reference target comprises one or more atoms associated with a biological response or biological function, and wherein the atomic fluctuations of the one or more atoms in the engineered peptide associated with a biological response or biological function overlap with the atomic fluctuations of the one or more atoms in the reference target associated with a biological response or biological function.
Embodiment I-19. The engineered peptide of embodiment I-18, wherein the overlap is a root mean square inner product (RMSIP) greater than 0.25.
Embodiment I-20. The engineered peptide of embodiment I-19, wherein the overlap has a root mean square inner product (RMSIP) greater than 0.75.
Embodiment I-21. The engineered peptide of any one of embodiments I-18 to I-20, wherein at least a portion of the atoms in the engineered peptide associated with a biological response or biological function are topologically constrained to a secondary structural element in the reference target.
Embodiment I-22. The engineered peptide of embodiment I-21, wherein the secondary structural element is a beta-sheet.
Embodiment I-23. The engineered peptide of embodiment I-21, wherein the secondary structural element is an alpha helix.
Embodiment I-24. The engineered peptide of embodiment I-21, wherein the secondary structural element is a turn, wherein the turn comprises between 2 to 7 residues, and comprises at least one inter-residue hydrogen bond.
Embodiment I-25. The engineered peptide of embodiment I-21, wherein the secondary structural element is a coil, wherein the coil comprises between 2 to 20 residues.
Embodiment I-26. The engineered peptide of embodiment I-25, wherein the coil comprises no inter-residue hydrogen bonds.
Embodiment I-27. The engineered peptide of any one of embodiments I-21 to I-26, wherein at least a portion of the atoms in the engineered peptide associated with a biological response or biological function are topologically constrained to a combination of two or more secondary structural elements independently selected from the group consisting of a beta-sheet, an alpha helix, a turn, and a coil.
Embodiment I-28. The engineered peptide of any one of embodiments I-1 to I-27, wherein one or more spatially-associated topological constraints is atomic distance.
Embodiment I-29. The engineered peptide of any one of embodiments I-1 to I-28, wherein one or more spatially-associated topological constraints is an atomic energy.
Embodiment I-30. The engineered peptide of embodiment I-29, wherein each atomic energy is independently pairwise attractive energy between two atoms, pairwise repulsive energy between two atoms, atom-level solvation energy, pairwise charged attraction energy between two atoms, pairwise hydrogen bonding attraction energy between two atoms, or non-covalent bonding energy.
Embodiment I-31. The engineered peptide of any one of embodiments I-1 to I-30, wherein one or more spatially-associated topological constraints is a chemical descriptor.
Embodiment I-32. The engineered peptide of embodiment I-31, wherein each chemical descriptor is independently hydrophobicity, polarity, volume, net charge, log P, high performance liquid chromatography retention, or van der Waals radii.
Embodiment I-33. The engineered peptide of any one of embodiments I-1 to I-32, wherein one or more spatially-associated topological constraints is a bioinformatic descriptor.
Embodiment I-34. The engineered peptide of embodiment I-33, wherein each bioinformatics descriptor is independently BLOSUM similarity, pKa, zScale, Cruciani Properties, Kidera Factors, VHSE-scale, ProtFP, MS-WHIM scores, T-scale, ST-scale, Transmembrane tendency, protein buried area, helix propensity, sheet propensity, coil propensity, turn propensity, immunogenic propensity, antibody epitope occurrence, or protein interface occurrence.
Embodiment I-35. The engineered peptide of any one of embodiments I-1 to I-34, wherein one or more spatially-associated topological constraints is solvent exposure.
Embodiment I-36. The engineered peptide of any one of embodiments I-1 to I-35, wherein at least one of the one or more reference target-derived constraints is a GPCR extracellular domain.
Embodiment I-37. The engineered peptide of any one of embodiments I-1 to I-36, wherein at least one of the one or more reference target-derived constraints is an ion channel extracellular domain.
Embodiment I-38. The engineered peptide of any one of embodiments I-1 to I-37, wherein at least one of the one or more reference target-derived constraints is a protein-protein or peptide-protein interface junction.
Embodiment I-39. The engineered peptide of any one of embodiments I-1 to I-38, wherein at least one of the one or more reference target-derived constraints is derived from a polymorphic region of the target.
Embodiment I-40. The engineered peptide of any one of embodiments I-1 to I-39, comprising one or more atoms associated with a biological response or biological function, wherein each of the one or more atoms is independently selected from the group consisting of carbon, oxygen, nitrogen, hydrogen, sulfur, phosphorus, sodium, potassium, zinc, manganese, magnesium, copper, iron, molybdenum, and nickel.
Embodiment I-41. The engineered peptide of any one of embodiments I-1 to I-40, comprising one or more amino acids associated with a biological function or biological response, wherein each of the one or more amino acids is independently a proteinogenic naturally occurring amino acid, a non-proteinogenic naturally occurring amino acid, or a chemically synthesized non-natural amino acid.
Embodiment I-42. The engineered peptide of any one of embodiments I-1 to I-41, wherein the engineered peptide has at least one structural difference when compared to the reference target.
Embodiment I-43. The engineered peptide of embodiment I-42, wherein the at least one structural difference is independently selected from the group consisting of sequence, number of amino acid residues, total number of atoms, total hydrophilicity, total hydrophobicity total positive charge, total negative charge, one or more secondary structures, shape factor, Zemike descriptors, van der Waals surface, structure graph nodes and edges, volumetric surface, electrostatic potential surface, hydrophobic potential surface, local diameter, local surface features, skeleton model, charge density, hydrophilic density, surface to volume ratio, amphiphilicity density, and surface roughness
Embodiment I-44. The engineered peptide of embodiment I-16, wherein the difference in one or more secondary structures is the presence of one or more additional secondary structural elements in the engineered peptide compared to the reference target, wherein each additional secondary structural element is independently selected from the group consisting of alpha helices, beta-sheets, loops, turns, and coils.
Embodiment I-45. The engineered peptide of any one of embodiments I-1 to I-44, wherein between 10% to 90% of the amino acids meet one or more non-reference target-derived topological constraints.
Embodiment I-46. The engineered peptide of embodiment I-45, wherein the one or more non-reference target-derived topological constraints enforce a pre-specified function.
Embodiment I-47. The engineered peptide of embodiment I-46, wherein the
Embodiment I-48. A method of selecting an engineered peptide, comprising:
Embodiment I-49. The method of embodiment I-48, wherein the overlap between each characteristic is independently less than or equal to 75% Mean Percentage Error (MPE) as determined by one or more of Total Topological Constraint Distance (TCD), topological clustering coefficient (TCC), Euclidean distance, power distance, Soergel distance, Canberra distance, Sorensen distance, Jaccard distance, Mahalanobis distance, Hamming distance, Quantitative Estimate of Likeness (QEL), or Chain Topology Parameter (CTP).
Embodiment I-50. The method of embodiment I-48 or I-49, wherein one or more constraints is derived from per-residue energy, per-residue interaction, per-residue fluctuation, per-residue atomic distance, per-residue chemical descriptor, per-residue solvent exposure, per-residue amino acid sequence similarity, per-residue bioinformatic descriptor, per-residue non-covalent bonding propensity, per-residue phi/psi angles, per-residue van der Waals radii, per-residue secondary structure propensity, per-residue amino acid adjacency, per-residue amino acid contact.
Embodiment I-51. The method of any one of embodiments I-48 to I-50, wherein the characteristics of one or more candidate peptides are determined by computer simulation.
Embodiment I-52. The method of embodiment I-51, wherein the computer simulation comprises molecular dynamics simulations, Monte Carlo simulations, coarse-grained simulations, Gaussian network models, machine learning, or any combinations thereof.
Embodiment I-53. The method of any one of embodiments I-48 to I-52, wherein the characteristics of one or more candidate peptides are determined by experimental characterization.
Embodiment I-54. The method of any one of embodiments I-48 to I-53, wherein the amino acids meeting the one or more reference target-derived constraints have between 10% and 90% sequence homology with the reference target.
Embodiment I-55. The method of any one of embodiments I-48 to I-54, wherein the amino acids meeting the one or more reference target-derived constraints have a van der Waals surface area overlap with the reference of between 30 Å2 to 3000 Å2.
Embodiment I-56. The method of any one of embodiments I-48 to I-55, wherein the combination comprises at least two reference target-derived constraints.
Embodiment I-57. The method of any one of embodiments I-48 to I-56, wherein the combination comprises at least five reference target-derived constraints.
Embodiment I-58. The method of any one of embodiments I-48 to I-57, wherein the combination of constraints comprises one or more constraints not derived from a reference target.
Embodiment I-59. The method of embodiment I-58, wherein the one or more non-reference target-derived constraints describes a desired structural, dynamical, chemical, or functional characteristic, or any combinations thereof.
Embodiment I-60. The method of any one of embodiments I-48 to I-59, wherein the constraints are independently selected from the group consisting of:
Embodiment I-61. The method of any one of embodiments I-48 to I-60, wherein one or more constraints is independently an atomic fluctuation.
Embodiment I-62. The method of any one of embodiments I-48 to I-61, wherein one or more constraints is independently a chemical descriptor.
Embodiment I-63. The method of any one of embodiments I-48 to I-62, wherein one or more constraints is independently atomic distance.
Embodiment I-64. The method of any one of embodiments I-48 to I-63, wherein one or more constraints is independently secondary structure.
Embodiment I-65. The method of any one of embodiments I-48 to I-64, wherein one or more constraints is independently van der Waals surface.
Embodiment I-66. The method of any one of embodiments I-48 to I-65, wherein one or more constraints is independently associated with a biological response or biological function.
Embodiment I-67. The method of any one of embodiments I-48 to I-66, wherein the engineered peptide comprises one or more atoms associated with a biological response or biological function.
Embodiment I-68. The method of any one of embodiments I-48 to I-66, wherein the engineered peptide comprises one or more amino acids associated with a biological response or biological function
Embodiment I-69. The method of any one of embodiments I-66 to I-68, wherein the biological response or biological function is selected from the group consisting of gene expression, metabolic activity, protein expression, cell proliferation, cell death, cytokine secretion, kinase activity, epigenetic modification, cell killing activity, inflammatory signals, chemotaxis, tissue infiltration, immune cell lineage commitment, tissue microenvironment modification, immune synapse formation, IL-2 secretion, IL-10 secretion, growth factor secretion, interferon gamma secretion, transforming growth factor beta secretion, immunoreceptor tyrosine-based activation motif activity, immunoreceptor tyrosine-based inhibition motif activity, antibody directed cell cytotoxicity, complement directed cytotoxicity, biological pathway agonism, biological pathway antagonism, biological pathway redirection, kinase cascade modification, proteolytic pathway modification, proteostasis pathway modification, protein folding/pathways, post-translational modification pathways, metabolic pathways, gene transcription/translation, mRNA degradation pathways, gene methylation/acetylation pathways, histone modification pathways, epigenetic pathways, immune directed clearance, opsonization, hormone signaling, integrin pathways, membrane protein signal transduction, ion channel flux, and g-protein coupled receptor response.
Embodiment I-70. The method of embodiment I-66, wherein the reference target comprises one or more atoms associated with a biological response or biological function, and wherein the atomic fluctuations of the one or more atoms in the engineered peptide associated with a biological response or biological function overlap with the atomic fluctuations of the one or more atoms in the reference target associated with a biological response or biological function.
Embodiment I-71. The method of embodiment I-70, wherein the overlap is a root mean square inner product (RMSIP) greater than 0.25.
Embodiment I-72. The method of embodiment I-71, wherein the overlap has a root mean square inner product (RMSIP) greater than 0.75.
Embodiment I-73. The method of any one of embodiments I-67 to I-69, wherein at least a portion of the atoms in the engineered peptide associated with a biological response or biological function are topologically constrained to a secondary structural element in the reference target.
Embodiment I-74. The method of embodiment I-73, wherein the secondary structural element is a beta-sheet.
Embodiment I-75. The method of embodiment I-73, wherein the secondary structural element is an alpha helix.
Embodiment I-76. The method of embodiment I-73, wherein the secondary structural element is a turn, wherein the turn comprises between 2 to 7 residues, and comprises at least one inter-residue hydrogen bond.
Embodiment I-77. The method of embodiment I-73, wherein the secondary structural element is a coil, wherein the coil comprises between 2 to 20 residues.
Embodiment I-78. The method of embodiment I-73, wherein the coil comprises no inter-residue hydrogen bonds.
Embodiment I-79. The method of any one of embodiments I-67 to I-69, wherein at least a portion of the atoms in the engineered peptide associated with a biological response or biological function are topologically constrained to a combination of two or more secondary structural elements independently selected from the group consisting of a beta-sheet, an alpha helix, a turn, and a coil.
Embodiment I-80. The method of any one of embodiments I-48 to I-79, wherein one or more spatially-associated topological constraints is atomic distance.
Embodiment I-81. The method of any one of embodiments I-48 to I-80, wherein one or more spatially-associated topological constraints is an atomic energy.
Embodiment I-82. The method of embodiment I-81, wherein each atomic energy is independently pairwise attractive energy between two atoms, pairwise repulsive energy between two atoms, atom-level solvation energy, pairwise charged attraction energy between two atoms, pairwise hydrogen bonding attraction energy between two atoms, or non-covalent bonding energy.
Embodiment I-83. The method of any one of embodiments I-48 to I-82, wherein one or more spatially-associated topological constraints is a chemical descriptor.
Embodiment I-84. The method of embodiment I-83, wherein each chemical descriptor is independently hydrophobicity, polarity, volume, net charge, log P, high performance liquid chromatography retention, or van der Waals radii.
Embodiment I-85. The method of any one of embodiments I-48 to I-84, wherein one or more spatially-associated topological constraints is a bioinformatic descriptor.
Embodiment I-86. The method of embodiment I-85, wherein each bioinformatics descriptor is independently BLOSUM similarity, pKa, zScale, Cruciani Properties, Kidera Factors, VHSE-scale, ProtFP, MS-WHIM scores, T-scale, ST-scale, Transmembrane tendency, protein buried area, helix propensity, sheet propensity, coil propensity, turn propensity, immunogenic propensity, antibody epitope occurrence, or protein interface occurrence.
Embodiment I-87. The method of any one of embodiments I-48 to I-86, wherein one or more spatially-associated topological constraints is solvent exposure.
Embodiment I-88. The method of any one of embodiments I-48 to I-87, wherein at least one of the one or more reference target-derived constraints is a GPCR extracellular domain.
Embodiment I-89. The method of any one of embodiments I-48 to I-88, wherein at least one of the one or more reference target-derived constraints is an ion channel extracellular domain.
Embodiment I-90. The method of any one of embodiments I-48 to I-89, wherein at least one of the one or more reference target-derived constraints is a protein-protein or protein-peptide interface junction.
Embodiment I-91. The method of any one of embodiments I-48 to I-90, wherein at least one of the one or more reference target-derived constraints is derived from a polymorphic region of the target.
Embodiment I-92. The method of any one of embodiments I-48 to I-91, wherein the engineered peptide comprises one or more atoms associated with a biological response or biological function, wherein each of the one or more atoms is independently selected from the group consisting of carbon, oxygen, nitrogen, hydrogen, sulfur, phosphorus, sodium, potassium, zinc, manganese, magnesium, copper, iron, molybdenum, and nickel.
Embodiment I-93. The method of any one of embodiments I-48 to I-92, wherein the engineered peptide comprises one or more amino acids associated with a biological function or biological response, wherein each of the one or more amino acids is independently a proteinogenic naturally occurring amino acid, a non-proteinogenic naturally occurring amino acid, or a chemically synthesized non-natural amino acid.
Embodiment I-94. The method of any one of embodiments I-48 to I-93, wherein the engineered peptide has at least one structural difference when compared to the reference target.
Embodiment I-95. The method of embodiment I-94, wherein the at least one structural difference is independently selected from the group consisting of sequence, number of amino acid residues, total number of atoms, total hydrophilicity, total hydrophobicity total positive charge, total negative charge, one or more secondary structures, shape factor, Zernike descriptors, van der Waals surface, structure graph nodes and edges, volumetric surface, electrostatic potential surface, hydrophobic potential surface, local diameter, local surface features, skeleton model, charge density, hydrophilic density, surface to volume ratio, amphiphilicity density, and surface roughness
Embodiment I-96. The method of embodiment I-95, wherein the difference in one or more secondary structures is the presence of one or more additional secondary structural elements in the engineered peptide compared to the reference target, wherein each additional secondary structural element is independently selected from the group consisting of alpha helices, beta-sheets, loops, turns, and coils.
Embodiment I-97. The method of any one of embodiments I-48 to I-96, wherein between 10% to 90% of the amino acids of the engineered peptide meet one or more non-reference target-derived topological constraints.
Embodiment I-98. The method of embodiment I-97, wherein the one or more non-reference target-derived topological constraints enforce a pre-specified function.
Embodiment I-99. The method of embodiment I-98, wherein:
Embodiment I-100. A composition comprising two or more selection steering polypeptides, wherein each polypeptide is independently a positive selection molecule comprising one or more positive steering characteristics, or a negative selection molecule comprising one or more negative steering characteristics, wherein each characteristic type is independently selected from the group consisting of:
Embodiment I-101. The composition of embodiment I-100, wherein at least one of the two or more polypeptides is a positive selection molecule, and at least one of the two or more polypeptides is a negative selection molecule.
Embodiment I-102. The composition of embodiment I-100 or I-101, wherein at least one of the two or more polypeptides is a native protein.
Embodiment I-103. The composition of any one of embodiments I-100 to I-102, comprising at least one pair of counterpart positive and negative selection molecules comprising at least one shared characteristic type, wherein the positive selection molecule comprises the positive characteristic and the negative selection molecule comprises the negative characteristic.
Embodiment I-104. A method of screening a library of binding molecules with the composition of embodiment I-100, comprising subjecting a pool of candidate binding molecules to at least one round of selection, wherein each round of selection comprises:
Embodiment I-105. The method of embodiment I-104, wherein the library of binding molecules is a phage library.
Embodiment I-106. The method of embodiment I-105, wherein the library of binding molecules is a cell library.
Embodiment I-107. The method of embodiment I-106, wherein the library of binding molecules is a B-cell library.
Embodiment I-108. The method of embodiment I-106, wherein the library of binding molecules is a T-cell library.
Embodiment I-109. The method of any one of embodiments I-104 to I-108, comprising two or more rounds of selection.
Embodiment I-110. The method of any one of embodiments I-104 to I-109, comprising three or more rounds of selection.
Embodiment I-111. The method of embodiment I-109 or I-110, wherein each round comprises a different set of selection molecules.
Embodiment I-112. The method of embodiment I-109 or I-110, wherein at least two rounds comprise the same negative selection molecule, or the same positive selection molecule, or both.
Embodiment I-113. The method of any one embodiments I-109 to I-112, comprising analyzing the subset of the pool obtained from a round of selection prior to proceeding to the next round of selection.
Embodiment I-114. The method of embodiment I-113, wherein the subset pool analysis determines the set of positive and/or negative selection molecules used in one or more subsequent rounds of selection.
Embodiment I-115. The method of embodiment I-113 or I-114, wherein each subset pool analysis is independently selected from the group consisting of peptide/protein biosensor binding, peptide/protein ELISA, peptide library binding, cell extract binding, cell surface binding, cell activity assay, cell proliferation assay, cell death assay, enzyme activity assay, gene expression profile, protein modification assay, Western blot, and immunohistochemistry.
Embodiment I-116. The method of any one of embodiments I-113 to I-115, wherein the positive, negative, or both positive and negative selection molecules used in one or more subsequent rounds of selection are determined by statistical/informatic scoring, or machine learning training, of a subset pool analysis.
Embodiment I-117. The method of any one of embodiments I-109 to I-116, wherein the subset pool obtained from a round of selection is modified before moving to the next selection round.
Embodiment I-118. The method embodiment I-117, wherein the subset pool analysis determines the positive, negative, or both positive and negative selection molecules used in one or more subsequent rounds of selection; and modification of the subset pool before moving to the next selection round.
Embodiment I-119. The method of embodiment I-117 or I-118, wherein each modification is independently selected from the group selected from genetic mutation, genetic depletion, genetic enrichment, chemical modification, and enzymatic modification.
The following Examples are merely illustrative and are not meant to limit any aspects of the present disclosure in any way.
As shown in
An additional constraint was added to the combination for evaluation of one candidate engineered peptide—atomic fluctuation (
Using the same reference target identified in Example 1 above, a second set of engineered peptides were developed. Engineered peptide candidates were generated using computational protein design (e.g. Rosetta) or other methods of sampling peptide space, and dynamics simulations were performed on the candidates. A covariance matrix of atomic fluctuations was generated for the reference target epitope, and for the residues in the candidates corresponding to the residues in the epitope of the reference target.
Principal component analysis was performed to compute the eigenvectors and eigenvalues for each covariance matrix—one covariance matrix for the reference target and one covariance for each of the candidates—and only those eigenvectors with the largest eigenvalues are retained (
Since the ordering of eigenvectors is based on their eigenvalues, and eigenvalues may not necessarily be the same between two different molecules due to the stochastic nature by which molecular dynamics simulations sample the underlying energy landscape of those different molecules, the inner product between multiple, differentially ranked eigenvectors was needed (e.g. eigenvector 1 of the candidate by eigenvector 2, 3, 4, etc. of the reference target). In addition, without wishing to be bound by any theory, molecular motions are complex and may involve more than one (or more than a few) dominant/principal modes of motion.
To solve these two challenges, the inner product between all pairs of eigenvectors in the candidates and the reference target were computed. This resulted in a matrix of inner products the dimensions of which were determined by the number of eigenvectors analyzed—for 10 eigenvectors, the matrix of inner products is 10 by 10. This matrix of inner products was distilled into a single value by computing the root mean-square value of the inner products. This is the root mean square inner product (RMSIP).
Principal component analysis (PCA) reduces the 3L×3L dimensional coordinate covariance matrices (L being number of atoms) into sets of eigenvectors, Φ (reference target) and Ψ (MEM), and eigenvalues, Λ. The set Φ contains N eigenvectors φi for the reference target and the set Ψ contains N eigenvectors ψj for the MEM, where eigenvectors are ordered in their respective sets by their associated eigenvalues. The eigenvector with the largest eigenvalue accounts for the largest fraction of total coordinate covariation. The inner product of each φi and ψj eigenvector is computed to compare the similarity of motion between the reference target and the MEM. The root mean square of all inner product combinations of φi and ψj eigenvectors renders the total similarity of motion of the engineered peptide candidate (MEM) to the reference target (RMSIP) (
The RMSIP results from 5 candidate engineered peptides vs. the VEGF reference epitope are shown in Table 1. These data were sampled from a total simulation of 1000 candidates generated using Rosetta design with a candidate vs. reference static structure RMSD cutoff. Of the 1000 candidates, XTR-1000-TO had the lowest Rosetta (static structure) Energy (lower is more favorable), but intermediate RMSIP dynamics matching. Candidates XTR-1000-B1 and B2 had the highest dynamics-matching score (e.g., their motions most closely matched the motions of the reference target, computed by RMSIP). Candidates XTR-1000-W1 and W2 had the lowest dynamics-matching score, shown to demonstrate the RMSIP dynamic range in this 1000 candidate data set, RMSIP range 0.772-0.545. Structures of the candidates aligned to the VEGF reference epitope are shown in
The three engineered peptides described in Example 1, and an additional fourth engineered peptide developed following a similar procedure were used in series of phage panning procedures. These peptides are shown in
Octet/Biosensor Screening: The affinity of the different engineered peptides were evaluated on an Octet Red 384 instrument, using a single-cycle kinetics assay design. The peptides were evaluated separately, and immobilized via a biotin linker to the streptavidin-coated tip of the biosensor. The remaining open streptavidin sites were blocked with biocytin. An analyte was washed over the sensor tip and the binding of the molecules in the analyte to the peptides recorded. For this assay, the analyte was a serial dilution of Bevacizumab, from 0.19 uM to 1.5 uM. Each assay was run in duplicate. Controls were also run, using just a buffer (to control for sensor drift) and a separate control of purified IgG from human ND serum (to control for non-specific IgG binding).
Seven different panning programs were devised, each comprising three rounds, with each round comprising a positive selection step and a negative selection step (
The panning protocol began with a human naïve scFv library, and panning was performed in solution, with the selection molecules bound to biotin (but still in solution). For each round, the starting pool was combined with the negative selection molecule first in solution, and then a streptavidin-coated substrate (e.g., magnetic beads) was applied to the mixture to bind the negative selection molecules. Thus, any phage in the pool that was bound to the negative selection molecule was also bound to the streptavidin-coated support. The remaining solution was removed, and this flow through was then taken on to the positive selection step. The flow through was combined with positive selection molecule, allowed to bind, and then a streptavidin-coated solid substrate applied to the mixture. In this step, the bound phage were retained while the remaining unbound phage were removed. Then the bound phage were then eluted. E. coli were transfected with the eluted phage using a 30 minute cultivation, the transfected cells were split for next-generation sequencing and DNA isolation for analysis, and then the phage amplified for use in the subsequent panning round. For each panning program, in each round negative selection was performed first, and positive selection second.
The candidate pools obtained from each of the seven panning programs plus the conventional panning method were then analyzed using ELISA for response to VEGF and sMEM positive selection molecule (iMEM corrected), to evaluate binding to full-length VEGF and to the putative epitope sMEM. The analyses of these ELISA tests are shown in
The clones that exhibited cross-blocking behavior were sequenced via Sanger sequencing, and it was found that 11 distinct clones were confirmed. Those obtained from the programmed in vitro selection using engineered peptides are shown in Table 3A. Those obtained via the conventional selection with VEGF and BSA are listed in Table 3B.
The selection pools were scored using the following equation:
Blocking Propensity=SUM(X-blocking Slope, (sMEM+VEGF)−iMEM), where X-blocking Slope, sMEM and VEGF are Robust Z-Scores.
Scoring rationale: If a blocking response is observed, through a significant (by robust z-score) negative slope, then blocking propensity is a combination of z-scores for VEGF binding and X-blocking slope. The blocking propensity is summarized in
The different selection programs were also evaluated for cross-blocking enrichment compared with the control (conventional) program, using a uniform, random sampling of all in vitro selection programs as compared to the conventional program (using just VEGF and BSA as selection molecules), at least four of the programs using engineered peptides showed enrichment, summarized in
1. Random-uniform sample of 96-clones from all panning programs, measure cross-blocking activity
2. Rank cross-blocking across all 96-clones
3. Perform Kruskal-Wallis test to calculate per-program mean cross-blocking rank vs. control
4. X-blocking enrichment=100%*(program cross-blocking mean rank−control mean rank)/(control mean rank)
The clones were also subjected to next-generation sequencing (NGS) to obtain information about the CDR loops on a genomic level.
The sequences were analyzed to determine if two unique sequences were actually different antibodies, versus sequencing errors, referred to as “clonality”. Normalized Shannon evaluation was also used, as shown in
While a classical panning approach using only a full length protein (VEGF) does focus diversity (Program 12), an engineered-peptide-programmed panning approach focuses repertoire diversity at least 2× more efficiently.
The engineered peptide (MEM)-programmed in vitro selection isolates distinct antibody clonotypes with higher diversity germline usage vs. conventional approach at the first round of selection. Using the sMEM-based in vitro selection produces more diverse light chain germline usage at round 1 vs. full length antigen and uMEM. MEM-based in vitro selection programs produce distinct heavy chain germline usage at round 2 vs. full length antigen. The order and identity of the MEM used in the in vitro selection program affect heavy chain germline usage. MEM-based in vitro selection programs produce distinct light chain germline usage at round 2 vs. full length antigen. The order and identity of the MEM used in the in vitro selection program affect light chain germline usage. MEM-based in vitro selection programs produce distinct, AND more diverse heavy chain germline usage at round 3 vs. full length antigen. The order and identity of the MEM used in the in vitro selection program affect heavy chain germline usage and diversity. MEM-based in vitro selection programs produce distinct, AND more diverse light chain germline usage at round 3 vs. full length antigen. The order and identity of the MEM used in the in vitro selection program affect light chain germline usage and diversity.
A summary of how the different phage panning programs focused Fab hits is provided in
The graphs summarizing on-epitope (sMEM) VEGF hit frequency per panning round for each program shown in
Using an identified therapeutic epitope reference target site on PD-L1, a series of engineered peptides (MEMs) were designed generally following a similar protocol as described in Example 2, as summarized in
The ELISA response of the resulting pools selected using each program to PD-L1 and the different engineered peptides are summarized in
These ELISA hits were analyzed with a dose-responsive PD-L1 competition with avelumab or durvalumab, at 0 nM, 67 pM, 670 pM, and 6.7 nM to identify 34 putative cross-blocking clone hits. Blocking propensity was calculated as follows: ELISA Z-Score(sMEM1+sMEM5+PD-L1−iMEM)+MAX(Avelumab Blocking Z-score, Durvalumab Blocking Z-score). A summary of the results is provided in Table 3 below.
The ELISA responses are provided in
These results were analyzed to determine if any of the in vitro selection programs produce a random-selection enrichment of clones that cross-block PD-L1:avelumab/durvalumab. Based on the ELISA and cross-blocking data using clones from a uniform, random sampling of all in vitro selection programs as compared to the conventional program (using just PD-L1 and BSA as selection molecules), at least two of the programs using engineered peptides showed enrichment. The results and summary of clones are shown in
Using a reference target, a topological characteristic of the reference target (sequence) is identified and encoded in a scaffold blueprint (
A machine-learning (ML) model may be trained on training data that includes representations of the scaffold blueprints and the corresponding scores. The representations may be, for example, one-dimensional vector of numbers, two dimensional matrices of alphanumerical data, three-dimensional tensor of normalized numbers. More specifically, in some instances, the representations are vectors including an ordered list of numbers of intervening scaffold residue positions. Such representations may be used because the order of target-residues can be inferred from target structures, therefore the representations do not need to identify the amino acid identity of target-residue positions. The scores of the scaffold blueprints can be generated using computational protein modeling (e.g., Rosetta remodeler) that determines an energy term for each scaffold blueprint. The scores can be then calculated based on the energy terms generated by the computational protein modeling.
The ML model can be, for example, a boosted decision tree algorithm, an ensemble of decision trees, an extreme gradient boosting (XGBoost) model, a random forest, a support vector machine (SVM), and/or the like. Once trained, the ML model is then executed to generate a set of predicted scores from a set of scaffold blueprints. If a predicted score is above a desired score, a scaffold blueprint corresponding to the predicted score can be simulated by computational protein modeling to generate a ground-truth score. The ground-truth score and the predicted score can be compared to determine retraining of the ML model. In some implantations, the training and executing steps may be iterated as shown in
This application is a continuation of International Patent Application No. PCT/US2020/032715, filed May 13, 2020, which claims the priority benefit of U.S. Provisional Patent Application No. 62/855,767, filed May 31, 2019, the entire contents of which are hereby incorporated by reference in their entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
62855767 | May 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2020/032715 | May 2020 | US |
Child | 17537215 | US |