A computer readable form of the Sequence Listing is filed with this application by electronic submission and is incorporated into this application by reference in its entirety. The Sequence Listing is contained in the file created on Sep. 14, 2021 having the file name “20-1325-WO-SeqList_ST25.txt” and is 41 kb in size.
Cyclic symmetry is frequent in protein and peptide homo-oligomers, but extremely rare within a single chain, as it is not compatible with free N and C termini. The ability to design symmetric, well-folded polypeptide macrocycles would open up new avenues for both therapeutic design and for bounded and unbounded nanomaterial design.
In one aspect, the disclosure provides polypeptide comprising or consisting of an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO: 1-91 as shown in Table 1, wherein:
In one embodiment, no proline or AIB residues may be added by amino acid change relative to the reference polypeptide. In another embodiment, the polypeptide has C2 symmetry, and comprises an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO:1-8. In a further embodiment, the polypeptide has S2 symmetry, and comprises an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO: 9-14. In one embodiment, the polypeptide has C3 symmetry, and comprises an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO:15-72. In one such embodiment, the first residue of the asymmetric unit is L-Proline and the third residue of the asymmetric unit is L-Aspartic acid, the 2nd residue can be any non-glycine, non-proline, non-AIB L-amino acid.
In another embodiment, the polypeptide has C4 symmetry, and comprises an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO:73-76. In one such embodiment, the polypeptide is bound to a metal ion, including but not limited to Zn2+.
In one embodiment, the polypeptide has C5 symmetry, and comprises an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO:77-90. In another embodiment, the polypeptide has S4 symmetry, and comprises an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence of SEQ ID NO:91.
In a further embodiment, the polypeptide is conjugated to one or more additional components, including, but not limited to detectable tags, small molecules, radioactive agents, antibodies, polyethylene glycol, therapeutic moieties, and diagnostic moieties.
The disclosure also provides methods for use of the polypeptide of any embodiment, including but not limited to those uses described herein, and assembling them together with metals to form super molecular crystals. The disclosure further provides methods for designing mixed chirality peptide macrocycles with internal symmetry according to any embodiment or combination of embodiments described herein.
All references cited are herein incorporated by reference in their entirety. As used herein, the singular forms “a”. “an” and “the” include plural referents unless the context clearly dictates otherwise.
As used herein, the amino acid residues are abbreviated as follows: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R), cysteine (Cys; C), glutamic acid (Glu; E), glutamine (Gln; Q), glycine (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Leu; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V).
Amino acid residues shown in upper case are L amino acids, and residues in lower case are D amino acids
In all embodiments of polypeptides disclosed herein, any N-terminal methionine residues are optional (i.e.: the N-terminal methionine residue may be present or may be absent).
All embodiments of any aspect of the disclosure can be used in combination, unless the context clearly dictates otherwise.
Unless the context clearly requires otherwise, throughout the description and the claims, the words ‘comprise’, ‘comprising’, and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to”. Words using the singular or plural number also include the plural and singular number, respectively. Additionally, the words “herein,” “above,” and “below” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of the application.
The description of embodiments of the disclosure is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. While the specific embodiments of, and examples for, the disclosure are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the disclosure, as those skilled in the relevant art will recognize.
In one aspect, the disclosure provides polypeptides comprising or consisting of an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO:1-91 (see Table 1), wherein:
In one embodiment, the polypeptide contains 0, 1, or 2 amino acid changes relative to the reference peptide.
The inventors have shown that the polypeptides of the disclosure are mixed-chirality peptide macrocycles with rigid structures that feature internal cyclic symmetries or improper rotational symmetries inaccessible to natural proteins, and can be used, for example, in therapeutic and nanomaterial design, as well as in synthetic switching systems and in co-assemblies with metals to form super molecular crystals.
The polypeptides are cyclic peptides in that there is a covalent linkage between the residues shown as N-terminal and C-terminal in SEQ ID NOS:1-91. Each of the polypeptides includes a series of repeats. For example:
In one embodiment wherein the polypeptides are mutated relative to the reference polypeptide, the same mutation (i.e.: the same amino acid residue substitution) is made to each repeat unit. In another embodiment, no proline or AIB residues may be added by amino acid change relative to the reference polypeptide.
In one embodiment, the polypeptide has C2 symmetry, and comprises an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO:1-8.
In another embodiment, the polypeptide has S2 symmetry, and comprises an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO:9-14.
In a further embodiment, the polypeptide has C3 symmetry, and comprises an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO: 15-72. In one such embodiment of the C3 symmetry polypeptides, the first residue of the asymmetric repeat unit is L-Proline and the third residue of the asymmetric unit is L-Aspartic acid, and the 2nd residue can be any non-glycine, non-proline, non-AIB L-amino acid
In one embodiment, the polypeptide has C4 symmetry, and comprises an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO:73-76. In one such embodiment, the polypeptide is bound to a metal ion, including but not limited to Zn2+. In this embodiment, the polypeptide may undergo a conformational change when bound to the metal ion as opposed to the unbound state, and thus may be used, for example, in detection of metal ions, or as a synthetic switching systems and in co-assemblies with metals to form super molecular crystals.
In another embodiment, the polypeptide has C5 symmetry, and comprises an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO:77-90.
In a further embodiment, the polypeptide has S4 symmetry, and comprises an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence of SEQ ID NO:91.
In all of these embodiments, the polypeptide may be conjugated to one or more additional components. In one embodiment, the one or more additional components may be cross-linked to a side chain of an amino acid residue in the polypeptide using conventional techniques. Any additional component may be conjugated to the polypeptides of the disclosure as appropriate for an intended use, including but not limited to detectable tags, small molecules, radioactive agents, antibodies, polyethylene glycol, therapeutic moieties, and diagnostic moieties. In all of these embodiments, the percent identity requirement does not include any additional functional domain that may be conjugated to the polypeptide.
In another embodiment, the disclosure provides compositions, comprising a plurality of the polypeptides of the disclosure attached to a scaffold. Any suitable scaffold may be used, including but not limited to polypeptide scaffolds, beads, virus-like particles, etc.
The polypeptides of the disclosure may be chemically synthesized using any suitable technique. Those polypeptides of the disclosure that include only L amino acids (SEQ ID NOs: 15, 17, 29, 47, 49, 67, 69, 71, 77, and 81) may be expressed recombinantly.
The disclosure also provides nucleic acids encoding a polypeptide comprising or consisting of an amino acid sequence at least 66%, 70%, 75%, 80%, 82%, 84%, 86%, 88%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO: 15, 17, 29, 47, 49, 67, 69, 71, 77, and 81, wherein:
The nucleic acid sequence may comprise single stranded or double stranded RNA or DNA in genomic or cDNA form, or DNA-RNA hybrids, each of which may include chemically or biochemically modified, non-natural, or derivatized nucleotide bases. Such nucleic acid sequences may comprise additional sequences useful for promoting expression and/or purification of the encoded protein, including but not limited to polyA sequences, modified Kozak sequences, and sequences encoding epitope tags, export signals, and secretory signals, nuclear localization signals, and plasma membrane localization signals. It will be apparent to those of skill in the art, based on the teachings herein, what nucleic acid sequences will encode the proteins of the disclosure.
In a further aspect, the disclosure provides expression vectors comprising nucleic acids of the disclosure operatively linked to a control sequence. “Expression vector” includes vectors that operatively link a nucleic acid coding region or gene to any control sequences capable of effecting expression of the gene product. “Control sequences” operatively linked to the nucleic acid sequences of the disclosure are nucleic acid sequences capable of effecting the expression of the nucleic acid molecules. The control sequences need not be contiguous with the nucleic acid sequences, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the nucleic acid sequences and the promoter sequence can still be considered “operably linked” to the coding sequence. Other such control sequences include, but are not limited to, polyadenylation signals, termination signals, and ribosome binding sites. Such expression vectors can be of any type, including but not limited plasmid and viral-based expression vectors. The control sequence used to drive expression of the disclosed nucleic acid sequences in a mammalian system may be constitutive (driven by any of a variety of promoters, including but not limited to, CMV, SV40, RSV, actin, EF) or inducible (driven by any of a number of inducible promoters including, but not limited to, tetracycline, ecdysone, steroid-responsive). The expression vector must be replicable in the host organisms either as an episome or by integration into host chromosomal DNA. In various embodiments, the expression vector may comprise a plasmid, viral-based vector, or any other suitable expression vector.
In one aspect, the disclosure provides recombinant host cell comprising the proteins, nucleic acids and/or the expression vectors of any embodiment or combination of embodiments of the disclosure. The host cells can be either prokaryotic or eukaryotic. The cells can be transiently or stably engineered to incorporate the expression vector of the invention, using techniques including but not limited to bacterial transformations, calcium phosphate co-precipitation, electroporation, or liposome mediated-, DEAE dextran mediated-, polycationic mediated-, or viral mediated transfection. (See, for example, Molecular Cloning: A Laboratory Manual (Sambrook, et al., 1989, Cold Spring Harbor Laboratory Press); Culture of Animal Cells: A Manual of Basic Technique, 2nd Ed. (R. I. Freshney. 1987. Liss, Inc. New York, N.Y.)). A method of producing a protein according to the invention is an additional part of the invention. The method comprises the steps of (a) culturing a host according to this aspect of the invention under conditions conducive to the expression of the protein, and (b) optionally, recovering the expressed protein. The expressed protein can be recovered from the cell free extract, but preferably they are recovered from the culture medium.
In another aspect, the disclosure provides methods for use of the polypeptides of any embodiment or combination of embodiments herein for any suitable use, including but not limited to therapeutic and nanomaterial design, biosensors, as well as in synthetic switching systems and in co-assemblies with metals to form super molecular crystals.
In a further aspect, the disclosure provides method for designing mixed chirality peptide macrocycles with internal symmetry according to any embodiment or combination of embodiments described herein. Exemplary such methods are detailed in the examples that follow.
Cyclic symmetry is frequent in protein and peptide homo-oligomers, but extremely rare within a single chain, as it is not compatible with free N and C termini. Here we describe the computational design of mixed-chirality peptide macrocycles with rigid structures that feature internal cyclic symmetries or improper rotational symmetries inaccessible to natural proteins. Crystal structures of three C2- and C3-symmetric macrocycles, and of six diverse S2-symmetric macrocycles, match the computationally-designed models with backbone heavy-atom RMSD values of 1 Å or better. Crystal structures of an S4-symmetric macrocycle (consisting of a sequence and structure segment mirrored at each of three successive repeats) designed to bind zinc reveal a large-scale zinc-driven conformational change from an S4-symmetric apo-state to a nearly inverted S4-symmetric holo-state almost identical to the design model. This work demonstrates the power of computational design for exploring symmetries and structures not found in nature, and for creating synthetic switchable systems.
We set out to develop general methods for computationally designing internally-symmetric peptide macrocycles with conformational rigidity imparted by large energy gaps between a symmetric ground state and all alternative conformations. To this end, we incorporated methods for sampling and designing with internal cyclic or improper rotational symmetries into the Rosetta™ heteropolymer design software. Our sampling methods use kinematic closure methods to provide analytical solutions for dihedral values yielding closed macrocycle conformations. Dihedral angles in all asymmetric units of the macrocycle (henceforth referred to as “lobes”) are required to match those in the first “reference” lobe to within a certain tolerance in the cyclic symmetric case, and after inversion in the case of improper rotational symmetries. Subsequent symmetric sequence design algorithms ensure residue identities and conformations in neighboring lobes match (for cyclic symmetry) or match with chirality inversion and inversion of dihedral values (for improper rotational symmetry).
Here, we apply the newly-developed computational methods to the creation of peptide macrocycles with cyclic symmetries (C2 or C3). We also explore the structures possible with improper rotational symmetries that are inaccessible to homochiral proteins, but which can be accessed by heterochiral peptides, demonstrating robust ability to design diverse, internally S2-symmetric folds. Finally, we present an S4-symmetric polypeptide macrocycle that functions as a conformational switch, inverting its fold in the presence or absence of zinc. Exemplary polypeptide designs are shown in Table 1. The robust computational design and validation methods demonstrated here are applicable to diverse problems in therapeutic and nanomaterial design.
Structures of Designed Peptide Macrocycles with C2 and C3 Symmetry
We developed a computational pipeline, summarized in the methods and in
We carried out large-scale conformational sampling on each of the designed sequences, and selected peptide macrocycles with large energy gaps between the designed conformation and all alternative states to synthesize chemically. We began by synthesizing one C2-symmetric and five C3-symmetric peptide macrocycles and their mirror-image enantiomorphs to facilitate the crystallization of the synthesized molecules from racemic mixture (Yeates and Kent, 2012). With this approach we succeeded in crystallizing the C2-symmetric peptide and three of the five C3-symmetric peptides (
Two of the three C3-symmetric peptides that we succeeded in crystallizing closely matched the design models (
Structures of Designed Peptide Macrocycles with S2 Symmetry
We next explored symmetries inaccessible to natural proteins. Unlike proteins built only from L-amino acids, peptides built from mixtures of D- and L-amino acids can access symmetries involving mirror operations, such as improper rotational symmetries. Using the same symmetric sampling, clustering, and sequence optimization strategy, we designed and synthesized a panel of 6 peptide macrocycles with S2 symmetry ranging in size from 8 to 12 amino acids (designs S2-1 through S2-6). These peptides have a sequence that repeats twice, with the chirality of residues in the second lobe inverted relative to the first, yielding a second lobe with a conformation mirroring that of the first. Six designs were selected for synthesis representing a diverse range of backbone conformations and hydrogen bonding patterns. We were able to crystallize all six of these peptides, and to determine their structures by direct phasing methods.
In all cases the observed conformation closely matched the design (overlays in
We next explored the possibility of using the new design methods to create conformational switches. We sampled S4-symmetric polypeptide conformations with 4 repeats in which alternating lobes had opposite chirality, and developed a computational strategy to select backbones that could present metal-binding side-chains for tetrahedral coordination of a central metal ion (see supplementary information). We designed sequences with L- and D-histidine residues positioned to chelate the metal ion, AIB and other residues to stabilize the fold, and apolar side-chains on the surface to stabilize an alternative, inside-out conformation in the absence of metal. The sequences of these designs repeat four times, with residues in the second and fourth lobes possessing chirality and conformations inverted relative to equivalent residues in the first and third lobes. We carried out large-scale conformational sampling to select designs with low-energy designed conformations with the histidine side-chains positioned to bind zinc. In several designs, we noted that large-scale conformational sampling predicted a second energy minimum corresponding to an inside-out fold with the apolar side-chains in the core and the histidines exposed. We selected a single design for synthesis, S4-1, which has sequence KLgeXHklQEXhKLqeXHklQEXh (SEQ ID NO:91), in which x represents AIB (
The 24-amino acid polypeptide S4-1 crystallized in space group P
The central metal ion plays an important structural role in holo-S4-1, stabilizing a conformation that presents four apolar D- and L-leucine side-chains to aqueous solvent (
The new peptide and polypeptide macrocycles presented here have diverse, rigid backbone folds closely matching the design models, in nearly all cases with sub-Angstrom accuracy. The structures were designed in four different symmetry classes (C2, C3, S2, and S4), the latter two of which are inaccessible to proteins or other natural macromolecules built from building-blocks of only one handedness. Likely because of their symmetry, the success rate of the designs was quite high, both in terms of crystallization of the designed peptides (11 of 13) and their close match to predicted structures (10 of 11). Since synthetic macrocycles are not limited to the 20 canonical amino acid building-blocks, we take advantage of the non-canonical, conformationally-constrained amino acid residue AIB to rigidify two macrocycles. We also illustrate the use of metal ligands as central structural elements. Moving forward the methods described here can be used to design with the thousands of possible non-canonical amino acids, as well as with bound metal ions.
The design of rigidly-folded heteropolymers requires efficient means of sampling backbone conformations, both to identify conformational states compatible with a given function that can be stabilized by a suitable choice of sequence, and to validate designed sequences by exploring possible alternative low-energy conformational states. This is particularly challenging when designing with non-canonical amino acid residues, requiring unbiased sampling methods that are not reliant on known structures. Because N-fold symmetric molecules have far fewer (1/N) conformational degrees of freedom than similarly sized asymmetric molecules, the new methods make comprehensive sampling tractable for much larger systems. By focusing on internally-symmetric macrocycles, we were able to achieve exhaustive or near-exhaustive coverage of the conformation spaces for peptides with up to 30 amino acids for the highest-order symmetries, a size range well beyond that which can be sampled exhaustively for asymmetric macrocycles. Since many applications of designed, well-folded heteropolymers require molecules that are able to present large binding interfaces (e.g. for nanomaterial self-assembly or therapeutic target binding), or molecules large enough to possess internal binding pockets (e.g. for small-molecule binding or catalysis), we anticipate that our computational methods for designing larger symmetric structures with non-canonical building blocks will have broad applicability.
Our most complex design, the 24-residue S4-1 polypeptide, binds zinc with sub-nanomolar affinity, and undergoes a major conformational change when zinc is removed. Both the apo and holo states are well-structured, with the former burying apolar side-chains and the latter exposing them (
We have presented general methods for computational design and validation of symmetric, well-folded polypeptide macrocycles, including those incorporating metals as structural elements, and have demonstrated robust ability to control structure with sub-Angstrom accuracy, culminating in an engineered, metal-dependent conformational switch. The ability to design symmetric, well-folded polypeptide macrocycles opens up new avenues for both therapeutic design and for bounded and unbounded nanomaterial design, and shows that methods originally developed for protein design can now be used to robustly design molecules quite unlike those that exist in nature.
Extensive modifications to the Rosetta™ software suite enabled the design of internally-symmetric peptide macrocycles, including those able to coordinate a central metal ion. New Rosetta™ modules were implemented to be compatible with both the PyRosetta™ and RosettaScripts™ scripting languages (Chaudhury et al., 2010; Fleishman et al., 2011), allowing their use in the development of future, application-specific design protocols.
The Rosetta™ symmetry code (DiMaio et al., 2011) was refactored to add support for mirror operations and improper rotational symmetries, and to correctly interconvert between mirrored amino acid types. Rosetta™ simple_cycpep_predict and energy_based_clustering applications, both described previously (Bhardwaj et al., 2016; Hosseinzadeh et al., 2017), were enhanced to allow sampling and clustering of quasi-symmetric backbones with a given symmetry (where a quasi-symmetric backbones is one that is nearly symmetric, but in which small deviations from perfect symmetry are allowed). A Rosetta™ module (“mover”) for converting quasi-symmetric structures to fully symmetric structures, called the SymetriCycpepAlignMover, was added. The interface and internal handling of non-canonical amino acids during design was greatly reworked, with the user-controlled PackerPalette introduced to control the set of chemical building-blocks used for a given design task, permitting deprecation of many problematic idiosyncrasies present in the previous interface to streamline the design process.
To enable the design of metal-binding peptides, Rosetta™ CrosslinkerMover was enhanced with support for a range of metal coordination geometries, with support for asymmetric structures or for the symmetry classes compatible with a given coordination geometry. For example, this mover allows the design of a tetrahedrally-coordinated zinc in an asymmetric structure or in a C2 or S4-symmetric structure, with suitable repetition of conformations and amino acid identities of the liganding residues.
To design symmetric peptides, we first sampled quasi-symmetric mainchain conformations using the simple_cycpep_predict application, and enumerated conformations with the energy_based_clustering application. In the case of the S4-1 polypeptide, this step was modified to sample only those backbone conformations capable of coordinating a central metal ion. Next, with scripts written in the RosettaScripts™ scripting language, we converted quasi-symmetric cluster centers to fully symmetric structures, and carried out sequence design with Rosetta™ symmetric design algorithms. Finally, we validated each designed sequence by large-scale conformational sampling, again using the simple_cypep_predict application, to identify those designed sequences that uniquely favoured the designed conformation. Computations were carried out on the University of Washington Hyak cluster, the Argonne National Laboratory Mira and Theta supercomputers, and the Simons Foundation Gordon and Iron clusters. Additionally, some validations were carried out using the Rosetta@Home™ distributed computing platform, which uses volunteer computers, cellular telephones, and mobile devices through the Berkeley Open Infrastructure for Network Computing (BOINC) (Anderson, 2004).
Peptides were synthesized using standard Fmoc solid-phase peptide synthesis techniques using a CEM Liberty Blue peptide synthesizer with microwave-heated coupling and deprotection steps. Peptides with twelve amino acids or fewer that contained L-aspartate or L-glutamate were synthesized tethered by the acidic side-chain to preloaded Fmoc-L-Asp(Wang resin LL)-ODmab or Fmoc-L-Glu(Wang resin LL)-ODmab resin, and were cyclized on-bead by a coupling reaction following deprotection of the C-terminus with 2% (v/v) hydrazine monohydrate treatment in dimethylformamide (DMF). Larger peptides were synthesized with the C-terminus coupled to Cl-TCP(Cl) resin from CEM, cleaved from the resin with 1% (v/v) TFA treatment in dicholoromethane (DCM), and cyclized by a solution-phase coupling reaction prior to final deprotection. Peptides were purified by reverse-phase HPLC with a water-acetonitrile gradient, lyophilized, and redissolved in buffer suitable for subsequent experiments (typically 100 mM HEPES, pH 7.5). Masses and purities were assessed by electrospray ionization mass spectrometry with a Thermo Scientific TSQ Quantum Access™ mass spectrometer. Full synthetic and purification protocols are described in the supplementary information.
Peptides were crystallized by hanging droplet vapour diffusion, with pH, buffer, ionic strength, and precipitants all optimized for each peptide. Growth conditions for the crystals of each peptide are described in the supplementary information. Diffraction data were collected at the Argonne National Laboratory Advanced Photon Source (APS) beamlines 24-ID-C and 24-ID-E.
To confirm zinc content of the 4-1 and 4-2 peptides, and to measure zinc affinity, we used a variant of the 4-(2-pyridylazo)resorcinol (PAR) assay described previously (Crow et al., 1997; Hunt et al., 1985, Mulligan et al., 2008). We carried this assay out in 96-well plates (200 μl total solution volume per well). To confirm metal content, 10 μM polypeptide was denatured in 6 M guanidinium hydrochloride (Sigma-Aldrich, St. Louis, Mo.), 100 mM HEPES, pH 7.5, and 200 μM PAR, and the change in absorbance at 490 nm was monitored using a SpectraMAX™ M5e plate reader (Molecular Devices, San Jose, Calif.). Standard curves were prepared with ZnCl2 to convert absorbance changes to zinc concentrations. The metal affinity of the S4-1 and S4-2 polypeptides was measured by competition with PAR, given the known dissociation constants of the PAR2-Zn complex. Full protocols and mathematical details for both assays are provided in the supplementary information.
1Only symmetry types that were synthesized are listed here. For full analysis of other symmetry types (e.g. C4, C5), please see the supplementary information.
2Results of clustering with a radius of 1.5 Å are reported. For full clustering analysis, please refer to the supplementary information.
Those cyclic peptides that contained an acidic L-amino acid residue, and which had twelve amino acid residues or fewer, were synthesized using standard Fmoc solid-phase peptide synthesis (SPPS) on preloaded and sidechain-linked Fmoc-L-Asp(Wang resin LL)-ODmab or Fmoc-L-Glu(Wang resin LL)-ODmab resin. Linear, protected peptides were built on a CEM Liberty Blue™ Peptide Synthesizer with microwave heating at coupling and deprotection steps. After the final Fmoc deprotection, the resin was treated with 2% (v/v) hydrazine monohydrate in dimethylformamide (DMF) to remove the C-terminal Dmab protecting group; the N- and C-termini were then joined on-resin by a coupling reaction. A cleavage cocktail of TFA:Water:TIPS:DODT (92.5:2.5:2.5:2.5) was used for global deprotection of side-chains and to cleave the peptide from the resin. After the removal of residual TFA by evaporation, peptides were ether precipitated and further purified using RP-HPLC.
We found that cyclizing peptides longer than twelve residues on resin was challenging. In these cases the linear sequences were synthesized on the Liberty Blue N Peptide Synthesizer (with the C-terminus coupled to the resin) and then cyclized in-solution. Cl-TCP(Cl) resin from CEM was used as solid support, and the linear peptide cleaved from this resin without side-chain deprotection by treatment with 1% (v/v) TFA in dichloromethane (DCM). The protected peptide in DCM was drained into an equal volume of 50:50 acetonitrile and water. We evaporated the DCM using a rotovap apparatus, and the peptide in water and acetonitrile was then lyophilized. The resulting powder was redissolved in DCM to 1 mM based on synthesis scale assuming perfect efficiencies and 2 eq. of (7-Azabenzotriazol-1-yloxy)tripyrrolidinophosphonium hexafluorophosphate (PyAOP) was added directly to the solution. The solution was left on a magnetic stirrer for 30 minutes, then 10 eq. of N,N-diisopropylethylamine (DIEA) was added drop-wise and the cyclization reaction was left stirring overnight. As much of the solution as possible was evaporated, again using the rotovap apparatus, and the cyclic peptide was then deprotected, precipitated and purified as described herein.
Crude peptides were purified using an Agilent Infinity™ Preparative high-pressure liquid chromatography (HPLC) system with an Agilent Zorbax™ SB-C18 column (9.4 mm×250 mm). We used a linear gradient of 1%/min for Solvent B (ACN with 0.1% TFA), and a flow rate of 5 mL/min, to purify peptides, with elution peaks detected using 214 nm absorbance. We confirmed mass and purity of peptides by electrospray ionization mass spectrometry (ESI-MS) on a Thermo Scientific TSQ Quantum Access™ mass spectrometer.
All crystals were grown by hanging drop vapor diffusion. Equal volumes of peptide and reservoir solution (100 nL each) were combined using a robot and suspended over 100 μL of reservoir using robotics. Individual crystallization conditions for each peptide are as follows:
C2-1 and its mirror form were lyophilized in a 1:1 molar ratio and dissolved at a total peptide concentration of 25 mg/mL. The peptide crystallized in two different conditions, producing different structures. Although the peptides were grown from a racemic mixture, the first crystal form contained one hand only. This crystal form grew in space group P212121 from a reservoir containing 0.1 M Citric acid pH 5.0 and 2.4 M ammonium sulfate. The crystal had a needle-like morphology with dimensions of 175×5×5 microns. The second crystal form grew in space group P1 from a reservoir containing 0.2M calcium chloride, 0.1 M HEPES pH 7.5, and 28% (w/v) PEG 400. Electron density shows evidence of epimerization (˜50%) at the DSER (residue 1) position. The crystal morphology was needle shaped, 200 microns long and less than 5 microns thick. Diffraction data from both crystal forms were collected at the Argonne National Laboratory Advanced Photon Source (APS), beamline 24-ID-E.
C3-1 and its mirror form were combined in a 1:1 molar ratio and dissolved in water for a total peptide concentration of 20 mg/mL. The peptide crystallized in two different conditions, yielding two crystal forms. The first crystal form grew in space group C2/c, from a reservoir composed of JCSG Core II A8 (0.1 M Tris pH 8.5, 5% (w/v) PEG 8000, 20% (w/v) PEG 300, 10% (w/v) Glycerol). The second crystal form grew in space group P1 from Morpheus screen condition C8, consisting of 0.09 M sodium nitrate, 0.09 M sodium phosphate dibasic, 0.09 M ammonium sulfate, 0.1 M HEPES, 0.1 M MOPS pH 7.5, 12.5% v/v 2-methyl-2,4-pentanediol, 12.5% PEG 1000, and 12.5% w/v PEG 3350. Diffraction data from both crystal forms were collected at the APS, beamline 24-ID-E.
C3-2 and its mirror form were combined in a 1:1 molar ratio and dissolved in water for a total peptide concentration of 22 mg/mL. The peptide crystallized in space group P3, from a reservoir composed of JCSG Core IV C5 (0.2 M Zinc acetate, 0.1 M imidazole pH 8.0, 2.5M Sodium chloride). Diffraction data were collected at the APS beamline 24-ID-C.
C3-3 and its mirror form were concentrated to 14.4 mg/mL each, for a total peptide concentration of 28.8 mg/mL. The peptide crystallized in the space group P31c, from a reservoir composed of JCSG Core III G10 (0.1 M Cadmium chloride, 0.1 M Sodium acetate pH 4.6, 30% (w/v) PEG 400). Diffraction data were collected at the APS on beamline APS 24-ID-C.
S2-1 was concentrated to 18.6 mg/mL. The reservoir contained 3.2 M ammonium sulfate and 0.1 M citrate, pH 5.0. The crystal morphology was rod shaped with dimensions 100×20×5 μm. Diffraction data were collected at the APS, beamline 24-ID-C using a wavelength of 0.8856 Å.
S2-2 was concentrated to 20.8 mg/mL. The reservoir contained 0.1 M potassium thiocyanate and 30% (w/v) polyethylene glycol (PEG) 2000 monomethylether (MME). For cryo-protection, the crystal was briefly immersed in a mixture of 65% reservoir and 35% ethylene glycol. Diffraction data were collected at the APS, beamline 24-ID-E using a wavelength of 0.9792 Å.
S2-3 was concentrated to 23.4 mg/mL. The peptide crystallized under two different conditions, producing different packings of space group P1. The first crystal form grew from a reservoir composed of 1.6 M tri-sodium citrate pH 6.5. The morphology was a trapezoidal plate (isosceles) with edges of 50×20×5 μm. The second crystal form grew from a reservoir composed of 0.2 M lithium sulfate, 0.1 M sodium acetate, and 50% (w/v) PEG 400. The morphology was pyramidal shaped with edges of approximately 30 μm. Diffraction data from both crystal forms were collected at the APS, beamline 24-ID-E using a wavelength of 0.9792 Å.
S2-4 was concentrated to 29.8 mg/mL. The peptide crystallized under two different conditions, producing different packings of space group P1. The first crystal form grew from a reservoir composed of 3.15 M ammonium sulfate and 0.1 M citric acid, pH 5.0. The morphology was a diamond shaped plate with edges of 100×100×20 sm. The second crystal form grew from a reservoir composed of 2.4 M sodium malonate, pH 7.0. The morphology was rod-like with edges of 400×50×50 μm. Diffraction data from both crystal forms were collected at the APS, beamline 24-ID-E.
S2-5 was concentrated to 27.4 mg/mL. The reservoir contained 0.17 M (NH4)2SO4, 25.5% (w/v) PEG 4000, and 15% (v/v) glycerol. The morphology was rod-like with edges of 250×40×30 μm. Diffraction data were collected at the APS, beamline 24-ID-C
S2-6 was concentrated to 33.9 mg/mL. The reservoir contained 3.15 M ammonium sulfate, and 0.1 M citric acid, pH 5.0. The morphology was prismatic with edges of 100×80×60 μm. Diffraction data were collected at the APS, beamline 24-ID-C.
S4-1 was concentrated to 15 mg/mL in the presence of 100 mM zinc acetate. The peptide crystallized in holo and apo forms. The holo crystal form grew from a reservoir composed of 1.1 M sodium malonate dibasic monohydrate, 0.1 M HEPES pH 7.0, and 0.5% (v/v) Jeffamine-% ED-2003. The morphology was a square plate with edges of 100×100×40 μm. Diffraction data were collected at the APS, beamline 24-ID-C. The second crystal form grew from a reservoir composed of 3.15 M ammonium sulfate and 0.1 M citric acid, pH 5.0. The morphology was needle-like with edges of 150×5×5 μm. Diffraction data were collected at the APS, beamline 24-ID-E.
X-ray diffraction data were collected at beamlines 24-ID-C and 24-ID-E at the Advanced Photon Source at Argonne National Laboratories as noted above for each crystal. Crystals were cooled to a temperature of 100 K. Diffraction data were indexed, integrated, scaled, and merged using the programs XDs/XSCALE or Denzo/scalepack (Kabsch, 2010; Otwinowski and Minor, 1997). Initial phases for all crystal structures were obtained by direct methods using the programs ShelxD or ShelxT (Sheldrick, 2015, 2008). Refinement was performed using the programs Rfmac5 or ShelX (Murshudov et al., 2011; Sheldrick, 2008). Model building was performed using the graphics program coot (Emsley et al., 2010).
We used a variant on the 4-(2-pyridylazo)resorcinol (PAR) assay described previously (Crow et al., 1997; Hunt et al., 1985; Mulligan et al., 2008) to confirm the zinc content of preparations of holo-S4-1 and holo-S4-2 polypeptides (incubated with 1.25 equivalents of ZnCl2) or apo-S4-1 and apo-S4-2 polypeptides (solubilized using metal-free buffer following lyophilization after purification in 1% trifluoroacetic acid expected to prevent metal binding). To prevent trace metal contamination, all glassware was triple-rinsed with distilled water (MilliporeSigma, Burlington, Mass.), and Chelex resin (Bio-Rad, Hercules Calif.) was added to all stock buffers. A stock of 7.1 M guanidinium hydrochloride (Sigma-Aldrich, St. Louis, Mo.) was prepared in 100 mM HEPES, pH 7.5, and exact concentration was measured by refractometry (Nozaki, 1972). In 96-well plates (200 μl per well), 10 μM polypeptide was denatured in 6 M guanidine in the presence of 200 μM PAR. The change in absorbance at 490 nm on addition of polypeptide was monitored over time on a SpectraMAX™ M5e plate reader (Molecular Devices, San Jose, Calif.), and the amplitude of the change measured. To control for unbound metal, measurements were also made with buffer substituted for guanidine. Standard curves were prepared using ZnCl2 to convert absorbance changes into molar concentrations of zinc released.
We measured the affinity of the S4-1 and S4-2 polypeptide for zinc by competition with PAR. PAR binds zinc as a PAR2Zn complex, which exhibits considerably enhanced absorbance near 490 nm as compared to free PAR. This provides a convenient means of measuring complex formation between the designed peptide and zinc through the resulting decrease in absorption at 490 nm. The dissociation constants of the PAR2Zn complex were previously reported to be 1.8×10−6 M and 2.5×10−7 M for the first and second dissociation events, respectively (Hunt et al., 1985), allowing an approximate second-order dissociation constant of 4.5×10−13 M2 to be computed. This in turn allows the dissociation constant for the polypeptide-Zn complex to be determined by competition. First, we write the expressions for the various dissociation constants as functions of concentrations of species:
The above four equations can be combined to yield:
Since we are working at concentrations well above the Kd of the PAR2Zn complex, we can make the approximation that all zinc is bound:
[Zn]total=[Zn]free+[pepZn]+[PAR2Zn]≈[pepZn]+[PAR2Zn] (S9)
Rearranging for [pepZn] and substituting into Eq. S8, we obtain an expression in which all other values are either known, set by the experimenter, or measured from the absorbance, and the sole unknown value is Kd,pep:
Binding data can be fitted to the expression above numerically. Alternatively, analytic solvers can be used to produce an unwieldy but exact expression for [PAR2Zn] as a function of polypeptide concentration (which is what we did, using Maple™ software [Maplesoft, Waterloo, ON, Canada]), to which titration data may be fitted to obtain Kd,pep. Our fits were performed with Origin software (OriginLab, Northampton, Mass., USA).
Experimentally, 0 to 50 μM apo-S4-1 or apo-S4-2 polypeptide was incubated with 10 μM ZnCl2 and 500 μM PAR in 100 mM HEPES buffer, pH 7.5, in a 96-well plate. Total solution volume was 100 μl. Again, glassware was triple-rinsed with distilled water, and Chelex resin was added to all stock solutions. Absorbance was monitored at 490 nm using a SpectraMAX M5e plate reader, and signals were averaged for several minutes after reaching plateaux. Standard curves using ZnCl2 in PAR were used to convert absorbance readings to [PAR2Zn].
This application claims priority to U.S. Provisional Application Ser. No. 63/083,444 filed Sep. 25, 2020, incorporated by reference herein in its entirety.
This invention was made with government support under Grant No. HDTRA 1-19-1-0003, awarded by the Defense Threat Reduction Agency. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/051633 | 9/23/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63083444 | Sep 2020 | US |