The de novo design of three protein chains that associate to form an obligate ‘ABC’ heterotrimer, but not binary AB, AC and BC heterodimers, is an outstanding challenge for protein design. ABC heterotrimers are a difficult design challenge because of the interaction cooperativity required for the three unique components to assemble only into the desired structure. Although an ABC heterotrimer only has one extra component compared to a heterodimer, the latter has only four alternate species, while an ABC heterotrimer has 15 alternate species.
A computer readable form of the Sequence Listing is filed with this application by electronic submission and is hereby incorporated by reference in its entirety. The Sequence Listing is contained in the XML file created on Apr. 20, 2023 having the name “22-0585-US.xml” and is 155,987 bytes in size.
In one aspect, the disclosure provides polypeptide comprising an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of the amino acid sequences listed in Tables 1, 2, and 3, wherein 0-7 residues at the N and/or C-terminus are optional and may be absent and not considered when determining percent identity. In one embodiment, interface residues in the reference sequence as identified in Table 4 are maintained. In another embodiment, residues capable of hydrogen-bonding as identified in Table 4 are maintained. In a further embodiment, W, Y, and F residues in the reference sequence are maintained. In one embodiment, mutations in residues relative to the reference sequence are conservative amino acid substitutions.
In a further embodiment, the disclosure provides fusion proteins comprising (a) a polypeptide of any embodiment or combination of embodiments; (b) a second polypeptide; and (c) an optional amino acid linker linking the polypeptide and second polypeptide. In one embodiment, the second polypeptide comprises a helical repeat protein or a protein with mixed alpha helix/beta sheet secondary structure. In a further embodiment, the fusion protein comprises an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of sequences in Tables 5 and 6.
The disclosure also provides nucleic acids encoding a polypeptide or fusion protein of any embodiment herein; expression vectors comprising a nucleic acid of the disclosure operatively linked to a suitable control element, and host cells comprising the polypeptide, fusion protein, nucleic acid, and/or expression vector of any embodiment herein.
The disclosure further provides heterotrimers, heterodimers, or heterotetramers comprising polypeptides that comprise an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to an amino acid sequence combinations listed in Table 1, Table 2, Table 3, Table 5, or Table 6, wherein 0-7 residues at the N and/or C-terminus of the polypeptides are optional and may be absent and not considered when determining percent identity. In one embodiment, the heterotrimer, heterodimer, or heterotetramer comprises an interaction hub building block between chains in larger closed structures.
The disclosure also provides kits comprising one or more polypeptide, fusion protein, nucleic acid, expression vector, host cell, and/or heterotrimer of any embodiment herein, and methods for use of the polypeptide, fusion protein, nucleic acid, expression vector, host cell, heterotrimer, and/or kit of any embodiment herein.
Through the ability to array functional domains in a controlled, periodic manner on one surface of a 2D-lattice, such designs can be utilized for a wide range of applications within protein-based nanotechnology.
All references cited are herein incorporated by reference in their entirety. Within this application, unless otherwise stated, the techniques utilized may be found in any of several well-known references such as: Molecular Cloning: A Laboratory Manual (Sambrook, et al., 1989, Cold Spring Harbor Laboratory Press), Gene Expression Technology (Methods in Enzymology, Vol. 185, edited by D. Goeddel, 1991. Academic Press, San Diego, CA), “Guide to Protein Purification” in Methods in Enzymology (M. P. Deutshcer, ed., (1990) Academic Press, Inc.); PCR Protocols: A Guide to Methods and Applications (Innis, et al. 1990. Academic Press, San Diego, CA), Culture of Animal Cells: A Manual of Basic Technique, 2nd Ed. (R. I. Freshney. 1987. Liss, Inc. New York, NY), Gene Transfer and Expression Protocols, pp. 109-128, ed. E. J. Murray, The Humana Press Inc., Clifton, N.J.), and the Ambion 1998 Catalog (Ambion, Austin, TX).
As used herein, the singular forms “a”, “an” and “the” include plural referents unless the context clearly dictates otherwise.
As used herein, the amino acid residues are abbreviated as follows: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R), cysteine (Cys; C), glutamic acid (Glu; E), glutamine (Gln; Q), glycine (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Leu; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V).
Any N-terminal methionine residue may be present in the polypeptides of the disclosure, or may be deleted and not considered when determining percent identity.
All embodiments of any aspect of the disclosure can be used in combination, unless the context clearly dictates otherwise.
Unless the context clearly requires otherwise, throughout the description and the claims, the words ‘comprise’, ‘comprising’, and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to”. Words using the singular or plural number also include the plural and singular number, respectively. Additionally, the words “herein,” “above,” and “below” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of the application.
In a first aspect, the disclosure provides polypeptides comprising an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of the amino acid sequences listed in Tables 1, 2, and 3 (SEQ ID NO:1-43, 47, 55, 59, 65, 88, 89, 95, and 98), wherein 0-7 residues at the N and/or C-terminus are optional and may be absent and not considered when determining percent identity.
As disclosed by the inventors in the examples, the polypeptides are capable of associating with other protein chains to form an obligate ‘ABC’ heterotrimers (Table 1), but not binary AB, AC and BC heterodimers when all three chains are present; obligate heterodimers (Table 2), or obligate heterotetramers (Table 4). The polypeptides can be used to create cooperative heteropolymers that can assemble as freestanding helical units or as hubs in larger designed assemblies. Their small base size and high soluble expression make them useful, for example, in biological scaffolding applications involving the recruitment or display of three different proteins. Fusion of polypeptide chains to other proteins for scaffolding applications may be covalent, or simply through flexible linkers sequences.
The polypeptides have the ability to sustain rigid helical fusions to monomeric repeat proteins (as described for the fusion proteins of the disclosure), enabling the incorporation of arms that can be extended to provide three new and unique elongated connection points. The heteropolymers can be used as interaction hub building blocks between chains in larger closed structures generated through geometry-aware rigid fusion to themselves and other designed proteins. A notable feature is that these new assemblies can continue to be built out recursively, unlike previously designed rings, as they have outward facing chains with two or more free termini. The number and orientational accessibility of the chain termini found in the heteropolymers enable the display of multiple distinct functional domains for signaling, recruitment and engagement of specific cells, the study of protein-protein interactions, and other applications. As illustrated by the larger ring designs, the modularity and orthogonality of the designed protein interfaces makes it possible to combine the heterotrimers with other heteromeric building blocks to construct more diverse nanostructures.
As used herein, “wherein 0-7 residues at the N and/or C-terminus are optional and may be absent” means that the N-terminal 1, 2, 3, 4, 5, 6, or 7 amino acid residues may be absent, and/or the C-terminal 7, 6, 5, 4, 3, 2, or 1 amino acid residues may be absent, and are not considered when determining percent identity.
In one embodiment, interface residues in the reference sequence as identified in Table 4 are maintained in the polypeptides of the disclosure. The interface residues are those residues at the interface of a heteropolymer that includes the polypeptide. Table 4 shows the position of interface residues relative to the amino acid numbering of sequences shown in Tables 1-3. By way of non-limiting example, interface residues for SEQ ID NO:1 (DHT01 chain A) are residues 9, 10, 13, 17, 20, 24, 27, 30, 31, 34, 37, 38, and 41. Thus, in one embodiment, residues 9, 10, 13, 17, 20, 24, 27, 30, 31, 34, 37, 38, and 41 in the polypeptide are identical to those in SEQ ID NO:1. It will be apparent to those of skill in the art to determine the interface residues in the other sequences listed in Table 4.
As used herein, “maintained” means identical amino acid residue as in the reference protein.
In another embodiment, residues capable of hydrogen-bonding as identified in Table 4 are maintained in the polypeptides of the disclosure. Table 4 shows the position of hydrogen-bonding residues (“HBNet residues”) relative to the amino acid numbering of sequences shown in Tables 1-3. By way of non-limiting example, hydrogen-bonding residues for SEQ ID NO:1 (DHT01 chain A) are residues 9, 13, 24, 37, 41. Thus, in one embodiment, residues 9, 13, 24, 37, and 41 in the polypeptide are identical to those in SEQ ID NO:1. It will be apparent to those of skill in the art to determine the hydrogen-bonding residues in the other sequences listed in Table 4.
In a further embodiment W, Y, and F residues in the reference sequence are maintained in the polypeptides of the disclosure. Table 4 identifies positions of W, Y, and F (large aromatic) residues in the reference sequences. By way of non-limiting example, large aromatic residues for SEQ ID NO:2 (DHT01 chain B) are residues 20, 27, and 48. Thus, in one embodiment, residues 20, 27, and 48 in the polypeptide are identical to those in SEQ ID NO:2. It will be apparent to those of skill in the art to determine the large aromatic residues in the other sequences listed in Table 4.
In one embodiment, mutations in polypeptide residues relative to the reference sequence are conservative amino acid substitutions. As used herein, a “conservative amino acid substitution” means a given amino acid can be replaced by a residue having similar physiochemical characteristics, e.g., substituting one aliphatic residue for another (such as Ile, Val, Leu, or Ala for one another), or substitution of one polar residue for another (such as between Lys and Arg; Glu and Asp; or Gln and Asn). Other such conservative substitutions, e.g., substitutions of entire regions having similar hydrophobicity characteristics, are known. Amino acids can be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp. 73-75, Worth Publishers, New York (1975)): (1) non-polar: Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), Met (M); (2) uncharged polar: Gly (G), Ser (S), Thr (T), Cys (C), Tyr (Y), Asn (N), Gln (Q); (3) acidic: Asp (D), Glu (E); (4) basic: Lys (K), Arg (R), His (H). Alternatively, naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe. Particular conservative substitutions include, but are not limited to, Ala into Gly or into Ser; Arg into Lys; Asn into Gln or into H is; Asp into Glu; Cys into Ser; Gln into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; Ile into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into Gln or into Glu; Met into Leu, into Tyr or into Ile; Phe into Met, into Leu or into Tyr; Ser into Thr; Thr into Ser; Trp into Tyr; Tyr into Trp; and/or Phe into Val, into Ile or into Leu.
In another embodiment, the disclosure provides fusion proteins comprising
As described herein the proteins may be fused to any further polypeptide domain as suitable for an intended purpose. As described above, the polypeptides have the ability to sustain rigid helical fusions to monomeric repeat proteins (i.e., exemplary second polypeptides)), enabling the incorporation of arms that can be extended to provide three new elongated connection points. In other embodiments, the second protein may be, for example, an antigen or other protein to be displayed on heteropolymers formed from the fusion proteins. In another embodiment, the second polypeptide may comprise an antibody or antigen-binding fragment thereof. In one embodiment, the second polypeptide comprises a helical repeat protein or a protein with mixed alpha helix/beta sheet secondary structure. In various non-limiting embodiments, the fusion protein comprises an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of sequences in Tables 5 and 6 (SEQ ID NO:48-50, 52, 54, 58, 60, 61, 63, 64, 66, 67, 70, 72, 73, 74, 80, 83, 86, 87, 92, 94, 97, 99-102, 105, 106, 108, 110, 111, 113, 114, 116, 117, 119, and 120). These embodiments comprise fusions to monomeric repeat proteins, as described in the examples that follow.
In another aspect, the disclosure provides nucleic acids encoding the polypeptide or fusion protein of any embodiment or combination of embodiments of the disclosure. The nucleic acid sequence may comprise single stranded or double stranded RNA (such as an mRNA) or DNA in genomic or cDNA form, or DNA-RNA hybrids, each of which may include chemically or biochemically modified, non-natural, or derivatized nucleotide bases. Such nucleic acid sequences may comprise additional sequences useful for promoting expression and/or purification of the encoded polypeptide, including but not limited to polyA sequences, modified Kozak sequences, and sequences encoding epitope tags, export signals, and secretory signals, nuclear localization signals, and plasma membrane localization signals. It will be apparent to those of skill in the art, based on the teachings herein, what nucleic acid sequences will encode the polypeptides of the disclosure.
In a further aspect, the disclosure provides expression vectors comprising the nucleic acid of any aspect of the disclosure operatively linked to a suitable control sequence. “Expression vector” includes vectors that operatively link a nucleic acid coding region or gene to any control sequences capable of effecting expression of the gene product. “Control sequences” operably linked to the nucleic acid sequences of the disclosure are nucleic acid sequences capable of effecting the expression of the nucleic acid molecules. The control sequences need not be contiguous with the nucleic acid sequences, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the nucleic acid sequences and the promoter sequence can still be considered “operably linked” to the coding sequence. Other such control sequences include, but are not limited to, polyadenylation signals, termination signals, and ribosome binding sites. Such expression vectors can be of any type, including but not limited plasmid and viral-based expression vectors. The control sequence used to drive expression of the disclosed nucleic acid sequences in a mammalian system may be constitutive (driven by any of a variety of promoters, including but not limited to, CMV, SV40, RSV, actin, EF) or inducible (driven by any of a number of inducible promoters including, but not limited to, tetracycline, ecdysone, steroid-responsive). The expression vector must be replicable in the host organisms either as an episome or by integration into host chromosomal DNA. In various embodiments, the expression vector may comprise a plasmid, viral-based vector, or any other suitable expression vector.
In another aspect, the disclosure provides host cells that comprise the nucleic acids, expression vectors (i.e., episomal or chromosomally integrated), polypeptides, fusion protein, or compositions disclosed herein, wherein the host cells can be either prokaryotic or eukaryotic. The cells can be transiently or stably engineered to incorporate the nucleic acids or expression vector of the disclosure, using techniques including but not limited to bacterial transformations, calcium phosphate co-precipitation, electroporation, or liposome mediated-, DEAE dextran mediated-, polycationic mediated-, or viral mediated transfection.
In a further aspect, the disclosure provides heterotrimers, heterodimers, or heterotetramers comprising polypeptides that comprise an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to an amino acid sequences combination listed in Table 1, Table 2, Table 3, Table 5, or Table 6, in, wherein 0-7 residues at the N and/or C-terminus of the polypeptides are optional and may be absent and not considered when determining percent identity.
As described above, the heteropolymers can be used as interaction hub building blocks between chains in larger closed structures generated through geometry-aware rigid fusion to themselves and other designed proteins. A notable feature is that these new assemblies can continue to be built out recursively, unlike previously designed rings, as they have outward facing chains with two or more free termini. The number and orientational accessibility of the chain termini found in the heteropolymers enable the display of multiple distinct functional domains for signaling and other potential applications. As illustrated by the larger ring designs, the modularity and orthogonality of the designed protein interfaces makes it possible to combine the heterotrimers with other heteromeric building blocks to construct more diverse nanostructures.
The far right column of each of Tables 1-3 and 5-6 provides the SEQ ID NO of polypeptide combinations to form the heterotrimers, heterodimers, or heterotetramers of this aspect. For example, the first row of Table 1 is shown below, and the far right column shows that polypeptides that comprise an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequences of SEQ ID NOS 1, 2, and 3 form an obligate heterotrimer.
By way of further example, Table 5 has the following entry, showing that polypeptides that comprise an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequences of (a) SEQ ID NOS 54, 55, and 9; or (b) SEQ ID NOS 63, 55, and 9; or (c) SEQ ID NOS 66, 55, and 9; or (d) SEQ ID NOS 72, 55, and 9, each form an obligate heterotrimer.
Other such combinations are clearly noted in the Tables, and the relevant combinations of polypeptides to form the heterotrimers, heterodimers, and heterotetramers will be clear to those of skill based on the teachings herein.
In one embodiment, heterotrimers, heterodimers, or heterotetramers comprise as interaction hub building blocks between chains in larger closed structures. As shown in the examples, the inventors demonstrate use of the heterotrimers, heterodimers or heterotetramers to produce four-chain A2B2 heterotetramers, nine-chain A3B3C3 nonamers, and twelve-chain A4B4C4 dodecamers—generated through geometry-aware rigid fusion to themselves and other designed proteins. A notable feature here is that these new assemblies can continue to be built out recursively, as they have outward facing chains with two or more free termini. The number and orientational accessibility of the chain termini found in both the base heterotrimers and the higher order assemblies enable the display of multiple distinct functional domains for signaling and other potential applications. The modularity and orthogonality of the designed protein interfaces makes it possible to combine the heterotrimer, heterodimer, or heterotetramer with other building blocks to construct more diverse nanostructures.
The disclosure further provides kits comprising one or more polypeptide, fusion protein, nucleic acid, expression vector, host cell, and/or heterotrimer of any preceding claim.
The disclosure also provides methods for use of the polypeptide, fusion protein, nucleic acid, expression vector, host cell, heterotrimer, and/or kit of any preceding claim for any suitable purpose, including but not limited to displaying antigens and administering to a subject to elicit an immune response, to act as an interaction hub building blocks between chains in larger closed structures, etc. For example the heterotrimers disclosed herein permit attachment of multiple, different monomeric proteins at each free termini (as described in the examples) and to spatially present them in a desired way.
The disclosure also provides methods for computational design of the polypeptide or fusion protein of any preceding claim, comprising any method or steps as disclosed in the examples that follow.
The de novo design of three protein chains which associate to form an obligate ‘ABC’ heterotrimer, but not binary AB, AC and BC heterodimers, is an outstanding challenge for protein design. We designed helical heterotrimers with specificity conferred by buried hydrogen bond networks and large aromatic residues to enhance shape complementary packing. We obtained ten designs for which all three chains cooperatively assembled into heterotrimers with few or no other species present. Crystal structures of a helical bundle heterotrimer and extended versions, with helical repeat proteins fused to individual subunits, showed all three chains assembling in the designed orientation, We used these heterotrimers as building blocks to construct larger cyclic oligomers, which were structurally validated by electron microscopy. Our three-way junction designs provide new routes to complex protein nanostructures and enable the scaffolding of three distinct ligands for modulation of cell signaling.
Over half of the proteins found in the Protein Data Bank (PDB) assemble to form homo-oligomers or hetero-oligomers. The most abundant hetero-oligomers in nature are heterodimers. Heterotrimers are less widespread. ABC heterotrimers are a difficult design challenge because of the interaction cooperativity required for the three unique components to assemble only into the desired structure. Although an ABC heterotrimer only has one extra component compared to a heterodimer, the latter has only 4 alternate species (A, B, AA, BB), while an ABC heterotrimer has 15 alternate species (A, B, C, AB, AC, BC, AAA, BBB, CCC, AAB, ABB, AAC, ACC, BBC, BCC). We set out to design cooperatively assembling heterotrimers in which only the ABC species forms. We reasoned that heterotrimers could be designed by burying polar residues capable of making hydrogen bond networks in the core and by incorporating large aromatic residues for implicit negative design22 against non-ABC assemblies—such sidechains can complicate core packing in undesirable alternative states by causing steric clashes or large cavities.
The simplest case of an ABC heterotrimer is a coiled coil, in which each chain is a single helix (
We obtained genes encoding 20 coiled coil designed heterotrimers (DHTs) in a tricistronic E. coli expression vector with one chain having a 6×His-tag and a second chain having Strep-tag II, expressed the proteins, and purified by immobilized metal affinity chromatography (IMAC) and Strep-Tactin pull-down. All designs were soluble and for 6 designs, all three components were observed by liquid chromatography-mass spectrometry (LC-MS) after both pull-down approaches. Of these, 5 eluted as monodisperse peaks by SEC, but only 1 design (DHT01) was an exclusive ABC heterotrimer by native mass spectrometry (nMS) and had good agreement with the design model via small angle X-ray scattering (SAXS) (
A helical wheel representation of DHT01 (
Extending an ABC Heterotrimeric Coiled Coil with Repeat Protein Arms
To determine if DHT01 could serve as an organizing hub for larger protein assemblies, monomeric designed helical repeat (DHR) proteins30 were rigidly fused onto available (N and C) termini using the Rosetta™ HelixFuse protocol, with each rigid fusion being referred to as an “arm” (
To explore the ability to design heterotrimers when larger interfaces are available for installing hydrogen bond networks, we extended our computational approach to helical hairpin units. We experimented with two approaches: first, sampling superhelical parameters for all six helices at once (see Methods), and second, making the search more tractable by first sampling parameters for 4 of the helices, filtering, and then adding on the two remaining helices (
Genes encoding 85 heterotrimers in a tricistronic expression vector were obtained, and the proteins were expressed in E. coli and purified via IMAC with only one chain having the 6×His-tag. Nine of the designs (
We decided to test rigid fusions sequentially to evaluate the effect of each fusion independently on ABC heterotrimer formation. Using this strategy, we found that of the first four helical bundle heterotrimers shown in
To facilitate downstream higher-order assembly design, we investigated whether the ABC heterotrimer could be reconstituted from separately expressed individual chains. Because of the hydrophobic nature of the core, each chain of the heterotrimer can self-associate when expressed separately. We reasoned that the designed ABC heterotrimer state would likely be lower in free energy than possible off-target homo-oligomeric species, and hence that heat annealing could promote assembly to the design target state. To test this idea, an equimolar amount of individually purified A, B, and C of DHT03_2arm_A21/B21/C were mixed together and run through an annealing protocol (see Methods) to allow these interfaces to reassemble in the presence of all components. A monodisperse SEC peak was observed belonging to the ABC heterotrimer state as confirmed by overlap with its SEC co-expressed tricistronic version (
To test whether the DHT03 components could properly assemble in the presence of other helical hairpin containing chains, as would be the case for more complex assemblies, we chose DHD131, a designed helical hairpin unit heterodimer containing buried hydrogen bond networks12. We found that our separate expression and reconstitution approach via annealing succeeded with this design: chains A and B of DHD131 could be separately expressed and purified, and when the proteins were mixed together at an equimolar ratio and annealed, a monodisperse heterodimer peak was observed by SEC. We evaluated simultaneous reconstitution of DHD131 and DHT03_3arm_A21/B21/C82, since the size difference between the two constructs (if correctly assembled) should be detected by SEC. We separately expressed and individually purified the two chains of the heterodimer and the three chains of the heterotrimer, mixed them at a 1:1:1:1:1 ratio, and carried out annealing and SEC as described above. The two major species were found to be the DHD131 heterodimer and the DHT03 heterotrimer by nMS. Some DHD131 BB homodimer was also detected, but this was observed in previous nMS analyses of the heterodimer alone12. Thus, the heterotrimer chains can come together to form the intended ABC species even in the presence of potentially confounding additional helical hairpins. The ability to simultaneously assemble multiple hetero-oligomers from individual chains without interference opens the door to construction of diverse nanostructures with distinct multichain hubs.
We succeeded in solving three high-resolution structures for DHT03: the original base construct, a 1-arm version (1arm_A21/B/C), and an elongated 2-arm version (2arm_A21/B21/C long). The first crystal structure was solved at 2.35 Å resolution. The design model, shown in darker colors, has an overall 2 Å Cα RMSD agreement to the crystal structure, shown in lighter colors, with the largest deviations upon individual chain superposition in chain C. Many of the hydrogen bonds in the design model are not present in the crystal structure, but the overall placement of these residues is relatively close in space to the design model and the ABC heterotrimer still assembles with buried polar groups in the core (water molecules were not detected). The placement of these residues also appears to be effective in specifying orientation as the chain C inversion in the design model matches that of the crystal structure. There are two possible explanations for the deviations in the hydrogen bonding between crystal structure and design model. The first is that optimization of non-polar packing in the actual structure distorts the protein slightly so that many of the designed hydrogen bonds do not form. In support of this, the crystal structure has lower Rosetta™ computed energy than the design model due to improvements in sidechain rotamer preferences, Lennard-Jones (LJ)/van der Waals interactions, and solvation energies. On the other hand, Rosetta™ does not accurately capture the high energetic cost of buried unsatisfied hydrogen bonds; and so the second possibility is that in solution or neighboring low energy states, small backbone adjustments allow for more or all of the designed hydrogen bonds to form. In any event, a lesson from this structure is that more extensive sampling around the designed conformation in the design process would be useful to determine whether nonpolar packing and hydrogen bond networks favor the same state; because of the short range nature of the hydrogen bond and the strong orientational constraints, even small distortions away from the design model can disrupt hydrogen bond networks.
The crystal structures of the one arm (biophysical data in
We were not able to crystallize the remaining nine base heterotrimers, and turned to protein structure prediction to supplement the SEC, nMS, and SAXS data presented above. For DHT03, 4 out of 5 AlphaFold-Multimer models generated heterotrimeric structures similar in overall topology; interestingly the predicted structures are closer to the Rosetta™ design model than the crystal structure when aligned across all Cα. For eight of the other nine heterotrimer designs, AlphaFold-Multimer34,35 models were within 2 Å Cα RMSD of the design models; we note that physically based Rosetta™ and AlphaFold should be largely orthogonal so this level of agreement can be considered independent validation). While not as definitive as a crystal structure, the combination of structure prediction and biophysical data presented here strongly support the design models.
Core Residues with Buried Polar Groups are Essential for Mediating ABC Specificity
To better understand the importance of the buried polar groups for exclusive ABC heterotrimer assembly in DHT03, the residues intended to form hydrogen bond networks were systematically replaced. Starting from the crystal structure, residues involved in each network were repacked with non-polar residues using Rosetta™, while keeping the remaining residues intact. We refer to these as “sub_net1,” “sub_net2,” and “sub_net3,” while the combination of all these substitutions was called “sub_netall.” These four constructs were purified via the same IMAC pull-down approach as the parent heterotrimer design. In all four cases, the A, B, and C components were present in the eluate by LC-MS but the SEC spectra had broad diffuse peaks suggesting heterogeneity in the assemblies. The broadest SEC trace was observed for “sub_netall” with the entirely hydrophobic core. This hydrogen bond network replacement with nonpolar residues experiment suggests that the buried polar residues contribute to structural specificity.
To investigate the potential of the ABC heterotrimers to serve as multichain connection hubs in larger designed nanostructures, we employed the WORMS36,31 software which searches over very large numbers of possible rigid fusion between building blocks to build up user-specified architectures. In a first round of architecture design, the chains of the 4-arm coiled coil ABC heterotrimers from
To expand the range of geometries, and to enable nanostructure assembly from individual components, we sought to design a second round of cyclic structures using the propagated 2-arm ABC heterotrimer crystal structure together with alpha/beta (LHD) heterodimers37, as shown schematically in
As is evident from the images in
We have demonstrated that computational design can be used to create cooperative ABC heterotrimers that can assemble as free standing helical units or as hubs in larger designed assemblies. Their small base size and high soluble expression make them useful for biological scaffolding applications involving the recruitment or display of three different proteins. The ability to sustain rigid helical fusions to monomeric repeat proteins enables the incorporation of arms that can be extended to provide three new elongated connection points. We show that the heterotrimers can be used as interaction hub building blocks between chains in larger closed structures—four-chain A2B2 heterotetramers, nine-chain A3B3C3 nonamers, and twelve-chain A4B4C4 dodecamers—generated through geometry-aware rigid fusion to themselves and other designed proteins. A notable feature here is that these new assemblies can continue to be built out recursively, unlike previously designed rings, as they have outward facing chains with two or more free termini. The number and orientational accessibility of the chain termini found in both the base heterotrimers and the higher order assemblies enable the display of multiple distinct functional domains for signaling and other potential applications. As illustrated by the larger ring designs, the modularity and orthogonality of the designed protein interfaces makes it possible to combine the heterotrimers with other heteromeric building blocks to construct more diverse nanostructures.
Our crystal structures together with the mutational data pose a fascinating fundamental biophysics puzzle. On the one hand, the crystal structure of DHT03 shows that many of the designed hydrogen bond network residues are not making hydrogen bonds due to small local distortions in the structure; instead they are buried without making any hydrogen bonds, which is expected to be extremely destabilizing. On the other hand, while DHT03 assembles exclusively to the designed heterotrimer state, mutants in which the hydrogen bond networks have been substituted by nonpolar residues appear to adopt a range of alternative states, suggesting that the hydrogen bond networks are playing an important role in conferring structural specificity as in our design conception. It is possible that these residues confer specificity even without making hydrogen bonds as alternative states could have still higher energies when they are present; or there may be very similar states populated in solution in which the hydrogen bonds are formed which favor the designed assembly.
With modular heterotrimeric building blocks such as those developed in this paper, a much wider range of asymmetric assemblies become accessible, as each additional heterotrimeric interface introduces new centers for asymmetric branching (
For single helix heterotrimers, three helices were fixed at supercoil phases 0°, 120°, and 240° to generate chains A, B, and C respectively. Helix termini were kept in the same direction for a parallel orientation, while the third helix at supercoil phase 240° was inverted for an antiparallel orientation. To create a left handed coiled coil, the supercoil and helical twist were kept at ideal values −2.85 and 102.85, respectively. The helical phase (ΔΦ1) was sampled from −100° to 100° with an interval of 20°. The three helices were sampled at a 6.5-7.5 Å distance (R) from the z-axis with an interval of 0.25 Å. Z-offset was also sampled, kept at 0 for the first helix but then sampled at −1.5, 0, and 1.5 for the second and third helices each to account for rise per residue. All helices were sampled independently across all parameters. Each helix was 77 residues in length.
For the 6-helix heterotrimers in the first sampling approach, the supercoil radius (R) and helical phase (ΔΦ1) were sampled independently for parallel backbones, with supercoil phases fixed at 0°, 120°, and 240° for the first three inner helices and 60°, 180°, and 300° for the remaining three outer helices. If the same parameters from the coiled coil search were applied here for all six helices, more than hundreds of billions of backbones would need to be sampled simultaneously. So instead, helical phase (ΔΦ1) was sampled from 0° to 100° with an interval of 20° and supercoil radius (R) was sampled at a 6.5-7.5 Å distance (R) with a 0.5 Å interval for the inner three helices and at 12.5-13.5 Å with a 0.5 Å interval for the outer three helices.
In the second approach, only the first three inner helices and one outer helix were sampled in the first round. The helical phase (ΔΦ1) was sampled from −100° to 100° with an interval of 20°. The 3 inner helices were sampled at a 6.5-7.25 Å distance (R) from the Z-axis with an interval of 0.375 Å. The fourth outer helix was sampled at a 12.25-13.25 Å distance with an interval of 0.5 Å. Z-offset was kept at 0 for the first helix and sampled at −1.5, 0, 1.5 for the second, third, and fourth helices. The 5th and 6th helices were then each individually sampled across the same 3 parameters as the 4th helix. Helix termini were kept in the same direction for a parallel orientation, while the third helix at supercoil phase 240° and the sixth helix at supercoil phase 300° were inverted for an antiparallel orientation. Each helix was 35 residues in length.
All polar residues and acidic charged (ASP/GLU) residues were considered during the search. A total of 100,000 Monte Carlo trials were attempted with extra rotamers parsed through to help increase sampling. A minimum of two TRP/TYR were required to be parts of the networks. For single helix backbones, hydrogen bond networks were searched across every other heptad such that a final core N—P—N—P—N—P—N—P—N heptad search pattern would result (N=nonpolar, P=polar). Networks were required to span all three helices and consist of at least three residues, with a total of three networks across the heterotrimer.
For 6-helix backbones resulting from the first parametric sampling approach, hydrogen bond networks were searched across the three middle heptads. Networks were required to span 5 or 6 helices, consisting of at least six residues, with at least two networks total. We required that each network contain at least one tyrosine, tryptophan, aspartate, or glutamate. Overall, MC HBNet search was slower due to the increased search space across the middle heptads at all helices, few fully satisfied long networks were found meeting our requirements, and ultimately Rosetta™ design packing around 6-10 residue networks (compared to 3-4 residue coiled coil networks) was harder, which led us to focus on the second sampling approach.
For 6-helix backbones resulting from the second sampling approach, three hydrogen bond networks were searched for such that all networks span across the three inner helices and the newly built outer helix. This would yield a core N—P-P-P—N heptad search pattern, in which every helix contributes at least 1 residue to a hydrogen bond network. We hypothesized that fully hydrophobic heptads above and below the networks would help keep the hydrogen bonding residues in place.
Chains B and A for the coiled coil backbone were trimmed by two and four heptads, respectively, resulting in chain A being 49 residues, chain B being 63 residues, and chain C being 77 residues long. Two helices of the 6-helix backbone (which would ultimately constitute one helical hairpin) were optionally trimmed by one heptad. Both sets of heterotrimer bases underwent packing using RosettaDesign™, with six-helix backbones having an additional SAP mover and filter. Constraints on hydrogen bond network residues were placed. The backbones were divided up by layers (core, boundary, surface) with two total packing rounds. A scoring term was also used to enforce at least two phenylalanine at the core. A round of Fast Design calling a Monte Carlo mover was applied to enhance secondary structure shape complementarity, along with an upweighted short range hydrogen bond scoring term to maintain proper helical formation. A final minimization and repack of the sidechain rotamers was allowed after removing constraints on network residues.
Six-helix heterotrimers are “closed” into three helical hairpin chains in either a clockwise (A-D; B-E; C-F) or counterclockwise (A-F; B-E; C-D) orientation. Short 2-5 residue loops were generated in Rosetta™ with favorable ABEGO types. Loops were built from either available termini on each chain, with an option to delete up to 3 residues or to add 2 more residues to the existing termini to build off of as a starting point. Loops were minimized and filtered by low fragment RMSD and psipred.
Rosetta™ Helixfuse was used to rigidly join a library of DHRs to heterotrimer bases by joining the termini of both constructs based on secondary structure overlap; up to a heptad on the heterotrimer was allowed to be deleted, while up to a full single repeat was permitted to be deleted for the DHR. The lowest scoring rmsd overlap was accepted. A filter was subsequently applied to check for clashes between the two joined proteins to determine residues that needed to be redesigned; RosettaDesign™ was used to find optimal residues for the new helix. The best scoring fusions according to 1-DDT29 and after manual inspection were ordered. For DHT01 fusions, solutions were found at only 5 of the 6 available termini.
The generation of cyclic assemblies using WORMS fusion was performed following protocols presented in previous literature31,36. Input building blocks for components (heterotrimers, heterodimers, and monomeric DHRs) were curated in a WORMS database file, wherein each entry included specification of scaffold class, pdb file path, and the range of helical residues up or downstream from N and C termini that were accessible as splice sites.
Within the WORMS software, cyclic symmetry protocols were performed (C2, C3, C4, and C5), such that closure between cyclically propagated copies of the input building blocks could be found by coordinate-alignment between residues within the specified fusion-accessible helices from each splice partner.
For rings made from DHT01, the rigidly fused DHR-arms were joined at N- and C-terminal helices, and an additional DHR repeat motif was used to brige the two heterotrimer chains. For rings made from DHT03, the rigidly fused DHR arms both possessed N-terminal helices, necessitating their fusion to alpha-beta heterodimers that possessed two C-terminal DHR arms in order to close the cyclic geometry.
The outputs from the WORMS algorithm were filtered by three criteria: sequence length, internal clashing, and ring-closure error; these values for each WORMS output were presented in score files, under the fields ‘chain_len’, ‘score0’, and ‘close_err”, respectively. The selected designs were then passed through rigid backbone sequence design using RosettaDesign™, in order to optimize the local sequence around the newly formed helical junctions. The modified positions to be designed were assigned as residues that either gained or lost contacts with neighboring residues following helical fusion.
The sequences of individual chains for these designed complexes were submitted to AlphaFold2™ for monomeric structure prediction. Complexes where the designed models for each constituent chain possessed low RMSD when aligned to AlphaFold2™ structure predictions (prioritizing alignment to predicted models with high pLDDT scores) were selected to order.
Genes were codon optimized for bacterial expression and ordered in pET29b+ vector between the NdeI and XhoI restriction sites, with a T7 promoter and Kanamycin resistance gene. DHT02, 03, 04, and 05 had an additional de novo EHEE protein39 added to the N-term of the first chain to increase molecular weight for SDS-PAGE differentiation. Constructs for co-expression were ordered using ribosome binding sites (RBS), TAAGAAGGAGATATCATCATG (SEQ ID NO: 121) and/or TAAAGAAGGAGATATCATATG (SEQ ID NO: 122), in between the chains. The last chain in base and arm sequences had a cleavable N-term 6×His-tag, with recognition sequences for tobacco etch virus (TEV) or thrombin cleavage. A stop codon was added after the last chain. The ring designs were ordered in the same manner, except here the C-term His-tag was kept in frame to reduce overall DNA synthesis length. An additional Strep-tag II (WSHPQFEK (SEQ ID NO: 123)) to allow for Strep-Tactin™ pull-down was added to the N-term of chain B for DHT01 arms and DHT01 variant. Chains for individual expression had either an N-term or C-term 6×His-tag. Genes were ordered from Integrated DNA Technologies (IDT) or Genscript.
Plasmids were transformed into either BL21(DE3) or Lemo21(DE3) E. coli cells using a 30 second heat shock protocol, added to autoinduction media40, and incubated at 225 rpm for 20-22 hours at 37° C. Cell pellets were obtained by centrifugation at 4000×g for 15 minutes, resuspended in 30 ml of lysis buffer (25 mM Tris-HCl pH 8.0, 300 mM NaCl, 20 mM imidazole, with added protease inhibitor PMSF), lysed with a sonicator at 85% amplitude with 15 second on/off cycles for a total of 2 minutes and 30 seconds, and then spun in the centrifuge at 24,000×g for 30 minutes. Cleared lysate was poured over an Ni-NTA column pre-equilibrated with 3 column volume (CV) of lysis buffer, washed with wash buffer (25 mM Tris-HCl pH 8.0, 300 mM NaCl, 30 mM imidazole) at 2×10 CV, and eluted with elution buffer (25 mM Tris-HCl pH 8.0, 300 mM NaCl, 250 mM imidazole) at 6 CV. For Strep-tag purification, Strep-Tactin XT Superflow high capacity resin (IBA) was equilibrated with 2 CV of Buffer W (100 mM Tris-HCl pH 8.0, 150 mM NaCl, 1 mM EDTA), IMAC eluate was poured over, column was washed with 5 CV of Buffer W, and protein was eluted with 3 CV of Buffer BXT (100 mM Tris-HCl, pH 8.0; 150 mM NaCl; 1 mM EDTA; 50 mM biotin). For TEV or thrombin cleavage, imidazole was cleared out through a buffer exchange into TBS buffer (25 mM Tris-HCl pH 8, 150 mM NaCl) and enzyme was applied for overnight cleavage. PMSF was used to stop thrombin cleavage and a second IMAC pulldown was carried out for either cleavage reaction. Flow-through was collected and run through SEC.
Protein samples were mixed with 2× Laemmli Sample Buffer, heated for 10 minutes at 95° C., and loaded onto Tris-Glycine gels along with 5 μL of BioRad's Precision Plus Protein Dual Xtra™ Protein Standard. Gel was run for 30 min at 200 V (Tris-Glycine) and then stained with Genscript's eStain™.
Separately expressed and individually purified components of the heterotrimer can be mixed at a 1:1:1 ratio in a PCR tube and incubated in a thermocycler. The mixture undergoes ˜30 minutes of heating at 90° C., followed by a gradual cooling by a 2° C. drop every 30 seconds until 12° C. is reached (resulting in a total of 20 minutes). For DHT03_2arm_A21/B21/C and subsequent 3-arm heterotrimers, 100 uM of each chain was mixed together for reconstitution, while ˜30 uM of each chain was mixed together for the DHT03 cyclic rings.
An AKTA PURE FPLC system was used. Heterotrimer bases, arm extensions, coiled coil C2 rings, and all constructs mentioned in SI were passed through a Cytvia Superdex S200 Increase 10/300 GL column, while C3/C4 rings made from DHT03 were passed through a Cytvia Superdex S6 Increase 10/300 GL column. The mobile phase was TBS (25 mM Tris-HCl pH 8.0, 100 mM NaCl or 25 mM Tris-HCl pH 8.0, 150 mM NaCl). Samples ran at a flow rate of 0.75 ml/min and fractions were collected at 0.5 ml.
Mass Spectrometry The fraction corresponding to the SEC peak was concentrated to 1-2 mg/ml and run through Agilent 6230 LC/MS TOF through an AdvanceBio RP desalting column. The mass of the proteins was determined using intact mass spectrometry in positive mode.
Samples were analyzed by online buffer exchange native mass spectrometry (nMS) to evaluate sample purity and accurately determine oligomeric states42. Multiple instruments were used as the analyses were carried out over the duration of the protein design process. The mass spectrometers used for detection were a Q Exactive™ UHMR modified with a surface-induced dissociation device and an Exactive™ Plus EMR modified with a selection quadrupole and a surface-induced dissociation device (Thermo Fisher Scientific)43. The liquid chromatography systems used for the buffer exchange include a Vanquish™ Duo UHPLC and a Dionex™ Ultimate 3000 HPLC (Thermo Fisher Scientific). A heated electrospray ionization source (HESI-II, Thermo Fisher Scientific) with a spray voltage of ˜4 kV was used for ionization. Protein samples stored in Tris buffer were injected (0.1-2 ug) onto the LC system and exchanged at a flow rate of 100-200 uL*min−1 into 200 mM ammonium acetate (mobile phase) prior to ionization. Buffer exchange columns used include self-packed columns with P6 polyacrylamide gel (Bio-Rad Laboratories) and prototype buffer exchange columns provided by Thermo Fisher Scientific (Sunnyvale, CA). Instrument parameters were optimized to allow for ion transmission while minimizing unintentional ion activation. Higher-energy collisional dissociation (HCD) and source fragmentation voltages were used for de-adducting to allow for accurate mass determination. Frequently, collisional dissociation leading to non-covalent fragmentation was used to further validate oligomeric composition. Mass spectra were deconvolved and oligomeric assignments were made using UniDec v544.
Samples were run over SEC through PBS (Phosphate-Buffered Saline pH 7.4) buffer, concentrated to 0.25 mg/ml, and placed in a 1 mm pathlength cuvette. A JASCO-1500 was used for wavelength scans (190-260 nm) at 25° C., 75° C., 95° C., and final 25° C. Temperature melts from 25 to 95° C. were monitored at 222 nm.
Purified samples were run through 25 mM Tris, 150 mM NaCl, and 2% glycerol buffer for SEC. Samples were concentrated using a 10K molecular weight cutoff (MWCO) benchtop spin concentrator and flow-through from the concentrator was used as a buffer blank. A 1.5-2.5 mg/ml low concentration range and a 3-6 mg/ml high concentration range were used for shipping to the SIBYLS High Throughput SAXS Advanced Light Source in Berkeley, California. The X-ray wavelength (λ) was 1.27 Å and the sample-to-detector distance was 1.5 m, corresponding to a scattering vector q (q=4π sin θ/λ, where 2θ is the scattering angle) range of 0.01 to 0.3 Å−1. A series of exposures was taken of each well, in equal sub-second time slices: 0.3 second exposures for 10 seconds resulting in 32 frames per sample. Collected data was processed using the SIBYLS SAXS FrameSlice server and analyzed using ScÅtter3. Scattering output was fit to the theoretical design model using the FoXS server.
Purified DHT03 protein at a concentration of 40 mg/mL was used to conduct sitting drop, vapor-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of DHT03 grew from drops consisting of 100 nL protein plus 100 nL of a reservoir solution consisting of 0.1M HEPES pH7.5, 20%(w/v) PEG8000 at 4° C., and were cryoprotected by supplementing the reservoir solution with 15% Ethylene glycol. Native diffraction data was collected at APS beamline APS-23-ID-B, indexed to P1 and reduced using XDS48. The structure was phased by molecular replacement using Phaser™. The core of DHT03_2arm_A21/B21/C long was used as a search model. The best solution with TFZ score 5.8 in Phaser™ was autobuild by SHELXE and the solution with best CC of model-map (0.35) was obtained for Coot™ adjustment and refinement using Phenix™
Purified DHT03_1arm_A21/B/C protein at a concentration of 30 mg/mL was used to conduct sitting drop, vapor-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of DHTO3_1arm_A21/B/C grew from drops consisting of 100 nL protein plus 100 nL of a reservoir solution consisting of 1M LiCl, 0.1M Sodium citrate pH5, 20% (w/v) PEG6000 at 4° C., and were cryoprotected by supplementing the reservoir solution with 15% Ethylene glycol. Native diffraction data was collected at APS beamline APS-23-ID-B, indexed to P1 and reduced using XDS48. The structure was phased by molecular replacement using Phaser™. The core region of a set of ˜50 lowest energy predicted models from Rosetta™ were used as search models. The arm region was rigid-body fitted in the density subsequently in Coot™ and refined using Phenix™. The following model building and refinement were done by Coot™ and Phenix™
Purified DHT03_2arm_A21/B21/C (long) protein at a concentration of 41 mg/mL was used to conduct sitting drop, vapor-diffusion crystallization trials using the JCSG Core I-IV screens (NeXtal Biotechnologies). Crystals of DHT03_2arm_A21/B21/C long grew from drops consisting of 100 nL protein plus 100 nL of a reservoir solution consisting of 2M (NH4)2SO4, 0.1M sodium acetate pH4.6 at 18° C., and were cryoprotected by supplementing the reservoir solution with 2.2M sodium malonate pH5. Native diffraction data was collected at APS beamline APS-23-ID-B, indexed to P1 and reduced using XDS48. The structure was phased by molecular replacement using Phaser™. The chain A of a set of ˜49 lowest energy predicted models from Rosetta™ were used as search models. Several of these models gave clear solutions. Chain B and Chain C were fitted manually in Coot™ and rigid body refinement by Phenix™. The following model building and refinement were done by Coot™ and Phenix™.
All SEC-purified samples were diluted to 0.008 mg/mL in TBS buffer, at pH 8.0. For each sample, copper grids (LaceyCarbon™, with 1 μm hole diameter and 5 μm hole spacing) were glow-discharged; 6 μL of diluted samples were applied to the grids, and left on for 8 seconds and then dried with blotter paper; three rounds of grid-staining with uranyl formate (6 μL, 2 mg/mL) were applied to each grid, and left to sit for 8 seconds before blotting; the grids were left to dry for 5 minutes.
NS-EM data acquisition was performed on an FEI Talos L120C transmission electron microscope (120 keV accelerating voltage, 2.7 mm spherical abberation), at a magnification of 92,000× and pixel size of 1.54 Å×1.54 Å. Data collection for selected samples was performed using Thermo Fisher Scientific EPU software. Micrographs were stored as *.mrc files for subsequent processing.
To process and analyze the data, the collected micrographs were processed and analyzed using the CryoSPARC V3 software suite.
This application claims priority to U.S. Provisional Patent Application Ser. No. 63/338,260 filed May 4, 2022, incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
63338260 | May 2022 | US |