This application contains a Sequence Listing submitted as an electronic text file named “19-599-PCT_Sequence-Listing_ST25.txt”, having a size in bytes of 9 kb, and created on Apr. 6, 2020. The information contained in this electronic file is hereby incorporated by reference in its entirety pursuant to 37 CFR § 1.52(e)(5).
Modular self-assembly of biomolecules in two dimensions (2D) is straightforward with DNA but is difficult to realize with proteins, due to the lack of modular specificity similar to Watson-Crick base pairing. The design of building blocks to enable programmable protein self-assembly is thus of importance.
In a first aspect, the disclosure provides polypeptides comprising the amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical along the full length of the amino acid sequence selected from the group consisting of:
V
LEISQDSGADDKQVKKLLDEIRKLVEKIEK
wherein (i) residues in parentheses are optional, and (ii) at least 1 of the highlighted residues is invariant.
In one embodiment, the polypeptide comprises two copies of SEQ ID NO:1 or SEQ ID NO:2 connected by a linking peptide. In one embodiment, the linking peptide is between 3-6 amino acids in length.
In a further embodiment, the polypeptide comprises the amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical along the full length of the amino acid sequence selected from the group consisting of:
L
QKELVEKLKRQGSGNMYIRALEQSLREQEE
L
LAERLKTLLKVLEISQDSGADDKQVKKLLD
wherein at least 1 of the highlighted residues is invariant.
In one embodiment, the polypeptide comprises the amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical along the full length of SEQ ID NO:3, and wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, or all 24 of the highlighted residues are invariant. In another embodiment, the polypeptide comprises the amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical along the full length of SEQ ID NO:4, and wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, or all 68 of the highlighted residues are invariant.
In another embodiment, the polypeptides further comprise one or more functional domains. In a further embodiment, the disclosure provides two-dimensional assemblies, comprising a plurality of assembled polypeptides according to any embodiment or combination of embodiments of the disclosure.
In other aspects, the disclosure provides nucleic acids encoding the polypeptide of any embodiment or combination of embodiments of the disclosure, expression vectors comprising the nucleic acids operatively linked to a control sequence, and recombinant host cells comprising the nucleic acid and/or the expression vector of the disclosure.
Also disclosed are uses of the polypeptides and two-dimensional assemblies of the disclosure, and method for designing polypeptides that can form two-dimensional arrays.
All references cited are herein incorporated by reference in their entirety. Within this application, unless otherwise stated, the techniques utilized may be found in any of several well-known references such as: Molecular Cloning: A Laboratory Manual (Sambrook, et al., 1989, Cold Spring Harbor Laboratory Press), Gene Expression Technology (Methods in Enzymology, Vol. 185, edited by D. Goeddel, 1991. Academic Press, San Diego, Calif.), “Guide to Protein Purification” in Methods in Enzymology (M. P. Deutshcer, ed., (1990) Academic Press, Inc.); PCR Protocols: A Guide to Methods and Applications (Innis, et al. 1990. Academic Press, San Diego, Calif.), Culture of Animal Cells: A Manual of Basic Technique, 2nd Ed. (R. I. Freshney. 1987. Liss, Inc. New York, N.Y.), Gene Transfer and Expression Protocols, pp. 109-128, ed. E. J. Murray, The Humana Press Inc., Clifton, N.J.), and the Ambion 1998 Catalog (Ambion, Austin, Tex.).
As used herein, the singular forms “a”, “an” and “the” include plural referents unless the context clearly dictates otherwise. All embodiments of any aspect of the disclosure can be used in combination, unless the context clearly dictates otherwise.
As used herein, the amino acid residues are abbreviated as follows: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R), cysteine (Cys; C), glutamic acid (Glu; E), glutamine (Gln; Q), glycine (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Leu; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V).
In one aspect, the disclosure provides polypeptides comprising the amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical along the full length of the amino acid sequence selected from the group consisting of:
V
LEISQDSGADDKQVKKLLDEIRKLVEKIEK
wherein (i) residues in parentheses are optional, and (ii) at least 1 of the highlighted residues is invariant.
As disclosed herein, the inventors have designed polypeptide building blocks that can be used, for example, to design 2D protein arrays. In this embodiment, the polypeptides are monomeric building blocks for the polypeptides that can form the 2 D arrays, and have been designed as detailed in the attached appendices.
As described in the examples, the polypeptides can tolerate significant substitutions, particularly in the non-highlighted residues. In some embodiments, a given amino acid can be replaced by a residue having similar physiochemical characteristics, e.g., substituting one aliphatic residue for another (such as Ile, Val, Leu, or Ala for one another), or substitution of one polar residue for another (such as between Lys and Arg; Glu and Asp; or Gln and Asn). Other such conservative substitutions, e.g., substitutions of entire regions having similar hydrophobicity characteristics, are known. Polypeptides comprising conservative amino acid substitutions can be tested in any one of the assays described herein to confirm that the desired activity is retained. Amino acids can be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp. 73-75, Worth Publishers, New York (1975)): (1) non-polar: Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), Met (M); (2) uncharged polar: Gly (G), Ser (S), Thr (T), Cys (C), Tyr (Y), Asn (N), Gln (Q); (3) acidic: Asp (D), Glu (E); (4) basic: Lys (K), Arg (R), His (H). Alternatively, naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe. Non-conservative substitutions will entail exchanging a member of one of these classes for another class. Particular conservative substitutions include, for example; Ala into Gly or into Ser; Arg into Lys; Asn into Gln or into H is; Asp into Glu; Cys into Ser; Gln into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; Ile into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into Gln or into Glu; Met into Leu, into Tyr or into Ile; Phe into Met, into Leu or into Tyr; Ser into Thr; Thr into Ser; Trp into Tyr; Tyr into Trp; and/or Phe into Val, into Ile or into Leu.
In all of these embodiments, the percent identity requirement does not include any additional functional domain that may be incorporated in the polypeptide.
In one embodiment, the polypeptide comprises the amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical along the full length of SEQ ID NO:1, and wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, or all 34 of the highlighted residues are invariant. In another embodiment, the polypeptide comprises the amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical along the full length of SEQ ID NO:2, and wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or all 12 of the highlighted residues are invariant.
The highlighted residues include residues present at interfaces that participate in protein-protein interactions and residues designed to provide additional hydrogen bonding, as detailed in the examples that follow.
The polypeptides may comprise multiple (2, 3, 4, 5, 6, 7, 8, 9, 10, or more) copies, connected by a linking peptide. Such constructs are particularly useful, for example, in serving as scaffolds for electronic microscopy (such as cryo-EM) structure determination. In one embodiment, the polypeptide comprises two copies of SEQ ID NO:1 connected by a linking peptide. In another embodiment, the polypeptide comprises two copies of SEQ ID NO:2 connected by a linking peptide. Any suitable linking peptides may be used as deemed appropriate for given purpose. The linking peptide may be of any suitable amino acid composition and/or length. In one non-limiting embodiment, the linking peptide is between 3-6 amino acids in length. Exemplary such linking peptides include, but are not limited to, GSGN (SEQ ID NO:5) and GPGN (SEQ ID NO:6)
In another embodiment, the polypeptide comprises the amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical along the full length of the amino acid sequence selected from the group consisting of:
L
QKELVEKLKRQGSGNMYIRALEQSLREQEE
L
LAERLKTLLKVLEISQDSGADDKQVKKLLD
wherein at least 1 of the highlighted residues is invariant.
In one embodiment, the polypeptide comprises the amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical along the full length of SEQ ID NO:3, and wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, or all 24 of the highlighted residues are invariant.
In another embodiment, the polypeptide comprises the amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical along the full length of SEQ ID NO:4, and wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, or all 68 of the highlighted residues are invariant.
As used throughout the present application, the term “polypeptide” is used in its broadest sense to refer to a sequence of subunit amino acids. The polypeptides of the invention may comprise L-amino acids, D-amino acids (which are resistant to L-amino acid-specific proteases in vivo), or a combination of D- and L-amino acids. The polypeptides described herein may be chemically synthesized or recombinantly expressed. The polypeptides may be linked to other compounds to promote an increased half-life in vivo, such as by PEGylation, HESylation, PASylation, glycosylation, or may be produced as an Fc-fusion or in deimmunized variants. Such linkage can be covalent or non-covalent as is understood by those of skill in the art.
As will be understood by those of skill in the art, the polypeptides of the invention may include additional residues at the N-terminus, C-terminus, or both that are not present in the polypeptides disclosed herein; these additional residues are not included in determining the percent identity of the polypeptides of the invention relative to the reference polypeptide.
In one embodiment, the polypeptides may further comprise one or more functional domains. As used herein, a “functional domain” is any polypeptide of interest that might be fused or covalently bound to the polypeptides of the disclosure. In non-limiting embodiments, such functional domains may comprise one or more polypeptide antigens, polypeptide therapeutics, enzymes, detectable domains, etc. The one or more functional domains may be fused at any appropriate regions within the polypeptides of the disclosure, including but not limited to at the N-terminus or at the C-terminus of the polypeptide.
As described in the examples that follow, the polypeptides of the disclosure are polypeptide building blocks that can be used, for example, to design 2D protein arrays. Thus, in another embodiment, the disclosure provides two-dimensional assemblies, comprising a plurality of assembled polypeptides according to any embodiment or combination of embodiments disclosed herein. In one embodiment, the two-dimensional assemblies may comprise a plurality of functional domains present on the assembly, via covalent or non-covalent attachment. Non-limiting and exemplary such functional domains are described above.
The polypeptides and two-dimensional assemblies can be used for any suitable purpose. In various embodiments, they may be used as scaffolds on which to fuse antigens to increase immune response due to the increase in avidity; as scaffolds for structure determination by cryo EM or X-ray crystallography when fused to proteins of interest; as a platform for the construction of molecular robots due to the regular spacing of the assemblies; as a surface for construction of regularly spaced enzyme assembly lines; or as protein materials for coating purposes.
In another aspect, the disclosure provides nucleic acids encoding the polypeptide of any embodiment or combination of embodiments of the disclosure. The nucleic acid may comprise single stranded or double stranded RNA or DNA, or DNA-RNA hybrids, each of which may include chemically or biochemically modified, non-natural, or derivatized nucleotide bases. Such nucleic acids may comprise additional sequences useful for promoting expression and/or purification of the encoded polypeptide, including but not limited to polyA sequences, modified Kozak sequences, and sequences encoding epitope tags, export signals, and secretory signals, nuclear localization signals, and plasma membrane localization signals. It will be apparent to those of skill in the art, based on the teachings herein, what nucleic acid sequences will encode the polypeptides of the disclosure.
In another aspect, the disclosure provides expression vectors comprising the nucleic acids of the disclosure operatively linked to a control sequence. “Expression vector” includes vectors that operatively link a nucleic acid coding region or gene to any control sequences capable of effecting expression of the gene product. “Control sequences” operably linked to the nucleic acid sequences of the disclosure are nucleic acid sequences capable of effecting the expression of the nucleic acid molecules. The control sequences need not be contiguous with the nucleic acid sequences, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the nucleic acid sequences and the promoter sequence can still be considered “operably linked” to the coding sequence. Other such control sequences include, but are not limited to, polyadenylation signals, termination signals, and ribosome binding sites. Such expression vectors can be of any type, including but not limited plasmid and viral-based expression vectors. The control sequence used to drive expression of the disclosed nucleic acid sequences in a mammalian system may be constitutive (driven by any of a variety of promoters, including but not limited to, CMV, SV40, RSV, actin, EF) or inducible (driven by any of a number of inducible promoters including, but not limited to, tetracycline, ecdysone, steroid-responsive). The expression vector must be replicable in the host organisms either as an episome or by integration into host chromosomal DNA. In various embodiments, the expression vector may comprise a plasmid, viral-based vector, or any other suitable expression vector.
In another aspect, the disclosure provides host cells that comprise the nucleic acids and/or or expression vectors (i.e.: episomal or chromosomally integrated) disclosed herein, wherein the host cells can be either prokaryotic or eukaryotic. The cells can be transiently or stably engineered to incorporate the expression vector of the disclosure, using techniques including but not limited to bacterial transformations, calcium phosphate co-precipitation, electroporation, or liposome mediated-, DEAE dextran mediated-, polycationic mediated-, or viral mediated transfection.
In a further aspect, the disclosure provides methods for designing polypeptides that can form two-dimensional arrays, comprising any method as described in the attached examples. In one embodiment, the method comprises
(a) modifying a polypeptide that forms a homodimer by adding a loop sequence to link two copies of the monomeric polypeptide to form a building block;
(b) docking the building block into pseudo-C 1 2 layer group and systematically sampling three parameters that control lattice geometry: two parameters describing the lattice dimensions, and one parameter controlling rotation of the building block around its central axis; and
(c) computationally modifying interface residues and enhancing binding specificity between monomers by designing buried hydrogen bond networks at the interface between subunits, selecting for networks that involve at least 3 side chain residues with all heavy-atom donors and acceptors participating in hydrogen bonds.
The design methods are described in detail in the examples, and all embodiments or combinations of embodiments disclosed therein may be used in the design methods of the disclosure.
The description of embodiments of the disclosure is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. While the specific embodiments of, and examples for, the disclosure are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the disclosure, as those skilled in the relevant art will recognize.
Modular self-assembly of biomolecules in two dimensions (2D) is straightforward with DNA but has been difficult to realize with proteins, due to the lack of modular specificity similar to Watson-Crick base pairing. Here we describe a general approach to design 2D arrays using de novo designed pseudosymmetric protein building blocks. A homodimeric helical bundle was reconnected into a monomeric building block, and the surface was redesigned in Rosetta™ to enable self-assembly into a 2D array in the C 1 2 layer symmetry group. The designed arrays assembled to sub-μm scale under both negative stain electron microscopy and atomic force microscopy, and displayed the designed lattice geometry. The design of 2D arrays with pseudosymmetric building blocks is an important step toward the design of programmable protein self-assembly via pseudosymmetric patterning of orthogonal binding interfaces.
Here we describe a general approach for generating pseudosymmetric 2D assemblies based on a C 1 2 symmetric layer group. Starting from a de novo designed homodimer, we first design a new loop to monomerize the backbone of our building block, then identify configurations of this backbone capable of forming 2D arrays with pseudo-C 1 2 symmetry (The resulting layer group symmetry is pseudo-C 1 2 because each building block has pseudo-C2 symmetry due to the presence of an additional loop), and finally redesign the interface so that the building block will be programmatically assembled into 2D arrays with the prescribed unit cell dimensions and subunit configuration. This monomerization of the multimeric protein building block allows unique sequences to be designed on each of the 4 binding interfaces, ultimately enabling the modular assembly of higher order interactions through the design of mutually orthogonal interfaces with the same subunit placement and unit cell dimensions (
We developed a general strategy for the design of pseudosymmetric 2D protein assemblies using de novo designed proteins as building blocks, fully described in Methods.
Using this monomerized building block as a starting point for pseudosymmetric assembly, we subsequently enumerated all possible pseudo-C 1 2 symmetric layer assemblies compatible with this design, exhaustively sampling three degrees of freedom: two parameters describing the lattice dimensions, and one parameter controlling rotation of the building block around its central axis (
Examination by negative-stain electron microscopy (EM) and atomic force microscopy (AFM) revealed regular arrays on the sub-μm scale for one of the designs with exclusively hydrophobic residues at the binding interfaces (2D-HP,
Given the non-specific clustering of 2D-HP assemblies under EM, we sought to further improve the binding specificity among building blocks by designing buried hydrogen bonds at the interface (
To verify that the array was forming a regular 2D grid, we collected a larger negative stain (NanoW™) dataset of the best-behaved arrays (
We showed that by systematically sampling lattice dimensions followed by computational interface design, the same de novo designed helical bundle building block can be modularly self-assembled into two arrays with unique cell dimensions. As more de novo building blocks are designed, particularly with higher-order symmetry, a variety of 2D assemblies with unique layer group symmetries are achievable with the same design protocol, including those using larger de novo building blocks, and designing in non-polar layer groups, which have a rotation about the layer plane (e.g., P 3 2 1 and P 4 21 2), effectively canceling out any “curvature” errors in binding along the z axis, further flattening out the 2D assembly.
The monomerization of the homodimer building block coupled with designed hydrogen bond networks allows orthogonal interfaces to be designed at each intermolecular binding site, paving the way for the programmatic self-assembly of proteins into finite shapes (
Step 1. Rapid Generation of Connecting and Non-Clashing 2D Lattices from Protein Building Blocks
˜/Rosetta/main/source/bin/flatland.static.linuxgccrelease
-in:file:s [input pdb model]
-database [path to Rosetta database]
-ignore_unrecognized_res
-mh:path:scores_BB_BB
/gscratch/baker/zibochen/utilities/aa_count_ACDEFHIKLMNQRSTVWY_resl1_ang15_msc 0.2_smooth1.3_ROSETTA/aa_count_ACDEFHIKLMNQRSTVWY_resl1_ang15_msc0.2_smooth1.3_ROSETTA-mh:score:use_ss1 true-mh:score:use_ss2 true-mh:score:use_aa1 false
-mh:score:use_aa2 false #motif score specific options
-symmetry_definition dummy
-output_virtual
-tag [user defined name tag for the job]
-rot_step [search step size for the self-rotation of the building block, takes a real number]
-Cn [internal cyclic symmetry of the building block, 2]
-wallpaper [layer symmetry of the final 2D lattice, C211]
-dump_silent [dump a silent file containing all the lattices, boolean]
-C21_B [lattice parameter B for the C 1 2 layer group, takes a real number]
-cell_upper [upper limit for the cell dimensions, takes a real number]
-single_chain_version [if the input model is monomerized, the code accommodates for this psudeo-symmetry. Boolean]
-cell_step [search step size for the lattice cell dimensions, takes a real number]
˜/Rosetta/main/source/bin/rosetta_scripts.static.linuxgccrelease
-in:file:s [input pdb model]
-out::file::pdb_comments
-run:preserve_header
-use_input_sc
-out:prefix HBNet_
-beta
-missing_density_to_jump true
-parser:protocol 2D_HBNet.xml
-database [path to Rosetta database]
-chemical:exclude_patches LowerDNA UpperDNA Cterm_amidation VirtualBB ShoveBB VirtualDNAPhosphate VirtualNTerm CTermConnect sc_orbitals pro_hydroxylated_case1 pro_hydroxylated_case2 ser_phosphorylated thr_phosphorylated tyr_phos phorylated tyr_sulfated lys_dimethylated lys_monomethylated lys_trimethylated lys_acetylated glu_carboxylated cys_acetylated tyr_diiodinated N_acetylated C_methylamidated MethylatedProteinCterm
-in:file:fullatom
-multi_cool_annealer 10
-no_optH false
-optH_MCA true
-flip_HNQ
˜/Rosetta/main/source/bin/symm_seq_gen_2D.default.linuxgccrelease
-database [path to Rosetta database]
-s [input pdb model]
-cn [symmetry of the building block, 2]
˜/Rosetta/main/source/bin/symm_seq_gen_2D.default.linuxgccrelease
-database [path to Rosetta database]\-in:file:silent [input Rosetta silent file containing the 2D lattice]
-parser:script_vars resfile=[input resfile to enfore newly designed interfaces stay intact]
-out::file::pdb_comments
-run:preserve_header
-multi_cool_annealer 10
-use_input_sc
-symmetry_definition dummy
-out:prefix packed_
-beta-missing_density_to_jump true
-symmetry:detect_bonds false
-parser:protocol 2D_final_design.xml
1. Connecting the Homodimer into Monomer
The two monomers from the homodimer 2L4HC2_23 are connected into a single chain monomer with a 5-residue loop. Briefly, a database of backbone samples composed of fragments spanning two helical regions via a loop of five or less residues was generated from high resolution crystallographic structures. Loops in this database were then structurally aligned to terminal residues of the design backbone, and those that aligned within 0.35 Å RMSD were carried forward with full Rosetta design restricted to the loop and its neighborhood residues within 6 Å. The lowest-scoring candidate selected as the final loop design.
A custom Rosetta protocol was developed to dock the building block into pseudo-C 1 2 layer group and systematically sample the three parameters that control lattice geometry: two parameters describing the lattice dimensions, and one parameter controlling rotation of the building block around its central axis (
RosettaDesign™ calculations were carried out on the interfaces between adjacent building blocks, while keeping the rest of the sequences fixed. To enhance the binding specificity among subunits, we optionally used the Rosetta™ HBNet™ algorithm to design buried hydrogen bond networks at the interface between subunits, selecting for networks that involve at least 3 side chain residues with all heavy-atom donors and acceptors participating in hydrogen bonds. Low energy sequences were identified using RosettaDesign™ calculations in which the hydrogen bond networks were held fixed. A final step of minimization and side chain repacking without atom pair constraints was applied to identify the movement of HBNets™, filtering out designs with significantly moved HBNets™. The complete 2D lattice was then regenerated using the adjacent building blocks (now with designed interfaces), with the newly designed sequences applied to all building blocks. A final round of Rosetta™ design was carried out in the context of the C 1 2 layer group symmetry with the newly designed sequences fixed, to resolve potential side chain clashes in the final lattice.
Fully designed models were selected based on the shape-complementarity of the designed interface (SC>0.6), size of the designed interfaces (dSASA>500 Å), average binding energy (ddG/dSASA<−0.02 Rosetta™ Energy Unit/Å2) and no buried unsatisfied hydrogen bonds introduced at the new interfaces. Selected designs were then visually inspected for good packing of hydrophobic side chains at the interfaces.
All structural images for figures were generated using PyMOL (3).
1.2% [wt/vol] tryptone, 2.4% [wt/vol] yeast extract, 0.5% [wt/vol] glycerol, 0.05% [wt/vol]D-glucose, 0.2% [wt/vol] D-lactose, 25 mM Na2HPO4, 25 mM KH2PO4, 50 mM NH4Cl, 5 mM Na2SO4, 2 mM MgSO4, 10 μM FeCl3, 4 μM CaCl2), 2 μM MnCl2, 2 μM ZnSO4, 400 nM CoCl2, 400 nM NiCl2, 400 nM CuCl2, 400 nM Na2MoO4, 400 nM Na2SeO3, 400 nM H3BO3
20 mM Tris pH 8.0, 100 mM NaCl
Synthetic genes were ordered from Genscript Inc. (Piscataway, N.J., USA) and delivered in pET28b(+) E. coli expression vector, inserted between the NdeI and XhoI sites.
Plasmids were transformed into chemically competent E. coli expression strains BL21(DE3)Star (Invitrogen) for protein expression. Single colonies were picked from agar plates following transformation and growth overnight, and 5 ml starter cultures were grown at 37° C. in Luria-Bertani (LB) medium containing 100 μg/mL kanamycin with shaking at 225 rpm for 18 hours at 37° C. Starter cultures were diluted into 500 ml TBM-5052 containing 100 μg/mL kanamycin, and incubated with shaking at 225 rpm for 24 hours at 37° C.
Cells were harvested by centrifugation for 15 minutes at 5000 rcf at 4° C. and resuspended in 20 ml lysis buffer. Lysozyme, DNAse, and EDTA-free cocktail protease inhibitor (Roche) were added to the resuspended cell pellet before sonication at 70% power for 5 minutes. All 10 designs expressed and precipitated into cell pellet after clearing the cell lysate at 12,000 g for 1 hour. Pellets were twice resuspended in 10 ml TBS followed by centrifugation at 12,000 g for 20 min. The resulting pellet was resuspended in 1 M GdmHCl followed by centrifugation at 12,000 g for 20 min. The supernatant was dialyzed overnight into TBS buffer.
CD wavelength scans (260 to 195 nm) and temperature melts (25 to 95° C.) were performed using an AVIV model 420 CD spectrometer. Temperature melts were carried out at a heating rate of 4° C./min and monitored by the change in ellipticity at 222 nm; protein samples were diluted to 0.25 mg/mL in PBS pH 7.4 in a 0.1 cm cuvette.
Crystals of SC_2L4HC2_23 were grown by mixing 0.1 ul of protein at 20 mg/ml plus 0.1 ul of crystallization condition Morpheus H9 (Molecular Dimensions, 0.1M Amino acids, 0.1M Buffer System 3 pH 8.5, 50% (v/v) Precipitant Mix 1). As this solution is already a suitable cryoprotectant, crystals were flash-frozen directly in liquid nitrogen prior to data collection. Diffraction data was collected at the Advanced Light Source, Lawrence Berkeley National Laboratory, beamline 8.2.1. Diffraction data was indexed and scaled using HKL2000 (4). Initial models were generated by the molecular-replacement method using the program PHASER™ (5) within the Phenix™ software suite (6), with the computational design serving as the search model. Efforts were made to reduce model bias by using simulated annealing and prime-and-switch phasing within Phenix.autobuild (7). Iterative rounds of manual building in COOT™ (8) and refinement in Phenix were used to produce the final model. Due to the high degree of self-similarity inherit in coiled-coil-like proteins, datasets for the reported structures suffered from a high degree of pseudo translational non-crystallographic symmetry, as report by Phenix.Xtriage™, which complicated structure refinement and may explain the higher than expected R values reported. RMSDs of bond lengths, angles and dihedrals from ideal geometries were calculated with Phenix™ (6). The overall quality of all final models was assessed using the program MOLPROBITY™ (9). Summaries of diffraction data and refinement statistics are provided in Supplementary Table 2.
Samples were applied to glow-discharged EM grids and stained with either uranyl acetate (UA), uranyl formate (UF) or NanoW™ (Nanoprobes, Inc, Yaphank, N.Y., USA) for screening or analysis. Data was collected using a Tecnai T12 equipped with a Gatan Orius CCD. CTF estimation was performed using GCTF (10), and all other image processing steps were completed via Relion™ 2.1 (11). For the analysis in
This application claims priority to U.S. Provisional Application Ser. No. 62/833,902 filed Apr. 15, 2019, incorporated by reference herein in its entirety.
This invention was made with government support under Grant No. GM123089, awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/028044 | 4/14/2020 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62833902 | Apr 2019 | US |