A computer readable form of the Sequence Listing is filed with this application by electronic submission and is incorporated into this application by reference in its entirety. The Sequence Listing is contained in the file created on Aug. 16, 2022 having the file name “21-1152-US.xml” and is 99 kb in size.
The design of dynamic protein mechanical systems is of great interest given their rich functionality, but while recent advances in protein design permit the generation of somewhat sophisticated static nanostructures and assemblies, the complex folding and diversity of non-covalent interactions in dynamic protein mechanical systems has made their design very challenging.
In one aspect, the disclosure provides polypeptides comprising an amino acid sequence at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NOS: 1-15 and 17-51, not including any functional domains added fused to 35 the polypeptides (whether N-terminal, C-terminal, or internal), and wherein the 1, 2, 3, 4, or 5 N-terminal and/or C-terminal amino acid residues may be present or absent and when absent are not considered in determining the percent identity.
In another embodiment, the disclosure provides kit or machine assemblies, comprising an axle and ring pair comprising an amino acid sequence at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence of one or more axle and ring pair are selected from the group consisting of the following pairs (A)-(J), not including any functional domains added fused to the polypeptides (whether N-terminal, C-terminal, or internal), and wherein the 1, 2, 3, 4, or 5 N-terminal and/or C-terminal amino acid residues may be present or absent and when absent are not considered in determining the percent identity.
The disclosure further provides nucleic acids encoding the polypeptides of the disclosure, expression vectors comprising the nucleic acids operatively linked to a suitable control sequence, host cells comprising the polypeptide, kits, machine assemblies, nucleic acids, and/or vectors of the disclosure; and methods for using the polypeptide, kits, machine assemblies, nucleic acids, vectors, and/or host cells of the disclosure.
All references cited are herein incorporated by reference in their entirety. Within this application, unless otherwise stated, the techniques utilized may be found in any of several well-known references such as: Molecular Cloning: A Laboratory Manual (Sambrook, et al., 1989, Cold Spring Harbor Laboratory Press), Gene Expression Technology (Methods in Enzymology, Vol. 185, edited by D. Goeddel, 1991. Academic Press, San Diego, CA), “Guide to Protein Purification” in Methods in Enzymology (M. P. Deutshcer, ed., (1990) Academic Press, Inc.); PCR Protocols: A Guide to Methods and Applications (Innis, et al. 1990. Academic Press, San Diego, CA), Culture of Animal Cells: A Manual of Basic Technique, 2nd Ed. (R. I. Freshney. 1987. Liss, Inc. New York, NY), Gene Transfer and Expression Protocols, pp. 109-128, ed. E. J. Murray, The Humana Press Inc., Clifton, N.J.), and the Ambion 1998 Catalog (Ambion, Austin, TX).
As used herein, the singular forms “a”, “an” and “the” include plural referents unless the context clearly dictates otherwise.
As used herein, the amino acid residues are abbreviated as follows: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R), cysteine (Cys; C), glutamic acid (Glu; E), glutamine (Gln; Q), glycine (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Leu; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V).
In all embodiments of polypeptides disclosed herein, any N-terminal methionine residues are optional (i.e.: the N-terminal methionine residue may be present or may be absent).
All embodiments of any aspect of the disclosure can be used in combination, unless the context clearly dictates otherwise.
Unless the context clearly requires otherwise, throughout the description and the claims, the words ‘comprise’, ‘comprising’, and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to”. Words using the singular or plural number also include the plural and singular number, respectively. Additionally, the words “herein,” “above,” and “below” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of the application.
In a first aspect, the disclosure provides polypeptides comprising an amino acid sequence at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NOS:1-15 and 17-51, not including any functional domains added fused to the polypeptides (whether N-terminal, C-terminal, or internal), and wherein the 1, 2, 3, 4, or 5 N-terminal and/or C-terminal amino acid residues may be present or absent and when absent are not considered in determining the percent identity.
The polypeptides disclosed herein are de novo proteins designed as single components (axles/rings) and full rotary machine assemblies, and this can be used, for example, in protein nanomachines that be genetically encoded for multicomponent self-assembly within cells or in vitro, facilitating fabrication or in vivo transfer and use in a vast range of nanodevices for medicine, material sciences or industrial bioprocesses.
The sequences provided below are annotated as follows:
AYALELALGALRLEDRARELIKEAEKKGDPEKLREALEALEEAVRLVEEAIKLRPDMDLAVEIAVRLARMLKRV
AELLQELAKKTGDPELLKLALRALEVAVRAVELAIKSNPDNDEAVETAVRLARELKKVAEELQERAKKTGDPEL
LKLALRALEVAVRAVELAIKSNPDNEEAVETAKRLAEELRKVAELLEERAKETGDPELQELAKRAKEVADRARE
LAKKS
NDEKEKLKELLKRAEELAKSPDPEDLKEAVRLAEEVVRERPGSNLAKKALEIILRAAEELAKLPDPEALKEAVK
AAEKVVREQPGSNLAKKALEIILRAAAALANLPDPESRKEADKAADKVRRE
ELAKRADDKDVREIVRDALELASRSTNDEVIRLALEAAVLAARSTDSDVLEIVKDALELAKQSTNEEVIKLALK
AAVLAAKSTDEEVLEEVKEALRRAKESTDEEEIKEELRKAVEEAE
DERQKQREEVRKLAEELASKATDEELIKEIKKCAQLAEELASRSTNDELIKQILEVAKLAFELASKATDEELIK
RILKCCQLAFELASRSTNDELIKQILEVAKLAFELASKATDEELIKLILACCVLAFELASRITNDEEIKQILEE
AKEAFERASKATDEEEIRKILAKCIA
Full Rotary machine assemblies:
AATGNTDQVRRAAELMVEIARLAGTEEAQDLALDALLDVLETALQIATKIIDDANKLLEKLR[RSERKDP]KVVETY
CEAIREAVRAAEELLRENPSTEAEELLRRAIEAAVRCPDCEAIREAVRAAEELLRENPSEEAKELLRRAIESAKKCP
DPEAQREAKRAEEELRKE(GSHHHHHH)
DEEDESYELVEHIAEELEEIAEEIAEAVENLAQAIIEALYVAWESNQQINEQVQEVEQS
MAELAYLLGELAYKLGEY
RIAIRAYRIALKHDPNNAEAWYNLGNAYYKQGDYDEAIEYYQKALELDPNNAEAWYNLGNAYYKQGDYDEAIEYYQK
ALELDPNNAEAKQNLGNAKQKQG(GSHHHHHH)
KTSELTDED
TIRREILKVDVRMLAISLA[ASAKDEE]LRKEIKKCLQLAEELASRSTNKELQKQAMEVAKLALELA
[RKATDE]ELIKEILKCCQLAFELASRSTNDELIKQILEVAKLAFELA[SKATDEE]LIKEILKCCQLAFELASRSTN
DEEIKQILETAKEAFERAS[KATDEE]EIKEILKKCQEKFEKKS(GSHHHHHH)
(MG){VEELLLLARAAHH}[SGTTVEE]AYKLAK[KLGISV]{KELLLLARAAHN}[SGTTVEE]AYKLA[LKLGIS]
RARELIKEAEKKGDPEKLREALEALEEAVRLVEEAIKLRPDMDLAVEIAVRLARMLKRVAELLQELAKKTGDPELLK
LALRALEVAVRAVELAIKSNPDNDEAVETAVRLARELKKVAEELQERAKKTGDPELLKLALRALEVAVRAVELAIKS
NPDNEEAVETAKRLAEELRKVAELLEERAKETGDPELQELAKRAKEVADRARELAKKS(GSHHHHHH)
LGEYRIAIRAYRIALKHDPNNAEAWYNLGNAYYKQGDYDEAIEYYQKALELDPNNAEAWYNLGNAYYKQGDYDEAIE
YYQKALELDPNNAEAKQNLGNAKQKQG(GSHHHHHH)
AAKLALKAALEAIELCKQSTDEELCEELVKLAQKLIELAKRYPDSEEAKRALKEAKELIEQCKESTDEDECRELVKR
AEELIREAKE(GSHHHHHH)
VEAIEAAVRALEAAERAGDPELREDAREAVRLAVEAAEEVQRNPSSSTANLLLKAIVALAEALAAAANGDKEKFKKA
AESALEIAKRVVEVASKEGDPEAVLEAAKVALRVAELAAKNGDKEVEKKAAESALEVAKRLVEVASKEGDPELVLEA
AKVALRVAELAAKNGDKEVFQKAAASAVEVALRLTEVASKEGDSELETEAAKVITRVRELASKQGDAAVAILAETAE
VKLEIEESKKRPQSESAKNLILIMQLLINQIRLLVLQIRMLDEQRQNQQREA{RVKSNEMERLAEVLRLSARARRGA
MSGSEEDQERLRKEMEEERKHMEEVEK}ELRKVEEKMKSHEDTSL{RLLVLIARLLINQIRLLILQIRSLSNLERNQ
AREAMVESNEMEREAETLRLSAR}
EQRRAG
Q}EMPGSNLAKAAQEIMRQASRAAEEAARRAKETLEKAEKDGDP{ETALKAVETVVKVARALNQIATA}AGSEEAQE
ELAARYPDSEAAKLALKAALEAIELCKQSTDEELCEELVKLAQKLIELAKRYPDSEEAKRALKEAKELIEQCKESTD
EDECRELVKRAEELIREAKE(GSHHHHHH)
VEAIEAAVRALEAAERAGDPELREDAREAVRLAVEAAEEVQRNPSSSTANLLLKAIVALAEALAAAANGDKEKFKKA
AESALEIAKRVVEVASKEGDPEAVLEAAKVALRVAELAAKNGDKEVFKKAAESALEVAKRLVEVASKEGDPELVLEA
AKVALRVAELAAKNGDKEVFQKAAASAVEVALRLTEVASKEGDSELETEAAKVITRVRELASKQGDAAVAILAETAE
VKLEIEESKKRPQSESAKNLILIMQLLINQIRLLVLQIRMLDEQRQRLEQQM{RMEVRQLEIRSECLRKESAVVSMV
NSVGTHDQMKLKEQMEEEERHTEKVEK}EIRKVEEKMKSHEDTSLRLLVLIA{RLLINQIRLLILQIRSLSNLELRL
QQQMRMEVEQLRIRSQCLQEE}SEVVEEVE
R}EQPGSNLAKAAQEIMRQASRAAEEAARRAKETLEKAEKDGDP{RTALQAVMTVVEVAKALNIIATM}AGSEEAQE
ELAARYPDSEAAKLALKAALEAIELCKQSTDEELCEELVKLAQKLIELAKRYPDSEEAKRALKEAKELIEQCKESTD
EDECRELVKRAEELIREAKE(GSHHHHHH)
AIRAVAEIAKEAQDSEVLEEAVRVIEEIAKESGSEEALRQAKRAIEEIAREARDLRVEALALLAMARLYLLMVKLEQ
LELRKLLLAAQALVQAAAQAERQTR
QAAAQLGE[AGISS]EEILELLRAAHE[LGLDP]DCIAAAADLGQ[AGISS]SEITALLLAAAAIELAKRADDKDVR
EIVRDALELASRSTNDEVIRLALEAAVLAARSTDSDVLEIVKDALELAKQSTNEEVIKLALKAAVLAAKSTDEEVLE
EVKEALRRAKESTDEEEIKEELRKAVEEAE(GSHHHHHH)
KTSELTDEKTIREEIRKVKEKSKEIV
LLLLAQAARN[SGTTVEE]AYKLAL[KLGIS]VEELLLLAKAADF[SGTTVEE]AYKLAL[KLGIS]VEELLLLARA
VRKTSELTDEKTIREEIRKVKEESKRIVEEA
EEEI
LIVD
NNRAIVEILALIVENNRAIIEALEAIGGGTKILEEMKKQLKDLKRALET
AFSGTTVEEAYKLALKLGIS(GSHHHHHH)
QAAAQLGE[AGISS]EEILELLRAAHE[LGLDP]DCIAAAADLGQ[AGISS]SEITALLLAAAAIELAKRADDKDVR
EIVRDALELASRSTNDEVIRLALEAAVLAARSTDSDVLEIVKDALELAKQSTNEEVIKLALKAAVLAAKSTDEEVLE
EVKEALRRAKESTDEEEIKEELRKAVEEAE(GSHHHHHH)
LGEYRIAIRAYRIALKHDPNNAEAWYNLGNAYYKQGDYDEAIEYYQKALELDPNNAEAWYNLGNAYYKQGDYDEAIE
YYQKALELDPNNAEAKQNLGNAKQKQG(GSHHHHHH)
EAWYNLGNAAYKKGEYDEAIEAYQKALELDPNNAEAWYNLGNAYYKQGDYDEAIEYYQKALELDPNNAEAKQNLGNA
KQKQG(GSHHHHHH)
SELTDED
TIRREILKVDVRMLAISLAASAKDEELRKEIKKCLQLAEELASRSTNKELQKQAMEVAKLALELARKATD
EELIKEILKCCQLAFELASRSTNDELIKQILEVAKLAFELASKATDEELIKEILKCCQLAFELASRSTNDEEIKQIL
ETAKEAFERASKATDEEEIKEILKKCQEKFEKKS(GSHHHHHH)
REAVRAAEELLRENPSTEAEELLRRAIEAAVRCPDCEAIREAVRAAEELLRENPSEEAKELLRRAIESAKKCPDPEA
QREAKRAEEELRKE(GSHHHHHH)
VEAIEAAVRALEAAERAGDPELREDAREAVRLAVEAAEEVQRNPSSSTANLLLKAIVALAEALAAAANGDKEKFKKA
AESALEIAKRVVEVASKEGDPEAVLEAAKVALRVAELAAKNGDKEVFKKAAESALEVAKRLVEVASKEGDPELVLEA
AKVALRVAELAAKNGDKEVFQKAAASAVEVALRLTEVASKEGDSELETEAAKVITRVRELASKQGDAAVAILAETAE
VKLEIEESKKRPQSESAKNLILIMQLLINQIRLLVLQIRMLDEQRQNQQREARVKSNEMERLAESLRLSARDRRGAM
SGSEEDQERIRKRMEEEEKDAEKVEKELRKVEEKMKSHEDTSLRLLVLIARLLINQIRLLILQIRSLSNLERNQARE
AMVHSNEMERRAEVLRLSAREQRRAG
EQAAREARIKERVKHAAEKMVRAAEAQAEFARLRAQ
AIRAVAEIAKEAQDSEVLEEAVRVIEEIAKESGSEEALRQAKRAIEEIAREARDLRVEALALLAMARLYLLMVKLEQ
LELRKLLLAAQALVQAAAQAERQTR
QRVLEEARKVSEEAREQGDDEVLALALIAIALAVLALALVACSRGNSEEAERASEKAQRVLEEARKVSEEAREQGDD
EVLALALIAIALAVLALAIVASCRGNKEEAERAAEDAIKVAMEALEVLLSAVEQGDLKVALAAVIAILLAIAALLMV
VEAIEAAVRALEAAERAGDPELREDAREAVRLAVEAAEEVQRNPSSSTANLLLKAIVALAEALAAAANGDKEKFKKA
AESALEIAKRVVEVASKEGDPEAVLEAAKVALRVAELAAKNGDKEVFKKAAESALEVAKRLVEVASKEGDPELVLEA
AKVALRVAELAAKNGDKEVFQKAAASAVEVALRLTEVASKEGDSELETEAAKVITRVRELASKQGDAAVAILAETAE
VKLEIEESKKRPQSESAKNLILIMQLLINQIRLLVLQIRMLDEQRQNQQREARVKSNEMERLAEVLRLSAREQRRAG
VEAIEAAVRALEAAERAGDPELREDAREAVRLAVEAAEEVQRNPSSSTANLLLKAIVALAEALAAAANGDKEKFKKA
AESALEIAKRVVEVASKEGDPEAVLEAAKVALRVAELAAKNGDKEVEKKAAESALEVAKRLVEVASKEGDPELVLEA
AKVALRVAELAAKNGDKEVFQKAAASAVEVALRLTEVASKEGDSELETEAAKVITRVRELASKQGDAAVAILAETAE
VKLEIEESKKRPQSESAKNLILIMQLLINQIRLLVLQIRMLDEQRQRLEQQMRMEVRQLEIRSRCLQEESEVVEEVE
AIRAVAEIAKEAQDSEVLEEAVRVIEEIAKESGSEEALRQAKRAIEEIAREARDLRVEALALLAMARLYLLMVKLEQ
AWYNLGNAYYKQGDYDEAIEYYQKALELDPNNAEAWYNLGNAYYKQGDYDEAIEYYQKALELDPNNAEAKQNLGNAK
QKQG(GSHHHHHH)
AWYNLGNAYYKQGDYDEAIEYYQKALELDPNNAEAWYNLGNAYYKQGDYDEAIEYYQKALELDPNNAEAKQNLGNAK
QKQG(GSHHHHHH)
AWYNLGNAYYKQGDYDEAIEYYQKALELDPNNAEAWYNLGNAYYKQGDYDEAIEYYQKALELDPNNAEAKQNLGNAK
QKQG(GSHHHHHH)
AWYNLGNAYYKQGDYDEAIEYYQKALELDPNNAEAWYNLGNAYYKQGDYDEAIEYYQKALELDPNNAEAKQNLGNAK
QKQG(GSHHHHHH)
LKRSGTSAVEIAKIVARVISEVIRTLKESGSSYEVICECVARIVAEIVEALKRSGTSAAIIALIVALVISEVIRTLK
AAKELKKSV
SEEEARTIAKEAATAFAKLALLQAEAFATLVKAAARVAYILGAIAYAQGEYDIAITAYQVALDLDPNNAEAWYNLGN
AYYKQGDYDEAIEYYQKALELDPNNAEAWYNLGNAYYKQGDYDEAIEYYQKALELDPNNAEAKQNLGNAKQKQG(GS
LKRSGTSAVEIAKIVARVISEVIRTLKESGSSYEVICECVARIVAEIVEALKRSGTSAAIIALIVALVISEVIRTLK
AAKKSATHI
SEYEIRKALEELKAATAELKRATASLRAITEELKRLAKALAEKMYKAGNAMYRKGQYTIAIIAYTLALLADPNNAEA
KQG
ALVEHNRAIVEHNAIIVEHNRIIAAVLELIVRAIAHTAAELAYLLGELAYKLGEYRIAIRAYRIALKLDPNNAEAWY
G
In one embodiment, any amino acid substitutions at interface residues (single underlined residues) are conservative amino acid substitutions. As used herein, “conservative amino acid substitution” means a given amino acid can be replaced by a residue having similar physiochemical characteristics, e.g., substituting one aliphatic residue for another (such as Ile, Val, Leu, or Ala for one another), or substitution of one polar residue for another (such as between Lys and Arg; Glu and Asp; or Gln and Asn). Other such conservative substitutions, e.g., substitutions of entire regions having similar hydrophobicity characteristics, are known. Polypeptides comprising conservative amino acid substitutions can be tested in any one of the assays described herein to confirm that a desired activity, e.g. antigen-binding activity and specificity of a native or reference polypeptide is retained. Amino acids can be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp. 73-75, Worth Publishers, New York (1975)): (1) non-polar: Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), Met (M); (2) uncharged polar: Gly (G), Ser (S), Thr (T), Cys (C), Tyr (Y), Asn (N), Gln (Q); (3) acidic: Asp (D), Glu (E); (4) basic: Lys (K), Arg (R), His (H). Alternatively, naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe. Non-conservative substitutions will entail exchanging a member of one of these classes for another class. Particular conservative substitutions include, for example; Ala into Gly or into Ser; Arg into Lys; Asn into Gln or into H is; Asp into Glu; Cys into Ser; Gln into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; Ile into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into Gln or into Glu; Met into Leu, into Tyr or into Ile; Phe into Met, into Leu or into Tyr; Ser into Thr; Thr into Ser; Trp into Tyr; Tyr into Trp; and/or Phe into Val, into Ile or into Leu.
In another embodiment, any amino acid substitutions at structural residues (bold font residues) are conservative amino acid substitutions.
In a further embodiment, any amino acid substitutions at residues needed for binding to small molecule (residues within squiggly brackets) are conservative amino acid substitutions.
In one embodiment, one or more loop regions are substituted or added to with any peptide domain deemed suitable for an intended use: domains that can be modified by enzymatic activity (i.e. phosphorylation), small molecule or protein binding domains, or catalytic domains. In this embodiment, the loop region may be substituted in its entirety, or 1, 2, 3, 4, 5, or all amino acid residues of the loop region may be retained when inserting the peptide domain.
In other embodiments, interface residues, structural residues, and/or residues needed for binding to small molecule are not substituted and are maintained relative to the reference polypeptide.
In another embodiment, any amino acid substitutions relative to the reference polypeptide are conservative amino acid substitutions. In one embodiment, optional amino acid residues are absent and are not considered when determining percent identity. In another embodiment, 1, 2, 3, 4, 5, 6, or more, or all of the optional amino acid residues are present and are considered when determining percent identity.
In another embodiment, the disclosure provides kits or machine assembly, comprising an axle and ring pair comprising an amino acid sequence at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence of any one or more axle and ring pair are selected from the group consisting of the following pairs (A)-(J), not including any functional domains added fused to the polypeptides (whether N-terminal, C-terminal, or internal), and wherein the 1, 2, 3, 4, or 5 N-terminal and/or C-terminal amino acid residues may be present or absent and when absent are not considered in determining the percent identity:
In kit embodiments, the axle and ring may be assembled or may be unassembled. In machine assembly embodiments, the axle and ring are assembled (such as by non-covalent assembly), as disclosed in the examples that follow.
In one embodiment, any amino acid substitutions at interface residues (single underlined residues) are conservative amino acid substitutions. In another embodiment, any amino acid substitutions at structural residues (bold font residues) are conservative amino acid substitutions. In a further embodiment, any amino acid substitutions at residues needed for binding to small molecule (residues within squiggly brackets) are conservative amino acid substitutions. In one embodiment, one or more loop regions are substituted or added to with any peptide domain deemed suitable for an intended use.
In other embodiments, interface residues, structural residues, and/or residues needed for binding to small molecule are not substituted. In some embodiments, optional amino acid residues are absent and are not considered when determining percent identity. In other embodiments, optional amino acid residues are present and are considered when determining percent identity. In another embodiment, any amino acid substitutions relative to the reference polypeptides are conservative amino acid substitutions.
The kit or machine assembly may comprise any other components as deemed appropriate for an intended use. In one non-limiting embodiment, the kits further comprise small molecule fuels to permit rotation of the assembled motor assembly, or small molecule suicide inhibitors that can lock mechanical rotation, as described in examples that follow.
In another aspect the disclosure provides nucleic acids encoding the polypeptides or kit/machine components of any embodiment or combination of embodiments of the disclosure. The nucleic acid sequence may comprise single stranded or double stranded RNA (such as an mRNA) or DNA in genomic or cDNA form, or DNA-RNA hybrids, each of which may include chemically or biochemically modified, non-natural, or derivatized nucleotide bases. Such nucleic acid sequences may comprise additional sequences useful for promoting expression and/or purification of the encoded polypeptide, including but not limited to polyA sequences, modified Kozak sequences, and sequences encoding epitope tags, export signals, and secretory signals, nuclear localization signals, and plasma membrane localization signals. It will be apparent to those of skill in the art, based on the teachings herein, what nucleic acid sequences will encode the polypeptides of the disclosure.
In a further aspect, the disclosure provides expression vectors comprising the nucleic acid of any aspect of the disclosure operatively linked to a suitable control sequence. “Expression vector” includes vectors that operatively link a nucleic acid coding region or gene to any control sequences capable of effecting expression of the gene product. “Control sequences” operably linked to the nucleic acid sequences of the disclosure are nucleic acid sequences capable of effecting the expression of the nucleic acid molecules. The control sequences need not be contiguous with the nucleic acid sequences, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the nucleic acid sequences and the promoter sequence can still be considered “operably linked” to the coding sequence. Other such control sequences include, but are not limited to, polyadenylation signals, termination signals, and ribosome binding sites. Such expression vectors can be of any type, including but not limited plasmid and viral-based expression vectors. The control sequence used to drive expression of the disclosed nucleic acid sequences in a mammalian system may be constitutive (driven by any of a variety of promoters, including but not limited to, CMV, SV40, RSV, actin, EF) or inducible (driven by any of a number of inducible promoters including, but not limited to, tetracycline, ecdysone, steroid-responsive). The expression vector must be replicable in the host organisms either as an episome or by integration into host chromosomal DNA. In various embodiments, the expression vector may comprise a plasmid, viral-based vector, or any other suitable expression vector.
In another aspect, the disclosure provides host cells that comprise the nucleic acids, expression vectors (i.e.: episomal or chromosomally integrated), non-naturally occurring polypeptides, fusion protein, or compositions disclosed herein, wherein the host cells can be either prokaryotic or eukaryotic. The cells can be transiently or stably engineered to incorporate the nucleic acids or expression vector of the disclosure, using techniques including but not limited to bacterial transformations, calcium phosphate co-precipitation, electroporation, or liposome mediated-, DEAE dextran mediated-, polycationic mediated-, or viral mediated transfection.
In another aspect, the present disclosure provides pharmaceutical compositions, comprising one or more polypeptides, kits, motor assemblies, nucleic acids, expression vectors, and/or host cells of the disclosure and a pharmaceutically acceptable carrier. The pharmaceutical compositions of the disclosure can be used, for example, in the methods of the disclosure described below. The pharmaceutical composition may comprise in addition to the polypeptide of the disclosure (a) a lyoprotectant; (b) a surfactant; (c) a bulking agent; (d) a tonicity adjusting agent; (e) a stabilizer; (f) a preservative and/or (g) a buffer.
The disclosure further provides methods for using the polypeptide, kit, machine, nucleic acid, expression vector, host, and/or pharmaceutical composition of any preceding claim for any suitable use as disclosed herein, including but not limited to in protein nanomachines that be genetically encoded for multicomponent self-assembly within cells or in vitro, facilitating fabrication or in vivo transfer and use in a vast range of nanodevices for medicine, material sciences or industrial bioprocesses.
In another aspect, the disclosure provides methods for designing the polypeptides of the disclosure, comprising any design methods as disclosed in the examples that follow.
Intricate protein nanomachines in nature have evolved to process energy and information by coupling biochemical free energy to mechanical work. The design of dynamic protein mechanical systems is of great interest given their richer functionality, but while recent advances in protein design now enable the generation of increasingly sophisticated static nanostructures and assemblies(9-1 7), the complex folding and diversity of non-covalent interactions has thus far made this very challenging(18).
We set out to explore the design of protein mechanical systems through a first-principle, bottom-up approach that decouples operational principles from the complex evolutionary trajectory of natural nanomachines. Sampling of the folding landscape for both structural and dynamic features is computationally expensive, and hence we decided on a hierarchical design approach with steps that can be tackled in turn: (i) the de novo design of stable protein building blocks optimized for assembly into constrained mechanical systems, (ii) the directed self-assembly of these components into hetero-oligomeric complexes, (iii) the shaping of the multistate energetic landscape along mechanical degrees of freedom (DOF) and (iv) the coupling of chemical or light energy to rotation or other motion. In this paper, as a proof of concept we aim to assemble a simple machine or kinematic pair (19,20) at the nanoscale, and focus on steps i-iii to design mechanically constrained heterooligomeric protein systems that undergo brownian rotary motion. We start from a rotary machine blueprint (
We set out to design de novo a library of stable protein components with shapes, fold and symmetry specifications suitable for integration into rotationally constrained assemblies. We first sought to design ring-like protein topologies with a range of inner diameter sizes capable of accommodating an axle-like binding partner in the center (
36,000X
We next sought to design high aspect ratio protein folds, or axles, onto which the ring-like designed protein could be threaded. In a first approach, single helix protein backbones were parametrically generated, and then D2, D3 or D4 dihedral symmetry was imposed to produce self-assembling dihedral homooligomers consisting of interdigitated single helices (
Synthetic genes encoding axle designs generated from the three approaches (12xC3s, 12xC5s, 12xC8s, 6xD2s, 12xD3s, 6xD4s, 6xD5s, 12xD8s) were obtained and the proteins were expressed in E. coli. The designed proteins that were well-expressed, soluble, and readily purified by Ni-NTA affinity chromatography were further purified on SEC. ˜40% (37.5% (6/16), 43% (14/32) and 33% (4/12) success rates for the first, second and third approach respectively) had appropriate monodisperse SEC chromatograms that matched the expected theoretical elution profile for the oligomerization state (
The first approach generated D2, D3 and D4 axle-like structures with folds featuring interdigitated helices with extended hydrogen bond networks. We obtained a 4.2A 3D reconstruction of a D3 axle (_1na0C3_int2_11) which showed close agreement with the design model topology. While the backbone was nearly identical to the design model, the side-chains could be partially elucidated (
The second approach generated D3, D4, D5 and D8 axle-like structures with folds featuring interdigitated helices with internal cavities for D5 and D8 (in these cases each central helix only forms contacts with two neighboring ones) (
The third approach yielded four C3 axles with folds of smaller aspect ratio and overall size, containing a large wheel-like DHR feature at one end, a narrow central three helix section and a six helix section at the other end. In all cases, the SAXS profiles together with SEC traces suggested that the correct oligomerization state was realized in solution. For design A15.5 we obtained a low resolution cryoEM map that recapitulated the general features of the design model, with prominent C3 symmetric DHR extremities and opposing prism-like extensions (
We next sought to assemble diverse axle-ring assemblies to explore the correspondence between the symmetry and energy landscape of the interface and the mechanical properties. The first challenge was to direct the self-assembly in solution of the ring around the axle by designing energetically favorable interactions, while maintaining some rotational freedom. We first sought to do this by designing assemblies with low residue interaction specificity, loose interface packing, as well as non-obligatory symmetry mismatched interactions between axle and ring restricting only parts of the assembly to form tight contacts (i.e. the full interface is never fully satisfied). To achieve these properties, we initially focused on electrostatic interactions between ring and axle which are longer range and less dependent on shape matching than the hydrophobic interactions generally utilized in protein design. To prevent potential disassembly at low concentrations, we aimed to kinetically trap the ring around the axle by installing disulfide bonds at the ring subunit-subunit interfaces. Further, to gain stepwise control on the in vitro assembly process, we introduced buried histidine mediated hydrogen bond networks at the ring asymmetric unit interfaces to enable pH controlled ring assembly (
We tested this approach by selecting three of the machine components described above—a D3 axle, a C3 ring and a C5 ring—and constructing ring-axle rotary machine assemblies with D3-C3 and D3-C5 symmetries (design A113_C2ams9 and C3D3_AR113 respectively,
We next experimented with the design of shape complementary axle and ring components, reasoning that this would enable more precise control of the rotational energy landscape by leveraging the ability to design tightly packed interfaces and hydrogen-bond networks mediated specificity (25). We designed four axle-ring assemblies using this approach: a fully C3 symmetric assembly consisting of a C3 axle and a C3 ring (C3-C3, A15.5R82), a symmetry mismatched assembly consisting of a D8 axle around which two C4 rings are assembled (D8-C4, 119RC4_20), a symmetry mismatched rotor consisting of C5 axle and C3 ring (C5-C3_2412 and C5C3_3250), as well as a C8-C4 rotor corresponding to a circular permutation version of the D8-C4 (C8D8_6_49_119RC4_20) (
Designs with each of the four symmetries were screened for assembly by expressing ring and axle pairs bicistronically and carrying out Ni-NTA purifications relying on a single HIS tag on the ring component (
To map the rotational landscape at the single molecule level, we subjected one design from each symmetry class to single particle cryoEM examination. For D3-C3 and D3-05, we obtained 2D class averages from the collected data that clearly resembled predicted projection maps, and 3D reconstructions in close agreement with the overall design model topology and designed hetero-oligomeric state (
Single particle cryoEM analysis of a C3-C3 assembly yielded 2D class averages with the axle and ring clearly visible. We were able to generate a 3D reconstruction with a resolution of 6.5 Å, which yielded an electron density map similar to the design model (
The predicted energy landscape of the D8-C4 design is quite rugged, with a total amplitude of 151.7 REU with 8 steep wells spaced 45° stepwise along the rotational axis corresponding to the high symmetry of the interface. We obtained a cryoEM map of ˜5.9 Å resolution very close to the design model (
Our proof of concept rotary machine assemblies demonstrate that protein nanostructures with internal mechanical constraints can now be designed. The hetero-oligomers topologies we created do not exist in nature nor have such synthetic systems been designed previously, and provide insights towards the design of more complex protein nanomachines. First, systematic and accurate de novo design according to machine components specification (
The internal periodic but asymmetric rotational energy landscapes of our designed rotary machine assemblies provide one of two needed elements for a directional motor. An energy harvesting process to break detailed balance and transfer the system into an excited state remains to be designed: for example the interface between machine components can be designed for binding and catalysis of small molecule fuels (19). Symmetry mismatch, which plays a crucial role in torque generation in natural motors (31-37), can be leveraged for the design of synthetic protein motors. Modular assembly could lead to compound machines for advanced operation or integration within nanomaterials. In this direction, we recently designed modular rotor complexes with reversible heterodimer extensions binding components of the rotor (
Approach 1: This approach relies first on the design of short (30 to 50 residues) single alpha helices monomers self-assembling into high aspect ratio dihedral homoligomers, which are then further fused to cyclic wheel shaped homooligomers to yield full axle parts.
Parametric design was used to generate short single α-helices and sample backbone configurations by systematically varying helical parameters using the Crick generating equations(24, 38). As described before, ideal values were used for the supercoil twist (ω0) and helical twist (ω1).
In the case of D3 helical bundles, in order to obtain the helical interdigitated geometry allowing the obtain a packed core without holes after design and therefore assembly of single helices into dihedral symmetry, we sampled two segments with different starting point for the superhelical radii per helix (6 Å and 12 Å), joined by a custom number of linker residues between the two segments, using a custom python script. This parameter range (˜5 Å with bins of 0.5 Å) were chosen based on iteratives cycles of parametric helix generation and Rosetta™ design with metric assessment, and the range of metrics yielding the highest scoring backbone were chosen. The helical phase (Δϕ1) was sampled from 0° to 90° with a step size of 10°. We sampled the offset along the z-axis (Z-offset) from −1.51 Å to 1.51 Å, with a step size of 0.1 Å. The supercoil phases (Δϕ0) were fixed at 0°, and 30° for D3s and D2s, respectively.
Once ideal backbones geometry were generated using this parametric approach, we used the Rosetta™ design protocol to further design side chains identities and rotamers and optimize the interface energy to direct the assembly in dihedral homooligomers. Importantly, this step relied on the use of the Rosetta™ HBnet protocol described previously(25), which allows for extended hydrogen bond networks across monomer subunits therefore ensuring specificity of interaction and symmetric binding mode.
The dihedral building blocks were then rigidly fused to previously designed cyclic homooligomers(39) by designing short rigid helical linkers bridging the two building blocks. The inner helices of the dihedral assemblies obtained (C or N termini depending on design) were then fused by short structured helical fragments using Rosetta™ Remodel(40) while sampling the rotation and distance between Z aligned cyclic homooligomers and dihedral homooligomer. To further stabilize and optimize the generated Cyclic-Dihedral fusion, a second round of Rosetta™ design of the fusion was performed. Method 2: This approach relied on alpha helical extensions of N or C termini of previously designed cyclic homooligomers, in order to direct the assembly of two elongated cyclic homooligomers into high aspect ratio dihedral symmetric axle parts. Rosetta™ SymDofMover was used to set up the symmetry in which the input monomer subunits were aligned along the z axis. Input subunits were first optionally flipped 180 degrees about the z axis to reverse the inputs if necessary, so that the N or C termini to be elongated would point toward each other. Monomer subunits were then translated along the specified z axis and rotated about the z axis according to random Gaussian sampling in order to finely sample helical extension parameters. Following these initial manipulations of the input structures, a symmetric pose was generated using D3, D4, D5, D6 or D8 symmetry definition files. We then applied the Rosetta™ BluePrintBDR mover which allowed us to build helical fragment extension starting at the previously positioned monomers, and spanning the distance between symmetric subunits. Once centroid helical backbones geometries were generated and sampled, we used the Rosetta™ design protocol to further design side chains identities and rotamers and optimize the interface energy to direct the assembly in dihedral homooligomers. Importantly, this step relied on the use of the Rosetta™ HBnet protocol described previously (25), which allows for extended hydrogen bond networks across monomer subunits therefore ensuring specificity of interaction and symmetric binding mode.
Computationally designed ring shape structures or various symmetries (C1, C3, C4)' were either collected from previously published work (21,41), or designed from heterodimers and
DHRs in symmetry mode (C3, C5) using protocols previously described(/2). 9x, 12x and 24x toroids were used in C1 symmetric versions or cut into 3 or 4 to produce C3 or C4 symmetric homooligomers. All designs were then computationally augmented by systematic symmetric fusion of DHR repeats proteins using the HFuse protocol, and the surrounding fusion interface of the fusion was further redesigned using Rosetta™ design protocols to optimize the assembly energy.
Generation of Two Component Rotary Machine Models from Symmetric Axle and Rotor Parts:
The goal of the computational docking procedure between axle and rotor machine parts was to exhaustively sample the rotational conformational space within some specified resolution and meaningful interface quality, all possible ways to assemble a full rotary machine complex from the two libraries of previously designed axle and rotor parts.
We started by enumerating all possible rotary machine assemblies by inspecting shapes and dimensions of available parts and identifying assemblies that would not produce any steric clashes. We then proceeded to computational docking of parts using a two-dimensional rigid body docking space to allow contact between the axle and rotor (one rotation and one translation along the Z axis). We sampled 180° rotation for C2s, 120° for C3s, 90° for C4s, 72° for C5s, and we sampled the whole span on available translation along the axis that would not generate clashes between backbones, with a 1° and 1 Å step, respectively. For each sampled dock, the resulting heteromultimeric interface was designed either using Rosetta™ design and HBnet to obtain tightly packed, specific interfaces with extended hydrogen bond networks, and in some cases by constraining the residue identities of the axle (DEHQTNSY) and ring (KRHQTNSY) to obtain complementary charges allowing loose non specific interactions. Since some of the resulting assemblies have intrinsic symmetry mismatch between the axle and rotor (e.g. D8 axle and C4 ring), we used a quasi-symmetric design methodology, relying on the Rosetta™ StoreQuasiSymmetricTaskMover, which creates a stored task that links selected interface residues. The residues remain identical in identity when the interface is designed, but their rotamers are packed differently, which allows identical residues in symmetric subunits to satisfy multiple interfaces at the same time.
In order to kinetically trap rings onto the axle, we further generated a disulfided version of homooligomers by placing cysteine at the interface between asymmetric units. This was achieved using a PyRosetta™ based stapling method that allows to identify pairs of residues that can accommodate disulfides given the 3D structure of a protein.
This protocol was developed to quickly identify pairs of residues that can accommodate disulfides given the 3D structure of a protein. 30,000 native disulfide structures were procured from the PDB, and the relative positions of the backbone atoms (N, CA, C) were calculated, hashed, and stored into a database. A candidate protein structure can then be searched for residue pairs at all relative positions of backbone atoms that can accommodate disulfides according to native geometries.
Rosetta™ models of D3-C3 and D3-C5 with truncated ring DHR arms (to minimize the total number of atoms to simulate) were used as the starting coordinates for the simulations. The rotor rings of D3-C3 and D3-C5 were rotated at 10 and 12 degree intervals, respectively. Each model was solvated in an octahedral periodic box of OPC water and 70 mM NaCl using AmberTools18 (42). In total, each system consisted of approximately 590,000 atoms. Simulations were run at constant pressure (1 bar) and temperature (298 K) using the Monte Carlo barostat, the Langevin thermostat and the ff19SB forcefield(43). Using the CUDA enabled version of Amber18, four parallel simulations for each rotated model were equilibrated using the AmberMDprep protocol(44). Once equilibrated, the simulations were run at 2 fs timestep for a total of 40 ns each, yielding an aggregate simulation time of 1920 ns for D3-C3 and 960 ns for D3-05. To allow exploration of the rotors' degrees of freedom from the initial configurations, the first 20 ns of each simulation was discarded and the final 20 ns was used in later analysis. To investigate the movement of the rings around their respective axles, 200 ps snapshots of the simulations were aligned to the initial axle coordinates by rmsd. Number density maps of the backbone atoms were calculated using the VolMap command in AmberTool's cpptraj (45). These maps were contoured to 0.001 as shown in fig. S15. To calculate the axle drift with respect to ring rotation, the backbone center of mass of the rings was calculated for all aligned snapshots. The snapshots were binned according to the ring rotation in 24 degree intervals and then averaged as shown in fig. S15. To calculate ring tilt, the centers of mass of each ring subunit was calculated, then a plane was fit through these points using the least squares optimizer in SciPy (46). The angle between this plane and the long axis of the axle was taken as the tilt, and this was averaged over rotation as described for the axial drift. The Mean Square Displacement for the DOFs was computed as MSD=average(r(t)−(0)){circumflex over ( )}2.
Buffer and media recipe for protein expression
TBM-5052: 1.5% [wt/vol] tryptone, 2.5% [wt/vol] yeast extract, 0.5% [wt/vol] glycerol, 0.05% [wt/vol] D-glucose, 0.2% [wt/vol] D-lactose, 25 mM Na2HPO4, 25 mM KH2PO4, 50 mM NH4Cl,5 mM Na2SO4, 2 mM MgSO4, 10 μM FeCl3, 4 μM CaC12, 2 μM MnC12, 2 μM ZnSO4, 400 nM CoC12, 400 nM NiCl2, 400 nM CuCl2, 400 nM Na2MoO4, 400 nM Na2SeO3, 400 nM H3BO3
Lysis buffer: 25 mM Tris, 25 mM NaCl, 20 mM Imidazole, pH 8.0 at room temperature
Wash buffer: 25 mM Tris, 25mM NaCl, 20 mM Imidazole, pH 8.0 at room temperature
Elution buffer: 25 mM Tris, 25 mM NaCl, 200 mM Imidazole, 50mM EDTA, pH 8.0 at room temperature
TBS buffer: 25 mM Tris pH 8.0, 25 mM NaCl
Construction of synthetic genes
Prior to transformation and expression in E coli hosts, synthetic genes were ordered either from Integrated DNA Technologies (Coralville, IA) or Genscript Inc. (Piscataway, N.J., USA) and cloned in pET29b+e. coli expression vector between the NdeI and Xhof sites. For bicistronic constructs used for screening the in cellulo assembly of axle and rotors, a synthetic bicistron containing both axle and rotor genes were synthesised and cloned at once in the Ndel/Xhof site, with a termination and strong ribosomal binding site sequence between the genes. For most synthetic gene constructs, a C or N ter hexahistidine tag was added in frame after a short GS linker. A stop codon was introduced at the 3′ end of the protein coding sequence to prevent expression of the C-terminal hexahistidine tag in the vector.
Plasmids were transformed into chemically competent E. coli expression strain BL21(DE3*) (New England Biolabs) for protein expression. Following transformation and overnight growth on Luria-Bertani agar Kanamycin plates 100 ug/ml, single colonies were picked and directly transferred into 2×50 ml TBM-5052 medium containing 150 μg/mL Kanamycin and incubated with shaking at 225 rpm for 24 hours at 37° C. following the autoinduction method (47). After 24 hours of incubation, the temperature was dropped for an overnight incubation at 20° C. before harvesting the cells via centrifugation at 4500 G for 20 minutes at 4° C.
The cell pellets were resuspended in 30 ml lysis buffer, followed by cell lysis via sonication at 85% power for 2.5 minutes (10 sec on/10 sec off) while keeping the cell suspension at 4° C. Lysates were clarified by centrifugation at 4° C. and 18000 G for 45 minutes and applied to columns containing Ni-NTA (Qiagen) resin pre-equilibrated with lysis buffer. The columns were washed 3 times with 10 column volumes (CV) of wash buffer, followed by 15 ml of elution buffer for protein elution.
Protein elutions were further concentrated in 15mL 3K protein concentrators (Millipore Sigma) to a volume of 500uL and the buffer exchanged for TBS buffer. The resulting protein solutions were purified by SEC using a Superdex™ 6 10/300 GL increase column (GE Healthcare) or a Superdex™ 200 10/300 GL increase column in TBS buffer. SEC elution fractions corresponding to the designs theoretical elution volumes were concentrated in TBS prior to further biochemical analysis. The theoretical SEC elution volumes were computed using the following calibrated equations: VS200=−1.89 log(<mass of design in kDa>)+21.9 ; and VS632 −1.33 log(<mass of design in kDa>)+21.9.
D3 axles and C3 or C5 rings were purified as previously described. Axle and ring were then mixed in TBS solution with 25mM TCEP following a 1:1 stoichiometry, after which the pH is dropped to 3.0 by dialysis in citrate buffer with TCEP. The protein samples were then heated for an hour at 65C, and then allowed to cool back down to room temperature on a bench. The protein samples were then dialysed overnight in TBS buffer and further SEC purified.
Protein samples were purified by SEC in 25 mM Tris pH 8.0, 25 mM NaCl and 1% glycerol; elution fractions corresponding to the protein were further concentrated using 3K protein concentrators (Millipore Sigma) and the flow-through was used as blank for buffer subtraction. SAXS Scattering measurements were performed at the SIBYLS 12.3.1 beamline at the Advanced Light Source. The sample-to-detector distance was 1.5 m, and the X-ray wavelength (X) was 1.27 Å, corresponding to a scattering vector q (q=4πsin θ/λ, where 2θ is the scattering angle) range of 0.01 to 0.3 Å-1. Å series of exposures were taken of each well, in equal sub-second time slices: 0.3-s exposures for 10 s resulting in 32 frames per sample. For each sample, data were collected for two different concentrations to test for concentration-dependent effects; ‘low’ concentration samples corresponded to 1 mg/ml and ‘high’ concentration samples to 5 mg/ml. Collected data were processed using the SAXS FrameSlice™ online server and analysed using the ScÅtter software package(23). The FoXS™ software (Sali Lab) was used to compare experimental scattering profiles to design models and assess quality of fit(48-50).
SEC fractions corresponding to the designs were concentrated in TBS prior to negative stain EM screening. Samples were then immediately diluted 5 to 150 times in TBS buffer (tris 25mM, NaCl 25mM) depending on the concentration of the samples. A final volume of 5 μL was applied on negatively glow discharged, carbon-coated 400-mesh copper grids (01844-F, TedPella,Inc.), then washed with Milli-Q Water and stained using 0.75% uranyl formate as previously described (51). Air-dried grids were then imaged on either a FEI Talos L120C TEM (FEI Thermo Scientific, Hillsboro, OR) equipped with a 4K×4K Gatan OneView™ camera at a magnification of 57,000× and pixel size of 2.51 Å. Micrographs collection was automated using EPU software (FEI Thermo Scientific, Hillsboro, OR) and were imported into CisTEM software (52) or cryoSPARC software (53). CTF estimation was done with CTFFIND4 and a circular blob picker was used to select particles which were then subjected to 2D classification. Ab initio reconstruction and homogeneous refinement in Cn symmetry were used to generate 3D electron density maps.
CryoEM grids were prepared by diluting protein samples with TBS 1 to 10 times immediately before applying 3.5 μL to glow-discharged 400 mesh, C-flat, 2 micron holes, 2 micron spacing, CF-2/2-4C (CF-224C-100) (Electron Microscopy Sciences, Hatfield, PA) cryoEM grids. For some samples, multiple blots were applied in order to obtain the best particle density. All grids were blotted using a blot force of 0 and 5 second blot time at 100% humidity and 4° C. and plunge-frozen in liquid ethane using a Vitrobot™ Mark IV (FEI Thermo Scientific, Hillsboro, OR). All cryoEM grids were screened on a Glacios transmission electron microscope (FEI Thermo Scientific, Hillsboro, OR) operated at 200 kV and equipped with a Gatan K2 Summit direct detector. Automated glacios data collection was carried out using Leginon (54) at a nominal magnification of 36,000× (1.16 Å/pixel). Movies were acquired in counting mode fractionated in 50 frames of 200 ms at 8.5 e-/pixel/sec for a total dose of ˜65e-/Å2. High resolution data was collected on a Titan Krios™ (FEIco.) operating at 300 kV, with a Quantum GIF energy filter (GatanInc.) operating in zero-loss mode with a 20eV slit width, and a K-2 Summit Direct Detect™ camera. Movies were acquired using Leginon in super-resolution mode at 130,000× (pixel size 0.525 Å/pixel) with 50 frames at an exposure rate of 2.5 e-/pixel/sec for a total dose of ˜90e/Å2. Details of dataset processing for each design are illustrated in Table S1 and Figure S3, S5, S6, S12 and S13. Theoretical 2D projections were generated using CryoSparc software's “create template” function from an input volume generated with EMAN2 (55).
Multiple datasets were collected for each design and combined early on during processing. See table 1 and processing flowcharts for details. Briefly, images were manually curated to remove poor quality acquisitions such as bad ice or large regions of carbon. Dose-weighting and image alignment of all 50 frames was carried out using MotionCor2 (56) with 5×5 patch or with cryosparc v2 patch alignment tool with default parameters. Super-resolution krios data was binned 2X during alignment. Initial CTF parameters were estimated using CTFfind4 (57). Particle picking was done with a gaussian blob picker and in some cases followed by a template picker. Particles were extensively classified in 2D to remove junk particles and designs which may not have been intact or were damaged, yielding in some cases relatively few particles. This may also be due to the low mass of the designed proteins which did not align well. In addition, the expected motion of the rotors may have introduced further heterogeneity, limiting classification efforts. Starting models for all designs were always obtained ab initio, despite clear evidence of the expected design in 2D. In 3D classification and refinement we were able to resolve either axle or ring, and in one case both together (D8-C4), suggesting rotor movement. FSC 0.143 curves were generated by exporting half maps to relion for post-process. Local resolution estimates were generated in relion and displayed onto the locally filtered map outputs using Chimera (58). For density modification in Phenix (59), we used as input the exported half maps from cryosparc with default params at 100 bins and local filtering with a factor of 5. FSC curves were plotted using the Phenix density modification Fref 0.5 output along with the relion FSC estimates. Directional FSC calculated using remote 3DF SC processing tool. 3D Variability analysis (3DVA) of the D8-C4 design was done in cryosparc v2 following expanded particles in D4 symmetry of the final reconstructions with a mask around both rings and the axel. We used default settings of simple cluster mode and 10 frame output with a 10 Å lowpass filter for assessing variability. First and last frames of the second trajectory component were used as input for downstream refinement of distinct structures. Resulting maps were then low-pass filtered to 15 Å for clarity. For D3-C5, 3DVA was carried out after D3 symmetry was expanded and variability was processed and filtered at 5 Å for display.
Biolayer interferometry experiments were performed on an OctetRED96 BLI system (ForteBio, Menlo Park, CA). Enzymatic protein biotinylation was performed on SEC purified Avi-tagged proteins prior to the assay. The BirA500 (Avidity, LLC) biotinylation kit was used to biotinylate protein from the IMAC elution according to the manufacturer protocol. Reactions were incubated at 4C overnight and purified using size exclusion chromatography on a Superdex™ 6 10/300 Increase GL (GE Healthcare) in TBS buffer (25 mM Tris pH 8.0, 25 mM NaCl). Streptavidin coated biosensors were equilibrated for 10 minutes in Octet buffer (10 mM HEPES pH 7.4, 25 mM NaCl, 3 mM EDTA, 0.05% Surfactant P20) supplemented with 1 mg/ml Bovine Serum Albumin (SigmaAldrich). Enzymatically biotinylated axle components were immobilized onto the biosensors by dipping the biosensors into a solution with 10-50 nM protein for 200-500s. This was followed by dipping in fresh octet buffer to establish a baseline. Titration experiments were performed at 25 ° C. while rotating at 1,000 r.p.m. Association of rings rotor components with axle immobilized on the tips was allowed by dipping biosensors in solutions containing designed protein diluted in octet buffer followed by dissociation by dipping the biosensors into fresh buffer solution in order to monitor the dissociation kinetics.
The oligomeric state of in vivo assembled rotors was analyzed by online buffer exchange MS(60) using a Vanquish UHPLC coupled to a Q Exactive™ Ultra-High Mass Range (UHMR) mass spectrometer (Thermo Fisher Scientific) modified to allow for surface-induced dissociation (SID) similar to that previously described (61). 1 μL of 25 μM protein in TBS buffer were injected and online buffer exchanged into 200 mM ammonium acetate, pH 6.8 by a self-packed buffer exchange column (P6 polyacrylamide gel, Bio-Rad Laboratories) at a flow rate of 100 μL per min. A heated electrospray ionization (HEST) source with a spray voltage of 4 kV was used for ionization. Mass spectra were recorded for 1000-20000 m/z at 3125 resolution as defined at 400 m/z. The injection time was set to 200 ms. Voltages applied to the transfer optics were optimized to allow for ion transmission while minimizing unintentional ion activation, and a higher-energy collisional dissociation of 5 V was applied. Mass spectra were deconvolved using UniDec V4.2.2 22. Deconvolution settings included mass sampling every 10 Da, smooth charge states distributions, automatic peak width tool, point smooth width of 1 or 10, and beta of 50.
This application claims priority to U.S. Provisional Patent Application Ser. No. 63/246,045 filed Sep. 20, 2021, incorporate by reference herein in its entirety.
This invention was made with government support under Grant No. T32 GM008268, awarded by the National Institutes of Health and Grant No. CHE-1629214, awarded by the National Science Foundation. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63246045 | Sep 2021 | US |