The present invention relates to the field of Bacillus endospore appendages (Ena) and new protein multimeric and fibrous assemblies for applications as bionanomaterials. In particular, the invention relates to self-assembling proteins composed of bacterial DUF3992 domain-containing protein subunits, containing a conserved N-terminal cysteine-containing region, and engineered proteins, as well as multimers and fibers thereof. Moreover, recombinant expression of said self-assembling protein subunits provides for production methods of novel protein nanofibers and modified display surfaces, such as Bacillus spores. Finally, the use of said multimers, fibers, and surfaces in biomedical and biotechnological applications is described herein.
Self-assembling molecules provide the challenging opportunity to control chemical functionality and morphology and thus biological activity. The unique properties of proteins including their modular nature, biocompatibility, and biodegradability offer exciting opportunities in designing smart nanomaterials (Herrera Estrada & Champion, 2015; Jain et al., 2018). Inspired by nature, several proteins/peptides have been engineered to self-assemble into a variety of complex structures, ranging from nanoparticles, vesicles, cages and fibrous assemblies; these can be endowed with novel functionalities offering numerous applications in diverse areas of bioengineering (Matsuurua 2014; Katyal et al., 2019). Varying the amino acid sequences of self-assembling peptides and proteins and manipulating the environmental parameters, allows to modulate the properties, and to control self-assembly to obtain diverse on demand supramolecular nanostructures (Lombardi et al., 2019). The various properties of the side chains in amino acids offer possibilities for their chemical modification with infinite sequence combinations, as well as modifying the amine- and/or carboxy-termini of proteins can tune the self-assembly of protein polymers into specific nanoarchitectures (Aluri et al., 2012; Yu et al., 1996). So natural self-assembling proteins or peptides may be engineered to induce various properties other than self-assembly, including self-healing, shear-thinning, shape memory, and so on (Chen and Zou, 2019).
When faced with adverse growth conditions, bacteria belonging to the phylum Firmicutes can differentiate into the metabolically dormant and non-productive endospore state. These endospores exhibit extreme resilience towards environmental stressors due to their dehydrated state and unique multilayered cellular structure, and can germinate into the metabolically active and replicating vegetative growth state even hundreds of years after their formation (Setlow, 2014). In this way, Firmicutes belonging to the classes Bacilli and Clostridia are able to withstand long periods of drought, starvation, high oxygen or antibiotic stress. Endospores typically consist of an innermost dehydrated core which contains the bacterial DNA. The core is enclosed by an inner membrane surrounded by a thin layer of peptidoglycan that will function as the cell wall of the vegetative cell that emerges during spore germination. Then comes a thick cortex layer of modified peptidoglycan that is essential for dormancy (Atrih and Foster, 1999). The cortex layer is in turn surrounded by several proteinaceous coat layers. In some Clostridium and most Bacillus cereus group species, the spore is enclosed by an outermost loose-fitting paracrystalline exosporium layer consisting of (glyco)proteins and lipids (Stewart, 2015). The surface of Bacillus and Clostridium endospores can also be decorated with multiple micrometers long and a few nanometers wide filamentous appendages, which show a great structural diversity between strains and species (Hachisuka and Kuno, 1976; Rode et al., 1971; Walker et al., 2007). Bacillus cereus sensu lato is a group of Gram-positive endospore-forming bacteria that displays a high ecological diversity notwithstanding their phylogenetic relationship. Their endospores exhibit extreme resilience towards environmental stressors due to their dehydrated state and unique multilayered cellular structure and can germinate into the metabolically active and replicating vegetative growth state even hundreds of years after their formation (Setlow, 2014). B. cereus endospores are decorated with micrometer-long appendages of unknown identity and function. The number of endospore appendages (hereafter called Enas) varies and morphology between B. cereus group strains and species and some strains even simultaneously express Enas of different morphologies (Smirnova et al., 2013). Structures resembling the Enas have not been observed on the surface of the vegetative cells suggesting that they represent spore-specific fibers. Enas appear to be a widespread feature among spores of strains belonging to the B. cereus group. Ankolekar et al., showed that all of 47 food isolates of B. cereus produced endospores with appendages (Ankolekar & Labbe, 2010). Appendages were also found on spores of ten out of twelve food-borne, enterotoxigenic isolates of Bacillus thuringiensis, which is closely related to B. cereus, and best known for its insecticidal activity (Ankolekar & Labbe, 2010). Altogether, this makes those Ena structures an interesting starting point for engineering towards new sustainable biomaterials. Remarkably, the presence of spore appendages in species belonging to the B. cereus group was reported already in the '60s but efforts to characterize their composition and genetic identity has failed due to difficulties to solubilize and enzymatically digest the fibers (Gerhardt & Ribi, 1964; DesRosier & Lara, 1981). So, there is an interest and need for the structural characterization of such endospore appendages to allow the design, development, and production of novel types of smart biomaterials with improved properties such as sustainability in harsh environmental conditions.
The present invention is based on the resolution of the genetic and structural basis of isolated endospore appendages (Enas) from the food poisoning outbreak strain B. cereus NVH-0075/95, which revealed proteinaceous fibers of two main morphologies, S-type and L-type fibers. By using cryo-EM and 3D helical reconstruction it was shown that Bacillus endospore appendages (Enas) form a novel class of Gram-positive pili, characterized by subunits with a jellyroll topology forming multimers that are laterally stacked by β-sheet augmentation. Moreover, Ena fibers are longitudinally stabilized by disulphide crosslinking through extension of their N-terminal protein subunit peptides that bridge the multimers resulting in flexible pili (see also
The first aspect of the invention relates to a protein with self-assembling properties, which is characterized in its amino acid sequence as belonging to the PFAM13157 class, i.e. characterized by the presence of a DUF3992 domain in its sequence, and which further requires to match the 3D structural fold of an Ena protein, as presented herein, specifically the fold of Ena1B (with a sequence depicted in SEQ ID NO:8), with a highly significant similarity score, defined as a Dali Z-score of 6 or more, 6.5 or more, or preferably n/10-4 or more, wherein n is the number of amino acids of said protein sequence. In one embodiment, said self-assembling protein subunit is provided by the bacterially originating proteins comprising an amino acid sequence selected from the group of SEQ ID NOs:1-80, SEQ ID NO:145 and SEQ ID NO:146, representing the Ena protein sequences identified in the present application, or any prokaryotic homologue with at least 60%, or at least 70% or at least 80% or at least 90% identity of any one of the sequences of SEQ ID NO:1-80, SEQ ID NO:145 or SEQ ID NO:146, wherein the % identity is calculated over the full length window of the sequence. In fact, the structural requirement described herein to match the Ena1B fold as disclosed herein often still stands for bacterial proteins with homologies even lower than 60% identity to the structural reference sequence of SEQ ID NO:8, since the bacterial Ena family is further classified in different members, as described below. So one embodiment relates to the isolated self-assembling protein comprising a DUF3992 domain, as determined by aligning to its Hidden Markov Model as depicted in Table 1, and wherein said protein subunit has a 3D (predicted) fold matching the Ena1B structure with a fold similarity score of 6.5 or more, as defined herein, and wherein Ena1B corresponds to SEQ ID NO:8 and wherein the Ena1B reference structure corresponds to the coordinates as provided herein in Table 2, and as deposited in PDB7A02.
In a specific embodiment, the self-assembling proteins referred to herein relates to said Ena protein family, as defined above, and/or as provided by the amino acid sequences depicted in SEQ ID NOs: 1-80, SEQ ID NO:145, or SEQ ID NO:146, providing representative examples of the Bacillus Ena1A (SEQ ID NO: 1-7), Ena1B (SEQ ID NO: 8-14), Ena1C (SEQ ID NO: 15-20), Bacillus Ena2A (SEQ ID NO: 21-28, SEQ ID NO:145), Ena2B (SEQ ID NO: 29-37), Ena2C (SEQ ID NO: 38-48, SEQ ID NO:146), and different types of other Bacillus Ena3 (SEQ ID NO: 49-80) proteins, respectively, or bacterial orthologues of any one thereof, which have at least 80% identity of any sequence depicted in SEQ ID NO:1-80, SEQ ID NO:145 or SEQ ID NO:146. The regions and level of sequence conservation is shown for the Ena family members by the multiple sequence alignments depicted in
A further embodiment relates to said self-assembling protein as described herein, which is an engineered self-assembling protein, wherein the Ena fold and HMM profile as described herein matches the Ena1B fold and DUF3992 profile, as described herein, but which is ‘engineered’ or ‘modified’ by further comprising for example, but not limited to, at least one of the modifications including a heterologous N- or C-terminal tag, and/or a steric block, a protein sequence variant which may contain one or more mutations as compared to the native or wild type Ena sequence, or which may contain an insertion of a peptide or scaffold, or a deletion of a number of amino acids, or which may be provided as separate parts of the Ena protein, such as ‘split’ parts, that assemble upon co-incubation.
A second aspect of the invention relates to a protein multimer comprising or containing at least seven of said self-assembling protein subunits, and preferably between 7 and maximally twelve subunits, which are non-covalently linked. More specifically, said multimer consists of seven, eight, nine, ten, eleven, twelve, 13, 14, 15, 16, 17, 18, 19, 20, or more self-assembling Ena protein subunits as defined herein, non-covalently stacked via β-sheet augmentation (a protein-protein interaction principle described in Remaut and Waksman, 2006). In a specific embodiment, said multimers as described herein may further comprise covalent connections, provided by for instance Cys connections between different protein subunits of said multimer (in suitable conditions). In one embodiment, said multimers are present ‘as such’, i.e. not as a filament or fiber constellation, and are therefore non-naturally occurring multimeric assemblies. Particularly, said self-assembling protein subunits defined herein as Ena proteins, may further comprise at least two conserved cysteine residues in their N-terminal region or N-terminal connector, as used interchangeably herein, for intermolecular disulphide bridge formation with further multimers. In a specific embodiment the multimeric assembly comprises seven to twelve protein subunits from the Ena protein family, as further defined herein, or as provided by the amino acid sequences depicted in SEQ ID NOs: 1-80, SEQ ID NO:145, or SEQ ID NO:146 providing representative examples of the Bacillus Ena1A (SEQ ID NO: 1-7), Ena1B (SEQ ID NO: 8-14), Ena1C (SEQ ID NO: 15-20), Bacillus Ena2A (SEQ ID NO: 21-28, SEQ ID NO:145), Ena2B (SEQ ID NO: 29-37), Ena2C (SEQ ID NO: 38-48, SEQ ID NO:146), and different types of other Bacillus Ena3 (SEQ ID NO: 49-80) proteins respectively, or bacterial orthologues thereof, which have at least 80% identity of any sequence depicted in SEQ ID NO:1-80, SEQ ID NO:145, or SEQ ID NO:146. A specific embodiment relates to said multimers with 7 to 12 protein subunits with identical self-assembling proteins as described herein. Alternatively, the multimers comprise at least 7 protein subunits wherein at least one of said protein subunits is an engineered self-assembling Ena protein, as defined herein and which concerns a non-naturally occurring Ena protein. In a specific embodiment, said multimers comprise at least 7, preferably maximally 12 Ena protein subunits, wherein at least one subunit is an engineered Ena protein comprising a steric block at the N- and/or C-terminus, thereby preventing the multimer to further assemble into fibers (
A specific embodiment relates to said multimers as described herein which are homomultimers or heteromultimers, and more specifically relate to multimers consisting of 6, or 7 to 12 subunits, and preferably relate to a heptamer, so consisting of 7 subunits, or a nonamer, so consisting of 9 subunits, both thereby possibly forming a disc-like multimer, or a decamer, undecamer or dodecamer, so consisting of 10, 11 or 12 subunits, respectively, thereby forming a helical turn or an arc of a β-propeller structure (
Another embodiment relates to said self-assembling protein subunits, or multimers of self-assembling DUF3992-containing protein subunits or Ena protein subunits or engineered Ena protein subunits, which comprise an N-terminal region or N-terminal connector (Ntc) region wherein the amino acid residue consensus motif ZXnCCXmC is present, wherein X is any amino acid, n is 1 or 2, m is between 10-12, and Z is preferably Leu, Ile, Val or Phe, and preferably wherein the C-terminal region or C-terminal receiver region comprises the consensus motif GX2/3CX4Y, wherein G is Glycine, X is any amino acid (2 or 3 residues), and Y is Tyrosine, so that the Cysteines (C) present in said N- and C-terminal region motifs of the protein subunits may form disulphide bridges for longitudinally connecting one multimer to another multimer (ultimately leading to assemblies into S-fibers as in
Another aspect of the invention relates to protein fibers produced as to comprise at least two of said multimers as described herein, wherein said multimers are not hindered to longitudinally crosslink through disulphide bonds, more specifically through at least one disulphide bond, preferably two or more disulphide bonds. Said disulphide bonds may be formed between side chains of cysteine residues of the N-terminal region or N-terminal connector of one or more subunits of a multimer with one or more cysteine residues present in the N- and/or C-terminal region of one or more subunits of the multimer constituting the preceding layer of the longitudinally formed protein fiber. Said protein fiber may be a recombinantly produced fiber.
In another embodiment, said protein fiber is an engineered protein fiber, comprising at least two multimers of which at least one multimer is an engineered multimer as defined herein, or wherein at least one multimer comprises at least one engineered Ena protein, as defined herein. In a preferred embodiment the protein fibers comprises multimers wherein the protein subunits comprise identical self-assembling protein subunits as described herein, and/or are composed of identical Ena proteins.
Another aspect of the invention relates to a chimeric gene construct comprising a promoter or regulatory sequence element that is operably linked to a DNA element comprising a coding sequence for the (engineered) self-assembling protein, preferably an Ena protein, as defined herein. More specifically, said coding sequence may code for a protein comprising an Ena protein as depicted in SEQ ID NOs: 1-80; SEQ ID NO:145, or SEQ ID NO:146, or a functional homologue of any of said Ena family members comprising Ena1/2A, Ena1/2B, Ena1/2C, or Ena3A, with at least 80% amino acid identity to any of SEQ ID NO:1-80, SEQ ID NO:145, or SEQ ID NO:146, or may code for an engineered Ena protein form thereof, as defined herein. In a specific embodiment, said promoter or regulatory element is heterologous to the coding sequence where it is operably linked to, and optionally is an inducible promoter, as known in the art.
A further embodiment relates to a host cell for expression of the chimeric gene as described herein, or for expression of the self-assembling protomers of the multimers or protein assemblies as described herein. Another embodiment relates to a modified spore-forming cell or bacterium, comprising the chimeric gene as described herein, or an engineered Ena gene or a gene encoding an engineered Ena protein. Another embodiment relates to a modified bacterial spore, in particular a modified Bacillus endospore, which comprises and/or displays Ena proteins, or engineered forms thereof, or multimers as described herein, or has protein fibers, in particular engineered or modified protein fibers, recombinantly produced fibers or spores, as described herein.
In a further aspect of the invention a modified surface or solid support is provided, said surface comprising an Ena protein, a multimer assembly, or a protein fiber as described herein, or an engineered form of any thereof. Said modified surface is composed by covalent attachments of said Ena protein, multimer or fiber to said surface, and may be a cellular or artificial surface, in particular a solid surface of any material type. Said modified surface may thus be used as a nucleator for epitaxial growth of a protein fiber, for instance when said modified surface is exposed or contacted with a solution of Ena proteins, wherein said Ena proteins are preferably present in monomeric or oligomeric form.
Further embodiments relate to a protein film comprising the engineered Ena protein fiber and/or the Ena protein fibers as described herein, said film preferably being a thin film, as known in the art. Alternatively a hydrogel is disclosed herein comprising the engineered protein fiber as described herein and/or the Ena protein fiber as described herein. A further embodiment relates to a nanowire comprising the engineered protein fibers that are spun into a thicker, thread-like bundle.
A final aspect of the invention relates to method to recombinantly produce the protein assemblies as described herein, more particularly the Ena proteins, multimeric and fibrous assemblies, or modified surfaces, in particular spore surfaces or synthetic surfaces as described herein.
One embodiment describes a method to produce a self-assembling DUF3992 domain-containing monomer, or multimer as described herein comprising the steps of:
Another embodiment provides for a method to recombinantly produce the self-assembling DUF3992 domain-containing or Ena proteins which are arrested or at least impeded in fiber assembly or in epitaxial growth, so a method to recombinantly produce engineered Ena proteins blocked in fiber outgrowth, comprising the method as described above, wherein the N- and/or C-terminal tag is at least 1, preferably at least 6, more preferably at least 9, or 15 amino acids in length to sterically block self-assembly of the protein subunits or multimers in longitudinal fiber formation. In a further embodiment, said N- or C-terminal tag is at least 6 amino acids in length to reversibly impede or hamper self-assembly of the protein subunits or multimers in longitudinal rigid fiber formation. In said case the N- or C-terminal tag may be a removable tag, for instance, by including a protease recognition sequence for removal of the tag by a protease, and reversal of the steric blockage of subunit and multimer assembly.
Another embodiment relates to a method to produce a protein fiber as described herein, comprising the steps a) and b) of the above method, wherein the N- and/or C-terminal tag is a present as a removable or cleavable tag, said method further comprising the step c) wherein the N- and/or C-terminal tag is removed or cleaved off to allow further self-assembly of the formed multimers into protein fibers. Alternatively step c) may be exerted prior to the purification step b). Furthermore, a method is provided to produce the modified surface as described herein, comprising the steps a), b), and/or c) (or vice versa c) and/or b)), further comprising step d) wherein a surface is modified by displaying or covalently attaching the (engineered) Ena protein, multimer or fiber to said surface.
Finally, the protein assemblies, such as fibers as described herein, may be produced within a cell, as depicted in the method for recombinant production of the Ena protein fibers comprising the steps of:
The drawings described are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes.
The drawings described are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes.
(
(
(
(
(
(
(
(
(
(
Approximate likelihood trees generated by FastTree v.2.1.8 (Price et al., 2010), visualized in Microreact (Argimon et al., 2016). Trees are rooted on midpoint. Nodes are colored according to annotated species. See Methods for further details. (
60k magnification TEM image of negatively stained Ena1A fibers that were formed in the cytoplasm of E. coli following recombinant expression of monomeric subunits.
(
(
(
(
The recEna1B (SEQ ID NO:84) structure is used here to demonstrate the suitable sites for insertion of single amino acids, peptides or full domains into loops connecting strands E-F, B-C, H-I and D-E (
The identifiers correspond to SEQ ID NOs: 1-7 for Ena1A and SEQ ID NOs: 21-28 for Ena2A.
The identifiers correspond to SEQ ID NOs: 8-14 for Ena1B and SEQ ID NOs: 29-37 for Ena2B.
The identifiers correspond to SEQ ID NOs: 15-20 for Ena1C and SEQ ID NOs:38-48 for Ena2C
Multiple sequence alignment of selected, representative Ena3 homologues, corresponding to SEQ ID NOs: 49-80.
3 μl of a 1 mg·mL−1 Ena1B suspension was deposited onto a Cu-mesh formvar grid, washed 3× in miliQ followed by 1% (w/v) uranyl acetate.
(
(
(
(
(
NS-TEM image of Ena2A recombinantly expressed in B121 (DE3) C43 E. coli without any N-terminal blocker, top right negative stain 2D class average confirming the identity of S-Ena fibers.
Recombinant Ena1BΔNtc fibers present in the extracellular milieu (
(
Examples shown for loops DE and HI (as indicated in
All 4 constructs (SEQ ID NO:8 for WT Ena1B, and SEQ ID NOs: 140-142 for Ena1B insertion variants) were expressed in E. coli after which total cell lysates and soluble fractions were loaded onto SDS-PAGE. Anti-Ena1B panel: high molecular weight bands of Ena1B that are retained in the stacking gel correspond to SDS-insoluble fibers (see nsTEM images in
The split of Ena1B in the BC or HI loop at Ala30 or Ala100, respectively.
Scalebar represents 100 nm.
nsTEM analysis micrograph of biotinylated Ena1B S-type fibers on streptavidin-coated gold beads.
Site-directed mutagenesis sites for Ena1B S-type fibers: surface exposed residues T31 was selected for mutagenesis into a cysteine residue (
Cryo-EM structure for Ena1B (UniProt. A0A1Y6A695) was compared with the Alphafold predicted fold structures for Ena1B itself, and the predicted Ena2A (NCBI ID: WP_001277540.1; SEQ ID NO:145), WP_017562367.1 and WP_041638338.1 protein sequences. RMSD, root-mean-square-deviation of atomic positions between atom i of each structure and the corresponding atom of the reference structure (cryoEM model of Ena1B—Uniprot: A0A1Y6A695; corresponding to SEQ ID NO:8), as well as the fold similarity score, i.e. the Dali Z-score (Jumper et al., 2021 Nature; doi.org/10.1038/s41586-021-03819-2).
The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto but only by the claims. Any reference signs in the claims shall not be construed as limiting the scope. Of course, it is to be understood that not necessarily all aspects or advantages may be achieved in accordance with any particular embodiment of the invention. Thus, for example those skilled in the art will recognize that the invention may be embodied or carried out in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other aspects or advantages as may be taught or suggested herein. The invention, both as to organization and method of operation, together with features and advantages thereof, may best be understood by reference to the following detailed description when read in conjunction with the accompanying drawings. The aspects and advantages of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter. Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases ‘in one embodiment’ or ‘in an embodiment’ in various places throughout this specification are not necessarily all referring to the same embodiment but may.
Where an indefinite or definite article is used when referring to a singular noun e.g. “a” or “an”, “the”, this includes a plural of that noun unless something else is specifically stated. Where the term “comprising” is used in the present description and claims, it does not exclude other elements or steps. Furthermore, the terms first, second, third and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments, of the invention described herein are capable of operation in other sequences than described or illustrated herein. The following terms or definitions are provided solely to aid in the understanding of the invention. Unless specifically defined herein, all terms used herein have the same meaning as they would to one skilled in the art of the present invention. Practitioners are particularly directed to Sambrook et al., Molecular Cloning: A Laboratory Manual, 4th ed., Cold Spring Harbor Press, Plainsview, N.Y. (2012); and Ausubel et al., Current Protocols in Molecular Biology (Supplement 114), John Wiley & Sons, New York (2016), for definitions and terms of the art. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art (e.g. in molecular biology, biochemistry, structural biology, and/or computational biology).
The term “nucleic acid sequence”, “DNA sequence” or “nucleic acid molecule(s)” as used herein refers to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. This term refers only to the primary structure of the molecule. Thus, this term includes double- and single-stranded DNA, and RNA. It also includes known types of modifications, for example, methylation, “caps” substitution of one or more of the naturally occurring nucleotides with an analog. By “nucleic acid construct” it is meant a nucleic acid molecule that has been constructed to comprise one or more functional units not found together in nature. Examples include circular, linear, double-stranded, extrachromosomal DNA molecules (plasmids), cosmids (plasmids containing COS sequences from lambda phage), viral genomes comprising non-native nucleic acid sequences, and the like. “Coding sequence” is a nucleotide sequence, which is transcribed into m RNA and/or translated into a polypeptide when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are determined by a translation start codon at the 5′-terminus and a translation stop codon at the 3′-terminus. A coding sequence can include, but is not limited to mRNA, cDNA, recombinant nucleotide sequences or genomic DNA, while introns may be present as well under certain circumstances. “Promoter region of a gene” or “regulatory element” as used here refers to a functional DNA sequence unit that, when operably linked to a coding sequence and possibly placed in the appropriate inducing conditions, is sufficient to promote transcription of said coding sequence. “Operably linked” refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. A promoter sequence “operably linked” to a nucleic acid molecule that is a coding sequence is ligated in such a way that expression of the coding sequence is achieved under conditions compatible with the promoter sequence. “Gene” as used here includes both the promoter region of the gene as well as the coding sequence. It refers both to the genomic sequence (including possible introns) as well as to the cDNA derived from the spliced messenger, operably linked to a promoter sequence. The term “terminator” or “transcription termination signal” encompasses a control sequence which is a DNA sequence at the end of a transcriptional unit which signals 3′ processing and polyadenylation of a primary transcript and termination of transcription. The terminator can be derived from the natural gene, from a variety of other plant genes, or from T-DNA. The terminator to be added may be derived from, for example, the nopaline synthase or octopine synthase genes, or alternatively from another gene. With a “chimeric gene” or “chimeric construct” or “chimeric gene construct” is meant a recombinant nucleic acid sequence molecule in which a promoter or regulatory nucleic acid sequence is operatively linked to, or associated with, a nucleic acid sequence that codes for an mRNA, such that the promoter or regulatory nucleic acid sequence is able to regulate transcription or expression of the associated nucleic acid coding sequence. The regulatory nucleic acid sequence of the chimeric gene is not operatively linked to the associated nucleic acid sequence as found in nature, and may be heterologous to the encoding nucleic acid sequence molecule, meaning that its sequence is not present in nature in the same constellation as presented in the chimeric construct. More general, the term “heterologous” is defined herein as a sequence or molecule that is different in its origin.
The terms “protein”, “polypeptide”, and “peptide” are interchangeably used further herein to refer to a polymer of amino acid residues and to variants and synthetic analogues of the same. A monomeric or protomer is defined as a single polypeptide chain from amino-terminal to carboxy-terminal ends. A “protein subunit” as used herein refers to a monomer or protomer, which may form part of a multimeric protein complex or assembly.
The terms “chimeric polypeptide”, “chimeric protein”, “chimer”, “fusion polypeptide”, “fusion protein”, are used interchangeably herein and refer to a protein that comprises at least two separate and distinct polypeptide components that may or may not originate from the same protein. The term also refers to a non-naturally occurring molecule which means that it is man-made. The term “fused to”, and other grammatical equivalents, such as “covalently linked”, “connected”, “attached”, “ligated”, “conjugated” when referring to a chimeric polypeptide (as defined herein) refers to any chemical or recombinant mechanism for linking two or more polypeptide components. The fusion of the two or more polypeptide components may be a direct fusion of the sequences or it may be an indirect fusion, e.g. with intervening amino acid sequences or linker sequences, or chemical linkers. The fusion of amino acid residues or (poly)peptides to an Ena protein or to another protein of interest as described herein, may be a covalent peptide bond, or also refer to a fusion obtained by chemical linking. The term “fused to”, as used herein, and interchangeably used herein as “connected to”, “conjugated to”, “ligated to” refers, in particular, to “genetic fusion”, e.g., by recombinant DNA technology, as well as to “chemical and/or enzymatic conjugation” resulting in a stable covalent link.
The term “molecular complex” or “complex” refers to a molecule associated with at least one other molecule, which may be a protein or a chemical entity. The term “associating with” refers to a condition of proximity between a chemical entity or compound, or portions thereof, and a binding pocket or binding site on a protein. As used herein, the term “protein complex” or “protein assembly” or “multimer” refers to a group of two or more associated macromolecules, whereby at least one of the macromolecules is a protein. A protein complex or assembly, as used herein, typically refers to binding or associations of macromolecules that can be formed under physiological conditions. Individual members of a protein complex, such as protein subunits or protomers, are linked by non-covalent or covalent interactions. “Binding” means any interaction, be it direct or indirect. A direct interaction implies a contact between the binding partners. An indirect interaction means any interaction whereby the interaction partners interact in a complex of more than two molecules. The interaction can be completely indirect, with the help of one or more bridging molecules, or partly indirect, where there is still a direct contact between the partners, which is stabilized by the additional interaction of one or more molecules. The binding or association maybe non-covalent—wherein the juxtaposition is energetically favoured by for instance hydrogen bonding or van der Waals or electrostatic interactions—or it may be covalent, for instance by peptide or disulphide bonds.
It will be understood that a protein complex can be multimeric. Protein complex assembly can result in the formation of homo-multimeric or hetero-multimeric complexes. Moreover, interactions can be stable or transient. The term “multimer(s)”, “multimeric complex”, or “multimeric protein(s) or assemblies” comprises a plurality of identical or heterologous polypeptide monomers. Polypeptides can be capable of self-assembling into multimeric assemblies (i.e.: dimers, trimers, pentamers, hexamers, heptamers, octamers, etc.) formed from self-assembly of a plurality of a single polypeptide monomers (i.e., “homo-multimeric assemblies”) or from self-assembly of a plurality of different polypeptide monomers (i.e. “hetero-multimeric assemblies”). As used herein, a “plurality” means 2 or more. The multimeric assembly comprises 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or more polypeptide monomers. The multimeric assemblies can be used for any purpose and provide a way to develop a wide array of protein “nanomaterials.” In addition to the finite, cage-like or shell-like protein assemblies, they may be designed by choosing an appropriate target symmetric architecture. The monomers or protomers and/or multimeric assemblies of the invention can be used in the design of higher order assemblies, such as fibers, with the attendant advantages of hierarchical assembly. The resulting multimeric or fibrous assemblies are highly ordered materials with superior rigidity and monodispersity, and can be functional as a multimer or fiber itself, or form the basis of advanced functional materials, such as modified surfaces containing multimeric assemblies or fibers, and custom-designed molecular machines with wide-ranging applications. More specifically, a multimer as used herein refers to homo- or heteromultimeric protein complexes which are non-covalently associated with each other to form an arc, turn, ring or disc-like structure; and/or further modified to grow or develop into self-assembling or triggered formation of nanofibers. Said multimeric assemblies may contain Ena proteins as defined herein, or Ena protein variants, mutant and/or engineered Ena proteins, as well as other proteins that may associate to said Ena protein-based multimers, called engineered multimers, thereby expanding said multimer towards further modifications required for certain applications.
A “protein domain” is a distinct functional and/or structural unit in a protein. Usually a protein domain is responsible for a particular function or interaction, contributing to the overall role of a protein. Domains may exist in a variety of biological contexts, where similar domains can be found in proteins with different functions. Protein secondary structure elements (SSEs) typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure. The two most common secondary structural elements of proteins are alpha helices and beta (β) sheets, though β-turns and omega loops occur as well. Beta sheets consist of beta strands (also β-strand) connected laterally by at least two or three back-bone hydrogen bonds, forming a generally twisted, pleated sheet. A β-strand is a stretch of poly-peptide chain typically 3 to 10 amino acids long with backbone in an extended conformation. A β-turn is a type of non-regular secondary structure in proteins that causes a change in direction of the polypeptide chain. Beta turns (β turns, β-turns, β-bends, tight turns, reverse turns) are very common motifs in proteins and polypeptides, which mainly serve to connect β-strands.
By “recombinant polypeptide” is meant a polypeptide made using recombinant techniques, i.e., through the expression of a recombinant or synthetic polynucleotide, which may be obtained in vitro and/or in a cellular context. When the chimeric polypeptide or biologically active portion thereof is recombinantly produced, it is also preferably substantially free of culture medium, i.e., culture medium represents less than about 20%, more preferably less than about 10%, and most preferably less than about 5% of the volume of the protein preparation. By “isolated” or “purified” is meant material that is substantially or essentially free from components that normally accompany it in its native state.
“Homologue”, “Homologues” of a protein encompass peptides, oligopeptides, polypeptides, proteins and enzymes having amino acid substitutions, deletions and/or insertions relative to the unmodified or wild-type protein in question and having similar biological and functional activity as the unmodified protein from which they are derived. The term “amino acid identity” as used herein refers to the extent that sequences are identical on an amino acid-by-amino acid basis over a window of comparison. Thus, a “percentage of sequence identity” is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical amino acid residue (e.g., Ala, Pro, Ser, Thr, Gly, Val, Leu, Ile, Phe, Tyr, Trp, Lys, Arg, His, Asp, Glu, Asn, Gln, Cys and Met, also indicated in one-letter code herein) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity. A “substitution”, or “mutation” as used herein, results from the replacement of one or more amino acids or nucleotides by different amino acids or nucleotides, respectively as compared to an amino acid sequence or nucleotide sequence of a parental protein or a fragment thereof. It is understood that a protein or a fragment thereof may have conservative amino acid substitutions which have substantially no effect on the protein's activity. The percentage of amino acid identity as provided herein is preferably in view of a window of comparison corresponding to the total length of the native or natural wild-type protein, or of the specific amino acid sequence referred to.
The term “wild-type” refers to a gene or gene product isolated from a naturally occurring source, or included in a cell, cell line or organism. A wild-type gene or gene product is that which is most frequently observed in a population and is thus arbitrarily designed the “normal” or “wild-type” form of the gene or gene product a observed in nature. In contrast, the term “modified”, “engineered”, “mutant” or “variant” refers to a gene or gene product that displays modifications in sequence, post-translational modifications and/or functional properties (i.e., altered characteristics) when compared to the wild-type or naturally-occurring gene or gene product. A knock-out refers to a modified or mutant or deleted gene as to provide for non-functional gene product and/or function. It is noted that naturally occurring mutants or variants may be isolated; these are identified by the fact that they have altered characteristics when compared to the wild-type gene or gene product, and a different sequence as compared to the reference gene or protein.
The present invention relates to novel protein assemblies applicable in several constellations as next-generation biomaterials. The generation of the multimeric assemblies as disclosed herein is based on the unravelling of the structural and genetical basis of Bacillus endospore appendages (Enas), which led to a number of opportunities for engineering and modulating these protein assemblies for the production of rigid but flexible structures with specific properties and with potential in numerous applications. The identification of the Ena protein family as building blocks of these multimeric and fibrous assemblies, directly correlated self-assembling property of the proteins to the presence of a DUF3992 protein domain present in a panel of bacterial proteins, allowing to form multimeric assemblies. Furthermore, the presence of the DUF3992 domain, as determined by adherence to the DUF3992 HMM profile (as provided in Table 1) in combination with a conserved N-terminal connector region, comprising at least two conserved cysteine residues, as provided by the motif ZXnC(C)XmC, wherein Z is Ile, Phe, Leu or Val, n is 1 or 2 residues, m is 10-12 residues, C is Cys, and X is any amino acid, which allows to covalently connect the multimeric assemblies longitudinally into a rigid fiber. Flexibility of the fibers is retained though by the characteristic of a 12-15 aa spacer region near the N-terminus, allowing to maintain the gap between stacked multimers (see
A Novel Prokaryotic Self-Assembling Protein Family, the Ena Proteins.
A first aspect of the present invention relates to a self-assembling protein subunit, which comprises a DUF3992 domain, providing for the structural element required to obtain a self-assembling protein multimeric assembly under permissive buffer conditions. In this context, ‘self-assembly’ refers to the spontaneous organization of molecules in ordered supramolecular structures thanks to their mutual non-covalent interactions without external control or template. The chemical and conformational structures of individual molecules carry the instructions of how these are assembled. The same or different molecules may constitute the building blocks of a molecular self-assembling system. Generally, interactions are established in a less ordered state, such as a solution, random coil, or disordered aggregate leading to an ordered final state, which can be a crystal or folded macromolecule, or a further assembly of macromolecules. The association of small molecules or proteins into well-ordered structures is driven by thermodynamic principles, thus, based on energy minimization. The interactions involved in the molecular assembly process are electrostatic, hydrophobic, hydrogen bonding, van der Waals interactions, aromatic stacking, and/or metal coordination. Although non-covalent and individually weak, these forces can generate highly stable assemblies and govern the shape and function of the final assembly (Lombardi et al., 2019). Said self-assembling protein subunits described herein, and called Ena proteins herein, are capable of forming self-assembling multimers and protein fibers envisaged herein to be applied in different settings and biomaterials. The multimeric or fibrous assemblies can be obtained from the pre-existing components termed building blocks, or subunits, more specifically the isolated self-assembling proteins as described herein, the Ena proteins.
Moreover, other embodiment described herein relate to ‘modified’ or ‘engineered’ building blocks or protein subunits, or assemblies, as referred to herein, and are defined as being designed or derived from the existing (native) ones obtained by changing the chemical composition, the length, and the directionality of interactions to create new units, or units with a new functionality, which contain all the necessary information that encodes their self-assembly. By controlling environmental variables, the system reaches a new thermodynamic minimum leading to a different ordered structure. In most cases, because the protein subunit self-assembly occurs by non-covalent interactions, their self-assembly is reversible and sensitive to the environment and the activity can be tuned controlling the association and the dissociation of the proteins. The self-assembling property of these proteins is provided by the presence of the DUF3992 domain.
‘Domain of Unknown Function’ or ‘DUF’ protein families are designated as such as a tentative name and tend to be renamed to a more specific name (or merged to an existing domain) after a protein function is identified. So the present invention in fact defines for the first time a function of self-assembly to the prokaryotic DUF3992 domain-containing proteins that further also match the Ena1B protein fold, as described herein, even though, the DUF3992-containing proteins are in the PFAM database known as a family of proteins that is functionally uncharacterised, and found in bacteria, typically between 98 and 122 amino acids in length. The PFAM database (version 33.1) also mentions that there is a single completely conserved residue T that may be functionally important (El-Gebali et al. 2019, The Pfam database; http://pfam.xfam-org/family/PF13157). This ‘Domain of Unknown Function’ 3992 is structurally characterized by the Hidden Markov models (HMM) obtained according to alignment of the 64 bacterial proteins known (Pfam-B_480 release 24.0) to comprise this particular DUF3992 protein domain, as also provided in the PFAM database for the PFAM13157 family (also see Table 1 as provided herein). The HMM profile for DUF3992 domain proteins of PFAM13157 family is also shown on http://pfam.xfam.org/family/PF13157#tabview=tab4 and should be interpreted as in Wheeler et al. (2014): ‘hidden Markov models are shown by drawing a stack of letters for each position, where the height of the stack corresponds to the conservation at that position, and the height of each letter within a stack depends on the frequency of that letter at that position.’
This group of spontaneously assembling proteins comprising the DUF3992 domain, previously indicated in the databases as hypothetical proteins of unknown function may hence now be part of the annotation constituting the definition of the bacterial Ena protein family. So, the Ena protein family is defined as bacterial DUF3992 classifying proteins based on their HMM profile aligning with the one presented herein in Table 1, with a length of about 100 to 160 amino acids, with the capacity to spontaneously assemble into higher structures such as multimers, and preferably said multimers preferably having the capacity to further assemble into fibrous structures, stabilized by the formation of longitudinal covalent disulphide bridges. Furthermore, the structural definition of the Ena proteins relates to these bacterial DUF3992 self-assembling proteins with an Ena fold, wherein aid Ena fold comprises: an 8-stranded β-sandwich, with sheets in BIDG and CHEF topology, as described herein, and as derivable from the matching of the (predicted) fold based on the amino acid sequence, as compared to the reference Ena1B cryoEM structure fold provided herein with a Z-score of 6.5 or more, and with an N-terminal ‘Ntc’ element containing a conserved Z-Xn-C(C)-Xm-C motif for covalent connection to preceding subunits in the fiber, wherein X=any amino acid, Z=Leu/Val/Ile/Phe, n=1 to 2 residues, m=10 to 12 residues, and C=Cys.
More specifically, the DUF3992 domain-containing protein subunits in the multimers as described herein are non-covalently linked to each other through β-sheet augmentation, a structural feature known in the art and previously described for instance in Remaut and Waksman (2006) as a staggering of protein subunits via electrostatic interactions between a β-strand from one of the proteins binding to the edge of a β sheet in the other protein (also see
The DUF3992 domain-containing self-assembling Ena proteins as disclosed herein are N-terminally characterized by conserved cysteine residues favouring the formation of rigid pili or appendage assemblies, as observed on Bacillus endospores. Based on this observation, the capacity of this self-assembling protein family to form fibers in vitro was investigated herein (see
Said Ena protein is defined herein as the proteins of PFAM 13157, constituted of bacterial DUF3992 domain-containing proteins, as characterized by its specific HMM profile, and as described in the Examples provided herein, further demonstrating to have a conserved Cys residue profile (see
In view of the phylogenetic and functional characterization of this family, an ‘Ena protein’, as used herein, is exemplified, but not limited to the list of Bacillus proteins depicted in SEQ ID NO:1-80, SEQ ID NO:145 or SEQ ID NO:146, disclosing representative proteins for each cluster of each Ena protein family member, exemplified further herein by Bacillus cereus NVH 0075-95 383 Ena1A (SEQ ID NO:1), Ena1B (SEQ ID NO:8), and Ena1C (SEQ ID NO:15) and Bacillus cytotoxicus NVH 391-98 Ena2A (SEQ ID NO: 21), Ena2B (SEQ ID NO: 29), Ena2C (SEQ ID NO: 38), and Bacillus cereus Ena3A (SEQ ID NO:49), and a number of homologues and/or orthologues in other bacterial strains, wherein each orthologous sequence of a family member has at least 80% identity to the sequence used herein as defined over their total length (also see Examples ‘Phylogenetic analysis’; and
Multimer Assemblies.
A second aspect of the invention relates to a protein multimeric assembly, or multimer, which comprises at least 7, preferably between 7 and 12, or more self-assembling protein subunits with a ‘Domain-of-Unknown-Function 3992’ (DUF3992) domain protein and typical N-terminal conserved region, wherein said protein subunits are non-covalently connected to each other.
Said self-assembling DUF3992 domain-containing protein subunits more specifically relate to proteins subunits comprising an Ena protein sequence, and/or an engineered Ena protein sequence.
Another embodiment discloses the multimer comprising 7-12 protein subunits wherein said protein subunits comprise Ena proteins, and/or an engineered Ena protein form thereof. In specific embodiments said multimers comprise proteins subunits selected from Ena proteins as depicted in SEQ ID NOs:1-80, 145-146, or a homologue with at least 60% identity of any one thereof, or at least 70%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 97% of any one thereof, a functional orthologue thereof, and/or an engineered Ena protein form thereof. These multimers as described herein are formed by self-assembly of protein subunits comprising a DUF3992 domain and defined to consist of 6, 7, 8, 9, 10, 11 or 12 protein subunits (
Alternatively, the Ena1/2C protein has been shown to form ring-like or disc-like multimers when recombinantly expressed. A closed circular multimer or disc-like structure is formed in vitro, with or without a sterically frustrated N- and/or C-terminal region. Even more, in particular cases even a recombinantly expressed truncated Ena1/2C protein, lacking the first N-terminal connector region, is capable of self-assembly and to assemble into multimers. In one embodiment, these Ena1C constituting multimers may consists of a heptamer or a nonamer, with 7 or 9 subunits, respectively (see also
The recombinantly produced Ena1C multimer or nonameric ring-structure may be further engineered by adding a heterologous N- or C-terminal tag, by mutation or insertions to adapt the Ena1C multimeric assemblies as biofunctional and structural tools.
In a specific embodiment, said multimer as described herein, comprising six to twelve protein subunits comprising a DUF3992 domain-containing protein, or specifically an Ena protein, a homologue thereof or an engineered form thereof, is an isolated multimer. Said isolated multimer is obtained by recombinant expression of a chimeric gene as described herein, to produce the multimer ‘as such’, optionally followed by purification of said multimers from the production host. One embodiment thus relates to said isolated multimer consisting of at least 6, or preferable 7-12 subunits, or an engineered multimer or a multimer comprising at least one engineered protein subunit as compared to the protein subunit its natural counterpart or wild type protein form. In specific embodiments, the protein subunits of the multimers as described herein may be homomeric multimers, or heteromeric multimers, the latter may comprise identical DUF3992 subunits, or consist of wild type Ena protein subunits and engineered Ena protein subunits, such as for instance tagged Ena proteins, or mutant Ena protein subunits. The heteromeric multimers may consist of one type of Ena protein or several types of Ena protein members.
Overall, the those multimers as defined herein to comprise at least seven DUF3992 domain-containing protein subunits, which may be at least one Ena protein as defined herein, and wherein said protein subunits are non-covalently linked via β-sheet augmentation, may comprise at least one engineered Ena protein subunit, which is defined herein as a non-naturally occurring Ena protein subunit, with the aim to prevent further oligomerisation and covalent interaction triggered by the N-terminal and/or C-terminal regions forming inter-multimeric disulphide bridges, and/or to acquire additional functionalities or properties for said multimeric assemblies.
An ‘engineered DUF3992-containing protein subunit’ as defined herein, or an ‘engineered Ena protein’ as defined herein, relates to non-naturally occurring forms of DUF3992-containing or Ena proteins, respectively, which is still capable of self-assembling and forming multimeric or fibrous structures. Engineered or modified or modulated proteins subunits or protein subunit variants, as interchangeably used herein, may show differences on their primary structural feature level, i.e. on their amino acid sequence as compared to the wild type (Ena) protein, as well as by other modifications, i.e. by chemical linkers or tags. An engineered protein subunit may thus concern a mutant protein, comprising for instance one or more amino acid substitutions, insertions or deletions, or a fusion protein, which may be a tagged or labelled protein, or a protein with an insertion within its sequence or its topology, or a protein formed by assembly of partial or split-Ena proteins, among other modifications. So in one embodiment, an engineered Ena protein is disclosed, wherein said engineered Ena protein is a modified Ena protein as compared to native Ena proteins, and is a non-naturally occurring protein. Non-limiting examples as provided herein relate to N- or C-terminally tagged Ena proteins, more specifically with a heterologous tag of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more amino acid residues long, to acquire sterically frustrated Ena protein subunits for multimer formation without forming any fibrous assemblies; Ena mutant or variant proteins; Ena protein fusions or Ena proteins with a heterologous peptide or protein inserted within one of its exposed loops between β-strands, or Ena proteins formed upon assembly of Ena split-protein parts separately expressed in a host.
A tag is a ‘heterologous tag’ or ‘heterologous label’ resulting in a ‘heterologous fusion’ if it is not naturally occurring in the wild-type protein sequence, and is added for application purposes, such as for facilitating purification of the protein, or for assembling multimers sterically hindered in outgrowth of fiber formation. The term “detectable label”, “labelling”, or “tag”, as used herein, refers to detectable labels or tags allowing the detection, visualization, and/or isolation, purification and/or immobilization of the isolated or purified (poly-)peptides described herein, and is meant to include any labels/tags known in the art for these purposes. Particularly preferred are affinity tags, such as chitin binding protein (CBP), maltose binding protein (MBP), glutathione-S-transferase (GST), poly(His) (e.g., 6×His or His6), Strep-tag®, Strep-tag II® and Twin-Strep-tag®; solubilization tags, such as thioredoxin (TRX), poly(NANP) and SUMO; chromatography tags, such as a FLAG-tag; epitope tags, such as V5-tag, EPEA-tag, myc-tag and HA-tag; fluorescent labels or tags (i.e., fluorochromes/-phores), such as fluorescent proteins (e.g., GFP, YFP, RFP etc.) and fluorescent dyes (e.g., FITC, TRITC, coumarin and cyanine); luminescent labels or tags, such as luciferase; and (other) enzymatic labels (e.g., peroxidase, alkaline phosphatase, beta-galactosidase, urease or glucose oxidase). Also included are combinations of any of the foregoing labels or tags.
Said functional engineered protein subunits or engineered Ena protein subunits or monomers, preferably engineered by addition of a tag, may further be capable of forming an arrested multimer, or an arrested fiber, in itself, as a homomultimeric assembly of engineered Ena protein subunits, or as a heteromultimeric assembly combining engineered and non-engineered (e.g. wild type) Ena protein subunits.
In a particular embodiment, the proteins subunit may be engineered Ena proteins comprising at least one Ena mutant or Ena variant protein subunit. For example, though not-limiting, such Ena mutants or variants can be derived from the structural information demonstrating where modification or mutation of surface sidechains of the multimer or protein subunit is feasible (see also
Furthermore, an example of insertion sites in Ena1B (SEQ ID NO:8) is depicted in
The N-terminal region and C-terminal region as defined herein for Ena proteins refers to the wild type Ena protein sequence. For said wild type (or substitution/mutant variant) Ena proteins, the ‘N-terminal region’ is defined as the first part of the Ena protein sequence comprising a flexible N-terminal connector followed by a spacer, and the first β-strand B of the typical BIDG CHEF β-sheets composing the jellyroll folding of said Ena protein subunit. The ‘C-terminal region’ of the Ena proteins as defined herein is the end of the protein sequence comprising the last β-strand I of the BIDG CHEF β-sheets and possible residual C-terminal residues thereafter.
One application one may consider is to modify the Ena protein subunit in an engineered Ena protein format whereby another functional moiety or protein, such as for instance an antibody or alike, is fused to said Ena protein or Ena multimer, providing for a functionalized multimer, optionally coupled to a surface or support.
In order to make structurally attractive fusions, the skilled person may consider engineering the Ena protein as a circularly permutated protein. The term “circular permutation of a protein” or “circularly permutated protein” refers to a protein which has a changed order of amino acids in its amino acid sequence, as compared to the wild type protein sequence, with as a result a protein structure with different connectivity, but overall similar three-dimensional (3D) shape. A circular permutation of a protein is analogous to the mathematical notion of a cyclic permutation, in the sense that the sequence of the first portion of the wild type protein (adjacent to the N-terminus) is related to the sequence of the second portion of the resulting circularly permutated protein (near its C-terminus), as described for instance in Bliven and Prlic (2012). A circular permutation of a protein as compared to its wild protein is obtained through genetic or artificial engineering of the protein sequence, whereby the N- and C-terminus of the wild type protein (as defined above herein for Ena proteins) are ‘connected’, and the protein sequence is interrupted or cleaved at another site, to create a novel N- and C-terminus of said protein. The circularly permutated Ena protein of the invention is thus the result of a connected N- and C-terminus of the wild type Ena protein sequence, and a cleavage or interrupted sequence at an accessible or exposed site (preferentially a β-turn or loop) of said Ena protein subunit, whereby the folding is retained or similar as compared to the folding of the wild type Ena protein. Said connection of the N- and C-terminus in said circularly permutated scaffold protein may be the result of a peptide bond linkage, or of introducing a peptide linker, or of a deletion of a peptide stretch near the original N- and C-terminus if the wild type protein, followed by a peptide bond or the remaining amino acids. This rearrangement of the N- and C-terminus of the resulting Ena protein is referred to as the secondary N- and C-terminus.
Finally, the multimers as described herein provide for numerous applications in the field of next-generation biomaterials. In one embodiment, said multimers may be coupled to a solid surface, and as such provide for modified surfaces with properties of having an extreme resilient behaviour, thus being very stable and rigid materials.
Fibrous Assemblies.
Another aspect of the invention relates to recombinantly produced fibers comprising at least two multimers, wherein said multimers comprise at least 7 protein subunits, or 7-12 subunits, which comprise a self-assembling DUF3992 domain-containing protein, in particular an Ena protein, wherein said protein subunits are non-covalently connected via β-sheet augmentation, and wherein said multimers are longitudinally stacked and covalently connected via at least one disulphide bridge. The protein fibers may thus be produced in a non-natural host, recombinantly, in cellulo and/or in vitro, and may comprise heteromeric or homomeric multimers. When heteromeric protein fibers are envisaged, the multimers may comprise one or more self-assembling DUF3992-domain-containing Ena proteins, or alternatively the protein subunits are identical except for that one or more subunit is an engineered protein form thereof. Homomultimeric protein fibers may be generated by recombinantly expressing a specific Ena protein or Ena protein mutant, variant or engineered Ena protein in a host cell. Any recombinantly produced protein fiber comprising one or more Ena protein subunits will be a non-naturally occurring fiber since the ruffles observed on the in vivo Bacillus fibers (see Examples) have never been seen in the recombinantly produced fibers.
In a specific embodiment, the protein subunits or multimers as described herein comprise an ‘N-terminal region’ or ‘N-terminal connector’ or ‘N-terminal connector region’, as used interchangeably herein, with a conserved amino acid residue sequence motif depicted as ZXnCCXmC, wherein Z is Leu, Ile, Val or Phe, and X is any amino acid, n is 1 or 2 residues, and m is 10-12, and comprising a ‘C-terminal region’ or ‘C-terminal receiver region’, as used interchangeably herein, with a conserved amino acid motif depicted as GX2/3CX4Y, wherein X is any amino acid, to allow S-type fiber formation of said multimers by longitudinally connecting the Cys present in said motifs to form covalent disulphide bonds. In a specific embodiment, said protein fiber formed by these multimers has a helical structure (e.g.
In another embodiment, an ‘engineered multimer’ for modulating the rigidity and/or elasticity of said protein fiber is produced wherein the N-terminal region of one or more protein subunits comprises a N-terminal conserved motif ZXnCCXmC, wherein Z is Leu, Ile, Val or Phe, and X is any amino acid, n is 1 or 2 residues, but with m being 7, 8 or 9 amino acid residues instead of 10-12 residues, resulting in a shorter N-terminal region (as compared to Ena1A of SEQ ID NO:1 or Ena1B of SEQ ID NO:8, for instance), or with m being between 13 and 16 residues, resulting in a longer N-terminal region terminal region (as compared to Ena1A of SEQ ID NO:1 or Ena1B of SEQ ID NO:8 for instance). Said engineered multimers may still allow to form covalent S-S bridges via said cysteines with the C-terminal receiver motif GX2/3CX4Y in the assembly of an S-type or helical fiber, but may be of lower stability or rigidity as compared to the ones where m is 10-12 residues. The formation of S-type or helical fibers may be possible without disulphide bridge formation, though this will result in much less stable and lower resilient fiber structures. Indeed, as supported herein, the fiber structures that comprise the N-terminal cysteine covalent linking provide for a stability that allows for instance the endospore appendages to survive in harsh conditions. The disulphide bonds present in the lumen of the fibers allow for this strength and are therefore preferred in the fibers.
Furthermore, L-type protein fibers comprising disc-type multimers are also longitudinally cross-linked via covalent linkage between N-terminal conserved Cys residues and multimers of the preceding layer connector. Said fibers may be formed by recombinant expression of Ena3, as depicted in SEQ ID NOs:49-80 or a homologue with at least 80% of any one thereof. Said Ena3 proteins being functional in L-type fiber formation are further defined herein to contain an N-terminal connector with a conserved motif that is slightly adapted to the Ena1/2 A&B S-type fiber forming subunits, i.e. the motif wherein the second Cys may be replaced by another amino acid in some Ena3 proteins, so as defined by ZXnC(C)XmC, wherein Z is Leu, Ile, Val or Phe, and X is any amino acid, n is 1 or 2 residues, and m is 10-12, and comprising a ‘C-terminal region’ or ‘C-terminal receiver region’, as used interchangeably herein, with a conserved amino acid motif depicted as S-Z-N-Y-X-B, wherein Z is Leu or Ile, B is Phe or Tyr, and X is any amino acid, to allow L-type fiber formation of said multimers by longitudinally connecting the Cys present in said motifs to form covalent disulphide bonds. In a specific embodiment, said protein fiber formed by these multimers has a disc-like structure (e.g.
For instance, by addition of a heterologous N-terminal tag of at least 1 to 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more amino acids, steric hinder will prevent or negatively affect disulphide bridge formation thereby preventing fiber formation, or resulting in partially formed fibers or less strong and less resilient or rigid fibers (see examples).
In a specific embodiment, the produced protein fiber comprising said at least 2 multimers are covalently linked through at least one disulphide bond between a side chain of a Cys residue of the N-terminal connector region of at least one protein subunit of one multimer with a Cys residue of a protein subunit of the receiver region of the multimer of the preceding layer in this longitudinal direction. In a preferred embodiment, there are at least two disulphide bonds formed between different multimers of the fiber, and most preferably each disulphide bond contains a sulphur atom from the cysteines in the N-terminal region of one or more protein subunits to make a bond to the sulphur atom of the cys present in the protein subunit of the preceding multimer of the fiber. In a specific embodiment said N-terminal region has two consecutive Cys in said conserved amino acid motif to both take part in a disulphide bridge with another multimer of the fiber. Other embodiments relate to said protein fibers as nanofibers comprising at least 2 multimers, wherein said multimers are stacked and covalently linked through disulphide bridge(s) formed by the first and second Cys residues of the N terminal conserved motif of protein subunit (i) and the Cys residue of the β-strand I of subunit (i−9) and B of subunit (i−10), respectively.
The protein fiber as described herein is thus composed from two or more multimers each comprising at least 7 protein subunits comprising a self-assembling DUF3992 domain-containing protein, as described herein, or more particular comprising an Ena protein or engineered Ena protein, wherein said protein subunits are non-covalently linked, and wherein said multimers are longitudinally stacked solely by forming covalent disulphide bonds between said stacked multimers. In said protein fibers, said multimers may be identical or different in composition. And said multimers may be engineered multimers for modulating the rigidity of the fiber, as defined herein. Furthermore, said at least two multimers of said protein fiber may be multimers comprising identical protein subunits, or comprising different protein subunits. Contrary to the L-type fibers, which comprise distinguishable multimeric discs that are only covalently connected via the disulphide bridges, the multimers present in S-type fibers will not be distinguishable as single units that are solely covalently connected, but will be a continuous β-sheet augmentation of protein subunits in a β-propeller helical structure, and additionally crosslinked every helical turn by disulphide bridges. So ‘a protein fiber comprising the multimers’ as used herein may refer to a protein fiber which is consisting of distinguishable separate disc-like multimers (e.g. comprising solely Ena3A-based protein subunits) solely connected via S-S bridges, or to a protein fiber compiled from helical-turn-like multimers (e.g. Ena1/2A and/or Ena1/2B protein-based), which are continuously non-covalently connected into a fibrous helical structure, and further crosslinked via S-S bridges.
Furthermore, alternative embodiments comprise an engineered protein fiber, which is defined as a fiber comprising two or more multimers, as described herein, wherein at least one multimer is an engineered multimer, as defined herein, and/or wherein at least one protein subunit is an engineered protein subunit, as defined herein.
Another embodiment relates to a recombinantly produced or in vitro produced and purified protein fiber, wherein said fiber may be obtainable by recombinant or in vitro expression of the chimeric gene as described further herein. Said in vitro produced fiber may be an S-type fiber as disclosed herein, and may be formed by multimers comprising Ena1A and/or Ena1B protein, and/or an engineered form thereof. Said in vitro produced fibers are not occurring in nature, such as on Bacillus endospores, for which is it clear that Ena1A, Ena1B and Ena1C are indispensably required to form S-type fibers in vivo (see Examples). A specific embodiment relates to said in vitro produced protein fiber which is an engineered protein fiber in that the multimers of said proteins fiber comprise at least one engineered multimer, as described herein, or at least one multimer comprising an engineered protein subunit, as described herein, in particular at least one engineered Ena protein, as described herein. A further embodiment provides for an engineered protein fiber, wherein the protein fiber as described herein is fused to another protein or is conjugated to another moiety, such as a chemical moiety, or a functional moiety.
Another aspect of the invention provides for a chimeric gene or chimeric construct, which comprises DNA elements comprising at least a heterologous promoter or regulatory element operably linked to a nucleic acid sequence which upon expression controlled by said promoter or regulatory element results in a nucleic acid molecule encoding a protein subunit or protomer containing a self-assembling protein, as defined herein, and wherein said heterologous promoter or heterologous regulatory element sequence is originating from another source as (or is different to the native form of) the nucleic acid sequence encoding the bacterially derived self-assembling protein. In a further embodiment said chimeric gene comprises a heterologous promoter element or regulatory expression element operably linked to a nucleic acid molecule encoding an Ena protein, as described herein, or an engineered Ena protein thereof, which may be an Ena mutant or variant protein, an extended Ena protein (sterically frustrated to prevent fiber formation) or a fusion protein. Moreover, said chimeric construct may be present in an expression cassette, or as part of a cloning or expression vector for production of the protein in vitro.
An “expression cassette” comprises any nucleic acid construct capable of directing the expression of a gene/coding sequence of interest, which is operably linked to a promoter of the expression cassette. Expression cassettes are generally DNA constructs preferably including (5′ to 3′ in the direction of transcription): a promoter region, a polynucleotide sequence, homologue, variant or fragment thereof operably linked with the transcription initiation region, and a termination sequence including a stop signal for RNA polymerase and a polyadenylation signal. It is understood that all of these regions should be capable of operating in biological cells, such as prokaryotic or eukaryotic cells, to be transformed. The promoter region comprising the transcription initiation region, which preferably includes the RNA polymerase binding site, and the polyadenylation signal may be native to the biological cell to be transformed or may be derived from an alternative source, where the region is functional in the biological cell. Such cassettes can be constructed into a “vector”.
The term “vector”, “vector construct,” “expression vector,” or “gene transfer vector,” as used herein, is intended to refer to a nucleic acid molecule capable of transporting another nucleic acid molecule to which it has been linked, and includes any vector known to the skilled person, including any suitable type. including, but not limited to, plasmid vectors, cosmid vectors, phage vectors, such as lambda phage, viral vectors, such as adenoviral, AAV or baculoviral vectors, or artificial chromosome vectors such as bacterial artificial chromosomes (BAC), yeast artificial chromosomes (YAC), or P1 artificial chromosomes (PAC). Expression vectors comprise plasmids as well as viral vectors and generally contain a desired coding sequence and appropriate DNA sequences necessary for the expression of the operably linked coding sequence in a particular host organism (e.g., bacteria, yeast, plant, insect, or mammal) or in in vitro expression systems. Expression vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., vectors having an origin of replication which functions in the host cell). Other vectors can be integrated into the genome of a host cell upon introduction into the host cell, and are thereby replicated along with the host genome. Suitable vectors have regulatory sequences, such as promoters, enhancers, terminator sequences, and the like as desired and according to a particular host organism (e.g. bacterial cell, yeast cell). Cloning vectors are generally used to engineer and amplify a certain desired DNA fragment and may lack functional sequences needed for expression of the desired DNA fragments. The construction of expression vectors for use in transfecting prokaryotic cells is also well known in the art, and thus can be accomplished via standard techniques (see, for example, Sambrook, et al. Molecular Cloning: A Laboratory Manual, 4th ed., Cold Spring Harbor Press, Plainsview, N.Y. (2012); and Ausubel et al., Current Protocols in Molecular Biology (Supplement 114), John Wiley & Sons, New York (2016), for definitions and terms of the art.
A further embodiment relates to a host cell expressing the chimeric gene as described herein, thereby possibly resulting in a host cell comprising the protomers or protein subunits of the multimers or forming the fibers as described herein. ‘Host cells’ can be either prokaryotic or eukaryotic. The cells can be transiently or stably transfected. Such transfection of expression vectors into prokaryotic and eukaryotic cells can be accomplished via any technique known in the art, including but not limited to standard bacterial transformations, calcium phosphate co-precipitation, electroporation, or liposome mediated-, DEAE dextran mediated-, polycationic mediated-, or viral mediated transfection. For all standard techniques see, for example, Sambrook et al., Molecular Cloning: A Laboratory Manual, 4th ed., Cold Spring Harbor Press, Plainsview, N.Y. (2012); and Ausubel et al., Current Protocols in Molecular Biology (Supplement 114), John Wiley & Sons, New York (2016). Recombinant host cells, in the present context, are those which have been genetically modified to contain an isolated DNA molecule, nucleic acid molecule or expression construct or vector of the invention. The DNA can be introduced by any means known to the art which are appropriate for the particular type of cell, including without limitation, transformation, lipofection, electroporation or viral mediated transduction. A DNA construct capable of enabling the expression of the chimeric protein of the invention can be easily prepared by the art-known techniques such as cloning, hybridization screening and Polymerase Chain Reaction (PCR). Standard techniques for cloning, DNA isolation, amplification and purification, for enzymatic reactions involving DNA ligase, DNA polymerase, restriction endonucleases and the like, and various separation techniques are those known and commonly employed by those skilled in the art. A number of standard techniques are described in Sambrook et al. (2012), Wu (ed.) (1993) and Ausubel et al. (2016). Representative host cells that may be used with the invention include, but are not limited to, bacterial cells, yeast cells, plant cells and animal cells. Bacterial host cells suitable for use with the invention include Escherichia spp. cells, Bacillus spp. cells, Streptomyces spp. cells, Erwinia spp. cells, Klebsiella spp. cells, Serratia spp. cells, Pseudomonas spp. cells, and Salmonella spp. cells. Animal host cells suitable for use with the invention include insect cells and mammalian cells (most particularly derived from Chinese hamster (e.g. CHO), and human cell lines, such as HeLa. Yeast host cells suitable for use with the invention include species within Saccharomyces, Schizosaccharomyces, Kluyveromyces, Pichia (e.g. Pichia pastoris), Hansenula (e.g. Hansenula polymorpha), Yarowia, Schwaniomyces, Schizosaccharomyces, Zygosaccharomyces and the like. Saccharomyces cerevisiae, S. carlsbergensis and K. lactis are the most commonly used yeast hosts, and are convenient fungal hosts. The host cells may be provided in suspension or flask cultures, tissue cultures, organ cultures and the like. Alternatively, the host cells may also be transgenic animals.
A specific embodiment relates to a Bacillus spp. cell comprising a chimeric gene encoding an Ena protein, or engineered Ena protein, as defined herein, so that upon sporulation of said Bacillus spp. the gene is expressed to form modified endospores, with (engineered) Ena protein for self-assembly into engineered Ena multimers and fibers in vivo. So a specific embodiment relates to a Bacillus spore or endospore comprising or displaying recombinant protein fibers comprising Ena protein or engineered Ena protein. Said engineered fibers on said spores may be advantageous for applying the spores in a certain environment or context.
Another embodiment relates to a method to produce such a modified endospore, comprising the steps of recombinant expression of a chimeric gene(s) as described herein in a spore-forming bacterial cell, and incubate in conditions for inducing sporulation.
Another aspect of the invention relates to a modified surface or solid support, which contains the (engineered) multimer or protein fiber of the invention. Particularly a modified surface is disclosed wherein a self-assembling Ena protein subunit as defined herein is covalently linked to a solid surface. A particular embodiment relates to said modified surface wherein at least one Ena protein subunit or engineered Ena protein is covalently linked to a solid support. Such a modified surface may be used as a nucleator surface allowing epitaxial growth to further form multimers and fibers as described herein, linked to said protein subunit and surface, when said modified surface comprising at least one Ena protein subunit is exposed to a solution comprising further Ena proteins, which will thus self-assemble with each other into multimers and upon covalent disulphide bridge formation form protein fibers outgrowing from said surface.
Surface immobilization may be envisaged as covalent binding of at least one (engineered) Ena protein subunit on said surface by using means known by the skilled person. Such means include, but are not limited to click chemistry, cross-linking to free amines (at the N-term, via Lysine) for example through NHS-chemistry, disulphide cross-linking, thiol-based cross-linking, addition of a tag (snap- or sortase tag for instance), fusion at N- or C-terminal end of the Ena protein to allow covalent attachment of the protein to a surface, as known in the art. The conditions in which a monomeric Ena subunit is coupled to the surface is envisaged to concern a denaturing buffer condition in a specific embodiment.
The protein fibers or engineered protein fibers may as well be fused or attached on the cell or microbial surface of the host, or can be nucleated onto a foreign surface that is exposed to a solution containing the Ena protein to obtain a modified surface comprising the fiber or engineered fiber.
Said surface immobilization may thus be accomplished herein on biological or synthetic surfaces. Biological surface includes the surface of a cell, of a bacterium, an (endo)spore, or other naturally occurring or recombinantly produced surfaces. High density surface expression of recombinant proteins is a prerequisite for successfully using cellular surface display in several areas of biotechnological applications in the fields of pharmaceutical, fine chemical, bioconversion, waste treatment and agrochemical production.
An artificial or synthetic surface may for instance include a bead, a slide, a chip, a plate, or a column. More particularly, the artificial surface may be particulate (e.g. beads or granules) or in sheet form (e.g. membranes or filters, glass or plastic slides, microtitre assay plates, dipstick, capillary devices) which can be flat, pleated, or hollow fibers or tubes. A range of biotechnological applications make use of the coating or activation of synthetic surfaces with protein assemblies, such as multimer compositions or fibers as described herein.
So the invention also provides for a system or in vitro method that couples the production of the Ena proteins or derivatives thereof with a self-assembling property that leads to the formation of multimeric and/or fibrous assemblies onto a synthetic surface and that displays these on said surface in a conformation for further specific capturing or displaying means and molecules to fulfil a certain goal in the biomedical or biotechnological field of biomaterials.
The invention further relates to directly applicable products obtained by generating the protein subunits, multimers or fibers or any engineered forms thereof in a particulate context. The self-assembling protein subunits according to the present invention indeed allow to self-assemble readily into multimeric assemblies as well as long, resilient, flexible nanofibers, which can be tailored for different functions through point mutations, peptide or protein fusions, and conjugates. Said engineered nanofibers with high rigidity and stability, even in harsh conditions, though with very high flexibility will provide for next-generation biomaterials. In one embodiment, such a biomaterial is present in the form of a thin protein film comprising the engineered protein fiber as described herein, and/or the protein fiber as described herein. As provided in the Example section (and e.g.
Another embodiment relates to a hydrogel comprising the engineered protein fiber of the invention, and optionally a protein fiber as described herein. In another embodiment, hydrogels are disclosed comprising an engineered multimer as described herein or a multimer comprising an engineered protein subunit as described herein. Hydrogels are known as water-swollen polymeric materials that maintain a distinct three-dimensional structure. They were the first biomaterials designed for use in the human body. Novel approaches in hydrogel design have revitalized this field of biomaterials research with applications in therapeutics, sensors, microfluidic systems, nanoreactors, and interactive surfaces. Hydrogels may self-assemble by hydrophobic, electrostatic or other types of molecular interactions. Designing hydrogel-forming polymers, using recognition motifs found in nature, enhances the potential for the formation of precisely defined three-dimensional structures.). The (engineered) multimers or protein fiber of the invention also provide for well-structured 3D building blocks to form a hydrogel, for which methods are known to the skilled person. The versatility of the revealed structures of the invention especially provide for an opportunity to manipulate its stability and specificity by modifying the primary structure, i.e. by using engineered proteins subunits, multimers or fibers of the invention for the successful design of a new class of hydrogel biomaterials. Furthermore, also hybrid hydrogels are envisaged herein, and usually referred to as hydrogel systems that possess components from at least two distinct classes of molecules, for example, synthetic polymers and biological macromolecules, interconnected either covalently or non-covalently. Compared to synthetic polymers, proteins and protein modules have well defined and homogeneous structures, consistent mechanical properties, and cooperative folding/unfolding transitions. The protein fiber or multimers of the invention used in said hybrid hydrogel may impose a level of control over the structure formation at the nanometer level; the synthetic part may contribute to the biocompatibility of the hybrid material in certain biomedical applications. By optimizing the amino acid sequence, i.e. by applying engineered Ena proteins, responsive hybrid hydrogels tailor made fora specific application may be designed, Potential applications of different types of hydrogels include tissue engineering, synthetic extracellular matrix, implantable devices, biosensors, separation systems, materials controlling the activity of enzymes, phospholipid bilayer destabilizing agents, materials controlling reversible cell attachment, nanoreactors with precisely placed reactive groups in three-dimensional space, smart microfluidics with responsive hydrogels, and energy-conversion systems.
A final aspect of the invention relates to methods for producing said self-assembling protein subunits, multimers, in vitro or in vivo/in cellulo produced protein fibers, or further to produce ‘arrested’ Ena proteins, engineered forms of Ena proteins, multimers and fiber, and produce modified surfaces of the present invention. The method to produce said protein subunit monomers or self-assembled multimers is a recombinant or in vitro process comprising the steps of:
One embodiment relates to said method wherein the protein subunit of the chimeric gene expressed in said cell may be an engineered protein subunit or engineered Ena protein, or may be more than one chimeric construct providing for the expression of one or more wild type Ena proteins and/or different forms of engineered protein subunits of the invention.
Another embodiment relates to said method wherein the purification in step b) comprises the steps of isolation and solubilization of inclusion bodies, refolding of solubilized protein subunits, and purification of refolded protein multimers. Further purification methods for instance using affinity chromatography, ion exchange chromatography, gel filtration, or further alternatives are known to the skilled person.
In another embodiment, the protein subunit, as described herein, in particular an (engineered) Ena protein subunit, encoded by the chimeric gene used in said method to express recombinantly in a cell comprises a heterologous N- or C-terminal tag. Said N- or C-terminal tag may result in production of protein subunits that are still capable to self-assemble into multimers, but due to a non-natural presence of said N- or C-terminal tag, steric hindrance arrests these protein subunits or multimers in further fiber formation or ‘outgrowth’. Most preferable said heterologous N- or C-terminal tag is at least 1-5, 6, 7, 9 or at least 15 amino acids to result in arrested or hampered fiber formation or blocking or retarding of epitaxial growth. Said heterologous N- or C-terminal tag may be an affinity tag, as described herein.
Another embodiment relates to a method to recombinantly produce the protein fiber in a host cell, comprising the steps of:
wherein the nucleic acid encoding said self-assembling protein subunit or the Ena protein does not provide for a heterologous N- or C-terminal tag. By recombinantly expressing tag-free or non-sterically hindered Ena proteins, the spontaneous self-assembly into fibers into the cytoplasm allows to easily produce S-type like fibers in vivo.
A further embodiment relates to the in vitro method for producing a protein fiber or engineered protein fiber according to the invention, comprising the steps of:
Alternatively, said protein fiber is produced by said method wherein step b) and c) are reversed. A cleavable tag is for instance a tag with a proteolytic cleavage site, or a cleavable tag as known by the skilled person.
Another embodiment further provides for a method to produce a modified surface as disclosed herein, comprising the steps of the method for producing and purifying the fiber, multimer or engineered forms thereof, followed by a further step of covalently attaching the protein, multimer or fiber to surface, which may be biological or artificial surface.
Finally, there are numerous applications as touched upon already herein for said Ena protein or engineered Ena protein subunit-derived assemblies as next-generation biomaterials in different fields, such as the biomedical and biotechnological areas. So, the use and utility of said nanomaterials is endless.
It is to be understood that although particular embodiments, specific configurations as well as materials and/or molecules, have been discussed herein for methods, and products according to the disclosure, various changes or modifications in form and detail may be made without departing from the scope of this invention. The following examples are provided to better illustrate particular embodiments, and they should not be considered limiting the application. The application is limited only by the claims.
Endospores formed by Bacillus and Clostridium species frequently carry surface-attached feather-, ribbon- or pilus-like appendages (Driks, 2007), the role of which has remained largely enigmatic due to the lack of molecular annotation of the pathways involved in their assembly. Half a century following their first observation (Hachisuka and Kuno, 1976; Hodgikiss, 1971), we herein employ high resolution de novo structure determination by cryoEM to structurally and genetically characterize the appendages found on B. cereus spores.
Negative stain EM imaging of B. cereus strain NVH 0075/95 showed typical endospores with a dense core of ˜1 urn diameter, tightly wrapped by an exosporium layer that on TEM images emanates as a flat 2-3 μm long saclike structure from the endospore body (
We found that B. cereus Enas come in two main morphologies: 1) staggered or S-type Enas that are several micrometer long and emerge from the spore body and traverses the exosporium, and 2) smaller, less abundant ladder- or L-type Enas that appears to directly emerge from the exosporium surface.
To further study the nature of the Enas, fibers purified from B. cereus NVH 0075/95 endospores were imaged by cryogenic electron microscopy (cryo-EM) and analysed using 3D reconstruction. Isolated fibers showed a 9.4:1 ratio of S- and L-type Enas, similar to what was seen on endospores. Boxes with a dimension of 300×300 pixels (246×246 Å2) were extracted along the length of the fibers, with an inter-box overlap of 21 Å, and subjected to 2D classification using RELION 3.0 (Zivanov et al., 2018). Power spectra of the 2D class averages revealed a well-ordered helical symmetry for S-type Enas (
To confirm the subunit identity of the endospore appendages isolated from B. cereus NVH0075/95, we cloned a synthetic gene fragment corresponding to the coding sequence of Ena1B and an N-terminal TEV protease cleavable 6×His-tag into a vector for recombinant expression in the cytoplasm of E. coli (recEna1B depicted in SEQ ID NO:83). The recombinant protein was found to form inclusion bodies, which were solubilized in 8M urea before affinity purification. Removal of the chaotropic agent by rapid dilution resulted in the formation of abundant soluble crescent-shaped oligomers reminiscent of a partial helical turn seen in the isolated S-type Enas (
The wild-type sequence of Ena1C (WP_000802321) was codon optimized for expression in E. coli and ordered as a synthetic gene from Twist Bioscience and subcloned further in the pET28a vector (NcoI-XhoI). The insert was designed to have an N-terminal 6× histidine tag followed by a TEV cleavage site (SEQ ID NO:89: ENLYFQG). Large scale recombinant expression was carried out in phage resistant T7 Express lysY/Iq E. coli strain from NEB. The obtained plasmids (pET28a_Ena1A; pET28a_Ena1B) were used to transform competent cells of C43(DE3). Single colonies were used to start overnight (ON) LB cultures. 10 ml ON culture was used to inoculate 11 LB, 25 mg/ml kanamycin at 37 C. Recombinant expression was induced at OD600 of 0.8 by addition of 1 mM IPTG and cultures were left to incubate ON. Cells were pelleted by 15 min centrifugation at 4000 g. The whole-cell pellet was resuspended in denaturing lysis buffer (20 mM Potassium Phosphate, 500 mM NaCl, 10 mM 13-ME, 20 mM imidazole, 8M urea, pH 7.5) and sonicated on ice. The lysate was centrifuged to separate the soluble and insoluble fractions by centrifugation at 20,000 rpm for 45 min in a JA-20 rotor from Beckman coulter. The cleared lysate was loaded onto a 5 ml HisTrap HP column packed with Ni Sepharose and equilibrated with denaturing lysis buffer. The bound protein was eluted with elution buffer (20 mM Potassium Phosphate, pH 7.5, 8 M Urea, 250 mM imidazole) in a gradient mode (20-250 mM Imidazole) using an AKTA purifier at room temperature. Resulting fractions were analyzed with SDS-PAGE to check for purity. Fractions containing Ena1C were pooled and refolded by means of dialysis (over-night, 100 μl against 1 liter, 3 kDa cutoff) to 20 mM Potassium Phosphate, 10 mM β-ME, pH 7.5. A 5 μl aliquoted of the refolded material was deposited on Formvar/Carbon grids (400 Mesh, Cu; Electron Microscopy Sciences) and stained using 2% (w/v) uranyl acetate.
As shown in
Upon recognizing that native S-type Enas show a mixed Ena1A and Ena1B composition, we continued with 3D cryoEM reconstruction of recEna1B for model building. The Ena subunit consists of a typical jellyroll fold (Richardson, 1981) comprised of two juxtaposed β-sheets consisting of strands BIDG and CHEF (
Thus, B. cereus endospore appendages represent a novel class of bacterial pili, comprising a left-handed single start helix with non-covalent lateral subunit contacts formed by β-sheet augmentation, and covalent longitudinal contacts between helical turns by disulphide bonded N-terminal connecter peptides, resulting in an architecture that combines extreme chemical stability (
Covalent bonding, and the highly compact jellyroll fold result in a high chemical and physical stability of the Ena fibers, withstanding desiccation, high temperature treatment, and exposure to proteases. The formation of linear filaments of multiple hundreds of subunits requires stable, long-lived subunit-subunit interactions with high flexibility to avoid that a dissociation of subunit-subunit complexes results in pilus breakage. This high stability and flexibility are likely to be adaptations to the extreme conditions that can be met by endospores in the environment or during the infectious cycle.
Two molecular pathways are known to form surface fibers or “pili” in Gram-positive bacteria: 1) sortase-mediated pilus assembly, which encompasses the covalent linkage of pilus subunits by means of a transpeptidation reaction catalyzed by sortases (Ton-That and Schneewind, 2004), and 2) Type IV pilus assembly, encompassing the non-covalent assembly of subunits through a coiled-coil interaction of a hydrophobic. N-terminal helix (Melville and Craig, 2013). Sortase-mediated pili and Type IV pili are formed on vegetative cells, however, and to date, no evidence is available to suggest that these pathways are also responsible for the assembly of endospore appendages.
Until the present study, the only species for which the genetic identity and protein composition of spore appendages has been known, is the non-toxigenic environmental species Clostridium taeniosporum, which carry large (4.5 urn long, 0.5 urn wide and 30 nm thick) ribbon-like appendages, which are structurally distinct from those found in most other Clostridium and Bacillus species. C. taeniosporum lacks the exosporium layer and the appendages seem to be attached to another layer, of unknown composition, outside the coat (Walker et al., 2007). The C. taeniosporum endospore appendages consist of four major components, three of which have no known homologs in other species and an orthologue of the B. subtilis spore membrane protein SpoVM (Walker et al., 2007). The appendages on the surface of C. taeniosporum endospores, therefore, represent distinct type of fibers than those found on the surface of spores of species belonging to the B. cereus group.
Our structural studies uncover a novel class of pili, where subunits are organized into helically wound fibers, held together by lateral β-sheet augmentation inside the helical turns, and longitudinal disulphide cross-linking across helical turns. Covalent cross-linking in pilus assembly is known for sortase-mediated isopeptide bond formation seen in Gram-positive pili (Ton-That and Schneewind, 2004). In Enas, the cross-linking occurs through disulphide bonding of a conserved Cys-Cys motif in the N-terminal connector of a subunit i, to two single Cys residues in the core domain of the Ena subunits located at position i−9 and i−10 in the helical structure. As such, the N-terminal connectors form a covalent bridge across helical turns, as well as a branching interaction with two adjacent subunits in the preceding helical turn (i.e. i−9 and i−10). The use of N-terminal connectors or extensions is also seen in chaperone-usher pili and bacteroides Type V pili, but these system employ a non-covalent fold complementation mechanism to attain long-lived subunit-subunit contacts, and lack a covalent stabilization (Sauer et al., 1999; Xu et al., 2016). Because in Ena the N-terminal connectors are attached to the Ena core domain via a flexible linker, the helical turns in Ena fibers have a large pivoting freedom and ability to undergo longitudinal stretching. These interactions result in highly chemically stable fibers, yet with a large degree of flexibility. Whether the stretchiness and bendiness of Enas are functionally important is yet unclear. Of note, in several chaperone-usher pili, a reversible spring-like stretching provided by helical unwinding and rewinding of the pili has been found important to withstand shear and pulling stresses exerted on adherent bacteria (Miller et al., 2006); (Fallman et al., 2005). Possibly, the longitudinal stretching seen in Ena may serve a similar role.
In B. cereus NVH 0075/95 Ena1A, Ena1B and Ena1C are encoded in a genomic region flanked upstream by dedA (genbank: KMP91696.1) and a gene encoding a 93-residue protein of unknown function (DUF1232, genbank: KMP91696.1) (
Typical Ena filaments have, to the best of our knowledge, never been observed on the surface of vegetative B. cereus cells indicating that they are endospore-specific structures. In support of that assumption, qRT-PCR analysis NVH 0075/95 demonstrated increased ena1A-C transcript during sporulation, compared to vegetative cells. A transcriptional analysis has previously been performed for B. thuringiensis serovar chinensis CT-43 determining transcription at 7 h, 9 h, 13 h (30% of cells undergoing sporulation) and 22 h after inoculation (Wang et al., 2013). It is difficult to directly compare expression levels of ena1A, B and C in B. cereus NVH 0075/95 with the expression level of ena2A-C in B. thuringiensis serovar chinensis CT-43 (CT43_CH0783-785) since the expression of the latter strain was normalized by converting the number of reads per gene into RPKM (Reads Per Kilo bases per Million reads) and analyzed by DEGseq software package, while the present study determines the expression level of the ena genes relative to the house keeping gene rpoB. However, both studies indicate that enaA and enaB are only transcribed during sporulation. By searching a separate set of published transcriptomic profiling data we found that ena2A-C also are expressed in B. antracis during sporulation (Bergman et al., 2006), although Enas have not previously been reported from B. anthracis spores.
CryoEM maps and immuno-gold TEM analysis of ex vivo S-type Enas indicated these contain both Ena1A and Ena1B (
To investigate the occurrence of ena1A-C within the B. cereus s.l. group and other relevant species of the genus Bacillus, pairwise tBLASTn searches for homologues of ena1A-C were performed on a database containing all available closed, curated Bacillus spp. genomes, with the addition of scaffolds for species for which closed genomes were lacking (n=735). Homologues with high coverage (>90%) and amino acid sequencing similarity (>80%) of ena1AB of B. cereus NVH 0075/95 were found in 48 strains including 11 of 85 B. cereus strains, 13 of 119 B. wiedmannii strains, 14 of 14 B. cytotoxicus strains, one of one B. luti (100%) strain, 3 of 6 B. mobilis strains, 3 of 33 B. mycoides strains, 1 of 1 B. tropics strain and both B. paranthracis strains analyzed. Of these strains, only 31 also carried a gene encoding a homolog with high sequence identity and coverage to Ena1C of B. cereus NVH 0075/95 (
Upon searching for Ena1A-C homologs in B. cereus group genomes, a candidate orthologous gene cluster encoding hypothetical EnaA-C proteins was discovered. These three proteins had, respectively, an average of 59.3±0.9%, 43.3±1.6% and 53.9±2.2% amino acid sequence identity with Ena1A, Ena1B and Ena1C of B. cereus NVH0075-95, and shared gene synteny (
The ena2A-C homo- or orthologues were much more common among B. cereus group strains than the ena1A-C genes; all investigated B. toyonensis (n=204), B. albus (n=1), B. bombysepticus (n=1), B. nitratireducens (n=6), B. thuringiensis (n=50) genomes and in the majority of B. cereus (87%, 74/85), B. wiedmannii (105/119, 89.3%), B. tropicus (71%, 5/7,) and B. mycoides (91%, 30/33) had the Ena2A-C form of the protein (
A few genomes had deviations in the ena-gene clusters compared to other strains of their species. Two of three B. mycoides strains (GCF_007673655 and GCF_007677835.1) lacked the ena1C allele downstream of the ena1A-B operon (data not shown). However, potential ena1c orthologs encoding hypothetical proteins with 50% identity to Ena1C of B. cereus NVH 0075/95 were found elsewhere in their genomes. One genome annotated as B. cereus (strain Rock3-44 Assembly: GCA_000161255.1) grouped with these strains of B. mycoides (
Our phylogenetic analyses of S-type fibers reveal Ena subunits belonging to a conserved family of proteins encompassing the domain of unknown function DUF3992.
Wild-type sequences of Ena1A (WP_000742049.1) and Ena1B (WP_000526007.1) were codon optimized for E. coli and ordered as synthetic genes from Twist Bioscience and subcloned further in the pET28a vector (NcoI-XhoI). The obtained plasmids (pET28a_Ena1A; pET28a_Ena1B) were used to transform competent cells of C43(DE3). Single colonies were used to start overnight (ON) LB cultures. 10 ml ON culture was used to inoculate 11 LB, 25 mg/ml kanamycin at 37 C. Recombinant expression was induced at OD600 of 0.8 by addition of 1 mM IPTG and cultures were left to incubate ON. Cells were pelleted by 15 min centrifugation at 4000 g. Cell pellets were resuspended in 1×PBS, 1 mg/ml lysozyme, 1 mM AEBSF, 50 μM leupeptin, 1 mM EDTA and incubated under active stirring at room temperature for 30 min after which DNAse and MgCl2 were added to a final concentration of 10 μg/ml and 10 mM, respectively, and incubated for another 30 min. Cell debris was pelleted via centrifugation (15 min, 4000 g). The supernatant was carefully removed and centrifuged for 50 min at 20.000 rpm. Supernatants were decanted and pellets were brought back into suspension (1×PBS). The resulting suspension was diluted five-fold in miliQ, deposited on Formvar/Carbon grids (400 Mesh, Cu; Electron Microscopy Sciences) and stained using 2% (w/v) uranyl acetate. TEM analysis revealed the presence of micrometer long fibers with a diameter of 10-11 nm. 2D classification of boxed fiber segments confirms the S-type nature of the observed fibers as shown in
Without knowledge on the function of Enas, we can only speculate about their biological role. The Enas of B. cereus group species resemble pili, which in Gram-negative and Gram-positive vegetative bacteria play roles in adherence to living surfaces (including other bacteria) and non-living surfaces, twitching motility, biofilm formation, DNA uptake (natural competence) and exchange (conjugation), secretion of exoproteins, electron transfer (Geobacter) and bacteriophage susceptibility (Lukaszczyk et al., 2019; Proft and Baker, 2009). Some bacteria express multiple types of pili that perform different functions. The most common function of pili-fibers is adherence to a diverse range of surfaces from metal, glass, plastics rocks to tissues of plants, animals or humans. In pathogenic bacteria, pili often play a pivotal role in colonization of host tissues and function as important virulence determinants. Similarly, it has been shown that appendages, expressed on the surface of C. sporogenes endospores, facilitate their attachment to cultured fibroblast cells (Panessa-Warren et al., 2007). The Enas are, however, not likely to be involved in active motility or uptake/transport of DNA or proteins as they are energy demanding processes that are not likely to occur in the endospore's metabolically dormant state. Enas appear to be a widespread feature among spores of strains belonging to the B. cereus group (
The cryo-EM images of ex vivo fibers showed 2-3 nm wide fibers (ruffles) at the terminus of S- and L-type Enas. The ruffles resemble tip fibrilla of P-pili and type 1 seen in many Gram-negatives bacteria of the family Enterobacteriaceae (Proft and Baker, 2009). In Gram-negative pilus filaments, the tip fibrilla provides adhesion proteins with a flexible location to enhance the interaction with receptors on mucosal surfaces (Mulvey et al., 1998). No filaments similar to the ruffles were observed on the in vitro assembled fibers suggesting that their formation require additional components than the Ena1A or Ena1B subunits.
We present the molecular identification of a novel class of spore-associated appendages or pili widespread in pathogenic Bacilli. Future molecular and infection studies will need to determine if and how Enas play a role in the virulence of spore-borne pathogenic Bacilli. The advances in uncovering the genetic identity and the structural aspects of the Enas presented in this work now enable in vitro and in vivo molecular studies to tease out their biological role(s), and to gain insights into the basis for Ena heterogeneity amongst different Bacillus species.
After isolation of Ena1B recombinantly produced S-fibers in cellulo, a suspension of Ena1B S-type fibers was prepared by diluting the Ena1B stock solution in miliQ to a final concentration of either 100 mg·mL−1 or 25 mg·mL−1. 50 μl of this Ena1B suspension was drop-cast onto a siliconized cover slip with a diameter of 18 mm and incubated at 60° C. for 1 h. Resulting thin films were either used as is (
ENA hydrogel preparation—50 μl of a 100 mg·ml−1 Ena1B S-type fiber suspension was pipetted onto a siliconized coverslip and airdried at 22° C. for 1 h (
Reinforced ENA hydrogel preparation—20 μl droplets of a 100 mg·ml−1 Ena1B S-type fiber suspension were dropped into 4 M MgCl2, 5 M NaCl or 100% (v/v) absolute Ethanol and incubated for 1 h at 22° C. The high viscosity of the ENA droplets prevents mixing of the fiber suspension with the chosen solutions, effectively stabilizing the droplet geometry during the incubation period. The high water activity of the salt or ethanol solution leads to a gradual dehydration of the ENA droplet resulting in the formation of a dense ENA hydrogel. The ENA hydrogel beads were 3× transferred to 1 mL of miliQ for removal of salt or ethanol and left to airdry for 24 h at 22° C. (
A mature spore from a quadruple Ena-knockout strain (Δena1A-1B-1C-ena3A) derived from B. cereus NM 0095-75 revealed a complete absence of any endospore appendages (
So, based on the identification of Ena3A as a further member of the Ena protein family, essential and sufficient to form L-type Ena fibers on Bacillus endospores, blast searches and a phylogenetic analyses was performed to provide candidate orthologues of Bacillus cereus Ena3A (as presented in SEQ ID NO:49). Multiple sequence alignment of the identified homologues (SEQ ID NO:50-80) is shown in
As a representative family member, the Ena3A protein presented in SEQ ID NO:49 was recombinantly expressed, also called herein ‘recEna3A’, and shown to produce helical, 7-start ladder-like (L-type) fibers with a helical twist of 18.4 degrees, a rise of 44.9 Å, and a diameter of 75 Å. L-type fibers are constructed of vertically stacked Ena3A heptameric rings, that are covalently connected via 7 N-terminal connectors. As shown in
The in vitro recombinant production of short Ena3 L-type fibers was obtained by expressing sterically blocked Ena3A, purification of the Ena3A multimers, followed by assembly of L-fibers after co-incubation with TEV protease (
So, the CryoEM structure of the Ena3A L-type fiber subunit of Bacillus cereus strain ATCC_10987 (WP_017562367.1; SEQ ID NO:49) provides the cryo-EM model as shown in
Thus, Ena3A subunits can be unambiguously identified based on a HMM profile search, resulting in a DUF3992 classification, followed by de novo structure prediction and comparison with the here disclosed for Ena3A cryoEM structures. A self-assembling Ena subunit will contain the eight-stranded Ena beta-sandwich fold with a Dali Z-score to Ena3A (SEQ ID NO: 49) of 6.5 or higher, and will contain a N-terminal connecter peptide with a Z-N-C(C)-M-C-X motif for disulphide-mediated cross-linking in the Ena fiber, and where Z is Leu, Ile, Val or Phe, N is 1 or 2 residues, C is Cys, M is 10 to 12 amino acids, and X is any amino acid. Self-assembly and fiber formation of candidate Ena subunits is done by recombinant expression in the cytoplasm of E. coli, and negative stain transmission electron visualization of isolated fiber material, as here described in material and methods.
To confirm that besides Ena1B, and Ena3A, the in vitro recombinant production method is generically applicable to all Enas for their typical fibers formation, the in vitro assembly Ena2A S-type fibers is shown in
Similarly, as a confirmation that the in cellulo or in vivo E. coli production of recombinant Ena fiber is also applicable to further Ena family members as shown for Ena1B and Ena3A, the recombinant expression of an Ena2A without steric block in E. coli resulted in ‘in cellulo’ assembly of S-fibers in the cytoplasm, followed by isolation of the fibers from the cell culture (
As shown in example 4 for Ena1C, multimeric disc-type of structures rather than helical multimers are formed in vitro using recombinant EnaC proteins. To further support this in view of Ena2C, similarly, recombinant Ena2C constituting multimers, as nonameric discs, were generated by expressing sterically blocked Ena2C (as presented in SEQ ID NO:146) with N-terminal 6×His-TEV blocker in E. coli Bl21 C43.
Isolation of the multimers and removal of the blocker by cleavage using TEV protease (as provided in the methods described herein), further resulted in L-type-like filaments, though filaments highly flexible and curving into closed loops (
The atomic model from recEna1B S-type fibers shows that the N-terminal connector (Ntc) of subunit i connects to subunits i−9 and i−10 via disulphide cross-linking. Although lateral, non-covalent contacts do exist between two neighbouring subunits (i−1,i), but these interactions are not expected to be sufficient to form robust fibers. To test that hypothesis, a recEna1BΔNtc (deletion of residues 2-15 of WT Ena1B of SEQ ID NO:8) was cloned and expressed in E. coli. Cells were harvested after overnight induction and deposited directly onto a TEM grid and analysed using ns-TEM (
Given the original steric block construct, used for the recombinant expression experiments exemplified herein contained 15 additional amino acids over the native Ena sequence (M-His6-SSG-TEV, MHHHHHHSSGENLYFQ-Ena1B, additional amino acids shown in bold), we made constructs containing smaller steric blocks of only 6 (M-TEV-Ena1B, M-ENLYFQ-Ena1B, wherein Ena1B is SEQ ID NO:8 without N-terminal M) or 9 (M-His6-SSG-Ena1B) additional amino acid residues at the N-terminus (
Constructs were designed to introduce an HA-tag (YPYDVPDYA) in the BC, DE, EF and HI loop regions of Ena1B, flanked by BamHI sites. For the DE loop, a second construct containing a FLAG-tag (DYKDDDDK) was designed as well. The FLAG-tag is also flanked by BamHI sites. Clear examples of peptide tag insertion in target loops are shown in the aligned sequences below and in
Alignment of Ena1B native sequence (SEQ ID NO:8) with engineered Ena1B insertion variants:
TGPVTIVFYSGGVTGTAVETIVVATGSSASFTVRRFDTVTILG...........TAAAETGEFCMTIRYTLS
SGPVTIVFYSGGVTGTAVETIVVATGSSASFTVRRFDTVTILG...........TAAAETGEFCMTIRYTLS
SGPVTIVFYSGGVTGTAVETIVVATGSSASFTVRRFDTVTILG...........TAAAETGEFCMTIRYTLS
TGPVTIVFYSGGVTGTAVETIVVATGSSASFTVRRFDTVTILGGSYPYDVPDYAGSAAETSEFCMTIRYTLS
indicates data missing or illegible when filed
Furthermore, engineering of the Ena proteins into Ena split-variants, also allowed to in cellulo assemble S-type Ena fibers, as shown in
Thus, Ena protein subunits can be used as engineered Ena subunits by providing them for recombinant expression as split-proteins, wherein at least the split into two polypeptides are shown here to still be able to undergo fold complementation upon co-expression and subsequently self-assembly into Ena S-type fibers.
Isolated recombinantly produced 6×His_TEV_Ena1B multimers were co-incubated with 100 nm Maleimide Super Mag Magnetic Beads (Raybiotech) in 1×PBS for 3 h at RT with continuous shaking and subjected to 3 rounds of washing in 1×PBS to remove any non-bound, sterically blocked Ena1B multimers. Next, the Ena1B functionalized magnetic beads were co-incubated with rec_6×His_TEV_Ena1B solution and TEV-protease, in 1×PBS for 1 h at RT with continuous shaking, and subjected to 3 rounds of washing in 1×PBS to remove any non-bound rec_6×His_TEV_Ena1B and TEV-protease. Next, 3 μl of the functionalized bead suspension was deposited onto a TEM grid and subjected to nsTEM analysis, revealing the presence of short S-type Ena1B fibers tethered to the surface of the magnetic beads (see expanded view in the right figure panel of
Recombinantly produced Ena1B S-type fibers were biotinylated using Biotin-dPEG11-MAL (Sigma-Aldrich) during 1 h at RT in 100 mM Tris pH 7.0, and subjected to 2 rounds of washing with miliQ water to remove any non-bound Biotin-dPEG11-MAL. Next, biotinylated Ena1B S-type fibers were co-incubated with streptavidin-coated gold beads (1.25 μm diameter), deposited onto a TEM grid and subjected to nsTEM analysis. Recorded micrographs demonstrate the successful functionalization of gold beads with S-type fibers, i.e. clear tethering of fibers onto the bead surface (
Solvent exposed threonine residues on the surfaces of Ena1B S-type or Ena3A L-type fibers were substituted with cysteines to serve as covalent, lateral, anchoring points through the formations of inter-fiber disulphide bridges. Each of the recombinantly produced proteins Ena1B T31C, Ena3A T40C and Ena3A T69C expressed and self-assembled well in the E. coli cytoplasm. Extraction of the Ena fibers was performed under oxidative conditions to facilitate S-S formation. nsTEM analysis of subsequently obtained fiber fractions revealed the presence of highly entangled Ena fiber networks, both for the Ena1B as the Ena3A point mutants (
Based on the observations and analyses presented herein, the Ena proteins are identified as a novel bacterial family of pili-forming protein subunits, belonging to the bacterial DUF3992 proteins, and containing an N-terminal conserved Cys-containing motif. First, identification of bacterial Ena protein family members is based on the amino acid sequence containing a DUF3992 domain, which can be analysed for adhering to the HMM profile of PFAM13157 as shown in Table 1 (or in the PFAM database: https://pfam.xfam.org/family/PF13157#tabview=tab4), and which contains an N-terminal connector (Ntc) comprising at least one conserved Cys, as presented herein, which corresponds to a conserved motif ZXnCCXmC, wherein Z is Leu, Ile, Val, or Phe, n is 1 or 2, and m is between 10 and 12 for Ena1/2 A & B proteins (see
Second, the structural requirements for a protein to be classified as an Ena protein is unambiguously derivable from its (predicted) fold which may simply be based on its amino acid sequence supplied to a modelling tool, as known in the art, and as compared to the Ena1B cryo-EM reference structure, as presented herein, and as deposited in the Protein Database with entry PDB7A02 (Version 1.0—entry submitted Aug. 6, 2000-released Aug. 24, 2000), wherein the fold similarity score, i.e. the Dali Z score, of the predicted fold is 6.5 or higher, since Z-scores higher then (n/10) minus 4, wherein n is the sequence length as the number of amino acids, are considered to correspond to highly significant fold similarities (Holm et al., 2008; Vol. 24 no. 23 p. 2780-2781; doi:10.1093/bioinformatics/btn507). Alternatively, the Ena3 cryo EM reference structure, as presented herein, can be used for determining the fold similarity, as shown in
Modelling of protein folds can be done by de novo prediction tools as is for instance performed, but not limited to, currently available sources such as Robetta (https://robetta.bakerlab.org/), or AlphaFold v2.0 (Jumper, et al. 2021, Nature; doi.org/10.1038/s41586-021-03819-2), or by homology based protein modelling as can be performed, for instance but not limited to available tools like SWISS-MODEL (https://academic.oup.com/nar/article/46/W1/W296/5000024), Phyre2 (https://www.nature.com/articles/nprot.2015.053), RaptorX (https://www.nature.com/articles/nprot.2012.085) and other.
For instance, structural comparison of a number of selected Ena candidate orthologues, characterized by the DUF3992 classification and the presence of an N-terminal connector, was performed for each 20 structure (shown in
Materials and Methods
Culture of B. cereus and Appendages Extraction
For extraction of Enas the B. cereus strain NVH 0075-95 was plated on blood agar plates and incubated at 37° C. for 3 months. Upon maturation, the spores were resuspended and washed in milli-Q water three times (centrifugation 2400×g at 4° C.). To get rid of various organic and inorganic debris, the pellet was then resuspended in 20% Nycodenz (Axis-Shield) and subjected to Nycodenz density gradient centrifugation where the gradient was composed of a mixture of 45% and 47% (w/v) Nycodenz in 1:1 v/v ratio. The pellet consisting only of the spore cells was then washed with 1M NaCl and TE buffer (50 mM Tris-HCl; 0.5 mM EDTA) containing 0.1% SDS respectively. To detach the appendages, the washed spores were sonicated at 20k Hz±50 Hz and 50 watts (Vibra Cell VC50T; Sonic & Materials Inc.; U.S.) for 30 s on ice followed by centrifugation at 4500×g and appendages were collected in the supernatant. To further get rid of the residual components of spore and vegetative mother cells n-Hexane was added and vigorously mixed with the supernatant in 1:2 v/v ratio. The mixture was then left to settle to allow phase separation of water and hexane. The hexane fraction containing the appendages was then collected and kept at 55° C. under pressured air for 1.5 hrs to evaporate the hexane. The appendages were finally resuspended in mill-Q water for further cryo-EM sample preparation.
Recombinant Expression, Purification and In Vitro Assembly of Ena1B Appendages
Ena1B was codon optimized for expression in E. coli., synthesized and cloned into Pet28a expression vector at Twist biosciences (SEQ ID NO:83). The insert was designed to have a N-terminal 6× histidine tag on Ena1B along with a TEV protease cleavage site (SEQ ID NO:89: ENLYFQG) in between. Large scale recombinant expression was carried out in phage resistant T7 Express lysY/Iq E. coli strain from NEB. A single colony was inoculated into 20 mL of LB and grown at 37° C. with shaking at 150 rpm overnight for primary culture. Next morning 6 L of LB was inoculated with 20 mL/L of primary culture and grown at 37° C. with shaking until the OD600 reached 0.8 after which protein expression was induced with 1 mM isopropyl β-D-1-thiogalactopyranoside (IPTG). The culture was incubated for a further 3 hrs at 37° C. and harvested by centrifugation at 5,000 rpm. The whole-cell pellet was resuspended in soluble lysis buffer (20 mM Potassium Phosphate, 500 mM NaCl, 10 mM β-ME, 20 mM imidazole, pH 7.5) and sonicated on ice for lysis. The lysate was centrifuged to separate the soluble and insoluble fractions by centrifugation at 18,000 rpm for 45 min in a JA-20 rotor from Beckman coulter. The pellet was further dissolved in denaturing lysis buffer consisting 8M urea in lysis buffer. The dissolved pellet was then passed HisTrap HP columns packed with Ni Sepharose and equilibrated with denaturing lysis buffer. The bound protein was then eluted out from the column with elution buffer (20 mM Potassium Phosphate, pH 7.5, 8 M Urea, 250 mM imidazole) in a gradient mode (20-250 mM Imidazole) using an AKTA purifier at room temperature. Recombinantly purified Ena1B with intact N terminal 6×HIS tag in denaturing conditions was subjected to buffer exchange with soluble lysis buffer by dialysis button from Hampton. As the N terminal His tag hindered the formation of double disulphide bridge between two monomers, Ena1B assembled into spirals (
Isolation of Recombinant In Vivo/in Cellulo Ena Fibers from Escherichia coli: [as Exemplified Herein for S-Type Fibers as in
Inoculate 1 liter of LB, 50 μg/ml kanamycin with 20 mL of an overnight pre-culture of E. coli C43(DE3) pET28a Ena1B or Ena3A, without steric block (i.e. for instance without HIS tag-TEV cleavage site as compared to in vitro assembly method). Incubate in a rotary shaker at 37° C. until mid-exponential phase (OD=0.7-1.0), lower temperature to 25° C. and add 1 mM final isopropyl β-d-1-thiogalactopyranoside. Incubate for 18 h, and harvest cells using a JLA 8.1 rotor at 5.000 rcf and 4° C. Resuspend cell pellets in 1×PBS, 1% (w/v) sodium dodecyl sulfate (SDS) using an overhead stirrer mounted with a propeller style agitator at 2000 rpm. Incubate the cell slurry for 30 min on a magnetic hotplate set to 99° C. while continuously stirring with a magnetic stirrer bar. Transfer homogenized lysate to 50 ml falcon tubes and centrifuge for 30 min at 20.000 rcf in a JLA 14.5 rotor at 20° C. Discard supernatant and resuspend pellets in 1×PBS using a Potter-Elvehjem tissue grinder with radial serrations and centrifuge homogenate for 30 min at 20.000 rcf. Discard supernatant and resuspend pellets in miliQ and centrifuge for 30 min at 20.000 rcf. Redissolve cleared Ena pellets in miliQ to reach desired final concentration.
Ena Treatment Experiments to Test its Robustness
Ex vivo Enas extracted from B. cereus strain NVH 0075-95 (see above) were resuspended in deionized water, autoclaved at 121° C. for 20 minutes to ensure inactivation of residual bacteria or spores, and subjected to treatment with buffer or as indicated below and shown in
Negative-Stain Transmission Electron Microscopy (TEM)
For visualization of spores and recombinantly expressed appendages by NS-TEM, formvar/carbon coated copper grids with 400-hole mesh from Electron Microscopy Sciences was discharged in a ELMO glow discharger with a plasma current of 4 mA at vacuum for 45 s. 3 μL of sample was applied on the grids and allowed to bind to the support film for 1 min after which the extra liquid was blotted out with Whatman grade 1 filter paper. The grid was then washed three times using three 15 μL drops of milli-Q followed by blotting of extra liquid. The washed grid was kept in 15 μL drops of 2% Uranyl acetate three times with 10 s, 2 s and 1 min long durations with a blotting step in between each dip. Finally, the uranyl acetate coated grids were blotted until drying. The grids were then screened using a 120 kV JEOL 1400 microscope equipped with LaB6 filament and TVIPS F416 CCD camera. 2D classes of the appendages were generated in RELION 3.0. as described later.
Preparation of Cryo-TEM Grids and Cryo-EM Data Collection
QUANTIFOIL® holey Cu 400 mesh grids with 2 μm holes and 1 μm spacing were first glow discharged in vacuum using plasma current of 5 mA for 1 min. 3 μL of 0.6 mg/mL Graphene Oxide (GO) solution was applied onto the grid and incubated 1 min for absorption at room temperature. Extra GO was then blotted out and left for drying using a Whatman grade 1 filter paper. For cryo-plunging, 3 μL of protein sample was applied on the GO coated grids at 100% humidity and room temperature in a Gatan CP3 cryo-plunger. After 1 min of absorption it was machine-blotted with Whatman grade 2 filter paper for 5 s from both sides and plunge frozen into liquid ethane at 180° C. Grids were then stored in liquid nitrogen until the data collection. Two datasets were collected for ex vivo and recEna1B appendages with slight changes in the collection parameters. High resolution cryo-EM 2D micrograph movies were recorded on a JEOL Cryoarm300 microscope automated with Serial EM in counting mode. For the ex vivo grown appendages, the microscope was equipped with a K2 summit detector and had the following settings: 300 keV, 100 mm aperture, 30 frames, 62.5 e−/Å2, 2.315 s exposure, and 0.82 Å/pxl. For the recEna1B dataset a K3 detector was used instead that had a pixel size of 0.782 Å/pxl, with an exposure of 64.66 e−/Å2 taken over 61 frames.
Image Processing
MOTIONCORR2 (Zheng et al., 2017) implemented in RELION 3.0 (Zivanov et al., 2018) was used to correct for beam-induced image motion and averaged 2D micrographs were generated. The motion-corrected micrographs were used to estimate the CTF parameters using CTFFIND4.2 (Rohou and Grigorieff, 2015) integrated in RELION 3.0. Subsequent processing used RELION 3.0. and SPRING (Desfosses et al., 2014). For both the datasets, the coordinates of the appendages were boxed manually using e2helixboxer from the EMAN2 package (Tang et al., 2007). Special care was taken to select micrographs with good ice and straight stretches of Ena filaments. The filaments were segmented into overlapping single-particle boxes of dimension 300×300 pxl with an inter-box distance of 21 Å. For the ex vivo Enas a total of 53,501 helical fragments was extracted from 580 micrographs with an average of 2-3 long filaments per micrograph. For the recEna1B filaments, 100,495 helical fragments were extracted from 3,000 micrographs with an average of 4-5 filaments per micrograph. To filter out bad particles multiples rounds of 2D classification were run in RELION 3.0. After several rounds of filtering, a dataset of 42,822 and 65,466 good particles of the ex vivo and recEna1B appendages were selected, respectively.
After running ˜50 iterations of 2D classification well-resolved 2D class averages could be obtained. segclassexam of the SPRING package (Desfosses et al., 2014) was used to generate B-factor enhanced power spectrum of the 2D class averages. The generated power spectrum had an amplified signal-to-noise ratio with well resolved layer lines (
Model Building
To improve the connectivity of the asymmetrical units, density modification for cryo-EM tool implemented in PHENIX (Afonine et al., 2018) was used. At first the primary skeleton for a single asymmetric subunit from the density modified map was generated in Coot (Emsley et al., 2010). Primary sequence of Ena1B was manually threaded into the asymmetric unit and fitted into the map taking into consideration the chemical properties of the residues. SSM Superpose option in coot was used to build the helix from a single subunit. The built model was then subjected to multiple rounds real space structural refinement in Phenix, each residue was manually inspected after every round of refinement. Model validation was done in Refmac implemented in Phenix. All the visualizations and images for figures were generated in ChimeraX (Goddard et al., 2018), Chimera (Pettersen et al., 2004), Pymol.
Immunostaining of Enas
Aliquots of purified RecEna1A, RecEna1B and RecEna1C were sent to Davids Biotechnologie GmbH (Germany) for rabbit immunization (28-day SuperFast immunization schedule; A055). Sera were received after one month and used without further affinity purification. For immunostaining EM imaging, 3 μl aliquots of purified ex vivo Enas were deposited on Formvar/Carbon grids (400 Mesh, Cu; Electron Microscopy Sciences), washed with 1×PBS, and incubated for 1 h with 0.5% (w/v) BSA in 1×PBS. After additional washing with 1×PBS, separate grids were incubated for 2 h at 37° C. with 1000-fold dilutions in 1×PBS of anti-Ena1A, anti-Ena1B, and anti-Ena1C sera, respectively. Following washing with 1×PBS, grids were incubated for 1 h at 37° C. with a 2000-fold dilution of 10 nm gold labeled Anti-Rabbit IgG produced in goat, and affinity isolated antibody (G7277-.4ML; Sigma-Aldrich).
Quantitative RT-PCR
Quantitative RT-PCR experiments were performed on isolated mRNA from B. cereus cultures harvested from three independent Bacto media cultures (37° C., 150 rpm) at four, eight, 12 and 16 hrs post-inoculation. RNA extraction, cDNA synthesis and RT-qPCR analysis was performed as essentially described before (Madslien et al., 2014), with the following changes: pre-heated (65° C.) TRIzol Reagent (Invitrogen) and bead beating 4 times for 2 min in a Mini-BeadBeater-8 (BioSpec) with cooling on ice in between. Each RT-qPCR of the RNA samples was performed in triplicate, no template was added in negative controls, and rpoB was used as internal control. Slopes of the standard curves and PCR efficiency (E) for each primer pair were estimated by amplifying serial dilutions of the cDNA template. For quantification of mRNA transcript levels, Ct (threshold cycle) values of the target genes and the internal control gene (rpoB) derived from the same sample in each RT-qPCR reaction were first transformed using the term E−Ct. The expression levels of target genes were then normalized by dividing their transformed Ct-values by the corresponding values obtained for the internal control gene (Duodu et al., 2010; Madslien et al., 2014; Pfaffl, 2001). The amplification was conducted by using StepOne PCR software V.2.0 (Applied Biosystems) with the following conditions: 50° C. for 2 min, 95° C. for 2 min, 40 cycles of 15 s at 95° C., 1 min at 60° C. and 15 s at 95° C. All primers used for RT-qPCR analyses are listed in Table 2. Regular PCR reactions were performed on cDNA to confirm that enaA and enaB were expressed as an operon using the primers 2180/2177 and 2176/2175 and DreamTaq DNA polymerase (Thermo Fisher) amplified in an Eppendorf Mastercycler using the following program: 95° C. for 2 min, 30 cycles of 95° C. for 30 s, 54° C. for 30 s, and 72° C. for 1 min.
Construction of Deletion Mutants
The B. cereus strain NVH 0075/95 was used as background for gene deletion mutants. The ena1B gene was deleted in-frame by replacing the reading frames with ATGTAA (5′-3′) using a markerless gene replacement method (Janes and Stibitz, 2006) with minor modifications. The Δena1B Δena1C double mutant was constructed by deletion of ena1C in the B. cereus strain NVH 0075/95 Δena1B background.
To create the deletion mutants the regions upstream (primer A and B, Table 2) and downstream (primer C and D, Table 2) of the target ena genes were amplified by PCR. To allow assembly of the PCR fragments, primers B and C contained complementary overlapping sequences. An additional PCR step was then performed, using the upstream and downstream PCR fragments as template and the A and D primer pair (Table 2). All PCR reactions were conducted using an Eppendorf Mastercycler gradient and high fidelity AccuPrime Taq DNA Polymerase (ThermoFisher Scientific) according to the manufacturer's instructions. The final amplicons were cloned into the thermosensitive shuttle vector pMAD (Arnaud et al., 2004) containing an additional I-Scel site as previously described (Lindback et al., 2012). The pMAD-I-Scel plasmid constructs were passed through One Shot™ INV110 E. coli (ThermoFisher Scientific) to achieve unmethylated DNA to enhance the transformation efficiency in B. cereus. The unmethylated plasmid were introduced into B. cereus NVH 0075/95 by electroporation (Mahillon et al., 1989). After verification of transformants by PCR, the plasmid pBKJ233 (unmethylated), containing the gene for the I-Scel enzyme, was introduced into the transformant strains by electroporation. The I-Scel enzyme makes a double-stranded DNA break in the chromosomally integrated plasmid. Subsequently, homologous recombination events lead to excision of the integrated plasmid resulting in the desired genetic replacement. The gene deletions were verified by PCR amplification using primers A and D (Table 2) and DNA sequencing (Eurofins Genomics).
Search for Orthologues and Homologues of Ena1
Publicly available genomes of species belonging to the Bacillus s.l. group was downloaded from NCBI RefSeq database (n=735, NCB (https://www.ncbi.nlm.nih.gov/refseq/). Except for strains of particular interest due to phenotypic characteristics (GCA_000171035.2_ASM17103v2, GCA_002952815.1_ASM295281v1, GCF_000290995.1_Baci_cere_AND1407_G13175) and species of which closed genomes were non-existent or very scarce, all assemblies included were closed and publicly available genomes from the curated database of NCBI RefSeq. Assemblies were quality checked using QUAST (Gurevich et al., 2013), and only genomes of correct size (˜4.9-6 Mb) and a GC content of ˜35% were included in the downstream analysis. Pairwise tBLASTn searches were performed (e-value 1e-10, max_hspr 1, default settings) to search for homo- and orthologs of the following query-protein sequences from strain NVH 0075-95: Ena1A (SEQ ID NO:1), Ena1B (SEQ ID NO:87), Ena1C (SEQ ID NO:15). The Ena1B protein sequence (SEQ ID NO:87) used as query originated from an inhouse amplicon sequenced product, while the Ena1A and Ena1C protein sequence queries originated from the assembly for strain NVH 0075-95 (Accession number GCF_001044825.1, protein KMP91697.1 and KMP91699.1, resp. We considered proteins orthologs or homologs when a subject protein matched the query protein with high coverage (>70%) and moderate sequence identity (>30%).
Comparative Genomics of the Ena-Genes and Proteins
Phylogenetic trees of the aligned Ena1A-C proteins were constructed using approximately maximum likelihood by FastTree (Price et al., 2010) (default settings) for all hits resulting from the tBLASTn search. The amino acid sequences were aligned using mafft v.7.310 (Katoh et al., 2019), and approximately-maximum-likelihood phylogenetic trees of protein alignments were made using FastTree, using the JTT+CAT model (Price et al., 2010). All Trees were visualized in Microreact (Argimon et al., 2016) and the metadata of species, and presence and absence for Ena1A-C and Ena2A-C overlaid the figures.
1 Numbers reflect the density modified cryo-EM map calculated using ResolveCryoEM (Terwilliger et al., 2019)
2 Numbers reflect a S-type Ena model with 23 Ena1B protomers
3 Numbers for a single Ena1B protomer
fluoroglycofenilyticus]
CCTCTCTACATAGCCTTTCCCCTCTCTCTT
AAGGCTATGTAGAGAGGGGAATTAGTAT
CCATATATTACA
ACTAATTCCCCTCTC
AATTAGTATGTAATATATGGTGATTTAAAGATT
ATTTTTTTGTTATCCTTTTCATAAGACTGTTTAC
TGAAAAGGATAACAAAAAAATTATTGCTTTTG
CCATATATTACATAGCCTTTCCCCTCTC
AAAGGCTATGTAATATATGGTGATTTAAAGAT
To allow assembly of the PCR fragments, primers B and C contain sequences overlapping each other (italic).
Number | Date | Country | Kind |
---|---|---|---|
20189961.4 | Aug 2020 | EP | regional |
This application is a national phase entry under 35 U.S.C. § 371 of International Patent Application PCT/EP2021/072085, filed Aug. 6, 2021, designating the United States of America and published in English as International Patent Publication WO 2022/029325 on Feb. 10, 2022, which claims the benefit under Article 8 of the Patent Cooperation Treaty to European Patent Application Serial No. 20189961.4, filed Aug. 7, 2020, the entireties of which are hereby incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/072085 | 8/6/2021 | WO |