The present application claims the benefit of a priority of Chinese Patent Application No. 202010436910.X, entitled “METHOD FOR BIOSYNTHESIS OF PROTEIN HETEROGENEOUS CATENANE” and filed on May 21, 2020, the entire contents of which including the Appendix are incorporated herein by reference.
The present disclosure relates to a method for biosynthesis of a protein heterocatenane, in particular to a biosynthetic system based on peptide-protein reactive pairs and/or split inteins, and a method for constructing a multi-domain protein heterocatenane by two orthogonal coupling cyclization modes on the basis of said system.
In nature, many natural biological macromolecules have specific topology and are closely related to their respective biological functions. Natural topological proteins that have so far been found include cyclic proteins, knotted proteins, lasso proteins, and protein catenanes, etc. Since the construction of cyclic proteins requires only the coupling of polypeptide chains, it is currently the focus of artificial topological proteins, which typically shows significantly improved thermal stability. Due to the complexity of the mechanism of protein folding, it is relatively difficult to regulate the topology of proteins by controlling the entwining relationship between polypeptide chains. The simplest [2]catenane among catenanes is composed of two mechanically interlocked cyclic motifs, and hence the corresponding protein heterocatenane structure can not only possess the advantages of cyclic proteins, but also achieve synergy functions by regulating the relative positions of the two cyclic motifs. Nevertheless, such a structure has not been found in nature. It is thus a very attractive research direction to develop a preparation method for protein heterocatenanes.
Relatively few reports are currently available on the synthesis of artificial protein catenanes, and their synthetic strategies can be broadly classified into three categories while the essence of achieving mechanically interlocked structures is all based on the folded structure of the proteins. The first category is realizing synthesis of protein homocatenanes by guiding the intertwining among molecular chains using the tetrameric domain p53tet of the tumor suppressor protein p53 or its mutant dimeric domain p53dim, followed by cyclization through highly efficient specific natural chemical ligation or SpyTag-SpyCatcher reactive pairs. The second category is gradually converting the topology of lasso peptides into higher-order catenanes through enzyme digestion and assembly. In the third category, the synthesis of protein heterocatenanes was achieved for the first time by splitting SpyCatcher into BDTag and SpyStapler and rationally recombining the three motifs based on the folded structure of the SpyTag-SpyCatcher reactive pairs combined with the characteristics of the split intein-mediated cyclization and autocatalytic formation of isopeptide bonds, but the reaction cannot be complete and the whole purification process is tedious. Based on the assembly-reaction synergy, further development of methods for biosynthesis of protein heterocatenanes will contribute to a more in-depth study on the effects of the topology on the protein functions and properties and will also lay the foundation for their applications in the field of biomedicine.
An objective of the present disclosure is to provide a strategy for biosynthesis of protein heterocatenanes that allows for efficient construction of multi-domain protein heterocatenanes without implementing any additional extracellular reactions.
By mimicking the multi-step post-translational modification process in the synthesis of natural topological proteins, combined with the in situ assembly, chain cleavage and site-specific cyclization, the present disclosure develops a synthetic system based on two orthogonal coupling approaches through rationally designed gene sequences, which enables modular synthesis of protein heterocatenanes featuring a branched or fully backbone cyclized structure.
The basic structure of the protein precursor sequence designed herein for preparing the protein heterocatenane includes: L1-1-X-L1-2-(in situ enzyme digestion site)-L2-1-X-L2-2, wherein
The two pairs of cyclization motifs are selected mainly from the following three options:
By inserting one or more identical or different proteins of interest in the basic structure of the above-mentioned protein precursor sequence, it is possible to construct a protein heterocatenane comprising the proteins of interest. The insertion sites for the proteins of interest may be within the ring, i.e., before and/or after the X domain. Since the cyclization mediated by the peptide-protein reactive pair is side-chain coupling and the N terminus and C terminus are still intact after cyclization, the insertion sites for the proteins of interest may also be outside the ring. i.e., the N terminus and/or the C terminus of the peptide-protein reactive pair, thereby constructing a branched heterocatenane.
The gene construction of the target protein is shown in
The strategy adopted in the present disclosure for biosynthesis of protein heterocatenanes focuses on the following aspects: (1) entwining motifs (X) such as the p53dim domain are utilized to realize mechanical interlocking, and the yield of heterocatenanes is improved by converting intermolecular dimerization into intramolecular dimerization; (2) cyclization modes that can occur intracellularly are selected, and peptide-protein reactive pairs and split inteins are most widely used at present; (3) the two cyclization modes should be somewhat orthogonal to avoid excessive side reactions, for example, a SpyTag-SpyCatcher reactive pair is used in combination with a split intein, or two split inteins with certain orthogonality are selected; and (4) a split intein typically includes a large-sized N-terminal part (IntN) and a relatively small-sized C-terminal part (IntC), and when IntC is located in the chain resulting in a blocked reaction, in situ cleavage of the nascent polypeptide chain by co-expressing the protease can trigger the trans-splicing reaction mediated by this split intein.
The split intein involved herein is preferably an NpuDnaE split intein, which is naturally split into IntC1 containing 36 amino acids and IntN1 containing 102 amino acids. IntC2 containing 15 amino acids and the corresponding IntN2 containing 123 amino acids, obtained by systematically truncating the IntC part, also have a good trans-splicing efficiency. Although IntC1 is somewhat reactive with IntN2, IntC2 is unable to react with IntN1, reflecting certain orthogonality.
The biosynthetic systems for the protein heterocatenanes described herein all make use of the intramolecular dimerization of entwining motifs such as the p53dim domain to guide the entwining of the polypeptide chains, but achieve orthogonal coupling in different ways. The intracellular cyclization reaction based on peptide-protein reactive pairs is a side-chain coupling reaction with intact N-/C-termini, while the resulting complex exists in the final structure, and thus a branched protein heterocatenane can be prepared by further fusing other proteins of interest. In contrast, the intracellular cyclization reaction based on the split inteins can realize the backbone cyclization by linking the two ends of the peptide chain via a native peptide bond, while the split inteins are released from the precursor proteins by self-splicing.
The method for biosynthesis of a protein heterocatenane provided herein substantially comprises:
In step 1) described above, the peptide-protein reactive pair is preferably a SpyTag-SpyCatcher reactive pair or a SnoopTag-SnoopCatcher reactive pair. The amino acid sequences of typical SpyTag and SpyCatcher are as shown in SEQ ID NO:1 and SEQ ID NO:2 in the sequence listing. A reactive SpyTag/SpyCatcher mutant may also be used. The mutant refers to a peptide chain derived from the above amino acid sequence of SpyTag/SpyCatcher by substitution, deletion or addition of amino acid residue(s), where the substitution, deletion or addition of amino acid residue(s) does not exert any influence on the coupling reaction for generating isopeptide bonds.
In step 1) described above, the entwining motif X is preferably a tumor suppressor-derived p53dim domain. The amino acid sequence of typical p53dim domain is as shown in SEQ ID NO:3 in the sequence listing. A p53dim mutant capable of forming an analogous dimeric structure may be used. The mutant refers to a peptide chain derived from the above amino acid sequence by substitution, deletion or addition of amino acid residue(s), where the substitution, deletion or addition of amino acid residue(s) does not exert any influence on the generation of their entwined dimers.
In step 1) described above, the split intein is preferably an NpuDnaE split intein containing N-terminal part (IntN) and C-terminal part (IntC) to constitute a cyclization motif, and the amino acid sequences of IntC1, IntN1, IntC2 and IntN2 resulting from the two splitting methods are as shown in SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6 and SEQ ID NO:7 in the sequence listing. In addition, other eligible split inteins may also be applied in the present disclosure to enable the biosynthesis of protein heterocatenanes.
In step 1) described above, the in situ enzyme digestion site is preferably a recognition sequence ETVRFQG of the Tobacco vein mottling virus (TVMV) protease.
In step 1) described above, in order to demonstrate the topology of the synthesized protein heterocatenane, a recognition sequence ENLYFQG of the Tobacco etch virus (TEV) protease may be introduced before the first entwining motif X, which may also be used as the in situ enzyme digestion site as described. To facilitate purification, a histidine tag is further introduced before the second entwining motif X. The protein purification is performed by affinity chromatography on a nickel column in step 4).
In step 3) described above, in the case of L1-1/L1-2 being a peptide-protein reactive pair, co-expression with a protease to digest the cleavage site in situ is necessary to achieve the biosynthesis of protein heterocatenanes; in the case of L1-1/L1-2 being a split intein, there is no need to co-express the protease.
In step 4) described above, for the protein heterocatenane with a histidine tag introduced therein, the expressed protein is purified by affinity chromatography on a nickel column, and the purity of the protein heterocatenane can be further improved by gradient elution or size exclusion chromatography.
In the examples of the present disclosure, the following protein precursor sequences are designed as shown in
The above coding genes are introduced into the expression vector pMCSG19, which is then transformed into BL21(DE3) competent cells for expression. In the system BXA-IntC1-X-IntN1 where the protease needs to be co-expressed. BL21(DE3) competent cells also contain a pRK1037 plasmid encoding the TVMV protease; in contrast, IntC1-X-POI1-IntN1-IntC2-X-POI2-IntN2 enables the biosynthesis of protein heterocatenanes when expressed alone or co-expressed with the TVMV protease, and there is no significant difference between the two conditions, so transferring the expression vectors into conventional BL21(DE3) competent cells for expression is sufficient. Finally, the obtained fusion proteins are purified to obtain the corresponding protein heterocatenanes.
A protein precursor obtained by expressing a recombinant plasmid of BXA-IntC1-X-IntN1 or IntC1-X-POI1-IntN1-IntC2-X-POI2-IntN2 first forms an intramolecularly entwined structure by dimerization of the p53dim domain and then achieves site-specific cyclization by two orthogonal coupling approaches. In the BXA-IntC1-X-IntN1 system, it is necessary to achieve the in situ enzyme digestion by co-expression with the TVMV protease to trigger the IntC1/IntN1-mediated trans-splicing reaction, followed by the side-chain cyclization reaction mediated by the SpyTag-SpyCatcher reactive pairs, resulting in the preparation of protein heterocatenane cat-BXA-X. In the system of IntC1-X-POI1-IntN1-IntC2-X-POI2-IntN2, two pairs of split inteins can undergo the sequential trans-splicing reaction to mediate the cyclization of the two proteins of interest in turn, ultimately leading to the preparation of protein heterocatenane cat-XPOI1-XPOI2.
By introducing other folded proteins, such as AffiHER2 with high affinity for HER2, at the N terminus of SpyCatcher and the C terminus of SpyTag on the basis of BXA-IntC1-X-IntN1, it is possible to realize biosynthesis of branched protein heterocatenanes based on the same co-expression method. In the system of IntC1-X-POI1-IntN1-IntC2-X-POI2-IntN2, a small ubiquitin-modified protein SUMO and a superfolded protein GFP are selected as model proteins to achieve the biosynthesis of protein heterocatenanes cat-XSUMO-X and cat-XSUMO-XGFP, respectively.
Before describing the present disclosure in detail, it should be appreciated that the present disclosure is not subject to the particular methods and experimental conditions described herein because the methods and conditions can be altered. Furthermore, the terms used herein are only for the purpose of explaining particular embodiments and are not intended to be limiting.
Unless otherwise defined, all technical and scientific terms used herein have the same meanings as conventionally understood by a person skilled in the art. For the sake of the present disclosure, the following terms are defined.
The term “and/or”, when used to connect two or more options, shall be understood to mean either of or any two or more of the options.
As used herein, the term “comprising” or “including” is intended to include the elements, integers or steps, without the exclusion of any other elements, integers or steps. The term “comprising” or “including”, when used herein, also covers situations of consisting of the recited elements, integers or steps, unless otherwise indicated.
Various exemplary examples, features, and aspects of the present disclosure will be illustrated in detail below. The term “exemplary” as used exclusively herein means “serving as an instance, example or illustration”. Any example described herein as “exemplary” is not necessarily construed as superior to or better than the other examples.
In addition, numerous details are set forth in the specific embodiments below in order to better illustrate the present disclosure. It shall be appreciated to a person skilled in the art that the present disclosure can still be implemented even without some of the details. In some other examples, methods, means, equipment, and steps familiar to a person skilled in the art are not described in detail so as to highlight the principles of the present disclosure.
Unless otherwise stated, all of the units used in the present specification are international standard units and all of the numerical values and numerical ranges used herein shall be construed as inclusion of systematic errors unavoidable in industrial production.
The sequences of protein precursors involved in the biosynthesis of protein heterocatenanes are illustrated below by way of some specific examples:
The present disclosure carries out the basic characterization and topological proof of the prepared protein heterocatenanes by conventional characterization means such as sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), ultra-performance liquid chromatography-mass spectrometry (LC-MS) and TEV protease digestion reaction.
Based on a rational design of gene sequences to combine in situ assembly, enzyme digestion and site-specific cyclization, the present disclosure develops an orthogonal coupling-based biosynthetic system applicable to the intracellular synthesis of heterocatenanes containing a variety of functional proteins, which highlights the following major advantages: 1) said biosynthetic system enables modular synthesis of heterocatenanes by genetically encoded approach, and improves the yield of protein heterocatenanes using intramolecular dimerization of entwining dimeric motifs such as the p53dim domain, and there are a variety of options available for the corresponding entwining motifs and coupling means; 2) by mimicking the multi-step post-translational modification process in synthesis of natural topological proteins, said biosynthetic system accomplishes the entwining of polypeptide chains and two orthogonal covalent cyclization reactions intracellularly without the need for additional extracellular reactions, and the corresponding protein heterocatenanes are obtained after expression and purification; and 3) in the construction of a protein precursor containing a peptide-protein reactive pair such as BXA-IntC1-X-IntN1, biosynthesis of branched protein heterocatenanes can be realized by introducing other folded proteins at the N terminus of the SpyCatcher and the C terminus of the SpyTag; while in the construction of a protein precursor containing two orthogonal split inteins such as IntC1-X-POI1-IntN1-IntC2-X-POI2-IntN2, biosynthesis of protein heterocatenanes with backbone cyclization is achieved. Both systems extend the scope of the existing protein heterocatenane structures.
The present disclosure is further described in detail below by way of examples, which are not intended to limit the scope of the present disclosure in any way.
Protein precursors involved in biosynthesis of protein heterocatenanes and their corresponding expression systems are constructed by the following specific steps:
The prepared protein heterocatenanes are subjected to basic characterization and their topologies are proven through sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), ultra-performance liquid chromatography-mass spectrometry (LC-MS), and TEV protease digestion reaction.
Gene fragments of BXA-IntC1-X-IntN1 and AffiHER2-BXA-AffiHER2-IntC1-X-IntN1 were inserted into the expression vectors pMCSG19 respectively, with the sequences shown in SEQ ID No:8 and SEQ ID No:9 in the sequence listing. The resulting constructs were confirmed by sequencing, then transformed into the pRK1037 plasmid-containing BL21(DE3) competent cells, and incubated overnight at 37° C. on Amp-Kan plates containing 100 μg/mL of sodium ampicillin (Amp) and 50 μg/mL of kanamycin (Kan). Thereafter, monoclonal colonies were picked out, inoculated into a 5-mL 2×YT medium with the same antibiotics, and subjected to shake incubation at 37° C. for 10 to 12 hours to prepare a seed broth. The seed broth was inoculated at a ratio of 1:100 into a 250 mL 2×YT medium with the same antibiotics, and the obtained cultures were subjected to shake incubation at 37° C. until OD600 was between 0.5 and 0.7. Isopropyl-p-D-thiogalactopyranoside (IPTG) was added to a final concentration of 0.25 mM, and then the cultures were shaken at 16° C. for 20 hours for expression.
Gene fragments of IntC1-X-SUMO-IntN1-IntC2-X-IntN2 and IntC1-X-SUMO-IntN1-IntC2-X-GFP-IntN2 were inserted into the expression vectors pMCSG19 respectively, with the sequences shown in SEQ ID No:10 and SEQ ID No:11 in the sequence listing. The resulting constructs were confirmed by sequencing, then transformed into BL21(DE3) competent cells, and incubated overnight at 37° C. on plates containing 100 μg/mL of sodium ampicillin. Thereafter, monoclonal colonies were picked out, inoculated into a 5-mL 2-YT medium with the same antibiotics, and subjected to shake incubation at 37° C. for 10 to 12 hours to prepare a seed broth. The seed broth was inoculated at a ratio of 1:100 into a 250 mL 2×YT medium with the same antibiotics, and the obtained cultures were subjected to shake incubation at 37° C. until OD600 was between 0.5 and 0.7. Isopropyl-β-D-thiogalactopyranoside (IPTG) was added to a final concentration of 0.25 mM, and then the cultures were shaken at 16° C. for 20 hours for expression.
Upon completion of the protein expression, the bacterial cells were collected by centrifugation (5500 g×15 min) with a high-speed refrigerated centrifuge and the supernatant was discarded. Bacterial cells were re-suspended with lysis buffer A (50 mM sodium dihydrogen phosphate, 300 mM sodium chloride, 10 mM imidazole, pH 8.0). The re-suspension was sonicated with an ultrasonic homogenizer in an ice-water bath (5-second interval for every 5-second operation, 30% intensity) and then centrifuged (12000 g×30 min) to collect the supernatant. The supernatant was mixed well with Ni-NTA resin and incubated at 4° C. for 1 hour. The mixture was poured into an empty column PD-10 for purification, and after the lysate was exhausted, the resin was washed with wash buffer B (50 mM sodium dihydrogen phosphate, 300 mM sodium chloride, 20 mM imidazole, pH 8.0) for 5 to 10 times the resin volume to reduce non-specific adsorption. The protein heterocatenanes cat-BXA-X, cat-(AffiHER2-BXA-AffiHER2)-X and cat-XSUMO-X could be eluted directly with elution buffer C (50 mM sodium dihydrogen phosphate, 300 mM sodium chloride, 250 mM imidazole, pH 8.0). In order to increase the purity, the protein heterocatenane cat-XSUMO-XGFP was subjected to gradient elution of first eluting with elution buffer D (50 mM sodium dihydrogen phosphate, 300 mM sodium chloride, 50 mM imidazole, pH 8.0) for about 10 times the resin volume, collecting the protein eluent which mainly contained heterocatenane, and then eluting the cyclic or catenated by-product of GFP with the elution buffer C.
The protein eluent was further purified using a fast protein liquid chromatography system (ÄKTA pure, GE Healthcare) with a size exclusion chromatography column (Superdex 200 increase 10/300 GL, GE Healthcare). The mobile phase was phosphate buffered saline PBS (pH 7.4) filtered through a 0.22 μm filter at a flow rate of 0.5 mL % min. The efflux peak of the protein was monitored by UV absorption at 280 nm, and the sample was collected for characterization.
The protein heterocatenanes purified in Example 3 were first added with 5×SDS loading buffer, heated at 98° C. for 10 min, and then characterized by SDS-PAGE. After exchanging buffers of the protein samples purified by SEC into ddH2O with an ultrafiltration tube, LC-MS was adopted to characterize their molecular weights. Protein concentrations were determined by an ultra-micro spectrophotometer (NanoPhotometer P330, Implen, Inc.). To prove the heterocatenane topology, the protein solution (10 μM) and TEV protease solution (10 μM) were mixed at a molar ratio of 20:1 and proteolysis was carried out at 37° C. (for 1, 3, 6 hours, where the protease digestion could be substantially complete within 3 hours). After the protease digestion, 10 μL of the proteolytic products were added with 5×SDS loading buffer and heated at 98° C. for 10 min to quench the reaction. The product composition after digestion was characterized by SDS-PAGE. After exchanging buffers of the remaining digested system into ddH2O with an ultrafiltration tube, LC-MS was employed to confirm the molecular weight of the proteolytic products. The results of the SEC characterization after affinity purification by a nickel column, SDS-PAGE characterization before and after the enzyme digestion, and the LC-MS characterization of the cat-BXA-X, cat-(AffiHER2-BXA-AffiHER2)-X, cat-XSUMO-X, and cat-XSUMO-XGFP were as shown in
Number | Date | Country | Kind |
---|---|---|---|
202010436910.X | May 2020 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/094589 | 5/19/2021 | WO |