The invention relates to a synthetic nucleic acid molecule for expressing at least one nucleotide sequence of interest in at least one prokaryotic host cell, comprising at least one promoter sequence and at least one cloning site for inserting the nucleotide sequence of interest, wherein the cloning site is located downstream of the promoter sequence. The invention also relates to a method for producing a shuttle vector, said vector comprising at least one replication module comprising at least one replication cassette for promoting replication of a nucleic acid molecule in Gram-negative organisms and at least one replication cassette for promoting replication of a nucleic acid molecule in Gram-positive organisms, at least one expression module for promoting expression of a nucleotide sequence of interest in a host cell, and at least one resistance module for providing the host cell with antibiotic resistance.
The heterologous expression of genes in prokaryotes is challenging, especially if the genes originate from a distant host or if the source is uncertain, such as a metagenomic expression library. Many vectors have been developed based on broad host range origins of replication, but these all focus on either Gram(+) or Gram(−) prokaryotes.
Escherichia coli is the working horse in biotechnology for decades. But with the shifting focus in biotechnology to functional metagenomics, the expression of environmental DNA (eDNA) in E. coli becomes a bottleneck. One may be able to optimize DNA sequences for the needs of E. coli and to generate novel E. coli strains, but this does not apply for the establishment of expression libraries with unknown DNA sequences, such as eDNA. That prevents us from harnessing the enormous biotechnological potential of genomes from uncultured microorganisms. However some approaches have already been followed to circumvent E. coli as an expression host, such as the establishment of a metagenomic expression library in Cupriavidus metaffidurans, Streptomyces spp., and Pseudomonas fluorescens.
Although some of these approaches are based on broad-host range expression vectors, the number of hosts is very limited due to the focus on either Gram(+) or Gram(−) organisms and the fact that most vectors replicate preferentially either in Gram(+) or Gram(−) organisms.
Today's broad host vectors are mainly based on IncaP, RK2 or rolling circle replicating (RCR) plasmids like pCI411. A very frequently used RCR-plasmid is pGK12 from Kluyveromyces lactis CBS 2359, which can replicate in E. coli and mainly in Gram(+) organisms like Bacillus subtilis, Borrelia burgdorferi and Lactococcus lactis. Although its RCR origin is used in over 20 shuttle vectors, it has numerous disadvantages like its big size and instabilities in any host. Due to this poor performance it was never widely adopted and alternatives are of great interest.
Accordingly, there is a need in the art for an expression vector for establishing expression libraries with unknown DNA sequences, such as environmental DNA. There is also a need in the art for a broad-host range expression vector which replicates in both Gram(+) and Gram(−) organisms.
The invention is directed at a synthetic nucleic acid molecule comprising at least one replication module comprising at least one replication cassette for promoting replication of the nucleic acid molecule in Gram-negative organisms and at least one replication cassette for promoting replication of the nucleic acid molecule in Gram-positive organisms, at least one expression module for promoting expression of the nucleotide sequence of interest in the host cell, and at least one resistance module for providing the host cell with antibiotic resistance, wherein the at least one replication module, the at least one expression module and the at least one resistance module are each flanked at both ends by at least one unique restriction site. The nucleic acid molecule according to the invention may represent a fully synthetic expression vector based on/comprising different origins of replication so as to allow for using various Gram(+) and Gram(−) hosts at the same time for expression. Thus, it is easily possible to change the cloning systems without the need of additional cloning. This also makes it easy to generate environmental DNA (eDNA) expression libraries for the functional screening in diverse hosts without focusing on either Gram(+) or Gram(−) organisms. Thus, the tool described herein may allow for the identification of novel biocatalysts, which, so far, were not functional in the limited number of vector compatible hosts. Moreover, the modular design of the nucleic acid molecule according to the invention allows for constructing diverse expression vectors which are each optimized for their intended use. To this end, each module of the nucleic acid molecule is flanked at both ends by at least one unique restriction site so that each module may be replaced by another module having a different function and/or effect. This measure makes it easy to combine different modules such that the resulting expression vector is optimally adapted to its intended application.
In an embodiment of the present invention the replication cassette for Gram-negative organisms can, for example, comprise a pBBR1 origin of replication. For example, the replication cassette for Gram-negative organisms can comprise the nucleotide sequence according to SEQ ID NO: 1.
In an embodiment of the present invention the replication cassette for Gram-positive organisms can comprise a pWV01 origin of replication, for example, a modified pWV01 origin of replication. In an embodiment of the present invention, the replication cassette for Gram-positive organisms can, for example, comprise the nucleotide sequence according to SEQ ID NO: 2. This sequence represents a modified pWV01 which is optimized for use in the nucleic acid molecule according to the invention.
The pWV01 origin of replication of Lactococcus lactis subsp. cremoris Wg2 is much smaller than the RCR origin of pGK12 and seems to have a higher performance in terms of copy number and stability. The pBBR1 origin of replication of Bordetella bronchiseptica S87 is widely used and compatible with IncP, IncQ and IncW group plasmids as well as with ColE1 and p15A containing plasmids. While the pBBR1 mode of replication is unclear, pWV01 replicates via the rolling circle mechanism. However, it is surprising that these origins are compatible and can be placed on the same vector without any interference. It is therefore an advantageous aspect of the invention that pBBR1 and an optimized pWV01 may be combined on the same completely synthesized vector.
In an embodiment of the present invention the unique restriction site can, for example, be selected from the group consisting of BglII, NotI, PmlI, and SapI. However, it is also possible to use other unique restriction sites as long as the modular character of the nucleic acid molecule according to the invention is maintained.
The nucleic acid molecule according to the present invention can, for example, further comprise at least one transcription termination sequence located downstream of the cloning site, for example, a transcription termination sequence selected from the group consisting of SEQ ID NO: 3 (new_Ter), SEQ ID NO: 4 (T7_Ter), SEQ ID NO: 5 (trpA_Ter), and SEQ ID NO: 6 (t500_Ter).
While an initially developed vector was able to replicate solely in Escherichia coli Stbl2, a specialized strain used to house unstable inserts containing repetitive sequences, deletion of some of its terminator sequences and having the common E. coli strain DH5α select which of the remaining terminators were compatible with stable replication and thus the maintenance of a full-length vector, resulted in the expression vector according to the invention. Surprisingly, this strain not only preferred a specific set of terminators (SEQ ID NO: 4, SEQ ID NO: 5, and SEQ ID NO: 6) but also generated a completely novel one (SEQ ID NO: 3).
In an embodiment of the present invention the expression module can, for example, comprise a promoter sequence. For example, the promoter sequence can comprise a Ptac promoter sequence (SEQ ID NO: 7).
In an embodiment of the present invention the expression module can, for example, comprise at least one regulatory sequence, for example, a lacI cassette and a lac operator sequence. In an embodiment of the present invention, the expression module can, for example, comprise the cloning site, for example, a multiple cloning site.
In an embodiment of the present invention the resistance module can, for example, comprise a chloramphenicol acetyl transferase (CAT) resistance cassette.
For example, the synthetic nucleic acid molecule according to the invention can, comprise at least one nucleotide sequence selected from the group consisting of:
The invention further concerns a prokaryotic cell including the nucleic acid molecule according to the invention as described above.
The invention also relates to a cell culture comprising at least one cell according to the invention.
Moreover, the invention relates to a polypeptide, produced by expression of a nucleotide sequence of interest in a prokaryotic host cell using the nucleic acid molecule according to the invention as described above.
A further aspect of the invention is the use of the nucleic acid molecule according to the invention as described above for heterologous expression of metagenomic expression libraries in at least one prokaryotic host cell, for example, for functional screening of environmental expression libraries. The nucleic acid molecule according to the invention can, for example, also be used for generating cDNA expression libraries of eukaryotic genes or introducing recombination cassettes for promoting specific knockouts or accelerating chromosomal localization.
Another aspect of the invention relates to a method for expressing at least one nucleotide sequence of interest in at least one prokaryotic host cell, said method comprising the following steps:
For example, the nucleic acid molecule may be heterologously expressed in at least one prokaryotic host cell.
In an embodiment of said method, the nucleotide sequence of interest can, for example, be part of a metagenomic library. According to the invention environmental DNA (eDNA) expression libraries can be easily generated and used for the functional screening in diverse hosts without focusing on either Gram(+) or Gram(−) organisms.
The invention is also directed at a method for producing a shuttle vector, said vector comprising at least one replication module comprising at least one replication cassette for promoting replication of a nucleic acid molecule in Gram-negative organisms and at least one replication cassette for promoting replication of a nucleic acid molecule in Gram-positive organisms, at least one expression module for promoting expression of a nucleotide sequence of interest in a host cell, and at least one resistance module for providing the host cell with antibiotic resistance, wherein said shuttle vector is obtained by assembling each of said modules, preferably such that the vector is optimized for its intended use. According to this aspect of the invention, different modules may be combined such that the resulting expression vector is perfectly adapted to its intended application. For example, each module of the vector may be flanked at both ends by at least one unique restriction site so that each module can be easily replaced by another module having a different function and/or effect.
The invention is further exemplarily described in detail with reference to the figures.
The term “synthetic nucleic acid molecule” as used herein refers to a nucleic acid molecule that is constructed by joining nucleic acid molecules using laboratory methods or that is chemically or by other means synthesized or amplified. The term “synthetic nucleic acid molecule” includes but is not limited to molecules that are chemically or otherwise modified but can base pair with naturally occurring nucleic acid molecules or to molecules that result from the replication of those described above. The term “synthetic nucleic acid molecule” further includes but is not limited to recombinant nucleic acid molecules.
The term “recombinant nucleic acid molecules” as used herein refers to nucleic acid molecules constructed by laboratory methods of genetic recombination (such as molecular cloning) to bring together genetic material from different sources.
The term “flanked module” as used herein refers to a consecutive sequence of nucleotides, wherein at least one specific element (such as a restriction site) abuts this sequence at both ends of the sequence, i.e. at both the 3′ and the 5′ end.
The term “heterologous expression” as used herein refers to a process wherein a gene or gene fragment is expressed in a host organism which does not naturally have this gene or gene fragment.
The term “metagenomic library” as used herein refers to a pool of genetic material recovered directly from environmental material comprising largely unbiased samples of all genes from all the members of the material. “Environmental material” may include but is not limited to environmental DNA (eDNA).
The term “nucleic acid” as used herein refers to a polymeric molecule comprising a consecutive sequence of nucleotide monomers (“nucleotides”). A nucleic acid molecule according to the invention may include but is not limited to deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and nucleic acid analogs such as peptide nucleic acids (PNA).
The term “replication module” as used herein refers to a consecutive sequence of nucleotides comprising at least one genetic element that is necessary to propagate a nucleic acid molecule comprising said consecutive sequence of nucleotides by producing at least one identical copy of said nucleic acid molecule in a living cell. Herein, the genetic element may include but is not limited to an “origin of replication” which is a particular sequence that is specifically recognized and bound by a protein complex in order to initiate the replication process.
The term “expression module” as used herein refers to a consecutive sequence of nucleotides comprising at least one genetic element that is suitable for performing a process by which information from a gene is used for the synthesis of a functional gene product in a living cell. Herein, the genetic element may include but is not limited to promoter and regulatory sequences.
The term “resistance module” as used herein refers to a consecutive sequence of nucleotides comprising at least one genetic element that is suitable for providing a living cell with resistance against a specific antibiotic.
The term “restriction site” as used herein refers to a consecutive sequence of nucleotides which is specifically recognized by a specific restriction enzyme that is able to cut a nucleic acid sequence between two nucleotides within said restriction site, or somewhere nearby.
The phrase “under stringent conditions” refers to conditions under which a nucleotide sequence will hybridize to a specific nucleic acid sequence, typically in a complex mixture of nucleic acids, but to essentially no other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances.
Longer sequences hybridize specifically at higher temperatures. Generally, stringent conditions are selected to be about 5-10 degrees Celsius lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength pH. The Tm is the temperature (under defined ionic strength, pH, and nucleic acid concentration) at which 50% of the nucleotide sequence complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50% of the probes are occupied at equilibrium). Stringent conditions will be those in which the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3, and the temperature is at least about 30 degrees Celsius for short sequences (e.g., 10 to 50 nucleotides) and at least about 60 degrees Celsius for long sequences (e.g., greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. Exemplary stringent hybridization conditions are often: 50% formamide, 5×SSC, and 1% SDS, incubating at 42 degrees Celsius, or, 5×SSC, 1% SDS, incubating at 65 degrees Celsius, with wash in 0.2×SSC, and 0.1% SDS at 65 degrees Celsius. Additional guidelines for determining hybridization parameters are provided in numerous references and are known by the person skilled in the art.
Methods to determine sequence identities between nucleic acid molecules are well known to a person skilled in the art and have been widely described, e.g., in US Patent Application 20140221623, which incorporated herein by reference in its entirety.
Restriction enzymes, the Rapid DNA ligation kit, and Phusion DNA polymerase were purchased from Fermentas (Thermo Fisher Scientific, St. Leon-Rot, Germany). Gibson Assembly™ Master Mix was purchased from New England Biolabs (Ipswitch, USA).
The initial shuttle vector pPolyREP (GenBank acc. No. KF680544.1) was completely synthesized by Geneart® (Thermo Fisher Scientific, St. Leon-Rot, Germany). The first modification was the removal of the tandem histidin terminator sequences (his-Ter) by a PCR with 5′-phosphorylated primers (pPR-HisT—1 & 2), followed by religation and transformation of E. coli DH5α. To rebuild the full-length vector starting from the isolated, truncated one, we conducted a Gibson Assembly [1] according to the manufacturer's manual with three amplificates. The first fragment was the original vector backbone with the newly generated terminator (new_Ter (SEQ ID NO: 3); primer pair GA_pPR—1 & 2), while the other two fragments were the missing lacI gene (primer pair GA_pPR—3 & 4) and the missing pBBR1 origin of replication (primer pair GA_pPR—5 & 6), respectively. By the last two PCRs we also replaced the tandem rrnB terminator sequence (rrnR_ter) with a T7 terminator sequence (T7_ter; SEQ ID NO: 4) in the overlapping region of the fragments. And finally we modified the pWV01 origin of replication according to Bryksin & Matsumura (2010) by an amplification with the primer pairs pWV01_opt—1 & 2 and insertion of the resulting 1,547 bp fragment in the shuttle vector via NotI, eventually yielding pPolyREPII. For expression trials, the gfp gene was amplified from pPT7-GFP (MoBiTec GmbH, Göttingen, Germany) with the primer pair GFP_for (5′-phosphorylated) and GFP_His6_rev and ligated to pPolyREPII via EcoRV and XbaI. This cloned gene encodes GFP-His6. All primer sequences mentioned here can be found in table 1. The correct sequences of all isolated vectors and inserts were confirmed by sequencing at MWG-Biotech AG (Ebersberg, Germany).
Escherichia coli Stbl2 [2] was purchased from Life Technologies Corporation (Thermo Fisher Scientific, St. Leon-Rot, Germany) and used to harbor pPolyREP and derivatives. E. coli DH5α [3] was used as a host for pPolyREPII and derivatives. Pseudomonas putida KT2440 [4] and Bacillus subtilis 168 [5,6] were kindly provided by Prof. Dr. Susanne Fetzner (University of Münster, Germany). B. licheniformis DSM13 [7] and Cupriavidus metallidurans CH34 [8,9] were purchased from DSMZ (German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany). The generation of competent cells, their transformation and selection with different concentrations of chloramphenicol is summarized in table 2. All strains harboring pPolyREP and pPolyREPII including their derivatives were grown in LB at 30° C. with the corresponding concentration of chloramphenicol. Autoinduction solutions M (50× stock: 1.25 M Na2HPO4, 1.25 M KH2PO4, 2.5 M NH4Cl, 0.25 M Na2SO4) and 5052 (50× stock: 25% (v/v) glycerol, 2.5% (w/v) D-glucose, 10% (w/v) α-lactose monohydrate) [10] were added to induce the cells for expression of gfp. Four hours prior to cell harvest, the expression was additionally induced by the addition of 0.5 mM isopropyl β-D-1-thiogalactopyranoside (IPTG).
Structural and Segregational Stability and Copy Number of pPolyREPII
First, a starting culture of the plasmid-harboring strain was prepared. For this, the strain was grown over night in chloramphenicol-containing LB medium. Then, OD600 was measured and fresh LB medium lacking chloramphenicol was inoculated with the starting culture to make an OD600 of 0.010. The thus inoculated culture was grown for one day and then used to again to inoculate fresh LB medium. A total of five passages was done. At the start and after two and five passages, a 100-μl sample of the culture was withdrawn, adequately diluted and identical volumes of solution were plated on LB agar plates with and without chloramphenicol, respectively. After incubation, colonies obtained were counted and the ratio of cells that retained pPolyREPII was calculated. For each of the strains two independent experiments were performed. The plating of the cell suspensions was done in triplicates each.
To check the segregational stability of pPolyREPII, the above mentioned starting culture was used to inoculate fresh LB medium complemented with chloramphenicol to make a starting OD600 of 0.010. After growth for one day, the culture was used to inoculate fresh chloramphenicol-containing LB medium. A total of five passages was done. The culture liquid of passage five was subjected to a plasmid isolation procedure. The kit-isolated plasmid DNA was digested with NotI and BglII and analyzed on an agarose gel. This experiment was performed twice independently for each of the strains.
To estimate the copy number of pPolyREPII, each of the strains was transformed with a reference plasmid with a known copy number. These strains and the corresponding pPolyREPII-harboring strains were grown overnight in LB medium with antibiotic and such pre-cultures were then used to inoculate fresh selective LB medium. Here, a starting OD600 value of 0.010 was adjusted and cells were incubated overnight. After OD600 measurement, cells were harvested (5500×g, 20 min, 10° C.) and the plasmid DNA (pDNA) was kit-isolated. DNA quantification was done using a NanoDrop™ 2000 (Thermo Scientific) and the pDNA was analyzed on an agarose gel. Based on quantification and comparison of band intensities using ImageJ (v1.48; open source; NIH, Bethesda, Md.) the pPolyREPII copy number in the different hosts was estimated. For this, above mentioned OD600 values were taken into account, serving as a measure for the amount of starting cell material for pDNA isolation and, hence, for pDNA total yield calculation.
Denaturing sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) was performed as described by Laemmli [11], using an overall acrylamide concentration of 12% and a cross-linker concentration of 2.6% in the separating gels. Polyacrylamide gels were stained with Coomassie blue R-250 (0.1% (w/w) Coomassie blue R-250, 50% (w/w) trichloroacetic acid in H2O) and destained in an aqueous solution of 30% (v/v) methanol and 10% (v/v) acetic acid. Transfer of proteins [12] from gels to nitrocellulose membranes (GE Healthcare Europe GmbH, Freiburg, Germany) was performed according to the protocol of QIAGEN (QIAexpress). Immunodetection of His6-tagged proteins on blots was performed using Anti-Penta-His IgG1 HRP conjugate (Qiagen, Germany) and detection by chemiluminescence. GFP-His6 synthesis in the crude extracts was visualized with a transilluminator (Blue LED Illuminator, excitation wavelength of 470 nm; NIPPON Genetics EUROPE GmbH, Germany).
Generation of the Initial Shuttle Vector pPolyREP
The shuttle vector was initially divided into functional subunits to facilitate the generation of defined modules that could be easily exchanged or deleted (
The third module was a chloramphenicol acetyl transferase (CAT) resistance cassette (
The fourth module was the expression module (
The introduction of transcription terminators can promote vector stability in different hosts [15,20]. Therefore, we introduced a series of different transcription terminators (for an overview see: [21]) downstream of each gene and the multiple cloning site. The complete synthesis and assembly of the full-length shuttle vector was carried out by GeneArt® (Thermo Fisher Scientific, St. Leon-Rot, Germany).
Evaluation of pPolyREP
The initial shuttle vector pPolyREP was propagated solely by the specialized E. coli strain Stbl2, which unlike the common strain DH5α can maintain DNA sequences containing unstable repeats [2]. A gfp gene with a downstream His6 tag sequence was cloned in this vector, generating pPR::GFP-His6. E. coli Stbl2, B. subtilis 168, B. licheniformis DSM13, C. metaffidurans CH34 and P. putida KT2440 were each transformed separately with pPolyREP and pPR::GFP-His6.
Although it was possible to select resistant transformants for all strains, GFP-His6 was only synthesized in E. coli Stbl2 and P. putida KT2440 (data not shown). This suggested that the other strains may have experienced problems similar to those reported in E. coli DH5α, namely, the truncation of the plasmid, reflecting the presence of multiple terminator sequences. It was also unclear whether pWV01 was functional, because only Gram(−) organisms were able to synthesize GFP-His6.
We therefore exploited the modular design of the shuttle vector and removed either of the two origins of replication by cutting the vector with BglII or NotI, generating pPR_pBBR1 and pPR_pWV01, respectively (
Optimization of pPolyREP to pPolyREPII
Our initial hypothesis explaining the failure of the shuttle vector to replicate in common E. coli strains was the large number and tandem organization of the terminator sequences. Analysis of the terminator regions in pPolyREP using the mfold program [22] showed that the tandem histidine terminators (his_Ter) and the tandem tryptophan terminators (trpA_Ter; SEQ ID NO: 5) were the strongest terminator regions, based on their calculated high Gibbs free energy values [23]. We therefore used PCR to delete the corresponding regions from pPolyREP and introduced the derived vectors into E. coli DH5α, which was unable to maintain the original vector. We recovered transformants containing the vector derivative with the histidine terminator sequences deleted but the tryptophan terminators remaining intact, indicating that the host experienced difficulty with the tandem histidine terminators. However the sequence of the isolated vector revealed additional truncations in the region between lacI and pBBR1, suggesting that the tandem rrnB terminators were also problematic. Interestingly, the tandem tR2 terminators (tR2_Ter) were also found to be modified by the partial deletion of one copy and the introduction of two point mutations. This generated a completely new terminator (new_Ter; SEQ ID NO: 3) as confirmed with the mfold program (
Evaluation of pPolyREPII
The full-length vector pPolyREPII was digested with BglII or NotI to generate the derivatives pPolyREPII(+) (pPRII(+)) and pPolyREPII(−) (pPRII(−)), which carry only the pWV01 or pBBR1 origins, respectively. Digesting these derivatives with BglII or NotI or BglII and MscI produced fragments of the anticipated sizes (
To further characterize pPolyREPII, we analyzed both its segregational and structural stability (
To estimate the copy number of pPolyREPII in the hosts tested, they were transformed with established vectors whose copy numbers are known and compared the yield of vector DNA isolated from a culture of the same OD600 nm. It was possible to estimate the copy numbers of pPolyREPII in all hosts tested except for B. licheniformis DSM13 due to isolation problems of vector DNA (Table 4).
As a result, a very effective broad host range expression vector is provided, which can be equally well established in Gram(+) and Gram(−) hosts. This is possible by a rational design and using the recent advantages in synthetic biology. The novel nucleic acid molecule (expression vector) according to the invention can be used to establish eDNA expression libraries and to screen for desired activities in different Gram(+) as well as Gram(−) hosts at the same time.
B. subtilis 168
B. licheniformis
C. metallidurans
E. coli Stbl2
E. coli DH5α
P. putida
E. coli DH5α
P. putida
C. metallidurans
B. subtilis 168
B. licheniformis
E. coli DH5α
P. putida
C. metallidurans
B. subtilis 168
B. licheniformis
Number | Date | Country | Kind |
---|---|---|---|
13 183 909.4 | Sep 2013 | EP | regional |