The present invention relates to a gene expression control system. In particular, to the use of a riboswitch to control the expression of a target gene, wherein when the riboswitch is not activated expression of the target gene is absent or very low, and when the riboswitch is activated the gene is expressed. Preferably the dynamic range of expression is low.
In some circumstances unwanted gene expression can be harmful or even toxic to a cell. Furthermore the regulation of expression of some genes is inherently leaky, that is the background level of gene expression, even without gene expression activation, can be relatively high or at least sufficient to cause harm to the cell. The aim of the present invention is to provide a gene expression control system which reduces or even eliminates background levels of gene expression and offers a tight regulation of gene expression.
According to a first aspect the invention provides a genetic construct comprising a DNA polynucleotide sequence which encodes a riboswitch operably linked to a coding region, wherein the coding region encodes a target gene and the riboswitch modulates translation or transcription of the coding region.
The target gene may be any gene of interest, but it is preferably a gene where background levels of expression, even at a low level, can cause harm to the cell.
The riboswitch in the genetic construct may reduce the background level of target gene expression by about 5%, 10%, 20%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or more. It may eliminate detectable background expression.
In this context the term “background expression” refers to the level of protein produced by a target gene in a cell under normal circumstances when expression of the gene is not desired, or expression of the target gene is not activated with an inducer. This level of expression is also sometimes referred to as the “leaky” level of expression, occurring because the gene promoter allows some expression even when not specifically activated.
In the genetic construct of the invention preferably the dynamic range of expression between the riboswitch being off and the riboswitch being on, and gene expression being activated, is low. The dynamic range may be between about 10 and about 100 fold of the off level, assuming that the off level is detectable. Preferably the dynamic range is between about 10 and about 20 fold that of the off level.
The riboswitch is preferably 5′ to the coding region. The coding region typically comprises at its 5′ terminus an ATG start codon.
The riboswitch may be an RNA molecule, such as mRNA. The riboswitch may comprise or consist of an aptamer domain, which is capable of specifically binding to an inducer, and an expression platform, which undergoes a conformational change (in response to the binding of the inducer to the aptamer domain) that promotes translation of the coding region.
The riboswitch may modulate translation of a coding region to which it is operably linked in response to contact of the aptamer domain with an inducer. The riboswitch may modulate translation of the coding region, in response to contact with an inducer, by positively regulating translation of the coding region (i.e. promoting translation of the coding region) or negatively regulating translation of the coding region (i.e. inhibiting translation of coding region). In a preferred embodiment the inducer activates the riboswitch such that it promotes translation of the coding region.
The expression platform of the riboswitch may comprise a nucleotide sequence encoding a regulatory domain that can be used to modulate translation or transcription of the coding region. The regulatory domain may be a ribosome binding site (RBS), which is also referred to as the Shine-Dalgarno (SD) sequence. The SD sequence is complementary to the 3′ end of the 16S rRNA. In Clostridia and Bacillus the sequence of the 3′ end of the 16S rRNA sequence may be:
and the consensus SD sequence may be:
which is followed by an initiation codon, most commonly AUG. In around 8% of cases the start site is GUG, whereas UUG and AUU are rare initiators present in autogenously regulated genes. The optimal spacing between the SD sequence and the start codon is 8 nt, but translation initiation is only severely affected if this distance is increased above 14 nt or reduced below 4 nt [Shine, J. and Dalgarno, L. (1975) Eur. J. Biochem. 57, 221.]. The skilled person would appreciate that in the absence of a functional RBS, ribosomes are incapable of binding to mRNA, and thus incapable of being translated into a protein.
In embodiments in which the riboswitch positively regulates translation of the coding region, the regulatory domain may, in the absence of an inducer, be sequestered by the expression platform, thus preventing binding of one or more ribosomes to the regulatory domain. Binding of the inducer to the aptamer domain may cause the expression platform to undergo a conformational change that releases (the formerly sequestered) regulatory domain, such that one or more ribosomes can bind to the regulatory domain and thus translate the coding region into a protein.
The riboswitch may alternatively act by blocking transcription of a coding region by creating a terminator, which in the presence of an inducer is removed.
The riboswitch may be activated by a non-natural or a natural agent which acts as the inducer.
The riboswitch may be a naturally-occurring riboswitch or a synthetic riboswitch. A naturally occurring riboswitch may be a riboswitch responsive to adenosylcobalamin, aquacobalamin, thiamin pyrophosphate, flavin mononucleotide, s-adenosylmethionine, molybdenum cofactor, tungsten cofactor, tetrahydrofolate, s-adenosylhomocysteine, guanine, adenine, prequeuosine-1-, 2′-deoxyguanosine, cyclic di-gmp, cyclic di-amp, cyclic amp-gmp, ztp, mg2+, mn2+, f−, ni2+/co2+, lysine, glycine, glutamine, glucosamine-6-phosphate, azaaromatics or guanidine.
A synthetic riboswitches may be a riboswitch responsive to tetracycline; neomycin; 2,4,6-trinitrotoluene (TNT); ammeline; 5-azacytosine; theophylline; pyrimido[4,5-d]pyrimidine-2,4,-diamine (PPDA); 2-aminopyrimido[4,5-d]pyrimidin-4(3H)-one-(PPAO) or 2,6-diamino preQ0-(DPQ0).
The aptamer domain of the riboswitch may specifically bind the inducer, such as, theophylline, and thus be referred to as a theophylline-responsive riboswitch. Theophylline is a purine that has high affinity for the aptamer domain of the theophylline-responsive riboswitch. The discriminatory capacity of the aptamer with respect to related purines, which are structurally similar, is very high. For example, the aptamer of the theophylline-responsive riboswitch has a binding affinity that is 10,000-fold greater for theophylline than that of caffeine, which only differs from theophylline with respect to a methyl group located at nitrogen atom N-7. The aptamer domain specific for theophylline can be used in to create a positive or a negative regulatory riboswitch.
The riboswitch may be activated by an inducer. The inducer may induce gene expression by binding to the aptamer of a riboswitch. Thus, in one embodiment, the inducer may be theophylline. However, the skilled person would appreciate that the inducer may be a molecule that is capable of specifically binding to an aptamer domain, such as adenosylcobalamin; aquacobalamin; thiamin pyrophosphate; flavin mononucleotide; s-adenosylmethionine; molybdenum cofactor; tungsten cofactor; tetrahydrofolate s-adenosylhomocysteine; guanine; adenine; prequeuosine-1-, 2′-deoxyguanosine; cyclic di-gmp; cyclic di-amp; cyclic amp-gmp; ztp; mg2+; mn2+; f−; ni2+/co2+; lysine; glycine; glutamine, glucosamine-6-phosphate, azaaromatics, guanidine; tetracycline; neomycin; 2,4,6-trinitrotoluene (TNT); ammeline; 5-azacytosine; theophylline; pyrimido[4,5-d]pyrimidine-2,4,-diamine (PPDA); 2-aminopyrimido[4,5-d]pyrimidin-4(3H)-one-(PPAO) or 2,6-diamino preQ0-(DPQ0). Similarly, the skilled person would appreciate that the aptamer domain may be any domain that specifically binds to an inducer.
In an embodiment, the riboswitch may be a positive regulatory theophylline-responsive riboswitch (i.e. a riboswitch that promotes translation of the coding region). The nucleotide sequence encoding the positive regulatory theophylline-responsive riboswitch may be as referred to herein as SEQ ID NO. 1, 2, 3, 4, 5, 6 or 7, as shown in Table 1.1. Riboswitches of SEQ ID NO. 1, 2 and 3 are known, whereas riboswitches of SEQ ID NO. 4, 5, 6 and 7 are new.
In an embodiment the riboswitch has the sequence of Seq Id No: 2.
In an embodiment the target gene may be an endonuclease. There are many different types of endonuclease that can be used in various “genome-editing” strategies where they are engineered to cut specific genomic target sequences. These include Zinc Finger Nucleases (ZFN), Transcription Activator-like Effector Nucleases (TALEN) and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) nucleases, homing meganucleases and standard restriction endonucleases (RE).
ZFNs include the Fok I endonuclease and an array of zinc finger binding domains that recognize the target DNA sequence. In TALENs, the zinc finger array is replaced by TAL effector repeats that guide targeting to the DNA. CRISPR/Cas9 genome editing requires a single guide (sg) RNA that directs the Cas9 endonuclease to a specific region of the genomic DNA, resulting in a DSB.
Homing meganucleases are part of selfish DNA elements, predominately introns (I-homing endonucleases) or encoded in-frame with a precursor protein as an intein (PI-homing endonucleases). They are characterised by their extreme specificity with target recognition sequences up to 40 bp in length. In this respect they differ from REs, whose target sequences consist of between 4 and 8 bp. The latter are of greatest utility in genome editing as the frequency of certain 8 bp recognition sequences in a genome can be extremely low, for instance, certain GC-rich 8 bp palindromic sequences can be entirely absent from AT-rich, clostridial genomes.
Homing meganucleases are divided into four families, characterized by common sequence motifs: LAGLIDADG, His-Cys-box, HN-H, and GIY-YIG. The former is the largest grouping. They contain two LAGLIDADG motifs and function as homodimers with one LAGLIDADG motif per polypeptide chain, e.g. I-CreI and I-MsoI, or as monomers with two motifs per polypeptide chain, e.g. PI-SceI, PI-PfuI, and I-DmoI.
In recent years genome editing strategies based on CRISPR/Cas (clustered regularly interspaced short palindromic repeats/CRISPR-associated proteins) have received particular prominence. They enable manipulation of the genome through the introduction of double-strand breaks in the DNA. In particular, the type II CRISPR-Cas9 system derived from Streptococcus pyogenes, has been the most extensively employed CRISPR system for genome editing purposes. In this system, the hybrid CRISPR RNA (crRNA):trans-activating crRNA (tracrRNA), or the simplified chimeric synthetic single guide RNA (sgRNA), combines with Cas9 to form a ribonucleoprotein complex. This complex then recognizes the target site, based on the protospacer adjacent motif (PAM) sequence, and induces a double strand break (DSB). In most bacteria, selection of mutant cells is achieved when the DNA editing template, which lacks the recognition site, is introduced into the genome via homologous recombination, enabling these mutant cells to “escape” from the cutting activity of Cas9.
Even though CRISPR/Cas9-based systems have been previously used in several Clostridium species, including C. pasteurianum, C. acetobutylicum, C. beijerinkii and C. difficile, this technology is still hindered by low transformation efficiencies, possibly related to the large size of the plasmid and the strong selection power of Cas9. The use of a Cas9 nickase (nCas9), a Cas9 mutant that introduces a breakage on only one strand of the chromosome, has been also demonstrated in Clostridium spp.; however, because nCas9 has a less powerful selection capacity than Cas9, the isolation of mutants becomes more time consuming, requiring several passages onto fresh medium.
The present invention provides for the first time the use of a synthetic riboswitch to control the expression of a DNA endonuclease, and in particular to control the expression of Cas9 expression, for use in CRISPR genome editing. More specifically, the invention provides the use of a synthetic riboswitch to control endonuclease, in particular Cas9, expression in the genus Clostridium.
The present invention provides the advantage that when unactivated the riboswitch prevents or reduces unwanted background levels of endonuclease activity in a cell. The present invention also has the advantage that when the riboswitch in activated by an inducer the endonuclease is expressed but the level of expression is low, it is sufficient to allow genome editing but not high enough to cause significant off target effects.
Also, CRISPR applications may be hampered by low transformation efficiencies. This may be because the expression of the nuclease before homologous recombination occurs leads to cell death and, as such, very few (sometimes none) colonies escape/survive from the activity of the nuclease. The use of an inducible expression control system that minimizes the expression of the nuclease until induced allows transformation efficiency to be improved. Furthermore, the small size of the riboswitch, around 89 nucleotides, does not add much to the construct size. This is in contrast to the incorporation of a transcription factor-based inducible system into a vector, which would generally add at least about 1.5 kb to an already large vector (i.e., cas9 is a very large gene, 4.2 kb). The size of the vector relates to the efficiency of transformation. A vector with cas9 under the control of a riboswitch will be smaller than a vector where cas9 is regulated via a transcription factor-based inducible system.
The endonuclease may be associated with CRISPR gene editing. The endonuclease may be Cas9, Cas9 nickase, dCas, Cpf1, C2c1, C2c2, C2c3, a Cas9 derivative, or any endonuclease suitable for use with CRISPR gene editing, or a homolog or functional variant thereof. The endoncuclease may be a Cas-derivative fused with a deaminase such as cytidine deaminase or adenine deaminase, the Cas-derivative may be any Cas9 effector protein such as Cas9 nickase or dCas9. Preferably the endonuclease is Cas9.
In an alternative embodiment the target gene may be an endonuclease characterised by a target recognition sequence of at least 8 bp. This includes, but is not restricted to, restriction enzymes such as the following (target sites are in brackets): AbsI (CCTCGAGG), SbfI/SdaI (CCTGCAGG), MreI (CGCCGGCG), MauBI (CGCGCGCG), SgrDI (CGTCGACG), Srfl (GCCCGGGC), AsiSI/SfaAI (GCGATCGC), Notl (GCGGCCGC), FseI (GGCCGGCC) and AcsI/SgsI (GGCGCGCC). Also included in this category are the meganucleases that have much larger recognition sites. These include, but are not restricted to, meganucleases such as the following (target sites are in brackets):—
In common with Cas9, the regulated expression of any of these enzymes may be used to introduce DSBs into the genome of the target cell wherever a recognition site is present. For example, various recombination strategies have been devised based on the regulated production of I-SceI in a cell in which its target site has been introduced into the genome. In its simplest form, a plasmid carrying a mutant allele is integrated into the genome via homologous recombination between a flanking left homology arm (LHA) and a right homology arm (RHA)—a knock-out (KO) cassette—together with an I-SceI restriction site that resides outside of the KO cassette. Regulated production of I-SceI, using an inducible promoter system such as the riboswitch, will result in the cleavage of the genome of all of those cells which carry the integrated plasmid, together with the I-SceI recognition site, leading to cell death. Those plasmids in which the plasmid has excised, as a consequence of homologous recombination between the duplicated LHA or RHA, will lack a I-SceI site and survive. Dependant on which homology arm mediates recombination, the surviving cells will carry either a wildtype or mutant allele. Other, more sophisticated variations of this approach exist, where the use of I-SceI or similar is combined with the use of lambda-Red technology etc, but the principle remains the same, ie., regulated production of I-SceI leads to elimination of the unwanted cells, and the enrichment/selection of the desired bioengineered variants. This approach may be taken with any meganuclease or restriction endonuclease, not just I-SceI.
In an alternative embodiment the target gene may be a sigma factor. Sigma (σ) factors control the promoter selectivity of bacterial RNA polymerase (RNAP). On binding to RNAP, σ factors allow efficient promoter recognition and transcription initiation. Aside from promoter recognition, they contribute to DNA strand separation, and then dissociate from the core enzyme following transcription initiation. Procaryotes produce a number of different σ factors, each of which recognises a specific promoter sequence. In this way, the production of one particular sigma factor can simultaneously regulate the expression of discrete sets of genes which are under the control of the target promoter sequence. Sigma factors are classified into two structurally unrelated families, the σ70 and the σ54 families. The σ70 family includes primary sigma factors responsible for the expression of housekeeping genes (e.g., σA in Bacillus subtilis) as well as related alternative sigma factors; σ54 forms a distinct subfamily of sigma factors referred to as σN. The number of genes can vary dependent on the sigma factor. The expression of most genes in a bacterial cell is dependent on the expression of the ‘housekeeping’ sigma factor σ70, but bacteria can express different sigma factors in response to different environmental conditions.
Alternative sigma factors can be responsible for the expression of a small subset of genes, which can be extremely limited. One such class of sigma factor are those responsible for the expression of the large, clostridial extracellular virulence factors of pathogenic strains of Clostridium botulinum, Clostridium tetani and Clostridium difficile, and a bacteriocin by Clostridium perfringens. These particular sigma factors have been assigned to σ70 group 5 (Dupuy and Matamouros, 2006, Research Microbiology, 157: 201-205) and recognise highly specific promoter elements which uniquely precede the toxin/bacteriocin genes of these bacteria. No other genes in the genome are known to be under the transcriptional control of these sigma factors. As the large size of these toxins/bacteriocins places an appreciable metabolic burden on the cell, it is important that their production is not constitutive but limited to specific conditions when they are required, ie., in the case of C. difficile when growing in the GI tract. Accordingly, their expression needs to be tightly regulated. This is achieved by making gene transcription absolutely reliant on a specific σ factor, and tightly regulating production of the σ factor.
The group 5 RNA polymerase sigma factor may be TcdR (from Clostridium difficile), BotR (from Clostridium botulinum), TetR (from Clostridium tetani) or UviA (from Clostridium perfringens). Preferably, the group 5 RNA polymerase sigma factor is BotR. Preferably, the group 5 RNA polymerase sigma factor is TetR. Preferably, the group 5 RNA polymerase sigma factor is TcdR. Preferably, the group 5 RNA polymerase sigma factor is UviA. The group 5 RNA polymerase sigma factor may have a sequence identity or sequence homology of at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% of, or is identical to, one or more of BotR, TetR, TcdR or UviA.
Due to the specificity of this family of σ70 factors, expression of either native or heterologous genes preceded by the requisite target promoter are absolutely reliant on availability of the σ factor. For example expression of the various genes encoding the components of the Clostridium botulinum toxin complex, is reliant on the presence of BotR, as will be the expression of any heterologous gene placed under the transcriptional control of the two promoter targets, those that precede either the ntnh (Pntnh) or the ha70 (Pha70) gene. It follows that if σ factor expression (BotR) is placed under the control of the riboswitch, then effectively there will be no transcription from their target promoters (Pntnh or Pha70), and consequent production of the encoded products of the downstream genes, in the absence of inducer. This tight level of repression may be advantageous from a safety perspective as it will limit the production of potentially highly toxic molecules, such as botulinum toxin, to those circumstances where such production is required, eg., in the commercial production of toxin for cosmetic or therapeutic purposes.
As a consequence of the low dynamic range of induction of the genetic construct of the invention, if the target gene is BotR, or a similar σ factor, production of BotR or the similar σ factor may be relatively low upon activation of the riboswitch. However, because each σ factor molecule brings about multiple transcriptional events at their target promoter (Pntnh or Pha70, in the case of BotR) the relative level of induction may be amplified. Thus, the level of expression of the gene concerned may be higher than if the promoters concerned (Pntnh or Pha70, in the case of BotR) were place directly under the control of the riboswitch.
The genetic construct may comprise a) a promoter suitable for use in a prokaryotic host, b) a riboswitch to regulate translation, and c) the coding region for a target gene.
The genetic construct of the invention may be introduced into a host cell by using any suitable means, such as endocytic uptake, microinjection, ballistic bombardment, a particle gun, electroporation, transduction, transfection, infection or cell fusion. Preferably, the genetic construct is introduced into the cell by using a vector.
Thus, according to another aspect of the invention, there is provided a vector comprising the genetic construct of the invention.
The vector may be a recombinant vector. The vector may be a virus, a virus-like particle, a plasmid, a cosmid, a phage, a transposon or a liposome.
According to another aspect of the invention, there is provided a host cell comprising the genetic construct according to the invention or a vector according to the invention.
The host cell may be a bacterium, a plant, an algae, a fungi or a protozoa. Preferably, the cell is a bacterium.
The bacterium may be a Gram positive bacterium or Gram negative bacteria. The bacteria may be of the genus Bacillus or Clostridium. The bacterium may be Clostridium sporogenes.
The bacterial cell may be any bacterial species, but preferably members of the bacterial phylum Firmicutes composed of the class Clostridia (orders Clostridiales, Halanaerobiales, Natranaerobiales and The rmoanaerobacterales), the class Bacilli (orders Bacillales and Lactobacillales) and the class Mollicutes (orders Acholeplasmatales, Anaeroplasmatales, Entomoplasmatales, Haloplasmatales and Mycoplasmatales).
The bacterium may be within the order of Clostridiales, Halanaerobiales, Natranaerobiales, Thermoanaerobacterales, Bacillales, Lactobacillales, Acholeplasmatales, Anaeroplasmatales, Entomoplasmatales, Haloplasmatales or Mycoplasmatales. Preferably, the bacterium is within the order of Clostridiales.
Within the order Clostridiales is the genus, Clostridium. Preferred species are C. aceticum, C. acetobutylicum, C. aerotolerans, C. baratii, C. beijerinckii, C. bifermentans, C. botulinum, C. butyricum, C. cadaveris, C. cellulolyticum, C. chauvoei, C. clostridioforme, C. colicanis, C. difficile, C. drakei C. estertheticum, C. fallax, C. feseri, C. formicaceticum, C. glycolicum, C. histolyticum, C. innocuum, C. kluyveri, C. ljungdahlii, C. lavalense, C. magnum. C. mayombei, C. methoxybenzovorans, C. novyi, C. oedematiens, C. paraputrificum, C. pasteurianum, C. perfringens, C. phytofermentans, C. piliforme, C. ragsdalei, C. ramosum, C. scatologenes, C. septicum, C. sordellii, C. sporogenes, C. sticklandii, C. tertium, C. tetani, C. thermocellum, C. thermosaccharolyticum, C. tyrobutyricum, C. paprosolvens, C. saccharobutylicum, C. carboxidovorans, C. scindens C. autoethanogenum, C. diolis, C. aurantibutyricum, C. felsineum, C. puniceum, C. roseum, C. saccharoperbutylacetonicum, C. tetanomorphum, and Clostridioides difficile, as well as other acetogenic anaerobes, such as, Acetobacterium woodii, Acetonema longum, Alkalibaculum bacchi, Blautia producta, Butyribacterium methylotrophicum, Eubacterium limosum, Oxobacter pfennigii, Moorella thermoacetica, Moorella thermoautotrophica, Thermoanaerobacter kiuvi.
Within the order Bacillales are Bacillaceae which include the genera Bacillus and Geobacillus, and Staphylococcaceae, which include the genus Staphylococcus. Preferred Bacillus species are: B. alcalophilus, B. aminovorans, B. amyloliquefaciens, B. anthracis, B. caldolyticus, B. circulans, B. coagulans, B. cereus, B. globigii, B. licheniformis, B. natto, B. polymyxa, B. phaericus, B. stearothermophilus, B. smithii, B. subtilis, B. thermoglucosidasius, B. thuringiensis and B. vulgatis. Preferred Geobacillus species are: G. debilis, G. stearothermophilus, G. thermocatenulatus, G. thermoleovorans, G. kaustophilus, G. thermoglucosidasius, G. thermodenitrificans, G. gargensis, G. jurassicus, G. lituanicus, G. pallidus, G. subterraneus, G. tepidamans, G. thermodenitrificans, G. thermoglucosidasius, G. thermoleovorans, G. toebii, G. uzenensis and G. vulcani. Preferred Staphylococcus species include: S. arlettae, S. aureus, S. auricularis, S. capitis, S. caprae, S. carnosus, S. chromogenes, S. cohnii, S. condiments, S. delphini, S. devriesei, S. epidermidis, S. equorum, S. fells. S. jleurettii, S. gallinarum, S. haemolyticus, S. hominis, S. hyicus, S. intermedius, S. kloosii, S. leei, S. lentus, S. lugdunensis, S. lutrae, S. lyticans, S. massiliensis, S. microti, S. muscae, S. nepalensis, S. pasteuri, S. pettenkoferi, S. piscsfermentans, S. pseudintermedius, S. pulvereri, S. rostri, S. saccharolyticus, S. saprophyticus, S. schleiferi, S. sciuri, S. simiae, S. simulans, S. stepanovicii, S. succinus, S. vitulinus, S. warneri and S. xylosus.
The bacterial cell may be C. acetobutylicum, C. difficile, C. beijerinckii, C. ljungdahlii, C. kluyveri, C. botulinum, C. beijerinckii, C. autoethanogenum, C. pasteurianum, C. saccharobutylicum, C. carboxidovorans, C. cellulovorans, C. sporogenes, C. phytofermentans, C. ragsdalei, C. tyrobutyricum, C. perfringens, C. butyricum, C. cellulolyticum, C. formicaceticum, C. novyi, C. scatologenes, C. septicum, C. sordellii, C. sticklandii, C. tetani, C. thermocellum, C. thermosaccharolyticum, C. paprosolvens, C. scindens, or C. bifermentans.
Preferably, the bacterial cell is a species selected from the group consisting of C. acetobutylicum, C. aerotolerans, C. autoethanogenum, C. baratii, C. beijerinckii, C. bifermentans, C. botulinum, C. butyricum, C. cadaveris, C. cellulolyticum, C. cellulovorans, C. chauvoei, C. clostridioforme, C. colicanis, C. difficile (now renamed Clostridioides difficile), C. estertheticum, C. fallax, C. feseri, C. formicaceticum, C. histolyticum, C. innocuum, C. kluyveri, C. ljungdahlii, C. lavalense, C. novyi, C. oedematiens, C. paraputrificum, C. pasteurianum, C. perfringens, C. phytofermentans, C. piliforme, C. ragsdalei, C. ramosum, C. roseum, C. saccharoperbutylacetonicum, C. scatologenes, C. septicum, C. sordellii, C. sporogenes, C. sticklandii, C. tertium, C. tetani, C. thermocellum, C. thermosaccharolyticum, C. tyrobutyricum, C. paprosolvens, C. saccharobutylicum, C. carboxidovorans, C. scindens, or C. autoethanogenum.
The bacterial cell may be C. phytofermentans, C. hylemonae, C. leptum, C. symbiosum, C. nexile, C. ramosum, C. bolteae, C. asparagiforme, C. methylpentosum, C. butyricum, C. sporogenes and C. scindens.
The bacterial cell may be Cupriavidus necator or metalodurans or is a cyanobacteria
In a preferred embodiment the host cell is a Clostridium cell. One particular challenge with Clostridium is to identify a workable inducible expression system. Clostridium is a large genus of Gram-positive, anaerobic, spore-forming bacteria that includes representatives relevant to both human and animal diseases as well as to the industrial production of chemicals and fuels. Whilst the majority of these species are studied for independent purposes, the emerging field of synthetic biology brings them all together under the same scope—the engineering of novel strains with new functionalities. These designated novel strains are on the one hand facilitating the study of fundamental biological processes and on the other hand, they are advancing biotechnological applications. Such applications include the production of platform chemicals and biofuels (e.g., Clostridium pasteurianum, Clostridium acetobutylicum); cellulosic and hemicellulosic biomass degradation (e.g., Clostridium celluloliticum); carbon fixation (e.g., Clostridium ljungdahlii and Clostridium autoethanogenum) and anti-cancer therapeutics (e.g., Clostridium sporogenes). Only a few studies regarding the use of inducible expression systems in Clostridium have been reported. These include a lactose-inducible system (LAC) from Clostridium perfringens, an arabinose-inducible system (ARA), based on the ARAi regulon from C. acetobutylicum, and the tetracycline-inducible system (TET), originally adapted from E. coli. Despite evidence of a dose-dependent induction, both, the LAC and ARA systems are hampered by a significant level of gene expression in the absence of the inducer, compromising their dynamic range and making them unsuitable for applications where a tight control of gene expression is needed. On the other hand, the TET system exhibits very low basal expression and has the highest inducing efficiency among the reported inducible promoters applied into Clostridium spp. thus far. However, optimal working conditions of the TET system, require high doses of the inducer, but elevated concentrations of the tetracycline-analogue, anhydrotetracycline, demonstrated significant inhibitory effects on cell growth.
Another important limitation that Clostridium spp. faces towards their potential application in synthetic biology projects, is the lack of fast and reliable methods for chromosomal manipulation. Traditionally, chromosomal modifications have been primarily achieved via insertional mutagenesis using ClosTron (Heap, J. T. et al. J. Microbiol. Methods 80, 49-55 (2010)) or via a special form of allelic exchange termed allele-coupled exchange (ACE) (Heap, J. T. et al. Nucleic Acids Res. 40, e59 (2012)); unfortunately, both methods are far from ideal. In ClosTron, for example, the end product is not a true deletion of the gene but rather an interruption of the gene's function which may also lead to polar effects on downstream genes. On the other hand, although ACE allows a more precise modification of the genome, it is lengthy and relies on a counter selection marker (such as pyrE, which may not always be available).
The genetic construct of the invention addresses many of the problems currently faced with respect to Clostridium spp.
According to a further aspect the invention provides a kit for regulating expression of a target gene, wherein the kit comprises the genetic construct of the invention. If the target gene is an endonuclease for use in CRISPR gene editing the kit may further comprise a sequence-specific guide RNA.
According to a further aspect the invention provides a method of controlling expression of a target gene in a cell comprising:
If the target gene is an endonuclease the method may further comprise transforming the cell with gRNAs needed to target the endonuclease activity, or using a cell which already contains the gRNAs needed to target the endonuclease activity.
According to a yet further aspect the invention provides a method of controlling expression of a target gene in a cell comprising:
Where the target gene is an endonuclease the cell may further comprise gRNAs needed to target the endonuclease activity.
All of the features described herein (including the accompanying claims, abstract and drawings) and/or all of the steps of any method or process so disclosed, may be combined with any of the above aspects in any combinations, except combinations where at least some of such features and/or steps are mutually exclusive.
For a better understanding of the invention, and to show how embodiments of the same may be carried into effect, reference will now be made, by way of example, to the accompanying Figures, in which:—
All the E. coli and Clostridium strains used in this study are listed in Table 2. E. coli TOP 10 (Invitrogen) was used as a general host for plasmid construction and propagation. E. coli CA434 was used as the donor strain for conjugation. Plasmid DNA for the transformation of C. pasteurianum was methylated in vivo by propagation in the E. coli host CR1, which harbours the plasmid pCR1, encoding the M.BepI methyltransferase, as previously described (Schwarz, K. M. et al. Metab. Eng. 40, 124-137 (2017)). All E. coli strains were transformed through electroporation using a MicroPulser™ system (BioRad). E. coli strains were grown at 30 or 37° C., in Luria-Bertani (LB) medium supplemented with chloramphenicol (25 μg/mL in solid and 12.5 μg/mL in liquid media), erythromycin (500 μg/mL) or kanamycin (50 μg/mL) when necessary. Growth media for the different Clostridium species are specified on Table 3. Media for clostridial strains were supplemented with the following antibiotic/inducer/supplement when appropriate: thiamphenicol (15 μg/mL), erythromycin (10 μg/mL), cefotoxin (16 μg/mL), D-cycloserine (500 μg/mL), theophylline (0.1-10 mM), glucose 0.05% w/v. Clostridium strains were grown at 37° C. in an anaerobic cabinet (MG1000 anaerobic workstation; Don Whitley Scientific Ltd).
All PCR reactions were performed using KOD-Hot Start Polymerase 2× Master Mix (Merck Millipore) or DreamTaq Green PCR Master Mix (Thermo Fisher Scientific). T4 ligase (Promega) was used for DNA ligation reactions. Restriction enzymes were purchased from New England Biolabs. Theophylline was purchased from Sigma-Aldrich.
Oligonucleotide primers were synthesized by Sigma-Aldrich and are listed in Table 4. Plasmids were constructed by restriction enzyme-based cloning procedures. Constructs were verified by DNA sequencing (Eurofins). All the plasmids used in this study are listed in Table 5. Details of plasmid construction are given in the Supporting Information. Protospacer sequences were designed according to the protocol described at http://benchling.com/pub/ellis-crspr-tools.
For each tested system, three independently conjugated C. sporogenes cultures were grown for 12 hours with selection before being diluted to a starting OD600 of 0.01 in fresh medium. For evaluation of CAT activity at a single time point, cultures were induced at an OD600≈0.5 with 2 mM theophylline; cultures were collected at stationary phase 4 hours after induction. For CAT expression assays over time, cultures were prepared as for the single time point measurements, collecting the samples at the specific data points. For dose-dependency assays, cultures were induced with increasing concentrations of theophylline (0-10 mM). In all cases, after sample collection, pellets were obtained and stored at −20° C. until CAT activity was determined.
CAT activity was measured on cell lysates according to the method of Shaw Methods Enzymol. 43, 737-55 (1975). Cell lysates were obtained using BugBuster Master Mix lysis buffer (Novagen), according to the manufacturer's protocol. 150 μL of a master mix consisting of 94 mM Tris buffer (pH 7.8), 0.19 mM acetyl Coenzyme A, 0.0833 mM DTNB (5,5′-dithiobis-2-nitrobenzoic acid) and 0.005% (w/v) chloramphenicol, were injected into each well of a 96-well clear-bottom plate (Greiner Bio One International) containing 10 μL of cell lysates. Absorbance was measured at 412 nm for 1 min using a CLARIOstar plate reader (MBG Labtech GmbH), set at 25° C. The rate of increase of absorption was used to calculate CAT activity (U/ml) using the following equation, where 0.2 is the total volume (in mL) of assay, df is the dilution factor, 0.0136 is the micromolar extinction coefficient for DTNB at 412 nm and 0.01 is the volume of cell lysate used.
CAT activity was further normalized by the total protein concentrations obtained using a BCA assay (Thermo Scientific).
For growth experiments, pre-cultures grown for 12 hours were diluted to a starting OD600 of 0.01 with fresh medium containing different concentrations of the inducer theophylline (0-10 mM). Because theophylline is dissolved in DMSO, appropriate quantities of DMSO were added to all the cultures to account for any effect DMSO might have on growth.
Prior to RNA extraction, 2 mL of C. sporogenes cultures grown in TYG to early stationary phase (OD600≈2.5) were mixed with 4 mL of RNA Protect Bacteria reagent (Qiagen). Total RNA was extracted using the FastRNA Pro Blue kit (MP biomedicals), according to the manufacturer's instructions. Purified RNA was DNase-treated using the RQ1 RNase-Free DNase kit (Promega). cDNA synthesis was performed on 1 μg of RNA using the Omniscript RT kit (Qiagen). 5 μL of 1:10 diluted cDNA mixtures were used to perform qRT-PCR analysis using the Power SYBR Green Master Mix (Thermo Fisher Scientific) on a Light Cycler 480 II (Roche). cDNA synthesis reactions containing no reverse transcriptase were included as a control for genomic DNA contamination. Primer efficiencies were calculated for each primer set prior to use. qRT-PCR was performed on cDNA isolated from three biological replicates and in technical duplicates for each cDNA sample and primer pair. Results were calculated according to the E-method58 and normalized to the 16Srrn and gyrA genes.
To analyse if C. sporogenes consumes theophylline, 12 hour cultures were diluted to a staring OD600 of 0.01. At an OD600 of ≈0.5 cultures were induced with 2 mM theophylline. 1 mL samples were taken immediately and then at 2, 4, 6, 8, 10 and 20 hours after supplementation of theophylline. Samples were centrifuged for 10 min and 10000×g. Supernatants were used to determine the concentration of theophylline using an Alliance HT 2795 HPLC (Waters corporation) coupled to a Micromass Quattro LC Mass Spectrometer (Waters Wilmslow) in positive mode from m/z 70 to 500 for all samples at a scan rate of 1 cycle/s and equipped with an electrospray source. The general parameters were as follows: capillary voltage 2000 V, sampling cone 30 V, source temperature 130° C., desolvation temperature 350° C., cone gas flow 64 L/h and desolvation gas flow 621 L/h.
Prior to ionization, chromatographic separation was achieved using a Supelco Ascentis Express HPLC column (100 mm×3 mm, 2.7 μM, Sigma Aldrich). The mobile phase consisted of (A) water with 0.1% v/v formic acid and (B) methanol with 0.1% v/v formic acid. In all HPLC runs the elution gradient started at 95% A, 5% B increasing to 10% A, 90% B, followed by a 5 min re-equilibration period. A sample volume of 10 μL was injected for each HPLC run. The column was operated at 40° C. with a flow rate of 0.4 mL/min. The HPLC run contained blanks and the sample-relevant standard solution. Samples and standards were filtered using a 0.2 μM filter.
For conjugation efficiency assays, the donor strain, E. coli CA434 (in triplicates), was grown overnight at 30° C. in LB supplemented with kanamycin and chloramphenicol. For each replica, two 1 mL cultures were centrifuged at 6,000×g for 1 min and then washed once with Phosphate-Buffered Saline (PBS). After a second centrifugation step, one of the two cultures was used to quantify the donor by plating the appropriate serial dilutions onto LB plates. The second culture was transferred to the anaerobic cabinet and mixed with 200 μL of a 12-hour C. sporogenes culture. The mixture was spotted onto the TYG medium supplemented with glucose. After 24 hours, cells were harvested, re-suspended in 500 μL of PBS and plated onto media supplemented with thiamphenicol and D-cycloserine, in the presence and absence of the inducer theophylline. After 24-48 hours of incubation at 37° C., colonies were counted and conjugation efficiency was calculated as total transconjugants per 1 mL of donor strain.
Transformation efficiency in C. pasteurianum was determined as the average of three independent transformations with 4 μg of plasmid DNA.
Single colonies were picked randomly and screened for desired mutants using colony PCR and specific flanking primers. In all cases, mutations were further confirmed with Sanger sequencing (data not shown).
5′-Terminal mRNA Analysis
To determine the transcriptional start site (TSS) of Pfdx, we sequenced the 5′-terminal end of the untranslated region (5′UTR) of the mRNA. Total RNA was extracted using the High Pure Isolation Kit (Roche). poly-T-cDNA was obtained using the 5′RACE kit 2nd Generation (Roche) and the Expand High Fidelity PCR System (Roche) as per manufacturer's instructions. The kit requires three specific primers (IC263-3, IC264-r and IC265-r), all of them complementary to the mRNA transcript (catP). After amplification, the TSS was determined according to 5′UTR and the putative −10 and −35 boxes on the basis of typical characteristics of bacterial promoters.
GGUGAUACCAGCAUCGUCUUGAUGCCCUUGGCAGCACCCUGCUAAGGUAACAACAAG
GGUGAUACCAGCAUCGUCUUGAUGCCCUUGGCAGCACCCUGCUAAGGAGGUAACAACAAG
GGUGAUACCAGCAUCGUCUUGAUGCCCUUGGCAGCACCCUGCUAAGGAGGCAACAAC
GGUGAUACCAGCAUCGUCUUGAUGCCCUUGGCAGCACCCUGCUAAGGAGGUAACAAC
GGUGAUACCAGCAUCGUCUUGAUGCCCUUGGCAGCACCCUGCUAAGGAGGUAACUUA
GGUGAUACCAGCAUCGUCUUGAUGCCCUUGGCAGCACCCUGCUAAGGAGGUGUGUUA
GGUGAUACCAGCAUCGUCUUGAUGCCCUUGGCAGCACCCUGCUAAGGAGGUCAACAAG
E. coli TOP10
E. coli CA434
E. coli TOP10-pCR1
Clostridium
sporogenes
C. spo-ΔspollR
C. spo-ΔspollE
C. spo-spollE::ARC
C. spo-Δc16380
C. spo-*c03700
C. spo-Δc01750
C. spo-Δc04580::cargo
C. spo-*c14540::cargo
C. spo-Δc05250-
C. spo-Δc14780
C. spo-pyrE::cargo
C. spo-c31080
C. spo-c29800
Clostridium
pasteurianum
C. past-ΔspollE
Clostridium
difficile 630
C. diff-ΔspollE
Clostridium
botulinum
C. bot-ΔspollE
C. sporogenes and C.
botulinum grown in solid or
C. posteurianum grown in
C. posteurianum grown in
C. difficile grown in solid or
C. sporogenes-spoIIR deletion
C. sporogenes-spoIIE deletion
C. sporogenes-ARC integration
C. pasteurianum-spoIIE deletion
C. difficile-spoIIE deletion
C. botulinum-spoIIE deletion
aLHA, left homology arm
bRHA, right homology arm
posteurianum
difficile
botulinum
aefficiency (%)
C.posteruianum DMS 525
C.difficile 630
C.botulinum ATCC 3502
C.sporogenes NCBI 10696
a(Number of corretly edited transformants/total number of transformants screened) × 100
Demonstration that Riboswitches Work in Clostridium
Diverse theophylline-dependent riboswitches work in both Gram-positive and Gram-negative bacteria (Topp, S. et al. Appl. Environ. Microbiol. 76, 7881-4 (2010)), however the data presented here is the first to show the performance of this inducible translational regulatory system in Clostridium sporogenes NCIMB 10696. Three known riboswitch sequences were selected, riboswitch-D, -E and -E* (sequence ID Nos: 1, 2 and 3 respectively), which were expected to exhibit higher translation efficiencies in the ‘on’ state. In previous publications, riboswitch E* was shown to be the best theophylline-dependant riboregulator in Gram-positive bacteria, thanks to its high dynamic range and its very low basal expression. 4 new riboswitches (named -F to -I, sequence ID Nos: 4 to 7) were constructed by rationally modifying the space between the Shine Dalgarno sequence (SD) and the translational start site of the original riboswitch E* (Table 1). A reporter plasmid, pMTL-IC101, derived from the pMTL82251 vector (Heap et al, J. Microbiol. Methods 78, 79-85 (2009)) was used as a chassis for all the constructs created thereof (
The synthetic theophylline responsive riboswitch is composed of an aptamer and a synthetic SD (
Due to the importance of having inducible systems where gene expression can be controlled within a large regulatory window, the expression pattern of riboswitches E, G and H when placed downstream of the core region of a synthetically weakened Plea, named Ph4 was compared. Ph4 was generated by replacing the core elements −10/−35 of Pfdx with the same regulatory elements of the constitutive ptb promoter from C. acetobutylicum ATCC 824 (Pptb, associated with the protein coding gene Ca_3076) (
To assess the dose dependency of riboswitch regulation, strains harbouring the riboswitch G downstream of Pfdx or Pfdx* were cultivated with increasing concentrations of theophylline, ranging from 0.1 to 10 mM (
As requisite of a good inducer, it was ascertained that theophylline is not metabolized by Clostridium sporogenes. Clostridial cells were cultivated in TYG medium, with theophylline added to the culture to a final concentration of 2 mM. The concentration of theophylline was monitored by HPLC/MS (high-performance liquid chromatography-mass spectrometry) analysis of cell-free supernatant samples over time. Results showed that the concentration of theophylline remains constant over the course of the experiment, with a slight increase of theophylline in the supernatant culture 20 hours after induction, possibly due to the release of the inducer from the cells after lysis (
The data below demonstrates that using the above described tightly inducible gene expression system in Clostridium an efficient Clostridial editing CRISPR tool can be provided which circumvents the obstacles previously seen for several reasons. Firstly, the theophylline responsive riboswitch can be designed to have the desired regulatory window, whereby there should be very low basal expression of Cas9 in the absence of inducer, allowing homologous recombination to occur before Cas9 mediates the site-specific DSB; after induction, an adequate level of gene expression occurs whilst minimizing the toxicity associated with Cas9. Secondly, riboswitches are smaller structures compared to systems that require a transcription factor; in a riboswitch-based system, 84 nucleotides are enough to tightly control the expression of Cas9, shrinking the size of the plasmid employed and enabling higher transformation efficiencies. Thirdly, because repression occurs at the level of translation, the riboswitch system allows the use of high-copy origins of replication, avoiding the undesired effects linked to read-through transcription on the plasmid backbone and facilitating the processes of cloning, screening and sequencing. Finally, and due to the aforementioned characteristics, a genetic construct according to the invention permits the confinement of all the essential components of a functional genome editing tool to the same plasmid.
In order to have an appropriate control over cas9 expression, both in the uninduced state and after induction, two riboswitches were used, riboswitches E and G, both were located downstream of the promoter Pfdx (
Transformation of the resulting vectors into C. sporogenes was performed via conjugation, plating the mated cultures into media supplemented with chloramphenicol in the absence and presence of 5 mM theophylline. As expected, the different constructs yielded different conjugation efficiencies (
Individual colonies harbouring the different CRISPR vectors and the DNA editing template were picked and transferred to selective plates in the presence of theophylline. In all cases, amplicons of ˜2.2 kbp instead of ˜2.9 kbp size were detected after screening, implying the expected ˜0.7 kbp complete deletion of spoIIR and an editing efficiency of 100% (
To further illustrate the versatility of the RiboCas system/genetic construct of the invention, attempts were made to both delete and integrate larger fragments into the genome of C. sporogenes. The spoIIE gene (Clspo_c37040), which is ˜2.4 kbp was selected as the target locus. Two sets of homology arms were designed to either delete the target sequence or to integrate a 3 kbp cassette that confers resistance to the antibiotic erythromycin once inserted into the genome (
To determine the applicability of the theophylline responsive riboswitches and the RiboCas editing tool in other clostridia species, studies were undertaken to delete the same gene, spoIIE, in three different species, including the solventogenic C. pasteurianum and the pathogens C. difficile and C. botulinum. Previous work had enabled the use of Cas9 for editing purposes in C. pasteurianum; however, the strategies employed suffer from very low transformation efficiencies that might compromise more ambitious applications. This is also the case in C. difficile; which despite some limited success, the performance of CRISPR still remains unsatisfactory. In particular, in many cases conjugation efficiency of the CRISPR plasmid is very low. This is likely due to the high toxicity associated with the Cas9 endonuclease activity.
Vectors pRECas1, containing the homology arms and the sgRNA to target the spoIIE gene in each species were introduced into the host organisms Clostridium pasteurianum DSM 525-H1, Clostridium difficile 630 and the group I Clostridium botulinum strain ATCC 3502. C. pasteurianum was transformed via electroporation whereas C. difficile and C. botulinum were transformed via conjugation. Simultaneously, the control plasmid pRECas1C (RiboCas vector lacking both homology arms and sgRNA), was also introduced into the different hosts to determine the leakiness of the system and its impact on the transformation/conjugation process. All transformations were plated on media supplemented with thiamphenicol, both in the absence and presence of theophylline. Transformation of the targeting RiboCas vector in C. pasteurianum was only 20% lower than that of the control vector (5000 CFU/μg DNA in pRECas1 vs 6500 CFU/μg DNA in pRECas1C) (
A summary of all gene editing attempts carried out for the validation of the RiboCas system in Clostridium, including insertions, deletions and nucleotide substitutions are summarized in Table 1. It is worth mentioning that, despite pRECas1 being the most efficient in this study (in terms of conjugation efficiency), vectors including the promoter J23119 may be suitable for organisms where ParaE is expected to be natively repressed (i.e. in C. acetobutylicum).
To exemplify the use of a riboswitch to control the expression of an alternative sigma factor, botR from C. botulinum ATCC 19397 was placed under the control of the Plea promoter from Clostridium sporogenes NCIMB 10696 and a theophylline-responsive riboswitch (rb G; SEQ ID No. 5). In the present embodiment, expression of BotR would activate its cognate promoter Pn/nh, which would in turn bring about catP expression, quantifiable spectrophotometrically via a Chloramphenicol Acetyltransferase (CAT) enzymatic assay. The Pfdx-rbG-botR-T1-Pntnh-catP construct (
As can be seen in
This serves to demonstrate that a riboswitch as described in the present invention can be successfully employed to tightly control the expression of a sigma factor.
This study shows the first implementation of synthetic riboswitches in the genus Clostridium and demonstrates their application in highly efficient genome editing. New and known theophylline-dependent riboswitches are demonstrated to function in C. sporogenes, with a response that is dependent on inducer concentration. In particular, the novel riboswitch G outperformed previously published riboswitches, exhibiting a large dynamic range and very low basal expression. By replacing the promoter and modifying the 5′UTR sequence located upstream of the riboswitch, a library of inducible switches with a full set of dynamic ranges were generated, suitable for applications where high levels of gene expression are needed as well as for the expression of toxic proteins. In all cases, the riboswitch was able to respond to the inducer theophylline, ensuring its performance independent of the genetic context.
To validate the usefulness of these tight riboregulators, a novel CRISPR Cas9-based system—RiboCas—was developed and shown to work in several clostridial species, including the non-pathogen C. sporogenes, the solventogenic C. pasteurianum and the pathogens C. difficile and C. botulinum. This novel system, using a genetic construct according to the invention, overcomes the main obstacles associated with Cas9 editing tools, including the very low transformation efficiencies and the inability to select only edited cells as a result of excessive Cas9 toxicity. For this reason, temporal and minimum exposure of the cells to the novel editing system has three main benefits: it provides time for the homologous recombination event to occur before Cas9 is expressed, it reduces the opportunity for potential off target effects and it diminishes the likelihood of mutations that inactivate Cas9 or the sgRNA; these mutations result in “escaper” colonies that are indistinguishable from non-edited cells. Because riboswitches can be adjusted to the desired regulatory window, a riboswitch-based CRISPR system allows very low basal expression of Cas9 in the absence of inducer and an adequate level of gene expression after induction, fulfilling the benefits of an inducible Cas9-based system.
As the field of synthetic biology progresses towards more practical applications, the technologies that have been developed and optimized for canonical organisms such as E. coli are likely to be needed in other biological chassis, including Clostridium spp. This study provides a new set of tools for the efficient manipulation of industrially-relevant organisms as well as for the study of Clostridium pathogens and the development of associated therapies.
Number | Date | Country | Kind |
---|---|---|---|
1901165.9 | Jan 2019 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2020/050192 | 1/28/2020 | WO | 00 |