The content of the electronically submitted sequence listing in ASCII text file (Sequence Listing.ST25.txt; Size: 196,608 bytes; and Date of Creation: Aug. 10, 2009) filed with the application is incorporated herein by reference in its entirety.
1. Field of the Invention
The present invention relates to the field of molecular biology and genetic tool development in thermophilic bacteria. In particular, it relates to the use of positive and/or negative selection markers that can be used to efficiently select modified strains of interest. By providing such capabilities, the disclosed invention facilitates the recycling of genetic markers in thermophilic bacterial host cells. The present invention also allows the creation of unmarked strains. The genetic tools disclosed in the present invention are prerequisites for making targeted higher order mutations in a single thermophilic strain background.
2. Background Art
Thermophilic microorganisms, which can grow at temperatures of 45° C. and above, are useful for a variety of industrial processes. For example, thermophilic microorganisms can be used as biocatalysts in reactions at higher operating temperatures than can be achieved with mesophilic microorganisms. Thermophilic organisms are particularly useful in biologically mediated processes for energy conversion, such as the production of ethanol from plant biomass, because higher operating temperatures allow more convenient and efficient removal of ethanol in vaporized form from the fermentation medium. Thermophilic organisms can also be used for the generation of alternative products including lactate or acrylate.
The ability to metabolically engineer thermophilic microorganisms to improve various properties (e.g., ethanol production, breakdown of lignocellulosic materials), would allow the benefit of higher operating temperatures to be combined with the benefits of using industrially important enzymes from a variety of sources in order to improve efficiency and lower the cost of production of various industrial processes, such as energy conversion and alternative fuel production.
Thermophilic organisms such as C. thermocellum and T. saccharolyticum are rapidly becoming organisms of choice for their potential to produce ethanol from cellulosic material. The genetic engineering of such thermophiles is necessary for the development of an efficient consolidated bioprocessing (CBP) system in the production of cellulosic ethanol and alternative products such as lactate or acrylate. A critical step towards genetic engineering of thermophiles is the development of specialized genetic tools.
Positive and negative selection markers greatly facilitate the ability to recycle genetic markers and make unmarked gene deletions, both of which are prerequisites for making targeted higher order mutations in a single strain background. The latter is required to make C. thermocellum a high yielding ethanologen. To date, higher order mutations have not been possible to achieve in C. thermocellum due to the limited number of genetic markers available in this system.
Positive and negative selections are commonly used genetic tools and have been applied to many classes of microbes. Many different types of positive and negative selections exist. However, little of this technology has been transferred to the anaerobic thermophiles. This is especially true of the cellulolytic clostridia, such as C. thermocellum, that fall into this class. In terms of negative selectable markers, not much information is known with respect to their use in anaerobic thermophilic organisms.
The choice of selection markers for use in anaerobic thermophiles is also complicated by the fact that prior selection systems typically are not adaptable to thermophilic systems. In addition, many of the pathways in which selectable markers function have not been clearly elucidated in thermophiles. Furthermore, whether traditional selection schemes are operable under temperature and pH conditions required for the growth of thermophiles is also unpredictable. Thus, it is unclear whether thermophilic organisms harbor homologs of well-known marker genes and whether they would function as expected. Attempts to utilize selection markers commonly used in other systems have resulted in inadequate growth of strains, the inability to efficiently select for the presence or absence of the marker, and a failure of selection due to a lack of information regarding the potential of such a marker to function in the thermophilic host.
Applicants have recognized the potential of certain selection markers to be applied towards the genetic engineering of thermophiles. These markers include the URA3 bacterial homolog, pyrF, as well as thymidine kinase (tdk) and hypoxanthine phosophoribosyl transferase (hpt). The pyrF gene has been successfully utilized in various systems, but has not been extensively applied to thermophilic organisms.
The tdk gene has been used as a negative selectable marker to make targeted gene deletions in other systems, such as the gram-negative bacterium Acinetobacter sp. and the gram-positive bacterium Streptococcus gordonii. See Metzgar et al., NAR 32: 5780-5790 (2004) and Franke et al., Antimicrobial Agents and Chemotherapy 44: 787-789 (200). However, no negative selection tools have been shown to be successful for use in C. thermocellum.
In addition, it is well known that the hypoxanthine phosophoribosyl transferase (hpt) gene is sensitive to the anti-metabolites 8-azahypoxanthine, 6-mercaptopurine, 8-azaguanine, aza-2,6-diaminopurine, and 6-thioguanine. There are multiple reports of using these anti-metabolites to delete the hpt gene and turn it into a negative selectable marker. However, there are no reports utilizing an artificial operon expressing an antibiotic resistance gene and hpt or tdk. In addition, there are no published reports using hpt as a marker in thermophilic or cellulolytic organisms. Excluding mammalian systems, there are very few reports detailing the use of hpt as a positive selectable marker. Such reports include those that describe the use of hpt in non-thermophilic organisms such as Toxiplasma gondii, Methanococcus maripaludis, Methanosarcina acetivorans and vaccinia virus. See Donald and Roos, Mol. Biochem. Parasitol. 91:295-305 (1998); Donald et al., J. Biol. Chem. 271:14010-9 (1996); Moore and Leigh, J. Bacteriol. 187:972-9 (2005); Prtichett et al., Appl. Environ. Microbiol. 70:1425-33 (2004); Isaacs et al., Virology 178:626-30 (1990).
The use of tdk and hpt as potential positive and negative selectable markers in mammalian cell culture was first reported in 1962 with the development of the HAT medium selection technique (http://en.wikipedia.org/wiki/HAT_medium). The HAT selection technique has been refined and modified over the past five decades utilizing both hpt and tdk in various ways. The use of either tdk or hpt as selectable markers in thermophiles or cellulolytic organisms has not, however, been reported for thermophilic organisms.
The present invention provides genetic tools for use in anaerobic thermophiles, including vector constructs for positive and/or negative selection and methods of utilizing such constructs for the recycling of genetic markers and for the creation of unmarked strains. In particular, the present invention provides for vector constructs containing a combination of markers, including pyrF, tdk and/or hpt, and optionally one or more antibiotic resistance markers. The present invention demonstrates the application of these selectable markers, and the use of both positive and negative selection capabilities, for the genetic engineering of thermophilic organisms.
The present invention provides for a vector for use in an anaerobic thermophilic host comprising: (a) one or more selectable marker sequences, wherein each selectable marker sequence comprises a nucleic acid sequence encoding for a positive and/or negative selectable marker; and (b) a thermophilic host sequence; wherein said thermophilic host sequence comprises a nucleic acid sequence that is endogenous to said thermophilic host.
In additional embodiments, the selectable markers are selected from the group consisting of thymidine kinase (tdk), hypoxanthine phosphoribosyltransferase (hpt) and orotidine-5°-phosphate decarboxylase (pyrF), an antibiotic resistance marker or a combination thereof. In certain embodiments, the selectable markers are derived from an anaerobic thermophilic organism, including a heterologous anaerobic thermophilic organism. In a further aspect, the invention provides that the tdk is from Thermoanaerobacterium saccharolyticum. In other aspects, the invention provides that the pyrF or hpt is from Clostridium thermocellum or Thermoanaerobacterium saccharolyticum.
In further embodiments, the vector comprises at least one positive selectable marker sequence, at least one negative selectable marker sequence, at least two selectable marker sequences or at least one positive selectable marker sequence and at least one negative selectable marker sequence. In particular embodiments, the selectable marker sequence encodes for a selectable marker that provides for both positive and negative selection. In other embodiments, the selectable marker sequence encodes for a selectable marker that provides for positive or negative selection.
In additional embodiments, the invention provides that the anaerobic thermophilic host sequence of the vector comprises nucleic acid sequences of regions flanking an endogenous target gene, an endogenous replicon, an endogenous origin of replication, or an endogenous regulatory sequence. In certain embodiments, the anaerobic thermophilic host is a xylanolytic and/or cellulolytic thermophilic organism. In certain other embodiments, the thermophilic host is Clostridium thermocellum or Thermoanaerobacterium saccharolyticum.
The invention further provides for a thermophilic host cell comprising a vector according to the invention. In particular embodiments, the endogenous hpt gene of the thermophilic host cell has been deleted (Δhpt). In certain other embodiments, the thermophilic host is not auxotrophic.
The invention also provides for a method for producing a transformed thermophilic host cell, said method comprising the following steps: (a) transforming said thermophilic host cell with the vector according to the invention; and (b) selecting said host cell for the presence of said vector within the host cell.
The invention also provides for a method of making an unmarked thermophilic host cell, said method comprising the following steps: (a) transforming said thermophilic host cell with the vector according to the invention; (b) selecting said host cell for the presence of said vector within the host cell; (c) culturing said host cell for a length of time and under conditions whereby the vector replicates; and (d) selecting said host cell for the absence of said vector within the host cell.
The invention further provides for a method of making one or more targeted gene deletions in a thermophilic host cell, said method comprising the following steps: (a) transforming said thermophilic host cell with the vector according to the invention, wherein said vector comprises a thermophilic host sequence flanking an endogenous target gene; (b) selecting said host cell for the presence of said vector within the host cell; (c) culturing said host cell for a length of time and under conditions whereby homologous recombination occurs between the vector and the host cell genome; and (d) determining whether said target gene has been deleted; and, optionally, (e) repeating steps (a)-(d) for deletion of a different target gene. In additional embodiments, the target gene encodes for pta or ldh.
The invention also provides for a method for recycling genetic markers in a thermophilic host cell, said method comprising the following steps: (a) transforming said thermophilic host cell with the vector according to the invention; (b) selecting said host cell for the presence of said vector within the host cell; (c) culturing said host cell for a length of time and under conditions whereby the vector replicates; and (d) selecting said host cell for the absence of said vector within the host cell; and, optionally, (e) repeating steps (a)-(d).
The invention additionally provides for a thermophilic host cell produced by a method according to the invention.
The present invention relates to, inter alia, the use of positive and/or negative selectable markers in thermophilic organisms. Applicants have constructed and characterized plasmids containing one or more of the selectable markers pyrF, tdk and hpt. Applicants' invention provides important tools for use in genetically engineering thermophilic microorganisms. In particular, Applicants' invention allows for recycling of genetic markers in thermophilic host cells and the creation of unmarked thermophilic strains.
A “plasmid” or “vector” refers to an extrachromosomal element often carrying one or more genes that are not part of the central metabolism of the cell, and is usually in the form of a circular double-stranded DNA molecule. Such elements may be autonomously replicating sequences, genome integrating sequences, phage or nucleotide sequences, linear, circular, or supercoiled, of a single- or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a promoter fragment and DNA sequence for a selected gene product along with appropriate 3′ untranslated sequence into a cell. Preferably, the plasmids or vectors of the present invention are stable and self-replicating.
An “expression vector” is a vector that is capable of directing the expression of genes to which it is operably linked.
The term “thermophilic” refers to an organism that grows and thrives at a temperature of about 45° C. or higher.
The term “anaerobic” refers to an organism that grows and thrives under conditions of an absence of oxygen or under conditions of depleted nitrate, sulphate and/or oxygen.
A “selectable marker” is a gene, the expression of which creates a detectable phenotype and which facilitates detection of host cells that contain a plasmid having the selectable marker. Selectable markers include thymidine kinase (tdk), hypoxanthine phosphoribosyltransferase (hpt) and orotidine-5′-phosphate decarboxylase (pyrF). Additional non-limiting examples of selectable markers include drug resistance genes and nutritional markers. For example, the selectable marker can be a gene that confers resistance to an antibiotic selected from the group consisting of: ampicillin, kanamycin, erythromycin, chloramphenicol, gentamycin, kasugamycin, rifampicin, spectinomycin, D-Cycloserine, nalidixic acid, streptomycin, or tetracycline. Other non-limiting examples of selection markers include adenosine deaminase, aminoglycoside phosphotransferase, dihydrofolate reductase, hygromycin-B-phosphotransferase, thymidine kinase, and xanthine-guanine phosphoribosyltransferase. A single plasmid can comprise one or more selectable markers.
The term “FOA” or “5-FOA” refers to 5-fluoroorotic acid. Typically used in yeast molecular genetics to detect expression of the URA3 gene that encodes orotine-5′-monophosphate (OMP) dicarboxylase. Cells with an active URA3 gene (Ura+) (or the homolog pyrF) convert the 5-FOA to fluorodeoxyuridine, which is toxic to cells. Yeast strains carrying a mutation in the URA3 gene (or pyrF) grow in the presence of 5-FOA, if the media is supplemented with uracil.
The term “endogenous” as used herein means native to, or originating within, an organism or system, e.g., a component that is normally present, produced or synthesized within an organism or system.
The term “heterologous” as used herein refers to an element of a plasmid or cell that is derived from a source other than the endogenous source. Thus, for example, a heterologous sequence could be a sequence that is derived from a different gene or plasmid from the same host, from a different strain of host cell, or from an organism of a different taxonomic group (e.g., different kingdom, phylum, class, order, family genus, or species, or any subgroup within one of these classifications). The term “heterologous” is also used synonymously herein with the term “exogenous.”
The term “unmarked” as used herein means not having a particular identifying selectable marker. For example, an “unmarked strain” or “unmarked host cell” refers to a strain or host cell that does not contain a gene for one or more particular selectable markers, where the selectable markers can be present endogenously, present extrachromosomally (e.g., on a plasmid) or integrated into the genome. A “marked strain” or “marked host cell” means having a particular identifying selectable marker.
The term “recombination” refers to the physical exchange of DNA between two identical (homologous), or nearly identical, DNA molecules. Recombination is used for targeted gene deletion to modify the sequence of a gene.
A “targeted gene deletion” or “gene knockout” refers to a technique by which an organism is engineered such that a particular endogenous gene of interest has been made inoperative. A targeted gene deletion can be achieved by utilizing a vector construct that has been engineered to recombine with the endogenous target gene, which is accomplished by incorporating sequences from the target gene itself into the vector constnict flanking a foreign sequence. Recombination then occurs between the target gene sequences within the vector and the endogenous target gene sequences, resulting in the insertion of a foreign sequence to disrupt the gene. With its sequence interrupted, the altered gene in most cases will be translated into a nonfunctional protein, if it is translated at all. Because the desired type of DNA recombination is a rare event in the case of most cells and most constructs, the foreign sequence chosen for insertion usually includes a selectable marker. This enables selection of cells or organisms in which the targeted gene was successfully deleted. A rarer second recombination event can subsequently occur, resulting in the extraction of the selectable marker from the site of insertion. After several rounds of cell division, this extracted marker sequence can be lost, resulting in an unmarked strain harboring a targeted gene deletion.
“Flanking sequences” as used herein refers to short DNA sequences located on either side of a transcription unit or a genetic locus. Flanking sequences often do not code for a protein.
The term “recycling a genetic marker” refers to the use of a selectable marker for making, e.g., a targeted gene deletion, and then removing the marker gene to allow subsequent genetic manipulations with that same marker.
The term “auxotrophy” refers to the inability of an organism to synthesize a particular organic compound required for its growth. An auxotroph is an organism that displays this characteristic. A strain is said to be auxotrophic if it carries a mutation that renders it unable to synthesize an essential compound. For example a bacterial mutant in which a gene of the uracil synthesis pathway is inactivated is a uracil auxotroph. Such a strain is unable to synthesize uracil and will only be able to grow if uracil can be taken up from the environment.
The term “stable plasmid” refers to a plasmid that is capable of autonomous replication and which is maintained throughout at least one and preferably many successive generations of host cell division. A “thermostable plasmid” is a plasmid that is stable at the temperatures of a thermophilic host.
A “reporter gene” is a gene that produces a detectable product that is connected to a promoter of interest so that detection of the reporter gene product can be used to evaluate promoter function. A reporter gene may also be fused to a gene of interest (e.g., 3′ to the endogenous promoter of the gene of interest), such that the fused genes are expressed as a fusion protein that allow one to detect whether the gene of interest is expressed under a given set of conditions. Non-limiting examples of reporter genes include: β-galactosidase, β-glucuronidase, luciferase, chloramphenicol acetyltransferase (CAT), secreted alkaline phosphatase (SEAP), green fluorescent protein (GFP), red fluorescent protein (RFP), and catechol 2,3-oxygenase (xylE).
A “nucleic acid” is a polymeric compound comprised of covalently linked subunits called nucleotides. Nucleic acid includes polyribonucleic acid (RNA) and polydeoxyribonucleic acid (DNA), both of which may be single-stranded or double-stranded. DNA includes cDNA, genomic DNA, synthetic DNA, and semi-synthetic DNA.
An “isolated nucleic acid molecule” or “isolated nucleic acid fragment” refers to the phosphate ester polymeric form of ribonucleosides (adenosine, guanosine, uridine or cytidine; “RNA molecules”) or deoxyribonucleosides (deoxyadenosine, deoxyguanosine, deoxythymidine, or deoxycytidine; “DNA molecules”), or any phosphoester anologs thereof, such as phosphorothioates and thioesters, in either single stranded form, or a double-stranded helix. Double stranded DNA-DNA, DNA-RNA and RNA-RNA helices are possible. The term nucleic acid molecule, and in particular DNA or RNA molecule, refers only to the primary and secondary structure of the molecule, and does not limit it to any particular tertiary forms. Thus, this term includes double-stranded DNA found, inter alia, in linear or circular DNA molecules (e.g., restriction fragments), plasmids, and chromosomes. In discussing the structure of particular double-stranded DNA molecules, sequences may be described herein according to the normal convention of giving only the sequence in the 5′ to 3′ direction along the non-transcribed strand of DNA (i.e., the strand having a sequence homologous to the mRNA).
A “gene” refers to an assembly of nucleotides that encode a polypeptide, and includes cDNA and genomic DNA nucleic acids. “Gene” also refers to a nucleic acid fragment that expresses a specific protein, including regulatory sequences preceding (5′ non-coding sequences) and following (3′ non-coding sequences) the coding sequence. “Native gene” refers to a gene as found in nature with its own regulatory sequences.
The term “percent identity”, as known in the art, is a relationship between two or more polypeptide sequences or two or more polynucleotide sequences, as determined by comparing the sequences. In the art, “identity” also means the degree of sequence relatedness between polypeptide or polynucleotide sequences, as the case may be, as determined by the match between strings of such sequences. “Identity” and “similarity” can be readily calculated by known methods, including but not limited to those described in: Computational Molecular Biology (Lesk, A. M., ed.) Oxford University Press, NY (1988); Biocomputing: Informatics and Genome Projects (Smith, D. W., ed.) Academic Press, NY (1993); Computer Analysis of Sequence Data, Part I (Griffin, A. M., and Griffin, H. G., eds.) Humana Press, NJ (1994); Sequence Analysis in Molecular Biology (von Heinje, G., ed.) Academic Press (1987); and Sequence Analysis Primer (Gribskov, M. and Devereux, J., eds.) Stockton Press, NY (1991). Preferred methods to determine identity are designed to give the best match between the sequences tested. Methods to determine identity and similarity are codified in publicly available computer programs. Sequence alignments and percent identity calculations may be performed using the Megalign program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, Wis.). Multiple alignment of the sequences was performed using the Clustal method of alignment (Higgins and Sharp (1989) CABIOS. 5:151-153) with the default parameters (GAP PENALTY=10, GAP LENGTH PENALTY=10). Default parameters for pairwise alignments using the Clustal method were KTUPLE 1, GAP PENALTY=3, WINDOW=5 and DIAGONALS SAVED=5.
Suitable nucleic acid sequences or fragments thereof (including any of the isolated polynucleotides of the present invention) encode polypeptides that are at least about 70% to 75% identical to the amino acid sequences reported herein, preferably at least about 80%, 85%, or 90% identical to the amino acid sequences reported herein, and most preferably at least about 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequences reported herein. Suitable nucleic acid fragments are preferably at least about 70%, 75%, or 80% identical to the nucleic acid sequences reported herein, preferably at least about 80%, 85%, or 90% identical to the nucleic acid sequences reported herein, and most preferably at least about 95%, 96%, 97%, 98%, 99%, or 100% identical to the nucleic acid sequences reported herein. Suitable nucleic acid fragments not only have the above identities/similarities but typically encode a polypeptide having at least 50 amino acids, preferably at least 100 amino acids, more preferably at least 150 amino acids, still more preferably at least 200 amino acids, and most preferably at least 250 amino acids.
A DNA “coding sequence” is a double-stranded DNA sequence which is transcribed and translated into a polypeptide in a cell in vitro or in vivo when placed under the control of appropriate regulatory sequences. “Suitable regulatory sequences” refer to nucleotide sequences located upstream (5′ non-coding sequences), within, or downstream (3′ non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences may include promoters, translation leader sequences, RNA processing site, effector binding site and stem-loop structure. The boundaries of the coding sequence are determined by a start codon at the 5′ (amino) terminus and a translation stop codon at the 3′ (carboxyl) terminus. A coding sequence can include, but is not limited to, prokaryotic sequences, cDNA from mRNA, genomic DNA sequences, and even synthetic DNA sequences. If the coding sequence is intended for expression in a eukaryotic cell, a polyadenylation signal and transcription termination sequence will usually be located 3′ to the coding sequence.
“Open reading frame” is abbreviated ORF and means a length of nucleic acid sequence, either DNA, cDNA or RNA, that comprises a translation start signal or initiation codon, such as an ATG or AUG, and a termination codon and can be potentially translated into a polypeptide sequence.
“Promoter” refers to a DNA sequence capable of controlling the expression of a coding sequence or functional RNA. In general, a coding sequence is located 3′ to a promoter sequence. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic DNA segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental or physiological conditions. Promoters which cause a gene to be expressed in most cell types at most times are commonly referred to as “constitutive promoters”. It is further recognized that since in most cases the exact boundaries of regulatory sequences have not been completely defined, DNA fragments of different lengths may have identical promoter activity.
A “promoter sequence” is a DNA regulatory region capable of binding RNA polymerase in a cell and initiating transcription of a downstream (3′ direction) coding sequence. For purposes of defining the present invention, the promoter sequence is bounded at its 3′ terminus by the transcription initiation site and extends upstream (5′ direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence will be found a transcription initiation site (conveniently defined for example, by mapping with nuclease S1), as well as protein binding domains (consensus sequences) responsible for the binding of RNA polymerase.
A coding sequence is “under the control” of transcriptional and translational control sequences in a cell when RNA polymerase transcribes the coding sequence into mRNA, which is then trans-RNA spliced (if the coding sequence contains introns) and translated into the protein encoded by the coding sequence.
“Transcriptional and translational control sequences” are DNA regulatory sequences, such as promoters, enhancers, terminators, and the like, that provide for the expression of a coding sequence in a host cell. In eukaryotic cells, polyadenylation signals are control sequences.
The term “operably linked” refers to the association of nucleic acid sequences on a single nucleic acid fragment so that the function of one is affected by the other. For example, a promoter is operably linked with a coding sequence when it is capable of affecting the expression of that coding sequence (i.e., that the coding sequence is under the transcriptional control of the promoter). Coding sequences can be operably linked to regulatory sequences in sense or antisense orientation.
The term “expression,” as used herein, refers to the transcription and stable accumulation of sense (mRNA) or antisense RNA derived from the nucleic acid fragment of the invention. Expression may also refer to translation of mRNA into a polypeptide.
The terms “restriction endonuclease” and “restriction enzyme” refer to an enzyme which binds and cuts at a specific nucleotide sequence within double stranded DNA.
The term “probe” refers to a single-stranded nucleic acid molecule that can base pair with a complementary single stranded target nucleic acid to foam a double-stranded molecule.
The term “complementary” is used to describe the relationship between nucleotide bases that are capable to hybridizing to one another. For example, with respect to DNA, adenosine is complementary to thymine and cytosine is complementary to guanine. Accordingly, the instant invention also includes isolated nucleic acid fragments that are complementary to the complete sequences as reported in the accompanying Sequence Listing as well as those substantially similar nucleic acid sequences.
As used herein, the term “oligonucleotide” refers to a nucleic acid, generally of about 18 nucleotides, that is hybridizable to a genomic DNA molecule, a cDNA molecule, or an mRNA molecule. Oligonucleotides can be labeled, e.g., with 32P-nucleotides or nucleotides to which a label, such as biotin, has been covalently conjugated. An oligonucleotide can be used as a probe to detect the presence of a nucleic acid according to the invention. Similarly, oligonucleotides (one or both of which may be labeled) can be used as PCR primers, either for cloning full length or a fragment of a nucleic acid of the invention, or to detect the presence of nucleic acids according to the invention. Generally, oligonucleotides are prepared synthetically, preferably on a nucleic acid synthesizer. Accordingly, oligonucleotides can be prepared with non-naturally occurring phosphoester analog bonds, such as thioester bonds, etc.
A nucleic acid molecule is “hybridizable” to another nucleic acid molecule, such as a cDNA, genomic DNA, or RNA, when a single stranded form of the nucleic acid molecule can anneal to the other nucleic acid molecule under the appropriate conditions of temperature and solution ionic strength. Hybridization and washing conditions are well known and exemplified in Sambrook, J., Fritsch, E. F. and Maniatis, T. MOLECULAR CLONING: A LABORATORY MANUAL, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (1989), particularly Chapter 11 and Table 11.1 therein (hereinafter “Maniatis”, entirely incorporated herein by reference). The conditions of temperature and ionic strength determine the “stringency” of the hybridization. Stringency conditions can be adjusted to screen for moderately similar fragments, such as homologous sequences from distantly related organisms, to highly similar fragments, such as genes that duplicate functional enzymes from closely related organisms. Post-hybridization washes determine stringency conditions. One set of preferred conditions uses a series of washes starting with 6×SSC, 0.5% SDS at room temperature for 15 min, then repeated with 2×SSC, 0.5% SDS at 45° C. for 30 min, and then repeated twice with 0.2×SSC, 0.5% SDS at 50° C. for 30 min. A more preferred set of stringent conditions uses higher temperatures in which the washes are identical to those above except for the temperature of the final two 30 min washes in 0.2×SSC, 0.5% SDS was increased to 60° C. Another preferred set of highly stringent conditions uses two final washes in 0.1×SSC, 0.1% SDS at 65° C. Another set of highly stringent conditions are defined by hybridization at 0.1×SSC, 0.1% SDS, 65° C. and washed with 2×SSC, 0.1% SDS followed by 0.1×SSC, 0.1% SDS.
Hybridization requires that the two nucleic acids contain complementary sequences, although depending on the stringency of the hybridization, mismatches between bases are possible. The appropriate stringency for hybridizing nucleic acids depends on the length of the nucleic acids and the degree of complementation, variables well known in the art. The greater the degree of similarity or homology between two nucleotide sequences, the greater the value of Tm for hybrids of nucleic acids having those sequences. The relative stability (corresponding to higher Tm) of nucleic acid hybridizations decreases in the following order: RNA:RNA, DNA:RNA, DNA:DNA. For hybrids of greater than 100 nucleotides in length, equations for calculating Tm have been derived (see, e.g., Maniatis at 9.50-9.51). For hybridizations with shorter nucleic acids, i.e., oligonucleotides, the position of mismatches becomes more important, and the length of the oligonucleotide determines its specificity (see, e.g., Maniatis, at 11.7-11.8). In one embodiment the length for a hybridizable nucleic acid is at least about 10 nucleotides. Preferably a minimum length for a hybridizable nucleic acid is at least about 15 nucleotides; more preferably at least about 20 nucleotides; and most preferably the length is at least 30 nucleotides. Furthermore, the skilled artisan will recognize that the temperature and wash solution salt concentration may be adjusted as necessary according to factors such as length of the probe.
The term “cellulase” refers to an enzyme involved in cellulose degradation. A cellulase can be an endoglucanase which cuts at random in the cellulose polysaccharide chain of amorphous cellulose generating oligosaccharides of varying lengths and consequently new chain ends, an exoglucanase which acts in a processive manner on the reducing or non-reducing ends of cellulose polysaccharide chains liberating either glucose (glucanohydrolases) or cellobiose (cellobiohydrolase) as major products or a β-glucosidase (β-glucoside glucohydrolases; EC 3.2.1.21) which hydrolyzes soluble cellodextrins and cellobiose to glucose units.
As discussed above, a critical step in genetic engineering of thermophiles such as C. thermocellum is the development of specialized genetic tools. The present invention relates to such specialized genetic tools for use in anaerobic thermophiles, including vector constructs for positive and/or negative selection and methods of utilizing such constructs for the recycling of genetic markers and for the creation of unmarked strains.
In one aspect of the invention, a combination of selectable markers are utilized. A vector of the invention can include one or more, two or more, or three or more positive and/or negative selectable markers, including 1, 2, 3, 4, 5 or 6 positive and/or negative selectable markers. In certain embodiments, the selectable markers are selected from the group consisting of pyrF, tdk and hpt, chloramphenicol, thyamphenicol, neomycin, and kanamycin. Particular combinations of markers can include, for example: (1) tdk and hpt; (2) pyrF and hpt; (3) chloramphenicol, hpt, and tdk; (4) neomycin and tdk; (5) kanamycin, hpt and tdk; (6) chloramphenicol, tdk and hpt; (7) neomycin, tdk and hpt; or (8) kanamycin, tdk and hpt.
In certain embodiments, when more than two markers are utilized in a single vector, two of the markers can be present between flanking sequences homologous to an endogenous target gene, and one or more additional markers can be present on the vector outside of the flanking sequence. In addition, one or more of the markers can be present between the flanking sequences homologous to an endogenous target gene, and one or more additional markers can be present on the vector outside of the flanking sequence.
One aspect of the invention relates to the construction of a thermophilic strain harboring a targeted in frame clean gene deletion. As described above, a targeted gene deletion can be achieved by utilizing a replicating vector construct that has been engineered to recombine with a portion of the target gene and a region downstream of the endogenous target gene, which is accomplished by incorporating 500-1000 base pairs of the target sequence and 500-1000 base pairs of the sequences located 3′ to the target gene itself into the vector construct flanking a positive selectable marker or a positive selectable marker genetically linked to a negative selectable marked. In particular embodiments 500-1000 base pairs of sequence located upstream of the target gene is cloned in between the selectable marker and the down stream region. In particular embodiments, the target gene is replaced by one or more of the selectable markers as described in the present invention. For example, a vector for a targeted gene deletion would include a pyrF, tdk, hpt or cat gene sequence, or any combination of linked marker gene sequences, flanked by sequence that is homologous to the target gene. Such a vector, when transformed into a thermophilic strain, will recombine with the target gene and result in the replacement of the target gene with the selectable marker(s) and the duplicated upstream region. A variation of this can be achieved in which a portion of the target gene is omitted and either the upstream or downstream region is duplicated on the plasmids.
The targeted gene deletion and plasmid loss can be selected for using the counterselectable marker located outside the engineered upstream and downstream flanks. This selection creates an allelic replacement of a portion of the target gene with the upstream region and a gene containing a positive and negative selection as in pyrF or a positive selectable marker linked to a negative selectable marker, as in cat linked to hpt/tdk. Recombination between the duplicated upstream regions results in a clean deletion and is selected for using the negative selection against the marker(s) that replaced the portion of the target gene.
Alternatively, a vector can be constructed such that the target gene is replaced by a negative or dual (has both positive and negative selection) selectable marker gene or cassette, such as pyrF, tdk, or hpt alone or linked to a positive selectable marker such as an antibiotic resistance gene, such as cat, or an additional dual selectable marker, such as pyrF, tdk, or hpt. Such a vector would comprise, for example, a cat and pyrF gene flanked by sequence that is homologous to the target gene. A selectable marker such as pyrF, tdk or hpt can be included on such a vector located outside of the flanking sequence. In this way, once the cat-pyrF cassette has been integrated into the genome replacing the target gene, the loss of the plasmid can be selected by the use of positive and/or negative selection, depending on which selectable marker is located outside of the flanking region. Selection for the loss of a plasmid once a targeted gene deletion has been made results in the creation of a marked strain that has the gene of interest replaced by a cassette or individual gene with negative selection. A second vector can be made that has the same flanking DNA that was used in the example above and a selectable marker such as pyrF, tdk or hpt can be included on such a vector located outside of the flanking sequence for plasmid loss. In this way, once the plasmid recombines with the chromosome, the cassette with negative selection, such as the cat-pyrF cassette described above, that has been integrated into the genome replacing the target gene, can be selected against. Upon loss of the plasmid using the negative marker located outside of the flanking region, a plasmid-free strain with a clean deletion of the target gene can be generated.
In additional embodiments, a vector can be constructed such that the endogenous target gene is replaced by a nonfunctional version of the target gene, a mutated version of the target gene, or a non-selectable sequence. Such a vector can also include a counterselectable marker located outside the engineered upstream and downstream flanks. After plasmid loss selection, as described above, an unmarked strain harboring a disruption or mutation of the target gene can be generated in an unmarked strain.
A marked or unmarked strain of the invention that harbors a targeted gene deletion can be further modified to include a second targeted gene deletion by virtue of the ability to transform this unmarked strain with a vector having any selectable marker of the present invention. In this way, selectable markers can be “recycled” or reused to further engineer a thermophilic organism.
Targeted gene deletions of thermophilic hosts can be made to generate a strain capable of increased ethanol production, increased lactate production or increased acrylate production, for example. Target genes include, but are not limited to, lactate dehydrogenase (ldh), hydrogenase, phosphotransaceytlase (pta), acetate kinase (ack), nitrogenase, pyruvate formate lyase (pfl), methylglyoxal synthase, and Spo0A, as well as other genes involved in central metabolism, stress response, and carbohydrate utilization.
In one embodiment, the invention provides for the creation of a thermophilic strain containing a deletion of the pyrF gene. Demonstration of the use of pyrF as a positive and negative selectable marker is conducted by reintroduction of the pyrF gene in such a strain, as discussed further below.
The invention also provides for the creation of a thermophilic strain expressing the thymidine kinase (tdk) gene. In additional embodiments, the introduction of the tdk gene and demonstration of its use as negative selectable marker is conducted, as discussed further below.
The invention additionally provides for the creation of a thermophilic strain containing a deletion of the hypoxanthine phsophribosyltransferase (hpt) gene. In additional embodiments, the reintroduction of the hpt gene and demonstration of its use as a positive and negative selectable marker is conducted, as discussed further below. In further aspects of the invention, a tdk, hpt and/or additional selectable markers, including an antibiotic resistance gene can be further introduced into an hpt-deleted strain. Thus, the invention provides for the incorporation of hpt, tdk and/or an antibiotic resistance gene into a single tool (or vector) for making markerless clean deletions in thermophiles.
The invention also provides for the creation of strains containing a combination of the selectable markers described above.
The pyrF gene encodes the pyrimidine biosynthetic enzyme orotidine-5′-monophosphate (OMP) decarboxylase. Its homology to the Saccharomyces cerevisiae URA3 gene allows for adaptation of the URA3 selection system, allowing both positive and negative selection of the marker. URA3 encodes an enzyme, orotidine-5′-phosphate decarboxylase (ODCase), that can catalyze the conversion of 5-fluoroorotic acid (5-FOA) into a highly toxic compound. Thus, counterselection works on the basis that the presence of URA3 confers sensitivity to 5-FOA, while URA3-negative cells are 5-FOA resistant. Alternatively, URA3 as a positive selection marker works based on the ability of an exogenous or plasmid-borne URA3 gene complementing uracil auxotrophy of a URA3-negative strain.
The present invention provides for a thermophilic bacterial system that utilizes the selection capabilities of the pyrF gene. One aspect of the invention provides for the creation of thermophilic strain in which the pyrF gene has been deleted (ΔpyrF). Deletion of pyrF results in uracil auxotrophy, and thus growth of such a strain requires uracil to be supplemented in the growth medium, or alternatively, expression of an exogenous or plasmid-borne pyrF gene. In such a system, positive selection for maintenance of a plasmid containing pyrF can be achieved by subjecting the ΔpyrF thermophilic strain, e.g., ΔpyrF C. theremocellum, to media lacking uracil. Negative selection can be achieved by subjecting the ΔpyrF thermophilic strain containing a plasmid borne copy of pyrF to media containing uracil and the antimetabolie 5-fluoro-orotic acid (5-FOA).
As discussed above, a ΔpyrF thermophilic strain is a uracil auxotroph. While such a strain has the advantage of being utilized for both positive and negative selection, due to its auxotrophy, such a strain can sometimes have diminished growth which may complicate strain development. This can be particularly relevant when a targeted gene deletion also causes diminished growth. A strain that is being engineered to increase ethanol production, for example, by deletion of the genes for ldh or pta, can have diminished growth. Thus, the use of a pyrF selection system can further affect the growth capability of such a strain. Furthermore, supplementation of such a strain with uracil can complicate directed evolution and bioprocess studies. Additionally, it is possible that uracil deficiency interferes with pyrimidine synthesis such that stably maintaining large plasmids is difficult.
The use of at least a single additional or alternative negative and/or positive selectable marker would be advantageous in achieving an unmarked gene deletion, or having the capability of recycling genetic markers. The present invention provides for such additional markers, such as tdk and hpt, as discussed further below.
It is known that the thermophilic organism C. thermocellum lacks the tdk gene. In
The advantages of using a tdk gene as a negative selection marker are numerous. First of all, the tdk gene is small. The T. saccharolyticum tdk gene, for example, is 579 base pairs in length, making it less burdensome to incorporate into cloning strategies. Secondly, as noted above, the tdk is not a native gene to C. thermocellum. Thus, unlike pyrF, an accompanying mutant devoid of the chromosomal copy of the gene does not need to be generated. Finally, unlike pyrF, the selection does not require an auxotrophy so there is not impairment of growth.
The tdk gene can also be used as a positive selectable marker in thermophilles. This selection uses inhibitors of dihydrofolate reductase such as aminopterin or trimethoprim, together with hypoxanthine and/or thymidine, which are intermediates in DNA synthesis. The inhibition of dihydrofolate reductase by aminopterin or trimethoprim blocks de novo DNA synthesis, which is required for growth of the cells. Thymidine and hypoxanthine are intermediates that allow cells to use the pyrimidine and purine salvage pathways, respectively. C. thermocellum does not have a tdk gene and thus lacks a true pyrimidine salvage pathway so trimethoprim is lethal to the cell. When transformed into C. thermocellum, the tdk gene product can thus be positively selected for in the presence of trimethoprim and thymidine.
The tdk gene to be introduced into C. thermocellum can be derived from any organism expressing a native tdk. In particular, the tdk can be derived from another thermophile, such as T. saccharolyticum, which can make it more suitable for use in C. thermocellum as compared to a tdk gene from a mesophile, such as E. coli or S. gordonii.
An exemplary gene encoding a Tdk enzyme for use in the invention is shown below. The DNA sequence of this 570 bp orf is as follows:
The presence of the tdk gene can be detected by negative selection, and in particular, by the addition of fluorodeoxyuridine (FUDR). The mechanism by which FUDR is a toxic antimetabolite in the presence of the enzyme activity encoded by the tdk gene is depicted in
Thus, the advantage of using tdk as a negative selection marker in C. thermocellum is that higher order, targeted gene deletions can be achieved, one goal being a high yielding ethanologen.
Many bacterial cells contain two distinct pathways for creating the purine intermediates necessary for DNA synthesis. The de novo purine synthesis pathway is typically responsible for a majority of this production. This pathway is a long, energy intensive, and makes purines “expensive” for the cell to manufacture. Under conditions where a culture is dividing rapidly or nutrients are limiting, cells conserve energy by replenishing the purine pool with the salvage pathway. In laboratory conditions, this pathway is rarely necessary. The hyoxanthine phosphoribosyltransferase gene encodes a protein that operates within the salvage pathway, and thus is a good candidate for use as a genetic tool.
The hpt gene can be used as both a positive and negative selectable marker. The hpt gene can be used as a negative selectable marker by utilizing anti-metabolites such as 8-azahypoxanthine (8-azaH). When the hpt gene is present and the gene product expressed, the anti-metabolite is incorporated into toxic purine intermediates and the cells die (
The hpt gene can be used as a positive selectable marker which is relatively rare in the absence of an auxotrophy. In a Δhpt strain the cells make purines through the de novo biosynthesis pathway. The de novo pathway itself can be inhibited by compounds such as mycophenolic acid (MPA) which inhibit the enzyme inosine 5′-monophosphate dehydrogenase (EC 1.1.1.205). In a wild type strain, MPA has little or no effect because the cell can still utilize the salvage pathway. Since a Δhpt strain is lacking the salvage pathway, mycophenolic acid becomes lethal to the cell since it can no longer synthesize purines. In a Δhpt strain, plasmids expressing a functional Hpt can be positively selected for on MPA. (
Deletion of the hpt gene affects only the purine salvage pathway and a growth defect is not expected. The present invention provides for the use of hpt alone or together with tdk and/or an antibiotic resistance gene to develop a thermophillic genetic tool applicable in a thermophilic organism, such as T. saccharolyticum or C. thermocellum. For example, because the hpt gene can be used as both a positive and negative selectable marker, this can potentially eliminate the need to link it to an antibiotic resistance gene.
The molecular vectors comprising the selectable markers described above can be tailored to meet the distinct requirements of the thermophilic organism to be engineered, but will be very similar in design, function, and technical application.
The gene encoding the T. saccharolyticum Hpt enzyme is designated by oak ridge national labs as or 0940 and is located on Contig7.
The genome sequence of C. thermocellum strain 1313 is not yet available so there is not a gene identifier. The Hpt enzyme in C. thermocellum strain 27405 is designated by oak ridge national labs as Cthe—2254. Manual sequence verification showed 100% homology between the two genes.
The present invention also relates to vectors which include selectable markers of the present invention, host cells which are genetically engineered with vectors of the invention and the production of polypeptides of the invention by recombinant techniques.
Host cells are genetically engineered (transduced or transformed or transfected) with the vectors of this invention which can be, for example, a cloning vector or an expression vector. The vector can be, for example, in the form of a plasmid, a viral particle, a phage, etc. The engineered host cells can be cultured in conventional nutrient media modified as appropriate for activating promoters, selecting transformants or amplifying the genes of the present invention. The culture conditions, such as temperature, pH and the like, are those previously used with the host cell selected for expression, and will be apparent to the ordinarily skilled artisan.
The appropriate selectable marker sequence may be inserted into the vector by a variety of procedures. In general, the DNA sequence is inserted into an appropriate restriction endonuclease site(s) by procedures known in the art. Such procedures and others are deemed to be within the scope of those skilled in the art.
The DNA sequence in the expression vector is operatively associated with an appropriate expression control sequence(s) (promoter) to direct mRNA synthesis. Any suitable promoter to drive gene expression in the host cells of the invention can be used, including the cbp, gapDH, pyrF, promoter from C. thermocellum. Additionally, the E. coli, lac or trp, and other promoters known to control expression of genes in prokaryotic or lower eukaryotic cells can be used. The expression vector also contains a ribosome binding site for translation initiation and a transcription terminator. The vector can also include appropriate sequences for amplifying expression, or can include additional regulatory regions.
In addition, the expression vectors may contain one or more selectable marker genes to provide a phenotypic trait for selection of transformed host cells such as pyrF, tdk and/or hpt.
The vector containing the appropriate selectable marker sequence as used herein, as well as an appropriate promoter or control sequence, can be employed to transform an appropriate thermophilic host to permit the host to express the protein.
Thus, in certain aspects, the present invention relates to host cells containing the above-described constructs. The host cell can be an anaerobic thermophilic bacterial cell, including an anaerobic xylanolytic and/or cellulolytic host cell. The selection of an appropriate host is deemed to be within the scope of those skilled in the art from the teachings herein.
Major groups of thermophilic bacteria include eubacteria and archaebacteria. Thermophilic eubacteria include: phototropic bacteria, such as cyanobacteria, purple bacteria, and green bacteria; Gram-positive bacteria, such as Bacillus, Clostridium, Lactic acid bacteria, and Actinomyces; and other eubacteria, such as Thiobacillus, Spirochete, Desulfotomaculum, Gram-negative aerobes, Gram-negative anaerobes, and Thermotoga. Within archaebacteria are considered Methanogens, extreme thermophiles (an art-recognized term), and Thermoplasma. In certain embodiments, the present invention relates to Gram-negative organotrophic thermophiles of the genera Thermus, Gram-positive eubacteria, such as genera Clostridium, and also which comprise both rods and cocci, genera in group of eubacteria, such as Thermosipho and Thermotoga, genera of Archaebacteria, such as Thermococcus, Thermoproteus (rod-shaped), Thermofilum (rod-shaped), Pyrodictium, Acidianus, Sulfolobus, Pyrobaculum, Pyrococcus, Thermodiscus, Staphylothermus, Desulfurococcus, Archaeoglobus, and Methanopyrus.
Some examples of thermophilic microorganisms (including bacteria, prokaryotic microorganisms such as fungi), which may be suitable for the present invention include, but are not limited to: Clostridium thermosulfurogenes, Clostridium cellulolyticum, Clostridium thermocellum, Clostridium thermohydrosulfuricum, Clostridium thermoaceticum, Clostridium thermosaccharolyticum, Clostridium tartarivorum, Clostridium thermocellulaseum, Thermoanaerobacterium thermosaccarolyticum, Thermoanaerobacterium saccharolyticum, Thermobacteroides acetoethylicus, Thermoanaerobium brockii, Methanobacterium thermoautotrophicum, Pyrodictium occultum, Thermoproteus neutrophiles, Thermofilum librum, Thermothrix thioparus, Desulfovibrio thermophilus, Thermoplasma acidophilum, Hydrogenomonas thermophilus, Thermomicrobium roseum, Thermus flavas, Thermus ruber, Pyrococcus furiosus, Thermus aquaticus, Thermus thermophilus, Chloroflexus aurantiacus, Thermococcus litoralis, Pyrodictium abyssi, Bacillus stearothermophilus, Cyanidium caldarium, Mastigocladus laminosus, Chlamydothrix calidissima, Chlamydothrix penicillata, Thiothrix carnea, Phormidium tenuissimum, Phormidium geysericola, Phormidium subterraneum, Phormidium bijahensi, Oscillatoria filiformis, Synechococcus lividus, Chloroflexus aurantiacus, Pyrodictium brockii, Thiobacillus thiooxidans, Sulfolobus acidocaldarius, Thiobacillus thermophilica, Bacillus stearothermophilus, Cercosulcifer hamathensis, Vahlkampfia reichi, Cyclidium citrullus, Dactylaria gallopava, Synechococcus lividus, Synechococcus elongatus, Synechococcus minervae, Synechocystis aquatilus, Aphanocapsa thermalis, Oscillatoria terebriformis, Oscillatoria amphibia, Oscillatoria germinata, Oscillatoria okenii, Phormidium laminosum, Phormidium parparasiens, Symploca thermalis, Bacillus acidocaldarias, Bacillus coagulans, Bacillus thermocatenalatus, Bacillus licheniformis, Bacillus pamilas, Bacillus macerans, Bacillus circulars, Bacillus laterosporus, Bacillus brevis, Bacillus subtilis, Bacillus sphaericus, Desulfotomaculum nigrificans, Streptococcus thermophilus, Lactobacillus thermophilus, Lactobacillus bulgaricus, Bifidobacterium thermophilum, Streptomyces fragmentosporus, Streptomyces thermonitrificans, Streptomyces thermovulgaris, Pseudonocardia thermophila, Thermoactinomyces vulgaris, Thermoactinomyces sacchari, Thermoactinomyces candidas, Thermomonospora curvata, Thermomonospora viridis, Thermomonospora citrina, Microbispora thermodiastatica, Microbispora aerata, Microbispora bispora, Actinobifida dichotomica, Actinobifida chromogena, Micropolyspora caesia, Micropolyspora faeni, Micropolyspora cectivugida, Micropolyspora cabrobrunea, Micropolyspora thermovirida, Micropolyspora viridinigra, Methanobacterium thermoautothropicum, variants thereof, and/or progeny thereof.
In certain embodiments, the present invention relates to thermophilic bacteria of the genera Thermoanaerobacterium or Thermoanaerobacter, including, but not limited to, species selected from the group consisting of: Thermoanaerobacterium thermosulfurigenes, Thermoanaerobacterium aotearoense, Thermoanaerobacterium polysaccharolyticum, Thermoanaerobacterium zeae, Thermoanaerobacterium xylanolyticum, Thermoanaerobacterium saccharolyticum, Thermoanaerobium brockii, Thermoanaerobacterium thermosaccharolyticum, Thermoanaerobacter thermohydrosulfuricus, Thermoanaerobacter ethanolicus, Thermoanaerobacter brockii, variants thereof, and progeny thereof.
In particular embodiments, the host cell is Clostridium thermocellum or Thermoanaerobacterium saccharolyticum. In additional embodiments, the host cell is a xylanolytic host of the genus Anaerocellum, Caldicellulosiruptor or Moorella.
The present invention also includes recombinant constructs comprising one or more of the selectable marker sequences as broadly described above. The constructs comprise a vector, such as a plasmid or viral vector, into which a sequence of the invention has been inserted, in a forward or reverse orientation. In one aspect of this embodiment, the construct further comprises regulatory sequences, including, for example, a promoter, operably associated to the sequence. Large numbers of suitable vectors and promoters are known to those of skill in the art, and are commercially available. The following vectors are provided by way of example only.
Introduction of the construct in host cells can be done using methods known in the art. Introduction can also be effected by electroporation methods as described in U.S. Prov. Appl. No. 61/109,642, filed Oct. 30, 2008, the contents of which are herein incorporated by reference.
Homologous recombination was utilized to make an in-frame deletion of the pyrF gene. A depiction of the expected recombination events and the resulting pyrF deletion is shown in
Creation of C. thermocellum pyrF Deletion
Plasmid pMU482 was transformed into the wild type C. thermocellum strain 1313 and selected on a rich media containing thyamphenicol (at the concentration of 6 μg/ml) to select for cells containing the cat marker encoded on the plasmid. A schematic displaying the creation of the knockout vector is depicted in
When the control pMU102 plasmid was transformed into C. thermocellum, only a few colonies appeared, most likely representing cells that gained spontaneous resistance to 5-FOA. When the pMU482 knockout vector was used to transform C. thermocellum, a much higher number of colonies appeared compared to the control. Results of the experiment are shown in
PCR screens were performed to verify that the pyrF gene had been deleted from the chromosome. The recombination events allowing the pyrF gene deletion are depicted in
Colonies were also tested by PCR amplification to confirm that the ΔpyrF strain was plasmid free. Results of the PCR screen depicted in
Positive Selection Using a C. thermocellum pyrF Knockout
The plasmid-free C. thermocellum ΔpyrF strain created above was further tested to confirm that positive selection could be appropriately applied. Cells were grown and plated on uracil minimal media. A C. thermocellum ΔpyrF strain is unable to grow on minimal media (lacking uracil), whereas a wild-type strain can. As shown in
Negative Selection Using a C. thermocellum pyrF Knockout
The plasmid-free C. thermocellum ΔpyrF strain created above was tested to confirm that negative selection could be appropriately applied. Cells were grown and plated using media containing the toxic analog 5-fluoroorotic acid (5-FOA). The protein product of the pyrF gene converts 5-FOA into a toxic compound. Thus, the C. thermocellum ΔpyrF strain should be resistant to 5-FOA, but the wild type or a complemented strain should be susceptible to 5-FOA and unable to grow. As shown in
The ability of pyrF to be utilized as a positive and negative selection marker in C. thermocellum led to the utilization of this marker for the creation of strains with a targeted gene deletion. One such target gene phosphotransacetylase (pta) is involved in the conversion of acetyl-CoA to acetate. The production of a modified organism harboring a pta deletion would be advantageous since it would prevent acetate production, thereby channeling the carbon flux towards increased ethanol production. Deletion of pta would facilitate the ultimate goal of making a homoethanologen strain (in conjunction with the deletion of other byproduct pathway genes such as ldh).
To knockout the pta gene, a knockout vector, pMU1162, was constructed which contained sequences homologous to upstream and downstream sequences of the pta gene with a chloramphenicol acetyltrasnferase (cat) gene located between the two flanking sequences. The knockout vector also contained a pyrF gene located outside of the flanking region sequence. A C. thermocellum strain was transformed with this plasmid. The transformation positive colonies were grown in rich media containing thyamphenicol (Tm6 mg/ml). Next day the cultures were plated on media containing thyamphenicol and 5-FOA. By homologous recombination, the endogenous pta was replaced with cat. A diagram depicting the vectors and the recombination events is shown in
PCR analysis was performed to confirm that deletion of the pta gene had been successfully achieved. As shown in
The pta gene was also knocked out utilizing a vector that contained pyrF gene sequences located between the two flanking sequences, rather than the chloramphenicol acetyltransferase (cat) gene sequence described above. This vector, referred to as pMU1663, is depicted in
To stably express the T. saccharolyticum tdk gene in C. thermocellum, allelic replacement using pyrF was performed. Plasmid pMU1452, as depicted in
C. thermocellum strains carrying the T. saccharolyticum tdk gene should be sensitive to fluorodeoxyuridine (FUDR) whereas those that do not have the tdk gene should be resistant to FUDR. To this end, the C. thermocellum pyrF::tdk strain and the control C. thermocellum ΔpyrF strain were plated in the presence of 10 μg/ml FUDR. For each culture, a 10 dilution was used for the plating. Results of the plating are shown in
Curing of the C. thermocellum pyrF::tdk Strain to Remove the Plasmid
Wild-type C. thermocellum harboring the pMU1452 vector was assayed for plasmid curing in the presence of FUDR. A cartoon schematic showing the process is shown in
A single set of primers was used that bind to both the plasmid and the chromosome. The diagram in
To knockout the hpt gene a vector was constructed which contained a region homologous to ˜1 kb upstream and downstream of the hpt gene. Additionally, a copy of the hpt gene expressed from the cellobiose phosphorylase promoter was added outside the clean deletion flanks to select for plasmid loss following the deletion event. The created plasmid is referred to as pMU1657, and is depicted in
Plasmid pMU1657 was transformed into the wild type C. thermocellum strain 1313 and selected for on chloramphenicol. Several colonies were selected and the transformed plasmid was verified by PCR. A colony was diluted into C. thermocellum growth medium and grown overnight. The following morning the dilution closest to an O.D of ˜1.0 was serially diluted and plated on defined medium containing 500 ug/ml 8-azahypoxanthine. Two days later, colonies were observed on the dilution representing 10−3. Six colonies were selected and PCR screens were performed to verify that the hpt gene had been deleted from the chromosome. After PCR amplification, the expected size of the wild type product having no deletion is 3380 bp and the expected size of the hpt deletion is 2820 bp. As shown in
The molecular data clearly shows that the hpt gene was successfully deleted and that the strain was plasmid free. To verify this, the new strain was grown up overnight and plated on thiamphenicol. After 5 days, no thiamphenicol resistant colonies were observed, further confirming the molecular data indicating plasmid loss. This data strongly suggests that hpt can be used as a negative selectable marker in C. thermocellum.
To provide stronger support for this conclusion, the knockout plasmid was reintroduced to look for complementation of the phenotype. Plasmid pMU1657 was transformed and the same plating procedure was used to assay for complementation and/or insensitivity to 8-azahypoxanthine. Results were exactly as expected. The Δhpt strain was completely insensitive to the drug and the strain containing the complementing plasmid showed sensitivity comparable to wild type levels (
To create a deletion of the hpt gene in T. saccharolyticum, the deletion vector pMU256 was built and transformed into T. saccharolyticum cells. The pMU256 plasmid is depicted in
To determine if hpt can used as a positive selectable marker, both C. thermocellum strain 1313 and the Δhpt strain were grown to an OD ˜1.0 and plated on several concentrations of mycophenolic acid. Results of these experiments are shown in
Plasmid pMU1745 is a replicating vector designed and constructed to make in-frame clean deletions of the lactate dehydrogenase (ldh) gene (
One colony of the selected colonies described above was inoculated into C. thermocellum growth media and the following morning a dilution series of this culture was plated on minimal media containing 500 ug/ml 8-azahypoxanthine. Eight colonies were PCR screened using primers flanking the LDH locus and a band of ˜2.1 kb representing the clean deletion was observed in seven out of eight colonies (
To further confirm that lactate deydrogenase was deleted from the genome, two batch fermentations were run containing 5 g/l cellobiose and 5 g/l avicel respectively. In both fermentations, no lactate was made by the Δldh strain as shown in
Primer design for amplification of DNA from C. thermocellum 1313 was based on the available C. thermocellum ATCC 27405 genome (http://genome.jgi-psf.org/cloth/cloth.home.html). The oligonucleotides and the plasmids/strains used in this study are listed in Table 1. The 5′ and 3′ flanking regions (˜1 kb) of pyrF and pta were amplified and assembled using yeast gap repair cloning to create gene deletion plasmids. (Shanks et al. Appl Environ Microbiol 11:5027-5036. (2006)).
E. coli-C. thermoceelum cloning vector, Ap
S. cerevisiae-E. coli-C. thermocellum shuttle vector for
S. cerevisiae-E. coli-C. thermocellum shuttle vector, C. therm
E. coli-C. thermocellum shuttle vector, C. the ΔpyrF cassette,
E. coli
S. cereviciae
C. thermocellum
1New England Biolabs, Ipswich, MA
2
Bacillus Genetic Stock Center, http://www.bgsc.org/
3Invitrogen, Carlsbad, California
4Deutsche Sammlung von Mikroorganismen and Zellkulturen GmbH, Germany
The pyrF and pta deletion vectors (pMU769 and pMU1162, respectively) contained cat (chloramphenicol acetyl transferase) expressed from the C. thermocellum gapDH promoter (gapDHp) positioned between the 5′ and 3′ flanking regions. The pyrF complementing construct (pMU612) contained pyrF expressed from the C. thermocellum cellobiose phosphorylase (cbp) promoter (cbpp). All DNA manipulations and cloning procedures were performed as per Maniatis et al.
For this transformation protocol a pulse generator was custom built and utilized a solid-state insulated-gate bipolar transistor (IGBT) instead of a power tetrode, as the high voltage switch (Infineon, part no. FZ200R65KF2). The device was charged with a high-voltage power supply from Emco (part no. F101). The charge was stored in an 8 kV 32 capacitor made by General Atomics (part no. 39742). Pulse duration and interval was controlled by an arbitrary function generator (Tektronix, part no. AFG3101). All manipulations were done under anaerobic conditions. Cultures were grown to mid-log phase (OD600=0.4-0.8) in rich medium and harvested by centrifugation (2200×g for 12-14 min). Cells were washed twice in autoclaved, deionized water and the final pellet was resuspended in 200 μl deionized water. For each transformation, 20 μl of cell suspension was added, along with 1-8 μg of plasmid DNA, to a 0.1 cm gap electorporation cuvette (Fisher Scientific).
A series of 60 square pulses were applied to the sample. The period of the pulses was 300 μs and the amplitude was 1.9 kV, resulting in an applied field strength of 19 kV/cm. After pulsing, cells were recovered overnight (15-18 h) at 51° C. in 3-5 ml rich medium. For liquid selection, recovered cultures were inoculated (10% v/v) into either rich medium supplemented with 3-6 μg/ml thiamphenicol (Tm) or uracil-free MJ medium when selecting for uracil prototrophy. For selecting transformants on solid medium, the recovery cultures were plated as agar suspensions in rich medium with 3-6 μg/ml Tm or MJ medium. To select pyrF mutants, transformants were grown in 3 μg/ml Tm. The cultures were then diluted to approximately 108 cells/ml and 100 μl of the diluted culture was plated as agar suspensions in rich medium containing 5-FOA. 5-FOA resistant colonies were screened by PCR using primers, which anneal outside of the regions of homology used to delete pyrF. The pyrF::gapDHp-cat mutants were isolated and the same primer set was used to screen pyrF::gapDHp-cat mutants.
To select pta::gapDHp-cat mutants, the ΔpyrF strain transformed with pMU1162 was gown in 5 ml of rich medium supplemented with 6-12 μg/ml Tm or in MJ medium for about 14-16 hours. Various volumes of the cultures (ranging from 100 to 1 ml) were plated as agar suspensions of rich medium containing 5-FOA and 48 μg/ml Tm. Resistant colonies were screened by PCR using primers which anneal outside of the regions of homology used to delete pta.
In order to create a marked mutation, a positive selection was needed to select for a chromosomal integration event and a negative selection was needed to select for loss of the replicating knock out plasmid. The latter component can be achieved using the ΔpyrF strain and ectopic expression of pyrF from a plasmid. To achieve the former, the cat marker, which provides Tm resistance at thermophilic temperatures from a multi-copy plasmid, was used for its ability to provide Tm resistance when harbored in single copy on the chromosome at the pyrF locus. An allelic replacement vector was constructed, pMU769, to delete the pyrF gene and replace it with cat controlled by the native glyceraldehyde 3-phosphate dehydrogenase (gapDH) promoter of C. thermocellum. The vector contained gapDHp-cat elements positioned between 5′ and 3′ pyrF flanking DNA. To replace pyrF with gapDHp-cat, C. thermocellum transformants containing pMU769 were subjected to two simultaneous selections in liquid, rich medium. Thiamphenicol was used to select for the plasmid encoded gapDHp-cat, while 5-FOA was used to select against chromosomal pryF. Recovered cultures were evaluated by PCR using primers that anneal upstream and downstream of pyrF. Using these conditions, replacing the pyrF gene with gapDHp-cat increased the PCR amplicon size by ˜300 bp as compared to the wt. This result demonstrated that cat expressed from the gapDH promoter was functional in a single copy on the C. thermocellum chromosome and could be used as marker for allele replacement.
Mixed acid fermentation of C. thermocellum involves co-production of lactic acid, acetic acid, formic acid, and ethanol. For C. thermocellum strain DSM 1313 acetic acid one co-product that needs to be eliminated to create a strain with increased ethanol yield. The production of acetic acid from acetyl-CoA involves two enzymatic activities that are catalyzed by Pta and Ack. The scheme used to replace pta with cat expressed from the gapDH promoter in the C. thermocellum pyrF background is shown in
Growth rate measurements were performed in a 200 μl volume in a 96-well plate at 55° C. The optical density at 600 nm was read by a Powerwave XS platereader customized by the manufacturer to incubate up to 68° C. (BioTek). The plates were shaken continuously and read at three minute intervals. Each sample was measured in quadruplicate. The specific growth rate (0 was determined by measuring the slope of the natural log-transformed OD readings. A two hour sliding window of OD readings between 0.08 and 1.00 were used for determination of maximum rate μmax. In all cases, the R-squared value was greater than 0.99.
The growth of the pta::gapDHp-cat strain was compared to the wt and ΔpyrF strains in rich medium, with and without uracil supplementation. Although initial rates of growth of the ΔpyrF and wt strains were similar, the ΔpyrF strain slowed abruptly at an OD of ˜0.7, while the wt continued to grow until it reached an OD of ˜1.6, suggesting that the rich medium was uracil-limited. Supplementing the medium with an additional 40 μg/ml uracil eliminated the growth defect of the ΔpyrF strain and resulted in a growth curve that was indistinguishable from the wt strain. Even with additional uracil supplementation to compensate for the ΔpyrF mutation, the maximum specific growth rate (μmax) of the pta::gapDHp-cat strain was about one third lower than that of either the wt or the ΔpyrF strains and the final OD was also reduced. This indicates that the growth defect of the pta::gapDHp-cat strain is distinct from the growth defect of the ΔpyrF strain and is a result of the pta mutation. End product analysis was performed on batch fermentations started at pH 7.0 with 5 g/l cellobiose as the primary carbon source under anaerobic conditions with a nitrogen atmosphere and 80 ml working volume. After 48 hours of fermentation the wt and ΔpyrF strain produced about 1 g/l of acetic acid whereas the acetic acid production of the pta::gapDHp-cat was indistinguishable from background levels (average 0.03 g/l). All three strains produced comparable amounts of ethanol and lactic acid. Due to the growth defect of the pta::gapDHp-cat strain a 96 hour sample point was taken but acetate levels did not change, measuring 0.031 g/l. The average dry cell mass for wt, ΔpyrF, and pta::gapDHp-cat strains were 0.54 g, 0.54 g and 0.35 g, respectively indicating that the pta::gapDHp-cat strain made about one-third less biomass compared to the wt and ΔpyrF strain.
Number | Date | Country | Kind |
---|---|---|---|
61232648 | Aug 2009 | US | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US10/45019 | 8/10/2010 | WO | 00 | 10/11/2012 |