Provided herein are compositions and methods for production of proteins in microbial systems. In particular, provided herein are compositions and methods for expressing exogenous genes from specific chromosomal locations of E. coli at exceptionally high efficiency.
Escherichia coli has been the pioneering host for recombinant protein production, since the original recombinant DNA procedures were developed using its genetic material and infecting bacteriophages. As a consequence, and because of the accumulated know-how on E. coli genetics and physiology and the increasing number of tools for genetic engineering adapted to this bacterium, E. coli is the preferred host when attempting the production of a new protein. Also, it is still the first choice for protein production at laboratory and industrial scales for an important number of proteins, because of its fast growth and simple culture procedures.
However, when searching for an ideal system for protein production, this bacterial species is clearly far from offering, in generic terms, optimal conditions for protein production and downstream. Plasmid loss and antibiotic-based maintenance, undesired chemical inducers of gene expression, plasmid/protein-mediated metabolic burden and stress responses, lack of post-translational modifications (including the inability to form disulphide bonds), no or poor secretion, protein aggregation and proteolytic digestion, endotoxin contamination and complex downstream are among the main obstacles encountered during protein production in E. coli.
Improved systems and methods for expressing proteins in E. coli are needed.
Provided herein are compositions and methods for production of proteins in microbial systems. In particular, provided herein are compositions and methods for expressing exogenous genes from specific chromosomal locations of E. coli designed to yield very high expression levels by virtue of the chromosomal context of the integration.
The compositions and methods described herein overcome limitations of prior E. coli expression systems of low levels of expression. The systems and methods described herein find use in commercial and clinical applications for which a high level of expression of genes of interest is desired (e.g., production of industrial, research, and pharmaceutical proteins).
For example, in some embodiments, provided herein is a E. coli bacterium comprising a recombination target site and/or heterologous genes at one or more genomic positions selected from, for example, nucleotides 281404 to 288692, 4189573 to U.S. Pat. Nos. 4,204,803, 4,128,474 to U.S. Pat. Nos. 4,165,334, 4,003,592 to U.S. Pat. Nos. 4,060,559, 3,949,128 to U.S. Pat. Nos. 3,980,248, 4,313,204 to 4318232, or 3387507 to U.S. Pat. No. 3,439,571 (using numbering from GenBank sequence accession U00096.3) or homologous regions in other genomes and sequences at least 90% homologous (e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% homologous).
Further embodiments provide an E. coli bacterium comprising a recombination target site and/or heterologous gene at one or more genomic positions comprising at least one expression enhancement sequence. In some embodiments, the expression enhancement sequence is at least 50% (e.g., at least 50%, 60%, 65%, 70%, or more) GC nucleotides. In some embodiments, the recombination target site and/or heterologous genes is flanked on both sides by said expression enhancement sequence. In some embodiments, the expression enhancement sequence is at least 500 bp (e.g., at least 500, 600, 750, 1000, 2000, or more bp) per side.
The present disclosure is not limited to particular recombination target sites. In some embodiments, the recombination target site is a site-specific recombination target site (e.g., including but not limited to, FLP recombination target (FRT), LOX, attP/B recognition sites, gamma-delta resolvase site, lambda red mediated recombination, Tn3 resolvase site, Clustered regularly interspaced short palindromic repeats (CRISPR), or φC31 integrase target site). In some embodiments, the target site further comprises one or more elements selected from, for example, a reporter construct, a promoter, a repressor, a selectable marker or a purification tag gene. The present disclosure is not limited to particular promoters. In some embodiments, the promoter is an inducible promoter. Examples include, but are not limited to, a lac promoter, a tac promoter, or a T7 promoter. In some embodiments, the bacterium expresses a gene of interest inserted into the chromosome at the genomic position at a higher level relative to the level of expression of the gene interested into a different location on the chromosome (e.g., at least 1.5, 2, 5, 10, 20, 30, 50, or 100-fold higher).
Additional embodiments provide a kit or system, comprising: a) a bacterium described herein; and b) a recombination enzyme specific for the recombination target. In some embodiments, the kit or system further comprises one or more components selected from, for example, growth medium, growth container, a plasmid or linear nucleic acid for inserting a gene of interest incorporated between a site-specific recombination target sequence, or components for purification of a protein expressed from said gene of interest.
Yet other embodiments provide a method of expressing a gene of interest, comprising: a) contacting a nucleic acid encoding said gene of interest with a bacterium described herein under conditions such that the nucleic acid integrates into the chromosome of the bacterium at the genomic positions; and b) expressing the gene of interest. In some embodiments, the nucleic acid is on a plasmid in between a site-specific recombination target sequence. In some embodiments, the method further comprises the step of purifying a protein expressed from the gene of interest (e.g., using one or more of affinity purification for the purification tag, ion exchange chromatography, or size exclusion chromatography). In some embodiments, the gene of interest encodes a protein selected from, for example, a research protein, a pharmaceutical protein, or an industrial protein.
Still other embodiments provide a bacterium described herein for use in expressing a gene of interest.
Additional embodiments are described herein.
To facilitate an understanding of the present invention, a number of terms and phrases are defined below:
As used herein, the “expression enhancement sequence” refers to a nucleic acid that enhances expression of a gene or other nucleic acid when located in the vicinity of the expression construct comprising the gene or other nucleic acid to be expressed. In some embodiments, the expression enhancement sequence flanks one or more sides (e.g., 3′ or 5′ side) of the expression construct. In some embodiments, the expression enhancement sequence is at least 50% (e.g., at least 50%, 60%, 65%, 70%, or more) GC nucleotides. In some embodiments, the expression enhancement sequence is at least a portion of nucleotides 281404 to 288692, 4189573 to U.S. Pat. Nos. 4,204,803, 4,128,474 to U.S. Pat. Nos. 4,165,334, 4,003,592 to U.S. Pat. Nos. 4,060,559, 3,949,128 to U.S. Pat. Nos. 3,980,248, 4,313,204 to 4318232, or 3387507 to U.S. Pat. No. 3,439,571 (using numbering from GenBank sequence accession U00096.3 or homologs thereof) of the E. coli genome and sequences at least 90% (e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% homologous).
As used herein the term, “in vitro” refers to an artificial environment and to processes or reactions that occur within an artificial environment. In vitro environments include, but are not limited to, test tubes and cell cultures. The term “in vivo” refers to the natural environment (e.g., an animal or a cell) and to processes or reaction that occur within a natural environment.
As used herein, the term “microbe” refers to a microorganism and is intended to encompass both an individual organism, or a preparation comprising any number of the organisms.
As used herein, the term “microorganism” refers to any species or type of microorganism, including but not limited to, bacteria, archaea, fungi, protozoans, mycoplasma, and parasitic organisms.
As used herein, the term “prokaryotes” refers to a group of organisms that usually lack a cell nucleus or any other membrane-bound organelles. In some embodiments, prokaryotes are bacteria. The term “prokaryote” includes both archaea and eubacteria.
As used, the term “eukaryote” refers to organisms distinguishable from “prokaryotes.” It is intended that the term encompass all organisms with cells that exhibit the usual characteristics of eukaryotes, such as the presence of a true nucleus bounded by a nuclear membrane, within which lie the chromosomes, the presence of membrane-bound organelles, and other characteristics commonly observed in eukaryotic organisms. Thus, the term includes, but is not limited to such organisms as fungi, protozoa, and animals (e.g., humans).
As used herein, the term “cell culture” refers to any in vitro culture of cells, including, e.g., prokaryotic cells and eukaryotic cells. Included within this term are continuous cell lines (e.g., with an immortal phenotype), primary cell cultures, transformed cell lines, finite cell lines (e.g., non-transformed cells), bacterial cultures in or on solid or liquid media, and any other cell population maintained in vitro.
As used herein, the term “sample” is used in its broadest sense. In one sense, it is meant to include a specimen or culture obtained from any source, as well as biological and environmental samples. Such examples are not however to be construed as limiting the sample types applicable to the present invention.
Provided herein are compositions and methods for production of proteins in microbial systems. In particular, provided herein are compositions and methods for expressing exogenous genes from specific chromosomal locations of E. coli.
The bacterial nucleoid is a dense structure composed of DNA, RNA and proteins that excludes other abundant cellular machinery, such as ribosomes and RNA polymerase (RNAP), from its interior (Chai et al. 2014; Jin and Cabrera 2006; Bakshi, Choi, and Weisshaar 2015). Several studies have demonstrated that packing of the nucleoid is non-random and condition dependent. For example, chromosome conformation capture studies in multiple bacterial species have revealed segments of DNA that preferentially self-interact, called chromosome interaction domains (Lioy et al. 2018a; Le et al. 2013a; Marbouty et al. 2015; Wang et al. 2015). During exponential growth, RNAPs are also organized into tight foci on the nucleoid surface, actively transcribing ribosomal RNA operons (rrn) (Cabrera and Jin 2006a). Despite the specific localization of DNA and RNAP, previous findings have shown that gene expression from different genomic loci is roughly equivalent, except for the effect of gene dosage, which decreases from the origin of replication to the terminus during exponential growth (Beckwith, Signer, and Epstein 1966; Sousa, de Lorenzo, and Cebolla 1997). Higher gene dosage near the origin is a result of multiple replication initiation events before terminus replication and cell division (Cooper and Helmstetter 1968).
Experiments conducted during the course of development of embodiments of the present disclosure identified a number of regions of the E. coli chromosome that resulted in increased levels of expression of genes of interest relative to other chromosomal locations.
Accordingly, provided herein are compositions and methods for expressing genes of interest in E. coli by insertion in the chromosome at the identified locations (e.g., the locations described in Table 1 or sequence at least 90% homologous (e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% homologous) to the regions described in Table 1) or inserted within expression enhancement sequences.
In some embodiments, the present disclosure provides expression enhancement sequences for expression of genes or interest. As described in Example 2, experiments conducted during the course of development of embodiments of the present disclosure identified GC content of the flanking sequence as a factor in levels of expression of a reporter gene. In some embodiments, the expression enhancement sequence is at least 50% (e.g., at least 50%, 60%, 65%, 70%, or more) GC nucleotides. In some embodiments, the recombination target site and/or heterologous genes is flanked on both sides by said expression enhancement sequence. In some embodiments, the expression enhancement sequence is at least 500 bp (e.g., at least 500, 600, 750, 1000, 2000, or more bp) per side.
In some embodiments, the present disclosure provides an E. coli bacterium engineered for expression of genes of interest at one or more of the locations recited in Table 1 or inserted within expression enhancement sequences. The present disclosure is not limited to particular methods for use in incorporating a nucleic acid encoding a gene of interest into the chromosome of the bacterium. In some embodiments, site-specific recombination is utilized. Site-specific recombination, also known as conservative site-specific recombination, is a type of genetic recombination in which DNA strand exchange takes place between segments possessing at least a certain degree of sequence homology. Site-specific recombinases (SSRs) perform rearrangements of DNA segments by recognizing and binding to short DNA sequences (sites), at which they cleave the DNA backbone, exchange the two DNA helices involved and rejoin the DNA strands. While in some site-specific recombination systems just a recombinase enzyme and the recombination sites is enough to perform all these reactions, in other systems a number of accessory proteins and/or accessory sites are also utilized.
Recombination sites are typically between 30 and 200 nucleotides in length and consist of two motifs with a partial inverted-repeat symmetry, to which the recombinase binds, and which flank a central crossover sequence at which the recombination takes place. The pairs of sites between which the recombination occurs are usually identical, but there are exceptions (e.g. attP and attB of λ integrase).
The present disclosure is not limited to particular site specific recombination systems. Examples include, but are not limited to, FLP/FLP recombination target (FRT), CRE/LOX, lambda-integrase (using attP/B recognition sites), gamma-delta resolvase (from the Tn1000 transposon), Tn3 resolvase (from the Tn3 transposon), lambda red mediated recombination (from phage lambda), CRISPR/CAS9, transposases (e.g., Tn5) and φC31 integrase (from the φC31 phage).
Flp-FRT recombination involves the recombination of sequences between short flippase recognition target (FRT) sites by the recombinase flippase(Flp)derived from the 2 plasmid of baker's yeast Saccharomyces cerevisiae. The 34 bp minimal FRT site sequence has the sequence 5′GAAGTTCCTATTCtctagaaaGtATAGGAACTTC3′ (SEQ ID NO:1) for which flippase (Flp) binds to both 13-bp 5′-GAAGTTCCTATTC-3′ (SEQ ID NO:2) arms flanking the 8 bp spacer, e.g., the site-specific recombination (region of crossover) in reverse orientation. FRT-mediated cleavage occurs just ahead from the asymmetric 8 bp core region (5′tctagaaa3′) on the top strand and behind this sequence on the bottom strand (See e.g., Zhu X D, Sadowski P D (1995). “Cleavage-dependent Ligation by the FLP Recombinase”. Journal of Biological Chemistry. 270 (39): 23044-54; herein incorporated by reference in its entirety).
Cre-Lox recombination is similar to the FLP/FRT system. The system comprises a single enzyme, Cre recombinase, which recombines a pair of short target sequences called the Lox sequences (ATAACTTCGTATA-NNNTANNN-TATACGAAGTTAT) (SEQ ID NO:3). The Cre enzyme and the original Lox site called the LoxP sequence are derived from bacteriophage P1 (See e.g., Sauer, B. (1987). “Functional expression of the Cre-Lox site-specific recombination system in the yeast Saccharomyces cerevisiae”. Mol Cell Biol. 7 (6): 2087-2096; herein incorporated by reference in its entirety).
Lambda-integrase (using attP/B recognition sites) is described, for example, in Landy, A. (1989). “Dynamic, Structural, and Regulatory Aspects of lambda Site-Specific Recombination”. Annual Review of Biochemistry. 58 (1): 913-41; herein incorporated by reference in its entirety. Additional phage and transposon systems are described, for example, in Stark, W. M.; Boocock, M. R. (1995). “Topological selectivity in site-specific recombination”. Mobile Genetic Elements. Oxford University Press. pp. 101-29; herein incorporated by reference in its entirety.
Lambda-red mediated recombination is described in Datsenko and Wanner 2000. “One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products”. Proc Natl Acad Sci USA. 2000 Jun. 6; 97(12):6640-5; herein incorporated by reference in its entirety). By introducing a PCR product or plasmid containing regions for recombination at a specific genomic location, a genetic element may be permanently introduced at a specific location in a host genome.
Clustered regularly interspaced short palindromic repeats (CRISPR) are segments of prokaryotic DNA containing short, repetitive base sequences. These play a key role in a bacterial defense system, and form the basis of a genome editing technology known as CRISPR/Cas9 that allows permanent modification of genes within organisms (See e.g., Zhang F, Wen Y, Guo X (2014). “CRISPR/Cas9 for genome editing: progress, implications and challenges”. Human Molecular Genetics. 23 (R1): R40-6; herein incorporated by reference in its entirety). By delivering the Cas9 nuclease complexed with a synthetic guide RNA (gRNA) into a cell, the cell's genome can be cut at a desired location, allowing existing genes to be removed and/or new ones added.
In some embodiments, transposases (e.g., Tn5) and transopons are utilized for incorporation of genes of interest (See e.g., Example 1 and Reznikoff W S (March 2003). “Tn5 as a model for understanding DNA transposition”. Molecular Microbiology. 47 (5): 1199-206; herein incorporated by reference in its entirety).
In some embodiments, the recombination enzyme specific for the chosen target site is provide on a plasmid or chromosomal location (e.g., under the control of an inducible promoter).
In some embodiments, target sites on the E. coli chromosome further comprise additional elements useful for expression of a gene of interest. Examples include, but are not limited to, promoters (e.g. inducible or constitutive promoters), repressors or operators for control of the promoter, reporter constructs, or a gene encoding a protein purification tag.
In some embodiments, reporter constructs are utilized to measure or identify expression or the gene or nucleic acid of interest. The present disclosure is not limited to particular reporter constructs. Examples include, but are not limited to, fluorescent reporters (e.g., green fluorescent protein or red fluorescent protein), beta-galactosidase, luciferase, chloramphenicol acetyltransferase etc. (See e.g., Ghim et al., BMB Rep. 2010 July;43(7):451-60; herein incorporated by reference in its entirety) for a review.
The present disclosure is not limited to particular promoters. In some embodiments, the promoter is an inducible promoter. Examples include, but are not limited to, a lac promoter, a tac promoter, or a T7 promoter (See e.g., Dubendorff J W, Studier F W (1991). “Controlling basal expression in an inducible T7 expression system by blocking the target T7 promoter with lac repressor”. Journal of Molecular Biology. 219 (1): 45-59; deBoer H. A., Comstock, L. J., Vasser, M. (1983). “The tac promoter: a functional hybrid derived from trp and lac promoters”. Proceedings of the National Academy of Sciences USA. 80 (1): 21-25; each of which is incorporated by reference in its entirety).
Examples of protein purification tags include, but are not limited to, histidine (His) tag, glutathione S-transferase or maltose-binding protein.
In some embodiments, the bacterium described herein is provided in the form of a kit or system, comprising, in one or more containers, one or more or all of a bacterium described herein, a recombination enzyme specific for the recombination target on the bacterial chromosome (e.g., provided as a plasmid or incorporated into the genome of the bacterium), growth medium, growth container, inducers or enhancers of an inducible promoter, a plasmid for inserting a gene of interest in between a site-specific recombination target sequence, or components for purification of a protein expressed from said gene of interest.
In operation, the present disclosure provides a method of expressing a gene of interest at a specific location (e.g., those described in Table 1 or sequences with a high GC content as described above) on an E. coli chromosome. In order to express a gene of interest, one introduces a nucleic acid encoding the gene of interest (e.g., on a plasmid or other vector) into a bacterium described herein. In some embodiments, the gene is inserted in between target recombination sites on the plasmid or vector. The vector comprising the gene of interest is transferred into the bacterium (e.g., using transfection or other technique). The recombination enzyme is provided (e.g., by inducing its expression) such that the gene of interest incorporates into the chromosome at the desired location using site-specific recombination.
In some embodiments, bacteria are cultured (e.g., using a liquid culture system or on a culture medium) in order to produce protein encoded by the gene of interest. Bacteria are grown for the desired length of time in order to produce adequate amounts of protein. In some embodiments, growth is monitored (e.g., via light microscopy or visible observation). In some embodiments, culture medium is replenished one or more times during culture.
In some embodiments, the bacterium expresses a gene of interest inserted into the chromosome at the genomic position at a higher level relative to the level of expression of the gene interested into a different location on the chromosome (e.g., at least 1.5, 2, 5, 10, 20, 30, 50, or 100-fold higher).
In some embodiments, following expression, a protein expressed from the gene of interest is purified. Following lysis of cells, proteins of interest are purified using any suitable method (e.g., one or more of affinity purification for a purification tag, ion exchange chromatography, or size exclusion chromatography).
The present disclosure is not limited to particular genes of interest. In some embodiments, the gene of interest encodes a protein for research or screening use, a pharmaceutical protein, or an industrial protein.
The following example is provided in order to demonstrate and further illustrate certain preferred embodiments and aspects of the present invention and are not to be construed as limiting the scope thereof.
The mNG coding sequence was obtained from Allele Biotech. mNG was under the TetR1 promoter and the B0030 ribosome binding site, which is predicted to have 30-fold lower translation initiation rate than the highest rate of a native gene in E. coli. (Kosuri et al. 2013b; Espah Borujeni, Channarasappa, and Salis 2014). Upstream of the mNG cassette, an FRT-flanked kanamycin resistance cassette amplified from the Keio collection was introduced in the divergent orientation relative to mNG (Baba et al. 2006). Directly downstream of the mNG coding sequence, an Illumina i5 adapter primer complement sequence and an AscI recognition site for later barcoding of the integration construct were inserted. The reporter and antibiotic cassettes are flanked by the strong bidirectional terminators L3S2P21 and ECK120026481 (Y.-J. Chen et al. 2013b). Finally the entire cassette is flanked by mosaic ends (MEs) to allow for binding to Tn5 transposase. The ME-flanked construct was modified to remove two PvuII restriction sites in order to allow for PvuII digestion of the plasmid pSAS31 and release the integration construct for Tn5 transposase binding in vitro.
MG1655 (CSGC 7740) was obtained from the Coli Genetic Stock Center (CGSC, Yale) (Blattner 1997). P1 vir transduction was used to introduce the Z1 cassette from MG1655 Z1 malE (Addgene plasmid #65915) into MG1655. This MG1655 Z1 strain was then transformed with the lambda red plasmid pSIM5. The primers BT1promCh F and BT1promCh R were used to amplify the mCherry and ampicillin resistance cassette from pBT1-proD-mCherry (Addgene plasmid #65823). The mCherry cassette was then integrated into a site directly downstream from yihG using lambda red recombination to produce ecSAS17 (MG1655 malE::Z1 mCherry+AmpR). The mCherry integration was confirmed by genotyping and the transduction of the Z1 cassette by observing TetR-mediated repression of mNG compared to a blank MG1655 strain. ecSAS17 was then transformed with the pBAD-Flp plasmid (see below) to provide the starting strain for library generation.
pSAS31 was digested with the restriction enzyme AscI. Primers were used to introduce the barcode and amplify the entire plasmid by PCR (
To generate stable transposomes for electroporation into the target strain, barcoded pSAS31 plasmid was digested with PvuII for one hour at 37° C. and fragments were separated on a 0.8% agarose gel. The band corresponding to the integration fragment size was cut out of the gel and purified. 200 ng/μl fragment was then incubated with 2 μl Tn5 transposase and 1 μl glycerol according to the manufacturer's instructions. After 30 minutes incubation at room-temperature, the mixture was stored at −20° C. Electrocompetent cells were prepared using ecSAS17 with chloramphenicol included in the growth medium in order to maintain the pBADpCP20 Flp recombinase plasmid. 1 μl of the Tn5-DNA complex was mixed with 50 μl of fresh electrocompetent cells. Four separate electroporations were carried out at 1800 kV and immediately resuspended in 1 mL of 30° C. SOC medium. Each reaction was pooled into SOC medium including chloramphenicol and incubated at 30° C. for 1.5 hours. An aliquot for plating was removed from the recovery medium before adding Kanamycin. Liquid selection proceeded for 16 hrs at 30° C. After liquid selection, all cells were pelleted at 4600×g for 7 minutes. Cells were then resuspended in 30 mL 15% glycerol, pipetted into 30 1 mL aliquots and snap frozen in a dry-ice ethanol bath before storage of the transposon library at −80° C. (Girgis et al. 2007b). According to colony forming unit counts from plating after recovery, 609,000 cells were uniquely transformed and maintained pBAD-Flp, as indicated by resistance to kanamycin and chloramphenicol.
Pairing Integration Site with Barcode Via Transposon Footprinting
Cells from one aliquot of the transposon library were recovered in 5 mL SOC for 30 minutes at 30° C. with shaking. Genomic DNA was isolated from the library using the Qiagen Blood and Tissue kit for Gram negative bacteria. 1 μ g of the resulting DNA was digested with either CviAII or CviQI restriction enzymes (each has a different 4 bp cut site but leaves compatible overhangs). An annealed Y-linker that complements the overhangs was ligated to the digested DNA fragments with T4 DNA ligase for 10 minutes. The reaction was quenched with EDTA. The DNA from the ligation mix was purified with Axygen AxyPrep Mag PCR cleanup beads at a 0.9:1 bead to DNA ratio to remove unligated Y-linker. The resulting DNA was amplified by PCR using the primers that bind within the transposon and on the Y-linker to amplify transposon-genomic DNA specific fragments. Illumina adapters were added to the resulting fragment by PCR.
The cryopreserved transposon library was scraped into 1 mL of M9-EZrich medium and diluted into 50 mL of M9-EZrich 1% Arabinose+0.4% glycerol+chloramphenicol in a baffled 125 mL flask to achieve OD of 0.0031 600 nm. The flask was incubated at 30° C. for 8 hours with shaking at 225 rpm to allow Flp recombinase to excise the kanamycin resistance cassette. Cells were then pelleted at 4600×g for 7 minutes and resuspended in 50 mL PBS. In parallel, an aliquot of the culture was diluted and plated on LB-kanamycin and LB plates to determine the fraction cell that permanently lost kanamycin resistance (<93%). Cells were pelleted again and resuspended in 10 mL M9RDM. Cells were then diluted into 100 mL of M9RDM+100 ng/mL Anhydrotetracycline (aTc) to a final 0.0031 OD600. The culture was incubated at 37° C. until an OD600 of 0.2 was reached (about 6 hours) to allow induction of the transposon-born reporter construct. The entire flask was then immediately transferred to an ice-slurry bath. Three aliquots of 5 mL were then pelleted at 6600×g for 3 minutes and snap-frozen in a dry-ice ethanol bath to allow harvest of genomic DNA. In parallel, three additional aliquots of 5 mL of the culture was rapidly mixed with 25 mL Bacteria RNA protect reagent (Qiagen) and frozen according to the manufacturer's instructions to allow harvest of RNA from matched samples of the growing library. All samples were then stored at −80° C.
Genomic DNA (gDNA) from harvested samples was extracted following the Qiagen Blood and Tissue kit instructions. 1 μg of gDNA was then digested for 1 hour with CviQI. The resulting DNA was purified with PCR cleanup kit and eluted into 0.1×TE. The DNA was then lightly amplified with primers flanking the barcode for eight cycles using Q5 polymerase, resulting in a 186 bp fragment. The DNA from the PCR mix was purified with Axygen AxyPrep Mag PCR cleanup beads at a 0.9:1 bead to DNA ratio to remove unincorporated primers. RNA from the exponentially growing cells was extracted following the Qiagen RNeasy Bacterial RNA protect protocol including on-column DNaseI treatment. 1 g of the resulting RNA and a single reverse primer were used for first strand synthesis with the NEB ProtoscriptII First Strand cDNA kit using the manufacturer's instructions, and the resulting cDNA was stored at −20° C. No-polymerase controls (−RT) were included. 20 μl of the cDNA or 5 μl of cDNA reaction mixture was used for a 50 μl minimal-cycle PCR amplification using NEB Q5 hotstart polymerase, following the manufacturer's instructions with the following modifications: NEB i5xx or i7xx primers were used to add Illumina adapter sequences. EvaGreen dsDNA dye to a final 1× concentration was added to each reaction. 10 μl of each reaction (including −RT controls) were then monitored for qPCR fluorescence signal during PCR amplification. The remaining 40 μl of each reaction was then amplified with the number of PCR cycles corresponding to 25% of the maximum fluorescence observed in the 10 μl qPCR pilot reaction. The cycle threshold for the −RT cDNA controls were verified to be at least 7 cycles greater than the standard cDNA samples (indicating background from DNA contamination of less than 1%). Each 40 μl PCR reaction was then purified with 90 μl of Axygen MAG-S1 beads and eluted in 0.1×TE. The purified DNA was submitted to the University of Michigan sequencing core for sequencing on a NextSeq 550.
Using a modification of transposon footprinting (Girgis et al. 2007a), 144,000 random integration reporter barcodes were paired were to their genomic integration locations in E. coli. Higher densities of reporter integrations were present around the origin of replication (Ori), likely due to higher Ori-Terminus chromosome copy number during exponential phase growth at the time of transformation (
The seven ribosomal RNA operons in the E. coli genome are located within each of the major peaks of transcriptional propensity (
The correlation of transcriptional propensity with several characterized genomic features was examined using rolling-window medians over 500 bp for each data set. Despite the fact that the abundant NAP Fis is not expected to bind the reporter construct, transcriptional propensity is highly positively correlated with Fis binding level at genomic integration sites (Spearman ρ=0.5,
RNA abundance from native genes displays only a weak positive correlation with transcriptional propensity (Spearman ρ=0.23). However, when larger rolling median windows are used for RNA abundance from native genes, correlation with transcriptional propensity is much higher (Spearman ρ=0.51). These results show that while highly expressed genes are more frequently located in high transcriptional propensity regions, the regulatory logic governing expression of individual genes is dominant over the underlying transcriptional propensity of a given region.
The correlation of neighboring RNA abundance in all orientations relative to the reporter on transcriptional propensity was examined. Native RNA abundances resulting from tandem orientation (or co-directional) transcription with reporters have weak correlation with transcriptional propensity. These data indicate that even very high neighboring transcription may have only a mild impact on transcriptional propensity. Since the correlations of native RNA abundances with transcriptional propensity in the tandem orientation with respect to the reporter are very similar regardless of which is upstream, insulation by the strong upstream transcriptional terminator of the reporter is also validated. Correlation of transcriptional propensity with RNA abundance from the divergent or convergent orientation is lower compared to tandem orientations. Together, these results indicate that neighboring transcription mildly effects transcriptional propensity. These results are largely consistent with the careful mechanistic studies of orientation-dependent expression interference between genes on plasmids (Yeung et al. 2017b). Transcriptional propensity from reporters on each strand display the same overall waveform pattern and are highly correlated. No evidence for a strand bias depending on the reporter direction with respect to replication direction was observed.
Binding of other NAPs (HU, LRP and SeqA) was not well correlated with transcriptional propensity nor was RNAP binding. No correlation of transcriptional propensity with a measure of DNA supercoiling density or with reporter location with respect to genes encodeding proteins recognized by the signal recognition particle was observed. Mean adenine and thiamine (AT) content in a 500 bp window around insertion locations was strongly negatively correlated with transcriptional propensity. AT content is also highly correlated with H-NS and protein occupancy binding. For this reason, conditional mutual information analysis was used to rank features for predicting transcriptional propensity. Together, these features describe 69.9% of the transcriptional propensity variation.
iPAGE analysis was used to identify pathways (Gene Ontology annotations) that are significantly informative about the transcriptional propensity at each gene location (Goodarzi, Elemento, and Tavazoie 2009). Large ribosomal subunit genes are significantly informative of high transcriptional propensity (GO:0022625). Genes in pathways for enterobacterial common antigen biosynthesis or organic phosphonate catabolism are clustered together in high transcriptional propensity peaks. Cellular amino acid biosynthetic process (GO:0008652), which has 105 genes in E. coli, is also significantly predictive of high transcriptional propensity. Intracellular protein transmembrane transport (GO:0065002), which is composed of twin-arginine transport genes for the export of folded proteins and also genes for SRP-dependent cotranslational protein targeting machinery, which are not clustered together were also identified. However, there is no difference in local transcriptional propensity between genes that encode products that are recognized by SRP and all other genes.
The correlation between integration density and multiple known genomic features was also tested. A b-spline was used to transform integration density over 500 bp rolling window signal to correct for expected differences in integrations densities that result from high gene dosage near the origin during transformation and Tn5 integration. H-NS binding had a very strong positive correlation with integration density. In contrast, low gene expression from eukaryotic genomes is well-correlated with heterochromatin that is inaccessible to transposon insertion. The correlation with H-NS was higher than the positive correlation of integration density with AT-content.
Table 1 shows exemplary regions that led to high levels of expression of inserted genes. The peaks ranges were identified by the shortest possible length that encompassed a region of high transcriptional propensity above the regional median propensity. These ranges comprise extended regions (>5 kb) which together make up 4.4% of the genome with the highest transcriptional propensity.
Dozens of existing genomic features from the genomic regions described in Example 1, such as protein binding and genome conformation contact maps, were compared to determine features responsible for the substantial expression variation. Of all genomic features, the proportion of guanine (G) and cytosine (C) in the overall nucleobase composition was the most strongly correlated with transcriptional propensity. Additionally, it was observed that long stretches of extreme GC content have particularly extreme transcription differences. Based on these observations, a synthetic DNA construct was constructed with the exact same reporter construct flanked by 3.5 kilobases of DNA with extremely low (35%) or high (65%) GC content, obtained from the genomes of different organisms (S. cerevisiae and H sapiens, respectively). Two plasmids that were identical with the exception of the GC content flanking the reporter construct were transformed into the E. coli strain Top10. The fluorescence produced by the 65% GC content plasmid was very strong compared to the 35% plasmid, which does not appear to have a fluorescent signal compared to a strain that expresses a repressor for the reporter (
All publications, patents, patent applications and accession numbers mentioned in the above specification are herein incorporated by reference in their entirety. Although the invention has been described in connection with specific embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications and variations of the described compositions and methods of the invention will be apparent to those of ordinary skill in the art and are intended to be within the scope of the following claims.
This application claims priority to U.S. provisional patent application Ser. No. 62/666,198, filed May 3, 2018, which is incorporated herein by reference in its entirety.
This invention was made with government support under Grant No. GM097033 awarded by National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/030367 | 5/2/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62666198 | May 2018 | US |