This application contains a Sequence Listing which has been filed electronically in ASCII and is hereby incorporated by reference in its entirety. This ASCII copy, created on Dec. 2, 2019, is named M065670412US01-SUBSEQ-CRP and is 98.199 kB in size.
Disclosed herein are novel CRISPR/dCas9-based fusion proteins that produce significantly less toxicity in comparison to previously described CRISPR/Cas9-based proteins, and complex genetic circuits controlled by the novel CRISPR/dCas-9-based fusion proteins.
Synthetic regulatory networks enable the control of when genes are turned on (Khalil A. S. and Collins J. J., Nat. Rev. Genet., 2010 May; 11(5): 367-79). Natural networks can consist of hundreds of regulators, but implementing synthetic versions at this scale has proven elusive (Purnick P. E. and Weiss R., Nat. Rev. Mol. Cell. Biol., 2009 June; 10(6): 410-22). Regulators used to build such networks have to perform reliably, cannot interfere with each other, and must tax cellular resources minimally (Nielsen A. A., et al., Curr. Opin. Chem. Biol., 2013 December; 17(6): 878-92). Sets of protein-based repressors and activators have been used to build regulatory circuits, but expanding the set becomes increasingly difficult as each new protein needs to be tested for cross-reactions with the remainder in the set (Gaber R., et al., Nat. Chem. Biol., 2014 March; 10(3): 203-8; Garg A., et al., Nucleic Acids Res., 2012 August; 40(15): 7584-95; Li Y., et al., Nat. Chem. Biol., 2015 March; 11(3): 207-13; Nielsen A. A., et al., Science, 2016. 352(6281): aac7341; Stanton B. C., et al., Nat. Chem. Biol., 2014. 10(2): p. 99-105). Further, protein expression draws on cellular resources (ATP, ribosomes, amino acids, etc.), and this can result in slow growth, reduced metabolic performance, and evolutionary instability (Ceroni F., et al., Nat. Methods, 2018 May; 15(5): 387-93; Lynch M. and Marinov G. K., Proc. Natl. Acad. Sci. USA, 2015 Dec. 22; 112(51): 15690-5; Pasini M., et al., N. Biotechnol. 2016 Jan. 25; 33(1): 78-90).
Regulators based on CRISPR (clustered regularly interspaced short palindromic repeats) machinery offer a potential solution (Barrangou R., et al., Science, 2007 Mar. 23; 315(5819): 1709-12; Deltcheva E., et al., Nature, 2011 Mar. 31; 471(7340): 602-7; Jinek M., et al., Science, 2012. 337(6096): p. 816-821; Cong L., et al., Science, 2013. 339(6121): p. 819-23; Mali P., et al., Science, 2013. 339(6121): p. 823-26; Gasiunas G., et al., Proc. Natl. Acad. Sci. USA, 2012 Sep. 25; 109(39): 15539-40). Catalytically inactive dCas9 can be used as a repressor by using the small guide RNA (sgRNA) to target a sequence within a promoter to sterically block RNA polymerase (RNAP) (Qi Lei S., et al., Cell, 2013. 152(5): p. 1173-83; Bikard D., et al., Nucleic Acids Res., 2013 August; 41(15): 7429-37). The target sequence in the promoter is based on a 3 nt PAM sequence, which binds to the dCas9 protein, and a 20 nt targeting region that basepairs with the sgRNA. Different DNA sequences can be targeted by changing this region, which has been the basis for building large sets of sgRNA-promoter pairs that exhibit little or no crosstalk. Up to 5 pairs have been shown in E. coli (Nielsen A. A. and Voigt C. A., Mol. Syst. Biol., 2014. 10(763): 1-11) and up to 20 pairs in yeast (Gander M. W., et al., Nat. Commun., 2017 May 25; 8: 15459), but theoretically thousands could be made, essentially solving the need for orthogonal regulators to build large networks. In addition, sgRNA-circuits do not require translation to function, thus simplifying their use in the nucleus of eukaryotic cells. Previously, dCas9 has been used to build simple logic circuits and cascades with up to 3 sgRNAs in bacteria, 7 sgRNAs in yeast, and 4 sgRNAs in mammalian cells (Nielsen A. A. and Voigt C. A., Mol. Syst. Biol., 2014. 10(763): 1-11; Gander M. W., et al., Nat. Commun., 2017 May 25; 8: 15459; Didovyk A., et al., ACS Synth. Biol., 2016 Jan. 15; 5(1): 81-8; Gao Y., et al., Nat. Methods, 2016 December; 13(12): 1043-49; Holowko M. B., et al., ACS Synth. Biol., 2016 Nov. 18; 5(11): 1275-83; Kiani S., et al., Nat. Methods, 2014 July; 11(7): 723-6; Weinberg B. H., P et al., Nat. Biotechnol., 2017 May; 35(5): 453-62).
Despite the promise, there are several limitations in the scale-up of dCas9-based circuits. The foremost challenge is that high concentrations of dCas9 is toxic in many bacteria (Rock J. M., et al., Nat. Microbiol., 2017. 2(16274): p. 1-9; Cho S., et al., ACS Synth. Biol., 2018 Apr. 20; 7(4): 1085-94; Lee Y. J., et al., Nucleic Acids Res., 2016 Mar. 18; 44(5): 2462-73). This can be avoided for genome editing and CRISPR interference (CRISPRi) experiments by keeping the concentration low or limiting how long it is expressed (Peters J. M., et al., Curr. Opin. Microbiol., 2015 October; 27: 121-26). However, for a genetic circuit, dCas9 needs to be continuously available, including under the conditions required by the application, for example in a fermenter. This is compounded by the problem that multiple sgRNAs all have to share the same pool of dCas9. The draw-down of a shared resource leads to changes in performance of all the sgRNA, referred to as “retroactivity,” and this can have a damaging impact on circuit function (Del Vecchio D., et al., Mol. Syst. Biol., 2008. 4(161): 1-16; Jayanthi S., et al., ACS Synth. Biol., 2-13 Aug. 16; 2(8): 431-41; Brewster R. C., et al., Cell, 2014 March; 156(6): 1312-23; Qian Y., et al., ACS Synth. Biol., 2017 Jul. 21; 6(7): 1263-72). Further, sgRNA-based gates have remarkably low cooperativity (Hill coefficient n≈1.0) (Nielsen A. A. and Voigt C. A., Mol. Syst. Biol., 2014. 10(763): 1-11). Higher cooperativities are required to build regulation that implement multistable switches, feedback control, cascades, and oscillations (n>1) (Strogatz S. H., Hachette UK, 2014; Hooshangi S., et al., Proc. Natl. Acad. Sci. USA, 2005 Mar. 8; 102(10): 3581-86; Ferrell J. E. Jr and Ha S. H., Trends Biochem. Sci., 2014 December; 39(12): 612-8; Gardner T. S., et al., Nature, 2000 Jan. 20; 403(6767): 339-42). In yeast, the cooperativity of sgRNA-based regulation was increased by fusing dCas9 to the chromatin remodeling repression domain Mxil, but there is no equivalent approach for prokaryotes (Gander M. W., et al., Nat. Commun., 2017 May 25; 8: 15459).
The origins of dCas9 toxicity are poorly understood. It has been observed that dCas9 binds non-specifically to NGG PAM sites, particularly when unbound to a sgRNA, and there are many GG sequences in the genome (5.4×105 PAM sites per E. coli genome) (Jones D. L., et al., Science, 2017 Sep. 29; 357(6358): 1420-24). While it primarily binds to this motif, it has been shown that it can also inefficiently recognize other PAM sequences (e.g., NAG or NGA) (Hsu P. D., et al., Nat. Biotechnol., 2013. 31(9): 827-32; Zhang Y., et al., Sci. Rep., 2014. 4(5405): 1-5). Further, dCas9 functions by first actively interrogating the genome to search for the PAM motif, and then checking the complementarity of the sgRNA sequence to the target site (Jinek M., et al., Science, 2012. 337(6096): 816-821; Qi Lei S., et al., Cell, 2013. 152(5): 1173-83). The search for PAM binding involves actively opening the DNA double strands in the chromosome (Sternberg S. H., et al., Nature, 2014. 507(7490): 62-67). Previous studies also demonstrated that off-target genomic loci with up to six nucleotides that differ from the sgRNA sequence could still be recognized by Cas9, albeit with lower efficiency (but still requiring the PAM site) (Kim D., et al., Nat. Methods, 2015. 12(3): 237-43). These observations collectively point to the non-specific binding to NGG sequences by dCas9 as being a significant contributor to toxicity.
It was hypothesized that reducing the non-specific binding of dCas9 would alleviate toxicity. The specificity of active Cas9 for genome editing applications has been increased via a variety of strategies, including point mutations to enhance PAM binding (Kleinstiver B. P., et al., Nature, 2015. 523(7561): p. 481-85; Slaymaker I. M., et al., Science, 2016. 351(6268): 84-88), increasing sgRNA length (Fu Y., et al., Nat. Biotechnol., 2014. 32(3): 279-84; Chen B., et al., Cell, 2013. 155(7): 1479-91), splitting Cas9 (Zetsche B., et al., Nat. Biotechnol., 2015. 33(2): 139-42; Nihongaki Y., et al., Nat. Biotechnol., 2015. 33(7): 755-60; Wright A. V., et al., Proc. Natl. Acad. Sci. USA, 2015. 112(10): 2984-89), and the use of a pair of Cas9 nickases or FokI-dCas9 nucleases to increase the length of targeting sequence (Mali P., et al., Nat. Biotechnol., 2013. 31(9): p. 833-38; Guilinger J. P., et al., Nat. Biotechnol., 2014. 32(6): 577-82). It has been shown that Cas9 can be mutated (R1335K) to impair its ability to recognize the PAM, thus completely blocking DNA cleavage (Bolukbasi M. F., et al., Nat. Methods, 2015 December; 12(12): 1150-56). Cleavage could be partially rescued by fusing a DNA binding protein (a ZFP or TALE) to dCas9 and placing the corresponding operator upstream of the region targeted by the sgRNA. The longer effective “operator” increase cleavages specificity.
As described herein, this strategy was applied to dCas9, but it was found that a fusion to the TetR-family PhlF repressor is uniquely able to recover full activity. This essentially eliminated toxicity, thus allowing up to 9600 proteins per cell without impairing cell health. Promoters were constructed that include the 30 bp PhlF operator and the sgRNA targeting sequence. A set of 30 sgRNAs were constructed and characterized as NOT gates with improved cooperativity (<n>=1.6). Finally, the loss in dynamic range of a gate as additional sgRNAs are expressed was quantified and a mathematical model was used to quantify the loss in repression due to resource sharing. This disclosure represents the first step towards harnessing dCas9 to scale-up circuit design; however, it also exposes limitations in the use of many regulators that require a shared pool of proteins for activity.
Described herein are novel CRISPR/dCas9-based logic gates that facilitate the scaling up of genetic circuits. These logic gates exhibit non-linear response curves and significantly less toxicity in comparison to previously described CRISPR/Cas9-based logic gates. These improvements enable the production of complex genetic circuits when both digital response curves and large amounts of dCas9 protein are needed. Also described herein are methods of regulating expression of a genetic circuit output sequence through the introduction of novel CRISPR/dCas9-based logic gates into a cell.
Compositions of Synthetic Genetic Circuits and Non-Natural Cells
In one aspect, the components of a synthetic genetic circuit are provided, including a single polynucleotide or a combination of polynucleotides that encode: at least one fusion protein comprising a catalytically-inactive CRISPR/Cas protein fused to a transcription factor, wherein the catalytically-inactive CRISPR/Cas protein comprises a mutated protospacer adjacent motif (PAM) domain (or PAM-interacting domain) and a mutated or absent HNH domain, at least one small guide RNA, and at least one output sequence whose expression is operably linked to an output promoter, wherein the output promoter comprises a transcription factor operator and a cognate promoter comprising an sgRNA target site and, optionally, a PAM site.
In some embodiments, the mutation of the CRISPR/Cas HNH domain consists of the deletion of the entire domain and its replacement by an amino acid linker sequence (e.g., GGSGGS, SEQ ID NO: 127). In some embodiments, the catalytically-inactive CRISPR/Cas protein of a fusion protein possesses a functional RuvC domain. In some embodiments, the catalytically-inactive CRISPR/Cas protein of a fusion protein consists of amino acids 1 to 1368 of Cas9, wherein the Cas9 amino acid sequence contains D10A and R1335K mutations and the Cas9 amino acids 768 to 919 are replaced by a GGSGGS (SEQ ID NO: 127) amino acid linker sequence.
In some embodiments, the catalytically-inactive CRISPR/Cas protein is fused to the transcription factor with a C-terminal polypeptide bond. In other embodiments, the catalytically-inactive CRISPR/Cas protein is fused to the transcription factor with an N-terminal polypeptide bond. In some embodiments, the catalytically-inactive CRISPR/Cas protein and the transcription factor are separated by a linker peptide.
In some embodiments, the transcription factor of a fusion protein represses (or decreases) the expression of the output sequence. In some embodiments, the transcription factor of a fusion protein is PhlF or an ortholog or functional variant, thereof. In other embodiments, the transcription factor of a fusion protein is BM3RI or an ortholog or functional variant, thereof. In other embodiments, the transcription factor of a fusion protein is a ZFP protein or an ortholog or functional variant, thereof. In some embodiments, the transcription factor of a fusion protein activates (or increases) the expression of the output sequence.
In some embodiments, the transcription factor operator and the cognate promoter of the output promoter are on the same DNA strand. In other embodiments, the transcription factor operator and the cognate promoter of the output promoter are on complementary DNA strands. In some embodiments, the transcription factor operator and the cognate promoter of the output promoter are separated by 0 to 20 base pairs.
In some embodiments, the catalytically-inactive CRISPR/Cas protein consists of amino acids 1 to 1368 of Cas9, wherein the Cas9 amino acid sequence contains D10A and R1335K mutations and the Cas9 amino acids 768 to 919 are replaced by a GGSGGS (SEQ ID NO: 127) amino acid linker sequence, the transcription factor is PhlF, the catalytically-inactive CRISPR/Cas protein is fused to PhlF with a C-terminal polypeptide bond, the transcription factor operator of the output promoter is a PhlF operator, and the PhlF operator and the cognate promoter sequence of the output promoter are separated by 0 to 20 base pairs.
In some embodiments, the single polynucleotide or the combination of polynucleotides of a genetic circuit encode: (a) at least one fusion protein comprising a catalytically-inactive CRISPR/Cas protein fused to a transcription factor, wherein the catalytically-inactive CRISPR/Cas protein comprises a mutated PAM domain and a mutated or absent HNH domain; (b) between two and thirty unique sgRNAs, wherein the expression of at least one of the unique sgRNAs is under the control of an inducible promoter; and (c) between one and twenty-nine output sequences, each of whose expression is operably linked to an independent output promoter, wherein at least two of the output promoters comprise a transcription factor operator and a cognate promoter comprising a unique sgRNA target site and, optionally, a PAM site, and wherein: (i) the unique sgRNA target site of each output promoter comprising an sgRNA target site comprises an sgRNA target site of one of the sgRNAs in (b); and (ii) the unique sgRNA target site of at least one of the output promoters comprises the sgRNA target site of the at least one sgRNA under the control of an inducible promoter in (b).
In some embodiments, the genetic circuit is encoded on a single polynucleotide. In some embodiments, the single polynucleotide is a plasmid. In some embodiments, the genetic circuit is encoded on more than one polynucleotides. In some embodiments, at least one of the more than one polynucleotides is a plasmid.
In another aspect, a polynucleotide or combination of polynucleotides are provided. In some embodiments, the polynucleotide or combination of polynucleotides comprise(s) the nucleotide sequence of a genetic circuit described above. Also disclosed herein are compositions comprising the polynucleotide or combination of polynucleotides.
In another aspect, the disclosure relates to non-natural cells comprising a genetic circuit as described above or a polynucleotide or combination of polynucleotides as described above.
Compositions of Fusion Proteins
In another aspect, compositions of fusion proteins are provided, including a catalytically-inactive Cas9 protein linked by a C-terminal polypeptide bond to PhlF, wherein the catalytically-inactive Cas9 protein comprises a mutated PAM domain, a mutated HNH domain, and a functional RuvCI domain, and optionally, the catalytically-inactive Cas9 protein and the PhlF protein are separated by a linker peptide.
In some embodiments, the mutation of the Cas9 HNH domain consists of the deletion of the entire domain and its replacement by an amino acid linker sequence. In some embodiments, the catalytically-inactive Cas9 protein amino acid sequence contains D10A and R1335K mutations and the Cas9 amino acids 768 to 919 are replaced by a GGSGGS (SEQ ID NO: 127) amino acid linker sequence.
Composition of Polynucleotides
In another aspect, compositions of polynucleotides encoding for fusion proteins are provided, including compositions of one or more polynucleotides encoding for any fusion protein encompassed above in “Compositions of Fusion Proteins.”
Methods of Regulating Expression of a Genetic Circuit's Output Sequence
In another aspect, methods of regulating expression of a genetic circuit's output sequence are described, including the introduction of a synthetic genetic circuit into a cell. This aspect embodies the cellular introduction of the synthetic genetic circuit compositions encompassed above in “Components of a Synthetic Circuit.”
These and other aspects are descried in more detail below.
The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present disclosure, which can be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein. It is to be understood that the data illustrated in the drawings in no way limit the scope of the disclosure.
Large synthetic genetic circuits require the simultaneous expression of many regulators. Deactivated Cas9 (dCas9) can serve as a repressor by having a small guide RNA (sgRNA) direct it to bind a promoter. The programmability and specificity of RNA:DNA basepairing simplifies the generation of many orthogonal sgRNAs that, in theory, could serve as a large set of regulators in a circuit. However, dCas9 is toxic in many bacteria, thus limiting how high it can be expressed, and low concentrations are quickly sequestered by multiple sgRNAs. Here, a non-toxic version of dCas9 was constructed by eliminating PAM (protospacer adjacent motif) binding with a R1335K mutation (dCas9*) and recovering DNA binding by fusing it to the PhlF repressor (dCas9*_PhlF). Both the 30 bp PhlF operator and 20 bp sgRNA binding site are required to repress a promoter. The larger region required for recognition mitigates toxicity in Escherichia coli, allowing up to 9600±800 molecules of dCas9*_PhlF per cell before growth or morphology are impacted, as compared to 530±40 molecules of dCas9. Further, PhlF multimerization leads to an increase in average cooperativity from n=0.9 (dCas9) to 1.6 (dCas9*_PhlF). A set of 30 orthogonal sgRNA-promoter pairs were characterized as NOT gates; however, the simultaneous use of multiple sgRNAs leads to a monotonic decline in repression and after 15 are co-expressed the dynamic range is <10-fold. This disclosure introduces a non-toxic variant of dCas9, critical for its use in applications in metabolic engineering and synthetic biology, and exposes a limitation in the number of regulators that can be used in one cell when they rely on a shared resource.
In this study, ZFPs as well as TetR-family homologs were fused to a dCas9 variant that has an impaired ability to recognize PAM sites. The corresponding operators for these DNA binding proteins were placed in proximity to the cognate promoters to increase targeting specificity. Among all the tested DNA binding proteins, fusion with PhlF showed the best repression fold change. Importantly, the fused dCas9-DNA binding protein complex showed significantly reduced toxicity when compare to dCas9, and the resulting gates generated non-linear response curves. These improvements will enable complex genetic circuits to be built when both digital response curves and large amounts of dCas9 protein are needed. Given the large number of orthogonal transcription factors and ever increasing Cas9 complexes being identified, this approach will enable even more complex circuits to be constructed in the future. Moreover, this approach is sufficiently general to apply to many CRISPR/Cas proteins.
Described herein are novel CRISPR/dCas9-based logic gates and methods of regulating expression of an output sequence through the introduction of novel CRISPR/dCas9-based logic gates into a cell.
Compositions of a Synthetic Genetic Circuits and Non-Natural Cells
In one aspect, the components of a synthetic genetic circuit are provided, including a single polynucleotide or a combination of polynucleotides that encode: at least one fusion protein comprising a catalytically-inactive CRISPR/Cas protein fused to a transcription factor, wherein the catalytically-inactive CRISPR/Cas protein comprises a mutated protospacer adjacent motif (PAM) domain (or PAM-interacting domain) and a mutated or absent HNH domain, at least one small guide RNA (i.e., sgRNA), and at least one output sequence whose expression is operably linked to an output promoter, wherein the output promoter comprises a transcription factor operator and a cognate promoter comprising an sgRNA target site and, optionally, a PAM site.
As used herein, the term “genetic circuit” refers to a controllable gene expression system. The term “synthetic genetic circuit” refers to an engineered, non-natural genetic circuit. Genetic circuits function by changing the flow of RNA polymerase on DNA. In some embodiments a synthetic genetic circuits functions by increasing the flow of RNA polymerase at one or more locations. In other embodiments, a synthetic genetic circuit functions by decreasing the flow of RNA polymerase at one or more locations. In still other embodiments, a synthetic genetic circuit functions by increasing the flow of RNA polymerase at one or more locations and decreasing the flow of RNA polymerase at one or more locations.
In some embodiments, the fusion protein(s), sgRNA(s), and output sequence(s) of a synthetic genetic circuit (i.e., “the core elements of the synthetic genetic circuit”) are encoded on a single polynucleotide (e.g., on the same backbone). In some embodiments, the core elements of the synthetic genetic circuit are encoded in any combination on multiple, independent polynucleotides. In some embodiments, the ratios of the core elements of the synthetic genetic circuit are equivalent (e.g., one fusion protein, one sgRNA, and one output sequence). In some embodiments, the ratios of the core elements of the synthetic genetic circuit are not equivalent (e.g., two fusion proteins, eight sgRNAs, and fifteen output sequences). In some embodiments, the core elements of the synthetic genetic circuit include multiple copies of the same fusion protein, sgRNA, or output sequence. In other embodiments, the core elements of the synthetic genetic circuit are each unique (e.g., each fusion protein, sgRNA, and output sequence has a unique composition). In some embodiments, the polynucleotide or combination of polynucleotides of a synthetic genetic circuit are in the form of a circular double stranded DNA (e.g., a viral vector or plasmid). In some embodiments, the components are encoded on plasmid p15A. In other embodiments, the polynucleotides or combination of polynucleotides of a synthetic genetic circuit are in the form of linear double stranded DNA (e.g., genomic DNA). In yet other embodiments, a combination of polynucleotides of a synthetic genetic circuit includes at least one polynucleotide that is in the form of circular double stranded DNA and at least one polynucleotide that is in the form of linear double stranded DNA.
The terms “fusion” or “fusion protein” refer to the combination of two or more polypeptides/peptides in a single polypeptide chain. Fusion proteins typically are produced genetically through the in-frame fusing of the nucleotide sequences encoding for each of the said polypeptides/peptides. Expression of the fused coding sequence results in the generation of a single protein without any translational terminator between each of the fused polypeptides/peptides. Alternatively, fusion proteins also can be produced by chemical synthesis.
In some embodiments of the fusion proteins described herein, the catalytically-inactive CRISPR/Cas protein of the fusion protein is fused to the transcription factor with a C-terminal polypeptide bond. In such embodiments, the C-terminal amino acid of the catalytically-inactive CRISPR/Cas protein is fused to the N-terminal amino acid of the transcription factor. In other embodiments, the catalytically-inactive CRISPR/Cas protein is fused to the transcription factor with an N-terminal polypeptide bond. In such embodiments, the N-terminal amino acid of the catalytically-inactive CRISPR/Cas protein is fused to the C-terminal amino acid of the transcription factor.
In some embodiments, the fusion of the catalytically-inactive CRISPR/Cas protein and the transcription factor is direct (i.e., without any additional amino acids residues between the fused polypeptides/peptides). In other embodiments, the catalytically-inactive CRISPR/Cas protein and the transcription factor of a fusion protein are separated by a linker peptide. As used herein, the term “linker peptide” refers to a polypeptide that serves to connect the CRISPR/Cas protein with the transcription factor of a fusion protein. The length of a linker peptide can vary; for example, the length may be as few as one amino acid or more than one hundred amino acids. Non-limiting examples of linker peptides contemplated herein include flexible linkers, such as Gly-Ser linkers. Such linkers can have the formula Glyx-Sery in which x=1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 and y=1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In one specific embodiment, x=4 and y=1, such that the linker formula is Gly4-Ser1 (SEQ ID NO: 129). The Gly-Ser linker can be replicated n number of times [(Glyx-Sery)n], for example, wherein n=1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, or 30. Additional flexible linkers include, e.g., (Gly)6 (SEQ ID NO: 130), (Gly)8 (SEQ ID NO: 131), etc. Additional linkers include rigid linkers (e.g., (EAAAK)3 (SEQ ID NO: 132), A(EAAAK)4ALEA(EAAAK)4A (SEQ ID NO: 133), PAPAP (SEQ ID NO: 134), etc.) and cleavable linkers (e.g., disulfide, VSQTSKLTR↓AETVFPDV (SEQ ID NO: 135), RVL↓AEA (SEQ ID NO: 136); EDVVCC↓SMSY (SEQ ID NO: 137); GGIEGR↓GS (SEQ ID NO: 138); GFLG↓ (SEQ ID NO: 139), etc. (cleavage site marked by “↓”)). Any of the linkers can be naturally-occurring or synthetic.
As used herein, the term “CRISPR/Cas protein” refers to an RNA-guided DNA endonuclease, including, but not limited to, Cas9, Cpf1, C2c1, and C2c3 and each of their orthologs and functional variants. The amino acid sequence of exemplary Streptococcus pyogenes serotype M1 Cas9 is provided below, which serves as a reference for the Cas9 mutation numbering described herein:
As used herein, the term “functional variants” includes polypeptides which are at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to a protein's native amino acid sequence (i.e., wild-type amino acid sequence) and which retain functionality. The term “functional variants” also includes polypeptides which are shorter or longer than a protein's native amino acid sequence by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 40, 50, 75, 100 amino acids or more and which retain functionality. In the context of a CRISPR/Cas protein variant, the term “retain functionality” refers to a variant's ability to bind RNA at least about 5%, 10%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 90%, 100%, or more than 100% as efficiently as the respective non-variant (i.e., wild-type) CRISPR/Cas protein. Methods of measuring and comparing the efficiency of RNA binding are known to those skilled in the art.
The term “catalytically-inactive CRISPR/Cas protein” as used herein refers to a CRISPR/Cas protein variant or mutant that lacks endonuclease activity (i.e., the ability to cleave double stranded DNA). For example, catalytically-inactive Cas9 mutants have been generated through incorporation of various mutations (e.g., D10 mutants) mutations (Jinek et al., Science 337, 816-21 (2012)).
The terms “PAM domain” or “PAM-interacting domain” are used interchangeably herein to refer to a domain of a CRISPR/Cas protein that is responsible for recognition of protospacer adjacent motifs (PAMs or PAM sites). The term “mutated PAM domain” refers to any point mutation, insertion, deletion, frameshift, or mis sense mutation or any combination of these mutations that decreases a CRISPR/Cas protein's ability to recognize a PAM site by at least 20%, 25%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 90% or up to 100% relative to the respective non-variant (i.e., wild-type) CRISPR/Cas protein. For example, Cas9 R1335 point mutations (e.g., R1335K) decrease Cas9's ability to recognize PAM sites. Methods of measuring and comparing PAM recognition are known to those skilled in the art.
The term “HNH domain” refers to a protein endonuclease domain. The term “mutated HNH domain” refers to any point mutation, insertion, deletion, frameshift, or missense mutation or any combination of these mutations to a CRISPR/Cas protein's HNH domain. For example, in some embodiments, the mutation of the CRISPR/Cas HNH domain consists of the deletion of the entire domain and its replacement by an amino acid linker sequence (e.g., GGSGGS, SEQ ID NO: 127). As used herein, the term “amino acid linker sequence” refers to a polypeptide that serves to replace the HNH domain of a CRISPR/Cas protein. The length of an amino acid linker can vary; for example, the length of an amino acid linker may as few as one amino acid or more than one hundred amino acids. The term “absent,” in the context of an HNH domain, refers to and encompasses CRISPR/Cas proteins that inherently lack an HNH domain (e.g., Cpf1, C2c1, and C2c3).
In some embodiments, the catalytically-inactive CRISPR/Cas protein of a fusion protein possesses a functional RuvC domain. The term “RuvC domain” refers to a protein endonuclease domain. “Possesses a functional RuvC domain” refers to a native or wild-type RuvC domain, or any mutation thereof, that retains the catalytically-inactive CRISPR/Cas protein's ability to regulate the expression of an output promoter. In some embodiments, the catalytically-inactive CRISPR/Cas protein of a fusion protein possess a native or wild-type RuvC domain. The terms “native RuvC domain” or “wild-type RuvC domain” refer to an RuvC domain composed entirely of an amino acid sequence that is found in nature.
In some embodiments, the catalytically-inactive CRISPR/Cas protein of a fusion protein consists of amino acids 1 to 1368 of Cas9, wherein the Cas9 amino acid sequence contains D10A and R1335K mutations and the Cas9 amino acids 768 to 919 are replaced by a GGSGGS (SEQ ID NO: 127) amino acid linker sequence.
Cas9 orthologs have been described in various species, including, but not limited to Bacteroides coprophilus (e.g., NCBI Reference Sequence: WP_008144470.1), Campylobacter jejuni susp. jejuni (e.g., GeneBank: AJP35933.1), Campylobacter lari (e.g., GeneBank: AJD02827.1), Fancisella novicida (e.g., UniProtKB/Swiss-Prot: A0Q5Y3.1), Filifactor alocis (e.g., NCBI Reference Sequence: WP_083799662.1), Flavobacterium columnare (e.g., GeneBank: AMA50561.1), Fluviicola taffensis (e.g., NCBI Reference Sequence: WP_013687888.1), Gluconacetobacter diazotrophicus (e.g., NCBI Reference Sequence: WP_041249387.1), Lactobacillus farciminis (e.g., NCBI Reference Sequence: WP_010018949.1), Lactobacillus johnsonii (e.g., GeneBank: KXN76786.1), Legionella pneumophila (e.g., NCBI Reference Sequence: WP_062726656.1), Mycoplasma gallisepticum (e.g., NCBI Reference Sequence: WP_011883478.1), Mycoplasma mobile (e.g., NCBI Reference Sequence: WP_041362727.1), Neisseria cinerea (e.g., NCBI Reference Sequence: WP_003676410.1), Neisseria meningitidis (e.g., GeneBank: ODP42304.1), Nitratifractor salsuginis (e.g., NCBI Reference Sequence: WP_083799866.1), Parvibaculum lavamentivorans (e.g., NCBI Reference Sequence: WP_011995013.1), Pasteurella multocida (e.g., GeneBank: KUM14477.1), Sphaerochaeta globusa (e.g., NCBI Reference Sequence: WP_013607849.1), Streptococcus pasteurianus (e.g., NCBI Reference Sequence: WP_061100419.1), Streptococcus thermophilus (e.g., GeneBank: ANJ62426.1), Sutterella wadsworthensis (e.g., NCBI Reference Sequence: WP_005430658.1), and Treponema denticola (e.g., NCBI Reference Sequence: WP_002684945.1).
In some embodiments, “Cas9” refers to any one of the Cas9 orthologs described herein, including functional variants thereof or suitable Cas9 endonucleases and sequences that are apparent to those of ordinary skill in the art.
The term “transcription factor” refers to any polypeptide that is capable of binding DNA and that, when bound, regulates output gene expression. “Regulates output gene expression” refers to a change (increase or decrease) of at least 5%, 10%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 90%, 100%, 500%, 1000%, 10,000% or more than 10,000% in the level of output gene expression relative to the level of expression in the absence of the transcription factor. Methods of measuring and comparing gene expression are known to those skilled in the art. In some embodiments, the transcription factor activates or increases a genetic circuit's output gene expression. In other embodiments, the transcription factor represses or decreases a genetic circuit's output gene expression.
In some embodiments of the fusion proteins described herein, the transcription factor of a fusion protein is PhlF or an ortholog or functional variant thereof. In other embodiments, the transcription factor of a fusion protein is BM3RI or an ortholog or functional variant, thereof. In other embodiments, the transcription factor of a fusion protein is a ZFP protein or an ortholog or functional variant, thereof. In the context of a PhlF, BM3RI, or ZFP protein variant, the term “retain functionality” refers to a variant's ability to repress (or decrease) gene expression at least about 5%, 10%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 90%, 100%, or more than 100% as efficiently as the respective wild-type protein. Methods of measuring and comparing gene expression are known to those skilled in the art.
As used herein, the terms “small guide RNA” or “sgRNA” refer to a nucleic acid molecule that has a sequence that complements an sgRNA target site, which mediates binding of the CRISPR/Cas-RNA complex to the sgRNA target site, providing the specificity of the CRISPR/Cas-RNA complex. Typically, guide RNAs that exist as single RNA species comprise two domains: (1) a “guide” domain that shares homology to a target nucleic acid (e.g., directs binding of a CRISPR/Cas complex to a target site); and (2) a “direct repeat” domain that binds a CRISPR/Cas protein. In this way, the sequence and length of a small guide RNA may vary depending on the specific sgRNA target site and/or the specific CRISPR/Cas protein (Zetsche et al. Cell 163, 759-71 (2015)).
In some embodiments, a genetic circuit comprises a single sgRNA. In other embodiments, a genetic circuit comprises two unique sgRNAs, wherein both sgRNAs can be fully expressed and independently repress two promoters without incurring significant negative effects on repression due to resource sharing (e.g., insufficient dCas9-fusion protein). In some embodiments, a genetic circuit comprises more than two unique sgRNAs (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more than 30 unique sgRNAs).
The term “output sequence” as used herein refers to an expressible nucleotide sequence that is operably linked to an output promoter of a synthetic genetic circuit. In some embodiments, the expressible nucleotide sequence of an output sequence comprises the nucleotide sequence of a non-coding RNA (e.g., a tRNA, rRNA, miRNA, siRNA, shRNA, sgRNA, piRNA, snoRNA, snRNA, exRNA, scaRNA, tracrRNA, lncRNA, riboswitch, or ribozyme). In some embodiments, the expressible nucleotide sequence of an output sequence comprises the nucleotide sequence of an RNA that encodes for a protein product (i.e., a mRNA). In some embodiments, the protein product is a therapeutic protein. In some embodiments, the protein product is a detectable protein, such as a fluorescent protein. In some embodiments, a genetic circuit comprises more than two unique output sequences (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more than 30 unique output sequences).
The term “operably linked” as used herein refers to a relationship between an output promoter and an output sequence wherein the position of the output promoter relative to the output sequence is such that the output promoter is able to influence the expression of the output sequence. The term “influence the expression” refers to output sequence expression level changes (increases or decreases) of at least 5%, 10%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 90%, 100%, 500%, 1000%, 10,000% or more than 10,000% relative to output sequence expression levels in the absence of the output promoter. Methods of measuring and comparing promoter functionality are known to those skilled in the art.
As used herein, the term “transcription factor operator” refers to the DNA sequence that a transcription factor binds to; for example, the PhlF operator is the DNA sequence that PhlF binds to. In some embodiments, the transcription factor operator is positioned 3′ to the cognate promoter. In other embodiments, the transcription factor operator is positioned 5′ to the cognate promoter. In some embodiments, the transcription factor operator and the cognate promoter are oriented on the same DNA strand. In other embodiments, the transcription factor operator and the cognate promoter are oriented on complementary DNA strands. In some embodiments, the transcription factor operator and the cognate promoter sequence are separated by 0 to 20 base pairs.
As used herein, the term “cognate promoter” refers to a DNA sequence that interacts with a CRISPR/Cas complex. In some embodiments, the cognate promoter consists of an sgRNA target site. The term “sgRNA target” refers to a sequence that is complementary to a CRISPR/Cas protein's complexed sgRNA. In some embodiments, the sgRNA target site of at least one of the output promoters comprises the sgRNA target site of at least one sgRNA whose expression is under the control of an inducible promoter. Examples of inducible promoters are known to those having skill in the art. In some embodiments, an inducible promoter is a chemically inducible promoter (e.g., pTet, pTac, or pVan), a temperature inducible promoter, or a light inducible promoter. In some embodiments, the inducer of an inducible promoter is a small molecule (e.g., aTc, IPTG, or vanillic acid). In other embodiments, the inducer is a large molecule (e.g., a protein or non-coding RNA).
In some embodiments, the cognate promoter comprises an sgRNA target site and a PAM site. The term “PAM” or “PAM site” are used interchangeably herein to refer to a short nucleotide sequence, generally 2-6 base pairs in length, that is recognized by a CRISPR/Cas protein; for example, Cas9 primarily recognizes NGG elements as PAM sites, though it has been shown that it can also inefficiently recognize other PAM sites (e.g., NAG or NGA) (Zhang et al., Sci. Rep. 4, 1-5 (2014); Hsu et al., Nat. Biotechnol. 31, 827-32 (2013)). PAM sites can vary between CRISPR/Cas proteins and each protein's species of origin. In some embodiments, the cognate promoter lacks a PAM site.
In some embodiments, the transcription factor operator and the cognate promoter of the output promoter are on the same DNA strand. In other embodiments, the transcription factor operator and the cognate promoter of the output promoter are on complementary DNA strands. In some embodiments, the transcription factor operator and the cognate promoter of the output promoter are separated by 0 to 20 base pairs.
In some embodiments, the output promoter also comprises minimal gene promoter elements. In some embodiments, these minimal gene promoter elements provide for basal or constitutive expression of an output sequence which can be activated or repressed by the binding of a fusion protein to the output promoter.
In some embodiments, the catalytically-inactive CRISPR/Cas protein consists of amino acids 1 to 1368 of Cas9, wherein the Cas9 amino acid sequence contains D10A and R1335K mutations and the Cas9 amino acids 768 to 919 are replaced by a GGSGGS (SEQ ID NO: 127) amino acid linker sequence, the transcription factor is PhlF, the catalytically-inactive CRISPR/Cas protein is fused to PhlF with a C-terminal polypeptide bond, the transcription factor operator of the output promoter is a PhlF operator, and the PhlF operator and the cognate promoter sequence of the output promoter are separated by 0 to 20 base pairs.
In some embodiments, the single polynucleotide or the combination of polynucleotides of a genetic circuit encode: (a) at least one fusion protein comprising a catalytically-inactive CRISPR/Cas protein fused to a transcription factor, wherein the catalytically-inactive CRISPR/Cas protein comprises a mutated PAM domain and a mutated or absent HNH domain; (b) between two and thirty unique sgRNAs, wherein the expression of at least one of the unique sgRNAs is under the control of an inducible promoter; and (c) between one and twenty-nine output sequences, each of whose expression is operably linked to an independent output promoter, wherein at least two of the output promoters comprise a transcription factor operator and a cognate promoter comprising a unique sgRNA target site and, optionally, a PAM site, and wherein: (i) the unique sgRNA target site of each output promoter comprising an sgRNA target site comprises an sgRNA target site of one of the sgRNAs in (b); and (ii) the unique sgRNA target site of at least one of the output promoters comprises the sgRNA target site of the at least one sgRNA under the control of an inducible promoter in (b).
In some embodiments, the genetic circuit is encoded on a single polynucleotide. In some embodiments, the single polynucleotide is a plasmid. In some embodiments, the genetic circuit is encoded on more than one polynucleotides. In some embodiments, at least one of the more than one polynucleotides is a plasmid.
In another aspect, a polynucleotide or combination of polynucleotides are provided. In some embodiments, the polynucleotide or combination of polynucleotides comprise(s) the nucleotide sequence of a genetic circuit described above. Also disclosed herein are compositions comprising the polynucleotide or combination of polynucleotides.
In another aspect, the disclosure relates to non-natural cells comprising a genetic circuit as described above or a polynucleotide or combination of polynucleotides as described above. The term “non-natural cells,” as used herein, relates to a cell that has been engineered to be different from its natural counterpart or the cell from which it is derived.
In some embodiments, a non-natural cell comprises a genetic circuit that comprises at least one output promoter comprising an sgRNA target site of at least one sgRNA whose expression is under the control of an inducible promoter. In some embodiments, the source of the inducer of the inducible promoter is outside of the cell (e.g., a small molecule inducer, such as aTc, IPTG, or Vanillic acid). In other embodiments, the source of the inducer of the inducible promoter is within the cell. For example, the non-natural cell may respond to an external or internal stimulus via the production of a molecule (e.g., a protein, non-coding RNA, etc.) that is the inducer of the inducible promoter.
Compositions of Fusion Proteins
In another aspect, compositions of fusion proteins are provided, including a catalytically-inactive Cas9 protein linked by a C-terminal polypeptide bond to PhlF, wherein the catalytically-inactive Cas9 protein comprises a mutated PAM domain, a mutated HNH domain, and a functional RuvCI domain, and optionally, the catalytically-inactive Cas9 protein and the PhlF protein are separated by a linker peptide. Relevant definitions and term usages described in “Components of a Synthetic Circuit” above apply to this section, as well.
In some embodiments, the mutation of the Cas9 HNH domain consists of the deletion of the entire domain and its replacement by an amino acid linker sequence. In some embodiments, the catalytically-inactive Cas9 protein amino acid sequence contains D10A and R1335K mutations and the Cas9 amino acids 768 to 919 are replaced by a GGSGGS (SEQ ID NO: 127) amino acid linker sequence.
The compositions of fusion proteins have, in some embodiments, a single type of fusion protein (i.e., all the fusion proteins in the composition have the same amino acid sequence). In other embodiments, however, the fusion protein compositions include two or more types of fusion proteins (i.e., a “cocktail” of fusion proteins). For example, fusion proteins of a composition may include fusion proteins that have: (1) catalytically-inactive Cas9 proteins and/or PhlF transcription factors from different species; (2) catalytically-inactive Cas9 proteins of the same species that have different mutations and/or amino acid linker sequences; (3) PhlF transcription factors of the same species that have different mutations; and/or (4) different linker peptide sequences.
In some embodiments, the fusion proteins in a fusion protein composition may include non-canonical amino acids (e.g., amino acid phosphorylation, methylation, acetylation, amidation, isomerization, hydroxylation, sulfonation, and cysteine oxidation and nitrosylation).
In some embodiments, the compositions also comprise an sgRNA or a combination of sgRNAs that can be bound by the fusion proteins of the composition. In some embodiments, the compositions include diluents of various: buffer content (e.g., Tris-HCl, Tris Base, acetate, phosphate), pH and ionic strength; additives such as detergents and solubilizing agents (e.g., Triton X-100, Tween 80, Polysorbate 80), anti-oxidants (e.g., DTT, ascorbic acid, sodium metabisulfite), preservatives (e.g., Thimersol, benzyl alcohol, sodium azide), and stabilizers (e.g., glycerol, mannitol, trehalose). In some embodiments, the protein compositions are incorporated into particulate preparations of polymeric compounds (e.g., polylactic acid, polyglycolic acid, etc.) or into liposomes.
In some embodiments, the compositions are provided in a dry, solid form (e.g., lyophilized compositions). In other embodiments, the compositions are provided in a liquid form. In some embodiments, the compositions are frozen. In some embodiments, the fusion compositions include packaging material and a container, wherein the packaging material comprises a label that indicates how the composition can be stored over various periods of time and the conditions under which the composition may be used.
Composition of Polynucleotides
In another aspect, compositions of polynucleotides encoding for fusion proteins are provided, including compositions of a polynucleotide encoding for any fusion protein encompassed above in “Compositions of Fusion Proteins.” For example, in some embodiments, a polynucleotide encodes for a catalytically-inactive Cas9 protein linked by a C-terminal polypeptide bond to PhlF, wherein the catalytically-inactive Cas9 protein comprises a mutated PAM domain, a mutated HNH domain, and a functional RuvCI domain, and optionally, the catalytically-inactive Cas9 protein and the PhlF protein are separated by a linker peptide.
The polynucleotide compositions have, in some embodiments, a single type of polynucleotide (i.e., each polynucleotide in the composition consists of the same nucleic acid sequence). In other embodiments, however, the polynucleotide compositions include two or more types of polynucleotides (i.e., a “cocktail” of polynucleotides). For example, polynucleotides of a composition may include polynucleotides that encode for: (1) catalytically-inactive Cas9 proteins and/or PhlF transcription factors from different species; (2) catalytically-inactive Cas9 proteins of the same species that have different mutations and/or amino acid linker sequences; (3) PhlF transcription factors of the same species that have different mutations; and/or (4) different linker peptide sequences. In some embodiments, the polynucleotides that encode for the fusion proteins also encode for one or more sgRNAs and/or one or more output sequences whose expression is operably linked to an output promoter, wherein the output promoter comprises a transcription factor operator and a cognate promoter comprising an sgRNA target site and, optionally, a PAM site. In some embodiments, the composition of polynucleotides includes additional, independent polynucleotides that encode for one or more sgRNAs and/or one or more output sequences whose expression is operably linked to an output promoter, wherein the output promoter comprises a transcription factor operator and a cognate promoter comprising an sgRNA target site and, optionally, a PAM site.
In some embodiments, the polynucleotide composition may include non-canonical nucleotides such as inosine, thiouridine, or pseudouridine. In some embodiments, the polynucleotide composition may include chemically modified nucleotides. Examples of chemically modified oligonucleotides or polynucleotides are well known in the art. For example, the naturally occurring phosphodiester backbone of an oligonucleotide or polynucleotide can be partially or completely modified with phosphorothioate, phosphorodithioate, or methylphosphonate internucleotide linkage modifications, modified nucleoside bases or modified sugars can be used in oligonucleotide or polynucleotide synthesis, and oligonucleotides or polynucleotides can be labelled with a fluorescent moiety (e.g., fluorescein or rhodamine) or other label (e.g., biotin).
In some embodiments, the compositions also comprise an sgRNA or a combination of sgRNAs. In some embodiments, the compositions include diluents of various buffer content (e.g., Tris-HCl, Tris Base, acetate, phosphate), pH and ionic strength. In some embodiments, the polynucleotide compositions are incorporated into particulate preparations of polymeric compounds (e.g., polylactic acid, polyglycolic acid, etc.) or into liposomes.
In some embodiments, the compositions of polynucleotides are in a dry, solid form (e.g., lyophilized compositions). In other embodiments, the compositions of polynucleotides are in liquid form. In some embodiments, the compositions of polynucleotides are frozen. In some embodiments, the compositions of polynucleotides include packaging material and a container, wherein the packaging material comprises a label that indicates how the composition can be stored over various periods of time and the conditions under which the composition may be used.
Methods of Regulating Expression of a Genetic Circuit's Output Sequence
In another aspect, methods of regulating expression of a genetic circuit's output sequence are described, including the introduction of a synthetic genetic circuit into a cell. This aspect embodies the cellular introduction of the synthetic genetic circuit compositions encompassed above in “Components of a Synthetic Circuit.”
As used herein, the term “introducing the genetic circuit” refers to any mechanism whereby a polynucleotide or combination of polynucleotides can be transferred from a cell's exterior to that cell's interior, in which the cell remains viable. Methods of introducing polynucleotides into a cell are known to those of ordinary skill in the art and include, but are not limited to, electroporation, transfection (e.g., heat-shock-mediated transfection, laser transfection, lipofectamine-mediated transfection, liposomal transfection), transformation, microinjection, nuclear injection, biolistics, gene guns, gene therapy, and gene transfer.
“Cell” as used herein may refer to a prokaryotic cell, a eukaryotic cell, or a synthetic cell (i.e., a minimal cell or an artificial cell). “Prokaryotic cells” include bacteria and archaea. In some embodiments the prokaryotic cell is a bacteria of a phyla selected from Actinobacteria, Aquificae, Armatimonadetes, Bacteroidetes, Caldiserica, Chlamydiae, Chloroflexi, Chrysiogenetes, Cyanobacteria, Deferribacteres, Deinococcus-Thermus, Dictyoglomi, Elusimicrobia, Fibrobacteres, Firmicutes, Fusobacteria, Gemmatimonadetes, Nitrospirae, Planctomycetes, Proteobacteria, Spirochaetes, Synergistets, Tenericutes, Thermodesulfobacteria, and Thermotogae. In some embodiments the prokaryotic cell is an archaea of a phyla selected from Euryarcheota, Crenarcheota, Nanoarchaeota, Thaumarchaeota, Aigarchaeota, Lokiarchaeota, Thermotogae, and Tenericutes. In some embodiments the eukaryotic cell is a member of a kingdom selected from Protista, Fungi, Plantae, or Animalia. In some embodiments the cell is a bacterial cell, such as Escherichia spp., Streptomyces spp., Zymonas spp., Acetobacter spp., Citrobacter spp., Synechocystis spp., Rhizobium spp., Clostridium spp., Corynebacterium spp., Streptococcus spp., Xanthomonas spp., Lactobacillus spp., Lactococcus spp., Bacillus spp., Alcaligenes spp., Pseudomonas spp., Aeromonas spp., Azotobacter spp., Comamonas spp., Mycobacterium spp., Rhodococcus spp., Gluconobacter spp., Ralstonia spp., Acidithiobacillus spp., Microlunatus spp., Geobacter spp., Geobacillus spp., Arthrobacter spp., Flavobacterium spp., Serratia spp., Saccharopolyspora spp., Thermus spp., Stenotrophomonas spp., Chromobacterium spp., Sinorhizobium spp., Saccharopolyspora spp., Agrobacterium spp. and Pantoea spp. The bacterial cell can be a Gram-negative cell such as an Escherichia coli (E. coli) cell, or a Gram-positive cell such as a species of Bacillus. In other embodiments the cell is an archaeal cell, such as Methanosphaera spp., Methanothermus spp., Methanomicrobium spp., Methanohalobium spp., Methanimicrococcus spp., Methanocalculus spp., Haloferax spp., Halobacterium spp., Halococcus spp., Halorubrum spp., Haloterrigena spp., Thermoplasma spp., Thermoproteus spp., Chaetomium spp., Thermomyces spp., Brevibacillus spp., and Sulfolobus spp. In other embodiments, the cell is a fungal cell such as a yeast cell, e.g., Saccharomyces spp., Schizosaccharomyces spp., Pichia spp., Paffia spp., Kluyveromyces spp., Candida spp., Talaromyces spp., Brettanomyces spp., Pachysolen spp., Debaryomyces spp., Yarrowia spp., and industrial polyploid yeast strains. Preferably the yeast strain is a S. cerevisiae strain or a Yarrowia spp. strain. Other examples of fungi include Aspergillus spp., Pennicilium spp., Fusarium spp., Rhizopus spp., Acremonium spp., Neurospora spp., Sordaria spp., Magnaporthe spp., Allomyces spp., Ustilago spp., Botrytis spp., and Trichoderma spp. In other embodiments, the cell is a mammalian cell, an algal cell, or a plant cell. As used herein, “synthetic cell” refers to an engineered cell that mimics one or more functions or structure of a biological cell. In some embodiments, the cell exists independent of other cells (i.e., is single cellular). In other embodiments the cell exists as part of a multicellular organism (e.g., part of a tissue or organ). For example, a cell may be located in a transgenic animal or transgenic plant.
A relevant synthetic genetic circuit that can be introduced into a cell may comprise a single layer input gate. For example, in some embodiments, a genetic circuit may comprise a fusion protein (e.g., a catalytically-inactive Cas9 protein linked by a C-terminal polypeptide bond to PhlF) whose expression is controlled by an inducible promoter (e.g., aTc inducible pTet promoter), an sgRNA whose expression is controlled by a different inducible input promoter (e.g., IPTG inducible pTac promoter), an output promoter that is targeted by fusion protein-sgRNA complexes, and a gene controlled by the output promoter. In some embodiments, these parts are integrated on the same backbone (e.g., p15A) to avoid plasmid variation. Expression and production of the fusion protein and the sgRNA can be stimulated via cellular administration of the appropriate inducers. The fusion proteins and sgRNAs that are produced then form complexes that target the output promoter. The interaction between a fusion protein-sgRNA complex and an output promoter (i.e., the interaction between the transcription factor of the fusion protein with its operator and the interaction between the catalytically-inactive CRISPR/Cas protein of the fusion protein with the sgRNA and the cognate promoter) results in the regulation (i.e., an increase or decrease) of the output gene's expression levels.
A synthetic genetic circuit may also comprise multiple layers. For example, in some embodiments, a genetic circuit with two layers may comprise a fusion protein (e.g., a catalytically-inactive Cas9 protein linked by a C-terminal polypeptide bond to PhlF) whose expression is controlled by an inducible promoter (e.g., aTc inducible pTet promoter), an sgRNA(a) whose expression is controlled by a different inducible input promoter (e.g., vanillic acid inducible pVanR promoter), an output promoter(a) that is targeted and repressed by fusion protein-sgRNA(a) complexes, an sgRNA(b) whose expression is controlled by the output promoter(a), an output promoter(b) that is targeted and repressed by fusion protein-sgRNA(b) complexes, and an output gene whose expression is controlled by the output promoter(b). In some embodiments, these parts may be integrated on the same backbone to avoid plasmid variation. Expression and production of the fusion protein and the sgRNA(a) can be stimulated via cellular administration of the appropriate inducer. The fusion proteins and sgRNA(a)s that are produced then form complexes that target and repress the output promoter(a). The interaction between a fusion protein-sgRNA(a) complex and an output promoter(a) results in repression of sgRNA(b) expression levels. Because sgRNA(b) expression is repressed, fewer fusion protein-sgRNA(b) complexes interact with and repress output promoter(b). Thus, the output gene's expression levels increase.
In another example, a synthetic genetic circuit with three layers may comprise, in some embodiments, a fusion protein whose expression is controlled by an inducible promoter, an sgRNA(a) whose expression is controlled by a different inducible input promoter, an output promoter(a) that is targeted and repressed by fusion protein-sgRNA(a) complexes, an sgRNA(b) whose expression is controlled by the output promoter(a), an output promoter(b) that is targeted and repressed by fusion protein-sgRNA(b) complexes, an sgRNA(c) whose expression is controlled by the output promoter(b), an output promoter(c) that is targeted and repressed fusion protein-sgRNA(c) complexes, and an output gene whose expression is controlled by the output promoter(c). In some embodiments, these parts are integrated on the same backbone to avoid plasmid variation. Expression and production of the fusion protein and the sgRNA(a) can be stimulated via cellular administration of the appropriate inducer. The fusion proteins and the sgRNA(a)s that are produced then form complexes that target and repress the output promoter(a). The interaction between a fusion protein-sgRNA(a) complex and an output promoter(a) results in repression of sgRNA(b) expression levels. Because sgRNA(b) expression is repressed, fewer fusion protein-sgRNA(b) complexes interact with and repress output promoter(b). Thus expression of sgRNA(c) increases. The interaction between a fusion protein-sgRNA(c) complex and an output promoter(c) results in repression of the output gene's expression levels.
In other embodiments, a synthetic genetic circuit comprises four or more layers. The complexity and diversity of the synthetic genetic circuits embodied herein can be selected as needed for particular tasks and outcomes. For example, in some embodiments, a multilayer synthetic genetic circuit comprises multiple input gates.
While the cellular concentrations of the components utilized in this method (e.g., the polynucleotides, the fusion proteins generated via translation of RNAs produced from the polynucleotides, and sgRNAs generated via transcription of the polynucleotides) may vary, the methods can utilize any effective amount of the components. “Any effective amount of the components” refers to any amount that, when combined, results in the regulation of output gene expression or the change (increase or decrease) of at least 5%, 10%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 90%, 100%, 500%, 1000%, 10,000% or more than 10,000% in the level of output gene expression relative to the level of expression in the absence of the combination of components. For example, in some embodiments, the cellular concentration of fused dCas9*-PhlF complex is about 5000 molecules per cell.
Methods and Materials
Strains and media.
All cloning was performed in Escherichia coli NEB 10-beta (New England Biolabs, # C3019) and cells were grown in LB Miller broth (Difco, MI, #90003-350). The measurements experiments were done in E. coli K-12 MG1655* [F-λ-ilvG-rfb-50 rph-1 Δ(araCBAD) Δ(LacI)] (Nielsen A. A. and Voigt C. A., Mol. Syst. Biol., 2014. 10(763): 1-11; Blattner F. R., et al., Science, 1997 Sep. 5; 277(5331): 1453-62), and MOPS EZ Rich Defined Medium was used (Teknova, # M2105) with 0.2% glucose (Thermo Fisher Scientific, #156129) as carbon source for cell growth. Ampicillin (100 μg/ml, GoldBio, # A-301-5), kanamycin (50 μg/ml, GoldBio, # K-120-5), and spectinomycin sulfate (50 μg/ml, GoldBio, # S-140-5) were used to maintain plasmids when appropriate.
Induction assays.
Individual colonies were inoculated into 150 μl MOPS EZ Rich Defined Medium with appropriate antibiotics and then grown overnight (˜16 hours) in 96-well plates (Nunc, Roskilde, Denmark, #249952) at 1,000 rpm and 37° C. on a plate shaker (ELMI, # DTS-4). Cultures were diluted 1000-fold by adding 2 μl of culture to 198 μl media, and then 15 μl of that dilution to 135 μl media, and grown with the same shaking condition for 3 hours. At this point, cells were diluted 3000-fold by adding 2 μl of culture to 198 μl media, and then 5 μl of that dilution to 145 μl media with inducers and antibiotics as needed, and then were grown under the same conditions for 6 hours.
Flow cytometry analyses.
Aliquots of 40 μl of media containing cells were collected and added to 160 μl phosphate-buffered saline with 1 mg/ml kanamycin to stop translation and arrest cell growth. The LSRII Fortessa flow cytometer (BD Biosciences, San Jose, Calif.) was used to quantify the fluorescent protein production. The software FlowJo v10 (TreeStar, Inc., Ashland, Oreg.) was used to gate the events by forward and side scatter, and at least 10,000 events were collected for each sample. The geometric mean of each sample was calculated. The autofluorescence of white cells was subtracted, defined as the geometric mean of a strain harboring an empty backbone (pSZ_Backbone,
Growth Assay.
Individual colonies were inoculated into 150 μl MOPS EZ Rich Defined Medium with appropriate antibiotics and then grown overnight (˜16 hours) in 96-well plates (Nunc, Roskilde, Denmark, #249952) at 1,000 rpm and 37° C. on a plate shaker (ELMI, # DTS-4). Cultures were diluted 1000-fold by adding 2 μl of culture to 198 μl media, and then 15 μl of that dilution to 135 μl media, and grown with the same shaking condition for 3 hours. After the 3 hours step, the cultures were diluted 3000-fold by adding 2 μl of culture to 198 μl media, and then 5 μl of that dilution to 145 μl media with appropriate antibiotics and different inducers concentrations. The dilutions were made in 96-well plates (Nunc, Roskilde, Denmark, #165305) and grown at 1,000 rpm and 37° C. for 6 hours. The optical density at 600 nm was measured on a Synergy H1 plate reader (BioTek, Winooski, Vt.) and the background of MOPS EZ Rich Defined Medium was subtracted. The measured values were then normalized to the un-induced samples (0 ng/ml aTc).
Microscopy.
After 6 hours growth in the growth assay experiments, aliquots (2 μl) of cultures were collected. Microscopic images of these cultures were then taken on the Axiovert 200m microscope (Carl Zeiss, Oberkochen, Germany).
Numbers of cells per ml.
Colonies were inoculated into 150 μl MOPS EZ Rich Defined Medium with appropriate antibiotics and then grown overnight (˜16 hours). The next day, these cultures were diluted by adding 1 μl culture into 1 ml fresh media. After 5 hours of growth (1,000 rpm and 37° C.), the culture density was measured and diluted to different OD600 nm. The cultures at different OD600 nm were then diluted 2×107-fold and plated on LB agar. Colony numbers were then counted after overnight growth at 37° C.
Quantification of dCas9.
Colonies were inoculated into 150 μl MOPS EZ Rich Defined Medium with appropriate antibiotics and then grown overnight (˜16 hours). The next day, these cultures were diluted by adding 1 μl culture into 1 ml fresh media containing inducer (2.5 ng/ml or 0.7 ng/ml aTc). After 5 hours of growth (1,000 rpm and 37° C.), the culture density was measured and adjusted to OD600 nm=1 with MOPS EZ Rich Defined Medium. 700 μl of the adjusted culture for each strain was centrifuged at 12,000 rpm for 1 min. The supernatant was discarded and cell pellet was re-suspended in 40 μl lysis buffer (100 mM NaCl, 25 mM TrisHCl, pH 8.0) containing 0.2% β-mercaptoethanol (Sigma-Aldrich, # M6250). The samples were boiled at 100° C. for 5 min, after which 3 μl of the dCas9 sample and 0.75 μl of the dCas9*_PhlF sample were added to lysis buffer to a final volume of 20 μl.
To prepare the standard curve, 2 μl of purchased Cas9 complex (New England Biolabs, # M0386S) was added to 38 μl lysis buffer. Then, different amounts (0.2 μl, 1 μl, 3 μl, 5 μl) of the diluted Cas9 standard, 3 μl WT lysate, and lysis buffer were added to each sample to a total volume of 20 μl.
The same amount (10 μl) of the resulting standards and cell lysates were loaded on a 4-12% gradient SDS-PAGE gel (Lonza, #59524). After the run, the gels were transferred onto a PVDF membrane (Biorad, #162-0177) and then blocked at room temperature for 1 hr in 5% skim milk (w/v of TBST, 138 mM NaCl, 2.7 mM KCl, 0.1% Triton X-100, 25 mM Tris-HCl, pH 8.0). The anti-Cas9 antibody (abcam, # ab202580) was used as primary antibody and added 1:2000 into 2.5% skim milk (w/v of TBST). The primary antibody solution was then added to the PVDF membrane and allowed to bind for 1 hour at room temperature. The membrane was then washed three times with TBST. The secondary antibody, HRP-conjugated anti-mouse antibody (Sigma, # A8924), was added to 1:4000 and incubated for 1 hour at room temperature. After washing the membrane, chemiluminescence for HRP (Pierce, #32106) was used to develop the signal and detected using the Biorad chemidoc MP imaging system (Biorad, #170-8280). ImageJ 1.41 (NIH) was used to analyze the gel densitometry. The relative protein numbers of dCas9 in the strain was calculated from the standard curve and known concentrations of Cas9 standards (
Random sequence generation.
The random sequences are generated using the online Random DNA Sequence Generator (www.faculty.ucr.edu/˜mmaduro/random.htm) with GC content set to 50%.
sgRNA array.
Pairs of ssDNA oligonucleotides ≤200 nt long that encode the necessary genetic parts (promoter, sgRNA, terminator) were ordered from Integrated DNA Technologies (IDT). These oligos are annealed by PCR using KAPA HiFi MasterMix (KAPA Biosystems, #07958935001) and the resulting dsDNA modules were then assembled in a one-pot Golden Gate assembly reaction using type II enzymes BsaI (New England Biolabs, # R0535S) or BsmbI (New England Biolabs, # R0580S) to generate plasmids with different numbers of sgRNAs. After transformation, these plasmids were re-purified and digested with restriction enzyme BsphI (New England Biolabs, # R0517S) to make sure they have the expected sizes and thus rule out the possibility of unwanted homologous recombination during construction and transformation (
Energy cost of expressing dCas9*_PhlF and TetR.
The tetR gene is 624 bp and the translated TetR protein contains 207 amino acids. Based on a previous study (Kaleta C., et al., Biotechnol. J., 2013 September; 8(9): 1105-14), for transcription, 0.6 ATP is needed per nucleotide triplet. The required ATPs for transcription of tetR mRNA would be: 0.6×624/3=124.8. In addition, the required ATPs to synthesize each amino acid from glucose were obtained from TABLE 1 of the same study (Kaleta C., et al., Biotechnol. J., 2013 September; 8(9): 1105-14), and the ATPs required for synthesizing amino acids in the TetR protein can be calculated, which is −307 (the negative value means net production of ATPs). For translation, 4 ATPs are needed per amino acid, and thus the ATPs required are 4×207=828. Overall, the ATPs required for synthesizing one TetR protein are: 124.8−307+828=645.8. The engineered dCas9*_PhlF protein contains 1511 amino acids (4536 bp DNA), and the ATPs required for each of these steps are: 907.2, −795, 6044. The overall ATPs consumption for synthesizing one dCas9*_PhlF protein would be 907.2−795+6044=6156.2.
Combining response functions.
In the layered cascade circuit of NOT gates, the output values from the previous layer serve as the input values to the current layer (
Parameters are shown for a fit to Equation 1 in main text. Gate sequences are provided in
Derivation for the impact of dCas9 sharing by multiple sgRNAs.
When multiple competing sgRNAs (i=2 . . . n) were expressed:
It was assumed that all of the co-expressed competing sgRNAs had the same transcription rate αi=αx for i=2 . . . n.
For the formation of each sgRNA::dCas9 complex:
The dynamics of free dCas9 is given by:
At steady-state, Equations 1-6 reduce to:
and N=n−1 is the number of co-expressed competing sgRNAs.
Transcription of a target reporter gene is blocked when a dCas9-sgRNA binds to its promoter (
A reporter system was developed to evaluate the impact of these modifications on the ability for dCas9 to repress the targeted promoter (
The ability for a zinc finger protein (ZFP) to recover nuclease activity was first tested. To this end, a variant of dCas9* described previously was built, where Zif268TS3 is fused to the C-terminal end of dCas9* via a 58 amino acid linker (Bolukbasi M. F., et al., Nat. Methods, 2015 December; 12(12): 1150-56). The corresponding 12 bp operator recognized by Zif268Ts3 was then placed upstream of the promoter, separated from the −35 position by a spacer (all promoter variants described are provided in TABLE 3). The orientation of the operator (forward and reverse) was initially tested with the forward yielding higher repression as previously observed (Bolukbasi M. F., et al., Nat. Methods, 2015 December; 12(12): 1150-56). Thus, it was selected for all subsequent optimization. The deletion of the nuclease domain (AHNH) (Sternberg S. H., et al., Nature, 2015. 527(7576): 110-13) and the increase in linker size to 88 amino acids (L88) both improved repression (
TetR-family repressors were then evaluated in place of the ZFP using the same dCas9* variant (88 amino acid linker, AHNH). Four repressors were tested (PhlF, BM3RI, HlyIIR, and SrpR) and their corresponding operators (30 bp, 20 bp, 22 bp, 30 bp, TABLE 3) were inserted in front of the promoter with the 6 bp spacer (Stanton B. C., et al., Nat. Chem. Biol., 2014. 10(2): p. 99-105). Of these, the PhlF fusion (dCas9*_PhlF) recovered the most activity, achieving 95% of the repression of dCas9 with an optimal spacer length of 6 bp (
The growth impact of dCas9 was then compared to dCas9*_PhlF at different levels of expression, controlled by the addition of aTc. The activity of the pTet promoter is used as a surrogate of dCas9 expression, measured in independent experiments using a separate plasmid and red fluorescent protein (
Note that the use of promoter strengths to compare expression levels between dCas9 and dCas9*_PhlF is, at best, inexact as these genes will translate differently. Therefore, immunoblotting was performed to quantify the size of the pools of each protein that the cell can tolerate before a growth impact is observed. Based on the growth experiment, 0.7 ng/ml aTc was chosen for dCas9 and 2.5 ng/ml aTc for dCas9*_PhlF as the inducer levels just prior to the corresponding thresholds (arrows in
A transcriptional NOT gate inverts the response of a promoter (Yokobayashi Y., et al., Proc. Natl. Acad. Sci. USA, 2002 Dec. 24; 99(26): 16587-91). More complex circuits can be constructed by connecting NOT gates to each other (e.g., toggle switch and oscillator) or by converting to NOR gates through the addition of a second upstream input promoter (Nielsen A. A., et al., Science, 2016. 352(6281): aac7341; Gardner T. S., et al., Nature, 2000 Jan. 20; 403(6767): 339-42; Elowitz M. B. and Leibler S., Nature, 2000. 403(6767): 335-38; Tamsir A., et al., Nature, 2011. 469(7329): 212-15). Previously, an architecture was designed for NOT and NOR gates based on sgRNAs using dCas9 (Nielsen A. A. and Voigt C. A., Mol. Syst. Biol., 2014. 10(763): 1-11). Here, this approach was followed to build gates based on dCas9*_PhlF, where the input promoter driving sgRNA is an IPTG-inducible pTac promoter (
The response function is characterized by comparing the activity of the pTac promoter, measured separately, versus the activity of the output promoter (
where y is the output promoter activity (and Ymax/Ymin are the maximum/minimum activities), x is the input promoter activity, K is the threshold and n is the cooperativity. Note that the values of the promoter activities are in arbitrary units of red fluorescence and not standardized units. The response function from dCas9 is linear over the entire range of input with n=0.9, as observed previously (
A library of NOT gates was then built based on a set of 30 orthogonal sgRNAs (Nielsen A. A. and Voigt C. A., Mol. Syst. Biol., 2014. 10(763): 1-11). The target sequence corresponding to each was used to construct a promoter based on the system shown in
Cascades were constructed to demonstrate the layering of gates. First, the vanillic acid inducible system (pVan) was selected to serve as the input because it was observed to generate the largest dynamic range (341-fold) (
Genetic circuits with more than one gate require the simultaneous expression of multiple sgRNAs within the cell that need to compete with the same pool of dCas9 molecules. The sharing impacts the dynamics of each component in the system and this can have unintended consequences for the overall behavior of the circuit (Del Vecchio D., et al., Mol. Syst. Biol., 2008. 4(161): 1-16). Therefore, it is important to quantify the titration that occurs as more sgRNAs are simultaneously expressed.
First, the impact of resource sharing between two sgRNAs was characterized (
It is expected that as more sgRNAs are added to the system, at some point there would be a decline in their ability to function as dCas9*_PhlF is titrated. To quantify this transition, a mathematical model was developed inspired closely by the work of Del Vecchio and co-workers (Chen P. Y., et al., bioRxiv, 2018 Feb. 4: doi.org/10.1101/266015). The equations corresponding to when two sgRNAs are expressed are described below and this is expanded to a system of i sgRNAs in the Methods section. The pool of total dCas9 CTOT is assumed to be constant. It can be described as the algebraic sum of free dCas9 CF and the concentrations of dCas9 bound to the first and second sgRNAs (s1 and s2),
C
TOT
=C
F
+C
s1
+C
s2 (2)
The dynamics of the unbound sgRNAs s1 and s2 are captured by the differential equations
where α1 and α2 are the transcription rates of the first and second sgRNAs. δs is degradation rates, and assumed to be the same for different sgRNAs. Similarly, the on- and off-rates of sgRNAs to dCas9 (k1 and k−1) are assumed to be sequence independent. There are two additional differential equations for the formation of sgRNA::dCas9 complexes:
Finally, the concentration of free dCas9 is given by
At steady-state, Equations 1-6 reduce to
where K1 is the association equilibrium constant of sgRNA to dCas9. This captures how increasing the concentration of the second sgRNA impacts the concentration of complexes with the first. By substituting sgRNA concentration from Equation 9, one can simplify Equation 9 to
Considering a Shea-Acker's model of a repressor binding to a promoter (related in form to Equation 1 of Example 4), the impact on transcription would be:
where G/Gss is the fold-repression, K is the dissociation equilibrium constant for dCas9::sgRNA binding to the promoter, and n is the cooperativity. Combining Equations 10 and 11 shows how the expression of a second sgRNA impact the repression of promoter responsive to the first sgRNA.
Similarly, concentration of the first sgRNA::dCas9 complex can be derived when multiple competing sgRNAs are co-expressed and sharing the dCas9 pool (Methods section):
where N is the number of additional co-expressed sgRNAs and αx is the transcription rate of these competing sgRNAs. The concentration for each of these competing sgRNAs is assumed to be equal. The fold-repression is calculated by substituting Cs1 from Equation 12 into Equation 11.
To parameterize the model, how the response of a sgRNA declines as more competing sgRNAs are added to the system was measured. The response of a vanillic acid-driven NOT gate based on sgRNA9 was measured; alone, it generates 58-fold repression (
One goal of this study was to evaluate a maximum number of sgRNAs that can be used together. Therefore, the system was tuned to minimize the expression level of each sgRNA to the point where they are as low as possible but still could minimally function as a NOT gate. In accordance with this approach, the constitutive promoter (pCon) was selected such that each sgRNA yields ˜10-fold repression when measured in the context of the N16 construct (
The impact on the sgRNA9 gate was measured as a function of the number of additional sgRNAs co-expressed (
Discussion.
The original uses intended for Cas9 and dCas9 have different constraints than those required for genetic circuits. Genome editing and knockdown experiments only require transient and low levels of expression for activity. These applications benefit from the capability of sgRNA to be designed to target essentially any region of the genome and this programmability could be very useful for building out sets of orthogonal regulators for genetic circuits. However, integrating a circuit into an application is more complicated, for example to produce a chemical product in a fermenter or integrate information in the human gut (Lian J., et al., Nat. Commun., 2017 Nov. 22; 8(1): 1688; Cress B. F., et al., Nucleic Acids Res., 2016 May 19; 44(9): 4472-85; Mimee M., et al., Cell Syst., 2015 Jul. 29; 1(1): 62-71; Fernandez-Rodriguez J., et al., Nat. Chem. Biol., 2017 July; 13(7): 706-8; Brophy J. A. N. and Voigt, C. A., Nat. Methods, 2014. 11(5): 508-20). For these purposes, a circuit cannot reduce growth or require significant cellular resources or energy to function. One of these problems has been solved, as described herein, where the growth impact of dCas9 is greatly reduced by increasing the required DNA sequence to which it binds by swapping a 3 bp PAM site for a 30 bp PhlF operator. This allows the expression of dCas9*_PhlF to be increased to ˜104 copies per cell, which is just about as high as one can expect to push the expression of a large protein in E. coli (Milo R. and Phillips R., Garland Science, 2015).
Repetitive sequences shared between gates is another challenge that must be solved before large sgRNA circuits can be built based on dCas9*_PhlF. The shared sequences can lead to genetic instability due to homologous recombination (Lou C., et al., Nat. Biotechnol., 2012 November; 30(11): 1137-42; Sleight S. C. and Sauro H. M., ACS Synth. Biol., 2013 Sep. 20; 2(9): 519-28). All of the sgRNA-based gates share the identical 83 bp tracrRNA sequences, and the output promoters share the identical 30 bp PhlF operator (
However, before undertaking this effort, it is important to consider whether the concept makes sense. The pool of dCas9*_PhlF would need to be maintained at a constant ˜104 molecules irrespective of the number of active gates. Our experimental data and model show that this can support about 15 sgRNA-based gates (Methods section). This is about on par with the number of available protein-based gates and is a harsh limitation to the huge number of potential gates considering sgRNA programmability alone (estimated to be ˜107 sgRNA-promoter pairs) (Nielsen A. A. and Voigt C. A., Mol. Syst. Biol., 2014. 10(763): 1-11). The retroactivity due to having to share the dCas9*_PhlF resource also changes as each additional sgRNAs is added to the system. When designing circuits, a mathematical model would have to be used to mitigate this complexity. Thus, the benefit of sgRNA-based gates, even when the dCas9 toxicity is solved, is not a scale-up in size, although there may be other benefits for certain scenarios.
One such scenario may be in eukaryotes where using dCas9-based gates have an advantage (Li Y., et al., Nat. Chem. Biol., 2015 March; 11(3): 207-13; Gander M. W., et al., Nat. Commun., 2017 May 25; 8: 15459; Nissim L., et al., Cell, 2017 Nov. 16; 171(5): 1138-50). The lack of translation at the gate level means that that circuit function can be entirely localized to the nucleus (once a dCas9 pool has been imported), thus avoiding the capping and export of the mRNA and importing of each protein-based repressor. Another may be for organisms where for which the circuit needs to be carried at low copy and the design of high-expression promoters remains elusive (Mimee M., et al., Cell Syst., 2015 Jul. 29; 1(1): 62-71).
A false concept is that sgRNA gates require less cellular resources because they do not require translation to function. While each gate only requires a new sgRNA to be transcribed, for it to be functional it needs a dCas9*_PhlF to form a complex that represses the output promoter. The binding of sgRNA to dCas9 is very tight (Kd=10 pM) (Wright A. V., et al., Proc. Natl. Acad. Sci. USA, 2015. 112(10): p. 2984-89) and dCas9 binds tightly to DNA (Kd=1 nM) (Sternberg S. H., et al., Nature, 2014. 507(7490): 62-67; Richardson C. D., et al., Nat. Biotechnol., 2016 March; 34(3): 339-44; Josephs E. A., et al., Nucleic Acids Res., 2015 Oct. 15; 43(18): 8924-41), requiring DNA replication machinery for removal during division (Jones D. L., et al., Science, 2017 Sep. 29; 357(6358): 1420-24). Therefore, it is likely that recycling of the pool will be low (reuse of dCas9 after dissociating from a previous sgRNA). This makes the cost of each dCas9*_PhlF:sgRNA “repressor” high when compared to a protein-based repressor (e.g., TetR). Putting it in terms of ATP consumption, an estimation is that the former requires ˜6000 ATP/repressor and the latter ˜600 ATP/repressor (Methods).
The sharing of a resource is a common feature of cells, including natural regulatory networks (Cookson N. A., et al., Mol. Syst. Biol., 2011 Dec. 20; 7:561; Mishra D., et al., Nat. Biotechnol., 2014 December; 32(12): 1268-75). One example are sigma factors, turned on in response to different cellular needs, that all must share core RNA polymerase to initiate transcription from a promoter (Gruber T. M. and Gross C. A., Annu. Rev. Microbiol., 2003; 57: 441-66). If multiple sigma factors were co-expressed, this would draw down the core resource. It has been shown that B. subtilis has an innovative solution: each sigma factor is expressed as an independent pulse and the pulsing time is changed with respect to need, as opposed to the expression level (Park J., et al., Cell Syst. 2018 Feb. 28; 6(2): 216-29). In the natural network, this is achieved with feedback loops of a complexity still elusive to achieve in engineered systems. Still, it may be a solution to the circuit limitations of dCas9 as well as other similar problems in the field (Cookson N. A., et al., Mol. Syst. Biol., 2011 Dec. 20; 7:561; Segall-Shapiro T. H., et al., Mol. Syst. Biol., 2014 Jul. 30; 10: 742). Until then, our results point to the difficulty of using a genetic circuit paradigm that requires a shared (and expensive) non-recyclable resource in bacteria. This work highlights the need to develop theoretical and experimental frameworks to quantify the cellular impact of introducing systems into cells, prior to performing experiments, in order to rationally guide design decisions.
All of the features disclosed in this specification may be combined in any combination. Each feature disclosed in this specification may be replaced by an alternative feature serving the same, equivalent, or similar purpose. Thus, unless expressly stated otherwise, each feature disclosed is only an example of a generic series of equivalent or similar features.
From the above description, one skilled in the art can easily ascertain the essential characteristics of the present disclosure, and without departing from the spirit and scope thereof, can make various changes and modifications of the disclosure to adapt it to various usages and conditions. Thus, other embodiments are also within the claims.
While several inventive embodiments have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the inventive embodiments described herein. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the inventive teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific inventive embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, inventive embodiments may be practiced otherwise than as specifically described and claimed. Inventive embodiments of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the inventive scope of the present disclosure.
All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms.
All references, patents and patent applications disclosed herein are incorporated by reference with respect to the subject matter for which each is cited, which in some cases may encompass the entirety of the document.
The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.”
The phrase “and/or,” as used herein in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to “A and/or B,” when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
As used herein in the specification and in the claims, “or” should be understood to have the same meaning as “and/or” as defined above. For example, when separating items in a list, “or” or “and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of” or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e. “one or the other but not both”) when preceded by terms of exclusivity, such as “either,” “one of,” “only one of,” or “exactly one of.” “Consisting essentially of,” when used in the claims, shall have its ordinary meaning as used in the field of patent law.
As used herein in the specification and in the claims, the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, “at least one of A and B” (or, equivalently, “at least one of A or B,” or, equivalently “at least one of A and/or B”) can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
It should also be understood that, unless clearly indicated to the contrary, in any methods claimed herein that include more than one step or act, the order of the steps or acts of the method is not necessarily limited to the order in which the steps or acts of the method are recited.
In the claims, as well as in the specification above, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,” “composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of” and “consisting essentially of” shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03. It should be appreciated that embodiments described in this document using an open-ended transitional phrase (e.g., “comprising”) are also contemplated, in alternative embodiments, as “consisting of” and “consisting essentially of” the feature described by the open-ended transitional phrase. For example, if the disclosure describes “a composition comprising A and B,” the disclosure also contemplates the alternative embodiments “a composition consisting of A and B” and “a composition consisting essentially of A and B.”
This application claims priority under 35 U.S.C. § 119(e) to U.S. patent application No. 62/735,877, filed Sep. 25, 2018, the entire contents of which are incorporated herein by reference.
This invention was made with Government support under Grant No. N00014-16-1-2388 awarded by the Office of Naval Research. The Government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62735877 | Sep 2018 | US |