The present disclosure relates generally to genetic engineering of cells to perform specific and complex functions. In particular, the present disclosure relates to engineered mammalian cells and methods of engineering mammalian cells, as well as novel multi-functional proteins integrating both transcriptional and post-translational control effectively linking genetic circuits with sensors for multi-input evaluations.
The following discussion is merely provided to aid the reader in understanding the disclosure and is not admitted to describe or constitute prior art thereto.
Early demonstrations of genetically engineering customized functions in mammalian cells indicate a vast potential to benefit applications including directed stem cell differentiation (1, 2) and cancer immunotherapy (3). In general, most applications require precise control of gene expression and the capability to sense and respond to external cues (4-8). Despite the growing availability of biological parts (such as libraries of promoters and regulatory proteins) that could be used to control cell states, assembling parts to compose customized genetic programs that function as intended remains a challenge, and it often requires iterative experimental tuning or down-selection to identify functional configurations. This highly empirical process limits both the scope of programs that one can feasibly compose and fine-tune and likely the performance of functional programs identified in this manner. Thus, the need for systematic and precise design processes represents a grand challenge in the field of mammalian synthetic biology.
Model-guided predictive design has been demonstrated in the composition of some cellular functions, including transcriptional logic in bacteria (9) as well as logical (10) and analog behaviors in yeast (11); however, this type of approach is less developed in mammalian systems. To date, transcription factors (TFs) based on zinc fingers (ZFs) (12, 13), transcription activator-like effectors (TALEs) (14-17), dCas9 (18, 19), and other proteins (20) have been used to implement transcriptional logic in mammalian cells. Some of these studies make use of protein splicing (12, 14, 18). Other studies have used RNA-binding proteins (21), proteases (22, 23), and synthetic protein-binding domains (17). Yet, none of these approaches currently enable the customized design of sophisticated mammalian cellular functions and prediction of circuit performance based only upon descriptions of the component parts. Associated challenges include the availability of appropriate parts (24), suitably descriptive models that support predictions using these parts (25), and computational and conceptual tools that facilitate the identification of designs that function robustly despite biological variability and crosstalk (26-28).
Accordingly, there is a need in the art for a tractable set of genetic circuits useful in mammalian cells that do not require unduly laborious testing and empirical trial-and-error tuning. The present disclosure fulfills that need by providing such genetic circuits with a variety of functions including, but not limited to, digital and analog information processing, and sense-and-respond behaviors.
Genetically engineering cells to perform customizable functions is an emerging frontier with numerous technological and translational applications. However, it remains challenging to systematically engineer mammalian cells to execute complex functions. To address this need, the present disclosure provides method enabling accurate genetic program design using high-performing genetic parts and predictive computational models, as well as novel multi-functional proteins integrating both transcriptional and post-translational control, validated models for describing these mechanisms, implemented digital and analog processing, and effectively linked genetic circuits with sensors for multi-input evaluations. The functional modularity and compositional versatility of these parts enable one to satisfy a given design objective via multiple synonymous programs. The platform described herein enables bioengineers to predictively design mammalian cellular functions that perform as expected even at high levels of biological complexity.
In one aspect, the disclosure provides an engineered genetic circuit comprising: (a) one or more engineered proteins selected from the group consisting of: an engineered protein that activates gene expression, wherein the engineered protein comprises a DNA binding domain, a transcription activator domain, and at least one split intein on the C-terminus or N-terminus of the DNA binding domain and/or the transcription activator domain; an engineered protein that inhibits gene expression, the engineered protein comprising a DNA binding domain, a transcription inhibitor domain, and at least one split intein on the C-terminus or N-terminus of the DNA binding domain and/or the transcription inhibitor domain; and a combination of two engineered proteins comprising a first engineered protein comprising a DNA binding domain fused to a dimerization domain, and a second engineered protein comprising a transcription regulator domain fused to a dimerization domain, wherein the dimerization domains of the two engineered proteins dimerize in the presence of a stimulus to which the dimerization domains of the two engineered proteins bind, and wherein and the first engineered protein and the second engineered protein each comprise at least one split intein; and (b) one or more engineered expression vectors comprising a minimal promoter and one or more DNA binding sites for the DNA binding domains of the engineered proteins of (a), and optionally a gene of interest that is expressed from the minimal promoter.
In some embodiments, the genetic circuit may comprise the engineered protein of (i) and the engineered protein of (ii).
In some embodiments, the DNA binding domain of the one or more engineered proteins of (i), (ii), and (iii) comprises one or more zinc fingers. In some embodiments, the zinc finger may be such as ZF1, ZF2, ZF3, ZF4, ZF5, ZF6, ZF7, ZF8, ZF9, ZF10, ZF11, ZF12, ZF13, ZF14, or ZF15. In some embodiments, the DNA binding domain of the one or more engineered proteins of (i), (ii), and (iii) may comprise, for example, all of or a functional fragment of a zinc finger protein comprising more than three DNA-binding domains, other classes of programmable DNA binding domains (e.g., transcription activator-like effector (TALE)), DNA binding domains derived from microbial proteins (e.g., tetR, 1acI, etc.), and/or Cas9 or variants of Cas9 and other Cas proteins, including catalytically inactive variants (e.g., dCas9). In some embodiments, the DNA binding domain of the one or more engineered proteins of (i), (ii), and (iii) comprises 2, 3, or more zinc fingers or 2, 3, or more of the other DNA binding domains provided herein.
In some embodiments, the engineered proteins are fusion proteins comprising heterologous domains.
In some embodiments, the transcription activator domain of the engineered protein of (i), (ii), and/or (iii) comprises a domain from a transcription activator selected from the group consisting of Herpes simplex virus protein 16 (VP16), a synthetic tetramer of VP16 (VP64), nuclear factor (NF) kappa-B (p65), heat shock transcription factor 1 (HSF1), replication and transcription activator (RTA) of the gamma-herpesvirus family, p53, an acidic domain (also known as “acid blobs” or “negative noodles,” rich in D and E amino acids, present in Ga14, Gcn4 and VP16), a glutamine-rich domain (which may comprise multiple repetitions like “QQQXXXQQQ,” like those present in transcription factor Sp1), a proline-rich domains (which may comprise repetitions like “PPPXXXPPP,” like those present in c-jun, AP2, and October 2), an isoleucine-rich domain (which may comprise repetitions of “IIXXII,” like those present in NTF-1), and a multipartite activator.
In some embodiments, the engineered protein of (ii) inhibits activation of transcription by the engineered protein of (i).
In some embodiments, the transcription regulator domain of the second engineered protein of the combination of engineered proteins of (iii) is a transcription activator domain optionally selected from the group consisting of Herpes simplex virus protein 16 (VP16), a synthetic tetramer of VP16 (VP64), nuclear factor (NF) kappa-B (p65), heat shock transcription factor 1 (HSF1), replication and transcription activator (RTA) of the gamma-herpesvirus family, p53, an acidic domain (also known as “acid blobs” or “negative noodles,” rich in D and E amino acids, present in Ga14, Gcn4 and VP16), a glutamine-rich domain (which may comprise multiple repetitions like “QQQXXXQQQ,” like those present in transcription factor Sp1), a proline-rich domains (which may comprise repetitions like “PPPXXXPPP,” like those present in c-jun, AP2, and October 2), an isoleucine-rich domain (which may comprise repetitions of “IIXXII,” like those present in NTF-1), and a multipartite activator.
In some embodiments, the engineered proteins of (i) or (ii) are present in an exogenous extracellular sensor. In some embodiments, the extracellular sensor comprises: (a) a ligand binding domain, (b) a transmembrane domain, (c) a protease cleavage site, and (d) the engineered protein of (i) or (ii).
In some embodiments, the split intein is a wild-type split intein. In some embodiments, the at least one split intein is a mutated split intein. In some embodiments, the at least one split intein is appended to the N-terminus of the engineered protein. In some embodiments, the split intein comprises SEQ ID NO: 1 or an amino acid sequence that possesses at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity thereto. In some embodiments, the at least one split intein is appended to the C-terminus of the engineered protein. In some embodiments, the split intein comprises SEQ ID NO: 3 or an amino acid sequence that possesses at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity thereto.
In some embodiments, the circuit components are eukaryotic, while in some embodiments, the circuit components are mammalian.
In some embodiments, the stimulus is a ligand, exposure to light, removal from light, phosphorylation, dephosphorylation, a post-translational modification of the dimerization domain, a change in the state of the environment in which the engineered genetic circuit is expressed.
In another aspect, the present disclosure provides an engineered genetic circuit, comprising: (a) a first engineered protein that activates gene expression, the first engineered protein comprising a first DNA binding domain, a first transcription activator domain, and at least one split intein on the C-terminus or N-terminus of the first DNA binding domain and/or the first transcription activator domain; (b) a first engineered expression vector comprising a minimal promoter and first DNA binding sites for the first DNA binding domain of the first engineered protein, and a first gene of interest that is expressed from the minimal promoter, wherein the gene of interest encodes a second engineered protein, the second engineered protein comprising a second DNA binding domain, a second transcription activator domain, and at least one split intein on the C-terminus or N-terminus of the second DNA binding domain and/or the second transcription activator domain; and (c) a second engineered expression vector comprising a minimal promoter and second DNA binding sites for the second DNA binding domain of the second engineered protein, and a second gene of interest that is expressed from the minimal promoter, wherein the second gene of interest encodes a detectable reporter protein; wherein the first engineered protein increases expression from the first engineered expression vector and the second engineered protein increases expression from the second engineered vector.
In another aspect, the present disclosure provides an exogenous extracellular sensor system comprising: (i) a first exogenous extracellular sensor component comprising:
(ii) a second exogenous extracellular sensor component comprising
(iii) an engineered expression vector comprising a minimal promoter and one or more DNA binding sites for the DNA binding domains of the first exogenous extracellular sensor, and, optionally, a gene of interest that is expressed from the minimal promoter; wherein the ligand binding domain of the first exogenous extracellular sensor component and the ligand binding domain of the second exogenous extracellular sensor component bind to the same ligand to form a tertiary complex; wherein the protease domain of the second exogenous extracellular sensor component cleaves the protease cleavage site of the first exogenous extracellular sensor component to release the engineered protein domain comprising the DNA binding domain and transcription activator domain; and wherein the DNA binding domain of the engineered protein domain binds to the one or more DNA binding sites of the engineered expression vector and increases expression from the minimal promoter of the engineered expression vector.
In another aspect, the present disclosure provides a host cell comprising the engineered genetic circuit of any one of foregoing aspects or embodiments or the exogenous extracellular sensor system of the foregoing aspect. In some embodiments, the host cell is eukaryotic. In some embodiments, the host the cell is mammalian.
The foregoing general description and following detailed description are exemplary and explanatory and are intended to provide further explanation of the disclosure as claimed. Other objects, advantages, and novel features will be readily apparent to those skilled in the art from the following brief description of the drawings and detailed description of the disclosure.
The present disclosure provides a platform of composable mammalian elements of transcription (COMET) with novel functionalities and which may be integrated in novel ways. The present disclosure achieves the incorporation of protein splicing-based post-translational control in COMET transcription factors, which can be accurately predicted using mathematical models. This mechanism employs split inteins: complementary domains that fold and trans-splice to covalently ligate flanking domains. The present disclosure demonstrated the utility of this platform by designing and building genetic circuits that implement a variety of functions including digital and analog information processing and sense-and-respond behaviors. Part of this implementation includes demonstrating the combination of modular expression sensor architecture (MESA) receptors and COMET transcription factors. Ultimately, this capability enables the construction of sophisticated engineered cellular functions for applications in biotechnology, medicine, and fundamental research.
I. Definitions
It is to be understood that methods are not limited to the particular embodiments described, and as such may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. The scope of the present technology will be limited only by the appended claims.
As used herein, certain terms may have the following defined meanings. As used in the specification and claims, the singular form “a,” “an” and “the” include singular and plural references unless the context clearly dictates otherwise. For example, the term “a peptide” includes a single peptide as well as a plurality of peptides, including mixtures thereof.
As used herein, the term “comprising” is intended to mean that the compositions and methods include the recited elements, but not excluding others. “Consisting essentially of” when used to define compositions and methods, shall mean excluding other elements of any essential significance to the composition or method. “Consisting of” shall mean excluding more than trace elements of other ingredients for claimed compositions and substantial method steps. Embodiments defined by each of these transition terms are within the scope of this disclosure. Accordingly, it is intended that the methods and compositions can include additional steps and components (comprising) or alternatively including steps and compositions of no significance (consisting essentially of) or alternatively, intending only the stated method steps or compositions (consisting of).
As used herein, “about” means plus or minus 10% as well as the specified number. For example, “about 10” should be understood as both “10” and “9-11.”
As used herein, “optional” or “optionally” means that the subsequently described event or circumstance may or may not occur, and that the description includes instances where said event or circumstance occurs and instances where it does not.
Regarding proteins, the phrases “percent identity” and “% identity,” refer to the percentage of residue matches between at least two amino acid sequences aligned using a standardized algorithm. Methods of amino acid sequence alignment are well-known. Some alignment methods take into account conservative amino acid substitutions. Such conservative substitutions, explained in more detail below, generally preserve the charge and hydrophobicity at the site of substitution, thus preserving the structure (and therefore function) of the polypeptide. Percent identity for amino acid sequences may be determined as understood in the art. (See, e.g., U.S. Pat. No. 7,396,664, which is incorporated herein by reference in its entirety). A suite of commonly used and freely available sequence comparison algorithms is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST), which is available from several sources, including the NCBI, Bethesda, Md., at its website. The BLAST software suite includes various sequence analysis programs including “blastp,” that is used to align a known amino acid sequence with other amino acids sequences from a variety of databases.
Regarding proteins, percent identity may be measured over the length of an entire defined polypeptide sequence, for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined polypeptide sequence, for instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 150 contiguous residues. Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in the tables, figures or Sequence Listing, may be used to describe a length over which percentage identity may be measured.
Regarding polynucleotide and amino acid sequences, “variant,” “mutant,” or “derivative” may be defined as a sequence having at least 50% sequence identity to the particular sequence over a certain length of one of the sequences using blastn or blastp with the “BLAST 2 Sequences” tool available at the National Center for Biotechnology Information's website. (See Tatiana A. Tatusova, Thomas L. Madden (1999), “Blast 2 sequences—a new tool for comparing protein and nucleotide sequences”, FEMS Microbiol Lett. 174:247-250). Such a pair of variant, mutant, or derivative sequences may show, for example, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater sequence identity over a certain defined length.
Nucleic acid sequences that do not show a high degree of identity may nevertheless encode the same or similar amino acid sequences due to the degeneracy of the genetic code where multiple codons may encode for a single amino acid. It is understood that changes in a nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that all encode substantially the same protein. For example, polynucleotide sequences as contemplated herein may encode a protein and may be codon-optimized for expression in a particular host. In the art, codon usage frequency tables have been prepared for a number of host organisms including humans, mouse, rat, pig, E. coli, plants, and other host cells.
“Transformation” or “transfection” describes a process by which exogenous nucleic acid (e.g., DNA or RNA) is introduced into a recipient cell. Transformation or transfection may occur under natural or artificial conditions according to various methods well known in the art, and may rely on any known method for the insertion of foreign nucleic acid sequences into a prokaryotic or eukaryotic host cell. The method for transformation or transfection is selected based on the type of host cell being transformed and may include, but is not limited to, bacteriophage or viral infection or non-viral delivery. Methods of non-viral delivery of nucleic acids include lipofection, nucleofection, microinjection, electroporation, heat shock, particle bombardment, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g. U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™ and Lipofectin™). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Felgner, WO 91/17424; WO 91/16024. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration). The term “transformed cells” or “transfected cells” includes stably transformed or transfected cells in which the inserted DNA is capable of replication either as an autonomously replicating plasmid or as part of the host chromosome, as well as transiently transformed or transfected cells which express the inserted DNA or RNA for limited periods of time.
The polynucleotide sequences contemplated herein may be present in expression vectors. For example, the vectors may comprise: (a) a polynucleotide encoding an ORF of a protein; (b) a polynucleotide that expresses an RNA that directs RNA-mediated binding, nicking, and/or cleaving of a target DNA sequence; and both (a) and (b). The polynucleotide present in the vector may be operably linked to a prokaryotic or eukaryotic promoter. “Operably linked” refers to the situation in which a first nucleic acid sequence is placed in a functional relationship with a second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Operably linked DNA sequences may be in close proximity or contiguous and, where necessary to join two protein coding regions, in the same reading frame. Vectors contemplated herein may comprise a heterologous promoter (e.g., a eukaryotic or prokaryotic promoter) operably linked to a polynucleotide that encodes a protein. A “heterologous promoter” refers to a promoter that is not the native or endogenous promoter for the protein or RNA that is being expressed.
As used herein, “expression” refers to the process by which a polynucleotide is transcribed from a DNA template (such as into and mRNA or other RNA transcript) and/or the process by which a transcribed mRNA is subsequently translated into peptides, polypeptides, or proteins. Transcripts and encoded polypeptides may be collectively referred to as “gene product.” If the polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell.
The term “vector” refers to some means by which nucleic acid (e.g., DNA) can be introduced into a host organism or host tissue. There are various types of vectors including plasmid vector, bacteriophage vectors, cosmid vectors, bacterial vectors, and viral vectors. As used herein, a “vector” may refer to a recombinant nucleic acid that has been engineered to express a heterologous polypeptide (e.g., the fusion proteins disclosed herein). The recombinant nucleic acid typically includes cis-acting elements for expression of the heterologous polypeptide. Any of the conventional vectors used for expression in eukaryotic cells may be used for directly introducing DNA into a subject. Expression vectors containing regulatory elements from eukaryotic viruses may be used in eukaryotic expression vectors (e.g., vectors containing SV40, CMV, or retroviral promoters or enhancers). Exemplary vectors include those that express proteins under the direction of such promoters as the SV40 early promoter, SV40 later promoter, metallothionein promoter, human cytomegalovirus promoter, murine mammary tumor virus promoter, and Rous sarcoma virus promoter. Expression vectors as contemplated herein may include eukaryotic or prokaryotic control sequences that modulate expression of a heterologous protein (e.g., the fusion protein disclosed herein).
Certain proteins or polypeptide sequences disclosed herein (e.g., split inteins) may include “wild type” proteins and variants, mutants, and derivatives thereof. As used herein the term “wild type” is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms. As used herein, a “variant”, “mutant,” or “derivative” refers to a protein molecule having an amino acid sequence that differs from a reference protein or polypeptide molecule. A variant or mutant may have one or more insertions, deletions, or substitutions of an amino acid residue relative to a reference molecule. A variant or mutant may include a fragment of a reference molecule. For example, a mutant or variant molecule may one or more insertions, deletions, or substitution of at least one amino acid residue relative to a reference polypeptide.
II. Abbreviations
III. Predictable Engineered Genetic Circuit
The technical field of the disclosed platform technology relates to biological engineering in mammalian synthetic biology. Mammalian cells can be programmed for numerous applications, ranging from customized cell-based therapeutics to tools for probing fundamental biological questions. To date, however, the tools available for composing such biological programs are limited in number, and tuning the performance of such biological parts is challenging, limiting the scope of applications that can be pursued. To meet this need, a Composable Mammalian Elements of Transcription (COMET) toolkit has been developed, and the current technology platform builds onto the COMET toolkit—making it more precise and, simultaneously, more versatile and genetically compact—by incorporating into the genetic circuits split parts (e.g., by utilizing“split inteins”). A basic COMET toolkit comprises a suite of engineered proteins that regulate gene expression, including both activation and suppression of gene expression, and engineered DNA sequences that are regulated by these engineered proteins. Both the proteins and the cognate DNA sequences are modular in design, enabling one to tune the quantitative performance of the system and to multiplex these elements to build sophisticated, customized, cellular functions. The incorporation of split parts that can aid in the splicing or dimerization of the engineered proteins, thus improving the COMET system by adding a layer of post-translation control that was previously unutilized.
Thus, the present disclosure provides a platform for accurate genetic program design by engineering new parts (e.g., “split parts” utilizing “split inteins”) that combine transcriptional and post-translational control and which are validated by a computational modeling framework. The experimental observations disclosed herein (see Examples 1-6, below) closely matched simulations, even in scenarios employing new proteins (including those with many domains) and new topologies (including those with many interacting parts), demonstrating a high predictive capacity across a range of complexity (
The disclosed genetic circuits comprise an expression vector and one or more (e.g., 1, 2, 3, 4, 5, or more) engineered proteins which comprise at least two functional domains: (i) a DNA binding domain, (ii) a transcription modulation domain, which may activate or inhibit transcription of the expression vector. In general, the DNA binding domain of the one or more engineered proteins is capable of binding to a DNA binding site on the expression vector. In some embodiments, the expression vector may comprise a minimal promoter and a gene of interest, such as a reporter gene of some kind (e.g., a fluorescent protein or another detectable protein/peptide/signal). The presently disclosed genetic circuits are unique compared to other platforms because the engineered proteins of the present system comprise at least one or one or more (e.g., 1, 2, 3, 4, 5, or more) split inteins. Split inteins are short peptide elements comprising complementary domains that fold and trans-splice to covalently ligate flanking domains. Accordingly, the incorporation of split inteins into the one or more engineered proteins allows for post-translational modification of the engineered proteins in response to stimuli from the system in which the genetic circuit is employed.
In some embodiments, the DNA binding domain of an engineered protein may comprise, for example, all of or a functional fragment of a zinc finger domain, such as ZF1, ZF2, ZF3, ZF4, ZFS, ZF6, ZF7, ZF8, ZF9, ZF10, ZF11, ZF12, ZF13, ZF14, or ZF15. In some embodiments, the DNA binding domain of an engineered protein may comprise, for example, all of or a functional fragment of a zinc finger protein comprising more than three DNA-binding domains, other classes of programmable DNA binding domains (e.g., transcription activator-like effector (TALE)), DNA binding domains derived from microbial proteins (e.g., tetR, 1acI, etc.), and/or Cas9 or variants of Cas9 and other Cas proteins, including catalytically inactive variants (e.g., dCas9).
In some embodiments, a transcription activator domain of an engineered protein may comprise, for example, all of or a functional fragment of Herpes simplex virus protein 16 (VP16), a synthetic tetramer of VP16 (VP64), nuclear factor (NF) kappa-B (p65) or a subunit thereof, heat shock transcription factor 1 (HSF1), replication and transcription activator (RTA) of the gamma-herpesvirus family, p53, acidic domains (also known as “acid blobs” or “negative noodles,” rich in D and E amino acids, present in Ga14, Gcn4 and VP16), glutamine-rich domains (which may comprise multiple repetitions like “QQQXXXQQQ,” like those present in transcription factor Sp1), proline-rich domains (which may comprise repetitions like “PPPXXXPPP,” like those present in c-jun, AP2, and October 2), isoleucine-rich domains (which may comprise repetitions of “IIXXII,” like those present in NTF-1), and/or multipartite activators, such as VP64-p65-Rta (i.e., “VPR”; see Chavez et al., Nat Methods. 2015 Apr.; 12(4): 326-328).
In some embodiments, a transcription inhibition/inhibitor domain of an engineered protein may comprise, for example, all of or a functional fragment of ZF, DsDed-ZF, KRAB, Polycomb complexes, any domain that can fulfill a similar function for inhibition as a bulky DsRed variant, any domain which sterically occludes recruitment of the RNA polymerase complex or accessory factors, and/or chromatin modification modalities including histone de-acetylation, histone methylation, etc. as reviewed in Beisel & Paro, Nature Reviews Genetics, 12:123-135 (2011).
In some embodiments, a split intein (or one or more split inteins) may be incorporated into an engineered protein between the DNA binding domain and the transcription modulation domain (e.g., transcription activation domain or transcription inhibition domain). In some embodiments, a split intein (or one or more split inteins) may be incorporated onto the N-terminus of an engineered protein. In some embodiments, a split intein (or one or more split inteins) may be incorporated onto the C-terminus of an engineered protein. In some embodiments, a split intein (or one or more split inteins) may be incorporated onto both the N-terminus and the C-terminus of an engineered protein.
Many split inteins are known in the art, including but not limited to, gp41-1, Npu DnaE, Ssp DnaE, Mtu RecA, Sce VMA, Ssp DnaB-SO, Ssp DnaB-S1, and Ssp GyrB-S11. Any of these known split inteins may be incorporated into an engineered protein for the purposes of the disclosed genetic circuits. In some embodiments, the split intein is or comprises gp41-1 or a sequence derived therefrom. In some embodiments, an engineered protein of the present disclosure may comprise one, two, or all of SEQ ID NOs: 1, 2, and/or 3, or an amino acid sequence that possesses at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any one of SEQ ID NOs: 1, 2, and/or 3.
The disclosed strategy and examples can be used to prepare further circuit beyond those expressly disclosed. Key strategies that enabled sophisticated design included the use of antagonistic bifunctionality (48), in which a component can exert opposing effects on a target gene depending on the other components in the circuit (
The disclosed genetic circuit platform comprising split inteins may be integrated with previous described technology related to the use of Modular Expression Sensor Architecture (MESA). MESA technology is known in the art. (See e.g., Rachel M. Dudek, Ph.D. Dissertation entitled “Engineering Multiparametric Evaluation of Environmental Cues by Mammalian Cell-based Devices,” Northwestern University, August 2015; Daringer et al., “Modular Extracellular Sensor Architecture for Engineering Mammalian Cell-based Devices,” Nichole M. Daringer, Rachel M. Dudek, Kelly A. Schwarz, and Josh N. Leonard, ACS Synth. Biol. 2014, 3, 892-902, published Feb. 25, 2014; and international publication WO 2013/022739, published on Feb. 14, 2013; the contents of which are incorporated herein by reference in their entireties).
MESA systems typically include a pair of extracellular receptors where both receptors of the pair contain a ligand binding domain and transmembrane domain, and one receptor contains a protease cleavage site and a functional domain (e.g., transcription regulator such as a transcription regulator that promotes transcription or a transcription regulator that inhibits transcription) and the other receptor contains a protease domain. As used herein, a transcription regulator may include a transcription factor that promotes transcription (e.g., by recruiting additional cellular components for transcription) and/or a transcription inhibitor or transcription repressor). In some embodiments of the disclosed subject matter, a MESA receptor may comprise a transcription factor or transcription inhibitor as described herein for use in the technology platform as described herein.
The disclosed genetic circuit platform comprising split inteins may be integrated with previous technology related to the use of TANGO assays. (See Barnea et al., “The genetic design of signaling cascades to record receptor activation,” Proc Natl Acad Sci USA. 2008 Jan. 8; 105(1):64-69; the content of which is incorporated herein by reference in its entirety). In some embodiments of the disclosed subject matter, a TANGO assay and/or a receptor utilized in a TANGO assay may comprise a transcription factor or transcription inhibitor as described herein for use in the technology platform as described herein.
The disclosed genetic circuit platform comprising split inteins may be integrated with previous technology related to the use of synNotch assays. (See Morsut et al., “Engineering Customized Cell Sensing and Response Behaviors Using Synthetic Notch Receptors,” Cell. 2016 Feb. 11; 164(4): 780-791; the content of which is incorporated herein by reference in its entirety). In some embodiments of the disclosed subject matter, a synNotch pathway and/or a receptor utilized in a synNotch pathway may comprise or utilize a transcription factor or transcription inhibitor as described herein for use in the technology platform as described herein.
IV. Methods of Using the Disclosed Engineered Circuits
The present disclosure provides methods in which a host cell may be transiently or non-transiently transfected (i.e., stably transfected) with one or more vectors described herein. In some embodiments, a cell is transfected as it naturally occurs in a subject (i.e., in situ). In some embodiments, a cell that is transfected is taken from a subject (i.e., explanted). In some embodiments, the cell is derived from cells taken from a subject, such as a cell line. Suitable cells may include stem cells (e.g., embryonic stem cells and pluripotent stem cells). A cell transfected with one or more vectors described herein may be used to establish a new cell line comprising one or more vector-derived sequences. In the methods contemplated herein, a cell may be transiently transfected with the components of a system as described herein (such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of a complex, in order to establish a new cell line comprising cells containing the modification but lacking any other exogenous sequence.
The presently disclosed methods may include delivering one or more polynucleotides, such as or one or more vectors as described herein and/or one or proteins transcribed therefrom, to a host cell. Further contemplated are host cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells. Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in mammalian cells or target tissues. Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell.
V. Applications and Advantages of the Disclosed Platform
In one aspect, the disclosed compositions and models can be used in methods comprising mammalian cell-based therapies for treating diverse diseases including cancer, autoimmune disease, and metabolic diseases. In another aspect, the disclosed compositions and models can be used in methods of biomanufacturing using cells engineered to perform sophisticated functions and improve yield, quality, and efficiency of a biologic product. In another aspect, the disclosed compositions and models can be used in methods of gene therapy comprising delivery of compact genetic programs using parts/strategies described in this invention. In another aspect, the disclosed compositions and models can be used in methods of preparing stem cell-based products (e.g., for therapy or research) in which differentiation is controlled by a genetic program built using this technology platform.
Further applications of the disclosed technology platform may include, but are not limited to: (i) engineered cell-based therapies for cancer, autoimmune disease, regenerative medicine, and many other diseases; (ii) investigating fundamental biological questions (research), for example by expressing transgenes in mammalian cells at various levels or only under certain conditions; and (c) control of gene expression in biotechnology, for example production of recombinant proteins in mammalian cells.
The disclosed compositions and models enable construction of multi-tasking proteins, which carry out multiple transcriptional or post-translational activities, and enable the preparation/construction of genetically compact designs for executing complex operations, e.g., implementing two logic gates simultaneously. The disclosed compositions and models enable use of a modeling framework for designing genetic programs and predicting circuit performance based upon descriptions of the component parts, and the disclosed circuits are interoperable with existing technologies, e.g., they can be integrated with upstream COMET-compatible sensors.
This technology enables the construction of cell-based therapies and biomanufacturing platforms that perform functions not achievable with existing approaches. This capability enables the engineering of cell-based therapies that are safer and more effective (e.g., cancer treatments that better distinguish a tumor from healthy tissue to maximize treatment and minimize toxicity). This technology also enables the creation of novel types of therapeutic cells (e.g., stem cell-derived products that cannot be manufactured with existing protocols). This technology also enables the engineering of cells for biomanufacturing, such as for cells to scale up and adapt to culture conditions, or to control the timing/rate/yield of production of a biologic product.
Further advantages of the disclosed technology platform may include, but are not limited to: (i) the disclosed technology comprises a set of comparable transcription factors which recognize orthogonal binding sites and can therefore be multiplexed and used in combination to perform different tasks within a single cell; and (ii) many different parameters are readily tunable in the disclosed technology using either design-driven or experimentally identified variations in the engineered proteins and/or DNA sequences of the disclosed technology.
Altogether, the present disclosure greatly expands the mammalian genetic program design space. In the disclosed current system, one can propose and formulate models for candidate designs based on principles for how the functionally modular parts operate and then evaluate in silico outcomes. Indeed, it is possible to further automate this process by using software to sweep large combinatorial spaces and identify candidates that satisfy specified performance objectives. Such advances could further speed up the design process and broaden the scope of possible circuits and behaviors beyond those accessible solely by intuition. The new components and quantitative approaches developed here will enable bioengineers to build customized cellular functions for applications ranging from fundamental research to biotechnology and medicine.
The following examples are given to illustrate the present invention. It should be understood, however, that the invention is not to be limited to the specific conditions or details described in these examples.
Experimental Method Details
Split Inteins
The COMET toolkit was expanded by incorporating gp41-1 (
The protein sequence for intN was:
The protein sequence for intC was:
The mutagenesis investigation (
Plasmid Cloning and Purification
Genetic components that were used in this study are listed in Table 1. Plasmids were designed in SnapGene (GSL Biotech LLC), and primers were ordered from Integrated DNA Technologies. Several domains were sourced from Donahue, et al. (60), but prior to the COMET study: VP16 and ZF domains are from Khalil, et al. (61), VP64 is from Chavez, et al. (Addgene #63798) (62), FRB and FKBP are from Daringer, et al. (Addgene #58876, #58877) (63), and DsRed-Express2 is a gift.
i Generated and used in the COMET study (60)
ii Mutations in intN fiveA: K41A, K43A, K45A, K48A, E52A
iii Mutations in intC sixA: K92A, K93A, E98A, E99A, E102A, E104A
Split inteins were from Hermann, et al. (Addgene #51267, #51268) (53). The ABA-binding domains PYL1 and ABI1 (64-68) from Gao, et al. (69) were utilized to make ABA-ZFa. The PEST tag was from the mouse ornithine decarboxylase gene (70). Two types of plasmid backbones were used: pcDNA (pPD005, Addgene #138749), which was modified from Thermo Fisher Scientific #V87020 as described by Donahue, et al. (60); and a series of transcription unit positioning vectors (TUPVs), which were derived from the modified pcDNA and previously published by Donahue, et al. (60), and based upon the mMoClo system from Duportet, et al. (71). Insulator sequences in TUPVs were from Bintu, et al. (Addgene #78099) (72).
Cloning was performed primarily using standard PCR, restriction, and ligation methods (reagents from New England Biolabs and Thermo Fisher Scientific), and in some cases through Golden Gate assembly, followed by transformation into chemically competent TOP10 E. coli (Thermo Fisher Scientific). Transformed E. coli were grown on LB/Ampicillin agar plates at 37° C., colonies were picked and grown in liquid LB/Ampicillin cultures, plasmid DNA was isolated (E.Z.N.A. plasmid mini kit, Omega Bio-tek), and DNA inserts were sequence-verified (ACGT, Inc.). Plasmids were prepared using polyethylene glycol-based extraction as described previously (60). DNA purity and concentration were measured using a Nanodrop 2000 (Thermo Fisher Scientific).
Mammalian Cell Culture
HEK293FT cells were cultured in complete DMEM medium containing 1% DMEM powder (Gibco #31600091), 0.35% w/v D-glucose (Sigma #50-99-7), 0.37% w/v sodium bicarbonate (Fisher #S233-500), 10% heat-inactivated FBS (Gibco #16140071), 4 mM L-glutamine (Gibco #25030081), and 100 U m−1 penicillin and 100 μg ml−1 streptomycin (Gibco #15140122) in tissue culture-treated 10 cm dishes (Corning #500001672) at 37° C. in 5% CO2. To passage, medium was aspirated, and cells were washed in PBS, incubated in trypsin-EDTA (Gibco #25300054; 37° C., 5 min), detached by tapping the dish, and resuspended in fresh medium and plated. This cell line tested negative for Mycoplasma using the MycoAlert Mycoplasma detection kit (Lonza #LT07-318).
Transfection
Cells were plated in 24-well plates (Corning #3524; 3×105 cells ml−1, 0.5 ml per well) and transfected after adhering to the plates, typically between 8-14 h after plating. Transfections were carried out using the calcium phosphate protocol (60): plasmids were mixed together in defined amounts, CaCl2 (2 M, 15% v/v) was added, and this solution was pipetted dropwise into an equal volume of 2×HEPES-buffered saline (500 mM HEPES, 280 mM NaCl, 1.5 mM Na2HPO4); the solution was gently pipetted four times, and three minutes later it was vigorously pipetted 20 times and added dropwise onto plated cells. In this study, DNA doses are reported in plasmid mass (ng) per well of cells or gene copies per well of cells. In each transfection experiment, “empty vector” (pPD005) was included in the transfection mix to maintain a consistent total mass of DNA per well. At one day after plating, medium was aspirated and replaced with fresh medium. In some experiments, the fresh medium contained vehicle or ligand. In
Flow Cytometry
Samples were prepared for flow cytometry generally at 40-48 h post-transfection. For each well, medium was aspirated, five drops of PBS were added, PBS was aspirated, and two drops of trypsin-EDTA were added. Cells were incubated (37° C., 5 min), plates were tapped to detach cells, and four drops of cold (4° C.) DMEM were added. The contents of each well were pipetted up and down several times to detach cells and pipetted into FACS tubes containing FACS buffer (FB; 2 ml; PBS pH 7.4, 5 mM EDTA, 0.1% w/v BSA). Tubes were centrifuged (150×g, 5 min), liquid was decanted, and two drops of FB were added. Samples were kept on ice and wrapped in foil, and then run on a BD LSR Fortessa special order research product using the following configuration: Pacific Blue channel with 405 nm excitation laser and 450/50 nm filter for EBFP2; FITC channel with 488 nm excitation laser and 505LP 530/30 nm filter for EYFP; and PE-Texas Red channel with 552 nm excitation laser and 600LP 610/20 nm filter for mKate2. Approximately 104 live single-cell events were collected per sample.
Flow Cytometry Data Analysis
Flow cytometry data were analyzed using FlowJo software (FlowJo, LLC) to gate on single-cell (FSC-A vs. FSC-H) and live (FSC-A vs. SSC-A) bases, compensated using compensation control samples, and gated as transfection-positive (
Nomenclature
Genes were named by their protein domains in order from N-terminus to C-terminus. Domains were generally connected by flexible linkers comprising glycine and serine. Several abbreviations were used: ZFa was an AD-ZF for any choice of AD and ZF; similarly, RaZFa was an AD-FRB and FKBP-ZF, and ABA-ZFa was an AD-PYL1 and ABI1-ZF. DsRed refers to wild type DsRed-Express2, and DsDed was an DsRed-Express2 R95K mutant. This is a streamlined nomenclature that differs from that used in the original COMET report (60), in that inhibitors do not use ZFi notation: ZFi is now termed ZF, and DsRed-ZFi is now termed DsRed-ZF.
The constitutive promoters used were CMV and EF1α. The inducible promoters used were COMET promoters, which were named as “[ZF domain]x[number of binding sites]-[binding site arrangement]”. For example, ZF1x6-C has six compact sites for ZF1. There were two non-standard cases: ZF1/2x6-C has six compact overlapping sites for ZF1 and ZF2 (up to six sites occupied, and up to six per ZF); (ZF2/ZF6)x3 has six compact sites alternating between ZF2 and ZF6 (up to six sites occupied, and up to three per ZF).
Statistical Analysis
Each sensor in
Computational Method Details
Overview
This section describes the extension of another explanatory model for COMET TFs (60) to a predictive model incorporating split intein-mediated splicing and other attributes. Rules for formulating systems of ODEs are provided to support the formulation of models for new genetic circuits based on the genetic parts from this study.
A Statistical Model for Gene Expression Heterogeneity
This modeling approach accounts for variation in gene expression—including differences in the expression of a gene between cells and differences in the expression of genes within a cell—in a cell population. A population matrix was generated using the constrained sampling method (60, 73), which was used here to describe the distribution of gene expression observed when cells were harvested via trypsin digest (as noted in
The ith row (cell) and pth column (plasmid) of the population matrix Z was a scalar for the relative expression of a gene. The z value was used as a multiplier in the production term for each RNA species.
Production rate=Zi,p·ktxEF1a·dose (1)
A dynamical model was run separately for each cell in the simulated population, and the mean end-point simulated reporter protein level for the population was calculated. Layering the statistical model on the dynamical models at the level of RNA production enables the simulations to account for cell-to-cell variation (e.g., this method incorporates potential outlier effects that could skew a population mean) and therefore should generally enable better predictions (73).
Some figures employ a standard single-cell (homogeneous) model, as indicated in the figure description. These cases forgo the incorporation of heterogeneity and instead simulate the mean-transfected cell, which represents a scenario for average gene expression from each plasmid.
Dynamical Models
Genetic programs were represented by systems of ODEs. State variables included RNA and protein species in arbitrary concentration units. Processes included transcription (constitutive, inducible, inhibitable), RNA degradation, protein translation, split intein-mediated splicing, small molecule-based reconstitution, and protein degradation. Parameter values are in Table 2.
Constitutive transcription from the EF1α promoter or CMV promoter was proportional to plasmid dose (ng).
RNA production rate=ktxEF1a·dose (2)
RNA production rate=ktxCMV·dose (3)
Functions for regulated transcription were broadly represented by f The dose term d for a regulated gene was empirically defined (i.e., based on a heuristic) and calculated by dividing the plasmid dose (ng) by 200 ng; then, the square root of this fraction was used, e.g., for 200 ng, d1/2=1, and for 50 ng, d1/2=0.5. The 200 ng dose was defined as a reference point because this was the dose of reporter plasmid used in the original characterization (60).
RNA production rate=ktxZF·d1/2·f (4)
ZFa-inducible transcription uses the COMET model formulation, in which b is TF-independent (background) transcription, m is the maximal activation, and w is a steepness parameter. Also modeled was the activation mediated by AD-ZF-containing proteins that also contain intC, intN, or additional ZF domains equivalently to that by a base case ZFa. The variable refers to the simulated amount of TF protein, not to plasmid dose.
ZFa-inducible transcription can be inhibited by a ZF, which sterically blocks the activator from binding to sites in a promoter. Also modeled was the inhibition mediated by ZF proteins that also contain intC, intN, FKBP, or additional ZF domains equivalently to that by a base case ZF. The subscripts A and I denote parameters for an activator and inhibitor, respectively.
ZFa-inducible transcription can also be inhibited by a DsDed-ZF, which acts through a dual mechanism of steric inhibition and reduction of effective promoter cooperativity. The effect of the latter mechanism is that at increasing strength or dose of inhibitor compared to activator, the cooperativity represented by m ramps down to an effective value of 1. Also modeled was the inhibition mediated by DsDed-ZF-containing proteins that also contain intC, intN, or additional ZF domains equivalently to that by a base case DsDed-ZF.
RNA degradation is represented as a first-order process.
RNA degradation rate=−kdegRNA·SpeciesRNA (8)
Protein translation is also first order.
Translation rate=kt1·SpeciesRNA (9)
Splicing is a second-order reaction between an intN-containing protein and an intC-containing protein with a fitted rate constant.
Splicing rate=krec·Species1Protein·Species2Protein (10)
For example, the following the terms represent the splicing of A-intN-B and X-intC-Y to A-Y and X-intC/intN-B, where A, B, X, and Y can be DNA-binding, activating, or inhibitory domains or no domain.
A-intN-BProtein splicing rate=−krec·A-intN-BProtein·X-intC-YProtein (11)
X-intC-YProtein splicing rate=−krec·A-intN-BProtein·X-intC-YProtein (12)
A-YProtein splicing rate=krec·A-intN-BProtein·X-intC-YProtein (13)
X-intC/intN-BProtein splicing rate=krec·A-intN-BProtein·X-intC-YProtein (14)
Small molecule-based reconstitution to form RaZFa uses the Heaviside function H with ligand treatment at time τ (hours) post-transfection. For simplicity, the krec parameter was also used to describe reconstitution.
RaZFa reconstitution rate=krec·AD-FRBProtein·FKBP-ZFProtein·H(t−τ) (15)
Prior to reconstitution, FKBP-ZF can act as a ZF-like inhibitor against RaZFa or ZFa at a target promoter.
Protein degradation is first order. Rate constants vary for non-intC-containing TFs, intC-containing TFs, and reporter protein, respectively.
Protein degradation rate=−kdegZFP·TFProtein (17)
Protein degradation rate=−kdegintC·TFProtein (18)
Protein degradation rate=−kdegRep·ReporterProtein (19)
As an example, the following system of equations represents the reconstitution of a ZFa and induction of a reporter. This system produces an AND gate for: if AD-intN and intC-ZF are present, then induce reporter.
Since the genetic parts employed for activation, inhibition, splicing, and dimerization exhibit functional modularity, one can utilize the formalisms described above to generate systems of equations to represent a variety of circuits. ODEs for the circuits in this study were provided in MATLAB files.
Parameterization
Some parameter values are from the COMET study (60) and others are newly estimated or fitted here (Table 2).
Ultrasensitivity
Ultrasensitivity is a type of nonlinear signal processing in which a small change in an input produces a large change in an output. It was demonstrated how this property can be achieved with engineered motifs such as a double inhibition cascade (
Diagrams
Genetic programs for digital functions are depicted using genetic diagrams and electronic diagrams. The former represents each promoter, protein, and regulatory interaction, and the latter represents the logic underlying these interactions.
The strategy that was pursued for genetic program design was uniquely enabled by the COmposable Mammalian Elements of Transcription (COMET): a toolkit of TFs and promoters with tunable properties enabling precise and orthogonal control of gene expression (13). These TFs comprised a ZF DNA-binding domain and a functional domain, e.g., VP16 and VP64 are activation domains (AD) that in combination with a ZF form an activator (ZFa). A protein including a ZF domain but lacking an AD can function as a competitive inhibitor of the cognate ZFa. Promoters in this library contained ZF binding sites arranged in different configurations (e.g., ZF1x6-C has six compactly arranged ZF1 sites). Each combination of a promoter and a ZFa (and potentially an inhibitor) conferred a characteristic level of transcriptional activity (
Although COMET includes many parts for implementing transcriptional regulation, complex genetic program design was facilitated by introducing a mechanism for regulation at the post-translational level (
As a first test of the predictive capacity of the revised model, a panel of circuits that could carry out various logic operations was simulated (
A versatile design framework would enable one to achieve a given performance objective via multiple circuits. Thus, the combined properties of COMET and splicing-based extensions were developed here to provide a sufficient basis for this capability. To investigate, four designs for a NIMPLY gate were compared, each of which utilizes a different mechanism (i.e., topology and/or choice of parts). The first two designs used inhibition mediated by ZF1 (
Across the panel, five of the eight gates exhibited a goodness of prediction metric (comparing all simulated and observed outcomes, Q2) of at least 90%, indicating a high capacity for predicting dose response landscapes that had not been used in model training (
A putative advantage of orthogonal parts like COMET TFs and promoters is that these parts may be used together without disrupting their functions. However, simply appending modules can lead to inefficient and cumbersome designs, and thus, one focus of the current approach was achieving genetic compactness as well as performance. Enhancing compactness could eliminate potential failure modes and reduce cargo size for gene delivery vehicles. Genetic compression—reducing the number of components for a given specification—has been investigated by using recombinase-mediated DNA rearrangement (35) and by borrowing from a software engineering strategy to eliminate redundancy (36). Here, it was sought to implement a previously unexplored form of topological compaction based on protein multi-tasking (
It was investigated whether functional modularity could enable the design of compact multi-input multi-output (MIMO) systems. Ultimately, this capability could support the encoding of sophisticated decision-making strategies in which cells take different actions in different situations. As a base case, a NIMPLY gate and a NOT gate were appended in a non-compact manner, and the combination functioned as expected (
Notably, when performance at the single-cell level was examined, some population-level outcomes were driven by subpopulations of cells. In some circuits, subpopulations induced one reporter or the other, but not both, and thus population outcomes were driven by shifts in subpopulation frequencies (
Although digital logic has many uses, biology also processes analog signals for many purposes, and it was next examined whether the disclosed tools could be employed in this way. The first property that for which implementation was sought was ultrasensitivity, which is desirable in engineering sharp activation (37, 38) and is observed in the natural control of processes including cell growth, division, and apoptosis (39). The second property was bandpass concentration filtering, in which an output is produced only when the input falls within a certain range of magnitudes (22, 40). Bandpass concentration filtering is salient for both natural and synthetic spatial patterning (41). To develop a strategy for implementing these properties, mechanistic insights were used. It was determined that ZFa-mediated activation is cooperative at the level of transcription initiation, and in comparing promoter architectures, maximal transcription increased with the number and compactness of binding sites (13). This COMET promoter feature confers high inducibility as well as a high sensitivity to inhibition by proteins that compete for DNA binding. It was also deduced that TF binding to promoter is generally non-cooperative, and transcriptional output from such promoters is not inherently ultrasensitive to ZFa dose (n=1). To construct systems that do exhibit ultrasensitivity (n>1), several strategies were examined in which the output is inhibited only at low activator doses (
Compared to a ZFa base case (n=1.0) (
Next, circuits to implement bandpass concentration filtering were investigated. The strategy was to use mechanisms that inhibit reporter output only at high doses of activator input, and the predictions were based on a fitted ZFa base case (
While the predictive design of genetic programs is a substantial technical advance in and of itself, employing this capability to enable many potential applications will require integrating genetic circuits with native or synthetic parts that sense and modulate the state of the cell or its environment. A recurring challenge associated with this goal is level-matching the output of a sensor to the input requirements of a downstream circuit (32, 42). It was investigated whether the disclosed designed circuits could overcome this challenge and be effectively linked to sensors without requiring laborious trial-and-error tuning. Simulations suggested that adding an upstream layer of signal processing (i.e., for sensing) is feasible, since in the model, ZFa can be arranged in series without prohibitively driving up background or dampening induced signal (
Two classes of synthetic sensors (intracellular and transmembrane) were considered for which it was hypothesized that signaling (i.e., sensor output) could be coupled to COMET-based circuits. For the intracellular sensor, a new TF—ABA-ZFa was built, which was analogous to RaZFa—by fusing the abscisic acid (ABA)-binding domains PYL1 and ABI1 (43) to an AD and a ZF, respectively. For transmembrane sensing, the modular extracellular sensor architecture (MESA)—a self-contained receptor and signal transduction system that transduces ligand binding into orthogonal regulation of target genes (44, 45)—was selected. In this mechanism, ligand-mediated dimerization of two transmembrane proteins called the target chain (TC) and protease chain (PC) promoted PC-mediated proteolytic trans-cleavage of a TC-bound TF. Several strategies were explored for building COMET-compatible MESA based on a recently reported improved MESA design (46) and the parts developed in the current study (
The two validated sensors were carried forward and downstream circuits comprising genetic parts and designed topologies from this disclosure were examined to determine whether they could be seamlessly linked with the new input layer. To this end, a panel of four synonymous topologies were designed that implement AND logic through different mechanisms (
All patents and publications mentioned in the specification are indicative of the levels of those of ordinary skill in the art to which the disclosure pertains. All patents and publications are herein incorporated by reference to the same extent as if each individual publication was specifically and individually indicated to be incorporated by reference.
Further, one skilled in the art readily appreciates that the present disclosure is well adapted to carry out the objects and obtain the ends and advantages mentioned, as well as those inherent therein. Modifications therein and other uses will occur to those skilled in the art. These modifications are encompassed within the spirit of the disclosure and are defined by the scope of the claims, which set forth non-limiting embodiments of the disclosure.
The present technology is not to be limited in terms of the particular embodiments described in this application, which are intended as single illustrations of individual aspects of the present technology. Many modifications and variations of this present technology can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. Functionally equivalent methods and apparatuses within the scope of the present technology, in addition to those enumerated herein, will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the present technology. It is to be understood that this present technology is not limited to particular methods, reagents, compounds compositions or biological systems, which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
References Cited
This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 63/079,882, filed Sep. 17, 2020, the entire contents of which are incorporated herein by reference.
This invention was made with government support under grant number EB026510 awarded by the National Institutes of Health (NIH). The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/050584 | 9/16/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63079882 | Sep 2020 | US |