A central goal of synthetic biology is to achieve multi-signal integration and processing in living cells for diagnostic, therapeutic, and biotechnology applications. Digital logic has been used to build small-scale circuits but other paradigms are needed for efficient computation in resource-limited cellular environments. Using fundamental properties of the scaling laws of thermodynamic noise with temperature and molecular count, which are true in both biological and in electronic systems, the pros and cons of analog versus digital computation have been analyzed for neurobiological systems21 and for systems in cell biology20. These results show that analog computation is more efficient than digital computation in part count, speed, and energy consumption below a certain crossover computational precision.20,21. For the limited computational precision seen in biological cells, analog computation therefore has benefits over digital computation.
Herein we demonstrate that synthetic analog gene circuits can be engineered to execute sophisticated computational functions in living cells using only a few interacting components, such as less than three transcription factors. Such synthetic analog gene circuits exploit positive and negative feedback to implement logarithmically linear sensing, addition, subtraction, and scaling thus enabling multiplicative, ratiometric, and power-law computations. The circuits exhibit Weber's Law behavior as in natural biological systems, operate over a wide dynamic range of up to four orders of magnitude, and can be architected to have tunable transfer functions. The molecular circuits described herein can be composed together to implement higher-order functions that are well-described by both intricate biochemical models and by simple mathematical functions. By exploiting analog building-block functions that are already naturally present in cells20,21, this paradigm efficiently implements arithmetic operations and complex functions in the logarithmic domain. Such circuits can open up new applications for synthetic biology and biotechnology that require complex computations with limited parts, that need wide-dynamic-range bio-sensing, or that would benefit from the fine control of gene expression. The molecular circuits described herein enable logarithmically linear analog computation within in-vitro and in-vivo systems with a broad class of molecules, all of which obey the Boltzmann exponential equations of thermodynamics that govern molecular association, attenuation, transformation, and degradation.
Examples of embodiments are provided herein and throughout the present application.
Accordingly, provided herein in some aspects are graded positive-feedback molecular circuits comprising
In some embodiments of these aspects, the circuit is executable in a cell, a cellular system, or an in vitro system.
In some embodiments of these aspects, the molecular species are selected from DNA, RNA, peptides, proteins, and small molecule inducers.
In some embodiments of these aspects, the proteins are one or more of transcription factors, nucleic acid binding proteins, enzymes, and hormones.
In some embodiments of these aspects, the RNA is one or more of a microRNA, a short-hairpin RNA, and antisense RNA.
In some embodiments of these aspects, strength of the graded positive feedback of the circuit is adjusted by altering any of the association, attenuation, transformation, or degradation strengths of any of the blocks or components in the feedback loop.
In some embodiments of these aspects, the Kd of binding of one molecular species to another is used to adjust the association, attenuation, transformation, or degradation strength of any of the blocks in the feedback circuit.
In some embodiments of these aspects, decoy or sequestration binding molecules or fragments of molecules serve to change the attenuation strength of any of any of the blocks/components in the feedback circuit.
In some embodiments of these aspects, degradation strength of any block is altered by adding one or more ssrA tags, antisense RNAs, microRNAs, proteases, degrons, PEST tags, or anti-sigma factors, in any block.
In some embodiments of these aspects, the circuit comprises low-copy plasmids and high-copy plasmids, each plasmid expressing one or more components of the association block, the control block, the transformation block, and the feedback block.
In some embodiments of these aspects, the attenuation strength of any block is altered by increasing a ratio of a high-copy plasmid number to a low-copy plasmid number.
In some embodiments of these aspects, graded positive feedback is used to widen a logarithmically linear range of transduction from an input molecular species to an output molecule.
Also provided herein, in some aspects, are molecular circuits for performing addition or weighted addition, wherein any of two outputs of an association, attenuation, transformation, or degradation block of any of the graded positive-feedback molecular circuits described herein is a common molecule.
In some aspects, provided herein are molecular circuits comprising at least two of any of the molecular circuits described herein, wherein the output slopes from any of these circuits with a common output molecule are adjusted by weighting to create a logarithmically linear function of the concentrations of the input molecular species.
In some aspects, provided herein are molecular circuits for performing subtraction or weighted subtraction wherein any of two outputs of an association, attenuation, transformation, or degradation block is a common molecule, and wherein the subtraction input to the block whose output is subtracted is a repressory input.
In some embodiments of these aspects, at least two of the inputs to the circuit arises from the output of logarithmically linear circuits, such that logarithmic subtraction, weighted logarithmic subtraction, division, or ratioing of these inputs is enabled.
A “block” referred to herein and throughout the specification can be understood to comprise one or more components that executed the function, e.g., the biological function, as described.
Provided herein, in some aspects, are graded negative-feedback molecular circuits comprising
In some embodiments of these aspects, the circuit is executable in a cell, a cellular system, or an in vitro system.
In some embodiments of these aspects, the molecular species are selected from DNA, RNA, peptides, proteins, and small molecule inducers.
In some embodiments of these aspects, the input-output molecular transfer function is a power law or equivalently creates a molecular output whose logarithmic concentration is a scaled version of the logarithmic concentration of the input.
Also provided herein are molecular circuits for use in performing fine control of gene, protein, or other molecular expression.
Also provided herein are logarithmically linear molecular circuits for use in performing logarithmically linear analog computation.
In
In
A central goal of synthetic biology is to achieve multi-signal integration and processing in living cells for diagnostic, therapeutic, and biotechnology applications. Digital logic has been used to build small-scale circuits but other paradigms are needed for efficient computation in resource-limited cellular environments. We demonstrate herein that synthetic gene circuits can be engineered to encode sophisticated computational functions in living cells, using, for example, just three transcription factors. We demonstrate herein that such synthetic analog gene circuits can exploit feedback to implement logarithmically linear sensing, addition, ratiometric, and power-law computations. The circuits described herein can exhibit Weber's Law behavior as in natural biological systems, operate over a wide dynamic range of up to four orders of magnitude, and can be architected to have tunable transfer functions. The circuits described herein can be composed together to implement higher-order functions that are well-described by both intricate biochemical models and by simple mathematical functions. By exploiting analog building-block functions that are already naturally present in cells, the paradigms and circuit structures described herein efficiently implement arithmetic operations and complex functions in the logarithmic domain. Such circuits open up new applications for synthetic biology and biotechnology that require complex computations with limited parts, which need wide-dynamic-range bio-sensing, and/or that benefit from fine control of gene expression.
In natural biological systems, digital behavior is appropriate for settings where decision making is necessary, such as in developmental circuits (1). The digital paradigm is an abstraction of graded analog functions where values above a threshold are classified as ‘1’ and values below this threshold as ‘0’ (
As demonstrated herein, analog synthetic circuit motifs were created that perform positive wide-dynamic-range logarithmic transformations of inducer concentration inputs to fluorescent protein outputs (
Provided in the various aspects described herein are molecular circuits and circuit configurations comprising two or more modular functional blocks, each such modular functional block comprising one or more molecular or biological component parts for executing the circuit function, such as positive logarithmic feedback, negative logarithmic feedback, power law functions, division function, addition function, subtraction function etc. As understood by one of ordinary skill in the art, the various modular blocks described herein in the various molecular/biological circuit configurations are governed and defined by their functional properties, but need not be physically distinct or physically separate in all embodiments. For example, two or more such modular blocks can be incorporated in one physical structure or component, such as a plasmid or vector; a single given modular block can be incorporated in more than one physical structure or component, such as multiple plasmids or vectors; or a single physical structure or component can comprise two or more modular functional blocks, as described herein. For example, a high copy-number plasmid is a physical structure or component part that can comprise two or more modular functional blocks, or part of two or more functional blocks, as described herein.
In some embodiments, the molecular circuits described herein incorporate the effects of biochemical interactions, such as the binding of inducer molecules to transcription factors, the binding of transcription factors to promoters, the degradation of free and bound transcription factors to DNA, the effective variation of transcription-factor diffusion-limited binding rates inside the cell with variation in plasmid copy number, microRNA binding to microRNA target sequences, etc. and the integration of all these effects. As used herein, transcription factors are called “free transcription factors” if they are not interacting with inducers or DNA. When inducers complex with transcription factors, the resulting product is referred to herein as an “inducer-transcription-factor complex.” When free transcription factors bind to DNA, it is referred to herein as “bound transcription factors.” When inducer-transcription factor complexes bind to DNA, it is referred to herein as “bound inducer-transcription-factor complex.”
Accordingly, provided herein, in some aspects, are graded or analog feedback molecular circuits comprising two or more modular functional blocks configured for performing positive wide-dynamic range logarithmic transduction of molecular inputs or configured for performing computations with input molecular species to generate output molecular species, wherein the molecular/biological circuit is implementable or executable in a cell, cellular system, or in vitro system comprising molecular or biological machinery or components, such as transcriptional or translational machinery or components.
In some embodiments of these aspects and all such aspects described herein, the two or more modular functional blocks comprise an association block, a control block, a transformation block, and a feedback block. These graded molecular circuits can use, for example, transcriptional and translational regulation mechanisms via component parts to implement logarithmic mathematical functions, as described herein.
As used herein, an “association block” or “association module” or “association component” refers to a modular functional component of a biological circuit in which two or more input molecular species associate to create one or more associated output molecular species via a chemical/molecular reaction by the association block. Such molecular species include nucleic acids, such as RNA and DNA; proteins, such as transcription factors, enzymes, and protein hormones; small molecule inducers and small-molecule hormones; or any other molecular species that undergoes chemical reactions as defined by the input-output block combination(s). The “association strength” of the block is a monotonically increasing or monotonically decreasing function of the ability of the two species to associate or bind with each other. It is often represented by the parameter Kd (20), with 1/Kd signifying a high association strength.
Input and output molecular species in an association block can include nucleic acids, such as RNA and DNA; proteins, such as transcription factors, enzymes, and protein hormones; small molecule inducers or small-molecule hormones; or any other molecular species that undergoes chemical reactions as defined and controlled by the association block. Examples of means to alter association strengths include mutating the binding sequence on a fragment of a DNA molecule such that a transcription-factor molecule associates with the DNA more strongly or weakly (
As used herein, a molecular input species is transformed to a different molecular output species via a chemical reaction in a “transformation block.” The “transformation strength” of the transformation block is a monotonically increasing function of the ratio of the concentration of the output species with respect to the input species. Examples of means to alter transformation strengths include mutating the sequences of promoter and/or transcription-factor binding strengths to DNA such that the output mRNA to input transcription factor ratio is increased, altering the ribosome binding sequence on the mRNA such that the output protein to input mRNA ratio is increased, or having the output of transcription itself be an RNA polymerase, e.g., the T7 RNA polymerase, such that this polymerase amplifies the gain of transcription through two stages of amplification rather than one.
As used herein, a molecular input species is degraded via a “degradation block” if the action of the degradation block serves to decrease the concentration of the input molecular species by degrading or destroying it in an irreversible fashion. The “degradation strength” of the degradation block is a monotonically increasing function of its ability to decrease the concentration of the species that it degrades. Examples of means to alter the degradation strength include means of tagging protein molecules with recognition sequences such as ‘ssrA tags’ that enable proteases (protein destroying enzymes) to speed their destruction or by altering the terminal sequences of mRNA molecules such that RNAase enzymes speed their destruction.
As used herein, a molecular input species is attenuated via an “attenuation block” if the species is reduced in number by virtue of its binding with another molecular species that sequesters it or that attenuates the species without destroying it irreversibly (
As used herein, a molecular species Min is converted to an output molecular species C in an “input block”, “input module”, or “input component” if the input block comprises at least one association block with an association strength that may (or may not) be altered by design.
As used herein, a molecular species C is converted to C′ in a “control block”, “control module”, or “control component” when that block is itself composed of one or more of an association, transformation, attenuation, or degradation block with respective association, transformation, attenuation, and degradation strengths that may (or may not) be altered by design. The control block can also serve to just be an identity function with no net transformation as a special case, i.e., C=C′ and [C]=[C′] such that the identity and concentration of the molecular input and output species are identical, or with the identity being the same (C=C′ as a molecular species) but the concentration of the input and output species differing from one another ([C]≠[C′]).
As used herein, an “output block” or “output module” or “output component” refers to a modular functional component of a biological circuit in which the molecular species C′ generated by the control block is converted to a molecular species termed herein as “Mout” via a transformation block with a transformation strength that may (or may not) be altered by design. The output block can also serve to just be an identity function with no net transformation as a special case, i.e., Mout=C′ and [Mout]=[C′] such that the identity and concentration of the molecular input and output species are identical, or with the identity being the same (Mout=C′ as a molecular species) but the concentration of the input and output species differing from one another ([Mout]#[C′]).
As used herein, a “feedback block” or “feedback module” or “feedback component” refers to a modular functional component of a biological circuit that takes one or more output molecular species M, of the circuit as its input and produces at its output one or more molecular species at its output via the composition of one or more of an association, transformation, attenuation, or degradation block with respective association, transformation, attenuation, and degradation strengths that may (or may not) be altered by design. The feedback block can also serve to just be an identity function, in some embodiments, with no net transformation as a special case, i.e., Mout=Mout′ and [Mout]=[Mout′] such that the identity and concentration of the molecular input and output species of the feedback block are identical or with the same identity but differing concentration (Mout=Mout′; [Mout]≠[Mout′]).
In some aspects, provided herein are graded positive-feedback molecular circuits, also referred to as a “wide-dynamic-range positive-logarithm circuit” comprising a “positive-feedback (PF) component” located on a low-copy plasmid (LCP) and a “shunt component” located on a high-copy plasmid (HCP).
As demonstrated herein, the positive-feedback (PF) component cascades the successive outputs of an input block, control block, output block, and feedback block in a positive feedback loop (
The shunt component (shunt) of the molecular circuit provides a means for controlling the attenuation and/or degradation strength of the feedback block and the control block thus affecting the overall strength of the positive feedback to enable optimally wide-dynamic-range graded analog operation. The shunt component binds and sequesters molecules away from the LCP, thus providing control of the attenuation strength of the LCP PF component (for example in
In some embodiments of the aspects described herein, the PF component on the LCP comprises one or more inducible promoters operably linked to sequences encoding transcription factors (TFs) that bind to these same promoters, i.e., TFs that are “specific for the inducible promoter.” Thus, the TFs generated by the PF component increase their own generation via a positive-feedback loop and alleviate saturation of the inducer-TF interaction. In some embodiments, the one or more inducible promoters of the PF component is/are also operably linked to sequences encoding a protein output, such as a detectable output, for example, a reporter protein.
In some embodiments of the aspects described herein, the shunt component on the HCP is comprised of one or more inducible promoters that are bound by and shunt away the same TFs generated by the LCP, thus reducing saturation of the TF-DNA interaction on the LCP.
In addition, in some embodiments of the aspects described herein, the shunt component on the HCP, also generates a protein output, such as a reporter protein, that is different from the TF output of the LCP (
In addition, in some embodiments, the feedback loop can comprise any other molecular species acting on another molecular species, such as any other protein acting on a promoter, or other genetic regulatory element, a microRNA (miRNA) or any other RNA species acting on an RNA-based genetic regulatory element, or a microRNA (miRNA) or any other RNA species bound to a protein acting on a promoter, or other genetic regulatory element.
Accordingly, as demonstrated herein, in some exemplary embodiments of these aspects (
In some embodiments of the graded positive-feedback molecular circuits described herein, where a configuration involving a “positive-feedback (PF) component” located on a low-copy plasmid (LCP) and a “shunt component” located on a high-copy plasmid (HCP) is used, the attenuation and degradation strength of the control block and/or the feedback block of the circuits is determined by the relative copy numbers or ratio of the number of high-copy plasmids versus the low-copy plasmids. For example, the ratio of the number of high-copy plasmids versus the low-copy plasmids is at least 2:1, at least 3:1, at least 4:1, at least 5:1, at least 6:1, at least 7:1, at least 8:1, at least 9:1, at least 10:1, at least 11:1, at least 12:1, at least 13:1, at least 14:1, at least 15:1, at least 16:1, at least 17:1, at least 18:1, at least 19:1, at least 20:1, at least 25:1, at least 30:1, at least 40:1, at least 50:1, at least 60:1, at least 70:1, at least 80:1, at least 90:1, at least 100:1, or more, or any ratio in between, e.g., 27:5 and the like. For the embodiments described herein in the examples ratios of 63:1, as determined by modeling and experiments, were found to provide optimally wide-dynamic-range operation both other embodiments with other transcription factors will have different values.
In some embodiments of the graded positive-feedback molecular circuits described herein, where a configuration involving a “positive-feedback (PF) component” located on a low-copy plasmid (LCP) and a “shunt component” located on a high-copy plasmid (HCP) is used, the transformation strength of the circuits is determined by the Kd of the molecular binding of Mout′ to the input component, for example, the binding of AraC to PBAD in the control block of the exemplary circuit described above. In addition, the degradation strength can be set by dilution and protein degradation of the molecular species C′, such as dilution and protein degradation of AraCcb in the control block of the exemplary circuit described above. Similarly, the attenuation strength of the feedback blocks of the circuits can be determined by dilution and protein degradation of the molecular species Mout or Mout′, for example, AraC or AraCc in the feedback block of the exemplary circuit described above
The AraC-based embodiment of the graded molecular circuits described herein exhibited an input-output transfer function that was well-fit by a simple mathematical function of the form ln(1+x), which is a first-order approximation for the Hill function at small values of x, where x is a scaled version of the input concentration (
To gain deeper insights into the mechanisms that may give rise to logarithmically linear transfer functions, detailed biochemical models were built which capture the effects of inducer-to-TF binding, TF-to-DNA binding, the “PF-shunt” circuit topology, and protein degradation (
In some embodiments of the aspects described herein, the quorum-sensing LuxR transcriptional activator, which is induced by Acyl Homoserine Lactone (AHL) and activates the promoter Plux, can be applied to a graded molecular circuit comprising a positive-feedback (PF) component located on a low-copy plasmid (LCP) and a shunt component located on a high-copy plasmid (HCP) (
In some such embodiments of the aspects described herein, the positive-feedback component on the LCP comprises one or more inducible promoters operably linked to sequences encoding the luxR transcription factor that binds to the Plux promoter, which is induced by AHL. In some such embodiments, the one or more inducible promoters of the positive-feedback component is/are also operably linked to sequences encoding a protein output, such as a detectable output, for example, a reporter protein, such as GFP, in addition to the transcription factor specific. Thus, the luxR transcription factor, generated by the positive-feedback component,t increase its own generation via a positive-feedback loop, and alleviates saturation of the inducer (AHL)-TF interaction.
In some embodiments of the aspects described herein, the shunt component on the HCP is comprised of one or more inducible promoters, such as Plux, that are bound by and shunt away the luxR transcription factor generated by the LCP, thus reducing saturation of the luxR transcription factor-DNA interaction on the LCP.
In addition, in some embodiments, the shunt component on the HCP also generates a protein output, such as a reporter protein, that is different from the TF output of the LCP and the reporter output of the LCP, such as mCherry (
Accordingly, as demonstrated herein, in some embodiments of these aspects, a graded molecular circuit uses AHL as the molecular input species Min; LuxR bound to AHL, termed “LuxRc,” as the output molecular species produced by the association block or C, and LuxRd, bound to DNA, i.e., the Plux promoter as the C′ molecular species produced by the control component. The output transformation block then produces LuxR as Mout with a transformation strength that may be altered by ribosome binding sequences (
In some embodiments of the graded molecular circuits described herein, where a configuration involving a “positive-feedback (PF) component” located on a low-copy plasmid (LCP) and a “shunt component” located on a high-copy plasmid (HCP) is used, the association strength and consequent effective strength of the control block is determined by the Kd of the molecular binding of C to DNA, i.e., LuxRc to Plux in the control block of the exemplary circuit described above. In addition, the degradation strength can be set, in some embodiments, by dilution and protein degradation of the bound molecular species C′=LuxRcb, such as dilution and protein degradation of LuxRcb in the control block of the exemplary circuit described above. Similarly, the degradation strength of the feedback blocks of the circuits is determined by dilution and protein degradation of the molecular species Mout or Mout′, for example, LuxR or LuxRc in the feedback block of the exemplary circuit described herein. The attenuation strength of the feedback block and the attenuation strength of the control block can be altered, in some embodiments, by changing the ratio of the HCP and LCP.
As demonstrated herein, a fluorescent output of this circuit, GFP, was fused to the C-terminus of LuxR and used a HCP Plux-mCherry shunt. The LuxR PF-shunt circuit also had an input dynamic range of more than three orders of magnitude (
In some embodiments of the aspects described herein, the behavior of the PF-shunt circuit motifs can be dynamically tuned by changing the relative copy numbers of the PF and shunt plasmids. For example, in some embodiments, such tuning can be achieved by combining a HCP shunt with a variable-copy plasmid (VCP), based on a pBAC/oriV vector 24, carrying the PF component (
Accordingly, in some embodiments of the graded positive-feedback molecular/biological circuits described herein, where a configuration involving a “positive-feedback (PF) component” located on a low-copy plasmid (LCP) and a “shunt component” located on a high-copy plasmid (HCP) is used, the ratio of the number of high-copy plasmids versus the low-copy plasmids is at least 2:1, at least 3:1, at least 4:1, at least 5:1, at least 6:1, at least 7:1, at least 8:1, at least 9:1, at least 10:1, at least 11:1, at least 12:1, at least 13:1, at least 14:1, at least 15:1, at least 16:1, at least 17:1, at least 18:1, at least 19:1, at least 20:1, at least 25:1, at least 30:1, at least 40:1, at least 50:1, at least 60:1, at least 70:1, at least 80:1, at least 90:1, at least 100:1, or more, or any ratio in between, e.g., 63:1, 27:5 and the like. Modeling and experimental data indicate that the ratio of 63:1 is effective in this embodiment.
Embodiments for graded molecular circuits do not necessarily need an LCP and HCP and can be all implemented on the same plasmid, in some embodiments. For example,
The difference between the DNA sequence of PluxR vs. PluxR56 corresponds to just four base pairs: The ACCT start of the standard PluxR promoter was mutated to TGGG in PluxR56 to obtain the results shown in
In some aspects, the analog computation modules described herein can be used to generate more complex circuits for higher-order functions. For example, as described herein, in some aspects, a molecular circuit can be created for implementing wide-dynamic-range negative logarithms, a broadly useful computation for calculations, such as for example in division, which can be achieved via logarithmic subtraction for applications that need to compute pH or pKa. Such functionality can be built by combining the PF-shunt positive-logarithm component parts described herein with an additional repressor component part, or inversion component, as shown in
For example, in some embodiments, to achieve a molecular circuit having a wide-dynamic-range negative logarithm function, an additional output promoter is added to the LCP of the PF-shunt motif as described for the graded positive-feedback molecular circuits. As shown herein, the behavior of such a circuit was predicted by the biochemical models described herein and was also well fit by a ln(1+x) mathematical function (
In the embodiment of
The log-transformed ratio of two different input inducers as shown in the embodiment of
In addition to the above positive-feedback logarithmic transduction, addition, and subtraction circuits, also provided herein, in some aspects, are “negative-feedback molecular circuits” comprising two or more modular functional components for implementing wide-dynamic range computations, wherein the output molecular species concentration is a desired power-law function of the input molecular species concentration can be constructed. The latter molecular circuit can be implementable or executable in a cell, cellular system, or in vitro system comprising molecular or biological machinery or components, such as transcriptional or translational machinery or components.
In some embodiments of these aspects and all such aspects described herein, the two or more modular components comprise an input association block, a control block, an output transformation block, and a feedback block as in
For example, in some embodiments of these aspects and all such aspects described herein, for example in
In some embodiments of these aspects, the LCP comprises one or more inducible promoters operably linked to sequences encoding transcription factors (TFs) that bind to these same promoters, i.e., TFs that are “specific for the inducible promoter.” In some embodiments, the one or more inducible promoters of the PF component is/are also operably linked to sequences encoding a protein output, such as a detectable output, for example, a reporter protein.
In some embodiments of these aspects, the HCP, acting in its function as an output transformation block, generates a protein output, that can also be operably linked to sequences encoding a reporter protein (lacI-mCherry in
In addition, in some embodiments, the feedback loop can comprise any other molecular species acting on another molecular species, such as any other protein acting on a promoter, or other genetic regulatory element, a microRNA (miRNA) or any other RNA species acting on a promoter or other genetic regulatory element, or a microRNA (miRNA) or any other RNA species bound to a protein acting on a promoter, or other genetic regulatory element.
The circuit of
The circuits described herein, which represent exemplary embodiments, provide a complete basis function set for logarithmically linear analog computation that requires logarithmic transduction (
As described herein, complex synthetic analog circuits can be designed using detailed biochemical models. However, a simpler predictive abstraction can be derived from the fact that the behavior of the circuit motifs described herein can be well fit to logarithmic functions. These biochemical models and mathematical functions provide complementary tools with varying levels of granularity for composing simple analog circuit modules (e.g., input-inducer-to-output-protein modules and input-protein-to-output-protein modules) to implement more complex functions in a predictable fashion. Indeed, abstractions with different levels of granularity are commonly used in other engineering fields during various stages of design20. For example, the straightforward cascade of logarithms from
As demonstrated herein, we have shown that powerful wide-dynamic-range analog computations can be performed with just three biological parts in living cells. Qian and Winfree recently demonstrated the impressive implementation of an in vitro 4-input-bit and 2-output-bit square-root digital calculator using 130 DNA strands within a DNA-based computation framework25. In comparison, the in-vivo analog power-law circuits described herein exploit binding functions that are already present in the biochemistry and therefore only requires two transcription factors. Even 1-bit full adders and subtractors in digital computation require several logic gates and thus, numerous synthetic parts8,9,11. The wide-dynamic-range analog adders and ratiometer circuits described herein are inherently implemented by circuits that add flux to or subtract flux from a common output molecule and can be constructed with no more than three transcription factors (
As demonstrated herein, the analog motifs described herein can be applied to different transcription factor families (e.g., AraC and LuxR). Thus, the analog circuits and motifs described herein are generalizable to other transcription factor-inducer systems, such as those provided herein, via part mining to enable wide-dynamic-range biosensors that provide quantitative measurements of inducer concentrations, rather than binary read-outs26,27.
In some aspects, the mechanisms underlying the analog circuits and motifs described herein are adaptable to other host cells, including yeast and mammalian cells. Indeed, shunt or decoy TF binding sites are naturally present in eukaryotes and are expected to influence the behavior of gene networks28. They can also find applications, in some aspects, in biotechnology by allowing engineers to finely tune the expression level of toxic proteins, enzymes in a metabolic pathway, or stress-response proteins29,30. For example, in some embodiments, ratios between small-molecules (e.g., NAD+/NADH) and proteins (e.g., Oct3/4, Sox2, Klf4, and c-Myc for cellular reprogramming) are important control parameters that could serve as inputs into ratiometric circuits that trigger downstream effectors. More advanced systems can incorporate analog biosensors with feedback control of endogenous genetic circuits to regulate phenotypes in a precise and dynamic fashion. The wide-dynamic-range analog computation circuits and motifs described herein can be further integrated with dynamical systems, such as timers31 and oscillators32-34, negative-feedback linearizing circuits35,36, endogenous circuits37, cell-cell communication8,9,38,39 and implemented using RNA components7,40, synthetic transcriptional regulation3,41, or protein-protein interactions42.
Using fundamental properties of the scaling laws of thermodynamic noise with temperature and molecular count, which are true in both biological and in electronic systems, the pros and cons of analog versus digital computation have been analyzed for neurobiological systems21 and for systems in cell biology20. These results show that analog computation is more efficient than digital computation in part count, speed, and energy consumption below a certain crossover computational precision. While the exact crossover precision varies with the computation, in both electronics and in actual biological cells, the exploitation of feedback loops, calibration loops, technological basis functions, redundancy, signal averaging, and error-correcting topologies can push this crossover precision to higher values. Alternatively, for a given speed of operation, more energy must be expended in creating a higher molecular production rate that leads to a higher molecular count and thus higher precision2,21. Thus, tradeoffs between error and use of resources are inherent to the design of synthetic circuits in living cells. To demonstrate the tunability of the analog circuits described herein, an AraC PF-shunt circuit with two PBAD promoters on the shunt plasmid, was constructed leading to an increase in the log-linear gain of about 2-fold over its single PBAD counterpart (
Efficient and accurate computational paradigm for synthetic biological networks can ultimately be used to integrate both analog and digital processing (a simple example of switched analog computation is shown, for example, in
Also, provided herein, in some aspects, are positive-feedback molecular circuits comprising:
In some embodiments of these circuits and all such circuits described herein, the shunt component further comprises a second molecular species, the expression, activity, and/or generation of which is regulated by the first molecular species of the shunt component. In some embodiments of these circuits and all such circuits described herein, the second molecular species is a detectable output, such as a fluorescent molecule or other well-known detectable biomolecule.
In some embodiments of these circuits and all such circuits described herein, the positive feedback component further comprises a third molecular species, expression, activity, and/or generation of which is regulated by the first molecular species of the positive feedback loop. In some embodiments of these circuits and all such circuits described herein, the second molecular species is a detectable output. In some embodiments of these circuits and all such circuits described herein, the third molecular species of the positive feedback component is different from the second molecular species of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the first molecular species of the shunt component is an inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the first molecular species of the positive feedback component is an inducible promoter sequence. In some embodiments of these circuits and all such circuits described herein, a sequence encoding the second molecular species of the positive feedback component is operably linked to the inducible promoter sequence. In some embodiments of these circuits and all such circuits described herein, the sequence encoding the second molecular species of the positive feedback component encodes for an RNA molecule or protein that is specific for the inducible promoter sequence and increases its transcriptional activity. In some embodiments of these circuits and all such circuits described herein, the protein that is specific for the inducible promoter sequence is a transcription factor. In some embodiments of these circuits and all such circuits described herein, the transcription factor is an engineered transcription factor.
In some embodiments of these circuits and all such circuits described herein, the second molecular species of the feedback component increases transcriptional activity of the first molecular species of the positive feedback component and the first molecular species of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the second molecular species is a transcriptional activator.
In some embodiments of these circuits and all such circuits described herein, a ratio of the shunt component to the positive feedback component is at least 2:1.
In some embodiments of these circuits and all such circuits described herein, the positive feedback component is located on a low-copy plasmid.
In some embodiments of these circuits and all such circuits described herein, the shunt component is located on a high-copy plasmid.
In some embodiments of these circuits and all such circuits described herein,
In some embodiments of these circuits and all such circuits described herein, the positive feedback component further comprises a sequence encoding a detectable output operably linked to the first molecular species.
In some embodiments of these circuits and all such circuits described herein, the shunt component further comprises a sequence encoding a detectable output operably linked to the inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the detectable output of the positive feedback component is different from the detectable output of the shunt component.
In some embodiments of these circuits and all such circuits described herein,
In some embodiments of these circuits and all such circuits described herein, the positive feedback component further comprises a sequence encoding a detectable output operably linked to the PLUX promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the shunt component further comprises a sequence encoding a detectable output operably linked to the PLUX promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the detectable output of the positive feedback component is different from the detectable output of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the detectable output is a reporter output. In some embodiments of these circuits and all such circuits described herein, the detectable output is a fluorescent output.
In some embodiments of these circuits and all such circuits described herein,
In some embodiments of these circuits and all such circuits described herein, the positive feedback component further comprises a sequence encoding a detectable output operably linked to the PBAD promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the shunt component further comprises a sequence encoding a detectable output operably linked to the PBAD promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the detectable output of the positive feedback component is different from the detectable output of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the detectable output is a reporter output.
In some embodiments of these circuits and all such circuits described herein, the detectable output is a fluorescent output.
Also provided herein, in some aspects, are adder molecular circuits or molecular circuits for performing addition or weighted addition comprising two or more of the positive feedback molecular circuits described herein, as shown in, for example,
In some embodiments of these circuits and all such circuits described herein, the inducing molecular species of each of the two or more positive feedback molecular circuits is different.
In some embodiments of these circuits and all such circuits described herein, the inducing molecular species of at least one of the two or more positive feedback molecular circuits is different from the inducing molecular species of any of the other two or more positive feedback molecular circuits.
In some embodiments of these circuits and all such circuits described herein, the shunt component of each of the two or more positive feedback molecular circuits comprises a second molecular species. In some embodiments of these circuits, the second molecular species of the shunt component is a detectable output. In some embodiments of these circuits, the second molecular species of the shunt components of each of the two or more positive feedback molecular circuits is the same or functionally equivalent.
Also provided herein, in some aspects, are negative-slope molecular circuits comprising:
In some embodiments of these circuits and all such circuits described herein, the shunt component further comprises a second molecular species, the expression, activity, and/or generation of which is regulated by the first molecular species of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the second molecular species is a detectable output.
In some embodiments of these circuits and all such circuits described herein, the positive feedback component further comprises a third molecular species, expression, activity, and/or generation of which is regulated by the first molecular species of the positive feedback component.
In some embodiments of these circuits and all such circuits described herein, the second molecular species is a detectable output.
In some embodiments of these circuits and all such circuits described herein, the third molecular species of the positive feedback component is different from the second molecular species of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the first molecular species of the shunt component is an inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the first molecular species of the positive feedback component is an inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, a sequence encoding the second molecular species of the positive feedback component is operably linked to the inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the sequence encoding the second molecular species of the positive feedback component encodes for an RNA molecule or protein that is specific for the inducible promoter sequence and increases its transcriptional activity.
In some embodiments of these circuits and all such circuits described herein, the protein that is specific for the inducible promoter sequence is a transcription factor.
In some embodiments of these circuits and all such circuits described herein, the transcription factor is an engineered transcription factor.
In some embodiments of these circuits and all such circuits described herein, the second molecular species of the feedback component increases transcriptional activity of: (i) the first molecular species of the positive feedback component and (ii) the first molecular species of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the second molecular species is a transcriptional activator.
In some embodiments of these circuits and all such circuits described herein, the inversion component further comprises a fourth molecular species, the expression, activity, and/or generation of which is regulated by the third molecular species of the inversion component.
In some embodiments of these circuits and all such circuits described herein, the fourth molecular species is a detectable output.
In some embodiments of these circuits and all such circuits described herein, the first molecular species of the inversion component is an inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, a sequence encoding the second molecular species of the inversion component is operably linked to the inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the sequence encoding the second molecular species of the inversion component encodes for an RNA molecule or protein that is specific for the third molecular species and decreases its activity.
In some embodiments of these circuits and all such circuits described herein, the third molecular species is an inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, a ratio of the shunt component to the positive feedback component is at least 2:1.
In some embodiments of these circuits and all such circuits described herein, the positive feedback component and the first and second molecular species of the inversion component are located on a low-copy plasmid.
In some embodiments of these circuits and all such circuits described herein, the shunt component and the third molecular species of the inversion component is located on a high-copy plasmid.
In some embodiments of these circuits and all such circuits described herein,
In some embodiments of these circuits and all such circuits described herein, the positive feedback component further comprises a sequence encoding a detectable output operably linked to the first molecular species.
In some embodiments of these circuits and all such circuits described herein, the shunt component further comprises a sequence encoding a detectable output operably linked to the inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the detectable output of the positive feedback component is different from the detectable output of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the inversion component further comprises a sequence encoding a detectable output operably linked to the inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the positive feedback component further comprises a sequence encoding a detectable output operably linked to the PLUX promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the shunt component further comprises a sequence encoding a detectable output operably linked to the PLUX promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the detectable output of the positive feedback component is different from the detectable output of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the inversion component further comprises a sequence encoding a detectable output operably linked to the PlacO promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the detectable output is a reporter output.
In some embodiments of these circuits and all such circuits described herein, the detectable output is a fluorescent output.
Also provided herein, in some aspects, are ratiometric molecular circuits or molecular circuits for performing division comprising at least one positive feedback molecular circuit and at least one negative-slope molecular circuit, as shown in, for example,
Provided herein, in other aspects, are power-law molecular circuit comprising:
In some embodiments of these circuits and all such circuits described herein, the shunt component further comprises a third molecular species, the expression, activity, and/or generation of which is regulated by the first molecular species of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the second molecular species is a detectable output.
In some embodiments of these circuits and all such circuits described herein, the feedback component further comprises a third molecular species, expression, activity, and/or generation of which is regulated by the first molecular species of the feedback component.
In some embodiments of these circuits and all such circuits described herein, the third molecular species is a detectable output.
In some embodiments of these circuits and all such circuits described herein, the third molecular species of the feedback component is different from the third molecular species of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the first molecular species of the feedback component is an inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, a sequence encoding the second molecular species of the feedback component is operably linked to the inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the sequence encoding the second molecular species of the feedback component encodes for an RNA molecule or protein that is specific for the first molecular species of the shunt component and increases its activity.
In some embodiments of these circuits and all such circuits described herein, the protein that is specific for the first molecular species of the shunt component is a transcription factor.
In some embodiments of these circuits and all such circuits described herein, the transcription factor is an engineered transcription factor.
In some embodiments of these circuits and all such circuits described herein, the first molecular species of the shunt component is an inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, a sequence encoding the second molecular species of the shunt component is operably linked to the inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the sequence encoding the second molecular species of the shunt component encodes for an RNA molecule or protein that is specific for the first molecular species of the shunt component and decreases its activity.
In some embodiments of these circuits and all such circuits described herein, the protein that is specific for the first molecular species of the shunt component is a transcription factor.
In some embodiments of these circuits and all such circuits described herein, the transcription factor is an engineered transcription factor.
In some embodiments of these circuits and all such circuits described herein, the second molecular species of the feedback component increases transcriptional activity of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the second molecular species is a transcriptional activator.
In some embodiments of these circuits and all such circuits described herein, a ratio of the shunt component to the feedback component is at least 2:1.
In some embodiments of these circuits and all such circuits described herein, the feedback component is located on a low-copy plasmid.
In some embodiments of these circuits and all such circuits described herein, the shunt component is located on a high-copy plasmid.
In some embodiments of these circuits and all such circuits described herein,
In some embodiments of these circuits and all such circuits described herein, the feedback component further comprises a sequence encoding a detectable output operably linked to the inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the shunt component further comprises a sequence encoding a detectable output operably linked to the inducible promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the detectable output of the feedback component is different from the detectable output of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the feedback component further comprises a sequence encoding a detectable output operably linked to the PlacO promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the shunt component further comprises a sequence encoding a detectable output operably linked to the PBAD promoter sequence.
In some embodiments of these circuits and all such circuits described herein, the detectable output of the feedback component is different from the detectable output of the shunt component.
In some embodiments of these circuits and all such circuits described herein, the detectable output is a reporter output.
In some embodiments of these circuits and all such circuits described herein, the detectable output is a fluorescent output.
In some aspects of all the embodiments of the invention, the circuits are made using nucleic acids as “building blocks” to encode other nucleic acids or proteins that interact with a promoter, enhancer, repressor or other responsive component that can regulate the circuit's expression.
In some aspects of all the embodiments of the invention, the circuits are made using enzymes and ligands thereto to execute the similar functions by regulating the enzyme activity, using, e.g., catalysts and coenzymes to provide the increase or decrease for the enzymatic reaction driving the circuits.
Provided herein are component molecular species or molecular parts that can be used to generate the molecular circuit configurations comprising the modular functional blocks for performing complex mathematical functions described herein. Such molecular species include nucleic acid sequences, such as inducible promoters, transcriptional activators and repressors, degaradation tages, ribosome binding sites, micro RNA binding sequences, and the like. As understood by one of skill in the art, these molecular species can be used to generate the circuit configurations, and specific combinations of these molecular species can be used alone and in combination to modulate the functionalities of the circuits and alter circuit parameters, such as the strength of a given modular functional block, for example.
Accordingly, provided herein are promoter sequences as component molecular species for use in the molecular/biological circuits, and functional and physical modules described herein. In some embodiments of the aspects described herein, the promoters used in the multi-input molecular circuits, and functional and physical modules described herein drive expression of an operably linked output sequence, such as, for example, a transcription factor sequence, a reporter sequence, an enzyme sequence, or a microRNA or other nucleic acid sequence.
The term “promoter” as used herein refers to any nucleic acid sequence that regulates the expression of another nucleic acid sequence by driving transcription of the nucleic acid sequence, which can be a heterologous target gene, encoding a protein or an RNA. Promoters can be constitutive, inducible, activateable, repressible, tissue-specific, or any combination thereof. A promoter is a control region of a nucleic acid sequence at which initiation and rate of transcription of the remainder of a nucleic acid sequence are controlled. A promoter can also contain genetic elements at which regulatory proteins and molecules can bind, such as RNA polymerase and other transcription factors.
In some embodiments of the aspects, a promoter can drive the expression of a transcription factor that regulates the expression of the promoter itself, or that of another promoter used in another modular component described herein.
A promoter can be said to drive expression or drive transcription of the nucleic acid sequence that it regulates. The phrases “operably linked”, “operatively positioned,” “operatively linked,” “under control,” and “under transcriptional control” indicate that a promoter is in a correct functional location and/or orientation in relation to a nucleic acid sequence it regulates to control transcriptional initiation and/or expression of that sequence. An “inverted promoter” is a promoter in which the nucleic acid sequence is in the reverse orientation, such that what was the coding strand is now the non-coding strand, and vice versa.
In addition, in various embodiments described herein, a promoter can be used in conjunction with an “enhancer,” which refers to a cis-acting regulatory sequence involved in the transcriptional activation of a nucleic acid sequence downstream of the promoter. The enhancer can be located at any functional location before or after the promoter, and/or the encoded nucleic acid. A promoter for use in the molecular/biological circuits described herein can also be “bidirectional,” wherein such promoters can initiate transcription of operably linked sequences in both directions.
A promoter can be one naturally associated with a gene or sequence, as can be obtained by isolating the 5′ non-coding sequences located upstream of the coding segment and/or exon of a given gene or sequence. Such a promoter can be referred to as “endogenous.” Similarly, an enhancer can be one naturally associated with a nucleic acid sequence, located either downstream or upstream of that sequence.
Alternatively, certain advantages can be gained by positioning a coding nucleic acid segment under the control of a recombinant or heterologous promoter, which refers to a promoter that is not normally associated with the encoded nucleic acid sequence in its natural environment. A recombinant or heterologous enhancer refers to an enhancer not normally associated with a nucleic acid sequence in its natural environment. Such promoters or enhancers can include promoters or enhancers of other genes; promoters or enhancers isolated from any other prokaryotic, viral, or eukaryotic cell; and synthetic promoters or enhancers that are not “naturally occurring”, i.e., contain different elements of different transcriptional regulatory regions, and/or mutations that alter expression through methods of genetic engineering that are known in the art. In addition to producing nucleic acid sequences of promoters and enhancers synthetically, sequences can be produced using recombinant cloning and/or nucleic acid amplification technology, including PCR, in connection with the molecular/biological circuits described herein (see U.S. Pat. No. 4,683,202, U.S. Pat. No. 5,928,906, each incorporated herein by reference). Furthermore, it is contemplated that control sequences that direct transcription and/or expression of sequences within non-nuclear organdies such as mitochondria, chloroplasts, and the like, can be employed as well.
As described herein, an “inducible promoter” is one that is characterized by initiating or enhancing transcriptional activity when in the presence of, influenced by, or contacted by an inducer or inducing agent. An “inducer” or “inducing agent” can be endogenous, or a normally exogenous compound or protein that is administered in such a way as to be active in inducing transcriptional activity from the inducible promoter.
In some embodiments of the aspects described herein, the inducer or inducing agent, i.e., a chemical, a compound or a protein, can itself be the result of transcription or expression of a nucleic acid sequence (i.e., an inducer can be a transcriptional repressor protein, such as Lad), which itself can be under the control of an inducible promoter. In some embodiments, an inducible promoter is induced in the absence of certain agents, such as a repressor. In other words, in such embodiments, the inducible promoter drives transcription of an operably linked sequence except when the repressor is present. Examples of inducible promoters include but are not limited to, tetracycline, metallothionine, ecdysone, mammalian viruses (e.g., the adenovirus late promoter; and the mouse mammary tumor virus long terminal repeat (MMTV-LTR)) and other steroid-responsive promoters, rapamycin responsive promoters and the like.
Inducible promoters useful in molecular/biological circuits, methods of use, and systems described herein are capable of functioning in both prokaryotic and eukaryotic host organisms. In some embodiments of the different aspects described herein, mammalian inducible promoters are included, although inducible promoters from other organisms, as well as synthetic promoters designed to function in a prokaryotic or eukaryotic host can be used. One important functional characteristic of the inducible promoters described herein is their ultimate inducibility by exposure to an externally applied inducer, such as an environmental inducer. Appropriate environmental inducers include exposure to heat (i.e., thermal pulses or constant heat exposure), various steroidal compounds, divalent cations (including Cu2+ and Zn2+), galactose, tetracycline or doxycycline, IPTG (isopropyl-β-D thiogalactoside), as well as other naturally occurring and synthetic inducing agents and gratuitous inducers.
The promoters for use in the molecular/biological circuits described herein encompass the inducibility of a prokaryotic or eukaryotic promoter by, in part, either of two mechanisms. In some embodiments of the aspects described herein, the molecular/biological circuits comprise suitable inducible promoters that can be dependent upon transcriptional activators that, in turn, are reliant upon an environmental inducer. In other embodiments, the inducible promoters can be repressed by a transcriptional repressor which itself is rendered inactive by an environmental inducer, such as the product of a sequence driven by another promoter. Thus, unless specified otherwise, an inducible promoter can be either one that is induced by an inducing agent that positively activates a transcriptional activator, or one which is derepressed by an inducing agent that negatively regulates a transcriptional repressor. In such embodiments of the various aspects described herein, where it is required to distinguish between an activating and a repressing inducing agent, explicit distinction will be made.
Inducible promoters that are useful in the molecular/biological circuits and methods of use described herein also include those controlled by the action of latent transcriptional activators that are subject to induction by the action of environmental inducing agents. Some non-limiting examples include the copper-inducible promoters of the yeast genes CUP1, CRS5, and SOD1 that are subject to copper-dependent activation by the yeast ACE1 transcriptional activator (see e.g. Strain and Culotta, 1996; Hottiger et al., 1994; Lapinskas et al., 1993; and Gralla et al., 1991). Alternatively, the copper inducible promoter of the yeast gene CTT1 (encoding cytosolic catalase T), which operates independently of the ACE1 transcriptional activator (Lapinskas et al., 1993), can be utilized. The copper concentrations required for effective induction of these genes are suitably low so as to be tolerated by most cell systems, including yeast and Drosophila cells. Alternatively, other naturally occurring inducible promoters can be used in the present invention including: steroid inducible gene promoters (see e.g. Oligino et al. (1998) Gene Ther. 5: 491-6); galactose inducible promoters from yeast (see e.g. Johnston (1987) Microbiol Rev 51: 458-76; Ruzzi et al. (1987) Mol Cell Biol 7: 991-7); and various heat shock gene promoters. Many eukaryotic transcriptional activators have been shown to function in a broad range of eukaryotic host cells, and so, for example, many of the inducible promoters identified in yeast can be adapted for use in a mammalian host cell as well. For example, a unique synthetic transcriptional induction system for mammalian cells has been developed based upon a GAL4-estrogen receptor fusion protein that induces mammalian promoters containing GAL4 binding sites (Braselmann et al. (1993) Proc Natl Acad Sci USA 90: 1657-61). These and other inducible promoters responsive to transcriptional activators that are dependent upon specific inducers are suitable for use with the molecular/biological circuits described herein.
Inducible promoters useful in some embodiments of the molecular/biological circuits and methods of use disclosed herein also include those that are repressed by “transcriptional repressors” that are subject to inactivation by the action of environmental, external agents, or the product of another gene. Such inducible promoters can also be termed “repressible promoters” where it is required to distinguish between other types of promoters in a given module or component of a molecular/biological circuit described herein. Examples include prokaryotic repressors that can transcriptionally repress eukaryotic promoters that have been engineered to incorporate appropriate repressor-binding operator sequences.
In some embodiments, repressors for use in the circuits described herein are sensitive to inactivation by physiologically benign agent. Thus, where a lac repressor protein is used to control the expression of a promoter sequence that has been engineered to contain a lacO operator sequence, treatment of the host cell with IPTG will cause the dissociation of the lac repressor from the engineered promoter containing a lacO operator sequence and allow transcription to occur. Similarly, where a tet repressor is used to control the expression of a promoter sequence that has been engineered to contain a tetO operator sequence, treatment of the host cell with tetracycline or doxycycline will cause the dissociation of the tet repressor from the engineered promoter and allow transcription of the sequence downstream of the engineered promoter to occur.
An inducible promoter useful in the methods and systems as disclosed herein can be induced by one or more physiological conditions, such as changes in pH, temperature, radiation, osmotic pressure, saline gradients, cell surface binding, and the concentration of one or more extrinsic or intrinsic inducing agents. The extrinsic inducer or inducing agent can comprise amino acids and amino acid analogs, saccharides and polysaccharides, nucleic acids, protein transcriptional activators and repressors, cytokines, toxins, petroleum-based compounds, metal containing compounds, salts, ions, enzyme substrate analogs, hormones, and combinations thereof. In specific embodiments, the inducible promoter is activated or repressed in response to a change of an environmental condition, such as the change in concentration of a chemical, metal, temperature, radiation, nutrient or change in pH. Thus, an inducible promoter useful in the molecular/biological circuits, methods and systems as disclosed herein can be a phage inducible promoter, nutrient inducible promoter, temperature inducible promoter, radiation inducible promoter, metal inducible promoter, hormone inducible promoter, steroid inducible promoter, and/or hybrids and combinations thereof.
Promoters that are inducible by ionizing radiation can be used in certain embodiments, where gene expression is induced locally in a cell by exposure to ionizing radiation such as UV or x-rays. Radiation inducible promoters include the non-limiting examples of fos promoter, c-jun promoter or at least one CArG domain of an Egr-1 promoter. Further non-limiting examples of inducible promoters include promoters from genes such as cytochrome P450 genes, inducible heat shock protein genes, metallothionein genes, hormone-inducible genes, such as the estrogen gene promoter, and such. In further embodiments, an inducible promoter useful in the methods and systems as described herein can be Zn2+ metallothionein promoter, metallothionein-1 promoter, human metallothionein IIA promoter, lac promoter, lacO promoter, mouse mammary tumor virus early promoter, mouse mammary tumor virus LTR promoter, triose dehydrogenase promoter, herpes simplex virus thymidine kinase promoter, simian virus 40 early promoter or retroviral myeloproliferative sarcoma virus promoter. Examples of inducible promoters also include mammalian probasin promoter, lactalbumin promoter, GRP78 promoter, or the bacterial tetracycline-inducible promoter. Other examples include phorbol ester, adenovirus E1A element, interferon, and serum inducible promoters.
Inducible promoters useful in the functional modules and molecular/biological circuits as described herein for in vivo uses can include those responsive to biologically compatible agents, such as those that are usually encountered in defined animal tissues or cells. An example is the human PAI-1 promoter, which is inducible by tumor necrosis factor. Further suitable examples include cytochrome P450 gene promoters, inducible by various toxins and other agents; heat shock protein genes, inducible by various stresses; hormone-inducible genes, such as the estrogen gene promoter, and such.
The administration or removal of an inducer or repressor as disclosed herein results in a switch between the “on” or “off” states of the transcription of the operably linked heterologous target gene. Thus, as defined herein the “on” state, as it refers to a promoter operably linked to a nucleic acid sequence, refers to the state when the promoter is actively driving transcription of the operably linked nucleic acid sequence, i.e., the linked nucleic acid sequence is expressed. Several small molecule ligands have been shown to mediate regulated gene expressions, either in tissue culture cells and/or in transgenic animal models. These include the FK1012 and rapamycin immunosupressive drugs (Spencer et al., 1993; Magari et al., 1997), the progesterone antagonist mifepristone (RU486) (Wang, 1994; Wang et al., 1997), the tetracycline antibiotic derivatives (Gossen and Bujard, 1992; Gossen et al., 1995; Kistner et al., 1996), and the insect steroid hormone ecdysone (No et al., 1996). All of these references are herein incorporated by reference. By way of further example, Yao discloses in U.S. Pat. No. 6,444,871, which is incorporated herein by reference, prokaryotic elements associated with the tetracycline resistance (tet) operon, a system in which the tet repressor protein is fused with polypeptides known to modulate transcription in mammalian cells. The fusion protein is then directed to specific sites by the positioning of the tet operator sequence. For example, the tet repressor has been fused to a transactivator (VP16) and targeted to a tet operator sequence positioned upstream from the promoter of a selected gene (Gussen et al., 1992; Kim et al., 1995; Hennighausen et al., 1995). The tet repressor portion of the fusion protein binds to the operator thereby targeting the VP16 activator to the specific site where the induction of transcription is desired. An alternative approach has been to fuse the tet repressor to the KRAB repressor domain and target this protein to an operator placed several hundred base pairs upstream of a gene. Using this system, it has been found that the chimeric protein, but not the tet repressor alone, is capable of producing a 10 to 15-fold suppression of CMV-regulated gene expression (Deuschle et al., 1995).
One example of a repressible promoter useful in the molecular/biological circuits described herein is the Lac repressor (lacR)/operator/inducer system of E. coli that has been used to regulate gene expression by three different approaches: (1) prevention of transcription initiation by properly placed lac operators at promoter sites (Hu and Davidson, 1987; Brown et al., 1987; Figge et al., 1988; Fuerst et al., 1989; Deuschle et al., 1989; (2) blockage of transcribing RNA polymerase II during elongation by a LacR/operator complex (Deuschle et al. (1990); and (3) activation of a promoter responsive to a fusion between LacR and the activation domain of herpes simples virus (HSV) virion protein 16 (VP16) (Labow et al., 1990; Bairn et al., 1991). In one version of the Lac system, expression of lac operator-linked sequences is constitutively activated by a LacR-VP16 fusion protein and is turned off in the presence of isopropyl-β-D-1-thiogalactopyranoside (IPTG) (Labow et al. (1990), cited supra). In another version of the system, a lacR-VP16 variant is used that binds to lac operators in the presence of IPTG, which can be enhanced by increasing the temperature of the cells (Baim et al. (1991), cited supra).
Thus, in some embodiments described herein, components of the Lac system are utilized. For example, a lac operator (LacO) can be operably linked to tissue specific promoter, and control the transcription and expression of the heterologous target gene and another protein, such as a repressor protein for another inducible promoter. Accordingly, the expression of the heterologous target gene is inversely regulated as compared to the expression or presence of Lac repressor in the system.
Components of the tetracycline (Tc) resistance system of E. coli have also been found to function in eukaryotic cells and have been used to regulate gene expression. For example, the Tet repressor (TetR), which binds to tet operator (tetO) sequences in the absence of tetracycline or doxycycline and represses gene transcription, has been expressed in plant cells at sufficiently high concentrations to repress transcription from a promoter containing tet operator sequences (Gatz, C. et al. (1992) Plant J. 2:397-404). In some embodiments described herein, the Tet repressor system is similarly utilized in the molecular/biological circuits described herein.
A temperature- or heat-inducible gene regulatory system can also be used in the circuits and modules described herein, such as the exemplary TIGR system comprising a cold-inducible transactivator in the form of a fusion protein having a heat shock responsive regulator, rheA, fused to the VP16 transactivator (Weber et al., 2003a). The promoter responsive to this fusion thermosensor comprises a rheO element operably linked to a minimal promoter, such as the minimal version of the human cytomegalovirus immediate early promoter. At the permissive temperature of 37° C., the cold-inducible transactivator transactivates the exemplary rheO-CMVmin promoter, permitting expression of the target gene. At 41° C., the cold-inducible transactivator no longer transactivates the rheO promoter. Any such heat-inducible or heat-regulated promoter can be used in accordance with the circuits and methods described herein, including but not limited to a heat-responsive element in a heat shock gene (e.g., hsp20-30, hsp27, hsp40, hsp60, hsp70, and hsp90). See Easton et al. (2000) Cell Stress Chaperones 5(4):276-290; Csermely et al. (1998) Pharmacol Ther 79(2): 129-1 68; Ohtsuka & Hata (2000) Int J Hyperthermia 16(3):231-245; and references cited therein. Sequence similarity to heat shock proteins and heat-responsive promoter elements have also been recognized in genes initially characterized with respect to other functions, and the DNA sequences that confer heat inducibility are suitable for use in the disclosed gene therapy vectors. For example, expression of glucose-responsive genes (e.g., grp94, grp78, mortalin/grp75) (Merrick et al. (1997) Cancer Lett 119(2): 185-1 90; Kiang et al. (1998) FASEB J 12(14):1571-16-579), calreticulin (Szewczenko-Pawlikowski et al. (1997) MoI Cell Biochem 177(1-2): 145-1 52); clusterin (Viard et al. (1999) J Invest Dermatol 112(3):290-296; Michel et al. (1997) Biochem J 328(Ptl):45-50; Clark & Griswold (1997) J Androl 18(3):257-263), histocompatibility class I gene (HLA-G) (Ibrahim et al. (2000) Cell Stress Chaperones 5(3):207-218), and the Kunitz protease isoform of amyloid precursor protein (Shepherd et al. (2000) Neuroscience 99(2):31 7-325) are upregulated in response to heat. In the case of clusterin, a 14 base pair element that is sufficient for heat-inducibility has been delineated (Michel et al. (1997) Biochem J 328(Ptl):45-50). Similarly, a two sequence unit comprising a 10- and a 14-base pair element in the calreticulin promoter region has been shown to confer heat-inducibility (Szewczenko-Pawlikowski et al. (1997) MoI Cell Biochem 177(1-2): 145-1 52).
Other inducible promoters useful in the molecular/biological circuits described herein include the erythromycin-resistance regulon from E. coli, having repressible (Eoff) and inducible (Eon) systems responsive to macrolide antibiotics, such as erythromycin, clarithromycin, and roxithromycin (Weber et al., 2002). The Eoff system utilizes an erythromycin-dependent transactivator, wherein providing a macrolide antibiotic represses transgene expression. In the Eon system, the binding of the repressor to the operator results in repression of transgene expression. Thus, in the presence of macrolides, gene expression is induced.
Fussenegger et al. (2000) describe repressible and inducible systems using a Pip (pristinamycin-induced protein) repressor encoded by the streptogramin resistance operon of Streptomyces coelicolor, wherein the systems are responsive to streptogramin-type antibiotics (such as, for example, pristinamycin, virginiamycin, and Synercid). The Pip DNA-binding domain is fused to a VP16 transactivation domain or to the KRAB silencing domain, for example. The presence or absence of, for example, pristinamycin, regulates the PipON and PipOFF systems in their respective manners, as described therein.
Another example of a promoter expression system useful for the molecular/biological circuits described herein utilizes a quorum-sensing (referring to particular prokaryotic molecule communication systems having diffusible signal molecules that prevent binding of a repressor to an operator site, resulting in derepression of a target regulon) system. For example, Weber et al. (2003b) employ a fusion protein comprising the Streptomyces coelicolor quorum-sending receptor to a transactivating domain that regulates a chimeric promoter having a respective operator that the fusion protein binds. The expression is fine-tuned with non-toxic butyrolactones, such as SCB1 and MP133.
In some embodiments, multiregulated, multigene gene expression systems that are functionally compatible with one another are utilized in the the modules and molecular/biological circuits described herein (see, for example, Kramer et al. (2003)). For example, in Weber et al. (2002), the macrolide-responsive erythromycin resistance regulon system is used in conjunction with a streptogramin (PIP)-regulated and tetracycline-regulated expression systems.
Other promoters responsive to non-heat stimuli can also be used. For example, the mortalin promoter is induced by low doses of ionizing radiation (Sadekova (1997) Int J Radiat Biol 72(6):653-660), the hsp27 promoter is activated by 17-β-estradiol and estrogen receptor agonists (Porter et al. (2001) J MoI Endocrinol 26(1):31-42), the HLA-G promoter is induced by arsenite, hsp promoters can be activated by photodynamic therapy (Luna et al. (2000) Cancer Res 60(6): 1637-1 644). A suitable promoter can incorporate factors such as tissue-specific activation. For example, hsp70 is transcriptionally impaired in stressed neuroblastoma cells (Drujan & De Maio (1999) 12(6):443-448) and the mortalin promoter is up-regulated in human brain tumors (Takano et al. (1997) Exp Cell Res 237(1):38-45). A promoter employed in methods described herein can show selective up-regulation in tumor cells as described, for example, for mortalin (Takano et al. (1997) Exp Cell Res 237(1):38-45), hsp27 and calreticulin (Szewczenko-Pawlikowski et al. (1997) MoI Cell Biochem 177(1-2): 145-1 52; Yu et al. (2000) Electrophoresis 2 1(14):3058-3068)), grp94 and grp78 (Gazit et al. (1999) Breast Cancer Res Treat 54(2): 135-146), and hsp27, hsp70, hsp73, and hsp90 (Cardillo et al. (2000) Anticancer Res 20(6B):4579-4583; Strik et al. (2000) Anticancer Res 20(6B):4457-4552).
In some exemplary embodiments of the circuits described herein, an inducible promoter is an arabinose-inducible promoter PBAD comprising the sequence:
In some exemplary embodiments of the circuits described herein, an inducible promoter is an LuxR-inducible promoter PLuxR comprising the sequence:
In some exemplary embodiments of the circuits described herein, an inducible promoter is an mutated LuxR-targeted promoter with modulated binding efficiency for LuxR, such as, for example,
In some exemplary embodiments of the circuits described herein, the inducible promoter comprises an Anhydrotetracycline (aTc)-inducible promoter as provided in PLtetO-1 (Pubmed Nucleotide# U66309) with the sequence comprising:
In some exemplary embodiments of the circuits described herein, the inducible promoter is an isopropyl β-D-1-thiogalactopyranoside (IPTG) inducible promoter. In one embodiment, the IPTG-inducible promoter comprises the PTAC sequence found in the vector encoded by PubMed Accession ID #EU546824. In one embodiment, the IPTG-inducible promoter sequence comprises the PTrc-2 sequence:
In some exemplary embodiments of the circuits described herein, the IPTG-inducible promoter comprises the PLlacO-1 sequence:
In some exemplary embodiments of the circuits described herein, the IPTG-inducible promoter comprises the PAllacO-1 sequence:
In some exemplary embodiments of the circuits described herein, the IPTG-inducible promoter comprises the Plac/ara-1 sequence
In some exemplary embodiments, the inducible promoter sequence comprises the PLs1con sequence:
Other non-limiting examples of promoters that are useful for use in the low- and molecular circuits described herein are provided in Tables 1-36.
mirabilis
E. coli σ54 promoters
B. subtilis
Synechocystis
In addition to the above-described promoter sequences, the molecular circuits and modular functional blocks described herein can comprise, in addition, one or more molecular species, including, but not limited to, ribosome binding sequences, degradation tag sequences, translational terminator sequences, and anti-sense sequences, that are added to, for example, enhance translation of mRNA sequences for protein synthesis, prevent further transcription downstream of the an encoded protein, or enhance degradation of an mRNA sequence or protein sequence. Such additional molecular species, by enhancing the fidelity and accuracy of the molecular circuits described herein permit, for example, increased numbers and combinations of molecular circuits and improve the capabilities of the molecular circuits described herein. Known enhancer and repressor sequences from promoter regions or intronic regions and their corresponding regulatory proteins or RNAs can also be used to regulate, e.g., transcription.
Ribosome binding sites (RBS) are sequences that promote efficient and accurate translation of mRNAs for protein synthesis, and are also provided for use as molecular species in the molecular circuits and modular functional blocks described herein to enable modulation of the efficiency and rates of synthesis of the proteins encoded by the molecular circuits and modular functional blocks. An RBS affects the translation rate of an open reading frame in two main ways—i) the rate at which ribosomes are recruited to the mRNA and initiate translation is dependent on the sequence of the RBS, and ii) the RBS can also affect the stability of the mRNA, thereby affecting the number of proteins made over the lifetime of the mRNA. Accordingly, one or more ribosome binding site sequences (RBS) can be added to the molecular circuits and modular functional blocks described herein to control expression of proteins, such as transcription factors or protein output products.
Translation initiation in prokaryotes is a complex process involving the ribosome, the mRNA, and several other proteins, such as initiation factors, as described in Laursen B S, et al., Microbiol Mol Biol Rev 2005 March; 69(1) 101-23. Translation initiation can be broken down into two major steps—i) binding of the ribosome and associated factors to the mRNA, and ii) conversion of the bound ribosome into a translating ribosome lengthening processing along the mRNA. The rate of the first step can be increased by making the RBS highly complementary to the free end of the 16s rRNA and by ensuring that the start codon is AUG. The rate of ribosome binding can also be increased by ensuring that there is minimal secondary structure in the neighborhood of the RBS. Since binding between the RBS and the ribosome is mediated by base-pairing interactions, competition for the RBS from other sequences on the mRNA, can reduce the rate of ribosome binding. The rate of the second step in translation initiation, conversion of the bound ribosome into an initiation complex is dependent on the spacing between the RBS and the start codon being optimal (5-6 bp).
Thus, a “ribosome binding site” (“RBS”), as defined herein, is a segment of the 5′ (upstream) part of an mRNA molecule that binds to the ribosome to position the message correctly for the initiation of translation. The RBS controls the accuracy and efficiency with which the translation of mRNA begins. In prokaryotes (such as E. coli) the RBS typically lies about 7 nucleotides upstream from the start codon (i.e., the first AUG). The sequence itself in general is called the “Shine-Dalgarno” sequence after its discoverers, regardless of the exact identity of the bases. Strong Shine-Dalgarno sequences are rich in purines (A's,G's), and the “Shine-Dalgarno consensus” sequence—derived statistically from lining up many well-characterized strong ribosome binding sites—has the sequence AGGAGG. The complementary sequence (CCUCCU) occurs at the 3′-end of the structural RNA (“16S”) of the small ribosomal subunit and it base-pairs with the Shine-Dalgarno sequence in the mRNA to facilitate proper initiation of protein synthesis. In some embodiments of the aspects described herein, a ribosome binding site (RBS) is added to a molecular circuits to regulate expression of a protein encoded by the circuit.
For protein synthesis in eukaryotes and eukaryotic cells, the 5′ end of the mRNA has a modified chemical structure (“cap”) recognized by the ribosome, which then binds the mRNA and moves along it (“scans”) until it finds the first AUG codon. A characteristic pattern of bases (called a “Kozak sequence”) is sometimes found around that codon and assists in positioning the mRNA correctly in a manner reminiscent of the Shine-Dalgarno sequence, but does not involve base pairing with the ribosomal RNA.
RBSs can include only a portion of the Shine-Dalgarno sequence. When looking at the spacing between the RBS and the start codon, the aligned spacing rather than just the absolute spacing is important. In essence, if only a portion of the Shine-Dalgarno sequence is included in the RBS, the spacing that matters is between wherever the center of the full Shine-Dalgarno sequence would be and the start codon rather than between the included portion of the Shine-Dalgarno sequence and the start codon.
While the Shine-Dalgarno portion of the RBS is critical to the strength of the RBS, the sequence upstream of the Shine-Dalgarno sequence is also important. One of the ribosomal proteins, S1, is known to bind to adenine bases upstream from the Shine-Dalgarno sequence. As a result, in some embodiments of the molecular circuits and modular functional blocks described herein, an RBS can be made stronger by adding more adenines to the sequence upstream of the RBS. A promoter may add some bases onto the start of the mRNA that may affect the strength of the RBS by affecting S1 binding.
In addition, the degree of secondary structure can affect the translation initiation rate. This fact can be used to produce regulated translation initiation rates, as described in Isaacs F J et al., Nat Biotechnol 2004 July; 22(7) 841-7.
In addition to affecting the translation rate per unit time, an RBS can affect the level of protein synthesis in a second way. That is because the stability of the mRNA affects the steady state level of mRNA, i.e., a stable mRNA will have a higher steady state level than an unstable mRNA that is being produced as an identical rate. Since the primary sequence and the secondary structure of an RBS (for example, the RBS could introduce an RNase site) can affect the stability of the mRNA, the RBS can affect the amount of mRNA and hence the amount of protein that is synthesized.
A “regulated RBS” is an RBS for which the binding affinity of the RBS and the ribosome can be controlled, thereby changing the RBS strength. One strategy for regulating the strength of prokaryotic RBSs is to control the accessibility of the RBS to the ribosome. By occluding the RBS in RNA secondary structure, translation initiation can be significantly reduced. By contrast, by reducing secondary structure and revealing the RBS, translation initiation rate can be increased. Isaacs and coworkers engineered mRNA sequences with an upstream sequence partially complementary to the RBS. Base-pairing between the upstream sequence and the RBS ‘locks’ the RBS off. A ‘key’ RNA molecule that disrupts the mRNA secondary structure by preferentially base-pairing with the upstream sequence can be used to expose the RBS and increase translation initiation rate.
Accordingly, in some embodiments of the aspects described herein, a ribosome binding site (RBS) for use as molecular species in the molecular circuits and modular functional blocks described herein comprises a sequence that is selected from the group consisting of those provided in the MIT Parts Registry. In some embodiments of the aspects described herein, novel ribosome binding sites can be generated using automated design of synthetic ribosome sites, as described in Salis H M et al., Nature Biotechnology 27, 946-950 (2009).
Terminators are sequences that usually occur at the end of a gene or operon and cause transcription to stop, and are also provided for use as molecular species in the molecular circuits and modular functional blocks described herein to regulate transcription and prevent transcription from occurring in an unregulated fashion, i.e., a terminator sequence prevents activation of downstream modules by upstream promoters. A “terminator” or “termination signal”, as described herein, is comprised of the DNA sequences involved in specific termination of an RNA transcript by an RNA polymerase. Thus, in certain embodiments a terminator that ends the production of an RNA transcript is contemplated for use as a molecular species. A terminator can be necessary in vivo to achieve desirable message levels.
In prokaryotes, terminators usually fall into two categories (1) rho-independent terminators and (2) rho-dependent terminators. Rho-independent terminators are generally composed of palindromic sequence that forms a stem loop rich in G-C base pairs followed by several T bases. Without wishing to be bound by a theory, the conventional model of transcriptional termination is that the stem loop causes RNA polymerase to pause, and transcription of the poly-A tail causes the RNA:DNA duplex to unwind and dissociate from RNA polymerase.
The most commonly used type of terminator is a forward terminator. When placed downstream of a nucleic acid sequence that is usually transcribed, a forward transcriptional terminator will cause transcription to abort. In some embodiments, bidirectional transcriptional terminators are provided. Such terminators will usually cause transcription to terminate on both the forward and reverse strand. Finally, in some embodiments, reverse transcriptional terminators are provided that terminate transcription on the reverse strand only.
In eukaryotic systems, the terminator region can also comprise specific DNA sequences that permit site-specific cleavage of the new transcript so as to expose a polyadenylation site. This signals a specialized endogenous polymerase to add a stretch of about 200 A residues (polyA) to the 3′ end of the transcript. RNA molecules modified with this polyA tail appear to more stable and are translated more efficiently. Thus, in those embodiments involving eukaryotes, it is preferred that a terminator comprises a signal for the cleavage of the RNA, and it is more preferred that the terminator signal promotes polyadenylation of the message. The terminator and/or polyadenylation site elements can serve to enhance message levels and/or to minimize read through between modules of the biological converter switches. As disclosed herein, terminators contemplated for use in molecular circuits and modular functional blocks, and methods of use thereof can include any known terminator of transcription described herein or known to one of ordinary skill in the art. Such terminators include, but are not limited to, the termination sequences of genes, such as for example, the bovine growth hormone terminator, or viral termination sequences, such as for example, the SV40 terminator. In certain embodiments, the termination signal encompasses a lack of transcribable or translatable sequence, such as due to a sequence truncation. The terminator used can be unidirectional or bidirectional.
Terminators for use as molecular species in the molecular circuits and modular functional blocks described herein can be selected from the non-limiting examples of Tables 37-41.
S. cerevisiae
In some embodiments of the aspects described herein, a nucleic sequence encoding a protein degradation tag can be added as a molecular species to the molecular circuits and modular functional blocks described herein to enhance degradation of a protein. As defined herein, a “degradation tag” is a genetic addition to the end of a nucleic acid sequence that modifies the protein that is expressed from that sequence, such that the protein undergoes faster degradation by cellular degradation mechanisms. Thus, such protein degradation tags ‘mark’ a protein for degradation, thus decreasing a protein's half-life.
One of the useful aspects of degradation tags is the ability to detect and regulate gene activity in a time-sensitive manner. Such protein degradation tags can operate through the use of protein-degrading enzymes, such as proteases, within the cell. In some embodiments, the tags encode for a sequence of about eleven amino acids at the C-terminus of a protein, wherein said sequence is normally generated in E. coli when a ribosome gets stuck on a broken (“truncated”) mRNA. Without a normal termination codon, the ribosome can't detach from the defective mRNA. A special type of RNA known as ssrA (“small stable RNA A”) or tmRNA (“transfer-messenger RNA”) rescues the ribosome by adding the degradation tag followed by a stop codon. This allows the ribosome to break free and continue functioning. The tagged, incomplete protein can get degraded by the proteases ClpXP or ClpAP. Although the initial discovery of the number of amino acids encoding for an ssRA/tmRNA tag was eleven, the efficacy of mutating the last three amino acids of that system has been tested. Thus, the tags AAV, ASV, LVA, and LAA are classified by only three amino acids.
In some exemplary embodiments of the aspects described herein, the protein degradation tag is an ssrA tag. In some embodiments of the aspects described herein, the ssrA tag comprises a sequence that is selected from the group consisting of sequences that encode for the peptides RPAANDENYALAA (SEQ ID NO: 815), RPAANDENYALVA (SEQ ID NO: 816), RPAANDENYAAAV (SEQ ID NO: 817), and RPAANDENYAASV (SEQ ID NO: 818).
In some exemplary embodiments of the aspects described herein, the protein degradation tag is an LAA variant comprising the sequence GCAGCAAACGACGAAAACTACGCTTTAGCAGCTTAA (SEQ ID NO: 819). In some embodiments of the aspects described herein, the protein degradation tag is an AAV variant comprising the sequence GCAGCAAACGACGAAAACTACGCTGCAGCAGTTTAA (SEQ ID NO: 820). In some exemplary embodiments of the aspects described herein, the protein degradation tag is an ASV variant comprising the sequence GCAGCAAACGACGAAAACTACGCTGCATCAGTTTAA (SEQ ID NO: 821).
Also provided herein are a variety of biological outputs for use as molecular species in the various molecular circuits and modular functional blocks described herein. These biological outputs, or “output products,” as defined herein, refer to products that can are used as markers of specific states of the molecular circuits and modular functional blocks described herein, or as the output product of one modular block that becomes the input molecular species for a subsequent modular block. An output sequence for use as a molecular species can encode for a protein or an RNA molecule that is used to track or mark the state of the cell upon receiving a particular input for a molecular circuit. Such output products can be used to distinguish between various states of a cell.
Double-stranded (dsRNA) has been shown to direct the sequence-specific silencing of mRNA through a process known as RNA interference (RNAi). The process occurs in a wide variety of organisms, including mammals and other vertebrates. Accordingly, in some embodiments of the aspects described herein, sequences encoding RNA molecules can be used as molecular species or components or output products in the molecular circuits and modular functional blocks. Such RNA molecules can be double-stranded or single-stranded and are designed, in some embodiments, to mediate RNAi, e.g., with respect to another output product or molecular species. In those embodiments where a sequence encodes an RNA molecule that acts to mediate RNAi, the sequence can be said to encode an “iRNA molecule.”
In some embodiments, an iRNA molecule can have any architecture described herein. e.g., it can be incorporate an overhang structure, a hairpin or other single strand structure or a two-strand structure, as described herein. An “iRNA molecule” as used herein, is an RNA molecule which can by itself, or which can be cleaved into an RNA agent that can, downregulate the expression of a target sequence, e.g., an output product encoded by another molecular circuit or modular functional block, as described herein. While not wishing to be bound by theory, an iRNA molecule can act by one or more of a number of mechanisms, including post-transcriptional cleavage of a target mRNA sometimes referred to in the art as RNAi, or pre-transcriptional or pre-translational mechanisms. An iRNA molecule can include a single strand or can include more than one strand, e.g., it can be a double stranded iRNA molecule.
The sequence encoding an iRNA molecule should include a region of sufficient homology to a target sequence, and be of sufficient length in terms of nucleotides, such that the iRNA molecule, or a fragment thereof, can mediate down regulation of the target sequence. Thus, the iRNA molecule is or includes a region that is at least partially, and in some embodiments fully, complementary to a target RNA sequence. It is not necessary that there be perfect complementarity between the iRNA molecule and the target sequence, but the correspondence must be sufficient to enable the iRNA molecule t, or a cleavage product thereof, to direct sequence specific silencing, e.g., by RNAi cleavage of the target RNA sequence, e.g., mRNA.
Complementarity, or degree of homology with the target strand, is most critical in the antisense strand. While perfect complementarity, particularly in the antisense strand, is often desired some embodiments can include, particularly in the antisense strand, one or more but preferably 6, 5, 4, 3, 2, or fewer mismatches (with respect to the target RNA). The mismatches, particularly in the antisense strand, are most tolerated in the terminal regions and if present are preferably in a terminal region or regions, e.g., within 6, 5, 4, or 3 nucleotides of the 5′ and/or 3′ terminus. The sense strand need only be sufficiently complementary with the antisense strand to maintain the overall double strand character of the molecule.
iRNA molecules for use in the molecular circuits and modular functional blocks described herein include: molecules that are long enough to trigger the interferon response (which can be cleaved by Dicer (Bernstein et al. 2001. Nature, 409:363-366) and enter a RISC (RNAi-induced silencing complex); and, molecules that are sufficiently short that they do not trigger the interferon response (which molecules can also be cleaved by Dicer and/or enter a RISC), e.g., molecules that are of a size which allows entry into a RISC, e.g., molecules which resemble Dicer-cleavage products. Molecules that are short enough that they do not trigger an interferon response are termed “sRNA molecules” or “shorter iRNA molecules” herein. Accordingly, a sRNA molecule or shorter iRNA molecule, as used herein, refers to an iRNA molecule, e.g., a double stranded RNA molecule or single strand molecule, that is sufficiently short that it does not induce a deleterious interferon response in a mammalian cell, such as a human cell, e.g., it has a duplexed region of less than 60 but preferably less than 50, 40, or 30 nucleotide pairs. The sRNA molecule, or a cleavage product thereof, can downregulate a target sequence, e.g., by inducing RNAi with respect to a target RNA sequence.
Each strand of an sRNA molecule can be equal to or less than 30, 25, 24, 23, 22, 21, or 20 nucleotides in length. The strand is preferably at least 19 nucleotides in length. For example, each strand can be between 21 and 25 nucleotides in length. Preferred sRNA molecules have a duplex region of 17, 18, 19, 29, 21, 22, 23, 24, or 25 nucleotide pairs, and one or more overhangs, preferably one or two 3′ overhangs, of 2-3 nucleotides.
A “single strand iRNA molecule” as used herein, is an iRNA molecule that is made up of a single molecule. It may include a duplexed region, formed by intra-strand pairing, e.g., it may be, or include, a hairpin or pan-handle structure. Single strand iRNA molecules are preferably antisense with regard to the target sequence. A single strand iRNA molecule should be sufficiently long that it can enter the RISC and participate in RISC mediated cleavage of a target mRNA. A single strand iRNA molecule for use in the modules and biological converter switches described herein is at least 14, and more preferably at least 15, 20, 25, 29, 35, 40, or 50 nucleotides in length. It is preferably less than 200, 100, or 60 nucleotides in length.
Hairpin iRNA molecules can have a duplex region equal to or at least 17, 18, 19, 29, 21, 22, 23, 24, or 25 nucleotide pairs. The duplex region is preferably equal to or less than 200, 100, or 50, in length. Preferred ranges for the duplex region are 15-30, 17 to 23, 19 to 23, and 19 to 21 nucleotides pairs in length. The hairpin preferably has a single strand overhang or terminal unpaired region, preferably the 3′, and preferably of the antisense side of the hairpin. Preferred overhangs are 2-3 nucleotides in length.
A “double stranded (ds) iRNA molecule” as used herein, refers to an iRNA molecule that includes more than one, and preferably two, strands in which interchain hybridization can form a region of duplex structure. The antisense strand of a double stranded iRNA molecule should be equal to or at least, 14, 15, 16 17, 18, 19, 25, 29, 40, or 60 nucleotides in length. It should be equal to or less than 200, 100, or 50, nucleotides in length. Preferred ranges are 17 to 25, 19 to 23, and 19 to 21 nucleotides in length. The sense strand of a double stranded iRNA molecule should be equal to or at least 14, 15, 16 17, 18, 19, 25, 29, 40, or 60 nucleotides in length. It should be equal to or less than 200, 100, or 50, nucleotides in length. Preferred ranges are 17 to 25, 19 to 23, and 19 to 21 nucleotides in length. The double strand portion of a double stranded iRNA molecule should be equal to or at least, 14, 15, 16 17, 18, 19, 20, 21, 22, 23, 24, 25, 29, 40, or 60 nucleotide pairs in length. It should be equal to or less than 200, 100, or 50, nucleotides pairs in length. Preferred ranges are 15-30, 17 to 23, 19 to 23, and 19 to 21 nucleotides pairs in length.
In some embodiments, the ds iRNA molecule is sufficiently large that it can be cleaved by an endogenous molecule, e.g., by Dicer, to produce smaller ds iRNA agents, e.g., sRNAs agents
It is preferred that the sense and antisense strands be chosen such that the ds iRNA molecule includes a single strand or unpaired region at one or both ends of the molecule. Thus, an iRNA agent contains sense and antisense strands, preferable paired to contain an overhang, e.g., one or two 5′ or 3′ overhangs but preferably a 3′ overhang of 2-3 nucleotides. Most embodiments have a 3′ overhang. Preferred sRNA molecule have single-stranded overhangs, preferably 3′ overhangs, of 1 or preferably 2 or 3 nucleotides in length at each end. The overhangs can be the result of one strand being longer than the other, or the result of two strands of the same length being staggered. 5′ ends are preferably phosphorylated.
Preferred lengths for the duplexed region is between 15 and 30, most preferably 18, 19, 20, 21, 22, and 23 nucleotides in length, e.g., in the sRNA molecule range discussed above. sRNA molecules can resemble in length and structure the natural Dicer processed products from long dsRNAs. Hairpin, or other single strand structures which provide the required double stranded region, and preferably a 3′ overhang are also encompassed within the term sRNA molecule, as used herein.
The iRNA molecules described herein, including ds iRNA molecules and sRNA molecules, can mediate silencing of a target RNA, e.g., mRNA, e.g., a transcript of a sequence that encodes a protein expressed in one or more modules or biological converter switches as described herein. For convenience, such a target mRNA is also referred to herein as an mRNA to be silenced or translationally regulated. Such a sequence is also referred to as a target sequence. As used herein, the phrase “mediates RNAi” refers to the ability to silence, in a sequence specific manner, a target RNA molecule or sequence. While not wishing to be bound by theory, it is believed that silencing uses the RNAi machinery or process and a guide RNA, e.g., an sRNA agent of 21 to 23 nucleotides.
In other embodiments of the aspects described herein, RNA molecules for use as molecular species in the molecular circuits and modular functional blocks described herein comprise natural or engineered microRNA sequences. Also provided herein are references and resources, such as programs and databases found on the World Wide Web, that can be used for obtaining information on endogenous microRNAs and their expression patterns, as well as information in regard to cognate microRNA sequences and their properties.
Mature microRNAs (also referred to as miRNAs) are short, highly conserved, endogenous non-coding regulatory RNAs (18 to 24 nucleotides in length), expressed from longer transcripts (termed “pre-microRNAs”) encoded in animal, plant and virus genomes, as well as in single-celled eukaryotes. Endogenous miRNAs found in genomes regulate the expression of target genes by binding to complementary sites, termed herein as “microRNA target sequences,” in the mRNA transcripts of target genes to cause translational repression and/or transcript degradation. miRNAs have been implicated in processes and pathways such as development, cell proliferation, apoptosis, metabolism and morphogenesis, and in diseases including cancer (S. Griffiths-Jones et al., “miRBase: tools for microRNA genomics.” Nuc. Acid. Res., 2007: 36, D154-D158). Expression of a microRNA target sequence refers to transcription of the DNA sequence that encodes the microRNA target sequence to RNA. In some embodiments, a microRNA target sequence is operably linked to or driven by a promoter sequence. In some embodiments, a microRNA target sequence comprises part of another sequence that is operably linked to a promoter sequence, and is said to be linked to, attached to, or fused to, the sequence encoding the output product.
The way microRNA and their targets interact in animals and plants is different in certain aspects. Translational repression is thought to be the primary mechanism in animals, with transcript degradation the dominant mechanism for plant target transcripts. The difference in mechanisms lies in the fact that plant miRNA exhibits perfect or nearly perfect base pairing with the target but in the case of animals, the pairing is rather imperfect. Also, miRNAs in plants bind to their targets within coding regions cleaving at single sites, whereas most of the miRNA binding sites in animals are in the 3′ un-translated regions (UTR). In animals, functional miRNA:miRNA target sequence duplexes are found to be more variable in structure and they contain only short complementary sequence stretches, interrupted by gaps and mismatches. In animal miRNA: miRNA target sequence interactions, multiplicity (one miRNA targeting more than one gene) and cooperation (one gene targeted by several miRNAs) are very common but rare in the case of plants. All these make the approaches in miRNA target prediction in plants and animals different in details (V. Chandra et al., “MTar: a computational microRNA target prediction architecture for human transcriptome.” BMC Bioinformatics 2010, 11(Suppl 1):S2).
Experimental evidence shows that the miRNA target sequence needs enough complementarities in either the 3′ end or in the 5′ end for its binding to a miRNA. Based on these complementarities of miRNA: miRNA target sequence target duplex, the miRNA target sequence can be divided into three main classes. They are the 5′ dominant seed site targets (5′ seed-only), the 5′ dominant canonical seed site targets (5′ dominant) and the 3′ complementary seed site targets (3′ canonical). The 5′ dominant canonical targets possess high complementarities in 5′ end and a few complementary pairs in 3′ end. The 5′ dominant seed-only targets possess high complementarities in 5′ end (of the miRNA) and only a very few or no complementary pairs in 3′ end. The seed-only sites have a perfect base pairing to the seed portion of 5′ end of the miRNA and limited base pairing to 3′ end of the miRNA. The 3′ complimentary targets have high complementarities in 3′ end and insufficient pairings in 5′ end. The seed region of the miRNA is a consecutive stretch of seven or eight nucleotides at 5′ end. The 3′ complementary sites have an extensive base pairing to 3′ end of the miRNA that compensate for imperfection or a shorter stretch of base pairing to a seed portion of the miRNA. All of these site types are used to mediate regulation by miRNAs and show that the 3′ complimentary class of target site is used to discriminate among individual members of miRNA families in vivo. A genome-wide statistical analysis shows that on an average one miRNA has approximately 100 evolutionarily conserved target sites, indicating that miRNAs regulate a large fraction of protein-coding genes.
At present, miRNA databases include miRNAs for human, Caenorhabditis elegans, D. melanogaster, Danio rerio (zebrafish), Gallus gallus (chicken), and Arabidopsis thaliana. miRNAs are even present in simple multicellular organisms, such as poriferans (sponges) and cnidarians (starlet sea anemone). Many of the bilaterian animal miRNAs are phylogenetically conserved; 55% of C. elegans miRNAs have homologues in humans, which indicates that miRNAs have had important roles throughout animal evolution. Animal miRNAs seem to have evolved separately from those in plants because their sequences, precursor structure and biogenesis mechanisms are distinct from those in plants (Kim V N et al., “Biogenesis of small RNAs in animals.” Nat Rev Mol Cell Biol. 2009 February; 10(2):126-39).
miRNAs useful as components and output products for designing the molecular circuits and modular functional blocks described herein can be found at a variety of databases as known by one of skill in the art, such as those described at “miRBase: tools for microRNA genomics.” Nuc. Acid. Res., 2007: 36 (Database Issue), D154-D158; “miRBase: microRNA sequences, targets and gene nomenclature.” Nuc. Acid. Res., 2006 34 (Database Issue):D140-D144; and “The microRNA Registry.” Nuc. Acid. Res., 2004 32 (Database Issue):D109-D111), which are incorporated herein in their entirety by reference.
Accordingly, in some embodiments of the aspects described herein, a molecular circuit or modular functional block can further comprise as a molecular species a sequence encoding an RNA molecule, such as an iRNA molecule or microRNA molecule. In such embodiments, the sequence encoding the RNA molecule can be operably linked to a promoter sequence, or comprise part of another sequence, such as a sequence encoding a protein output. In those embodiments where the RNA molecule comprises part of, is linked to, attached to, or fused to, the sequence encoding, e.g., an output product, transcription of the sequence results in expression of both the mRNA of the output product and expression of the RNA molecule.
In some embodiments of the aspects described herein, the output product of a given molecular circuit, or one modular component of such a circuit, is itself a transcriptional activator or repressor, the production of which by a module or circuit can provide additional input signals to subsequent or additional modules or molecular circuits. For example, the output product encoded by a inversion component can be a transcriptional repressor that prevents transcription from another module of a molecular circuit.
Transcriptional regulators either activate or repress transcription from cognate promoters. Transcriptional activators typically bind nearby to transcriptional promoters and recruit RNA polymerase to directly initiate transcription. Transcriptional repressors bind to transcriptional promoters and sterically hinder transcriptional initiation by RNA polymerase. Some transcriptional regulators serve as either an activator or a repressor depending on where it binds and cellular conditions. Examples of transcriptional regulators for use as output products in the molecular circuits described herein are provided in Table 41.
Rhizobium leguminosarum (+LVA)
Vibrio cholerae
Bacillus licheniformis (+LVA)
Salmonella phage P22 (+LVA)
E. coli)
P. aeruginosa PA3477 (+LVA)
P. aeruginosa PA3477 (no LVA)
An enzyme can be a molecular species for for use in different embodiments of the molecular circuits described herein. In some embodiments, an enzyme output is used as a response to a particular set of inputs. For example, in response to a particular number of inputs received by one or more molecular circuits described herein, a molecular circuit or modular block thereof can encode as an output product an enzyme as a molecular species that can degrade or otherwise destroy specific products produced by the cell.
In some embodiments, output product sequences encode “biosynthetic enzymes” that catalyze the conversion of substrates to products. For example, such biosynthetic enzymes can be combined together along with or within the modules and molecular circuits described herein to construct pathways that produce or degrade useful chemicals and materials, in response to specific signals. These combinations of enzymes can reconstitute either natural or synthetic biosynthetic pathways. These enzymes have applications in specialty chemicals, biofuels, and bioremediation. Descriptions of enzymes useful as molecular species for the modules and molecular circuits are described herein.
N-Acyl Homoserine lactones (AHLs or N-AHLs) are a class of signaling molecules involved in bacterial quorum sensing. Several similar quorum sensing systems exists across different bacterial species; thus, there are several known enzymes that synthesize or degrade different AHL molecules that can be used for the modules and molecular circuits described herein.
Pseudomonas aeruginosa
Pseudomonas aeruginosa
Isoprenoids, also known as terpenoids, are a large and highly diverse class of natural organic chemicals with many functions in plant primary and secondary metabolism. Most are multicyclic structures that differ from one another not only in functional groups but also in their basic carbon skeletons. Isoprenoids are synthesized from common prenyl diphosphate precursors through the action of terpene synthases and terpene-modifying enzymes such as cytochrome P450 monooxygenases. Plant terpenoids are used extensively for their aromatic qualities. They play a role in traditional herbal remedies and are under investigation for antibacterial, antineoplastic, and other pharmaceutical functions. Much effort has been directed toward their production in microbial hosts.
There are two primary pathways for making isoprenoids: the mevalonate pathway and the non-mevalonate pathway.
Odorants are volatile compounds that have an aroma detectable by the olfactory system. Odorant enzymes convert a substrate to an odorant product. Exemplary odorant enzymes are described in Table 45.
The following are exemplary enzymes involved in the biosynthesis of plastic, specifically polyhydroxybutyrate.
The following are exemplary enzymes involved in the biosynthesis of butanol and butanol metabolism.
Other miscellaneous enzymes for use as molecular species for the modules and molecular circuits are provided in Table 48.
Cellulomonas fimi exoglucanase
Cellulomonas fimi endoglucanase A
Cytophaga hutchinsonii
Synechocystis
synechocystis
Other enzymes of use as molecular species for the modules and molecular circuits described herein include enzymes that phosphorylate or dephosphorylate either small molecules or other proteins, and enzymes that methylate or demethylate other proteins or DNA.
In some embodiments of the aspects described herein, nucleic acid sequences encoding selection markers are used as as molecular species for the modules and molecular circuits. “Selection markers,” as defined herein, refer to output products that confer a selective advantage or disadvantage to a biological unit, such as a cell or cellular system. For example, a common type of prokaryotic selection marker is one that confers resistance to a particular antibiotic. Thus, cells that carry the selection marker can grow in media despite the presence of antibiotic. For example, most plasmids contain antibiotic selection markers so that it is ensured that the plasmid is maintained during cell replication and division, as cells that lose a copy of the plasmid will soon either die or fail to grow in media supplemented with antibiotic. A second common type of selection marker, often termed a positive selection marker, includes those selection markers that are toxic to the cell. Positive selection markers are frequently used during cloning to select against cells transformed with the cloning vector and ensure that only cells transformed with a plasmid containing the insert. Examples of selection markers for use as molecular species are provided in Table 50.
In some embodiments of the aspects described herein, the output molecular species are “reporters.” As defined herein, “reporters” refer to proteins that can be used to measure gene expression. Reporters generally produce a measurable signal such as fluorescence, color, or luminescence. Reporter protein coding sequences encode proteins whose presence in the cell or organism is readily observed. For example, fluorescent proteins cause a cell to fluoresce when excited with light of a particular wavelength, luciferases cause a cell to catalyze a reaction that produces light, and enzymes such as β-galactosidase convert a substrate to a colored product. In some embodiments, reporters are used to quantify the strength or activity of the signal received by the modules or biological converter switches of the invention. In some embodiments, reporters can be fused in-frame to other protein coding sequences to identify where a protein is located in a cell or organism.
There are several different ways to measure or quantify a reporter depending on the particular reporter and what kind of characterization data is desired. In some embodiments, microscopy can be a useful technique for obtaining both spatial and temporal information on reporter activity, particularly at the single cell level. In other embodiments, flow cytometers can be used for measuring the distribution in reporter activity across a large population of cells. In some embodiments, plate readers may be used for taking population average measurements of many different samples over time. In other embodiments, instruments that combine such various functions, can be used, such as multiplex plate readers designed for flow cytometers, and combination microscopy and flow cytometric instruments.
Fluorescent proteins are convenient ways to visualize or quantify the output of a molecular circuit or modular functional block described herein. Fluorescence can be readily quantified using a microscope, plate reader or flow cytometer equipped to excite the fluorescent protein with the appropriate wavelength of light. Since several different fluorescent proteins are available, multiple gene expression measurements can be made in parallel. Non-limiting examples of fluorescent proteins are provided in Table 51.
Discosoma striata (coral)
Luminescence can be readily quantified using a plate reader or luminescence counter. Luciferases can be used as output products for various embodiments described herein, for example, measuring low levels of gene expression, because cells tend to have little to no background luminescence in the absence of a luciferase. Non-limiting examples of luciferases are provided in Table 52.
Photinus pyralis
In other embodiments, enzymes that produce colored substrates can be quantified using spectrophotometers or other instruments that can take absorbance measurements including plate readers. Like luciferases, enzymes like β-galactosidase can be used for measuring low levels of gene expression because they tend to amplify low signals. Non-limiting examples of such enzymes are provided in Table 53.
Another reporter output product for use as a molecular species in the different aspects and embodiments described herein includes fluoresceine-A-binding (BBa K157004).
Also useful as output products for use as molecular species for the modules and molecular circuits described herein are receptors, ligands, and lytic proteins. Receptors tend to have three domains: an extracellular domain for binding ligands such as proteins, peptides or small molecules, a transmembrane domain, and an intracellular or cytoplasmic domain which frequently can participate in some sort of signal transduction event such as phosphorylation. In some embodiments, transporter, channel, or pump gene sequences are used as molecular species, such as output product genes. Transporters are membrane proteins responsible for transport of substances across the cell membrane. Channels are made up of proteins that form transmembrane pores through which selected ions can diffuse. Pumps are membrane proteins that can move substances against their gradients in an energy-dependent process known as active transport. In some embodiments, nucleic acid sequences encoding proteins and protein domains whose primary purpose is to bind other proteins, ions, small molecules, and other ligands are used. Exemplary receptors, ligands, and lytic proteins are listed in Table 55.
Vibrio cholerae
Kluyveromyces lactis
The methods and uses of the molecular circuits described herein can involve in vivo, ex vivo, or in vitro systems. The term “in vivo” refers to assays or processes that occur in or within an organism, such as a multicellular animal. In some of the aspects described herein, a method or use can be said to occur “in vivo” when a unicellular organism, such as a bacteria, is used. The term “ex vivo” refers to methods and uses that are performed using a living cell with an intact membrane that is outside of the body of a multicellular animal or plant, e.g., explants, cultured cells, including primary cells and cell lines, transformed cell lines, and extracted tissue or cells, including blood cells, among others. The term “in vitro” refers to assays and methods that do not require the presence of a cell with an intact membrane, such as cellular extracts, and can refer to the introducing a molecular circuit in a non-cellular system, such as a media or solutions not comprising cells or cellular systems, such as cellular extracts.
A cell for use with the molecular circuits described herein can be any cell or host cell. As defined herein, a “cell” or “cellular system” is the basic structural and functional unit of all known independently living organisms. It is the smallest unit of life that is classified as a living thing, and is often called the building block of life. Some organisms, such as most bacteria, are unicellular (consist of a single cell). Other organisms, such as humans, are multicellular. A “natural cell,” as defined herein, refers to any prokaryotic or eukaryotic cell found naturally. A “prokaryotic cell” can comprise a cell envelope and a cytoplasmic region that contains the cell genome (DNA) and ribosomes and various sorts of inclusions.
In some embodiments, the cell is a eukaryotic cell, preferably a mammalian cell. A eukaryotic cell comprises membrane-bound compartments in which specific metabolic activities take place, such as a nucleus. In other embodiments, the cell or cellular system is an artificial or synthetic cell. As defined herein, an “artificial cell” or a “synthetic cell” is a minimal cell formed from artificial parts that can do many things a natural cell can do, such as transcribe and translate proteins and generate ATP.
Cells of use in the various aspects described herein upon transformation or transfection with molecular r circuits described herein include any cell that is capable of supporting the activation and expression of the molecular circuits. In some embodiments of the aspects described herein, a cell can be from any organism or multi-cell organism. Examples of eukaryotic cells that can be useful in aspects described herein include eukaryotic cells selected from, e.g., mammalian, insect, yeast, or plant cells. The molecular circuits described herein can be introduced into a variety of cells including, e.g., fungal, plant, or animal (nematode, insect, plant, bird, reptile, or mammal (e.g., a mouse, rat, rabbit, hamster, gerbil, dog, cat, goat, pig, cow, horse, whale, monkey, or human)). The cells can be primary cells, immortalized cells, stem cells, or transformed cells. In some preferred embodiments, the cells comprise stem cells. Expression vectors for the components of the molecular circuit will generally have a promoter and/or an enhancer suitable for expression in a particular host cell of interest. The present invention contemplates the use of any such vertebrate cells for the molecular circuits, including, but not limited to, reproductive cells including sperm, ova and embryonic cells, and non-reproductive cells, such as kidney, lung, spleen, lymphoid, cardiac, gastric, intestinal, pancreatic, muscle, bone, neural, brain, and epithelial cells.
As used herein, the term “stem cells” is used in a broad sense and includes traditional stem cells, progenitor cells, preprogenitor cells, reserve cells, and the like. The term “stem cell” or “progenitor cell” are used interchangeably herein, and refer to an undifferentiated cell which is capable of proliferation and giving rise to more progenitor cells having the ability to generate a large number of mother cells that can in turn give rise to differentiated, or differentiable daughter cells. Stem cells for use with the molecular circuits and the methods described herein can be obtained from endogenous sources such as cord blood, or can be generated using in vitro or ex vivo techniques as known to one of skill in the art. For example, a stem cell can be an induced pluripotent stem cell (iPS cell) derived using any methods known in the art. The daughter cells themselves can be induced to proliferate and produce progeny that subsequently differentiate into one or more mature cell types, while also retaining one or more cells with parental developmental potential. The term “stem cell” refers then, to a cell with the capacity or potential, under particular circumstances, to differentiate to a more specialized or differentiated phenotype, and which retains the capacity, under certain circumstances, to proliferate without substantially differentiating. In one embodiment, the term progenitor or stem cell refers to a generalized mother cell whose descendants (progeny) specialize, often in different directions, by differentiation, e.g., by acquiring completely individual characters, as occurs in progressive diversification of embryonic cells and tissues. Cellular differentiation is a complex process typically occurring through many cell divisions. A differentiated cell can derive from a multipotent cell which itself is derived from a multipotent cell, and so on. While each of these multipotent cells can be considered stem cells, the range of cell types each can give rise to can vary considerably. Some differentiated cells also have the capacity to give rise to cells of greater developmental potential. Such capacity can be natural or can be induced artificially upon treatment with various factors. In many biological instances, stem cells are also “multipotent” because they can produce progeny of more than one distinct cell type, but this is not required for “stem-ness.” Self-renewal is the other classical part of the stem cell definition, and it is essential as used in this document. In theory, self-renewal can occur by either of two major mechanisms. Stem cells can divide asymmetrically, with one daughter retaining the stem state and the other daughter expressing some distinct other specific function and phenotype. Alternatively, some of the stem cells in a population can divide symmetrically into two stems, thus maintaining some stem cells in the population as a whole, while other cells in the population give rise to differentiated progeny only. Formally, it is possible that cells that begin as stem cells might proceed toward a differentiated phenotype, but then “reverse” and re-express the stem cell phenotype, a term often referred to as “dedifferentiation”.
Exemplary stem cells include, but are not limited to, embryonic stem cells, adult stem cells, pluripotent stem cells, induced pluripotent stem cells (iPS cells), neural stem cells, liver stem cells, muscle stem cells, muscle precursor stem cells, endothelial progenitor cells, bone marrow stem cells, chondrogenic stem cells, lymphoid stem cells, mesenchymal stem cells, hematopoietic stem cells, central nervous system stem cells, peripheral nervous system stem cells, and the like. Descriptions of stem cells, including method for isolating and culturing them, can be found in, among other places, Embryonic Stem Cells, Methods and Protocols, Turksen, ed., Humana Press, 2002; Weisman et al., Annu. Rev. Cell. Dev. Biol. 17:387 403; Pittinger et al., Science, 284:143 47, 1999; Animal Cell Culture, Masters, ed., Oxford University Press, 2000; Jackson et al., PNAS 96(25):14482 86, 1999; Zuk et al., Tissue Engineering, 7:211 228, 2001 (“Zuk et al.”); Atala et al., particularly Chapters 33 41; and U.S. Pat. Nos. 5,559,022, 5,672,346 and 5,827,735. Descriptions of stromal cells, including methods for isolating them, can be found in, among other places, Prockop, Science, 276:71 74, 1997; Theise et al., Hepatology, 31:235 40, 2000; Current Protocols in Cell Biology, Bonifacino et al., eds., John Wiley & Sons, 2000 (including updates through March, 2002); and U.S. Pat. No. 4,963,489; Phillips B W and Crook J M, Pluripotent human stem cells: A novel tool in drug discovery. BioDrugs. 2010 Apr. 1; 24(2):99-108; Mari Ohnuki et al., Generation and Characterization of Human Induced Pluripotent Stem Cells, Current Protocols in Stem Cell Biology Unit Number: UNIT 4A., September, 2009.
The term “biological sample” as used herein refers to a cell or population of cells or a quantity of tissue or fluid from a subject. Most often, the sample has been removed from a subject, but the term “biological sample” can also refer to cells or tissue analyzed in vivo, i.e. without removal from the subject. Often, a “biological sample” will contain cells from the animal, but the term can also refer to non-cellular biological material.
The term “disease” or “disorder” is used interchangeably herein, refers to any alternation in state of the body or of some of the organs, interrupting or disturbing the performance of the functions and/or causing symptoms such as discomfort, dysfunction, distress, or even death to the person afflicted or those in contact with a person. A disease or disorder can also related to a distemper, ailing, ailment, malady, disorder, sickness, illness, complaint, interdisposition, affection. A disease and disorder, includes but is not limited to any condition manifested as one or more physical and/or psychological symptoms for which treatment is desirable, and includes previously and newly identified diseases and other disorders.
In some embodiments of the aspects described herein, the cells for use with the molecular circuits described herein are bacterial cells. The term “bacteria” as used herein is intended to encompass all variants of bacteria, for example, prokaryotic organisms and cyanobacteria. In some embodiments, the bacterial cells are gram-negative cells and in alternative embodiments, the bacterial cells are gram-positive cells. Non-limiting examples of species of bacterial cells useful for engineering with the molecular circuits described herein include, without limitation, cells from Escherichia coli, Bacillus subtilis, Salmonella typhimurium and various species of Pseudomonas, Streptomyces, and Staphylococcus. Other examples of bacterial cells that can be genetically engineered for use with the molecular circuits described herein include, but are not limited to, cells from Yersinia spp., Escherichia spp., Klebsiella spp., Bordetella spp., Neisseria spp., Aeromonas spp., Franciesella spp., Corynebacterium spp., Citrobacter spp., Chlamydia spp., Hemophilus spp., Brucella spp., Mycobacterium spp., Legionella spp., Rhodococcus spp., Pseudomonas spp., Helicobacter spp., Salmonella spp., Vibrio spp., Bacillus spp., and Erysipelothrix spp. In some embodiments, the bacterial cells are E. coli cells.
Other examples of organisms from which cells can be transformed or transfected with the molecular circuits described herein include, but are not limited to the following: Staphylococcus aureus, Bacillus subtilis, Clostridium butyricum, Brevibacterium lactofermentum, Streptococcus agalactiae, Lactococcus lactis, Leuconostoc lactis, Streptomyces, Actinobacillus actinobycetemcomitans, Bacteroides, cyanobacteria, Escherichia coli, Helobacter pylori, Selnomonas ruminatium, Shigella sonnei, Zymomonas mobilis, Mycoplasma mycoides, or Treponema denticola, Bacillus thuringiensis, Staphylococcus lugdunensis, Leuconostoc oenos, Corynebacterium xerosis, Lactobacillus planta rum, Streptococcus faecalis, Bacillus coagulans, Bacillus ceretus, Bacillus popillae, Synechocystis strain PCC6803, Bacillus liquefaciens, Pyrococcus abyssi, Selenomonas nominantium, Lactobacillus hilgardii, Streptococcus ferns, Lactobacillus pentosus, Bacteroides fragilis, Staphylococcus epidermidis, Staphylococcus epidermidis, Zymomonas mobilis, Streptomyces phaechromogenes, Streptomyces ghanaenis, Halobacterium strain GRB, and Halobaferax sp. strain Aa2.2.
In other embodiments of the aspects described herein, molecular circuits can be introduced into a non-cellular system such as a virus or phage, by direct integration of the molecular circuit nucleic acid, for example, into the viral genome. A virus for use with the molecular circuits described herein can be a dsDNA virus (e.g. Adenoviruses, Herpesviruses, Poxviruses), a ssDNA viruses ((+)sense DNA) (e.g. Parvoviruses); a dsRNA virus (e.g. Reoviruses); a (+)ssRNA viruses ((+)sense RNA) (e.g. Picornaviruses, Togaviruses); (−)ssRNA virus ((−)sense RNA) (e.g. Orthomyxoviruses, Rhabdoviruses); a ssRNA-Reverse Transcriptase viruses ((+)sense RNA with DNA intermediate in life-cycle) (e.g. Retroviruses); or a dsDNA—Reverse Transcriptase virus (e.g. Hepadnaviruses).
Viruses can also include plant viruses and bacteriophages or phages. Examples of phage families that can be used with the molecular circuits described herein include, but are not limited to, Myoviridae (T4-like viruses; P1-like viruses; P2-like viruses; Mu-like viruses; SPO1-like viruses; φH-like viruses); Siphoviridaeλ-like viruses (T1-like viruses; T5-like viruses; c2-like viruses; L5-like viruses; ψM1-like viruses; φC31-like viruses; N15-like viruses); Podoviridae (T7-like viruses; φ29-like viruses; P22-like viruses; N4-like viruses); Tectiviridae (Tectivirus); Corticoviridae (Corticovirus); Lipothrixviridae (Alphalipothrixvirus, Betalipothrixvirus, Gammalipothrixvirus, Deltalipothrixvirus); Plasmaviridae (Plasmavirus);Rudiviridae (Rudivirus); Fuselloviridae (Fusellovirus); Inoviridae(Inovirus, Plectrovirus); Microviridae (Microvirus, Spiromicrovirus, Bdellomicrovirus, Chlamydiamicrovirus); Leviviridae (Levivirus, Allolevivirus) and Cystoviridae (Cystovirus). Such phages can be naturally occurring or engineered phages.
In some embodiments of the aspects described herein, the molecular circuits are introduced into a cellular or non-cellular system using a vector or plasmid. As used herein, the term “vector” is used interchangeably with “plasmid” to refer to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked Vectors capable of directing the expression of genes and/or nucleic acid sequence to which they are operatively linked are referred to herein as “expression vectors.” In general, expression vectors of utility in the methods and molecular circuits described herein are often in the form of “plasmids,” which refer to circular double stranded DNA loops which, in their vector form are not bound to the chromosome. In some embodiments, all components of a given molecular circuit can be encoded in a single vector. For example, a lentiviral vector can be constructed, which contains all components necessary for a functional molecular circuit as described herein. In some embodiments, individual components (e.g., positive-deeback component a shunt component, an inversion component) can be separately encoded in different vectors and introduced into one or more cells separately.
Other expression vectors can be used in different embodiments described herein, for example, but not limited to, plasmids, episomes, bacteriophages or viral vectors, and such vectors can integrate into the host's genome or replicate autonomously in the particular cellular system used. Viral vector include, but are not limited to, retroviral vectors, such as lentiviral vectors or gammaretroviral vectors, adenoviral vectors, and baculoviral vectors. In some embodiments, lentiviral vectors comprising the nucleic acid sequences encoding the molecular circuits described herein are used. For example, a lentiviral vector can be used in the form of lentiviral particles. Other forms of expression vectors known by those skilled in the art which serve the equivalent functions can also be used. Expression vectors comprise expression vectors for stable or transient expression encoding the DNA. A vector can be either a self replicating extrachromosomal vector or a vector which integrates into a host genome. One type of vector is a genomic integrated vector, or “integrated vector”, which can become integrated into the chromosomal DNA or RNA of a host cell, cellular system, or non-cellular system. In some embodiments, the nucleic acid sequence or sequences encoding the biological classifier circuits and component input detector modules described herein integrates into the chromosomal DNA or RNA of a host cell, cellular system, or non-cellular system along with components of the vector sequence.
In other embodiments, the nucleic acid sequence encoding a molecular circuit directly integrates into chromosomal DNA or RNA of a host cell, cellular system, or non-cellular system, in the absence of any components of the vector by which it was introduced. In such embodiments, the nucleic acid sequence encoding the molecular circuits can be integrated using targeted insertions, such as knock-in technologies or homologous recombination techniques, or by non-targeted insertions, such as gene trapping techniques or non-homologous recombination.
Another type of vector for use in the methods and molecular circuits described herein is an episomal vector, i.e., a nucleic acid capable of extra-chromosomal replication. Such plasmids or vectors can include plasmid sequences from bacteria, viruses or phages. Such vectors include chromosomal, episomal and virus-derived vectors e.g., vectors derived from bacterial plasmids, bacteriophages, yeast episomes, yeast chromosomal elements, and viruses, vectors derived from combinations thereof, such as those derived from plasmid and bacteriophage genetic elements, cosmids and phagemids. A vector can be a plasmid, bacteriophage, bacterial artificial chromosome (BAC) or yeast artificial chromosome (YAC). A vector can be a single or double-stranded DNA, RNA, or phage vector. In some embodiments, the molecular circuits and component modules are introduced into a cellular system using a BAC vector.
The vectors comprising the molecular circuits and component modules described herein can be “introduced” into cells as polynucleotides, preferably DNA, by techniques well-known in the art for introducing DNA and RNA into cells. The term “transduction” refers to any method whereby a nucleic acid sequence is introduced into a cell, e.g., by transfection, lipofection, electroporation, biolistics, passive uptake, lipid:nucleic acid complexes, viral vector transduction, injection, contacting with naked DNA, gene gun, and the like. The vectors, in the case of phage and viral vectors can also be introduced into cells as packaged or encapsidated virus by well-known techniques for infection and transduction. Viral vectors can be replication competent or replication defective. In the latter case, viral propagation generally occurs only in complementing host cells. In some embodiments, the biological classifier circuits and component input detector modules are introduced into a cell using other mechanisms known to one of skill in the art, such as a liposome, microspheres, gene gun, fusion proteins, such as a fusion of an antibody moiety with a nucleic acid binding moiety, or other such delivery vehicle.
The molecular circuits or the vectors comprising the molecular circuits described herein can be introduced into a cell using any method known to one of skill in the art. The term “transformation” as used herein refers to the introduction of genetic material (e.g., a vector comprising a biological classifier circuit) comprising one or more modules or biological classifier circuits described herein into a cell, tissue or organism. Transformation of a cell can be stable or transient. The term “transient transformation” or “transiently transformed” refers to the introduction of one or more transgenes into a cell in the absence of integration of the transgene into the host cell's genome. Transient transformation can be detected by, for example, enzyme linked immunosorbent assay (ELISA), which detects the presence of a polypeptide encoded by one or more of the transgenes. For example, a molecular circuit can further comprise a promoter operably linked to an output product, such as a reporter protein. Expression of that reporter protein indicates that a cell has been transformed or transfected with the molecular circuit, and is hence implementing the circuit. Alternatively, transient transformation can be detected by detecting the activity of the protein encoded by the transgene. The term “transient transformant” refers to a cell which has transiently incorporated one or more transgenes.
In contrast, the term “stable transformation” or “stably transformed” refers to the introduction and integration of one or more transgenes into the genome of a cell or cellular system, preferably resulting in chromosomal integration and stable heritability through meiosis. Stable transformation of a cell can be detected by Southern blot hybridization of genomic DNA of the cell with nucleic acid sequences, which are capable of binding to one or more of the transgenes. Alternatively, stable transformation of a cell can also be detected by the polymerase chain reaction of genomic DNA of the cell to amplify transgene sequences. The term “stable transformant” refers to a cell or cellular, which has stably integrated one or more transgenes into the genomic DNA. Thus, a stable transformant is distinguished from a transient transformant in that, whereas genomic DNA from the stable transformant contains one or more transgenes, genomic DNA from the transient transformant does not contain a transgene. Transformation also includes introduction of genetic material into plant cells in the form of plant viral vectors involving epichromosomal replication and gene expression, which can exhibit variable properties with respect to meiotic stability. Transformed cells, tissues, or plants are understood to encompass not only the end product of a transformation process, but also transgenic progeny thereof.
The terms “nucleic acids” and “nucleotides” refer to naturally occurring or synthetic or artificial nucleic acid or nucleotides. The terms “nucleic acids” and “nucleotides” comprise deoxyribonucleotides or ribonucleotides or any nucleotide analogue and polymers or hybrids thereof in either single- or doublestranded, sense or antisense form. As will also be appreciated by those in the art, many variants of a nucleic acid can be used for the same purpose as a given nucleic acid. Thus, a nucleic acid also encompasses substantially identical nucleic acids and complements thereof. Nucleotide analogues include nucleotides having modifications in the chemical structure of the base, sugar and/or phosphate, including, but not limited to, 5-position pyrimidine modifications, 8-position purine modifications, modifications at cytosine exocyclic amines, substitution of 5-bromo-uracil, and the like; and 2′-position sugar modifications, including but not limited to, sugar-modified ribonucleotides in which the 2′-OH is replaced by a group selected from H, OR, R, halo, SH, SR, NH2, NHR, NR2, or CN. shRNAs also can comprise non-natural elements such as non-natural bases, e.g., ionosin and xanthine, nonnatural sugars, e.g., 2′-methoxy ribose, or non-natural phosphodiester linkages, e.g., methylphosphonates, phosphorothioates and peptides.
The term “nucleic acid sequence” or “oligonucleotide” or “polynucleotide” are used interchangeably herein and refers to at least two nucleotides covalently linked together. The term “nucleic acid sequence” is also used inter-changeably herein with “gene”, “cDNA”, and “mRNA”. As will be appreciated by those in the art, the depiction of a single nucleic acid sequence also defines the sequence of the complementary nucleic acid sequence. Thus, a nucleic acid sequence also encompasses the complementary strand of a depicted single strand. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated. As will also be appreciated by those in the art, a single nucleic acid sequence provides a probe that can hybridize to the target sequence under stringent hybridization conditions. Thus, a nucleic acid sequence also encompasses a probe that hybridizes under stringent hybridization conditions. The term “nucleic acid sequence” refers to a single or double-stranded polymer of deoxyribonucleotide or ribonucleotide bases read from the 5′- to the 3′-end. It includes chromosomal DNA, self-replicating plasmids, infectious polymers of DNA or RNA and DNA or RNA that performs a primarily structural role. “Nucleic acid sequence” also refers to a consecutive list of abbreviations, letters, characters or words, which represent nucleotides. Nucleic acid sequences can be single stranded or double stranded, or can contain portions of both double stranded and single stranded sequence. The nucleic acid sequence can be DNA, both genomic and cDNA, RNA, or a hybrid, where the nucleic acid sequence can contain combinations of deoxyribo- and ribonucleotides, and combinations of bases including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine hypoxanthine, isocytosine and isoguanine. Nucleic acid sequences can be obtained by chemical synthesis methods or by recombinant methods. A nucleic acid sequence will generally contain phosphodiester bonds, although nucleic acid analogs can be included that can have at least one different linkage, e.g., phosphoramidate, phosphorothioate, phosphorodithioate, or O-methylphosphoroamidite linkages and peptide nucleic acid backbones and linkages in the nucleic acid sequence. Other analog nucleic acids include those with positive backbones; non-ionic backbones, and non-ribose backbones, including those described in U.S. Pat. Nos. 5,235,033 and 5,034,506, which are incorporated by reference. Nucleic acid sequences containing one or more non-naturally occurring or modified nucleotides are also included within one definition of nucleic acid sequences. The modified nucleotide analog can be located for example at the 5′-end and/or the 3′-end of the nucleic acid sequence. Representative examples of nucleotide analogs can be selected from sugar- or backbone-modified ribonucleotides. It should be noted, however, that also nucleobase-modified ribonucleotides, i.e. ribonucleotides, containing a non naturally occurring nucleobase instead of a naturally occurring nucleobase such as uridines or cytidines modified at the 5-position, e.g. 5-(2-amino)propyl uridine, 5-bromo uridine; adenosines and guanosines modified at the 8-position, e.g. 8-bromo guanosine; deaza nucleotides, e. g. 7 deaza-adenosine; O- and N-alkylated nucleotides, e.g. N6-methyl adenosine are suitable. The 2′ OH— group can be replaced by a group selected from H. OR, R. halo, SH, SR, NH2, NHR, NR2 or CN, wherein R is C-C6 alkyl, alkenyl or alkynyl and halo is F. Cl, Br or I. Modifications of the ribose-phosphate backbone can be done for a variety of reasons, e.g., to increase the stability and half-life of such molecules in physiological environments or as probes on a biochip. Mixtures of naturally occurring nucleic acids and analogs can be used; alternatively, mixtures of different nucleic acid analogs, and mixtures of naturally occurring nucleic acids and analogs can be used. Nucleic acid sequences include but are not limited to, nucleic acid sequence encoding proteins, for example that act as reporters, transcriptional repressors, antisense molecules, ribozymes, small inhibitory nucleic acid sequences, for example but not limited to RNAi, shRNAi, siRNA, micro RNAi (mRNAi), antisense oligonucleotides etc.
In its broadest sense, the term “substantially complementary”, when used herein with respect to a nucleotide sequence in relation to a reference or target nucleotide sequence, means a nucleotide sequence having a percentage of identity between the substantially complementary nucleotide sequence and the exact complementary sequence of said reference or target nucleotide sequence of at least 60%, at least 70%, at least 80% or 85%, at least 90%, at least 93%, at least 95% or 96%, at least 97% or 98%, at least 99% or 100% (the later being equivalent to the term “identical” in this context). For example, identity is assessed over a length of at least 10 nucleotides, or at least 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 or up to 50 nucleotides of the entire length of the nucleic acid sequence to said reference sequence (if not specified otherwise below). Sequence comparisons are carried out using default GAP analysis with the University of Wisconsin GCG, SEQWEB application of GAP, based on the algorithm of Needleman and Wunsch (Needleman and Wunsch (1970) J MoI. Biol. 48: 443-453; as defined above). A nucleotide sequence “substantially complementary” to a reference nucleotide sequence hybridizes to the reference nucleotide sequence under low stringency conditions, preferably medium stringency conditions, most preferably high stringency conditions (as defined above).
In its broadest sense, the term “substantially identical”, when used herein with respect to a nucleotide sequence, means a nucleotide sequence corresponding to a reference or target nucleotide sequence, wherein the percentage of identity between the substantially identical nucleotide sequence and the reference or target nucleotide sequence is at least 60%, at least 70%, at least 80% or 85%, at least 90%, at least 93%, at least 95% or 96%, at least 97% or 98%, at least 99% or 100% (the later being equivalent to the term “identical” in this context). For example, identity is assessed over a length of 10-22 nucleotides, such as at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 or up to 50 nucleotides of a nucleic acid sequence to said reference sequence (if not specified otherwise below). Sequence comparisons are carried out using default GAP analysis with the University of Wisconsin GCG, SEQWEB application of GAP, based on the algorithm of Needleman and Wunsch (Needleman and Wunsch (1970) J MoI. Biol. 48: 443-453; as defined above). A nucleotide sequence that is “substantially identical” to a reference nucleotide sequence hybridizes to the exact complementary sequence of the reference nucleotide sequence (i.e. its corresponding strand in a double-stranded molecule) under low stringency conditions, preferably medium stringency conditions, most preferably high stringency conditions (as defined above). Homologues of a specific nucleotide sequence include nucleotide sequences that encode an amino acid sequence that is at least 24% identical, at least 35% identical, at least 50% identical, at least 65% identical to the reference amino acid sequence, as measured using the parameters described above, wherein the amino acid sequence encoded by the homolog has the same biological activity as the protein encoded by the specific nucleotide. The term “substantially non-identical” refers to a nucleotide sequence that does not hybridize to the nucleic acid sequence under stringent conditions.
As used herein, the term “gene” refers to a nucleic acid sequence comprising an open reading frame encoding a polypeptide, including both exon and (optionally) intron sequences. A “gene” refers to coding sequence of a gene product, as well as non-coding regions of the gene product, including 5′UTR and 3′UTR regions, introns and the promoter of the gene product. These definitions generally refer to a single-stranded molecule, but in specific embodiments will also encompass an additional strand that is partially, substantially or fully complementary to the single-stranded molecule. Thus, a nucleic acid sequence can encompass a double-stranded molecule or a double-stranded molecule that comprises one or more complementary strand(s) or “complement(s)” of a particular sequence comprising a molecule. As used herein, a single stranded nucleic acid can be denoted by the prefix “ss”, a double stranded nucleic acid by the prefix “ds”, and a triple stranded nucleic acid by the prefix “ts.”
The term “operable linkage” or “operably linked” are used interchangeably herein, are to be understood as meaning, for example, the sequential arrangement of a regulatory element (e.g. a promoter) with a nucleic acid sequence to be expressed and, if appropriate, further regulatory elements (such as, e.g., a terminator) in such a way that each of the regulatory elements can fulfill its intended function to allow, modify, facilitate or otherwise influence expression of the linked nucleic acid sequence. The expression can result depending on the arrangement of the nucleic acid sequences in relation to sense or antisense RNA. To this end, direct linkage in the chemical sense is not necessarily required. Genetic control sequences such as, for example, enhancer sequences, can also exert their function on the target sequence from positions which are further away, or indeed from other DNA molecules. In some embodiments, arrangements are those in which the nucleic acid sequence to be expressed recombinantly is positioned behind the sequence acting as promoter, so that the two sequences are linked covalently to each other. The distance between the promoter sequence and the nucleic acid sequence to be expressed recombinantly can be any distance, and in some embodiments is less than 200 base pairs, especially less than 100 base pairs, less than 50 base pairs. In some embodiments, the nucleic acid sequence to be transcribed is located behind the promoter in such a way that the transcription start is identical with the desired beginning of the chimeric RNA described herein. Operable linkage, and an expression construct, can be generated by means of customary recombination and cloning techniques as described (e.g., in Maniatis T, Fritsch E F and Sambrook J (1989) Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor (N.Y.); Silhavy et al. (1984) Experiments with Gene Fusions, Cold Spring Harbor Laboratory, Cold Spring Harbor (N.Y.); Ausubel et al. (1987) Current Protocols in Molecular Biology, Greene Publishing Assoc and Wiley Interscience; Gelvin et al. (Eds) (1990) Plant Molecular Biology Manual; Kluwer Academic Publisher, Dordrecht, The Netherlands). However, further sequences can also be positioned between the two sequences. The insertion of sequences can also lead to the expression of fusion proteins, or serves as ribosome binding sites. In some embodiments, the expression construct, consisting of a linkage of promoter and nucleic acid sequence to be expressed, can exist in a vector integrated form and be inserted into a plant genome, for example by transformation.
The term “expression” as used herein refers to the biosynthesis of a gene product, preferably to the transcription and/or translation of a nucleotide sequence, for example an endogenous gene or a heterologous gene, in a cell. For example, in the case of a heterologous nucleic acid sequence, expression involves transcription of the heterologous nucleic acid sequence into mRNA and, optionally, the subsequent translation of mRNA into one or more polypeptides. Expression also refers to biosynthesis of a microRNA or RNAi molecule, which refers to expression and transcription of an RNAi agent such as siRNA, shRNA, and antisense DNA but does not require translation to polypeptide sequences. The term “expression construct” and “nucleic acid construct” as used herein are synonyms and refer to a nucleic acid sequence capable of directing the expression of a particular nucleotide sequence, such as the heterologous target gene sequence in an appropriate host cell (e.g., a prokaryotic cell, eukaryotic cell, or mammalian cell). If translation of the desired heterologous target gene is required, it also typically comprises sequences required for proper translation of the nucleotide sequence. The coding region can code for a protein of interest but can also code for a functional RNA of interest, for example, microRNA, microRNA target sequence, antisense RNA, dsRNA, or a nontranslated RNA, in the sense or antisense direction. The nucleic acid construct as disclosed herein can be chimeric, meaning that at least one of its components is heterologous with respect to at least one of its other components.
The terms “polypeptide”, “peptide”, “oligopeptide”, “polypeptide”, “gene product”, “expression product” and “protein” are used interchangeably herein to refer to a polymer or oligomer of consecutive amino acid residues.
The term “subject” refers to any living organism from which a biological sample, such as a cell sample, can be obtained. The term includes, but is not limited to, humans; non-human primates, such as chimpanzees and other apes and monkey species; farm animals such as cattle, sheep, pigs, goats and horses, domestic subjects such as dogs and cats, laboratory animals including rodents such as mice, rats and guinea pigs, and the like. The term does not denote a particular age or sex. Thus, adult and newborn subjects, as well as fetuses, whether male or female, are intended to be covered. The term “subject” is also intended to include living organisms susceptible to conditions or diseases caused or contributed bacteria, pathogens, disease states or conditions as generally disclosed, but not limited to, throughout this specification. Examples of subjects include humans, dogs, cats, cows, goats, and mice.
The terms “higher” or “increased” or “increase” as used herein in the context of expression or biological activity of a microRNA or protein generally means an increase in the expression level or activity of the microRNA or protein by a statically significant amount relative to a reference level, state or condition. For the avoidance of doubt, a “higher” or “increased”, expression of a microRNA means a statistically significant increase of at least about 50% as compared to a reference level or state, including an increase of at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100% or more, including, for example at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 6-fold, at least 7-fold, at least 8-fold, at least 9-fold, at least 10-fold, at least 20-fold, at least 30-fold, at least 40-fold, at least 50-fold, at least 60-fold, at least 70-fold, at least 80-fold, at least 90-fold, at least 100-fold, at least 500-fold, at least 1000-fold increase or greater of the level of expression of the microRNA relative to the reference level.
Similarly, the terms “lower”, “reduced”, or “decreased” are all used herein generally to mean a decrease by a statistically significant amount. However, for avoidance of doubt, “lower”, “reduced”, “reduction” or “decreased” means a decrease by at least 50% as compared to a reference level, for example a decrease by at least about 60%, or at least about 70%, or at least about 80%, or at least about 90%, or at least about 95%, or up to and including a 100% decrease (i.e. absent level as compared to a reference sample), or any decrease between 50-100% as compared to a reference level.
As used herein, the term “comprising” means that other elements can also be present in addition to the defined elements presented. The use of “comprising” indicates inclusion rather than limitation. Accordingly, the terms “comprising” means “including principally, but not necessary solely”. Furthermore, variation of the word “comprising”, such as “comprise” and “comprises”, have correspondingly the same meanings. The term “consisting essentially of” means “including principally, but not necessary solely at least one”, and as such, is intended to mean a “selection of one or more, and in any combination”. Stated another way, the term “consisting essentially of” means that an element can be added, subtracted or substituted without materially affecting the novel characteristics described herein. This applies equally to steps within a described method as well as compositions and components therein. In other embodiments, the inventions, compositions, methods, and respective components thereof, described herein are intended to be exclusive of any element not deemed an essential element to the component, composition or method (“consisting of”). For example, a biological classifier circuit that comprises a repressor sequence and a microRNA target sequence encompasses both the repressor sequence and a microRNA target sequence of a larger sequence. By way of further example, a composition that comprises elements A and B also encompasses a composition consisting of A, B and C.
As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. Thus for example, references to “the method” includes one or more methods, and/or steps of the type described herein and/or which will become apparent to those persons skilled in the art upon reading this disclosure and so forth.
It is understood that the foregoing detailed description and the following examples are illustrative only and are not to be taken as limitations upon the scope described herein. Various changes and modifications to the disclosed embodiments, which will be apparent to those of skill in the art, can be made without departing from the spirit and scope described herein. Further, all patents, patent applications, publications, and websites identified are expressly incorporated herein by reference for the purpose of describing and disclosing, for example, the methodologies described in such publications that might be used in connection with the present invention. These publications are provided solely for their disclosure prior to the filing date of the present application. Nothing in this regard should be construed as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior invention or for any other reason. All statements as to the date or representation as to the contents of these documents are based on the information available to the applicants and do not constitute any admission as to the correctness of the dates or contents of these documents.
Unless otherwise defined herein, scientific and technical terms used in connection with the present application shall have the meanings that are commonly understood by those of ordinary skill in the art to which this disclosure belongs. It should be understood that this invention is not limited to the particular methodology, protocols, and reagents, etc., described herein and as such can vary. The terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention, which is defined solely by the claims. Definitions of common terms in immunology, and molecular biology can be found in The Merck Manual of Diagnosis and Therapy, 18th Edition, published by Merck Research Laboratories, 2006 (ISBN 0-911910-18-2); Robert S. Porter et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0-632-02182-9); and Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 1-56081-569-8); Immunology by Werner Luttmann, published by Elsevier, 2006. Definitions of common terms in molecular biology are found in Benjamin Lewin, Genes IX, published by Jones & Bartlett Publishing, 2007 (ISBN-13: 9780763740634); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0-632-02182-9); and Robert A. Meyers (ed.), Maniatis et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., USA (1982); Sambrook et al., Molecular Cloning: A Laboratory Manual (2 ed.), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., USA (1989); Davis et al., Basic Methods in Molecular Biology, Elsevier Science Publishing, Inc., New York, USA (1986); or Methods in Enzymology: Guide to Molecular Cloning Techniques Vol. 152, S. L. Berger and A. R. Kimmerl Eds., Academic Press Inc., San Diego, USA (1987); Current Protocols in Molecular Biology (CPMB) (Fred M. Ausubel, et al. ed., John Wiley and Sons, Inc.), Current Protocols in Protein Science (CPPS) (John E. Coligan, et. al., ed., John Wiley and Sons, Inc.) and Current Protocols in Immunology (CPI) (John E. Coligan, et. al., ed. John Wiley and Sons, Inc.), which are all incorporated by reference herein in their entireties.
It is understood that the foregoing detailed description and examples are illustrative only and are not to be taken as limitations upon the scope of the invention. Various changes and modifications to the disclosed embodiments, which will be apparent to those of skill in the art, may be made without departing from the spirit and scope of the present invention. Further, all patents, patent applications, and publications identified are expressly incorporated herein by reference for the purpose of describing and disclosing, for example, the methodologies described in such publications that might be used in connection with the present invention. These publications are provided solely for their disclosure prior to the filing date of the present application. Nothing in this regard should be construed as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior invention or for any other reason. All statements as to the date or representation as to the contents of these documents are based on the information available to the applicants and do not constitute any admission as to the correctness of the dates or contents of these documents.
Presented herein are strategies for designing synthetic gene circuits which implement analog computation in living cells. One approach involves detailed biochemical models which capture the effects of positive feedback, shunt plasmids, protein degradation, and transcription-factor diffusion. These detailed biochemical models enable us to accurately capture the behavior of the various analog circuit topologies by solely changing the parameters that are expected to vary between experiments (e.g., plasmid copy number).
Another approach described herein uses simple mathematical functions, such as logarithms, to capture the behaviour of the analog circuit motifs described herein with a handful of parameters. These empirical mathematical functions enable the composition of analog circuit modules together with predictable behavior. Thus, they are useful in the synthetic circuit design process because they are easily interpretable by human designers and remain accurate in circuits of higher complexity.
Described herein are detailed biochemical models for synthetic analog genetic circuits. The models described and demonstrated herein incorporate effects of biochemical interactions, such as binding of inducers to transcription factors, binding of transcription factors to promoters, degradation of free and bound transcription factors to DNA, the effective variation of transcription-factor diffusion-limited binding rates inside the cell with variation in plasmid copy number, and the integration of all these effects in the positive-feedback-and-shunt (PF-shunt) topology described herein. To clarify the various interactions within these biochemical reaction models, analog circuit schematics' that represent steady-state mass-action kinetics are also shown.
The models described and demonstrated herein yield insight into and predict network behavior. The models assume that the concentration of chemical species is uniformly distributed and the behavior of the genetic circuits described herein can be analyzed in the steady state. For each experiment, only model parameter values that varied in that experiment (e.g., the copy number of plasmids used) were adjusted. All other parameter values were used consistently throughout all of our models.
As used herein, to describe interactions between inducers, transcription factors, and DNA, transcription factors are called “free” if they are not interacting with inducers or DNA. When inducers complex with transcription factors, the resulting product is termed the inducer-transcription-factor “complex”. When free transcription factors bind to DNA, these are termed “bound” transcription factors. When inducer-transcription factor complexes bind to DNA, these are termed as “bound complex transcription factors”. (For all the abbreviations, refer to Table 1).
The set of ordinary differential equations which model the process of free inducer (In) binding to free transcription factor (T) (In+T⇄TC) can be described by:
Where TC is the concentration of transcription factor bound to the inducer, k1 is the rate of the forward reaction and k−1 is the rate of the reverse reaction. At equilibrium, the bound transcription factor is equal to:
Where InT is the concentration of total inducer, TT is the concentration of total transcription factor and Km=k−1/k1 is the dissociation constant. In the case that
we can approximate Eq. 2.4 as:
Note that the Michaelis-Menten approximation is a special case of Eq. 3 (where TT<<InT. Eq. 3 shows that the amount of bound transcription factor (Ta) will saturate at high values of total transcription factor (TT) because it is limited by the inducer concentration (InT); in contrast, in the Michaelis-Menten model, bound transcription factor increases linearly with increasing total transcription factor, without being limited by inducer saturation.
Many binding reactions include cooperativity between inducers and transcription factors. We will study two specific cases of cooperativity (h=2 and 3, where h is the Hill Coefficient):
In the case of h=2 (Hill Coefficient=2):
The set of the ordinary differential equations which describes the set of biochemical reactions in Eq. 4 includes:
At equilibrium:
Where Km1=k−1/k1, and Km2=k−2/k2. Substituting Eq. 6.1, 6.3 and 6.4 into Eq. 6.2, we get:
We will assume that the concentration of the product of the final reaction is larger than the concentration of the product of the intermediate reactions (Km2<<Km1); in this case, Eq. 7 can be approximated by:
Where Km2=Km1·Km2. In the case that
we can approximate Eq. 8 as
In the case of h=3 (Hill Coefficient=3):
The set of the ordinary differential equations which describes the set of biochemical reactions in Eq. 10 includes:
At equilibrium:
Where Km1=k−1/k1, Km2=k−2/k2 and Km3=k−3/k3. Substituting Eq. 12.1, 12.2, 12.4 and 12.5 into Eq. 12.3 we get:
We will assume that the concentration of the product of the final reaction is larger than the concentration of the products of the intermediate reactions (Km3<<<Km2, Km1); in this case Eq. 13 can be approximated by:
Where Km3=Km1·Km2·Km3. In the case that
we can approximate Eq. 14 as
Based on these specific cases, we can generalize Eq. 3, 9 and 15 by using the Hill function2:
where h1 is the Hill coefficient, h2 and Kn are fitting parameters with h2<h1 and <Km. We study the condition
in two different cases:
We use Eq. 16 to describe inducer-transcription factor binding reactions in combination with literature-based values for the Hill coefficient h1 and dissociation constant Km (Supplementary Table 2). Supplementary
Transcription factor (TF) binding to promoters is modeled according to the Shea-Ackers formalism3,4. The total expression PT from a promoter is described by a weighted sum of the basal level probability (1−P) and the induced level probability P:
P
T=Const1·(1−P)+Const2·P→PT=Const1+(Const2−Const1)·P, (17)
where Const1 and Const2 are constants that correspond to basal or induced expression respectively. In this study we used two activator-type transcription factors: LuxR5 and AraC6. The probability of the Lux promoter (Plux) being induced is described by the following equation:
where Kd is the dissociation constant for the binding of the inducer-transcription factor (AHL-LuxR) complex (LuxRC) to the promoter Plux. The concentration of the bound-promoter complex (AHL-LuxR-Plux) is directly proportional to the probability of the promoter being induced and the concentration of promoter binding sites (OT):
The sum of the free (AHL-LuxR) complex (LuxRC) and bound (AHL-LuxR) complex (LuxRCb) are equal to the total (AHL-LuxR) complex LuxRCT:
LuxR
CT
=LuxR
C
+LuxR
Cb (20)
The PBAD promoter is activated by the AraC transcription factor when it is induced by arabinose. The probability of the PBAD promoter being induced by the arabinose-AraC complex is described by the following equation7:
where AraCC is the concentration of the arabinose-AraC complex, AraC is the concentration of free AraC transcription factor, Kd is the dissociation constant for binding of the arabinose-AraC complex to the PBAD promoter, and Kdf is the dissociation constant for free AraC binding to PBAD. The probability of free AraC binding to the promoter is equal to:
The concentration of the bound-promoter complex arabinose-AraC-PBAD (AraCCb) is directly proportional to the probability of the promoter being induced and the number of the promoter binding sites (OT):
The concentration of the bound AraC (AraCb) to the promoter is directly proportional to the probability of binding the free AraC to the promoter and the number of the promoter binding sites:
The sum of the free (arabinose-AraC) complex (AraCC) and bound (arabinose-AraC) complex (AraCCb) to DNA is equal to the total (arabinose-AraC) complex AraCCT, and the sum of free AraC (AraC) and bound AraC (AraCb) to DNA is equal to AraCT−AraCCT:
AraC
CT
=AraC
C
+AraC
Cb (25)
AraC
T
−AraC
CT
=AraC+AraC
b (26)
In the models described herein, as in others, free and DNA-bound transcription factor degrade at different rates8. Generally DNA can protect a transcription factor from degradation, thereby decreasing its degradation rate. The degradation process for a transcription factor can be described by the following reactions9,10:
where T is the concentration of free transcription factor; Tb is the concentration of transcription factor bound to DNA; E is the concentration of free protein-degrading enzyme; kf and kfb are the forward reaction rates of the binding of free transcription factor and DNA-bound transcription factor to the protein-degrading enzyme, respectively; kr and krb are the reverse reaction rates of the binding of free transcription factor and DNA-bound transcription factor to the protein-degrading enzyme, respectively; kc and kcb are the forward reaction rates of enzyme function and release for the enzyme-free-transcription-factor complex and the enzyme-DNA-bound-transcription-factor-complex, respectively; and γ is the dilution rate of total transcription factor due to cell growth. We assume that the degradation rate is not directly affected by the binding of inducers to transcription factors.
The set of ordinary differential equations which model the degradation process is:
In steady state dTE/dt=0, dTbE/dt=0, which leads to:
The decay of free and bound transcription factor can be expressed by:
Substituting Eq. 29 into Eq. 30, we get:
The sum of free protein-degrading enzyme E and bound enzyme to the transcription factors (TE and TbE) is equal to the total enzyme concentration (ET):
E
T
=E+TE+T
b
E (32)
Substituting Eq. 29.1 and Eq. 29.2 into Eq. 32, we can express the concentration of free protein-degrading enzyme as:
In the general case where there are multiple protein species that are degraded by enzyme E, the concentration of free protein-degrading enzyme can be described as:
Where i pertains to different free proteins and transcription factors, and j is different bound transcription factors to DNA. In this model, the degradation of free transcription factors or proteins is significantly faster than the degradation of bound transcription factors to DNA such that most protein-degrading enzyme is typically free or associated with bound transcription factors. Therefore, if we assume that T/Ki<<Tbi/Kbi the free protein-degrading enzyme concentration can be expressed by:
Substituting the general form of the free protein-degrading enzyme concentration (Eq. 35) into Eq. 31, the general decay of free and bound transcription factors can be modeled as:
The steady-state mass action model assumes that there is a balance between the overall production rate and the degradation rate of the transcription factor’:
where G is the total production rate. The sum of the free and the bound forms of transcription factor to DNA is equal to the total transcription factor (TTi=Ti+Tbi):
In steady state we get:
Where μeff is given by:
and me “protection parameter”
The protection parameter generally varies in the range 0≦θi≦1, with two extreme cases:
Positive-feedback loops are commonly used motifs in genetic circuits and depending on their context exhibit different behavior, including bi-stability in toggle-switch circuits11 and hysteresis in digital memory devices12. While positive feedback has many different forms, the simplest form of genetic positive feedback is the production of a transcriptional activator by its promoter (
A schematic diagram that represents LuxR positive feedback is shown in
where g is the production rate for induced promoter expression and Basal is the basal level. Similarly, the schematic diagram for AraC positive feedback is shown in
The modeling and experimental results are presented in
The shunt circuit with positive feedback is depicted in
where subscripts with “1” refer to the positive-feedback plasmid and subscripts with “2” refer to the shunt plasmid.
In the case that the plasmids are distributed uniform inside the cell, we can assume that the distance between the plasmid copy numbers Δx is approximately equal to (V/N)1/3, where N is the total plasmid copy number and V is the cell volume. Since the jumping of transcription factors between the plasmids is described by a 3D diffusion process, we can express the jumping time as14:
The forward reaction rate of TF binding to DNA is inversely proportional to the search time, such that:
K
d1
=K
−11·τslide1 (45.1)
K
d2
=K
−12·(τslide2+τjump). (45.2)
where Kd1 and Kd2 are the dissociation constants of the transcription factor for the PF plasmid and shunt plasmid respectively, K−11 and K−12 are proportional to the reverse reaction rates of the transcription factor binding to the promoter of the PF plasmid and shunt plasmid, respectively, and τslide1 and τslide2 are the sliding times of the transcription factor in the PF plasmid and shunt plasmid, respectively. If we assume that the sliding time is not dependent on the plasmid copy number, then dividing Eq. 45.1 by Eq. 45.2 yields:
where D is the diffusion coefficient, and (k2=ln(2)/τslide2) is a rate constant that describes transcription-factor binding to the shunt-plasmid promoter.
We note two important points:
In our models, transcription-factor diffusion processes only influence the Kd of the shunt plasmid and not that of the PF plasmid. Therefore, Kd1 is defined as the reference dissociation constant (when the distance between the TF gene and its cognate binding site on the same plasmid is less than 1 Kbp13 or the search type is local).
When we fit our model (
The experimental and modeling results of the PF-shunt circuit for LuxR and AraC with different copy numbers are presented in
Using transcriptional activators and repressors in multi-component circuits, we developed several synthetic analog gene circuits. The first circuit gives a wide-dynamic-range negative-slope logarithm (
Where LacIT is the total Lad concentration, Km is the dissociation constant between IPTG and LacI, and h1 is the Hill coefficient which represents cooperativity between IPTG and Lad. The concentration of free Lad is expressed by:
LacI=LacI
T
−LacI
C (48)
We consider three possible binding states for the PlacO promoter: (1) The promoter is empty with probability 1, (2) Free LacI repressor is bound to the promoter with probability LacI/Kdf, and (3) IPTG-LacI complex (LacIC) is bound to the promoter with probability LacIc/Kd, where Kdf<<Kd. The probability of the PlacO promoter being in an open complex P is described by the following equation:
where ni represents the cooperativity between Lad and the promoter. In the work described herein, we used the PlacO promoter in two networks:
The genetic circuit of the wide-dynamic-range negative-slope is shown in
The WDR PF-shunt subcircuit of
The dissociation constant for binding of LuxR to the Plux promoter is defined according to Eq. 47. We use
where N is sum of the high and the low copy number and
where N is low copy number. Subscripts ‘1’, ‘2’, and ‘3’ correspond to the Plux1, Plux2, and Plux3 promoters in
The experimental characterization and the modeling results of the PlacO promoter are shown in
G=g·O
T
·P, (50)
where g is the production rate, OT is number of PlacO binding sites, and P is the probability of the PlacO promoter being in an open complex (Eq. 49). Since the output of the PlacO promoter is the mCherry reporter protein, the degradation rate is calculated according to:
μeff=μ0+γ (51)
Model parameters are listed in Table 2. We found that the ratio
is consistent with published parameters16.
By combining the WDR PF-shunt subcircuit of
We used negative feedback to create a genetic power-law circuit (
N is the copy number of the high copy plasmid (HCP). The experimental and modeling results of the power-law circuit are shown in
We constructed four open loop circuits to test the effect of adding a shunt plasmid. The first circuit is shown in
The experimental and modeling results of the open-loop circuits are shown in
We constructed two open loop circuits with AraC. The first circuit is shown in
To test the specific effect of the shunt on linearization, we constructed a new circuit (
As described herein, we fit our experimental results to simple mathematical approximations which enable straightforward analog circuit design. These approximations are not based on physical parameters as discussed in also herein, and are useful in allowing quick design and insights into circuit behavior.
General genetic circuits including our wide-dynamic-range PF-shunt circuit can be empirically approximated by a simple Hill function8:
where In is the inducer concentration (AHL, Arab), n is the Hill coefficient, a is an amplification parameter, d is the basal level of expression and f( ) represents the output. The Hill function xn/(1+xn) can be re-written as:
For small values of ln(1+xn), we get:
Then, we approximate our PF-shunt output as:
For (In/b)n>1, we can approximate Eq. 56 as:
In practice, a and n are represented by one parameter a′=an and n is set to 1 in all fits.
Because log-domain electronic circuits obey the exponential laws of Boltzmann thermodynamics like biochemical circuits do, highly accurate biochemical functions and Hill-function approximations thereof can be implemented by analog circuits that only use a single transistor or a handful of transistors1,20. Therefore, the ln(1+x) function is a good approximation for describing the input-output behavior of electronic circuits as well.
The wide-dynamic-range negative-slope circuit includes two stages:
According to the approximation of Eq. 55, PlacO promoter activity is then well-fit by:
The fitting results for PlacO promoter activity are shown in
The fitting results are shown in
For 1<<AHL/b1, we get a negative-slope logarithm function:
External tuning of the multi-stage analog circuits described herein via inducers is not essential in the frameworks described herein, which is an advantage for the scalability of our circuits in situations where an inducer may be not be available. For example,
The log-linear adder circuit can be fit by the simple expression, indicating a sum of log-transformed inputs:
The ratiometer can be fit by the simple mathematical expression, indicating a difference between log-transformed inputs:
In the case that a1=a2=a:
In
From
from the LCP. Here, G1 represents maximal production from the PlacO promoter. Similarly, from the HCP,
where G2 represents maximal production from the PBAD promoter. These two equations need to be consistent as per the negative-feedback loop of
According to Eq. 46.1, for the LacI production from the HCP we get:
Similarly, from Eq. 46.1, for the AraC production from the LCP we get:
For large NHCP we get:
In the range where
Thus, we have a power-law circuit as confirmed by the measurements of
Analog functions can be integrated with digital control as a powerful mixed-signal strategy for tuning dynamic circuit behavior. To demonstrate such functionality, we built a positive-logarithm circuit that could be toggled by the presence or absence of an input inducer (
The same circuit can implement a negative-logarithm circuit with AHL as its input that can be digitally toggled by the presence or absence of arabinose. As shown in
We constructed a new wide-dynamic-range PF-shunt circuit with two identical promoters on the shunt HCP. The circuit is shown in
Time-course experiments were performed on our AHL wide-dynamic-range circuit positive-logarithm circuit described herein (the circuit of
Once the diluted cultures grew to an OD600 of ˜0.5 (˜3 hours), 20 μl of culture was moved into a new 96-well plate containing 200 μl of media, antibiotics, and inducers and then incubated in a VWR microplate shaker at 37° C. and 700 rpm.
At OD600 ˜0.5, 50 μl of culture was moved to a 96-well plate with 200 μl of PBS and taken to a FACS machine for measurement. In addition, 20 μl of culture was moved into a new 96-well plate containing 200 μl of media, antibiotics, and inducers and then incubated in a VWR microplate shaker at 37° C. and 700 rpm. This iterative dilution, growth, and measurement process was repeated over 10 hours.
The experimental results corresponding to different times are shown in
Herein, we explore the effects of our circuit motifs described herein on sensitivity. If we change the input signal In to In+ΔIn and measure the response Δf in the output signal f, then the sensitivity is defined as24:
where < > denotes the stationary values of In and f.
We calculate the sensitivity for input-output transfer curves that fit a log-linear function and for input-output transfer curves that fit a Hill function:
If the input-output transfer curve does not saturate and fits a log-linear function (Eq. 56); for example, in our PF-and-shunt circuits, then:
If the input-output transfer curve saturates and fits a Hill function (Eq. 53), for example, in circuits with strong positive feedback and in circuits with open-loop motifs, then:
In
As described in Madar et al. and illustrated in
Rewriting Eq. 70.3 by substituting in Eq. 72, the sensitivity of our analog circuits can be defined as:
where d in Eq. 70.3, is defined as the basal level (Basal) of the transfer function.
Based on Eq. 73, the sensitivity is influenced by the IDR and the ratio between the basal level and the maximum output, a.
In this section, we describe minimal models for graded positive feedback without a shunt and for graded positive feedback with a shunt that are based only on biochemical reactions. These minimal models, while sacrificing some accuracy compared to our previously described complex biophysical models, nevertheless provide insight and intuition about the mechanism of linearization enabled by positive feedback. For example, they reveal that the use graded positive-feedback enables linearization and wide-dynamic-range operation on just a single plasmid if the Kd for biochemical binding of the transcription-factor complex to DNA is appropriate: The strength of the positive feedback, which depends on this Kd, must not be too strong to yield latching or reduced-dynamic-range analog operation; it must not be too weak to make the positive feedback ineffective at compensating for saturating effects. Indeed, our scheme for widening the log-linear dynamic range of operation via graded positive feedback is conceptually general and applies to both genetic and electronic circuits: expansive sin h-based linearization of compressive tan h-based functions in log-domain electronic circuits27 is analogous to the use of expansive positive-feedback linearization of compressive biochemical binding functions in log-domain genetic circuits, and such circuits show an optimum as well.
The set of the biochemical reactions which describe graded positive feedback without a shunt can be described by:
I
n
+T⇄T
C (79.1)
T
C
+DNA
LCP
⇄G
LCP (79.2)
G
LCP
→G
LCP
+T (79.3)
T→φ (79.4)
Eq. 79.1 describes the binding reaction of the inducer to the transcription factor. Eq. 79.2 describes the binding of the complex to the promoter. Eq. 79.3 describes the positive feedback loop and Eq. 79.4 describes the degradation of the transcription factor due to dilutive cell division. We define the input dynamic range (IDR) as the ratio of the input concentrations required for 90% and 10% of the maximal output25 as shown in
A minimal set of biochemical equations for graded positive feedback involving a shunt are given by:
I
n
+T⇄T
C (80.1)
T
C
+DNA
LCP
⇄G
LCP (80.2)
T
C
+DNA
HCP
⇄G
HCP (80.3)
G
LCP
→G
LCP
+T (80.4)
G
HCP
→G
HCP+Signal (80.5)
T→φ (80.6)
Signal→φ (80.7)
Eq. 80.1 describes the binding of the inducer to the transcription factor. Eq. 80.2 and Eq. 80.3 describe the binding of the complex to the promoter on the LCP and HCP. For simplicity in the minimal model, we assume that the forward and reverse rates of binding to the LCP and HCP are equal. Eq. 80.4 describes the positive-feedback loop and Eq. 80.5 describes the expression of the signal by the shunt. The final two reactions describe the degradation of the transcription factor and the signal, which we assume is identical due to dilutive cell division. The simulation results are shown in
All fluorescence intensities presented in the data described herein were smoothed using Matlab.
Strains and Plasmids.
All plasmids in this work were constructed using basic molecular cloning techniques (Supplementary Information). E. coli 10β (araD139 Δ(ara-leu)7697 fhuA lacX74 galK (φ80 Δ(lacZ)M15) mcrA galU recA1 endA1 nupG rpsL (StrR) Δ(mrr-hsdRMS-mcrBC)) or E. coli EPI300 (F− mcrA Δ(mrr-hsdRMS-mcrBC) Φ80dlacZΔM15 ΔlacX74 recA1 endA1 araD139 Δ(ara, leu)7697 galU galK λ-rpsL (StrR) nupG trfA tonA), where noted, were used as bacterial hosts for the circuits in
Circuit Characterization.
Overnight cultures of E. coli strains were grown from glycerol freezer stocks at 37° C. 300 rpm in 3 mL of Luria-Bertani (LB)-Miller medium (Fisher #BP1426-2), with appropriate antibiotics: carbenicillin (50 μg/ml), kanamycin (30 μg/ml), chloramphenicol (25 μg/ml). The inducers used were arabinose, isopropyl-β-D-1-thiogalactopyranoside, and AHL 3OC6HSL (Sigma-Aldrich #K3007-10MG). Where appropriate, COPYCONTROL24 from Epicentre (Madison, Wis.) was added to overnight cultures at 1× active concentration. Overnight cultures were diluted 1:100 into 3 mL fresh LB and antibiotics and were incubated at 37° C. 300 rpm for 20 minutes. 200 μl of cultures were then moved into 96-well plates, combined with inducers, and then incubated for 4 hours and 20 minutes in a VWR microplate shaker shaking at 37° C. and 700 rpm, arriving at OD600 of ˜0.6-0.8.
Cells were then diluted 4-fold into a new 96-well plate containing fresh 1×PBS and immediately assayed using a BD LSRFORTESSA-HTS. At least 50,000 events were recorded for all data, which was then gated by forward scatter and side scatter using CYFLOGIC v.1.2.1 software (CyFlo, Turku, Finland). The geometric means of the gated fluorescence distributions were calculated by Matlab. Fluorescence values are based on geometric means of flow cytometry populations from three experiments and the error bars represent standard errors of the mean.
All the plasmids in this work were constructed using basic molecular cloning techniques19. New England Biolab's (Beverly, Mass.) restriction endonucleases, T4 DNA Ligase, and Taq Polymerase were used. PCRs were carried out with a BIO-RAD S1000™ Thermal Cycler With Dual 48/48 Fast Reaction Modules. Synthetic oligonucleotides were synthesized by Integrated DNA Technologies (Coralville, Iowa). As described in the Methods Summary, plasmids were transformed into E. coli 10β (araD139 Δ(ara-leu)7697 fhuA lacX74 galK (φ80 Δ(lacZ)M15) mcrA galU recA1 endA1 nupG rpsL (StrR) Δ(mrr-hsdRMS-mcrBC)), E. coli EPI300 (F− mcrA Δ(mrr-hsdRMS-mcrBC) φ80dlacZΔM15 ΔlacX74 recA1 endA1 araD139 Δ(ara, leu)7697 galU galK rpsL (StrR) nupG trfA tonA), or E. coli MG1655 Pro which contains integrated constitutive constructs for TetR and Lad proteins (
All devices (promoter-RBS-gene-terminator) were initially assembled in the Lutz and Bujard expression vector pZE11G15 containing ampicillin resistance and the ColE1 origin of replication. Parts are defined as promoters, RBSs, genes, and terminators. Manipulation of different parts of the same type were carried out using the same restriction sites. For example, to change a gene in a device we used KpnI and XmaI. To assemble two devices together, we used a single restriction site flanking one device and used oligonucleotide primers and PCR to add that restriction site to the 5′ and 3′ ends of a second device. After assembling devices in the ampicillin-resistant ColE1 backbone, antibiotic-resistance genes were changed using AatII and SacI, and origin of replications were changed with SacI and AvrII. For gene fusions, oligonucleotide primers were designed to delete the stop codon in the C-terminus of the first gene as well as the start codon in the N-terminus of the second gene and to insert a 12-bp (Gly-Gly-Ser-Gly) linker between the two genes. The genes were amplified separately with appropriate primers using standard PCR techniques and the PCR products were assembled in a subsequent PCR reaction with the linker region serving as means of annealing the two templates. The variable copy plasmid (VFP) containing Plux positive feedback was built by adding an AatII site to the 5′ end and a PacI site to the 3′ end of the Plux positive feedback device using PCR. This PCR product was cloned into the AatII and Pad sites of a pBAC/oriV vector17.
For each experiment, fluorescence readings were taken on a BioTek Synergy H1 Microplate reader using BioTek Gen5 software to determine the minimum and maximum expression level for cultures in each 96-well plate. GFP fluorescence was quantified by excitation at wavelength 484 nm and emission at wavelength 510 nm. mCherry fluorescence was quantified by excitation at wavelength 587 nm and emission at wavelength 610 nm. The gain of the plate reader was automatically sensed and adjusted by the machine.
Cultures containing the minimum and maximum fluorescence levels, as determined by the plate reader, were used to calibrate the FITC and PE-TexasRed filter voltages on a BD LSRFORTESSA-HTS in order to measure GFP and mCherry expression levels, respectively. The FACS voltages were adjusted using BD FACSDIVA software so that the maximum and minimum expression levels could be measured with the same voltage settings. Thus, consistent voltages were used across each entire experiment. The same voltages were used for subsequent repetitions of the same experiment. GFP was excited with a wavelength 488 nm laser and mCherry was excited with a wavelength 561 nm laser. Voltage compensation for FITC and PE-TexasRed was not necessary for any experiments.
aParameter was set according to the literature
bKd/Kdf was set according to the literature
cFor the wide-dynamic-range negative-slope circuit we obtained 1.65 for this parameter. In the negative-feedback circuit where mCherry is fused to the C-terminus of LacI we obtained 1.4.
This application claims benefit under 35 U.S.C. §119(e) of U.S. Provisional Patent Application Ser. No. 61/623,936 filed on Apr. 13, 2012, the contents of which are incorporated herein in their entirety by reference.
This invention was made with Government support under Grant No. CCF-1124247 awarded by the National Science Foundation, under Grant No. N00014-11-1-0725 awarded by the Office of Naval Research, and under Grant No. FA8721-05-C-0002 awarded by the U.S. Air Force. The Government has certain rights in this invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2013/036411 | 4/12/2013 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61623936 | Apr 2012 | US |