LIGHT-ACTIVATED, CALCIUM-GATED POLYPEPTIDE AND METHODS OF USE THEREOF

INTRODUCTION

Calcium indicators that signal a change in intracellular calcium concentration are useful in a variety of applications. For example, neuronal activity is tightly coupled to rises in cytosolic calcium, both in distal dendrites and in the cell body, or soma, of neurons. Consequently, a very important class of tools for studying calcium signaling is real-time fluorescence calcium indicators, including the GCaMP series and small-molecule dyes such as Fura-2 and Fluo-4. However, these tools have two important limitations. First, the real-time imaging required for the use of calcium indicators is both technically demanding and restricted to small fields of view, should one desire single-cell resolution. Second, these indicators allow one to only passively observe calcium patterns, but not to respond to them—for example, to selectively manipulate or further characterize subsets of neurons based on their history of activity.

There is a need in the art for compositions and methods for detecting, and responding to, changes in intracellular calcium levels.

SUMMARY

The present disclosure provides a light-activated, calcium-gated polypeptide; and a system comprising: a) the light-activated, calcium-gated polypeptide; and b) a fusion protein comprising a calcium responsive polypeptide and a protease that cleaves a proteolytically cleavable linker present in the light-activated, calcium-gated polypeptide. The present disclosure provides nucleic acids encoding the light-activated, calcium-gated polypeptide or the system, and cells comprising the nucleic acids. The present disclosure provides methods of detecting a change in intracellular calcium ion concentration. The present disclosure provides methods of controlling or modulating an activity of a cell.

The present disclosure provides a light-activated, calcium-gated transcriptional control polypeptide; and a system comprising: a) the light-activated, calcium-gated transcriptional control polypeptide; and b) a fusion protein comprising a calcium responsive polypeptide and a protease that cleaves a proteolytically cleavable linker present in the light-activated, calcium-gated transcriptional control polypeptide. The present disclosure provides nucleic acids encoding the light-activated, calcium-gated transcriptional control polypeptide or the system, and cells comprising the nucleic acids. The present disclosure provides methods of detecting a change in intracellular calcium ion concentration. The present disclosure provides methods of controlling or modulating an activity of a cell.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A-1C depicts the FLARE design and optimization of calcium response.

FIG. 2 provides a table of published TEV protease catalytic constants.

FIG. 3A-3C depicts light gating upon LOV domain insertion.

FIG. 4A-4D depicts the directed evolution of the LOV domain.

FIG. 5 depicts FACS plots showing library progression during directed evolution of the LOV domain.

FIG. 6 depicts the sequencing analysis of clones derived from the directed evolution of the LOV domain.

FIG. 7A-7C depicts FACS plots showing the analysis of specific LOV mutants.

FIG. 8A-8B depicts immunofluorescence images showing the directed evolution of the LOV domain.

FIG. 9 depicts an immunofluorescence image showing light gating by eLOV in vivo.

FIG. 10A-10G depicts the FLARE design and optimization of calcium response in neurons.

FIG. 11A-11B depicts the screening of alternative TEV cleavage sites.

FIG. 12A-12B depicts the analysis of FLARE sensitivity in neurons.

FIG. 13A-13B depicts the functional reactivation of neurons marked by FLARE.

FIG. 14 depicts immune fluorescence images showing the results of a second FLARE design.

FIG. 15A-15G provide amino acid sequences of LOV domains of light-activated polypeptides.

FIG. 16A-16B provide amino acid sequences of calmodulin.

FIG. 17A-17D provide amino acid sequences of calmodulin-binding polypeptides.

FIG. 18 provides an amino acid sequence of troponin C.

FIG. 19A-19B provide amino acid sequences of troponin I polypeptides.

FIG. 20A-20D provide amino acid sequences of tobacco etch virus (TEV) protease.

FIG. 21 depicts the amino acid sequence of a Streptomyces pyogenes Cas9 polypeptide.

FIG. 22 depicts the amino acid sequence of a Staphylococcus aureus Cas9 polypeptide.

FIG. 23 provides amino acid sequences of various depolarizing opsins.

FIG. 24 provides amino acid sequences of various hyperpolarizing opsins.

FIG. 25A-25B provide an amino acid sequence of a FLARE component 1 of the present disclosure (e.g., a FLARE component comprising calmodulin-binding polypeptide, a LOV domain polypeptide, a proteolytically cleavable crosslinker, and a transcription factor) (FIG. 25A); and amino acid sequences of the FLARE component 1 (FIG. 25B).

FIG. 26A-26B provide an amino acid sequence of a FLARE component 2 of the present disclosure (e.g., a FLARE component comprising a calmodulin polypeptide and a TEV protease) (FIG. 26A); and amino acid sequences of the FLARE component 2 (FIG. 26B).

FIG. 27 provides a nucleotide sequence of a FLARE component 3 of the present disclosure (e.g., a FLARE component comprising a promoter operably linked to a nucleotide sequence encoding a fluorescent protein.

FIG. 28A-28E depict activity of FLARE in vivo.

DEFINITIONS

The terms “polynucleotide” and “nucleic acid,” used interchangeably herein, refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. Thus, this term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.

“Operably linked” refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. For instance, a promoter is operably linked to a coding region of a nucleic acid if the promoter affects transcription or expression of the coding region of a nucleic acid.

A “vector” or “expression vector” is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, i.e. an “insert”, may be attached so as to bring about the replication of the attached segment in a cell.

“Heterologous,” as used herein, refers to a nucleotide or polypeptide sequence that is not found in the native (e.g., naturally-occurring) nucleic acid or protein, respectively.

As used herein, the term “affinity” refers to the equilibrium constant for the reversible binding of two agents (e.g., a protease and a polypeptide comprising a protease cleavage site) and is expressed as Km. Km is the concentration of peptide at which the catalytic rate of proteolytic cleavage is half of Vmax (maximal catalytic rate). Km is often used in the literature as an approximation of affinity when speaking about enzyme-substrate interactions.

The term “binding” refers to a direct association between two molecules, due to, for example, covalent, electrostatic, hydrophobic, and ionic and/or hydrogen-bond interactions, including interactions such as salt bridges and water bridges. “Specific binding” refers to binding with an affinity of at least about 10⁻⁷M or greater, e.g., 5×10⁻⁷M, 10⁻⁸M, 5×10⁻⁸M, and greater. “Non-specific binding” refers to binding with an affinity of less than about 10⁻⁷M, e.g., binding with an affinity of 10⁻⁶M, 10⁻⁵M, 10⁻⁴M, etc.

The terms “polypeptide,” “peptide,” and “protein”, used interchangeably herein, refer to a polymeric form of amino acids of any length, which can include genetically coded and non-genetically coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones. The term includes fusion proteins, including, but not limited to, fusion proteins with a heterologous amino acid sequence, fusions with heterologous and homologous leader sequences, with or without N-terminal methionine residues; immunologically tagged proteins; and the like.

An “isolated” polypeptide is one that has been identified and separated and/or recovered from a component of its natural environment. Contaminant components of its natural environment are materials that would interfere with diagnostic or therapeutic uses for the polypeptide, and may include enzymes, hormones, and other proteinaceous or nonproteinaceous solutes. In some embodiments, the polypeptide will be purified (1) to greater than 90%, greater than 95%, or greater than 98%, by weight of antibody as determined by the Lowry method, for example, more than 99% by weight, (2) to a degree sufficient to obtain at least 15 residues of N-terminal or internal amino acid sequence by use of a spinning cup sequenator, or (3) to homogeneity by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) under reducing or nonreducing conditions using Coomassie blue or silver stain. Isolated polypeptide includes the polypeptide in situ within recombinant cells since at least one component of the polypeptide's natural environment will not be present. In some instances, isolated polypeptide will be prepared by at least one purification step.

The term “genetic modification” refers to a permanent or transient genetic change induced in a cell following introduction into the cell of a heterologous nucleic acid (e.g., a nucleic acid exogenous to the cell). Genetic change (“modification”) can be accomplished by incorporation of the heterologous nucleic acid into the genome of the host cell, or by transient or stable maintenance of the heterologous nucleic acid as an extrachromosomal element. Where the cell is a eukaryotic cell, a permanent genetic change can be achieved by introduction of the nucleic acid into the genome of the cell. Suitable methods of genetic modification include viral infection, transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate precipitation, direct microinjection, use of a CRISPR/Cas9 system, and the like.

A “host cell,” as used herein, denotes an in vivo or in vitro eukaryotic cell, or a cell from a multicellular organism (e.g., a cell line) cultured as a unicellular entity, which eukaryotic cells can be, or have been, used as recipients for a nucleic acid (e.g., an expression vector that comprises a nucleotide sequence encoding an eLOV polypeptide; or any other nucleic acid or expression vector described herein), and include the progeny of the original cell which has been genetically modified by the nucleic acid. It is understood that the progeny of a single cell may not necessarily be completely identical in morphology or in genomic or total DNA complement as the original parent, due to natural, accidental, or deliberate mutation. A “recombinant host cell” (also referred to as a “genetically modified host cell”) is a host cell into which has been introduced a heterologous nucleic acid, e.g., an expression vector. For example, a genetically modified eukaryotic host cell is genetically modified by virtue of introduction into a suitable eukaryotic host cell of a heterologous nucleic acid, e.g., an exogenous nucleic acid that is foreign to the eukaryotic host cell, or a recombinant nucleic acid that is not normally found in the eukaryotic host cell, where such nucleic acids and expression vectors are described herein.

Before the present invention is further described, it is to be understood that this invention is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.

Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, the preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.

It must be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a transcription factor” includes a plurality of such transcription factors and reference to “the proteolytically cleavable linker” includes reference to one or more proteolytically cleavable linkers and equivalents thereof known to those skilled in the art, and so forth. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination. All combinations of the embodiments pertaining to the invention are specifically embraced by the present invention and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all sub-combinations of the various embodiments and elements thereof are also specifically embraced by the present invention and are disclosed herein just as if each and every such sub-combination was individually and explicitly disclosed herein.

The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.

DETAILED DESCRIPTION

A system of the present disclosure is a calcium- and light-gated system. Thus, a system of the present disclosure provides an “AND” gate that can be used to detect a change in intracellular calcium ion concentration, e.g., in response of a cell to any of a variety of stimuli. A system of the present disclosure provides a high signal-to-noise (S/N) ratio. A system of the present disclosure can be used to control an activity of a cell. For example, once a change in intracellular calcium ion concentration in the cell is detected, one or more activities of the cell can be modulated in response. An activity of the cell can be activated; or an activity of the cell can be inhibited. Thus, a system of the present disclosure provides a means not only to detect a change in intracellular calcium ion concentration, but to react to the change by modulating an activity of the cell. Furthermore, a change in intracellular calcium ion concentration can be detected in a temporal manner using a system of the present disclosure; i.e., the change can be detected over time. In addition to, or as an alternative to, modulating (e.g., controlling) an activity of a cell in response to an increase in intracellular calcium ion concentration, the cell can be further characterized; for example, a cell can be further characterized by any of a variety of techniques, including, e.g., proteomic analysis, transcriptomic analysis, imaging with a real-time calcium indicator, imaging with a synaptic marker, etc.

FIG. 1A presents a schematic representation of certain embodiments of a system of the present disclosure. Some embodiments of a system of the present disclosure, e.g., embodiments comprising a transcription factor, are also referred to as “FLARE” for Fast Light and Activity Reporter giving Expression. As depicted schematically in FIG. 1A, a FLARE system of the present disclosure comprises two polypeptides: 1) a first polypeptide comprises: a) a transmembrane domain; b) a polypeptide that binds a calcium-responsive polypeptide; c) a LOV light-activated polypeptide; d) a proteolytically cleavable linker that is caged by the LOV light-activated polypeptide, and that becomes uncaged upon exposure of the LOV light-activated polypeptide to light of an activating wavelength (e.g., blue light); and e) a transcription factor; and 2) a second comprises: a) a calcium-responsive polypeptide; and b) a protease that cleaves the proteolytically cleavable linker.

As depicted in the left panel of FIG. 1A, in the absence of light of an activating wavelength, and under conditions of low intracellular Ca²⁺ concentration, the first polypeptide and the second polypeptide do not substantially bind to one another, as the polypeptide that binds the calcium-responsive polypeptide present in first polypeptide and the calcium-responsive polypeptide present in second polypeptide do not substantially bind to one another under conditions of low intracellular calcium concentration. Furthermore, even if the first polypeptide and the second polypeptide were to bind to one another, since the LOV light-activated polypeptide cages the proteolytically cleavable linker in the absence of light of an activating wavelength, the proteolytically cleavable linker is not accessible to the protease. Thus, two signals are required for: 1) binding of the calcium-responsive polypeptide to the polypeptide that binds the calcium-responsive polypeptide; and 2) cleavage of the proteolytically cleavable linker by the protease.

As shown in the right panel of FIG. 1A, in the presence of a high intracellular Ca²⁺ concentration in the cell, and upon exposure of the cell to light of an activating wavelength, the first polypeptide and the second polypeptide bind to one another. The high intracellular Ca²⁺ concentration in the cell triggers binding of the calcium-responsive polypeptide present in the second polypeptide to the polypeptide that binds the calcium-responsive polypeptide present in the first polypeptide. Exposure of the cell to light of an activating wavelength induces a conformational change in the LOV light-activated polypeptide, exposing the proteolytically cleavable linker in the first polypeptide to the protease present in the second polypeptide. Cleavage of the proteolytically cleavable linker releases the transcription factor, which can enter the nucleus and modulate transcription of a coding region operably linked to a promoter that is recognized by the transcription factor. The coding region can encode any of a variety of gene products, including, e.g., an inhibitory RNA; a guide RNA; a reporter gene product; an opsin; a toxin; a DREADD; an RNA-guided endonuclease; a kinase; a biotin ligase; a transcription factor; a recombinase; an antibiotic resistance factor; a calcium sensor; a peroxidase; a fluorescent protein; a synaptic marker; etc.

A FLARE system of the present disclosure, when present in a cell, provides a signal-to-noise ratio of at least 3:1, at least 4:1, at least 5:1, at least 6:1, at least 7:1, at least 8:1, at least 9:1, at least 10:1, from 10:1 to 15:1, from 15:1 to 20:1, or more than 20:1 (e.g., from 20:1 to 50:1, from 50:1 to 100:1, from 100:1 to 150:1, or more than 150:1); i.e., the signal produced when the cell is exposed to light of an activating wavelength (e.g., blue light) and to a second signal that increases the intracellular calcium concentration in the cell above about 100 nM is at least 2-fold, at lease 3-fold, at least 4-fold, at least 5-fold, at least 6-fold, at least 7-fold, at least 8-fold, at least 9-fold, at least 10-fold, at least 15-fold, at least 20-fold, or more than 20-fold (e.g., more than 25-fold, more than 50-fold, more than 75-fold, more than 100-fold, more than 125-fold, or more than 150-fold), higher than the signal produced by the cell when the cell is: i) not exposed to either light of an activating wavelength or to a second signal that increases the intracellular calcium concentration in the cell above about 100 nM; ii) exposed to light of an activating wavelength, but not to a second signal that increases the intracellular calcium concentration in the cell above about 100 nM; or iii) exposed to a second signal that increases the intracellular calcium concentration in the cell above about 100 nM, but not to light of an activating wavelength.

A FLARE system of the present disclosure, its components, and methods of use are described in detail herein.

Light- and Calcium-Gated Systems

System 1.

The present disclosure provides a nucleic acid system comprising: A) a first nucleic acid comprising, in order from 5′ to 3′: a) a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide comprising, in order from amino terminus to carboxyl terminus: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV-domain light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 15A-15D; and iv) a proteolytically cleavable linker; and b) an insertion site for a nucleic acid comprising a nucleotide sequence encoding a polypeptide of interest; and B) a second nucleic acid comprising a nucleotide sequence encoding a second fusion polypeptide comprising: i) a calcium-binding polypeptide selected from a calmodulin polypeptide and troponin C polypeptide; and ii) a protease that cleaves the proteolytically cleavable linker. This nucleic acid system allows the user to insert into the insertion site a nucleic acid comprising a nucleotide sequence encoding a polypeptide of interest.

The present disclosure provides a nucleic acid system comprising: A) a first nucleic acid comprising, in order from 5′ to 3′: a) a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide comprising, in order from amino terminus to carboxyl terminus: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV-domain light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 15E-15G; and iv) a proteolytically cleavable linker; and b) an insertion site for a nucleic acid comprising a nucleotide sequence encoding a polypeptide of interest; and B) a second nucleic acid comprising a nucleotide sequence encoding a second fusion polypeptide comprising: i) a calcium-binding polypeptide selected from a calmodulin polypeptide and troponin C polypeptide; and ii) a protease that cleaves the proteolytically cleavable linker. This nucleic acid system allows the user to insert into the insertion site a nucleic acid comprising a nucleotide sequence encoding a polypeptide of interest.

In some cases, the insertion site is a multiple cloning site. For example, the insertion site can comprise multiple (e.g., 2, 3, 4, or more) restriction endonuclease cleavage sites. The insertion site can comprise a restriction endonuclease cleavage site; in such a case, a nucleic acid comprising a nucleotide sequence encoding a polypeptide of interest can comprise, at its 5′ and 3′ ends, nucleotide sequences (e.g., complementary overhangs) that anneal with the ends created by restriction endonuclease cleavage.

The insertion site is within 10 nucleotides (nt), within 9 nt, within 8 nt, within 7 nt, within 6 nt, within 5 nt, within 4 nt, within 3 nt, within 2 nt, or 1 nt, of the 3′ end of the nucleotide sequence encoding the light-activated, calcium-gated fusion polypeptide. The insertion site is positioned relative to the nucleotide sequence encoding the light-activated, calcium-gated fusion polypeptide such that, after insertion of a nucleic acid comprising a nucleotide sequence encoding a gene product of interest, and after transcription and translation, a fusion polypeptide comprising: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV-domain light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted any one of FIG. 15A-15D; iv) a proteolytically cleavable linker; and v) the gene product of interest, is produced.

The insertion site is within 10 nucleotides (nt), within 9 nt, within 8 nt, within 7 nt, within 6 nt, within 5 nt, within 4 nt, within 3 nt, within 2 nt, or 1 nt, of the 3′ end of the nucleotide sequence encoding the light-activated, calcium-gated fusion polypeptide. The insertion site is positioned relative to the nucleotide sequence encoding the light-activated, calcium-gated fusion polypeptide such that, after insertion of a nucleic acid comprising a nucleotide sequence encoding a gene product of interest, and after transcription and translation, a fusion polypeptide comprising: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV-domain light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted any one of FIG. 15E-15G; iv) a proteolytically cleavable linker; and v) the gene product of interest, is produced.

System 2.

The present disclosure provides nucleic acid system comprising: a) a first nucleic acid comprising a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide comprising, in order from amino terminus to carboxyl terminus: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 15A-15D; iv) a proteolytically cleavable linker; and v) a gene product of interest; and b) a second nucleic acid comprising a nucleotide sequence encoding a second fusion polypeptide comprising: i) a calcium-binding polypeptide selected from a calmodulin polypeptide and troponin C polypeptide; and ii) a protease that cleaves the proteolytically cleavable linker. Thus, in some cases, the present disclosure provides a nucleic acid system in which the first nucleic acid comprises a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide that comprises a gene product of interest.

The present disclosure provides nucleic acid system comprising: a) a first nucleic acid comprising a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide comprising, in order from amino terminus to carboxyl terminus: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 15E-15G; iv) a proteolytically cleavable linker; and v) a gene product of interest; and b) a second nucleic acid comprising a nucleotide sequence encoding a second fusion polypeptide comprising: i) a calcium-binding polypeptide selected from a calmodulin polypeptide and troponin C polypeptide; and ii) a protease that cleaves the proteolytically cleavable linker. Thus, in some cases, the present disclosure provides a nucleic acid system in which the first nucleic acid comprises a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide that comprises a gene product of interest.

A transmembrane domain, a calmodulin polypeptide, a calmodulin-binding polypeptide, a troponin C polypeptide, a troponin I polypeptide, a LOV-domain light-activated polypeptide, a proteolytically cleavable linker, and a protease, that can be encoded by a nucleotide sequence included in one or more embodiments of System 1 or System 2 are described below.

Polypeptides

The present disclosure provides a light-activated, calcium-gated polypeptide comprising, in order from amino terminus to carboxyl terminus: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV-domain light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 15A-15D; iv) a proteolytically cleavable linker; and v) a polypeptide of interest. The present disclosure provides a light-activated, calcium-gated polypeptide comprising, in order from amino terminus to carboxyl terminus: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV-domain light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 15E-15G; iv) a proteolytically cleavable linker; and v) a polypeptide of interest.

Suitable transmembrane domains, calmodulin-binding polypeptides, troponin I polypeptides, LOV-domain light-activated polypeptides, proteolytically cleavable linkers, and polypeptides of interest are described below.

In some cases, a light-activated, calcium-gated polypeptide of the present disclosure is isolated. In some cases, a light-activated, calcium-gated polypeptide of the present disclosure is present in a cell in vitro. In some cases, a light-activated, calcium-gated polypeptide of the present disclosure is present in a cell in vivo. Suitable cells are described below.

System Components

The present disclosure provides components of a system of the present disclosure, e.g., components of System 1 and System 2.

For example, the present disclosure provides a nucleic acid comprising: a) a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide comprising, in order from amino terminus to carboxyl terminus: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV-domain light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 15A-15D; and iv) a proteolytically cleavable linker; and b) an insertion site for a nucleic acid comprising a nucleotide sequence encoding a polypeptide of interest. In some cases, the nucleotide sequence encoding the light-activated, calcium-gated fusion polypeptide is operably linked to a promoter. Suitable promoters are described below. In some cases, the nucleic acid is present in a recombinant expression vector, e.g., a recombinant viral vector. Suitable vectors are described below. The present disclosure provides a genetically modified host cell that is genetically modified with the nucleic acid. The present disclosure provides a genetically modified host cell that is genetically modified with the recombinant expression vector. Suitable host cells are described below.

As another example, the present disclosure provides a nucleic acid comprising: a) a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide comprising, in order from amino terminus to carboxyl terminus: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV-domain light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 15E-15G; and iv) a proteolytically cleavable linker; and b) an insertion site for a nucleic acid comprising a nucleotide sequence encoding a polypeptide of interest. In some cases, the nucleotide sequence encoding the light-activated, calcium-gated fusion polypeptide is operably linked to a promoter. Suitable promoters are described below. In some cases, the nucleic acid is present in a recombinant expression vector, e.g., a recombinant viral vector. Suitable vectors are described below. The present disclosure provides a genetically modified host cell that is genetically modified with the nucleic acid. The present disclosure provides a genetically modified host cell that is genetically modified with the recombinant expression vector. Suitable host cells are described below.

As another example, the present disclosure provides a nucleic acid comprising a nucleotide sequence encoding a fusion polypeptide comprising: i) a calcium-binding polypeptide selected from a calmodulin polypeptide and troponin C polypeptide; and ii) a protease. In some cases, the nucleotide sequence encoding the fusion polypeptide is operably linked to a promoter. Suitable promoters are described below. In some cases, the nucleic acid is present in a recombinant expression vector, e.g., a recombinant viral vector. Suitable vectors are described below. The present disclosure provides a genetically modified host cell that is genetically modified with the nucleic acid. The present disclosure provides a genetically modified host cell that is genetically modified with the recombinant expression vector. Suitable host cells are described below.

As another example, the present disclosure provides a nucleic acid comprising a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide comprising, in order from amino terminus to carboxyl terminus: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV-domain light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 15A-15D; iv) a proteolytically cleavable linker; and v) a polypeptide of interest. In some cases, the nucleotide sequence encoding the light-activated, calcium-gated fusion polypeptide is operably linked to a promoter. Suitable promoters are described below. In some cases, the nucleic acid is present in a recombinant expression vector, e.g., a recombinant viral vector. Suitable vectors are described below. The present disclosure provides a genetically modified host cell that is genetically modified with the nucleic acid. The present disclosure provides a genetically modified host cell that is genetically modified with the recombinant expression vector. Suitable host cells are described below.

As another example, the present disclosure provides a nucleic acid comprising a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide comprising, in order from amino terminus to carboxyl terminus: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV-domain light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 15E-15G; iv) a proteolytically cleavable linker; and v) a polypeptide of interest. In some cases, the nucleotide sequence encoding the light-activated, calcium-gated fusion polypeptide is operably linked to a promoter. Suitable promoters are described below. In some cases, the nucleic acid is present in a recombinant expression vector, e.g., a recombinant viral vector. Suitable vectors are described below. The present disclosure provides a genetically modified host cell that is genetically modified with the nucleic acid. The present disclosure provides a genetically modified host cell that is genetically modified with the recombinant expression vector. Suitable host cells are described below.

Transmembrane Domain

Any of a variety of transmembrane domains (polypeptides) can be used in a light-activated, calcium-gated transcriptional control polypeptide of the present disclosure. A suitable transmembrane domain is any polypeptide that is thermodynamically stable in a membrane, e.g., a eukaryotic cell membrane such as a mammalian cell membrane. Suitable transmembrane domains include a single alpha helix, a transmembrane beta barrel, or any other structure.

A “mammalian cell membrane” includes the membrane of a membrane-bound organelle (e.g., the nucleus, a mitochondrion, a lysosome, the endoplasmic reticulum, the Golgi apparatus, a vacuole, a chloroplast); and the plasma membrane. Thus, a suitable transmembrane domain is in some cases a transmembrane domain that provides for insertion into the plasma membrane. In some cases, a suitable transmembrane domain provides for insertion into a chloroplast membrane. In some cases, a suitable transmembrane domain provides for insertion into a mitochondrial membrane. In some cases, a suitable transmembrane domain provides for insertion into a lysosome.

A suitable transmembrane domain can have a length of from about 10 to 50 amino acids, e.g., from about 10 amino acids to about 40 amino acids, from about 20 amino acids to about 40 amino acids, from about 15 amino acids to about 25 amino acids, e.g., from about 10 amino acids to about 15 amino acids, from about 15 amino acids to about 20 amino acids, from about 20 amino acids to about 25 amino acids, from about 25 amino acids to about 30 amino acids, from about 30 amino acids to about 35 amino acids, from about 35 amino acids to about 40 amino acids, from about 40 amino acids to about 45 amino acids, or from about 45 amino acids to about 50 amino acids.

Suitable transmembrane (TM) domains include, e.g., a Syne homology nuclear TM domain; a CD4 TM domain; a CD8 TM domain; a KASH protein TM domain; a neurexin3b TM domain; a Notch receptor polypeptide TM domain; etc.

For example, a CD4 TM domain can comprise the amino acid sequence MALIVLGGVAGLLLFIGLGIFF (SEQ ID NO://); a CD8 TM domain can comprise the amino acid sequence IYIWAPLAGTCGVLLLSLVIT (SEQ ID NO://); a neurexin3b TM domain can comprise the amino acid sequence GMVVGIVAAAALCILILLYAM (SEQ ID NO://); a Notch receptor polypeptide TM domain can comprise the amino acid sequence FMYVAAAAFVLLFFVGCGVLL (SEQ ID NO://).

Alternative Tethers

In some cases, in place of a transmembrane domain, the light-activated, calcium-gated fusion polypeptide comprises a polypeptide that tethers the light-activated, calcium-gated fusion polypeptide to actin. A suitable actin-binding polypeptide includes, e.g., filamin, spectrin, transgelin, fimbrin, villin, fascin, formin, tensin, tropomodulin, gelsolin, and actin-binding fragments thereof.

In some cases, in place of a transmembrane domain, the light-activated, calcium-gated fusion polypeptide comprises a polypeptide that excludes the light-activated, calcium-gated fusion polypeptide from the nucleus. Such a polypeptide can be a nuclear exclusion signal (NES) or nuclear export signal. Suitable NES polypeptides include, e.g., MVKELQEIRL (SEQ ID NO://); MTASALARMEV (SEQ ID NO://); LALKLAGLDI (SEQ ID NO://); LQKKLEELEL (SEQ ID NO://); LESNLRELQI (SEQ ID NO://); LCQAFSDVLI (SEQ ID NO://); MVKELQEIRLEP (SEQ ID NO://); LQKKLEELELA (SEQ ID NO://); LALKLAGLDIN (SEQ ID NO://); LQLPPLERLTLD (SEQ ID NO://); LQKKLEELELE (SEQ ID NO://); MTKKFGTLTI (SEQ ID NO://); LAEMLEDLHI (SEQ ID NO://); LDQQFAGLDL (SEQ ID NO://); LCQAFSDVIL (SEQ ID NO://); LPVLENLTL (SEQ ID NO://); and IQQQLGQLTLENLQML (SEQ ID NO://).

Another suitable protein is an estrogen receptor protein. For example, an estrogen receptor protein can comprise an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: PSAGDMRAANLWPSPLMIKRSKKNSLALSLTADQMVSALLDAEPPILYSEYDPTRPFSE ASMMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQVHLLECAWLEILMIGLVWRSM EHPVKLLFAPNLLLDRNQGKCVEGMVEIFDMLLATSSRFRMMNLQGEEFVCLKSIILLN SGVYTFLSSTLKSLEEKDHIHRVLDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRH MSNKGMEHLYSMKCKNVVPLYDLLLEAADAHRLHAPTSRGGASVEETDQSHLATAGS TSSHSLQKYYITGEAEGFPATA; where the amino acid sequence is a MyoD-ERT2 fusion polypeptide, comprising the ligand-binding domain of estrogen receptor (amino acids 203-440), a basic domain in helix-loop-helix proteins of the MYOD family (amino acids 1-114).

Calmodulin/Calmodulin-Binding Polypeptide

In some cases, the light-activated, calcium-gated fusion polypeptide comprises a calmodulin-binding polypeptide; and the second fusion polypeptide comprises a calmodulin polypeptide.

A suitable calmodulin-binding polypeptide binds a calmodulin polypeptide under conditions of high Ca²⁺ concentration. For example, a suitable calmodulin-binding polypeptide binds a calmodulin polypeptide when the concentration of Ca²⁺ is greater than 100 nM, greater than 150 nM, greater than 200 nM, greater than 250 nM, greater than 300 nM, greater than 350 nM, greater than 400 nM, greater than 500 nM, or greater than 750 nM.

A suitable calmodulin-binding polypeptide does not substantially bind a calmodulin polypeptide under conditions of low Ca²⁺ concentration. For example, a suitable calmodulin-binding polypeptide does not substantially bind a calmodulin polypeptide when the intracellular Ca²⁺ concentration is less than about 300 nM, less than about 250 nM, less than about 200 nM, less than about 110 nM, less than about 105 nM, or less than about 100 nM.

A calmodulin-binding polypeptide can have a length of from about 10 amino acids to about 50 amino acids, e.g., from about 10 amino acids to about 40 amino acids, from about 20 amino acids to about 40 amino acids, from about 15 amino acids to about 25 amino acids, e.g., from about 10 amino acids to about 15 amino acids, from about 15 amino acids to about 20 amino acids, from about 20 amino acids to about 25 amino acids, from about 25 amino acids to about 30 amino acids, from about 30 amino acids to about 35 amino acids, from about 35 amino acids to about 40 amino acids, from about 40 amino acids to about 45 amino acids, or from about 45 amino acids to about 50 amino acids.

A suitable calmodulin-binding polypeptide in some cases comprises an amino acid sequence having at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: KRRWKKNFIAVSAANRFKKISSSGAL (SEQ ID NO://); and has a length of from about 26 amino acids to about 30 amino acids.

In some cases, a suitable calmodulin-binding polypeptide comprises an amino acid sequence having at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: FNARRKLKGAILTTMLFTRNFS (SEQ ID NO://); and has a length of from 22 amino acids to about 25 amino acids. In some cases, a suitable calmodulin-binding polypeptide comprises an amino acid sequence having at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: FNARRKLKGAILTTMLFTRNFS (SEQ ID NO://); and has a K8 amino acid substitution; and has a length of from 22 amino acids to about 25 amino acids. In some cases, a suitable calmodulin-binding polypeptide comprises an amino acid sequence having at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: FNARRKLKGAILTTMLFTRNFS (SEQ ID NO://); and has a K8A amino acid substitution; and has a length of from 22 amino acids to about 25 amino acids. In some cases, a suitable calmodulin-binding polypeptide comprises an amino acid sequence having at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: FNARRKLKGAILTTMLFTRNFS (SEQ ID NO://); and has a T13 substitution; and has a length of from 22 amino acids to about 25 amino acids. In some cases, a suitable calmodulin-binding polypeptide comprises an amino acid sequence having at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: FNARRKLKGAILTTMLFTRNFS (SEQ ID NO://); and has a T13F substitution; and has a length of from 22 amino acids to about 25 amino acids. In some cases, a suitable calmodulin-binding polypeptide comprises the following amino acid sequence: FNARRKLKGAILFTMLFTRNFS; and has a length of 22 amino acids. In some cases, a suitable calmodulin-binding polypeptide comprises the following amino acid sequence: FNARRKLAGAILFTMLFTRNFS; and has a length of 22 amino acids.

In some cases, two copies of a calmodulin-binding polypeptide are used. For example, a calmodulin-binding polypeptide can comprise the amino acid sequence FNARRKLAGAILFTMLATRNFSGSFNARRKLAGAILFTMLATRNFS (SEQ ID NO://) which contains two copies of FNARRKLAGAILFTMLATRNFS (SEQ ID NO://) and an intervening Gly-Ser (GS) linker.

A suitable calmodulin polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following calmodulin amino acid sequence: MDQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG DGTIDFPEFLTMMARKMKYTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTD EEVDEMIREADIDGDGQVNYEEFVQMMTAK (SEQ ID NO://); and has a length of from about 148 amino acids to about 160 amino acids. In some cases, the calmodulin polypeptide has a length of 148 amino acids.

In some cases, a suitable calmodulin polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following calmodulin amino acid sequence: MDQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG DGTIDFPEFLTMMARKMKYTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTD EEVDEMIREADIDGDGQVNYEEFVQMMTAK (SEQ ID NO://); and has a substitution of F19; and has a length of from about 148 amino acids to about 160 amino acids. In some cases, the calmodulin polypeptide has a length of 148 amino acids. In some cases, the F19 substitution is an F19L substitution, an F19I substitution, an F19V substitution, or an F19A substitution.

In some cases, a suitable calmodulin polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following calmodulin amino acid sequence: MDQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG DGTIDFPEFLTMMARKMKYTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTD EEVDEMIREADIDGDGQVNYEEFVQMMTAK (SEQ ID NO://); and has a substitution of V35; and has a length of from about 148 amino acids to about 160 amino acids. In some cases, the calmodulin polypeptide has a length of 148 amino acids. In some cases, the V35 substitution is a V35G substitution, a V35A substitution, a V35L substitution, or a V35I substitution.

In some cases, a suitable calmodulin polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following calmodulin amino acid sequence: MDQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG DGTIDFPEFLTMMARKMKYTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTD EEVDEMIREADIDGDGQVNYEEFVQMMTAK (SEQ ID NO://); and has an F19 substitution (e.g., an F19L substitution, an F19I substitution, an F19V substitution, or an F19A substitution) and a V35 substitution (e.g., a V35G substitution, a V35A substitution, a V35L substitution, or a V35I substitution); and has a length of from about 148 amino acids to about 160 amino acids. In some cases, the calmodulin polypeptide has a length of 148 amino acids.

In some cases, a suitable calmodulin polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following calmodulin amino acid sequence: MDQLTEEQIAEFKEAFSLLDKDGDGTITTKELGTGMRSLGQNPTEAELQDMINEVDADG DGTIDFPEFLTMMARKMKYTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTD EEVDEMIREADIDGDGQVNYEEFVQMMTAK (SEQ ID NO://); and comprises a Leu at amino acid 19 and a Gly at amino acid 35; and has a length of from about 148 amino acids to about 160 amino acids. In some cases, the calmodulin polypeptide has a length of 148 amino acids.

Troponin C/Troponin I

In some cases, the light-activated, calcium-gated fusion polypeptide comprises a troponin C-binding polypeptide (e.g., a troponin I polypeptide); and the second fusion polypeptide comprises a troponin C polypeptide.

A suitable troponin I polypeptide binds a troponin C polypeptide under conditions of high Ca²⁺ concentration. For example, a suitable troponin I polypeptide binds a troponin C polypeptide when the concentration of Ca²⁺ is greater than 100 nM, greater than 150 nM, greater than 200 nM, greater than 250 nM, greater than 300 nM, greater than 350 nM, greater than 400 nM, greater than 500 nM, or greater than 750 nM.

A suitable troponin I polypeptide does not substantially bind a troponin C polypeptide under conditions of low Ca²⁺ concentration. For example, a suitable troponin I polypeptide does not substantially bind a troponin C polypeptide when the intracellular Ca²⁺ concentration is less than about 300 nM, less than about 250 nM, less than about 200 nM, less than about 110 nM, less than about 105 nM, or less than about 100 nM.

A troponin I polypeptide can have a length of from about 10 amino acids to about 200 amino acids, e.g., from about 10 amino acids to about 40 amino acids, from about 20 amino acids to about 40 amino acids, from about 15 amino acids to about 25 amino acids, e.g., from about 10 amino acids to about 15 amino acids, from about 15 amino acids to about 20 amino acids, from about 20 amino acids to about 25 amino acids, from about 25 amino acids to about 30 amino acids, from about 30 amino acids to about 35 amino acids, from about 35 amino acids to about 40 amino acids, from about 40 amino acids to about 45 amino acids, from about 45 amino acids to about 50 amino acids, from about amino acids to about 75 amino acids, from about 75 amino acids to about 100 amino acids, from about 100 amino acids to about 150 amino acids, or from about 150 amino acids to about 200 amino acids.

(SEQ ID NO: //)

MPEVERKPKI TASRKLLLKS LMLAKAKECW EQEHEEREAE

KVRYLAERIP TLQTRGLSLS ALQDLCRELH AKVEVVDEER

YDIEAKCLHN TREIKDLKLK VMDLRGKFKR PPLRRVRVSA

DAMLRALLGS KHKVSMDLRA NLKSVKKEDT EKERPVEVGD

WRKNVEAMSG MEGRKKMFDA AKSPTSQ.

A fragment of troponin I can be used. See, e.g., Tung et al. (2000) Protein Sci. 9:1312. For example, troponin I (95-114) can be used. Thus, for example, in some cases, the troponin I polypeptide can comprise an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following troponin I amino acid sequence: KDLKLK VMDLRGKFKR PPLR (SEQ ID NO://); and has a length of about 20 amino acids to about 50 amino acids (e.g., from about 20 amino acids to about 25 amino acids, from about 25 amino acids to about 30 amino acids, from about 30 amino acids to about 35 amino acids, from about 35 amino acids to about 40 amino acids, from about 40 amino acids to about 45 amino acids, or from about 45 amino acids to about 50 amino acids). In some cases, the troponin I polypeptide has a length of 20 amino acids. In some cases, the troponin I polypeptide has the amino acid sequence: KDLKLK VMDLRGKFKR PPLR (SEQ ID NO://); and has a length of 20 amino acids.

In some cases, a suitable troponin I polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following troponin I amino acid sequence: RMSADAMLKALLGSKHKVAMDLRAN (SEQ ID NO://); and has a length of from about 25 amino acids to about 50 amino acids (e.g., from about 25 amino acids to about 30 amino acids, from about 30 amino acids to about 35 amino acids, from about 35 amino acids to about 40 amino acids, from about 40 amino acids to about 45 amino acids, or from about 45 amino acids to about 50 amino acids). In some cases, the troponin I polypeptide has the amino acid sequence: RMSADAMLKALLGSKHKVAMDLRAN (SEQ ID NO://); and has a length of 25 amino acids.

In some cases, a suitable troponin I polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following troponin I amino acid sequence: NQKLFDLRGKFKRPPLRRVRMSADAMLKALLGSKHKVAMDLRAN (SEQ ID NO://); and has a length of from about 44 amino acids to about 50 amino acids (e.g., 44, 45, 46, 47, 4, 49, or 50 amino acids). In some cases, the troponin I polypeptide has the amino acid sequence: NQKLFDLRGKFKRPPLRRVRMSADAMLKALLGSKHKVAMDLRAN (SEQ ID NO://); and has a length of 44 amino acids.

A suitable troponin C polypeptide can have a length of from about 100 amino acids to about 175 amino acids, e.g., from about 100 amino acids to about 125 amino acids, from about 125 amino acids to about 150 amino acids, or from about 150 amino acids to about 175 amino acids.

A suitable troponin C polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following troponin C amino acid sequence: MTDQQAEARSYLSEEMIAEFKAAFDMFDADGGGDISVKELGTVMRMLGQTPTKEELD AIIEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRDANGYIDAEELA EIFRASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ (SEQ ID NO://); and has a length of from about 160 amino acids to about 175 amino acids (e.g., from about 160 amino acids to about 165 amino acids, from about 165 amino acids to about 170 amino acids, or from about 170 amino acids to about 175 amino acids. In some cases, a suitable troponin C polypeptide comprises the amino acid sequence: MTDQQAEARSYLSEEMIAEFKAAFDMFDADGGGDISVKELGTVMRMLGQTPTKEELD AIIEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRDANGYIDAEELA EIFRASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ (SEQ ID NO://); and has a length of 160 amino acids.

LOV-Domain Light-Activated Polypeptide

A LOV domain light-activated polypeptide that can be encoded by a nucleotide sequence present in a nucleic acid of a system (System 1 or System 2) of the present disclosure is activatable by blue light, and can cage a proteolytically cleavable linker attached to the light-activated polypeptide. Thus, in the absence of blue light, the proteolytically cleavable linker is caged, i.e., inaccessible to a protease. In the presence of blue light, the light-activated polypeptide undergoes a conformational change, such that the proteolytically cleavable linker is uncaged and becomes accessible to a protease. A LOV domain light-activated polypeptide comprises a light, oxygen, or voltage (LOV) domain (a “LOV polypeptide”).

A suitable LOV domain light-activated polypeptide can have a length of from about 100 amino acids to about 150 amino acids. For example, a LOV polypeptide can comprise an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the LOV2 domain of Avena sativa phototropin 1 (AsLOV2).

In some cases, a suitable LOV domain light-activated polypeptide comprises an amino sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following LOV2 amino acid sequence: DLATTLERIEKNFVITDPRLPDNPIIFASDSFLQLTEYSREEILGRNCRFLQGPETDRATVRKI RDAIDNQTEVTVQLINYTKSGKKFWNLFHLQPMRDQKGDVQYFIGVQLDGTEHVRDAAEREGVM LIKKTAENIDEAAK (SEQ ID NO://); GenBank AF033096. In some cases, a suitable LOV polypeptide comprises an amino sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following LOV2 amino acid sequence: DLATTLERIEKNFVITDPRLPDNPIIFASDSFLQLTEYSREEILGRNCRFLQGPETDRATVRKI RDAIDNQTEVTVQLINYTKSGKKFWNLFHLQPMRDQKGDVQYFIGVQLDGTEHVRDAAEREGVM LIKKTAENIDEAAK (SEQ ID NO://); and has a length of from 142 amino acids to 150 amino acids. In some cases, a suitable LOV domain light-activated polypeptide comprises an amino sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following LOV2 amino acid sequence: DLATTLERIEKNFVITDPRLPDNPIIFASDSFLQLTEYSREEILGRNCRFLQGPETDRATVRKI RDAIDNQTEVTVQLINYTKSGKKFWNLFHLQPMRDQKGDVQYFIGVQLDGTEHVRDAAEREGVM LIKKTAENIDEAAK (SEQ ID NO://); and has a length of 142 amino acids.

In some cases, a suitable LOV domain light-activated polypeptide comprises an amino sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: SLATTLERIEKNFVITDPRLPDNPIIFASDSFLQLTEYSREEILGRNCRFLQGPETDRATVR KIRDAIDNQTEVTVQLINYTKSGKKFWNLFHLQPMRDQKGDVQYFIGVQLDGTEHVRD AAEREAVMLIKKTAEEIDEAAK (SEQ ID NO://). In some cases, a suitable LOV domain light-activated polypeptide comprises an amino sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: SLATTLERIEKNFVITDPRLPDNPIIFASDSFLQLTEYSREEILGRNCRFLQGPETDRATVR KIRDAIDNQTEVTVQLINYTKSGKKFWNLFHLQPMRDQKGDVQYFIGVQLDGTEHVRD AAEREAVMLIKKTAEEIDEAAK (SEQ ID NO://); and has a length of from about 142 amino acids to about 150 amino acids. In some cases, a suitable LOV domain light-activated polypeptide comprises an amino sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: SLATTLERIEKNFVITDPRLPDNPIIFASDSFLQLTEYSREEILGRNCRFLQGPETDRATVR KIRDAIDNQTEVTVQLINYTKSGKKFWNLFHLQPMRDQKGDVQYFIGVQLDGTEHVRD AAEREAVMLIKKTAEEIDEAAK (SEQ ID NO://); and has a length of 142 amino acids.

A suitable LOV domain light-activated polypeptide comprises one or more amino acid substitutions relative to the LOV2 amino acid sequence depicted in FIG. 15A. In some cases, a suitable LOV domain light-activated polypeptide comprises one or more amino acid substitutions at positions selected from 1, 2, 12, 25, 28, 91, 100, 117, 118, 119, 120, 126, 128, 135, 136, and 138, relative to the LOV2 amino acid sequence depicted in FIG. 15A. Suitable substitutions include, Asp→Ser at amino acid 1; Asp→Phe at amino acid 1; Leu→Arg at amino acid 2; Asn→Ser at amino acid 12; Ile→Val at amino acid 12; Ala→Val at amino acid 28; Leu→Val at amino acid 91; Gln→Tyr at amino acid 100; His→Arg at amino acid 117; Val→Leu at amino acid 118; Arg→His at amino acid 119; Asp→Gly at amino acid 120; Gly→Ala at amino acid 126; Met→Cys at amino acid 128; Glu→Phe at amino acid 135; Asn→Gln at amino acid 136; Asn→Glu at amino acid 136; and Asp→Ala at amino acid 138, where the amino acid numbering is based on the number of the LOV2 amino acid sequence depicted in FIG. 15A.

In some cases, a suitable LOV domain light-activated polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15C, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 28 is Ala; amino acid 117 is Arg; amino acid 126 is Ala; and amino acid 136 is Glu. In some case, the suitable LOV domain light-activated polypeptide has a length of 142 amino acids.

In some cases, a suitable LOV domain light-activated polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15D, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 25 is Val; amino acid 28 is Val; amino acid 117 is Arg; amino acid 126 is Ala; amino acid 130 is Val; and amino acid 136 is Glu. In some case, the LOV domain light-activated polypeptide has a length of 142 amino acids.

In some cases, a suitable LOV domain light-activated polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15E, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 28 is Ala; amino acid 91 is Val; amino acid 100 is Tyr; amino acid 117 is Arg; amino acid 118 is Leu; amino acid 119 is His; amino acid 120 is Gly; amino acid 126 is Ala; amino acid 128 is Cys; amino acid 130 is Val; amino acid 135 is Phe; amino acid 136 is Gln; and amino acid 138 is Ala. In some case, the LOV domain light-activated polypeptide has a length of 142 amino acids.

In some cases, a suitable LOV domain light-activated polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15F, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 28 is Val; amino acid 117 is Arg; amino acid 126 is Ala; amino acid 130 is Val; and amino acid 136 is Glu. In some case, the LOV domain light-activated polypeptide has a length of 138 amino acids.

In some cases, a suitable LOV domain light-activated polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15G, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 28 is Val; amino acid 91 is Val; amino acid 100 is Tyr; amino acid 117 is Arg; amino acid 118 is Leu; amino acid 119 is His; amino acid 120 is Gly; amino acid 126 is Ala; amino acid 128 is Cys; amino acid 130 is Val; amino acid 135 is Phe; amino acid 136 is Gln; and amino acid 138 is Ala. In some case, the LOV domain light-activated polypeptide has a length of 138 amino acids.

In some cases, the LOV domain light-activated polypeptide comprises a substitution selected from an L2R substitution, an L2H substitution, an L2P substitution, and an L2K substitution. In some cases, the LOV polypeptide comprises a substitution selected from an N12S substitution, an N12T substitution, and an N12Q substitution. In some cases, the LOV polypeptide comprises a substitution selected from an A28V substitution, an A28I substitution, and an A28L substitution. In some cases, the LOV polypeptide comprises a substitution selected from an H117R substitution, and an H117K substitution. In some cases, the LOV polypeptide comprises a substitution selected from an I130V substitution, an I130A substitution, and an I130L substitution. In some cases, the LOV polypeptide comprises substitutions at amino acids L2, N12, and I130. In some cases, the LOV polypeptide comprises substitutions at amino acids L2, N12, H117, and I130. In some cases, the LOV polypeptide comprises substitutions at amino acids A28 and H117. In some cases, the LOV polypeptide comprises substitutions at amino acids N12 and I130. In some cases, the LOV polypeptide comprises an L2R substitution, an N12S substitution, and an I130V substitution. In some cases, the LOV polypeptide comprises an N12S substitution and an I130V substitution. In some cases, the LOV polypeptide comprises an A28V substitution and an H117R substitution. In some cases, the LOV polypeptide comprises an L2P substitution, an N12S substitution, an I130V substitution, and an H117R substitution. In some cases, the LOV polypeptide comprises an L2P substitution, an N12S substitution, an A28V substitution, an H117R substitution, and an I130V substitution. In some cases, the LOV polypeptide comprises an L2P substitution, an N12S substitution, an I130V substitution, and an H117R substitution. In some cases, the LOV polypeptide comprises an L2R substitution, an N12S substitution, an A28V substitution, an H117R substitution, and an I130V substitution. In some cases, the LOV polypeptide has a length of 142 amino acids, 143 amino acids, 144 amino acids, 145 amino acids, 146 amino acids, 147 amino acids, 148 amino acids, 149 amino acids, or 150 amino acids. In some cases, the LOV polypeptide has a length of 142 amino acids.

In some cases, a LOV light-activated polypeptide comprises the following amino acid sequence:

(SEQ ID NO://)

FRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRNCRF

LQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNVFHLQPMRDY

KGDVQYFIGVQLDGTERLHGAAEREAVCLVKKTAFQIA.

In some cases, a LOV light-activated polypeptide comprises the following amino acid sequence:

(SEQ ID NO://)

SRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRNCRF

LQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNLFHLQPMRDQ

KGDVQYFIGVQLDGTERVRDAAEREAVMLVKKTAEEID.

In some cases, a LOV light-activated polypeptide comprises the following amino acid sequence:

(SEQ ID NO://)

FRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRNCRF

LQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNVFHLQPMRDY

KGDVQYFIGVQLDGTERLHGAAEREAVCLVKKTAFQIA.

In some cases, a LOV light-activated polypeptide comprises the following amino acid sequence:

(SEQ ID NO://)

SRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRNCRF

LQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNVFHLQPMRDY

KGDVQYFIGVQLDGTERLHGAAEREAVCLVKKTAFEIDEAAK.

In some cases, a LOV light-activated polypeptide comprises the following amino acid sequence:

(SEQ ID NO: //)

SRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRN

CRFLQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNLFHL

QPMRDQKGDVQYFIGVQLDGTERVRDAAEREAVMLVKKTAEEIDEAA

K.

LOV light-activated polypeptide cages the proteolytically cleavable linker in the absence of light of an activating wavelength, the proteolytically cleavable linker is substantially not accessible to the protease. Thus, e.g., in the absence of light of an activating wavelength (e.g., in the dark; or in the presence of light of a wavelength other than blue light), the proteolytically cleavable linker is cleaved, if at all, to a degree that is more than 50% less, more than 60% less, more than 70% less, more than 80% less, more than 90% less, more than 95% less, more than 98% less, or more than 99% less, than the degree of cleavage of the proteolytically cleavable linker in the presence of light of an activating wavelength (e.g., blue light, e.g., light of a wavelength in the range of from about 450 nm to about 495 nm, from about 460 nm to about 490 nm, from about 470 nm to about 480 nm, e.g., 473 nm).

Non-limiting examples of suitable polypeptides comprising: a) a LOV light-activated polypeptide; and b) a proteolytically cleavable linker include the following (where the proteolytically cleavable linker is underlined, and where the triangle indicates the cleavage site):

1)

(SEQ ID NO: //)

SRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRN

CRFLQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNLFHL

QPMRDQKGDVQYFIGVQLDGTERVRDAAEREAVMLVKKTAEEIDEAA

KENLYFQ_▴M;

2)

(SEQ ID NO: //)

SRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRN

CRFLQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNVFHL

QPMRDYKGDVQYFIGVQLDGTERLHGAAEREAVCLVKKTAFEIDEAA

KENLYFQ_▴M;

3)

(SEQ ID NO: //)

FRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRN

CRFLQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNVFHL

QPMRDYKGDVQYFIGVQLDGTERLHGAAEREAVCLVKKTAFQIAENL

YFQ_▴M;

4)

(SEQ ID NO: //)

SRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRN

CRFLQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNLFHL

QPMRDQKGDVQYFIGVQLDGTERVRDAAEREAVMLVKKTAEEIDENL

YFQ_▴G;

and

5)

(SEQ ID NO: //)

FRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRN

CRFLQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNVFHL

QPMRDYKGDVQYFIGVQLDGTERLHGAAEREAVCLVKKTAFQIAENL

YFQ_▴G.

Proteolytically Cleavable Linker

The proteolytically cleavable linker can include a protease recognition sequence recognized by a protease selected from the group consisting of alanine carboxypeptidase, Armillaria mellea astacin, bacterial leucyl aminopeptidase, cancer procoagulant, cathepsin B, clostripain, cytosol alanyl aminopeptidase, elastase, endoproteinase Arg-C, enterokinase, gastricsin, gelatinase, Gly-X carboxypeptidase, glycyl endopeptidase, human rhinovirus 3C protease, hypodermin C, IgA-specific serine endopeptidase, leucyl aminopeptidase, leucyl endopeptidase, lysC, lysosomal pro-X carboxypeptidase, lysyl aminopeptidase, methionyl aminopeptidase, myxobacter, nardilysin, pancreatic endopeptidase E, picornain 2A, picornain 3C, proendopeptidase, prolyl aminopeptidase, proprotein convertase I, proprotein convertase II, russellysin, saccharopepsin, semenogelase, T-plasminogen activator, thrombin, tissue kallikrein, tobacco etch virus (TEV), togavirin, tryptophanyl aminopeptidase, U-plasminogen activator, V8, venombin A, venombin AB, and Xaa-pro aminopeptidase.

For example, the proteolytically cleavable linker can comprise a matrix metalloproteinase (MMP) cleavage site, e.g., a cleavage site for a MMP selected from collagenase-1, -2, and -3 (MMP-1, -8, and -13), gelatinase A and B (MMP-2 and -9), stromelysin 1, 2, and 3 (MMP-3, -10, and -11), matrilysin (MMP-7), and membrane metalloproteinases (MT1-MMP and MT2-MMP). For example, the cleavage sequence of MMP-9 is Pro-X-X-Hy (wherein, X represents an arbitrary residue; Hy, a hydrophobic residue), e.g., Pro-X-X-Hy-(Ser/Thr), e.g., Pro-Leu/Gln-Gly-Met-Thr-Ser (SEQ ID NO://) or Pro-Leu/Gln-Gly-Met-Thr (SEQ ID NO://). Another example of a protease cleavage site is a plasminogen activator cleavage site, e.g., a uPA or a tissue plasminogen activator (tPA) cleavage site. Another example of a suitable protease cleavage site is a prolactin cleavage site. Specific examples of cleavage sequences of uPA and tPA include sequences comprising Val-Gly-Arg. Another example of a protease cleavage site that can be included in a proteolytically cleavable linker is a tobacco etch virus (TEV) protease cleavage site, e.g., ENLYFQS (SEQ ID NO://), where the protease cleaves between the glutamine and the serine; or ENLYFQY (SEQ ID NO://), where the protease cleaves between the glutamine and the tyrosine; or ENLYFQL (SEQ ID NO://), where the protease cleaves between the glutamine and the leucine. Another example of a protease cleavage site that can be included in a proteolytically cleavable linker is an enterokinase cleavage site, e.g., DDDDK (SEQ ID NO://), where cleavage occurs after the lysine residue. Another example of a protease cleavage site that can be included in a proteolytically cleavable linker is a thrombin cleavage site, e.g., LVPR (SEQ ID NO://) (e.g., where the proteolytically cleavable linker comprises the sequence LVPRGS (SEQ ID NO://)). Additional suitable linkers comprising protease cleavage sites include linkers comprising one or more of the following amino acid sequences: LEVLFQGP (SEQ ID NO://), cleaved by PreScission protease (a fusion protein comprising human rhinovirus 3C protease and glutathione-S-transferase; Walker et al. (1994) Biotechnol. 12:601); a thrombin cleavage site, e.g., CGLVPAGSGP (SEQ ID NO://); SLLKSRMVPNFN (SEQ ID NO://) or SLLIARRMPNFN (SEQ ID NO://), cleaved by cathepsin B; SKLVQASASGVN (SEQ ID NO://) or SSYLKASDAPDN (SEQ ID NO://), cleaved by an Epstein-Barr virus protease; RPKPQQFFGLMN (SEQ ID NO://) cleaved by MMP-3 (stromelysin); SLRPLALWRSFN (SEQ ID NO://) cleaved by MMP-7 (matrilysin); SPQGIAGQRNFN (SEQ ID NO://) cleaved by MMP-9; DVDERDVRGFASFL SEQ ID NO://) cleaved by a thermolysin-like MMP; SLPLGLWAPNFN (SEQ ID NO://) cleaved by matrix metalloproteinase 2 (MMP-2); SLLIFRSWANFN (SEQ ID NO://) cleaved by cathespin L; SGVVIATVIVIT (SEQ ID NO://) cleaved by cathepsin D; SLGPQGIWGQFN (SEQ ID NO://) cleaved by matrix metalloproteinase 1 (MMP-1); KKSPGRVVGGSV (SEQ ID NO://) cleaved by urokinase-type plasminogen activator; PQGLLGAPGILG (SEQ ID NO://) cleaved by membrane type 1 matrixmetalloproteinase (MT-MMP); HGPEGLRVGFYESDVMGRGHARLVHVEEPHT (SEQ ID NO://) cleaved by stromelysin 3 (or MMP-11), thermolysin, fibroblast collagenase and stromelysin-1; GPQGLAGQRGIV (SEQ ID NO://) cleaved by matrix metalloproteinase 13 (collagenase-3); GGSGQRGRKALE (SEQ ID NO://) cleaved by tissue-type plasminogen activator (tPA); SLSALLSSDIFN (SEQ ID NO://) cleaved by human prostate-specific antigen; SLPRFKIIGGFN (SEQ ID NO://) cleaved by kallikrein (hK3); SLLGIAVPGNFN (SEQ ID NO://) cleaved by neutrophil elastase; and FFKNIVTPRTPP (SEQ ID NO://) cleaved by calpain (calcium activated neutral protease).

Suitable proteolytically cleavable linkers also include ENLYFQX (SEQ ID NO://; where X is any amino acid), ENLYFQS (SEQ ID NO://), ENLYFQG (SEQ ID NO://), ENLYFQY (SEQ ID NO://), ENLYFQL (SEQ ID NO://), ENLYFQW (SEQ ID NO://), ENLYFQM (SEQ ID NO://), ENLYFQH (SEQ ID NO://), ENLYFQN (SEQ ID NO://), ENLYFQA (SEQ ID NO://), and ENLYFQQ (SEQ ID NO://).

Suitable proteolytically cleavable linkers also include NS3 protease cleavage sites such as: DEVVECS (SEQ ID NO://), DEAEDVVECS (SEQ ID NO://), EDAAEEVVECS (SEQ ID NO://).

Suitable proteolytically cleavable linkers also include calpain cleavage site, where suitable calpain cleavage sites include, e.g., PLFAAR (SEQ ID NO://) and QQEVYGMMPRD (SEQ ID NO://).

In some cases, the proteolytically cleavable linker comprises an amino acid sequence that is substantially not cleaved by any endogenous protease in a given cell (e.g., a eukaryotic cell; e.g., a mammalian cell; e.g., a particular type of mammalian cell). In some cases, the proteolytically cleavable linker comprises an amino acid sequence that is cleaved by a viral protease, and that is substantially not cleaved by any endogenous protease in a given cell (e.g., a eukaryotic cell; e.g., a mammalian cell; e.g., a particular type of mammalian cell). In some cases, the proteolytically cleavable linker comprises an amino acid sequence that is cleaved by a non-naturally occurring (e.g., engineered) protease, and that is substantially not cleaved by any endogenous protease in a given cell (e.g., a eukaryotic cell; e.g., a mammalian cell; e.g., a particular type of mammalian cell).

In some cases, the proteolytically cleavable linker comprises an amino acid sequence that is cleaved by a protease that is endogenous to a given cell (e.g., a eukaryotic cell; e.g., a mammalian cell; e.g., a particular type of mammalian cell).

Proteases

In some cases, the protease is a protease that is not normally produced in a particular cell; e.g., the protease is heterologous to the cell. For example, in some cases, the protease is one that is not normally produced in a mammalian cell. Examples of such proteases include viral proteases, insect-specific proteases, venom proteases, and the like.

In some cases, the protease is a protease that is normally produced in a particular cell; e.g., the protease is an endogenous protease (e.g., a calpain protease; etc.).

Suitable proteases include, but are not limited to, alanine carboxypeptidase, Armillaria mellea astacin, bacterial leucyl aminopeptidase, cancer procoagulant, cathepsin B, clostripain, cytosol alanyl aminopeptidase, elastase, endoproteinase Arg-C, enterokinase, gastricsin, gelatinase, Gly-X carboxypeptidase, glycyl endopeptidase, human rhinovirus 3C protease, hypodermin C, IgA-specific serine endopeptidase, leucyl aminopeptidase, leucyl endopeptidase, lysC, lysosomal pro-X carboxypeptidase, lysyl aminopeptidase, methionyl aminopeptidase, myxobacter, nardilysin, pancreatic endopeptidase E, picornain 2A, picornain 3C, proendopeptidase, prolyl aminopeptidase, proprotein convertase I, proprotein convertase II, russellysin, saccharopepsin, semenogelase, T-plasminogen activator, thrombin, tissue kallikrein, tobacco etch virus (TEV), togavirin, tryptophanyl aminopeptidase, U-plasminogen activator, Factor Xa, V8, venombin A, venombin AB, a calpain protease, and an Xaa-pro aminopeptidase.

Suitable proteases include a matrix metalloproteinase (MMP) (e.g., an MMP selected from collagenase-1, -2, and -3 (MMP-1, -8, and -13), gelatinase A and B (MMP-2 and -9), stromelysin 1, 2, and 3 (MMP-3, -10, and -11), matrilysin (MMP-7), and membrane metalloproteinases (MT1-MMP and MT2-MMP); a plasminogen activator (e.g., a uPA or a tissue plasminogen activator (tPA)). Another example of a suitable protease is prolactin. Another example of a suitable protease is a tobacco etch virus (TEV) protease. Another example of suitable protease is enterokinase. Another example of suitable protease is thrombin. Additional examples of suitable protease are: a PreScission protease (a fusion protein comprising human rhinovirus 3C protease and glutathione-S-transferase; Walker et al. (1994) Biotechnol. 12:601); cathepsin B; an Epstein-Barr virus protease; cathespin L; cathepsin D; thermolysin; kallikrein (hK3); neutrophil elastase; calpain (calcium activated neutral protease); and NS3 protease.

In some cases, a suitable protease is a TEV protease. In some cases, a suitable protease comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 20A. In some cases, a suitable protease is a TEV protease. In some cases, a suitable protease comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 20B. In some cases, a suitable protease is a TEV protease. In some cases, a suitable protease comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 20C. In some cases, a suitable protease is a TEV protease. In some cases, a suitable protease comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 20D.

In some cases, a suitable TEV protease comprises the amino acid sequence

(SEQ ID NO: //)

GESLFKGPRDYNPISSTICHLTNESDGHTTSLYGIGFGPFIITNKHL

FRRNNGTLLVQSLHGVFKVKNTTTLQQHLIDGRDMIIIRMPKDFPPF

PQKLKFREPQREERICLVTTNFQTKSMSSMVSDTSCTFPSSDGIFWK

HWIQTKDGQCGSPLVSTRDGFIVGIHSASNFTNTNNYFTSVPKNFME

LLTNQEAQQWVSGWRLNADSVLWGGHKVFMV.

A suitable TEV protease can have a length of from about 200 amino acids to about 250 amino acids. For example, a suitable TEV protease can have a length of from about 200 amino acids to about 220 amino acids, from about 220 amino acids to about 240 amino acids, or from about 240 amino acids to about 250 amino acids. For example, a suitable TEV protease can have a length of 219 amino acids, 242 amino acids, or 238 amino acids.

System Comprising a Nucleic Acid Comprising a Nucleotide Sequence Encoding a Polypeptide of Interest

As noted above, a system of present disclosure includes a nucleic acid system (“System 2”) comprising: a) a first nucleic acid comprising a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide comprising, in order from amino terminus to carboxyl terminus: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 15A-15D; iv) a proteolytically cleavable linker; and v) a polypeptide of interest; and b) a second nucleic acid comprising a nucleotide sequence encoding a second fusion polypeptide comprising: i) a calcium-binding polypeptide selected from a calmodulin polypeptide and troponin C polypeptide; and ii) a protease that cleaves the proteolytically cleavable linker. Thus, in some cases, the present disclosure provides a nucleic acid system in which the first nucleic acid comprises a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide that comprises a polypeptide of interest.

A system of present disclosure can include a nucleic acid system (“System 2”) comprising: a) a first nucleic acid comprising a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide comprising, in order from amino terminus to carboxyl terminus: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 15E-15G; iv) a proteolytically cleavable linker; and v) a polypeptide of interest; and b) a second nucleic acid comprising a nucleotide sequence encoding a second fusion polypeptide comprising: i) a calcium-binding polypeptide selected from a calmodulin polypeptide and troponin C polypeptide; and ii) a protease that cleaves the proteolytically cleavable linker. Thus, in some cases, the present disclosure provides a nucleic acid system in which the first nucleic acid comprises a nucleotide sequence encoding a light-activated, calcium-gated fusion polypeptide that comprises a polypeptide of interest.

Polypeptides of Interest

Suitable polypeptides of interest that can be encoded in a system of the present disclosure include, but are not limited to, a reporter gene product, an opsin, a DREADD, a toxin, an enzyme, a transcription factor, an antibiotic resistance factor, a genome editing endonuclease, an RNA-guided endonuclease, a protease, a kinase, a phosphatase, a phosphorylase, a lipase, a receptor, an antibody, a fluorescent protein, a biotin ligase, a peroxidase such as APEX or APEX2, a base editing enzyme, a recombinase, a synaptic marker, a signaling protein, an effector protein of a receptor, a protein that regulates synaptic vesicle fusion or protein trafficking or organelle trafficking, a portion (e.g., a split half) of any one of the aforementioned polypeptides. In some cases, the gene product is inactive until released from the calcium-gated, light-activated polypeptide. In some cases, the gene product is a nuclear protein. In some cases, the gene product is a cytosolic protein. In some cases, the gene product is a mitochondrial protein. In some cases, the gene product is a transmembrane protein.

Biotin Ligase

A suitable biotin ligase includes a BirA biotin-protein ligase polypeptide. A BirA biotin-protein ligase activates biotin to form biotinyl 5′ adenylate and transfers the biotin to a biotin-acceptor tag (BAT). A BAT can be present in a fusion protein, where the fusion protein comprises: a) a BAT; and b) a heterologous polypeptide. Suitable BATs include, e.g., GLNDIFEAQKIEWHE (SEQ ID NO://; see, e.g., Fairhead and Howarth (2015) Methods Mol. Biol. 1266:171).

A suitable BirA biotin-protein ligase polypeptide can comprise an amino acid sequence having at least at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence:

(SEQ ID NO: //)

MKDNTVPLKL IALLANGEFH SGEQLGETLG MSRAAINKHI

QTLRDWGVDV FTVPGKGYSL PEPIQLLNAE EILSQLDGGS

VAVLPVIDST NQYLLDRIGE LKSGDACVAE YQQAGRGRRG

RKWFSPFGAN LYLSMFWRLE QGPAAAIGLS LVIGIVMAEV

LRKLGADKVR VKWPNDLYLQ DRKLAGILVE LTGKTGDAAQ

IVIGAGINMA MRRVEESVVN QGWITLQEAG INLDRNTLAA

MLIRELRAAL ELFEQEGLAP YLSRWEKLDN FINRPVKLII

GDKEIFGISR GIDKQGALLL EQDGIIKPWM GGEISLRSAE

K.

Synaptic Markers

In some cases, a polypeptide of interest is a synaptic marker. Synaptic markers include, but are not limited to, PSD-95, SV2, homer, bassoon, synapsin I, synaptotagmin, synaptophysin, synaptobrevin, SAP102, α-adaptin, GluA1, NMDA receptor, LRRTM1, LRRTM2, SLITRK, neuroligin-1, neuroligin-2, gephyrin, GABA receptor, and the like.

Nucleic Acid Editing Enzymes

In some cases, a polypeptide of interest is a nucleic acid-editing enzyme. Suitable nucleic acid-editing enzymes include, e.g., a DNA-editing enzyme, a cytidine deaminase, an adenosine deaminase, an apolipoprotein B mRNA-editing complex (APOBEC) family deaminase, an activation-induced cytidine deaminase (AID), an ACF1/ASE deaminase, and an ADAT family deaminase.

Peroxidases

A suitable polypeptide of interest is in some cases a peroxidase, where suitable peroxidases include, e.g., horse radish peroxidase, yeast cytochrome c peroxidase (CCP), ascorbate peroxidase (APX), bacterial catalase-peroxidase (BCP), APEX, and APEX2. See, e.g., U.S. Patent Publication No. 2014/0206013.

An example of a suitable peroxidase is an APX, which has the following amino acid sequence: MGKSYPTVSA DYQKAVEKAK KKLRGFIAEK RCAPLMLRLA WHSAGTFDKG TKTGGPFGTI KHPAELAHSA NNGLDIAVRL LEPLKAEFPI LSYADFYQLA GVVAVEVTGG PEVPFHPGRE DKPEPPPEGR LPDATKGSDH LRDVFGKAMG LTDQDIVALS GGHTIGAAHK ERSGFEGPWT SNPLIFDNSY FTELLSGEKE GLLQLPSDKA LLSDPVFRPL VDKYAADEDA FFADYAEAHQ KLSELGFADA (SEQ ID NO://). In some cases, the peroxidase comprises a K14D substitution. In some cases, the peroxidase can contain a combination of (a) K14D, E112K, E228K, D229K, K14D/E112K, K14D/E228K, K14D/D229K, E17N/K20A/R21L, or K14D/W41F/E112K, and (b) S69F, G174F, W41F/S69F, D133A/T135F/K136F, W41F/D133A/T135F/K136F, S69F/D133A/T135F/K136F, or W41F/S69F/D133A/T135F/K136F. In some cases, the peroxidase can contain a combination of (a) single mutant K14D, single mutant E112K, single mutant E228K, single mutant D229K, double mutant K14D/E112K, double mutant K14D/E228K, double mutant K14D/D229K, triple mutant E17N/K20A/R21L, or triple mutant K14D/W41F/E112K, and (b) single mutant W41F, single mutant S69F, single mutant G174F, double mutant W41F/S69F, triple mutant D133A/T135F/K136F, quadruple mutant W41F/D133A/T135F/K136F, quadruple mutant S69F/D133A/T135F/K136F, or quintuple mutant W41F/S69F/D133A/T135F/K136F. Examples of such combined mutants include, but are not limited to, K14D/E112K/W41F (APEX), and K 14D/E112K/W41F/D133A/T135F/K136F. The amino acid numbering is based on the above-provided APX amino acid sequence.

Antibodies

A suitable polypeptide of interest is in some cases an antibody. The terms “antibodies” and “immunoglobulin” include antibodies or immunoglobulins of any isotype, fragments of antibodies that retain specific binding to antigen, including, but not limited to, Fab, Fv, scFv, and Fd fragments, chimeric antibodies, humanized antibodies, single-chain antibodies (scAb), single domain antibodies (dAb), single domain heavy chain antibodies, a single domain light chain antibodies, nanobodies, bi-specific antibodies, multi-specific antibodies, and fusion proteins comprising an antigen-binding (also referred to herein as antigen binding) portion of an antibody and a non-antibody protein. Also encompassed by the term are Fab′, Fv, F(ab′)₂, and or other antibody fragments that retain specific binding to antigen, and monoclonal antibodies.

The term “nanobody” (Nb), as used herein, refers to the smallest antigen binding fragment or single variable domain (V_HH) derived from naturally occurring heavy chain antibody and is known to the person skilled in the art. They are derived from heavy chain only antibodies, seen in camelids (Hamers-Casterman et al., 1993; Desmyter et al., 1996). In the family of “camelids” immunoglobulins devoid of light polypeptide chains are found. “Camelids” comprise old world camelids (Camelus bactrianus and Camelus dromedarius) and new world camelids (for example, Llama paccos, Llama glama, Llama guanicoe and Llama vicugna). A single variable domain heavy chain antibody is referred to herein as a nanobody or a V_HHantibody.

“Antibody fragments” comprise a portion of an intact antibody, for example, the antigen binding or variable region of the intact antibody. Examples of antibody fragments include Fab, Fab′, F(ab′)₂, and Fv fragments; diabodies; linear antibodies (Zapata et al., Protein Eng. 8(10): 1057-1062 (1995)); domain antibodies (dAb; Holt et al. (2003) Trends Biotechnol. 21:484); single-chain antibody molecules; and multi-specific antibodies formed from antibody fragments. Papain digestion of antibodies produces two identical antigen-binding fragments, called “Fab” fragments, each with a single antigen-binding site, and a residual “Fc” fragment, a designation reflecting the ability to crystallize readily. Pepsin treatment yields an F(ab′)₂fragment that has two antigen combining sites and is still capable of cross-linking antigen. Antibody fragments include, e.g., scFv, sdAb, dAb, Fab, Fab′, Fab′₂, F(ab′)₂, Fd, Fv, Feb, and SMIP. An example of an sdAb is a camelid VHH.

“Fv” is the minimum antibody fragment that contains a complete antigen-recognition and -binding site. This region consists of a dimer of one heavy- and one light-chain variable domain in tight, non-covalent association. It is in this configuration that the three complementarity determining regions (CDRs) of each variable domain interact to define an antigen-binding site on the surface of the V_H-V_Ldimer. Collectively, the six CDRs confer antigen-binding specificity to the antibody. However, even a single variable domain (or half of an Fv comprising only three CDRs specific for an antigen) has the ability to recognize and bind antigen, although at a lower affinity than the entire binding site.

“Single-chain Fv” or “sFv” or “scFv” antibody fragments comprise the V_Hand V_Ldomains of antibody, wherein these domains are present in a single polypeptide chain. In some embodiments, the Fv polypeptide further comprises a polypeptide linker between the V_Hand V_Ldomains, which enables the sFv to form the desired structure for antigen binding. For a review of sFv, see Pluckthun in The Pharmacology of Monoclonal Antibodies, vol. 113, Rosenburg and Moore eds., Springer-Verlag, New York, pp. 269-315 (1994).

The term “diabodies” refers to small antibody fragments with two antigen-binding sites, which fragments comprise a heavy-chain variable domain (V_H) connected to a light-chain variable domain (V_L) in the same polypeptide chain (V_H-V_L). By using a linker that is too short to allow pairing between the two domains on the same chain, the domains are forced to pair with the complementary domains of another chain and create two antigen-binding sites. Diabodies are described more fully in, for example, EP 404,097; WO 93/11161; and Hollinger et al. (1993) Proc. Natl. Acad. Sci. USA 90:6444-6448.

DREADDs

A suitable polypeptide of interest is in some cases a Designer Receptors Exclusively Activated by Designer Drugs (DREADD; also known as a “RASSL”). See e.g., Roth (2016) Neuron 89:683; Bang et al. (2016) Exp. Neurobiol. 25:205; Whissell et al. (2016) Front. Genet. 7:70; and U.S. Pat. No. 6,518,480. For example, a modified G protein-coupled receptor (GPCR) is genetically engineered so that it: 1) retains binding affinity for a synthetic small molecule; and 2) has decreased binding affinity for a selected naturally occurring peptide or nonpeptide ligand relative to binding by its corresponding wild-type GPCR (e.g., the GPCR from which the modified GPCR was derived). Synthetic small molecule binding to the modified receptor induces the target cell to respond with a specific physiological response (e.g., cellular proliferation, cellular secretion, cell migration, cell contraction, or pigment production).

Any G protein-coupled receptor having separable domains for: 1) natural ligand (e.g., a natural peptide ligand) binding; 2) synthetic small molecule binding; and 3) G protein interaction can be modified to produce a DREADD.

GPCRs that bind peptide as their natural ligand are in some cases used to generate a DREADD. Such GPCRs, include, but are not limited to: Type-1 Angiotensin II Receptor, Type-1a Angiotensin II Receptor, Type-1B Angiotensin II Receptor, Type-1C Angiotensin II Receptor, Type-2 Angiotensin II Receptor, Neuromedin-B Receptor, Gastrin-releasing Peptide Receptor, Bombesin Subtype-3 Receptor, B1 Bradykinin Receptor, B2 Bradykinin Receptor, Interleukin-8 A Receptor, Interleukin-8 B Receptor, FMet-Leu-Phe Receptor, Monocyte Chemoattractant Protein 1 Receptor, C-C Chemokine Receptor Type 1 Receptor, C5a Anaphylatoxin Receptor, Cholecystokinin Type A Receptor, Gastrin/cholecystokinin Type B Receptor, Endothelin-1 Receptor, Endothelin B Receptor, Follicle Stimulating Hormone (FSH-R) Receptor, Lutropin-choriogonadotropic Hormone (LH/CG-R) Receptor, Adrenocorticotropic Hormone Receptor (ACTH-R), Melanocyte Stimulating Hormone Receptor (MSH-R), Melanocortin-3 Receptor, Melanocortin-4 Receptor, Melanocortin-5 Receptor, Melatonin Type 1A Receptor, Melatonin Type 1B Receptor, Melatonin Type 1C Receptor, Neuropeptide Y Type 1 Receptor, Neuropeptide Y Type 2 Receptor, Neurotensin Receptor, Delta-type Opioid Receptor, Kappa-type Opioid Receptor, Mu-type Opioid, Nociceptin Receptor, Gonadotropin-releasing Hormone Receptor, Somatostatin Type 1 Receptor, Somatostatin Type 2 Receptor, Somatostatin Type 3 Receptor, Somatostatin Type 4 Receptor, Somatostatin Type 5 Receptor, Substance-P Receptor, Substance-K Receptor, Neuromedin K Receptor, Vasopressin Via Receptor, Vasopressin V1B Receptor, Vasopressin V2 Receptor, Oxytocin Receptor, Galanin Receptor, Calcitonin Receptor, Calcitonin A Receptor, Calcitonin B Receptor, Growth Hormone-releasing Hormone Receptor, Parathyroid Hormone/parathyroid Hormone-related Peptide Receptor, Pituitary Adenylate Cyclase Activating Polypeptide Type I Receptor, Secretin Receptor, Vasoactive Intestinal Polypeptide 1 Receptor, and Vasoactive Intestinal Polypeptide 2 Receptor.

A DREADD can interact with a G protein selected from Gi, Gq, and Gs. Thus, a DREADD can be a Gi-coupled DREADD, a Gq-coupled DREADD, or a Gs-coupled DREADD.

DREADDs include, but are not limited to, hM3Dq, a DREADD generated from the human M3 muscarinic receptor; hM4Di, a DREADD generated from the Gi-coupled human M4 muscarinic; a DREADD generated from a kappa opioid receptor (see U.S. Pat. No. 6,518,480); KORD; and the like.

Transcription Factors

Suitable transcription factors include naturally-occurring transcription factors and recombinant (e.g., non-naturally occurring, engineered, artificial, synthetic) transcription factors. In some cases, the transcription is a transcriptional activator. In some cases, the transcriptional activator is an engineered protein, such as a zinc finger or TALE based DNA binding domain fused to an effector domain such as VP64 (transcriptional activation).

A transcription factor can comprise: i) a DNA binding domain (DBD); and ii) an activation domain (AD). The DBD can be any DBD with a known response element, including synthetic and chimeric DNA binding domains, or analogs, combinations, or modifications thereof. Suitable DNA binding domains include, but are not limited to, a GAL4 DBD, a LexA DBD, a transcription factor DBD, a Group H nuclear receptor member DBD, a steroid/thyroid hormone nuclear receptor superfamily member DBD, a bacterial LacZ DBD, an EcR DBD, a GALA DBD, and a LexA DBD. Suitable ADs include, but are not limited to, a Group H nuclear receptor member AD, a steroid/thyroid hormone nuclear receptor AD, a CJ7 AD, a p65-TA1 AD, a synthetic or chimeric AD, a polyglutamine AD, a basic or acidic amino acid AD, a VP16 AD, a GAL4 AD, an NF-κB AD, a BP64 AD, a B42 acidic activation domain (B42AD), a p65 transactivation domain (p65AD), SAD, NF-1, AP-2, SP1-A, SP1-B, Oct-1, Oct-2, MTF-1, BTEB-2, and LKLF, or an analog, combination, or modification thereof.

Suitable transcription factors include transcriptional repressors, where suitable transcriptional repressors (e.g., a transcription repressor domain) include, but are not limited to, Krüppel-associated box (KRAB); the Mad mSIN3 interaction domain (SID); the ERF repressor domain (ERD); MDB-2B; v-ErbA; MBD3; and the like.

Reporter Gene Products

Suitable reporter gene products include polypeptides that generate a detectable signal. Suitable detectable signal-producing proteins include, e.g., fluorescent proteins; enzymes that catalyze a reaction that generates a detectable signal as a product; and the like.

Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilised EGFP (dEGFP), destabilised ECFP (dECFP), destabilised EYFP (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t-HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2(12), mRFP1, pocilloporin, Renilla GFP, Monster GFP, paGFP, Kaede protein and kindling protein, Phycobiliproteins and Phycobiliprotein conjugates including B-Phycoerythrin, R-Phycoerythrin and Allophycocyanin. Other examples of fluorescent proteins include mHoneydew, mBanana, mOrange, dTomato, tdTomato, mTangerine, mStrawberry, mCherry, mGrape1, mRaspberry, mGrape2, mPlum (Shaner et al. (2005) Nat. Methods 2:905-909), and the like. Any of a variety of fluorescent and colored proteins from Anthozoan species, as described in, e.g., Matz et al. (1999) Nature Biotechnol. 17:969-973, is suitable for use.

Suitable enzymes include, but are not limited to, horse radish peroxidase (HRP), alkaline phosphatase (AP), beta-galactosidase (GAL), glucose-6-phosphate dehydrogenase, beta-N-acetylglucosaminidase, β-glucuronidase, invertase, Xanthine Oxidase, firefly luciferase, glucose oxidase (GO), and the like.

Genome-Editing Endonuclease

A “genome editing endonuclease” is an endonuclease, e.g., sequence-specific endonuclease, which can be used for the editing of a cell's genome (e.g., by cleaving at a targeted location within the cell's genomic DNA). Examples of genome editing endonucleases include but are not limited to: (i) Zinc finger nucleases, (ii) TAL endonucleases, and (iii) CRISPR/Cas endonucleases. Examples of CRISPR/Cas endonucleases include class 2 CRISPR/Cas endonucleases such as: (a) type II CRISPR/Cas proteins, e.g., a Cas9 protein; (b) type V CRISPR/Cas proteins, e.g., a Cpf1 polypeptide, a C2c1 polypeptide, a C2c3 polypeptide, and the like; and (c) type VI CRISPR/Cas proteins, e.g., a C2c2 polypeptide.

Examples of suitable sequence-specific, e.g., genome editing, endonucleases include, but are not limited to, zinc finger nucleases, meganucleases, TAL-effector DNA binding domain-nuclease fusion proteins (transcription activator-like effector nucleases (TALEN®s)), and CRISPR/Cas endonucleases (e.g., class 2 CRISPR/Cas endonucleases such as a type II, type V, or type VI CRISPR/Cas endonucleases). Thus, in some cases, a gene product is a sequence-specific genome editing endonuclease, e.g., genome editing, endonucleases selected from: a zinc finger nuclease, a TAL-effector DNA binding domain-nuclease fusion protein (TALEN), and a CRISPR/Cas endonuclease (e.g., a class 2 CRISPR/Cas endonuclease such as a type II, type V, or type VI CRISPR/Cas endonuclease). In some cases, a sequence-specific genome editing endonuclease includes a zinc finger nuclease or a TALEN. In some cases, a sequence-specific genome editing endonuclease includes a class 2 CRISPR/Cas endonuclease. In some cases, a sequence-specific genome editing endonuclease includes a class 2 type II CRISPR/Cas endonuclease (e.g., a Cas9 protein). In some cases, a sequence-specific genome editing endonuclease includes a class 2 type V CRISPR/Cas endonuclease (e.g., a Cpf1 protein, a C2c1 protein, or a C2c3 protein). In some cases, a sequence-specific genome editing endonuclease includes a class 2 type VI CRISPR/Cas endonuclease (e.g., a C2c2 protein).

RNA-mediated adaptive immune systems in bacteria and archaea rely on Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) genomic loci and CRISPR-associated (Cas) proteins that function together to provide protection from invading viruses and plasmids. In some cases, an RNA-guided endonuclease is a class 2 CRISPR/Cas endonuclease. In class 2 CRISPR systems, the functions of the effector complex (e.g., the cleavage of target DNA) are carried out by a single endonuclease (e.g., see Zetsche et al, Cell. 2015 Oct. 22; 163(3):759-71; Makarova et al, Nat Rev Microbiol. 2015 November; 13(11):722-36; and Shmakov et al., Mol Cell. 2015 Nov. 5; 60(3):385-97). As such, the term “class 2 CRISPR/Cas protein” is used herein to encompass the endonuclease (the target nucleic acid cleaving protein) from class 2 CRISPR systems. Thus, the term “class 2 CRISPR/Cas endonuclease” as used herein encompasses type II CRISPR/Cas proteins (e.g., Cas9), type V CRISPR/Cas proteins (e.g., Cpf1, C2c1, C2C3), and type VI CRISPR/Cas proteins (e.g., C2c2). To date, class 2 CRISPR/Cas proteins encompass type II, type V, and type VI CRISPR/Cas proteins, but the term is also meant to encompass any class 2 CRISPR/Cas protein suitable for binding to a corresponding guide RNA and forming an RNP complex.

In some cases, a suitable RNA-guided endonuclease comprises an amino acid sequence having at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the Streptococcus pyogenes Cas9 amino acid sequence depicted in FIG. 21.

In some cases, a suitable RNA-guided endonuclease comprises an amino acid sequence having at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the Staphylococcus aureus Cas9 amino acid sequence depicted in FIG. 22.

In some cases, the RNA-guided endonuclease is a nickase. Jinek et al., Science. 2012 Aug. 17; 337(6096):816-21).

In some cases, the RNA-guided endonuclease is a variant Cas9 protein that has reduced catalytic activity (e.g., when a Cas9 protein has a D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or a A987 mutation of the amino acid sequence depicted in FIG. 21, e.g., D10A, G12A, G17A, E762A, H840A, N854A, N863A, H982A, H983A, A984A, and/or D986A); and the variant Cas9 protein retains the ability to bind to target nucleic acid in a site-specific manner (e.g., when complexed with a guide RNA.

In some cases, the RNA-guided endonuclease is a type V CRISPR/Cas protein. In some cases, the RNA-guided endonuclease is a type VI CRISPR/Cas protein. Examples and guidance related to type V and type VI CRISPR/Cas proteins (e.g., Cpf1, C2c1, C2c2, and C2c3 guide RNAs) can be found in the art, for example, see Zetsche et al, Cell. 2015 Oct. 22; 163(3):759-71; Makarova et al, Nat Rev Microbiol. 2015 November; 13(11):722-36; and Shmakov et al., Mol Cell. 2015 Nov. 5; 60(3):385-97.

In some cases, the RNA-guided endonuclease is a chimeric polypeptide (e.g., a fusion polypeptide) comprising: a) an RNA-guided endonuclease; and b) a fusion partner, where the fusion partner provides a functionality or activity other than an endonuclease activity. For example, the fusion partner can be a polypeptide having an enzymatic activity that modifies a polypeptide (e.g., a histone) associated with, or proximal to, a target nucleic acid (e.g., methyltransferase activity, deaminase activity (e.g., cytidine deaminase activity), demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity or demyristoylation activity).

In some cases, the RNA-guided endonuclease is a base editor; for example, in some cases, the RNA-guided endonuclease is a fusion polypeptide comprising: a) an RNA-guided endonuclease; and b) a cytidine deaminase. See, e.g., Komor et al. (2016) Nature 533:420.

Opsins

In some cases, a gene product encoded in a system of the present disclosure is a hyperpolarizing or a depolarizing light-activated polypeptide (an “opsin”). The light-activated polypeptide may be a light-activated ion channel or a light-activated ion pump. The light-activated ion channel polypeptides are adapted to allow one or more ions to pass through the plasma membrane of a neuron when the polypeptide is illuminated with light of an activating wavelength. Light-activated proteins may be characterized as ion pump proteins, which facilitate the passage of a small number of ions through the plasma membrane per photon of light, or as ion channel proteins, which allow a stream of ions to freely flow through the plasma membrane when the channel is open. In some embodiments, the light-activated polypeptide depolarizes the neuron when activated by light of an activating wavelength. Suitable depolarizing light-activated polypeptides, without limitation, are shown in FIG. 23. In some embodiments, the light-activated polypeptide hyperpolarizes the neuron when activated by light of an activating wavelength. Suitable hyperpolarizing light-activated polypeptides, without limitation, are shown in FIG. 24.

In some cases, a light-activated polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to an opsin amino acid sequence depicted in FIG. 23. In some cases, a light-activated polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to an opsin amino acid sequence depicted in FIG. 24.

In some embodiments, the light-activated polypeptides are activated by blue light. In some embodiments, the light-activated polypeptides are activated by green light. In some embodiments, the light-activated polypeptides are activated by yellow light. In some embodiments, the light-activated polypeptides are activated by orange light. In some embodiments, the light-activated polypeptides are activated by red light.

In some embodiments, the light-activated polypeptide expressed in a cell can be fused to one or more amino acid sequence motifs selected from the group consisting of a signal peptide, an endoplasmic reticulum (ER) export signal, a membrane trafficking signal, and/or an N-terminal golgi export signal. The one or more amino acid sequence motifs which enhance light-activated protein transport to the plasma membranes of mammalian cells can be fused to the N-terminus, the C-terminus, or to both the N- and C-terminal ends of the light-activated polypeptide. In some cases, the one or more amino acid sequence motifs which enhance light-activated polypeptide transport to the plasma membranes of mammalian cells is fused internally within a light-activated polypeptide. Optionally, the light-activated polypeptide and the one or more amino acid sequence motifs may be separated by a linker.

In some embodiments, the light-activated polypeptide can be modified by the addition of a trafficking signal (ts) which enhances transport of the protein to the cell plasma membrane. In some embodiments, the trafficking signal can be derived from the amino acid sequence of the human inward rectifier potassium channel Kir2.1. In other embodiments, the trafficking signal can comprise the amino acid sequence KSRITSEGEYIPLDQIDINV (SEQ ID NO:56). Trafficking sequences that are suitable for use can comprise an amino acid sequence having at least 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, amino acid sequence identity to an amino acid sequence such a trafficking sequence of human inward rectifier potassium channel Kir2.1 (e.g., KSRITSEGEYIPLDQIDINV (SEQ ID NO:56)).

A trafficking sequence can have a length of from about 10 amino acids to about 50 amino acids, e.g., from about 10 amino acids to about 20 amino acids, from about 20 amino acids to about 30 amino acids, from about 30 amino acids to about 40 amino acids, or from about 40 amino acids to about 50 amino acids.

ER export sequences that are suitable for use with a light-activated polypeptide include, e.g., VXXSL (where X is any amino acid; SEQ ID NO:52) (e.g., VKESL (SEQ ID NO:53); VLGSL (SEQ ID NO:54); etc.); NANSFCYENEVALTSK (SEQ ID NO:55); FXYENE (SEQ ID NO:57) (where X is any amino acid), e.g., FCYENEV (SEQ ID NO:58); and the like. An ER export sequence can have a length of from about 5 amino acids to about 25 amino acids, e.g., from about 5 amino acids to about 10 amino acids, from about 10 amino acids to about 15 amino acids, from about 15 amino acids to about 20 amino acids, or from about 20 amino acids to about 25 amino acids.

In some cases, a light-activated polypeptide is a fusion polypeptide that comprises an endoplasmic reticulum (ER) export signal (e.g., FCYENEV). In some cases, a light-activated polypeptide is a fusion polypeptide that comprises a membrane trafficking signal (e.g., KSRITSEGEYIPLDQIDINV). In some cases, a light-activated polypeptide is a fusion polypeptide comprising, in order from N-terminus to C-terminus: a) a light-activated polypeptide comprising an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to an opsin amino acid sequence depicted in FIG. 23 or FIG. 24; b) an ER export signal; and c) a membrane trafficking signal.

Toxins

Suitable toxins include polypeptide toxins present in a natural source (e.g., naturally-occurring), recombinantly produced toxins, and synthetically produced toxins. Suitable toxins include ribosome inactivating proteins (RIPs); a bacterial toxin; and the like.

Suitable toxins include, e.g., anthopleurin B (GVPCLCDSDG-PRPRGNTLSG-ILWFYPSGCP-SGWHNCKAHG-PNIGWCCKK; SEQ ID NO://), anthopleurin C, anthopleurin Q, calitoxin (MKTQVLALFV LCVLFCLAES RTTLNKRNDI EKRIECKCEG DAPDLSHMTG TVYFSCKGGD GSWSKCNTYT AVADCCHQA; SEQ ID NO://), a conotoxin, ectatomin, HsTx1, omega-atracotoxin, a raventoxin, a scorpion toxin, and the like.

Suitable bacterial toxins include, e.g., cholera toxin, botulinum toxin, diphtheria toxin (produced by Corynebacterium diphtheriae), tetanospasmin, an enterotoxin, hemolysin, shiga toxin, erythrogenic toxin, adenylate cyclase toxin, pertussis toxin, ST toxin, LT toxin, ricin, abrin, tetanus toxin, and the like.

Exemplary Type I RIPS include, but are not limited to, gelonin, dodecandrin, tricosanthin, tricokirin, bryodin, Mirabilis antiviral protein (MAP), barley ribosome-inactivating protein (BRIP), pokeweed antiviral proteins (PAPS), saporins, luffins, and momordins. Exemplary Type II RIPS include, but are not limited to, ricin and abrin.

Antibiotic Resistance Factors

As noted above, in some cases, the gene product of interest is an antibiotic resistance factor, e.g., a polypeptide that confers antibiotic resistance to a cell that produces the polypeptide.

Suitable antibiotic resistance factors include, but are not limited to, polypeptides that confer resistance to kanamycin, gentamicin, rifampin, trimethoprim, chloramphenicol, tetracycline, penicillin, methicillin, blasticidin, puromycin, hygromycin, or other antimicrobial agent. Suitable antibiotic resistance factors include, but are not limited to, aminoglycoside acetyltransferases, rifampin ADP-ribosyltransferases, dihydrofolate reductases, transporters, β-lactamases, chloramphenicol acetyltransferases, and efflux pumps. See, e.g., McGarvey et al. (2012) Applied Environ. Microbiol. 78:1708. Suitable antibiotic resistance factors include, but are not limited to, aminoglycoside 6′-N-acetyltransferase; gentamycin 3′-N-acetyltransferase; rifampin ADP-ribosyltransferase; dihydrofolate reductase; MFS transporter; ABC transporter; blasticidin-S deaminase; blasticidin acetyltransferase; puromycin N-acetyl-transferease; hygromycin kinase; and the like.

Recombinases

In some cases, the gene product of interest is a recombinase. The term “recombinase” refers to an enzyme that catalyzes DNA exchange at a specific target site, for example, a palindromic sequence, by excision/insertion, inversion, translocation, and exchange.

Suitable recombinases include, but are not limited to, Cre recombinase; a FLP recombinase; a Tel recombinase; and the like. A suitable recombinase is one that targets (and cleaves) a target site selected from a telRL site, a loxP site, a phi pK02 telRL site, an FRT site, phiC31 attP site, and λattP site.

A suitable recombinase can be selected from the group consisting of: TelN; Tel; Tel (gp26 K02 phage); Cre; Flp; phiC31; Int; and a lambdoid phage integrase (e.g. a phi 80 recombinase, a HK022 recombinase; an HP1 recombinase).

Examples of target sites for such recombinases include, e.g.: a telRL site (targeted by a TelN recombinase): TATCAGCACACAATTGCCCATTATACGCGCGTATAATGGACTAT TGTGTGCTGA (SEQ ID NO://); a pal site: ACCTATTTCAGCATACTACGCGCGTAGTATGCTGAAATAGGT (SEQ ID NO://); a phi K02 telRL site: CCATTATACGCGCGTATAATGG (SEQ ID NO://); a loxP site (targeted by a Cre recombinase): TAACTTCGTATAGCATACATTATACGAAGTTAT (SEQ ID NO://); a FRT site (targeted by a Flp recombinase): GAAGTTCCTATTCTCTAGAAAGTATAGGAACTTC (SEQ ID NO://); a phiC31 attP site (targeted by a phiC31 recombinase): CCCAGGTCAGAAGCGGTTTTCGGGAGTAGTGCCCCAACTGGGGT AACCTTTGAGTTCTCTCAGTTGGGGGCGTAGGGTCGCCGACAYGA CACAAGGGGTT (SEQ ID NO://); a λ attP site: TGATAGTGACCTGTTCGTTTGCAACACATTGATGAGCAATGCTT TTTTATAATGCCAACTTTGTACAAAAAAGCTGAACGAGAAACGTA AAATGATATAAA (SEQ ID NO://).

Additional Amino Acid Sequences

In some cases, the gene product is a fusion polypeptide comprising a fusion partner, where the fusion partner can be, e.g., a soma localization signal, a nuclear localization signal, a protein transduction domain, a mitochondrial localization signal, a chloroplast localization signal, an endoplasmic reticulum retention signal, an epitope tag, etc. For example, a suitable mitochondrial localization sequence is LGRVIPRKIASRASLM (SEQ ID NO://); or MSVLTPLLLRGLTGSARRLPVPRAKIHSLL (SEQ ID NO://).

Soma Localization Signal

In some cases, the transcription factor includes a soma localization signal. For example, a 66 amino acid C-terminal sequence of Kv2.1 or a 27 amino acid sequence of Nav1.6 induces localization to the soma of a neuron. For example, the Nav1.6 soma localization signal comprises the amino acid sequence: TVRVPIAVGESDFENLNTEDVSSESDP (SEQ ID NO://).

Nuclear Localization Signals

Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO://); the NLS from nucleoplasmin (e.g. the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO://)); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO://) or RQRRNELKRSP (SEQ ID NO://); the hRNPA1 M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO://); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO://) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO://) and PPKKARED (SEQ ID NO://) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO://) of human p53; the sequence SALIKKKKKMAP (SEQ ID NO://) of mouse c-abl IV; the sequences DRLRR (SEQ ID NO://) and PKQKKRK (SEQ ID NO://) of the influenza virus NS1; the sequence RKLKKKIKKL (SEQ ID NO://) of the Hepatitis virus delta antigen; the sequence REKKKFLKRR (SEQ ID NO://) of the mouse Mx1 protein; the sequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO://) of the human poly(ADP-ribose) polymerase; and the sequence RKCLQAGMNLEARKTKK (SEQ ID NO://) of the steroid hormone receptors (human) glucocorticoid.

A gene product can include a “Protein Transduction Domain” or PTD (also known as a CPP—cell penetrating peptide), which refers to a polypeptide that facilitates traversing a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane. A PTD attached to another polypeptide (a polypeptide gene product of interest) facilitates the polypeptide traversing a membrane, for example going from extracellular space to intracellular space, or cytosol to within an organelle. In some cases, a PTD attached to a polypeptide gene product of interest facilitates entry of the polypeptide into the nucleus (e.g., in some cases, a PTD includes a nuclear localization signal). In some cases, a PTD is covalently linked to the amino terminus of a polypeptide gene product of interest. In some cases, a PTD is covalently linked to the carboxyl terminus of a polypeptide gene product of interest. In some cases, a PTD is covalently linked to the amino terminus and to the carboxyl terminus of a polypeptide gene product of interest. Exemplary PTDs include but are not limited to a minimal undecapeptide protein transduction domain (corresponding to residues 47-57 of HIV-1 TAT comprising YGRKKRRQRRR; SEQ ID NO://); a polyarginine sequence comprising a number of arginines sufficient to direct entry into a cell (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or 10-50 arginines); a VP22 domain (Zender et al. (2002) Cancer Gene Ther. 9(6):489-96); an Drosophila Antennapedia protein transduction domain (Noguchi et al. (2003) Diabetes 52(7):1732-1737); a truncated human calcitonin peptide (Trehin et al. (2004) Pharm. Research 21:1248-1256); polylysine (Wender et al. (2000) Proc. Natl. Acad. Sci. USA 97:13003-13008); RRQRRTSKLMKR (SEQ ID NO://); Transportan GWTLNSAGYLLGKINLKALAALAKKIL (SEQ ID NO://); KALAWEAKLAKALAKALAKHLAKALAKALKCEA (SEQ ID NO://); and RQIKIWFQNRRMKWKK (SEQ ID NO://). Exemplary PTDs include but are not limited to, YGRKKRRQRRR (SEQ ID NO://), RKKRRQRRR (SEQ ID NO://); an arginine homopolymer of from 3 arginine residues to 50 arginine residues; Exemplary PTD domain amino acid sequences include, but are not limited to, any of the following: YGRKKRRQRRR (SEQ ID NO://); RKKRRQRR (SEQ ID NO://); YARAAARQARA (SEQ ID NO://); THRLPRRRRRR (SEQ ID NO://); and GGRRARRRRRR (SEQ ID NO://).

Nucleic Acids

As noted above, a nucleic acid system of the present disclosure (e.g., System 1; System 2; as described above) comprises two nucleic acids.

In some cases, the nucleotide sequence encoding the light-activated, calcium-gated fusion polypeptide and/or the nucleotide sequence encoding the second fusion polypeptide (the second fusion polypeptide comprising a calmodulin polypeptide or a troponin C polypeptide fused to a protease) is operably linked to a transcriptional control element (e.g., a promoter; an enhancer; etc.). In some cases, the transcriptional control element is inducible. In some cases, the transcriptional control element is constitutive. In some cases, the promoters are functional in eukaryotic cells. In some cases, the promoters are cell type-specific promoters. In some cases, the promoters are tissue-specific promoters. In some cases, the promoter to which the nucleotide sequence encoding the light-activated, calcium-gated fusion polypeptide is operably linked, and the promoter to which the nucleotide sequence encoding the second fusion polypeptide is operably linked, are substantially the same. In other cases, the promoter to which the nucleotide sequence encoding the light-activated, calcium-gated fusion polypeptide is operably linked is different from the promoter to which the nucleotide sequence encoding the second fusion polypeptide is operably linked.

Depending on the host/vector system utilized, any of a number of suitable transcription and translation control elements, including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector (see e.g., Bitter et al. (1987) Methods in Enzymology, 153:516-544).

A promoter can be a constitutively active promoter (i.e., a promoter that is constitutively in an active/“ON” state), it may be an inducible promoter (i.e., a promoter whose state, active/“ON” or inactive/“OFF”, is controlled by an external stimulus, e.g., the presence of a particular temperature, compound, or protein.), it may be a spatially restricted promoter (i.e., transcriptional control element, enhancer, etc.)(e.g., tissue specific promoter, cell type specific promoter, etc.), and it may be a temporally restricted promoter (i.e., the promoter is in the “ON” state or “OFF” state during specific stages of embryonic development or during specific stages of a biological process, e.g., hair follicle cycle in mice).

Suitable promoter and enhancer elements are known in the art. For expression in a eukaryotic cell, suitable promoters include, but are not limited to, light and/or heavy chain immunoglobulin gene promoter and enhancer elements; cytomegalovirus immediate early promoter; herpes simplex virus thymidine kinase promoter; early and late SV40 promoters; promoter present in long terminal repeats from a retrovirus; mouse metallothionein-I promoter; and various art-known tissue-specific promoters. Suitable promoters include, but are not limited to the SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6) (Miyagishi et al., Nature Biotechnology 20, 497-500 (2002)), an enhanced U6 promoter (e.g., Xia et al., Nucleic Acids Res. 2003 Sep. 1; 31(17)), a human H1 promoter (H1), and the like.

Suitable reversible promoters, including reversible inducible promoters are known in the art. Such reversible promoters may be isolated and derived from many organisms, e.g., eukaryotes and prokaryotes. Modification of reversible promoters derived from a first organism for use in a second organism, e.g., a first prokaryote and a second a eukaryote, a first eukaryote and a second a prokaryote, etc., is well known in the art. Such reversible promoters, and systems based on such reversible promoters but also comprising additional control proteins, include, but are not limited to, alcohol regulated promoters (e.g., alcohol dehydrogenase I (alcA) gene promoter, promoters responsive to alcohol transactivator proteins (AlcR), etc.), tetracycline regulated promoters, (e.g., promoter systems including TetActivators, TetON, TetOFF, etc.), steroid regulated promoters (e.g., rat glucocorticoid receptor promoter systems, human estrogen receptor promoter systems, retinoid promoter systems, thyroid promoter systems, ecdysone promoter systems, mifepristone promoter systems, etc.), metal regulated promoters (e.g., metallothionein promoter systems, etc.), pathogenesis-related regulated promoters (e.g., salicylic acid regulated promoters, ethylene regulated promoters, benzothiadiazole regulated promoters, etc.), temperature regulated promoters (e.g., heat shock inducible promoters (e.g., HSP-70, HSP-90, soybean heat shock promoter, etc.), light regulated promoters, synthetic inducible promoters, and the like.

Inducible promoters suitable for use include any inducible promoter described herein or known to one of ordinary skill in the art. Examples of inducible promoters include, without limitation, chemically/biochemically-regulated and physically-regulated promoters such as alcohol-regulated promoters, tetracycline-regulated promoters (e.g., anhydrotetracycline (aTc)-responsive promoters and other tetracycline-responsive promoter systems, which include a tetracycline repressor protein (tetR), a tetracycline operator sequence (tetO) and a tetracycline transactivator fusion protein (tTA)), steroid-regulated promoters (e.g., promoters based on the rat glucocorticoid receptor, human estrogen receptor, moth ecdysone receptors, and promoters from the steroid/retinoid/thyroid receptor superfamily), metal-regulated promoters (e.g., promoters derived from metallothionein (proteins that bind and sequester metal ions) genes from yeast, mouse and human), pathogenesis-regulated promoters (e.g., induced by salicylic acid, ethylene or benzothiadiazole (BTH)), temperature/heat-inducible promoters (e.g., heat shock promoters), and light-regulated promoters (e.g., light responsive promoters from plant cells).

In some cases, the promoter is a neuron-specific promoter. Suitable neuron-specific control sequences include, but are not limited to, a neuron-specific enolase (NSE) promoter (see, e.g., EMBL HSENO2, X51956; see also, e.g., U.S. Pat. No. 6,649,811, U.S. Pat. No. 5,387,742); an aromatic amino acid decarboxylase (AADC) promoter; a neurofilament promoter (see, e.g., GenBank HUMNFL, L04147); a synapsin promoter (see, e.g., GenBank HUMSYNIB, M55301); a thy-1 promoter (see, e.g., Chen et al. (1987) Cell 51:7-19; and Llewellyn et al. (2010) Nat. Med. 16:1161); a serotonin receptor promoter (see, e.g., GenBank S62283); a tyrosine hydroxylase promoter (TH) (see, e.g., Nucl. Acids. Res. 15:2363-2384 (1987) and Neuron 6:583-594 (1991)); a GnRH promoter (see, e.g., Radovick et al., Proc. Natl. Acad. Sci. USA 88:3402-3406 (1991)); an L7 promoter (see, e.g., Oberdick et al., Science 248:223-226 (1990)); a DNMT promoter (see, e.g., Bartge et al., Proc. Natl. Acad. Sci. USA 85:3648-3652 (1988)); an enkephalin promoter (see, e.g., Comb et al., EMBO J. 17:3793-3805 (1988)); a myelin basic protein (MBP) promoter; a CMV enhancer/platelet-derived growth factor-β promoter (see, e.g., Liu et al. (2004) Gene Therapy 11:52-60); a motor neuron-specific gene Hb9 promoter (see, e.g., U.S. Pat. No. 7,632,679; and Lee et al. (2004) Development 131:3295-3306); and an alpha subunit of Ca(²⁺)-calmodulin-dependent protein kinase II (CaMKIIα) promoter (see, e.g., Mayford et al. (1996) Proc. Natl. Acad. Sci. USA 93:13250). Other suitable promoters include elongation factor (EF) 1α and dopamine transporter (DAT) promoters.

In some cases, a nucleic acid of a system of the present disclosure is a recombinant expression vector. In some cases, the recombinant expression vector is a viral construct, e.g., a recombinant adeno-associated virus (AAV) construct, a recombinant adenoviral construct, a recombinant lentiviral construct, a recombinant retroviral construct, etc. In some cases, a nucleic acid of a system of the present disclosure is a recombinant lentivirus vector. In some cases, a nucleic acid of a system of the present disclosure is a recombinant AAV vector.

Suitable expression vectors include, but are not limited to, viral vectors (e.g. viral vectors based on vaccinia virus; poliovirus; adenovirus (see, e.g., Li et al., Invest Opthalmol Vis Sci 35:2543 2549, 1994; Borras et al., Gene Ther 6:515 524, 1999; Li and Davidson, PNAS 92:7700 7704, 1995; Sakamoto et al., Hum Gene Ther 5:1088 1097, 1999; WO 94/12649, WO 93/03769; WO 93/19191; WO 94/28938; WO 95/11984 and WO 95/00655); adeno-associated virus (see, e.g., Ali et al., Hum Gene Ther 9:81 86, 1998, Flannery et al., PNAS 94:6916 6921, 1997; Bennett et al., Invest Opthalmol Vis Sci 38:2857 2863, 1997; Jomary et al., Gene Ther 4:683 690, 1997, Rolling et al., Hum Gene Ther 10:641 648, 1999; Ali et al., Hum Mol Genet 5:591 594, 1996; Srivastava in WO 93/09239, Samulski et al., J. Vir. (1989) 63:3822-3828; Mendelson et al., Virol. (1988) 166:154-165; and Flotte et al., PNAS (1993) 90:10613-10617); SV40; herpes simplex virus; human immunodeficiency virus (see, e.g., Miyoshi et al., PNAS 94:10319 23, 1997; Takahashi et al., J Virol 73:7812 7816, 1999); a retroviral vector (e.g., Murine Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, a lentivirus, human immunodeficiency virus, myeloproliferative sarcoma virus, and mammary tumor virus); and the like. In some cases, the vector is a lentivirus vector. Also suitable are transposon-mediated vectors, such as piggyback and sleeping beauty vectors.

In some cases, a nucleic acid system of the present disclosure is packaged in a viral particle. For example, in some cases, the nucleic acids of a nucleic acid system of the present disclosure are recombinant AAV vectors, and are packaged in recombinant AAV particles. Thus, the present disclosure provides a recombinant viral particle comprising a nucleic acid system of the present disclosure.

Genetically Modified Host Cells

The present disclosure provides a genetically modified host cell (e.g., an in vitro genetically modified host cell) comprising a nucleic acid system of the present disclosure. In some cases, one or both of the first and the second nucleic acid of a nucleic acid system of the present disclosure is stably integrated into the genome of the host cell. In some instances, one or both of the first and the second nucleic acid of a nucleic acid system of the present disclosure is present episomally in the genetically modified host cell.

In some cases, the genetically modified host cell is a primary (non-immortalized) cell. In some cases, the genetically modified host cell is an immortalized cell line.

Suitable host cells include mammalian cells, insect cells, reptile cells, amphibian cells, arachnid cells, plant cells, bacterial cells, archaeal cells, yeast cells, algal cells, fungal cells, and the like.

In some cases, the genetically modified host cell is a mammalian cell, e.g., a human cell, a non-human primate cell, a rodent cell, a feline (e.g., a cat) cell, a canine (e.g., a dog) cell, an ungulate cell, an equine (e.g., a horse) cell, an ovine cell, a caprine cell, a bovine cell, etc. In some cases, the genetically modified host cell is a rodent cell (e.g., a rat cell; a mouse cell). In some cases, the genetically modified host cell is a human cell. In some cases, the genetically modified host cell is a non-human primate cell.

Suitable mammalian cells include primary cells and immortalized cell lines. Suitable mammalian cell lines include human cell lines, non-human primate cell lines, rodent (e.g., mouse, rat) cell lines, and the like. Suitable mammalian cell lines include, but are not limited to, HeLa cells (e.g., American Type Culture Collection (ATCC) No. CCL-2), CHO cells (e.g., ATCC Nos. CRL9618, CCL61, CRL9096), 293 cells (e.g., ATCC No. CRL-1573), Vero cells, NIH 3T3 cells (e.g., ATCC No. CRL-1658), Huh-7 cells, BHK cells (e.g., ATCC No. CCL10), PC12 cells (ATCC No. CRL1721), COS cells, COS-7 cells (ATCC No. CRL1651), RAT1 cells, mouse L cells (ATCC No. CCLI.3), human embryonic kidney (HEK) cells (ATCC No. CRL1573), HLHepG2 cells, and the like.

Suitable host cells include cells of, e.g., Bacteria (e.g., Eubacteria); Archaebacteria; Protista; Fungi; Plantae; and Animalia. Suitable host cells include cells of plant-like members of the kingdom Protista, including, but not limited to, algae (e.g., green algae, red algae, glaucophytes, cyanobacteria); fungus-like members of Protista, e.g., slime molds, water molds, etc.; animal-like members of Protista, e.g., flagellates (e.g., Euglena), amoeboids (e.g., amoeba), sporozoans (e.g, Apicomplexa, Myxozoa, Microsporidia), and ciliates (e.g., Paramecium). Suitable host cells include cells of members of the kingdom Fungi, including, but not limited to, members of any of the phyla: Basidiomycota (club fungi; e.g., members of Agaricus, Amanita, Boletus, Cantherellus, etc.); Ascomycota (sac fungi, including, e.g., Saccharomyces); Mycophycophyta (lichens); Zygomycota (conjugation fungi); and Deuteromycota. Suitable host cells include cells of members of the kingdom Plantae, including, but not limited to, members of any of the following divisions: Bryophyta (e.g., mosses), Anthocerotophyta (e.g., hornworts), Hepaticophyta (e.g., liverworts), Lycophyta (e.g., club mosses), Sphenophyta (e.g., horsetails), Psilophyta (e.g., whisk ferns), Ophioglossophyta, Pterophyta (e.g., ferns), Cycadophyta, Gingkophyta, Pinophyta, Gnetophyta, and Magnoliophyta (e.g., flowering plants). Suitable host cells include cells of members of the kingdom Animalia, including, but not limited to, members of any of the following phyla: Porifera (sponges); Placozoa; Orthonectida (parasites of marine invertebrates); Rhombozoa; Cnidaria (corals, anemones, jellyfish, sea pens, sea pansies, sea wasps); Ctenophora (comb jellies); Platyhelminthes (flatworms); Nemertina (ribbon worms); Ngathostomulida (jawed worms)p Gastrotricha; Rotifera; Priapulida; Kinorhyncha; Loricifera; Acanthocephala; Entoprocta; Nemotoda; Nematomorpha; Cycliophora; Mollusca (mollusks); Sipuncula (peanut worms); Annelida (segmented worms); Tardigrada (water bears); Onychophora (velvet worms); Arthropoda (including the subphyla: Chelicerata, Myriapoda, Hexapoda, and Crustacea, where the Chelicerata include, e.g., arachnids, Merostomata, and Pycnogonida, where the Myriapoda include, e.g., Chilopoda (centipedes), Diplopoda (millipedes), Paropoda, and Symphyla, where the Hexapoda include insects, and where the Crustacea include shrimp, krill, barnacles, etc.; Phoronida; Ectoprocta (moss animals); Brachiopoda; Echinodermata (e.g. starfish, sea daisies, feather stars, sea urchins, sea cucumbers, brittle stars, brittle baskets, etc.); Chaetognatha (arrow worms); Hemichordata (acorn worms); and Chordata. Suitable members of Chordata include any member of the following subphyla: Urochordata (sea squirts; including Ascidiacea, Thaliacea, and Larvacea); Cephalochordata (lancelets); Myxini (hagfish); and Vertebrata, where members of Vertebrata include, e.g., members of Petromyzontida (lampreys), Chondrichthyces (cartilaginous fish), Actinopterygii (ray-finned fish), Actinista (coelocanths), Dipnoi (lungfish), Reptilia (reptiles, e.g., snakes, alligators, crocodiles, lizards, etc.), Aves (birds); and Mammalian (mammals). Suitable plant cells include cells of any monocotyledon and cells of any dicotyledon. Plant cells include, e.g., a cell of a leaf, a root, a tuber, a flower, and the like. In some cases, the genetically modified host cell is a plant cell. In some cases, the genetically modified host cell is a bacterial cell. In some cases, the genetically modified host cell is an archaeal cell.

Suitable eukaryotic host cells include, but are not limited to, Pichia pastoris, Pichia finlandica, Pichia trehalophila, Pichia koclamae, Pichia membranaefaciens, Pichia opuntiae, Pichia thermotolerans, Pichia salictaria, Pichia guercuum, Pichia pijperi, Pichia stiptis, Pichia methanolica, Pichia sp., Saccharomyces cerevisiae, Saccharomyces sp., Hansenula polymorpha, Kluyveromyces sp., Kluyveromyces lactis, Candida albicans, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Trichoderma reesei, Chrysosporium lucknowense, Fusarium sp., Fusarium gramineum, Fusarium venenatum, Neurospora crassa, Chlamydomonas reinhardtii, and the like. In some cases, subject genetically modified host cell is a yeast cell. In some instances, the yeast cell is Saccharomyces cerevisiae.

Suitable prokaryotic cells include any of a variety of bacteria, including laboratory bacterial strains, pathogenic bacteria, etc. Suitable prokaryotic hosts include, but are not limited, to any of a variety of gram-positive, gram-negative, or gram-variable bacteria. Examples include, but are not limited to, cells belonging to the genera: Agrobacterium, Alicyclobacillus, Anabaena, Anacystis, Arthrobacter, Azobacter, Bacillus, Brevibacterium, Chromatium, Clostridium, Corynebacterium, Enterobacter, Erwinia, Escherichia, Lactobacillus, Lactococcus, Mesorhizobium, Methylobacterium, Microbacterium, Phormidium, Pseudomonas, Rhodobacter, Rhodopseudomonas, Rhodospirillum, Rhodococcus, Salmonella, Scenedesmun, Serratia, Shigella, Staphylococcus, Strepromyces, Synnecoccus, and Zymomonas. Examples of prokaryotic strains include, but are not limited to: Bacillus subtilis, Bacillus amyloliquefacines, Brevibacterium ammoniagenes, Brevibacterium immariophilum, Clostridium beigerinckii, Enterobacter sakazakii, Escherichia coli, Lactococcus lactis, Mesorhizobium loti, Pseudomonas aeruginosa, Pseudomonas mevalonii, Pseudomonas pudica, Rhodobacter capsulatus, Rhodobacter sphaeroides, Rhodospirillum rubrum, Salmonella enterica, Salmonella typhi, Salmonella typhimurium, Shigella dysenteriae, Shigella flexneri, Shigella sonnei, and Staphylococcus aureus. One example of a suitable bacterial host cell is Escherichia coli cell.

Suitable plant cells include cells of a monocotyledon; cells of a dicotyledon; cells of an angiosperm; cells of a gymnosperm; etc.

System for Light-Activated, Calcium-Gated Transcription Control

The present disclosure provides a system (a “FLARE” system) for light-activated, calcium-gated transcriptional control of expression of a target gene product. A FLARE system of the present disclosure in some cases comprises 3 components: 1) a first fusion polypeptide comprising: a) a calcium-binding polypeptide; and b) a protease; 2) a second fusion polypeptide comprising: a) a transmembrane domain; b) a polypeptide that binds the calcium-binding polypeptide under certain Ca²⁺ concentration conditions (e.g., a Ca²⁺ concentration above about 100 nM); c) a light-activated polypeptide comprising a LOV domain; d) a proteolytically cleavable linker that is caged by the light-activated polypeptide in the absence of blue light; and e) a transcription factor; and 3) a construct that comprises: a) a promoter that is activated by the transcription factor; and b) a nucleotide sequence encoding a gene product of interest, where the nucleotide sequence is operably linked to the promoter. Each of these components is described in detail below. In some cases, a FLARE system of the present disclosure comprises one of the above-mentioned components. In some cases, a FLARE system of the present disclosure comprises two of the above-mentioned components.

The present disclosure provides one or more nucleic acids comprising nucleotide sequences encoding one or more components of a FLARE system of the present disclosure, as well as genetically modified host cells comprising the one or more nucleic acids.

Thus, the present disclosure provides a system comprising: 1) a first fusion polypeptide comprising: a) a calcium-binding polypeptide selected from a calmodulin polypeptide and a troponin C polypeptide; and b) a protease; 2) a second fusion polypeptide comprising: a) a transmembrane domain; b) a polypeptide that binds the calcium-binding polypeptide under certain Ca²⁺ concentration conditions (e.g., a Ca²⁺ concentration above about 100 nM); c) a light-activated polypeptide comprising a LOV domain; d) a proteolytically cleavable linker that is caged by the light-activated polypeptide in the absence of blue light; and e) a transcription factor. The present disclosure provides a nucleic acid system comprising: 1) a first nucleic acid comprising a nucleotide sequence encoding the first fusion polypeptide; and 2) a second nucleic acid comprising a nucleotide sequence encoding the second fusion polypeptide. In some cases, the system comprises a genetically modified host cell, where the host cell is genetically modified with a nucleotide sequence encoding a gene product of interest, where the nucleotide sequence is operably linked to a promoter that is controlled by the transcription factor.

The present disclosure provides a system comprising: a nucleic acid comprising: a) a nucleotide sequence encoding a fusion polypeptide comprising: i) a transmembrane domain; ii) calmodulin-binding polypeptide or a troponin I polypeptide that binds calmodulin or troponin C, respectively, under certain Ca²⁺ concentration conditions (e.g., a Ca²⁺ concentration above about 100 nM); ii) a light-activated polypeptide comprising a LOV domain; and iii) a proteolytically cleavable linker that is caged by the light-activated polypeptide in the absence of blue light; and b) an insertion site for inserting a nucleic acid comprising a nucleotide sequence encoding a transcription factor.

Fusion Polypeptide Comprising a Calcium-Binding Protein and a Protease

As noted above, a component of a FLARE system of the present disclosure can include a fusion polypeptide comprising: a) a calcium-binding polypeptide selected from a calmodulin polypeptide and a troponin C polypeptide; and b) a protease.

Calmodulin

In some cases, a suitable calmodulin polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following calmodulin amino acid sequence: MDQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG DGTIDFPEFLTMMARKMKYTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTD EEVDEMIREADIDGDGQVNYEEFVQMMTAK (SEQ ID NO://); and has a substitution of V35; and has a length of from about 148 amino acids to about 160 amino acids. In some cases, the calmodulin polypeptide has a length of 148 amino acids. In some cases, the V35 substitution is a V35G substitution, a V35A substitution, a V35L substitution, or a V35I substitution.

In some cases, a suitable calmodulin polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following calmodulin amino acid sequence: MDQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG DGTIDFPEFLTMMARKMKYTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTD EEVDEMIREADIDGDGQVNYEEFVQMMTAK (SEQ ID NO://); and has an F19 substitution (e.g., an F19L substitution, an F19I substitution, an F19V substitution, or an F19A substitution) and a V35 substitution (e.g., a V35G substitution, a V35A substitution, a V35L substitution, or a V35I substitution); and has a length of from about 148 amino acids to about 160 amino acids. In some cases, the calmodulin polypeptide has a length of 148 amino acids.

In some cases, a suitable calmodulin polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following calmodulin amino acid sequence: MDQLTEEQIAEFKEAFSLLDKDGDGTITTKELGTGMRSLGQNPTEAELQDMINEVDADG DGTIDFPEFLTMMARKMKYTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTD EEVDEMIREADIDGDGQVNYEEFVQMMTAK (SEQ ID NO://); and comprises a Leu at amino acid 19 and a Gly at amino acid 35; and has a length of from about 148 amino acids to about 160 amino acids. In some cases, the calmodulin polypeptide has a length of 148 amino acids.

Troponin C

A suitable troponin C polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following troponin C amino acid sequence: MTDQQAEARSYLSEEMIAEFKAAFDMFDADGGGDISVKELGTVMRMLGQTPTKEELD AIIEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRDANGYIDAEELA EIFRASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ (SEQ ID NO://; and has a length of from about 160 amino acids to about 175 amino acids (e.g., from about 160 amino acids to about 165 amino acids, from about 165 amino acids to about 170 amino acids, or from about 170 amino acids to about 175 amino acids. In some cases, a suitable troponin C polypeptide comprises the amino acid sequence: MTDQQAEARSYLSEEMIAEFKAAFDMFDADGGGDISVKELGTVMRMLGQTPTKEELD AIIEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRDANGYIDAEELA EIFRASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ (SEQ ID NO://; and has a length of 160 amino acids.

Proteases

In some cases, the protease is a protease that is normally produced in a particular cell; e.g., the protease is an endogenous protease.

Suitable proteases include, but are not limited to, alanine carboxypeptidase, Armillaria mellea astacin, bacterial leucyl aminopeptidase, cancer procoagulant, cathepsin B, clostripain, cytosol alanyl aminopeptidase, elastase, endoproteinase Arg-C, enterokinase, gastricsin, gelatinase, Gly-X carboxypeptidase, glycyl endopeptidase, human rhinovirus 3C protease, hypodermin C, IgA-specific serine endopeptidase, leucyl aminopeptidase, leucyl endopeptidase, lysC, lysosomal pro-X carboxypeptidase, lysyl aminopeptidase, methionyl aminopeptidase, myxobacter, nardilysin, pancreatic endopeptidase E, picornain 2A, picornain 3C, proendopeptidase, prolyl aminopeptidase, proprotein convertase I, proprotein convertase II, russellysin, saccharopepsin, semenogelase, T-plasminogen activator, thrombin, tissue kallikrein, tobacco etch virus (TEV), togavirin, tryptophanyl aminopeptidase, U-plasminogen activator, V8, venombin A, venombin AB, and Xaa-pro aminopeptidase.

Fusion Polypeptide Comprising a Transcription Factor

As noted above, a component of a FLARE system of the present disclosure can include a fusion polypeptide comprising: a) a transmembrane domain; b) a polypeptide that binds a calmodulin polypeptide or a troponin C polypeptide under certain Ca²⁺ concentration conditions (e.g., a Ca²⁺ concentration above about 100 nM); c) a light-activated polypeptide comprising a LOV domain; d) a proteolytically cleavable linker that is caged by the light-activated polypeptide in the absence of blue light; and e) a transcription factor.

The present disclosure provides a light-activated, calcium-gated transcriptional control polypeptide. A light-activated, calcium-gated transcriptional control polypeptide can comprise, in order from amino terminus (N-terminus) to carboxyl terminus (C-terminus): i) a transmembrane domain; ii) a polypeptide that binds a calmodulin polypeptide or a troponin C polypeptide under certain Ca²⁺ concentration conditions (e.g., a Ca²⁺ concentration above about 100 nM); iii) a light-activated polypeptide that comprises a LOV domain; iv) a proteolytically cleavable linker; and v) a transcription factor.

Transmembrane Domain

Any of a variety of transmembrane domains (transmembrane polypeptides) can be used in a light-activated, calcium-gated transcriptional control polypeptide of the present disclosure. A suitable transmembrane domain is any polypeptide that is thermodynamically stable in a membrane, e.g., a eukaryotic cell membrane such as a mammalian cell membrane. Suitable transmembrane domains include a single alpha helix, a transmembrane beta barrel, or any other structure.

Calmodulin-Binding Polypeptides and Troponin I Polypeptides

In some cases, a light-activated, calcium-gated transcriptional control polypeptide comprises a calmodulin-binding polypeptide. In some cases, a light-activated, calcium-gated transcriptional control polypeptide comprises a troponin I polypeptide.

Calmodulin-Binding Polypeptides

In some cases, a suitable troponin I polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following troponin I amino acid sequence: NQKLFDLRGKFKRPPLRRVRMSADAMLKALLGSKHKVAMDLRAN (SEQ ID NO://); and has a length of from about 44 amino acids to about 50 amino acids (e.g., 44, 45, 46, 47, 4, 49, or 50 amino acids). In some cases, the troponin I polypeptide has the amino acid sequence: NQKLFDLRGKFKRPPLRRVRMSADAMLKALLGSKHKVAMDLRAN (SEQ ID NO://); and has a length of 44 amino acids.

Calmodulin-Binding Polypeptides

In some cases, a suitable calmodulin polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following calmodulin amino acid sequence: MDQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG DGTIDFPEFLTMMARKMKYTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTD EEVDEMIREADIDGDGQVNYEEFVQMMTAK (SEQ ID NO://); and has a substitution of V35; and has a length of from about 148 amino acids to about 160 amino acids. In some cases, the calmodulin polypeptide has a length of 148 amino acids. In some cases, the V35 substitution is a V35G substitution, a V35A substitution, a V35L substitution, or a V35I substitution.

In some cases, a suitable calmodulin polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following calmodulin amino acid sequence: MDQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADG DGTIDFPEFLTMMARKMKYTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTD EEVDEMIREADIDGDGQVNYEEFVQMMTAK (SEQ ID NO://); and has an F19 substitution (e.g., an F19L substitution, an F19I substitution, an F19V substitution, or an F19A substitution) and a V35 substitution (e.g., a V35G substitution, a V35A substitution, a V35L substitution, or a V35I substitution); and has a length of from about 148 amino acids to about 160 amino acids. In some cases, the calmodulin polypeptide has a length of 148 amino acids.

In some cases, a suitable calmodulin polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following calmodulin amino acid sequence: MDQLTEEQIAEFKEAFSLLDKDGDGTITTKELGTGMRSLGQNPTEAELQDMINEVDADG DGTIDFPEFLTMMARKMKYTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTD EEVDEMIREADIDGDGQVNYEEFVQMMTAK (SEQ ID NO://); and comprises a Leu at amino acid 19 and a Gly at amino acid 35; and has a length of from about 148 amino acids to about 160 amino acids. In some cases, the calmodulin polypeptide has a length of 148 amino acids.

LOV Domain Light-Responsive Polypeptide

A LOV domain light-activated polypeptide suitable for inclusion in a light-activated, calcium-gated transcriptional control polypeptide of the present disclosure is activatable by blue light, and can cage a proteolytically cleavable linker attached to the light-activated polypeptide. Thus, in the absence of blue light, the proteolytically cleavable linker is caged, i.e., inaccessible to a protease. In the presence of blue light, the light-activated polypeptide undergoes a conformational change, such that the proteolytically cleavable linker is uncaged and becomes accessible to a protease. A light-activated polypeptide suitable for inclusion in a light-activated, calcium-gated transcriptional control polypeptide of the present disclosure is a light, oxygen, or voltage (LOV) polypeptide.

A LOV polypeptide suitable for inclusion in a light-activated, calcium-gated transcriptional control polypeptide of the present disclosure can have a length of from about 100 amino acids to about 150 amino acids. For example, a LOV polypeptide can comprise an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the LOV2 domain of Avena sativa phototropin 1 (AsLOV2).

In some cases, a suitable LOV polypeptide comprises an amino sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: SLATTLERIEKNFVITDPRLPDNPIIFASDSFLQLTEYSREEILGRNCRFLQGPETDRATVR KIRDAIDNQTEVTVQLINYTKSGKKFWNLFHLQPMRDQKGDVQYFIGVQLDGTEHVRD AAEREAVMLIKKTAEEIDEAAK (SEQ ID NO://); and comprises a substitution at one or more of amino acids L2, N12, A28, H117, and I130, where the numbering is based on the amino acid sequence SLATTLERIEKNFVITDPRLPDNPIIFASDSFLQLTEYSREEILGRNCRFLQGPETDRATVR KIRDAIDNQTEVTVQLINYTKSGKKFWNLFHLQPMRDQKGDVQYFIGVQLDGTEHVRD AAEREAVMLIKKTAEEIDEAAK (SEQ ID NO://). In some cases, the LOV polypeptide comprises a substitution selected from an L2R substitution, an L2H substitution, an L2P substitution, and an L2K substitution. In some cases, the LOV polypeptide comprises a substitution selected from an N12S substitution, an N12T substitution, and an N12Q substitution. In some cases, the LOV polypeptide comprises a substitution selected from an A28V substitution, an A28I substitution, and an A28L substitution. In some cases, the LOV polypeptide comprises a substitution selected from an H117R substitution, and an H117K substitution. In some cases, the LOV polypeptide comprises a substitution selected from an I130V substitution, an I130A substitution, and an I130L substitution. In some cases, the LOV polypeptide comprises substitutions at amino acids L2, N12, and I130. In some cases, the LOV polypeptide comprises substitutions at amino acids L2, N12, H117, and I130. In some cases, the LOV polypeptide comprises substitutions at amino acids A28 and H117. In some cases, the LOV polypeptide comprises substitutions at amino acids N12 and I130. In some cases, the LOV polypeptide comprises an L2R substitution, an N12S substitution, and an I130V substitution. In some cases, the LOV polypeptide comprises an N12S substitution and an I130V substitution. In some cases, the LOV polypeptide comprises an A28V substitution and an H117R substitution. In some cases, the LOV polypeptide comprises an L2P substitution, an N12S substitution, an I130V substitution, and an H117R substitution. In some cases, the LOV polypeptide comprises an L2P substitution, an N12S substitution, an A28V substitution, an H117R substitution, and an I130V substitution. In some cases, the LOV polypeptide comprises an L2P substitution, an N12S substitution, an I130V substitution, and an H117R substitution. In some cases, the LOV polypeptide comprises an L2R substitution, an N12S substitution, an A28V substitution, an H117R substitution, and an I130V substitution. In some cases, the LOV polypeptide has a length of 142 amino acids, 143 amino acids, 144 amino acids, 145 amino acids, 146 amino acids, 147 amino acids, 148 amino acids, 149 amino acids, or 150 amino acids. In some cases, the LOV polypeptide has a length of 142 amino acids.

In some cases, a suitable LOV domain light-activated polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15D, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 25 is Val; amino acid 28 is Val; amino acid 117 is Arg; amino acid 126 is Ala; amino acid 130 is Val; and amino acid 136 is Glu. In some case, the LOV domain light-activated polypeptide has a length of 142 amino acids.

In some cases, a suitable LOV domain light-activated polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15E, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 28 is Ala; amino acid 91 is Val; amino acid 100 is Tyr; amino acid 117 is Arg; amino acid 118 is Leu; amino acid 119 is His; amino acid 120 is Gly; amino acid 126 is Ala; amino acid 128 is Cys; amino acid 130 is Val; amino acid 135 is Phe; amino acid 136 is Gln; and amino acid 138 is Ala. In some case, the LOV domain light-activated polypeptide has a length of 142 amino acids.

In some cases, a suitable LOV domain light-activated polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15F, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 28 is Val; amino acid 117 is Arg; amino acid 126 is Ala; amino acid 130 is Val; and amino acid 136 is Glu. In some case, the LOV domain light-activated polypeptide has a length of 138 amino acids.

In some cases, a suitable LOV domain light-activated polypeptide comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15G, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 28 is Val; amino acid 91 is Val; amino acid 100 is Tyr; amino acid 117 is Arg; amino acid 118 is Leu; amino acid 119 is His; amino acid 120 is Gly; amino acid 126 is Ala; amino acid 128 is Cys; amino acid 130 is Val; amino acid 135 is Phe; amino acid 136 is Gln; and amino acid 138 is Ala. In some case, the LOV domain light-activated polypeptide has a length of 138 amino acids.

In some cases, a LOV light-activated polypeptide comprises the following amino acid sequence:

(SEQ ID NO: //)

FRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRN

CRFLQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNVFHL

QPMRDYKGDVQYFIGVQLDGTERLHGAAEREAVCLVKKTAFQIA.

In some cases, a LOV light-activated polypeptide comprises the following amino acid sequence:

(SEQ ID NO: //)

SRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRN

CRFLQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNLFHL

QPMRDQKGDVQYFIGVQLDGTERVRDAAEREAVMLVKKTAEEID.

In some cases, a LOV light-activated polypeptide comprises the following amino acid sequence:

(SEQ ID NO: //)

FRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRN

CRFLQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNVFHL

QPMRDYKGDVQYFIGVQLDGTERLHGAAEREAVCLVKKTAFQIA.

In some cases, a LOV light-activated polypeptide comprises the following amino acid sequence:

(SEQ ID NO: //)

SRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRN

CRFLQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNVFHL

QPMRDYKGDVQYFIGVQLDGTERLHGAAEREAVCLVKKTAFEIDEAA

K.

In some cases, a LOV light-activated polypeptide comprises the following amino acid sequence:

(SEQ ID NO: //)

SRATTLERIEKSFVITDPRLPDNPIIFVSDSFLQLTEYSREEILGRN

CRFLQGPETDRATVRKIRDAIDNQTEVTVQLINYTKSGKKFWNLFHL

QPMRDQKGDVQYFIGVQLDGTERVRDAAEREAVMLVKKTAEEIDEAA

K.

Proteolytically Cleavable Linker

Suitable proteolytically cleavable linkers also include ENLYFQS (SEQ ID NO://), ENLYFQY (SEQ ID NO://), ENLYFQL (SEQ ID NO://), ENLYFQW (SEQ ID NO://), ENLYFQM (SEQ ID NO://), ENLYFQH (SEQ ID NO://), ENLYFQN (SEQ ID NO://), ENLYFQA (SEQ ID NO://), and ENLYFQQ (SEQ ID NO://).

Suitable proteolytically cleavable linkers also include NS3 protease cleavage sites such as: DEVVECS (SEQ ID NO://), DEAEDVVECS (SEQ ID NO://), EDAAEEVVECS (SEQ ID NO://).

Suitable proteolytically cleavable linkers also include calpain cleavage site, where suitable calpain cleavage sites include, e.g., PLFAAR (SEQ ID NO://) and QQEVYGMMPRD (SEQ ID NO://).

Transcription Factor

Suitable transcription factors include naturally-occurring transcription factors and recombinant (e.g., non-naturally occurring, engineered, artificial, synthetic) transcription factors. In some cases the transcriptional activator is an engineered protein, such as a zinc finger or TALE based DNA binding domain fused to an effector domain such as VP64 (transcriptional activation).

Suitable transcription factors include transcriptional activators, where suitable transcriptional activators include, but are not limited to, GAL4-VP16, GAL5-VP64, Tbx21, tTA-VP16, VP16, VP64, GAL4, p65, LexA-VP16, GAL4-NFκB, and the like. Amino acid sequences of suitable transcriptional activators are known in the art. For example, a tTA-VP16 transcription factor can comprise an amino acid sequence having at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, to the following amino acid sequence:

MSRLDKSKVINSALELLNEVGIEGLTTRKLAQKLGVEQPTLYWHVKNKRALLD ALAIEMLDRHHTHFCPLEGESWQDFLRNNAKSFRCALLSHRDGAKVHLGTRPTEKQYE TLENQLAFLCQQGFSLENALYALSAVGHFTLGCVLEDQEHQVAKEERETPTTDSMPPLL RQAIELFDHQGAEPAFLFGLELIICGLEKQLKCESGSAYSRARTKNNYGSTIEGLLDLPDD DAPEEAGLAAPRLSFLPAGHTRRLSTAPPTDVSLGDELHLDGEDVAMAHADALDDFDL DMLGDGDSPGPGFTPHDSAPYGALDMADFEFEQMFTDALGIDEYGG (SEQ ID NO://). A tTA-VP16 transcription activator binds to, e.g., a TRE promoter (see, e.g., FIGS. 27A and 27B).

Additional Amino Acid Sequences

A fusion polypeptide comprising: a) a TM domain; b) a polypeptide that binds a calcium-binding polypeptide; c) a light-activated polypeptide comprising a LOV domain; d) a proteolytically cleavable linker; and e) a transcription factor can include one or more additional polypeptides. The one or more additional polypeptides can be, e.g., a soma localization signal; a nuclear localization signal; etc.

Soma Localization Signal

Nuclear Localization Signals

A transcription factor can include a “Protein Transduction Domain” or PTD (also known as a CPP—cell penetrating peptide), which refers to a polypeptide that facilitates traversing a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane. A PTD attached to another polypeptide (a polypeptide gene product of interest) facilitates the polypeptide traversing a membrane, for example going from extracellular space to intracellular space, or cytosol to within an organelle. In some cases, a PTD attached to a polypeptide gene product of interest facilitates entry of the polypeptide into the nucleus (e.g., in some cases, a PTD includes a nuclear localization signal). In some cases, a PTD is covalently linked to the amino terminus of a polypeptide gene product of interest. In some cases, a PTD is covalently linked to the carboxyl terminus of a polypeptide gene product of interest. In some cases, a PTD is covalently linked to the amino terminus and to the carboxyl terminus of a polypeptide gene product of interest. Exemplary PTDs include but are not limited to a minimal undecapeptide protein transduction domain (corresponding to residues 47-57 of HIV-1 TAT comprising YGRKKRRQRRR; SEQ ID NO://); a polyarginine sequence comprising a number of arginines sufficient to direct entry into a cell (e.g., 3, 4, 5, 6, 7, 8, 9, 10, or 10-50 arginines); a VP22 domain (Zender et al. (2002) Cancer Gene Ther. 9(6):489-96); an Drosophila Antennapedia protein transduction domain (Noguchi et al. (2003) Diabetes 52(7):1732-1737); a truncated human calcitonin peptide (Trehin et al. (2004) Pharm. Research 21:1248-1256); polylysine (Wender et al. (2000) Proc. Natl. Acad. Sci. USA 97:13003-13008); RRQRRTSKLMKR (SEQ ID NO://); Transportan GWTLNSAGYLLGKINLKALAALAKKIL (SEQ ID NO://); KALAWEAKLAKALAKALAKHLAKALAKALKCEA (SEQ ID NO://); and RQIKIWFQNRRMKWKK (SEQ ID NO://). Exemplary PTDs include but are not limited to, YGRKKRRQRRR (SEQ ID NO://), RKKRRQRRR (SEQ ID NO://); an arginine homopolymer of from 3 arginine residues to 50 arginine residues; Exemplary PTD domain amino acid sequences include, but are not limited to, any of the following: YGRKKRRQRRR (SEQ ID NO://); RKKRRQRR (SEQ ID NO://); YARAAARQARA (SEQ ID NO://); THRLPRRRRRR (SEQ ID NO://); and GGRRARRRRRR (SEQ ID NO://).

Target Genes

The transcription factor can control expression of any of a variety of gene products. “Gene products” as used herein, include polypeptide gene products and nucleic acid gene products.

Polypeptide Gene Products

In some cases, a transcription factor present in a light-activated, calcium-gated transcription control polypeptide of the present disclosure, when released from the light-activated, calcium-gated transcription control polypeptide by cleavage of the proteolytically cleavable linker, controls transcription of a nucleotide sequence encoding a polypeptide.

Suitable polypeptide gene products include, but are not limited to, a reporter gene product, an opsin, a DREADD, a toxin, an enzyme, a transcription factor, an antibiotic resistance factor, a genome editing endonuclease, an RNA-guided endonuclease, a protease, a kinase, a phosphatase, a phosphorylase, a lipase, a receptor, an antibody, a fluorescent protein, a peroxidase such as APEX or APEX2, a base editing enzyme, a recombinase, a synaptic marker, a signaling protein, an effector protein of a receptor, a protein that regulates synaptic vesicle fusion or protein trafficking or organelle trafficking, a portion (e.g., a split half) of any one of the aforementioned polypeptides.

Synaptic Markers

Nucleic Acid Editing Enzymes

Peroxidases

Antibodies

Reporter Gene Products

Genome-Editing Endonuclease

In some cases, a suitable RNA-guided endonuclease comprises an amino acid sequence having at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the Staphylococcus aureus Cas9 amino acid sequence depicted in FIG. 22.

In some cases, the RNA-guided endonuclease is a nickase. Jinek et al., Science. 2012 Aug. 17; 337(6096):816-21).

Opsins

Transcription Factors

Toxins

Antibiotic Resistance Factors

As noted above, in some cases, the gene product of interest is an antibiotic resistance factor, e.g., a polypeptide that confers antibiotic resistance to a cell that produces the polypeptide.

Recombinases

DREADDs

A DREADD can interact with a G protein selected from Gi, Gq, and Gs. Thus, a DREADD can be a Gi-coupled DREADD, a Gq-coupled DREADD, or a Gs-coupled DREADD.

Nucleic Acid Gene Products

Suitable nucleic acid gene products include, but are not limited to, an inhibitory nucleic acid, a ribozyme, a guide RNA that binds a target nucleic acid and an RNA-guided endonuclease, a microRNA (miRNA), an antisense RNA, a ribozyme, a decoy RNA, an anti-mir RNA, a long non-coding RNA, and the like. Typically, the nucleic acid gene product is not translated.

Guide RNAs

Guide RNAs include RNAs (where a guide RNA can be a single RNA molecule or two RNA molecules) that comprise a first segment that comprises a nucleotide sequence that is complementary to (and hybridizes with) a target nucleotide sequence (e.g., a target nucleotide sequence present in genomic DNA), and a second segment that comprises a nucleotide sequence that binds to an RNA-guided endonuclease (e.g., a Cas9 polypeptide, a Cpf1 polypeptide, a C2c2 polypeptide, as described above).

In some cases, the guide RNA(s) bind to a Cas9 polypeptide. The first segment (targeting segment) of a Cas9 guide RNA includes a nucleotide sequence (a guide sequence) that is complementary to (and therefore hybridizes with) a specific sequence (a target site) within a target nucleic acid (e.g., a target ssRNA, a target ssDNA, the complementary strand of a double stranded target DNA, etc.). The protein-binding segment (or “protein-binding sequence”) interacts with (binds to) a Cas9 polypeptide. The protein-binding segment of a Cas9 guide RNA includes two complementary stretches of nucleotides that hybridize to one another to form a double stranded RNA duplex (dsRNA duplex). Site-specific binding and/or cleavage of a target nucleic acid (e.g., genomic DNA) can occur at locations (e.g., target sequence of a target locus) determined by base-pairing complementarity between the Cas9 guide RNA (the guide sequence of the Cas9 guide RNA) and the target nucleic acid.

In some cases, a guide RNA includes two separate nucleic acid molecules: an “activator” and a “targeter” and is referred to herein as a “dual guide RNA”, a “double-molecule guide RNA”, a “two-molecule guide RNA”, or a “dgRNA.” In some cases, the guide RNA is one molecule (e.g., for some class 2 CRISPR/Cas proteins, the corresponding guide RNA is a single molecule; and in some cases, an activator and targeter are covalently linked to one another, e.g., via intervening nucleotides), and the guide RNA is referred to as a “single guide RNA”, a “single-molecule guide RNA,” a “one-molecule guide RNA”, or simply “sgRNA.”

A “target nucleic acid” as used herein is a polynucleotide (e.g. a chromosomal DNA sequence; or an extrachromosomal sequence, e.g., an episomal sequence, a minicircle sequence, a mitochondrial sequence, a chloroplast sequence, etc.) that includes a site (“target site” “target sequence” or “endonuclease-recognized sequence”) targeted by a sequence-specific endonuclease, e.g., genome-editing endonuclease. When the sequence-specific endonuclease, e.g., genome editing endonuclease, is a CRISPR/Cas endonuclease, the target sequence is the sequence to which the guide sequence of a CRISPR/Cas guide RNA (e.g., a Cas9 guide RNA) will hybridize. For example, the target site (or target sequence) 5′-GAGCAUAUC-3′ within a target nucleic acid is targeted by (or is bound by, or hybridizes with, or is complementary to) the sequence 5′-GAUAUGCUC-3′. Suitable hybridization conditions include physiological conditions normally present in a cell. For a double stranded target nucleic acid, the strand of the target nucleic acid that is complementary to and hybridizes with the guide RNA is referred to as the “complementary strand” or “target strand”; while the strand of the target nucleic acid that is complementary to the “target strand” (and is therefore not complementary to the guide RNA) is referred to as the “non-target strand” or “non-complementary strand”.

Guide RNAs are well known in the art. Nucleotide sequences of the portion of the guide RNA that binds to a particular RNA-guided endonuclease (e.g., Cas9, Cpf1, C2c2, etc.) are known in the art. The portion of the guide RNA that hybridizes to a target nucleic acid can be designed based on the sequence of the target nucleic acid.

Inhibitory RNAs

Inhibitory RNAs are well known in the art. RNAi is the sequence-specific, post-transcriptional silencing of a gene's expression by double-stranded RNA. RNAi is mediated by 21- to 25-nucleotide, double-stranded RNA molecules referred to as small interfering RNAs (siRNAs). siRNAs can be derived by enzymatic cleavage of double-stranded precursor short interfering RNAs (shRNA) expressed from genetic constructs or micro RNA precursors in cells.

Cells Comprising a Polypeptide System

The present disclosure provides a cell comprising a FLARE system of the present disclosure. In some cases, the cell is in vitro. In some cases, the cell is in vivo.

The present disclosure provides a cell comprising a fusion polypeptide comprising: a) a transmembrane domain; b) a polypeptide that binds a calmodulin polypeptide or a troponin C polypeptide under certain Ca²⁺ concentration conditions (e.g., a Ca²⁺ concentration above about 100 nM); c) a light-activated polypeptide comprising a LOV domain; d) a proteolytically cleavable linker that is caged by the light-activated polypeptide in the absence of blue light; and e) a transcription factor.

The present disclosure provides a cell comprising a fusion polypeptide comprising: a) a calmodulin polypeptide; and b) a protease. The present disclosure provides a cell comprising a fusion polypeptide comprising: a) a troponin C polypeptide; and b) a protease.

The present disclosure provides a cell comprising: a first fusion polypeptide comprising: a) a transmembrane domain; b) a calmodulin-binding polypeptide that binds a calmodulin polypeptide under certain Ca²⁺ concentration conditions (e.g., a Ca²⁺ concentration above about 100 nM); c) a light-activated polypeptide comprising a LOV domain; d) a proteolytically cleavable linker that is caged by the light-activated polypeptide in the absence of blue light; and e) a transcription factor; and a second fusion polypeptide comprising: a) a calmodulin polypeptide; and b) a protease that cleaves the proteolytically cleavable linker under certain conditions.

The present disclosure provides a cell comprising: a first fusion polypeptide comprising: a) a transmembrane domain; b) a troponin I polypeptide that binds a troponin C polypeptide under certain Ca²⁺ concentration conditions (e.g., a Ca²⁺ concentration above about 100 nM); c) a light-activated polypeptide comprising a LOV domain; d) a proteolytically cleavable linker that is caged by the light-activated polypeptide in the absence of blue light; and e) a transcription factor; and a second fusion polypeptide comprising: a) a troponin C polypeptide; and b) a protease that cleaves the proteolytically cleavable linker under certain conditions.

Suitable cells include mammalian cells, amphibian cells, avian cells, insect cells, reptile cells, arachnid cells, and the like. In some cases, the cell is a primary (non-immortalized) cell. In some cases, the cell is an immortalized cell line.

In some cases, the cell is a mammalian cell, e.g., a human cell, a non-human primate cell, a rodent cell, a feline (e.g., a cat) cell, a canine (e.g., a dog) cell, an ungulate cell, an equine (e.g., a horse) cell, an ovine cell, a caprine cell, a bovine cell, etc. In some cases, the genetically modified host cell is a rodent cell (e.g., a rat cell; a mouse cell). In some cases, the genetically modified host cell is a human cell. In some cases, the genetically modified host cell is a non-human primate cell.

Suitable host cells include cells of, e.g., Bacteria (e.g., Eubacteria); Archaebacteria; Protista; Fungi; Plantae; and Animalia. Suitable host cells include cells of plant-like members of the kingdom Protista, including, but not limited to, algae (e.g., green algae, red algae, glaucophytes, cyanobacteria); fungus-like members of Protista, e.g., slime molds, water molds, etc.; animal-like members of Protista, e.g., flagellates (e.g., Euglena), amoeboids (e.g., amoeba), sporozoans (e.g., Apicomplexa, Myxozoa, Microsporidia), and ciliates (e.g., Paramecium). Suitable host cells include cells of members of the kingdom Fungi, including, but not limited to, members of any of the phyla: Basidiomycota (club fungi; e.g., members of Agaricus, Amanita, Boletus, Cantherellus, etc.); Ascomycota (sac fungi, including, e.g., Saccharomyces); Mycophycophyta (lichens); Zygomycota (conjugation fungi); and Deuteromycota. Suitable host cells include cells of members of the kingdom Plantae, including, but not limited to, members of any of the following divisions: Bryophyta (e.g., mosses), Anthocerotophyta (e.g., hornworts), Hepaticophyta (e.g., liverworts), Lycophyta (e.g., club mosses), Sphenophyta (e.g., horsetails), Psilophyta (e.g., whisk ferns), Ophioglossophyta, Pterophyta (e.g., ferns), Cycadophyta, Gingkophyta, Pinophyta, Gnetophyta, and Magnoliophyta (e.g., flowering plants). Suitable host cells include cells of members of the kingdom Animalia, including, but not limited to, members of any of the following phyla: Porifera (sponges); Placozoa; Orthonectida (parasites of marine invertebrates); Rhombozoa; Cnidaria (corals, anemones, jellyfish, sea pens, sea pansies, sea wasps); Ctenophora (comb jellies); Platyhelminthes (flatworms); Nemertina (ribbon worms); Ngathostomulida (jawed worms)p Gastrotricha; Rotifera; Priapulida; Kinorhyncha; Loricifera; Acanthocephala; Entoprocta; Nemotoda; Nematomorpha; Cycliophora; Mollusca (mollusks); Sipuncula (peanut worms); Annelida (segmented worms); Tardigrada (water bears); Onychophora (velvet worms); Arthropoda (including the subphyla: Chelicerata, Myriapoda, Hexapoda, and Crustacea, where the Chelicerata include, e.g., arachnids, Merostomata, and Pycnogonida, where the Myriapoda include, e.g., Chilopoda (centipedes), Diplopoda (millipedes), Paropoda, and Symphyla, where the Hexapoda include insects, and where the Crustacea include shrimp, krill, barnacles, etc.; Phoronida; Ectoprocta (moss animals); Brachiopoda; Echinodermata (e.g. starfish, sea daisies, feather stars, sea urchins, sea cucumbers, brittle stars, brittle baskets, etc.); Chaetognatha (arrow worms); Hemichordata (acorn worms); and Chordata. Suitable members of Chordata include any member of the following subphyla: Urochordata (sea squirts; including Ascidiacea, Thaliacea, and Larvacea); Cephalochordata (lancelets); Myxini (hagfish); and Vertebrata, where members of Vertebrata include, e.g., members of Petromyzontida (lampreys), Chondrichthyces (cartilaginous fish), Actinopterygii (ray-finned fish), Actinista (coelocanths), Dipnoi (lungfish), Reptilia (reptiles, e.g., snakes, alligators, crocodiles, lizards, etc.), Aves (birds); and Mammalian (mammals). Suitable plant cells include cells of any monocotyledon and cells of any dicotyledon. Plant cells include, e.g., a cell of a leaf, a root, a tuber, a flower, and the like. In some cases, the genetically modified host cell is a plant cell. In some cases, the genetically modified host cell is a bacterial cell. In some cases, the genetically modified host cell is an archaeal cell.

Suitable plant cells include cells of a monocotyledon; cells of a dicotyledon; cells of an angiosperm; cells of a gymnosperm; etc.

Nucleic Acids, Expression Vectors, and Host Cells

The present disclosure provides nucleic acid(s) comprising nucleotide sequences encoding one or more components of a FLARE system of the present disclosure. The present disclosure provides host cells genetically modified with the one or more nucleic acid(s).

The present disclosure provides a nucleic acid system comprising: a) a first nucleic acid comprising a nucleotide sequence encoding a first fusion polypeptide comprising: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide that binds calmodulin or troponin C, respectively, under certain Ca²⁺ concentration conditions (e.g., a Ca²⁺ concentration above about 100 nM); ii) a light-activated polypeptide comprising a LOV domain; iii) a proteolytically cleavable linker that is caged by the light-activated polypeptide in the absence of blue light; and iv) a transcription factor; and b) a second nucleic acid comprising a nucleotide sequence encoding a second fusion polypeptide comprising: a) a calmodulin polypeptide or a troponin C polypeptide; and b) a protease that cleaves the proteolytically cleavable linker under certain conditions.

The present disclosure provides a nucleic acid system comprising: a) a first nucleic acid comprising a nucleotide sequence encoding a first fusion polypeptide comprising: i) a transmembrane domain; ii) a calmodulin-binding polypeptide that binds calmodulin under certain Ca²⁺ concentration conditions (e.g., a Ca²⁺ concentration above about 100 nM); ii) a light-activated polypeptide comprising a LOV domain; iii) a proteolytically cleavable linker that is caged by the light-activated polypeptide in the absence of blue light; and iv) a transcription factor; and b) a second nucleic acid comprising a nucleotide sequence encoding a second fusion polypeptide comprising: a) a calmodulin polypeptide; and b) a protease that cleaves the proteolytically cleavable linker under certain conditions.

The present disclosure provides a nucleic acid system comprising: a) a first nucleic acid comprising a nucleotide sequence encoding a first fusion polypeptide comprising: i) a transmembrane domain; ii) a troponin I polypeptide that binds a troponin C polypeptide under certain Ca²⁺ concentration conditions (e.g., a Ca²⁺ concentration above about 100 nM); ii) a light-activated polypeptide comprising a LOV domain; iii) a proteolytically cleavable linker that is caged by the light-activated polypeptide in the absence of blue light; and iv) a transcription factor; and b) a second nucleic acid comprising a nucleotide sequence encoding a second fusion polypeptide comprising: a) a troponin C polypeptide; and b) a protease that cleaves the proteolytically cleavable linker under certain conditions.

The present disclosure provides a nucleic acid comprising: a nucleic acid comprising: a) a nucleotide sequence encoding a fusion polypeptide comprising: i) a transmembrane domain; ii) calmodulin-binding polypeptide or a troponin I polypeptide that binds calmodulin or troponin C, respectively, under certain Ca²⁺ concentration conditions (e.g., a Ca²⁺ concentration above about 100 nM); ii) a light-activated polypeptide comprising a LOV domain; and iii) a proteolytically cleavable linker that is caged by the light-activated polypeptide in the absence of blue light; and b) an insertion site for inserting a nucleic acid comprising a nucleotide sequence encoding a transcription factor. The insertion site is within 10 nucleotides (nt), within 9 nt, within 8 nt, within 7 nt, within 6 nt, within 5 nt, within 4 nt, within 3 nt, within 2 nt, or 1 nt, of the 3′ end of the nucleotide sequence encoding the light-activated, calcium-gated fusion polypeptide. The insertion site is positioned relative to the nucleotide sequence encoding the light-activated, calcium-gated fusion polypeptide such that, after insertion of a nucleic acid comprising a nucleotide sequence encoding a transcription factor, and after transcription and translation, a fusion polypeptide comprising: i) a transmembrane domain; ii) a calmodulin-binding polypeptide or a troponin I polypeptide; iii) a LOV-domain light-activated polypeptide comprising an amino acid sequence having at least 80% amino acid sequence identity to the amino acid sequence depicted in any one of FIG. 15A-15G; iv) a proteolytically cleavable linker; and v) the transcription factor, is produced. In some cases, the insertion site is a multiple cloning site.

In any of the above embodiments, the nucleic acid(s) can be present in a recombinant expression vector. In some cases, the recombinant expression vector is a viral construct, e.g., a recombinant adeno-associated virus (AAV) construct, a recombinant adenoviral construct, a recombinant lentiviral construct, a recombinant retroviral construct, etc. In some cases, a nucleic acid of a system of the present disclosure is a recombinant lentivirus vector. In some cases, a nucleic acid of a system of the present disclosure is a recombinant AAV vector.

In some cases, a nucleic acid or a nucleic acid system of the present disclosure is packaged in a viral particle. For example, in some cases, one or more of the nucleic acids of a nucleic acid system of the present disclosure are recombinant AAV vectors, and are packaged in recombinant AAV particles. Thus, the present disclosure provides a recombinant viral particle comprising a nucleic acid or a nucleic acid system of the present disclosure.

The present disclosure provides genetically modified host cells, where a host cell is genetically modified with a nucleic acid(s) comprising nucleotide sequences encoding one or more FLARE components, as described above. In some cases, a nucleic acid(s) comprising nucleotide sequences encoding one or more FLARE components, as described above, is stably integrated into the genome of the host cell. In some cases, a nucleic acid(s) comprising nucleotide sequences encoding one or more FLARE components, as described above, is present in the host cell episomally. The genetically modified cell can be in vitro or in vivo.

In some cases, the genetically modified host cell is a primary (non-immortalized) cell. In some cases, the genetically modified host cell is an immortalized cell line.

A genetically modified host cell of the present disclosure is a eukaryotic cell. Suitable host cells include mammalian cells, insect cells, reptile cells, amphibian cells, arachnid cells, and the like.

Suitable plant cells include cells of a monocotyledon; cells of a dicotyledon; cells of an angiosperm; cells of a gymnosperm; etc.

Enhanced LOV Polypeptide

The present disclosure provides an enhanced LOV-domain light-activated polypeptide (also referred to herein as an “enhanced LOV polypeptide” or an “eLOV polypeptide”). The present disclosure provides a nucleic acid comprising a nucleotide sequence encoding eLOV polypeptide of the present disclosure, and a recombinant expression vector comprising the nucleic acid. The present disclosure provides a genetically modified host cell comprising a nucleic acid comprising a nucleotide sequence encoding eLOV polypeptide of the present disclosure, or a recombinant expression vector comprising the nucleic acid.

In some cases, an eLOV polypeptide comprises an amino sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the following amino acid sequence: SLATTLERIEKNFVITDPRLPDNPIIFASDSFLQLTEYSREEILGRNCRFLQGPETDRATVR KIRDAIDNQTEVTVQLINYTKSGKKFWNLFHLQPMRDQKGDVQYFIGVQLDGTEHVRD AAEREAVMLIKKTAEEIDEAAK (SEQ ID NO://); and comprises a substitution at one or more of amino acids L2, N12, A28, H117, and I130, where the numbering is based on the amino acid sequence SLATTLERIEKNFVITDPRLPDNPIIFASDSFLQLTEYSREEILGRNCRFLQGPETDRATVR KIRDAIDNQTEVTVQLINYTKSGKKFWNLFHLQPMRDQKGDVQYFIGVQLDGTEHVRD AAEREAVMLIKKTAEEIDEAAK (SEQ ID NO://). In some cases, the eLOV polypeptide comprises a substitution selected from an L2R substitution, an L2H substitution, an L2P substitution, and an L2K substitution. In some cases, the eLOV polypeptide comprises a substitution selected from an N12S substitution, an N12T substitution, and an N12Q substitution. In some cases, the eLOV polypeptide comprises a substitution selected from an A28V substitution, an A28I substitution, and an A28L substitution. In some cases, the eLOV polypeptide comprises a substitution selected from an H117R substitution, and an H117K substitution. In some cases, the eLOV polypeptide comprises a substitution selected from an I130V substitution, an I130A substitution, and an I130L substitution. In some cases, the eLOV polypeptide comprises substitutions at amino acids L2, N12, and I130. In some cases, the eLOV polypeptide comprises substitutions at amino acids L2, N12, H117, and I130. In some cases, the eLOV polypeptide comprises substitutions at amino acids A28 and H117. In some cases, the eLOV polypeptide comprises substitutions at amino acids N12 and I130. In some cases, the eLOV polypeptide comprises an L2R substitution, an N12S substitution, and an I130V substitution. In some cases, the eLOV polypeptide comprises an N12S substitution and an I130V substitution. In some cases, the eLOV polypeptide comprises an A28V substitution and an H117R substitution. In some cases, the eLOV polypeptide comprises an L2P substitution, an N12S substitution, an I130V substitution, and an H117R substitution. In some cases, the eLOV polypeptide comprises an L2P substitution, an N12S substitution, an A28V substitution, an H117R substitution, and an I130V substitution. In some cases, the eLOV polypeptide comprises an L2P substitution, an N12S substitution, an I130V substitution, and an H117R substitution. In some cases, the eLOV polypeptide comprises an L2R substitution, an N12S substitution, an A28V substitution, an H117R substitution, and an I130V substitution. In some cases, the eLOV polypeptide has a length of 142 amino acids, 143 amino acids, 144 amino acids, 145 amino acids, 146 amino acids, 147 amino acids, 148 amino acids, 149 amino acids, or 150 amino acids. In some cases, the LOV polypeptide has a length of 142 amino acids.

In some cases, an eLOV polypeptide of the present disclosure comprises one or more amino acid substitutions relative to the LOV2 amino acid sequence depicted in FIG. 15A. In some cases, an eLOV polypeptide of the present disclosure comprises one or more amino acid substitutions at positions selected from 1, 2, 12, 25, 28, 91, 100, 117, 118, 119, 120, 126, 128, 135, 136, and 138, relative to the LOV2 amino acid sequence depicted in FIG. 15A. Suitable substitutions include, Asp→Ser at amino acid 1; Asp→Phe at amino acid 1; Leu→Arg at amino acid 2; Asn→Ser at amino acid 12; Ile→Val at amino acid 12; Ala→Val at amino acid 28; Leu→Val at amino acid 91; Gln→Tyr at amino acid 100; His→Arg at amino acid 117; Val→Leu at amino acid 118; Arg→His at amino acid 119; Asp→Gly at amino acid 120; Gly→Ala at amino acid 126; Met→Cys at amino acid 128; Glu→Phe at amino acid 135; Asn→Gln at amino acid 136; Asn→Glu at amino acid 136; and Asp→Ala at amino acid 138, where the amino acid numbering is based on the number of the LOV2 amino acid sequence depicted in FIG. 15A.

In some cases, an eLOV polypeptide of the present disclosure comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15C, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 28 is Ala; amino acid 117 is Arg; amino acid 126 is Ala; and amino acid 136 is Glu. In some case, an eLOV polypeptide of the present disclosure has a length of 142 amino acids.

In some cases, an eLOV polypeptide of the present disclosure comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15D, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 25 is Val; amino acid 28 is Val; amino acid 117 is Arg; amino acid 126 is Ala; amino acid 130 is Val; and amino acid 136 is Glu. In some case, an eLOV polypeptide of the present disclosure has a length of 142 amino acids.

In some cases, an eLOV polypeptide of the present disclosure comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15E, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 28 is Ala; amino acid 91 is Val; amino acid 100 is Tyr; amino acid 117 is Arg; amino acid 118 is Leu; amino acid 119 is His; amino acid 120 is Gly; amino acid 126 is Ala; amino acid 128 is Cys; amino acid 130 is Val; amino acid 135 is Phe; amino acid 136 is Gln; and amino acid 138 is Ala. In some case, an eLOV polypeptide of the present disclosure has a length of 142 amino acids.

In some cases, an eLOV polypeptide of the present disclosure comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15F, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 28 is Val; amino acid 117 is Arg; amino acid 126 is Ala; amino acid 130 is Val; and amino acid 136 is Glu. In some case, an eLOV polypeptide of the present disclosure has a length of 138 amino acids.

In some cases, an eLOV polypeptide of the present disclosure comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%, amino acid sequence identity to the amino acid sequence depicted in FIG. 15G, where amino acid 1 is Ser; amino acid 2 is Arg; amino acid 12 is Ser; amino acid 28 is Val; amino acid 91 is Val; amino acid 100 is Tyr; amino acid 117 is Arg; amino acid 118 is Leu; amino acid 119 is His; amino acid 120 is Gly; amino acid 126 is Ala; amino acid 128 is Cys; amino acid 130 is Val; amino acid 135 is Phe; amino acid 136 is Gln; and amino acid 138 is Ala. In some case, an eLOV polypeptide of the present disclosure has a length of 138 amino acids.