A computer readable form of the Sequence Listing is filed with this application by electronic submission and is incorporated into this application by reference in its entirety. The Sequence Listing is contained in the file created on Jan. 8, 2023 having the file name “21-1622-WO.xml” and is 638 kb in size.
Bioluminescent light produced by the enzymatic oxidation of a luciferin substrate is widely used for bioassays and imaging in biomedical research. Because no excitation light source is needed, luminescent photons are produced in the dark which results in higher sensitivity than fluorescence imaging in live animal models and in biological samples where autofluorescence or phototoxicity is a concern. However, the development of luciferases as molecular probes has lagged behind that of well-developed fluorescent protein toolkits for a number of reasons: (i) very few native luciferases have been identified; (ii) many of those that have been identified require multiple disulfide bonds to stabilize the structure and are therefore prone to misfolding in mammalian cells; (iii) most native luciferases do not recognize synthetic luciferins with more desirable photophysical properties; and (iv) multiplexed imaging to follow multiple processes in parallel using mutually orthogonal luciferase-luciferin pairs has been limited by the low substrate specificity of native luciferases.
In one aspect, the disclosure provides proteins having luciferase activity, comprising the secondary structure arrangement H1-L1-H2-L2-E1-L3-E2-L4-H3-L5-E3-L6-E4-L7-E5-L8-E6, wherein “H” is a helical domain. “L” is a loop domain, and “E” is a beta strand domain; wherein:
In various embodiments, 7 of the E5 domain is M; the E6 domain is at least 9, 10, 11, 12, or 13 amino acids in length and wherein residue 5 of the E6 domain is V; residue 1 of the L5 domain is S; residue 7 of the E5 domain is M and residue 5 of the E6 domain is V; and/or residue 7 of the E5 domain is M, residue 5 of the E6 domain is V, and residue 1 of the L5 domain is S.
In other embodiments, the H2 domain is at least 5, 6, or 7 amino acids in length, the H3 domain is at least 9, 10, 11, 12, 13, or 14 amino acids in length, the E1 domain is at least 3 or 4 amino acids in length, the E2 domain is at least 3 or 4 amino acids in length, and/or the E4 domain is at least 8, 9, 10, 11, or 12 amino acids in length.
In one embodiment, 1, 2, 3, 4, or all 5 of the following is true:
In another embodiment, 1, 2, 3, 4, 5, or all 6 of the following are true:
In a further embodiment, the protein comprises an amino acid sequence at least 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO:1-181, or SEQ ID NO:1-3. In another embodiment, the protein comprises the amino acid sequence of SEQ ID NO:4.
In one aspect, the disclosure provides proteins having luciferase activity, and comprising an amino acid sequence at least 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence of SEQ ID NO:1, wherein:
Residue 14 is Y, D, or E and residue 98 is H or N; and
Residue 18 is D or E and residue 65 is R.
In various embodiments, the protein comprises one or both of A96M and M110V substitutions relative to SEQ ID NO:1; both of A96M and M110V substitutions relative to SEQ ID NO:1; and/or an R60S substitution relative to SEQ ID NO:1; R60S, A96M, and M110V substitutions relative to SEQ ID NO:1. In other embodiments, the protein comprises an amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO: 1-3, or 1-181.
In another aspect, the disclosure provides a protein comprising the formula X1-Z1-X2-Z2-X3-Z3-X4-Z4-X5-Z5-X6-Z6-X7-Z7-X8-Z8, wherein:
In another embodiment, the disclosure provides self-complementing multipartite protein shaving luciferase activity, comprising at least a first polypeptide component and a 20 second polypeptide component, wherein the at least first polypeptide component and the second polypeptide component are not covalently linked, wherein in total the at least first polypeptide component and the second polypeptide component comprise domains X1-Z1-X2-Z2-X3-Z3-X4-Z4-X5-Z5-X6-Z6-X7-Z7-X8-Z8-X9, wherein each domain is as defined herein:
In a further embodiment, the disclosure provides self-complementing multipartite protein having luciferase activity, comprising at least a first polypeptide component and a second polypeptide component, wherein the at least first polypeptide component and the second polypeptide component are not covalently linked, wherein in total the at least first polypeptide component and the second polypeptide component comprise the secondary structure arrangement H1-L1-H2-L2-E1-L3-E2-L4-H3-L5-E3-L6-E4-L7-E5-L8-E6, wherein each domain is as defined herein
In a further embodiment, the disclosure provides proteins having luciferase activity, comprising the secondary structure arrangement H1-L1-H2-L2-E1-L3-E2-L4-H3-L5-E3-L6-E4-L7-E5-L8-E6, wherein “H” is a helical domain, “L” is a loop domain, and “E” is a beta strand domain; wherein:
The disclosure also provides fusion proteins comprising:
The disclosure further provides nucleic acids encoding the protein, polypeptide component, or fusion proteins of the disclosure, expression vector comprising the nucleic acids operatively linked to a suitable control element, host cells comprising a protein, polypeptide component, fusion protein, nucleic acid, and/or expression vector of the disclosure; and kits comprising a protein, polypeptide component, fusion protein, nucleic acid, expression vector, and/or host cell of the disclosure; and instructions for their use. The disclosure also provides methods for use of a protein, polypeptide component, fusion protein, nucleic acid, expression vector, host cell, and/or kit of the disclosure.
All references cited are herein incorporated by reference in their entirety.
As used herein, the singular forms “a”, “an” and “the” include plural referents unless the context clearly dictates otherwise.
As used herein, the amino acid residues are abbreviated as follows: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R), cysteine (Cys; C), glutamic acid (Glu; E), glutamine (Gln; Q), glycine (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Leu; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V).
In all embodiments of polypeptides disclosed herein, any N-terminal methionine residues are optional (i.e.: the N-terminal methionine residue may be present or may be deleted).
All embodiments of any aspect of the disclosure can be used in combination, unless the context clearly dictates otherwise.
Unless the context clearly requires otherwise, throughout the description and the claims, the words ‘comprise’. ‘comprising’, and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to”. Words using the singular or plural number also include the plural and singular number, respectively. Additionally, the words “herein.” “above,” and “below” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of the application
In a first aspect, the disclosure provides proteins having luciferase activity, comprising the secondary structure arrangement H1-L1-H2-L2-E1-L3-E2-L4-H3-L5-E3-L6-E4-L7-E5-L8-E6, wherein “H” is a helical domain, “L” is a loop domain, and “E” is a beta strand domain; wherein:
As disclosed in the examples that follow, the proteins of the disclosure are non-naturally occurring, have luciferase activity and share this recited secondary structure arrangement. The arrangement is shown with respect to the amino acid sequence of SEQ ID NO:1 in
Except as noted, the different domains may be any suitable length.
In one embodiment, residue 7 of the E5 domain is M. In another embodiment, the E6 domain is at least 9, 10, 11, 12, or 13 amino acids in length and residue 5 of the E6 domain is V. In a further embodiment, residue 1 of the L5 domain is S. In one embodiment, residue 7 of the E5 domain is M and residue 5 of the E6 domain is V. In a further domain, residue 7 of the E5 domain is M, residue 5 of the E6 domain is V, and residue 1 of the L5 domain is S. In another embodiment, the H2 domain is at least 5, 6, or 7 amino acids in length, the H3 domain is at least 9, 10, 11, 12, 13, or 14 amino acids in length, the E1 domain is at least 3 or 4 amino acids in length, the E2 domain is at least 3 or 4 amino acids in length, and the E4 domain is at least 8, 9, 10, 11, or 12 amino acids in length. In all these embodiments, one or more of the recited domains may independently include amino acid residues. In one embodiment, one or more of the domains may independently include an additional 1, 2, 3, 4, or 5 residues.
In one embodiment:
The loop domains may be of any length and may include insertions, relative to the sequences exemplified herein, of any residues or functional domains as deemed appropriate, including but not limited to metal binding domains, drug binding domains, GPCR receptors, protein switches, and small molecule binding domains.
In another embodiment, the proteins comprise an amino acid sequence at least 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO:1-3, wherein residues in parentheses are optional and may be present or may be deleted.
SEQ ID NO:1 is the Lux-Sit construct disclosed herein.
SEQ ID NO:2 is the Lux-Sit-i construct disclosed herein.
SEQ ID NO:3 is the Lux-Sit-f construct disclosed herein.
In each of the annotated sequences shown for SEQ ID NO:1-3:
In some embodiments of the proteins, 1, 2, 3, 4, or all 5 of the following is true:
In other embodiments of the proteins, the H2 domain is 7 amino acids in length, the H3 domain is 14 amino acids in length, the E1 domain is 4 amino acids in length, the E2 domain is 4 amino acids in length, and/or the E4 domain is 12 amino acids in length. In further embodiments, 1, 2, 3, 4, 5, or all 6 of the following are true:
In another embodiment, the protein comprises the amino acid sequence of SEQ ID NO:4.
In another embodiment, the proteins comprise an amino acid sequence at least 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid selected from the group consisting of SEQ ID NO:1-181, as shown in Table 1. SEQ ID NO:5-181 in Table 1 are re-designed amino acid sequences based on LuxSit-i (SEQ ID NO:2), with their luciferase activities shown.
coli expression)
coli expression)
coli expression)
In another aspect, the disclosure provides protein having luciferase activity, comprising an amino acid sequence at least 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO:1, wherein:
The proteins of this aspect are non-naturally occurring. In one embodiment, the percent identity relative to the reference sequence is carried out by sequence alignment with the Needleman-Wunsch algorithm, a common sequence alignment tool for those of skill in the art, which allows for insertions and deletions.
In one embodiment, the protein comprises one or both of A96M and M110V substitutions relative to SEQ ID NO:1. In another embodiment, the protein comprises comprising an R60S substitution relative to SEQ ID NO:1. In a further embodiment, the protein comprises R60S, A96M, and M110V substitutions relative to SEQ ID NO:1. In another embodiment, any substitutions relative to SEQ ID NO:1 at residues F12, 135, W38, F49, V81, L83, V94, A 97, W100, M110, V112 are conservative amino acid substitutions. In one embodiment, the protein comprises an amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from SEQ ID NO:1-3. In another embodiment, the protein comprises an amino acid sequence at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from SEQ ID NO: 1-181.
In another embodiment, the proteins comprise the formula X1-Z1-X2-Z2-X3-Z3-X4-Z4-X5-Z5-X6-Z6-X7-Z7-X8-Z8, wherein:
In one embodiment, 1, 2, 3, 4, 5, 6, 7, or all 8 of the following are true
In one embodiment of all of the proteins of the disclosure, amino acid substitutions relative to the reference protein are conservative amino acid substitutions. As used herein, “conservative amino acid substitution” means a given amino acid can be replaced by a residue having similar physiochemical characteristics, e.g., substituting one aliphatic residue for another (such as Ile, Val, Leu, or Ala for one another), or substitution of one polar residue for another (such as between Lys and Arg; Glu and Asp; or Gln and Asn). Other such conservative substitutions, e.g., substitutions of entire regions having similar hydrophobicity characteristics, are known. Proteins comprising conservative amino acid substitutions can be tested in any one of the assays described herein to confirm that a desired activity, is retained. Amino acids can be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp. 73-75, Worth Publishers, New York (1975)): (1) non-polar: Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), Met (M); (2) uncharged polar: Gly (G). Ser (S), Thr (T), Cys (C), Tyr (Y), Asn (N), Gln (Q); (3) acidic: Asp (D), Glu (E); (4) basic: Lys (K), Arg (R), His (H). Alternatively, naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met. Ala. Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser. Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe. Non-conservative substitutions will entail exchanging a member of one of these classes for another class. Particular conservative substitutions include, for example; Ala into Gly or into Ser; Arg into Lys; Asn into Gln or into H is; Asp into Glu; Cys into Ser; Gln into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; lie into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into Gln or into Glu; Met into Leu, into Tyr or into lie; Phe into Met, into Leu or into Tyr; Scr into Thr; Thr into Ser; Trp into Tyr; Tyr into Trp; and/or Phe into Val, into Ile or into Leu.
In another embodiment, the disclosure provides self-complementing multipartite protein having luciferase activity, comprising at least a first polypeptide component and a second polypeptide component, wherein the at least first polypeptide component and the second polypeptide component are not covalently linked, wherein in total the at least first polypeptide component and the second polypeptide component comprise domains X1-Z1-X2-Z2-X3-Z3-X4-Z4-X5-Z5-X6-Z6-X7-Z7-X8-Z8-X9, wherein each domain is as defined above; and
The split proteins are non-naturally occurring. The split proteins comprise at least a first polypeptide component and a second polypeptide component in which X domains are preserved while split points are taken only in the Z domains. In other words, each X strand or (X1, X2, X3, X4, X5, X6, X7, X8, and X9) is fully present within one polypeptide component of the at least first polypeptide component and the second polypeptide component, while the protein is split into separate components at a Z domain (of Z1, Z2, Z3, Z4, Z5, Z6, Z7, and Z8), wherein the Z domain that the split occurs at may be absent, or may be partially present in one or both of the first and second polypeptide components. By way of non-limiting example, in various embodiments of a split luciferase protein, the first polypeptide component and the second polypeptide component may comprise components as exemplified in Table 2.
In various embodiments, the split may occur at Z4, Z5, Z6, or Z7.
In another embodiment, the disclosure provides self-complementing multipartite protein having luciferase activity, comprising at least a first polypeptide component and a second polypeptide component, wherein the at least first polypeptide component and the second polypeptide component are not covalently linked, wherein in total the at least first polypeptide component and the second polypeptide component comprise the secondary structure arrangement H1-L1-H2-L2-E1-L3-E2-L4-H3-L5-E3-L6-E4-L7-E5-L8-E6, wherein each domain is as defined above;
In this embodiment, the split proteins comprise at least a first polypeptide component and a second polypeptide component in which H and E domains are preserved while split points are taken only in the L domains. In various embodiments, the split occurs at L4. L5, L6, L7, or L8.
The split proteins of these embodiments are only active when they are brought together, and thus are conditionally active.
In another embodiment the disclosure provides fusion proteins comprising:
As used herein, a “functional domain” is any polypeptide that can be usefully fused to the luciferase protein or split protein component of the disclosure. By way of non-limiting examples, the one or more additional functional domains may comprise a diagnostic polypeptide, any protein that one might want to localize within a cell, tissue, or organism; etc.
In another aspect the disclosure provides nucleic acids encoding the protein, protein component, or fusion protein of any embodiment or combination of embodiments of the disclosure. The nucleic acid sequence may comprise single stranded or double stranded RNA (such as an mRNA) or DNA in genomic or cDNA form, or DNA-RNA hybrids, each of which may include chemically or biochemically modified, non-natural, or derivatized nucleotide bases. Such nucleic acid sequences may comprise additional sequences useful for promoting expression and/or purification of the encoded polypeptide, including but not limited to polyA sequences, modified Kozak sequences, and sequences encoding epitope tags, export signals, and secretory signals, nuclear localization signals, and plasma membrane localization signals. It will be apparent to those of skill in the art, based on the teachings herein, what nucleic acid sequences will encode the polypeptides of the disclosure. In various non-limiting embodiments, the nucleic acid may comprise the nucleotide sequence of any one of SEQ ID NO:200-380, wherein residues in parentheses are optional and may be present or absent.
In a further aspect, the disclosure provides expression vectors comprising the nucleic acid of any aspect of the disclosure operatively linked to a suitable control sequence. “Expression vector” includes vectors that operatively link a nucleic acid coding region or gene to any control sequences capable of effecting expression of the gene product. “Control sequences” operably linked to the nucleic acid sequences of the disclosure are nucleic acid sequences capable of effecting the expression of the nucleic acid molecules. The control sequences need not be contiguous with the nucleic acid sequences, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the nucleic acid sequences and the promoter sequence can still be considered “operably linked” to the coding sequence. Other such control sequences include, but are not limited to, polyadenylation signals, termination signals, and ribosome binding sites. Such expression vectors can be of any type, including but not limited plasmid and viral-based expression vectors. The control sequence used to drive expression of the disclosed nucleic acid sequences in a mammalian system may be constitutive (driven by any of a variety of promoters, including but not limited to, CMV, SV40, RSV, actin, EF) or inducible (driven by any of a number of inducible promoters including, but not limited to, tetracycline, ecdysone, steroid-responsive). The expression vector must be replicable in the host organisms either as an episome or by integration into host chromosomal DNA. In various embodiments, the expression vector may comprise a plasmid, viral-based vector, or any other suitable expression vector.
In another aspect, the disclosure provides host cells that comprise the nucleic acids, expression vectors (i.e.: episomal or chromosomally integrated), non-naturally occurring polypeptides, fusion protein, or compositions disclosed herein, wherein the host cells can be either prokaryotic or eukaryotic. The cells can be transiently or stably engineered to incorporate the nucleic acids or expression vector of the disclosure, using techniques including but not limited to bacterial transformations, calcium phosphate co-precipitation, electroporation, or liposome mediated-, DEAE dextran mediated-, polycationic mediated-, or viral mediated transfection.
The disclosure also provide kits, comprising:
In one embodiment, the kits further comprise diphenylterazine (DTZ).
In another aspect, the disclosure provides methods for use of the protein, polypeptide component, fusion protein, nucleic acid, expression vector, host cell, and/or kit of any preceding claim for any suitable purpose, including but not limited to use luminescent reporting assays, diagnostic assays, cellular localization of targets of interest, cellular imaging, gene editing, live animal imaging, cancer labeling, CART-cells reporting, secreted assay, gene delivery, tissue engineering, etc. Additional details can be found in the examples.
In another aspect, the disclosure provides methods for making a luciferase, comprising de novo design using the methods of any embodiment disclosed herein, starting with the protein comprising the amino acid sequence of SEQ ID NO:381. The examples provide detailed methods for de novo design of luciferases for DTZ. As described in the examples, the methods involve designing a shape complementary catalytic site that stabilizes the anionic state of DTZ and lowers the SET energy barrier, assuming that the downstream dioxetane light emitter thermolysis steps are spontaneous. To stabilize the anionic species of DTZ, we focused on the placement of the positively charged guanidinium group of an arginine residue to interact with the anionic imidazopyrazinone core. To computationally design such active sites, the inventors first used AIMNet to generate an ensemble of anionic DTZ conformers (
De novo enzyme design has sought to introduce active sites and substrate binding pockets predicted to catalyze a reaction of interest into geometrically compatible native scaffolds1,2, but has been limited by a lack of suitable protein structures and the complexity of native protein sequence-structure relationships. Here we describe a deep-learning based “family-wide hallucination” approach that generates large numbers of idealized protein structures containing diverse pocket shapes and designed sequences that encode them. We use these scaffolds to design artificial luciferases that selectively catalyze the oxidative chemiluminescence of the synthetic luciferin substrates, diphenylterazine (DTZ); through the placement of an arginine guanidinium group adjacent to an anion species that develops during the reaction in a high shape complementarity binding pocket. For both luciferin substrates, we obtain designed luciferases with high selectivity; the most active of these is a small (13.9 kDa) and thermostable (TM>95° C.) enzyme with a catalytic efficiency on DTZ (kcat/KM=106 M−1s−1) comparable to native luciferases but with much higher substrate specificity. The design of highly active and specific biocatalysts from scratch with broad applications in biomedicine is an important milestone for computational enzyme design, and our approach should enable the design of a wide range of new and useful luciferases and other enzymes.
Bioluminescent light produced by the enzymatic oxidation of a luciferin substrate is widely used for bioassays and imaging in biomedical research. Because no excitation light source is needed, luminescent photons are produced in the dark which results in higher sensitivity than fluorescence imaging in live animal models and in biological samples where autofluorescence or phototoxicity is a concern4,5. However, the development of luciferases as molecular probes has lagged behind that of well-developed fluorescent protein toolkits for a number of reasons: (i) very few native luciferases have been identified; (ii) many of those that have been identified require multiple disulfide bonds to stabilize the structure and are therefore prone to misfolding in mammalian cells; (iii) most native luciferases do not recognize synthetic luciferins with more desirable photophysical properties; and (iv) multiplexed imaging to follow multiple processes in parallel using mutually orthogonal luciferase-luciferin pairs has been limited by the low substrate specificity of native luciferases.
We sought to use de novo protein design to create new luciferases that are small, highly stable, well-expressed in cells, specific for one substrate, and need no cofactors to function. As we are not constrained to natural luciferase substrates, we chose a synthetic luciferin, Diphenylterazine (DTZ) as the target substrate duo to its good quantum yield, red-shifted emission3, favorable in vivo pharmacokinetics14,15, and lack of required cofactors for emission. Previous computational enzyme design studies have primarily repurposed native protein scaffolds in the PDB, but there are few native structures with binding pockets appropriate for DTZ, and the effects of sequence changes on native proteins can be unpredictable. To circumvent these limitations, we set out to generate large numbers of ideal protein scaffolds with pockets of the appropriate size and shape for DTZ, and with clear sequence-structure relationships to facilitate subsequent active site incorporation. To identify protein folds capable of hosting such pockets, we first docked DTZ into 4000 native small molecule binding proteins. We found that many NTF2 (nuclear transport factor 2)-like folds have binding pockets with appropriate shape-complementary and size for DTZ placement (
Native NTF2 structures have a range of pocket sizes and shapes but also contain non-ideal features such as long loops which compromise stability. To create large numbers of ideal NTF2-like structures, we developed a deep-learning based “family-wide hallucination” approach that integrates unconstrained de novo design19,21 and fixed backbone sequence design approaches21 to enable the generation of an essentially unlimited number of proteins having a desired fold (
We chose the NFT2 scaffold of SEQ ID NO:381 from which to design luciferases for DTZ, as described in detail below.
Standard computational enzyme design generally starts from an ideal active site or theozyme consisting of protein functional groups surrounding the reaction transition state that is then extrapolated into a set of existing scaffolds1,2. However, the detailed mechanism of native marine luciferases is not well defined as only a handful of apo-structures and no holo-structures have been solved24,25 (excluding calcium-regulated photoproteins). Both quantum chemistry calculations26,27 and experimental data28,29 suggest that the chemiluminescent reaction proceeds through an anionic species and that the polarity of the surroundings can substantially alter the free energy of the subsequent single electron transfer (SET) process with triplet molecular oxygen (3O2). Guided by these data (
To computationally design such active sites into large numbers of hallucinated NTF2 scaffolds, we first used AIMNet30 to generate an ensemble of anionic DTZ conformers (
Oligonucleotides encoding the two halves of each design were assembled into full-length genes and cloned into an E. coli expression vector (see Methods). A colony-based screening method was used to directly image active luciferase colonies from the library and the activities of selected clones were confirmed using a 96-well plate expression (
To better understand the contributions to catalysis of LuxSit, the most active of our luciferase designs, we constructed a site saturation mutagenesis (SSM) library in which every mutation was made at every pocket residue one at a time (see Methods).
The most active catalysts, LuxSit-f and LuxSit-i were both expressed solubly in E. coli at high levels and are monomeric (some dimerization was observed at the high protein concentration,
As luciferases are commonly used genetic tags and reporters for the study of cellular functions, we evaluated the expression and function of LuxSit-i in live mammalian cells. LuxSit-i-mTagBFP2-expressing HEK293T cells had DTZ specific luminescence (
The high substrate specificity of LuxSit-i might allow multiplexing of luminescent reporters through substrate-specific or spectrally resolved luminescent signals (
Computational enzyme design to date has been constrained by the number of available scaffolds, which limits the extent to which catalytic configurations and enzyme-substrate shape complementarity can be achieved16-18. The use of deep-learning to generate large numbers of de novo designed scaffolds here eliminates this restriction; moving forward, the more accurate RoseTTAfold™39 and AlphaFold2™33 should enable still more effective protein scaffold generation by leveraging family-wide hallucination abilities. The diversity of scaffold pocket shapes and sizes enabled the exploration of a range of catalytic geometries and the maximization of substrate-enzyme shape complementarity; to our knowledge, no native luciferases have folds similar to LuxSit, and the two enzymes have high specificity for fully synthetic luciferin substrates that do not exist in nature. With the incorporation of 2-3 substitutions that provide a more complementary pocket to stabilize the transition state, LuxSit-i has higher activity than any previously de novo designed enzyme; the kcat/KM of 106 M−1s−1 is in the range of native luciferases. This is a notable advance for computational enzyme design, as tens of rounds of directed evolution were required to obtain catalytic proficiencies in this range for a designed retroaldolase, and the structure was remodeled considerably40; in contrast, the predicted differences in ligand-sidechain interactions between LuxSit and LuxSit-i are very subtle. Achieving such high activities directly from the computer remains an outstanding goal for computational enzyme design. The small size of LuxSit makes it well suited as a genetic tag for capacity-limited viral vectors, biosensor development, and fusions to proteins of interest. On the basic science side, the small size, simplicity, and high activity make LuxSit-i an excellent model system for computational and experimental studies aimed at improving understanding of the luciferase catalytic mechanism. Extension of the approach used here to create similarly specific new luciferases for synthetic luciferin substrates beyond DTZ and h-CTZ would considerably extend the multiplexing opportunities illustrated in
amean ± s.d., n = 3 (technical triplicates). LQY (luminescent quantum yield) measurements were performed by consuming 125 pmol of DTZ (w/LuxSit-i, and LuxSit-f) or CTZ (w/RLuc) in 50 AL PBS with 50 nM corresponding recombinant luciferases.
bAll LQY values were estimated relative to the reported quantum yield of RLuc42. All values were calculated by the assumptions of the simplest Michaelis-Menten kinetics model, excluding potential substrate/product inhibition or enzyme modification during the reaction43,44.
Synthetic genes and oligonucleotides were purchased from Integrated DNA Technologies or GenScript. The synthetic gene was inserted between NdeI and XhoI sites of a pET29b+ vector, containing an N-terminal hexahistidine tag followed by a TEV protease cleavage site and a C-terminal stop codon. Restriction endonucleases, Q5 PCR polymerase, and T4 ligase were purchased from NEB. Plasmid DNA, PCR products, or digested fragments were purified by Qiagen DNA purification kits. DNA sequences were analyzed by Genewiz. Coelenterazine (CTZ) was purchased from Gold Biotechnology. Diphenylterazine (DTZ), pyridyl diphenylterazine (8pyDTZ), and Furimazine (FRZ) were purchased from MedChemExpress. All other coelenterazine analogs (bis-CTZ: bisdeoxycoelenterazine; f-CTZ: f-Coelenterazine; e-CTZ: e-Coelenterazine-F; PP-CTZ: methoxy e-Coelenterazine; v-CTZ: v-Coelenterazine. All other chemicals were purchased from Sigma-Aldrich or Fisher Scientific and used without further purification. To identify the molecular mass of each protein, intact mass spectra were obtained via reverse-phase LC/MS on an Agilent 6230B TOF on an AdvanceBio RP-Desalting column and subsequently deconvoluted by Bioconfirm software using a total entropy algorithm. AKTA pure M with UNICORN 6.3.2 Workstation control (GE Healthcare) coupled with a Superdex™ 75 Increase 10/300 GL column was used for size exclusion chromatography. DNA and protein concentrations were determined by a NanoDrop™ small-volume 8 channel UV/vis spectrometer. CD spectra and CD melting experiments were performed by the default setting on a J-1500 Circular Dichroism Spectropolarimeter (Jasco). All luminescence measurements were acquired by a Biotek Synergy Neo2T™ Multi-Mode Plate Reader. To convert relative arbitrary unit (RLU) to the number of photons, Neo2 plate reader was calibrated by determining the chemiluminescence of luminol with known quantum yield in the presence of horseradish peroxidase and hydrogen peroxide in K2CO3 aqueous solution as previously described45. SDS PAGE and luminescence images were captured by a Bio-Rad ChemiDocT™ XRS+. Images were analyzed using the Fiji image analysis software.
Lemo21(DE3) strain was used for transformation with the pET29b+ plasmid encoding the gene of interest. Transformed cells were grown for 12 h in TB medium supplemented with kanamycin. Cells were inoculated at 1:50 ratio in 100 mL fresh TB medium, grown at 37° C. for 4 h, and then induced by IPTG for an additional 18 h at 16° C. Cells were harvested by centrifugation at 4,000 g for 10 min and resuspended in 30 mL lysis buffer (20 mM Tris-HCl pH 8.0, 300 mM NaCl, 30 mM imidazole, and Pierce™ Protease Inhibitor Tablets). Cell resuspensions were lysed by sonication for 5 min (10 s per cycle). Lysates were clarified by centrifugation at 24,000 g at 12° C. for 40 min and pre-equilibrated with 1 mL of Ni-NTA nickel agarose at 4° C. for 1 h. The resin was washed twice with 10 mL wash buffer and then eluted in 1 mL elution buffer (20 mM Tris-HCl pH 8.0, 300 mM NaCl, 300 mM imidazole). The eluted proteins were purified by size exclusion chromatography in PBS. Fractions were collected based on A280 trace, snap-frozen in liquid nitrogen, and stored at −80° C.,
Our generation of idealized NTF2-scaffolds can be divided into four parts: (3.1) Generation of seed-structures, (3.2) optimization of backbone geometries using trRosetta™-based hallucination, (3.3) generation of structure-conditioned sequence models to bias design, (3.4) design and filtering.
We thought to increase the set of NTF2 structures by complementing experimentally resolved structures from the PDB with highly accurate models generated by trRosetta™22. To achieve this, we first collected 85 NTF2-like protein structures from the PDB based on SCOPe annotation (d. 17.4 SCOPe v2.05). Corresponding sequences were then used as queries to collect sequence homologs from UniProt™ by performing 8 iterations of hhblits at 1e-20 e-value cutoff against uniclust30_2018_08 database; default filtering cutoffs were relieved (−maxfilt 100000000−neffinax 20−nodiff-realign_max 10000000) to maximize the number of the output hits. All the hits were redundancy reduced using cd-hit46 with a sequence identity cutoff of 60% yielding a set of 7,573 candidates for modeling.
To generate inputs for structure modeling with trRosetta™, we built multiple sequence alignments (MSAs) for each of the 7,573 selected sequences with hhblits using a more conservative e-value cutoff of 1e-50; the resulting MSAs were also complemented by hits from hmmsearch against uniref100 (release-2019_11) with the bit-score threshold of 115 (i.e. ˜1 bit per position). After joining the above two sets of alignments and filtering them at 90% sequence identity and 75% coverage cutoffs, only sequences with more than 50 homologs in the corresponding MSAs were retained for modeling (2,005 sequences). The filtered MSAs along with information on the top 25 putative structural homologs as identified by hhsearch against the PDB100 database of templates were used as inputs to the template-aware version of trRosetta™47 to predict residue pair distances and orientations. Network predictions were then used to reconstruct full atom 3D structure models using a Rosetta™-based folding protocol described previously22.
Seeking to idealize the native structure seeds, we reasoned that trRosetta™, a convolutional residual neural network, which predicts residue-residue orientations and distances from sequence, could serve as a key component in a protein idealizer. Previously, this network has been used to generate diverse proteins that resemble the “ideal” structures of de novo designed proteins by changing the protein sequence to optimize the contrast (KL-divergence) between the predicted geometry and that of randomly generated sequences19.
For our purpose, the desired fold-space is not diverse but instead focused on the NTF2-like topology. To guarantee generation of ideal structures within this fold-space, we implemented a new fold-specific loss-function, which biased hallucinations based on observed geometries in native crystal structures. As many experimentally characterized NTF2s contain non-ideal regions, we began by creating a set (χ) of trimmed but ideal NTF2s by manually removing non-ideal structural elements such as kinked helices, and long or rarely observed loops. For each seed structure, we then used a structure-based sequence alignment method (see 3.3) to find equivalent positions between the seed structure and χ. Residue pairs were considered to be in a conserved tertiary motif (TERM) if there were 5 or more equivalent positions in χ. The smooth probability distributions based on observed geometries in χ were then computed. For distances we used a Gaussian distribution with mean equal to the true distance denoted by D and standard deviation denoted by a equal to 0.5 Å. The probability density function for distances d is given by:
Using this density function one can construct a categorical distribution for binned distances by evaluating this function at the centers of the bins and then normalizing by a sum of all values in different bins. Similarly, a von Mises distribution was used for omega angle smoothing with probability density function given by ƒ(ω; Ω, κ)=N(κ) exp[κ cos(ω−Ω)] where N(κ) is a normalizing constant, Ω is the crystal value, κ is the inverse variance chosen to be 100, and ω is the smoothed angle. For phi and theta angles a von Mises-Fisher blur is given by ƒ(x; y, K)=N(κ) exp[κ μTx] where N(κ) is a normalizing constant, μ is a unit vector on a 3D sphere corresponding to the phi and theta angles from the crystal structure, x is a smoothed unit vector, and K is the inverse variance chosen to be 100.
Next, we converted those probability distributions to energy landscapes (ie—negative log likelihoods) and sought to minimize the expected energy. This soft restraint encouraged the network to seek out the consensus structure, while still allowing deviations where needed. Specifically, we formulated the fold-specific loss as:
where p is the network prediction and s is the smoothed probability distribution of the conserved residue pairs. For the second part of the loss function and similar to previous work19, we sought to maximize the Kullback-Leibler (KL) divergence between the predicted probability distribution and a background distribution for all i,j residue pairs not in a TERM.
where b is the background distribution and Nx is the number of bins in each probability distribution (Nd=37, Nω,θ, =25, Nφ=13). Briefly, b is calculated by a network of similar architecture to trRosetta™ trained on the same training data, except it is never given sequence information as an input. The final loss is given by:
We used a Markov Chain Monte Carlo (MCMC) procedure to search for sequences that trRosetta™ predicted to fold into structures that minimize this loss function. We allowed four types of moves with different sampling probabilities: mutations (p=0.55), insertions (p=0.15), deletions (p=0.15), and moving segments (p=0.15). Mutations randomly changed one amino acid to another, with an equal transition probability for all 20 amino acids. Insertions inserted a new amino acid (all equally likely) into a random location subject to the KL-divergence loss. Deletions deleted a random residue from the same locations. Finally, we also allowed “segments” to move, cutting and pasting themselves from one part of the sequence to another, while maintaining the same overall segment order. Here, a “segment” is a continuous stretch of amino acids all subject to fold specific loss, often composed of a single strand or helix. Starting from a random sequence of an initial length (typically 120 amino acids), we used the standard Metropolis criteria to accept or reject moves:
where Ai is the chance of accepting the move at step i, Li is the loss at the current step, Li-1 is the loss at the previous step and T is the temperature. The temperature started at 0.2 and was reduced by half every 5 k steps. Generally, it took 30 k steps to converge.
Given the complexity of the NTF2-like protein fold, we hypothesized that it was necessary to impose sequence design rules to disfavor alternative states (negative design). Towards this end, we computed a structure-conditioned multiple sequence alignment based on native NTF2-like proteins. Specifically, we used TMalign48 to superimpose each of the 2005 predicted native structures (from 3.1) onto each hallucinated backbone (from 3.2). Next, to find structurally corresponding positions, we implemented a structure-based dynamic programming algorithm, similar to the Needleman-Wunsch algorithm49. However, instead of using the amino acid similarity as the scoring metric, we used a tunable structure-based score function. After aligning the two structures, we scored the structural similarity of any two residues by empirically weighting several metrics: (1) Distance between Ca atoms, (2) differences between backbone torsion angles (phi and psi) backbone torsion angles and (3) the angle (degrees) between the vectors pointing from Cα to Cβ in each residue. To calculate the unweighted score for each component, we normalized each by a maximum possible value (180 degrees for angles and 10 Å for distances) and included a “set point” that approximately delineated when we judged a metric to indicate two residues to be more similar than not. Values above this setpoint are positive, indicating two residues are similar and values below the set point indicated two residues are dissimilar.
Each value was scaled by its normalized weight and summed to give an overall similarity score between any two amino acids.
These similarity scores were used as the similarity metric in our dynamic programming algorithm, in place of the typical BLOSUM62 similarity metric. We used a gap penalty of 0.1 and an extension penalty of 0.0. Finally, after concatenating all the structure-conditioned aligned sequences, we used PSI-BLAST-exB50,51 to compute sequence redundancy weighted log-odds scores for each amino acid at each position (position-specific scoring matrices, PSSMs).
To design the resulting backbones, we sought, in addition to the sequence patterns captured in the PSSM (3.3), to further specify the backbone conformation and functionalize the pocket, by installing entire hydrogen bonding networks from native NTF2-like proteins. We compiled two sets of hydrogen bonding networks: a set for the cavity containing 85 networks and another set of networks connecting the C-terminal region of the first helix with the third beta-strand containing 25 networks. In 20 independent attempts for each backbone, we randomly grafted a network from each set, fixed the identities of hydrogen bonding residues, and designed the sequences for all other positions under PSSM constraints. The resulting models were filtered for various backbone quality metrics and for maintenance of hydrogen bonding networks in the absence of constraints, resulting in a total of 1615 idealized scaffolds.
The hierarchical search framework of RifDock is a powerful way to search through 6-dimensional rigid body orientations. While originally designed to work with physics-based forcefields, the scoring machinery can easily be modified to do other things. A system was added called “Tuning Files” that allows one to tune the energetics of rifdock by “requiring” specific interactions. Specified interactions can range from specific hydrogen bonds, to specific bidentates, and even to specific hydrophobic interactions. The specifics are that during the RifGen stage, each stored rotamer is compared against a list of definitions in the Tuning File. If the rotamer satisfies a definition, it is stored into the RIF with a “Requirement Number”. Later during RifDock, these Requirement Numbers are available during scoring and the presence or absence of certain rotameric interactions may be used to penalize or even completely discard dock solutions. In this work, the Tuning Files were used to require the specific hydrogen bond interactions between the arginine and the secondary amine in the pyrazine ring of the colenterazine-like substrate.
5 Designing Theozyme Architectures into De Novo NTF2 Scaffolds
De novo design of luciferases can be divided into three main steps—scaffold construction, substrate placement with required interactions, and sequence design. With the idealized NTF2-like scaffolds in hand, we selected 5 diverse rotamers from AIMNet and used the Rotamer Interaction Field (RIF) docking method31 to exhaustively search a large space of interacting side chains to the anionic form of DTZ. Chemically, deprotonation of N1 hydrogen is the first step to forming an anionic species (
Rifdock was then used to hierarchically search for the best combination of RIF to place on the input backbone. Although the negative charge can move to another electronegative atom O1 via resonance of the imidazopyrazinone core, it is unclear which anionic species is more critical for the luciferase-catalyzed luminescence emission. Thus, we let RIFdock place the polar rotamers on the basis of hydrogen-bond geometry to O1 and apolar rotamers to DTZ without specific requirements. In the next docking step, we parsed the -scaffold_res argument with a list of residue numbers as scaffold backbone positions that were annotated as pocket residues to allow a hierarchical search of RIF placement. We only allowed the RIF placements in the pocket residues and left pre-defined hydrogen bond networks (HBNets) intact. After RIFdock, we continued for Rosetta™ sequence design where the score function was reweighted for higher buried_unsat_penalty52 and the amino acid selection was biased by giving a pre-generated PSSM file via SeqprofConsensus task operation. This would minimize buried unsatisfied residues and increase pre-organized architectures in the core that are known to be beneficial for a catalytic pocket53. Two rounds of FastDesign calculation were included: we restricted the RIF rotamers and core HBNets to repacking in the first round while we allowed other residues for re-design based on PSSM during the Monte Carlo simulated annealing procedure. After the surrounding residues were optimized to retain the RIF interactions, we allowed the re-design of RIF rotamers, to find efficient aromatic and hydrophobic packing around DTZ while catalytic residues (the N1 requirement) were still limited to only repacking. The final set of designs was obtained after filtering by ligand-binding interface energy, shape complementarity, contact molecular surface, number of HbondsToResidue, and the presence of N1_hbond.
6. Structure Prediction of LuxSit with AlphaFold2 and Comparison to Design Model
To computationally assess the accuracy of the LuxSit design model, we performed single sequence structure prediction using AlphaFold2. All models were run with 12 recycles and generated models were relaxed using AMBER54. The model with the highest pLDDT was used for comparison to the Rosetta™ design model and structural superpositions were performed using the Theseus alignment tool to determine backbone RMSD between the design model and AlphaFold2 model33.
Rosetta™ cartesian_ddg application55,56 was used to computationally estimate enzyme and substrate binding free energy. The LuxSit design model was relaxed beforehand in cartesian space with the substrate-bound. For the 21 positions that were experimentally screened for single mutation effects on luciferase activity, each residue was computationally mutated into other amino acid types and packing and cartesian relaxation was performed to evaluate the final score in REU. This procedure was applied three times in parallel for both substrate-bound and apo-states. The average of the three calculation results was used to calculate the relative binding free energy (ddGbind) by subtracting the total score of the apo-state from the complex state.
The construction of assembled gene libraries was described previously in detail57. In brief, the amino acid sequences of all designed luciferases were first reverse-translated into E. coli codon-optimized DNA sequences. All DNA sequences were categorized into multiple sub-pools by the gene length (˜500 designs per sub-pool). Each gene was subsequently split into two fragments (fragment A and fragment B) and added outer and inner primer sequences to the 5′ and 3′ end (e.g., Outer_oligoA_5primer+design_half A+Inner_oligoA_3primer and Inner_oligoB_5primer+design_half B+Outer_oligoB_3primer). All oligos were ordered in one Twist 250 nt Oligo Pool. To construct the library of each sub-pool, polymerase chain reaction (PCR) with oligoA_5primer/oligoA_3primer or oligoB_5primer/oligoB_3primer oligonucleotide pairs was used to amplify the individual fragment A or fragment B from each sub-pool. The pool-specific sequences were removed with Uracil Specific Excision Reagent (USER) followed by NEB End Repair kit. Outer primers (oligoA_5primer and oligoB_3primer) were then used for fragment A and fragment B assembly and amplification. The assembled full-length fragment was digested with XhoI/HindIII and ligated into a predigested pBAD/His B vector. All ligation products were used to transform ElectroMAX™ DH10B Cells, which were next plated on 150 mm×15 mm LB agar plates supplemented with carbenicillin and L-arabinose. We sequenced 30 random colonies and 11 of the sequences were in our designed library. The plates (˜2000 colonies per plate) were incubated at 37° C. overnight to form bacterial colonies and left at 4° C. for another 24 h. To directly image luminescence activity from bacterial colonies, we sprayed the PBS solution containing 30 μM DTZ to each agar plate, waited for 2 min, and the luminescence images were acquired and processed with Bio-Rad ChemiDoc XRS+. After screening 15 plates, active colonies were collected for sequencing, protein expression, and other downstream characterization where LuxSit was selected from three active designs shown catalytic signal above background.
To create libraries of each single amino acid substitution at residues 13, 14, 17, 18, 35, 37, 38, 49, 52, 53, 56, 60, 65, 81, 83, 94, 96, 98, 100, 110, and 112, forward oligos mixture with degenerate codons (NDT, VHG, and TGG=1:1:0.1 ratio) and an overlapped reverse oligo were used to amplify the plasmid of LuxSit. The resulting PCR products were circularized by Gibson Assembly protocol and were subsequently used to transform ElectroMAX™ DH10B Cells. The cells were plated on 150 mm×15 mm LB agar plates supplemented with carbenicillin and L-arabinose, incubated at 37° C. overnight, and left at 4° C. for another 24 h. As described in the screening of luciferase libraries, colony-based screening by spraying DTZ solution was used to identify active colonies. Inactive colonies were also randomly picked. As a result, a total of 32 colonies were picked for each residue library. 32×21 individual colonies were grown in 1 mL of TB supplemented with carbenicillin and L-arabinose in 96-well deep-well culture plates. The plates were shaken at 37 C overnight (˜16-18 h) on 96-well plate shakers at 1,100 rpm. Cells were pelleted by centrifugation at 4,000 g for 15 min in a tabletop centrifuge. Media was discarded and the cell pellets were resuspended in 0.2 mL BugBuster HT Protein Extraction buffer. The plates were transferred back to 96-well plate shakers and incubated at 1,100 rpm for an additional 30 min. Cellular debris was pelleted again by centrifugation at 4,000 g for 15 min, soluble lysates were transferred to a new semi-deep 96-well plate, and incubated with 10 μL of magnetic Ni-NTA beads for 30 min to allow binding. The magnetic extractor was used to first transfer the beads from the binding plates to wash plates with 200 μL IMAC wash buffer in each well, and then transfer the beads to elusion plates containing 30 μL IMAC elution buffer in each well. The concentrations of all proteins in each well were determined by the Bradford assay directly. The elution solution in each well was used to make a 25 μL protein solution at indicated concentration and mixed with 25 μL of 50 μM DTZ PBS solution. The luminescence signals were acquired over a course of 15 min while the actual point mutation was identified by sequencing. Thus, the mutation-to-activity relationship can be mapped. To evaluate whether these beneficial mutations are synergistic, we ordered individual mutants with combinatorial mutations at residue 14, 60, 96, 98, and 110 (see Table 4), expressed, and purified these LuxSit variants for kinetic, emission spectra, and luminescence intensity. We identified four mutants that can produce 47 to 77-fold more photons than the parent LuxSit. We assigned one of which, LuxSit-f (A96M/M110V), for its strong initial flash emission. Since the mutations at residue 96 and 110 are robust and mutations at residue 60 are versatile, we generated a fully randomized library at 60, 96, and 110 positions to exhaustively screen all possible combinations. After the colony-based screening, we identified many colonies with strong luciferase activities with DTZ (
For Michaelis-Menten kinetics measurements, 25 μL of serial diluted DTZ substrate in Tris pH 8.0 buffer was added into the wells of a white 96-well half-area microplate containing 25 μL of purified luciferases (final enzyme concentration: 100 nM; substrate concentration: 0.78 to 50 μM). Measurements were taken every 1 min (0.1 s integration and 10 s shaking between each interval) for a total of 20 min. Initial velocities were estimated as the average of the light intensities from the first three data points to fit the Michaelis-Menten equation. All relative arbitrary unit (RLU) per second values were converted to photon/s by the luminol-H2O2—HRP calibration method45. Following the equation: Imax=LQY×kcat×[E], Imax is the maximal photon flux (photon s−1), [E] is the total enzyme concentration, and V ax is the maximum photon flux per molecule (photon s−1 molecule−1) from the fitting of the Michaelis-Menten equation. To determine the luminescent quantum yields, 25 μL of 5 μM individual substrate in PBS was injected into 25 μL PBS containing 100 nM corresponding luciferase. DTZ was used for all LuxSit variants while CTZ was used as the substrate of native RLuc. The luminescence signals were monitored until the reactions were completed (0.1 s integration and measurements were taken every 5 s for a total of 40 min). The sum of luminescence photon counts was normalized to the total photon counts of RLuc/CTZ pair (LQY=5.3±0.1%)58 to derive relative luminescent quantum yields of LuxSit variants (
Purified protein samples were prepared at 15 μM in pH 7.4 10 mM phosphate buffer. Spectra from 190 nm to 260 nm were recorded at 25° C., 50° C., 75° C., 95° C., and after cooling back to 25° C. Thermal denaturation was monitored at 220 nm from 25° C. to 95° C. (1° C. per min increments). Tm values were not reported because no obvious inflection points of the melting curves.
HEK293T and HeLa cell lines were maintained at 37° C. with humidified 5% CO2 atmosphere and cultured in Dulbecco's Modified Eagle's Medium (DMEM, GIBDO) supplemented with 10% fetal bovine serum (FBS, Sigma). Cells were transfected with Turbofectin™ 8.0 (Origene) with 500 μg of plasmid DNA. After 24 h at 37° C. in a CO2 incubator, the medium was removed, and cells were collected and resuspended in Dulbecco's phosphate-buffered saline (DPBS).
Cells were washed twice with HBSS and subsequently imaged in HBSS in the dark at 37° C. Right before imaging, cells were incubated with 25 μM DTZ. Epifluorescence imaging was conducted on a Yokogawa CSU-X1 microscope equipped with a Hamamatsu ORCA-Fusion scientific CMOS camera and Lumencor Celesta light engine. Objectives used were: 10×, NA 0.45. WD 4.0 mm, 20×, NA 1.4, WD 0.13 mm, and 40×, NA 0.95. WD 0.17-0.25 mm with correction collar for cover glass thickness (0.11 mm to 0.23 mm) (Plan Apochromat Lambda). Imaging for BFP utilized a 408 nm laser, 432/36 nm dichroic, and a 440/40 nm emission filter (Semrock). Exposure times were 200 ms for BFP and 10 s for luminescence. All epifluorescence experiments were subsequently analyzed using NIS Elements software.
15. Multiplex Dual-Luciferase Reporter Assay for the cAMP/PKA and NF-κB Pathways
HEK293T cells were grown in a tissue culture-grade white 96-well plate and transfected with indicated CRE-RLuc, NFκB-LuxSit-i, and CMV-CyOFP plasmids. 24 h after transfection, the medium was replaced by 2 μM of Forskolin (FSK) or 300 ng/mL human tumor necrosis factor alpha (TNFα) in regular cell media. 23 h after stimulation, the cells were resuspended in DPBS by pipette mixing. 25 μL of DPBS containing 30,000 intact cells was mixed with 25 μL of CelLytic M for 15 min to make cell lysates. For intact cell assay. 25 μL of DPBS containing 15.000 intact cells was mixed with 25 μL of PP-CTZ (2 μM) or/and DTZ (10 μM) in DPBS. For cell lysate assay, 25 μL of cell lysate was added to 25 μL of PP-CTZ (2 μM) or/and DTZ (10 μM) to initiate luminescence reactions. The signals were recorded every 1 min for a total of 10 min. The light signals were collected in the substrate-resolved mode without filters and with 528/20 and 390/35 filters under the spectrally resolved mode. Area scanning the fluorescence intensity of CyOFP at 480 nm (excitation wavelength) and 580 nm (emission wavelength) was used to estimate the total cell numbers and transfection efficiency. The reported unit was the average of the first 10 min luminescence (RLU) over the relative fluorescence units (a.u.). To derive fold-of-activation, all values were normalized to the corresponding non-stimulated control.
No statistical methods were used to pre-determine the sample size. No sample was excluded from data analysis. Results were reproduced using different batches of pure proteins on different days. Unless otherwise indicated, data are shown as mean±s.d., and error bars in figures represent s.d. of technical triplicate. Data were analyzed and plotted using GraphPad Prism 8, seaborn, and matplotlib.
amean (95% CI)., n = 3 (technical triplicates);
bIntegrated luminescence intensity over the first 20 min. All values were normalized to the signal of LuxSit with 25 μM DTZ in Tris pH 8.0 buffer.
This application claims priority to U.S. Provisional Patent Application Serial Nos. 63/300,171 filed Jan. 17, 2022 and 63/381,922 filed Nov. 1, 2022, each incorporated by reference herein in their entirety.
This invention was made with government support under Grant No. K99EB031913, awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2023/060615 | 1/13/2023 | WO |
Number | Date | Country | |
---|---|---|---|
63381922 | Nov 2022 | US | |
63300171 | Jan 2022 | US |