Rationally manipulating protein localization can provide fundamental insights into cellular processes and is a powerful tool for engineering cellular behaviors. Techniques that allow temporal regulation of protein localization are particularly valuable for interrogating and programming dynamic cellular processes, with light and small molecules serving as the most widely used means of user-defined control.
In one aspect, the disclosure provides non-naturally occurring polypeptides comprising the general formula X1-X2-X3-X4-X5, wherein:
X1 optionally comprises first, second, third, and fourth helical domains;
X2 comprises a fifth helical domain comprising the amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of HSIVYAIEAAIF (SEQ ID NO:1), wherein 1, 2, 3, or all 4 of the following changes from SEQ ID NO:1 are not permissible: H1K, S2L, Y5E, and F12R
X3 comprises a sixth helical domain;
X4 comprises a seventh helical domain comprising the amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of RNVEHALMRIVLAIY (SEQ ID NO:2), wherein 1, 2, 3, or all 4 of the following changes from SEQ ID NO:2 are not permissible: R1E, H5E, M8K, and L12K; and
X5 comprises an eighth helical domain. In various embodiments, acceptable substitutions in X2 relative to SEQ ID NO:1 are selected from the group shown in Table 1 and Table 2; acceptable substitutions in X4 relative to SEQ ID NO:2 are selected from the group shown in Table 3 and Table 4; X2 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
X4 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
RNVEHALMRIVLAIY
LAEENLREAEES;
X3 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of EVRELARELVRLAVEAAEEVQR (SEQ ID NO:5); X5 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of EKREKARERVREAVERAEEVQR (SEQ ID NO:6); and/or X1, when present, comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of:
In another aspect, the disclosure provides non-naturally occurring polypeptide comprising the general formula X1-X2-X3-X4-X5-X6-X7, wherein:
X1 comprises first helical domain;
X2 comprises a second helical domain comprising the amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
DLANLAVAAVLTACL
,
wherein 1, 2, 3, 4, 5, 6, or all 7 of the following changes from SEQ ID NO:20 are not permissible: D1K, N4S, L5Q, A8E, L11K, T12L, and L15E;
X3 comprises a third helical domain;
X4 comprises a fourth helical domain comprising the amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of RAVILAIM (SEQ ID NO:21), wherein 1, 2, 3, or all 4 of the following changes from SEQ ID NO:21 are not permissible: R1E, I4K, I7C, and M8E;
X5 comprises a fifth helical domain;
X6 comprises a sixth helical domain comprising the amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of RAIWLAAE (SEQ ID NO:22), wherein 1, 2, 3, or all 4 of the following changes from SEQ ID NO:22 are not permissible: R1L, I3C, W4E, and A7Q; and
X7 comprises seventh and eighth helical domains. In various embodiments, acceptable substitutions in X2 relative to SEQ ID NO:20 are selected from those shown in Table 6 and Table 7; acceptable substitutions in X4 relative to SEQ ID NO:21 are selected from those shown in Table 8 and Table 9; acceptable substitutions in X6 relative to SEQ ID NO:22 are selected from those shown in Table 10 and Table 11; X2 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
X4 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
X6 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
X1 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
X3 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO: 27); X5 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO:28); and/or X7 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
In a further aspect, the disclosure provides fusion protein comprising:
(a) the polypeptide of any embodiment or combination of embodiments of the disclosure; and
(b) a polypeptide localization domain at the N-terminus and/or the C-terminus of the fusion protein, and/or a protein having one or more interaction surfaces.
In one aspect, the disclosure provides recombinant fusion proteins, comprising a polypeptide of the general formula X1-B1-X2-B2-X3, wherein
(a) one of X1 and X3 is selected from the group consisting of
(b) the other of X1 and X3 is an NS3a peptide (either catalytically active or dead), wherein if X1 or X3 is the ANR peptide, then NS3a is one of SEQ ID NOS:30-38;
(c) X2 is a protein having one or more interaction surfaces; and
(d) B1 and B2 are optional amino acid linkers.
In one embodiment, the NS3a peptide comprises the amino acid sequence having at least 80%, 75%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of the amino acid sequence selected from the group consisting of SEQ ID NOS:30-38, wherein the bolded amino acid residue is the catalytic position, wherein the bolded “S” residue represents catalytically active NS3a peptides, and wherein the bolded ‘S” residue can be substituted with an alanine (or other) residue to render the NS3a peptide catalytically dead.
In another aspect, the disclosure provides polypeptides comprising the amino acid sequence selected from the group consisting SEQ ID NO:31-38, wherein the bolded amino acid residue is the catalytic position, wherein the bolded “S” residue represents catalytically active NS3a peptides, and wherein the bolded ‘S” residue can be substituted with an alanine (or other) residue to render the NS3a peptide catalytically dead.
In a further aspect, the disclosure provides combinations, comprising:
(a) a first fusion protein comprising:
(b) one or more second fusion proteins comprising:
In various further aspects, the disclosure provides nucleic acids encoding the polypeptide, fusion protein, or the recombinant fusion protein of any embodiment or combination of embodiments disclosed herein; expression vectors comprising the nucleic acid operatively linked to a promoter sequence; host cells comprising the nucleic acids and/or expression vectors; and use of the polypeptide, fusion protein, recombinant fusion protein, combination, nucleic acid, expression vector, or host cell or any embodiment disclosed herein to carry out any methods, including but not limited to those disclosed herein.
As used herein and unless otherwise indicated, the terms “a” and “an” are taken to mean “one”, “at least one” or “one or more”. Unless otherwise required by context, singular terms used herein shall include pluralities and plural terms shall include the singular.
Unless the context clearly requires otherwise, throughout the description and the claims, the words ‘comprise’, ‘comprising’, and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to”. Words using the singular or plural number also include the plural or singular number, respectively. Additionally, the words “herein,” “above” and “below” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of this application.
As used herein, the amino acid residues are abbreviated as follows: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R), cysteine (Cys; C), glutamic acid (Glu; E), glutamine (Gln; Q), glycine (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Leu; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V).
All embodiments of any aspect of the invention can be used in combination, unless the context clearly dictates otherwise.
In a first aspect, the disclosure provides non-naturally occurring polypeptide comprising the general formula X1-X2-X3-X4-X5, wherein:
X1 optionally comprises first, second, third, and fourth helical domains;
X2 comprises a fifth helical domain comprising the amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of HSIVYAIEAAIF (SEQ ID NO:1), wherein 1, 2, 3, or all 4 of the following changes from SEQ ID NO:1 are not permissible: H1K, S2L, Y5E, and F12R
X3 comprises a sixth helical domain;
X4 comprises a seventh helical domain comprising the amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of RNVEHALMRIVLAIY (SEQ ID NO:2), wherein 1, 2, 3, or all 4 of the following changes from SEQ ID NO:2 are not permissible: R1E, H5E, M8K, and L12K; and
X5 comprises an eighth helical domain.
The polypeptides of this aspect are danoprevir/NS3a complex reader (DNCR) polypeptides that selectively bind a danoprevir/NS3a complex over the apo NS3a protein, where NS3a is any variant of the HCV protease NS3/4a (any genotype and catalytically active or dead), as described in detail in the attached appendices. The functional part of DNCR is the interface with danoprevir/NS3a, which includes portions of helices 5 and 7. This interface could be grafted onto any protein backbone that supported the arrangement of these helices while retaining activity as a danoprevir/NS3a complex reader. There is flexibility in the amino acid sequence of these interface helices, with the general mutational trends permitted discussed in the examples that follow. The X1 helical domains are optional, in that the inventors have shown binding in the absence of the first four helical domains. As will be understood, 1, 2, 3, or all 4 helical domains may be present or absent. For example, only helical domain 4 may be present; only helical domains 3-4 may be present, only helical domains 2-4 may be present; helical domains 1-4 may be present, or none of helical domains 1-4 may be present.
As used herein, a “helical domain” is any sequence of amino acids that forms an alpha-helical secondary structure. In one embodiment, the helical domains do not include any proline residues. In another embodiment, the length of the 5th and 7th helical domains is at least 12 amino acids. In other embodiments, the length of each helical domain is at least 12 amino acids in length. In other exemplary embodiments, the length of each helical domain is independently between 12 and 35, 12-30, 15-30, 20-30, 22-28, 23-27, 24-26, or 25 amino acids in length.
In various embodiments:
In one embodiment, acceptable substitutions in X2 relative to SEQ ID NO:1 are selected from the group consisting of those shown in Table 1.
As used herein, aliphatic residues include Ile, Val, Leu, and Ala; polar residues include Lys, Arg, Glu, Asp, Gln, Ser, Thr, and Asn; aromatic residues include Trp, Tyr, Phe; and small residues include Gly, Ser, Cys, Ala, and Thr. In another embodiment, acceptable substitutions in X2 relative to SEQ TD NO:1 are selected from the group consisting of those shown in Table 2.
In a further embodiment, acceptable substitutions in X4 relative to SEQ TD NO:2 are selected from the group consisting of those shown in Table 3.
In another embodiment, acceptable substitutions in X4 relative to SEQ TD NO:2 are selected from the group consisting of those shown in Table 4.
In one embodiment, X2 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
In another embodiment, X4 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
RNVEHALMRIVLAIY
LAEENLREAEES.
In a further embodiment, X3 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of EVRELARELVRLAVEAAEEVQR (SEQ ID NO:5). In another embodiment, X5 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of EKREKARERVREAVERAEEVQR (SEQ ID NO:6). In one embodiment, X1, when present, comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of SEQ ID NO:7.
In various embodiments:
X4 comprises the amino acid sequence having at least 60% identity to the full length of
RNVEHALMRIVLAIY
LAEENLREAEES,
X3 comprises the amino acid sequence having at least 60% identity to the full length of EVRELARELVRLAVEAAEEVQR (SEQ ID NO:5), X5 comprises the amino acid sequence having at least 60% identity to the full length of EKREKARERVREAVERAEEVQR (SEQ ID NO:6), and X1, when present, comprises the amino acid sequence having at least 60% identity to the full length of SEQ ID NO:7;
X4 comprises the amino acid sequence having at least 70% identity to the full length of
RNVEHALMRIVLAIY
LAEENLREAEES,
X3 comprises the amino acid sequence having at least 70% identity to the full length of EVRELARELVRLAVEAAEEVQR (SEQ ID NO:5), X5 comprises the amino acid sequence having at least 70% identity to the full length of EKREKARERVREAVERAEEVQR (SEQ ID NO:6), and X1, when present, comprises the amino acid sequence having at least 70% identity to the full length of SEQ ID NO:7;
X4 comprises the amino acid sequence having at least 80% identity to the full length of
RNVEHALMRIVLAIY
LAEENLREAEES,
X3 comprises the amino acid sequence having at least 80% identity to the full length of EVRELARELVRLAVEAAEEVQR (SEQ ID NO:5), X5 comprises the amino acid sequence having at least 80% identity to the full length of EKREKARERVREAVERAEEVQR (SEQ ID NO:6), and X1, when present, comprises the amino acid sequence having at least 80% identity to the full length of SEQ ID NO:7;
X4 comprises the amino acid sequence having at least 80% identity to the full length of
RNVEHALMRIVLAIY
LAEENLREAEES,
X3 comprises the amino acid sequence having at least 80% identity to the full length of EVRELARELVRLAVEAAEEVQR (SEQ ID NO:5), X5 comprises the amino acid sequence having at least 80% identity to the full length of EKREKARERVREAVERAEEVQR (SEQ ID NO:6), and X1, when present, comprises the amino acid sequence having at least 80% identity to the full length of SEQ ID NO:7;
X4 comprises the amino acid sequence having at least 90% identity to the full length of
RNVEHALMRIVLAIY
LAEENLREAEES,
X3 comprises the amino acid sequence having at least 90% identity to the full length of EVRELARELVRLAVEAAEEVQR (SEQ ID NO:5), X5 comprises the amino acid sequence having at least 90% identity to the full length of EKREKARERVREAVERAEEVQR (SEQ ID NO:6), and X1, when present, comprises the amino acid sequence having at least 90% identity to the full length of SEQ ID NO:7;
X4 comprises the amino acid sequence having at least 95% identity to the full length of
RNVEHALMRIVLAIY
LAEENLREAEES,
X3 comprises the amino acid sequence having at least 95% identity to the full length of EVRELARELVRLAVEAAEEVQR (SEQ ID NO:5), X5 comprises the amino acid sequence having at least 95% identity to the full length of EKREKARERVREAVERAEEVQR (SEQ ID NO:6), and X1, when present, comprises the amino acid sequence having at least 95% identity to the full length of SEQ ID NO:7; or
X4 comprises the amino acid sequence having 100% identity to the full length of
RNVEHALMRIVLAIY
LAEENLREAEES,
X3 comprises the amino acid sequence having 100% identity to the full length of EVRELARELVRLAVEAAEEVQR (SEQ ID NO:5), X5 comprises the amino acid sequence having 100% identity to the full length of EKREKARERVREAVERAEEVQR (SEQ ID NO:6), and X1, when present, comprises the amino acid sequence having 100% identity to the full length of SEQ ID NO:7.
In various further embodiments, the polypeptide comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of SEQ ID NO:8, SEQ ID NO:9, or SEQ ID NO:10.
As discussed in the examples that follow, the inventors have extensively characterized permitted variability in the sequence of the DNCR polypeptides disclosed herein. Exemplary substitutions are provided in Table 5 and based on experimental variation of DNCR1 (SEQ ID NO: 9) positions 117-191. Thus, in one embodiment, acceptable substitutions relative to SEQ ID NO:8-10 are selected from the group shown in Table 5.
In another aspect, the disclosure provides non-naturally occurring polypeptide comprising the general formula X1-X2-X3-X4-X5-X6-X7, wherein:
X1 comprises first helical domain;
X2 comprises a second helical domain comprising the amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97, 98% 99%, or 100% identity to the full length of
DLANLAVAAVLTACL
,
wherein 1, 2, 3, 4, 5, 6, or all 7 of the following changes from SEQ ID NO:20 are not permissible: D1K, N4S, L5Q, A8E, L11K, T12L, and L15E;
X3 comprises a third helical domain;
X4 comprises a fourth helical domain comprising the amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of RAVILAIM (SEQ ID NO:21), wherein 1, 2, 3, or all 4 of the following changes from SEQ ID NO:21 are not permissible: R1E, I4K, I7C, and M8E;
X5 comprises a fifth helical domain;
X6 comprises a sixth helical domain comprising the amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of RAIWLAAE (SEQ ID NO:22), wherein 1, 2, 3, or all 4 of the following changes from SEQ ID NO:22 are not permissible: R1L, I3C, W4E, and A7Q; and
X7 comprises seventh and eighth helical domains.
The polypeptides of this aspect are grazoprevir/NS3a complex reader (GNCR) polypeptides, defined as a protein that selectively binds the grazoprevir/NS3a complex over the apo NS3a protein, where NS3a is any variant of the HCV protease NS3/4a (any genotype and catalytically active or dead), as described in detail herein. The functional part of GNCR is the interface with grazoprevir/NS3a, which includes portions of helices 2, 4, and 6, as defined herein. This interface can be grafted onto any protein backbone that supported the arrangement of these helices and still serve as a grazoprevir/NS3a complex reader. Additionally, there is flexibility in the sequence of these interface helices, with exemplary mutational trends discussed in the examples herein.
In one embodiment, acceptable substitutions in X2 relative to SEQ ID NO:20 are selected from the group consisting of those shown in Table 6
In another embodiment, acceptable substitutions in X2 relative to SEQ ID NO:20 are selected from the group shown in Table 7.
In a further embodiment, acceptable substitutions in X4 relative to SEQ ID NO:21 are selected from the group shown in Table 8
In another embodiment, acceptable substitutions in X4 relative to SEQ ID NO:21 are selected from the group consisting those shown in Table 9.
In one embodiment, acceptable substitutions in X6 relative to SEQ ID NO:22 are selected from the group consisting of those shown in Table 10
In a further embodiment, acceptable substitutions in X6 relative to SEQ ID NO:22 are selected from those shown in Table 11.
In various embodiments,
DLANLAVAAVLTACL
,
DLANLAVAAVLTACL
,
DLANLAVAAVLTACL
,
DLANLAVAAVLTACL
,
DLANLAVAAVLTACL
,
In another embodiment, X2 comprises the amino acid sequence having at least 25%, 30%, 35%0, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
In a further embodiment, X4 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
In one embodiment, X6 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
In another embodiment, X1 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of IEKLCKKAEEEAKEAQEKADELRQRH (SEQ ID NO:26). In a further embodiment, X3 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO: 27). In one embodiment, X5 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO:28). In another embodiment, X7 comprises the amino acid sequence having at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of
In various embodiments
X4 comprises the amino acid sequence having at least 60% identity to the full length of
X6 comprises the amino acid sequence having at least 60% identity to the full length of
X1 comprises the amino acid sequence having at least 60% identity to the full length of IEKLCKKAEEEAKEAQEKADELRQRH (SEQ ID NO:26), X3 comprises the amino acid sequence having at least 60% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO: 27), X5 comprises the amino acid sequence having at least 60% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO:28), and X7 comprises the amino acid sequence having at least 60% identity to the full length of
X4 comprises the amino acid sequence having at least 70% identity to the full length of
X6 comprises the amino acid sequence having at least 70% identity to the full length of
X1 comprises the amino acid sequence having at least 70% identity to the full length of IEKLCKKAEEEAKEAQEKADELRQRH (SEQ ID NO:26), X3 comprises the amino acid sequence having at least 70% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO: 27), X5 comprises the amino acid sequence having at least 70% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO:28), and X7 comprises the amino acid sequence having at least 70% identity to the full length of
X4 comprises the amino acid sequence having at least 80% identity to the full length of
X6 comprises the amino acid sequence having at least 80% identity to the full length of
X1 comprises the amino acid sequence having at least 80% identity to the full length of IEKLCKKAEEEAKEAQEKADELRQRH (SEQ ID NO:26), X3 comprises the amino acid sequence having at least 80% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO: 27), X5 comprises the amino acid sequence having at least 80% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO:28), and X7 comprises the amino acid sequence having at least 80% identity to the full length of
X4 comprises the amino acid sequence having at least 90% identity to the full length of
X6 comprises the amino acid sequence having at least 90% identity to the full length of
X1 comprises the amino acid sequence having at least 90% identity to the full length of IEKLCKKAEEEAKEAQEKADELRQRH (SEQ ID NO:26), X3 comprises the amino acid sequence having at least 90% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO: 27), X5 comprises the amino acid sequence having at least 90% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO:28), and X7 comprises the amino acid sequence having at least 90% identity to the full length of
X4 comprises the amino acid sequence having at least 95% identity to the full length of
X6 comprises the amino acid sequence having at least 95% identity to the full length of
X1 comprises the amino acid sequence having at least 95% identity to the full length of IEKLCKKAEEEAKEAQEKADELRQRH (SEQ ID NO:26), X3 comprises the amino acid sequence having at least 95% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO: 27), X5 comprises the amino acid sequence having at least 95% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO:28), and X7 comprises the amino acid sequence having at least 95% identity to the full length of
X4 comprises the amino acid sequence having 100% identity to the full length of
X6 comprises the amino acid sequence having 100% identity to the full length of
X1 comprises the amino acid sequence having 100% identity to the full length of IEKLCKKAEEEAKEAQEKADELRQRH (SEQ ID NO:26), X3 comprises the amino acid sequence having 100% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO: 27), X5 comprises the amino acid sequence having 100% identity to the full length of DIAKLCIKAASEAAEAASKAAELAQR (SEQ ID NO:28), and X7 comprises the amino acid sequence having 100% identity to the full length of
In another embodiment, the polypeptide has at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of a polypeptide selected from the group consisting of SEQ ID NOS:11-12
The inventors have extensively characterized permitted variability in the sequence of the GNCR polypeptides disclosed herein. In one embodiment, acceptable substitutions relative to SEQ ID NO:11-12 are selected from the group shown in Table 12.
In another embodiment, amino acid substitutions relative to the reference peptides are conservative amino acid substitutions. As used herein, “conservative amino acid substitution” means a given amino acid can be replaced by a residue having similar physiochemical characteristics, e.g., substituting one aliphatic residue for another (such as Ile, Val, Leu, or Ala for one another), or substitution of one polar residue for another (such as between Lys and Arg; Glu and Asp; or Gln and Asn). Other such conservative substitutions, e.g., substitutions of entire regions having similar hydrophobicity characteristics, are known. Polypeptides comprising conservative amino acid substitutions can be tested in any one of the assays described herein to confirm that a desired activity, e.g. antigen-binding activity and specificity of a native or reference polypeptide is retained. Amino acids can be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp. 73-75, Worth Publishers, New York (1975)): (1) non-polar: Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), Met (M); (2) uncharged polar: Gly (G), Ser (S), Thr (T), Cys (C), Tyr (Y), Asn (N), Gln (Q); (3) acidic: Asp (D), Glu (E); (4) basic: Lys (K), Arg (R), His (H). Alternatively, naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe. Non-conservative substitutions will entail exchanging a member of one of these classes for another class. Particular conservative substitutions include, for example; Ala into Gly or into Ser; Arg into Lys; Asn into Gln or into H is; Asp into Glu; Cys into Ser; Gln into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; Ile into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into Gln or into Glu; Met into Leu, into Tyr or into Ile; Phe into Met, into Leu or into Tyr; Ser into Thr; Thr into Ser; Trp into Tyr; Tyr into Trp; and/or Phe into Val, into Ile or into Leu.
In all of the above embodiments of the DNCR and the GNCR polypeptides, the polypeptides may comprise amino acid linkers between one or more of the helical domains. Any suitable linkers can be used, having any amino acid composition and length as determined appropriate for an intended use. In various embodiments, the linkers may be flexible, for example being rich in glycine, serine, and/or threonine residues. In other embodiments, the linker may not include proline residues.
In one embodiment, the disclosure provides fusion protein comprising:
(a) the polypeptide of any embodiment or combination of embodiments disclosed herein; and
(b) a polypeptide localization domain at the N-terminus and/or the C-terminus of the fusion protein.
This embodiment permits localization to a cellular target. Any suitable localization domain can be used as deemed appropriate for an intended purpose. In non-limiting embodiments, the localization domain may target the fusion protein to the cell membrane, the nucleus, the mitochondria, Golgi apparatus, cell surface receptors, etc.
In another embodiment, the disclosure provides fusion protein comprising:
(a) the polypeptide of any embodiment or combination of embodiments disclosed herein; and
(b) a protein having one or more interaction surfaces.
This embodiment provide additional functionality to the polypeptides by regulating interactions with binding partners of the protein having one or more interaction surface. Any suitable protein can be used as deemed appropriate for an intended purpose. In non-limiting embodiments, the protein having one or more interaction surfaces comprises an enzymatic protein, protein-protein interaction domain, a nucleic acid-binding domain, etc. In various further embodiments, the protein having one or more interaction surfaces is selected from the group consisting of: Cas9 and related CRISPR proteins (catalytically active or dead), a DNA binding domain of a transcription factor (such as the Gal4 DNA binding domain), a pro-apoptotic domain (such as caspase 9), and a cell surface receptor (such as a chimeric antigen receptor).
In another aspect, the disclosure provides recombinant fusion proteins, comprising a polypeptide of the general formula X1-B1-X2-B2-X3, wherein
(a) one of X1 and X3 is selected from the group consisting of
(b) the other of X1 and X3 is an NS3a peptide (either catalytically active or dead), wherein if X1 or X3 is the ANR peptide, then NS3a is one of the following variants of HCV protease NS3/4a: NS3a (SEQ ID NO:30), or engineered variants NS3a* (SEQ ID NO:31), NS3a-H1 (SEQ ID NO:32), -H2 (SEQ ID NO:33), -H3 (SEQ ID NO:34), -H4 (SEQ ID NO:35), -H5 (SEQ ID NO:36), or -H6 ((SEQ ID NO:37);
(c) X2 is a protein having one or more interaction surfaces; and
(d) B1 and B2 are optional amino acid linkers.
As described in detail in the examples that follow, the inventors have discovered that the recombinant fusion proteins of the disclosure may be used, for example, to disallow access to the X2 protein by occlusion of its interaction surface by an X1/X3 complex in the basal state (“intramolecular binding”). This complex can then be disrupted by any of the small molecule NS3a inhibitors, allowing access to the X2 protein, as described herein. Alternatively, when X1 or X3 is the DNCR or GNCR polypeptide, access to the X2 protein interaction surface is enabled in the basal state and occluded by interaction with NS3a when the appropriate small molecule NS3a inhibitor is present (danoprevir or grazoprevir, respectively).
In one embodiment, the NS3a peptide comprises the amino acid sequence having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of the amino acid sequence selected from the group consisting of SEQ ID NO:30-38, wherein the bolded amino acid residue is the catalytic position, wherein the bolded “S” residue represents catalytically active NS3a peptides, and wherein the bolded ‘S” residue can be substituted with an alanine (or other) residue to render the NS3a peptide catalytically “dead” (which will also work in all applications):
In various embodiments, one or both of B1 and B2 are present, or both B1 and B2 are present. Any suitable linkers can be used, having any amino acid composition and length as determined appropriate for an intended use. As disclosed in the examples that follow, the inventors have provided extensive guidance on identifying the appropriate linkers in light of the protein having one or more interaction surfaces included in the fusion protein. In various embodiments, the linkers may be flexible, for example being rich in glycine, serine, and/or threonine residues. In other embodiments, the linker may not include proline residues.
In another embodiment, one of X1 and X3 is a peptide comprising the amino acid sequence having at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of the amino acid sequence selected from GELGRLVYLLDGPGYDPIHSD (SEQ ID NO:13), GELDELVYLLDGPGYDPIHSD (SEQ ID NO: 14), GELGELVYLLDGPGYDPIHSD (SEQ ID NO: 15), or GELDRLVYLLDGPGYDPIHSD (SEQ ID NO:16), or GELDELVYLLDGPGYDPIHSDVVTRGGSHLFNF (SEQ ID NO: 17) (“ANR peptide”). In this embodiment, the recombinant fusion proteins may be used, for example, to bring any protein domains that are genetically fused to ANR and NS3a together in the basal state. This complex can then be disrupted by any of the small molecule NS3a inhibitors as described herein.
Use of catalytically active vs. dead NS3a enables the creation of orthogonal ANR/NS3a systems, in which only the catalytically active NS3a/ANR complex can be disrupted by covalent inhibitors such as telaprevir or non-covalent inhibitors, while the catalytically dead NS3a/ANR complex can only be disrupted by non-covalent inhibitors such as asunaprevir. Catalytically active variants of NS3a contain the catalytic serine, bolded in “LKGSSGG” (SEQ ID NO:18) and in SEQ ID NOS:30-38, while catalytically dead versions have that serine mutated to an alanine. Exemplary embodiments of this system are described in the examples that follow, such as a demonstrated application for intramolecular gating of an enzyme or interaction domain, and a demonstrated application as an intermolecular off switch for transcription or signaling (demonstrated for transcriptional control for exogenous or endogenous promoters in mammalian cells).
In one embodiment, one of X1 and X3 is the DNCR polypeptide of any embodiment or combination of embodiments disclosed herein. In another embodiment, one of X1 and X3 is the GNCR polypeptide of any embodiment or combination of embodiments disclosed herein. In these embodiments, the recombinant fusion proteins may be used, for example, to turn off activity of the X2 protein. A possible application of this would be to have an enzymatic domain constitutively active in the basal, no drug-state, and inhibited upon NS3a inhibitor addition. Another possible application would be to allow constitutive transcription in the basal, no-drug state, where X2 is a transcription factor or catalytically dead Cas9 domain and have this transcription inactivated by formation of the complex or DNCR or GNCR with NS3a upon NS3a inhibitor addition.
The recombinant fusion protein may comprise any protein having one or more interaction surfaces as the X2 moiety, as deemed most suitable for an intended use, such as those described herein and in the attached appendices. Any suitable protein having one or more interaction surfaces can be used as deemed appropriate for an intended purpose. In non-limiting embodiments, the protein having one or more interaction surfaces comprises an enzymatic protein, protein-protein interaction domain, a nucleic acid-binding domain, etc. In various further embodiments, the protein having one or more interaction surfaces is selected from the group consisting of. Cas9 and related CRISPR proteins (catalytically active or dead), a DNA binding domain of a transcription factor (such as the Gal4 DNA binding domain), a pro-apoptotic domain (such as caspase 9), and a cell surface receptor (such as a chimeric antigen receptor). In another embodiment, X2 may be a protein including, but not limited to, a guanine nucleotide exchange factor GEF such as SOS, Cas9 and related CRISPR proteins (catalytically active or dead), a DNA binding domain of a transcription factor (such as the Gal4 DNA binding domain), a pro-apoptotic domain (such as caspase 9), and a cell surface receptor (such as a chimeric antigen receptor).
In another embodiment, the recombinant fusion protein further comprises a peptide localization tag at the N-terminus and/or the C-terminus of the fusion protein. Any suitable localization tag can be used as deemed appropriate for an intended purpose. In non-limiting embodiments, the localization tag may target the recombinant fusion protein to the cell membrane, the nucleus, the mitochondria, Golgi apparatus, cell surface receptors, etc. In one embodiment, the localization tag comprises a membrane localization or nuclear localization tag.
In non-limiting embodiments, the recombinant fusion protein comprises the amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of the amino acid sequence of:
LIERLTYHMYADPNEVRTFLTTYRSFCKPQELLSLITERFEIPEPEPTEADRIAIENGDQPLSAELKRFRKEYIQ
PVQLRVLNVCRHWVEHHFYDFERDAYLLQRMEEFIGTVRGKAMKKWVESITKIIQRKKIARDNGPGHNITFQSSP
PTVEWHISRPGHIETFDLLTLHPIEIARQLTLLESDLYRAVQPSELVGSVWTKEDKEINSPNLLKMIRHTTNLTL
WFEKCIVETENLEERVAVVSRIIEILQVFQELNNFNGVLEVVSAMNSSPVYRLDHTFEQIPSRQKKILEEAHELS
EDHYKKYLAKLRSINPPCVPFFGIYLTNILKTEEGNPEVLKRHGKELINFSKRRKVAEILGEIQQYQNQPYCLRV
ESDIKRFFENLNPMGNSME
KEFTDYLFNKSLEI
EP
GSGTGSGMAKGSVVIVGRINLSGDTAYSQQTRGLLGIIIT
SLTGRDKNQVDGEVQVLSTATQSFLATCVNGVCWTVYHGAGSKTLAGPKGPITQMYTNVDQDLVGWPAPPGARSM
TPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPVSYLKGSSGGPLLCPSGHVVGIFRAAVCTRGVAKAVDF
IPVESMETTMR
GSGTGSGGSGTGDYKDDDDKQHKLRKLNPPDESGPGCMSCKCVLS
In another aspect, the disclosure provides polypeptides comprising the amino acid sequence selected from the group consisting of SEQ ID NOS:31-38, wherein the bolded amino acid residue is the catalytic position, wherein the bolded “S” residue represents catalytically active NS3a peptides, and wherein the bolded ‘S” residue can be substituted with an alanine (or other) residue to render the NS3a peptide catalytically “dead” (which will also work in all applications):
As disclosed herein, the polypeptides of this aspect of the disclosure reduce membrane binding of the Ns3A protein, and thus are particularly useful for the intermolecular binding aspects and embodiments disclosed herein. The polypeptides of this claim are engineered chimeras of natural genotype 1b HCV protease NS3/4a and a solubility optimized genotype 1a HCV protease NS3/4a (catalytically active or dead). These non-natural variants of NS3a allow binding to the peptide ANR while having reduced binding to cellular membranes.
In another aspect, the disclosure provides combinations, comprising:
(a) a first fusion protein comprising:
(b) one or more second fusion proteins comprising:
These combinations can be used for intermolecular binding uses of any kind. Numerous exemplary embodiments are disclosed herein. The localization tags and proteins having one or more interaction surface can be any suitable ones, including but not limited to those disclosed herein and the attached examples. In one embodiment, the first fusion protein comprises the NS3a polypeptide comprising the amino acid sequence selected from the group consisting of SEQ ID NOS:31-38, wherein the bolded amino acid residue is the catalytic position, wherein the bolded “S” residue represents catalytically active NS3a peptides, and wherein the bolded ‘S” residue can be substituted with an alanine (or other) residue to render the NS3a peptide catalytically “dead”. In another embodiment, the second fusion protein comprises a polypeptide comprising the amino acid sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to the full length of the amino acid sequence selected from GELGRLVYLLDGPGYDPIHSD (SEQ ID NO:13), GELDELVYLLDGPGYDPIHSD (SEQ ID NO:14), GELGELVYLLDGPGYDPIHSD (SEQ ID NO:15), GELDRLVYLLDGPGYDPIHSD (SEQ ID NO:16), or GELDELVYLLDGPGYDPIHSDVVTRGGSHLFNF (SEQ ID NO:17) (“ANR peptide”).
In further embodiments, the second fusion protein comprises the DNCR polypeptide of any embodiment or combination of embodiments disclosed herein. In other embodiments, the second fusion protein comprises the GNCR polypeptide of any embodiment or combination of embodiments disclosed herein.
The polypeptides, fusion proteins, and recombinant fusion proteins described herein may be chemically synthesized or recombinantly expressed (when the polypeptide is genetically encodable), and may include additional residues at the N-terminus, C-terminus, or both that are not present in the polypeptides or peptide domains of the disclosure; these additional residues are not included in determining the percent identity of the polypeptides or peptide domains of the disclosure relative to the reference polypeptide. Such residues may be any residues suitable for an intended use, including but not limited to detection tags (i.e.: fluorescent proteins, antibody epitope tags, etc.), adaptors, ligands suitable for purposes of purification (His tags, etc.), other peptide domains that add functionality to the polypeptides, etc.
In a further aspect, the present disclosure provides nucleic acids encoding a polypeptide, fusion protein, and/or recombinant fusion proteins of the present invention that can be genetically encoded. The nucleic acid sequence may comprise RNA, DNA, and/or modified nucleic acids. Such nucleic acid sequences may comprise additional sequences useful for promoting expression and/or purification of the encoded protein, including but not limited to polyA sequences, modified Kozak sequences, and sequences encoding epitope tags, export signals, and secretory signals, nuclear localization signals, and plasma membrane localization signals. It will be apparent to those of skill in the art, based on the teachings herein, what nucleic acid sequences will encode the polypeptides, fusion protein, and/or recombinant fusion proteins of the invention.
In another aspect, the present disclosure provides expression vectors comprising the nucleic acid of any embodiment or combination of embodiments disclosed herein operatively linked to a suitable control sequence. Expression vectors include vectors that operatively link a nucleic acid coding region or gene to any control sequences capable of effecting expression of the gene product. “Control sequences” operably linked to the nucleic acid sequences of the invention are nucleic acid sequences capable of effecting the expression of the nucleic acid molecules. The control sequences need not be contiguous with the nucleic acid sequences, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the nucleic acid sequences and the promoter sequence can still be considered “operably linked” to the coding sequence. Other such control sequences include, but are not limited to, polyadenylation signals, termination signals, and ribosome binding sites. Such expression vectors include but not limited to, plasmid and viral-based expression vectors. The control sequence used to drive expression of the disclosed nucleic acid sequences in a mammalian system may be constitutive (driven by any of a variety of promoters, including but not limited to, CMV, SV40, RSV, actin, EF) or inducible (driven by any of a number of inducible promoters including, but not limited to, tetracycline, ecdysone, steroid-responsive). The expression vector must be replicable in the host organisms either as an episome or by integration into host chromosomal DNA. In various embodiments, the expression vector may comprise a plasmid, viral-based vector, or any other suitable expression vector.
In a further aspect, the present disclosure provides host cells that comprise the nucleic acid and/or expression vectors disclosed herein, wherein the host cells can be either prokaryotic or eukaryotic. The cells can be transiently or stably engineered to incorporate the expression vector of the invention, using standard techniques in the art, including but not limited to standard bacterial transformations, calcium phosphate co-precipitation, electroporation, or liposome mediated-, DEAE dextran mediated-, polycationic mediated-, or viral mediated transfection. (See, for example, Molecular Cloning: A Laboratory Manual (Sambrook, et al., 1989, Cold Spring Harbor Laboratory Press; Culture of Animal Cells: A Manual of Basic Technique, 2nd Ed. (R.I. Freshney. 1987. Liss, Inc. New York, N.Y.). A method of producing a polypeptide according to the invention is an additional part of the disclosure. The method comprises the steps of (a) culturing a host according to this aspect of the invention under conditions conducive to the expression of the polypeptide, and (b) optionally, recovering the expressed polypeptide. The expressed polypeptide can be recovered from the cell free extract, but preferably they are recovered from the culture medium.
In another aspect, the disclosure provides use of the polypeptides, fusion proteins, recombinant fusion proteins, combinations, nucleic acids, expression vectors, and/or host cells of any embodiment or combination of embodiments disclosed herein, to carry out any methods, including but not limited to those disclosed herein. Numerous exemplary uses of the polypeptides, fusion proteins, recombinant fusion proteins, combinations, nucleic acids, expression vectors, and/or host cells are described in the examples that follow. In exemplary non-limiting embodiments, the methods may include:
Here, we describe a new chemically-controlled method for rapidly disrupting the interaction between two basally co-localized protein binding partners. Our chemically-disrupted proximity (CDP) system is based on the interaction between the hepatitis C virus protease (HCVp) NS3a and a genetically-encoded peptide inhibitor. Using clinically-approved antiviral inhibitors as chemical disrupters of the NS3a/peptide interaction, we demonstrate that our CDP system can be used to confer temporal control over diverse intracellular processes. This NS3a-based CDP system represents a new modality for engineering chemical control over intracellular protein function that is complementary to currently available technologies.
Rationally manipulating protein localization can provide fundamental insights into cellular processes and is a powerful tool for engineering cellular behaviors. Techniques that allow temporal regulation of protein localization are particularly valuable for interrogating and programming dynamic cellular processes, with light and small molecules serving as the most widely used means of user-defined control. A strategy for the chemical control of protein localization is the use of chemically-induced proximity (CIP), which allows two proteins to be colocalized upon addition of a bridging small molecule.
Systems that allow the interaction of two basally colocalized proteins to be rapidly disrupted with a small molecule provide a method for temporally controlling intracellular protein function (
Here, we describe the development and use of a CDP system based on the hepatitis C virus protease (HCVp) NS3a and its interaction with a peptide inhibitor. Clinically-approved protease inhibitors that efficiently disrupt the NS3a/peptide interaction are available as bio-orthogonal inputs for this system. We first show that our NS3a-based CDP system can be used as a chemically-disruptable autoinhibitory switch for controlling the activity of an enzyme that activates RAS GTPase. We also demonstrate that the NS3a-based CDP system can be used to rapidly disrupt subcellular protein colocalization. Demonstrating the functional utility of chemically disrupting protein colocalization, we show that our NS3a-based CDP system can be used as a transcriptional off switch.
In order to use NS3a as a platform for a CDP system, a genetically-encoded binding partner that can be displaced with protease inhibitors was used. To provide this, we investigated the use of a peptide inhibitor of NS3a's serine protease activity (
We first explored using the NS3a/ANR interaction as a chemically-disruptable autoinhibitory switch for intramolecularly controlling the guanine nucleotide exchange factor (GEF) activity of the RAS GTPase activator Son of sevenless (SOS).
We used the computational modeling tool RosettaRemodel™ to guide the selection of flexible linker lengths with which to fuse ANR and NS3a to opposing termini of SOScat. Our goal was to identify linkers of sufficient length that NS3a and ANR can form an intramolecular complex but short enough that the complex is primarily centered over SOScat's active site, with an energetic penalty for adopting non-inhibitory conformations. To do this, we computationally treated variable linker length SOScat fusions with ANR at the N-terminus and NS3a at the C-terminus as a single loop closure problem (
To demonstrate the utility of our NS3a-CDAR design for activating the RAS/ERK pathway, we transfected HEK293 cells with a membrane-targeted variant of our computationally-designed construct (
We next investigated the utility of the NS3a/ANR interaction as an intermolecular CDP system by determining whether it could provide temporal control over protein colocalization. An N-terminal amphipathic helix—helix α0—from the NS3a variant used in our NS3a-CDAR construct has previously been demonstrated to interact with membranes (
To functionally test our NS3a chimeras, we used a fluorescent protein colocalization assay (
We next determined how rapidly the intracellular NS3a(H1)/ANR interaction can be disrupted. We found that the interaction between EGFP-ANR with mitochondrially-localized NS3a(H1) was completely disrupted within five minutes of asunaprevir addition (
The localization of transcriptional activation domains upstream of genes can drive transcription and subsequent protein expression. We reasoned that the NS3a(H1)/ANR interaction could function as a chemically-disruptable off switch of transcription. To test this notion, we first determined whether ANR was capable of colocalizing the transcriptional activator VP64-p65-Rta (VPR) with a Gal4 DNA-binding domain-NS3a(H1) fusion bound upstream of an mCherry™ reporter gene (
Finally, we explored whether our CDP system could be combined with chemical methods for activating transcription. To do this, we used a nuclease-null, chemically-inducible Cas9 (dciCas9) variant that is autoinhibited by the BCL-xL/BH3 interaction and can be activated with a chemical disrupter. An NS3a(H1)-VPR fusion was recruited upstream of a GFP reporter gene through its interaction with an MCP-ANR fusion bound to an MS2 stem loop of a scaffold RNA targeted to the Tet operator (
In sum, we have developed a CDP system based on the interaction between the viral protease NS3a and a genetically-encoded peptide inhibitor. We demonstrated that our NS3a-based CDP system can be used to engineer chemical control over a number of intracellular protein functions. The use of NS3a as a component of a CDP system further expands the utility of this protease as a chemically-controllable module. The reagents and chemically-controlled methods disclosed can be used to confer temporal control over intracellular protein function. Furthermore, the orthogonality of our CDP components to currently available CIP systems allows integration of these strategies.
The NS3a-CDAR construct was modeled after a previously developed BCL-xL/BH3 autoinhibited SOScat fusion design wherein a BH3 peptide was fused to the N-terminus (residue 574) of SOScat and BCL-xL was fused to the C-terminus (residue 1020). Due to similarities in the topology between the BCL-xL/BH3 complex and the NS3a/ANR complex, we limited our computational modeling to a construct composed of SOScat (574-1029) containing ANR fused to the N-terminus and NS3a fused to the C-terminus. ANR and NS3a were fused to SOScat through flexible linkers.
The NS3a/ANR complex (PDB 4A1X) was modeled using the RosettaRemodel™ conformational sampling protocol described previously (Rose, J. C. et al. Nat. Chem. Biol. 2017, 13, 119-126.). Briefly, the NS3a/ANR autoinhibitory complex was treated as a single rigid-body between the N- and C-termini of SOScat (PDB 1XD2). To allow this setup, the SOScat structure was circularly permuted, with a chain break introduced arbitrarily, away from the termini. This scheme allows for treatment of the NS3a/ANR complex across the termini as a loop closure problem, wherein a break is randomly introduced into one of the linkers to be reconnected via both random fragment moves and chain-closure algorithms guided by the Rosetta™ energy function; trajectories that properly reconnected the chain were considered successful.
Linkers were assigned the identity of repeating glycine-serine/threonine residues. We tested N-terminal linkers between 1 and 13 residues in length at 2 residue increments, and C-terminal linkers between 5 and 29 residues in length at 2 residue increments, giving 91 different linker length combinations.
1,000 independent trajectories were sampled in 100 parallel runs that used the flags above. The lowest energy model from each successful trajectory was saved as a PDB file.
Non-biotinylated NS3a variants and ANR-GST fusions were obtained as double stranded DNA G-Blocks (IDT) containing Gibson Assembly overhangs designed in NEBuilder™ (NEB). ANR was designed with an N-terminal hexahistidine tag and a C-terminal Glutathione S-Transferase domain. NS3a protease genes were sub-cloned into the pMCSG7 vector backbone by PCR linearization of the vector, then Gibson assembly of the vector with the gene insert (NEB, product number E2611L). All NS3a constructs contained an N-terminal hexahistidine tag. This NS3a fusion was used for all in vitro experiments with NS3a except for the protease assay shown in
NS3a for biotinylation was cloned into the pDW363 vector. NS3a was N-terminally fused to AviTag™ biotin acceptor peptide followed by a hexahistidine tag. The pDW363 vector contains a bi-cistronic BirA biotin ligase. Avi-tagged NS3a was cloned into pDW363 via PCR-linearization of the vector, followed by Gibson assembly with the gene insert, obtained as double stranded DNA G-Blocks containing Gibson Assembly overhangs designed in NEBuilder™.
All constructs for NS3a-CDAR and sub-cellular colocalization microscopy experiments were obtained as codon-optimized, double-stranded DNA G-Blocks™ (Integrated DNA Technologies) containing Gibson Assembly overhangs designed in NEBuilder™ (NEB). Genes were sub-cloned into pcDNA5/FRT/TO vector (Thermo Fisher Scientific) by PCR linearization of the vector, then Gibson Assembly of the vector with the gene insert. ANR and NS3a sequence variants were obtained via Quikchange™ mutagenesis.
Plasmids containing single-guide RNAs (TRE3G) were generated by cloning into gRNA Cloning Vector (gifts from George Church (Addgene plasmid #41824)). DNA corresponding to the guide target was ordered as a single stranded oligonucleotide containing Gibson assembly overhangs complementary to the vector and assembled with AflII-digested gRNA vector. A scaffold RNA (scRNA) targeting TRE3G containing two MS2 hairpins was cloned into dual insert vectors derived from pSico™, expressing the scaffold RNA under a U6 promoter and the protein inserts under a CMV promoter: pJZC34 (MS2/MCP) (gift from Jesse Zalatan). All MS2 fusions were expressed as P2A-BFP fusions instead of the IRES-mCherry fusions in the original vectors.
The parental pLenti Gal4 reporter plasmid ‘G143’ (UAS-mCherry™/CMV-Gal4-ERT2-VP16-P2A-Puro) was a gift from Doug Fowler. The ERT2-VP16 and Puromycin resistance cassette was exchanged for NS3a(H1)-P2A-ANR—BFP-NLS-VPR. Fragments were obtained from the previously mentioned pcDNA5/FRT/TO expression systems by PCR and restriction digesting G143 with BamHI and SexAI. Fragments and digested vector were assembled using Gibson Assembly.
All PCR reactions (vector linearizations, Gibson assembly insert preparation, and Quikchanges) were performed with Q5 polymerase (New England Biolabs). All Gibson assembly reactions were performed with NEBuilder™ HiFi Assembly Master Mix (New England Biolabs). Oligonucleotides and Gene Blocks™ used for cloning were synthesized by Integrated DNA Technologies. Correct insertion of the genes and vector preparations were verified by whole gene sequencing (Genewiz). Protein sequences for all constructs used are provided in Table 13.
The SNAPtag™-NS3a-His6 plasmid was transformed into BL21(DE3) E. coli cells. One colony was used to inoculate 5 mL of LB broth with ampicillin (100 μg/mL). 18 hours post inoculation, the entirety of the 5 mL culture was used to inoculate 500 mL of LB both with ampicillin (100 μg/mL). Cultures were grown at 37° C. to on OD600 of 0.8, cooled to 18° C. and induced with 0.25 mM IPTG. Protein was expressed at 18° C. overnight. Cells were harvested by centrifugation and pellets stored at −80° C. For SNAPtag-NS3a purification, the pellets were thawed on ice and re-suspended in 10 mL of LS-His6 Lysis Buffer (50 mM HEPES pH 7.8, 100 mM NaCl, 20% (w/v) glycerol, 20 mM imidazole, 5 mM DTT). The re-suspended cell pellet was lysed via sonication and the lysate was cleared by centrifugation. The cleared lysate was purified using Ni-NTA agarose (Qiagen) by rotating at 4° C. for 1 hour. The resin was subsequently washed with 10 mL of LS-Lysis Buffer and the protein was eluted in 3 mL of LS-Elution Buffer (50 mM HEPES pH 7.8, 100 mM NaCl, 20% (w/v) glycerol, 200 mM Imidazole, 5 mM DTT). Purified protein was dialyzed twice into 1000 mL LS-Storage Buffer (50 mM HEPES pH 7.8, 100 mM NaCl, 20% (w/v) glycerol, 5 mM DTT, 0.6 mM lauryldimethylamine-N-oxide). Protein was stored by snap-freezing aliquots and storing at −80° C.
NS3a variant expressions were performed in BL21 (DE3) E. coli by growing cells at 37° C. to an O.D.600 of 0.5-1.0, then moved to 18° C. Immediately following transfer to 18° C., protein expression was induced with 0.5 mM IPTG overnight. For biotinylated constructs, 12.5 mg of D(+)-biotin/L was added simultaneously during inoculation with the overnight culture. Following 16-20 hours overnight growth, cultures were subsequently harvested, and cell pellets frozen at −80° C. Cell pellets were then re-suspended in 20 mM Tris pH 8.0, 500 mM NaCl, 5 mM imidazole, 1 mM DTT, 0.1% Tween-20. All buffers for NS3a variant purifications included 10% v/v glycerol. Cells were lysed by sonication, and the supernatant was incubated with Ni-NTA resin (Qiagen) for a minimum of 1 hour at 4° C. Ni-NTA resin was then washed with three volumes of “NS3a-Wash Buffer” (20 mM Tris pH 8.0, 500 mM NaCl, 20 mM imidazole, 10% glycerol), and proteins were eluted with “NS3a Elution Bufer” (20 mM Tris pH 8.0, 500 mM NaCl, 300 mM imidazole, 10% glycerol). Purified protein was dialyzed twice (3.5 kDa mwco Slide-A-Lyzer™ dialysis cassettes, Thermo Scientific) into 1000 mL NS3a-Storage Buffer (50 mM HEPES pH 7.8, 100 mM NaCl, 10% (w/v) glycerol, 5 mM DTT, 0.6 mM lauryldimethylamine-N-oxide). Protein was stored by snap-freezing aliquots in liquid nitrogen and storing at −80° C. Biotinylated constructs were then further purified by size exclusion chromatography on a Superdex-75 10/300 GL column (GE Healthcare) in a buffer of in 20 mM Tris pH 8.0, 300 mM NaCl, 1 mM DTT, 10% glycerol.
His6-ANR-GST plasmid was expressed in BL21(DE3) E. coli cells. 18 hours post inoculation, the entirety of the 5 mL culture was used to inoculate 250 mL of LB both with ampicillin (100 μg/mL). Cultures were grown at 37° C. to on OD600 of 0.8, cooled to 18° C. and induced with 0.5 mM IPTG. Protein was expressed at 18° C. overnight. Cells were harvested by centrifugation and pellets stored at −80° C. For ANR-GST purification, the pellet was thawed on ice and re-suspended in 10 mL of His6 Lysis Buffer (50 mM HEPES pH 7.8, 100 mM NaCl, 20 mM imidazole, 5 mM DTT) supplemented with PMSF (1 mM). The re-suspended cell pellet was lysed via sonication and the lysate was cleared by centrifugation. The cleared lysate was purified using Ni-NTA agarose (Qiagen) by rotating at 4° C. for 1 hour. The resin was subsequently washed with 10 mL of Lysis Buffer and the protein was eluted in 3 mL of Elution Buffer (50 mM HEPES pH 7.8, 100 mM NaCl, 200 mM Imidazole, 5 mM DTT). Purified protein was dialyzed twice into 1000 mL Storage Buffer (50 mM HEPES pH 7.8, 100 mM NaCl, 5 mM DTT). Protein was stored by snap-freezing aliquots and storing at −80° C.
Grazoprevir was purchased from MedChem Express (MK-5172, product #: HY-15298). Asunaprevir (BMS-650032, product #: A3195) and Danoprevir (RG7227, product #: A4024) were both purchased from ApexBio. A-115463 was purchased from ChemieTek (Product #: CT-A115).
The affinities of the NS3a variants for ANR were determined using a fluorescence polarization assay. Fluorescently labeled ANR (FAM-ANR,
Fluorescence polarization competition assays were used to determine the ability of danoprevir to displace ANR. A 75 nM solution of NS3a in FP-Buffer was incubated with 50 nM FAM-ANR in a black 96-well plate for 1 hour in the dark. 3-fold serial dilutions of danoprevir were prepared in FP-Buffer such that, when added to the NS3a/FAM-ANR solution, the highest concentration of danoprevir tested was 10 μM. Plates were incubated for 1 hour in the dark. Fluorescence polarization was measured at 22° C. on a Perkin Elmer EnVision™ fluorometer (excitation, 495 nm; emission 520 nm). Each measurement was carried out in triplicate. Anisotropy values were obtained and a nonlinear regression model was used to fit curves with GraphPad Prism.
The potency of ANR against NS3a protease was determined via a FRET assay. Titrations of ANR-GST (3-fold serial dilutions starting at 10 μM) were added to a black 96-well plate (Corning, product number 3720) containing 50 nM SNAPtag-NS3a. Reactions were incubated with NS3a-SNAPtag at room temperature for 1 hour. To each well was simultaneously added substrate M-2235 (Bachem) to a final concentration of 5 μM and reactions were monitored by measuring the fluorescence intensity every minute for 30 minutes at 22° C. on a Perkin Elmer EnVision™ fluorometer (excitation, 360 nm; emission 460 nm). Each measurement was carried out in triplicate. Slopes of the fluorescence increase were compared to a no-protease control. A nonlinear regression model was used to fit curves using GraphPad™ Prism.
Pierce high-capacity streptavidin beads (Thermo-Fisher #P120359) were prepared by washing three times with Buffer PDA (TBS+0.05% tween+0.5 mg/mL BSA). For each condition and each replicate, beads were washed and incubated separately. The wash was performed by adding 200 μL Buffer PDA to 30 μL of a 50/50 bead slurry, inverting to mix, and spinning down (2500×g for 2 min). The supernatant was removed by pipetting, and the wash was repeated two more times to end with a 50/50 slurry of beads in wash buffer.
Purified biotinylated NS3a was prepared at a 50× final concentration and 10 μL were added to a 490 μL 50/50 slurry of streptavidin beads and Buffer PD for final NS3a concentration of 125 nM. Beads were incubated and rotated at 4° C. After one hour, beads were harvested and washed three times as described previously, ending in a 50/50 bead/buffer slurry. ANR was added to all samples at a final concentration of 5 μM. For the danoprevir treated samples, danoprevir was added to a final concentration of 10 μM. Buffer PD was added to a final volume of 500 μL, and the beads were incubated and rotated at 4° C. After 1 hour, beads were pelleted and washed three times in Buffer FDB (TBS buffer+0.05% Tween) with 5 minute incubations between washes on a rotator at 4° C. To obtain final bound protein, beads were pelleted and supernatant was aspirated, resulting in a final volume of beads of 20 μL. 10 μL 3×SDS loading dye was added directly to beads and boiled at 90° C. for 10 min. Bead mixture was pelleted and supernatants were loaded directly onto a polyacrylamide gel for Western Blot analysis (Mini-PROTEAN™ TGX Any kD, Bio-Rad #456-9036).
NIH-3T3 cells were maintained in DMEM (Gibco, product number 11065092) supplemented with 10% FBS (Gibco, product number A3160602). All transient transfections were done using LipoFectamine3000 (ThermoFisher, product number L3000015) at a ratio of 3:2:1 LipoFectamine3000:p3000Reagent:DNA (μg) prepared in OptiMem™ (Gibco, product number 11058021) 16-20 hours after plating of cells. Transfections were allowed to proceed for 24 hours before experiments were performed. Cells were tested and found free of mycoplasma monthly.
24 hours prior to transfection, 3×104 3T3 cells were plated onto 18 mm glass cover slips (Fisher, product number 12-546) in a standard 12-well plate. After co-transfection with the appropriate NS3a/ANR pairs (Tom20-mCherry™-NS3a(H #)/EGFP-ANR2, Myr-mCherry™-ANR2/EGFP-NS3a(H1), or NLS3-BFP-ANR2/EGFP-NS3a(H1)), cells were allowed to recover for 24 hours before treatment with 10 μM asunaprevir or DMSO (0.5% DMSO final concentration). Cells were incubated with drug for the stated time points before media was aspirated, then washed once with chilled PBS, and immediately fixed in 4% paraformaldehyde (Electron Microscopy Services, product number 15710). Paraformaldehyde solution was prepared in 1×PBS and cells were allowed to fix for 15 minutes. Paraformaldehyde was removed and cells were washed twice with chilled PBS. Slides were mounted onto glass cover slips using Fluoromount G (Southern Biotechnology, product number 0100-01) and sealed. Images were generated using a Leica SP8× Confocal Microscope. UV lasers at 405 nm was used for BFP. White lasers (488 nm and 587 nm) were used for EGFP and mCherry™, respectively. BFP fluorescence emissions were recorded using a PMT detector. EGFP and mCherry™ fluorescence emissions were recorded by separate HyD detectors. Images were acquired using a 63× oil objective at 512×512 resolution. Only images of cells exhibiting both mCherry™ and EGFP (or both BFP and EGFP for nuclear colocalization) were collected. The degree of colocalization was measured as Pearson's r-correlation coefficients. Pearson's r coefficients were determined using ImageJ™.
All P-values are from unpaired, two-sided t-tests, computed using Graphpad™ Prism 5.
HEK293 and HEK293T cells were maintained in DMEM (Gibco, #11065092) supplemented with 10% FBS (Gibco, product number A3160602). Transient transfections for all experiments were carried out using TurboFectin8.0 (Origene) at a ratio of 3:1 TurboFectin™:DNA (μg) prepared in OptiMem™ (Gibco, #11058021) 16-20 hours after plating of cells. Transfections were allowed to proceed for 18-24 hours before experiments were performed or media was exchanged. Cells were tested and found free of mycoplasma monthly.
18-24 hours prior to transfection, 3.0×105 HEK293 cells were plated onto poly-D-lysine 12 well plates. Immediately prior to transfection, media was aspirated and cells were washed with 1 mL of pre-warmed (37° C.) PBS, then serum starved with FBS-free DMEM. Following serum starvation, cells were transfected with 1 μg of FLAG-tagged NS3a-CDAR, BH3-NS3a-CDAR, or an empty pCDNA5 vector. Transfected cells were allowed to serum stave for 18-20 hours prior to drug treatment. For drug treatment, serum-free media was prepared with DMSO or 10 μM of a drug. Media was aspirated, washed once with pre-warmed DPBS, then treated with drug/DMSO media for the requisite amount of time. Media was subsequently aspirated and the cells were washed twice with 1 mL chilled PBS, then lysed with 75 μL Mod. RIPA buffer (50 mM Tris, pH 7.8, 1% IGEPAL CA-630, 150 mM NaCl, 1 mM EDTA, 2 mM Na3VO4, 30 mM NaF, Pierce Protease Inhibitor Tablet). Cleared lysates were subjected to SDS-PAGE and transferred to nitrocellulose. Blocking and antibody incubations were done in TBS with 0.1% Tween-20 (v/v) and blocking buffer (Odyssey). Primary antibodies were all purchased from Cell Signaling Technologies and were diluted as follows: Total ERK (1:2500, #9107), phosphorylated ERK (1:2500, #4370), FLAG (1:2,500, #D6W5B). Blots were washed three times in TBS with 0.1% Tween-20. Antibody binding was detected by using near-infrared-dye-conjugated secondary antibodies and visualized on the LI-COR Odyssey scanner. Blots were quantified via densitometry with Image Studio (LI-COR).
18-24 hours prior to transfection, HEK293T cells were plated in a 12-well plate at a density of 1.25×105 cells/mL. Cells were subsequently transfected with 1 μg of the Gal4 reporter plasmid (UAS-mCherry™/CMV-Gal4-NS3a(H1)-P2A-ANR-Myc-BFP-VPR-NLS) in OptiMem™. For the negative control experiment, 500 ng of a plasmid where ANR was replaced with the non-NS3a binding protein DNCR2 (UAS-mCherry™/CMV-Gal4-NS3a(H1)-P2A-DNCR2-Myc-VPR-NLS was co-transfected with 500 ng of a BFP expressing reporter plasmid in OptiMem™. 16 hours post transfection, cells were washed with 1 mL DPBS. Complete media containing 1 μM danoprevir, 1 μM grazoprevir, or DMSO was subsequently added to each well. 24 hours after drug treatment, media was removed and cells were washed with 1 mL DPBS, then detached with 200 μL Versene™ (Sigma-Aldrich, 15-040-066). Cells were then re-suspended with 500 μL DPBS, and pelleted at 2500 rpm for 3 min at room temperature. Supernatant was subsequently removed and the cells were re-suspended in 400 μL DPBS and analyzed on a FACS LSRII (BD Biosciences).
For Gal4/NS3a-CDP mediated transcriptional activation FACS experiments, 10,000 single cell events were collected for each of the samples run. Of these 10,00 single cell events, the median mCherry™ fluorescence signal is reported only for cells exhibiting BFP signal greater than that of non-transfected cells. The gathered FACS data were analyzed using FlowJo™ (v.10.1).
dciCas9-Mediated Transcription
GFP expression experiments were performed in a HEK293T cell line with GFP stably integrated downstream of a tetracycline-inducible landing pad (7x-TRE3G operator) created in a similar manner as a previously reported Tet-Bxb1-BFP HEK293T cell line (Matreyek et al. Nucleic Acids Res. 2017, 45, e102.). For the dciCas9-mediated transcriptional activation experiment, 6×104 cells/well were plated in 12-well plates on day 1 and transfected with 1 μg total DNA on day 2 (0.3 μg dciCas9 vector, 0.3 μg NS3a(H1)-VPR vector, and 0.4 μg NLS-MCP-ANR2/TRE3G scaffold RNA vector). 18 hours after transfection, media was replaced with complete DMEM containing DMSO, 10 μM A115, or 10 μM A115 and 10 μM grazoprevir. 48 hours post drug treatment, media was aspirated and cells were washed with 1 mL pre-warmed DPBS, then detached and analyzed as described in the chemically-disruptable Gal4(DBD)-NS3a(H1)/VPR-ANR/transcriptional regulation experiment.
For FACS analysis, 10,000 single cell events were collected for each of the samples run. Of these 10,00 single cell events, the median GFP fluorescence signal is reported only for cells exhibiting BFP signal greater than that of non-transfected cells. The gathered FACS data were analyzed using FlowJo (v.10.1).
All P-values are from unpaired, two-sided t-tests, computed using Graphpad™ Prism 5.
NS3a
LLGIIITSATGRDKNQVDGEVQVLSTATQSFLATCVNGVCWTVYHGAGSKT
LAGPKGPITQMYTNVDQDLVGWPAPPGARSMTPCTCGSSDLYLVTRHADV
IPVRRRGDSRGSLLSPRPVSYLKGSSGGPLLCPSGHVVGIFRAAVCTRGVAK
AVDFIPVESMETTMRGSHHHHHH (SEQ ID NO: 51)
DTAYSQQTRGLLGCIITSATGRDKNQVDGEVQVLSTATQSFLATCVNGVC
S139A
WTVYHGAGSKTLAGPKGPITQMYTNVDQDLVGWPAPPGARSMTPCTCGS
SDLYLVTRHADVIPVRRRGDSRGSLLSPRPVSYLKGS
A
GGPLLCPSGHVVGI
FRAAVCTRGVAKAVDFIPVESMETTMR (SEQ ID NO: 52)
DTAYSQQTRGLLGCIITSATGRDKNQVDGEVQVLSTATQSFLATCVNGVC
WTVYHGAGSKTLAGPKGPITQMYTNVDQDLVGWPAPPGARSMTPCTCGS
SDLYLVTRHADVIPVRRRGDSRGSLLSPRPVSYLKGSSGGPLLCPSGHVVGI
FRAAVCTRGVAKAVDFIPVESMETTMR (SEQ ID NO: 53)
ETSQTGRDKNQVEGEVQIVSTATQTFLATSINGVLWTVYHGAGTRTIASPK
S139A
GPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLYLVTRHADVIPVRR
RGDSRGSLLSPRPISYLKGS
A
GGPLLCPAGHAVGIFRAAVSTRGVAKAVDFI
PVESLETTMRSP (SEQ ID NO: 54)
ETSQTGRDKNQVEGEVQIVSTATQTFLATSINGVLWTVYHGAGTRTIASPK
GPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLYLVTRHADVIPVRR
RGDSRGSLESPRPISYLKGSSGGPLLCPAGHAVGIFRAAVSTRGVAKAVDFI
PVESLETTMRSP (SEQ ID NO: 55)
TSQTGRDKNQVEGEVQVVSTATQSFLATSINGVLWTVYHGAGTRTIASPK
GPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLYLVTRHADVIPVRR
RGDSRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAAVSTRGVAKAVDFI
PVESLETTMRSP (SEQ ID NO: 56)
NS3a/NS3a*
chimera H1-
YSQQTRGLEGCQETSQTGRDKNQVEGEVQVVSTATQSFLATSINGVLWTV
YHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLY
LVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAA
VSTRGVAKAVDFIPVESLETTMRSPGSGTGSGTSGSTGTGSTGDYKDDDDK
NS3a/NS3a*
chimera H2-
YSQQTRGELGCQETSQTGRDKNQVEGEVQVVSTATQSFLATSINGVLWTV
YHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLY
LVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAA
VSTRGVAKAVDFIPVESLETTMRSPGSGTGSGTSGSTGTGSTGDYKDDDDK
NS3a/NS3a*
chimera H3-
YSQQTRGLLGCQETSQTGRDKNQVEGEVQVVSTATQSFLATSINGVLWTV
YHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLY
LVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAA
VSTRGVAKAVDFIPVESLETTMRSPGSGTGSGTSGSTGTGSTGDYKDDDDK
NS3a/NS3a*
chimera H4-
YSQQTRGLLGCIETSQTGRDKNQVEGEVQVVSTATQSFLATSINGVLWTV
YHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLY
LVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAA
VSTRGVAKAVDFIPVESLETTMRSPGSGTGSGTSGSTGTGSTGDYKDDDDK
NS3a/NS3a*
chimera H5-
YSQQTRGLLGCIITSQTGRDKNQVEGEVQVVSTATQSFLATSINGVLWTVY
HGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLYL
VTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAAV
STRGVAKAVDFIPVESLETTMRSPGSGTGSGTSGSTGTGSTGDYKDDDDK
NS3a/NS3a*
chimera H6-
YSQQTRGLEGCIETSQTGRDKNQVEGEVQVVSTATQSFLATSINGVLWTV
YHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLY
LVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAA
VSTRGVAKAVDFIPVESLETTMRSPGSGTGSGTSGSTGTGSTGDYKDDDDK
NS3a/NS3a*
chimera H7-
YSQQTRGEEGCQETSQTGRDKNQVEGEVQVVSTATQSFLATSINGVLWTV
YHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLY
LVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAA
VSTRGVAKAVDFIPVESLETTMRSPGSGTGSGTSGSTGTGSTGDYKDDDDK
NS3a/NS3a*
chimera H1-
LSGDTAYAQQTRGLEGCQETSQTGRDKNQVEGEVQIVSTATQTFLATSING
VLWTVYHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTC
GSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSAGGPLLCPAGHAV
GIFRAAVSTRGVAKAVDFIPVESLETTMRSP (SEQ ID NO: 64)
ANR-mCherry
ANR
DGSGTGSGTGSGTGTTSGTGTGGSTGGELDELVYLLDGPGYDPIHSD
NS3a/NS3a*
AKGSVVIVGRINLSGDTAYSQQTRGLEGCQETSQTGRDKNQVEGEVQVVS
chimera H1-P2A-
TATQSFLATSINGVLWTVYHGAGTRTIASPKGPVTQMYTNVDKDLVGWQ
ANR-Myc-BFP-
APQGSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGS
AGGPLLCPAGHAVGIFRAAVSTRGVAKAVDFIPVESLETTMRSP
GSGATNFS
LLKQAGDVEENPGPGALSGMGELDELVYLLDGPGYDPIHSDGVLSGSGTGSGTG
DNCR2-VPR
VQIVSTATQTFLATSINGVLWTVYHGAGTRTIASPKGPVTQMYTNVDKDL
VGWQAPQGSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISY
LKGSAGGPLLCPAGHAVGIFRAAVSTRGVAKAVDFIPVESLETTMRSP
GSG
ATNFSLLKQAGDVEENPGP
MSSDEEEARELIERAKEAAERAQEAAERTGDPRVREL
ARELKRLAQEAAEEVKRDPSSSDVNEALKLIVEAIEAAVDALEAAERTGDPEVRELAR
ELVRLAVEAAEEVQRNPSSSDVNEALHSIVYAIEAAIFALEAAERTGDPEVRELARELV
RLAVEAAEEVQRNPSSRNVEHALMRIVLAIYLAEENLREAEESGDPEKREKARERVRE
AVERAEEVQRDPSGWLNHEQKLISEEDLDALDDFDLDMLGSDALDDFDLDMLG
GTACGTTCTCTATCACTGATAGTTTAAGAGCTATGCTGGAAACAGCATAG
Methods for post-translational, dynamic control over intracellular protein function are valuable tools for studying naturally-occurring biological systems and for engineering synthetic systems. Existing chemical and optogenetic systems for controlling protein function are largely restricted to providing single-input/single-output control schemes. To address this, we have created a system using the hepatitis C virus protease NS3a as a single receiver protein that binds multiple drug inputs and is recognized by a set of reader proteins to produce divergent outputs. The keys to the development of this multi-input/multi-output system, called Pleiotropic Response Outputs from a Chemically-Inducible Single Receiver (PROCISiR), are computationally designed reader proteins that can discriminate between different NS3a-drug complexes. The unique, responsive architecture of PROCISiR enables proportional and temporal control modes that are unobtainable with current systems. In signaling or transcriptional applications, we demonstrate output reversibility, switching, tunability, ratiometric control, and fine specification of intermediate levels of two outputs. Given the availability of multiple NS3a-targeting drugs and our ability to create protein readers of specific drug-bound NS3a complexes, PROCISiR can be scaled to provide unprecedented multi-state control over intracellular protein function. These complex control modalities can be readily applied to both in vitro studies of mammalian cellular processes and in vivo signaling and transcriptional control programs for engineered cell therapies.
Mammalian cells are complex information processing systems that receive and transmit many signals through interconnected signaling networks to produce diverse arrays of responses. Multi-functional proteins, such as receptor tyrosine kinases and GPCRs, that can receive multiple inputs and provide variable outputs are central components of these networks, allowing flexible and complex control over cellular behavior. We identified HCV protease NS3a as an attractive central receiver protein that can serve as a control hub for a chemically-controlled multi-input/multi-output system called PROCISiR (
Rosetta™ interface design allowed us to develop protein readers that selectively recognize a binding surface centered on NS3a-bound inhibitors (
To improve D5's affinity for the NS3a/danoprevir complex, we used two sequential yeast surface display libraries (
The high specificity of DNCR2 provided confidence that we could design additional readers that selectively recognize other NS3a/drug complexes. We computationally designed a reader of the grazoprevir/NS3a complex by applying a similar methodology. One design of the 29 tested, G3, showed modest, grazoprevir-dependent binding, which was not observed for the original scaffold, DHR18, or for G3 variants containing interface mutations (M112E and A175Q) (
With our two drug/NS3a complex readers, DNCR2 and GNCR1, and the apo-NS3a reader (ANR), we now had three readers to combine with NS3a in our PROCISiR system (
The ability of our readers to discriminate between different states of NS3a allows complex control modes to be achieved by combining inputs and/or readers, a capability not shared by chemically inducible systems for which there is only one input and one protein complex. First, we used danoprevir as an agonist and grazoprevir as an antagonist to temporally and proportionally control transcription of one endogenous gene using DNCR2-VPR (a transcriptional activator) and an NS3a-dCas9 fusion (Streptococcus pyogenes). We used danoprevir to induce transcriptional activation of CXCR4 from its endogenous promoter, and then rapidly reversed CXCR4 expression by using grazoprevir as a competitive chaser (mRNA reversion t1/2 of 1.3 hours) (
We then applied our PROCISiR method to provide orthogonal control of multi-gene transcription using dCas9 with scaffold RNAs (scRNAs) that contain loci-targeting, single guide RNAs and embedded stem loops recognized by RNA-binding proteins (RBPs). Using an MS2 scRNA targeting endogenous CXCR4 and a PP7 scRNA targeting the Tet operator of a GFP reporter, together with GNCR1-MCP and DNCR2-PCP RBP fusions, respectively, we directed NS3a-VPR to orthogonally induce transcription of each gene (
Finally, we applied PROCISiR to directly control the relative activation of two signaling pathways through localization of DNCR2 and GNCR1 to the plasma membrane via NS3a-CAAX (
Here, we present two new readers with de novo designed interfaces that selectively recognize highly similar protein-small molecule complexes. The ability to discriminate between such closely-related binding surfaces highlights the power of computational protein design and suggests that it will be possible to exploit the wealth of additional NS3a inhibitors available to rapidly expand the number of protein readers, and subsequent outputs, available for the PROCISiR system. Furthermore, a similar strategy can be applied to alternative protein-small molecule complexes. Our designed readers have several characteristics that will make them useful replacements for the existing chemically induced dimerizers, in particular, the high potency, reversibility, favorable pharmacokinetics, and bio-orthogonal nature of the NS3a inhibitors. These characteristics are in demand for in vivo applications such as drug-based control of cellular therapeutics.
The architecture of the PROCISiR system with its multiple inputs, three readers, and single receiver protein enables many unique, fine-scale modulations for in vitro mammalian cell biology. Use of PROCISiR as a post-translational controller allows simulation of a wide range of signaling and transcription states in a quantitative and targeted manner. Our ability to use a combination of inputs and readers to finely modulate gene expression allows temporal induction of the small-scale changes of gene expression observed during development and cancer progression, a capability not matched by the binary, and often non-physiological levels achievable with existing gene induction systems. We extended this fine proportional control of two outputs to concurrently modulate the levels of activity of two signaling pathways, demonstrating the ability to tune levels of individual pathway activity and their crosstalk. Because the danoprevir/grazoprevir ratios are manifested in the fractions of total NS3a bound to each drug, these proportional response regimes are not limited to the narrow drug concentrations of a bimolecular binding interaction, as they are for individual chemically induced dimerizers. The integrated nature of our system enables these more nuanced input-output response structures, which allows researchers to simulate and study the subtle perturbations to signaling and transcription that occur between normal and diseased cell states.
Briefly, small molecule parameters were generated with OpenBabel™ and scaffolds were docked to NS3a/drug complexes with PatchDock™ or RIFdock™ (grazoprevir/NS3a reader). The interface of the scaffold was designed with a custom RosettaScript™, and designs to test were manually selected after filtering by several design metrics.
Note that there were three variants of the NS3a protein sequence used in this study. A solubility optimized NS3a/4a (either catalytically active or catalytically dead, S139A) derived from HCV genotype 1a was used for the majority of the work with the designed readers. Genotype 1a NS3a/4a does not interact with the peptide ANR, which was selected to interact with genotype 1b NS3a; therefore, we engineered a hybrid NS3a/4a, NS3aH1, which is the solubility optimized NS3a/4a with four mutations needed for interaction with ANR: A7S, E13L, I35V, and T42S. NS3aH1 (catalytically active) was used for the majority of the microscopy colocalization and transcription-control constructs. NS3a/4a solubility optimized S139A was used for membrane signaling constructs with DNCR2 and GNCR1. The NS3a/4a fusion is referred to as NS3a throughout the paper. The NS3a variant used is described for each experiment below and in Table 14.
Bacterial expression constructs: Biotinylated proteins were expressed from the pDW363 vector, which encodes a bi-cistronic BirA biotin ligase. Proteins were N-terminally tagged with the biotin acceptor peptide, followed by a His6 tag. Constructs were cloned into pDW363 via PCR-linearization of the vector, followed by Gibson assembly with the gene insert. Untagged proteins were expressed from the pCDB24 vector (gift of Christopher Bahl, Baker lab), which encodes proteins with an N-terminal His10-Smt3 tag, which is scarlessly removed by ULP1. Linear gene inserts with overhangs and a stop codon added were inserted via Gibson assembly into pCDB24 that had been linearized with XhoI (New England Biolabs).
Yeast surface expression constructs: Danoprevir/NS3a reader designs were synthesized as linear genes by Gen9. All yeast constructs were cloned by homologous recombination in yeast with linearized pETCON™ vector (NdeI-/XhoI-cut, New England BioLabs). pETCON™ encodes Aga-2, the inserted gene, and a C-terminal c-myc tag for expression detection. Grazoprevir/NS3a reader designs were synthesized and constructed in complete pETCON™ plasmids by Genscript.
Mammalian expression constructs: All constructs were made in pcDNA5/FRT/TO (Thermo Fisher Scientific) unless otherwise noted. pcDNA5/FRT/TO was either linearized via PCR, or cut by BamHI and EcoRV, and inserts and vector were assembled by Gibson assembly. Dual expression constructs of DNCR2-VPR/KRAB and NS3aH1-dCas9 were made in PiggyBac™ vectors (pSLQ2818 pPB: CAG-PYL1-KRAB-IRES-Puro-WPRE-SV40PA-PGK-ABI-tagBFP-SpdCas9 and pSLQ2817 pPB: CAG-PYL1-VPR-IRES-Puro-WPRE-SV40PA-PGK-ABI-tagBFP-SpdCas9, gifts from Stanley Qi (Addgene plasmids #84241 and 84239)). The PiggyBac vectors were linearized by restriction enzyme digest, and PCR amplified inserts and digested vector were assembled by Gibson assembly. pCDNA5/FRT/TO-MCP-NS3a-P2a-DNCR2-KRAB-MeCP2-P2a-GNCR1-VPR-IRES-BFP was assembled with fragments PCR amplified from the following sources: MCP from pJZC34 (see below), KRAB-MeCP2 was a gift from Alejandro Chavez & George Church (Addgene 110821), VPR from one of the above-mentioned pPB vectors, and DNCR2, GNCR1, and NS3a (solubility optimized S139A) from gBlocks.
Single-guide RNAs (CXCR4, CD95, TRE3G) were cloned into the gRNA Cloning Vector, a gift from George Church (Addgene plasmid #41824). DNA corresponding to the guide target was ordered as a single stranded oligo with overlap to the vector and assembled with AflII-digested gRNA vector by Gibson Assembly. Scaffold RNAs (targeting CXCR4, CD95, or TRE3G with com, PP7, or MS2, respectively) were cloned into dual insert vectors derived from pSico™, expressing the scaffold RNA under a U6 promoter and the protein inserts under a CMV promoter: pJZC33 or 34 (MS2/MCP), pJZC43 (PP7/PCP), pJZC48 (com/com), gifts from Jesse Zalatan. All RNA-binding protein-reader fusions were expressed with P2a-tagBFP in place of the IRES-mCherry™ in the original vectors. This vector was also the basis of the scRNA-only vectors, which were used when all readers/RBPs were expressed separately. These vectors expressed only a tagBFP downstream of the CMV, and the guide plus 2×MS2 (wt+f6 sequences) under the U6 promoter.
pCDNA5/FRT/TO-Lifeact-mCherry™ was created from mCherry™-Lifeact-7, a gift from Michael Davidson (Addgene plasmid #54491). pEF5-FRT-mCherry-NS3a-CAAX-IRES-EGFP-DNCR2-P2a-BFP-GNCR1 was created by assembling readers and fluorescent proteins from other constructs in a pEF5-FRT backbone obtained by digestion of Addgene plasmid #61684, a gift from Maxence Nachury. pPB-NS3a-CAAX-IRES-EGFP-DNCR2-TIAM-BFP-GNCR1-LARG and pPB-NS3a-CAAX-IRES-EGFP-DNCR2-ITSN-BFP-GNCR1-iSH2 and were assembled with NS3a, reader, and fluorescent protein fragments from the previously mentioned construct, with addition of signaling effector domains from the following sources: human TIAM DH-domain residues 1033-1240 from Maly lab source, human ITSN DH-domain residues 1228-1429 from Maly lab source, LARG DH-domain was a gift from Michael Glotzer (Addgene plasmid #80408), iSH2 residues 420-615 aa from human p85 from Maly lab source. The PiggyBac vector used for these two constructs was linearized by digesting the multiple cloning site of PB501B (Systems Biosciences).
pLenti-UAS-minCMV-mCherry™/CMV-Gal4DBD-NS3a-P2a-DNCR2-VPR was based on a pLenti-UAS-minCMV-mCherry™/CMV-Gal4DBD-ERT2VP16 vector, a gift from Kenneth Matreyek, (from which the Gal4-UAS-minCMV was from Addgene plasmid #79130, a gift from Wendell Lim) which was digested with BamHI-HF and SexA1 to insert the NS3a-P2a-DNCR2-VPR fragment.
All cloning PCR reactions were performed with Q5 polymerase (New England BioLabs), and all Gibson assembly reactions were performed with NEBuilder HiFi™ Assembly Master Mix (New England BioLabs). Oligonucleotides and gBlocks were synthesized by Integrated DNA Technologies. The complete insert was verified by sequencing for each construct (Genewiz). Select mammalian expression vectors constructed in this study are available on Addgene, and bacterial or yeast expression vectors are available upon request. See Table 14 for all sequences.
Grazoprevir was purchased from MedChem Express (MK-5172, product number HY-15298). Asunaprevir (BMS-650032, product number A3195) and danoprevir (RG7227, product number A4024) were purchased from ApexBio.
Proteins were expressed in BL21 (DE3) E. coli at 37° C. to an O.D.600 of 0.5-1.0, then moved to 18° C. and induced to 0.5 mM IPTG overnight. For biotinylated constructs, 12.5 mg D(+)-biotin/L culture was added upon inoculation with overnight culture. After 16-20 hours of overnight growth, cultures were harvested, and cell pellets frozen at −80° C. Cell pellets were resuspended in 20 mM Tris pH 8.0, 500 mM NaCl, 5 mM imidazole, 1 mM DTT, 0.1% v/v Tween-20. All buffers for NS3a purifications additionally included 10% v/v glycerol. Cells were lysed by sonication, and supernatant was incubated with NiNTA resin (Qiagen) for at least 1 h at 4° C. Resin was washed with 20 mM Tris pH 8.0, 500 mM NaCl, 20 mM imidazole, and proteins were eluted with 20 mM Tris pH 8.0, 500 mM NaCl, 300 mM imidazole. Biotinylated constructs were then further purified by size exclusion chromatography on a Superdex 75 10/300 GL column (GE Healthcare) in 20 mM Tris pH 8.0, 300 mM NaCl, 1 mM DTT, 10% v/v glycerol. Proteins were stored in this buffer at −80° C. For proteins tagged with His10-Smt3, the tag was removed by overnight cleavage at room temperature using His-tagged ULP1 protease (purified in-house) at a ratio of 1 mg ULP1:250 mg protein. Cleavage was performed concurrent with dialysis (3.5 kDa mwco Slide-A-Lyzer™ dialysis cassettes, Thermo Scientific) in 20 mM Tris pH 8.0, 300 mM NaCl, 1 mM DTT, 10% v/v glycerol. Cleaved protein was then put through a second NiNTA purification, with the desired protein collected in the flowthrough and wash (20 mM Tris pH 8.0, 500 mM NaCl, 20 mM imidazole, 10% v/v glycerol). NS3a S139A and DNCR2 for crystallization were further purified via ion exchange chromatography on a HiTrap™ SP column (GE Healthcare) and HiTrap Q column (GE Healthcare), respectively, followed by size exclusion chromatography on a Superdex™ 75 10/300 GL column (GE Healthcare) in 20 mM Tris pH 8.0, 100 mM NaCl, 2 mM DTT. 60 μM NS3a and 100 μM DNCR2 were mixed with 500 μM danoprevir and incubated at 4° C. overnight. The NS3a S139A/DNCR2/danoprevir complex was further purified by size exclusion chromatography on a Superdex 75 10/300 GL column (GE Healthcare) in 20 mM Tris pH 8.0, 50 mM NaCl, 2 mM DTT. The protein complex peak fractions were pooled and subsequently concentrated to 7 mg/mL for crystallization.
Crystals were obtained using the hanging drop method by adding 1 μl of the above NS3a/DNCR2/danoprevir complex to 1 μl of a well solution containing 100 mM Bis-Tris, pH 6.5, 200 mM LiSO4 and 22% w/v PEG 3350. Crystals formed in 24-36 h at room temperature. Crystals were flash-frozen with liquid nitrogen in a cryoprotectant with 20% v/v glycerol.
Data collection was performed at the ALS beamlines 8.2.1 and 8.2.2. The diffraction data was processed by the HKL2000 package in the space group P21. The structure was determined, at 2.3 Å resolution, using one data set collected at a wavelength of 1.00 Å, which was also used for refinement (Extended Data Table 2). The initial phases were determined by molecular replacement with the program Phaser, using the crystal structure of NS3a (PDB code: 3M5L) as the initial search model. Two NS3/4a were found in one asymmetric unit, and the experimental electron density map clearly showed the presence of two molecules of DNCR2 with two molecules of danoprevir in one asymmetric unit. The complex model was improved using iterative cycles of manual rebuilding with the program COOT and refinement with Refmac5 of the CCP4 program suite. There were no Ramachandran outliers (98.3% most favored, 1.7% allowed).
5 nmoles of each protein or drug were mixed in 300 μL total volume (16.7 μM final concentration), in a buffer of 20 mM Tris pH 8.0, 300 mM NaCl, 10% glycerol, 1 mM DTT. Complexes were incubated on ice for 1 h before injection of 250 μL into a 500 μL loop and onto a Superdex™-75 10/300 GL column (GE Healthcare) at 4° C. Untagged NS3a S139A (solubility optimized) and untagged DNCR2 were used for SEC.
Library design to improve the affinity of the original designs proceeded through three stages: 1) Redesign of the D5 or G3 interface using RosettaDesign™, 2) selection of positions to vary in the library, and 3) optimization of degenerate codon choices to encode the library using a previously described integer linear programming approach.
Redesign of the interfaces was done using the RosettaScript™ cid_roll_design.xml (Supplemental Methods). ˜1000 redesigns were generated for D5/G3. Unique sequences from designs that had a ROSETTA™ ddg score below that of the original design (700-800 sequences) were used to assemble a position specific scoring matrix (PSSM).
To select positions to vary in the library, this PSSM was visually examined with reference to the original design and the redesign models. Positions with significant changes in the redesigns that were proximal to the interface were chosen to vary in the library. Additionally, to enable construction of each library from two oligonucleotides, the positions varied were constrained to two helices (helices 5 and 7 for D5, and helices 2 and 4 for G3).
The library design scripts require two inputs: a short list of residues required to be varied in the library, and a longer list of preferred residues and/or a PSSM.37 Required residue lists generally included the original residue from the design, with a further hand-selected set of residues highly preferred in the redesigns. Preferred residue lists included all amino acids occurring in the redesigns. The D5 library was designed by optimizing degenerate codon choice to encode as many preferred residues as possible within a DNA library size constraint of 107. The resulting library encoded 4.1×106 protein variants). The G3 library was designed by optimizing the sum of the PSSM scores from the redesigns within a DNA library size constraint of 107. The resulting G3 library encoded 7.1×106 protein variants.
DNCR1 combinatorial library design used the same library optimization approach as above, but used experimentally determined mutational preferences as the input, rather than design-determined preferences. The enrichment values from the DNCR1 SSM library (see below) were standardized (Z-value) for each positive sort (performed at 50 nM or 500 nM NS3a). The Z-values for the two sorts were then averaged. These average standardized enrichment values were used as a PSSM input to the library design script. Positions to vary were hand-chosen based on their proximity to the designed interface (based on the original D5 model), as well as the presence of multiple enriched mutations in the SSM results. The mutations that were required to be included in the library design were also hand-picked from the most enriched mutations (top 10% of enrichment values), while the inclusion of additional mutations was optimized by maximizing the sum of the enrichment scores. Some large codon choices were removed to enforce a modest number of mutations at each position. Additionally, chemical diversity classes were defined to prioritize inclusion of certain classes of residues. The library DNA size was constrained to be <108 variants, and final size in protein sequences was 2.76×107.
Combinatorial libraries were assembled from two ultramer oligonucleotides (Integrated DNA Technologies), which contained a short, overlapping region corresponding to part of the constant helix between the two varied helices (helix 6 for the D5 libraries, and helix 3 for the G3 library). Linear, double stranded fragments were generated in the first PCR by pairing each varied primer with a constant primer that annealed 5′ or 3′ to the end of the full gene. These fragments were excised and extracted from an agarose gel. A second round of PCR was performed to overlap these fragments, with further amplification by addition of the outside primers in the 10th cycle (out of 35). The correct-sized product was gel extracted and used as the template for 1-2 more rounds of PCR with the outside primers to yield sufficient DNA. The DNCR1 SSM library was assembled using a pair of primers (Integrated DNA Technologies) for each of the 75 protein positions varied, where the forward primer contained the NNK site in a central position, and the reverse primer overlapped with the 5′ end of the forward primer.38 Linear fragments corresponding to each primer pair were overlapped in a second round of PCR to yield the full gene insert. Combinatorial library PCRs were performed with Q5 polymerase (New England BioLabs), and the SSM library PCRs were performed with Phusion™ polymerase (Thermo Fisher Scientific). For all libraries, the linear library DNA was combined with NdeI- and XhoI-digested pETCON™ at a ratio of 4 μg insert:1 μg vector and electroporated into freshly-prepared electrocompetent EBY100 S. cerevisiae.
Yeast were grown overnight at 30° C. in yeast minimal media (-ura for strain selection, -trp for pETCON™ selection) supplemented with 2% w/v glucose. Overnights were used to inoculate SGCAA cultures (2% w/v galactose, 0.67% w/v yeast nitrogen base, 0.5% w/v casamino acids, and 0.1 M sodium phosphate, pH 6.6) to an O.D. 600 of 1.0-2.0 and protein expression was induced overnight at 30° C. Before sorting or analysis, cells were pelleted and resuspended in PBS supplemented with 0.5% w/v bovine serum albumin (PBSA). Protein solutions of biotinylated NS3a with danoprevir or grazoprevir were made in PBSA and incubated with the yeast for 30 min-1 h at 22° C. For analysis and sorting of initial, low-affinity designs, NS3a was pre-tetramerized by incubation with streptavidin-phycoerythrin (SAPE, Invitrogen) at a molar ratio of 1 SAPE:4 NS3a for at least 10 minutes prior to incubation with yeast; these sorts are denoted as “avid” below. Cells were washed in cold PBSA and incubated for 15 min on ice with SAPE and fluorescein isothiocyanate-conjugated chicken anti-c-myc (Immunology Consultants Laboratory), both diluted 1:100 in PBSA. After the labeling incubation, cells were washed again in cold PBSA and analyzed on a C6 flow cytometer (Accuri) or a FACSCanto™ cytometer (BD Biosciences), or sorted on a SH800 (Sony Biotechnology) cell sorter or a FACSAria III (BD Biosciences) cell sorter. All FACS data were analyzed using FlowJo (v.10.1). See
Titration curves for NS3a/drug complexes on yeast-displayed designs used construct NS3a_3 (solubility-optimized, catalytically active). Drug concentrations were at a fixed molar ratio of 10 drug:1 NS3a, with the exception of the DNCR2-danoprevir titration, for which a fixed concentration of 50 nM danoprevir was used for all points to stay above the NS3a/danoprevir Ki. Curves were fit using Graphpad Prism 5 to a one-site specific binding model with Hill coefficient.
For the first D5 library, the following sequential sorts were performed using catalytically active NS3a (NS3a_3): 1 μM NS3a/10 μM danoprevir, 0.5 μM NS3a avid/5 μM danoprevir, 0.5 μM NS3a avid/5 μM danoprevir, 0.25 μM NS3a avid/2.5 μM danoprevir, 2 μM NS3a/20 μM danoprevir, 20 nM NS3a/200 nM danoprevir. The highest 1-3% PE/FITC-positive events were collected for each sort, with the gate set along the binding/expression diagonal. For the DNCR1 combinatorial library, the following sequential sorts were performed using catalytically inactive NS3a (NS3a_2): 100 nM NS3a/1 μM danoprevir, 100 nM NS3a/1 μM danoprevir, 50 nM NS3a/500 nM danoprevir, 5 nM NS3a/50 nM danoprevir, 500 μM NS3a/50 nM danoprevir, 20 μM NS3a/50 nM danoprevir. The top 0.5-9% were collected in each sort. For the G3 library, the following sequential sorts were performed using catalytically inactive NS3a (NS3a_2): 500 nM NS3a avid/5 μM grazoprevir, 50 nM NS3a avid/500 nM grazoprevir, 500 nM NS3a/5 μM grazoprevir, 500 nM NS3a/5 μM grazoprevir, 250 nM NS3a/2.5 μM grazoprevir, 100 nM NS3a/1 μM grazoprevir, 30 nM NS3a/300 nM grazoprevir.
The most-enriched clones were assessed by colony PCR and sequencing (Genewiz) of ˜50 colonies from the final 2-3 pools of each library. Titrations of NS3a/drug were performed on several of the most enriched clones to verify that the most-enriched clones (DNCR1 and GNCR1) exhibited the tightest binding. DNCR2 was selected from multiple very high-affinity clones based on its superior expression on yeast.
For the DNCR1 site saturation mutagenesis (SSM) library, two sorts were performed on the same day at 50 nM NS3a (NS3a_2)/500 nM danoprevir and 500 nM NS3a (NS3a_2)/5 μM danoprevir. For both conditions, a positive-sort gate was set to collect the top 1% of binders, and a negative-sort gate was set to collect the bottom 6% of binders. All gates were set along the binding/expression diagonal. The naïve population for sequencing analysis was saved from the same day of growth.
DNCR1 SSM library sequencing
At least 20 million cells were harvested for each selected library pool and the naïve library, and DNA was extracted and prepared for Illumina sequencing The first round of qPCR, to amplify the 150 bp varied region, was performed for 25-35 cycles using Phusion polymerase. After gel extraction, a second round of PCR was performed to add on barcodes and Illumina adaptors. Sequencing was performed with a 600-cycle reagent kit (Illumina) on a MiSeq™ sequencer (Ilumina). Enrich was used to align and filter the paired-end reads.40 An average quality for each read was required to be greater than 20, no N's were allowed, and the maximum number of nucleotide mutations allowed per sequence was 3. The sequence counts output by Enrich were processed by an in-house Python script to calculate the enrichment value (enrichment ratio for each mutant, normalized by the wild-type enrichment ratio): log 2 (Fv,sel/Fv,inp)/(Fwt,sel/Fwt,inp), where Fv is the frequency of the variant in the selected or input (naïve library) pool, and Fwt is the frequency of the wild-type residue. Only single mutants that had at least 15 counts in the naïve library were included in the analysis.
All cells were cultured in high-glucose DMEM, 4 mM L-glutamine, 10% fetal bovine serum (FBS, Life Technologies) at 37° C., 5% CO2. Cells were tested and found free of mycoplasma monthly.
A Leica SP8× system was used for confocal microscopy. A UV laser at 405 nm was used to excite tagBFP. White light lasers of 488 and 587 nm were used for EGFP and mCherry™, respectively. TagBFP emission was recorded on a PMT detector, and EGFP and mCherry™ were detected by separate HyD™ detectors. All images were taken using a 63× objective with oil, at 512×512 resolution.
Colocalization experiments were performed in NIH3T3 cells (Flp-In-3T3, Thermo Fisher Scientific). For fixed-cell experiments, cells were plated at 3×104 cells/mL on sterile glass coverslides placed in 12-well culture plates. Cells were transfected 24 hours after plating with Lipofectamine™ 2000 or 3000 (Thermo Fisher Scientific) at a ratio of 3 μL reagent:1 μg DNA, according to manufacturer's instructions. 3-vector transfections were performed with 0.3 μg NS3a and 0.35 μg each ANR/DNCR2/GNBP vectors, while 2-vector transfections were performed with 0.3 μg free component and 0.7 μg of the immobilized component. One day after transfection, cells were treated with drug or DMSO and fixed. Drug additions were performed by exchanging the media for DMEM+10% v/v FBS plus drug. To fix, cells were washed once with DPBS (Thermo Fisher Scientific), then incubated with 4% v/v paraformaldehyde in DPBS for 15 minutes. After washing twice with DPBS, coverslides with cells were removed from the plate and mounted on glass slides using Fluoromount-G (SouthernBiotech).
For the live cell experiment assaying DNCR2 membrane association time, cells were plated at 3×104 cells/mL in 35 mm glass-bottomed dishes (Matek), that were coated with poly-D-lysine. Experiments were performed in FluorBrite™ DMEM (Thermo Fisher Scientific) media supplemented with GlutaMax™ (Thermo Fisher Scientific) and 10% v/v FBS. Cells were imaged with dishes open on a heated stage (˜55° C., which resulted in the media at the center of the plate remaining at ˜30° C.). 5 μM drug additions were performed by removing 1 mL media from the dish, mixing with drug, and returning to the dish after 2 minutes of imaging. All cells were imaged within 30 minutes of removal from incubator, and no environmental controls were used beyond heating. The constructs used for live cell membrane localization kinetics were myristoyl-tag-mCherry™-NS3a and DNCR2-EGFP.
Colocalization of NS3a and DNCR1 at the plasma membrane, nucleus, mitochondria and Golgi was performed with two sets of constructs, with either NS3a or DNCR1 as the immobilized component. mCherry™-NS3a was used with Tom20-DNCR1-EGFP, DNCR1-EGFP-Giantin, and 3×NLS-DNCR1-EGFP. DNCR1-EGFP was used with Tom20-mCherry™-NS3a, mCherry-NS3a-Giantin, 3×NLS-mCherry™-NS3a, and myristoyl-tag-mCherry™-NS3a. Drug specificity of DNCR1 was analyzed with mCherry™-NS3a and Tom20-DNCR1-EGFP or DNCR1-EGFP-Giantin, and drug specificity of DNCR2 and NS3a with DNCR2-EGFP and Tom20-mCherry™-NS3a. Colocalization was analyzed after 1 h of 10 μM drug or equal volume DMSO treatment.
Colocalization of NS3a, ANR, and DNCR2 was performed with NS3aH1-mCherry™ in combination with 2 separate vectors encoding 3×NLS-DNCR2-EGFP and ANR-ANR—BFP-CAAX (0.3 μg, 0.35 μg, 0.35 μg, respectively) or one vector encoding Tom20-BFP-ANR-ANR-P2a-DNCR2-EGFP-CAAX (0.3 μg NS3a, 0.75 μg ANR/DNCR2). Colocalization of NS3a, DNCR2 and GNCR1 was performed with NS3aH1-mCherry™, Tom20-DNCR2-EGFP, and GNCR1-BFP-CAAX (2-location; 0.3 μg, 0.35 μg, 0.35 μg, respectively), or with DNCR2-EGFP, GNCR1-BFP, and NS3aH1-mCherry™-CAAX (1-location; 0.25 μg, 0.25 μg, 0.5 μg, respectively). For all 3-color experiments, 15-minute drug treatments with 5 μM danoprevir or grazoprevir or equal volume DMSO were performed prior to fixing.
For the colocalization experiment shown in
All images were analyzed using ImageJ. Pearson's r values reported are Rcolocalization values generated using an automatic thresholding program (Colocalization Threshold plugin).4 For DNCR2 membrane associate kinetics analysis, a square ROI was set to include only cytoplasm. EGFP fluorescence was quantified in the ROI over the timecourse. 15 min timecourses (2 min pre-drug addition, 13 min post-drug) were collected for 18 cells from 4 independent plates. The cytoplasmic fluorescence was normalized to the value in the first and last frame for each cell. Because the cells were imaged at different time points (every ˜20-30 seconds), we used an in-house Python script to fit a 1-D interpolation to each timecourse and plotted the average and standard deviation value of the 1-D functions at 20 second intervals. Time points after drug addition were fit to an exponential decay model to calculate a t1/2 using Graphpad Prism 5 (y=(y0−b)*e−kx+b, where b was constrained to 0, but y0 was left unconstrained to account for minor variability in drug addition and mixing times).
Widefield images were collected in an environmental chamber with humidity control, 37° C., and 5% CO2 on a Leica DMi8 automated fluorescence microscope. Cells were plated on glass-bottomed 96-well plates (Cellvis). Plates were treated with 10 μg/mL bovine fibronectin (Sigma Aldrich) for 1 hour and washed once with PBS.
The cell line used was TRex™-HeLa (ThermoFisher Scientific), into which Lifeact-mCherry™ was stably integrated into the doxycycline-regulated Flp-In site by co-transfection of the pCDNA5-FRT/TO-Lifeact-mCherry™ vector with the Flp recombinase plasmid pOG44 (ThermoFisher Scientific) according to manufacturer's protocols. Lifeact-mCherry™ was induced by addition of 1 μg/mL doxycycline to culture media. For expression of signaling effector proteins, 1 day prior to imaging, 5×106 cells were transiently transfected with 10 μg DNA in a 100 μL electroporation tip using a Neon transfection system (ThermoFisher Scientific) according to manufacturer's recommendations for HeLa cells. 5×103 cells were plated in each well of the 96-well plate used for imaging. Cells recovered in complete DMEM with 10% FBS overnight. The following day, media was aspirated, cells were washed once with PBS, and cells were serum starved for 3-8 hours before imaging in 100 μL FluorBrite™ DMEM (Thermo Fisher Scientific) media supplemented with GlutaMax™ (Thermo Fisher Scientific) (“imaging media”). For Rac/Rho regulation, the construct PB-NS3a-CAAX-IRES-EGFP-DNCR2-TIAM-P2a-BFP-GNCR1-LARG was used, with images collected for the mCherry™ (Lifeact) and EGFP (DNCR2-TIAM) channels. Cells were imaged for 10 minutes prior to drug addition, and drug was added by pipetting 100 μL 2× drug in prewarmed imaging media, after which cells were imaged for a further 60 minutes.
COS-7 cells (ATCC), were plated in 24-well plates at 2×105 cells/mL (0.5 mL volume). One day later, cells were transfected using TurboFectin™ 8.0 (OriGene) according to the manufacturer's instructions with 0.75 μg myristoyl-tag-mCherry™-NS3a and 0.25 μg DNCR2-iSH2 vectors. One day after transfection, cells were washed once with DPBS, and media was replaced with serum-free DMEM. After serum-starving for 22 hours, cells were exposed to a 15-min drug treatment using 12, 3-fold dilutions of danoprevir from 5 μM to 0 μM, in triplicate. After drug treatment, cells were washed once in DPBS, then lysed in 50 μL modified RIPA buffer (50 mM Tris-HCl, pH 7.8, 1% v/v IGEPAL CA-630, 150 mM NaCl, 1 mM EDTA, 1× Pierce Protease Inhibitor Tablet) for 30 minutes on ice. Cell debris was cleared by centrifugation at 17 kg for 10 min at 4° C. Lysate was mixed with protein loading dye and denatured at 95° C. for 7 minutes then run on an SDS-PAGE gel (Criterion, Bio-Rad) and transferred to nitrocellulose. Blocking and primary antibody incubations were done in a 1:1 mix of TBS plus 0.1% v/v Tween-20 (TBST) and blocking buffer (Odyssey). Primary antibodies used were pSER473 AKT (1:2000, Cell Signaling Technologies #4060), and pan-AKT (1:2000, Cell Signaling Technologies #2920). Blots were washed with TBST, then incubated with secondary antibodies diluted 1:10,000 in TBST (goat anti-rabbit-IRDye™ 800 CW (926-32211) and goat anti-mouse-IRDye™ 680LT (926-68020), LI-COR), washed, and imaged on a LI-CORTM Odyssey scanner. pAKT signal was divided by AKT signal for each lane, and the titration curve was fit to a three-parameter dose-response curve (fitting top, bottom, and EC50) in Graphpad™ Prism 5.
dCas9 Transcription Control
CXCR4 and CD95 induction experiments with DNCR2-VPR and NS3aH1-dCas9 were performed in HEK293T cells (293T/17, ATCC) following the protocol and using the same materials as detailed in Gao et al. Antibodies used were: APC anti-human CD184 (CXCR4) [12G5] (BioLegend 306510), PE anti-human CD95 (Fas) [DX2] (BioLegend 305607), PE Mouse IgG1, κ Isotype Ctrl [MOPC-21] (BioLegend 400111), APC Mouse IgG2b, K Isotype Ctrl [MPC-11] (BioLegend 400322). No binding of isotype controls was observed to HEK293T cells; therefore, no background adjustments were made for isotype binding. Briefly, cells were plated in 12-well plates at 6×104 cells/mL on day 1 and transfected with TurboFectin™ 8.0 (OriGene) according to the manufacturer's instructions on day 2. 1 μg total DNA was transfected per well (0.5 μg pB-DNCR2-VPR/NS3a-dCas9, 0.5 μg equal mix of 3 CD95 or CXCR4 guide RNA vectors (or unrelated guide for “No guide” controls)). 10 μM danoprevir was added on day 3, and cells were harvested on day 5 (VPR), incubated with antibodies for 1 hr, and analyzed on a FACSLSRII™ (BD Biosciences). For gene repression experiments with KRAB, cells were passaged on day 5, incubated with fresh drug, and analyzed on day 7. For all mammalian FACS experiments (unless otherwise noted), 10,000 single cell events were collected for each sample, and the median fluorescence signal of cells with BFP signals greater than that of untransfected cells were reported. All FACS data were analyzed using FlowJo™ (v.10.1). See
Danoprevir/grazoprevir titrations to linearize CXCR4 or CD95 expression were performed with DNCR2-VPR and NS3a-dCas9 following the protocol detailed above for gene induction with VPR, but in 24-well plates with 0.5 μg total DNA. Danoprevir was titrated in 12 concentrations in 2.5-fold dilutions starting from 1000 nM. Grazoprevir dilutions were added to the danoprevir titration, all starting from 10 nM grazoprevir, and decreasing across 12 concentration points in 2-, 1.5-, or 1.25-fold dilutions. Data were fit to four-parameter log dose-response curves (fitting EC50, upper and lower baselines, and Hill coefficient) in Graphpad Prism 5.
Induction and reversion timecourses of CXCR4 expression that were analyzed by qPCR were performed in a similar manner, with 10 μM danoprevir replaced by 10 μM grazoprevir or equal DMSO after 24 hours of danoprevir treatment. Wells (in triplicate for each condition) were harvested at each time point by aspirating, washing with 1 mL DPBS, adding 300 μL Versene (ThermoFisher Scientific) and incubating for 5 minutes at 37° C., then pelleting at 3.5 krpm for 2 minutes at 4° C., aspirating, and freezing the pellets at −80° C.
GFP expression experiments were performed in a HEK293T cell line with GFP stably integrated in a single tetracycline-inducible landing pad (7×TRE3G operator with rTA) created in a similar manner as a previously published TetBxblBFP-rTA HEK293T cell line (gift from Doug Fowler). Combined CXCR4 and GFP induction was performed in this line transfected with 0.3 μg pCDNA5-FRT/TO-dCas9, 0.3 μg pCDNA5/FRT/TO-NS3aH1-VPR, 0.2 μg CXCR4-2×MS2/MCP-GNCR1-P2a-BFP (equal mix of 3 scRNAs), and 0.2 μg TRE3G-2×PP7/PCP-DNCR2-P2a-BFP. Drug treatment (48 hours) with 10 μM danoprevir or 10 μM grazoprevir or danoprevir/grazoprevir matrix, harvesting, CXCR4 antibody incubation and FACS analysis were performed as described above for immunofluorescence analysis.
The 3-gene experiment was performed in the GFP reporter HEK293T cell line transfected with 0.25 μg pCDNA5-FRT/TO-dCas9, 0.25 μg pCDNA5/FRT/TO NS3aH1-VPR, 0.166 μg TRE3G-2×MS2(wt+f6)/MCP-ANR-ANR-P2a-BFP, 0.166 μg CXCR4-com/com-GNCR1-P2a-BFP (equal mix of 3 scRNAs), and 0.166 μg CD95-2×PP7/PCP-DNCR2-P2a-BFP (equal mix of 3 scRNAs). Cells were plated in 12-well plates at 6×104 cells/mL on day 1 and transfected with TurboFectin™ 8.0 (OriGene) according to the manufacturer's instructions on day 2 and 1 μM or 10 μM drug was added on day 3. Cells were harvested on day 5 as described above for other samples to be analyzed to qPCR.
For RT-qPCR analysis, RNA was extracted with the Arum™ Total RNA Mini Kit (Bio-Rad). Integrity of the total RNA was confirmed by running on an agarose gel. Reverse transcription was performed on 1 μg total RNA using the iScript™ Reverse Transcription Kit (Bio-Rad), according to manufacturer's instructions. A no-RT control was performed on several samples per experiment to confirm that there was no significant genomic DNA contamination. qPCR was performed on 50 ng cDNA (1 μL of RT reaction) in a 10 μL reaction volume using SsoAdvanced Universal SYBR™ Green Supermix (Bio-Rad). For each biological sample, technical duplicates of the qPCR were performed and averaged. qPCR primers for GAPDH (reference gene), CXCR4, CD95, and GFP are listed in Table 14. CXCR4 and GAPDH primers are from Zalatan et al., and CD95 and GFP primers were designed to amplify a 94 bp product using Primer3 (v. 0.4.0).20,44 A thermocycle of 95° C. for 2 min, (95° C. 10 sec, 58° C. 30 sec)×40 cycles, 65° C.-95° C. at 0.5° C. increments 5 sec/step was performed on a Bio-Rad CFX Connect Real-Time System. For the CXCR4 reversibility experiment, fold-change in CXCR4 expression was calculated relative to a 0 hr timepoint using the 2−ΔΔCT method.45 For analysis of the 3-gene experiment, fold-change was calculated relative to untransfected TRE3G-GFP HEK293Ts.
The switchable gene expression/repression experiment on CXCR4 and CD95 was performed in TReX™-HEK293 cell (ThermoFisher Scientific), into which Sp dCas9 was stably integrated using vector pCDNA5/FRT/TO-nFLAG-dCas9 and the Flp recombinase vector pOG44, according to manufacturer's protocols. This experiment followed our general dCas9 transcription experiment workflow described above. Briefly, cells were plated on day 1, transfected and induced with doxycycline on day 2, had 100 nM danoprevir or grazoprevir or equal volume DMSO added on day 3, and harvested for FACS analysis on day 5. All readers were transfected in via one plasmid, pCDNA5/FRT/TO-MCP-NS3a-P2a-DNCR2-KRAB-MeCP2-P2a-GNCR1-VPR-IRES-BFP. A mix of 3 guides each for CXCR4 or CD95 were transfected, or a gal4-4 control guide, all in a pU6-guide-2×MS2(wt+f6)/CMV-BFP vector. 0.5 μg reader and 0.5 μg guide plasmids were co-transfected in each well. Cells were incubated with antibodies and analyzed as described above, with 20,000 single-cell events collected per sample, and the median fluorescence plotted for cells with the top ˜30% BFP expression signal.
HEK293T/17 cells (ATCC) were plated at 7×104 cells/mL in 0.5 mL in 24-well plates. One day later, they were transfected with 0.35 μg pLenti-UAS-mCherry™/CMV-Gal4DBD-NS3a-P2a-DNCR2-VPR and 0.15 μg of a BFP-expressing vector to use for gating on transfection-positive cells. The next day, a 12-point dilution series of danoprevir was added with 2.5-fold dilutions starting at 100 nM danoprevir. Two days later, cells were removed from the plate with Versene (Gibco), and analyzed for mCherry™ and BFP fluorescence on an FACSLSRII (BD Biosciences). 20,000 single-cell events were collected, and median mCherry™ fluorescence was reported for the cells with the top ˜50% of BFP signal for each sample.
All P-values are from unpaired, two-sided t-tests, computed using Graphpad™ Prism 5.
The danoprevir/NS3a complex reader design process started with docking, using PatchDock™, a set of highly stable, de novo designed proteins on a danoprevir/NS3a structure: leucine-rich repeat proteins, designed helical repeat proteins (DHRs), ferredoxins, and helical bundles.1-3 One design, D5, based on a DHR, showed danoprevir-dependent binding to NS3a when assayed via yeast surface display.
To improve D5's affinity for the NS3a/danoprevir complex, we used two sequential yeast surface display libraries (
We performed a detailed biochemical analysis of the DNCR2/danoprevir/NS3a complex to confirm that it had the expected properties of a chemically-induced heterodimer. DNCR2 does not appear to bind substantially to danoprevir alone based on the inability of a high concentration (100 μM) of the free drug to disrupt the DNCR2/danoprevir/NS3a complex on yeast (
For our drug/NS3a complex reader, we targeted the NS3a/grazoprevir complex. Grazoprevir is an FDA-approved drug with picomolar affinity to NS3a (Ki of 140 μM).6 For this round of design, we exclusively used DHR scaffolds, as our first-generation design had indicated that they were more suitable scaffolds for our design goal. We assembled a DHR scaffold set of many curvatures and sizes from published DHR crystal structures, as well as an in-house set of models (available upon request). We used both PatchDock™ and a new rotamer interaction field docking protocol (RIFDock™) to center the DHR scaffolds over grazoprevir, followed by the same design approach that was used for the danoprevir CID design. We ordered and tested 29 designs by yeast surface display. Five designs based on DHR models showed very weak, but grazoprevir-dependent binding (data not shown). One design, G3, based on the crystal structure of DHR18, showed modest binding, similar to the first-generation danoprevir reader design, D5 (
We computationally characterized the mutational preferences of the G3 interface via a similar Rosetta™-based approach used to predict the mutational preferences of D5. The predicted mutational preferences at the G3 interface are shown in
As an assay for colocalization of NS3a and DBP, we used confocal fluorescence microscopy of NIH3T3 cells transiently transfected with pairs of NS3a-mCherry™ and DNCR1-EGFP constructs. NS3a was localized to different subcellular compartments via N-terminal Tom20 (mitochondria), nuclear localization signal (NLS, nucleus), or myristoylation tags (plasma membrane), or a C-terminal Giantin tag (Golgi). DNCR1-EGFP was diffuse throughout the cell under DMSO treatment (
Subcellular Localization Control with PROCISiR
In addition to the GNCR1/DNCR2 and DNCR2/ANR combinations used for subcellular location control of NS3a demonstrated in
To predict drug concentration regimes that would yield intermediate levels of NS3a:DNCR2 and NS3a:GNCR1 complexes, we modeled the fraction of NS3a bound to different drugs. For this, we simply used NS3a:drug Ki values and the Cheng-Prussoff approximations for equilibrium drug:receptor binding in the presence of a competitive inhibitor:8
where fNd is the fraction of NS3a bound to the target drug, and fNc is the fraction of NS3a bound to the competitor drug, D is the free concentration of target drug, C is the free concentration of competitor drug, Ki,d is the NS3a Ki for the target drug, and Ki,c is the NS3a Ki for the competitor drug. The following NS3a:drug Ki values used are from published enzyme inhibition studies: danoprevir:NS3a, 1.0 nM, asunaprevir:NS3a 1.0 nM, grazoprevir:NS3a, 0.14 nM.6,9 There are several assumptions made in applying these equations that are unlikely to be valid in all cellular conditions. These include that the total drug concentrations is equal to the free drug concentration and the direct inverse relationship between fNd and fNc, which is unlikely to be true when NS3a concentrations are high. Additionally, in applying these equations to model the fractions of NS3a:drug:reader complexes, we make the further approximation that all NS3a:drug complexes will be fully bound by their corresponding reader.
Nevertheless, in comparing the predicted fraction NS3a bound to danoprevir or grazoprevir with transcriptional outputs coming from NS3a:danoprevir:DNCR2 or NS3a:grazoprevir:GNCR1, we see very good correspondence between the model and experimental results in
In
To enable temporal switching or graded control of gene expression from repression to overexpression, we utilized a scaffold RNA/RNA-binding protein (RBP) system with NS3a fused to the RBP MS2, GNCR1 fused to VPR, and DNCR2 fused to KRAB-MeCP2, a repressor with enhanced activity over KRAB.11 While more modest than the effect seen from the direct fusion system, this switchable system also demonstrated statistically significant overexpression (from grazoprevir treatment) or repression (from danoprevir treatment) of CXCR4 and CD95 (
Finally, in a demonstration of the multi-state transcriptional outputs that can be achieved with PROCISiR, we combined GNCR1, DNCR2, and ANR with three orthogonal scRNA/RBP pairs (com/com, PP7/PCP, and MS2/MCP) to control the expression of CXCR4, CD95, and GFP, respectively (
EGEVQIVSTATQTFLATSINGVLWIVYHGAGIRTIASPKGPVTQMYTNVDKD
LVGWQAPQGSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPIS
YLKGSAGGPLLCPAGHAVGIFRAAVSTRGVAKAVDFIPVESLETTMRSP
QEAAEEVKRDPSSSDVNEALKLIVEAIEAAVDALEAAERTGDPEVRELAREL
VRLAVEAAEEVQRNPSSSDVNEALHSIVYAIEAAIFALEAAERTGDPEVREL
ARELVRLAVEAAEEVQRNPSSRNVEHALMRIVLAIYLAEENLREAEESGDPE
KREKARERVREAVERAEEVQRDPSGWLNH (SEQ ID NO: 79)
QEAAEEVKRDPSSSDVNEALKLIVEAIEAAVDALEAAERTGDPEVRELAREL
VRLAVEAAEEVQRNPSSSDVNEALLSIVIAIEAAVHALEAAERTGDPEVREL
ARELVRLAVEAAEEVQRNPSSREVEHALMKIVLAIYEAEESLREAEESGDPE
KREKARERVREAVERAEEVQRDPSGWLNH (SEQ ID NO: 80)
GDTAYAQQTRGEEGCQETSQTGRDKNQVEGEVQIVSTATQTFLATSINGVLW
TVYHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLY
LVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSAGGPLLCPAGHAVGIFRAA
VSTRGVAKAVDFIPVESLETTMRSP (SEQ ID NO: 81)
GDTAYAQQTRGEEGCQETSQTGRDKNQVEGEVQIVSTATQTFLATSINGVLW
TVYHGAGIRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLY
LVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAA
VSTRGVAKAVDFIPVESLETTMRSP (SEQ ID NO: 82)
KAEEEAKEAQEKADELRQRHPDSQAAEDAEDLANEAEAAVLAACSLAQEHPN
ADIAKLCIKAASEAAEAASKAAELAQRHPDSQAARDAIKLASQAARAVILAI
MLAAENPNADIAKLCIKAASEAAEAASKAAELAQRHPDSQAARDAIKLASQA
AEAVERAIWLAAENPNADIAKKCIKAASEAAEEASKAAEEAQRHPDSQKARD
EIKEASQKAEEVKERCKSLEGGGSEQKLISEEDL (SEQ ID NO: 83)
ADIAKLCIKAASEAAEAASKAAELAQRHPDSQAARDAIKLASQAARAVILAI
MLAAENPNADIAKLCIKAASEAAEAASKAAELAQRHPDSQAARDAIKLASQA
AEAVERAIWLAAENPNADIAKKCIKAASEAAEEASKAAEEAQRHPDSQKARD
EIKEASQKAEEVKERCKSLEGGGSEQKLISEEDL (SEQ ID NO: 84)
RELIERAKEAAERAQEAAERTGDPRVRELARELKRLAQEAAEEVKRDPSSSD
VNEALKLIVEAIEAAVDALEAAERTGDPEVRELARELVRLAVEAAEEVQRNP
SSSDVNEALLTIVIAIEAAVNALEAAERTGDPEVRELARELVRLAVEAAEEV
QRNPSSREVNIALWKIVLAIQEAVESLREAEESGDPEKREKARERVREAVER
AEEVQRDPSGWLNHLEGGGSEQKLISEEDL (SEQ ID NO: 85)
RELIERAKEAAERAQEAAERTGDPRVRELARELKRLAQEAAEEVKRDPSSSD
VNEALKLIVEAIEAAVDALEAAERTGDPEVRELARELVRLAVEAAEEVQRNP
SSSDVNEALLSIVIAIEAAVHALEAAERTGDPEVRELARELVRLAVEAAEEV
QRNPSSREVEHALMKIVLAIYEAEESLREAEESGDPEKREKARERVREAVER
AEEVQRDPSGWLNHLEGGGSEQKLISEEDL (SEQ ID NO: 86)
RELIERAKEAAERAQEAAERTGDPRVRELARELKRLAQEAAEEVKRDPSSSD
VNEALKLIVEAIEAAVDALEAAERTGDPEVRELARELVRLAVEAAEEVQRNP
SSSDVNEALHSIVYAIEAAIFALEAAERTGDPEVRELARELVRLAVEAAEEV
QRNPSSRNVEHALMRIVLAIYLAEENLREAEESGDPEKREKARERVREAVER
AEEVQRDPSGWLNHLEGGGSEQKLISEEDL (SEQ ID NO: 87)
MSSDEEEARELIERAKEAAERAQEAAERTGDPRVRELARELKRLAQEAAEEV
KRDPSSSDVNEALKLIVEAIEAAVDALEAAERTGDPEVRELARELVRLAVEA
AEEVQRNPSSSDVNEALHSIVYAIEAAIFALEAAERTGDPEVRELARELVRL
AVEAAEEVQRNPSSRNVEHALMRIVLAIYLAEENLREAEESGDPEKREKARE
RVREAVERAEEVQRDPSGWLNHEQKLISEEDLGSGIGSGTMVSKGEELFTGV
ERAQEAAERTGDPRVRELARELKRLAQEAAEEVKRDPSSSDVNEALKLIVEA
IEAAVDALEAAERTGDPEVRELARELVRLAVEAAEEVQRNPSSSDVNEALLS
IVIAIEAAVHALEAAERTGDPEVRELARELVRLAVEAAEEVQRNPSSREVEH
ALMKIVLAIYEAEESLREAEESGDPEKREKARERVREAVERAEEVQRDPSGW
LNHEQKLISEEDLGSGTGSGTMVSKGEELFTGVVPILVELDGDVNGHKFSVS
MSSDEEEARELIERAKEAAERAQEAAERTGDPRVRELARELKRLAQEAAEEV
KRDPSSSDVNEALKLIVEAIEAAVDALEAAERTGDPEVRELARELVRLAVEA
AEEVQRNPSSSDVNEALLSIVIAIEAAVHALEAAERTGDPEVRELARELVRL
AVEAAEEVQRNPSSREVEHALMKIVLAIYEAEESLREAEESGDPEKREKARE
RVREAVERAEEVQRDPSGWLNHEQKLISEEDLGSGTGSGTMVSKGEELFTGV
TGDPRVRELARELKRLAQEAAEEVKRDPSSSDVNEALKLIVEATEAAVDALE
AAERTGDPEVRELARELVRLAVEAAEEVQRNPSSSDVNEALLSIVIAIEAAV
HALEAAERTGDPEVRELARELVRLAVEAAEEVQRNPSSREVEHALMKIVLAI
YEAEESLREAEESGDPEKREKARERVREAVERAEEVQRDPSGWLNHEQKLIS
MSSDEEEARELIERAKEAAERAQEAAERTGDPRVRELARELKRLAQEAAEEV
KRDPSSSDVNEALKLIVEAIEAAVDALEAAERTGDPEVRELARELVRLAVEA
AEEVQRNPSSSDVNEALLSIVIAIEAAVHALEAAERTGDPEVRELARELVRL
AVEAAEEVQRNPSSREVEHALMKIVLAIYEAEESLREAEESGDPEKREKARE
RVREAVERAEEVQRDPSGWLNHEQKLISEEDLGSGTGSGTMVSKGEELFTGV
INLSGDTAYAQQTRGEEGCQETSQTGRDKNQVEGEVQIVSTATQTFLATSIN
GVLWTVYHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGS
SDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSAGGPLLCPAGHAVGI
FRAAVSTRGVAKAVDFIPVESLETTMRSP (SEQ ID NO: 93)
EEGCQETSQTGRDKNQVEGEVQIVSTATQTFLATSINGVLWTVYHGAGTRTI
ASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLYLVTRHADVIPV
RRRGDSRGSLLSPRPISYLKGSAGGPLLCPAGHAVGIFRAAVSTRGVAKAVD
FIPVESLETTMRSP (SEQ ID NO: 94)
INLSGDTAYAQQTRGEEGCQETSQTGRDKNQVEGEVQIVSTATQTFLATSIN
GVLWTVYHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTCGS
SDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSAGGPLLCPAGHAVGI
FRAAVSTRGVAKAVDFIPVESLETTMRSPGSGTGSGSGEPQQSFSEAQQQLC
TGRDKNQVEGEVQIVSTATQTFLATSINGVLWTVYHGAGTRTIASPKGPVTQ
MYTNVDKDLVGWQAPQGSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGS
LLSPRPISYLKGSAGGPLLCPAGHAVGIFRAAVSTRGVAKAVDFIPVESLET
TMRSP (SEQ ID NO: 96)
GEVQIVSTATQTFLATSINGVLWTVYHGAGTRTIASPKGPVTQMYTNVDKDL
VGWQAPQGSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISY
LKGSAGGPLLCPAGHAVGIFRAAVSTRGVAKAVDFIPVESLETTMRSP
MKKKGSVVIVGRINLSGDTAYSQQTRGLEGCQETSQTGRDKNQVEGEVQVVS
TATQSFLATSINGVLWTVYHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQ
GSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGG
PLLCPAGHAVGIFRAAVSTRGVAKAVDFIPVESLETTMRSPGSGTGSGMVSK
AERTGDPRVRELARELKRLAQEAAEEVKRDPSSSDVNEALKLIVEAIEAAVD
ALEAAERTGDPEVRELARELVRLAVEAAEEVQRNPSSSDVNEALHSIVYAIE
AAIFALEAAERTGDPEVRELARELVRLAVEAAEEVQRNPSSRNVEHALMRIV
LAIYLAEENLREAEESGDPEKREKARERVREAVERAEEVQRDPSGWLNHEQK
LVYLLDGPGYDPIHSDGSGTGSGTGSGTGTTSGTGTGGSTGEQKLISEEDLG
VYLLDGPGYDPIHSDGSGTGSGTGSGTGTTSGTGTGGSTGGELDELVYLLDG
PGYDPIHSDGSGATNFSLLKQAGDVEENPGPMSSDEEEARELIERAKEAAER
AQEAAERTGDPRVRELARELKRLAQEAAEEVKRDPSSSDVNEALKLIVEAIE
AAVDALEAAERTGDPEVRELARELVRLAVEAAEEVQRNPSSSDVNEALHSIV
YAIEAAIFALEAAERTGDPEVRELARELVRLAVEAAEEVQRNPSSRNVEHAL
MRIVLAIYLAEENLREAEESGDPEKREKARERVREAVERAEEVQRDPSGWLN
HEQKLISEEDLGSGTGSGTMVSKGEELFTGVVPILVELDGDVNGHKFSVSGE
ERAQEAAERTGDPRVRELARELKRLAQEAAEEVKRDPSSSDVNEALKLIVEA
IEAAVDALEAAERTGDPEVRELARELVRLAVEAAEEVQRNPSSSDVNEALHS
IVYAIEAAIFALEAAERTGDPEVRELARELVRLAVEAAEEVQRNPSSRNVEH
ALMRIVLAIYLAEENLREAFESGDPEKREKARERVREAVERAEEVQRDPSGW
LNHEQKLISEEDLGSGTGSGTMVSKGEELFTGVVPILVELDGDVNGHKFSVS
MDIEKLCKKAEEEAKEAQEKADELRQRHPDSQAAEDAEDLANLAVAAVLTAC
LLAQEHPNADIAKLCIKAASEAAEAASKAAELAQRHPDSQAARDAIKLASQA
ARAVILAIMLAAENPNADIAKLCIKAASEAAEAASKAAELAQRHPDSQAARD
AIKLASQAAEAVERAIWLAAENPNADIAKKCIKAASEAAEEASKAAEEAQRH
PDSQKARDEIKEASQKAEEVKERCKSEQKLISEEDLGSGSSELIKENMHMKL
MKKKGSVVIVGRINLSGDTAYSQQTRGLEGCQETSQTGRDKNQVEGEVQVVS
TATQSFLATSINGVLWTVYHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQ
GSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGG
PLLCPAGHAVGIFRAAVSTRGVAKAVDFIPVESLETTMRSPGSGTGSGMVSK
MDIEKLCKKAEEEAKEAQEKADELRQRHPDSQAAEDAEDLANLAVAAVLTAC
LLAQEHPNADIAKLCIKAASEAAEAASKAAELAQRHPDSQAARDAIKLASQA
ARAVILAIMLAAENPNADIAKLCIKAASEAAEAASKAAELAQRHPDSQAARD
AIKLASQAAEAVERAIWLAAENPNADIAKKCIKAASEAAEEASKAAEEAQRH
PDSQKARDEIKEASQKAEEVKERCKSEQKLISEEDLGSGSSELIKENMHMKL
MSSDEEEARELIERAKEAAERAQEAAERTGDPRVRELARELKRLAQEAAEEV
KRDPSSSDVNEALKLIVEAIEAAVDALEAAERTGDPEVRELARELVRLAVEA
AEEVQRNPSSSDVNEALHSIVYAIEAAIFALEAAERTGDPEVRELARELVRL
AVEAAEEVQRNPSSRNVEHALMRIVLAIYLAEENLREAEESGDPEKREKARE
RVREAVERAEEVQRDPSGWLNHEQKLISEEDLGSGTGSGTRLLYPVSKYQQD
MSSDEEEARELIERAKEAAERAQEAAERTGDPRVRELARELKRLAQEAAEEV
KRDPSSSDVNEALKLIVEAIEAAVDALEAAERTGDPEVRELARELVRLAVEA
AEEVQRNPSSSDVNEALHSIVYAIEAAIFALEAAERTGDPEVRELARELVRL
AVEAAEEVQRNPSSRNVEHALMRIVLAIYLAEENLREAEESGDPEKREKARE
RVREAVERAEEVQRDPSGWLNHEQKLISEEDLEFSSAAGTSDALDDFDLDML
MSSDEEEARELIERAKEAAERAQEAAERTGDPRVRELARELKRLAQEAAEEV
KRDPSSSDVNEALKLIVEAIEAAVDALEAAERTGDPEVRELARELVRLAVEA
AEEVQRNPSSSDVNEALHSIVYAIEAAIFALEAAERTGDPEVRELARELVRL
AVEAAEEVQRNPSSRNVEHALMRIVLAIYLAEENLREAEESGDPEKREKARE
RVREAVERAEEVQRDPSGWLNHEQKLISEEDLEFSSAAGTSGGGGGMDAKSL
MKKKGSVVIVGRINLSGDTAYSQQTRGLEGCQETSQTGRDKNQVEGEVQVVS
TATQSFLATSINGVLWTVYHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQ
GSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGG
PLLCPAGHAVGIFRAAVSTRGVAKAVDFIPVESLETTMRSPHMSSAAGATMS
GCGGGTGGTCGGTAGTGAGTCGTTTAAGAGCTATGCTGGAAACAGCATAGCA
GCAGACGCGAGGAAGGAGGGCGCGTTTAAGAGCTATGCTGGAAACAGCATAG
GCCTCTGGGAGGTCCTGTCCGGCTCGTTTAAGAGCTATGCTGGAAACAGCAT
RQRHPDSQAAEDAEDLANLAVAAVLTACLLAQEHPNADIAKLCIKAASEAAE
AASKAAELAQRHPDSQAARDAIKLASQAARAVILAIMLAAENPNADIAKLCI
KAASEAAEAASKAAELAQRHPDSQAARDAIKLASQAAEAVERAIWLAAENPN
ADIAKKCIKAASEAAEEASKAAEEAQRHPDSQKARDEIKEASQKAEEVKERC
KSEQKLISEEDLGSGATNFSLLKQAGDVEENPGPSELIKENMHMKLYMEGTV
GCGGGTGGTCGGTAGTGAGTCGTTTAAGAGCTATGCTGGAAACAGCATAGCA
GCAGACGCGAGGAAGGAGGGCGCGTTTAAGAGCTATGCTGGAAACAGCATAG
GCCTCTGGGAGGTCCTGTCCGGCTCGTTTAAGAGCTATGCTGGAAACAGCAT
RHPDSQAAEDAEDLANLAVAAVLTACLLAQEHPNADIAKLCIKAASEAAEAA
SKAAELAQRHPDSQAARDAIKLASQAARAVILAIMLAAENPNADIAKLCIKA
ASEAAEAASKAAELAQRHPDSQAARDAIKLASQAAEAVERAIWLAAENPNAD
IAKKCIKAASEAAEEASKAAEEAQRHPDSQKARDEIKEASQKAEEVKERCKS
GTACAGCAGAAGCCTTTAGAAGTTTAAGAGCTATGCTGGAAACAGCATAGCA
GTGGCATGCTCACTTCAGGTGGTTTAAGAGCTATGCTGGAAACAGCATAGCA
GAAGCCTCGCTGGGGAACGCCGTTTAAGAGCTATGCTGGAAACAGCATAGCA
GTACGTTCTCTATCACTGATAGTTTAAGAGCTATGCTGGAAACAGCATAGCA
RAQEAAERTGDPRVRELARELKRLAQEAAEEVKRDPSSSDVNEALKLIVEAI
EAAVDALEAAERTGDPEVRELARELVRLAVEAAEEVQRNPSSSDVNEALHSI
VYAIEAAIFALEAAERTGDPEVRELARELVRLAVEAAEEVQRNPSSRNVEHA
LMRIVLAIYLAEENLREAEESGDPEKREKARERVREAVERAEEVQRDPSGWL
NHEQKLISEEDLGSGATNFSLLKQAGDVEENPGPSELIKENMHMKLYMEGTV
GTACGTTCTCTATCACTGATAGTTTAAGAGCTATGCTGGAAACAGCATAGCA
ELDELVYLLDGPGYDPIHSDGSGTGSGTGSGTGTTSGTGTGGSTGGELDELV
YLLDGPGYDPIHSDGSGATNESLLKQAGDVEENPGPSELIKENMHMKLYMEG
MKKKGSVVIVGRINLSGDTAYSQQTRGLEGCQETSQTGRDKNQVEGEVQVVS
TATQSFLATSINGVLWTVYHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQ
GSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGG
PLLCPAGHAVGIFRAAVSTRGVAKAVDFIPVESLETTMRSPGSGTGSGEQKL
GCGGGTGGTCGGTAGTGAGTCGTTTAAGAGCTATGCTGGAAACAGCATAGCA
GCAGACGCGAGGAAGGAGGGCGCGTTTAAGAGCTATGCTGGAAACAGCATAG
GCCTCTGGGAGGTCCTGTCCGGCTCGTTTAAGAGCTATGCTGGAAACAGCAT
GTACAGCAGAAGCCTTTAGAAGTTTAAGAGCTATGCTGGAAACAGCATAGCA
GTGGCATGCTCACTTCAGGTGGTTTAAGAGCTATGCTGGAAACAGCATAGCA
GAAGCCTCGCTGGGGAACGCCGTTTAAGAGCTATGCTGGAAACAGCATAGCA
GRINLSGDTAYAQQTRGEEGCQETSQTGRDKNQVEGEVQIVSTATQTFLATS
INGVLWTVYHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPCTC
GSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSAGGPLLCPAGHAV
GIFRAAVSTRGVAKAVDFIPVESLETTMRSPGSGATNFSLLKQAGDVEENPG
VKRDPSSSDVNEALKLIVEAIEAAVDALEAAERTGDPEVRELARELVRLAVE
AAEEVQRNPSSSDVNEALHSIVYAIEAAIFALEAAERTGDPEVRELARELVR
LAVEAAEEVQRNPSSRNVEHALMRIVLAIYLAEENLREAEESGDPEKREKAR
ERVREAVERAEEVQRDPSGWLNHEQKLISEEDLSGGGSGGSGSMDAKSLTAW
ELRQRHPDSQAAEDAEDLANLAVAAVLTACLLAQEHPNADIAKLCIKAASEA
AEAASKAAELAQRHPDSQAARDAIKLASQAARAVILAIMLAAENPNADIAKL
CIKAASEAAEAASKAAELAQRHPDSQAARDAIKLASQAAEAVERAIWLAAEN
PNADIAKKCIKAASEAAEEASKAAEEAQRHPDSQKARDEIKEASQKAEEVEE
RCKSEQKLISEEDLEFSSAAGTSDALDDFDLDMLGSDALDDFDLDMLGSDAL
IVGRINLSGDTAYAQQTRGEEGCQETSQTGRDKNQVEGEVQIVSTATQTFLA
TSINGVLWTVYHGAGTRTIASPKGPVTQMYTNVDKDLVGWQAPQGSRSLTPC
TCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSAGGPLLCPAGH
AVGIFRAAVSTRGVAKAVDFIPVESLETTMRSPSAGGSAGGEKMSKDGKKKK
EEARELIERAKEAAERAQEAAERTGDPRVRELARELKRLAQEAAEEVKRDPS
SSDVNEALKLIVEAIEAAVDALEAAERTGDPEVRELARELVRLAVEAAEEVQ
RNPSSSDVNEALHSIVYAIEAAIFALEAAERTGDPEVRELARELVRLAVEAA
EEVQRNPSSRNVEHALMRIVLAIYLAEENLREAEESGDPEKREKARERVREA
VERAEEVQRDPSGWLNHSAGGSAGGSAGGSAGGSGASGSGATNFSLLKQAGD
EDAEDLANLAVAAVLTACLLAQEHPNADIAKLCIKAASEAAEAASKAAELAQ
RHPDSQAARDAIKLASQAARAVILAIMLAAENPNADIAKLCIKAASEAAEAA
SKAAELAQRHPDSQAARDAIKLASQAAEAVERAIWLAAENPNADIAKKCIKA
ASEAAEEASKAAEEAQRHPDSQKARDEIKEASQKAEEVKERCKSSAGGSAGG
KNQVEGEVQIVSTATQTFLATSINGVLWTVYHGAGTRTIASPKGPVTQMYTN
VDKDLVGWQAPQGSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSP
RPISYLKGSAGGPLLCPAGHAVGIFRAAVSTRGVAKAVDFIPVESLETTMRS
PSAGGSAGGEKMSKDGKKKKKKSKTKCVIM - (IRES) -
EEARELIERAKEAAERAQEAAERTGDPRVRELARELKRLAQEAAEEVKRDPS
SSDVNEALKLIVEATEAAVDALEAAERTGDPEVRELARELVRLAVEAAEEVQ
RNPSSSDVNEALHSIVYAIEAAIFALEAAERTGDPEVRELARELVRLAVEAA
EEVQRNPSSRNVEHALMRIVLAIYLAEENLREAEESGDPEKREKARERVREA
VERAEEVQRDPSGWLNHSAGGSAGGSAGGSAGGSGASRQLSDADKLRKVICE
LLETERTYVKDLNCLMERYLKPLQKETFLTQDELDVLFGNLTEMVEFQVEFL
KTLEDGVRLVPDLEKLEKVDQFKKVLFSLGGSFLYYADRFKLYSAFCASHTK
VPKVLVKAKTDTAFKAFLDAQNPKQQHSSTLESYLIKPIQRILKYPLLLREL
FALTDAESEEHYHLDVAIKTMNKVASHINEMQKIHEEGSGATNESLLKQAGD
EDAEDLANLAVAAVLTACLLAQEHPNADIAKLCIKAASEAAEAASKAAELAQ
RHPDSQAARDAIKLASQAARAVILAIMLAAENPNADIAKLCIKAASEAAEAA
SKAAELAQRHPDSQAARDAIKLASQAAEAVERAIWLAAENPNADIAKKCIKA
ASEAAEEASKAAEEAQRHPDSQKARDEIKEASQKAEEVKERCKSSAGGSAGG
LKVLDQVFYQRVSREGILSPSELRKIFSNLEDILQLHIGLNEQMKAVRKRNE
TSVIDQIGEDLLTWFSGPGEEKLKHAAATFCSNQPFALEMIKSRQKKDSRFQ
TFVQDAESNPLCRRLQLKDIIPTQMQRLTKYPLLLDNIAKYTEWPTEREKVK
KAADHCRQILNYVNQAVKEAENKQR (SEQ ID NO: 143)
GEVQIVSTATQTFLATSINGVLWTVYHGAGTRTIASPKGPVTQMYTNVDKDL
VGWQAPQGSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISY
LKGSAGGPLLCPAGHAVGIFRAAVSTRGVAKAVDFIPVESLETTMRSPGSGA
RELARELKRLAQEAAEEVKRDPSSSDVNEALKLIVEAIEAAVDALEAAERTG
DPEVRELARELVRLAVEAAEEVQRNPSSSDVNEALHSIVYAIEAAIFALEAA
ERTGDPEVRELARELVRLAVEAAEEVQRNPSSRNVEHALMRIVLAIYLAEEN
LREAEESGDPEKREKARERVREAVERAEEVQRDPSGWLNHEQKLISEEDLDA
This application claims priority to U.S. Provisional Patent Application Ser. No. 62/775,171 filed Dec. 4, 2018, incorporated by reference herein in its entirety.
This invention was made with government support under Grant No. R01GM086858 awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/064203 | 12/3/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62775171 | Dec 2018 | US |