A computer readable form of the Sequence Listing is filed with this application by electronic submission and is incorporated into this application by reference in its entirety. The Sequence Listing is contained in the file created on Apr. 11, 2022 having the file name “21-0379-WO-SeqList_ST25.txt” and is 322 kb in size.
Cyclic GMP-AMP Synthase (cGAS) is a pattern recognition receptor critical for the innate immune response to intracellular pathogens, DNA damage, tumorigenesis, and senescence. Binding to double-stranded DNA (dsDNA) induces conformational changes in cGAS that activate the enzyme to produce 2′-3′ cyclic GMP-AMP (cGAMP), a second messenger that initiates a potent interferon response through its receptor, STING. The cGAS-STING pathway is a major target for prevention and treatment of infectious disease, cancer, and autoimmunity, although most current efforts are limited to small molecule drugs.
In one aspect, the disclosure provides polypeptides comprising an amino acid sequence at least 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO:1 or 2, wherein the polypeptide is a constitutively active Cyclic GMP-AMP Synthase (cGAS) mutant;
In various aspects, residue 55 is V, I, A, S, or G relative to residue numbering in SEQ ID NO:1, preferably wherein residue 55 is V or I, most preferably wherein residue 55 is V; residue 61 is K relative to residue numbering in SEQ ID NO:1; 1, 2, or all 3 of the following are true: residue 62 is V, residue 63 is K, and/or residue 64 is I, relative to residue numbering in SEQ ID NO:1; and/or one or more of the following is true relative to residue numbering in SEQ ID NO: 1:
In another aspect, the disclosure provides polypeptides comprising an amino acid sequence at least 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55% 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO:3 or 4,
In various embodiments, residue 51 is P, L, or F and residue 55 is F, I, W, or L, relative to residue numbering in SEQ ID NO:3; residue 52 is V, I, A, G, or S relative to residue numbering in SEQ ID NO:3; preferably wherein residue 52 is V or I, or more preferably wherein residue 52 is V; residue 58 is K relative to residue numbering in SEQ ID NO:3; 1, 2, or all 3 are true: residue 59 is V, residue 60 is K, and/or residue 61 is I relative to residue numbering in SEQ ID NO:3; and/or one or more of the following is true relative to residue numbering in SEQ ID NO:3:
In various embodiments, the polypeptide does not include the mutation or combination of mutations listed on a single line in Table 1 relative to residue numbering in SEQ ID NO:1, in Table 2 relative to residue numbering in SEQ ID NO:2, in Table 3 relative to residue numbering in SEQ ID NO:3, or in Table 4 relative to residue numbering in SEQ ID NO:4.
In other embodiments, the polypeptides comprise an amino acid sequence at least 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NOS:5-59.
In other aspects, the disclosure provides nucleic acids encoding the polypeptide of any embodiment of the disclosure; expression vectors comprising the nucleic acid operatively linked to a suitable control sequence; host cell comprising the polypeptide, composition, nucleic acid, or expression vector of any embodiment herein, pharmaceutical compositions comprising the nucleic acid, expression vector, or host cell of any embodiment herein, and a pharmaceutically acceptable carrier; and methods and uses of the polypeptides, compositions, nucleic acids, expression vectors, or host cells of any embodiment herein for any therapeutic use, including but not limited for use as an adjuvant, such as for use as an adjuvant in combination with a prophylactic or therapeutic vaccine.
All references cited are herein incorporated by reference in their entirety. As used herein, the singular forms “a”, “an” and “the” include plural referents unless the context clearly dictates otherwise.
As used herein, the amino acid residues are abbreviated as follows: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R). cysteine (Cys; C), glutamic acid (Glu; E), glutamine (Gln; Q), glycine (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Leu; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V).
Any N-terminal methionines in the polypeptides of the disclosure are optional, and may be present or absent.
All embodiments of any aspect of the disclosure can be used in combination, unless the context clearly dictates otherwise.
Unless the context clearly requires otherwise, throughout the description and the claims, the words ‘comprise’, ‘comprising’, and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to”. Words using the singular or plural number also include the plural and singular number, respectively. Additionally, the words “herein,” “above.” and “below” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of the application.
In one aspect, the disclosure provides polypeptides comprising an amino acid sequence at least 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO:1 or 2,
CDSAFRGVGLLNTGSYYEHVKISAPNEFDVMFKLEVPRIQLEEYSNTRAY
YFVKFKRNPKENPLSQFLEGEILSASKMLSKFRKIIKEEINDIKDTDVIM
KRKRGGSPAVTLLISEKISVDITLALESKSSWPASTQEGLRIQNWLSAKV
RKQLRLKPFYLVPKHAKEGNGFQEETWRLSFSHIEKEILNNHGKSKTCCE
NKEEKCCRKDCLKLMKYLLEQLKERFKDKKHLDKFSSYHVKTAFFHVCTQ
NPQDSQWDRKDLGLCFDNCVTYFLWCLRTEKLENYFIPEFNLFSSNLIDK
RSKEFLTKQIEYERNNEFPVFDEF (Full length Human cGas)
The polypeptides disclosed herein are constitutively active Cyclic GMP-AMP Synthase (cGAS) mutants), as disclosed in detail in the examples that follow, and thus can be used, for example, as adjuvants for vaccine administration and to stimulate anti-tumor immunity.
As disclosed herein, the inventors developed a simple, knowledge-based two-state design protocol that can be generally applied to stabilize specific conformations of dynamic proteins where target and off-target structures are known. Extensive studies were carried out, as detailed in the examples below, to identify key residues involved in the structural rearrangements between DNA-free and DNA bound conformations of cGAS to arrive at the DNA-independent, constitutively active cGAS mutants of the disclosure, and to verify that other residues can vary quite broadly.
SEQ ID NO:1 is the amino acid sequence of truncated human cGAS, and SEQ ID NO:2 is the amino acid sequence of full length cGAS, which is 156 amino acids longer than SEQ ID NO:1. Residue numbering to define amino acid sequences is relative to SEQ ID NO:1 (the truncated version). Thus, if residue numbering is relative to the amino acid of SEQ ID NO:2, then 156 amino acids would be added. For example:
Those of skill in the art will be able, based on the disclosure, determine the amino acid residue relative to SEQ ID NO:2 for other specified positions relative to SEQ ID NO:1 disclosed herein.
In one embodiment, residue 19 is P and residue 20 is A relative to SEQ ID NO:1. In another embodiment, residue 19 is D and residue 20 is P relative to SEQ ID NO:1.
In various further embodiments, the following positions relative to SEQ ID NO:1 numbering may be one or more of the following:
In one embodiment, the polypeptide does not include the mutation or combination of mutations listed on a single line in Table 1, relative to SEQ ID NO: 1.
In another embodiment, the polypeptide does not include the mutation or combination of mutations listed on a single line Table 2, relative to SEQ ID NO: 2.
In another aspect, the disclosure provides polypeptides comprising an amino acid sequence at least 20%, 25%, 30%, 35% 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO:3 or 4,
KLKKVLDKLRLKRKDISEAAETVNKVVERLLRRMQKRESEFKGVEQLNTG
SYYEHVKISAPNEFDVMFKLEVPRIELQEYYETGAFYLVKFKRIPRGNPL
SHFLEGEVLSATKMLSKFRKIIKEEVKEIKDIDVSVEKEKPGSPAVTLLI
RNPEEISVDIILALESKGSWPISTKEGLPIQGWLGTKVRTNLRREPFYLV
PKNAKDGNSFQGETWRLSFSHTEKYILNNHGIEKTCCESSGAKCCRKECL
KMKYLLEQLKKEFQELDAFCSYHVKTAIFHMWTQDPQDSQWDPRNLSSCF
DKLLAFFLECLRTEKLDHYFIPKENLFSQELIDRKSKEFLSKKIEYERNN
GFPIFDKL Full length mouse cGAS
SEQ ID NO:3 is the amino acid sequence of truncated mouse cGAS, and SEQ ID NO:4 is the amino acid sequence of full length mouse cGAS, which is 145 amino acids longer than SEQ ID NO:4. Residue numbering to define amino acid sequences is relative to SEQ ID NO:3 (the truncated version). Thus, if residue numbering is relative to the amino acid of SEQ ID NO:4, then 145 amino acids would be added.
In one embodiment, residue 15 is P and residue 16 is A relative to SEQ ID NO: 3. In another embodiment, wherein residue 15 is D and residue 16 is P relative to SEQ ID NO: 3.
In various further embodiments, the following positions relative to SEQ ID NO:3 numbering may be one or more of the following:
In another embodiment, the polypeptide does not include the mutation or combination of mutations listed on a single line Table 3, relative to SEQ ID NO: 3.
In another embodiment, the polypeptide does not include the mutation or combination of mutations listed on a single line Table 4, relative to SEQ ID NO: 4.
In another embodiment, the polypeptides comprise an amino acid sequence at least 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence selected from the group consisting of SEQ ID NO:5-59, wherein any N-terminal methionines are optional and may be present or absent.
DVMFKLEVPRIQLEEYSNTRAYYFVKFKRNPKENPLSQFLEGEILSASKMLSKFRKIIKEEINDIKDTDV
In an embodiment of all aspects and embodiments of the polypeptides disclosed herein, a given amino acid at a non-specified residue can be replaced by a residue having similar physiochemical characteristics, e.g., substituting one aliphatic residue for another (such as Ile, Val, Leu, or Ala for one another), or substitution of one polar residue for another (such as between Lys and Arg; Glu and Asp; or Gln and Asn). Other such conservative substitutions, e.g., substitutions of entire regions having similar hydrophobicity characteristics, are known. Polypeptides comprising conservative amino acid substitutions can be tested in any one of the assays described herein to confirm that the desired activity is retained. Amino acids can be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp. 73-75, Worth Publishers, New York (1975)): (1) non-polar: Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), Met (M); (2) uncharged polar: Gly (G), Ser (S), Thr (T), Cys (C), Tyr (Y), Asn (N), Gln (Q); (3) acidic: Asp (D), Glu (E); (4) basic: Lys (K), Arg (R), His (H). Alternatively, naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe. Non-conservative substitutions will entail exchanging a member of one of these classes for another class. Particular conservative substitutions include, for example; Ala into Gly or into Ser; Arg into Lys; Asn into Gln or into H is; Asp into Glu; Cys into Ser; Gln into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; Ile into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into Gln or into Glu; Met into Leu, into Tyr or into Ile; Phe into Met, into Leu or into Tyr; Ser into Thr; Thr into Ser; Trp into Tyr; Tyr into Trp; and/or Phe into Val, into Ile or into Leu.
The polypeptides of all aspects and embodiments of the disclosure may further comprise additional amino acid residues at the C-terminus and/or N-terminus. Any additional residues or functional domains may be added as appropriate for an intended purpose. In various non-limiting embodiments, the polypeptides may further comprise one or more functional domains (including but not limited to a therapeutic polypeptide domain, a targeting domain, a diagnostic polypeptide domain, etc.), detectable sequences (ex: fluorescent domains), purification tags, etc.
In another embodiment, compositions are provided comprising 2, 3, 4, 5, 6, 7, 8, 9, 10, or more polypeptides of the disclosure.
In another embodiment, the disclosure provides nucleic acids encoding the polypeptide of any embodiment or combination of embodiments herein. The nucleic acid sequence may comprise single stranded or double stranded RNA or DNA in genomic or cDNA form, or DNA-RNA hybrids, each of which may include chemically or biochemically modified, non-natural, or derivatized nucleotide bases. Such nucleic acid sequences may comprise additional sequences useful for promoting expression and/or purification of the encoded polypeptide, including but not limited to polyA sequences, modified Kozak sequences, and sequences encoding epitope tags, export signals, and secretory signals, nuclear localization signals, and plasma membrane localization signals. It will be apparent to those of skill in the art, based on the teachings herein, what nucleic acid sequences will encode the polypeptides of the disclosure.
In one embodiment, the disclosure provides expression vectors comprising a nucleic acid of the disclosure operatively linked to a suitable control sequence. “Expression vector” includes vectors that operatively link a nucleic acid coding region or gene to any control sequences capable of effecting expression of the gene product. “Control sequences” operatively linked to the nucleic acid sequences of the disclosure are nucleic acid sequences capable of effecting the expression of the nucleic acid molecules. The control sequences need not be contiguous with the nucleic acid sequences, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the nucleic acid sequences and the promoter sequence can still be considered “operably linked” to the coding sequence. Other such control sequences include, but are not limited to, polyadenylation signals, termination signals, and ribosome binding sites. Such expression vectors can be of any type, including but not limited plasmid and viral-based expression vectors. The control sequence used to drive expression of the disclosed nucleic acid sequences in a mammalian system may be constitutive (driven by any of a variety of promoters, including but not limited to, CMV, SV40, RSV, actin, EF) or inducible (driven by any of a number of inducible promoters including, but not limited to, tetracycline, ecdysone, steroid-responsive). The expression vector must be replicable in the host organisms either as an episome or by integration into host chromosomal DNA. In various embodiments, the expression vector may comprise a plasmid, viral-based vector, or any other suitable expression vector.
In another embodiment, the disclosure provides host cell comprising the polypeptide, composition, nucleic acid, or expression vector of embodiment or combination of embodiments of the disclosure. The host cells can be either prokaryotic or eukaryotic, and may transiently or constitutively comprise the polypeptide, composition, nucleic acid, or expression vector.
In a further embodiment, pharmaceutical compositions are provided comprising the polypeptide, composition, nucleic acid, expression vector, and/or host cell of any embodiment or combination of embodiments disclosed herein, and a pharmaceutically acceptable carrier. The pharmaceutical compositions of the disclosure can be used, for example, in the methods of the disclosure described herein. The pharmaceutical composition may further comprise (a) a lyoprotectant; (b) a surfactant; (c) a bulking agent; (d) a tonicity adjusting agent; (e) a stabilizer; (f) a preservative and/or (g) a buffer. In some embodiments, the buffer in the pharmaceutical composition is a Tris buffer, a histidine buffer, a phosphate buffer, a citrate buffer or an acetate buffer. The pharmaceutical composition may also include a lyoprotectant, e.g. sucrose, sorbitol or trehalose. In certain embodiments, the pharmaceutical composition includes a preservative e.g. benzalkonium chloride, benzethonium, chlorohexidine, phenol, m-cresol, benzyl alcohol, methylparaben, propylparaben, chlorobutanol, o-cresol, p-cresol, chlorocresol, phenylmercuric nitrate, thimerosal, benzoic acid, and various mixtures thereof. In other embodiments, the pharmaceutical composition includes a bulking agent, like glycine. In yet other embodiments, the pharmaceutical composition includes a surfactant e.g., polysorbate-20, polysorbate-40, polysorbate-60, polysorbate-65, polysorbate-80 polysorbate-85, poloxamer-188, sorbitan monolaurate, sorbitan monopalmitate, sorbitan monostearate, sorbitan monooleate, sorbitan trilaurate, sorbitan tristearate, sorbitan trioleaste, or a combination thereof. The pharmaceutical composition may also include a tonicity adjusting agent, e.g., a compound that renders the formulation substantially isotonic or isoosmotic with human blood. Exemplary tonicity adjusting agents include sucrose, sorbitol, glycine, methionine, mannitol, dextrose, inositol, sodium chloride, arginine and arginine hydrochloride. In other embodiments, the pharmaceutical composition additionally includes a stabilizer, e.g., a molecule which, when combined with a protein of interest substantially prevents or reduces chemical and/or physical instability of the protein of interest in lyophilized or liquid form. Exemplary stabilizers include sucrose, sorbitol, glycine, inositol, sodium chloride, methionine, arginine, and arginine hydrochloride.
In another embodiment, the disclosure provides methods and uses of polypeptide, composition, nucleic acid, expression vector, or host cell of any preceding claim for any therapeutic use, including but not limited for use as an adjuvant. The polypeptides disclosed herein are DNA-independent constitutively active Cyclic GMP-AMP Synthase (cGAS) mutants), as disclosed in detail in the examples that follow, and thus the polypeptides, compositions, nucleic acids, expression vectors, or host cells can be used, for example, as adjuvants for vaccine administration and to stimulate anti-tumor immunity.
Cyclic GMP-AMP Synthase (cGAS) is a pattern recognition receptor critical for the innate immune response to intracellular pathogens, DNA damage, tumorigenesis, and senescence. Binding to double-stranded DNA (dsDNA) induces conformational changes in cGAS that activate the enzyme to produce 2′-3′ cyclic GMP-AMP (cGAMP), a second messenger that initiates a potent interferon response through its receptor, STING. The cGAS-STING pathway is a major target for prevention and treatment of infectious disease, cancer, and autoimmunity, although most current efforts are limited to small molecule drugs. Here, we combined two-state computational design with informatics-guided design to create constitutively active, dsDNA ligand-independent variants of cGAS (CA-cGAS). We identified CA-cGAS mutants with interferon-stimulating activity approaching that of dsDNA-stimulated wild-type cGAS. DNA-independent adoption of the active conformation by CA-cGAS was directly confirmed by X-ray crystallography. Inducible expression of CA-cGAS in tumor cells in vivo resulted in STING-dependent tumor regression, demonstrating that the designed proteins have therapeutically relevant biological activity. Our work provides a general framework for stabilizing active conformations of enzymes through two-state computational design and provides CA-cGAS variants that could be useful as genetically-encoded adjuvants and as research tools for understanding inflammatory diseases.
The ability to induce a controlled and robust innate immune response is highly desirable for prophylactic and therapeutic immunopotentiation. The cGAS-STING pathway has rapidly become a promising target, with numerous small molecule STING agonists in development, several of which have entered clinical trials. However, several first-generation molecules suffered from low efficacy or adverse reactions. Constitutively active cGAS (CA-cGAS) variants that are ligand-independent could be useful as therapeutic agents because their activity would not depend on factors such as the cell cycle, subcellular localization, or disease state.
cGAS is a member of the OAS-family, a conserved group of metazoan proteins responsible for double stranded nucleic acid surveillance in the cytosol. OAS family proteins are characterized by an extended N-terminal α-helix, designated the “spine,” a mixed a/P nucleotidyltransferase fold domain (NTase core), and an all α-helical C-terminal domain. cGAS recognizes and is allosterically activated by dsDNA in a sequence-independent manner (
We hypothesized that mutations which i) stabilize the dsDNA-bound (active) conformation or ii) destabilize the unbound (inactive) conformation would shift the conformational equilibrium of the enzyme towards the active state, an outcome usually achieved by dsDNA binding (
To identify key residues involved in the structural rearrangements between DNA-free and DNA bound conformations, we calculated a distance difference matrix between the DNA-bound (active; PDB ID 4K97) and -unbound (inactive; PDB ID 4K8V) forms of mcGAS. The matrix measures the change in distance for each residue-residue pair in the two structures. Our analysis highlighted two regions that move significantly and are likely critical for cGAS activation: the spine helix above the active site that bridges the NTase core and C-domain (region 1, residues 155-170), and the active site loop and flanking beta strands (region 2, residues 196-214) (
To identify potential activating mutations in these regions, we evaluated the energy of all possible single amino acid substitutions in both the active and inactive conformations using Rosetta™ (660 mutations total) (20). Mutations were ranked by taking the difference in the energetic impact of each mutation between the active and inactive states, where the energetic impact is defined as the difference between the WT energy and single-mutant energy for a given state (
We screened the enzymatic activity of each cGAS variant in HEK293T cells, which do not naturally express cGAS, by transfecting each cGAS construct together with a STING expression vector and a luciferase-based IFN-stimulated response element (ISRE) reporter plasmid (
To close the activity gap between CA-cGAS-04 and WT cGAS, we used a combination of bioinformatics and computational approaches. We reasoned that proteins with structural homology specific to the active conformation of cGAS may inspire additional active conformation-stabilizing mutations. We used Protein Data Bank in Europe (21) to search for distantly related proteins with structures closely matching the active conformation of cGAS, and then reviewed available structural and functional data to identify those that are not known to undergo large conformational changes. One protein that matched these criteria was MiD51, an ADP-binding mitochondrial receptor that facilitates mitochondrial fission and adopts the same structure in ADP-bound and unliganded crystal forms (22). MiD51 contains an NTase domain that aligns well with active cGAS (1.96 Å Ca RMSD) (
To identify such mutations, we generated multiple sequence alignments (MSAs) of varying permissiveness for cGAS and MiD51 using the hhblits algorithm (23). At low stringency many sequences appeared in both alignments, in keeping with the distant homology between the two proteins, but cGAS-like and MiD51-like clusters were apparent (
Based on this analysis, a second set of mutants was selected and cloned into the CA-cGAS-04 background. Four of these variants, containing primarily polar to non-polar mutations (
To further characterize these active variants, we expressed and purified the core catalytic domain of each (d147 CA-cGAS), ensuring that our preparations were free of nucleic acid (
To explore alternative activating mutations in the spine helix, we also characterized a variant of CA-cGAS-41 containing the bioinformatics consensus mutations L159I/K160P/R161A. This variant is as active as CA-cGAS-41 (
The local environment around regions 1 and 2 in cGAS are reasonably well conserved, which suggests activating mutations should generalize to cGAS from different species. Mutations from CA-cGAS-22, -41, -42, and -50 were introduced with K395M/K399M into human cGAS (hcGAS) and activity was measured by ISRE-luciferase assay. All CA-hcGAS variants had near-WT activity (
To test the activity and therapeutic potential of CA-cGAS in vivo, we used the B16-F10 mouse melanoma model. Prior studies have shown that intratumoral injection of nuclease-resistant, modified 2′,3′ cyclic di-AMP causes tumor regression, and that this effect requires STING expression in the host, but not the tumor (24-26). This approach is currently being evaluated in humans (NCT02675439 and NCT03172936). Instead of injecting immunostimulatory molecules, we sought to determine whether controlled expression of CA-cGAS could generate therapeutically relevant levels of cGAMP in the tumor itself. To do this, we created a clonal line of B16-F10 cells transduced with a lentivirus encoding doxycycline (dox)-inducible CA-cGAS-50. We found that dox-mediated induction of CA-cGAS-50 did not affect cell growth in vitro (
Our results establish a general multi-state design framework for stabilizing target conformations in structurally dynamic proteins. In brief, this framework has three steps: 1) Identify target and non-target conformations for the system in question, 2) enumerate all possible mutations in dynamic regions in both conformations to identify those with the largest energetic difference between the two states, and 3) combine top-ranked mutations with additional design and bioinformatics analyses based on conformation-specific homologs to identify supporting mutations and prioritize designs for experimental testing. Most current computational design methods typically explicitly consider only a single, static state, which is fundamentally inconsistent with the physical reality of proteins. This is especially true for enzymes and other dynamic systems, making these difficult targets for design. The computational design framework presented herein overcomes many hurdles in multistate design and should be applicable to other multi-state protein design challenges. Our method is also computationally inexpensive and effective even without consideration of possible unknown structural states, such as the unanticipated crystal contacts we observed in our structures. Our designed CA-cGAS variants establish that structural rearrangements alone, without dsDNA binding (12), oligomerization (14, 15), or liquid-liquid phase separation (27), are sufficient for enzyme activity. Their independence of dsDNA binding and therefore the biological status of the cell (e.g., infected vs. uninfected) could make CA-cGAS molecules useful tools for better understanding the biological role of the cGAS-STING pathway in various tissues. Finally, our demonstration that induction of CA-cGAS-50 resulted in STING-dependent tumor regression in vivo establishes CA-cGAS molecules as biologic alternatives to small molecule STING activators with potential prophylactic and therapeutic applications in infectious disease and cancer.
Structures of mcGAS in the active (PDB ID 4K97) and inactive (PDB ID 4K8V) conformations were downloaded from RCSB. The residue-residue distance difference matrix was generated by calculating the alpha-carbon or side-chain heavy atom distances for the active and inactive conformations. The difference is the inactive conformation residue-residue distance subtracted from the active conformation residue-residue distance. To prepare the PDB files for computational design, all heteroatoms and DNA atoms were removed. For PDB 4K8V all chains except chain A were removed. The Rosetta™ computational design methodology was used to introduce point mutants at every residue in regions of the enzyme that undergo putatively relevant conformational changes during activation. Each residue was mutated to every other residue and any residues in a 10 Å sphere were allowed to repack (sample rotamer conformations) to accommodate the mutation. The entire pose was then minimized, but backbone atoms were constrained outside of a 15 Å sphere around the target residue. This protocol was applied to both the active and inactive conformation. The difference in total energy in RosettaTh Energy Units (REU) was calculated for the mutant and WT active conformations (dActive), and for the mutant and WT inactive conformations (dInactive). The difference between dActive and dInactive (normalized REU) was used to rank order mutants. A mutant with negative normalized REU should bias the structure towards the active conformation. The best scoring single mutants were combined or improved by either allowing any other site where a beneficial mutation was observed to mutate to one of those beneficial mutations or allowing design to any residue at those sites. Designs were scored as described above.
We also took a bioinformatics approach to identify likely variants. First we searched the PDB for cGAS structural homologues that had low sequence conservation using the protein structure comparison service PDBeFold at the European Bioinformatics Institute, paying close attention to the alignment of the active site and dynamic regions identified in cGAS. We then performed a literature search for information about the biochemistry of hits, looking specifically for a protein that did not undergo structural rearrangements, and identified a suitable protein, MiD51 (PDB ID 40AF). Using MiD51 and cGAS sequences excluding the unstructured N-terminal domain as inputs, we generated two overlapping multiple sequence alignments (MSAs) using the HHblits algorithm with a high e-value cutoff We then combined the two alignments and calculated bit-scores for cGAS and MiD51. We clustered the bit-score from alignments to cGAS and MiD51 using a Gaussian mixture model with Dirichlet process. We then generated MSAs with increasing e-value cutoff until there was no overlap between the cGAS MSA and MiD51 MSA, resulting in MSAs containing cGAS-like and MiD51-like proteins. We calculated the site-specific amino acid frequencies for each group, and the frequency difference between groups. To evaluate the magnitude of the differences in site specific amino acid frequencies we calculated the divergence. The site specific amino acid frequency divergence between cGAS and MiD51 multiple sequence alignments was calculated from the site specific entropy S=−Σpa×log2 (pa). The divergence was then calculated according to
The sequences for mouse cGAS (uniprot: Q8C6L5) and human cGAS (uniprot: Q8N884) were optimized for expression in mammalian cells and purchased as synthetic gBlocks from IDT. The gene fragment was amplified using Phusion™ Polymerase per the manufacturer's instructions with primers incorporating homology regions for the 5′ or 3′ ends of the target cut vector (pCDNA3, pET28b, or pCDB179) and for the gBlock. For pCDNA3.1 and pET28b vectors the primers were homologous to the 5′ and 3′ ends of the gBlock. In some applications the sequence for a myc tag (GAACAGAAACTGATTAGCGAAGAAGAT; SE ID NO:60) was included in the antisense primer. For pCDB179 the sense primer was homologous for the 3′ end of the cut vector, and for the 436th-454th base pairs in the gBlock, corresponding to a truncation at the 148th amino acid. Because cloning into pCDB179 introduces a SUMO domain on the N terminus of the protein and the 148th amino acid is a proline, a non-homologous serine codon (TCG) was added between the primer and gBlock homology regions. When the SUMO domain is cleaved from the peptide, the serine residue remains. Constructs with this truncation will be referred to as d147 cGAS. The antisense primer was homologous for the 5′ cut end of the vector. The resulting amplicons were assembled with restriction-digested (NdeI and XhoI) pET28b, (XhoI and BamHI) pCDB179, or (XhoI and NotI) pCDNA3.1 using Gibson Assembly and transformed into chemically competent E. coli DH5a cells (NEB). Colonies were verified by Sanger sequencing. To identify CA-cGAS, the ability to bind and be activated by DNA was first removed by introducing the mutations K395M and K399M into the wild-type sequence. The mutations were made by site directed mutagenesis. The resulting amplified plasmids were transformed into chemically competent K coil DH5 cells (NEB) and colonies verified by Sanger sequencing.
Activating mutations identified by computational design were generated by site directed mutagenesis using the Kunkel method. Single stranded, uracilated plasmid was generated by transforming genes in pCDNA3 plasmid into chemically competent CJ236 cells and inoculating six colonies into 3 mL of LB with carbenicillin. Cultures were allowed to expand for three to four hours at 37° C., shaking at 200 rpm before adding 3×109 plaque forming units of M13K07 helper phage. After an additional hour the culture was expanded 1.50 and grown overnight at 37° C., shaking at 200 rpm. Phage was isolated from the culture medium by first clarifying the medium and then pelleting the phage in 3.3% PEG 8000, 420 mM NaCl. Single stranded DNA was harvested using a Qiagen Qiaprep™ M13 kit (#27704). Oligonucleotide primers containing the desired mutation or mutations were obtained from IDT. Oligonucleotides were phosphorylated with T4 polynucleotide kinase (NEB) and diluted to an appropriate working concentration. Oligonucleotides were annealed to the single stranded template by slowly lowering the temperature from 95° C. to 25° C. at 1° C./min. The phosphorylated and annealed oligonucleotides were used to prime in vitro DNA synthesis, with T4 DNA ligase and T7 DNA polymerase (NEB). The product, double stranded heteroduplex plasmid, was transformed into chemically competent E. coli DH5a cells (NEB). Colonies were verified by Sanger sequencing.
CA-cGAS genes were cloned from pCDNA3 to pCDB179 or pET28b+ by in vitro amplification with an appropriate set of primers containing homology for the target plasmid, and then Gibson assembled into cut vector. pET28b was cut with XhoI and NdeI. pCDB179 was cut with XhoI and BamH1. For crystallography and in vitro activity assays, a truncated d147 variant was cloned into pCDB179 with a single serine residue inserted between the C-terminal SUMO glycine and N-terminal P147 residue in cGAS. The serine residue facilitates cleavage during purification.
To screen likely constitutively active mutants, pCDNA3 plasmid containing the mutant cGAS gene was purified from chemically competent E. coli DH5a cells (NEB) using a Qiagen Miniprep™ kit. 293T cells were seeded into 96-well flat bottom plates at 2.5×104 per well. Cells were transiently transfected with three plasmids containing mutant cGAS (5 ng/well), STING (20 ng/well), and luciferase under regulation by an Interferon Stimulated Response Element (5 ng/well). The cells were incubated for 8 to 18 hours before being lysed and luciferase activity assessed using the Luciferase Assay System (Promega E4550) according to the manufacturer's instructions.
cGAS mutants showing activity by ISRE assay were purified recombinantly from chemically competent E. coli T7 cells (NEB). A small culture of TB with kanamycin was inoculated and grown overnight at 37° C. Cultures were then expanded 1:50 into 500 mL of TB with kanamycin in a 2 L baffled shake flask and grown up to approximately OD 0.6 at 37° C., shaking at 220 rpm. Once the indicated cell density was achieved, expression was induced by the addition of β-D-1-thiogalactopyranoside (IPTG) at a final concentration of 1 mM. Expression proceeded for 18 h at 18° C. with shaking at 220 r.p.m. Cultures were harvested by centrifugation at 5,000 r.c.f. for 10 minutes, and the supernatant discarded. The pellet was resuspended in lysis buffer (50 mM Tris pH 8.0, 300 mM NaCl, 20 mM imidazole, 1 mM PMSF, 1 mM DTT, 0.1 mg/ml DNAse, and 0.1 μM RNase) and lysed at 4° C. by sonication with a probe sonicator at 70% power for two minutes or microfluidization with a single pass at 20,000 psi. Lysate was clarified by centrifugation at 12,000 r.c.f for 30 minutes. Clarified lysate was further filtered through a 0.22 μm PVDF membrane before loading onto a 5 mL HisTrap™ HP (Cytiva Life Sciences) column. The column was washed with approximately 25 mL of running buffer (50 mM Tris pH 8.0, 300 mM NaCl, 20 mM imidazole, 1 mM DTT) at 3 mL per minute and the protein eluted with a linear gradient from 0% to 100% elution buffer (50 mM Tris pH 8.0, 300 mM NaCl, 500 mM imidazole, 1 mM DTT) over 40 minutes. Protein elution was monitored by absorbance at 280 nm and purity estimated by monitoring absorbance at 260 nm.
To remove any bound dsDNA, the major elution fractions were pooled further purified by Heparin affinity chromatography. Pooled fractions were diluted in 20 mM Tris pH 8.0 to bring the final NaCl concentration down to approximately 200 mM. For full-length cGAS constructs the NaCl concentration must not drop below 200 mM, otherwise precipitation occurs. Truncated d147 cGAS constructs are more tolerant of low ionic strength. The diluted protein was then loaded onto a 5 mL HiTrap™ Heparin HP column (Cytiva Life Sciences). Nucleic acids were eluted by washing the column with 25 mL of wash buffer (20 mM Tris pH 8.0, 250 mM NaCl, 1 mM DTT), then the protein was eluted with a linear gradient from 0% to 100% elution buffer (20 mM Tris pH 8.0, 1000 mM NaCl, 1 mM DTT) over 40 minutes. Elution was monitored by absorbance at 260 nm (nucleic acids) and 280 nm (protein).
For d147 cGAS constructs, the SUMO domain was cleaved with a custom Ulpl variant produced as previously described (28). The protein concentration was first estimated by absorbance at 280 nm. After correcting for scattering, the concentration was calculated based on the theoretical molar extinction coefficient, assuming all cysteine residues are reduced. One milligram of protease per every 100 mg cGAS was added directly to the pooled fractions from the heparin column, without modification of the buffer. The cleavage reaction was allowed to proceed overnight at 4° C. The extent of cleavage was followed by SDS-PAGE. To remove SUMO, Ulpl protease, and uncleaved cGAS the entire reaction volume was loaded onto a 5 mL HisTrap™ HP column and the flowthrough collected. The column was then washed with 25 mL of running buffer at pH 7.5 and protein elution monitored at 280 nm. The major wash fractions were pooled with the flow through.
Full length or cleaved d147 cGAS constructs were further purified by SEC. Nickel or Heparin affinity eluates were pooled and concentrated in a spin concentrator to approximately one ml volume or a final concentration of ˜20 mg/mL, whichever was larger. One ml aliquots were injected onto a Superdex™ 75 10/300 GL column. For d147 cGAS constructs the column was equilibrated and run with low salt SEC buffers (20 mM Tris pH 7.5, 100 mM NaCl, 1 mM DTT). If the protein was to be frozen for storage, the buffer included 250 mM glycine and 5% v/v glycerol, except for d147 WT and d147 K395M/K399M cGAS, which was eluted in low salt SEC buffer, without glycine. Full-length constructs were eluted into low-salt SEC buffers. Again, if the protein was to be frozen, it was eluted in buffer with 250 mM glycine and 5% v/v glycerol. except WT and K395M/K399M cGAS, which was always eluted in high-salt (300 mM NaCl) SEC buffer. Full-length constructs eluted at 10.8 mL, and d147 constructs eluted at 12.6 mL. Major fractions were pooled and quantified by absorbance at 280 nm. Presence of DNA was evaluated by 260/280 nm ratio. Protein identity and purity was confirmed by SDS-PAGE, Liquid Chromatography-Mass Spectrometry, and UV-Vis spectroscopy. Constructs were frozen by diluting in SEC buffer to an appropriate concentration and aliquoting into cryo-safe tubes. Tubes were flash frozen in liquid nitrogen and stored at −80° C.
In Vitro cGAS Activity Assay
In vitro cGAS activity was determined using the method previously reported by Andreeva et al. (15). Activity was measured by combining 1 μM cGAS, 500 μM GTP, and 50 μM 2-aminopurine riboside-5′-O-triphosphate (fATP) (Biolog) with or without 2.6 ng/L plasmid dsDNA. The reaction was initiated by spiking in 5 mM MgCl2. The fluorescence intensity of fATP was monitored for three hours at 32° C. on a Synergy Neo2 plate reader (Biotek), with λex 305 nm and λem 363 nm. As cGAS converts GTP and fATP into cyclic GMP-fAMP, the fluorescence intensity decreases. The fluorescence intensity was normalized to a buffer-only sample with 50 μM fATP. The initial rate was fit with a linear model.
The interaction between CA-cGAS and dsDNA was evaluated by electrophoretic mobility shift. A 1% agarose gel was prepared in 1×Tris/Glycine buffer (Bio-Rad). CA-cGAS was mixed with a 40 bp dsDNA fragment purchased from IDT in 20 mM Tris pH 7.5, 100 mM NaCl buffer. Mixtures were set up such that the final concentration of dsDNA was constant for all ratios. After incubating CA-cGAS and dsDNA mixtures for 30 minutes, 2 μL of glycerol was added to 10 μL of the reaction and loaded into the pre-prepared agarose gel. Gels were run in 1×Tris/Glycine buffer at 50 volts for one hour. The gel was stained for 15 minutes in 1×SYBR™ Gold (Invitrogen) in the dark and imaged with a UV transilluminator. The gel was then stained for protein with GelCode™ Blue Stain Reagent (Thermo Scientific) for 15 minutes, de-stained overnight in water, and imaged.
Murine cGAS (mcGAS) mut50 was cloned into the pSLIK™ vector as described (10).
B16-F10 cells were maintained in Dulbecco's modification of Eagle medium (DMEM) supplemented with 10% FBS, 2 mM L-glutamine, 10 mM HEPES, 1 mM sodium pyruvate, 0.05 mM beta-mercaptoethanol, 100 IU penicillin, and 100 μg/ml streptomycin (complete DMEM). A clonal line of parental B16-F10 cells was created by lentiviral transduction of a construct that constitutively expressed the mWasabi™ fluorescent protein fused to a nuclear localization signal and a peptide derived from ovalbumin (SIINFEKL). These cells were transduced with pSLIK CA-cGAS-50. Single clones were grown up and tested for expression levels of CA-cGAS-50. A clone with low basal cGAMP production and high inducible expression of CA-cGAS-50 was selected for further experiments. For quantification of in vitro cell growth and CA-cGAS-50 induction, doxycycline was used at 1 g/ml in complete DMEM.
C57BL6/J mice were purchased from Jackson and allowed at least one week to acclimate prior to experiment initiation. STING KO mice (Tmem173−/−) were generated as described (29) and backcrossed to C57BL/6J (30). All mice were maintained in a specific pathogen-free (SPF) barrier facility at the University of Washington, and all experiments were done in accordance with the Institutional Animal Care and Use Committee guidelines of the University of Washington.
9-13 week old female mice were injected subcutaneously into the flank with 1×105 tumor cells mixed 1:1 with Matrigel™ Basement Matrix (Corning #354248) for a final injection volume of 100 μL. Tumor volume was calculated using the following formula: Volume=Short axis2×Long axis×0.523 (31). Mice receiving doxycycline were switched to doxycycline chow (Envigo #TD.01306 625 mg/kg irradiated) starting at day 10-11 post tumor implantation.
Tumor cGAMP Measurements
Tumors were dissected out of the mice and cut into small pieces in 3 ml of dissociation buffer: 2.7 mg/mL Collagenase A (Sigma #110088793001), 23 U/mL DNaseI (Sigma #D4263-5VL), 2 mM CaCl2) in 1×PBS. Tumors were then incubated at 37° C. with shaking for 30 minutes. 3 mL of termination buffer (2% FCS, 5 mM EDTA in 1×PBS) was then added. Samples were then filtered through a 70 mm mesh strainer, washed twice in 1×PBS, and lysed in 200 μL RIPA buffer: 150 mM NaCl, 1.0% NP-40 substitute, 0.5% sodium deoxycholate, 0.1% SDS, 50 mM Tris pH 8.0). 50 μL of sample was used per well to measure cGAMP content (Arbor Assays 2′,3′-Cyclic GAMP ELISA Kit #K067-H5W) per the manufacturer's instructions.
Tumor growth and cGAMP quantification data were visualized and analyzed using GraphPad™ Prism software. Statistical tests used to analyze data are noted in the figure legends.
This application claims priority to U.S. Provisional Application Ser. No. 63/174,279 filed Apr. 13, 2021, incorporated by reference herein in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/024255 | 4/11/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63174279 | Apr 2021 | US |