The Sequence Listing written in file 081906-226310US-1106551_SL.txt created on Dec. 6, 2018, 325,708 bytes, machine format IBM-PC, MS-Windows operating system, is hereby incorporated by reference in its entirety for all purposes.
While genomic DNA holds the key to the genetic code, epigenetics offers another layer of information that establishes cell fate during development, aging and disease, as well as in response to the environment. Epigenetics is a means by which the transcriptome (and thus the proteome) of a cell can be changed without alteration of the genetic content. Epigenetic regulation is thought to be accomplished through epigenetic marks such as posttranslational modifications of histones and DNA methylation, and also via other mechanisms involving noncoding RNAs (1,2). Regions of active gene expression and open chromatin carry a signature of epigenetic marks that is distinct from repressed and heterochromatic regions (2). For example, histone acetylation is always associated with active transcription, while different histone methylation marks are associated with active versus repressed chromatin. Specifically, trimethylation of lysine 4 on histone H3 (H3K4me3) is associated with active transcription, while trimethylation of H3K9 (H3K9me3) and H3K27 (H3K27me3) are associated with repressed chromatin regions. There has been a significant effort to decipher the relationship between epigenetic marks, regulatory element activity and gene regulation. Large consortia projects such as ENCODE and the Roadmap Epigenomics Project have mapped epigenetic signatures across the human genome in many different human cell types and tissues, which have then been correlated with gene expression (3,4). These association-based studies have provided epigenomic landscapes of epigenetic marks present at promoters and other regulatory elements, but cannot dissect the dynamic relationships between the epigenome and transcriptional control. While some evidence suggests that silencing of gene expression precedes de novo DNA methylation (5), the causal relationship between the presence of a histone mark and gene expression is still unclear. Accordingly, there is a need in the art for new tools that can be used to further explore the relationships between epigenetic modifications, transcriptional control, organismal development, and disease states. The present invention satisfies this need, and provides related advantages as well.
In one aspect, the present invention provides a fusion protein comprising (1) a catalytically inactive Cas9 (dCas9) domain and (2) an effector domain, wherein the effector domain is enhancer of zeste homolog 2 (Ezh2), Friend of GATA1 (FOG1), histone H3 lysine 9 methyltransferase G9A (G9A), histone-lysine N-methyltransferase SUV39H1 (SUV39H1), Krüppel-associated box (KRAB), DNA (cytosine-5)-methyltransferase 3A (DNMT3A), or a combination thereof. In some embodiments, the effector domain is located N-terminal and/or C-terminal to the dCas9 domain. In some embodiments, the fusion protein further comprises a nuclear localization signal (NLS) domain, a FLAG epitope tag, an amino acid linker, or a combination thereof. In some embodiments, the NLS domain, the FLAG epitope tag, and/or the amino acid linker are located N-terminal and/or C-terminal to the dCas9 domain. In some instances, the amino acid linker comprises the amino acid sequence (GGS)n, wherein the subscript n is the number of repeat units and is between 1 and 10 (e.g., n is equal to 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) (SEQ ID NO: 95). In some embodiments, the amino acid linker sequence comprises the amino acid sequence of any one of SEQ ID NOS:71-80. In particular embodiments, the effector domain is KRAB or DNMT3A and the effector domain is located N-terminal to the dCas9 domain.
In some embodiments, the effector domain is Ezh2 and the Ezh2 effector domain comprises the conserved cysteine-rich (CXC) and Su(var)3-9, Enhancer-of-zeste and Trithorax (SET) domains. In some embodiments, the Ezh2 effector domain further comprises the embryonic ectoderm development (EED) binding domain. In some embodiments, the Ezh2 effector domain comprises amino acids 1-746 of Ezh2 (SEQ ID NO:1). In some instances, the Ezh2 effector domain is located N-terminal to the dCas9 domain.
In some embodiments, the effector domain comprises amino acids 1-45 of FOG1 (SEQ ID NO:3), a first NLS domain is located at the N-terminal end of the protein, and a second NLS domain is located at the C-terminal end of the protein. In particular embodiments, the fusion protein further comprises a FLAG epitope tag that is located between the first NLS domain and the N-terminal end of the dCas9 domain.
In some embodiments, the FOG1 effector domain comprises 1, 2, 3, or 4 FOG1 effector domains that are located between the FLAG epitope tag and the N-terminal end of the dCas9 domain. In particular embodiments, the fusion protein further comprises an amino acid linker comprising the amino acid sequence (GGS)n, wherein n is 5 (SEQ ID NO:75), and the amino acid linker is located between the FOG1 effector domain and the N-terminal end of the dCas9 domain. In some embodiments, the fusion protein further comprises an amino acid linker comprising the amino acid sequence (GGS)n, wherein n is 5 (SEQ ID NO:75), and the amino acid linker is located between the C-terminal end of the dCas9 domain and the second NLS domain.
In some embodiments, the FOG1 effector domain is located between the second NLS domain and the C-terminal end of the dCas9 domain. In particular embodiments, the fusion protein further comprises an amino acid linker comprising the amino acid sequence (GGS)n, wherein n is 5 (SEQ ID NO:75), and the amino acid linker is located between the FLAG epitope tag and the N-terminal end of the dCas9 domain. In some embodiments, the fusion protein further comprises an amino acid linker comprising the amino acid sequence (GGS)n, wherein n is 5 (SEQ ID NO:75), and the amino acid linker is located between the C-terminal end of the dCas9 domain and the FOG1 effector domain.
In some embodiments, a first FOG1 effector domain is located between the FLAG epitope tag and the N-terminal end of the dCas9 domain, and a second FOG1 effector domain is located between the C-terminal end of the dCas9 domain and the second NLS domain. In particular embodiments, the fusion protein further comprises an amino acid linker comprising the amino acid sequence (GGS)n, wherein n is 5 (SEQ ID NO:75), and the amino acid linker is located between the first FOG1 effector domain and the N-terminal end of the dCas9 domain. In some embodiments, the fusion protein further comprises an amino acid linker comprising the amino acid sequence (GGS)n, wherein n is 5 (SEQ ID NO:75), and the amino acid linker is located between the C-terminal end of the dCas9 domain and the second FOG1 effector domain.
In another aspect, the present invention provides a nucleic acid comprising a polynucleotide sequence encoding a fusion protein provided herein. In yet another aspect, the present invention provides an expression vector comprising a nucleic acid provided herein. In still another aspect, the present invention provides a cell comprising a fusion protein provided herein or an expression vector provided herein.
In another aspect, the present invention provides a method for producing an epigenetic modification of a target chromatin site comprising a Cas9 recognition site. In some embodiments, the method comprises contacting the target chromatin site with a fusion protein provided herein. In particular embodiments, the epigenetic modification comprises acetylation, deacetylation, methylation, or a combination thereof. In some instances, methylation comprises the addition of one, two, or three methyl groups.
In some embodiments, an epigenetic modification of a nucleic acid and/or a histone protein is produced. In particular embodiments, an epigenetic modification of histone H3 is produced. In some instances, lysine 9 on histone H3 is trimethylated (H3K9me3) and/or lysine 27 on histone H3 is trimethylated (H3K27me3).
In some embodiments, an epigenetic modification is produced in vitro. In other embodiments, the fusion protein and the target chromatin site are in a cell. In some embodiments, the method further comprises contacting the target chromatin site with a single guide RNA (sgRNA). In particular embodiments, expression of the target chromatin site is suppressed.
Other objects, features, and advantages of the present invention will be apparent to one of skill in the art from the following detailed description and figures.
Epigenome editing is an emerging tool to alter epigenetic marks at defined genomic loci (6). Precise DNA targeting was first accomplished with the design of programmable proteins based on zinc fingers (ZFs) and Transcription Activator-Like Effectors (TALEs) (7, 8). However, the field has been revolutionized by the discovery of the RNA-guided DNA-targeting platform CRISPR/Cas9 (clustered, regularly interspaced, short palindromic repeat/CRISPR-associated protein 9) (9, 10). dCas9 can be fused to heterologous effector domains to regulate transcription in a highly specific manner (13-15).
There has been considerable focus on dCas9-tethered epigenetic enzymes that alter DNA methylation. In particular, dCas9 fusions to DNMT3A/B or TET1/2 have been shown to target the deposition of 5-methylcytosine (5-mC) or the acquisition of 5-hydroxy-mC (5-hmC, considered to be the initial step in the removal of DNA methylation), respectively (16-21). Fewer studies have explored dCas9 fusions with enzymes affecting histone modifications. Gene activation has been explored using the histone acetyltransferase p300, histone demethylase LSD1 and a H3K4 methylase (22-24). Gene repression has been attempted using dCas9-KRAB fusions (15, 25). The Krüppel associated box (KRAB) domain recruits endogenous chromatin modifying complexes including the KAP1 co-repressor complex (26, 27) and the nucleosome remodeling and deacetylase (NuRD) complex (28) and thus has the potential to both trimethylate histone H3 on lysine 9 and to deacetylate histones. Catalytic domains from several other enzymes that catalyze H3K9me3 (such as G9A and SUV39H1) have been linked to either zinc finger or TALE DNA-binding domains, causing repression of the HER2 gene promoter (29). Although H3K27me3 is associated with repression, Ezh2 (the catalytic subunit of the Polycomb repressive complex 2 that causes deposition of H3K27me3) has not yet been studied as a fusion to a programmable DNA-binding domain. Importantly, in these previous studies, only changes in gene expression were used to assess the efficacy of the targeted epigenetic regulators. Few studies have monitored the changes in histone modification at the target site bound by the epigenetic regulator. However, such studies are essential to dissect the cause-and-effect relationship between histone modifications and transcriptional regulation.
The present invention is based, in part, on the development of novel fusion proteins comprising a catalytically-inactive Cas9 (dCas9) domain and an effector domain that imparts the ability of the fusion proteins to make epigenetic modifications at target chromatin sites. In particular, the inventors discovered that fusion proteins comprising dCas9 and either Krüppel-associated box (KRAB) or the N-terminal 45 amino acids of Friend of GATA1 (FOG1) were not only able to effect epigenetic modifications of chromatin, but were also particularly potent transcriptional repressors.
As used herein, the following terms have the meanings ascribed to them unless specified otherwise.
The terms “a,” “an,” or “the” as used herein not only include aspects with one member, but also include aspects with more than one member. For instance, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a cell” includes a plurality of such cells and reference to “the agent” includes reference to one or more agents known to those skilled in the art, and so forth.
The terms “about” and “approximately” as used herein shall generally mean an acceptable degree of error for the quantity measured given the nature or precision of the measurements. Typical, exemplary degrees of error are within 20 percent (%), preferably within 10%, and more preferably within 5% of a given value or range of values. Any reference to “about X” specifically indicates at least the values X, 0.95X, 0.96X, 0.97X, 0.98X, 0.99X, 1.01X, 1.02X, 1.03X, 1.04X, and 1.05X. Thus, “about X” is intended to teach and provide written description support for a claim limitation of, e.g., “0.98X.”
The term “nucleic acid” or “polynucleotide” refers to deoxyribonucleic acids (DNA) or ribonucleic acids (RNA) and polymers thereof in either single- or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions), alleles, orthologs, SNPs, and complementary sequences as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic AcidRes. 19:5081 (1991); Ohtsuka et al., J. Biol. Chem. 260:2605-2608 (1985); and Rossolini et al., Mol. Cell. Probes 8:91-98 (1994)). The term nucleic acid is used interchangeably with gene, cDNA, and mRNA encoded by a gene.
The term “gene” means the segment of DNA involved in producing a polypeptide chain. It may include regions preceding and following the coding region (leader and trailer) as well as intervening sequences (introns) between individual coding segments (exons).
The term “amino acid” refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids. Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, γ-carboxyglutamate, and O-phosphoserine. Amino acid analogs refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an a carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid. “Amino acid mimetics” refers to chemical compounds having a structure that is different from the general chemical structure of an amino acid, but that functions in a manner similar to a naturally occurring amino acid.
There are various known methods in the art that permit the incorporation of an unnatural amino acid derivative or analog into a polypeptide chain in a site-specific manner, see, e.g., WO 02/086075.
Amino acids may be referred to herein by either the commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.
The term “conservatively modified variants” applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, “conservatively modified variants” refers to those nucleic acids that encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence, to essentially identical sequences. Because of the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode any given protein. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to any of the corresponding codons described without altering the encoded polypeptide. Such nucleic acid variations are “silent variations,” which are one species of conservatively modified variations. Every nucleic acid sequence herein that encodes a polypeptide also describes every possible silent variation of the nucleic acid. One of skill will recognize that each codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine, and TGG, which is ordinarily the only codon for tryptophan) can be modified to yield a functionally identical molecule. Accordingly, each silent variation of a nucleic acid that encodes a polypeptide is implicit in each described sequence.
As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a “conservatively modified variant” where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles of the invention.
The following eight groups each contain amino acids that are conservative substitutions for one another:
In the present application, amino acid residues are numbered according to their relative positions from the left most residue, which is numbered 1, in an unmodified wild-type polypeptide sequence.
The terms “polypeptide,” “peptide,” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues. All three terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers and non-naturally occurring amino acid polymers. As used herein, the terms encompass amino acid chains of any length, including full-length proteins, wherein the amino acid residues are linked by covalent peptide bonds.
The term “expression vector” refers to a nucleic acid construct, generated recombinantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular polynucleotide sequence (e.g., encoding a fusion protein of the present invention or a guide RNA molecule) in a host cell. An expression cassette may be part of a plasmid, viral genome, or nucleic acid fragment. Typically, an expression cassette includes a polynucleotide to be transcribed, operably linked to a promoter. Other elements that may be present in an expression cassette include those that enhance transcription (e.g., enhancers) and terminate transcription (e.g., terminators), as well as those that confer certain binding affinity or antigenicity to the recombinant protein produced from the expression cassette.
The term “enhancer of zeste homolog 2 (Ezh2)” refers to a histone-lysine-N-methyltransferase that is encoded by the EZH2 gene and participates, for example, in the methylation of lysine 27 of histone H3 (H3K27). Ezh2 methylation facilitates the formation of heterochromatin and subsequent suppression of gene expression. Ezh2 serves as the catalytic subunit of the Polycomb Repressive Complex 2 (PRC2), which plays important roles in embryonic development through the epigenetic regulation of genes that are involved with development and differentiation. Non-limiting examples of human Ezh2 mRNA sequences are set forth under NCBI Reference Sequence identifiers NM_004456.4→NP_004447.2 (transcript variant 1), NM_152998.2→NP_694543.1 (transcript variant 2), NM_001203247.1→NP_001190176.1 (transcript variant 3), NM_ NM_001203248.1→NP_001190177.1 (transcript variant 4), and NM_001203249.1→NP_001190178.1 (transcript variant 5). A non-limiting example of a human Ezh2 amino acid sequence is set forth under NCBI Reference Sequence identifier AAH10858.1 (SEQ ID NO:67). A non-limiting example of a mouse Ezh2 amino acid sequence is set forth under NCBI Reference Sequence identifier NP_031997.2.
The term “conserved cysteine-rich (CXC) domain” refers to a region near the C-terminal end of Ezh2, located N-terminal to the SET domain, that is coordinated by two clusters of three zinc ions. Mutations within the CXC domain are associated with a decrease in histone methyltransferase activity. As a non-limiting example, the CXC domain of Ezh2 in humans spans from about amino acid 503 to about amino acid 605 of the amino acid sequence set forth under NCBI Reference Sequence identifier AAH10858.1 (SEQ ID NO:67).
The term “Su(var)3-9, Enhancer-of-zeste and Trithorax (SET) domain” refers to a protein domain that is commonly present as part of a larger multidomain protein. In the context of the present invention, a SET domain is found near the C-terminal end of Ezh2, spanning the region from about amino acid 617 to about amino acid 738. The SET domain functions as the catalytic active site of Ezh2, and can play a role in determining substrate preference. For example, mutating the tyrosine at amino acid position 641 to phenylalanine has been shown to convey a preference for H3K27 trimethylation.
The term “embryonic ectoderm development (EED) binding domain” refers to a region near the N-terminal end of Ezh2, spanning the region from about amino acid 39 to about amino acid 67, that is required for histone methyltransferase activity. In particular, the EED binding domain is required for recognition of Ezh2 by the EED protein, which is part of the PRC2.
The term “Friend of GATA1 (FOG1)” refers to a zinc finger protein also known as ZFPM1 that is encoded by the ZFPM1 gene and is a cofactor of the GATA1 transcription factor. FOG1 is involved with recruiting the nucleosome remodeling and deacetylase (NuRD) complex to target sites, causing deacetylation, as well as methylation of lysine 27 of histone H3 via the recruitment of the PRC2. A non-limiting example of a human FOG1 mRNA sequence is set forth under NCBI Reference Sequence identifier NM_NM_153813.2→NP_722520.2. A non-limiting example of a human FOG1 amino acid sequence is set forth under NCBI Reference Sequence identifier AAN45858.1.
The term “euchromatic histone-lysine N-methyltransferase 2 (G9A)” refers to a histone methyltransferase that is also known as EHMT2 and is encoded by the EHMT2 gene in humans. G9A participates in the methylation of lysine 9 of histone H3 (H3K9), which is associated with the suppression of gene expression. Non-limiting examples of human G9A mRNA sequences are set forth under NCBI Reference Sequence identifiers NM'001289413.1 4→NP_001276342.1 (transcript variant 1), NM_006709.4→NP_006700.3 (transcript variant 2), NM_025256.6→NP_079532.5 (transcript variant 3), and NM_001318833.1→NP_001305762.1 (transcript variant 4).
The term “histone-lysine N-methyltransferase SUV39H1 (SUV39H1)” refers to an enzyme that is encoded by the SUV39H1 gene in humans. SUV39H1 contains an N-terminal chromodomain and a C-terminal Su(var)3-9, Enhancer-of-zeste and Trithorax (SET) domain. SUV39H1 participates in the methylation of H3K9, which is associated with the suppression of gene expression. Non-limiting examples of human SUV39H1 mRNA sequences are set forth under NCBI Reference Sequence identifiers NM_001282166.1→NP_001269095.1 (transcript variant 1) and NM_003173.3→NP_003164.1 (transcript variant 2). A non-limiting example of an SUV39H1 amino acid sequence is set forth under NCBI Reference Sequence identifier NP_003164.1 (isoform 2).
The term “Krüppel-associated box (KRAB)” refers to a group of transcriptional repression domains that are present in about 400 human zinc finger protein-based transcription factors in humans. Typically, the KRAB domain contains about 75 amino acid residues, although the minimal repression module contains about 45 amino acid residues. Similar to FOG1, the KRAB domain functions by recruiting chromatin-modifying complexes to target sites. KRAB participates in the trimethylation of H3K9, which is achieved with the KRAB-associated protein 1 (KAP1) co-repressor complex. Over 10 independently coded KRAB domains have been identified that are functional suppressors of transcription. Non-limiting examples of human genes that encode KRAB zinc finger proteins include ZNF10, ZNF708, ZNF43, ZNF184, ZNF91, HPF4, HTF10 and HTF34. A non-limiting example of a human KRAB amino acid sequence is set forth under NCBI Reference Sequence identifier NP_056209.2.
The term “DNA (cytosine-5)-methyltransferase 3A (DNMT3A)” refers to a methyltransferase that is encoded by the DNMT3A gene in humans. DNMT3A catalyzes the methylation of CpG structures within DNA, in particular de novo DNA methylation, as opposed to maintenance methylation. DNA methylation by DNMT3A plays roles in cellular differentiation, embryonic development, transcriptional regulation (e.g., suppression of gene expression), heterochromatin formation, X-inactivation, imprinting, and the maintenance of genome stability. Non-limiting examples of human DNMT3A mRNA sequences are set forth under NCBI Reference Sequence identifiers NM_175629.2→NP_783328.1 (transcript variant 1), NM_153759.3 →NP_715640.2 (transcript variant 2), NM_022552.4→NP_072046.2 (transcript variant 3), NM_175630.1→NP_783329.1 (transcript variant 4), and NM_001320892.1→NP_001307821.1 (transcript variant 5).
The term “effector domain” refers to a protein, or a functional portion thereof, that modifies chromatin or a component thereof (e.g., a nucleic acid (e.g., DNA), a nucleotide, or a protein (e.g., a histone)). The chromatin or component thereof can be directly modified by the effector domain, or can be indirectly modified, e.g., by another protein that interacts with or is recruited by the effector domain. Non-limiting examples of modifications include methylation, demethylation, trimethylation, demethylation, acetylation, deacetylation, citrullination, and combinations thereof. In some embodiments, the effector domain produces two or more different modifications (e.g., deacetylation, followed by methylation). In such embodiments, the two or more different modifications can be achieved by the effector domain interacting with different additional proteins (i.e., the effector domain recruits or interacts with different proteins, each of which participates in or produces a different modification. As non-limiting examples, an effector domain can participate in the epigenetic modification of nucleotides, specific structures within nucleic acids (e.g., CpG structures), histones (e.g., histone H3), specific amino residues within a histone (e.g., lysine residues such as lysine 9 or lysine 27 of histone H3), or any combination thereof.
The terms “nuclear localization signal (NLS)” and “nuclear localization signal domain” refer to a peptide comprising an amino acid sequence that causes a protein (i.e., a protein to which the NLS is attached) to be imported into the nucleus of a cell. Typically, an NLS comprises one or more short sequences of positively charged amino acid residues (e.g., lysine, arginine). NLSs are commonly classified as being either classical or non-classical. Furthermore, classical NLSs are commonly classified as being either monopartite or bipartite, wherein bipartite NLSs contain two clusters of basic amino acid residues that are separated by a short peptide linker (e.g., a peptide of about 10 amino acids in length). A non-limiting example of a monopartite NLS is the Simian Vacuolating Virus 40 (SV40) NLS, having the sequence PKKKRKV (SEQ ID NO:68) or PKKKRKVG (SEQ ID NO:69). A non-limiting example of a bipartite NLS is KRPAATKKAGQAKKKK (SEQ ID NO:70). Classical NLSs are commonly recognized by the importin α class of nuclear import adaptor proteins, which are in turn recognized by importin β. Non-classical NLSs are typically recognized by importin β receptors without the involvement of importin α proteins.
The term “FLAG epitope tag” refers to a peptide having the sequence motif DYKDDDDK (SEQ ID NO:65). FLAG epitope tags can be used for protein purification (e.g., by affinity chromatography). In addition, by using antibodies that recognize the FLAG epitope, FLAG epitope tags can be used for the detection of proteins (i.e., when the protein or a complex comprising the protein contains the FLAG epitope tag), which is especially useful when no antibody specific for the protein of interest is readily available. In some instances, a FLAG epitope tag comprises a longer sequence, such as DYKDHDGDYKDHDIDYKDDDDK (SEQ ID NO:66).
The term “amino acid linker” refers to a contiguous sequence of amino acids that links one domain or portion of a fusion protein of the present invention to another. Amino acid linkers can contain natural amino acids, unnatural amino acids, or a combination thereof. In the context of the present invention, amino acids linkers commonly comprise a combination of glycine and serine amino acids. An amino acid linker can be of any length, and can contain any number of repeat units (e.g., repeat units comprising the sequence GGS (SEQ ID NO: 71)). Repeat units can be of any length.
The term “dCas9” refers to a Cas9 nuclease that contains one or more mutations that decrease or abolish the nuclease activity of Cas9, but leave the ability of Cas9 to function as an RNA-guided DNA-binding protein intact. As a non-limiting example, dCas9 can refer to a Cas9 nuclease that contains the two single amino acid mutations D10A and H840A, which render Cas9 catalytically inactive. The terms “catalytically inactive Cas9 domain” and “dCas9 domain” refer to a dCas9 protein, or a functional portion thereof (i.e., a portion of dCas9 that retains the ability of the protein to function as an RNA-guided DNA-binding protein), that recognizes and binds to a Cas9 recognition site as described herein.
In one aspect, the present invention provides a fusion protein comprising (1) a catalytically inactive Cas nuclease (e.g., catalytically inactive Cas9, or dCas9) domain and (2) an effector domain. In some embodiments, the effector domain is selected form the group consisting of enhancer of zeste homolog 2 (Ezh2), Friend of GATA1 (FOG1), histone H3 lysine 9 methyltransferase G9A (G9A), histone-lysine N-methyltransferase SUV39H1 (SUV39H1), Krüppel-associated box (KRAB), and DNA (cytosine-5)-methyltransferase 3A (DNMT3A). The fusion protein can contain 1, 2, 3, 4, 5, or 6 effector domains selected from the group consisting of Ezh2, FOG1, G9A, SUV39H1, KRAB, and DNMT3A. In particular embodiments, the effector domain is Ezh2 and/or FOG1. In some embodiments, the fusion protein comprises a DNMT3A effector domain and a full-length Ezh2 domain. The effector domain can comprise a full-length protein (e.g., full-length Ezh2, FOG1, G9A, SUV39H1, KRAB, or DNMT3A) or can comprise a functional portion or fragment of the full-length protein. In some embodiments, the effector domain comprises a catalytic domain of the full-length protein (e.g., a domain that is capable of producing an epigenetic modification (e.g., acetylation or methylation)).
The effector domain can be located either N-terminal or C-terminal to the catalytically inactive Cas nuclease (e.g., dCas9) domain. In some embodiments, the effector domain is located both N-terminal and C-terminal to the catalytically inactive Cas nuclease domain. In some embodiments, the fusion protein further comprises a nuclear localization signal (NLS) domain. In other embodiments, the fusion protein further comprises a FLAG epitope tag. In some embodiments, the fusion protein further comprises an amino acid linker. The fusion protein can comprise 1, 2, 3, 4, 5, or more NLS domains, FLAG epitope tags, and/or amino acid linkers, which can be present in any number of combinations. In some embodiments, the fusion protein comprises two NLS domains. In other embodiments, the fusion protein comprises a FLAG epitope tag. In particular embodiments, the fusion protein comprises two NLS domains and a FLAG epitope tag.
In some embodiments, the amino acid linker has about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 amino acid residues. In other embodiments, the amino acid linker has at least about 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, or more amino acid residues. In some embodiments, the amino acid linker comprises one or more repeat units (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more repeat units). Each repeat unit can have, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more amino acid residues. The amino acids can be natural amino acids, unnatural amino acids, or any combination thereof. In some embodiments, the amino acid linker comprises repeat units that have three amino acids (e.g., GGS (SEQ ID NO: 71)). In particular embodiments, the amino acid linker has the amino acid sequence (GGS)n (SEQ ID NO: 71), wherein the subscript n is the number of repeat units. In some embodiments, the amino acid linker comprises the amino acid sequence of any one of SEQ ID NOS:71-80. In some instances, n is 5 (SEQ ID NO:75).
In some embodiments, the NLS domain, the FLAG epitope tag, and/or the amino acid linker are located N-terminal to the catalytically inactive Cas nuclease (e.g., dCas9) domain. In other embodiments, the NLS domain, the FLAG epitope, and/or the amino acid linker are located C-terminal to the catalytically inactive Cas nuclease domain. In some embodiments, the NLS domain, the FLAG epitope tag, and/or the amino acid linker are located both N-terminal to the catalytically inactive Cas nuclease domain and C-terminal to the catalytically inactive Cas nuclease domain. As non-limiting examples, amino acid linkers can be located between the catalytically inactive Cas nuclease domain and the effector domain, between two or more effector domains (i.e., when the fusion protein comprises a plurality of fusion domains), between the catalytically inactive Cas nuclease domain and the NLS domain, between the catalytically inactive Cas nuclease domain and the FLAG epitope tag, between the FLAG epitope tag and the NLS domain, or any combination thereof.
When the fusion protein comprises two or more effector domains, they can be all of the same type, they can each be different, or a combination thereof. The effector domains can be located N-terminal to the catalytically inactive Cas nuclease (e.g., dCas9) domain, C-terminal to the catalytically inactive Cas nuclease domain, or both N-terminal to the catalytically inactive Cas nuclease domain and C-terminal to the catalytically inactive Cas nuclease domain. In some embodiments, the fusion protein comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more effector domains that are N-terminal to the catalytically inactive Cas nuclease domain. In other embodiments, the fusion protein comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more effector domains that are C-terminal to the catalytically inactive Cas nuclease domain. In still other embodiments, the fusion protein comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more effector domains that are N-terminal to the catalytically inactive Cas nuclease domain and 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more effector domains that are C-terminal to the catalytically inactive Cas nuclease domain. In some instances, the fusion protein comprises one effector domain that is N-terminal to the catalytically inactive Cas nuclease domain. In other instances, the fusion protein comprises one effector domain that is C-terminal to the catalytically inactive Cas nuclease domain. In some other instances, the fusion protein comprises one or more effector domains that are N-terminal to the catalytically inactive Cas nuclease domain the one or more effector domains that are C-terminal to the catalytically inactive Cas nuclease domain. In still other instances, the fusion protein comprises, 2, 3, or 4 effector domains that are N-terminal to the catalytically inactive Cas nuclease domain.
In some embodiments, the effector domain is KRAB and is located N-terminal to a dCas9 domain. In particular embodiments, the fusion protein comprises a single KRAB effector domain, and the KRAB effector domain is not located C-terminal to a dCas9 domain. In other embodiments, the effector domain is DNMT3A and is located N-terminal to a dCas9 domain. In particular embodiments, the fusion protein comprises a single DNMT3A effector domain, and the DNMT3A effector domain is not located C-terminal to a dCas9 domain.
In some embodiments, the fusion protein comprises an effector domain that comprises a functional portion of Ezh2. In some embodiments, the functional portion of Ezh2 comprises the Su(var)3-9, Enhancer-of-zeste and Trithorax (SET) domain. In particular embodiments, the functional portion of Ezh2 comprises the CXC and SET domains. In some embodiments, the functional portion of Ezh2 further comprises the embryonic ectoderm development (EED) binding domain. In particular embodiments, the functional portion of Ezh2 comprises the SET domain, the CXC domain, and the EED binding domain. In some instances, the fusion protein comprises an effector domain that comprises a full-length Ezh2 protein. As a non-limiting example, the full-length Ezh2 effector domain can comprise amino acids 1-746 of SEQ ID NO:1.
In some embodiments, the fusion protein comprises an effector domain that comprises a functional portion of FOG1. In some embodiments, the functional portion of FOG1 comprises the N-terminal 45 amino acids of a full-length FOG1 protein (e.g., the full-length FOG1 protein having the amino acid sequence set forth under NCBI Reference Sequence identifier AAN45858.1). As a non-limiting example, the functional portion of FOG1 can comprise the amino acid sequence set forth in SEQ ID NO:3 (FOG1[1-45]). In other embodiments, the fusion protein comprises an effector domain the comprises a full-length FOG1 protein (e.g., the full-length FOG1 protein having the amino acid sequence set forth under NCBI Reference Sequence identifier AAN45858.1). When the fusion protein comprises a plurality of FOG1 effector domains, they can all comprise a functional portion of FOG1, they can all comprise full-length FOG1, or a combination thereof. In some embodiments, the fusion protein comprises 1, 2, 3, 4, or more effector domains, wherein each effector domain comprises the amino acid sequence set forth in SEQ ID NO:3 (FOG1[1-45]). In some embodiments, the functional portion of FOG1 comprises the N-terminal 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, or 45 amino acids of a full-length FOG1 protein (e.g., the full-length FOG1 protein having the amino acid sequence set forth under NCBI Reference Sequence identifier AAN45858.1). In other embodiments, the functional portion of FOG1 comprises the first about 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1,000, or more N-terminal amino acids of a full-length FOG1 protein (e.g., the full-length FOG1 protein having the amino acid sequence set forth under NCBI Reference Sequence identifier AAN45858.1).
In some embodiments, the fusion protein comprises an effector domain that comprises the amino acid sequence set forth in SEQ ID NO:3 (FOG1[1-45]). In some instances, the fusion protein further comprises an NLS domain that is located at the N-terminal end of the fusion protein. In some instances, the fusion protein further comprises an NLS domain that is located at the C-terminal end of the fusion protein. In particular embodiments, the fusion protein comprises a first NLS domain is located at the N-terminal end of the fusion protein, and a second NLS domain is located at the C-terminal end of the fusion protein. In some embodiments, the fusion protein further comprises a FLAG epitope tag that is located between the NLS domain that the N-terminal end of the protein and the N-terminal end of the catalytically inactive Cas nuclease (e.g., dCas9) domain. As a non-limiting example, the fusion protein can further comprise a FLAG epitope tag that is located between the first NLS domain and the N-terminal end of the catalytically inactive Cas nuclease domain.
In some embodiments, the FOG1 effector domain comprises 1, 2, 3, or 4 FOG1 effector domains that are located between the FLAG epitope tag and the N-terminal end of the catalytically inactive Cas nuclease (e.g., dCas9) domain. In some instances, each of the 1, 2, 3, or 4 FOG1 effector domains comprises the amino acid sequence set forth in SEQ ID NO:3 (FOG1[1-45]). In particular embodiments, the fusion protein further comprises one or more amino acid linkers. In some instances, the amino acid linker(s) comprise the amino acid sequence (GGS)n, wherein n is 5 (SEQ ID NO:75). The amino acid linker(s) can be located between the FOG1 effector domain (e.g., 1, 2, 3, or 4 FOG1 effector domains) and the N-terminal end of the catalytically inactive Cas nuclease domain, between the C-terminal end of the catalytically inactive Cas nuclease domain and the second NLS domain, between adjacent FOG1 effector domains (i.e., when the fusion protein contains 2 or more FOG1 effector domains), between the FOG1 effector domain and the FLAG epitope tag, between the FLAG epitope tag and the first NLS domain, or a combination thereof.
In some embodiments, the FOG1 effector domain is located between the second NLS domain and the C-terminal end of the catalytically inactive Cas nuclease (e.g., dCas9) domain. In some instances, the FOG1 effector domain comprises the amino acid sequence set forth in SEQ ID NO:3 (FOG1[1-45]). In particular embodiments, the fusion protein further comprises one or more amino acid linkers. In some instances, the amino acid linker(s) comprise the amino acid sequence (GGS)n, wherein n is 5 (SEQ ID NO:75). The amino acid linker(s) can be located between the FLAG epitope tag and the N-terminal end of the catalytically inactive Cas nuclease domain, between the C-terminal end of the catalytically inactive Cas nuclease domain and the FOG1 effector domain, between the FOG1 effector domain and the second NLS domain, between the FLAG epitope tag and the first NLS domain, or a combination thereof.
In some embodiments, a first FOG1 effector domain is located between the FLAG epitope tag and the N-terminal end of the catalytically inactive Cas nuclease (e.g., dCas9) domain, and a second FOG1 effector domain is located between the C-terminal end of the catalytically inactive Cas nuclease domain and the second NLS domain. In some instances, one or both FOG1 effector domains comprise the amino acid sequence set forth in SEQ ID NO:3 (FOG1[1-45]). In particular embodiments, the fusion protein further comprises one or more amino acid linkers. In some instances, the amino acid linker(s) comprise the amino acid sequence (GGS)n, wherein n is 5 (SEQ ID NO:75). The amino acid linker(s) can be located between the first FOG1 effector domain and the N-terminal end of the catalytically inactive Cas nuclease domain, between the C-terminal end of the catalytically inactive Cas nuclease domain and the second FOG1 effector domain, between the first FOG1 effector domain and the FLAG epitope tag, between the second FOG1 effector domain and the second NLS domain, between the FLAG epitope tag and the first NLS domain, or a combination thereof.
In some embodiments, the fusion protein comprises the amino acid sequence of SEQ ID NO:9. In some embodiments, the fusion protein comprises the amino acid sequence of SEQ ID NO:10. In some embodiments, the fusion protein comprises the amino acid sequence of SEQ ID NO:9 or 10, and the position denoted by “Z” comprises the amino acid sequence of SEQ ID NO:1. In some embodiments, the fusion protein comprises the amino acid sequence of SEQ ID NO:9 or 10, and the position denoted by “Z” comprises the amino acid sequence of SEQ ID NO:2. In some embodiments, the fusion protein comprises the amino acid sequence of SEQ ID NO:9 or 10, and the position denoted by “Z” comprises the amino acid sequence of SEQ ID NO:3. In some embodiments, the fusion protein comprises the amino acid sequence of SEQ ID NO:9 or 10, and the position denoted by “Z” comprises the amino acid sequence of SEQ ID NO:4. In some embodiments, the fusion protein comprises the amino acid sequence of SEQ ID NO:9 or 10, and the position denoted by “Z” comprises the amino acid sequence of SEQ ID NO:5. In some embodiments, the fusion protein comprises the amino acid sequence of SEQ ID NO:9 or 10, and the position denoted by “Z” comprises the amino acid sequence of SEQ ID NO:6. In some embodiments, the fusion protein comprises the amino acid sequence of SEQ ID NO:9 or 10, and the position denoted by “Z” comprises the amino acid sequence of SEQ ID NO:7.
In some embodiments, the fusion protein comprises an amino acid sequence having at least about 90% (e.g., at least about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%) identity to any one of SEQ ID NOS:81-94. In some embodiments, the fusion protein comprises the amino acid sequence of any one of SEQ ID NOS:81-94.
Fusion proteins of the present invention comprise a catalytically inactive Cas nuclease domain that comprises a catalytically inactive Cas nuclease (e.g., catalytically inactive Cas9, or dCas9) protein, or a fragment thereof, that has the ability to target a particular polynucleotide sequence (e.g., a Cas9 recognition site) within a target chromatin site. As a non-limiting example, a catalytically inactive variant of Cas9 can be used in which the Cas9 contains two single amino acid mutations (e.g., D10A, H840A) that abolish its nuclease activity, giving rise to an RNA-guided DNA-binding protein that lacks enzymatic activity (dCas9) (10). Typically, the catalytically inactive Cas nuclease domain will comprise a Cas nuclease, or a fragment thereof, that contains one or more mutations that abolish or decrease the ability of the Cas nuclease to cleave DNA, but allow the Cas nuclease to retain the ability to recognize a desired polynucleotide sequence.
Cas nucleases (e.g., Cas9) are part of what is known as the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated protein (Cas) nuclease system, which is an engineered nuclease system that is based on the adaptive immune response of many bacteria and archaea. Briefly, when a bacterium is invaded by a virus or plasmid, segments of the viral or plasmid DNA are converted into CRISPR RNAs (crRNA) by the “immune” response. The crRNA then associates with a type of RNA called tracrRNA to guide the Cas nuclease to a region that is homologous to the crRNA in the target DNA called a “protospacer.” In the case of catalytically active Cas nucleases, the DNA is cleaved by the Cas nuclease to generate blunt ends at double-strand break sites that are specified by a guide sequence contained within the crRNA transcript that is about 20 nucleotides in length. Depending on the particular Cas nuclease, both the crRNA and the tracrRNA may be required for site-specific DNA recognition and cleavage. This system has been modified such that the crRNA and tracrRNA, if needed, can be combined into one molecule (i.e., a “single guide RNA” or “sgRNA”), and the crRNA equivalent portion of the guide RNA can be engineered to guide the Cas (e.g., Cas9) nuclease to target any desired sequence (e.g., a nucleotide sequence within a target chromatin site).
Catalytically inactive variants of any number of Cas nucleases can be used in fusion proteins of the present invention. There are three main types of Cas nucleases (type I, type II, and type III), and 10 subtypes including 5 type I, 3 type II, and 2 type III proteins. Type II Cas nucleases include Cas1, Cas2, Csn2, Cas9, and Cpf1. A number of Cas nucleases will be known to one of skill in the art, for which catalytically inactive variants (e.g., mutants) thereof and homologs, fragments, derivatives, and combinations of the catalytically inactive variants find utility in fusion proteins of the present invention.
Non-limiting examples of additional Cas nucleases for which catalytically inactive variants find utility in fusion proteins of the present invention include Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cash, Cas7, Cas8, Cas10, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, Cpf1. For each of these examples, one of skill in the art will be able to identify mutants in which catalytic ability is abolished or decreased, but polynucleotide sequence-targeting ability is retained.
Catalytically inactive variants of Cas nucleases can be derived from a variety of bacterial species including, but not limited to, Veillonella atypical, Fusobacterium nucleatum, Filifactor alocis, Solobacterium moorei, Coprococcus catus, Treponema denticola, Peptoniphilus duerdenii, Catenibacterium mitsuokai, Streptococcus mutans, Listeria innocua, Staphylococcus pseudintermedius, Acidaminococcus intestine, Olsenella uli, Oenococcus kitaharae, Bifidobacterium bifidum, Lactobacillus rhamnosus, Lactobacillus gasseri, Finegoldia magna, Mycoplasma mobile, Mycoplasma gallisepticum, Mycoplasma ovipneumoniae, Mycoplasma canis, Mycoplasma synoviae, Eubacterium rectale, Streptococcus thermophilus, Eubacterium dolichum, Lactobacillus coryniformis subsp. Torquens, Ilyobacter polytropus, Ruminococcus albus, Akkermansia muciniphila, Acidothermus cellulolyticus, Bifidobacterium longum, Bifidobacterium dentium, Corynebacterium diphtheria, Elusimicrobium minutum, Nitratifractor salsuginis, Sphaerochaeta globus, Fibrobacter succinogenes subsp. Succinogenes, Bacteroides fragilis, Capnocytophaga ochracea, Rhodopseudomonas palustris, Prevotella micans, Prevotella ruminicola, Flavobacterium columnare, Aminomonas paucivorans, Rhodospirillum rubrum, Candidatus Puniceispirillum marinum, Verminephrobacter eiseniae, Ralstonia syzygii, Dinoroseobacter shibae, Azospirillum, Nitrobacter hamburgensis, Bradyrhizobium, Wolinella succinogenes, Campylobacter jejuni subsp. Jejuni, Helicobacter mustelae, Bacillus cereus, Acidovorax ebreus, Clostridium perfringens, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria meningitidis, Pasteurella multocida subsp. Multocida, Sutterella wadsworthensis, proteobacterium, Legionella pneumophila, Parasutterella excrementihominis, Wolinella succinogenes, and Francisella novicida. For each of these examples, one of skill in the art will be able to clone nucleases and subsequently identify mutants in which catalytic ability is abolished or decreased, but polynucleotide sequence-targeting ability is retained.
“Cas9” refers to a particular type II Cas nuclease that is an RNA-guided double-stranded DNA-binding nuclease protein. Catalytically active Cas9 nuclease has two functional domains, e.g., RuvC and HNH, that cut different DNA strands. Cas9 requires two RNA molecules (e.g., a crRNA and a tracrRNA), or alternatively, a single guide RNA (sgRNA) that comprises a crRNA and a tracrRNA. Cas9 utilizes a G-rich protospacer-adjacent motif (PAM) that is 3′ of the guide RNA targeting sequence and creates double-strand cuts having blunt ends. As non-limiting examples, the amino acid sequence of the Streptococcus pyogenes wild-type Cas9 polypeptide is set forth, e.g., in NBCI Ref. Seq. No. NP_269215 and the amino acid sequence of Streptococcus thermophilus wild-type Cas9 polypeptide is set forth, e.g., in NBCI Ref. Seq. No. WP_011681470.
The fusion proteins of the present invention are typically guided to a target site (e.g., a target chromatin site containing a Cas recognition site (e.g., Cas9 recognition site)) by a guide RNA (gRNA) (e.g., a single guide RNA (sgRNA)). The gRNAs for use in the methods of the present invention typically include a crRNA sequence that is complementary to a target nucleic acid sequence and may include a scaffold sequence (e.g., tracrRNA) that interacts with a Cas nuclease variant (e.g., dCas9) or fragment thereof, depending on the particular nuclease being used.
The gRNA molecule can comprise any nucleic acid sequence, so long as the sequence has sufficient complementarity to the intended target polynucleotide sequence (e.g., target DNA sequence at or near the target chromatin site) to permit hybridization with the target sequence and direct sequence-specific binding of a catalytically inactive Cas nuclease domain (and thus the fusion protein) to the target sequence. The gRNA molecule typically recognizes a PAM sequence that is near or adjacent to the target sequence. The target DNA site may be located immediately 5′ of a PAM sequence, the PAM sequence being specific to the particular bacterial species of the catalytically inactive Cas nuclease being used. Non-limiting examples of PAM sequences include NGG (Streptococcus pyogenes), NNNNGATT (Neisseria meningitidis), NNAGAA (Streptococcus thermophilus), NAAAAC (Treponema denticola). The PAM sequence can be NGG, wherein N is any nucleotide, NRG, wherein N is any nucleotide and R is a purine, or NNGRR, wherein N is any nucleotide and R is a purine. For Cas nucleases derived from S. pyogenes, the target sequence should immediately precede (i.e., be located 5′ of) a 5′NGG PAM.
In some embodiments, the degree of complementarity between a guide sequence of the gRNA (i.e., the crRNA sequence) and its corresponding target sequence is about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more. In some embodiments, a crRNA sequence is about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides in length. In some embodiments, the crRNA sequence is about 20, 21, 22, 23, 24, or 25 nucleotides in length.
In some embodiments, the length of the gRNA molecule is about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, or more nucleotides in length. In some instances, the length of the gRNA is about 100 nucleotides in length.
Non-limiting examples of algorithms for determining sequence complementarity include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND, SOAP, and Maq.
Basic texts disclosing general methods and techniques in the field of recombinant genetics include Sambrook and Russell, Molecular Cloning, A Laboratory Manual (3rd ed. 2001); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); and Ausubel et al., eds., Current Protocols in Molecular Biology (1994).
For nucleic acids, sizes are given in either kilobases (kb) or base pairs (bp). These are estimates derived from agarose or acrylamide gel electrophoresis, from sequenced nucleic acids, or from published DNA sequences. For proteins, sizes are given in kilodaltons (kDa) or amino acid residue numbers. Proteins sizes are estimated from gel electrophoresis, from sequenced proteins, from derived amino acid sequences, or from published protein sequences.
Oligonucleotides that are not commercially available can be chemically synthesized, e.g., according to the solid phase phosphoramidite triester method first described by Beaucage & Caruthers, Tetrahedron Lett. 22: 1859-1862 (1981), using an automated synthesizer, as described in Van Devanter et. al., Nucleic Acids Res. 12: 6159-6168 (1984). Purification of oligonucleotides is performed using any art-recognized strategy, e.g., native acrylamide gel electrophoresis or anion-exchange HPLC as described in Pearson & Reanier, J. Chrom. 255: 137-149 (1983).
The sequence of a protein domain or gene of interest, such as a Cas nuclease (e.g., Cas9) domain or an effector domain, can be verified after cloning or subcloning using, e.g., the chain termination method for sequencing double-stranded templates of Wallace et al., Gene 16: 21-26 (1981).
A large number of possible tags may be used for practicing the present invention. Non-limiting examples include: biotin (small molecule); StrepTag (StrepII) (8 a.a.); SBP (38 a.a.); biotin carboxyl carrier protein or BCCP (100 a.a.); epitope tags such as FLAG (8 a.a.), 3× FLAG (22 a.a.), and myc (22 a.a.); S-tag (Novagen) (15 a.a.); Xpress (Invitrogen) (25 a.a.); eXact (Bio-Rad) (75 a.a.); HA (9 a.a.); VSV-G (11 a.a.); Protein A/G (280 a.a.); HIS (6-10 a.a.) (SEQ ID NO: 96); glutathione s-transferase or GST (218 a.a.); maltose binding protein or MBP (396 a.a.); CBP (28 a.a.); CYD (5 a.a.); HPC (12 a.a.); CBD intein-chitin binding domain (51 a.a.); Trx (Invitrogen) (109 a.a.); NorpA (5 a.a.); and NusA (495 a.a.).
In another aspect, the present invention provides nucleic acids that comprise a polynucleotide sequence encoding a fusion protein of the present invention. The rapid progress in the studies of human genome has made possible a cloning approach where a human DNA sequence database can be searched for any gene segment that has a certain percentage of sequence homology to a known nucleotide sequence, such as one encoding a previously Cas nuclease (e.g., Cas9) or an effector domain protein described herein. Any DNA sequence so identified can be subsequently obtained by chemical synthesis and/or a polymerase chain reaction (PCR) technique such as overlap extension method. For a short sequence, completely de novo synthesis may be sufficient; whereas further isolation of full length coding sequence from a human cDNA or genomic library using a synthetic probe may be necessary to obtain a larger gene.
Alternatively, a nucleic acid sequence can be isolated from a cDNA or genomic DNA library (e.g., human or rodent cDNA or genomic DNA library) using standard cloning techniques such as polymerase chain reaction (PCR), where homology-based primers can often be derived from a known nucleic acid sequence. Most commonly used techniques for this purpose are described in standard texts, e.g., Sambrook and Russell, supra.
cDNA libraries may be commercially available or can be constructed. The general methods of isolating mRNA, making cDNA by reverse transcription, ligating cDNA into a recombinant vector, transfecting into a recombinant host for propagation, screening, and cloning are well known (see, e.g., Gubler and Hoffman, Gene, 25: 263-269 (1983); Ausubel et al., supra). Upon obtaining an amplified segment of nucleotide sequence by PCR, the segment can be further used as a probe to isolate the full length polynucleotide sequence encoding the protein of interest from the cDNA library. A general description of appropriate procedures can be found in Sambrook and Russell, supra.
A similar procedure can be followed to obtain a full-length sequence encoding a protein of interest from a human genomic library. Human genomic libraries are commercially available or can be constructed according to various art-recognized methods. In general, to construct a genomic library, the DNA is first extracted from a tissue where a protein of interest is likely found. The DNA is then either mechanically sheared or enzymatically digested to yield fragments of about 12-20 kb in length. The fragments are subsequently separated by gradient centrifugation from polynucleotide fragments of undesired sizes and are inserted in bacteriophage λ vectors. These vectors and phages are packaged in vitro. Recombinant phages are analyzed by plaque hybridization as described in Benton and Davis, Science, 196: 180-182 (1977). Colony hybridization is carried out as described by Grunstein et al., Proc. Natl. Acad. Sci. USA, 72: 3961-3965 (1975).
Based on sequence homology, degenerate oligonucleotides can be designed as primer sets and PCR can be performed under suitable conditions (see, e.g., White et al., PCR Protocols: Current Methods and Applications, 1993; Griffin and Griffin, PCR Technology, CRC Press Inc. 1994) to amplify a segment of nucleotide sequence from a cDNA or genomic library. Using the amplified segment as a probe, the full-length nucleic acid encoding a protein of interest is obtained.
Upon acquiring a nucleic acid sequence encoding a protein of interest, such as a Cas nuclease (e.g., Cas9) or an effector domain protein, the coding sequence can be further modified by a number of well-known techniques such as restriction endonuclease digestion, PCR, and PCR-related methods to generate coding sequences, including mutants and variants derived from the wild-type protein. The polynucleotide sequence encoding the desired polypeptide can then be subcloned into a vector, for instance, an expression vector, so that a recombinant polypeptide can be produced from the resulting construct. Further modifications to the coding sequence, e.g., nucleotide substitutions, may be subsequently made to alter the characteristics of the polypeptide.
A variety of mutation-generating protocols are established and described in the art, and can be readily used to modify a polynucleotide sequence encoding a protein of interest. See, e.g., Zhang et al., Proc. Natl. Acad. Sci. USA, 94: 4504-4509 (1997); and Stemmer, Nature, 370: 389-391 (1994). The procedures can be used separately or in combination to produce variants of a set of nucleic acids, and hence variants of encoded polypeptides. Kits for mutagenesis, library construction, and other diversity-generating methods are commercially available.
Mutational methods of generating diversity include, for example, site-directed mutagenesis (Botstein and Shortle, Science, 229: 1193-1201 (1985)), mutagenesis using uracil-containing templates (Kunkel, Proc. Natl. Acad. Sci. USA, 82: 488-492 (1985)), oligonucleotide-directed mutagenesis (Zoller and Smith, Nucl. Acids Res., 10: 6487-6500 (1982)), phosphorothioate-modified DNA mutagenesis (Taylor et al., Nucl. Acids Res., 13: 8749-8764 and 8765-8787 (1985)), and mutagenesis using gapped duplex DNA (Kramer et al., Nucl. Acids Res., 12: 9441-9456 (1984)).
Other possible methods for generating mutations include point mismatch repair (Kramer et al., Cell, 38: 879-887 (1984)), mutagenesis using repair-deficient host strains (Carter et al., Nucl. Acids Res., 13: 4431-4443 (1985)), deletion mutagenesis (Eghtedarzadeh and Henikoff, Nucl. Acids Res., 14: 5115 (1986)), restriction-selection and restriction-purification (Wells et al., Phil. Trans. R. Soc. Lond. A, 317: 415-423 (1986)), mutagenesis by total gene synthesis (Nambiar et al., Science, 223: 1299-1301 (1984)), double-strand break repair (Mandecki, Proc. Natl. Acad. Sci. USA, 83: 7177-7181 (1986)), mutagenesis by polynucleotide chain termination methods (U.S. Pat. No. 5,965,408), and error-prone PCR (Leung et al., Biotechniques, 1: 11-15 (1989)).
C. Modification of Nucleic Acids for Preferred Codon Usage in a Host Organism
The nucleic acid comprising a polynucleotide sequence encoding a protein of interest, e.g., a fusion protein of the present invention or a portion thereof (e.g., a Cas nuclease domain, effector domain), can be further altered to coincide with the preferred codon usage of a particular host. For example, the preferred codon usage of one strain of bacterial cells can be used to derive a polynucleotide that encodes a recombinant polypeptide of the invention and includes the codons favored by this strain. The frequency of preferred codon usage exhibited by a host cell can be calculated by averaging frequency of preferred codon usage in a large number of genes expressed by the host cell (e.g., calculation service is available from web site of the Kazusa DNA Research Institute, Japan). This analysis is preferably limited to genes that are highly expressed by the host cell.
At the completion of modification, the coding sequences are verified by sequencing and are then subcloned into an appropriate expression vector for recombinant production of a protein of interest, such as a fusion protein comprising a Cas nuclease domain or a variant thereof and an effector domain or a variant thereof.
Following verification of the coding sequence, a fusion protein of the present invention can be produced using routine techniques in the field of recombinant genetics, relying on the polynucleotide sequences encoding the polypeptide disclosed herein.
To obtain high level expression of a nucleic acid encoding a fusion protein of this invention, one typically subclones a polynucleotide encoding the protein of interest in the correct reading frame into an expression vector (e.g., an expression vector of the present invention the comprises a nucleic acid of the present invention) that contains a strong promoter to direct transcription, a transcription/translation terminator and a ribosome binding site for translational initiation. Suitable bacterial promoters are well known in the art and described, e.g., in Sambrook and Russell, supra, and Ausubel et al., supra. Bacterial expression systems for expressing the polypeptide are available in, e.g., E. coli, Bacillus sp., Salmonella, and Caulobacter. Kits for such expression systems are commercially available. Eukaryotic expression systems for mammalian cells (including human cells), yeast, and insect cells are well known in the art and are also commercially available. In one embodiment, the eukaryotic expression vector is an adenoviral vector, an adeno-associated vector, or a retroviral vector.
The promoter used to direct expression of a heterologous nucleic acid depends on the particular application. The promoter is optionally positioned about the same distance from the heterologous transcription start site as it is from the transcription start site in its natural setting. As is known in the art, however, some variation in this distance can be accommodated without loss of promoter function.
In another aspect, the present invention provides host cells that have been transfected by expression vectors of the present invention (i.e., expression vectors comprising nucleic acids that comprise nucleotide sequences encoding fusion proteins of the present invention). The compositions and methods of the present invention can be used for producing epigenetic modifications in the genome of any host cell of interest. The host cell can be a cell from any organism, e.g., a bacterial cell, an archaeal cell, a cell of a single-cell eukaryotic organism, a plant cell (e.g., a rice cell, a wheat cell, a tomato cell, an Arabidopsis thaliana cell, a Zea mays cell and the like), an algal cell (e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens C. Agardh, and the like), a fungal cell (e.g., yeast cell, etc.), an animal cell, a cell from an invertebrate animal (e.g., fruit fly, cnidarian, echinoderm, nematode, etc.), a cell from a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal, etc.), a cell from a mammal, a cell from a human, a cell from a healthy human, a cell from a human patient, a cell from a cancer patient, etc. In some cases, the host cell treated by the method disclosed herein can be transplanted to a subject (e.g., patient). For instance, the host cell in which the epigenetic modification is made can be derived from the subject to be treated (e.g., patient).
Epigenetic modifications by fusion proteins of the present invention can be made in any cell of interest, such as a stem cell, e.g., embryonic stem cell, induced pluripotent stem cell, adult stem cell, e.g., mesenchymal stem cell, neural stem cell, hematopoietic stem cell, organ stem cell, a progenitor cell, a somatic cell, e.g., fibroblast, hepatocyte, heart cell, liver cell, pancreatic cell, muscle cell, skin cell, blood cell, neural cell, immune cell, and any other cell of the body, e.g., human body. The cells can be primary cells or primary cell cultures derived from a subject, e.g., an animal subject or a human subject, and allowed to grow in vitro for a limited number of passages. In some embodiments, epigenetic modifications are made in cells that are disease cells or derived from a subject with a disease. For instance, the cells can be cancer or tumor cells. The cells can also be immortalized cells (e.g., cell lines), for instance, from a cancer cell line.
Depending on the host cell and expression system used, the expression vector (e.g., for expression of a fusion protein of the present invention and/or a gRNA molecule) may contain transcription and translation control elements, including promoters, transcription enhancers, transcription terminators, and the like. Useful promoters can be derived from viruses, or any organism, e.g., prokaryotic or eukaryotic organisms. Promoters may also be inducible (i.e., capable of responding to environmental factors and/or external stimuli that can be artificially controlled). For expressing fusion proteins of the present invention, non-limiting examples of promoters that find utility in expression vectors of the present invention include RNA polymerase II promoters (e.g., pGAL7 and pTEF1), RNA polymerase III promoters (e.g., RPR-tetO, SNR52, and tRNA-tyr), the SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6), an enhanced U6 promoter, a human H1 promoter (H1), etc. Suitable terminators for use in fusion protein-expressing vectors of the present invention include, but are not limited to SNR52 and RPR terminator sequences, which can be used with transcripts created under the control of a RNA polymerase III promoter. Additionally, various primer binding sites may be incorporated into a vector to facilitate vector cloning, sequencing, genotyping, and the like. Other suitable promoter, enhancer, terminator, and primer binding sequences will readily be known to one of skill in the art.
The particular expression vector used to transport the genetic information into the cell is not particularly critical. Any of the conventional vectors used for expression in eukaryotic or prokaryotic cells may be used. Standard bacterial expression vectors include plasmids such as pBR322 based plasmids, pSKF, pET23D, and fusion expression systems such as GST and LacZ. Epitope tags can also be added to recombinant proteins to provide convenient methods of isolation, e.g., c-myc.
Expression vectors containing regulatory elements from eukaryotic viruses are typically used in eukaryotic expression vectors, e.g., SV40 vectors, papilloma virus vectors, and vectors derived from Epstein-Barr virus. Other exemplary eukaryotic vectors include pMSG, pAV009/A+, pMTO10/A+, pMAMneo-5, baculovirus pDSVE, and any other vector allowing expression of proteins under the direction of the SV40 early promoter, SV40 later promoter, metallothionein promoter, murine mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoters shown effective for expression in eukaryotic cells.
Some expression systems have markers that provide gene amplification such as thymidine kinase, hygromycin B phosphotransferase, and dihydrofolate reductase. Alternatively, high yield expression systems not involving gene amplification are also suitable, such as a baculovirus vector in insect cells, with a polynucleotide sequence encoding the protein of interest under the direction of the polyhedrin promoter or other strong baculovirus promoters.
The elements that are typically included in expression vectors also include a replicon that functions in E. coli, a gene encoding antibiotic resistance to permit selection of bacteria that harbor recombinant plasmids, and unique restriction sites in nonessential regions of the plasmid to allow insertion of eukaryotic sequences. The particular antibiotic resistance gene chosen is not critical, any of the many resistance genes known in the art are suitable. The prokaryotic sequences are optionally chosen such that they do not interfere with the replication of the DNA in eukaryotic cells, if necessary. Similar to antibiotic resistance selection markers, metabolic selection markers based on known metabolic pathways may also be used as a means for selecting transfected host cells.
When periplasmic expression of a fusion protein of the present invention is desired, the expression vector further comprises a sequence encoding a secretion signal, such as the E. coli OppA (Periplasmic Oligopeptide Binding Protein) secretion signal or a modified version thereof, which is directly connected to 5′ of the coding sequence of the protein to be expressed. This signal sequence directs the recombinant protein produced in cytoplasm through the cell membrane into the periplasmic space. The expression vector may further comprise a coding sequence for signal peptidase 1, which is capable of enzymatically cleaving the signal sequence when the recombinant protein is entering the periplasmic space. More detailed description for periplasmic production of a recombinant protein can be found in, e.g., Gray et al., Gene 39: 247-254 (1985), U.S. Pat. Nos. 6,160,089 and 6,436,674.
A person skilled in the art will recognize that various conservative substitutions can be made to any wild-type or mutant/variant protein to produce a fusion protein of the present invention. Moreover, modifications of a polynucleotide coding sequence may also be made to accommodate preferred codon usage in a particular expression host without altering the resulting amino acid sequence.
Standard transfection methods are used to produce bacterial, mammalian, yeast, insect, or plant cell lines that express large quantities of a recombinant fusion protein of this invention, which are then purified using standard techniques (see, e.g., Colley et al., J. Biol. Chem. 264: 17619-17622 (1989); Guide to Protein Purification, in Methods in Enzymology, vol. 182 (Deutscher, ed., 1990)). Transformation of eukaryotic and prokaryotic cells are performed according to standard techniques (see, e.g., Morrison, J. Bact. 132: 349-351 (1977); Clark-Curtiss & Curtiss, Methods in Enzymology 101: 347-362 (Wu et al., eds, 1983).
Any of the well-known procedures for introducing foreign nucleotide sequences into host cells may be used. These include the use of calcium phosphate transfection, polybrene, protoplast fusion, electroporation, liposomes, microinjection, plasma vectors, viral vectors and any of the other well-known methods for introducing cloned genomic DNA, cDNA, synthetic DNA, or other foreign genetic material into a host cell (see, e.g., Sambrook and Russell, supra). It is only necessary that the particular genetic engineering procedure used be capable of successfully introducing at least one gene into the host cell capable of expressing the fusion protein of this invention.
Once the expression of a recombinant fusion protein of the present invention in transfected host cells is confirmed, e.g., via an immunoassay such as Western blotting assay, the host cells are then cultured in an appropriate scale for the purpose of purifying the recombinant polypeptide.
1. Purification of Recombinantly Produced Polypeptides from Bacteria
When the fusion proteins of the present invention are produced recombinantly by transformed bacteria in large amounts, typically after promoter induction, although expression can be constitutive, the polypeptides may form insoluble aggregates. There are several protocols that are suitable for purification of protein inclusion bodies. For example, purification of aggregate proteins (hereinafter referred to as inclusion bodies) typically involves the extraction, separation and/or purification of inclusion bodies by disruption of bacterial cells, e.g., by incubation in a buffer of about 100-150 μg/ml lysozyme and 0.1% Nonidet P40, a non-ionic detergent. The cell suspension can be ground using a Polytron grinder (Brinkman Instruments, Westbury, N.Y.). Alternatively, the cells can be sonicated on ice. Additional methods of lysing bacteria are described in Ausubel et al. and Sambrook and Russell, both supra, and will be apparent to those of skill in the art.
The cell suspension is generally centrifuged and the pellet containing the inclusion bodies resuspended in buffer which does not dissolve but washes the inclusion bodies, e.g., 20 mM Tris-HCl (pH 7.2), 1 mM EDTA, 150 mM NaCl and 2% Triton-X 100, a non-ionic detergent. It may be necessary to repeat the wash step to remove as much cellular debris as possible. The remaining pellet of inclusion bodies may be resuspended in an appropriate buffer (e.g., 20 mM sodium phosphate, pH 6.8, 150 mM NaCl). Other appropriate buffers will be apparent to those of skill in the art.
Following the washing step, the inclusion bodies are solubilized by the addition of a solvent that is both a strong hydrogen acceptor and a strong hydrogen donor (or a combination of solvents each having one of these properties). The proteins that formed the inclusion bodies may then be renatured by dilution or dialysis with a compatible buffer. Suitable solvents include, but are not limited to, urea (from about 4 M to about 8 M), formamide (at least about 80%, volume/volume basis), and guanidine hydrochloride (from about 4 M to about 8 M). Some solvents that are capable of solubilizing aggregate-forming proteins, such as SDS (sodium dodecyl sulfate) and 70% formic acid, may be inappropriate for use in this procedure due to the possibility of irreversible denaturation of the proteins, accompanied by a lack of immunogenicity and/or activity. Although guanidine hydrochloride and similar agents are denaturants, this denaturation is not irreversible and renaturation may occur upon removal (by dialysis, for example) or dilution of the denaturant, allowing re-formation of the immunologically and/or biologically active protein of interest. After solubilization, the protein can be separated from other bacterial proteins by standard separation techniques. For further description of purifying recombinant polypeptides from bacterial inclusion body, see, e.g., Patra et al., Protein Expression and Purification 18: 182-190 (2000).
Alternatively, it is possible to purify recombinant polypeptides from bacterial periplasm. Where the recombinant protein is exported into the periplasm of the bacteria, the periplasmic fraction of the bacteria can be isolated by cold osmotic shock in addition to other methods known to those of skill in the art (see e.g., Ausubel et al., supra). To isolate recombinant proteins from the periplasm, the bacterial cells are centrifuged to form a pellet. The pellet is resuspended in a buffer containing 20% sucrose. To lyse the cells, the bacteria are centrifuged and the pellet is resuspended in ice-cold 5 mM MgSO4 and kept in an ice bath for approximately 10 minutes. The cell suspension is centrifuged and the supernatant decanted and saved. The recombinant proteins present in the supernatant can be separated from the host proteins by standard separation techniques well known to those of skill in the art.
2. Standard Protein Separation Techniques for Purification
When a recombinant polypeptide of the present invention, e.g., a fusion protein of the present invention is expressed in host cells (such as human cells) in a soluble form, its purification can follow the standard protein purification procedure described below. This standard purification procedure is also suitable for purifying fusion proteins obtained from chemical synthesis.
i. Solubility Fractionation
Often as an initial step, and if the protein mixture is complex, an initial salt fractionation can separate many of the unwanted host cell proteins (or proteins derived from the cell culture media) from the recombinant protein of interest, e.g., a fusion protein of the present invention. The preferred salt is ammonium sulfate. Ammonium sulfate precipitates proteins by effectively reducing the amount of water in the protein mixture. Proteins then precipitate on the basis of their solubility. The more hydrophobic a protein is, the more likely it is to precipitate at lower ammonium sulfate concentrations. A typical protocol is to add saturated ammonium sulfate to a protein solution so that the resultant ammonium sulfate concentration is between 20-30%. This will precipitate the most hydrophobic proteins. The precipitate is discarded (unless the protein of interest is hydrophobic) and ammonium sulfate is added to the supernatant to a concentration known to precipitate the protein of interest. The precipitate is then solubilized in buffer and the excess salt removed if necessary, through either dialysis or diafiltration. Other methods that rely on solubility of proteins, such as cold ethanol precipitation, are well known to those of skill in the art and can be used to fractionate complex protein mixtures.
ii. Size Differential Filtration
Based on a calculated molecular weight, a protein of greater and lesser size can be isolated using ultrafiltration through membranes of different pore sizes (for example, Amicon or Millipore membranes). As a first step, the protein mixture is ultrafiltered through a membrane with a pore size that has a lower molecular weight cut-off than the molecular weight of a protein of interest, e.g., a fusion protein of the present invention. The retentate of the ultrafiltration is then ultrafiltered against a membrane with a molecular cut off greater than the molecular weight of the protein of interest. The recombinant protein will pass through the membrane into the filtrate. The filtrate can then be chromatographed as described below.
iii. Column Chromatography
The proteins of interest (such as a fusion protein of the present invention) can also be separated from other proteins on the basis of their size, net surface charge, hydrophobicity, or affinity for ligands, such as amylose. In addition, antibodies raised against a segment of the protein of interest can be conjugated to column matrices and the target fusion protein can therefore be immunopurified. All of these methods are well known in the art.
It will be apparent to one of skill that chromatographic techniques can be performed at any scale and using equipment from many different manufacturers (e.g., Pharmacia Biotech).
In another aspect, the present invention provides a method for producing an epigenetic modification of a target chromatin site. A nucleic acid component of chromatin (e.g., DNA) and/or a protein component of chromatin (e.g., a histone protein such as histone H3) present at the target site can be modified. In some embodiments, the target chromatin site comprises a Cas nuclease recognition site (e.g., a Cas9 recognition site). In some embodiments, the target chromatin site comprises a polynucleotide sequence that is recognized by a guide RNA (gRNA) molecule. In some embodiments, the method comprises contacting the target chromatin site with a fusion protein provided herein.
The term “epigenetic modification” refers to a change in genetic information that does not arise from a change in a nucleotide sequence (e.g., a DNA sequence). Typically, epigenetic modifications, such as those than can be produced by fusion proteins and other compositions of the present invention, affect the expression or activity of a target chromatin site (e.g., the expression or activity of a gene), although an epigenetic modification can be any modification of genetic material (e.g., chromatin or a component thereof) that does not arise from a nucleotide sequence change but produces a change in a phenotype (e.g., of an organism comprising the target chromatin site). In the context of the present invention, epigenetic modifications typically comprise modifications to a nucleic acid (e.g., DNA) or a protein (e.g., a histone)). Such modifications typically comprise methylation, dimethylation, trim ethylation, demethylation, acetylation, deacetylation, citrullination, or a combination thereof. Epigenetic modifications can either decrease or increase the expression or activity of a target chromatin site (e.g., gene expression or activity). Epigenetic modifications, and the resulting effects (e.g., changes in gene expression or phenotype), can be either transient or persistent. The choice of effector domain can be used to determine whether a transient or persistent epigenetic modification and/or resulting effect is produced. As a non-limiting example, a combination of a DNMT3A effector domain and a full-length Ezh2 domain can be used to achieve a persistent effect (e.g., gene silencing).
“Chromatin” refers to the macromolecular complex typically found in cells that comprises nucleic acids (e.g., DNA, RNA) and proteins (e.g., histones). Chromatin performs several functions, including packaging DNA into more compact forms, controlling DNA replication and gene expression (e.g., transcriptional regulation), and protecting against DNA damage. In eukaryotic cells, nucleosomes, which comprise DNA wrapped around histone proteins and are separated by relatively short sections of linker DNA, form the fundamental repeat unit of chromatin. Furthermore, multiple histones can wrap into a 30 nm fiber structure. The 30 nm fibers can undergo further high-level packaging into metaphase chromosomes. The relatively loosely packed form of chromatin wherein DNA is wrapped around histone proteins, but the histone proteins are not wrapped into 30 nm fibers, is known as euchromatin and is the form of chromatin that is typically associated with gene transcription. Conversely, the more densely packed form of chromatin, in which histones have wrapped into 30 nm fibers, is known as heterochromatin. The density of chromatin packaging in heterochromatin typically precludes the ability of RNA polymerases to access DNA and carry out transcription. Accordingly, epigenetic modifications of structural proteins in chromatin (e.g., histones), such as those produced by fusion proteins and other compositions of the present invention, control local chromatin structure (e.g., whether the chromatin is in the form of heterochromatin or euchromatin), which in turn affects a target chromatin site (e.g., gene) expression or activity.
“Histones,” which can be modified by fusion proteins and other compositions of the present invention, are highly alkaline proteins that are found in eukaryotic cells and, together with DNA, form the fundamental unit of chromatin known as the nucleosome. Histones function to increase chromatin packaging density, in part, by serving as a structure which DNA can wrap around. The five major families of histone proteins include H2A, H2B, H3, H4, and H1/H5. The latter family constitutes what are known as linker histones, while the first four families are known as core histones. The nucleosome core consists of two H2A-H2B dimers and an H3-H4 tetramer.
In mammals, there are several subfamilies of histone H3: H3.1, H3.2, H3.3, H3.4, H3.5, H3.X, and H3.Y. In humans, H3.1 histone proteins include those encoded by the HIST1H3A, HIST1H3B, HIST1H3C HIST1H3D, HIST1H3E, HIST1H3F, HIST1H3F, HIST1H3G, HIST1H3H, HIST1H3I, and HIST1H3J genes. H3.2 histone proteins in humans include those encoded by the HIST2H3A, HIST2H3C, and HIST2H3D genes. In humans, H3.3 histone proteins include those encoded by the H3F3A and H3F3B genes.
Various modifications of amino acid residues within a histone protein, such as those produced by fusion proteins and other compositions of the present invention, can affect the chemical properties of the histone, and by extension, affect processes such as chromatin packing. Lysine and arginine are residues modified within histone proteins. For example, lysine residues can be methylated or acetylated, or arginine residues can be methylated or citrullinated by fusion proteins of the present invention. Also, serine, threonine, and tyrosine residues can be phosphorylated by fusion proteins of the present invention. Acetylation, which is typically associated with increased transcriptional activity, can, for example, neutralize the positive charge or the lysine residue side chain, thus decreasing the electrostatic interaction between the histone protein and associated DNA molecules. While histone methylation can be associated with different chromatin packing states or levels of transcription activity, methylation of lysines 9 and 27 of histone H3 and lysine 20 of histone H4 are typically associated with suppressed transcription. In particular, dimethylation and trimethylation of lysine 9 of histone H3 (H3K9me2/3), trimethylation of lysine 27 of histone H3 (H3K27me3), and trimethylation of lysine 20 of histone H4 (H4K20me3) are associated with suppressed transcription.
In some embodiments, an epigenetic modification of a nucleic acid (e.g., DNA) component of chromatin at a target site is produced. In other embodiments, an epigenetic modification of a protein (e.g., a histone protein such as a histone H3 protein) component of chromatin at a target site is produced. In some embodiments, an epigenetic modification of a nucleic acid component and a protein component of chromatin at a target site are produced. When an epigenetic modification of a histone H3 protein is produced, in particular embodiments lysine 9 and/or lysine 27 of histone H3 are modified. In some embodiments, a fusion protein of the present invention removes an acetyl group from lysine 27 on histone H3. In some embodiments, a fusion protein of the present invention adds 1, 2, or 3 methyl groups to lysine 9 on histone H3. In other embodiments, a fusion protein of the present invention adds 1, 2, or 3 methyl groups to lysine 27 on histone H3. In particular embodiments, a fusion protein of the present invention adds 1, 2, or 3 methyl groups to lysine 9 on histone H3 and adds 1, 2, or 3 methyl groups to lysine 27 on histone H3. In some instances, lysine 9 on histone H3 is trimethylated (H3K9me3) and/or lysine 27 on histone H3 is trimethylated (H3K27me3). In some embodiments, a fusion protein deacetylates lysine 27 on histone H3 and methylates (e.g., trimethylates) lysine 27 on histone H3. In particular embodiments, the deacetylation event precedes the methylation event.
In some embodiments, epigenetic modification of a target chromatin site by a fusion protein of the present invention produces or is associated with a change in chromatin packing. In some instances, the epigenetic modification results in or is associated with heterochromatin formation (e.g., a transition from euchromatin to heterochromatin). In other instances, the epigenetic modification results in or is associated with euchromatin formation (e.g., a transition from heterochromatin to euchromatin). Such changes in chromatin packing can produce or be associated with a change in target chromatin site expression.
Chromatin immunoprecipitation (ChIP) assays and assays of epigenetic modification can be used to identify or confirm epigenetic modifications produced by fusion proteins of the present invention. ChIP assays are techniques that allow the detection of interactions between proteins and nucleic acids (e.g., DNA). ChIP assays can be used, for example, to detect interactions between DNA and transcription factors or chromatin-modifying proteins. ChIP assays can also be used to analyze the chromatin structure and epigenetic modifications at specific sites of interest (e.g., particular DNA sequences of interest). In one type of ChIP assay, commonly referred to as xChIP, formaldehyde is used to crosslink chromatin (i.e., DNA and associated proteins). Following crosslinking, DNA-protein complexes are immunoprecipitated (e.g., using antibodies specific for the protein(s) of interest). The crosslinks are then reversed, and the isolated DNA can be analyzed (e.g., by sequencing, PCR, or the detection of epigenetic modification such as a methylation assay. Another type of ChIP assay, commonly referred to as nChIP, uses nuclease digestion to prepare chromatin for analysis. nChIP assays allow for more accurate detection of epigenetic modification of histones such as methylation and acetylation than is typically possible with formaldehyde crosslinking, although nChIP assays do not always allow for the detection of DNA-protein interactions when the proteins have a weak binding affinity for DNA. Many ChIP assays are semi-quantitative, although in some cases it is desirable to couple a ChIP assay with a method such as quantitative PCR.
ChIP assays can also be combined with an assay to detect epigenetic modifications, such as DNA methylation assays. A non-limiting example of a DNA methylation assay is DNA bisulfite modification, wherein DNA obtained from a ChIP assay is treated with bisulfite and methylation-specific primers are used to detect changes in DNA methylation.
Changes in chromatin structure (e.g., arising from epigenetic modifications effected by fusion proteins of the present invention for producing epigenetic modifications) can be assessed by additional methods, non-limiting examples of which include DNasel hypersensitivity assays and trichostatin A (TSA) assays. DNasel hypersensitivity sites are typically located in or around promoter regions; as such DNasel hypersensitivity assays can be used to differentiate transcriptionally active from transcriptionally inactive chromatin regions. TSA, at low doses, inhibits the activity of histone deacetylases (HDACs). Accordingly, TSA assays can be used to determine the role that acetylation (or deacetylation) plays at a particular target chromatin site of interest (e.g., a gene of interest).
In some embodiments, epigenetic modification of a target chromatin site (e.g., a gene) by a fusion protein of the present invention produces or is associated with a reduction in, or suppression of, expression of the target chromatin site (e.g., gene expression is reduced or suppressed). In some instances, expression is reduced by at least about 1.1-, 1.2-, 1.3-, 1.4-, 1.5-, 1.6-, 1.7-, 1.8-, 1.9-, 2-, 2.1-, 2.2-, 2.3-, 2.4-, 2.5-, 2.6-, 2.7-, 2.8-, 2.9-, 3-, 3.1-, 3.2-, 3.3-, 3.4-, 3.5-, 3.6-, 3.7-, 3.8-, 3.9-, 4-, 4.1-, 4.2-, 4.3-, 4.4-, 4.5-, 4.6-, 4.7-, 4.8-, 4.9-, 5-, 5.1-, 5.2-, 5.3-, 5.4-, 5.5-, 5.6-, 5.7-, 5.8-, 5.9-, 6-, 6.1-, 6.2-, 6.3-, 6.4-, 6.5-, 6.6-, 6.7-, 6.8-, 6.9-, 7-, 7.1-, 7.2-, 7.3-, 7.4-, 7.5-, 7.6-, 7.7-, 7.8-, 7.9-, 8-, 8.1-, 8.2-, 8.3-, 8.4-, 8.5-, 8.6-, 8.7-, 8.8-, 8.9-, 9-, 9.1-, 9.2-, 9.3-, 9.4-, 9.5-, 9.6-, 9.7-, 9.8-, 9.9-, 10-, 10.5-, 11-, 11.5-, 12-, 12.5-, 13-, 13.5-, 14-, 14.5-, 15-, 15.5-, 16-, 16.5-, 17-, 17.5-, 18-, 18.5-, 19-, 19.5-, or 20-fold. The reduction in expression can be determined, for example, with respect to a control (e.g., expression of a target chromatin site that has not been epigenetically modified by the fusion protein of the present invention for which the comparison is being made).
In some embodiments, epigenetic modification of a target chromatin site (e.g., a gene) by a fusion protein of the present invention produces or is associated with an increase in, or exacerbation of, expression of the target chromatin site (e.g., gene expression is increased or exacerbated). In some instances, expression is increased by at least about 1.1-, 1.2-, 1.3-, 1.4-, 1.5-, 1.6-, 1.7-, 1.8-, 1.9-, 2-, 2.1-, 2.2-, 2.3-, 2.4-, 2.5-, 2.6-, 2.7-, 2.8-, 2.9-, 3-, 3.1-, 3.2-, 3.3-, 3.4-, 3.5-, 3.6-, 3.7-, 3.8-, 3.9-, 4-, 4.1-, 4.2-, 4.3-, 4.4-, 4.5-, 4.6-, 4.7-, 4.8-, 4.9-, 5-, 5.1-, 5.2-, 5.3-, 5.4-, 5.5-, 5.6-, 5.7-, 5.8-, 5.9-, 6-, 6.1-, 6.2-, 6.3-, 6.4-, 6.5-, 6.6-, 6.7-, 6.8-, 6.9-, 7-, 7.1-, 7.2-, 7.3-, 7.4-, 7.5-, 7.6-, 7.7-, 7.8-, 7.9-, 8-, 8.1-, 8.2-, 8.3-, 8.4-, 8.5-, 8.6-, 8.7-, 8.8-, 8.9-, 9-, 9.1-, 9.2-, 9.3-, 9.4-, 9.5-, 9.6-, 9.7-, 9.8-, 9.9-, 10-, 10.5-, 11-, 11.5-, 12-, 12.5-, 13-, 13.5-, 14-, 14.5-, 15-, 15.5-, 16-, 16.5-, 17-, 17.5-, 18-, 18.5-, 19-, 19.5-, or 20-fold. The increase in expression can be determined, for example, with respect to a control (e.g., expression of a target chromatin site that has not been epigenetically modified by the fusion protein of the present invention for which the comparison is being made).
Typically, epigenetic modifications produced by fusion proteins of the present invention will produce a decrease or increase in the level of mRNA expression (i.e., a decrease or increase in transcription of a gene expressed by the target chromatin site or under the control of a genetic regulatory element at the target chromatin site). Accordingly, the amount of a decrease or increase in expression can be determined or quantified by measuring mRNA levels (e.g., of a gene expressed by the target chromatin site or under the control of a genetic regulatory element at the target chromatin site). In some embodiments, the amount of a decrease or increase in expression is expressed as a fold change in the level of one or more mRNA transcripts. Exemplary methods for measuring mRNA levels include, without limitation, PCR (e.g., reverse-transcription quantitative PCR) and microarray analysis.
In addition, epigenetic modifications produced by fusion proteins of the present invention can produce changes in the level of protein expression. Accordingly, the amount of a decrease or increase in expression effected by an epigenetic modification can be determined or quantified by measuring protein levels (e.g., of a protein expressed from a gene expressed by the target chromatin site or under the control of a genetic regulatory element at the target chromatin site. In some embodiments, the amount of a decrease or increase in expression is expressed as a fold change in the level of one or more proteins. Exemplary methods for determining protein expression or quantifying the presence of other compounds (e.g., metabolites or other biochemicals that can be used to assay metabolic activity) include, without limitation, Western Blot, dot blot, enzyme-linked immunosorbent assay (ELISA), radioimmunoassay (RIA), immunoprecipitation, immunofluorescence, immunohistochemistry FACS analysis, chemiluminescence, and multiplex bead assays (e.g., using Luminex or fluorescent microbeads).
Epigenetic modifications produced according to compositions and methods of the present invention can produce changes in one or more phenotypes (e.g., the level or activity of a biochemical pathway, or the morphology or developmental fate of a cell or tissue). In some embodiments, the effects of epigenetic modifications can be assessed by employing a reporter or selectable marker to examine the phenotype of an organism or a population of organisms. In some instances, the marker produces a visible phenotype, such as the color of an organism or population of organisms. As a non-limiting example, the phenotype can be examined by growing the target organisms (e.g., cells or other organisms that have had their genome epigenetically modified) and/or their progeny under conditions that result in a phenotype, wherein the phenotype may not be visible under ordinary growth conditions.
In some embodiments, the reporter or selectable marker, used for assessing the effects of an epigenetic modification made by a fusion protein of the present invention, is a fluorescent tagged protein, an antibody, a labeled antibody, a chemical stain, a chemical indicator, or a combination thereof. In other embodiments, the reporter or selectable marker responds to a stimulus, a biochemical, or a change in environmental conditions. In some instances, the reporter or selectable marker responds to the concentration of a metabolic product, a protein product, a synthesized drug of interest, a cellular phenotype of interest, a cellular product of interest, or a combination thereof. A cellular product of interest can be, as a non-limiting example, an RNA molecule (e.g., messenger RNA (mRNA), long non-coding RNA (lncRNA), microRNA (miRNA)), which can be produced, for example, under the control of a target chromatin site that is epigenetically modified by a fusion protein of the present invention.
In some embodiments, an epigenetic modification is produced in vitro. In other embodiments, the fusion protein and the target chromatin site are in a cell. As a non-limiting example, the fusion protein, or a combination of the fusion protein and a gRNA, can be introduced into a cell, and the fusion protein subsequently produces an epigenetic modification at a target chromatin site (e.g., a target chromatin site that is present within the cell's genome). Alternatively, a nucleic acid or a vector comprising a polynucleotide sequence encoding the fusion protein and/or the gRNA can be introduced into a cell, and subsequently the fusion protein can be expressed by the cell. The expressed fusion protein can then produce an epigenetic modification at a target chromatin site within the cell.
Epigenetic modification methods of the present invention can be performed in a multiplex format. In some embodiments, multiplexing comprises introducing two or more gRNA molecules into a host cell, or cloning two or more nucleic acids comprising polynucleotide sequences that encode gRNA molecules in tandem into a single expression vector (i.e., an expression vector that is subsequently introduced into a host cell). In some instances, at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, or more gRNA molecules are introduced into a host cell. In some embodiments, at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, or more polynucleotide sequences that encode gRNA molecules (e.g., different gRNA molecules) are included in a single vector. In some embodiments, at least 2, 3, 4, 5, 6, 7, 8, 9, 10, or more expression vectors are introduced into a host cell. Each of the expression vectors can encode one or more different gRNA molecules.
In still other embodiments, multiplexing comprises transfecting a plurality of host cells. Each host cell can be transfected with a single expression vector or multiple different expression vectors. In some embodiments, a plurality of host cells comprises about 103, about 104, about 105, about 106, about 107, or about 108 cells. Also, multiple embodiments of multiplexing can be combined.
By using one or a combination of the various multiplexing embodiments, it is possible to epigenetically modify any number of target sites within a genome. In some instances, at least about 10 (e.g., at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) target sites are modified. In other instances, between about 10 and 100 (e.g., about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100) target sites are modified. In some instances, about 100 and about 1,000 (e.g., about 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1,000) target sites are modified. In other instances, between about 1,000 and about 30,000 (e.g., about 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, 20,000, 21,000, 22,000, 23,000, 24,000, 25,000, 26,000, 27,000, 28,000, 29,000, or 30,000) target sites are modified.
In some embodiments, more than one gRNA (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) molecule is used modify each target site. In some instances, a multiplexed experiment utilizes at least about 2 to about 100 (e.g., at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100) different gRNA molecules. In other instances, a multiplexed experiment utilizes at least about 100 to about 10,000 (e.g., at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,500, 2,000, 2,500, 3,000, 3,500, 4,000, 4,500, 5,000, 5,500, 6,000, 6,500, 7,000, 7,500, 8,000, 8,500, 9,000, 9,500, or 10,000) different gRNA molecules. In some instances, a multiplexed experiment utilizes at least about 10,000 to about 500,000 (e.g., at least about 10,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 150,000, 200,000, 250,000, 300,000, 350,000, 400,000, 450,000, or 500,000) different gRNA molecules.
In some embodiments, the host cell comprises a population of cells (e.g., host cells). In some instances, one or more epigenetic modifications are produced in at least about 20 percent (e.g., at least about 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 percent) of the population of cells. In other instances, one or more epigenetic modifications are produced in at least about 50 percent (e.g., at least about 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 65, 70, 75, 80, 85, 90, 95, or 100 percent) of the population of cells. In still other instances, one or more epigenetic modifications are produced in at least about 75 percent (e.g., at least about 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 95, or 100 percent) of the population of cells. In other instances, one or more epigenetic modifications are produced in at least about 90 percent (e.g., at least about 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 percent) of the population of cells. In particular instances, one or more epigenetic modifications are produced in at least about 95 percent (e.g., at least about 95, 96, 97, 98, 99, or 100 percent) of the population of cells.
The compositions and methods of the present invention can be used to screen for one or more target chromatin sites (e.g., within the genome of a cell or organism). As a non-limiting example, compositions and methods of the present invention can be used to produce epigenetic modification(s) at one or more target chromatin sites, and then the effects of the epigenetic modification(s) on the expression of the target site(s) (e.g., one or more genes) can be assessed. Target site expression can be assessed in terms of transcriptional activity (e.g., mRNA levels), translational activity (e.g., protein levels), or phenotype, using techniques that are described herein and will be known to one of skill in the art.
Screening methods can be performed in a multiplex format as described herein. In some embodiments, multiplexed screening comprises introducing two or more gRNA molecules into a host cell, or cloning two or more nucleic acids comprising polynucleotide sequences that encode gRNA molecules in tandem into a single expression vector. In some instances, at least about 2 to about 10 gRNA molecules are introduced into a host cell. In some embodiments, at least about 2 to about 10 polynucleotide sequences that encode gRNA molecules (e.g., different gRNA molecules) are included in a single vector (i.e., a vector that is introduced into a host cell). In some embodiments, at least about 2 to about 10, or more expression vectors are introduced into a host cell. Each of the expression vectors can encode one or more different gRNA molecules.
In still other embodiments, multiplexed screening comprises transfecting a plurality of host cells. Each host cell can be transfected with a single expression vector or multiple different expression vectors. In some embodiments, a plurality of host cells comprises between about 103 and about 108 cells. Also, multiple embodiments of multiplexed screening can be combined. One of skill in the art will recognize that the progeny of epigenetically modified cells can also be used for screening according to methods of the present invention.
By using one or a combination of the various multiplexing embodiments, it is possible to screen any number of target sites within a genome. In some instances, at least about 10 to about 30,000 loci are screened. In some embodiments, more than one gRNA molecule is used to screen each locus. In some instances, a multiplexed screening experiment utilizes at least about 2 to about 500,000 different gRNA molecules.
The compositions and methods provided by the present invention are useful for any number of applications. As non-limiting examples, epigenetic modifications (e.g., of a genome) can be performed in order to prevent or treat a disease, or to identify one or more specific target chromatin sites (e.g., genetic loci) that contribute to a phenotype, disease, biological function, and the like. As another non-limiting example, epigenetic modifications for the purposes of screening according to the compositions and methods of the present invention can be used to improve or optimize a biological function or pathway.
The compositions and methods of the present invention are useful for preventing or treating any number of genetic diseases (e.g., in a subject in need thereof). The present invention is particularly well-suited for the prevention or treatment of diseases that result from the underexpression or overexpression of a gene product, such as a protein or enzyme. The present invention is also particularly well-suited for the prevention or treatment of diseases that arise from abnormal cell differentiation or development, as many of these processes are under the direct control of epigenetic regulation.
In some embodiments, the subject is treated (e.g., a target chromatin site in the subject is epigenetically modified) before any symptoms or sequelae of the genetic disease develop. In other embodiments, the subject has symptoms or sequelae of the genetic disease. In some instances, treatment results in a reduction or elimination of the symptoms or sequelae of the genetic disease.
In some embodiments, treatment (e.g., epigenetic modification of a target chromatin site) includes administering compositions (e.g., fusion proteins, nucleic acids, expression vectors, or cells) of the present invention directly to a subject. As a non-limiting example, pharmaceutical compositions of the present invention (e.g., comprising a fusion protein, nucleic acid, expression vector, or cell of the present invention and a pharmaceutically acceptable carrier) can be delivered directly to a subject (e.g., by local injection or systemic administration). In other embodiments, the compositions of the present invention are delivered to a host cell or population of host cells, and then the host cell or population of host cells is administered or transplanted into the subject. The host cell or population of host cells can be administered or transplanted with a pharmaceutically acceptable carrier. In some instances, epigenetic modification of the target chromatin site (e.g., of the host cell genome) has not yet been completed prior to administration or transplantation to the subject. In other instances, epigenetic modification of the target chromatin site has been completed when administration or transplantation occurs. In certain instances, progeny of the host cell or population of host cells are transplanted into the subject. In some embodiments, correct epigenetic modification of the host cell or population of host cells, or the progeny thereof, is verified before administering or transplanting cells containing modified chromatin or the progeny thereof into a subject. Procedures for transplantation, administration, and verification of correct epigenetic modification are discussed herein and will be known to one of skill in the art.
Compositions of the present invention, including cells and/or progeny thereof that have had their target chromatin sites epigenetically modified by the methods and/or compositions of the present invention, may be administered as a single dose or as multiple doses, for example two doses administered at an interval of about one month, about two months, about three months, about six months or about 12 months. Other suitable dosage schedules can be determined by a medical practitioner.
In another aspect, the present invention provides kits for producing epigenetic modifications at a target chromatin site comprising a Cas nuclease (e.g., Cas9) recognition site, the kit comprising one or more fusion proteins of the present invention. The kit may also comprise one or more nucleic acids (e.g., encoding a fusion protein of the present invention), one or more expression vectors (e.g., comprising a nucleic acid comprising a polynucleotide sequence encoding a fusion protein of the present invention), or one or more cells (e.g., transfected with a nucleic acid or expression vector) of the present invention. The kit may further comprise guide RNA (gRNA) molecule(s), or nucleic acids or expression vectors containing polynucleotide sequences encoding the gRNA molecule(s).
Kits of the present invention can be packaged in a way that allows for safe or convenient storage or use (e.g., in a box or other container having a lid), Typically, kits of the present include one or more containers, each container storing a particular kit component such as a reagent, a control sample, and so on. The choice of container will depend on the particular form of its contents, e.g., a kit component that is in liquid form, powder form, etc. Furthermore, containers can be made of materials that are designed to maximize the shelf-life of the kit components. As a non-limiting example, kit components that are light-sensitive can be stored in containers that are opaque.
In some embodiments, the kit contains one or more reagents. In some instances, the reagents are useful for transfecting a host cell with a nucleic acid (e.g., encoding a fusion protein of the present invention), expression vector (e.g., comprising a nucleic acid of the present invention), or a plurality thereof, and/or inducing expression from the nucleic acid(s) and/or expression vector(s). The kit may further comprise one or more reagents useful for delivering fusion proteins of the present invention into a host cell. In yet other embodiments, the kit further comprises instructions for use.
The present invention will be described in greater detail by way of a specific example. The following example is offered for illustrative purposes only, and is not intended to limit the invention in any manner.
This example demonstrates the use of fusion proteins of the present invention for producing epigenetic modifications of target chromatin sites. In particular, a broad set of epigenetic enzymes (epigenetic writers) and epigenetic recruiters (peptides or proteins recruiting chromatin modifying complexes) were investigated for their ability to produce transcriptionally repressive histone marks when fused to a catalytically inactive Cas9 (dCas9) platform. In addition to the writers of H3K9me3 (i.e., G9A, SUV39H1) and the KRAB repressor domain (6, 30), fusions to Ezh2 (i.e., a writer of H3K27me3) and to the N-terminal 45 residues of FOG1 (which has been associated with acquisition of H3K27me3 and loss of histone acetylation (31, 32)) were also created and used; these domains had not been previously investigated as dCas9 fusions. The effects of the marks introduced by these proteins on gene expression were compared to the effects of DNA methylation by dCas9-DNMT3A. This example shows that dCas9 fusions to catalytic domains of EZH2, G9A and SUV39H1, as well as dCas9 fused to the N terminus of FOG1, were sufficient for some level of repression of three different promoters in two different cell types, but that repression was not always correlated with the expected histone modification. This example also shows that the dCas9-like targeting protein dCpf1 was not able to substitute for dCas9 in these experiments. Finally, this example shows that combinations of targeted effectors were able to produce persistent silencing.
Construction of dCas9 Expression Plasmids
A variety of epigenetic effectors were fused to human codon-optimized and catalytically inactive “dead” Cas9 (dCas9) in different conformations. The improved pCDNA3-dCas9 expression plasmid was obtained by altering the original dCas9 plasmid (33) using Gibson cloning. The improved pCDNA3-dCas9 contained two nuclear localization signals (NLS), a 3× FLAG epitope tag, and [(GGS)5] (SEQ ID NO:75) amino acid linkers at the N- and C-termini of dCas9 with flanking restriction sites KpnI and NheI, respectively. The improved dCas9 protein sequence is set forth in SEQ ID NO:8. Effector domains were amplified using 2× Phusion Master Mix (New England Biolabs) according to the manufacturer's instructions. PCR primers for cDNA amplification of individual effector domains were designed with cloning vector overhangs for Gibson cloning. Primer sequences are set forth in SEQ ID NOS:11-22 and 27-34. cDNA for G9A[SET], SUV[SET], and DNMT3A was kindly provided by the lab of Marianne Rots (29, 34). The DNMT3L expression plasmid pCDNA-DNMT3L was a kind gift from Dr. Fred Chedin (35). Mouse Ezh2[FL] cDNA was synthesized by Bio Basic, Inc. Ezh2[FL] was used as a template to amplify the shorter Ezh2[SET] domain. Catalytic mutants Ezh2[SET-Y641A]-dCas9 and Ezh2[SET-Y726F]-dCas9 were created by site-directed mutagenesis using the QuikChange II XL Site-Directed Mutagenesis kit (Stratagene). The sequences of primers used for mutagenesis are set forth in SEQ ID NOS:23-26. The KRAB domain was amplified from dCas9-KRAB (33) and FOG1 cDNA was amplified from HEK293FT cells. Total RNA was isolated from HEK293FT using the RNeasy mini kit (Qiagen) and cDNA was synthesized using random hexamer primers using the RevertAid cDNA synthesis kit (ThermoScientific). Using Gibson Assembly (New England Biolabs), amplified cDNAs were cloned into either KpnI or NheI digested dCas9 for N-terminal or C-terminal fusions to dCas9, respectively. Finally, the FOG1 epigenetic effector construct was Gibson assembled (New England Biolabs). Protein sequences of dCas9-fusions are set forth in SEQ ID NOS:9 and 10. For arrays of two, three, and four FOG1 domains to the N-terminus of dCas9, FOG1 monomer coding sequences were amplified separately by PCR introducing a GS linker between individual monomer coding sequences and the KpnI and FseI restriction sites at the beginning of first monomer and the end of the last monomer for each array. In addition, a BsaI endonuclease site was added to either end of the FOG1 monomers and each fragment contained a distinct 4-base overhang that directed the assembly of multiple monomers. The sequences of amplification primers are set forth in SEQ ID NOS:35-42. Two, three, or four monomer coding sequences were mixed with pFusA plasmid for Golden Gate Assembly cloning with BsaI and T4 DNA ligase (New England Biolabs). DNA fragments of arrays of two, three, and four FOG1 domains were digested with KpnI and FseI and ligated into the KpnI/FseI digested dCas9 plasmid.
The cloning vector was obtained from Addgene (36; Addgene, plasmid #41824) and was linearized using the AflII restriction enzyme. 19-bp gRNA target sequences were selected within 500 base pairs of the relevant gene promoter using the online tool CHOPCHOP (37). Each gRNA sequence was selected and incorporated into two 60-mer oligonucleotides that contained cloning vector overhangs for Gibson assembly. After annealing and extending the oligonucleotides to 100-bp, the PCR purified (PCR purification kit; QIAGEN) dsDNA was Gibson assembled into the AflII linearized plasmid. The sequences of oligomers used to create target specific vectors are set forth in SEQ ID NOS:43-45.
Construction of dCpf1 Expression Plasmids and crRNA
The inactive Cpf1 was generated by mutating the catalytic domain AsCpf1 (D908A; (38)). This amino acid change was induced through adding mutations in the primers during PCR amplification with pcDNA3.1-hAsCpf1 (Addgene, plasmid #69982) as template. Primer sequences are set forth in SEQ ID NOS:49-52. Two PCR fragments were inserted into the FseI/NheI linearized pCDNA3-dCas9 backbone using Gibson assembly, thereby replacing dCas9 with dCpf1. Effector domains were then added using KpnI and/or NheI digested plasmid to generate N- and/or C-terminal dCpf1 fusions following the same principle as dCas9 fusions. This step used the same cDNA amplification primers as described for dCas9 fusions. crRNA was designed to target 23-bp adjacent to the 5′-NTTT-3′ PAM. crRNA target sequences are listed in SEQ ID NO:46-48. For Cpf1 cleavage assays and dCpf1 ChIP assays, the U6-crRNA cassette was amplified by PCR (39). The U6-crRNA cassette was then co-transfected with dCpf1 expressing plasmids as described below. To determine repression by dCpf1 fusion proteins plasmids containing the U6-crRNA cassette were coexpressed with plasmids expressing dCpf1 fusions (40).
The human colon cancer cell line HCT116 (ATCC #CCL-247) was grown in McCoy's 5A Medium supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin. Cells were maintained at 37° C. and 5% CO2. HCT116 cells were authenticated by the Bioreagent and Cell Culture Core, USC Norris Comprehensive Cancer Center. Cells of 50-60% confluency were transfected using Lipofectamine 3000 (Life Technologies) following the manufacturer's instructions. Transfections for RNA extraction were performed in 12-well plates using 625 ng dCas9 expression vector, 500 ng of equimolar pooled expression vectors, and 125 ng pBABE-puro. Transfections with dCpf1 were carried out using the same protocol except that U6-crRNA expressing plasmids were co-transfected with dCpf1 expressing plasmids as described elsewhere (39). For ChIP assays and DNA-methylation analysis, cells were plated in 10-cm2 culture dishes and transfection was scaled up accordingly. Transfection medium was replaced 24 hours post-transfection with growth medium containing 3 μg/mL puromycin to enrich for transfected cells. Subsequently, puromycin-containing media was exchanged every 24 hours. To assay for persistent repression, media was switched to standard growth media four days after transfection.
Transfected cells were rinsed in 1× DPBS and RNA stabilized by adding 500 μg RNAlater (Ambion) and stored at 4° C. for up to one week. Total RNA was extracted 3-4 days after transfection using the RNeasy Mini kit (QIAGEN) and 500 ng RNA were reverse-transcribed using the SuperScript VILO MasterMix (Invitrogen) according to the manufacturer's instructions. Real-time PCR was performed in triplicate with 2× iQ SYBR mix (BioRad) using the CFX384 Real-Time System C1000 Touch Thermo Cycler (BioRad) and the included software was used to extract raw Cq values. Gene expression analysis was performed with GAPDH as a reference gene in at least two biological replicates using intron-spanning HER2 primers (HER2-F 5′-GGGAAACCTGGAACTCACCT-3′ (SEQ ID NO:53); HER2-R 5′-GACCTGCCTCACTTGGTTGT-3′ (SEQ ID NO:54)), EPCAM primers (EPCAM-F 5′-CTGGCCGTAAACTGCTTTGT-3′ (SEQ ID NO:55); EPCAM-R 5′-TCCCAAGTTTTGAGCCATTC-3′ (SEQ ID NO:56)), MYC primers (MYC-F 5′-AAACACAAACTTGAACAGCTAC-3′ (SEQ ID NO:57); MYC-R 5′-ATTTGAGGCAGTTTACATTATGG-3′ (SEQ ID NO:58)) and GAPDH primers (GAPDH-F 5′-AATCCCATCACCATCTTCCA-3′ (SEQ ID NO:59); GAPDH-R 5′-CTCCATGGTGGTGAAGACG-3′ (SEQ ID NO:60)). Relative target gene expression was calculated as the difference between the target gene and the GAPDH reference gene (i.e., dCq=Cq[target]−Cq[GAPDH]). Gene expression results are indicated as fold change relative to a reference sample (usually dCas9 without any effector domain), using the ddCq method. A one-way ANOVA (ANalysis Of VAriance) with post-hoc Tukey HSD (Honestly Significant Difference) test was applied to determine statistical significance for different dCas9 fusions.
For ChIP assays of histone marks, transfected cells were cross-linked 3-4 days post transfection by incubation with 1% formaldehyde solution for 10 minutes at room temperature and the reaction was stopped by the addition of glycine to a final concentration of 125 mM. Cross-linked cell pellets were stored at −80° C. Chromatin was extracted and ChIP was performed using StaphA cells (Sigma-Aldrich, St. Louis, MO, USA) to collect the immunoprecipitates as previously described (33,41). Briefly, chromatin was sheared to an average fragment size of 500-bp using a Bioruptor 2000 (Diagenode). 10 μg chromatin were used per ChIP assay. ChIP enrichment was performed by incubation with 3μg H3K9me3 antibody (Abcam ab8898), 3 H3K9me2 antibody (MP 07-441), 2μg H3K27me3 antibody (MP 07-449), 2μg H3K27ac antibody (Active Motif #39133), or 2μg normal rabbit IgG (Abcam ab46540) for 16 hours at 4° C. Immuno complexes were bound to StaphA cells for 15 minutes at room temperature. For dCpf1 and dCas9 ChIP assays, HCT116 cells were transfected in 10 cm culture dishes as described above, but puromycin selection was omitted. After cross-linking of chromatin, ChIP assays were performed using 3 μg FLAG antibody (SIGMA M2 F1804) at 4° C. overnight. Immuno complexes were captured with 3μg rabbit anti-mouse antibody for 1 hour at 4° C. and were bound to StaphA cells for 15 minutes at room temperature. After washing and reversal of cross-links, DNA was purified using the QIAquick PCR Purification Kit (Qiagen). ChIP-DNA and diluted input control were used for subsequent qPCR reactions with 2x SYBR FAST Master Mix (KAPA Biosystems) according to the manufacturer's recommendations using the CFX384 Real-Time System C1000 Touch Thermo Cycler (BioRad). ChIP enrichment was calculated relative to input samples using the dCq method (i.e., dCq=Cq[HER2-ChIP]-Cq[input]). HER2 ChIP amplification primers are as follows: HER2-ChIP-F (5′-TTGGAATGCAGTTGGAGGGG-3′ (SEQ ID NO:61)) and HER2-ChIP-R (5′-GGTTTCTCCGGTCCCAATGG-3′ (SEQ ID NO:62)). A one-way ANOVA (ANalysis Of VAriance) with post-hoc Tukey HSD (Honestly Significant Difference) test was applied to determine statistical significance for different dCas9 fusions.
Genomic DNA from transfected and untreated cells was isolated using the Quick-gDNA MiniPrep kit (ZYMO). Bisulfite conversion was performed using the EZ DNA Methylation-Lightning Kit (ZYMO) following the manufacturer's instructions. Bisulfite-Sequencing PCR primers (HER2-BSP-F 5′-GGAGGGGGTAGAGTTATTAGTTTTT-3′ (SEQ ID NO:63) and HER2-BSP-R 5′-AAATAACAACTCCCAACTTCACTTT-3′ (SEQ ID NO:64)) were designed using MethPrimer (42). Bisulfite converted DNA was used for PCR amplification with GoTaq polymerase (Promega) and the 152-bp PCR product was purified with the QIAquick PCR Purification Kit (Qiagen). Amplicons were inserted into the pCR4-TOPO TA vector using the TOPO-TA-cloning kit (ThermoFisher) and transformed into NEB5α competent cells. Plasmid DNA from individual recombinant clones was isolated and subjected to Sanger sequencing using M13F primers at the College of Biological Sciences UC DNA Sequencing Facility. Methylation status of CpGs for each clone was determined by sequence comparison.
For the pPGK-mCherry reporter plasmid, the Cpf1 nuclease binding site (crRNA binding region on HER2 promoter) was inserted between XhoI/BamHI sites, which are flanked by 200-bp direct repeats derived from mCherry as single-strand annealing (SSA) arms (43). The ORF of the mCherry gene was interrupted by the insertion of the relevant binding region and a series of three stop codons (
Transfected cells were lysed 48 hours post transfection in 1× RIPA buffer (Millipore) supplemented with protease inhibitor cocktail (Roche). Protein concentrations were determined by Bradford assay (BioRad) and 20 μg protein were separated on a 4-15% TGX gel (BioRad) in Tris/Glycine/SDS buffer and transferred onto nitrocellulose membranes. Protein loading was evaluated by Ponceau S stain. After rinsing the membrane with deionized water, non-specific antigen binding was blocked in TBST (50 mM Tris, 150 mM NaCl, and 0.1% Tween-20) with 5% nonfat dry milk (Cell Signaling). Membranes were incubated with primary antibody in blocking solution at 4° C. overnight. Monoclonal antibodies against FLAG (1:1000; SIGMA M2 F1804) or anti-beta-actin (1:2500; SIGMA A5441) were used. Membranes were washed with TBST three times for 10 minutes before membranes were incubated with HRP conjugated anti-mouse secondary antibody at room temperature. After 45 minutes, the membrane was washed three times in TBST and proteins were visualized with Amersham ECL Prime Western Blotting Detection Reagent (GE Healthcare) and autoradiobiography film.
Systematic Evaluation of Repression by dCas9 Fused to Catalytic Domains of Histone Lysine Methyltransferases G9A and SUV39H1
Epigenetic effector domains for H3K9 methylation have been previously fused to artificial zinc finger proteins (ZFP) to affect transcriptional regulation in a targeted manner. More specifically, the C-terminal end of ZFP E2C, which targets the HER2 promoter, had been previously fused to the catalytic SET domains of the histone methyltransferases G9A or SUV39H1 (herein referred to as G9A[SET] and SUV[SET], respectively;
Repression by dCas9-SUV[SET] Does Not Require Trimethylation Of H3K9 at the HER2 Gene Promoter
To determine if repression by G9A[SET]-dCas9 and SUV[SET]-dCas9 was associated with the trimethylation of H3K9, histone ChIP-qPCR assays were performed to quantitatively measure H3K9me3 enrichment at the HER2 promoter. ChIP enrichment was evaluated relative to dCas9 that did not contain an effector domain. G9A[SET]-dCas9 co-transfected with the three guide-RNAs produced a 13-fold increase in H3K9 trimethylation compared to dCas9 with no ED (Tukey HSD test, P<0.05;
Full-Length Histone Methyltransferase Ezh2 is Required for H3K27 Methylation, but H3K27me3 is Not Correlated with Repressive Activity
H3K9me3 and H3K27me3 mark distinct regions in the genome (45); H3K9me3 is a mark typical of constitutive heterochromatin, while H3K27me3 is usually enriched on facultative heterochromatin (2, 46). Since enzymes mediating the repressive H3K27me3 mark had not yet been targeted to a specific genomic locus by dCas9, dCas9 N-terminal fusions were created with the full-length mouse methyltransferase (Ezh2[FL]), as well as a truncated form (Ezh2[SET]) containing the CXC and SET domains (aa482-746) but lacking some of the N-terminal domains (
Taken together, these results supported a hypothesis that neither H3K9me3 nor H3K27me3 must precede or are causative for repression. A possible non-epigenetic mechanism for repression was the simple steric interference of endogenous regulatory components by the binding of the dCas9-ED fusions. dCas9 alone did not cause repression by this mechanism, as cells transfected with only an mCherry expression plasmid displayed HER expression at a level similar to a dCas9 with no ED (
dCas9-FOG1[1-45] is a Novel and Efficient Transcriptional Repressor Producing H3K27 Trimethylation
As an alternative to the “direct tethering” of the H3K27me3 methyltransferase Ezh2 (
Since FOG1[1-45]-dCas9-FOG1[1-45] (also referred to herein as dCas9-FOG1 [N+C]) showed the strongest repression at the HER2 target locus, ChIP-qPCR assays were performed to determine enrichment of the histone marks H3K27ac and H3K27me3. While the effect on H3K27ac was not significant (Tukey test, P=0.07), H3K27me3 was increased 5.8-fold (Tukey test, P<0.01;
The effect of targeted epigenetic reprogramming is influenced by factors such as epigenetic marks, three-dimensional interactions (e.g., between a promoter and an enhancer, or localization of the DNA region to a subnuclear compartment such as a transcriptional factory), and initial expression levels, which in some instances are locus- and cell-type dependent. Therefore, seven epigenetic modifiers at the HER2, MYC, and EPCAM promoters were investigated in HCT116 and HEK293T cells. To be more comprehensive in the comparison of epigenetic modifiers having a common dCas9 architecture, the additional constructs KRAB-dCas9 and DNMT3A-dCas9 were created. The Krüppel-associated box (KRAB) domain is a commonly used repression domain that, like FOG1, acts by the recruitment of chromatin-modifying complexes. The KRAB domain achieves repression in association with the recruitment of the KAP1 co-repressor complex and is associated with H3K9me3 deposition (27). The DNMT3A repression domain extended the toolbox to include targeted de novo DNA methylation (16-21). As reported in previous studies (16, 17, 22, 25, 48), KRAB-dCas9 caused trimethylation of H3K9 and DNMT3A-dCas9 induced DNA methylation at the targeted HER2 promoter (
Next, dCas9 fusions were tested at different gene promoters. Very modest or no repressive activity was observed at the MYC promoter in HCT116 cells (Tukey HSD Test, P<0.05 and P<0.01;
Effector Fusions to the Catalytically Inactive Cpf1 (dCpf1) Are Not Effective
To guide different epigenetic effector domains to unique sites within the same or different regulatory elements, it is useful to employ orthogonal programmable DNA-binding platforms. The RNA-guided endonuclease Cpf1, a type V CRISPR/Cas system, offers a genome editing alternative to the type II CRISPR/Cas9 endonuclease (39, 50, 51). Unlike Cas9, for which a CRISPR targeting RNA and a trans-activating RNA are combined to form a guide RNA, Cpf1 requires only a single CRISPR gRNA (crRNA). Acidaminococcus (As)Cpf1 efficiently cleaves target DNA adjacent to a short T-rich PAM recognition site (5′-TTTN-3′) whereas Streptococcus pyogenes (Sp)Cas9 requires a G-rich PAM site (5′-NGG-3′), hence broadening in principle the number and diversity of target sites in the genome that are accessible to precise gene editing. Since the goal is to develop tools that target the epigenome, but do not cleave the target DNA, a catalytically “dead” Cpf1 [D908A] (dCpf1;
EZH2[FL]-dCas9 and DNMT3A-dCas9 Establish Persistent Repression, while FOG1[1-45]-dCas9-FOG1[1-45] and KRAB-dCas9 Drive Robust Transient Repression
Next, it was tested whether transient expression of dCas9 fusion proteins could cause persistent HER2 gene repression and if combinations of dCas9 fusion proteins could increase transient and/or persistent downregulation of HER2 expression. Transient repression was measured four days after transfection under puromycin selection to enrich for transfected cells, while the persistent effect was determined after cells were grown for an additional ten days in puromycin-free media (
Precise control of transcription and epigenetics at a defined genomic locus provides an ability to dissect links between the two processes in a way not formerly possible. In this study, a set of epigenome editing tools was generated to deposit epigenetic marks typically associated with a repressed chromatin state, including DNA methylation and histone methylation (both H3K9me3 and H3K27me3). The epigenetic fusions of dCas9 with histone methyltransferases (HMT) described herein complement recently described epigenetic editing tools, which have been mostly focused on DNA methylation and demethylation (16-21). The present study made use of a dCas9 architecture and assayed a broad assortment of epigenetic effector domains at three loci in two cell types. Direct enzyme tethering vs. co-repressor recruitment strategies were also examined.
The major finding of this study was that transcriptional repression was independent of deposition of the expected repressive chromatin mark. While dCas9 alone did not produce repression, evidence from Ezh2[SET]-dCas9 catalytic mutants (
The KRAB domain achieves repression in association with recruitment of the KAP1 co-repressor complex which contains the histone methyltransferase SETDB1, initiating trimethylation of H3K9 (27). The histone methyltransferases SUV39H1 and G9A have also been associated with H3K9me3. In contrast, the two new functional domains introduced in this study, Ezh2 and FOG1, are both associated with H3K27me3. Ezh2 is a catalytic component of the PRC2 complex responsible for H3K27me2/3. GATA-1 and its cofactor Friend of GATA-1 (FOG1) bind to their genomic targets and repress gene expression through recruitment of the nucleosome remodeling deacetylase (NuRD). In biochemical studies, FOG1[1-45] has been shown to interact with several proteins that are part of the NuRD complex, such as histone deacetylases HDAC1/2, CHD4, MBD2/3 as well as MTA-1 and MTA-2 (31, 54). NuRD-mediated deacetylation of H3K27 in turn allows for H3K27 trimethylation by the PRC2 complex (32-55). In the studies described herein, FOG1[1-45]-dCas9-FOG1[1-45] showed the strongest repression at the HER2 target locus compared to any of the other effector domains tested, and also provided strong deposition of H3K27me3. These findings present FOG1[1-45]-dCas9-FOG1[1-45] as a newly described, highly efficient transcriptional repressor associated with H3K27 trimethylation.
The catalytic domains for Ezh2, G9A and SUV39H1 have been mapped to their C-terminal SET domains (30, 47). G9A[SET]-dCas9 was able to deposit H3K9me3 and a full-length Ezh2[FL]-dCas9 was able to deposit H3K27me3; however SUV[SET]-dCas9 and Ezh2[SET]-dCas9 were not able to deposit their expected marks. These observations indicate that the SET domains of SUV and Ezh2 are not sufficient for H3K9 or H3K27 trimethylation but that other parts of the full-length proteins may also be required for histone methylation, at least in the context of dCas9 fusion proteins. Perhaps this is not unexpected as other domains of the Ezh2 protein are important for interaction with members of the PRC2 complex, such as Suz12 and EED, as well as other epigenetic modifying enzymes such as DNA methyltransferases (56, 57). It should also be noted that SUV39H1 has Glu-repeat, Cys-repeat, Ankyrin, and Chomodomain domains upstream of the SET domain (30), which may be important for catalytic (epigenetic writing) activity.
Two strategies can be used to epigenetically repress a specific endogenous gene: 1) direct targeting of a chromatin modifying enzyme itself to DNA or 2) recruitment of a chromatin remodeling complex that contains several enzymatic capabilities. Although in nature, epigenetic enzymes are rarely attached to DNA-binding domains directly, the results presented here using the enzymatic domains of EZH2, SUV, and G9A, as well as those of several other studies (16, 17, 22-24), suggest that the first strategy can be effective experimentally. The novel transcriptional repressor consisting of dCas9 fused to FOG1[1-45] is an example of the alternative repression strategy based on recruitment of a co-repressor, as opposed to fusion of an enzymatic component to dCas9. In addition to any functional advantages (e.g., improved target-gene repression), the use of a short peptide is less likely to interfere with endogenous regulatory factors at the promoter than the direct tethering of large enzymes. It also provides an opportunity to increase its effect by multiplexing the short interaction peptides, such as is frequently done with the herpes simplex VP16 activation domain to produce the more effective VP64 (58, 59). However, the data demonstrate that some configurations of arrayed repeats can actually reduce protein expression, which could have accounted for the reduced repression of the tandem FOG1 arrays.
The toolbox of epigenetic editors described herein was found to have locus- and cell-type dependent effects on transcriptional repression, ranging from nearly no significant repression by any factor at the MYC promoter in HCT116 cells to nearly 10-fold repression by one factor at MYC in HEK293T cells. HCT116 is a colon cancer cell line that contains amplified regions in the genome resulting in additional copies of affected genes. The MYC gene is located in such an amplified region in HCT116 cells and is thus present in three copies, while there are two copies of the EPCAM and HER2 genes. It cannot be concluded whether the lack of repression is a cell-type specific phenomenon, per se, or if it is more difficult to achieve repression in the presence of additional MYC gene copies. The effect of targeted epigenetic reprograming might also be influenced by existing epigenetic marks, three-dimensional interactions, and initial expression levels, as well as other factors
Surprisingly, none of the dCpf1-effector domain fusions had an effect on gene expression, despite evidence of binding to the DNA target sites. In contrast to the G-rich PAM site (5′-NGG-3′) required by the Streptococcus pyogenes (Sp)Cas9, Acidaminococcus sp. BV3L6 (As)Cpf1 is an RNA-guided nuclease that can use a short T-rich PAM recognition site (5′-TTTN-3′) (39, 50). Targeting both T-rich as well as C-rich chromatin regions would broaden the number of target sites in the genome that are accessible to epigenetic editing, and would have been a useful orthogonal platform for targeting different effectors to the same gene or simultaneously activating and repressing different genes in the same cell. Notably, there have not been any reports of dCpf1 based activators (e.g., VP64) or repressors (e.g., KRAB) in mammalian cells. In Arabidopsis, fusions of catalytic inactive Cpf1 (AsCpf1[D908A] and LbCpf1 [D832A]) with three copies of the SRDX repressor domain were used to repress a noncoding RNA (60). Unfortunately, the dCpf1 used here was not suitable for targeted transcriptional regulation. It is also noted that Ezh2[SET]-dCas9 was observed to produce gene repression through a non-catalytic process such as steric hindrance (
In addition to orthogonal gene regulation, epigenetic editing is useful for effecting persistent changes in gene expression without altering genetic sequence. In nature, H3K9me3 and H3K27me3 are often associated with silenced states of genes and other elements that are stable over the lifetime of an individual. However, far less is known about the transitions between active and silenced states. It has been shown that targeting DNMT3A to a gene promoter can be sufficient to achieve persistent gene silencing (16, 61, 62). Although targeting DNMT3A results in methylation at the target site, it has been found that the downregulation of gene expression is often modest (17, 48). In certain cell types, targeting KRAB and DNMT3L in addition to DNMT3A was required for persistent gene silencing (61). However, KRAB-dCas9 had no effect on promoting persistent silencing in the present study, while the dCas9 fusion with the epigenetic writer of H3K27me3 (Ezh2[FL]) facilitated persistence.
Targeting epigenetic modifying enzymes allowed for the interrogation of the causal relationship between the epigenetic marks and gene expression at the target site. Surprisingly, it was found that deposition of the expected histone modification was not sufficient for transcriptional repression. This result was similar to a previous finding that the level of H3K27ac at an enhancer region was not correlated with the activity of that enhancer in its endogenous genomic context (63). The present study has expanded the list of tools available for epigenetic editing (6) to include new targeted tools to deposit H3K27me3. However, almost all targeted epigenetic modifiers reported to date have fallen well short of producing the dramatic differences in the level of gene repression observed in natural epigenetic states.
It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, patent applications, and sequence accession numbers cited herein are hereby incorporated by reference in their entirety for all purposes.
PSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYT
RRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIF
GNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFR
GHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAIL
SARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLA
EDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI
LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQKKKRKVGL
PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL
LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKD
NREKIEKILTFRIPYWGPLARGNSRFAWMTRKSEETITPWNFEEV
VDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK
VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFK
KIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE
DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGR
LSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI
QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR
HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH
PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQ
SFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA
KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL
DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSE
QEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV
WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDK
LIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE
LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENG
RKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ
KQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDK
PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATL
IHQSITGLYETRIDLSQLGGDGGSGGSGGSGGSGGSASGGGSGGGSK
KVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRR
YTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHP
IFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIK
FRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAK
AILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNF
DLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAIL
LSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQKKKR
KVGLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDG
TEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYP
FLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWN
FEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNE
LTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKED
YFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENED
ILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGW
GRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK
EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKV
MGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQI
LKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVD
AIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWR
QLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITK
HVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKV
REINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRK
MIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNG
ETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILP
KRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKK
LKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSL
FELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGS
PEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYN
KHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEV
LDATLIHQSITGLYETRIDLSQLGGD
GGSGGSGGSGGSGGSASGGGS
PSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYT
RRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIF
GNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFR
GHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAIL
SARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLA
EDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI
LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQKKKRKVGL
PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL
LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKD
NREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEV
VDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK
VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFK
KIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE
DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGR
LSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI
QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR
HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH
PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQ
SFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA
KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL
DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSE
QEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV
WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDK
LIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE
LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENG
RKRMLASAGELQKGNELALPSKINNFLYLASHYEKLKGSPEDNEQ
KQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDK
PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATL
IHQSITGLYETRIDLSQLGGD
GGSGGSGGSGGSGGSASZTSGGGSGG
SSNRQKILERTETLNQEWKQRRIQPVHIMTSVSSLRGTRECSVTSD
LDFPAQVIPLKTLNAVASVPIMYSWSPLQQNFMVEDETVLHNIPYM
GDEVLDQDGTFIEELIKNYDGKVHGDRECGFINDEIFVELVNALGQ
YNDDDDDDDGDDPDEREEKQKDLEDNRDDKETCPPRKFPADKIFE
AISSMFPDKGTAEELKEKYKELTEQQLPGALPPECTPNIDGPNAKS
VQREQSLHSFHTLFCRRCFKYDCFLHPFHATPNTYKRKNTETALD
NKPCGPQCYQHLEGAKEFAAALTAERIKTPPKRPGGRRRGRLPNN
SSRPSTPTISVLESKDTDSDREAGTETGGENNDKEEEEKKDETSSSS
EANSRCQTPIKMKPNIEPPENVEWSGAEASMFRVLIGTYYDNFCAI
ARLIGTKTCRQVYEFRYKESSIIAPYPTEDVDTPPRKKKRKHRLWA
AHCRKIQLKKDGSSNHVYNYQPCDHPRQPCDSSCPCVIAQNFCEK
FCQCSSECQNRFPGCRCKAQCNTKQCPCYLAVRECDPDLCLTCGA
ADHWDSKNVSCKNCSIQRGSKKHLLLAPSDVAGWGIFIKDPVQKN
EFISEYCGEIISQDEADRRGKVYDKYMCSFLFNLNNDFVVDATRKG
NKIRFANHSVNPNCYAKVMMVNGDHRIGIFAKRAIQTGEELFFDY
RYSQADALKYVGIEREMEIP
STGGSGGSGGSGGSGGSGRPMDKKYSI
GLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFH
RLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDS
TDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQT
YNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLF
GNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGD
QYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQD
LTLLKALVRQQKKKRKVGLPEKYKEIFFDQSKNGYAGYIDGGASQ
EEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIH
LGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYINGPLARGNSRF
AWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKV
LPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLL
FKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLL
KIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDD
KVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFAN
RNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKG
ILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRER
MKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVD
QELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPS
EEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGF
IKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKL
VSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESE
FVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA
NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKT
EVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVL
VVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEV
KKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFL
YLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILA
DANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDT
TIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD
GGSGGSGG
SGGSGGSASGGGSGGGSKRPAATKKAGQAKKKKGGSGSGATNFSLLKQ
HPRQPCDSSCPCVIAQNFCEKFCQCSSECQNRFPGCRCKAQCNTK
QCPCYLAVRECDPDLCLTCGAADHWDSKNVSCKNCSIQRGSKKH
LLLAPSDVAGWGIFIKDPVQKNEFISEYCGEIISQDEADRRGKVYD
KYMCSFLFNLNNDFVVDATRKGNKIRFANHSVNPNCYAKVMMVN
GDHRIGIFAKRAIQTGEELFFDYRYSQADALKINGIEREMEIP
STGG
SGGSGGSGGSGGSGRPMDKKYSIGLAIGTNSVGWAVITDEYKVPSK
KFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRR
KNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNI
VDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGH
FLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSA
RLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAE
DAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDIL
RVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQKKKRKVGLP
EKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELL
VKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDN
REKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVV
DKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKV
KYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI
ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDI
VLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLS
RKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQ
KAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRH
KPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHP
VENTQLQNEKLYLYYLQNGRDMINDQELDINRLSDYDVDAIVPQS
FLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA
KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL
DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSE
QEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV
WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDK
LIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE
LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENG
RKRMLASAGELQKGNELALPSKINNFLYLASHYEKLKGSPEDNEQ
KQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDK
PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATL
IHQSITGLYETRIDLSQLGGD
GGSGGSGGSGGSGGSASGGGSGGGSK
RPAATKKAGQAKKKKGGSGSGATNFSLLKQAGDVEENPGPAAA
P
STGGSGGSGGSGGSGGSGRPMDKKYSIGLAIGTNSVGWAVITDEY
KVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRR
YTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHP
IFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIK
FRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAK
AILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNF
DLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAIL
LSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQKKKR
KVGLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDG
TEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYP
FLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWN
FEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNE
LTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKED
YFKKIECFDSVEISGYEDRFNASLGTYHDLLKIIKDKDFLDNEENED
ILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGW
GRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK
EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKV
MGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQI
LKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVD
AIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWR
QLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITK
HVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKV
REINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRK
MIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNG
ETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILP
KRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKK
LKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSL
FELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGS
PEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYN
KHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEV
LDATLIHQSITGLYETRIDLSQLGGD
GGSGGSGGSGGSGGSASGGGS
TPRQNLKCVRILKQFHKDLERELLRRHHRSKTPRHLDPSLANYLV
QKAKQRRALRRWEQELNAKRSHLGRITVENEVDLDGPPRAFVYIN
EYRVGEGITLNQVAVGCECQDCLWAPTGGCCPGASLHKFAYNDQ
GQVRLRAGLPIYECNSRCRCGYDCPNRVVQKGIRYDLCIFRTDDG
RGWGVRTLEKIRKNSFVMEYVGEIITSEEAERRGQIYDRQGATYLF
DLDYVEDVYTVDAAYYGNISHFVNHSCDPNLQVYNVFIDNLDERLP
RIAFFATRTIRAGEELTFDYNMQVDPVDMESTRMDSNFGLAGLPG
SPKKRVRIECKCGTESCRKYLF
STGGSGGSGGSGGSGGSGRPMDKK
YSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGA
LLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDS
FFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKL
VDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQL
VQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKK
NGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLA
QIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDE
HHQDLTLLKALVRQQKKKRKVGLPEKYKEIFFDQSKNGYAGYID
GGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSI
PHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLAR
GNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNL
PNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKA
IVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGT
YHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA
HLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKS
DGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSP
AIKKGILQTVKVVDELVKYMGRHKPENIVIEMARENQTTQKGQK
NSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGR
DMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKYLTRSDKNRGK
SDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSE
LDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVI
TLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKY
PKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFK
TEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVN
IYKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTV
AYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK
GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPS
KYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEF
SKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPA
AFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD
G
GSGGSGGSGGSGGSASGGGSGGGSKRPAATKKAGQAKKKKGGSGSGAT
GANPELRNKEGDTAWDLTPERSDVWFALQLNRKLRLGVGNRAIR
TEKIICRDVARGYENVPIPCVNGVDGEPCPEDYKYISENCETSTMNI
DRNITHLQHCTCVDDCSSSNCLCGQLSIRCWYDKDGRLLQEFNKIE
PPLIFECNQACSCWRNCKNRVVQSGIKVRLQLYRTAKMGWGVRA
LQTIPQGTFICEYVGELISDAEADVREDDSYLFDLDNKDGEVYCIDA
RYYGNISRFINHLCDPNIIPVRVFMLHQDLRFPRIAFFSSRDIRTGEE
LGFDYGDRFWDIKSKYFTCQCGSEKCKHSAEAIALEQSRLARLDP
HPELLPELGSLPPVN
STGGSGGSGGSGGSGGSGRPMDKKYSIGLAIG
TNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGET
AEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES
FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKAD
LRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLF
EENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIA
LSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYAD
LFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLL
KALVRQQKKKRKVGLPEKYKEIFFDQSKNGYAGYIDGGASQEEFY
KFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGEL
HAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWM
TRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKH
SLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTN
RKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIK
DKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVM
KQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNF
MQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQ
TVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKR
IEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQEL
DINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEV
VKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKR
QLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSD
FRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVY
GDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGE
IRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQ
TGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVA
KVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD
LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA
SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADAN
LDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDR
KRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD
GGSGGSGGSGGS
GGSASGGGSGGGSKRPAATKKAGQAKKKKGGSGSGATNFSLLKQAGD
TGLLVLKDLGIQVDRYIASEVCEDSITVGMVRHQGKIMYVGDVRS
VTQKHIQEWGPFDLVIGGSPCNDLSIVNPARKGLYEGTGRLFFEFY
RLLHDARPKEGDDRPFFWLFENVVAMGVSDKRDISRFLESNPVMI
DAKEVSAAHRARYFWGNLPGMNRPLASTVNDKLELQECLEHGRI
AKFSKVRTITTRSNSIKQGKDQHFPVFMNEKEDILWCTEMERVFG
FPVHYTDVSNMSRLARQRLLGRSWSVPVIRHLFAPLKEYFACV
STG
GSGGSGGSGGSGGSGRPMDKKYSIGLAIGTNSVGWAVITDEYKVPS
KKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTR
RKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFG
NIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRG
HFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS
ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLA
EDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI
LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQKKKRKVGL
PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL
LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKD
NREKIEKILTFRIPYYVNGPLARGNSRFAWMTRKSEETITPWNFEEV
VDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK
VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFK
KIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE
DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGR
LSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI
QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR
HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH
PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQ
SFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA
KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL
DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSE
QEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV
WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDK
LIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE
LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENG
RKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ
KQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDK
PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATL
IHQSITGLYETRIDLSQLGGD
GGSGGSGGSGGSGGSASGGGSGGGSK
RPAATKKAGQAKKKKGGSGSGATNFSLLKQAGDVEENPGPAAA
QLTKPDVILRLEKGEEPWLVEREIHQETHP
STGGSGGSGGSGGSGG
SGRPMDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHS
IKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNE
MAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTI
YHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDV
DKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQ
LPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDD
LDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASM
IKRYDEHHQDLTLLKALVRQQKKKRKVGLPEKYKEIFFDQSKNG
YAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQR
TFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYY
VGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMT
NFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLS
GEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRF
NASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEE
RLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKT
ILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHI
ANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQT
TQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLY
YLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRS
DKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE
RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI
REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGT
ALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSN
IMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVL
SMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYG
GFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNP
IDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKG
NELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDE
IIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLT
NLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLS
QLGGD
GGSGGSGGSGGSGGSASGGGSGGGSKRPAATKKAGQAKKKKG
PSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYT
RRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIF
GNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFR
GHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAIL
SARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLA
EDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI
LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQKKKRKVGL
PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL
LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKD
NREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEV
VDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK
VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFK
KIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE
DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGR
LSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI
QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR
HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH
PVENTQLQNEKLYLYYLQNGRDMINDQELDINRLSDYDVDAIVPQ
SFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA
KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL
DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSE
QEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV
WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDK
LIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE
LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENG
RKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ
KQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDK
PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATL
IHQSITGLYETRIDLSQLGGD
GGSGGSGGSGGSGGSASMGQTGKKS
EKGPVCWRKRVKSEYMRLRQLKRFRRADEVKTMFSSNRQKILER
TETLNQEWKQRRIQPVHIMTSVSSLRGTRECSVTSDLDFPAQVIPL
KTLNAVASVPIMYSWSPLQQNFMVEDETVLHNIPYMGDEVLDQDG
TFIEELIKNYDGKVHGDRECGFINDEIFVELVNALGQYNDDDDDDD
GDDPDEREEKQKDLEDNRDDKETCPPRKFPADKIFEAISSMFPDKG
TAEELKEKYKELTEQQLPGALPPECTPNIDGPNAKSVQREQSLHSF
HTLFCRRCFKYDCFLHPFHATPNTYKRKNTETALDNKPCGPQCYQ
HLEGAKEFAAALTAERIKTPPKRPGGRRRGRLPNNSSRPSTPTISVL
ESKDTDSDREAGTETGGENNDKEEEEKKDETSSSSEANSRCQTPIK
MKPNIEPPENVEWSGAEASMFRVLIGTYYDNFCAIARLIGTKTCRQ
VYEFRVKESSIIAPVPTEDVDTPPRKKKRKHRLWAAHCRKIQLKK
DGSSNHVYNYQPCDHPRQPCDSSCPCVIAQNFCEKFCQCSSECQNR
FPGCRCKAQCNTKQCPCYLAVRECDPDLCLTCGAADHWDSKNVS
CKNCSIQRGSKKHLLLAPSDVAGWGIFIKDPVQKNEFISEYCGEIIS
QDEADRRGKVYDKYMCSFLFNLNNDFVVDATRKGNKIRFANHSV
NPNCYAKVMMVNGDHRIGIFAKRAIQTGEELFFDYRYSQADALKY
VGIEREMEIP
TSGGGSGGGSKRPAATKKAGQAKKKKGGSGSGATNFSL
PSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYT
RRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIF
GNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFR
GHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAIL
SARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLA
EDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI
LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQKKKRKVGL
PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL
LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKD
NREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEV
VDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK
VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFK
KIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE
DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGR
LSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI
QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR
HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH
PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQ
SFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA
KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL
DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSE
QEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV
WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDK
LIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE
LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENG
RKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ
KQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDK
PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATL
IHQSITGLYETRIDLSQLGGD
GGSGGSGGSGGSGGSASTEDVDTPPR
KKKRKHRLWAAHCRKIQLKKDGSSNHVYNYQPCDHPRQPCDSSC
PCVIAQNFCEKFCQCSSECQNRFPGCRCKAQCNTKQCPCYLAVRE
CDPDLCLTCGAADHWDSKNVSCKNCSIQRGSKKHLLLAPSDVAG
WGIFIKDPVQKNEFISEYCGEIISQDEADRRGKVYDKYMCSFLFNL
NNDFVVDATRKGNKIRFANHSVNPNCYAKVMMVNGDHRIGIFAK
RAIQTGEELFFDYRYSQADALKYVGIEREMEIP
TSGGGSGGGSKRPA
ATKKAGQAKKKKGGSGSGATNFSLLKQAGDVEENPGPAAA
PSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYT
RRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIF
GNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFR
GHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAIL
SARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLA
EDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI
LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQKKKRKVGL
PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL
LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKD
NREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEV
VDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK
VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFK
KIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE
DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGR
LSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI
QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR
HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH
PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQ
SFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA
KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL
DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSE
QEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV
WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDK
LIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE
LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENG
RKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ
KQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDK
PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATL
IHQSITGLYETRIDLSQLGGD
GGSGGSGGSGGSGGSASMSRRKQSNP
RQIKRSLGDMEAREEVQLVGASHMEQKATAPEAPSP
TSGGGSGGG
PSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYT
RRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIF
GNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFR
GHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAIL
SARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLA
EDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI
LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQKKKRKVGL
PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL
LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKD
NREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEV
VDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK
VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFK
KIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE
DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGR
LSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI
QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR
HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH
PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQ
SFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA
KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL
DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSE
QEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV
WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDK
LIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE
LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENG
RKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ
KQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDK
PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATL
IHQSITGLYETRIDLSQLGGD
GGSGGSGGSGGSGGSASPRQNLKCVR
ILKQFHKDLERELLRRHHRSKTPRHLDPSLANYLVQKAKQRRALR
RWEQELNAKRSHLGRITVENEVDLDGPPRAFVYINEYRVGEGITLN
QVAVGCECQDCLWAPTGGCCPGASLHKFAYNDQGQVRLRAGLPI
YECNSRCRCGYDCPNRVVQKGIRYDLCIFRTDDGRGWGVRTLEKI
RKNSFVMEYVGEIITSEEAERRGQIYDRQGATYLFDLDYVEDVYTV
DAAYYGNISHFVNHSCDPNLQVYNVFIDNLDERLPRIAFFATRTIRA
GEELTFDYNMQVDPVDMESTRMDSNFGLAGLPGSPKKRVRIECKC
GTESCRKYLF
TSGGGSGGGSKRPAATKKAGQAKKKKGGSGSGATNFSL
PSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYT
RRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIF
GNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFR
GHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAIL
SARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLA
EDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI
LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQKKKRKVGL
PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL
LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKD
NREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEV
VDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK
VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFK
KIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE
DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGR
LSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI
QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR
HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH
PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQ
SFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA
KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL
DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSE
QEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV
WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDK
LIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE
LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENG
RKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ
KQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDK
PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATL
IHQSITGLYETRIDLSQLGGD
GGSGGSGGSGGSGGSASGSAAIAEVLL
NARCDLHAVNYHGDTPLHIAARESYHDCVLLFLSRGANPELRNKE
GDTAWDLTPERSDVWFALQLNRKLRLGVGNRAIRTEKIICRDVAR
GYENVPIPCVNGVDGEPCPEDYKYISENCETSTMNIDRNITHLQHC
TCVDDCSSSNCLCGQLSIRCWYDKDGRLLQEFNKIEPPLIFECNQA
CSCWRNCKNRVVQSGIKVRLQLYRTAKMGWGVRALQTIPQGTFI
CEYVGELISDAEADVREDDSYLFDLDNKDGEVYCIDARYYGNISRFI
NHLCDPNIIPVRVFMLHQDLRFPRIAFFSSRDIRTGEELGFDYGDRF
WDIKSKYFTCQCGSEKCKHSAEAIALEQSRLARLDPHPELLPELGS
LPPVN
TSGGGSGGGSKRPAATKKAGQAKKKKGGSGSGATNFSLLKQAG
PSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYT
RRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIF
GNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFR
GHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAIL
SARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLA
EDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI
LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQKKKRKVGL
PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL
LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKD
NREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEV
VDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK
VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFK
KIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE
DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGR
LSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI
QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR
HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH
PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQ
SFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA
KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL
DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSE
QEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV
WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDK
LIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE
LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENG
RKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ
KQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDK
PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATL
IHQSITGLYETRIDLSQL
GGDGGSGGSGGSGGSGGSASRAPSRLQMF
FANNHDQEFDPPKVYPPVPAEKRKPIRVLSLFDGIATGLLVLKDLG
IQVDRYIASEVCEDSITVGMVRHQGKIMYVGDVRSVTQKHIQEWG
PFDLVIGGSPCNDLSIVNPARKGLYEGTGRLFFEFYRLLHDARPKE
GDDRPFFWLFENVVAMGVSDKRDISRFLESNPVMIDAKEVSAAHR
ARYFWGNLPGMNRPLASTVNDKLELQECLEHGRIAKFSKVRTITT
RSNSIKQGKDQHFPVFMNEKEDILWCTEMERVFGFPVHYTDVSNM
SRLARQRLLGRSWSVPVIRHLFAPLKEYFACV
TSGGGSGGGSKRPAA
TKKAGQAKKKKGGSGSGATNFSLLKQAGDVEENPGPAAA
PSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYT
RRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIF
GNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFR
GHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAIL
SARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLA
EDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI
LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQKKKRKVGL
PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL
LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKD
NREKIEKILTFRIPYTVGPLARGNSRFAWMTRKSEETITPWNFEEV
VDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK
VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFK
KIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE
DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGR
LSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI
QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR
HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH
PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQ
SFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA
KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL
DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY
HHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSE
QEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV
WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDK
LIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE
LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENG
RKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ
KQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDK
PIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATL
IHQSITGLYETRIDLSQLGGD
GGSGGSGGSGGSGGSASTLVTFKDVF
VDFTREEWKLLDTAQQIVYRNVMLENYKNLVSLGYQLTKPDVILR
LEKGEEPWLVEREIHQETHP
TSGGGSGGGSKRPAATKKAGQAKKKKG
The present application claims priority to U.S. Provisional Application No. 62/568,156, filed Oct. 4, 2017, the disclosure of which is herein incorporated by reference in its entirety for all purposes.
This invention was made with Government support under Grant No. CA204563, awarded by the National Institutes of Health. The Government has certain rights in this invention.
Number | Date | Country | |
---|---|---|---|
62568156 | Oct 2017 | US |