This invention pertains to the chimeric proteins and methods for their use in living cells.
The instant application contains a Sequence Listing that has been submitted in ASCII format via EFS-Web and is hereby incorporated by reference in its entirety. The ASCII copy, created on ______, is named ______, and is ______ bytes in size.
This invention pertains to optimized protein fusion linkers for creating multi-functional chimeric proteins and methods of using the same. Additionally, the invention pertains to chimeric proteins for use in guided endonuclease systems.
The fusion of guided endonucleases to one or more unrelated enzymes is desirable for many reasons. In some cases, it is useful to visualize and monitor successful delivery of guided endonuclease reagents into the target cell nucleus or nucleolus. In other cases, it may be useful to fuse DNA modifying enzymes to affect a specific repair outcome after guided endonuclease cleavage.
The construction of recombinant fusion protein requires the selection of a suitable linker to join the protein domains. Direct fusion of functional domains without a linker may lead to many problems, including misfolding of the fusion protein, low yield in protein production, or impaired bioactivity. For example, a barrier to using engineered nucleases fused to a fluorescent protein is that such fusions tend to have lower editing activity presumably due to steric interference caused by the unnatural fusion of the two proteins.
Achieving high functional activity for the fusion proteins can be dependent on the nature of the linker between the subunits as direct fusion of proteins can inhibit folding, stability and biological activity. Additionally, linked domains can interfere with each other's functions and physically separating these domains can lead to improved function. The nature of the linker between domains can influence average separation distances depending on linker rigidity and length.
Linkers are typically placed in three classifications: flexible, rigid, and cleavable. Flexible linkers usually consist of glycine repeats, with the periodic addition of a polar serine residue to disrupt linker-protein interaction. Rigid linkers include A(EAAAK)nA repeats that form an alpha helical structure or a Pro-rich sequence, (XP)n that form relatively rigid extended rod like structures.
As such, what are needed are methods and compositions to overcome the existing challenges of current technologies. New methods and compositions for a set of universal peptide linkers that have been specifically optimized in guided endonucleases enzyme fusions is desirable to improve function of the guided endonuclease as well as the covalently fused protein partner.
In general, this invention pertains to methods and compositions for improved multi-functional chimeric proteins and improved linkers. In some embodiments the chimeric proteins comprise guided endonuclease proteins such as RNA guided endonucleases (“RGEN(s)”), including a Cas/CRISPR protein that are covalently fused to a partner protein with a set of universal peptide linkers. In additional embodiments, the fusion protein comprises rigid linker systems to fuse two proteins. In a further embodiment, the rigid linkers may be used for fusion proteins comprising a Cas protein fused to a fluorescent protein.
The fusion of Cas proteins of guided endonucleases to one or more unrelated protein partners is desirable for many reasons. For example, one embodiment includes the ability to generate CRISPR/Cas9 protein fusions to cleave double stranded DNA at precise locations in living cells. Cas9 is an RNA guided endonuclease from Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-Cas (CRISPR-associated)) bacterial adaptive immune system of Steptococcus pyogenes. Cas9 is guided to a 23-nt DNA target sequence by a target site specific 20-nt complementary RNA (part of the 44-nt crRNA) and a universal 89-nt tracrRNA, collectively referred to as the guide RNA (gRNA) complex. The Cas9-gRNA ribonucleoprotein (RNP) complex mediates double-stranded DNA breaks (DSBs) which are then repaired by either the non-homologous end joining (NHEJ), which typically introduces mutations or indels at the cut site that frequently lead to gene disruption through frameshift mutation or the homology directed repair (HDR) system if a suitable template nucleic acid is present.
In one embodiment, a set of universal peptide linkers that have been specifically optimized for CRISPR enzyme fusions is desirable to improve the function of both Cas9 and the covalently-fused partner protein or protein domain. The universal linkers would not be specific to a Cas9 protein or any mutant variant thereof. In an additional embodiment, the CRISPR enzyme could be optimally linked with the universal linkers to a fluorescent protein.
In one embodiment, rigid linkers are used to covalently fuse two different proteins or protein domains. In one respect, the rigid linkers include A(EAAAK)nA repeats that form an alpha helical structure to provide rigidity is used. In another respect, a (XP)n repeat that forms relatively rigid extended rod like structures is used.
In a first aspect, a chimeric protein is provided. The chimeric protein includes a guided endonuclease protein, a rigid linker, and a second protein.
In a second aspect, a method of enriching cells having a chimeric protein is provided. The method includes several steps. The first step includes incubating a chimeric protein according to the first aspect with a guide RNA to form a RNP complex. The second step includes contacting the RNP complex to a plurality of target cells to produce recipient cells having the RNP complex. The third step includes sorting the recipient cells based on a fluorescence signal.
In a third aspect, an isolated nucleic acid encoding a chimeric protein is provided. The chimeric protein includes the chimeric protein of the first aspect.
In a fourth aspect, a chimeric protein is provided. The chimeric protein includes a guided endonuclease protein, a universal linker, and a second protein.
In a fifth aspect, an isolated nucleic acid is provided. The isolated nucleic acid encodes the chimeric protein according to any of respects of the fourth aspect.
In a sixth aspect, a method of enriching cells having a chimeric protein is provided. The method includes several steps. The first step includes incubating a chimeric protein according to any of the respects of the fourth aspect with a guide RNA to form a RNP complex. The second step includes contacting the RNP complex to a plurality of target cells to produce recipient cells having the RNP complex. The third step includes sorting the recipient cells based on a fluorescence signal.
The methods and compositions of the invention described herein provide for improved universal linkers for covalently fusing two or more proteins or protein domains. Additionally, the present invention describes methods of using the chimeric fusions. In particular, the disclosed chimeric proteins provide a robust manner whereby cells that contain the chimeric protein may be identified and sorted based upon the ability of the chimeric protein to generate a fluorescence signal. Moreover, not only do the chimeric proteins retain their enzymatic activity as guided endonucleases, but also several chimeric proteins possess surprisingly enhanced activity when compared to unmodified endonucleases or fusion proteins lacking a linker. These and other advantages of the invention, as well as additional inventive features, will be apparent from the description of the invention provided herein.
The use of the terms “a” and “an” and “the” and similar referents in the context of describing the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “comprising”, “having”, “including” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but no limited to”) unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.
The term, “codon-optimized” as that term modifies “nucleic acid,” “gene,” “DNA,” or “RNA,” that encodes a protein or polypeptide, refers to a nucleic acid, gene, DNA or RNA that includes preferred codons for efficient expression of the protein or polypeptide in a given host cell or organism based upon the naturally occurring abundance of charged tRNA's specific for codons in that host cell or organism. By way of example, a codon-optimized Cas9 nucleic acid for E. coli will be suitable for optimal expression of the Cas9 protein when that nucleic acid is expressed in E. coli. Similarly, a codon-optimized Cas9 nucleic acid for human cells will be suitable for optimal expression of the Cas9 protein when that nucleic acid is expressed in a human cell. Exemplary codon-optimized nucleic acids are disclosed herein. A codon-optimized nucleic acid, gene, DNA, RNA can be readily generated from a naturally occurring endogenous DNA or RNA derived from the original host cell or organism or from reverse translation of the protein amino acid sequence for the relevant protein or polypeptide sequence. One of ordinary skill in the art would understand from the literature the codon bias or preference rules for a variety of host cells or organisms. Furthermore, codon-optimized nucleic acid conversion software programs are readily available online or generally known in the art.
The terms “RNA-guided endonuclease” and “RGEN” refer to a ribonucleoprotein endonuclease that includes an RNA component for targeted enzyme activity on a given substrate. Exemplary RGEN's include CRISPR-associated endonucleases.
The term “CRISPR” refers to Clustered Regularly Interspaced Short Palindromic Repeat bacterial adaptive immune system.
The terms “Cas” and “Cas endonuclease” generally refers to a CRISPR-associated endonuclease.
The term “Cas protein” generally refers to a wild-type protein, including a variant thereof, of a CRISPR-associated endonuclease (including the interchangeable terms Cas and Cas endonuclease).
The term “Cas nucleic acid” generally refers to a nucleic acid of a CRISPR-associated endonuclease, including a guide RNA, sgRNA, crRNA, or tracrRNA.
The terms “Cas9” and “CRISPR/Cas9” refer to the CRISPR-associated bacterial adaptive immune system of Steptococcus pyogenes.
The terms “AsCas12a” and “CRISPR/AsCas12a” refer to the CRISPR-associated bacterial adaptive immune system of Acidaminococcus sp.
The terms “LbCas12a” and “CRISPR/LbCas12a” refer to the CRISPR-associated bacterial adaptive immune system of Lachnospiraceae bacterium.
The term “Cas9 protein” refers to the protein of the Cas9 or CRISPR/Cas9 endonuclease system. For the purposes of this disclosure, the wild-type Cas9 protein amino acid sequence is SEQ ID NO: 1.
The term “AsCas12a protein” refers to the protein of the AsCas12a or CRISPR/AsCas12a endonuclease system. For the purposes of this disclosure, the wild-type AsCas12a protein amino acid sequence is SEQ ID NO: 2.
The term “LbCas12a protein” refers to the protein of the LbCas12a or CRISPR/LbCas12a endonuclease system. For the purposes of this disclosure, the wild-type LbCas12a protein amino acid sequence is SEQ ID NO: 3.
The term “Cas9 nucleic acid” refers to a nucleic acid (e.g., DNA or RNA) that encodes a Cas9 protein or polypeptide. A Cas9 nucleic acid can be selected from the naturally occurring nucleic acid from Steptococcus pyogenes or a codon-optimized nucleic acid for efficient expression in a given host cell or organism. For the purposes of this disclosure, exemplary codon-optimized versions of a Cas9 nucleic acid are SEQ ID NOS: 337 and 338.
The term “AsCas12a nucleic acid” refers to a nucleic acid (e.g., DNA or RNA) that encodes a AsCas12a protein or polypeptide. An AsCas12a nucleic acid can be selected from the naturally occurring nucleic acid from Acidaminococcus sp. or a codon-optimized nucleic acid for efficient expression in a given host cell or organism. For the purposes of this disclosure, exemplary codon-optimized versions of a AsCas12a nucleic acid are SEQ ID NOS: 339 and 340.
The term “LbCas12a nucleic acid” refers to a nucleic acid (e.g., DNA or RNA) that encodes a LbCas12a protein or polypeptide. A LbCas12a nucleic acid can be selected from the naturally occurring nucleic acid from Lachnospiraceae bacterium or a codon-optimized nucleic acid for efficient expression in a given host cell or organism. For the purposes of this disclosure, exemplary codon-optimized versions of a LbCas12a nucleic acid are SEQ ID NOS: 341 and 342.
The terms “guide RNA,” “guide RNA complex” and “gRNA complex” refer to a target site-specific crRNA, a universal tracrRNA or a combination of both.
The term “sgRNA” refers to a guide RNA complex in which the crRNA is covalently linked to the tracrRNA in a single molecule.
The term “variant,” as that term modifies a protein (for example, a Cas9 protein, AsCas12a protein or LbCas12a protein), refers to a protein that includes at least one amino substitution of the reference wild-type protein amino acid sequence, additional amino acids (for example, such as an affinity tag or nuclear localization signal), or a combination thereof. An exemplary LbCas12a protein variant amino acid sequence is LbCas12a (E795L) (SEQ ID NO: 310), and chimeric proteins comprising this amino acid sequence is presented in Tables V.7, V.8 and V.10.
The term ALT-R®, as that term modifies an RNA (for example, such as a crRNA, a tracrRNA, a guide RNA, or a sgRNA), refers to an isolated, chemically-synthesized, synthetic RNA.
The term ALT-R®, as that term modifies a protein (for example, such as a Cas9 protein, an AsCas12a protein, or a LbCas12a protein), refers to an isolated, recombinant protein.
The terms “fusion protein,” “protein fusion,” “chimeric protein,” “protein chimera, “and “chimeric fusion protein” refer to a first protein or polypeptide having a covalent bond to at least one or more proteins or polypeptides, wherein the at least one or more proteins or polypeptides differ in primary sequence composition from the first protein or polypeptide. The terms fusion protein, protein fusion, chimeric protein, protein chimera and chimeric fusion protein have the same meaning and are used interchangeably. Exemplary fusion proteins are disclosed herein.
The term “universal,” as applied to modify a “guide RNA,” a “linker,” or a “linker peptide, refers to a guide RNA, linker or linker peptide for use as a guide RNA, linker or linker peptide in more than one RGEN (as related to a guide RNA) or more than one fusion protein, (as related to a linker or linker peptide). Exemplary universal linkers and universal linker peptides include flexible linkers, rigid linkers and mixed linkers, including those linkers identified in Table I and references cited therein, which are incorporated by reference in their entirety.
The term “editing activity assay” refers to an assay to determine the extent of editing of a locus by an RGEN, such as a Cas endonuclease, on at least one locus targeted by the guide RNA. Exemplary editing activity assays include those selected from a T7EI assay and a Next Generation Sequencing (NGS) assay. Exemplary T7EI assay procedures are disclosed in U.S. patent application Ser. No. 14/975,709, filed Dec. 18, 2015 (Attorney Docket IDT01-008-US), the contents of which are incorporated by reference herein. Exemplary NGS assays are disclosed in U.S. patent application Ser. No. 13/935,451, filed Jul. 3, 2013 (Attorney Docket IDT01-001-US), the contents of which are incorporated by reference herein.
The terms “linker” and “linker polypeptide” refers to a polypeptide amino acid sequence of 2 or more amino acids that covalently join a first protein or polypeptide to a second protein polypeptide. Exemplary linkers and linker polypeptides are disclosed herein.
The term “rigid linker” refers to a linker that restricts at least one degree of freedom in the motion or conformation for an affected fusion protein, protein fusion, chimeric protein, or protein chimera that includes the linker. Exemplary rigid linkers are disclosed herein.
In a first embodiment a composition for an improved multi-functional chimeric protein and improved linkers is provided. In another respect the universal peptide linker is a rigid linker system.
In another embodiment, the rigid link is used to fuse two proteins, recombinant proteins, or protein domains into a single covalently fused protein construct. In one respect the Alpha helix forming linker with the sequence of (EAAAK)n is used. In another respect the EAAAK amino acid sequence is repeated n times where n represents 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14 repeats.
In another embodiment, the rigid linker is a Pro-rich sequence. In a further respect, the Pro-rich sequence is (XP)n with X designating any amino acid, including preferably Ala, Lys, or Glu and more preferably Alanine. In another respect, the XP sequence is repeated n times where n represent 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14 repeats. In another respect, the number of repeats is preferably 5, 6, 7, 8, or 9 repeats. In another respect the number of repeats is more preferably 7 or 9 repeats.
In another embodiment, the rigid linker is a Pro-rich sequence. In a further respect, the Pro-rich sequence is (XP)nA with X designating any amino acid, including preferably Ala, Lys, or Glu and more preferably, Alanine and where the XP sequence is repeated n times in the range from 1-14 repeats, including n being 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14 repeats. In another respect the number of repeats is preferably 5, 6, 7, 8, or 9 repeats. In another respect the number of repeats is more preferably 7 or 9 repeats.
In another embodiment, the rigid linker is used to covalently fuse a guided endonuclease and a second protein. In one respect, the rigid linker is used to covalently fuse the C-terminal end of a guided endonuclease to the N-terminal end of the second protein. Preferably, the rigid linker is used to covalently fuse the C-terminal of a RGEN to the N-terminal of a second protein. More preferably, the rigid linker is used to covalently fuse the C-terminal of a CRISPR-Cas to the N-terminal of a second protein.
In a further embodiment, the Pro-rich (XP)n rigid linker, where n is the repeat unit in the range from 1-14 repeats including n being 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14 repeats, is used to covalently fuse the C-terminal end of a guided endonuclease to the N-terminal end of a second protein. Preferably, the (XP)n rigid linker is used to covalently fuse the C-terminal of a RGEN to the N-terminal of a second protein. More preferably, the (XP)n rigid linker is used to covalently fuse the C-terminal of a Cas protein to the N-terminal of a second protein.
In another embodiment, the Pro-rich sequence includes a (PX)nP rigid linker system, where n is the repeat unit in the range from 1-14 repeats including n being 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14 repeats. In one respect, the Pro-rich sequence includes (PA)nP rigid linker system where n is the repeat unit in the range from 1-14 repeats including n being 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14 repeats. In another respect, the Pro-rich sequence includes (PA)6P rigid linker system.
In another embodiment, the rigid linker is used to covalently fuse the C-terminal end of a first protein to the N-terminal of guided endonuclease. In another respect, the rigid linker is used to covalently fuse the C-terminal of a first protein to the N-terminal of a RGEN. In another respect, the rigid linker is used to covalently fuse the C-terminal of a first protein to the N-terminal of a Cas protein.
In another embodiment, the Pro-rich (XP)n rigid linker is used to covalently fuse the C-terminal end of a first protein to the N-terminal of guided endonuclease. In another respect, the (XP)n linker is used to covalently fuse the C-terminal of a first protein to the N-terminal of a RGEN. In another respect the (XP)n rigid linker is used to covalently fuse the C-terminal of a first protein to the N-terminal of a Cas protein.
In another embodiment, the CRISPR/Cas9 protein is one of the proteins covalently fused to the rigid linker. In one respect, the Cas9 protein may be a wild type Cas9 protein. In another respect the Cas protein may be a mutant or variant protein.
In another embodiment, the first protein is a guided endonuclease joined through a rigid linker to a fluorescent protein. Preferably, the guided endonuclease is an RGEN and more preferably is a CRISPR/Cas enzyme. In one respect, the fluorescent protein is eGFP or mCherry. In another aspect, the CRISPR/Cas enzyme is covalently linked to the fluorescent protein with a rigid linker to generate a CRISPR/Cas fluorescent chimeric fusion protein. The CRISPR/Cas fluorescent fusion protein allows for the ability to visualize and monitor successful delivery of CRISPR/Cas reagents into the target cell nucleus or nucleolus. In another respect, the cells may be sorted and enriched for based on the detection of the fluorescent fusion protein.
In a first aspect, a chimeric protein is provided. The chimeric protein includes a guided endonuclease protein, a rigid linker, and a second protein. In a first respect, the guided endonuclease protein is a Cas protein. In a second respect, the Cas protein is selected from the group consisting of Cas9 protein, AsCas12a protein, LbCas12a protein and a variant of Cas9 protein, AsCas12a protein, or LbCas12a protein. In a third respect, the Cas protein is selected from the group comprising SEQ ID NOS: 1, 2, and 3. In a fourth respect, the Cas protein is preferably SEQ ID NO: 1. In a fifth respect, the Cas protein is preferably SEQ ID NO: 2. In a sixth respect, the Cas protein is preferably SEQ ID NO: 3. In a seventh respect, the second protein is a fluorescent protein. In an eighth respect, the fluorescent protein is selected from the group comprising SEQ ID NOS: 4 and 5. In a ninth respect, the fluorescent protein is SEQ ID NO: 4. In a tenth respect, the fluorescent protein is SEQ ID NO: 5. In an eleventh respect, the rigid linker is selected from the group comprising XP, APA, and SEQ ID NOS: 8, 11-23 and 24-36. In a twelve respect, the rigid linker is SEQ ID NO: 8. In a thirteenth respect, the rigid linker is selected from the group comprising XP and SEQ ID NOS: 24-36. In a fourteenth respect, the rigid linker is selected from SEQ ID NOS: 29-31. In a fifteenth respect, the rigid linker is selected from the group comprising APA and SEQ ID NOS: 11-23. In a sixteenth respect, the rigid linker is selected from the group comprising APA and SEQ ID NOS: 17-18. In an eighteenth respect, the chimeric protein is selected from the group that includes SEQ ID NOS: 40-70, 74-104, 108-138, 142-172, 176-206, 210-240, 244-274, and 278-308.
In a second aspect, a method of enriching cells having a chimeric protein is provided. The method includes several steps. The first step includes incubating a chimeric protein according to the first aspect with a guide RNA to form a RNP complex. The second step includes contacting the RNP complex to a plurality of target cells to produce recipient cells having the RNP complex. The third step includes sorting the recipient cells based on a fluorescence signal. In a first respect, the method includes an additional step of performing an editing activity assay on at least one locus targeted by the guide RNA. In a second respect, the step of performing an editing activity assay on at least one locus targeted by the guide RNA is selected from an T7EI assay and a Next Generation Sequencing assay. In a third respect, the method includes a chimeric protein selected from the group consisting of SEQ ID NOS: 40-70, 74-104, 108-138, 142-172, 176-206, 210-240, 244-274, and 278-308.
In a third aspect, an isolated nucleic acid encoding a chimeric protein is provided. The chimeric protein includes the chimeric protein of the first aspect. In a first respect, the chimeric protein comprises a member selected from the group consisting of SEQ ID NOS: 40-70, 74-104, 108-138, 142-172, 176-206, 210-240, 244-274, and 278-308. In a second respect, wherein the isolated nucleic acid is codon optimized for expression in an organism or host cell. In a third respect, the organism or host cell is selected from E. coli or H. sapiens.
In a fourth aspect, a chimeric protein is provided. The chimeric protein includes a guided endonuclease protein, a universal linker, and a second protein. In a first respect, the guided endonuclease protein is a Cas protein. In a second respect, the Cas protein is selected from the group consisting of Cas9 protein, AsCas12a protein, LbCas12a protein, and a variant of Cas9 protein, AsCas12a protein, and LbCas12a protein. In a third respect, the Cas protein is selected from the group comprising SEQ ID NOS: 1, 2, and 3. In a fourth respect, the Cas protein is SEQ ID NO: 1. In a fifth respect, the Cas protein is SEQ ID NO: 2. In a sixth respect, the Cas protein is SEQ ID NO: 3. In a seventh respect, the second protein is a fluorescent protein. In a eighth respect, the fluorescent protein is selected from the group comprising SEQ ID NOS: 4 and 5. In a ninth respect, the fluorescent protein is SEQ ID NO: 4. In a tenth respect, the fluorescent protein is SEQ ID NO: 5. In an eleventh respect according to any of the previous respects of the fourth aspect, the universal linker is selected from a flexible linker, a rigid linker, and a mixed linker. In a twelfth respect, the universal linker comprises a flexible linker selected from the group consisting of SEQ ID NOS: 6 and 7. In a thirteenth respect, the universal linker comprises a rigid linker selected from the group consisting of XP, APA, and SEQ ID NOS: 8-36. In a fourteenth respect, the universal linker comprises a mixed linker of SEQ ID NOS: 37. In a fifteenth respect, the chimeric protein is selected from the group consisting of SEQ ID NOS: 38-309.
In a fifth aspect, an isolated nucleic acid is provided. The isolated nucleic acid encodes the chimeric protein according to any of respects of the fourth aspect. In a first respect, the isolated nucleic acid is codon optimized for expression in an organism or host cell. In a second respect, the organism or host cell is selected from E. coli or H. sapiens. In a third respect, the isolated nucleic acid is selected from the group consisting of SEQ ID NOS: 348-387 and 389-429.
In a sixth aspect, a method of enriching cells having a chimeric protein is provided. The method includes several steps. The first step includes incubating a chimeric protein according to any of respects of the fourth aspect with a guide RNA to form a RNP complex. The second step includes contacting the RNP complex to a plurality of target cells to produce recipient cells having the RNP complex. The third step includes sorting the recipient cells based on a fluorescence signal. In a first respect, the method includes an additional step of performing an editing activity assay on at least one locus targeted by the guide RNA. In a second respect, the step of performing an editing activity assay on at least one locus targeted by the guide RNA is selected from an T7EI assay and a Next Generation Sequencing assay.
This example identifies linkers that improve Cas9-eGFP activity. Nineteen separate plasmids that express recombinant versions of Cas9 in mammalian cells were constructed wherein the encoded protein has eGFP positioned downstream of the Cas9 carboxy-terminal domain (CTD) and the peptide sequence between Cas9 and eGFP were varied (Table I).
1Waldo 1999;
2Gergeron 2009;
3Bae and Shen 2006;
4McCormick 2001;
5Chen 2017
The constructs were first assayed through expression from plasmids delivered into HEK293 cells with editing assayed after 48-72 hours by T7EI digestion. Constructs showing improved activity were then subcloned into vectors for protein expression in E. coli and recombinant Cas9-fluorescent protein fusion proteins were purified by immobilized metal affinity chromatography followed by ion exchange chromatography. Purified proteins were tested for Cas9 endonuclease activity when delivered as RNP into HEK293 cells with editing assayed after 48 hours by T7 Endonuclease (T7EI) digestion.
The editing efficiency of the 19 different recombinant versions of Cas9 protein when expressed form transiently transfected plasmid in HEK293 was determined. Additionally, for a subset of the constructs were delivered as RNP. The methods and compositions disclosed herein identify linkers that result in improved editing efficiency over a baseline flexible linker design. Rigid linkers of both Pro-rich sequence (XP)n and EAAAK repeat varieties were found to result in an increase in Cas9 activity when delivered as plasmid. Additionally, it was found that certain (XP)n repeat structures showed improvement in editing activity when delivered as RNP.
This example demonstrates that introduction of rigid linkers between Cas9 and eGFP results in an improvement in Cas9 editing activity of a Cas9-eGFP fusion protein when expressed inhuman cells from transfected plasmid. The set of rigid and flexible linkers that were generate in Example 1 were further tested in the context of Cas9-linker-eGFP to assess their impact on Cas9 activity.
The editing activity of Cas9-eGFP fusion proteins containing either rigid helical [A(EAAAK)4A] (linker SEQ ID NO: 8), rigid alanine-proline [(AP)7A] (linker SEQ ID NO: 16), flexible (GGGGS)4 (linker SEQ ID NO: 7), or a mixed flexibility linker GGGGSEAAAKGGGGS (linker SEQ ID NO: 37) were compared to wild-type Cas9 protein and the Cas9-eGFP protein with a flexible GSAGSAAGSGEF (Abbreviated GSA; linker SEQ ID NO: 6) linker serving as a baseline for comparison of editing activity. Fusion proteins were expressed from plasmids using a CMV promoter in HEK293 cells. Plasmids were delivered using the Lonza Nucleofector 96-well shuttle, using SF cell line solution, on setting DS-150 with 400 ng protein expression plasmid and 350,000 cells per well, with cells split into three wells following transfection. Two sgRNAs (HPRT-38087 (SEQ ID NO: 324) and HPRT-38285 (SEQ ID NO: 326)) targeting positions within the HPRT1 gene region were delivered at a final concentration of 30 nM by reverse transfection 24 hours after plasmid delivery using Lipofectamine RNAi Max. To analyze editing activity, genomic DNA was extracted 48 hours after reverse transfection using QuickExtract™ DNA extraction solution. Editing was assayed by PCR amplifying an ˜1 kb region of the HPRT1 gene containing target cleavage sites, melting and re-hybridizing the PCR products, digesting the products with T7EI, and quantifying the proportion of cleaved to full length sequence using the Fragment Analyzer (Agilent).
The results are shown in
To further test rigid linkers, another experiment was performed using the same protocol as in Example 2. A set of 12 different gRNAs (Table II) targeting positions within the HPRT1 gene were tested. The PCR Primers are provided in Table III.
The results are shown in
This example demonstrates the effect of linker length by varying the number of alanine-proline repeats between 1 and 14 (APA and linker SEQ ID NOS: 11-23) and testing two different extend helical linker designs. This experiments follows the same protocol as established in Example 2; however, the guide RNA was expressed from plasmid co-delivered at 100 ng per well and the cells were allowed to grow for three days before collecting genomic DNA.
The results are shown in
This example demonstrates that introduction of a (AP)7A linker (SEQ ID NO: 16) or (AP)9A linker (SEQ ID NO: 18) between Cas9 and eGFP results in an improvement in the editing activity of a Cas9-eGFP fusion protein when delivered into human cells as RNP
To determine whether rigid linkers increase editing activity for Cas9-GFP when delivered as RNP, Cas9-GFP linker variants containing either the GSA linker, helical rigid linker, or alanine-proline rigid linker were expressed in E. coli and purified. RNP complex was formed by incubating Cas9 with ALT-R® sgRNA in PBS in a 1:1.2 ratio for 10 min and then delivered into HEK293 cells at a final RNP concentration of either 0.0625 micromolar, 0.25 micromolar, or 2.0 micromolar, along with 4 micromolar ALT-R® Cas9 Electroporation Enhancer. Editing activity was assayed by T7EI assay as previously described and the results are shown in
At the 2 micromolar dose of RNP, an increase in editing activity for linker variants containing the (AP)7A linker (linker SEQ ID NO: 16; chimeric Cas9 SEQ ID NO: 49) and (AP)9A linker (linker SEQ ID NO: 18; chimeric Cas9 SEQ ID NO: 51) was observed compared to the flexible GSA linker (linker SEQ ID NO: 6; chimeric Cas9 SEQ ID NO: 38) over the tested sites. The benefit of these linkers was also observed for a second different fluorophore. The second fluorophore tested was mCherry, and the same (AP)nA linkers were used to covalently attach the mCherry to Cas9. As seen in
This example demonstrates that the use of an (AP)nA linker between Cas9 and eGFP is compatible with use of Cas9-(AP)nA-eGFP use with FACS to enrich for an edited cell population.
To determine whether Cas9-(AP)nA-eGFP can be used with FACS to enrich for cells with edits, RNP complex was formed as in Example 5 using a chimeric Cas9-(AP)7A-eGFP protein (SEQ ID NO: 49) or wild-type Cas9 (SEQ ID NO: 1) and delivered into cells using Lipofectamine RNAiMAX at 10 nM final concentration in 3 wells of a 6 well dish with 1.2 million cells per well. After ˜16 hours, cells were trypsinized, washed with PBS, and resuspended in PBS containing 1% FBS. Cells were filtered through a 70 uM Flowmi Tip Strainer (Bel-Art). Approximately 10% of cells were transferred into a collection tube containing 500 uls of PBS with 1% FBS as the unsorted control. Cells into which Cas9-eGFP was delivered were sorted based on GFP signal using a Becton Dickinson Aria II cell sorter. Cells were sorted into three populations consisting of the cells with the top ˜20% of signal, the mid ˜80-60% of signal, and the bottom ˜60% of signal. Total editing in cells 48-72 hours after delivery was measured by NGS and results are shown in
To determine whether rigid linkers increase editing efficiency in the context of another RNA guided endonuclease, the LbCas12a protein variant (E795L) (SEQ ID NO: 310) was fused to eGFP (SEQ ID NO: 4) using either no linker (chimeric LbCas12a(E795L)-eGFP protein; SEQ ID NO: 314) or a subset of the linkers listed in Table I was expressed in E. coli and purified. RNP complex was formed by incubating untagged or chimeric LbCas12a (E795L) proteins with ALT-R® LbCas12a crRNA (see Table IV) in PBS in a 1:1.2 ratio for 10 min and then delivered into HEK293 cells at a final concentration of 50 nM along with 3 micromolar ALT-R® Cas12a Electroporation Enhancer. Editing 48 hours after delivery was measured by NGS and the results are shown in
The wild-type Cas9 protein amino acid sequence is presented as SEQ ID NO: 1. Polynucleotides codon-optimized for expression in E. coli and human cells are presented in SEQ ID NOs.: 337 and 338, respectively. Exemplary variants of Cas9 protein, the polynucleotides encoding them, as well as exemplary guide RNAs are disclosed in U.S. patent application Ser. Nos. 15/729,491 and 15/964,041, filed Oct. 10, 2017 and Apr. 26, 2018, respectively (Attorney Docket Nos. IDT01-009-US and IDT01-009-US-CIP, respectively), the contents of which are incorporated by reference herein.
The wild-type AsCas12a protein amino acid sequence is presented as SEQ ID NO: 2. Polynucleotides codon-optimized for expression in E. coli and human cells are presented as SEQ ID NOs.: 339 and 340, respectively. Exemplary variants of AsCas12a protein, the polynucleotides encoding them, as well as exemplary guide RNAs are disclosed in U.S. patent application Ser. No. 16/536,256, filed Aug. 8, 2019, (Attorney Docket No. IDT01-013-US), the contents of which are incorporated by reference herein.
The wild-type LbCas12a protein amino acid sequence is presented as SEQ ID NO: 3. Polynucleotides codon-optimized for expression in E. coli and human cells are presented in SEQ ID NOS: 341 and 342, respectively. Exemplary variants of LbCas12a protein, the polynucleotides encoding them, as well as exemplary guide RNAs are disclosed in U.S. Patent Application Ser. No. 63/018,592, filed May 1, 2020, (Attorney Docket No. IDT01-017-PRO), (now U.S. patent application Ser. No. ______, filed ______), the contents of which are incorporated by reference herein.
Table V.1 provides chimera fusion proteins between Cas9 amino acid sequences, linker amino acid sequences (underlined), and eGFP amino acid sequences (bolded, italicized).
Table V.2 provides chimera fusion proteins between Cas9 amino acid sequences, linker amino acid sequences (underlined), and mCherry amino acid sequences (bolded, italicized).
GSAGSAAGSGEF
GGGGSGGGGSGGGGSGGGGS
AEAAAKEAAAKEAAAKEAAAKA
AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
LEAEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKALE
APA
APAPA
APAPAPA
APAPAPAPA
APAPAPAPAPA
APAPAPAPAPAPA
APAPAPAPAPAPAPA
APAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPAPAPA
XP
XPXP
XPXPXP
XPXPXPXP
XPXPXPXPXP
XPXPXPXPXPXP
XPXPXPXPXPXPXP
XPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXPXPXP
Table V.3 provides chimera fusion proteins between AsCas12a amino acid sequences, linker amino acid sequences (underlined), and eGFP amino acid sequences (bolded, italicized).
GSAGSAAGSGEF
GGGGSGGGGSGGGGSGGGGS
AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
LEAEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKALE
APA
APAPA
APAPAPA
APAPAPAPA
APAPAPAPAPA
APAPAPAPAPAPA
APAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPAPA
XP
XPXP
XPXPXP
XPXPXPXP
XPXPXPXPXP
XPXPXPXPXPXP
XPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXPXP
Table V.4 provides chimera fusion proteins between AsCas12a amino acid sequences, linker amino acid sequences (underlined), and mCherry amino acid sequences (bolded, italicized).
GSAGSAAGSGEF
GGGGSGGGGSGGGGSGGGGS
AEAAAKEAAAKEAAAKEAAAKA
LEAEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKALE
APA
APAPA
APAPAPA
APAPAPAPA
APAPAPAPAPA
APAPAPAPAPAPA
APAPAPAPAPAPAPA
APAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPAPAPA
XP
XPXPXP
XPXPXPXP
XPXPXPXPXP
XPXPXPXPXPXP
XPXPXPXPXPXPXP
XPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXPXPXP
GGGGSEAAAKGGGGS
Table V.5 provides chimera fusion proteins between LbCas12a amino acid sequences, linker amino acid sequences (underlined), and eGFP amino acid sequences (bolded, italicized).
GSAGSAAGSGEF
GGGGSGGGGSGGGGSGGGGS
AEAAAKEAAAKEAAAKEAAAKA
AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
LEAEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKALE
APA
APAPA
APAPAPA
APAPAPAPA
APAPAPAPAPA
APAPAPAPAPAPA
APAPAPAPAPAPAPA
APAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPAPAPA
XP
XPXP
XPXPXP
XPXPXPXP
XPXPXPXPXP
XPXPXPXPXPXP
XPXPXPXPXPXPXP
XPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXPXPXP
GGGGSEAAAKGGGGS
Table V.6 provides chimera fusion proteins between LbCas12a amino acid sequences, linker amino acid sequences (underlined), and mCherry amino acid sequences (bolded, italicized).
GSAGSAAGSGEF
GGGGSGGGGSGGGGSGGGGS
AEAAAKEAAAKEAAAKEAAAKA
AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
LEAEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKALE
APA
APAPA
APAPAPA
APAPAPAPA
APAPAPAPAPA
APAPAPAPAPAPA
APAPAPAPAPAPAPA
APAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPAPAPA
XP
XPXP
XPXPXP
XPXPXPXP
XPXPXPXPXP
XPXPXPXPXPXP
XPXPXPXPXPXPXP
XPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXPXPXP
GGGGSEAAAKGGGGS
Table V.7 provides chimera fusion proteins between LbCas12a variant amino acid sequences (E795L), linker amino acid sequences (underlined), and eGFP amino acid sequences (bolded, italicized).
GSAGSAAGSGEF
GGGGSGGGGSGGGGSGGGGS
AEAAAKEAAAKEAAAKEAAAKA
AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
LEAEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKALE
APA
APAPA
APAPAPA
APAPAPAPA
APAPAPAPAPA
APAPAPAPAPAPA
APAPAPAPAPAPAPA
APAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPAPAPA
XP
XPXP
XPXPX
XPXPXPXP
XPXPXPXPXP
XPXPXPXPXPXP
XPXPXPXPXPXPXP
XPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXPXPXP
GGGGSEAAAKGGGGS
Table V.8 provides chimera fusion proteins between LbCas12a variant amino acid sequences (E795L), linker amino acid sequences (underlined), and mCherry amino acid sequences (bolded, italicized).
GSAGSAAGSGEF
GGGGSGGGGSGGGGSGGGGS
AEAAAKEAAAKEAAAKEAAAKA
AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
LEAEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKALE
APA
APAPA
APAPAPA
APAPAPAPA
APAPAPAPAPA
APAPAPAPAPAPA
APAPAPAPAPAPAPA
APAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPAPA
APAPAPAPAPAPAPAPAPAPAPAPAPAPA
XP
XPXP
/
XPXPXP
XPXPXPXP
XPXPXPXPXP
XPXPXPXPXPX
XPXPXPXPXPXPXP
XPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXP
XPXPXPXPXPXPXPXPXPXPXPXPX
XPXPXPXPXPXPXPXPXPXPXPXPXPXP
GGGGSEAAAKGGGGS
Table V.9 provides the LbCas12a protein variant (E795L) amino acid sequence.
Table V.10 provides chimera fusion proteins between Cas protein amino acid sequences to either eGFP or mCherry amino acid sequences (bolded, italicized) without an intervening linker polypeptide.
Exemplary DNA sequences encoding Cas9 protein chimeras
All references, including publications, patent applications, and patents cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and were set forth in its entirety herein.
This application claims benefit of priority under 35 U.S.C. 119 to U.S. Provisional Patent Application Ser. No. 63/012,658, filed Apr. 20, 2020 and entitled “OPTIMIZED PROTEIN FUSIONS AND LINKERS,” the contents of which are herein incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63012658 | Apr 2020 | US |