The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jan. 19, 2022 is named A2186-7040WO_SL.txt and is 184,784 bytes in size.
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated (Cas) genes, collectively known as CRISPR-Cas or CRISPR/Cas systems, are adaptive immune systems in archaea and bacteria that defend particular species against foreign genetic elements.
It is against the above background that the present invention provides certain advantages and advancements over the prior art.
The disclosure provides Cas12i2 fusion proteins, compositions, systems, and methods of using the Cas12i2 fusion proteins. In particular, such Cas12i2 fusion proteins contain one or more domains, wherein at least one of the domains includes a portion of a Cas12i2 domain and one or more heterologous sequences. The heterologous sequences in the Cas12i2 fusion proteins may include a fusion domain (e.g., a base editing domain, a ssDNA binding domain, an NLS, a poly-basic domain, a restriction endonuclease, or a CRISPR nuclease). The Cas12i2 domain (e.g., at least a portion of SEQ ID NO: 1 or any of SEQ ID NOs: 39-43) in the Cas12i2 fusion proteins may contact (e.g., associate with, recognize, or bind) a target nucleic acid at a position specified by an RNA guide. While the amino acid numbering system used herein is in relation to SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, other Cas12i2 sequences can be used. One of ordinary skill in the art can identify the corresponding amino acid positions in another Cas12i2 sequences using available tools, such as sequence alignment algorithms.
In one aspect, the disclosure provides a Cas12i2 fusion protein comprising:
In some embodiments, n<m. In some embodiment, m=n+1.
In some embodiments, the Cas12i2 fusion protein is a Cas12i2 fusion protein as described herein (e.g., in section Cas12i2 fusion proteins of the detailed description).
In some embodiments, the heterologous sequence comprises a fusion domain (e.g., a base editing domain, a ssDNA binding domain, an NLS, a poly-basis domain, a restriction endonuclease, or a CRISPR nuclease). In certain embodiments, the heterologous sequence comprises at least one linker sequence. In some embodiments, the heterologous sequence comprises a first linker (e.g., a first peptide linker) and a second linker (e.g., a second peptide linker). In certain embodiments, the first linker and the second linker each independently comprise between 3 and 60 amino acid residues. In some embodiments, the first linker and the second linker each independently comprise one or more Gly residues and one or more Ser residues. In certain embodiments, the first linker and the second peptide linker each independently comprise (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In some embodiments, the first linker is N-terminal of the fusion domain and the second linker is C-terminal of the fusion domain. In certain embodiments, the first linker and the second linker are the same. In some embodiments, the first linker and the second linker are different.
In one aspect, the disclosure provides a Cas12i2 fusion protein comprising:
In some embodiments, the heterologous sequence further comprises a fusion domain. In some embodiments, the Cas12i2 fusion protein is a Cas12i2 fusion protein as described herein (e.g., in section Fusion proteins with dimerization domains of the detailed description).
In one aspect, the disclosure provides a Cas12i2 fusion protein comprising:
In some embodiments, the Cas12i2 fusion protein is a Cas12i2 fusion protein as described herein (e.g., in section N-terminal and C-terminal split fusion).
In another aspect, the disclosure provides an engineered, non-naturally occurring Cas12i2 fusion protein comprising:
In some embodiments, the Cas12i2 fusion protein is capable of specifically binding or contacting a target nucleic acid (e.g., a target nucleic acid complementary to the spacer sequence). In some embodiments, the first portion and the second portion are linked by a heterologous sequence. In certain embodiments, the heterologous sequence comprises one or more of:
In some embodiments, the C-terminal most amino acid of the first portion is any amino acid residue of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 chosen from residues:
In any of the embodiments described herein, the Cas12i2 fusion protein further comprises a second heterologous sequence at its N-terminus. In some embodiments, the Cas12i2 fusion protein further comprises an additional heterologous sequence at its C-terminus. In some embodiments, the second heterologous sequence and/or the additional heterologous sequence is chosen from a purification tag, stability tag, or restriction endonuclease or domain thereof. In some embodiments, the heterologous sequence comprises a FokI nuclease domain (e.g., a catalytically active FokI nuclease or a catalytically inactive FokI nuclease domain). In certain embodiments, the N-terminal Met residue of SEQ ID NO: 1, 39-43, 73, or 74 is absent.
In any of the embodiments described herein, the first portion further comprises a fusion domain, the second portion comprises a fusion domain, or the first portion and the second portion comprise a fusion domain. In certain embodiments, a) the first portion comprises a catalytically active FokI nuclease domain and the second portion comprises a catalytically inactive FokI nuclease domain; or
In some embodiments, the Cas12i2 fusion protein is capable of binding an RNA guide, wherein the RNA guide comprises a direct repeat sequence and a spacer sequence, wherein the spacer is capable of hybridizing to a target nucleic acid, e.g., a target strand (i.e., non-PAM strand) of a target nucleic acid.
In any of the aspects described herein, the Cas12i2 fusion protein comprises a catalytic residue (e.g., D599, E833, and D1019). In certain embodiments, the Cas12i2 fusion protein comprises a mutation at any one of amino acid residue D599, E833, or D1019 of SEQ ID NO: 1. In certain embodiments, the Cas12i2 fusion protein is a deadCas12i2 fusion protein (e.g., a variant Cas12i2 fusion protein comprising D599A, E833A, and/or D1019A). In some embodiments, the Cas12i2 fusion protein comprises a catalytically inactive RuvC domain. In certain embodiments, the Cas12i2 fusion protein comprises nickase activity. In some embodiments, the Cas12i2 fusion protein is capable of binding an RNA guide comprising a direct repeat sequence and a spacer sequence, wherein the spacer sequence is capable of hybridizing to a target nucleic acid.
In some embodiments, the heterologous sequence comprises a peptide tag, a fluorescent protein, a base-editing domain, a DNA methylation domain, a histone residue modification domain, a localization factor (e.g., an NLS), a transcription modification factor, a light-gated control factor, a chemically inducible factor, a chromatin visualization factor, restriction endonuclease, or a CRISPR nuclease. In some embodiments, the Cas12i2 fusion protein comprises a fusion domain having an amino acid sequence of SEQ ID NO: 66 or SEQ ID NO: 67, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, the fusion domain is situated at the N-terminus or C-terminus of the Cas12i2 fusion protein. In certain embodiments, the fusion domain comprises an NLS. In some embodiments, the NLS comprises an amino acid sequence of any one of SEQ ID NOs: 61-65, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In certain embodiments, the Cas12i2 protein comprises an amino acid sequence of SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 73, or SEQ ID NO: 74, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, the heterologous sequence is about 1-5, 5-10, 10-20, 20-30, 30-40, 40-50, 50-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1100, 1100-1400, 1400-1600, 1600-1800, or 1800-2000 amino acids in length.
In another aspect, the disclosure provides a system comprising:
In another aspect, the disclosure provides a nucleic acid encoding the Cas12i2 fusion protein or the system described herein.
In one aspect, the disclosure provides a composition comprising: a first nucleic acid encoding the Cas12i2 fusion protein of any aspect described herein and a second nucleic acid comprising or encoding an RNA guide comprising a direct repeat sequence and a spacer sequence, wherein the spacer sequence is capable of hybridizing to a target nucleic acid, e.g., to a target strand (i.e., non-PAM strand) of a target nucleic acid.
In another aspect, the disclosure provides a vector comprising:
Another aspect of the invention provides a cell comprising the Cas12i2 fusion protein of any aspect described herein or the system of any aspect described herein. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a prokaryotic cell.
In another aspect, the disclosure provides a cell comprising the Cas12i2 fusion protein, the system, the nucleic acid, or the vector of any aspect described herein. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a prokaryotic cell.
In another aspect, the disclosure provides a method of binding or contacting the Cas12i2 fusion protein of any aspect described herein, or any system described herein with a target nucleic acid in a cell comprising:
In some embodiments, the target nucleic acid is a double-stranded DNA.
In another aspect, the disclosure provides a method of modifying a target nucleic acid, the method comprising delivering to the target nucleic acid (i) a Cas12i2 fusion protein of aspect described herein, or any system described herein and (ii) an RNA guide comprising a direct repeat sequence and a spacer sequence, wherein the spacer sequence is capable of hybridizing to the target nucleic acid, e.g., to a target strand (i.e., non-PAM strand) of the target nucleic acid, wherein the Cas12i2 fusion protein is capable of binding to the RNA guide, wherein recognition of the target nucleic acid by the CRISPR-associated protein and RNA guide results in a modification of the target nucleic acid. In some embodiments, the modification comprises DNA methylation, epigenetic modification, or DNA cleavage (e.g., single stranded cleavage, double stranded cleavage, or nicking). In some embodiments, the target nucleic acid comprises a target strand and a non-target strand, and the system modifies the target strand. In certain embodiments, the Cas12i2 fusion protein is any Cas12i2 protein comprising a heterologous sequence disposed between any one of residues i) 342-358 (e.g., 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358); ii) 373-378 (e.g., 373, 374, 375, 376, 377, or 378); iii) 408-413 (e.g., 408, 409, 410, 411, 412, or 413); iv) 677-685 (e.g., 677, 678, 679, 680, 681, 682, 683, 684, or 685); v) 718-723 (e.g., 718, 719, 720, 721, 722, or 723); vi) 771-782 (e.g., 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, or 782); or vii) 953-965 (e.g., 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, or 965) of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43.
In some embodiments, the target nucleic acid comprises a target strand and a non-target strand, and the system modifies the non-target strand. In certain embodiments, the Cas12i2 fusion protein is any Cas12i2 protein comprising a heterologous sequence disposed between any one of residues viii) 55-65 (e.g., 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, or 65); ix) 99-105 (e.g., 99, 100, 101, 102, 103, 104, or 105); x) 112-120 (e.g., 112, 113, 114, 115, 116, 117, 118, 119, or 120); xi) 195-206 (e.g., 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, or 206); xii) 241-250 (e.g., 241, 242, 243, 244, 245, 246, 247, 248, 249, or 250); xiii) 583-594 (e.g., 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, or 594); xiv) 877-901 (e.g., 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, or 901) of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43.
In some embodiments of any of the compositions described herein, the system is present in a delivery composition comprising a nanoparticle, a liposome, an exosome, a microvesicle, or a gene-gun.
In some embodiments of any of the compositions described herein, the compositions are within a cell. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the cell is a prokaryotic cell.
Other features and advantages of the invention will be apparent from the following detailed description and from the claims.
The figures are a series of schematics that represent exemplary Cas12i2 fusion proteins.
The present disclosure relates to novel Cas12i2 fusion proteins and methods of use thereof. In some aspects, a composition comprising a Cas12i2 fusion protein having one or more characteristics is described herein. In some aspects, a method of producing a Cas12i2 fusion protein is described. In some aspects, a method of delivering a composition comprising a Cas12i2 fusion protein is described.
The term “base editing domain,” as described herein refers to an agent comprising a polypeptide that is capable of making a chemical modification to a base (e.g., A, T, C, G, or U) within a nucleic acid sequence (e.g., DNA or RNA). In some embodiments, a base editing domain changes a first canonical base into a second canonical base. In some embodiments, a base editing domain changes a canonical base into a non-canonical base.
As used herein, a “biologically active portion” of a polypeptide is a portion of a polypeptide that maintains a function (e.g. completely, partially, or minimally) of the polypeptide (e.g., a Cas12i2 domain (e.g., a “minimal” or “core” domain) or a fusion domain).
As used herein, the term “catalytic residue” refers to an amino acid that activates catalysis. A catalytic residue is an amino acid that is involved (e.g., directly involved) in catalysis.
The term “Cas12i2 fusion protein,” as used herein, refers to a polypeptide having: i) one or more domains, wherein at least one of the domains includes a portion of a Cas12i2 domain and ii) a heterologous sequence, wherein the Cas12i2 fusion protein comes into contact with a target nucleic acid specified by an RNA guide. In some embodiments, the Cas12i2 fusion protein has enzymatic (e.g., nuclease) activity. In some embodiments, an enzymatic activity (e.g., nuclease activity) can be carried out by the Cas12i2 domain. In some instances, the Cas12i2 domain comprises an amino acid sequence having at least 80% (e.g., 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) sequence identity to any one of SEQ ID NOs: 1 or 39-43 or a portion thereof. In some instances, the Cas12i2 domain has the sequence of SEQ ID NO: 1 or a portion thereof. In some instances, the Cas12i2 domain includes a first portion and a second portion, wherein the first portion and the second portion together bind to an RNA guide comprising a direct repeat sequence and a spacer sequence. Optionally, the first and second portions are not directly adjacent to each other. For example, in some instances, a heterologous sequence is adjacent to the first portion and to the second portion. In some instances, a heterologous sequence is C-terminal of the first portion and N-terminal of the second portion. In some instances, the heterologous sequence is N-terminal of the first portion and C-terminal of the second portion. While the amino acid numbering system used herein is in relation to SEQ ID NO: 1, other Cas12i2 sequences can be used. One of ordinary skill in the art can identify the corresponding amino acid positions in another Cas12i2 sequences using available tools, such as sequence alignment algorithms.
As used herein, the term “dimerization domain,” refers to a polypeptide domain capable of specifically binding a separate, and compatible, polypeptide domain (e.g., a second compatible dimerization domain). In some embodiments, the dimer is formed by a non-covalent bond between the first dimerization domain and the second compatible dimerization domain. In some embodiments, the first dimerization domain and the second compatible dimerization domain are identical (e.g., a homodimer). In some embodiments, the first dimerization domain and the second dimerization domain are not identical (e.g., a heterodimer). In some embodiments, a dimerization domain is a leucine zipper. In some instances, the dimerization domain is a chemically inducible dimerization domain (e.g., a rapamycin sensitive dimerization domain) that can be regulated by the presence of a small molecule.
As used herein, the terms “domain” and “protein domain” refer to a distinct functional and/or structural unit of a polypeptide. In some embodiments, a domain may comprise a conserved amino acid sequence. As used herein, the term “RuvC domain” refers to a conserved domain or motif of amino acids having nuclease (e.g., endonuclease) activity. As used herein, a protein having a split RuvC domain refers to a protein having two or more RuvC motifs, at sequentially disparate sites within a sequence, that interact in a tertiary structure to form a RuvC domain.
The term “fusion domain,” as used herein, refers to a polypeptide domain that is operably linked to a second, heterologous domain. In some embodiments, the fusion domain is about 1-5, 10-20, 20-50, 50-100, or 100-200 amino acids in length.
The term “heterologous,” when used to describe a first element in reference to a second element means that the first element and second element do not exist in nature disposed as described. For example, a heterologous polypeptide sequence refers to (a) a polypeptide, or portion of a polypeptide that is operably linked to a second polypeptide sequence to which it is not operably linked in nature, (b) a polypeptide or portion of a polypeptide that is not native to a cell in which it is expressed, (c) a polypeptide or portion of a polypeptide that has been altered or mutated relative to its native state, or (d) a polypeptide with an altered expression as compared to the native expression levels under similar conditions. As an example, a heterologous sequence of a polypeptide may be a different sequence or from a different source, relative to other domains or portions of a polypeptide. In some instances, the heterologous sequence includes a fusion domain and at least one linker sequence.
As used herein, the term “insertion” refers to a gain of residues in an amino acid sequence.
As used herein, the term “nuclease” refers to an enzyme capable of cleaving a phosphodiester bond. A nuclease hydrolyzes phosphodiester bonds in a nucleic acid backbone. As used herein, the term “endonuclease” refers to an enzyme capable of cleaving a phosphodiester bond between nucleotides.
As used herein, the terms “parent,” “parent polypeptide,” and “parent sequence” refer to an original polypeptide (e.g., starting polypeptide) to which an alteration is made to produce a variant polypeptide. In some embodiments, the parent is an Cas12i2 having an identical amino acid sequence of the variant at one or more of specified positions. The parent may be a naturally occurring (wild-type) polypeptide. In a particular embodiment, the parent is a polypeptide with at least 60%, at least 61%, at least 62%, at least 63%, at least 64%, at least 65%, at least 70%, at least 72%, at least 73%, at least 74%, at least 75%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity to a polypeptide described herein of any one of SEQ ID NO: 1 and SEQ ID NOs: 39-43.
The term “basic domain,” as used herein refers to a polypeptide domain comprising a plurality of basic amino acids (e.g., histidine, lysine, arginine, or any combination thereof). In some embodiments, the basic domain can bind to a nucleic acid. Optionally, a basic domain can comprise one or more non-basic (e.g., polar, nonpolar, or acidic) amino acids dispersed throughout. For example, in some embodiments, the basic domain comprises a plurality of lysine residues but no histidine or arginine residues. In other embodiments, the basic domain may comprise a plurality of lysine residues and one or both of histidine and arginine residues.
The term “poly-basic domain,” as used herein refers to a polypeptide domain comprising a combination of histidine, lysine, and/or arginine that can bind a nucleic acid, e.g., by interacting with the negatively charged phosphate backbone or DNA through electrostatic interactions, and, optionally, one or more non-basic (e.g., polar, nonpolar, or acidic) amino acids dispersed throughout. In some instances, the poly-basic domain comprises between 5 and 50 (e.g., between 5-10, 10-20, 20-30, 30-40, or 40-50) arginine residues. In some instances, the poly-basic domain comprises between 5 and 50 (e.g., between 5-10, 10-20, 20-30, 30-40, or 40-50) lysine residues. In some instances, the poly-basic domain comprises between 5 and 50 (e.g., between 5-10, 10-20, 20-30, 30-40, or 40-50) histidine residues. In some instances, the poly-basic domain comprises one or more polar amino acids (e.g., Q, N, and/or S) located between a two poly-basic sequences each independently between 5 and 25 (e.g., between 5-10, 10-15, 15-20, or 20-25) residues in length.
The term “polypeptide linker,” as used herein refers to a linker that comprises amino acids and links together two amino acid sequences (e.g., domains). In some embodiments, the polypeptide linker comprises glycine and/or serine residues used alone or in combination. In some embodiments, the peptide linker connects two portions of the Cas12i2 fusion protein together.
As used herein, the term “protospacer adjacent motif” or “PAM” refers to a DNA sequence adjacent to a target sequence to which a complex comprising a CRISPR nuclease (e.g., a Cas12i2 fusion protein) and an RNA guide binds. In some embodiments, a PAM is required for binding of a Cas12i2 fusion protein and an RNA guide to a target nucleic acid. As used herein, the term “adjacent” includes instances in which an RNA guide of the complex specifically binds, interacts, or associates with a target sequence that is immediately adjacent to a PAM. In such instances, there are no nucleotides between the target sequence and the PAM. The term “adjacent” also includes instances in which there are a small number (e.g., 1, 2, 3, 4, or 5) of nucleotides between the target sequence, to which the targeting moiety binds, and the PAM.
As used herein, the terms “reference composition,” “reference sequence,” and “reference” refer to a control, such as a negative control or a parent (e.g., a parent sequence, a parent protein, a wild-type protein, or a complex comprising a parent sequence).
As used herein, the terms “RNA guide” or “RNA guide sequence” refer to any RNA molecule that facilitates the targeting of a polypeptide described herein (e.g., a Cas12i2 fusion protein) to a target nucleic acid. For example, an RNA guide can be a molecule that recognizes (e.g., binds to) a target nucleic acid. An RNA guide may be designed to be complementary to a target nucleic acid, e.g., a target strand (i.e., non-PAM strand) of a target nucleic acid sequence. An RNA guide comprises a DNA targeting sequence and a direct repeat (DR) sequence. The terms CRISPR RNA (crRNA), pre-crRNA, mature crRNA, and gRNA are also used herein to refer to an RNA guide. As used herein, the term “pre-crRNA” refers to an unprocessed RNA molecule comprising a DR-spacer-DR sequence. As used herein, the term “mature crRNA” refers to a processed form of a pre-crRNA; a mature crRNA may comprise a DR-spacer sequence, wherein the DR is a truncated form of the DR of a pre-crRNA and/or the spacer is a truncated form of the spacer of a pre-crRNA.
The term “split fusion domain,” as used herein refers to: (i) a first portion (e.g., an N-terminal portion, a C-terminal portion, or a central portion) of a reference polypeptide, and (ii) a second portion of the reference polypeptide; wherein (i) and (ii) are non-contiguous (e.g., are present on a single polypeptide chain but separated by a Cas12i2 domain or are present on different polypeptide chains); and wherein (i) and (ii) bound together have one or more activity of the reference polypeptide.
The term, “ssDNA binding domain” as used herein refers to a polypeptide domain that binds a single stranded DNA molecule (e.g., an unwound portion of a largely double stranded DNA molecule). In some instances, the ssDNA binding domain comprises a single-stranded DNA binding protein (SSB) found in E. coli (see, e.g., Oakley A. J. Nucleic Acid Research 42(4): 2750-2757, 2014).
As used herein, the term “substantially identical” refers to a sequence, polynucleotide, or polypeptide, that has a certain degree of identity to a reference sequence.
As used herein, the terms “target nucleic acid” and “target sequence” refer to a nucleic acid sequence to which a targeting moiety (e.g., RNA guide) specifically binds. In some embodiments, the DNA targeting sequence of an RNA guide binds to a target nucleic acid. The target nucleic acid is typically a double-stranded molecule, wherein one strand comprises the target sequence adjacent to the PAM and is referred to as the “PAM strand” (i.e., the non-target strand or the non-spacer-complementary strand), and the other, complementary strand is referred to as the “non-PAM strand” (i.e., the target strand or the spacer-complementary strand).
The present disclosure provides, e.g., fusion proteins including: i) one or more domains, wherein at least one of the domains includes a portion of a Cas12i2 domain and ii) a heterologous sequence, wherein the Cas12i2 fusion protein comes into contact with (e.g., associates with, recognizes, or binds) a target nucleic acid with an RNA guide. In some embodiments, the Cas12i2 fusion protein has enzymatic activity. In some embodiments, the enzymatic activity can be carried out by the Cas12i2 domain. In some embodiments, the heterologous sequence comprises a fusion domain (e.g., a domain having various activities, e.g., methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, RNA cleavage activity, DNA cleavage activity, nucleic acid binding activity, and switch activity (e.g., light inducible)). In some embodiments, the Cas12i2 fusion protein comprises a domain architecture shown, for example, in any of
In one aspect, the disclosure provides a Cas12i2 fusion protein comprising:
In some embodiments, n<m. In some embodiments, m=n+1.
In some embodiments of any Cas12i2 fusion protein described herein, a) n is 342 and m is 343, or b) n is 347 and m is 348. In some embodiments, the first portion comprises at least 273, 280, 290, 300, 310, 320, 330, 340, 341, or 342 amino acids. In certain embodiments, the second portion comprises at least 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 711, or 712 amino acids. In certain embodiments, the C-terminal amino acid(s) of the first portion comprise FDS, DS, or S. In some embodiments, the N-terminal amino acid(s) of the second portion comprise EFS, EF, or E. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of SEFFSGEETYTICVHHL (SEQ ID NO: 2), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, 11 and 12, 12 and 13, or 13 and 14, of SEQ ID NO: 2. In certain embodiments, one or more amino acids of SEQ ID NO: 2 are absent from the Cas12i2 fusion protein. In other embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 sequential amino acids of SEQ ID NO: 2 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 sequential amino acids of SEQ ID NO: 2 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In certain embodiments, n is 374 and m is 375. In some embodiments, the first portion comprises at least 300, 310, 320, 330, 340, 350, 360, 370, 373, 374, 375, 376, or 377 amino acids. In certain embodiments, the second portion comprises at least 544, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, or 680 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise DDP, DP, or P. In some embodiments, the N-terminal amino acid(s) of the second portion comprise ADP, AD, or A. In certain embodiments, the heterologous moiety is situated between any two adjacent amino acids of DPADPE (SEQ ID NO: 3), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, or 5 and 6 of SEQ ID NO: (3). In some embodiments, one or more amino acids of SEQ ID NO: 3 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 3 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 3 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments of the fusion Cas12i2 proteins described herein, a) n is 409 and m is 410 or b) n is 410 and m is 411. In certain embodiments, the first portion comprises at least 328 330, 340, 350, 360, 370, 380, 390, 400, 405, 406, 407, 408, 409, or 410 amino acids. In some embodiments, the second portion comprises at least 516, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 641, 642, 643, 644, or 645 amino acids. In certain embodiments, the C-terminal amino acid(s) of the first portion comprise IRQE, RQ, Q, or E. In some embodiments, the N-terminal amino acid(s) of the second portion comprise ECS, EC, E, or C. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of RQECSA (SEQ ID NO: 4), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, or 5 and 6 of SEQ ID NO: 4. In some embodiments, one or more amino acids of SEQ ID NO: 4 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 4 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 4 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments, n is 682 and m is 683. In some embodiments, the first portion comprises at least 546, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 681, or 682 amino acids. In certain embodiments, the second portion comprises at least 298, 300, 310, 320, 330, 340, 350, 360, 370, 371, or 372 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise KKK, KK, or K. In some embodiments, the N-terminal amino acid(s) of the second portion comprise EIV, EI, or E. In certain embodiments, the heterologous moiety is situated between any two adjacent amino acids of KKNKKKEIV (SEQ ID NO: 5), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 7 and 8, or 8 and 9 of SEQ ID NO: 5. In some embodiments, one or more amino acids of SEQ ID NO: 5 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 5 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 5 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the PAM Distal Region of Amino Acids V718-L723 In some embodiments, n is 721 and m is 722. In some embodiments, the first portion comprises at least 577, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, or 721 amino acids. In certain embodiments, the second portion comprises at least 266, 270, 280, 290, 300, 310, 320, 330, 331, 332, or 333 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise RGK, GK, or K. In some embodiments, the N-terminal amino acid(s) of the second portion comprise SLV, SL, or S. In certain embodiments, the heterologous moiety is situated between any two adjacent amino acids of VRGKSL (SEQ ID NO: 6), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, or 5 and 6 of SEQ ID NO: 6. In some embodiments, one or more amino acids of SEQ ID NO: 6 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 6 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 6 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments, n is 778 and m is 779. In certain embodiments, the first portion comprises at least 622, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 775, 776, 777 or 778 amino acids. In certain embodiments, the second portion comprises at least 221, 225, 230, 240, 250, 260, 270, 275, or 276 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise KNN, NN, or N. In certain embodiments, the N-terminal amino acid(s) of the second portion comprise PIS, PI, or P. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of ALNASKNNPISD (SEQ ID NO: 7), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, or 11 and 12 of SEQ ID NO: 7. In some embodiments, one or more amino acids of SEQ ID NO: Xe are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 7 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 7 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments, n is 960 and m is 961. In certain embodiments, the first portion comprises at least 768, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, or 960 amino acids. In certain embodiments, the second portion comprises at least 75, 80, 85, 90, 91, 92, 93, or 94 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise DRK, RK, or K. In certain embodiments, the N-terminal amino acid(s) of the second portion comprise SNI, SN, or S. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of LKWRSDRKSNIPC (SEQ ID NO: 8), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, 11 and 12, or 12 and 13 of SEQ ID NO: 8. In certain embodiments, one or more amino acids of SEQ ID NO: 8 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 sequential amino acids of SEQ ID NO: 8 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 sequential amino acids of SEQ ID NO: 8 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments of the Cas12i2 fusion protein described herein, a) n is 61 and m is 62, or b) n is 62 and m is 63. In some embodiments, the first portion comprises at least 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, or 61 amino acids. In certain embodiments, the second portion comprises at least 795, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, or 991 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise EKQ, KQ, or Q. In certain embodiments, the N-terminal amino acid(s) of the second portion comprise QQD, QQ, or Q. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of STEQEKQQQDI (SEQ ID NO: 9), e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, or 8 and 9 of SEQ ID NO: 9. In certain embodiments, one or more amino acids of SEQ ID NO: 9 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 amino acids of SEQ ID NO: 9 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 9 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In certain embodiments of the Cas12i2 fusion protein described herein, a) n is 101 and m is 102, or b) n is 102 and m is 103. In certain embodiments, the first portion comprises at least 81, 90, 100, or 101 amino acids. In certain embodiments, the second portion comprises at least 762, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 951, 952, or 953 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise YGGT, YGG, GG, G, or T. In some embodiments, the N-terminal amino acid(s) of the second portion comprise TAS, TA, AS, T, or A. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of YGGTASD (SEQ ID NO: 10), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, or 6 and 7 of SEQ ID NO: 10. In some embodiments, one or more amino acids of SEQ ID NO: 10 are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, or 7 sequential amino acids of SEQ ID NO: 10 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, or 7 sequential amino acids of SEQ ID NO: 10 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments, n is 116 and m is 117. In certain embodiments, the first portion comprises at least 81, 90, 100, or 101 amino acids. In some embodiments, the second portion comprises at least 762, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 951, 952, or 953 amino acids. In other embodiments, the C-terminal amino acid(s) of the first portion comprise SIG, IG, or G. In certain embodiments, the N-terminal amino acid(s) of the second portion comprise ESY, ES, or E. In other embodiments, the heterologous moiety is situated between any two adjacent amino acids of SASIGESYY (SEQ ID NO: 11), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, or 8 and 9 of SEQ ID NO: 11. In some embodiments, one or more amino acids of SEQ ID NO: 11 are absent from the Cas12i2 fusion protein. In other embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 11 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 11 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments, n is 199 and m is 200. In other embodiments, the first portion comprises at least 160, 170, 180, 190, 195, 196, 197, 198, or 199 amino acids. In certain embodiments, the second portion comprises at least 684, 690, 700, 710, 720, 730, 740, 750, 760, 780, 790, 800, 810, 820, 830, 840, 850, or 855 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise LKE, KE, or E. In some embodiments, the N-terminal amino acid(s) of the second portion comprise IPK, IP, or I. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of SNLKEIPKNVAP (SEQ ID NO: 12), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, or 11 and 12 of SEQ ID NO: 12. In some embodiments, one or more amino acids of SEQ ID NO: 12 are absent from the Cas12i2 fusion protein. In other embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 12 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In other embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 12 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments, n is 246 and m is 247. In other embodiments, the first portion comprises at least 197, 200, 210, 220, 230, 240, 245, or 246 amino acids. In certain embodiments, the second portion comprises at least 646, 650, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 780, 790, 800, 805, 806, 807, or 808 amino acids. In yet another embodiment, the C-terminal amino acid(s) of the first portion comprise GQK, QK, or K. In certain embodiments, the N-terminal amino acid(s) of the second portion comprise EFD, EF, or E. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of KDGQKEFDL (SEQ ID NO: 13), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, or 8 and 9 of SEQ ID NO: 13. In some embodiments, one or more amino acids of SEQ ID NO: 13 are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 13 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 13 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments of the Cas12i2 fusion protein described herein, a) n is 587 and m is 588, or b) n is 590 and m is 591. In other embodiments, the first portion comprises at least 470, 472, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 585, 587, or 590 amino acids. In certain embodiments, the second portion comprises at least 371, 374, 380, 390, 400, 410, 420, 430, 440, 450, 460, 464, or 467 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise: a) QKG, KG, or G; or b) TLQ, LQ, or Q. In some embodiments, the N-terminal amino acid(s) of the second portion comprise: a) TLQ, TL, or T; or b) IGD, IG, or I. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of GRQKGTLQIGDR (SEQ ID NO: 14), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, or 11 and 12 of SEQ ID NO: 14. In certain embodiments, one or more amino acids of SEQ ID NO: 14 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 14 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 14 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments of the Cas12i2 fusion protein described herein, a) n is 893 and m is 894, or b) n is 894 and m is 895. In other embodiments, the first portion comprises at least 715, 716, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 891, 892, 893, or 894 amino acids. In some embodiments, the second portion comprises at least 128, 129, 130, 140, 150, 160, or 161 amino acids. In certain embodiments, the C-terminal amino acid(s) of the first portion comprise: a) RNP, NP, or P; or b) NPD, PD, or D. In some embodiments, the N-terminal amino acid(s) of the second portion comprise: a) DKA, DK, or D; or b) KAM, KA, or K. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of CGSLYTSHQDPLVHRNPDKAMKCRW (SEQ ID NO: 15), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, 11 and 12, 12 and 13, 13 and 14, 14 and 15, 15 and 16, 16 and 17, 17 and 18, 18 and 19, 19 and 20, 20 and 21, 21 and 22, 22 and 23, 23 and 24, 24 and 25, of SEQ ID NO: 15. In other embodiments, one or more amino acids of SEQ ID NO: 15 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, or 24 sequential amino acids of SEQ ID NO: 15 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In other embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, or 24 sequential amino acids of SEQ ID NO: 15 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments, n is 175 and m is 176. In certain embodiments, the heterologous sequence comprises a localization sequence, e.g., a nuclear localization sequence (NLS).
In certain embodiments, the heterologous sequence comprises an NLS, and n and m are each independently a number between:
In some embodiments, the first portion comprises at least 140, 145, 150, 155, 160, 165, 170, or 175 amino acids. In some embodiments, the second portion comprises at least 703, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 875, 876, 877, or 878 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise GTG, TG, or G. In certain embodiments, the N-terminal amino acid(s) of the second portion comprise EKE, EK, or E. In some embodiments, the heterologous moiety is situated between any two adjacent amino acid residues of GTGEKED (SEQ ID NO: 16), e.g., between positions 1 and 2, 2 and 3, 3 and 4, or 4 and 5 of SEQ ID NO: 17. In some embodiments, one or more amino acids of SEQ ID NO: 16 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, or 5 amino acids of SEQ ID NO: 16 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, or 5 sequential amino acids of SEQ ID NO: 16 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments of the Cas12i2 fusion proteins described herein, a) n is 218 and m is 219; or b) n is 219 and m is 220. In certain embodiments, the first portion comprises at least 175, 176, 180, 190, 200, 210, 218, or 219 amino acids. In some embodiments, the second portion comprises at least 668, 669, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 835, or 836 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise: a) KAT, AT, or T; or b) ATK, TK, or K. In certain embodiments, N-terminal amino acid(s) of the second portion comprise: a) KET, KE, or K; or b) ETF, ET, or E. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of KATKET (SEQ ID NO: 17), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, or 6 and 7 of SEQ ID NO: 17. In certain embodiments, one or more amino acids of SEQ ID NO: 17 are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, or 7 sequential amino acids of SEQ ID NO: 17 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, or 7 sequential amino acids of SEQ ID NO: 17 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In certain embodiments, n is 266 and m is 267. In some embodiments, the first portion comprises at least 213, 220, 230, 240, 250, 260, 265, or 266 amino acids. In some embodiments, the second portion comprises at least 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, or 788 amino acids. In certain embodiments, the C-terminal amino acid(s) of the first portion comprise KSK, SK, or K. In other embodiments, the N-terminal amino acid(s) of the second portion comprise ERD, ER, or E. In certain embodiments, the heterologous moiety is situated between any two adjacent amino acids of SKERDWCC (SEQ ID NO: 18), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, or 7 and 8 of SEQ ID NO: 18. In other embodiments, one or more amino acids of SEQ ID NO: 18 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, or 8 sequential amino acids of SEQ ID NO: 18 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, or 8 sequential amino acids of SEQ ID NO: 18 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments of the Cas12i2 fusion protein described herein, a) n is 409 and m is 410, or b) n is 410 and m is 411. In certain embodiments, the first portion comprises at least 328 330, 340, 350, 360, 370, 380, 390, 400, 405, 406, 407, 408, 409, or 410 amino acids. In some embodiments, the second portion comprises at least 516, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 641, 642, 643, 644, or 645 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise IRQE, RQ, Q, or E. In some embodiments, the N-terminal amino acid(s) of the second portion comprise ECS, EC, E, or C. In certain embodiments, the heterologous moiety is situated between any two adjacent amino acids of RQECSA (SEQ ID NO: 19), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, or 5 and 6 of SEQ ID NO: 19. In some embodiments, one or more amino acids of SEQ ID NO: 19 are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 19 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 19 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments, n is 462 and m is 463. In other embodiments, the first portion comprises at least 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 461, or 462 amino acids. In some embodiments, the second portion comprises at least 474, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 591, or 592 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise DRP, RP, or P. In some embodiments, the N-terminal amino acid(s) of the second portion comprise NSL, NS, or S. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of AQRNDRPNSLDLR (SEQ ID NO: 20), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, 11 and 12, or 12 and 13 of SEQ ID NO: 20. In other embodiments, one or more amino acids of SEQ ID NO: 20 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 sequential amino acids of SEQ ID NO: 20 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 sequential amino acids of SEQ ID NO: 20 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In certain embodiments, n is 478 and m is 479. In some embodiments, the first portion comprises at least 383, 390, 400, 410, 420, 430, 440, 450, 460, 470, 475, or 478 amino acids. In some embodiments, the second portion comprises at least 461, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, or 578 amino acids. In other embodiments, the C-terminal amino acid(s) of the first portion comprise RHP, HP, or P. In certain embodiments, the N-terminal amino acid(s) of the second portion comprise DGR, DG, or D. In certain embodiments, the heterologous moiety is situated between any two adjacent amino acids of HPDGRW (SEQ ID NO: 21), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, or 5 and 6 of SEQ ID NO: 21. In other embodiments, one or more amino acids of SEQ ID NO: 21 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 21 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 21 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
Exemplary Cas12i2 Fusion Proteins Comprising a Heterologous Sequence Such as an NLS at Amino Acid Residues I498-T513 In some embodiments of any Cas12i2 fusion protein described herein, a) n is 504 and m is 505; or b) n is 505 and m is 506. In other embodiments, the first portion comprises at least 404, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 504, or 505 amino acids. In certain embodiments, the second portion comprises at least 439, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 549, or 550 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise: a) GNS, NS, or S; or b) NSP, SP, or P. In some embodiments, the N-terminal amino acid(s) of the second portion comprise: a) PVD, PV, or P; or b) VDT, VD, or V. In certain embodiments, the heterologous moiety is situated between any two adjacent amino acids of IYAAGNSPVDTCQFRT (SEQ ID NO: 22), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, 11 and 12, 12 and 13, 13 and 14, 14 and 15, or 15 and 16 of SEQ ID NO: 22. In some embodiments, one or more amino acids of SEQ ID NO: 22 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 sequential amino acids of SEQ ID NO: 22 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 sequential amino acids of SEQ ID NO: 22 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments, n is 614 and m is 615. In some embodiments, the first portion comprises at least 492, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, or 614 amino acids. In certain embodiments, the second portion comprises at least 352, 360, 370, 380, 390, 400, 410, 420, 430, or 440 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise EVV, VV, or V. In certain embodiments, the N-terminal amino acid(s) of the second portion comprise KEG, KE, or K. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of VKEGQYHKELGC (SEQ ID NO: 23), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, or 11 and 12 of SEQ ID NO: 23. In certain embodiments, one or more amino acids of SEQ ID NO: 23 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 23 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 23 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments, n is 977 and m is 978. In some embodiments, the first portion comprises at least 782, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, or 977 amino acids. In some embodiments, the second portion comprises at least 352, 360, 370, 380, 390, 400, 410, 420, 430, or 440 amino acids. In certain embodiments, the C-terminal amino acid(s) of the first portion comprise KLG, LG, or G. In some embodiments, the N-terminal amino acid(s) of the second portion comprise NKE, NK, or N. In certain embodiments, the heterologous moiety is situated between any two adjacent amino acids of GNKEAV (SEQ ID NO: 24), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, and 5 and 6 of SEQ ID NO: 24. In some embodiments, one or more amino acids of SEQ ID NO: 24 are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 24 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 24 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments, n is 1007 and m is 1008. In certain embodiments, the first portion comprises at least 806, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 1000, or 1007 amino acids. In certain embodiments, the second portion comprises at least 38, 39, 40, 41, 42, 43, 44, 45, 46, or 47 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise SIV, IV, or V. In some embodiments, the N-terminal amino acid(s) of the second portion comprise FDW, FD, or F. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of VFDQKQ (SEQ ID NO: 25), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, and 5 and 6 of SEQ ID NO: 25. In some embodiments, one or more amino acids of SEQ ID NO: 25 are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 25 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 25 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.
In some embodiments, the heterologous sequence comprises a fusion domain (e.g., a base editing domain, a ssDNA binding domain, an NLS, or a poly-basic domain). For instance, a Cas12i2 fusion protein of this disclosure may comprise a nuclear localization sequence (NLS) such as an SV40 (simian virus 40) NLS, c-Myc NLS, or other suitable monopartite NLS. The NLS may be fused to the N-terminus and/or C-terminus of the Cas12i2 polypeptide, and may be fused singly (i.e., a single NLS) or concatenated (e.g., a chain of 2, 3, 4, etc. NLS).
In some embodiments, at least one Nuclear Export Signal (NES) is attached to a nucleic acid sequences encoding the Cas12i2 fusion protein. In some embodiments, a C-terminal and/or N-terminal NLS or NES is attached for optimal expression and nuclear targeting in eukaryotic cells, e.g., human cells.
In certain embodiments, the heterologous sequence comprises at least one linker sequence. In some embodiments, the heterologous sequence comprises a first linker (e.g., a first peptide linker) and a second linker (e.g., a second peptide linker). In some embodiments, the first linker and the second linker each independently comprise between 3 and 60 amino acid residues (e.g., 5, 10, 15, 20, 25, 30, 35, 40, 50, 55, or 60, between 3-10, between 10-20, between 20-30, between 30-40, between 40-50, or between 50-60). In some embodiments, the first linker and the second linker each independently comprise one or more Gly residues and/or one or more Ser residues. In other embodiments, the first linker and the second peptide linker each independently comprise (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In some embodiments, the first linker is N-terminal of the fusion domain and the second linker is C-terminal of the fusion domain. In certain embodiments, the first linker and the second linker are the same. In some embodiments, the first linker and the second linker are different.
In some embodiments, a Cas12i2 protein comprises a heterologous sequence (e.g., an insertion) within the Wed domain, the Rec1 domain, or the Nuc domain. In some embodiments, the insertion occurs at the interface of the Wed domain and the Rec1 domain.
In some embodiments, n is 430 and m is 431. In some embodiments, n is 431 and m is 432. In some embodiments, n is 432 and m is 433. In some embodiments, n is 433 and m is 434. In some embodiments, n is 434 and m is 435. In some embodiments, n is 435 and m is 436. In some embodiments, n is 436 and m is 437. In some embodiments, n is 437 and m is 438. In some embodiments, n is 438 and m is 439. In some embodiments, n is 440 and m is 441. In some embodiments, n is 441 and m is 442. In some embodiments, n is 442 and m is 443. In some embodiments, n is 443 and m is 444. In some embodiments, n is 444 and m is 445. In some embodiments, n is 445 and m is 446. In some embodiments, n is 446 and m is 447. In some embodiments, n is 447 and m is 448. In some embodiments, n is 448 and m is 449. In some embodiments, n is 449 and m is 450. In some embodiments, n is 920 and m is 921. In some embodiments, n is 921 and m is 922. In some embodiments, n is 922 and m is 923. In some embodiments, n is 923 and m is 924. In some embodiments, n is 924 and m is 925. In some embodiments, n is 925 and m is 926. In some embodiments, n is 926 and m is 927. In some embodiments, n is 927 and m is 928. In some embodiments, n is 928 and m is 929. In some embodiments, n is 929 and m is 930. In some embodiments, n is 930 and m is 931. In some embodiments, n is 931 and m is 932. In some embodiments, n is 932 and m is 933. In some embodiments, n is 933 and m is 934. In some embodiments, n is 934 and m is 935. In some embodiments, n is 935 and m is 936. In some embodiments, n is 936 and m is 937. In some embodiments, n is 937 and m is 938. In some embodiments, n is 938 and m is 939. In some embodiments, n is 939 and m is 940.
In some embodiments, the insertion is one residue to about 10 residues in length (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 residues). In some embodiments, the insertion comprises one or more of a glycine, serine, aspartate, or asparagine residue. In some embodiments, the insertion comprises a one-residue insertion (e.g., one glycine, one serine, one aspartate, or one asparagine). In some embodiments, the insertion comprises a two-residue insertion (e.g., two glycines, two serines, two aspartates, or two asparagines). In some embodiments, the insertion comprises a two-residue insertion comprising at least one glycine. In some embodiments, the insertion comprises a three-residue insertion (e.g., three glycines, three serines, three aspartates, or three asparagines). In some embodiments, the insertion comprises a three-residue insertion comprising at least one glycine. In some embodiments, the insertion comprises a four-residue insertion (e.g., four glycines, four serines, four aspartates, or four asparagines). In some embodiments, the insertion comprises a four-residue insertion comprising at least one glycine. In some embodiments, the insertion comprises a five-residue insertion (e.g., five glycines, five serines, five aspartates, or five asparagines). In some embodiments, the insertion comprises a five-residue insertion comprising at least one glycine.
In some embodiments, a Cas12i2 protein has a glycine-glycine insertion in the Wed domain or the Rec domain. In some embodiments, n is 440, m is 441, and the heterologous sequence is a glycine-glycine insertion. In some embodiments, n is 440, m is 441, and the heterologous sequence is a serine-serine insertion. In some embodiments, n is 440, m is 441, and the heterologous sequence is an aspartate-aspartate insertion. In some embodiments, n is 440, m is 441, and the heterologous sequence is an asparagine-asparagine insertion. In some embodiments, n is 440, m is 441, and the heterologous sequence is a glycine-serine insertion. In some embodiments, n is 440, m is 441, and the heterologous sequence is a glycine-aspartate insertion. In some embodiments, n is 440, m is 441, and the heterologous sequence is a glycine-asparagine insertion. In some embodiments, n is 440, m is 441, and the heterologous sequence is a serine-glycine insertion. In some embodiments, n is 440, m is 441, and the heterologous sequence is an aspartate-glycine insertion. In some embodiments, n is 440, m is 441, and the heterologous sequence is an asparagine-glycine insertion.
In some embodiments, a Cas12i2 protein has a glycine-glycine insertion in the Nuc domain. In some embodiments, n is 927, m is 928, and the heterologous sequence is a glycine-glycine insertion. In some embodiments, n is 927, m is 928, and the heterologous sequence is a serine-serine insertion. In some embodiments, n is 927, m is 928, and the heterologous sequence is an aspartate-aspartate insertion. In some embodiments, n is 927, m is 928, and the heterologous sequence is an asparagine-asparagine insertion. In some embodiments, n is 927, m is 928, and the heterologous sequence is a glycine-serine insertion. In some embodiments, n is 927, m is 928, and the heterologous sequence is a glycine-aspartate insertion. In some embodiments, n is 927, m is 928, and the heterologous sequence is a glycine-asparagine insertion. In some embodiments, n is 927, m is 928, and the heterologous sequence is a serine-glycine insertion. In some embodiments, n is 927, m is 928, and the heterologous sequence is an aspartate-glycine insertion. In some embodiments, n is 927, m is 928, and the heterologous sequence is an asparagine-glycine insertion.
Fusion Proteins with Dimerization Domains
In another aspect, the disclosure provides a Cas12i2 fusion protein (see, e.g.,
In certain embodiments, the first heterologous sequence further comprises a fusion domain. In some embodiments, the fusion domain is disposed between the Cas12i2 domain and the dimerization domain. In some embodiments, the first heterologous sequence comprises (i) a first dimerization domain and (ii) a fusion domain, wherein the fusion domain is disposed between the first dimerization domain and the Cas12i2 domain. In some embodiments, the second heterologous sequence comprises a second, compatible dimerization domain. In some embodiments, the Cas12i2 domain is linked to the first heterologous sequence by a first linker (e.g., a first peptide linker). In some embodiments, the Cas12i2 domain is linked to the second heterologous sequence by a second linker (e.g., a second peptide linker). In certain embodiments, the fusion domain is linked to the first dimerization domain by a third linker (e.g., a third peptide linker). In other embodiments, the first linker, the second linker, or the third linker each independently comprise between 4 and 60 amino acid residues. In certain embodiments, the first linker, the second linker, or the third linker each independently comprise a combination of Gly residues and Ser residues. In some embodiments, the first linker, the second linker, or the third linker each independently comprise an amino acid sequence comprising (GSG)x, (GGGS)x, or (GSSG)x, wherein X is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15).
In another aspect, the disclosure features a Cas12i2 fusion protein (see, e.g.,
In some embodiments, the first portion of a split fusion domain is linked to the Cas12i2 domain by a first linker (e.g., a first peptide linker). In some embodiments, the second portion of a split fusion domain is linked to the Cas12i2 domain by a second linker (e.g., a second peptide linker). In certain embodiments, the first linker and the second linker each independently comprise between 4 and 60 amino acid residues. In certain embodiments, the first linker and the second linker each independently comprise a combination of Gly and Ser residues. In certain embodiments, the first linker and the second peptide linker each independently comprise (GSG)x, (GGGS)x or (GSSG)x, wherein X is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). Examples of split fusion domains that may be used are beta-lactamase, dihydrofolate reductase (DHFR), focal adhesion kinase (FAK), green fluorescent protein GFP), enhanced GFP (EGFP), horseradish peroxidase, infrared fluorescent protein IFP1.4, LacZ, luciferase (e.g., recombinase enhanced bimolecular luciferase (ReBiL), Gaussia princeps luciferase, NanoLuc, and NanoBIT), Tobacco etch virus protease (TEV), and ubiquitin.
In another aspect, the disclosure provides an engineered, non-naturally occurring Cas12i2 protein comprising:
In certain embodiments, the first portion and the second portion are linked by a heterologous sequence. In some embodiments, the heterologous sequence comprises one or more of:
In some embodiments, the heterologous sequence comprises each of a first linker (e.g., a first peptide linker), a second linker (e.g., a second peptide linker), and a fusion domain, wherein the fusion domain is disposed between the first linker and the second linker. In certain embodiments, the first linker and the second linker, when present, comprise between 3 and 60 amino acid residues. In some embodiments, the first linker and the second linker each independently comprise the amino acid sequence (GSS)x, (GSG)x, (GGGS)x, or (GSSG)x, wherein X is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15).
In some embodiments, the C-terminal most amino acid of the first portion is any amino acid residue of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 chosen from residues:
In some embodiments, the N-terminal most amino acid of the second portion is any amino acid residue of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 chosen from residues:
In any of the embodiments described herein, the circularly permuted Cas12i2 protein further comprises a second heterologous sequence at its N-terminus. In some embodiments, the circularly permuted Cas12i2 protein further comprises an additional heterologous sequence at its C-terminus. In some embodiments, the second heterologous sequence and/or the additional heterologous sequence a chosen from a purification tag, a stability tag, or a restriction endonuclease or restriction endonuclease domain.
In some embodiments, a circularly permutated Cas12i2 protein comprises a) a first portion comprising an amino acid sequence of an N-terminal portion of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and b) a second portion comprising an amino acid sequence of a C-terminal portion of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, wherein the second portion is N-terminal of the first portion, and wherein the C-terminal most amino acid of the first portion is an amino acid residue of a flexible loop within the Helical II, Helical III, Nuc, or RuvC II domain. In some embodiments, the flexible loop is in proximity to or in contact with target DNA, such as a loop depicted in
In some embodiments, a circularly permutated Cas12i2 protein comprises a) a first portion comprising an amino acid sequence of an N-terminal portion of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and b) a second portion comprising an amino acid sequence of a C-terminal portion of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, wherein the second portion is N-terminal of the first portion, and wherein the N-terminal most amino acid of the second portion is an amino acid residue of a flexible loop within the Helical II, Helical III, Nuc, or RuvC II domain. In some embodiments, the flexible loop is in proximity to or in contact with target DNA, such as a loop depicted in
In some embodiments, a circularly permutated Cas12i2 protein comprises a) a first portion comprising an amino acid sequence of an N-terminal portion of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and b) a second portion comprising an amino acid sequence of a C-terminal portion of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, wherein the second portion is N-terminal of the first portion, and wherein the C-terminal most amino acid of the first portion is any amino acid residue of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 chosen from residues a) 342-358 (e.g., residue 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358); b) 373-378 (e.g., residue 373, 374, 375, 376, 377, or 378); c) 386-397 (e.g., residue 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, or 397); d) 677-685 (e.g., residue 677, 678, 679, 680, 681, 682, 683, 684, or 685); e) 771-782 (e.g., residue 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, or 782); f) 831-844 (e.g., residue 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, or 844), g) 953-965 (e.g., residue 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, or 965) h) 55-65 (e.g., residue 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, or 65); i) 99-105 (e.g., residue 99, 100, 101, 102, 103, 104, or 105); j) 112-120 (e.g., residue 112, 113, 114, 115, 116, 117, 118, 119, or 120); k) 195-206 (e.g., residue 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, or 206); 1) 241-250 (e.g., residue 241, 242, 243, 244, 245, 246, 247, 248, 249, or 250); or n) 877-901 (e.g., residue 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, or 901). The positions of the residues are indicated in
In some embodiments, a circularly permutated Cas12i2 protein comprises a) a first portion comprising an amino acid sequence of an N-terminal portion of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and b) a second portion comprising an amino acid sequence of a C-terminal portion of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, wherein the second portion is N-terminal of the first portion, and wherein the N-terminal most amino acid of the second portion is any amino acid residue of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 chosen from residues a) 342-358 (e.g., residue 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358); b) 373-378 (e.g., residue 373, 374, 375, 376, 377, or 378); c) 386-397 (e.g., residue 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, or 397); d) 677-685 (e.g., residue 677, 678, 679, 680, 681, 682, 683, 684, or 685); e) 771-782 (e.g., residue 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, or 782); f) 831-844 (e.g., residue 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, or 844), or g) 953-965 (e.g., residue 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, or 965). The positions of the residues are indicated in
In some embodiments, a circularly permutated Cas12i2 protein comprises a) a first portion comprising an amino acid sequence of an N-terminal portion of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and b) a second portion comprising an amino acid sequence of a C-terminal portion of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, wherein the second portion is N-terminal of the first portion, and wherein the C-terminal most amino acid of the first portion is any amino acid residue of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 chosen from residues c) 408-413 (e.g., residue 408, 409, 410, 411, 412, or 413); d) 677-685 (e.g., residue 677, 678, 679, 680, 681, 682, 683, 684, or 685); h) 55-65 (e.g., residue 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, or 65); i) 99-105 (e.g., residue 99, 100, 101, 102, 103, 104, or 105); j) 112-120 (e.g., residue 112, 113, 114, 115, 116, 117, 118, 119, or 120); k) 195-206 (e.g., residue 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, or 206); 1) 241-250 (e.g., residue 241, 242, 243, 244, 245, 246, 247, 248, 249, or 250); or n) 877-901 (e.g., residue 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, or 901).
In some embodiments, a circularly permutated Cas12i2 protein comprises a) a first portion comprising an amino acid sequence of an N-terminal portion of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and b) a second portion comprising an amino acid sequence of a C-terminal portion of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, wherein the second portion is N-terminal of the first portion, and wherein the N-terminal most amino acid of the second portion is any amino acid residue of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 chosen from residues c) 408-413 (e.g., residue 408, 409, 410, 411, 412, or 413); d) 677-685 (e.g., residue 677, 678, 679, 680, 681, 682, 683, 684, or 685); h) 55-65 (e.g., residue 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, or 65); i) 99-105 (e.g., residue 99, 100, 101, 102, 103, 104, or 105); j) 112-120 (e.g., residue 112, 113, 114, 115, 116, 117, 118, 119, or 120); k) 195-206 (e.g., residue 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, or 206); 1) 241-250 (e.g., residue 241, 242, 243, 244, 245, 246, 247, 248, 249, or 250); or n) 877-901 (e.g., residue 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, or 901).
In some embodiments, the N-terminus of a circularly permutated Cas12i2 protein comprises at least one fusion domain. In some embodiments, the fusion domain comprises an NLS. In some embodiments, the circularly permuted Cas12i2 protein comprises an NLS at its N-terminus and/or C-terminus. In some embodiments, the circularly permuted Cas12i2 protein comprises an NLS at its N-terminus. In some embodiments, the circularly permuted Cas12i2 protein comprises an NLS at its C-terminus. In some embodiments, the NLS comprises an amino acid sequence of any one of SEQ ID NOs: 61-65, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, the fusion domain is a FokI nuclease domain. See e.g., Ramirez et al., Nucleic Acids Res. 40(12): 5560-8 (2012) and Guilinger et al., Nature Biotechnology 32: 577-82 (2014). In some embodiments, the FokI nuclease domain is a catalytically active FokI nuclease domain. In some embodiments, the FokI nuclease domain is a dead (e.g., a catalytically inactive) FokI nuclease domain. In some embodiments, the circularly permuted Cas12i2 protein comprises a FokI nuclease domain at its N-terminus (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain). In some embodiments, the circularly permuted Cas12i2 protein comprises a FokI nuclease domain at its C-terminus (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain). In some embodiments, the circularly permuted Cas12i2 protein comprises a FokI nuclease domain at its N-terminus and at its C-terminus. In some embodiments, the circularly permuted Cas12i2 protein comprises a catalytically active FokI nuclease domain at its N-terminus and a catalytically active FokI nuclease domain at its C-terminus. In some embodiments, the circularly permuted Cas12i2 protein comprises a catalytically active FokI nuclease domain at its N-terminus and a catalytically inactive FokI nuclease domain at its C-terminus. In some embodiments, the circularly permuted Cas12i2 protein comprises a catalytically inactive FokI nuclease domain at its N-terminus and a catalytically active FokI nuclease domain at its C-terminus. In some embodiments, the circularly permuted Cas12i2 protein comprises a catalytically inactive FokI nuclease domain at its N-terminus and a catalytically inactive FokI nuclease domain at its C-terminus. In some embodiments wherein a circularly permuted Cas12i2 protein comprises a FokI nuclease domain at its N-terminus and at its C-terminus, the FokI nuclease domains form a dimer (e.g., a homodimer or a heterodimer). See, e.g.,
In some embodiments, the FokI nuclease domain further comprises an additional fusion domain. In some embodiments, the FokI nuclease domain is a catalytically active FokI nuclease domain, and the additional fusion domain is a protein or a peptide. In some embodiments, the FokI nuclease domain is a catalytically inactive FokI nuclease domain and the additional fusion domain is a protein or a peptide. In some embodiments, the protein is a polymerase.
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 55-65 (e.g., residue 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, or 65). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 61 corresponding to SEQ ID NO: 40, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 60 corresponding to SEQ ID NO: 40. In some embodiments, the circularly permuted Cas12i2 protein comprises an amino acid sequence of SEQ ID NO: 47, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain comprises an NLS.
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 99-105 (e.g., residue 99, 100, 101, 102, 103, 104, or 105). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 102 corresponding to SEQ ID NO: 40, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 101 corresponding to SEQ ID NO: 40. In some embodiments, the circularly permuted Cas12i2 protein comprises an amino acid sequence of SEQ ID NO: 48, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain comprises an NLS.
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 112-120 (e.g., residue 112, 113, 114, 115, 116, 117, 118, 119, or 120). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 117 corresponding to SEQ ID NO: 40, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 116 corresponding to SEQ ID NO: 40. In some embodiments, the circularly permuted Cas12i2 protein comprises an amino acid sequence of SEQ ID NO: 49, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain comprises an NLS.
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 195-206 (e.g., residue 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, or 206). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 200 corresponding to SEQ ID NO: 40, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 199 corresponding to SEQ ID NO: 40. In some embodiments, the circularly permuted Cas12i2 protein comprises an amino acid sequence of SEQ ID NO: 50, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain comprises an NLS.
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 241-250 (e.g., residue 241, 242, 243, 244, 245, 246, 247, 248, 249, or 250). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 247 corresponding to SEQ ID NO: 40, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 246 corresponding to SEQ ID NO: 40. In some embodiments, the circularly permuted Cas12i2 protein comprises an amino acid sequence of SEQ ID NO: 51, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain comprises an NLS.
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 342-358 (e.g., residue 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 343 corresponding to SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 342. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain is a FokI nuclease domain (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain).
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 373-378 (e.g., residue 373, 374, 375, 376, 377, or 378). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 374 corresponding to SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 373. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain is a FokI nuclease domain (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain).
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 386-397 (e.g., residue 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, or 397). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 387 corresponding to SEQ ID NO: 1, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 386. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain is a FokI nuclease domain (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain).
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 408-413 (e.g., residue 408, 409, 410, 411, 412, or 413). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 410 corresponding to SEQ ID NO: 40, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 409 corresponding to SEQ ID NO: 40. In some embodiments, the circularly permuted Cas12i2 protein comprises an amino acid sequence of SEQ ID NO: 45, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain comprises an NLS.
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 677-685 (e.g., residue 677, 678, 679, 680, 681, 682, 683, 684, or 685). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 678 corresponding to SEQ ID NO: 1, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 677. In some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 681 corresponding to SEQ ID NO: 40, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 680 corresponding to SEQ ID NO: 40. In some embodiments, the circularly permuted Cas12i2 protein comprises an amino acid sequence of SEQ ID NO: 46, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain comprises an NLS. In some embodiments, the fusion domain is a FokI nuclease domain (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain).
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 771-782 (e.g., residue 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, or 782). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 772 corresponding to SEQ ID NO: 1, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 771. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain is a FokI nuclease domain (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain).
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 831-844 (e.g., residue 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, or 844). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 832 corresponding to SEQ ID NO: 1, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 831. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain is a FokI nuclease domain (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain).
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 877-901 (e.g., residue 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, or 901). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 893 corresponding to SEQ ID NO: 40, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 892 corresponding to SEQ ID NO: 40. In some embodiments, the circularly permuted Cas12i2 protein comprises an amino acid sequence of SEQ ID NO: 52, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain comprises an NLS.
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue “x” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein x is chosen from residues 953-965 (e.g., residue 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, or 965). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue “y” of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), wherein y is x−1. For example, in some embodiments, the N-terminal residue of a circularly permutated Cas12i2 protein comprises residue 954 corresponding to SEQ ID NO: 1, and the C-terminal residue of a circularly permutated Cas12i2 protein comprises residue 953. In some embodiments, residue “x” and/or residue “y” is linked to a fusion domain. In some embodiments, the fusion domain is a FokI nuclease domain (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain).
In some embodiments, a circularly permuted Cas12i2 protein is truncated relative to a Cas12i2 protein of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43. In some embodiments, a circularly permuted Cas12i2 protein has a modified Helical II domain relative to the Cas12i2 protein of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43. For example, in some embodiments, the circularly permuted Cas12i2 protein comprises substitutions or deletions in the Helical II domain relative to the sequence of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43. In some embodiments, a circularly permuted Cas12i2 protein comprises a truncated Helical II domain. For example, in some embodiments, the circularly permuted Cas12i2 protein does not comprise one or more flexible loops or alpha helices of the Helical II domain. For example, in some embodiments, the circularly permuted Cas12i2 protein does not comprise the loop of residues 342-358 (or 343-357), the loop of residues 386-397 (or 387-396), or the alpha helices of residues 359-385 (or 358-386).
In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein comprises an amino acid residue of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto) chosen from residues 386-397 (e.g., residue 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, or 397); d) 677-685 (e.g., residue 677, 678, 679, 680, 681, 682, 683, 684, or 685). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto) chosen from residues 342-358 (e.g., residue 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358). In some embodiments, the C-terminal residue of the circularly permuted Cas12i2 protein comprises an amino acid residue of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 (or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto) chosen from residues 330-342 (e.g., residue 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, or 342). In some embodiments, the N-terminal residue and/or C-terminal residue further comprises a fusion domain. In some embodiments, the fusion domain is a FokI nuclease domain (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain). In some embodiments, the fusion domain comprises an NLS.
In certain embodiments, the circularly permuted Cas12i2 protein comprises an additional heterologous sequence disposed between a first amino acid residue “n” and a second amino acid residue “m” of the circularly permuted Cas12i2 protein, wherein n and m are each independently an amino acid residue of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43. In some embodiments, n and m are each independently a number between:
In some embodiments, n<m. In some embodiments, m=n+1.
In certain embodiments, the N-terminal Met residue of any of SEQ ID NO: 1 or any one of SEQ ID NOs: 39-43 is absent. In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein is a Met residue. In some embodiments, the Met residue is added to the N-terminus of any one of the circularly permuted Cas12i2 proteins described herein.
In some embodiments, the circularly permuted Cas12i2 protein is capable of binding an RNA guide comprising a direct repeat sequence and a spacer sequence, wherein the spacer sequence is capable of hybridizing to a target nucleic acid.
In any of the aspects described herein, the circularly permuted Cas12i2 protein comprises a catalytic residue (e.g., D599, E833, and D1019). In certain embodiments, the circularly permuted Cas12i2 protein comprises a mutation (e.g., an alanine mutation) at any one of amino acid residue D599, E833, or D1019 of SEQ ID NO: 1. In certain embodiments, the circularly permuted Cas12i2 protein is a dead Cas12i2 protein (e.g., a catalytically inactive Cas12i2 protein).
In some embodiments, a circularly permuted Cas12i2 protein described herein comprises nickase activity. In some embodiments, a circularly permuted Cas12i2 protein described herein nicks the target strand of a target nucleic acid. In some embodiments, a circularly permuted Cas12i2 protein described herein nicks the non-target strand of a target nucleic acid. In some embodiments, a circularly permuted Cas12i2 protein described herein nicks a target sequence adjacent to a Cas12i2 PAM sequence (e.g., a 5′-NTTN-3′ sequence). See, e.g.,
In some embodiments, the heterologous sequence comprises a peptide tag, a fluorescent protein, a base-editing domain, a DNA methylation domain, a histone residue modification domain, a localization factor (e.g., an NLS), a transcription modification factor, a light-gated control factor, a chemically inducible factor, a chromatin visualization factor, or a restriction endonuclease. In some embodiments, the heterologous sequence is about 5-10, 10-20, 20-30, 30-40, 40-50, 50-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1100, 1100-1400, 1400-1600, 1600-1800, or 1800-2000 amino acids in length.
This application related to Cas12i2 fusion proteins comprising heterologous sequences as described herein. In some instances, the heterologous sequence comprises a fusion domain (e.g., a base editing domain, a ssDNA binding domain, an NLS domain, a poly-basic domain, or a nuclease domain). In some embodiments, the fusion domain can have various activities, e.g., methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, RNA cleavage activity, DNA cleavage activity, nucleic acid binding activity, ligase activity (e.g., an EC 6.1, 6.2, 6.3, 6.4, 6.5, or 6.6 ligase), transcriptase activity, reverse transcriptase activity, and switch activity (e.g., light inducible). In some embodiments, the fusion domain is chosen from peptide tag, a fluorescent protein, a base-editing domain, a DNA methylation domain, a histone residue modification domain, a localization factor (e.g., an NLS), a transcription modification factor, a ligase a light-gated control factor, a chemically inducible factor, or a chromatin visualization factor. In some embodiments, the fusion domains are chosen from Krüppel associated box (KRAB), VP64, VP16, Fok1, P65, HSF1, MyoD1, Geminin, Streptavidin, an asialoglycoprotein receptor ligand, and biotin-APEX, or biologically active portions thereof. In some embodiments, the fusion domain is selected from a restriction endonuclease, a CRISPR nuclease, or a domain thereof. The restriction endonuclease can be any restriction endonuclease known in the art (see, e.g., https://www.neb.com/tools-and-resources/selection-charts/alphabetized-list-of-recognition-specificities). In some embodiments, the restriction endonuclease is FokI or the nuclease domain thereof. The CRISPR nuclease can be any CRISPR nuclease known in the art, e.g., a class I or class II enzyme. The CRISPR nuclease can be a type I, type II, type III, type IV, type V, or type VI CRISPR nuclease. In some embodiments, the CRISPR nuclease is any CRISPR nuclease having a RuvC domain or split RuvC domain such that a Cas12i2 fusion protein comprises two or more RuvC domains or two or more split RuvC domains. The CRISPR nuclease can be a Cas9, Cas12, or Cas13 ortholog. The CRISPR nuclease can be a Cpf1 (Cas12a), C2c1 (Cas12b), Cas12c, Cas12d, Cas12e, Cas12g, Cas12h, Cas12i (e.g., Cas12i1 or Cas12i2), or Cas12j (also known as CasPhi). In some embodiments, the fusion domain is a splint ligase. In some embodiments, the fusion domains are chosen from a protein comprising a DNA binding domain (e.g., a helix-turn-helix motif (Aravind et al., FEMS Microbiology 29(2): 231-262, 2005), a zinc finger domain, a leucine zipper domain, a winged helix domain, a winged helix-turn-helix domain, a basic helix-loop-helix domain, an HMG-Box domain, a Wor3 domain, an OB-fold domain (Flynn and Zou Crit. Rev. Biochem. Mol. Biol. 45(4): 266-275, 2010), an immunoglobulin fold domain, a B3 domain, or a Tal effector domain), or a biologically active portion thereof. In some embodiments, the fusion domain comprises a multimerized fusion domain comprising two or more copies of any fusion domain described herein, optionally linked by a linker. The positioning of the one or more functional domains on the inactivated CRISPR nuclease is one that allows for correct spatial orientation for the fusion domain to affect the target with the attributed functional effect.
In some embodiments, Cas12i2 fusion proteins described herein comprise a fusion domain comprising a base editor that enable the Cas12i2 fusion proteins to edit a single nucleic acid base. In some instances, the fusion domain comprises a polypeptide that is capable of making a modification to a base (e.g., A, T, C, G, or U) within a nucleic acid sequence (e.g., DNA or RNA). In some instances, the base editing domain is capable of deamidating a base within a nucleic acid. In some instances, the base editing domain is capable of deamidating a base within a DNA molecule. In some instances, the base editing domain is capable of deamidating cytosine (C) in DNA. In some embodiments, the base editing domain is capable of deamidating a thymine (T) in DNA.
In some embodiments, the fusion domain is capable of methylating a base within a nucleic acid. In some instances, the fusion domain is capable of methylating cytosine (C) in DNA. In some embodiments, the fusion domain is capable of methylating adenine (A) in DNA. In some embodiments, the fusion domain is capable of methylating uracil (U) in RNA.
In some embodiments, the fusion domain is capable of demethylating a base within a nucleic acid. In some embodiments, the fusion domain is capable of demethylating a thymine (T) in DNA. In some embodiments, the fusion domain is capable of demethylating guanine (G) in DNA.
Examples of fusion domains are methylase (e.g., an M6a (EC 2.1.1.72), M4c (EC 2.1.1.113), M5c (EC 2.1.1.37), RNA methyltransferase (NSUN1, NSUN2, NSUN3, NSUN4, NSUN5, NSUN6, NSUN7, TRDMT1 (previously DNMT2)), and DNA methyltransferase (DNMT1, DNMT3 (3a, 3b, 3c, 3L)).
In some embodiments, Cas12i2 fusion protein comprises a nuclear localization sequence (also known as a nuclear localization signal) that promotes translocation through the nuclear envelope via nuclear pore complexes. The nuclear pore complex is composed of nucleoporins. Nucleoporins interact with transport molecules known as karyopherins. Karyopherins bind to proteins containing a nuclear localization sequence and transport the protein across the nuclear pore complex. In some embodiments, a nuclear localization sequence consists of one or more short (e.g., <50 amino-acid residues) sequence of basic amino acids. In some embodiments, a nuclear localization sequence consists of one or more short (e.g., <50 amino-acid residues) sequence of lysines or arginines. In some embodiments, the nuclear localization sequence is monopartite or bipartite. In some embodiments, the nuclear localization sequence is a nucleoplasmin NLS (npNLS).
In some embodiments, the NLS comprises: KRPAATKKAGQAKKKK (SEQ ID NO: 61), MKRTADGSEFESPKKKRKV (SEQ ID NO: 62), MKRTADGSEFESPKKKRKVE (SEQ ID NO: 63), KRTADGSEFESPKKKRKV (SEQ ID NO: 64), or KRTADGSEFESPKKKRKVE (SEQ ID NO: 65). In some embodiments, the NLS comprises an amino acid sequence of any one of SEQ ID NOs: 61-65, or an amino acid sequence having at least 80% (e.g., 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) identity thereto. In some embodiments, a linker (e.g., a polypeptide linker) is disposed between the Cas12i2 domain and the NLS. In some embodiments, the polypeptide linker comprises a glycine and/or serine residue (e.g., a GS linker). For example, in some embodiments, the Cas12i2 fusion proteins of SEQ ID NO: 68 and SEQ ID NO: 73 comprise the NLS of SEQ ID NO: 65, and the Cas12i2 fusion proteins of SEQ ID NO: 69 and SEQ ID NO: 74 comprise the NLS of SEQ ID NO: 64. In some embodiments, a Cas12i2 fusion protein comprises at least 80% (81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) identity to SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 73, or SEQ ID NO: 74.
In some embodiments, the nuclear localization sequence is disposed in the middle of the Cas12i2 fusion protein and is exposed on the fusion protein surface. In some embodiments, a nuclear localization sequence is recognized by a karyopherin. In some embodiment the nuclear localization sequence interacts with one or more karyopherin. In some embodiments, the karyopherin recognizes a nuclear localization sequence as it emerges from a ribosome. In some embodiments, the karyopherin recognizes a nuclear localization sequence on a fully translated protein.
In some embodiments, the nuclear localization sequence is defined as the nuclear localization sequence from the proteins listed in Table 6 of US 2015-0246139, which is incorporated by reference herein.
In some embodiments, the nuclear localization sequence is included in a heterologous sequence. In certain embodiments, the heterologous sequence comprising an NLS is located between a first portion comprising amino acids 1-n of SEQ ID NO: 1, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, and a second portion comprising amino acids m-1054 of SEQ ID NO: 1, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, wherein n and m are each independently a number between:
In some embodiments, the heterologous sequence comprises an NLS. In certain embodiments, the heterologous sequence comprising an NLS is located at the N-terminus and/or C-terminus of a circularly permuted Cas12i2 protein. In certain embodiments, the heterologous sequence comprising an NLS is located at the N-terminus of a circularly permuted Cas12i2 protein. In certain embodiments, the heterologous sequence comprising an NLS is located at the C-terminus of a circularly permuted Cas12i2 protein.
In some embodiments, the Cas12i2 fusion protein comprises a split fusion domain. Typically, a split fusion domain is a domain wherein a reference protein is split into two parts, which together substantially comprises a functioning fusion domain. A split can be done in any way that the function of the fusion domain(s) is unaffected. In some embodiments, the split is substantially proportional (e.g., a first split fusion portion and a second split fusion portion are substantially equal in amino acid length). In some embodiments, one portion of the split fusion domain has a greater number of amino acid residues than a second portion of the split fusion protein. In some embodiments, a split fusion domain is chosen from beta-lactamase, dihydrofolate reductase (DHFR), focal adhesion kinase (FAK), green fluorescent protein GFP), enhanced GFP (EGFP), horseradish peroxidase, infrared fluorescent protein IFP1.4, LacZ, luciferase (e.g., recombinase enhanced bimolecular luciferase (ReBiL), Gaussia princeps luciferase, NanoLuc, and NanoBIT), Tobacco etch virus protease (TEV), and ubiquitin.
In some embodiments, the Cas12i2 fusion protein comprises a dimerization domain. Typically, a dimerization domain is a polypeptide domain capable of specifically binding a separate, and compatible, polypeptide domain (e.g., a second compatible dimerization domain). In some embodiments, the dimer is formed by a non-covalent bond between the first dimerization domain and the second compatible dimerization domain. In some embodiments, the first dimerization domain and the second compatible dimerization domain are identical (e.g., a homodimer). In some embodiments, the first dimerization domain and the second dimerization domain are not identical (e.g., a heterodimer). In some embodiments, a dimerization domain is a leucine zipper. In some instances, the dimerization domain is a chemically inducible dimerization domain (e.g., a rapamycin sensitive dimerization domain) that can be regulated by the presence of a small molecule. In some embodiments, the dimerization domain is a light inducible dimerization domain (e.g., a far-red light inducible) that can be regulated by light exposure.
In some embodiments, the Cas12i2 fusion protein of the present invention includes a Cas12i2 domain described herein.
A nucleic acid sequence encoding a Cas12i2 domain described herein may be substantially identical to a reference nucleic acid sequence if the nucleic acid encoding the Cas12i2 domain comprises a sequence having least about 60%, least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity to the reference nucleic acid sequence. The percent identity between two such nucleic acids can be determined manually by inspection of the two optimally aligned nucleic acid sequences or by using software programs or algorithms (e.g., BLAST, ALIGN, CLUSTAL) using standard parameters. One indication that two nucleic acid sequences are substantially identical is that the two nucleic acid molecules hybridize to each other under stringent conditions (e.g., within a range of medium to high stringency).
In some embodiments, a Cas12i2 domain described herein is encoded by a nucleic acid sequence having at least about 60%, least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity to a reference nucleic acid sequence.
A nuclease described herein may substantially identical to a reference polypeptide if the nuclease comprises an amino acid sequence having at least about 60%, least about 65%, least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity to the amino acid sequence of the reference polypeptide. The percent identity between two such polypeptides can be determined manually by inspection of the two optimally aligned polypeptide sequences or by using software programs or algorithms (e.g., BLAST, ALIGN, CLUSTAL) using standard parameters. One indication that two polypeptides are substantially identical is that the first polypeptide is immunologically cross-reactive with the second polypeptide. Typically, polypeptides that differ by conservative amino acid substitutions are immunologically cross-reactive. Thus, a polypeptide is substantially identical to a second polypeptide, for example, where the two peptides differ only by a conservative amino acid substitution or one or more conservative amino acid substitutions.
In some embodiments, a Cas12i2 domain of the present invention comprises a polypeptide sequence having 50, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% identity to any one of SEQ ID NOs: 1 and 39-43. In some embodiments, a Cas12i2 domain of the present invention comprises a polypeptide sequence having greater than 50, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% identity to any one of SEQ ID NO: 1 and SEQ ID NOs: 39-43.
In some embodiments, a nuclease of the present invention is a Cas12i2 domain having a specified degree of amino acid sequence identity to one or more reference polypeptides, e.g., at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or even at least 99% sequence identity to the amino acid sequence of any one of SEQ ID NO: 1 and SEQ ID NOs: 39-43. Homology or identity can be determined by amino acid sequence alignment, e.g., using a program such as BLAST, ALIGN, or CLUSTAL, as described herein. In some embodiments, a Cas12i2 domain having a specified degree of amino acid sequence identity to one or more reference polypeptides retains one or more characteristics, e.g., nuclease activity and/or DNA binding activity, as the one or more reference polypeptides.
Also provided is a Cas12i2 domain of the present invention having enzymatic activity, e.g., nuclease activity, and comprising an amino acid sequence which differs from the amino acid sequences of any one of any one of SEQ ID NO: 1 and SEQ ID NOs: 39-43 by no more than 50, no more than 40, no more than 35, no more than 30, no more than 25, no more than 20, no more than 19, no more than 18, no more than 17, no more than 16, no more than 15, no more than 14, no more than 13, no more than 12, no more than 11, no more than 10, no more than 9, no more than 8, no more than 7, no more than 6, no more than 5, no more than 4, no more than 3, no more than 2, or no more than 1 amino acid residue(s), when aligned using any of the previously described alignment methods.
In some embodiments, a Cas12i2 domain of the present invention comprises a RuvC domain. In some embodiments, a Cas12i2 domain of the present invention comprises a split RuvC domain or two or more partial RuvC domains. For example, a Cas12i2 domain comprises RuvC motifs that are not contiguous with respect to the primary amino acid sequence of the Cas12i2 domain but form a RuvC domain once the protein folds. In some embodiments, the catalytic residue of a RuvC motif is a glutamic acid residue and/or an aspartic acid residue. For example, in some embodiments, the nuclease of SEQ ID NO: 1 comprises one or more of the following catalytic residues: D599, E833, and D1019.
In some embodiments, the invention includes an isolated, recombinant, substantially pure, or non-naturally occurring Cas12i2 fusion protein comprising a Cas12i2 domain comprising a RuvC domain, wherein the Cas12i2 domain has enzymatic activity, e.g., nuclease activity, wherein the Cas12i2 domain comprises an amino acid sequence having at least about 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to any one of SEQ ID NO: 1 and SEQ ID NOs: 39-43.
In some embodiments, the biochemistry of a Cas12i2 fusion protein (e.g., a Cas12i2 domain of a Cas12i2 fusion protein) described herein is analyzed using one or more assays. In some embodiments, the biochemical characteristics of a Cas12i2 fusion protein described herein are analyzed in vitro using a purified nuclease incubated with an RNA guide (e.g., a mature crRNA) and a target DNA molecule. In some embodiments, the biochemical characteristics of a Cas12i2 fusion protein described herein are analyzed in vitro using a fluorescence depletion assay. In some embodiments, the biochemical characteristics of a Cas12i2 fusion protein described herein are analyzed in mammalian cells, as described in Example 1.
Described herein are Cas12i2 fusion proteins, compositions, and methods relating to a Cas12i2 fusion protein of the present invention. The compositions and methods are based, in part, on the observation that cloned and expressed polypeptides of the present invention have nuclease activity.
In some embodiments, a Cas12i2 fusion protein and an RNA guide as described herein form a complex (e.g., an RNP). In some embodiments, the complex includes other components. In some embodiments, the complex is activated upon binding to a target nucleic acid, e.g., to a target strand of a target nucleic acid, that has complementarity to a spacer sequence in the RNA guide. In some embodiments, the target nucleic acid is a double-stranded DNA (dsDNA). In some embodiments, the target nucleic acid is a single-stranded DNA (ssDNA). In some embodiments, the target nucleic acid is a single-stranded RNA (ssRNA). In some embodiments, the target nucleic acid is a double-stranded RNA (dsRNA). In some embodiments, the sequence-specificity requires a complete match of the spacer sequence in the RNA guide to the target nucleic acid, e.g., to a target strand of the target nucleic acid. In other embodiments, the sequence specificity requires a partial (contiguous or non-contiguous) match of the spacer sequence in the RNA guide to the target nucleic acid, e.g., to a target strand of the target nucleic acid.
In some embodiments, the complex becomes activated upon binding to the target nucleic acid. In some embodiments, the activated complex exhibits “multiple turnover” activity, whereby upon acting on (e.g., cleaving) the target nucleic acid, the activated complex remains in an activated state. In some embodiments, the activated complex exhibits “single turnover” activity, whereby upon acting on the target nucleic acid, the complex reverts to an inactive state.
In some embodiments, a Cas12i2 fusion protein described herein comes into contact with a target nucleic acid at a sequence defined by the region of complementarity between the RNA guide and the target nucleic acid. In some embodiments, the PAM sequence of a Cas12i2 fusion protein described herein is located directly upstream of the target sequence of the target nucleic acid (e.g., directly 5′ of the target sequence). In some embodiments, the PAM sequence of a Cas12i2 fusion protein described herein is located directly 5′ of the target sequence on the non-spacer-complementary strand (e.g., non-target strand) of the target nucleic acid.
In some embodiments, a nuclease of the present invention targets a sequence adjacent to a PAM, wherein the PAM comprises a nucleotide sequence set forth as 5′-TTN-3′, 5′-TTH-3′, 5′-TTY-3′, or 5′-TTC-3′, wherein “N” is any nucleobase, “H” is A, C, or T, and “Y” is C or T. In some embodiments, a Cas12i2 fusion protein (e.g., a Cas12i2 domain) described herein cleaves ssDNA. In some embodiments, a Cas12i2 fusion protein described herein cleaves dsDNA. In some embodiments, a Cas12i2 fusion protein described herein is a nickase (e.g., the Cas12i2 domain cleaves one strand of a double-stranded target nucleic acid).
In some embodiments, a Cas12i2 fusion protein (e.g., the Cas12i2 domain or the fusion domain) of the present invention has enzymatic activity, e.g., nuclease activity, over a broad range of pH conditions. In some embodiments, the Cas12i2 fusion protein has enzymatic activity, e.g., nuclease activity, at a pH of from about 3.0 to about 12.0. In some embodiments, the Cas12i2 fusion protein has enzymatic activity at a pH of from about 4.0 to about 10.5. In some embodiments, the Cas12i2 fusion protein has enzymatic activity at a pH of from about 5.5 to about 8.5. In some embodiments, the Cas12i2 fusion protein has enzymatic activity at a pH of from about 6.0 to about 8.0. In some embodiments, the Cas12i2 fusion protein has enzymatic activity at a pH of about 7.0.
In some embodiments, a Cas12i2 fusion protein (e.g., the Cas12i2 domain or the fusion domain) of the present invention has enzymatic activity, e.g., nuclease activity, at a temperature range of from about 10° C. to about 100° C. In some embodiments, a Cas12i2 fusion protein of the present invention has enzymatic activity at a temperature range from about 20° C. to about 90° C. In some embodiments, a Cas12i2 fusion protein of the present invention has enzymatic activity at a temperature of about 20° C. to about 25° C. or at a temperature of about 37° C.
In some embodiments wherein a Cas12i2 fusion protein (e.g., the Cas12i2 domain or the fusion domain) of the present invention induces double-stranded breaks or single-stranded breaks in a target nucleic acid, (e.g. genomic DNA), the double-stranded break can stimulate cellular endogenous DNA-repair pathways, including Homology Directed Recombination (HDR), Non-Homologous End Joining (NHEJ), or Alternative Non-Homologues End-Joining (A-NHEJ). NHEJ can repair cleaved target nucleic acid without the need for a homologous template. This can result in deletion or insertion of one or more nucleotides at the target locus. HDR can occur with a homologous template, such as the donor DNA. The homologous template can comprise sequences that are homologous to sequences flanking the target nucleic acid cleavage site. In some cases, HDR can insert an exogenous polynucleotide sequence into the cleave target locus. The modifications of the target DNA due to NHEJ and/or HDR can lead to, for example, mutations, deletions, alterations, integrations, gene correction, gene replacement, gene tagging, transgene knock-in, gene disruption, and/or gene knock-outs.
In some embodiments, binding of a Cas12i2 fusion protein/RNA guide complex to a target locus in a cell recruits one or more endogenous cellular molecules or pathways other than DNA repair pathways to modify the target nucleic acid. In some embodiments, binding of a Cas12i2 fusion protein/RNA guide complex blocks access of one or more endogenous cellular molecules or pathways to the target nucleic acid, thereby modifying the target nucleic acid. For example, binding of a Cas12i2 fusion protein/RNA guide complex may block endogenous transcription or translation machinery to decrease the expression of the target nucleic acid.
In some embodiments, the present invention includes variants of a Cas12i2 domain described herein. In some embodiments, a Cas12i2 domain described herein can be mutated at one or more amino acid residues to modify one or more functional activities. For example, in some embodiments, a Cas12i2 domain of the present invention is mutated at one or more amino acid residues to modify its nuclease activity (e.g., cleavage activity). For example, in some embodiments, a Cas12i2 domain may comprise one or more mutations that increase the ability of the Cas12i2 domain to cleave a target nucleic acid. In some embodiments, a Cas12i2 domain is mutated at one or more amino acid residues to modify its ability to functionally associate with an RNA guide. In some embodiments, a Cas12i2 domain is mutated at one or more amino acid residues to modify its ability to functionally associate with a target nucleic acid.
In some embodiments, a variant Cas12i2 domain has a conservative or non-conservative amino acid substitution, deletion or addition. In some embodiments, the variant Cas12i2 domain has a silent substitution, deletion or addition, or a conservative substitution, none of which alter the polypeptide activity of the present invention. Typical examples of the conservative substitution include substitution whereby one amino acid is exchanged for another, such as exchange among aliphatic amino acids Ala, Val, Leu and Ile, exchange between hydroxyl residues Ser and Thr, exchange between acidic residues Asp and Glu, substitution between amide residues Asn and Gln, exchange between basic residues Lys and Arg, and substitution between aromatic residues Phe and Tyr. In some embodiments, one or more residues of a Cas12i2 domain disclosed herein are mutated to an Arg residue. In some embodiments, one or more residues of a Cas12i2 domain disclosed herein are mutated to a Gly residue.
A variety of methods are known in the art that are suitable for generating modified polynucleotides that encode variant Cas12i2 domains of the invention, including, but not limited to, for example, site-saturation mutagenesis, scanning mutagenesis, insertional mutagenesis, deletion mutagenesis, random mutagenesis, site-directed mutagenesis, and directed-evolution, as well as various other recombinatorial approaches. Methods for making modified polynucleotides and proteins (e.g., nucleases) include DNA shuffling methodologies, methods based on non-homologous recombination of genes, such as ITCHY (See, Ostermeier et al., 7:2139-44 [1999]), SCRACHY (See, Lutz et al. 98:11248-53 [2001]), SHIPREC (See, Sieber et al., 19:456-60 [2001]), and NRR (See, Bittker et al., 20:1024-9 [2001]; Bittker et al., 101:7011-6 [2004]), and methods that rely on the use of oligonucleotides to insert random and targeted mutations, deletions and/or insertions (See, Ness et al., 20:1251-5 [2002]; Coco et al., 20:1246-50 [2002]; Zha et al., 4:34-9 [2003]; Glaser et al., 149:3903-13 [1992]).
In some embodiments, a Cas12i2 domain of the present invention comprises an alteration at one or more (e.g., several) amino acids, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 162, 164, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 193, 194, 195, 196, 197, 198, 199, 200, or more.
In some embodiments, a variant Cas12i2 domain comprises one or more of the amino acid substitutions listed in Table 2 relative to the sequence of SEQ ID NO: 1. In some embodiments, the variant Cas12i2 domain comprises at least one of a D581, G624, F626, D835, L836, P868, S879, D911, 1926, V1020, V1030, E1035, and S1046 substitution. In some embodiments, the variant Cas12i2 domain comprises at least one of a D581R, G624R, F626R, D835R, L836R, P868R, S879R, D911R, I926R, V1020R, V1030R, E1035R, and S1046R substitution. In some embodiments, the variant Cas12i2 domain comprises at least one of a D581G, G624G, F626G, D835G, L836G, P868G, S879G, D911G, I926G, V1020G, V1030G, and S1046G substitution. In some embodiments, the variant Cas12i2 domain comprises at least one of a D581R, G624R, F626R, D835R, L836R, P868R, S879R, D911R, I926R, V1020G, V1030G, E1035R, and S1046G substitution and at least one additional substitution listed in Table 2.
In some embodiments, the variant Cas12i2 domain of SEQ ID NO: 39 comprises the following mutations relative to SEQ ID NO: 1: D581R D911R I926R V1030G. In some embodiments, the variant Cas12i2 domain of SEQ ID NO: 40 comprises the following mutations relative to SEQ ID NO: 1: D581R 1926R V1030G. In some embodiments, the variant Cas12i2 domain of SEQ ID NO: 41 comprises the following mutations relative to SEQ ID NO: 1: D581R I926R V1030G S1046G. In some embodiments, the variant Cas12i2 domain of SEQ ID NO: 42 comprises the following mutations relative to SEQ ID NO: 1: D51R G624R F626R I926R V1030G E1035R S1046G. In some embodiments, the variant Cas12i2 domain of SEQ ID NO: 43 comprises the following mutations relative to SEQ ID NO: 1: D581R G624R F626R P868T I926R V1030G E1035R S51046G.
In some embodiments, the variant Cas12i2 domain comprises the amino acid substitutions listed in Table 3.
Although the changes described herein may be one or more amino acid changes, changes to a Cas12i2 fusion protein may also be of a substantive nature, such as fusion of polypeptides as amino- and/or carboxyl-terminal extensions. For example, a Cas12i2 fusion protein may contain additional peptides, e.g., one or more peptides. Examples of additional peptides may include epitope peptides for labelling, such as a polyhistidine tag (His-tag), Myc, and FLAG. In some embodiments, a Cas12i2 fusion protein comprises: MKIEEGKGHHHHHH (SEQ ID NO: 66) or KIEEGKGHHHHHH (SEQ ID NO: 67). For example, in some embodiments, a Cas12i2 fusion protein comprises at least 80% (81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) identity to SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 73, or SEQ ID NO: 74. In some embodiments, a Cas12i2 fusion protein of any one of SEQ ID NOs: 45-52 is fused to a peptide sequence of SEQ ID NO: 66 or SEQ ID NO: 67. In some embodiments, a Cas12i2 fusion protein described herein can be fused to a detectable moiety such as a fluorescent protein (e.g., green fluorescent protein (GFP) or yellow fluorescent protein (YFP)).
In those embodiments where a tag is fused to a CRISPR nuclease (e.g., a Cas12i2 fusion protein), such tag may facilitate affinity-based or charge-based purification of the CRISPR nuclease (e.g., the Cas12i2 fusion protein), e.g., by liquid chromatography or bead separation utilizing an immobilized affinity or ion-exchange reagent. As a non-limiting example, a recombinant CRISPR nuclease of this disclosure comprises a polyhistidine (His) tag, and for purification is loaded onto a chromatography column comprising an immobilized metal ion (e.g. a Zn2+, Ni2+, Cu2+ ion chelated by a chelating ligand immobilized on the resin, which resin may be an individually prepared resin or a commercially available resin or ready to use column. Following the loading step, the column is optionally rinsed, e.g., using one or more suitable buffer solutions, and the His-tagged protein is then eluted using a suitable elution buffer. Alternatively, or additionally, if the recombinant CRISPR nuclease of this disclosure utilizes a FLAG-tag, such protein may be purified using immunoprecipitation methods known in the industry. Other suitable purification methods for tagged CRISPR nucleases or accessory proteins of this disclosure will be evident to those of skill in the art.
A nuclease described herein can be modified to have diminished nuclease activity, e.g., nuclease inactivation of at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, or 100%, as compared to a reference nuclease. Nuclease activity can be diminished by several methods known in the art, e.g., introducing mutations into the RuvC domain (e.g., one or more catalytic residues of the RuvC domain). In a non-limiting example, a variant of SEQ ID NO: 1 comprising a mutation in residue D599, residue E833, and/or residue D1019 demonstrates diminished or no nuclease activity.
In some embodiments, the Cas12i2 fusion protein described herein can be self-inactivating. See, Epstein et al., “Engineering a Self-Inactivating CRISPR System for AAV Vectors,” Mol. Ther., 24 (2016): S50, which is incorporated by reference in its entirety.
Nucleic acid molecules encoding the Cas12i2 fusion protein described herein can further be codon-optimized. The nucleic acid can be codon-optimized for use in a particular host cell, such as a bacterial cell or a mammalian cell.
In some instances, a linker is a covalent linkage or connection between two or more components described herein. In some embodiments, the linker comprises a chemical linker. In some embodiments, a linker comprises a functional group pair. In some embodiments, a linker is a peptide linker. In some instances, the linker(s) is located N-terminal of the fusion domain. In some instances, the linker(s) is located C-terminal of the fusion domain. In some instances, a first linker is located N-terminal of the fusion domain and the second linker is located C-terminal of the fusion domain. In some embodiments, a first linker(s) is located C-terminal of a first fusion domain and a second linker is located N-terminal of a second fusion domain.
In some embodiments, a heterologous sequence comprises one or more linkers (e.g., peptide linkers) of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, or more amino acid residues. In some embodiments, the linker can be located N-terminal of a fusion domain. In certain embodiments, the linker can be located C-terminal of a fusion domain. The linker sequence may comprise any naturally occurring amino acid. In some embodiments, the linker comprises amino acids glycine and serine. In some embodiments, the linker comprises sets of glycine and serine repeats such as (G4S)x, where x is a positive integer between 0 and 15 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In some embodiments, the linker comprises an amino acid sequence of (GSG)x, wherein X is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In some embodiments, the linker comprises an amino acid sequence of (GSSG)x, wherein X is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In some embodiments, the linker comprises an amino acid sequence of (GSS)x, wherein X is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In some embodiments, the linker comprises an amino acid sequence of GSSGSSGSSGSSGSS (SEQ ID NO: 44). In some embodiments, the linker can comprise the amino acid sequence of any of the following:
In some embodiments, the linker comprises the 16 residue “XTEN” linker, or a variant thereof (see, e.g., Schellenberger et al. (Nat. Biotechnol. 27: 1186-1190, 2009), the entirety of which is incorporated herein by reference.
In some embodiments, any peptide linker described herein may further comprise between 1-5 (e.g., 1, 2, 3, 4, or 5) amino acid residues N-terminal or C-terminal of the peptide linker. The 1-5 amino acids residues N-terminal or C-terminal of the peptide linker can comprise any naturally occurring or modified amino acid residue.
Also included within the scope of the invention are linkers described in WO2012/138475, incorporated herein by reference in its entirety.
In some embodiments, the composition described herein comprises a targeting moiety.
The targeting moiety may be substantially identical to a reference nucleic acid sequence if the targeting moiety comprises a sequence having least about 60%, least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity to the reference nucleic acid sequence. The percent identity between two such nucleic acids can be determined manually by inspection of the two optimally aligned nucleic acid sequences or by using software programs or algorithms (e.g., BLAST, ALIGN, CLUSTAL) using standard parameters. One indication that two nucleic acid sequences are substantially identical is that the two nucleic acid molecules hybridize to each other under stringent conditions (e.g., within a range of medium to high stringency).
In some embodiments, the targeting moiety has at least about 60%, least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity to the reference nucleic acid sequence.
In some embodiments, the targeting moiety comprises, or is, an RNA guide sequence. In some embodiments, the RNA guide sequence directs a Cas12i2 fusion protein described herein to a particular nucleic acid sequence. Those skilled in the art reading the below examples of particular kinds of RNA guide sequences will understand that, in some embodiments, an RNA guide sequence is site-specific. That is, in some embodiments, an RNA guide sequence associates specifically with one or more target nucleic acid sequences (e.g., specific DNA or genomic DNA sequences) and not to non-targeted nucleic acid sequences (e.g., non-specific DNA or random sequences).
In some embodiments, the composition as described herein comprises an RNA guide sequence that associates with a Cas12i2 domain of a Cas12i2 fusion protein described herein and directs a Cas12i2 fusion protein to a target nucleic acid sequence (e.g., DNA). The RNA guide sequence may associate with a nucleic acid sequence and alter functionality of a Cas12i2 fusion protein (e.g., alters affinity of the Cas12i2 fusion protein to a molecule, e.g., at least 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or more).
The RNA guide sequence may target (e.g., associate with, be directed to, contact, or bind) one or more nucleotides of a sequence, e.g., a site-specific sequence or a site-specific target. In some embodiments, a Cas12i2 domain (e.g., a Cas12i2 domain of a Cas12i2 fusion protein plus an RNA guide) is activated upon binding to a target nucleic acid, e.g., to a target strand of a target nucleic acid, wherein the target strand of the target nucleic acid has complementarity to a spacer sequence in the RNA guide.
In some embodiments, an RNA guide sequence comprises a spacer sequence. In some embodiments, the spacer sequence of the RNA guide sequence may be generally designed to have a length of between 15-35 nucleotides (e.g., 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides) and be complementary to a specific nucleic acid sequence. In some particular embodiments, the RNA guide sequence may be designed to be complementary to a specific DNA strand, e.g., of a genomic locus. In some embodiments, the spacer sequence is designed to be complementary to a specific DNA strand, e.g., of a genomic locus.
In certain embodiments, the RNA guide sequence includes, consists essentially of, or comprises a direct repeat sequence linked to a sequence or spacer sequence. In some embodiments, the RNA guide sequence includes a direct repeat sequence and a spacer sequence or a direct repeat-spacer-direct repeat sequence. In some embodiments, the RNA guide sequence includes a truncated direct repeat sequence and a spacer sequence, which is typical of processed or mature crRNA. In some embodiments, a nuclease forms a complex with the RNA guide sequence, and the RNA guide sequence directs the complex to associate with site-specific target nucleic acid that is complementary to at least a portion of the RNA guide sequence.
In some embodiments, the RNA guide sequence comprises a sequence, e.g., RNA sequence, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% complementary to a target nucleic acid sequence. In some embodiments, the RNA guide sequence comprises a sequence at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% complementary to a DNA sequence. In some embodiments, the RNA guide sequence comprises a sequence at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% complementary to a target nucleic acid sequence. In some embodiments, the RNA guide sequence comprises a sequence at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% complementary to a genomic sequence. In some embodiments, the RNA guide sequence comprises a sequence complementary to or a sequence comprising at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% complementarity to a genomic sequence.
In some embodiments, a nuclease described herein includes one or more (e.g., two, three, four, five, six, seven, eight, or more) RNA guide sequences, e.g., RNA guides.
In some embodiments, the RNA guide has an architecture similar to, for example International Publication Nos. WO 2014/093622 and WO 2015/070083, the entire contents of each of which are incorporated herein by reference.
In some embodiments, an RNA guide sequence of the present invention comprises a direct repeat sequence having 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% identity the direct repeat sequences of Table 4. In some embodiments, an RNA guide of the present invention comprises a direct repeat sequence having greater than 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100% identity to the direct repeat sequences of Table 4.
In some embodiments, a Cas12i2 fusion protein and an RNA guide (e.g., an RNA guide comprising a direct repeat and a spacer) form a complex. In some embodiments, a Cas12i2 fusion protein and an RNA guide (e.g., an RNA guide comprising direct repeat-spacer-direct repeat sequence or pre-crRNA) form a complex. In some embodiments, the complex binds a target nucleic acid. In some embodiments, the Cas12i2 fusion protein comprises an amino acid sequence that is at least 80% (e.g., 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%) identical to the amino acid sequence of SEQ ID NO: 1, and the direct repeat sequence comprises a nucleotide sequence that is at least 80% (e.g., 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%) identical to the nucleotide sequence of SEQ ID NO: 36 or SEQ ID NO: 37.
In some embodiments, the spacer of an RNA guide binds to a target nucleic acid, e.g., to the target strand (i.e., non-PAM strand) of a target nucleic acid, wherein the non-target strand (i.e., PAM strand) comprises a target sequence adjacent to a PAM sequence of any one of 5′-TTN-3′, 5′-TTH-3′, 5′-TTY-3′, or 5′-TTC-3′.
In some embodiments, the gRNA (e.g., a crRNA) comprises:
In some embodiments, a Cas12i2 fusion protein and an RNA guide (e.g., an RNA guide comprising a direct repeat and a spacer) form a complex. In some embodiments, a Cas12i2 fusion protein and an RNA guide (e.g., an RNA guide comprising direct repeat-spacer-direct repeat sequence or pre-crRNA) form a complex. In some embodiments, the complex binds a target nucleic acid. In some embodiments, the Cas12i2 fusion protein comprises an amino acid sequence that is at least 80% (e.g., 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%) identical to the amino acid sequence of SEQ ID NO: 1, and the direct repeat sequence comprises a nucleotide sequence that is at least 80% (e.g., 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%) identical to the nucleotide sequence of SEQ ID NO: 38.
In some embodiments, an RNA guide described herein comprises a uracil (U). In some embodiments, an RNA guide described herein comprises a thymine (T). In some embodiments, a direct repeat sequence of an RNA guide described herein comprises a uracil (U). In some embodiments, a direct repeat sequence of an RNA guide described herein comprises a thymine (T). Unless otherwise noted, all compositions and nucleases provided herein are made in reference to the active level of that composition or nuclease, and are exclusive of impurities, for example, residual solvents or by-products, which may be present in commercially available sources. Nuclease component weights are based on total active protein. All percentages and ratios are calculated by weight unless otherwise indicated. All percentages and ratios are calculated based on the total composition unless otherwise indicated. In the exemplified composition, the nuclease levels are expressed by pure enzyme by weight of the total composition and unless otherwise specified, the ingredients are expressed by weight of the total compositions.
The RNA guide sequence or any of the nucleic acid sequences encoding a Cas12i2 fusion protein described herein may include one or more covalent modifications with respect to a reference sequence, in particular the parent polyribonucleotide, which are included within the scope of this invention.
Exemplary modifications can include any modification to the sugar, the nucleobase, the internucleoside linkage (e.g. to a linking phosphate/to a phosphodiester linkage/to the phosphodiester backbone), and any combination thereof. Some of the exemplary modifications provided herein are described in detail below.
The RNA guide sequence or any of the nucleic acid sequences encoding components of a Cas12i2 fusion protein may include any useful modification, such as to the sugar, the nucleobase, or the internucleoside linkage (e.g. to a linking phosphate/to a phosphodiester linkage/to the phosphodiester backbone). One or more atoms of a pyrimidine nucleobase may be replaced or substituted with optionally substituted amino, optionally substituted thiol, optionally substituted alkyl (e.g., methyl or ethyl), or halo (e.g., chloro or fluoro). In certain embodiments, modifications (e.g., one or more modifications) are present in each of the sugar and the internucleoside linkage. Modifications may be modifications of ribonucleic acids (RNAs) to deoxyribonucleic acids (DNAs), threose nucleic acids (TNAs), glycol nucleic acids (GNAs), peptide nucleic acids (PNAs), locked nucleic acids (LNAs) or hybrids thereof). Additional modifications are described herein.
In some embodiments, the modification may include a chemical or cellular induced modification. For example, some nonlimiting examples of intracellular RNA modifications are described by Lewis and Pan in “RNA modifications and structures cooperate to guide RNA-protein interactions” from Nat Reviews Mol Cell Biol, 2017, 18:202-210.
Different sugar modifications, nucleotide modifications, and/or internucleoside linkages (e.g., backbone structures) may exist at various positions in the sequence. One of ordinary skill in the art will appreciate that the nucleotide analogs or other modification(s) may be located at any position(s) of the sequence, such that the function of the sequence is not substantially decreased. The sequence may include from about 1% to about 100% modified nucleotides (either in relation to overall nucleotide content, or in relation to one or more types of nucleotide, i.e. any one or more of A, G, U or C) or any intervening percentage (e.g., from 1% to 20%>, from 1% to 25%, from 1% to 50%, from 1% to 60%, from 1% to 70%, from 1% to 80%, from 1% to 90%, from 1% to 95%, from 10% to 20%, from 10% to 25%, from 10% to 50%, from 10% to 60%, from 10% to 70%, from 10% to 80%, from 10% to 90%, from 10% to 95%, from 10% to 100%, from 20% to 25%, from 20% to 50%, from 20% to 60%, from 20% to 70%, from 20% to 80%, from 20% to 90%, from 20% to 95%, from 20% to 100%, from 50% to 60%, from 50% to 70%, from 50% to 80%, from 50% to 90%, from 50% to 95%, from 50% to 100%, from 70% to 80%, from 70% to 90%, from 70% to 95%, from 70% to 100%, from 80% to 90%, from 80% to 95%, from 80% to 100%, from 90% to 95%, from 90% to 100%, and from 95% to 100%).
In some embodiments, sugar modifications (e.g., at the 2′ position or 4′ position) or replacement of the sugar at one or more ribonucleotides of the sequence may, as well as backbone modifications, include modification or replacement of the phosphodiester linkages. Specific examples of a sequence include, but are not limited to, sequences including modified backbones or no natural internucleoside linkages such as internucleoside modifications, including modification or replacement of the phosphodiester linkages. Sequences having modified backbones include, among others, those that do not have a phosphorus atom in the backbone. For the purposes of this application, and as sometimes referenced in the art, modified RNAs that do not have a phosphorus atom in their internucleoside backbone can also be considered to be oligonucleosides. In particular embodiments, a sequence will include ribonucleotides with a phosphorus atom in its internucleoside backbone.
Modified sequence backbones may include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates such as 3′-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates such as 3′-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates having normal 3′-5′ linkages, 2′-5′ linked analogs of these, and those having inverted polarity wherein the adjacent pairs of nucleoside units are linked 3′-5′ to 5′-3′ or 2′-5′ to 5′-2′. Various salts, mixed salts and free acid forms are also included. In some embodiments, the sequence may be negatively or positively charged.
The modified nucleotides, which may be incorporated into the sequence, can be modified on the internucleoside linkage (e.g., phosphate backbone). Herein, in the context of the polynucleotide backbone, the phrases “phosphate” and “phosphodiester” are used interchangeably. Backbone phosphate groups can be modified by replacing one or more of the oxygen atoms with a different substituent. Further, the modified nucleosides and nucleotides can include the wholesale replacement of an unmodified phosphate moiety with another internucleoside linkage as described herein. Examples of modified phosphate groups include, but are not limited to, phosphorothioate, phosphoroselenates, boranophosphates, boranophosphate esters, hydrogen phosphonates, phosphoramidates, phosphorodiamidates, alkyl or aryl phosphonates, and phosphotriesters. Phosphorodithioates have both non-linking oxygens replaced by sulfur. The phosphate linker can also be modified by the replacement of a linking oxygen with nitrogen (bridged phosphoramidates), sulfur (bridged phosphorothioates), and carbon (bridged methylene-phosphonates).
The α-thio substituted phosphate moiety is provided to confer stability to RNA and DNA polymers through the unnatural phosphorothioate backbone linkages. Phosphorothioate DNA and RNA have increased nuclease resistance and subsequently a longer half-life in a cellular environment.
In specific embodiments, a modified nucleoside includes an alpha-thio-nucleoside (e.g., 5′-O-(1-thiophosphate)-adenosine, 5′-O-(1-thiophosphate)-cytidine (a-thio-cytidine), 5′-O-(1-thiophosphate)-guanosine, 5′-O-(1-thiophosphate)-uridine, or 5′-O-(1-thiophosphate)-pseudouridine).
Other internucleoside linkages that may be employed according to the present invention, including internucleoside linkages which do not contain a phosphorous atom, are described herein.
In some embodiments, the sequence may include one or more cytotoxic nucleosides. For example, cytotoxic nucleosides may be incorporated into sequence, such as bifunctional modification. Cytotoxic nucleoside may include, but are not limited to, adenosine arabinoside, 5-azacytidine, 4′-thio-aracytidine, cyclopentenylcytosine, cladribine, clofarabine, cytarabine, cytosine arabinoside, 1-(2-C-cyano-2-deoxy-beta-D-arabino-pentofuranosyl)-cytosine, decitabine, 5-fluorouracil, fludarabine, floxuridine, gemcitabine, a combination of tegafur and uracil, tegafur ((RS)-5-fluoro-1-(tetrahydrofuran-2-yl)pyrimidine-2,4(1H,3H)-dione), troxacitabine, tezacitabine, 2′-deoxy-2′-methylidenecytidine (DMDC), and 6-mercaptopurine. Additional examples include fludarabine phosphate, N4-behenoyl-1-beta-D-arabinofuranosylcytosine, N4-octadecyl-1-beta-D-arabinofuranosylcytosine, N4-palmitoyl-1-(2-C-cyano-2-deoxy-beta-D-arabino-pentofuranosyl) cytosine, and P-4055 (cytarabine 5′-elaidic acid ester).
In some embodiments, the sequence includes one or more post-transcriptional modifications (e.g., capping, cleavage, polyadenylation, splicing, poly-A sequence, methylation, acylation, phosphorylation, methylation of lysine and arginine residues, acetylation, and nitrosylation of thiol groups and tyrosine residues, etc.). The one or more post-transcriptional modifications can be any post-transcriptional modification, such as any of the more than one hundred different nucleoside modifications that have been identified in RNA (Rozenski, J, Crain, P, and McCloskey, J. (1999). The RNA Modification Database: 1999 update. Nucl Acids Res 27: 196-197). In some embodiments, the first isolated nucleic acid comprises messenger RNA (mRNA). In some embodiments, the mRNA comprises at least one nucleoside selected from the group consisting of pyridine-4-one ribonucleoside, 5-aza-uridine, 2-thio-5-aza-uridine, 2-thiouridine, 4-thio-pseudouridine, 2-thio-pseudouridine, 5-hydroxyuridine, 3-methyluridine, 5-carboxymethyl-uridine, 1-carboxymethyl-pseudouridine, 5-propynyl-uridine, 1-propynyl-pseudouridine, 5-taurinomethyluridine, 1-taurinomethyl-pseudouridine, 5-taurinomethyl-2-thio-uridine, 1-taurinomethyl-4-thio-uridine, 5-methyl-uridine, 1-methyl-pseudouridine, 4-thio-1-methyl-pseudouridine, 2-thio-1-methyl-pseudouridine, 1-methyl-1-deaza-pseudouridine, 2-thio-1-methyl-1-deaza-pseudouridine, dihydrouridine, dihydropseudouridine, 2-thio-dihydrouridine, 2-thio-dihydropseudouridine, 2-methoxyuridine, 2-methoxy-4-thio-uridine, 4-methoxy-pseudouridine, and 4-methoxy-2-thio-pseudouridine. In some embodiments, the mRNA comprises at least one nucleoside selected from the group consisting of 5-aza-cytidine, pseudoisocytidine, 3-methyl-cytidine, N4-acetylcytidine, 5-formylcytidine, N4-methylcytidine, 5-hydroxymethylcytidine, 1-methyl-pseudoisocytidine, pyrrolo-cytidine, pyrrolo-pseudoisocytidine, 2-thio-cytidine, 2-thio-5-methyl-cytidine, 4-thio-pseudoisocytidine, 4-thio-1-methyl-pseudoisocytidine, 4-thio-1-methyl-1-deaza-pseudoisocytidine, 1-methyl-1-deaza-pseudoisocytidine, zebularine, 5-aza-zebularine, 5-methyl-zebularine, 5-aza-2-thio-zebularine, 2-thio-zebularine, 2-methoxy-cytidine, 2-methoxy-5-methyl-cytidine, 4-methoxy-pseudoisocytidine, and 4-methoxy-1-methyl-pseudoisocytidine. In some embodiments, the mRNA comprises at least one nucleoside selected from the group consisting of 2-aminopurine, 2, 6-diaminopurine, 7-deaza-adenine, 7-deaza-8-aza-adenine, 7-deaza-2-aminopurine, 7-deaza-8-aza-2-aminopurine, 7-deaza-2,6-diaminopurine, 7-deaza-8-aza-2,6-diaminopurine, 1-methyladenosine, N6-methyladenosine, N6-isopentenyladenosine, N6-(cis-hydroxyisopentenyl)adenosine, 2-methylthio-N6-(cis-hydroxyisopentenyl) adenosine, N6-glycinylcarbamoyladenosine, N6-threonylcarbamoyladenosine, 2-methylthio-N6-threonyl carbamoyladenosine, N6,N6-dimethyladenosine, 7-methyladenine, 2-methylthio-adenine, and 2-methoxy-adenine. In some embodiments, mRNA comprises at least one nucleoside selected from the group consisting of inosine, 1-methyl-inosine, wyosine, wybutosine, 7-deaza-guanosine, 7-deaza-8-aza-guanosine, 6-thio-guanosine, 6-thio-7-deaza-guanosine, 6-thio-7-deaza-8-aza-guanosine, 7-methyl-guanosine, 6-thio-7-methyl-guanosine, 7-methylinosine, 6-methoxy-guanosine, 1-methylguanosine, N2-methylguanosine, N2,N2-dimethylguanosine, 8-oxo-guanosine, 7-methyl-8-oxo-guanosine, 1-methyl-6-thio-guanosine, N2-methyl-6-thio-guanosine, and N2,N2-dimethyl-6-thio-guanosine.
The sequence may or may not be uniformly modified along the entire length of the molecule. For example, one or more or all types of nucleotide (e.g., naturally-occurring nucleotides, purine or pyrimidine, or any one or more or all of A, G, U, C, I, pU) may or may not be uniformly modified in the sequence, or in a given predetermined sequence region thereof. In some embodiments, the sequence includes a pseudouridine. In some embodiments, the sequence includes an inosine, which may aid in the immune system characterizing the sequence as endogenous versus viral RNAs. The incorporation of inosine may also mediate improved RNA stability/reduced degradation. See for example, Yu, Z. et al. (2015) RNA editing by ADARI marks dsRNA as “self”. Cell Res. 25, 1283-1284, which is incorporated by reference in its entirety.
The present invention also provides a vector for expressing a Cas12i2 fusion protein described herein or nucleic acids encoding a Cas12i2 fusion protein described herein may be incorporated into a vector. In some embodiments, a vector of the invention includes a nucleotide sequence encoding a Cas12i2 fusion protein described herein. In some embodiments, a vector of the invention includes a nucleotide sequence encoding a Cas12i2 fusion protein described herein.
The present invention also provides a vector that may be used for preparation of a Cas12i2 fusion protein described herein or compositions comprising a Cas12i2 fusion protein described herein. In some embodiments, the invention includes the composition or vector described herein in a cell. In some embodiments, the invention includes a method of expressing a composition comprising a Cas12i2 fusion protein of the present invention, or vector or nucleic acid encoding the Cas12i2 fusion protein, in a cell. The method may comprise the steps of providing the Cas12i2 fusion protein, e.g., vector or nucleic acid, and delivering the Cas12i2 fusion protein to the cell.
Expression of natural or synthetic polynucleotides is typically achieved by operably linking a polynucleotide encoding the gene of interest, e.g., nucleotide sequence encoding a Cas12i2 fusion protein of the present invention, to a promoter and incorporating the construct into an expression vector. The expression vector is not particularly limited as long as it includes a polynucleotide encoding a Cas12i2 fusion protein of the present invention and can be suitable for replication and integration in eukaryotic cells.
Typical expression vectors include transcription and translation terminators, initiation sequences, and promoters useful for expression of the desired polynucleotide. For example, plasmid vectors carrying a recognition sequence for RNA polymerase (pSP64, pBluescript, etc.). may be used. Vectors including those derived from retroviruses such as lentivirus are suitable tools to achieve long-term gene transfer since they allow long-term, stable integration of a transgene and its propagation in daughter cells. Examples of vectors include expression vectors, replication vectors, probe generation vectors, and sequencing vectors. The expression vector may be provided to a cell in the form of a viral vector.
Viral vector technology is well known in the art and described in a variety of virology and molecular biology manuals. Viruses which are useful as vectors include, but are not limited to phage viruses, retroviruses, adenoviruses, adeno-associated viruses, herpes viruses, and lentiviruses. In general, a suitable vector contains an origin of replication functional in at least one organism, a promoter sequence, convenient restriction endonuclease sites, and one or more selectable markers.
The kind of the vector is not particularly limited, and a vector that can be expressed in host cells can be appropriately selected. To be more specific, depending on the kind of the host cell, a promoter sequence to ensure the expression of a nuclease of the present invention from a polynucleotide is appropriately selected, and this promoter sequence and the polynucleotide are inserted into any of various plasmids etc. for preparation of the expression vector.
Additional promoter elements, e.g., enhancing sequences, regulate the frequency of transcriptional initiation. Typically, these are located in the region 30-110 bp upstream of the start site, although a number of promoters have recently been shown to contain functional elements downstream of the start site as well. Depending on the promoter, it appears that individual elements can function either cooperatively or independently to activate transcription.
Further, the disclosure should not be limited to the use of constitutive promoters. Inducible promoters are also contemplated as part of the disclosure. The use of an inducible promoter provides a molecular switch capable of turning on expression of the polynucleotide sequence which it is operatively linked when such expression is desired or turning off the expression when expression is not desired. Examples of inducible promoters include, but are not limited to a metallothionine promoter, a glucocorticoid promoter, a progesterone promoter, and a tetracycline promoter.
The expression vector to be introduced can also contain either a selectable marker gene or a reporter gene or both to facilitate identification and selection of expressing cells from the population of cells sought to be transfected or infected through viral vectors. In other aspects, the selectable marker may be carried on a separate piece of DNA and used in a co-transfection procedure. Both selectable markers and reporter genes may be flanked with appropriate transcriptional control sequences to enable expression in the host cells. Examples of such a marker include a dihydrofolate reductase gene and a neomycin resistance gene for eukaryotic cell culture; and a tetracycline resistance gene and an ampicillin resistance gene for culture of E. coli and other bacteria. By use of such a selection marker, it can be confirmed whether the polynucleotide encoding a nuclease of the present invention has been transferred into the host cells and then expressed without fail.
The preparation method for recombinant expression vectors is not particularly limited, and examples thereof include methods using a plasmid, a phage or a cosmid.
The Cas12i2 fusion protein described herein can be introduced into a variety of cells. In some embodiments, the cell is an isolated cell. In some embodiments, the cell is in cell culture. In some embodiments, the cell is ex vivo. In some embodiments, the cell is obtained from a living organism, and maintained in a cell culture. In some embodiments, the cell is a single-cellular organism.
In some embodiments, the cell is a prokaryotic cell. In some embodiments, the cell is a bacterial cell or derived from a bacterial cell. In some embodiments, the cell is an archaeal cell or derived from an archaeal cell.
In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a plant cell or derived from a plant cell. In some embodiments, the cell is a fungal cell or derived from a fungal cell. In some embodiments, the cell is an animal cell or derived from an animal cell. In some embodiments, the cell is an invertebrate cell or derived from an invertebrate cell. In some embodiments, the cell is a vertebrate cell or derived from a vertebrate cell. In some embodiments, the cell is a mammalian cell or derived from a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the cell is a zebra fish cell. In some embodiments, the cell is a rodent cell. In some embodiments, the cell is synthetically made, sometimes termed an artificial cell.
In some embodiments, the cell is derived from a cell line. A wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, 293T, MF7, K562, HeLa, and transgenic varieties thereof. Cell lines are available from a variety of sources known to those with skill in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassas, Va.)). In some embodiments, a cell transfected with one or more nucleic acids (such as nuclease polypeptide encoding vector and RNA guide) is used to establish a new cell line comprising one or more vector-derived sequences to establish a new cell line comprising modification to the target nucleic acid or target locus. In some embodiments, the cell is an immortal or immortalized cell.
In some embodiments, the cell is a primary cell. In some embodiments, the cell is a stem cell such as a totipotent stem cell (e.g., omnipotent), a pluripotent stem cell, a multipotent stem cell, an oligopotent stem cell, or an unipotent stem cell. In some embodiments, the cell is an induced pluripotent stem cell (iPSC) or derived from an iPSC. In some embodiments, the cell is a differentiated cell. For example, in some embodiments, the differentiated cell is a muscle cell (e.g., a myocyte), a fat cell (e.g., an adipocyte), a bone cell (e.g., an osteoblast, osteocyte, osteoclast), a blood cell (e.g., a monocyte, a lymphocyte, a neutrophil, an eosinophil, a basophil, a macrophage, a erythrocyte, or a platelet), a nerve cell (e.g., a neuron), an epithelial cell, an immune cell (e.g., a lymphocyte, a neutrophil, a monocyte, or a macrophage), a liver cell (e.g., a hepatocyte), a fibroblast, or a sex cell. In some embodiments, the cell is a terminally differentiated cell. For example, in some embodiments, the terminally differentiated cell is a neuronal cell, an adipocyte, a cardiomyocyte, a skeletal muscle cell, an epidermal cell, or a gut cell. In some embodiments, the cell is a mammalian cell, e.g., a human cell or a murine cell. In some embodiments, the murine cell is derived from a wild-type mouse, an immunosuppressed mouse, or a disease-specific mouse model.
In some embodiments, a Cas12i2 fusion protein of the present invention can be prepared by an in vitro coupled transcription-translation system. Bacteria that can be used for preparation of a Cas12i2 fusion protein of the present invention are not particularly limited as long as they can produce a Cas12i2 fusion protein of the present invention. Some non-limiting examples of the bacteria include E. coli cells described herein.
The present invention includes a method for protein expression, comprising translating a Cas12i2 fusion protein described herein.
In some embodiments, a host cell described herein is used to express a Cas12i2 fusion protein. The host cell is not particularly limited, and various known cells can be preferably used. Specific examples of the host cell include bacteria such as E. coli, yeasts (budding yeast, Saccharomyces cerevisiae, and fission yeast, Schizosaccharomyces pombe), nematodes (Caenorhabditis elegans), Xenopus laevis oocytes, and animal cells (for example, CHO cells, COS cells and HEK293 cells). The method for transferring the expression vector described above into host cells, i.e., the transformation method, is not particularly limited, and known methods such as electroporation, the calcium phosphate method, the liposome method and the DEAE dextran method can be used.
After a host is transformed with the expression vector, the host cells may be cultured, cultivated or bred, for production of a Cas12i2 fusion protein. After expression of the Cas12i2 fusion protein, the host cells can be collected and Cas12i2 fusion protein purified from the cultures etc. according to conventional methods (for example, filtration, centrifugation, cell disruption, gel filtration chromatography, ion exchange chromatography, etc.).
In some embodiments, the methods for Cas12i2 fusion protein expression comprises translation of at least 5 amino acids, at least 10 amino acids, at least 15 amino acids, at least 20 amino acids, at least 50 amino acids, at least 100 amino acids, at least 150 amino acids, at least 200 amino acids, at least 250 amino acids, at least 300 amino acids, at least 400 amino acids, at least 500 amino acids, at least 600 amino acids, at least 700 amino acids, at least 800 amino acids, at least 900 amino acids, or at least 1000 amino acids of a nuclease. In some embodiments, the methods for protein expression comprises translation of about 5 amino acids, about 10 amino acids, about 15 amino acids, about 20 amino acids, about 50 amino acids, about 100 amino acids, about 150 amino acids, about 200 amino acids, about 250 amino acids, about 300 amino acids, about 400 amino acids, about 500 amino acids, about 600 amino acids, about 700 amino acids, about 800 amino acids, about 900 amino acids, about 1000 amino acids, about 1100 amino acids, about 1200 amino acids, about 1300 amino acids, about 1400 amino acids, about 1500 amino acids, about 1600 amino acids, about 1700 amino acids, about 1800 amino acids, about 1900 amino acids, about 2000 amino acids, or more of a Cas12i2 fusion protein.
A variety of methods can be used to determine the level of production of a Cas12i2 fusion protein in a host cell. Such methods include, but are not limited to, for example, methods that utilize either polyclonal or monoclonal antibodies specific for a Cas12i2 fusion protein. Exemplary methods include, but are not limited to, enzyme-linked immunosorbent assays (ELISA), radioimmunoassays (MA), fluorescent immunoassays (FIA), and fluorescent activated cell sorting (FACS). These and other assays are well known in the art (See, e.g., Maddox et al., J. Exp. Med. 158:1211 [1983]).
The present disclosure provides methods of in vivo expression of a Cas12i2 fusion protein in a cell, comprising providing a polyribonucleotide encoding the Cas12i2 fusion protein to a host cell wherein the polyribonucleotide encodes the Cas12i2 fusion protein, expressing the Cas12i2 fusion protein in the cell, and obtaining the Cas12i2 fusion protein from the cell.
Compositions described herein may be formulated, for example, including a carrier, such as a carrier and/or a polymeric carrier, e.g., a liposome, and delivered by known methods to a cell (e.g., a prokaryotic, eukaryotic, plant, mammalian, etc.). Such methods include, but not limited to, transfection (e.g., lipid-mediated, cationic polymers, calcium phosphate, dendrimers); electroporation or other methods of membrane disruption (e.g., nucleofection), viral delivery (e.g., lentivirus, retrovirus, adenovirus, AAV), microinjection, microprojectile bombardment (“gene gun”), fugene, direct sonic loading, cell squeezing, optical transfection, protoplast fusion, impalefection, magnetofection, exosome-mediated transfer, lipid nanoparticle-mediated transfer, and any combination thereof.
In some embodiments, the method comprises delivering one or more nucleic acids (e.g., nucleic acids encoding a Cas12i2 fusion protein, RNA guide, donor DNA, etc.), one or more transcripts thereof, and/or a pre-formed Cas12i2 fusion protein/RNA guide complex to a cell. Exemplary intracellular delivery methods, include, but are not limited to: viruses or virus-like agents; chemical-based transfection methods, such as those using calcium phosphate, dendrimers, liposomes, or cationic polymers (e.g., DEAE-dextran or polyethylenimine); non-chemical methods, such as microinjection, electroporation, cell squeezing, sonoporation, optical transfection, impalefection, protoplast fusion, bacterial conjugation, delivery of plasmids or transposons; particle-based methods, such as using a gene gun, magnectofection or magnet assisted transfection, particle bombardment; and hybrid methods, such as nucleofection. In some embodiments, the present application further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells.
All references and publications cited herein are hereby incorporated by reference.
The following examples are provided to further illustrate some embodiments of the present invention but are not intended to limit the scope of the invention; it will be understood by their exemplary nature that other procedures, methodologies, or techniques known to those skilled in the art may alternatively be used.
This Example describes fusion protein activity (e.g., base editing or methylation) assessment on multiple targets using Cas12i2 fusion proteins introduced into mammalian cells by transient transfection.
The Cas12i2 fusion proteins described herein can be cloned into a pcda3.1 backbone (Invitrogen™). The plasmids can then be maxi-prepped and diluted. For RNA guide preparation, a dsDNA fragment encoding a crRNA can be derived by ultramers containing the target sequence scaffold, and the U6 promoter. Ultramers can be resuspended in Tris·HCl at a pH of 7.5. The amplification of the crRNA can be done using the aforementioned template, a forward primer, a reverse primer, NEB HiFi Polymerase, and water. Cycling conditions are: 1×(30 s at 98° C.), 30×(10 s at 98° C., 15 s at 67° C.), 1×(2 min at 72° C.). PCR products can be cleaned up with a 1.8×SPRI treatment and normalized to 25 ng/μL.
Approximately 16 hours prior to transfection, 25,000 HEK293T cells in DMEM/10% FBS+Pen/Strep can be plated into each well of a 96-well plate. On the day of transfection, the cells are 70-90% confluent. For each well to be transfected, a mixture of Lipofectamine™ 2000 and Opti-MEM™ can be prepared and then incubated at room temperature for 5-20 minutes (Solution 1). After incubation, the Lipofectamine™:OptiMEM™ mixture can be added to a separate mixture containing Cas12i2 plasmid and crRNA and water (Solution 2). In the case of negative controls, the crRNA is not included in Solution 2. The solution 1 and solution 2 mixtures can be mixed by pipetting up and down and then incubated at room temperature for 25 minutes. Following incubation, the Solution 1 and Solution 2 mixture can be added dropwise to each well of a 96 well plate containing the cells. 72 hours post transfection, cells can be trypsinized by adding 10 μL of TrypLE™ to the center of each well and incubated for approximately 5 minutes. 100 μL of D10 media can then be added to each well and mixed to resuspend cells. The cells can then be spun down, and the supernatant can be discarded. QuickExtract™ buffer can be added to the amount of the original cell suspension volume. Cells can be incubated at 65° C. for 15 minutes, 68° C. for 15 minutes, and 98° C. for 10 minutes.
Activity of a Cas12i2 fusion protein comprising a base editing domain can be monitored by next gen sequencing. Samples for Next Generation Sequencing can be prepared by two rounds of PCR. The first round (PCR1) is used to amplify specific genomic regions depending on the target. PCR1 products can be purified by column purification. Round 2 PCR (PCR2) can be done to add Illumina adapters and indexes. Reactions can then be pooled and purified by column purification. Sequencing runs can be done with a 150 cycle NextSeq v2.5 mid or high output kit.
Activity of a Cas12i2 fusion protein comprising a DNA methylation domain can be monitored, e.g., by methylation-specific PCR or whole-genome bisulfite sequencing.
This Example describes engineering and protein activity (e.g., indel activity) assessment of circularly permutated Cas12i2 polypeptides. The native amino and carboxy termini (residues 1 and 1,054) of the variant Cas12i2 polypeptide of SEQ ID NO: 40 were covalently linked with the following amino acid linker: GGSGGSGGSGGSGGS (SEQ ID NO: 71), and new N- and C-termini were introduced, thereby reorganizing the amino acid sequence of the protein. The positions of the new N- and C-termini relative to the amino acid positions of SEQ ID NO: 40 are shown in Table 5, and the sequences of the circularly permuted Cas12i2 polypeptides are shown in Table 6.
The variant Cas12i2 polypeptide of SEQ ID NO: 40 and the circularly permuted Cas12i2 polypeptides of SEQ ID NOs: 45-52 were cloned into a pcDNA3.1 backbone (Invitrogen™). RNA guides were cloned into a pUC19 backbone (New England Biolabs®). The plasmids were then maxi-prepped and diluted. The tested RNA guide and target sequences are shown in Table 7.
Approximately 16 hours prior to transfection, HEK293T cells in DMEM/10% FBS+Pen/Strep were plated into each well of a 96-well plate. On the day of transfection, the cells were 70-90% confluent. For each well to be transfected, a mixture of Lipofectamine™ 2000 and Opti-MEM™ was prepared and then incubated at room temperature for 5-20 minutes (Solution 1). After incubation, the Lipofectamine™:OptiMEM™ mixture was added to a separate mixture containing Cas12i2 plasmid and RNA guide plasmid and water (Solution 2). In the case of negative controls, the crRNA was not included in Solution 2. The solution 1 and solution 2 mixtures were mixed by pipetting up and down and then incubated at room temperature for 25 minutes. Following incubation, Solution 1 and Solution 2 mixture were added dropwise to each well of a 96-well plate containing the cells. 72 hours post-transfection, cells were trypsinized by adding TrypLE™ to the center of each well and incubated for approximately 5 minutes. D10 media was then added to each well and mixed to resuspend cells. The cells were then spun down, and the supernatant was discarded. QuickExtract™ buffer was added to ⅕ the amount of the original cell suspension volume. Cells were incubated at 65° C. for 15 minutes, 68° C. for 15 minutes, and 98° C. for 10 minutes.
Samples for Next Generation Sequencing were prepared by two rounds of PCR. The first round (PCR1) was used to amplify specific genomic regions depending on the target. PCR1 products were purified by column purification. Round 2 PCR (PCR2) was done to add Illumina adapters and indexes. Reactions were then pooled and purified by column purification. Sequencing runs were done with a 150 cycle NextSeq v2.5 mid or high output kit.
This Example thus shows that circularly permuted Cas12i2 polypeptides function as nucleases. Therefore, the circularly permuted Cas12i2 polypeptides appeared to maintain domain structure despite rearrangement of the amino acid sequences.
It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.
SEFFSGEETYTICVHHL
This application claims priority to U.S. Provisional Application 63/139,651 filed on Jan. 20, 2021, U.S. Provisional Application 63/227,404 filed on Jul. 30, 2021, U.S. Provisional Application 63/270,512 filed on Oct. 21, 2021, the entire contents of each of which are hereby incorporated by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/013133 | 1/20/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63270512 | Oct 2021 | US | |
63227404 | Jul 2021 | US | |
63139651 | Jan 2021 | US |