COMPOSITIONS COMPRISING A CAS12I POLYPEPTIDE AND USES THEREOF

Information

  • Patent Application
  • 20230287456
  • Publication Number
    20230287456
  • Date Filed
    September 09, 2022
    2 years ago
  • Date Published
    September 14, 2023
    a year ago
Abstract
The present invention relates to compositions comprising a Cas12i polypeptide, a deaminase polypeptide, and an RNA guide, processes for characterizing the compositions, cells comprising the compositions, Cas12i fusion proteins, Cas12i complexes, and methods of using the compositions.
Description
SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted electronically in XML file format and is hereby incorporated by reference in its entirety. Said XML copy, created on Nov. 17, 2022, is named A2186-704110_SL.xml and is 184,137 bytes in size. No new matter has been added.


BACKGROUND

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated (Cas) genes, collectively known as CRISPR-Cas or CRISPR/Cas systems, are adaptive immune systems in archaea and bacteria that defend particular species against foreign genetic elements.


SUMMARY OF THE INVENTION

It is against the above background that the present invention provides certain advantages and advancements over the prior art. Although this invention disclosed herein is not limited to specific advantages or functionalities, the invention provides Cas12i fusion proteins, compositions, systems, and methods of using the Cas12i fusion proteins. In particular, such Cas12i fusion proteins contain one or more domains, wherein at least one of the domains is a deaminase domain and wherein at least one of the domains is a Cas12i domain or biologically active portion thereof. The Cas12i domain in the Cas12i fusion proteins may bind to a target sequence on a target nucleic acid specified by an RNA guide. While the amino acid numbering system used herein is in relation to SEQ ID NO: 2, other Cas12i sequences can be used. One of ordinary skill in the art can identify the corresponding amino acid positions in another Cas12i sequences using available tools, such as sequence alignment algorithms.


In one aspect, the disclosure provides a Cas12i fusion protein comprising:

    • i) a Cas12i polypeptide comprising an alteration (e.g., comprising a plurality of alterations) relative to the amino acid sequence of SEQ ID NOs: 2, wherein the alteration is selected from the group comprising G587R, G624R, F626R, E833Q, E833N, D1019K, D1019N, D581R, D911R, I926R, V1030G, E1035R, S1046G, and P868T, and wherein the Cas12i2 polypeptide comprises at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% identity to the amino acid sequence of SEQ ID NO: 2; and
    • ii) a heterologous sequence comprising a deaminase domain.


In one aspect, the disclosure provides a Cas12i fusion protein comprising:

    • i) a Cas12i polypeptide comprising an alteration (e.g., comprising a plurality of alterations) relative to the amino acid sequence of SEQ ID NO: 2, wherein the alteration (e.g., plurality of alterations) is selected from the group comprising D581R, D911R, I926R, V1030G, S1046G, G624R, F626R, E1035R, and P868T, wherein the Cas12i2 polypeptide comprises at least 90% identity to the amino acid sequence of SEQ ID NO: 2; and
    • ii) a heterologous sequence comprising a deaminase domain.


In one aspect, the disclosure provides a Cas12i fusion protein comprising:

    • i) a Cas12i polypeptide comprising an alteration (e.g., comprising a plurality of alterations) relative to the amino acid sequence of SEQ ID NO: 2, wherein the alteration (e.g., plurality of alterations) is selected from the group comprising D581R, D911R, I926R, V1030G, S1046G, G624R, F626R, E1035R, and P868T, wherein the Cas12i2 polypeptide comprises at least 90% identity to the amino acid sequence of SEQ ID NO: 2, and wherein the Cas12i polypeptide has reduced nuclease activity or is a nuclease dead Cas12i polypeptide; and
    • ii) a heterologous sequence comprising a deaminase domain.


In some embodiments the Cas12i polypeptide comprises one or more alterations in a catalytic residue, wherein the one or more catalytic residues are selected from D599, E833, and D1019.


In certain embodiments, the Cas12i polypeptide comprises one or more alterations in a catalytic residue, wherein the one or more alterations are selected from D599A, D599K, E833Q, E833N, D1019K, and D1019N.


In some embodiments, the alteration in a catalytic residue comprises D599A. In certain embodiments, the alteration in a catalytic residue comprises D599K. In some embodiments, the alteration in a catalytic residue comprises E833Q. In one embodiment, the alteration in a catalytic residue comprises E833N. In certain embodiments, the alteration in a catalytic residue comprises D1019K. In some embodiments, the alteration in a catalytic residue comprises D1019N.


In one embodiment, the one or more alterations in a catalytic residue comprises D1019K and D599K.


In certain embodiments, the one or more alterations in the catalytic residue comprises D1019N and D599K.


In one embodiment, the one or more alterations in the catalytic residue comprises D1019K, E833N, and D599K.


In certain embodiments, the plurality of alterations further comprises G587R.


In some embodiments, the alteration comprises G624R. In some embodiments, the alteration comprises F626R. In some embodiments, the alteration comprises D581R. In certain embodiments, the alteration comprises D911R. In some embodiments, the alteration comprises I926R. In certain embodiments, the alteration comprises V1030G. In some embodiments, the alteration comprises S1046G. In certain embodiments, the alteration comprises E1035R. In one embodiment, the alteration comprises P868T.


In certain embodiments, the plurality of alterations further comprise a second alteration relative to the amino acid sequence of SEQ ID NO: 2.


In certain embodiments, the second alteration comprises a substitution, insertion, or deletion.


In some embodiments, the Cas12i polypeptide further comprises a third alteration relative to the amino acid sequence of SEQ ID NO: 2, and optionally further comprises a fourth, a fifth, a sixth, a seventh, eighth, ninth, and tenth alteration relative to the amino acid sequence of SEQ ID NO: 2.


In certain embodiments, the third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth alterations, if present, each independently comprises a substitution, insertion, or deletion.


In some embodiments, the plurality of alterations comprise one or more of (e.g., 2, 3, or all of) D581R, D911R, I926R, and V1030G.


In certain embodiments, the plurality of alterations comprise one or more of (e.g., 2 or all of) D581R, I926R, and V1030G.


In certain embodiments, the plurality of alterations comprise one or more of (e.g., 2, 3, or all of) D581R, I926R, V1030G, and S1046G.


In some embodiments, the plurality of alterations comprise one or more of (e.g., 2, 3, 4, 5, 6, or all of) D581R, G624R, F626R, I926R, V1030G, E1035R, and S1046G.


In certain embodiments, the plurality of alterations comprise one or more of (e.g., 2, 3, 4, 5, 6, or all of) D581R, G624R, F626R, P868T, I926R, V1030G, E1035R, and S1046G.


In one embodiment, the plurality of alterations comprise:

    • i) D581R, D911R, I926R, and V1030G;
    • ii) D581R, I926R, and V1030G;
    • iii) D581R, I926R, V1030G, and S1046G;
    • iv) D581R, G624R, F626R, I926R, V1030G, E1035R, and S1046G; or
    • v) D581R, G624R, F626R, P868T, I926R, V1030G, E1035R, and S1046G.


In certain embodiments the Cas12i polypeptide comprises at least 95% or 99% identity to the amino acid sequence of SEQ ID NO: 2.


In certain embodiments, an amino acid sequence according to SEQ ID NO: 41, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.


In some embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 42, or a sequence having at least 80%, 5%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.


In one embodiment, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 43, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.


In certain embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 44, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.


In some embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 45, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.


In certain embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 46, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.


In one aspect, the disclosure provides a Cas12i polypeptide comprising an alteration relative to the amino acid sequence of SEQ ID NO: 2, wherein the alteration is selected from D1019K or D1019N.


In one aspect, the disclosure provides a Cas12i fusion protein comprising the Cas12i polypeptide of the immediate preceding aspect and a heterologous sequence comprising a deaminase domain.


In one aspect, the disclosure provides a Cas12i fusion protein comprising:

    • i) a Cas12i polypeptide comprising an alteration (e.g., a plurality of alterations) relative to the amino acid sequence of SEQ ID NO: 9, wherein the alteration (e.g., plurality of alterations) is selected from the group comprising E480R, G564R, V592R, or E1042R, wherein the Cas12i polypeptide comprises at least 90% identity to the amino acid sequence of SEQ ID NO: 9, and wherein the Cas12i polypeptide has reduced nuclease activity or is a nuclease dead Cas12i polypeptide; and
    • ii) a heterologous sequence comprising a deaminase domain.


In some embodiments, the alteration comprises E480R. In one embodiment, the alteration comprises G564R. In certain embodiments, the alteration comprises V592R. In some embodiments, the alteration comprises E1042R.


In certain embodiments, the Cas12i polypeptide comprises an alteration in a catalytic residue, wherein optionally the alteration comprises an alteration at one or more of D608 (e.g., D608A), E844, and D1022.


In certain embodiments, the Cas12i polypeptide further comprises a second alteration relative to the amino acid sequence of SEQ ID NO: 9.


In some embodiments, the second alteration comprises a substitution, insertion, or deletion.


In certain embodiments, the Cas12i polypeptide further comprises a third alteration, and optionally further comprises a fourth, a fifth, a sixth, a seventh, eighth, ninth, and tenth alteration.


In certain embodiments, the third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth alterations, if present, each independently comprises a substitution, insertion, or deletion.


In certain embodiments, the plurality of alterations comprise E480R, G564R, V592R, and E1042R.


In some embodiments, the Cas12i polypeptide further comprises an alteration in a catalytic residue, wherein the alteration comprises D608A.


In certain embodiments, the Cas12i fusion protein an amino acid sequence according to SEQ ID NO: 60, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.


In certain embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 61, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.


In some embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 62, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.


In certain embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 63, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.


In certain embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 64, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.


In certain embodiments, the heterologous sequence is N-terminal or C-terminal of the Cas12i polypeptide. In some embodiments, the heterologous sequence is N-terminal of the Cas12i polypeptide. In certain embodiments, the heterologous sequence is C-terminal of the Cas12i polypeptide.


In some embodiments, the deaminase domain is chosen from a human APOBEC3 family deaminase, an Activation Induced Deaminase (AID), or an ABE8 deaminase, or a functional fragment or variant thereof, or a deaminase domain having at least 80% amino acid sequence identity thereto.


In certain embodiments, the human APOBEC3 family deaminase is A3A comprising an amino acid sequence of SEQ ID NO: 29, the AID deaminase comprises an amino acid sequence of SEQ ID NO: 28, or the ABE8 is ABE8_20 (SEQ ID NO: 30), or a functional fragment or variant thereof, or a deaminase domain having at least 80% amino acid sequence identity thereto.


In some embodiments, the deaminase domain is chosen from humanAPOBEC3a (A3A; SEQ ID NO: 29) or Activation Induced Deaminase (AID; SEQ ID NO: 28), or a functional fragment or variant thereof, or a deaminase domain having at least 80% amino acid sequence identity thereto.


In certain embodiments, the deaminase domain is chosen from an APOBEC3 family deaminase or ABE8_20, or a functional fragment or variant thereof, or a deaminase domain having at least 80% amino acid sequence identity thereto.


In certain embodiments, the heterologous sequence further comprises at least one peptide linker. In some embodiments, the peptide linker comprises between 3 and 70 (e.g., 3-10, 10-15, 15-20, 20-25, 25-30, 30-35, 35-40, 40-45, 45-50, 50-55, 55-60, 60-65, or 65-70) amino acid residues. In certain embodiments, the peptide linker comprises one or more Gly residues and one or more Ser residues. In some embodiments, the peptide linker comprises (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In certain embodiments, the peptide linker comprises one or more proline residues.


In some embodiments, the peptide linker comprises the structure of:





L1-L2-L3

    • wherein L1 and L3 are each independently chosen from (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15), and
    • L2 is a polypeptide comprising between 0-40 (e.g., 0-5, 5-10, 10-15, 15-20, 20-25, 25-30, 30-35, or 35-40) amino acid residues. In certain embodiments, L2 is an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SGSETPGTSESATPES (SEQ ID NO: 106). In certain embodiments, the peptide linker comprises an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 40.


In some embodiments, the Cas12i fusion protein does not comprise a linker sequence.


In some embodiments, heterologous sequence is heterologous to both the Cas12i polypeptide and the deaminase domain.


In certain embodiments, the heterologous sequence comprises a Uracil Glycosylase Inhibitor (UGI) polypeptide.


In some embodiments, the heterologous sequence comprises a nuclear localization sequence (NLS) polypeptide.


In some embodiments, the Cas12i fusion protein forms a complex with a ribonucleic acid (RNA) guide wherein the RNA guide comprises a nuclease binding sequence (e.g., a direct repeat sequence) and a DNA-binding sequence (e.g., a spacer sequence).


In one aspect, the disclosure provides a method of introducing a substitution in a target sequence of a target nucleic acid in a cell, the method comprising:

    • contacting the cell with: (i) a Cas12i domain, (ii) an RNA guide, and (iii) a deaminase domain, or nucleic acid encoding (i), (ii), and (iii),
    • wherein the target nucleic acid comprises a nucleotide position 0, wherein nucleotide position 0 of the target strand hybridizes with the direct repeat-proximal end of the spacer sequence of the RNA guide, and a position x (wherein optionally x=20), wherein position x of the target strand hybridizes with the direct repeat-distal end of the spacer sequence of the RNA guide,
    • and wherein the target nucleic acid comprises an A or a C between positions 5-16 (e.g., between positions 7-12, e.g., between positions 8-11) on the target strand or the non-target strand,
    • wherein the A is substituted to a inosine (I) (e.g., converts an A:T base pair to an I:C, I:U, or I:A base pair) or a guanine (G) or the C is substituted to a U or T (e.g., converts a C:G base pair to a T:A base pair).


In one aspect, the disclosure provides a method of introducing a substitution in a target sequence of a target nucleic acid in a cell, the method comprising:

    • contacting the cell with (i) a Cas12i fusion protein described herein, and (ii) an RNA guide, or a nucleic acid encoding (i) and (ii),
    • thereby introducing the substitution.


In certain embodiments, the cell is in vivo.


In some embodiments, the cell is ex vivo.


In one aspect, the disclosure provides a composition comprising:

    • a) the Cas12i fusion protein described herein; and
    • b) an RNA guide or a nucleic acid encoding the RNA guide, wherein the RNA guide comprises a nuclease binding sequence (e.g., a direct repeat sequence) and a DNA-binding sequence (e.g., a spacer sequence).


In some embodiments of the aspects or embodiments described herein, the spacer sequence comprises about 10 nucleotides to about 50 nucleotides in length. In some embodiments, the spacer sequence comprises about 15 nucleotides and about 35 nucleotides in length.


In certain embodiments, the spacer sequence is substantially identical to a target sequence of a target nucleic acid.


In some embodiments, the target sequence is adjacent to a protospacer adjacent motif (PAM) sequence. In certain embodiments, the PAM sequence comprises a sequence set forth as 5′-NTTN-3′, wherein N is any nucleotide.


In one aspect, the disclosure provides Cas12i fusion protein comprising, in an N-terminal to C-terminal direction:

    • (a) an N-terminal portion of a Cas12i polypeptide, wherein the N-terminal portion of the Cas12i polypeptide comprises a Cas12i sequence from the N-terminus to a loop, or a functional fragment or variant thereof;
    • (b) a heterologous sequence comprising a deaminase domain, and
    • (c) a C-terminal portion of the Cas12i polypeptide, wherein the C-terminal portion of the Cas12i polypeptide comprises a Cas12i sequence from the loop to the C-terminus, or a fragment or variant thereof.


In some embodiments, the N-terminal portion of the Cas12i polypeptide comprises amino acids 1-n of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and

    • the C-terminal portion of the Cas12i polypeptide comprises amino acids m-1054 of SEQ ID NO: 2, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


In some embodiments, n and m are each independently a number between:

    • i) 342-358 (e.g., 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358);
    • ii) 373-378 (e.g., 373, 374, 375, 376, 377, or 378);
    • iii) 408-413 (e.g., 408, 409, 410, 411, 412, or 413);
    • iv) 677-685 (e.g., 677, 678, 679, 680, 681, 682, 683, 684, or 685);
    • v) 718-723 (e.g., 718, 719, 720, 721, 722, or 723);
    • vi) 771-782 (e.g., 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, or 782);
    • vii) 953-965 (e.g., 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, or 965);
    • viii) 55-65 (e.g., 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, or 65);
    • ix) 99-105 (e.g., 99, 100, 101, 102, 103, 104, or 105);
    • x) 112-120 (e.g., 112, 113, 114, 115, 116, 117, 118, 119, or 120);
    • xi) 195-206 (e.g., 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, or 206);
    • xii) 241-250 (e.g., 241, 242, 243, 244, 245, 246, 247, 248, 249, or 250);
    • xiii) 583-594 (e.g., 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, or 594);
    • xiv) 877-901 (e.g., 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, or 901); or
    • xv) 386-397 (e.g., residue 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, or 397).


In certain embodiments, n<m. In some embodiments, m=n+1.


In particular embodiments, the Cas12i polypeptide is a Cas12i4 polypeptide.


In some embodiments, the heterologous sequence comprises at least one linker (e.g., any linker described herein).


In certain embodiments, the heterologous sequence comprises a first linker (e.g., a first peptide linker) and a second linker (e.g., a second peptide linker). In some embodiments, the first linker and the second linker each independently comprise between 3 and 70 (e.g., 3-10, 10-15, 15-20, 20-25, 25-30, 30-35, 35-40, 40-45, 45-50, 50-55, 55-60, 60-65, or 65-70) amino acid residues. In certain embodiments, the first linker and the second linker each independently comprise (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In some embodiments, the first linker and the second linker independently comprise amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 40. In some embodiments, the first linker and the second linker each independently comprise one or more proline residues. In certain embodiments, the first linker is N-terminal of the deaminase domain and the second linker is C-terminal of the deaminase domain. In some embodiments, the first linker and the second linker have the same sequence. In certain embodiments, the first linker and the second linker have different sequences.


In one aspect, the disclosure provides a fusion protein comprising:

    • (a) a Cas12i4 polypeptide,
    • (b) a deaminase domain chosen from APOBEC3 or ABE8_20, or a biologically active portion or variant thereof.


In one embodiments, the deaminase domain is N-terminal or C-terminal of the Cas12i4 polypeptide. In certain embodiments, the deaminase domain is N-terminal of the Cas12i4 polypeptide. In certain embodiments, the deaminase domain is C-terminal of the Cas12i4 polypeptide. In some embodiments, the fusion protein does not comprise a linker sequence.


In certain embodiments, the fusion protein further comprises at least one heterologous sequence, which is heterologous to both the Cas12i4 domain and the deaminase domain.


In certain embodiments, the heterologous sequence comprises a Uracil Glycosylase Inhibitor (UGI) polypeptide.


In some embodiments, the heterologous sequence comprises a nuclear localization sequence (NLS) polypeptide.


In some embodiments, the fusion protein comprises, one, two, or three of:

    • i) a first heterologous sequence situated between the Cas12i4 domain and the deaminase domain;
    • ii) a second heterologous sequence situated between the Cas12i4 domain and the terminus nearest the Cas12i4 domain; or
    • iii) a third heterologous sequence situated between the deaminase domain and the terminus nearest the deaminase domain.


In some embodiments, the deaminase domain is N-terminal of the Cas12i4 domain and the UGI domain. In certain embodiments, the deaminase domain is C-terminal of the Cas12i4 domain.


In some embodiments, the deaminase domain is N-terminal of the Cas12i4 domain, the first heterologous sequence comprises the UGI domain, the second heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide), and the third heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide).


In some embodiments, the deaminase domain is C-terminal of the Cas12i4 domain, the first heterologous sequence comprises a linker, the second heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide) and the UGI domain, and the third heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide).


In some embodiments, the deaminase domain is C-terminal of the Cas12i4 domain, the first heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide), the second heterologous sequence is absent or comprises a peptide tag (e.g., a peptide purification tag), and the third heterologous sequence comprises a UGI domain and an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide).


In certain embodiments, the first heterologous sequence comprises the UGI polypeptide. In some embodiments, the UGI polypeptide is flanked by peptide linkers.


In some embodiments, the second and third heterologous sequence each independently comprise an NLS polypeptide.


In some embodiments, the first heterologous sequence comprises a peptide linker and, when present, the second heterologous sequence or the third heterologous sequence comprises an NLS polypeptide, one or more linkers, and a UGI polypeptide.


In certain embodiments, the NLS polypeptide is N-terminal of the UGI polypeptide. In certain embodiments, the NLS polypeptide is C-terminal of the UGI polypeptide.


In some embodiments, one of the one or more linkers is situated between the NLS polypeptide and the UGI polypeptide.


In certain embodiments, the third heterologous sequence comprises a first and a second linker, wherein the first linker is situated N-terminal the NLS polypeptide and the UGI polypeptide and the second linker is situated between the NLS polypeptide and the UGI polypeptide.


In some embodiments, the first heterologous sequence further comprises an NLS sequence. In certain embodiments, the NLS polypeptide is situated N-terminal of the linker.


In some embodiments, the fusion protein does not comprise the second heterologous sequence.


In one aspect, the disclosure provides a fusion protein comprising:

    • (a) a Cas12i4 polypeptide,
    • (b) a deaminase domain; and
    • (c) a UGI polypeptide.


In some embodiments, the deaminase domain is N-terminal or C-terminal of the Cas12i4 polypeptide. In some embodiments, the deaminase domain is N-terminal of the Cas12i4 polypeptide. In certain embodiments, the deaminase domain is C-terminal of the Cas12i4 polypeptide.


In some embodiments, the fusion protein does not comprise a linker sequence.


In certain embodiments, the fusion protein further comprises at least one heterologous sequence, which is heterologous to each of the Cas12i4 domain, the deaminase domain, and the UGI polypeptide.


In one embodiments, the heterologous sequence comprises a nuclear localization sequence (NLS) polypeptide.


In certain embodiments, the fusion protein comprises, one, two, or three of:

    • i) a first heterologous sequence situated between the Cas12i4 domain and the deaminase domain;
    • ii) a second heterologous sequence situated between the Cas12i4 domain and the terminus nearest the Cas12i4 domain; or
    • iii) a third heterologous sequence situated between the deaminase domain and the terminus nearest the deaminase domain.


In certain embodiments, the deaminase domain is N-terminal of the Cas12i4 domain and the UGI domain.


In some embodiments, the deaminase domain is C-terminal of the Cas12i4 domain.


In certain embodiments, the fusion protein does not comprise the first heterologous sequence, and wherein the UGI domain is situated between the deaminase domain and the Cas12i4 domain.


In some embodiments, UGI domain is situated C-terminal of both the deaminase domain and the Cas12i4 domain.


In certain embodiments, the UGI domain is flanked by peptide linkers.


In certain embodiments, when present, the first and second heterologous sequence each independently comprise an NLS polypeptide.


In some embodiments, the one, two, or three of the first, the second, and the third heterologous sequence each independently comprise one or more peptide linkers and the third heterologous sequence comprises an NLS polypeptide. In some embodiments, the NLS polypeptide is N-terminal of the UGI polypeptide. In certain embodiments, the NLS polypeptide is C-terminal of the UGI polypeptide.


In some embodiments, at least one (e.g., one) of the one or more of the linkers is situated between the NLS polypeptide and the UGI polypeptide.


In certain embodiments, the third heterologous sequence comprises a first and a second linker, wherein the first linker is situated N-terminal the NLS polypeptide and the UGI polypeptide and the second linker is situated between the NLS polypeptide and the UGI polypeptide.


In some embodiments, the first heterologous sequence further comprises an NLS sequence. In certain embodiments, the NLS polypeptide is situated N-terminal of the linker.


In some embodiments, the NLS polypeptide is selected from a nuclear plasma NLS (npNLS) polypeptide or a bipartite NLS (bpNLS) polypeptide. In certain embodiments, the fusion protein comprises an npNLS polypeptide and a bpNLS polypeptide.


In some embodiments, the npNLS polypeptide is situated N-terminal of the bpNLS polypeptide.


In certain embodiments, the npNLS polypeptide is situated C-terminal of the bpNLS polypeptide.


In some embodiments, the npNLS polypeptide comprises an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 36. In certain embodiments, the bpNLS polypeptide comprises an amino acid sequence having at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 38.


In some embodiments, the npNLS polypeptide comprises an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 36, and the bpNLS polypeptide comprises an amino acid sequence having at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 38.


In certain embodiments, each peptide linker independently comprises between 2 and 200 amino acid residues. In some embodiments, each peptide linker independently comprises one or more Gly residues and one or more Ser residues. In certain embodiments, each peptide linker independently comprises (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15).


In particular embodiments, the peptide linker comprises the structure of:





L1-L2-L3

    • wherein L1 and L3 are each independently chosen from (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15), and
    • L2 is a polypeptide comprising between 0-40 (e.g., 0-5, 5-10, 10-15, 15-20, 20-25, 25-30, 30-35, or 35-40) amino acid residues.


In certain embodiments, L2 is an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SGSETPGTSESATPES (SEQ ID NO: 106).


In some embodiments, the peptide linker comprises an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 40.


In certain embodiments, at least one of the first, second, or third heterologous sequence comprises a linker comprising an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 40.


In some embodiments, the fusion protein comprises an N-terminal or C-terminal peptide tag.


In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 60, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can introduce a substitution in a target sequence of a target nucleic acid.


In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 61, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can introduce a substitution in a target sequence of a target nucleic acid.


In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 62, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can introduce a substitution in a target sequence of a target nucleic acid.


In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 63, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can introduce a substitution in a target sequence of a target nucleic acid.


In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 64, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can introduce a substitution in a target sequence of a target nucleic acid.


In certain embodiments, the fusion protein forms a complex with a ribonucleic acid (RNA) guide wherein the RNA guide comprises a nuclease binding sequence (e.g., a direct repeat sequence) and a DNA-binding sequence (e.g., a spacer sequence).


In one aspect, the disclosure provides a polypeptide system comprising:

    • (a) a first polypeptide comprising a Cas12i domain and a first dimerization domain, and
    • (b) a second polypeptide comprising a deaminase domain and a second, compatible dimerization domain.


In certain embodiments, the first polypeptide comprises a first peptide linker situated between the Cas12i domain and the first dimerization domain.


In some embodiments, the second polypeptide comprises a second peptide linker situated between the Cas12i domain and the second dimerization domain.


In certain embodiments, each peptide linker independently comprises between 2 and 200 amino acid residues.


In some embodiments, each peptide linker independently comprises one or more Gly residues and one or more Ser residues.


In certain embodiments, each peptide linker independently comprises (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15).


In some embodiments, each peptide linker independently comprises one or more proline residues.


In particular embodiments, the peptide linker comprises the structure of:





L1-L2-L3

    • wherein L1 and L3 are each independently chosen from (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15), and
    • L2 is a polypeptide comprising between 0-40 (e.g., 0-5, 5-10, 10-15, 15-20, 20-25, 25-30, 30-35, or 35-40) amino acid residues.


In some embodiments, the first polypeptide and the second polypeptide form a complex upon dimerization of the of the first dimerization domain and the second dimerization domain.


In certain embodiments, the Cas12i domain comprises a Cas12i1 polypeptide, a Cas12i2 polypeptide, a Cas12i3 polypeptide, or a Cas12i4 polypeptide, and wherein:

    • (a) the Cas12i1 polypeptide comprises an amino acid sequence with at least 80% (e.g., 85%, 90%, 95%, 97%, 98%, or 99%) identity to SEQ ID NO: 8;
    • (b) the Cas12i2 polypeptide comprises an amino acid sequence with at least 80% (e.g., 85%, 90%, 95%, 97%, 98%, or 99%) identity to any one of SEQ ID NOs: 2-7;
    • (c) the Cas12i3 polypeptide comprises an amino acid sequence with at least 80% (e.g., 85%, 90%, 95%, 97%, 98%, or 99%) identity to SEQ ID NO: 11; and
    • (d) the Cas12i4 polypeptide comprises an amino acid sequence with at least 80% (e.g., 85%, 90%, 95%, 97%, 98%, or 99%) identity to SEQ ID NO: 9 or SEQ ID NO: 10.


In some embodiments, the Cas12i domain forms a complex with an RNA guide, wherein the RNA guide comprises a nuclease binding sequence (e.g., a direct repeat sequence) and a DNA-binding sequence (e.g., a spacer sequence).


In certain embodiments, the first dimerization domain and the second dimerization domain are identical (e.g., a homodimer). In some embodiments, the first dimerization domain and the second dimerization domain are not identical (e.g., a heterodimer). In certain embodiments, the first dimerization domain is chosen from leucine zipper, nanobody, antibody, or coiled-coil domain. In certain embodiments, the first and second dimerization domains are chemically inducible dimerization domains (e.g., a rapamycin sensitive dimerization domain) that can be regulated by the presence of a small molecule.


In one aspect, the disclosure provides a fusion protein comprising:

    • a) a first portion comprising an amino acid sequence of an N-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and
    • b) a second portion comprising an amino acid sequence of a C-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto,
    • wherein the second portion is N-terminal of the first portion,
    • wherein the first portion and second portion together bind to an RNA guide comprising a direct repeat sequence and a spacer sequence.


In some embodiments, the fusion protein is capable of specifically binding a target nucleic acid complementary to the spacer sequence.


In certain embodiments, the first portion and the second portion are linked by a heterologous sequence.


In some embodiments, the heterologous sequence comprises one or more of:

    • a) a first linker (e.g., a first peptide linker);
    • b) a second linker (e.g., a second peptide linker); and
    • c) an effector domain.


In certain embodiments, the C-terminal most amino acid of the first portion is any amino acid residue of any one of SEQ ID NOs: 2-7 chosen from residues:

    • a) 342-358 (e.g., residue 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358);
    • b) 373-378 (e.g., residue 373, 374, 375, 376, 377, or 378);
    • c) 408-413 (e.g., residue 408, 409, 410, 411, 412, or 413);
    • d) 677-685 (e.g., residue 677, 678, 679, 680, 681, 682, 683, 684, or 685);
    • e) 718-723 (e.g., residue 718, 719, 720, 721, 722, or 723);
    • f) 771-782 (e.g., residue 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, or 782);
    • g) 953-965 (e.g., residue 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, or 965);
    • h) 55-65 (e.g., residue 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, or 65);
    • i) 99-105 (e.g., residue 99, 100, 101, 102, 103, 104, or 105);
    • j) 112-120 (e.g., residue 112, 113, 114, 115, 116, 117, 118, 119, or 120);
    • k) 195-206 (e.g., residue 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, or 206);
    • l) 241-250 (e.g., residue 241, 242, 243, 244, 245, 246, 247, 248, 249, or 250);
    • m) 583-594 (e.g., residue 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, or 594);
    • n) 877-901 (e.g., residue 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, or 901);
    • o) 173-179 (e.g., residue 173, 174, 175, 176, 177, 178, or 179);
    • p) 216-221 (e.g., residue 216, 217, 218, 219, 220, or 221);
    • q) 265-272 (e.g., residue 265, 266, 267, 268, 269, 270, 271, or 272);
    • r) 456-468 (e.g., residue 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, or 468);
    • s) 476-482 (e.g., residue 476, 477, 478, 479, 480, 481, or 482);
    • t) 498-513 (e.g., residue 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, or 513);
    • u) 614-625 (e.g., residue 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, or 625);
    • v) 977-982 (e.g., residue 977, 978, 979, 980, 981, or 982);
    • w) 1007-1012 (e.g., residue 1007, 1008, 1009, 1010, 1011, or 1012);
    • x) 386-397 (e.g., residue 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, or 397); or
    • y) 831-844 (e.g., residue 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, or 844).


In some embodiments, the first portion further comprises a fusion domain, the second portion comprises a fusion domain, or the first portion and the second portion comprise a fusion domain.


In certain embodiments, the fusion domain is a deaminase.


In some embodiments, the fusion domain is a UGI polypeptide and/or an NLS.


In certain embodiments, the fusion domain is a FokI nuclease domain.


In some embodiments, the FokI nuclease domain is a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain.


In certain embodiments, the FokI nuclease domain is fused to a deaminase.


In some embodiments, the FokI nuclease domain is fused to a UGI polypeptide and/or an NLS.


In some embodiments, the first portion comprises a catalytically active FokI nuclease domain and the second portion comprises a catalytically inactive FokI nuclease domain, or the first portion comprises a catalytically inactive FokI nuclease domain and the second portion comprises a catalytically active FokI nuclease domain.


In certain embodiments, the fusion protein comprises a catalytically inactive RuvC domain.


In some embodiments, the fusion protein comprises nickase activity.


In one aspect, the disclosure provides a method of producing a substitution in a target sequence of a target nucleic acid in a cell, the method comprising:

    • contacting the cell with: (i) a Cas12i domain, (ii) an RNA guide, and (iii) a deaminase domain, or nucleic acid encoding (i), (ii), and (iii),
    • wherein the target sequence comprises a nucleotide position 0, wherein nucleotide position 0 of the target strand hybridizes with the direct repeat-proximal end of the spacer sequence of the RNA guide, and a position x (wherein optionally x=20), wherein position x of the target strand hybridizes with the direct repeat-distal end of the spacer sequence of the RNA guide,
    • and wherein the target sequence comprises an A or a C between positions 5-16 (e.g., between positions 7-12, e.g., between positions 8-11)) on the target strand or the non-target strand,
    • wherein the A is substituted to a inosine (I) (e.g., converts an A:T base pair to an I:C, I:U, or I:A base pair) or a guanine (G) or the C is substituted to a U or T (e.g., converts a C:G base pair to a T:A base pair).


In one aspect, the disclosure provides a method of introducing a substitution in a target sequence of a target nucleic acid in a cell, the method comprising:

    • contacting the cell with (i) a fusion protein described herein or the polypeptide system described herein, and (ii) an RNA guide, or a nucleic acid encoding (i) and (ii),
    • thereby introducing the substitution.


In certain embodiments, the target sequence comprises a nucleotide position 0, wherein nucleotide position 0 of the target strand hybridizes with the direct repeat-proximal end of the spacer sequence of the RNA guide, and a position x (wherein optionally x=20), wherein position x of the target strand hybridizes with the direct repeat-distal end of the spacer sequence of the RNA guide,

    • and wherein the target sequence comprises an A or a C between positions 5-16 (e.g., between positions 7-12, e.g., between positions 8-11) on the target strand or the non-target strand,
    • wherein the A is substituted to a inosine (I) (e.g., an A:T base pair is converted to an I:C, I:U, or I:A base pair) or to guanine (G), or the C is substituted to a U (e.g., converts a C:G base pair to a T:A base pair).


In some embodiments, the method converts a C:G base pair to a T:A base pair alteration in the target sequence.


In some embodiments, the alteration occurs at one or more C:G base pairs between positions 7-12 (e.g., between positions 8-11) of the target sequence. In certain embodiments, the cell is selected from a eukaryotic cell, a mammalian cell, or a human cell. In some embodiments, the cell is in vivo. In certain embodiments, the cell is ex vivo. In some embodiments, the cell is in vitro.


In one aspect, the disclosure provides a composition comprising:

    • a) the fusion protein described herein, or the polypeptide system described herein; and
    • b) an RNA guide or a nucleic acid encoding the RNA guide, wherein the RNA guide comprises a nuclease binding sequence (e.g., a direct repeat sequence) and a DNA-binding sequence (e.g., a spacer sequence).


In some embodiments, the Cas12i polypeptide is a Cas12i1 polypeptide, a Cas12i2 polypeptide, a Cas12i3 polypeptide, or a Cas12i4 polypeptide, and wherein:

    • (a) the Cas12i1 polypeptide comprises an amino acid sequence with at least 80% identity to SEQ ID NO: 8;
    • (b) the Cas12i2 polypeptide comprises an amino acid sequence with at least 80% identity to any one of SEQ ID NOs: 2-7;
    • (c) the Cas12i3 polypeptide comprises an amino acid sequence with at least 80% identity to SEQ ID NO: 11; and
    • (d) the Cas12i4 polypeptide comprises an amino acid sequence with at least 80% identity to SEQ ID NO: 9 or SEQ ID NO: 10.


In some embodiments of the compositions, methods, or systems described herein:

    • (a) the Cas12i1 polypeptide comprises an amino acid sequence with at least 95% identity to SEQ ID NO: 8;
    • (b) the Cas12i2 polypeptide comprises an amino acid sequence with at least 95% identity to any one of SEQ ID NOs: 2-7;
    • (c) the Cas12i3 polypeptide comprises an amino acid sequence with at least 95% identity to SEQ ID NO: 11; and
    • (d) the Cas12i4 polypeptide comprises an amino acid sequence with at least 95% identity to SEQ ID NO: 9 or SEQ ID NO: 10.


In some embodiments of the compositions, methods, or systems described herein:

    • (a) the Cas12i1 polypeptide comprises the amino acid sequence set forth in SEQ ID NO: 8;
    • (b) the Cas12i2 polypeptide comprises the amino acid sequence set forth in any one of SEQ ID NOs: 2-7;
    • (c) the Cas12i3 polypeptide comprises the amino acid sequence set forth in SEQ ID NO: 11; and
    • (d) the Cas12i4 polypeptide comprises the amino acid sequence set forth in SEQ ID NO: 9 or SEQ ID NO: 10.


In certain embodiments, the Cas12i2 polypeptide comprises at least 80% identity to any one of SEQ ID NOs: 2-7, and wherein the Cas12i2 polypeptide further comprises one or more of the following substitutions: G587R, D599A, D599K, F626R, E833Q, E833N, D1019K, and D1019N.


In some embodiments, the Cas12i2 polypeptide comprises at least 95% identity to any one of SEQ ID NOs: 2-7, and wherein the Cas12i2 polypeptide further comprises one or more of the following substitutions: G587R, D599A, D599K, F626R, E833Q, E833N, D1019K, and D1019N.


In certain embodiments, the fusion protein or first polypeptide comprises at least one of an epitope peptide, a nuclear localization signal, and a nuclear export signal.


In some embodiments of the compositions, methods, or systems described herein:

    • (a) the Cas12i1 polypeptide comprises an amino acid sequence with at least 80% identity to SEQ ID NO: 8 and the direct repeat sequence comprises a nucleotide sequence with at least 90% identity to any one of SEQ ID NOs: 12-14;
    • (b) the Cas12i2 polypeptide comprises an amino acid sequence with at least 80% identity to any one of SEQ ID NOs: 2-7 and the direct repeat sequence comprises a nucleotide sequence with at least 90% identity to any one of SEQ ID NOs: 15-17;
    • (c) the Cas12i3 polypeptide comprises an amino acid sequence with at least 80% identity to SEQ ID NO: 11 and the direct repeat sequence comprises a nucleotide sequence with at least 90% identity to any one of SEQ ID NOs: 18-20; and
    • (d) the Cas12i4 polypeptide comprises an amino acid sequence with at least 80% identity to SEQ ID NO: 9 or SEQ ID NO: 10 and the direct repeat sequence comprises a nucleotide sequence with at least 90% identity to any one of SEQ ID NOs: 21-24.


In some embodiments of the compositions, methods, or systems described herein:

    • (a) the Cas12i1 polypeptide comprises an amino acid sequence with at least 95% identity to SEQ ID NO: 8 and the direct repeat sequence comprises a nucleotide sequence with at least 90% identity to any one of SEQ ID NOs: 12-14;
    • (b) the Cas12i2 polypeptide comprises an amino acid sequence with at least 95% identity to any one of SEQ ID NOs: 2-7 and the direct repeat sequence comprises a nucleotide sequence with at least 90% identity to any one of SEQ ID NOs: 15-17;
    • (c) the Cas12i3 polypeptide comprises an amino acid sequence with at least 95% identity to SEQ ID NO: 11 and the direct repeat sequence comprises a nucleotide sequence with at least 90% identity to any one of SEQ ID NOs: 18-20; and
    • (d) the Cas12i4 polypeptide comprises an amino acid sequence with at least 95% identity to SEQ ID NO: 9 or SEQ ID NO: 10 and the direct repeat sequence comprises a nucleotide sequence with at least 90% identity to any one of SEQ ID NOs: 21-24.


In some embodiments of the compositions, methods, or systems described herein:

    • (a) the Cas12i1 polypeptide comprises an amino acid sequence with at least 95% identity to SEQ ID NO: 8 and the direct repeat sequence comprises a nucleotide sequence with at least 95% identity to any one of SEQ ID NOs: 12-14;
    • (b) the Cas12i2 polypeptide comprises an amino acid sequence with at least 95% identity to any one of SEQ ID NOs: 2-7 and the direct repeat sequence comprises a nucleotide sequence with at least 95% identity to any one of SEQ ID NOs: 15-17;
    • (c) the Cas12i3 polypeptide comprises an amino acid sequence with at least 95% identity to SEQ ID NO: 11 and the direct repeat sequence comprises a nucleotide sequence with at least 95% identity to any one of SEQ ID NOs: 18-20; and
    • (d) the Cas12i4 polypeptide comprises an amino acid sequence with at least 95% identity to SEQ ID NO: 9 or SEQ ID NO: 10 and the direct repeat sequence comprises a nucleotide sequence with at least 95% identity to any one of SEQ ID NOs: 21-24.


In certain embodiments, the spacer sequence comprises about 10 nucleotides to about 50 (e.g., about 10 to about 20, about 20 to about 30, about 30 to about 40, or about 40 to about 50) nucleotides in length. In some embodiments, the spacer sequence comprises about 15 nucleotides and about 35 (e.g., about 15 to about 20, about 20 to about 25, about 25 to about 30, or about 30 to about 35) nucleotides in length.


In some embodiments, the spacer sequence is substantially complementary to a target sequence of a target nucleic acid.


In certain embodiments, the target sequence is adjacent to a protospacer adjacent motif (PAM) sequence. In some embodiments, the PAM sequence comprises a sequence set forth as 5′-NTTN-3′, wherein N is any nucleotide.


In one aspect, the disclosure provides a modified cell comprising a target sequence adjacent to a 5′-NTTN-3′ sequence, wherein the 3′ N is designated as position 0 and position numbers increase in the 3′ direction, wherein the target nucleic acid comprises a nucleotide substitution between positions 5-16 (e.g., between positions 7-12 (e.g., 7, 8, 9, 10, 11, or 12)) relative to an unmodified cell from which the modified cell was produced.


In one aspect, the disclosure provides a modified cell comprising a target sequence comprising a nucleotide position 1 at the 3′ end of a 5′-NTTN-3′ sequence (e.g., positions −3 to −0) and a position x (wherein optionally x=20) nucleotides downstream from position 1, wherein the target sequence comprises a nucleotide substitution between positions 5-16 (e.g., between positions 7-12 (e.g., 7, 8, 9, 10, 11, or 12)) relative to an unmodified cell from which the modified cell was produced.


In certain embodiments, the unmodified cell comprises at least one C between 5-16 (e.g., between positions 7-12, e.g., between positions 8-11) nucleotides downstream from position 0.


In some embodiments, the at least one C is substituted to a U or a T (e.g., a C:G base pair is converted to a T:A base pair).


In certain embodiments, the unmodified cell comprises at least one A between 5-16 (e.g., between positions 7-12, e.g., between positions 8-11) nucleotides downstream from position 0.


In some embodiments, the at least one A is substituted to inosine (I) (e.g., an A:T base pair is converted to an I:C, I:U, or I:A base pair) or to guanine (G).


In certain embodiments, the cell is modified by a fusion protein or polypeptide system any method, or any composition described herein.


In some embodiments, the modified cell comprises 2, 3, or more nucleotide substitutions between nucleotide positions 5-16.


In some embodiments of any of the compositions described herein, the system is present in a delivery composition comprising a virus, a nanoparticle, a liposome, an exosome, a microvesicle, or a gene-gun.


In some embodiments of any of the compositions described herein, the compositions are within a cell. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the cell is a prokaryotic cell.


Other features and advantages of the invention will be apparent from the following detailed description and from the claims.


Definitions

The present invention will be described with respect to particular embodiments and with reference to certain Figures, but the invention is not limited thereto but only by the claims. Terms as set forth hereinafter are generally to be understood in their common sense unless indicated otherwise.


As used herein, the term “activity” refers to a biological activity. In some embodiments, the activity refers to effector activity. In some embodiments, activity includes enzymatic activity, e.g., catalytic ability of an effector. For example, activity can include nuclease activity. In another example, activity refers to the ability of an enzyme to generate DNA from RNA or to introduce an edit into a target sequence.


As used herein, the term “adjacent to” refers to a nucleotide or amino acid sequence in close proximity to another nucleotide or amino acid sequence. In some embodiments, a nucleotide sequence is adjacent to another nucleotide sequence if no nucleotides separate the two sequences. In some embodiments, a nucleotide sequence is adjacent to another nucleotide sequence if a small number of nucleotides separate the two sequences (e.g., about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides). In some embodiments, a first sequence is adjacent to a second sequence if the two sequences are separated by about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 nucleotides.


As used herein, a “biologically active portion” of a polypeptide is a portion of a polypeptide that maintains a function (e.g., completely, partially, or minimally) of the polypeptide (e.g., a Cas12i domain (e.g., a “minimal” or “core” domain) or a deaminase domain).


As used herein, the term “Cas12i polypeptide” (also referred to herein as Cas12i) refers to a polypeptide that binds to a target sequence on a target nucleic acid specified by an RNA guide, wherein the polypeptide has at least some amino acid sequence homology to a wild-type Cas12i polypeptide. In some embodiments, the Cas12i polypeptide comprises at least 75%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity with any one of SEQ ID NOs: 1-5 and 11-18 of U.S. Pat. No. 10,808,245, which is incorporated by reference herein in its entirety. In some embodiments, a Cas12i polypeptide comprises at least 75%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% sequence identity with any one of SEQ ID NO: 3 (Cas12i1), SEQ ID NO: 5 (Cas12i2), SEQ ID NO: 14 (Cas12i3), or SEQ ID NO: 16 (Cas12i4) of U.S. Pat. No. 10,808,245, corresponding to SEQ ID NOs: 8, 2, 11, and 9 of the present application. In some embodiments, a Cas12i polypeptide of the disclosure is a Cas12i1 polypeptide or Cas12i2 polypeptide as described in PCT/US2021/025257. In some embodiments, the Cas12i polypeptide cleaves a target nucleic acid (e.g., as a nick or a double strand break).


The term “Cas12i fusion protein,” as used herein, refers to a polypeptide having: i) one or more domains, wherein at least one of the domains includes a portion of a Cas12i domain and ii) a fusion domain such as a deaminase domain, wherein the Cas12i fusion protein binds to a target sequence on a target nucleic acid specified by an RNA guide. In some embodiments, the Cas12i fusion protein has enzymatic (e.g., nuclease) activity. In some embodiments, an enzymatic activity (e.g., nuclease activity) can be carried out by the Cas12i domain. In some instances, the Cas12i domain comprises an amino acid sequence having at least 80% (e.g., 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) sequence identity to any one of SEQ ID NOs: 2-11 or a portion thereof. In some instances, the Cas12i domain has the sequence of SEQ ID NO: 2 or a portion thereof. In some instances, the Cas12i domain has the sequence of SEQ ID NO: 4 or a portion thereof. While the amino acid numbering system used herein is in relation to SEQ ID NO: 2, other Cas12i2 sequences can be used. One of ordinary skill in the art can identify the corresponding amino acid positions in another Cas12i2 sequences using available tools, such as sequence alignment algorithms. In some embodiments, the Cas12i fusion protein was produced by translation of a single nucleic acid encoding the fusion protein. In some embodiments, the Cas12i domain and the heterologous domain were produced separately (e.g., from separate genes) and then covalently linked.


As used herein, the term “complex” refers to a grouping of two or more molecules. In some embodiments, the complex comprises a polypeptide and a nucleic acid molecule interacting with (e.g., binding to, coming into contact with, adhering to) one another. In some embodiments, the term “complex” is used to refer to association of a Cas12i polypeptide and a deaminase polypeptide. In some embodiments, the term “complex” is used to refer to association of an RNA guide and a Cas12i polypeptide. In some embodiments, the term “complex” is used to refer to association of a Cas12i polypeptide, a deaminase polypeptide, and an RNA guide.


As used herein, the term “deaminase” or “deaminase domain”, refers to a polypeptide or polypeptide domain capable of removing an amino group from a substrate molecule (such as a nucleotide base). In some embodiments, the deaminase domain is an enzyme. In some embodiments, the deaminase domain is an enzyme classified in EC 3.5.4.


As used herein, the term “dimerization domain,” refers to a polypeptide domain capable of specifically binding a separate, and compatible, polypeptide domain (e.g., a second compatible dimerization domain). In some embodiments, the dimer is formed by a non-covalent bond between the first dimerization domain and the second compatible dimerization domain. In some embodiments, the first dimerization domain and the second compatible dimerization domain have identical sequences (e.g., form a homodimer). In some embodiments, the first dimerization domain and the second dimerization domain do not have identical sequences (e.g., form a heterodimer). In some embodiments, a dimerization domain is a leucine zipper. In some instances, the dimerization domain is a nanobody, antibody, or coiled-coil domain. In some instances, the dimerization domain is a chemically inducible dimerization domain (e.g., a rapamycin sensitive dimerization domain) that can be regulated by the presence of a small molecule.


As used herein, the terms “domain” and “protein domain” refer to a distinct functional and/or structural unit of a polypeptide. In some embodiments, a domain may comprise a conserved amino acid sequence.


The term “fusion domain,” as used herein, refers to a polypeptide domain that is operably linked to a second, heterologous domain. In some embodiments, the fusion domain is about 10-20, 20-50, 50-100, 100-200, or 200-300 amino acids in length.


The term “heterologous,” when used to describe a first element in reference to a second element means that the first element and second element do not exist in nature disposed as described. For example, a heterologous polypeptide sequence refers to (a) a polypeptide, or portion of a polypeptide that is operably linked to a second polypeptide sequence to which it is not operably linked in nature, (b) a polypeptide or portion of a polypeptide that is not native to a cell in which it is expressed, (c) a polypeptide or portion of a polypeptide that has been altered or mutated relative to its native state, or (d) a polypeptide with an altered expression as compared to the native expression levels under similar conditions. As an example, a heterologous sequence of a polypeptide may be a different sequence or from a different source, relative to other domains or portions of a polypeptide. In some instances, the heterologous sequence includes a protein domain and at least one linker sequence.


The term “loop,” as used herein, refers to a consecutive group of amino acids in an amino acid sequence of a polypeptide, comprising substantially no regular secondary structure, that connects two regular secondary structure elements when the polypeptide is under physiological conditions. In some embodiments, the loop is located on the surface in a solvent exposed area of a polypeptide, protein, or fragment thereof. In some embodiments, the loop comprises at least 3 amino acids. In some embodiments, loops are identified using analytical methods, such as X-ray crystallography, nuclear magnetic resonance (NMR), and small-angle X-ray scattering (SAXS). In some embodiments, loops can be determined using molecular modeling techniques.


The term “polypeptide linker,” as used herein refers to a linker that comprises amino acids and links together two amino acid sequences (e.g., domains). In some embodiments, the polypeptide linker comprises glycine and/or serine residues used alone or in combination. In some embodiments, the peptide linker connects two portions of the Cas12i fusion protein together.


As used herein, the term “protospacer adjacent motif” or “PAM sequence” refers to a DNA sequence adjacent to a target sequence to which a binary complex comprising a Cas12i polypeptide and an RNA guide binds. In some embodiments, a PAM sequence is required for enzyme activity. In the case of a double-stranded target, the RNA guide binds to a first strand of the target, and a PAM sequence as described herein is present in the second, complementary strand. For example, in some embodiments, the RNA guide binds to the target strand (TS) (e.g., the spacer-complementary strand), and the PAM sequence as described herein is present in the non-target strand (i.e., the non-spacer-complementary strand). In a double-stranded DNA molecule, the strand containing the PAM motif is called the “PAM-strand” and the complementary strand is called the “non-PAM strand.” The RNA guide binds to a site in the non-PAM strand that is complementary to a target sequence disclosed herein. In some embodiments, the PAM strand is a coding (e.g., sense) strand. In other embodiments, the PAM strand is a non-coding (e.g., antisense strand). Since an RNA guide binds the non-PAM strand via base-pairing, the non-PAM strand is also known as the target strand, while the PAM strand is also known as the non-target strand.


As used herein, the terms “RNA guide” or “RNA guide sequence” refer to any RNA molecule that facilitates the targeting of a Cas12i polypeptide described herein to a target sequence. For example, an RNA guide can be a molecule that recognizes (e.g., binds to) a target sequence. An RNA guide may be designed to be complementary to a specific nucleic acid sequence. An RNA guide comprises a DNA-targeting sequence (e.g., a DNA-binding sequence or a spacer) and a nuclease binding sequence (e.g. direct repeat (DR) sequence). The terms CRISPR RNA (crRNA), pre-crRNA and mature crRNA are also used herein to refer to an RNA guide. In some instances, the RNA guide can be a modified RNA molecule comprising one or more deoxyribonucleotides, for example, in a DNA-binding sequence contained in the RNA guide, which binds the non-PAM strand of a target nucleic acid. In some examples, the DNA-binding sequence may contain a DNA sequence or a DNA/RNA hybrid sequence.


As used herein, the term “substantially complementary” refers to a polynucleotide (e.g., a spacer sequence of an RNA guide) that has a certain level of complementarity to a target sequence. In some embodiments, the level of complementarity is such that the polynucleotide can hybridize to the target sequence with sufficient affinity to permit a Cas12i polypeptide that is complexed with the polynucleotide to act on (e.g., cleave) the target sequence.


As used herein, the term “substitution” refers to a replacement of a nucleotide or nucleotides with a different nucleotide or nucleotides, relative to a reference sequence. No particular process is implied in how to make a sequence comprising a substitution. For instance, a sequence comprising a substitution can be synthesized directly from individual nucleotides. In other embodiments, a substitution is made by providing and then altering a reference sequence. The nucleic acid sequence can be in a genome of an organism. The nucleic acid sequence can be in a cell. The nucleic acid sequence can be a DNA sequence. The substitution described herein refers to a substitution of up to several kilobases.


As used herein, the term “target sequence” refers to a sequence to which an RNA guide specifically binds. In some embodiments, the DNA-binding sequence of an RNA guide (e.g., the spacer) binds to a target sequence. In some embodiments, the term “target nucleic acid” is used to refer to a nucleic acid such as a chromosome where a target sequence can be found. For example, a target nucleic acid comprises the target sequence and additional coding or non-coding sequences. In some embodiments, an edit is introduced into a target sequence or target nucleic acid by a composition described herein. In some embodiments, the target sequence is a segment of DNA adjacent to a PAM motif (on the PAM strand). The complementary region of the target sequence is on the non-PAM strand. A target sequence may be immediately adjacent to the PAM motif. Alternatively, the target sequence and the PAM may be separated by a small sequence segment (e.g., up to 5 nucleotides, for example, up to 4, 3, 2, or 1 nucleotide). A target sequence may be located at the 3′ end of the PAM motif or at the 5′ end of the PAM motif, depending upon the CRISPR nuclease that recognizes the PAM motif, which is known in the art. For example, a target sequence is located at the 3′ end of a PAM motif for a Cas12i polypeptide (e.g., a Cas12i2 polypeptide such as those disclosed herein). It is of course understood that DNA is often double stranded, and that a RNA guide will bind to one of the two strands, to which it is complementary. The location in the DNA where the RNA guide binds can be conveniently described by either providing the sequence of the strand to which the RNA guide binds (the non-PAM strand) or the sequence of the strand to which the RNA guide does not bind (the PAM strand). Thus, as is clear from context throughout the application, a target nucleic acid sequence may be described by providing the nucleic acid sequence of either strand of the double stranded DNA targeted by a RNA guide described herein.


It is understood that, herein, when a nucleic is said to comprise a particular nucleotide between specified positions, the end positions are included. For example, a nucleic acid comprising A between positions 8-11 could comprise the A at position 8, 9, 10, or 11.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a bar graph that shows % C>T edits for AAVS1, EMX1, and VEGFA targets by Cas12i2-deaminase and Cas9-deaminase fusion polypeptides.



FIG. 2 is a graph that shows C>T base editing by a Cas12i2-NA3A-NUGI construct of SEQ ID NO: 46.



FIG. 3 is a graph that shows C>T base editing by a Cas12i2-NA3A-NUGI construct of SEQ ID NO: 45.



FIG. 4 is a graph that shows C>T base editing by a dCas9-NA3A-CUGI construct of SEQ ID NO: 51.



FIG. 5 is a graph that shows C>T base editing by an nCas9-NAID-CUGI construct of SEQ ID NO: 54.



FIG. 6A is a bar graph that shows C>T base editing by Cas12i2-deaminase and Cas9-deaminase fusion polypeptides within an EMX1_T4 target. Positions of the Cas12i2 and Cas9 targets are shown in the schematic diagram below the graph.



FIG. 6B is a bar graph that shows indel activity by Cas12i2 and Cas9 constructs within an EMX1_T4 target.



FIG. 7A is a bar graph that shows C>T base editing by Cas12i2-deaminase and Cas9-deaminase fusion polypeptides within an EMX1_T7 target. Positions of the Cas12i2 and Cas9 targets are shown.



FIG. 7B is a bar graph that shows indel activity by Cas12i2 and Cas9 constructs within an EMX1_T7 target.



FIG. 8 is a bar graph that shows C>T base editing activity for variants of the Cas12i2-deaminase fusion polypeptide of SEQ ID NO: 45.



FIG. 9 is a bar graph that shows C>T base editing activity for variants of the Cas12i2-deaminase fusion polypeptide of SEQ ID NO: 45.



FIG. 10 is a graph that shows C>T base editing activity and indel activity by Cas12i2, Cas12i4, and Cas9 constructs of SEQ ID NO: 45, SEQ ID NO: 64, and SEQ ID NO: 51, respectively.



FIG. 11 depicts a schematic representation of a Cas12i2 fusion protein comprising a FokI nuclease domain. In some instances, the FokI nuclease domain is a heterodimeric FokI nuclease domain. In this exemplary schematic, the heterodimeric FokI nuclease domain comprises a catalytically active FokI nuclease domain and a catalytically inactive FokI nuclease domain. In some aspects, a FokI domain as depicted in FIG. 11 is further fused to a deaminase. In some aspects, the Cas12i2 protein as depicted in FIG. 11 is further fused to a deaminase.



FIGS. 12A, 12B, 12C, and 12D depict flexible loops of the Cas12i2 protein in proximity to target DNA. FIG. 12A depicts the positions of flexible loops in the Helical II domain (loops at residues 342-358, 373-378, and 386-397), the Helical III domain (loops at residues 677-685 and 771-782), the RuvC II motif (loop at residues 831-844), and the Nuc domain (loop at residues 953-965). FIG. 12B depicts the positions of the loops at residues 373-378, 677-685, and 953-965. FIG. 12C depicts the positions of the loops at residues 342-358 and 386-397. In some embodiments, a FokI nuclease domain is introduced by way of linker in the loop at residues 342-358 and in the loop at residues 386-397. For example, in some embodiments, a catalytically active FokI nuclease domain is introduced into the loop at residues 342-358 and a catalytically inactive FokI nuclease domain is introduced into the loop at residues 386-397. In another example, in some embodiments, a catalytically inactive FokI nuclease domain is introduced into the loop at residues 342-358 and a catalytically active FokI nuclease domain is introduced into the loop at residues 386-397. FIG. 12D depicts the positions of the loops at residues 342-358 and 386-397 as well as the helices between the two loops. In some instances, a circular permutation is introduced at any one of the indicated loops. In some instances, the portion of the Helical II domain positioned from about residue 342 to about 397 is deleted.



FIG. 13A depicts a schematic representation for the engineering a circularly permuted, non-naturally occurring Cas12i2 protein. The top panel depicts the domains of a reference Cas12i2 protein. In the middle panel of this exemplary schematic, the N-terminus and the C-terminus of the Cas12i2 protein are linked by way of a heterologous sequence (e.g., a linker), and a new N-terminus and C-terminus are located at a loop of interest (e.g., a loop within the Helical II domain). In some instances, the new N-terminus and/or C-terminus comprise a fusion domain. In some instances, the fusion domain is a FokI nuclease domain. As depicted in this exemplary schematic, the new N-terminus can be fused to a dead FokI nuclease domain, and the new C-terminus can be fused to an active FokI nuclease domain. In some aspects, a FokI domain as depicted in FIG. 13A is further fused to a deaminase. In some aspects, the Cas12i2 protein as depicted in FIG. 13A is further fused to a deaminase.



FIG. 13B depicts a schematic representation for the engineering a circularly permuted, non-naturally occurring Cas12i2 protein. The top panel depicts the domains of a reference Cas12i2 protein and a portion of the Helical II domain that can be mutated or deleted (see asterisk). In the middle panel of this exemplary schematic, the N-terminus and the C-terminus of the Cas12i2 protein are linked by way of a heterologous sequence (e.g., a linker), a portion of the Helical II domain is deleted (e.g., the portion from about residue 342 to about 397), and a new N-terminus and C-terminus are located within the Helical II domain. In some instances, the new N-terminus and/or C-terminus comprise a fusion domain. In some instances, the fusion domain is a FokI nuclease domain. As depicted in this exemplary schematic, the new N-terminus can be fused to a dead FokI nuclease domain, and the new C-terminus can be fused to an active FokI nuclease domain. In some aspects, a FokI domain as depicted in FIG. 13B is further fused to a deaminase. In some aspects, the Cas12i2 protein as depicted in FIG. 13B is further fused to a deaminase.





DETAILED DESCRIPTION

The present disclosure relates to a compositions comprising a Cas12i polypeptide, a deaminase, and an RNA guide. In some aspects, a composition having one or more characteristics is described herein. In some aspects, a method of producing the composition is described. In some aspects, a method of delivering the composition is described.


Composition


In some embodiments, a composition of the present invention comprises at least one protein component. In some embodiments, the at least one protein component is a Cas12i polypeptide, a deaminase polypeptide, or a Cas12i fusion protein (e.g., Cas12i-deaminase fusion polypeptide).


In some embodiments, a composition of the present invention is capable of binding to a target sequence of a target nucleic acid. In some embodiments, the target nucleic acid is DNA. In some embodiments, a composition of the present invention modifies a target nucleic acid. In some embodiments, a composition of a present invention introduces a substitution into a target sequence of a target nucleic acid. In some embodiments, a composition of a present invention is capable of introducing a substitution into the target strand of a target nucleic acid. In some embodiments, a composition of a present invention is capable of introducing a substitution into the non-target strand of a target nucleic acid.


Cas12i Domains and Polypeptides


In some embodiments, a composition of the present invention comprises a Cas12i polypeptide. In some embodiments, the Cas12i polypeptide is an RNA-guided nuclease. In some embodiments, the Cas12i polypeptide is a DNA-targeting nuclease.


In some embodiments, the Cas12i polypeptide is encoded by a nucleotide sequence such as SEQ ID NO: 1 or comprises an amino acid sequence such as SEQ ID NO: 2. In some embodiments, the Cas12i polypeptide of the present invention is a variant of a parent Cas12i polypeptide, wherein the parent comprises a nucleotide sequence such as SEQ ID NO: 1 or is encoded by a polypeptide that comprises an amino acid sequence such as SEQ ID NO: 2. See Table 1.









TABLE 1







Cas12i sequences.









SEQ ID




NO
Sequence
Description





 1
ATGAGCAGCGCGATCAAAAGCTACAAGAGCGTTCTGCGTCCGAACGAGC
Nucleotide



GTAAGAACCAACTGCTGAAAAGCACCATTCAGTGCCTGGAAGACGGTAG
sequence



CGCGTTCTTTTTCAAGATGCTGCAAGGCCTGTTTGGTGGCATCACCCCG
encoding



GAGATTGTTCGTTTCAGCACCGAACAGGAGAAACAGCAACAGGATATCG
Cas12i2



CGCTGTGGTGCGCGGTTAACTGGTTCCGTCCGGTGAGCCAAGACAGCCT




GACCCACACCATTGCGAGCGATAACCTGGTGGAGAAGTTTGAGGAATAC




TATGGTGGCACCGCGAGCGACGCGATCAAACAGTACTTCAGCGCGAGCA




TTGGCGAAAGCTACTATTGGAACGACTGCCGTCAACAGTACTATGATCT




GTGCCGTGAGCTGGGTGTTGAGGTGAGCGACCTGACCCATGATCTGGAG




ATCCTGTGCCGTGAAAAGTGCCTGGCGGTTGCGACCGAGAGCAACCAGA




ACAACAGCATCATTAGCGTTCTGTTTGGCACCGGCGAAAAAGAGGACCG




TAGCGTGAAACTGCGTATCACCAAGAAAATTCTGGAGGCGATCAGCAAC




CTGAAAGAAATCCCGAAGAACGTTGCGCCGATTCAAGAGATCATTCTGA




ACGTGGCGAAAGCGACCAAGGAAACCTTCCGTCAGGTGTATGCGGGTAA




CCTGGGTGCGCCGAGCACCCTGGAGAAATTTATCGCGAAGGACGGCCAA




AAAGAGTTCGATCTGAAGAAACTGCAGACCGACCTGAAGAAAGTTATTC




GTGGTAAAAGCAAGGAGCGTGATTGGTGCTGCCAGGAAGAGCTGCGTAG




CTACGTGGAGCAAAACACCATCCAGTATGACCTGTGGGCGTGGGGCGAA




ATGTTCAACAAAGCGCACACCGCGCTGAAAATCAAGAGCACCCGTAACT




ACAACTTTGCGAAGCAACGTCTGGAACAGTTCAAAGAGATTCAGAGCCT




GAACAACCTGCTGGTTGTGAAGAAGCTGAACGACTTTTTCGATAGCGAA




TTTTTCAGCGGCGAGGAAACCTACACCATCTGCGTTCACCATCTGGGTG




GCAAGGACCTGAGCAAACTGTATAAGGCGTGGGAGGATGATCCGGCGGA




CCCGGAAAACGCGATTGTGGTTCTGTGCGACGATCTGAAAAACAACTTT




AAGAAAGAGCCGATCCGTAACATTCTGCGTTACATCTTCACCATTCGTC




AAGAATGCAGCGCGCAGGACATCCTGGCGGCGGCGAAGTACAACCAACA




GCTGGATCGTTATAAAAGCCAAAAGGCGAACCCGAGCGTTCTGGGTAAC




CAGGGCTTTACCTGGACCAACGCGGTGATCCTGCCGGAGAAGGCGCAGC




GTAACGACCGTCCGAACAGCCTGGATCTGCGTATTTGGCTGTACCTGAA




ACTGCGTCACCCGGACGGTCGTTGGAAGAAACACCATATCCCGTTCTAC




GATACCCGTTTCTTCCAAGAAATTTATGCGGCGGGCAACAGCCCGGTTG




ACACCTGCCAGTTTCGTACCCCGCGTTTCGGTTATCACCTGCCGAAACT




GACCGATCAGACCGCGATCCGTGTTAACAAGAAACATGTGAAAGCGGCG




AAGACCGAGGCGCGTATTCGTCTGGCGATCCAACAGGGCACCCTGCCGG




TGAGCAACCTGAAGATCACCGAAATTAGCGCGACCATCAACAGCAAAGG




TCAAGTGCGTATTCCGGTTAAGTTTGACGTGGGTCGTCAAAAAGGCACC




CTGCAGATCGGTGACCGTTTCTGCGGCTACGATCAAAACCAGACCGCGA




GCCACGCGTATAGCCTGTGGGAAGTGGTTAAAGAGGGTCAATACCATAA




AGAGCTGGGCTGCTTTGTTCGTTTCATCAGCAGCGGTGACATCGTGAGC




ATTACCGAGAACCGTGGCAACCAATTTGATCAGCTGAGCTATGAAGGTC




TGGCGTACCCGCAATATGCGGACTGGCGTAAGAAAGCGAGCAAGTTCGT




GAGCCTGTGGCAGATCACCAAGAAAAACAAGAAAAAGGAAATCGTGACC




GTTGAAGCGAAAGAGAAGTTTGACGCGATCTGCAAGTACCAGCCGCGTC




TGTATAAATTCAACAAGGAGTACGCGTATCTGCTGCGTGATATTGTTCG




TGGCAAAAGCCTGGTGGAACTGCAACAGATTCGTCAAGAGATCTTTCGT




TTCATTGAACAGGACTGCGGTGTTACCCGTCTGGGCAGCCTGAGCCTGA




GCACCCTGGAAACCGTGAAAGCGGTTAAGGGTATCATTTACAGCTATTT




TAGCACCGCGCTGAACGCGAGCAAGAACAACCCGATCAGCGACGAACAG




CGTAAAGAGTTTGATCCGGAACTGTTCGCGCTGCTGGAAAAGCTGGAGC




TGATTCGTACCCGTAAAAAGAAACAAAAAGTGGAACGTATCGCGAACAG




CCTGATTCAGACCTGCCTGGAGAACAACATCAAGTTCATTCGTGGTGAA




GGCGACCTGAGCACCACCAACAACGCGACCAAGAAAAAGGCGAACAGCC




GTAGCATGGATTGGTTGGCGCGTGGTGTTTTTAACAAAATCCGTCAACT




GGCGCCGATGCACAACATTACCCTGTTCGGTTGCGGCAGCCTGTACACC




AGCCACCAGGACCCGCTGGTGCATCGTAACCCGGATAAAGCGATGAAGT




GCCGTTGGGCGGCGATCCCGGTTAAGGACATTGGCGATTGGGTGCTGCG




TAAGCTGAGCCAAAACCTGCGTGCGAAAAACATCGGCACCGGCGAGTAC




TATCACCAAGGTGTTAAAGAGTTCCTGAGCCATTATGAACTGCAGGACC




TGGAGGAAGAGCTGCTGAAGTGGCGTAGCGATCGTAAAAGCAACATTCC




GTGCTGGGTGCTGCAGAACCGTCTGGCGGAGAAGCTGGGCAACAAAGAA




GCGGTGGTTTACATCCCGGTTCGTGGTGGCCGTATTTATTTTGCGACCC




ACAAGGTGGCGACCGGTGCGGTGAGCATCGTTTTCGACCAAAAACAAGT




GTGGGTTTGCAACGCGGATCATGTTGCGGCGGCGAACATCGCGCTGACC




GTGAAGGGTATTGGCGAACAAAGCAGCGACGAAGAGAACCCGGATGGTA




GCCGTATCAAACTGCAGCTGACCAGC






 2
MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLFGGITP
Cas12i2



EIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIASDNLVEKFEEY
amino acid



YGGTASDAIKQYFSASIGESYYWNDCRQQYYDLCRELGVEVSDLTHDLE
sequence



ILCREKCLAVATESNQNNSIISVLFGTGEKEDRSVKLRITKKILEAISN




LKEIPKNVAPIQEIILNVAKATKETFRQVYAGNLGAPSTLEKFIAKDGQ




KEFDLKKLQTDLKKVIRGKSKERDWCCQEELRSYVEQNTIQYDLWAWGE




MFNKAHTALKIKSTRNYNFAKQRLEQFKEIQSLNNLLVVKKLNDFFDSE




FFSGEETYTICVHHLGGKDLSKLYKAWEDDPADPENAIVVLCDDLKNNF




KKEPIRNILRYIFTIRQECSAQDILAAAKYNQQLDRYKSQKANPSVLGN




QGFTWTNAVILPEKAQRNDRPNSLDLRIWLYLKLRHPDGRWKKHHIPFY




DTRFFQEIYAAGNSPVDTCQFRTPRFGYHLPKLTDQTAIRVNKKHVKAA




KTEARIRLAIQQGTLPVSNLKITEISATINSKGQVRIPVKFDVGRQKGT




LQIGDRFCGYDQNQTASHAYSLWEVVKEGQYHKELGCFVRFISSGDIVS




ITENRGNQFDQLSYEGLAYPQYADWRKKASKFVSLWQITKKNKKKEIVT




VEAKEKFDAICKYQPRLYKFNKEYAYLLRDIVRGKSLVELQQIRQEIFR




FIEQDCGVTRLGSLSLSTLETVKAVKGIIYSYFSTALNASKNNPISDEQ




RKEFDPELFALLEKLELIRTRKKKQKVERIANSLIQTCLENNIKFIRGE




GDLSTTNNATKKKANSRSMDWLARGVFNKIRQLAPMHNITLFGCGSLYT




SHQDPLVHRNPDKAMKCRWAAIPVKDIGDWVLRKLSQNLRAKNIGTGEY




YHQGVKEFLSHYELQDLEEELLKWRSDRKSNIPCWVLQNRLAEKLGNKE




AVVYIPVRGGRIYFATHKVATGAVSIVFDQKQVWVCNADHVAAANIALT




VKGIGEQSSDEENPDGSRIKLQLTS






 3
MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLFGGITP
Variant



EIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIASDNLVEKFEEY
Cas12i2 of



YGGTASDAIKQYFSASIGESYYWNDCRQQYYDLCRELGVEVSDLTHDLE
SEQ ID NO: 3



ILCREKCLAVATESNQNNSIISVLFGTGEKEDRSVKLRITKKILEAISN
of



LKEIPKNVAPIQEIILNVAKATKETFRQVYAGNLGAPSTLEKFIAKDGQ
PCT/US2021/



KEFDLKKLQTDLKKVIRGKSKERDWCCQEELRSYVEQNTIQYDLWAWGE
025257



MFNKAHTALKIKSTRNYNFAKQRLEQFKEIQSLNNLLVVKKLNDFFDSE




FFSGEETYTICVHHLGGKDLSKLYKAWEDDPADPENAIVVLCDDLKNNF




KKEPIRNILRYIFTIRQECSAQDILAAAKYNQQLDRYKSQKANPSVLGN




QGFTWTNAVILPEKAQRNDRPNSLDLRIWLYLKLRHPDGRWKKHHIPFY




DTRFFQEIYAAGNSPVDTCQFRTPRFGYHLPKLTDQTAIRVNKKHVKAA




KTEARIRLAIQQGTLPVSNLKITEISATINSKGQVRIPVKFRVGRQKGT




LQIGDRFCGYDQNQTASHAYSLWEVVKEGQYHKELGCFVRFISSGDIVS




ITENRGNQFDQLSYEGLAYPQYADWRKKASKFVSLWQITKKNKKKEIVT




VEAKEKFDAICKYQPRLYKFNKEYAYLLRDIVRGKSLVELQQIRQEIFR




FIEQDCGVTRLGSLSLSTLETVKAVKGIIYSYFSTALNASKNNPISDEQ




RKEFDPELFALLEKLELIRTRKKKQKVERIANSLIQTCLENNIKFIRGE




GDLSTTNNATKKKANSRSMDWLARGVFNKIRQLAPMHNITLFGCGSLYT




SHQDPLVHRNPDKAMKCRWAAIPVKDIGRWVLRKLSQNLRAKNRGTGEY




YHQGVKEFLSHYELQDLEEELLKWRSDRKSNIPCWVLQNRLAEKLGNKE




AVVYIPVRGGRIYFATHKVATGAVSIVFDQKQVWVCNADHVAAANIALT




GKGIGEQSSDEENPDGSRIKLQLTS






 4
MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLFGGITP
Variant



EIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIASDNLVEKFEEY
Cas12i2 of



YGGTASDAIKQYFSASIGESYYWNDCRQQYYDLCRELGVEVSDLTHDLE
SEQ ID NO: 4



ILCREKCLAVATESNQNNSIISVLFGTGEKEDRSVKLRITKKILEAISN
of



LKEIPKNVAPIQEIILNVAKATKETFRQVYAGNLGAPSTLEKFIAKDGQ
PCT/US2021/



KEFDLKKLQTDLKKVIRGKSKERDWCCQEELRSYVEQNTIQYDLWAWGE
025257



MFNKAHTALKIKSTRNYNFAKQRLEQFKEIQSLNNLLVVKKLNDFFDSE




FFSGEETYTICVHHLGGKDLSKLYKAWEDDPADPENAIVVLCDDLKNNF




KKEPIRNILRYIFTIRQECSAQDILAAAKYNQQLDRYKSQKANPSVLGN




QGFTWTNAVILPEKAQRNDRPNSLDLRIWLYLKLRHPDGRWKKHHIPFY




DTRFFQEIYAAGNSPVDTCQFRTPRFGYHLPKLTDQTAIRVNKKHVKAA




KTEARIRLAIQQGTLPVSNLKITEISATINSKGQVRIPVKFRVGRQKGT




LQIGDRFCGYDQNQTASHAYSLWEVVKEGQYHKELGCFVRFISSGDIVS




ITENRGNQFDQLSYEGLAYPQYADWRKKASKFVSLWQITKKNKKKEIVT




VEAKEKFDAICKYQPRLYKFNKEYAYLLRDIVRGKSLVELQQIRQEIFR




FIEQDCGVTRLGSLSLSTLETVKAVKGIIYSYFSTALNASKNNPISDEQ




RKEFDPELFALLEKLELIRTRKKKQKVERIANSLIQTCLENNIKFIRGE




GDLSTTNNATKKKANSRSMDWLARGVFNKIRQLAPMHNITLFGCGSLYT




SHQDPLVHRNPDKAMKCRWAAIPVKDIGDWVLRKLSQNLRAKNRGTGEY




YHQGVKEFLSHYELQDLEEELLKWRSDRKSNIPCWVLQNRLAEKLGNKE




AVVYIPVRGGRIYFATHKVATGAVSIVFDQKQVWVCNADHVAAANIALT




GKGIGEQSSDEENPDGSRIKLQLTS






 5
MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLFGGITP
Variant



EIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIASDNLVEKFEEY
Cas12i2 of



YGGTASDAIKQYFSASIGESYYWNDCRQQYYDLCRELGVEVSDLTHDLE
SEQ ID NO: 5



ILCREKCLAVATESNQNNSIISVLFGTGEKEDRSVKLRITKKILEAISN
of



LKEIPKNVAPIQEIILNVAKATKETFRQVYAGNLGAPSTLEKFIAKDGQ
PCT/US2021/



KEFDLKKLQTDLKKVIRGKSKERDWCCQEELRSYVEQNTIQYDLWAWGE
025257



MFNKAHTALKIKSTRNYNFAKQRLEQFKEIQSLNNLLVVKKLNDFFDSE




FFSGEETYTICVHHLGGKDLSKLYKAWEDDPADPENAIVVLCDDLKNNF




KKEPIRNILRYIFTIRQECSAQDILAAAKYNQQLDRYKSQKANPSVLGN




QGFTWTNAVILPEKAQRNDRPNSLDLRIWLYLKLRHPDGRWKKHHIPFY




DTRFFQEIYAAGNSPVDTCQFRTPRFGYHLPKLTDQTAIRVNKKHVKAA




KTEARIRLAIQQGTLPVSNLKITEISATINSKGQVRIPVKFRVGRQKGT




LQIGDRFCGYDQNQTASHAYSLWEVVKEGQYHKELGCFVRFISSGDIVS




ITENRGNQFDQLSYEGLAYPQYADWRKKASKFVSLWQITKKNKKKEIVT




VEAKEKFDAICKYQPRLYKFNKEYAYLLRDIVRGKSLVELQQIRQEIFR




FIEQDCGVTRLGSLSLSTLETVKAVKGIIYSYFSTALNASKNNPISDEQ




RKEFDPELFALLEKLELIRTRKKKQKVERIANSLIQTCLENNIKFIRGE




GDLSTTNNATKKKANSRSMDWLARGVFNKIRQLAPMHNITLFGCGSLYT




SHQDPLVHRNPDKAMKCRWAAIPVKDIGDWVLRKLSQNLRAKNRGTGEY




YHQGVKEFLSHYELQDLEEELLKWRSDRKSNIPCWVLQNRLAEKLGNKE




AVVYIPVRGGRIYFATHKVATGAVSIVFDQKQVWVCNADHVAAANIALT




GKGIGEQSSDEENPDGGRIKLQLTS






 6
MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLFGGITP
Variant



EIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIASDNLVEKFEEY
Cas12i2 of



YGGTASDAIKQYFSASIGESYYWNDCRQQYYDLCRELGVEVSDLTHDLE
SEQ ID NO:



ILCREKCLAVATESNQNNSIISVLFGTGEKEDRSVKLRITKKILEAISN
495 of



LKEIPKNVAPIQEIILNVAKATKETFRQVYAGNLGAPSTLEKFIAKDGQ
PCT/US2021/



KEFDLKKLQTDLKKVIRGKSKERDWCCQEELRSYVEQNTIQYDLWAWGE
025257



MFNKAHTALKIKSTRNYNFAKQRLEQFKEIQSLNNLLVVKKLNDFFDSE




FFSGEETYTICVHHLGGKDLSKLYKAWEDDPADPENAIVVLCDDLKNNF




KKEPIRNILRYIFTIRQECSAQDILAAAKYNQQLDRYKSQKANPSVLGN




QGFTWTNAVILPEKAQRNDRPNSLDLRIWLYLKLRHPDGRWKKHHIPFY




DTRFFQEIYAAGNSPVDTCQFRTPRFGYHLPKLTDQTAIRVNKKHVKAA




KTEARIRLAIQQGTLPVSNLKITEISATINSKGQVRIPVKFRVGRQKGT




LQIGDRFCGYDQNQTASHAYSLWEVVKEGQYHKELRCRVRFISSGDIVS




ITENRGNQFDQLSYEGLAYPQYADWRKKASKFVSLWQITKKNKKKEIVT




VEAKEKFDAICKYQPRLYKFNKEYAYLLRDIVRGKSLVELQQIRQEIFR




FIEQDCGVTRLGSLSLSTLETVKAVKGIIYSYFSTALNASKNNPISDEQ




RKEFDPELFALLEKLELIRTRKKKQKVERIANSLIQTCLENNIKFIRGE




GDLSTTNNATKKKANSRSMDWLARGVFNKIRQLAPMHNITLFGCGSLYT




SHQDPLVHRNPDKAMKCRWAAIPVKDIGDWVLRKLSQNLRAKNRGTGEY




YHQGVKEFLSHYELQDLEEELLKWRSDRKSNIPCWVLQNRLAEKLGNKE




AVVYIPVRGGRIYFATHKVATGAVSIVFDQKQVWVCNADHVAAANIALT




GKGIGRQSSDEENPDGGRIKLQLTS






 7
MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLFGGITP
Variant



EIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIASDNLVEKFEEY
Cas12i2 of



YGGTASDAIKQYFSASIGESYYWNDCRQQYYDLCRELGVEVSDLTHDLE
SEQ ID NO:



ILCREKCLAVATESNQNNSIISVLFGTGEKEDRSVKLRITKKILEAISN
496 of



LKEIPKNVAPIQEIILNVAKATKETFRQVYAGNLGAPSTLEKFIAKDGQ
PCT/US2021/



KEFDLKKLQTDLKKVIRGKSKERDWCCQEELRSYVEQNTIQYDLWAWGE
025257



MFNKAHTALKIKSTRNYNFAKQRLEQFKEIQSLNNLLVVKKLNDFFDSE




FFSGEETYTICVHHLGGKDLSKLYKAWEDDPADPENAIVVLCDDLKNNF




KKEPIRNILRYIFTIRQECSAQDILAAAKYNQQLDRYKSQKANPSVLGN




QGFTWTNAVILPEKAQRNDRPNSLDLRIWLYLKLRHPDGRWKKHHIPFY




DTRFFQEIYAAGNSPVDTCQFRTPRFGYHLPKLTDQTAIRVNKKHVKAA




KTEARIRLAIQQGTLPVSNLKITEISATINSKGQVRIPVKFRVGRQKGT




LQIGDRFCGYDQNQTASHAYSLWEVVKEGQYHKELRCRVRFISSGDIVS




ITENRGNQFDQLSYEGLAYPQYADWRKKASKFVSLWQITKKNKKKEIVT




VEAKEKFDAICKYQPRLYKFNKEYAYLLRDIVRGKSLVELQQIRQEIFR




FIEQDCGVTRLGSLSLSTLETVKAVKGIIYSYFSTALNASKNNPISDEQ




RKEFDPELFALLEKLELIRTRKKKQKVERIANSLIQTCLENNIKFIRGE




GDLSTTNNATKKKANSRSMDWLARGVFNKIRQLATMHNITLFGCGSLYT




SHQDPLVHRNPDKAMKCRWAAIPVKDIGDWVLRKLSQNLRAKNRGTGEY




YHQGVKEFLSHYELQDLEEELLKWRSDRKSNIPCWVLQNRLAEKLGNKE




AVVYIPVRGGRIYFATHKVATGAVSIVFDQKQVWVCNADHVAAANIALT




GKGIGRQSSDEENPDGGRIKLQLTS






 8
MSNKEKNASETRKAYTTKMIPRSHDRMKLLGNFMDYLMDGTPIFFELWN
Cas12i1 (SEQ



QFGGGIDRDIISGTANKDKISDDLLLAVNWFKVMPINSKPQGVSPSNLA
ID NO: 3 of



NLFQQYSGSEPDIQAQEYFASNFDTEKHQWKDMRVEYERLLAELQLSRS
U.S. Pat.



DMHHDLKLMYKEKCIGLSLSTAHYITSVMFGTGAKNNRQTKHQFYSKVI
No.



QLLEESTQINSVEQLASIILKAGDCDSYRKLRIRCSRKGATPSILKIVQ
10,808,245)



DYELGTNHDDEVNVPSLIANLKEKLGRFEYECEWKCMEKIKAFLASKVG




PYYLGSYSAMLENALSPIKGMTTKNCKFVLKQIDAKNDIKYENEPFGKI




VEGFFDSPYFESDTNVKWVLHPHHIGESNIKTLWEDLNAIHSKYEEDIA




SLSEDKKEKRIKVYQGDVCQTINTYCEEVGKEAKTPLVQLLRYLYSRKD




DIAVDKIIDGITFLSKKHKVEKQKINPVIQKYPSFNFGNNSKLLGKIIS




PKDKLKHNLKCNRNQVDNYIWIEIKVLNTKTMRWEKHHYALSSTRFLEE




VYYPATSENPPDALAARFRTKTNGYEGKPALSAEQIEQIRSAPVGLRKV




KKRQMRLEAARQQNLLPRYTWGKDFNINICKRGNNFEVTLATKVKKKKE




KNYKVVLGYDANIVRKNTYAAIEAHANGDGVIDYNDLPVKPIESGFVTV




ESQVRDKSYDQLSYNGVKLLYCKPHVESRRSFLEKYRNGTMKDNRGNNI




QIDFMKDFEAIADDETSLYYFNMKYCKLLQSSIRNHSSQAKEYREEIFE




LLRDGKLSVLKLSSLSNLSFVMFKVAKSLIGTYFGHLLKKPKNSKSDVK




APPITDEDKQKADPEMFALRLALEEKRLNKVKSKKEVIANKIVAKALEL




RDKYGPVLIKGENISDTTKKGKKSSTNSFLMDWLARGVANKVKEMVMMH




QGLEFVEVNPNFTSHQDPFVHKNPENTFRARYSRCTPSELTEKNRKEIL




SFLSDKPSKRPTNAYYNEGAMAFLATYGLKKNDVLGVSLEKFKQIMANI




LHQRSEDQLLFPSRGGMFYLATYKLDADATSVNWNGKQFWVCNADLVAA




YNVGLVDIQKDFKKK






 9
MASISRPYGTKLRPDARKKEMLDKFFNTLTKGQRVFADLALCIYGSLTL
Cas12i4 (SEQ



EMAKSLEPESDSELVCAIGWFRLVDKTIWSKDGIKQENLVKQYEAYSGK
ID NO: 16 of



EASEVVKTYLNSPSSDKYVWIDCRQKFLRFQRELGTRNLSEDFECMLFE
U.S. Pat.



QYIRLTKGEIEGYAAISNMFGNGEKEDRSKKRMYATRMKDWLEANENIT
No.



WEQYREALKNQLNAKNLEQVVANYKGNAGGADPFFKYSFSKEGMVSKKE
10,808,245)



HAQQLDKFKTVLKNKARDLNFPNKEKLKQYLEAEIGIPVDANVYSQMFS




NGVSEVQPKTTRNMSFSNEKLDLLTELKDLNKGDGFEYAREVLNGFFDS




ELHTTEDKFNITSRYLGGDKSNRLSKLYKIWKKEGVDCEEGIQQFCEAV




KDKMGQIPIRNVLKYLWQFRETVSAEDFEAAAKANHLEEKISRVKAHPI




VISNRYWAFGTSALVGNIMPADKRHQGEYAGQNFKMWLEAELHYDGKKA




KHHLPFYNARFFEEVYCYHPSVAEITPFKTKQFGCEIGKDIPDYVSVAL




KDNPYKKATKRILRAIYNPVANTTGVDKTTNCSFMIKRENDEYKLVINR




KISVDRPKRIEVGRTIMGYDRNQTASDTYWIGRLVPPGTRGAYRIGEWS




VQYIKSGPVLSSTQGVNNSTTDQLVYNGMPSSSERFKAWKKARMAFIRK




LIRQLNDEGLESKGQDYIPENPSSFDVRGETLYVFNSNYLKALVSKHRK




AKKPVEGILDEIEAWTSKDKDSCSLMRLSSLSDASMQGIASLKSLINSY




FNKNGCKTIEDKEKFNPVLYAKLVEVEQRRTNKRSEKVGRIAGSLEQLA




LLNGVEVVIGEADLGEVEKGKSKKQNSRNMDWCAKQVAQRLEYKLAFHG




IGYFGVNPMYTSHQDPFEHRRVADHIVMRARFEEVNVENIAEWHVRNFS




NYLRADSGTGLYYKQATMDFLKHYGLEEHAEGLENKKIKFYDFRKILED




KNLTSVIIPKRGGRIYMATNPVTSDSTPITYAGKTYNRCNADEVAAANI




VISVLAPRSKKNEEQDDIPLITKKAESKSPPKDRKRSKTSQLPQK






10
MASISRPYGTKLRPDARKKEMLDKFFNTLTKGQRVFADLALCIYGSLTL
Variant



EMAKSLEPESDSELVCAIGWFRLVDKTIWSKDGIKQENLVKQYEAYSGK
Cas12i4



EASEVVKTYLNSPSSDKYVWIDCRQKFLRFQRELGTRNLSEDFECMLFE




QYIRLTKGEIEGYAAISNMFGNGEKEDRSKKRMYATRMKDWLEANENIT




WEQYREALKNQLNAKNLEQVVANYKGNAGGADPFFKYSFSKEGMVSKKE




HAQQLDKFKTVLKNKARDLNFPNKEKLKQYLEAEIGIPVDANVYSQMFS




NGVSEVQPKTTRNMSFSNEKLDLLTELKDLNKGDGFEYAREVLNGFFDS




ELHTTEDKFNITSRYLGGDKSNRLSKLYKIWKKEGVDCEEGIQQFCEAV




KDKMGQIPIRNVLKYLWQFRETVSAEDFEAAAKANHLEEKISRVKAHPI




VISNRYWAFGTSALVGNIMPADKRHQGEYAGQNFKMWLRAELHYDGKKA




KHHLPFYNARFFEEVYCYHPSVAEITPFKTKQFGCEIGKDIPDYVSVAL




KDNPYKKATKRILRAIYNPVANTTRVDKTTNCSFMIKRENDEYKLVINR




KISRDRPKRIEVGRTIMGYDRNQTASDTYWIGRLVPPGTRGAYRIGEWS




VQYIKSGPVLSSTQGVNNSTTDQLVYNGMPSSSERFKAWKKARMAFIRK




LIRQLNDEGLESKGQDYIPENPSSFDVRGETLYVFNSNYLKALVSKHRK




AKKPVEGILDEIEAWTSKDKDSCSLMRLSSLSDASMQGIASLKSLINSY




FNKNGCKTIEDKEKFNPVLYAKLVEVEQRRTNKRSEKVGRIAGSLEQLA




LLNGVEVVIGEADLGEVEKGKSKKQNSRNMDWCAKQVAQRLEYKLAFHG




IGYFGVNPMYTSHQDPFEHRRVADHIVMRARFEEVNVENIAEWHVRNFS




NYLRADSGTGLYYKQATMDFLKHYGLEEHAEGLENKKIKFYDFRKILED




KNLTSVIIPKRGGRIYMATNPVTSDSTPITYAGKTYNRCNADEVAAANI




VISVLAPRSKKNREQDDIPLITKKAESKSPPKDRKRSKTSQLPQK






11
MSISNNNILPYNPKLLPDDRKHKMLVDTFNQLDLIRNNLHDMIIALYGA
Cas12i3 (SEQ



LKYDNIKQFASKEKPHISADALCSINWFRLVKTNERKPAIESNQIISKF
ID NO: 14 of



IQYSGHTPDKYALSHITGNHEPSHKWIDCREYAINYARIMHLSFSQFQD
U.S. Pat.



LATACLNCKILILNGTLTSSWAWGANSALFGGSDKENFSVKAKILNSFI
No.



ENLKDEMNTTKFQVVEKVCQQIGSSDAADLFDLYRSTVKDGNRGPATGR
10,808,245)



NPKVMNLFSQDGEISSEQREDFIESFQKVMQEKNSKQIIPHLDKLKYHL




VKQSGLYDIYSWAAAIKNANSTIVASNSSNLNTILNKTEKQQTFEELRK




DEKIVACSKILLSVNDTLPEDLHYNPSTSNLGKNLDVFFDLLNENSVHT




IENKEEKNKIVKECVNQYMEECKGLNKPPMPVLLTFISDYAHKHQAQDF




LSAAKMNFIDLKIKSIKVVPTVHGSSPYTWISNLSKKNKDGKMIRTPNS




SLIGWIIPPEEIHDQKFAGQNPIIWAVLRVYCNNKWEMHHFPFSDSRFF




TEVYAYKPNLPYLPGGENRSKRFGYRHSTNLSNESRQILLDKSKYAKAN




KSVLRCMENMTHNVVFDPKTSLNIRIKTDKNNSPVLDDKGRITFVMQIN




HRILEKYNNTKIEIGDRILAYDQNQSENHTYAILQRTEEGSHAHQFNGW




YVRVLETGKVTSIVQGLSGPIDQLNYDGMPVTSHKFNCWQADRSAFVSQ




FASLKISETETFDEAYQAINAQGAYTWNLFYLRILRKALRVCHMENINQ




FREEILAISKNRLSPMSLGSLSQNSLKMIRAFKSIINCYMSRMSFVDEL




QKKEGDLELHTIMRLTDNKLNDKRVEKINRASSFLTNKAHSMGCKMIVG




ESDLPVADSKTSKKQNVDRMDWCARALSHKVEYACKLMGLAYRGIPAYM




SSHQDPLVHLVESKRSVLRPRFVVADKSDVKQHHLDNLRRMLNSKTKVG




TAVYYREAVELMCEELGIHKTDMAKGKVSLSDFVDKFIGEKAIFPQRGG




RFYMSTKRLTTGAKLICYSGSDVWLSDADEIAAINIGMFVVCDQTGAFK




KKKKEKLDDEECDILPFRPM









A nucleic acid sequence encoding the Cas12i polypeptide described herein may be substantially identical to a reference nucleic acid sequence, e.g., SEQ ID NO: 1. In some embodiments, the Cas12i polypeptide is encoded by a nucleic acid comprising a sequence having least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity to the reference nucleic acid sequence, e.g., nucleic acid sequence encoding the parent polypeptide, e.g., SEQ ID NO: 1. The percent identity between two such nucleic acids can be determined manually by inspection of the two optimally aligned nucleic acid sequences or by using software programs or algorithms (e.g., BLAST, ALIGN, CLUSTAL) using standard parameters. One indication that two nucleic acid sequences are substantially identical is that the nucleic acid molecules hybridize to the complementary sequence of the other under stringent conditions (e.g., within a range of medium to high stringency).


In some embodiments, the Cas12i polypeptide is encoded by a nucleic acid sequence having at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more sequence identity, but not 100% sequence identity, to a reference nucleic acid sequence, e.g., nucleic acid sequence encoding the Cas12i polypeptide, e.g., SEQ ID NO: 1.


In some embodiments, the Cas12i polypeptide of the present invention comprises a polypeptide sequence having 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO: 2. In some embodiments, the Cas12i polypeptide of the present invention comprises a sequence having greater than 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%, but not 100%, identity to SEQ ID NO: 2.


In some embodiments, the present invention describes a Cas12i polypeptide having a specified degree of amino acid sequence identity to one or more reference polypeptides, e.g., a parent polypeptide, e.g., at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or even at least 99%, but not 100%, sequence identity to the amino acid sequence of SEQ ID NO: 2. Homology or identity can be determined by amino acid sequence alignment, e.g., using a program such as BLAST, ALIGN, or CLUSTAL, as described herein.


In some embodiments, the Cas12i polypeptide is a variant Cas12i2 polypeptide described in PCT/US2021/025257, which is incorporated by reference in its entirety. In some embodiments, the variant Cas12i2 polypeptide comprises one or more of the amino acid substitutions listed in Table 2 of PCT/US2021/025257. In some embodiments, the Cas12i polypeptide is a variant Cas12i2 polypeptide having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 3 of PCT/US2021/025257. In some embodiments, the Cas12i polypeptide is a variant Cas12i2 polypeptide having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 4 of PCT/US2021/025257. In some embodiments, the Cas12i polypeptide is a variant Cas12i2 polypeptide having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 5 of PCT/US2021/025257. In some embodiments, the Cas12i polypeptide is a variant Cas12i2 polypeptide having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 495 of PCT/US2021/025257. In some embodiments, the Cas12i polypeptide is a variant Cas12i2 polypeptide having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 496 of PCT/US2021/025257. In some embodiments, the Cas12i polypeptide is a variant Cas12i2 polypeptide having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 3-146 and 495-512 of PCT/US2021/025257.


In some embodiments, a Cas12i2 polypeptide further comprises one or more of the following substitutions: G587R, D599A, D599K, F626R, E833Q, E833N, D1019K, or D1019N.


In some embodiments, the Cas12i polypeptide is a Cas12i1 polypeptide. In some embodiments, the Cas12i1 polypeptide of the present invention comprises a polypeptide sequence having 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO: 8. In some embodiments, the Cas12i1 polypeptide of the present invention comprises a polypeptide sequence having greater than 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO: 8.


In some embodiments, the Cas12i polypeptide has a specified degree of amino acid sequence identity to one or more reference polypeptides, e.g., a Casi1 polypeptide, e.g., at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 8. Homology or identity can be determined by amino acid sequence alignment, e.g., using a program such as BLAST, ALIGN, or CLUSTAL, as described herein.


In some embodiments, a nucleic acid encoding the Cas12i1 polypeptide as described herein encodes an amino acid sequence having at least 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO: 8. In some embodiments, the Cas12i1 polypeptide has a sequence greater than 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO: 8.


In some embodiments, a Cas12i1 polypeptide described herein having enzymatic activity, e.g., nuclease or endonuclease activity, comprises an amino acid sequence which differs from the amino acid sequences of any one of a Cas12i polypeptide and SEQ ID NO: 8 by 100, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 40, 35, 30, 25, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 amino acid residue(s), when aligned using any of the previously described alignment methods.


In some embodiments, the Cas12i polypeptide is a Cas12i3 polypeptide. In some embodiments, the Cas12i3 polypeptide of the present invention comprises a polypeptide sequence having 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO: 11. In some embodiments, the Cas12i3 polypeptide of the present invention comprises a polypeptide sequence having greater than 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO: 11.


In some embodiments, the Cas12i3 polypeptide has a specified degree of amino acid sequence identity to one or more reference polypeptides, e.g., a parent polypeptide, e.g., at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 11. Homology or identity can be determined by amino acid sequence alignment, e.g., using a program such as BLAST, ALIGN, or CLUSTAL, as described herein.


In some embodiments, a nucleic acid encoding the Cas12i3 polypeptide as described herein encodes an amino acid sequence having at least 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO: 11. In some embodiments, the Cas12i3 polypeptide has a sequence greater than 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO: 11.


In some embodiments, a Cas12i3 polypeptide described herein having enzymatic activity, e.g., nuclease or endonuclease activity, comprises an amino acid sequence which differs from the amino acid sequences of any one of a Cas12i polypeptide and SEQ ID NO: 11 by 100, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 40, 35, 30, 25, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 amino acid residue(s), when aligned using any of the previously described alignment methods.


In some embodiments, the Cas12i polypeptide is a Cas12i4 polypeptide. In some embodiments, the Cas12i4 polypeptide of the present invention comprises a polypeptide sequence having 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO: 9 or SEQ ID NO: 10. In some embodiments, the Cas12i4 polypeptide of the present invention comprises a polypeptide sequence having greater than 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO: 9 or SEQ ID NO: 10.


In some embodiments, the Cas12i4 polypeptide has a specified degree of amino acid sequence identity to one or more reference polypeptides, e.g., a parent polypeptide, e.g., at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to the amino acid sequence of SEQ ID NO: 9 or SEQ ID NO: 10. Homology or identity can be determined by amino acid sequence alignment, e.g., using a program such as BLAST, ALIGN, or CLUSTAL, as described herein.


In some embodiments, a nucleic acid encoding the Cas12i4 polypeptide as described herein encodes an amino acid sequence having at least 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO: 9 or SEQ ID NO: 10. In some embodiments, the Cas12i4 polypeptide has a sequence greater than 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity to SEQ ID NO: 9 or SEQ ID NO: 10.


In some embodiments, a Cas12i4 polypeptide described herein having enzymatic activity, e.g., nuclease or endonuclease activity, comprises an amino acid sequence which differs from the amino acid sequences of any one of a Cas12i polypeptide and SEQ ID NO: 9 or SEQ ID NO: 10 by 100, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 40, 35, 30, 25, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 amino acid residue(s), when aligned using any of the previously described alignment methods.


In some embodiments, the Cas12i polypeptide comprises an alteration at one or more (e.g., several) amino acids of a parent polypeptide, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 162, 164, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 193, 194, 195, 196, 197, 198, 199, 200, or more are altered.


An alteration may comprise a substitution, an insertion, deletion, addition, or fusion of an amino acid or amino acids in a peptide or polypeptide or a nucleotide or nucleotides in a nucleotide or nucleotides relative to a reference sequence. No particular process is implied in how to make a sequence comprising an alteration. For instance, a sequence comprising an alteration can be synthesized directly from individual nucleotides. In other embodiments, an alteration is made by providing and then altering a reference sequence.


In some embodiments, the nucleotide sequence encoding the Cas12i polypeptide described herein can be codon-optimized for use in a particular host cell or organism. For example, the nucleic acid can be codon-optimized for any non-human eukaryote including mice, rats, rabbits, dogs, livestock, or non-human primates. Codon usage tables are readily available, for example, at the “Codon Usage Database” available at www.kazusa.orjp/codon/ and these tables can be adapted in a number of ways. See Nakamura et al. Nucl. Acids Res. 28:292 (2000), which is incorporated herein by reference in its entirety. Computer algorithms for codon optimizing a particular sequence for expression in a particular host cell are also available, such as Gene Forge (Aptagen; Jacobus, Pa.).


Although the changes described herein may be one or more amino acid changes, changes to the Cas12i polypeptide may also be of a structural or substantive nature, such as fusion of polypeptides as amino- and/or carboxyl-terminal extensions. For example, the Cas12i polypeptide may contain additional peptides, e.g., one or more peptides. Examples of additional peptides may include epitope peptides for labelling, such as a polyhistidine tag (His-tag), Myc, and FLAG. In some embodiments, the Cas12i polypeptide described herein can be fused to a detectable moiety such as a fluorescent protein (e.g., green fluorescent protein (GFP) or yellow fluorescent protein (YFP)).


In some embodiments, the Cas12i polypeptide as in any one of the embodiments described herein comprises at least one (e.g., two, three, four, five, six, or more) nuclear localization signal (NLS). In some embodiments, the Cas12i polypeptide comprises at least one (e.g., two, three, four, five, six, or more) nuclear export signal (NES). In some embodiments, the Cas12i polypeptide comprises at least one (e.g., two, three, four, five, six, or more) NLS and at least one (e.g., two, three, four, five, six, or more) NES.


In some embodiments, the Cas12i polypeptide comprises at least a RuvC domain but less than the whole Cas12i polypeptide. In some embodiments, the Cas12i polypeptide is a truncated Cas12i polypeptide relative to a wild-type Cas12i polypeptide. In some embodiments, the truncated Cas12i polypeptide comprises a RuvC domain. In some embodiments, the Cas12i polypeptide comprises at least one functional domain of the whole Cas12i polypeptide. In some embodiments, the Cas12i polypeptide comprises at least two RuvC domains or at least two RuvC motifs. In some embodiments, the Cas12i polypeptide comprises at least three RuvC domains or at least three RuvC motifs. In some embodiments, the Cas12i polypeptide comprises at least one catalytically dead RuvC domain and at least one catalytically active RuvC domain. In some embodiments, the Cas12i polypeptide comprises two RuvC domains from one or more Type V or Type II nucleases. In some embodiments, the Cas12i polypeptide comprises at least a RuvC domain and a dimerization domain.


In some embodiments, the Cas12i polypeptide as described in any one of the previous embodiments is fused to a deaminase polypeptide. In some embodiments, the Cas12i polypeptide comprises an N-terminal deaminase polypeptide. In some embodiments, the Cas12i polypeptide comprises a C-terminal deaminase polypeptide. In some embodiments, the Cas12i polypeptide comprises a deaminase polypeptide at an intramolecular position within the Cas12i polypeptide (e.g., the deaminase is within a loop of the Cas12i polypeptide.


In some embodiments, the Cas12i polypeptide as in any one of the embodiments described herein interacts with a deaminase polypeptide (e.g., through electrostatic interactions). In some embodiments, the Cas12i polypeptide comprises a dimerization domain. In some embodiments, the dimer is formed by a non-covalent bond between the first dimerization domain and the second compatible dimerization domain. In some embodiments, a dimerization domain is a leucine zipper, nanobody, or antibody. In some embodiments, the dimerization domain recruits a deaminase polypeptide. In some embodiments, the Cas12i polypeptide and the deaminase polypeptide interact through coiled-coil peptide heterodimers.


Deaminase Domains


In some embodiments, the deaminase domain comprises an enzyme classified in EC 3.5.4 (e.g., cytosine deaminase (EC 3.5.4.1), adenine deaminase (EC 3.5.4.2), guanine deaminase (EC 3.5.4.3), adenosine deaminase (EC 3.5.4.4), cytidine deaminase (EC 3.5.4.5), AMP deaminase (EC 3.5.4.6), ADP deaminase (EC 3.5.4.7), aminoimidazolase (EC 3.5.4.8), methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9), IMP cyclohydrolase (EC 3.5.4.10), pterin deaminase (EC 3.5.4.11), dCMP deaminase (EC 3.5.4.12), dCTP deaminase (EC 3.5.4.13), EC 3.5.4.14 (dCTP deaminase), EC 3.5.4.5, (deoxy)cytidine deaminase (EC 3.5.4.14), guanosine deaminase (EC 3.5.4.15), adenosine-phosphate deaminase (EC 3.5.4.17), ATP deaminase (EC 3.5.4.18), phosphoribosyl-AMP cyclohydrolase (EC 3.5.4.19), pyrithiamine deaminase (EC 3.5.4.20), creatinine deaminase (EC 3.5.4.21), 1-pyrroline-4-hydroxy-2-carboxylate deaminase (EC 3.5.4.22), blasticidin-S deaminase (EC 3.5.4.23), sepiapterin deaminase (EC 3.5.4.24), GTP cyclohydrolase II (EC 3.5.4.25), diaminohydroxyphosphoribosylaminopyrimidine deaminase (EC 3.5.4.26), methenyltetrahydromethanopterin cyclohydrolase (EC 3.5.4.27), GTP cyclohydrolase ha (EC 3.5.4.29), dCTP deaminase (dUMP-forming) (EC 3.5.4.30), S-methyl-5′-thioadenosine deaminase (EC 3.5.4.31), 8-oxoguanine deaminase (EC 3.5.4.32), tRNAAla(adenine37) deaminase (EC 3.5.4.34), tRNA(cytosine8) deaminase (EC 3.5.4.35), mRNA(cytosine6666) deaminase (EC 3.5.4.36), double-stranded RNA adenine deaminase (EC 3.5.4.37), single-stranded DNA cytosine deaminase (EC 3.5.4.38), GTP cyclohydrolase IV (EC 3.5.4.39), aminodeoxyfutalosine deaminase (EC 3.5.4.40), 5′-deoxyadenosine deaminase (EC 3.5.4.41), N-isopropylammelide isopropylaminohydrolase (EC 3.5.4.42), hydroxydechloroatrazine ethylaminohydrolase (EC 3.5.4.43), ectoine hydrolase (EC 3.5.4.44), melamine deaminase (EC 3.5.4.45), cAMP deaminase (EC 3.5.4.46), EC 3.5.4.31 (EC 3.5.4.n1), EC 3.5.4.39 (EC 3.5.4.n2), and EC 3.5.4.45 (EC 3.5.4.n3)), or any biologically active portion thereof.


In particular embodiments, the deaminase domain is a cytidine deaminase domain. In certain embodiments, the cytidine deaminase domain is an apolipoprotein B mRNA-editing complex (APOBEC) family deaminase. In certain embodiments, the cytidine deaminase is an APOBEC1 (UniprotKB—P41238), an APOBEC2 (UniprotKB—Q9Y235), an APOBEC3 (e.g., APOBEC3A (UniprotKB—P31941), APOBEC3B (UniprotKB—Q9UH17), APOBEC3C (UniprotKB—Q9NRW3), APOBEC3D (Q96AK3), APOBEC3E, APOBEC3F (UniprotKB—Q8IUX4), APOBEC3G (UniprotKB—Q9HC16), or APOBEC3H (UniprotKB—Q6NTF7)), an APOBEC4 (UniprotKB—Q8WW27) deaminase, or an Activation-induced (cytidine) deaminase (AID) (UniprotKB—Q9GZX7), or a biologically active portion or variant thereof. In certain embodiments, the cytidine deaminase is APOBEC3a (A3A) (e.g., human APOBEC3a), or a biologically active portion thereof. In certain embodiments, the cytidine deaminase is Activation Induced Deaminase (AID), or a biologically active portion thereof.


In certain embodiments, the deaminase domain is an adenine deaminase domain. In certain embodiments, the deaminase domain is an ABE8 deaminase. In certain embodiments, the ABE8 selected from ABE8.1, ABE8.2, ABE8.3, ABE8.4, ABE8.5, ABE8.6, ABE8.7, ABE8.8, ABE8.9, ABE8.10, ABE8.11, ABE8.12, ABE8.13, ABE8.17, or ABE8.20.


In some embodiments, the deaminase domain is an adenosine deaminase domain. In some embodiments, the adenosine deaminase is a TadA deaminase. In some embodiments, the TadA deaminase is TadA variant. In some embodiments, the TadA variant is a TadA*8. In some embodiments, the deaminase or deaminase domain is a variant of a naturally occurring deaminase from an organism, such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, or mouse. In some embodiments, the deaminase or deaminase domain does not occur in nature. For example, in some embodiments, the deaminase or deaminase domain is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, or at least 99.9% identical to a naturally occurring deaminase. For example, deaminase domains are described in International PCT Application Nos. PCT/2017/045381 (WO 2018/027078) and PCT/US2016/058344 (WO 2017/070632), each of which is incorporated herein by reference for its entirety. Also, see Komor, A. C., et al., “Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage” Nature 533, 420-424 (2016); Gaudelli, N. M., et al., “Programmable base editing of A·T to G·C in genomic DNA without DNA cleavage” Nature 551, 464-471 (2017); Komor, A. C., et al., “Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity” Science Advances 3:eaao4774 (2017)), and Rees, H. A., et al., “Base editing: precision chemistry on the genome and transcriptome of living cells.” Nat Rev Genet. 2018 December; 19(12):770-788. Doi: 10.1038/s41576-018-0059-1, the entire contents of which are hereby incorporated by reference.


Cas12i-Deaminase Fusion Polypeptides


The present disclosure provides Cas12i fusion proteins comprising a Cas12i domain (e.g., a Cas12i1, Cas12i2, Cas12i3, or Cas12i4 domain) and a deaminase domain as described herein wherein the Cas12i fusion protein binds to a target on a nucleic acid specified by an RNA guide. In some embodiments, the Cas12i2 fusion protein has enzymatic activity. In some embodiments, the enzymatic activity can be carried out by the Cas12i2 domain. In some embodiments, the enzymatic activity is carried out by the deaminase domain. In some embodiments, the deaminase domain is fused N-terminally to the Cas12i domain. In some embodiments, the deaminase domain is fused C-terminally to the Cas12i domain. In certain embodiments, the deaminase domain is fused directed to the Cas12i domain. In some embodiments, the Cas12i fusion proteins comprise a first deaminase domain fused N-terminally to the Cas12i domain and a second deaminase domain fused C-terminally to the Cas12i domain. In some embodiments, the deaminase domain is fused to the Cas12i through a linker. In some embodiments, the linker is a peptide linker as described herein.


In one aspect, the disclosure provides a Cas12i fusion protein comprising, in an N-terminal to C-terminal direction:

    • (a) a first, N-terminal portion of a Cas12i polypeptide, wherein the N-terminal portion of the Cas12i polypeptide comprises a Cas12i sequence from the N-terminus to a loop, or a functional fragment or variant thereof;
    • (b) a heterologous sequence comprising a deaminase domain, and
    • (c) a second, C-terminal portion of the Cas12i polypeptide, wherein the C-terminal portion of the Cas12i polypeptide comprises a Cas12i sequence from the loop to the C-terminus, or a fragment or variant thereof.


In one aspect, the disclosure provides a Cas12i fusion protein, wherein the N-terminal portion of the Cas12i polypeptide comprises amino acids 1-n of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and

    • the C-terminal portion of the Cas12i polypeptide comprises amino acids m-1054 of SEQ ID NO: 1, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


In some embodiments, n and m are each independently a number between:

    • i) 342-358 (e.g., 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358);
    • ii) 373-378 (e.g., 373, 374, 375, 376, 377, or 378);
    • iii) 408-413 (e.g., 408, 409, 410, 411, 412, or 413);
    • iv) 677-685 (e.g., 677, 678, 679, 680, 681, 682, 683, 684, or 685);
    • v) 718-723 (e.g., 718, 719, 720, 721, 722, or 723);
    • vi) 771-782 (e.g., 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, or 782);
    • vii) 953-965 (e.g., 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, or 965);
    • viii) 55-65 (e.g., 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, or 65);
    • ix) 99-105 (e.g., 99, 100, 101, 102, 103, 104, or 105);
    • x) 112-120 (e.g., 112, 113, 114, 115, 116, 117, 118, 119, or 120);
    • xi) 195-206 (e.g., 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, or 206);
    • xii) 241-250 (e.g., 241, 242, 243, 244, 245, 246, 247, 248, 249, or 250);
    • xiii) 583-594 (e.g., 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, or 594).
    • xiv) 877-901 (e.g., 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, or 901); or
    • xv) 386-397 (e.g., residue 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, or 397).


In some embodiments, n<m. In some embodiments, m=n+1. In certain embodiments, the Cas12i fusion protein comprises a component of Table 3.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids S342-L358


In some embodiments of any Cas12i2 fusion protein described herein, a) n is 342 and m is 343, or b) n is 347 and m is 348. In some embodiments, the first portion comprises at least 273, 280, 290, 300, 310, 320, 330, 340, 341, or 342 amino acids. In certain embodiments, the second portion comprises at least 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 711, or 712 amino acids. In certain embodiments, the C-terminal amino acid(s) of the first portion comprise FDS, DS, or S. In some embodiments, the N-terminal amino acid(s) of the second portion comprise EFS, EF, or E. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of SEFFSGEETYTICV (SEQ ID NO: 107), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, 11 and 12, 12 and 13, or 13 and 14 of SEQ ID NO: 107. In certain embodiments, one or more amino acids of SEQ ID NO: 107 are absent from the Cas12i2 fusion protein. In other embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 sequential amino acids of SEQ ID NO: 107 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 sequential amino acids of SEQ ID NO: 107 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids D373-E378


In certain embodiments, n is 374 and m is 375. In some embodiments, the first portion comprises at least 300, 310, 320, 330, 340, 350, 360, 370, 373, 374, 375, 376, or 377 amino acids. In certain embodiments, the second portion comprises at least 544, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, or 680 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise DDP, DP, or P. In some embodiments, the N-terminal amino acid(s) of the second portion comprise ADP, AD, or A. In certain embodiments, the heterologous moiety is situated between any two adjacent amino acids of DPADPE (SEQ ID NO: 108), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, or 5 and 6 of SEQ ID NO: 108. In some embodiments, one or more amino acids of SEQ ID NO: 108 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 108 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 108 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids D386-I397


In some embodiments of any Cas12i2 fusion protein described herein, a) n is 386 and m is 387, b) n is 387 and m is 388, c) n is 388 and m is 389, d) n is 389 and m is 390, e) n is 390 and m is 391, f) n is 391 and m is 392, g) n is 392 and m is 393, h) n is 393 and m is 394, i) n is 394 and m is 395, j) n is 395 and m is 396, or k) n is 396 and m is 397. In some embodiments, the first portion comprises at least 308, 310, 320, 330, 340, 350, 360, 370, 380, or 390 amino acids. In certain embodiments, the second portion comprises at least 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, or 680 amino acids. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of DDLKNNFKKEPI (SEQ ID NO: 131), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, or 11 and 12 of SEQ ID NO: 131. In certain embodiments, one or more amino acids of SEQ ID NO: 107 are absent from the Cas12i2 fusion protein. In other embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 131 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 131 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids R408-A413


In some embodiments of the fusion Cas12i2 proteins described herein, a) n is 409 and m is 410 or b) n is 410 and m is 411. In certain embodiments, the first portion comprises at least 328, 330, 340, 350, 360, 370, 380, 390, 400, 405, 406, 407, 408, 409, or 410 amino acids. In some embodiments, the second portion comprises at least 516, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 641, 642, 643, 644, or 645 amino acids. In certain embodiments, the C-terminal amino acid(s) of the first portion comprise IRQE, RQ, Q, or E. In some embodiments, the N-terminal amino acid(s) of the second portion comprise ECS, EC, E, or C. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of RQECSA (SEQ ID NO: 109), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, or 5 and 6 of SEQ ID NO: 109. In some embodiments, one or more amino acids of SEQ ID NO: 109 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 109 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 109 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids K677-V685


In some embodiments, n is 682 and m is 683. In some embodiments, the first portion comprises at least 546, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 681, or 682 amino acids. In certain embodiments, the second portion comprises at least 298, 300, 310, 320, 330, 340, 350, 360, 370, 371, or 372 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise KKK, KK, or K. In some embodiments, the N-terminal amino acid(s) of the second portion comprise EIV, EI, or E. In certain embodiments, the heterologous moiety is situated between any two adjacent amino acids of KKNKKKEIV (SEQ ID NO: 110), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 7 and 8, or 8 and 9 of SEQ ID NO: 110. In some embodiments, one or more amino acids of SEQ ID NO: 110 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 110 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 110 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids V718-L723


In some embodiments, n is 721 and m is 722. In some embodiments, the first portion comprises at least 577, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, or 721 amino acids. In certain embodiments, the second portion comprises at least 266, 270, 280, 290, 300, 310, 320, 330, 331, 332, or 333 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise RGK, GK, or K. In some embodiments, the N-terminal amino acid(s) of the second portion comprise SLV, SL, or S. In certain embodiments, the heterologous moiety is situated between any two adjacent amino acids of VRGKSL (SEQ ID NO: 111), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, or 5 and 6 of SEQ ID NO: 111. In some embodiments, one or more amino acids of SEQ ID NO: 111 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 111 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, or 6 sequential amino acids of SEQ ID NO: 111 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids A771-D782


In some embodiments, n is 778 and m is 779. In certain embodiments, the first portion comprises at least 622, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 775, 776, 777 or 778 amino acids. In certain embodiments, the second portion comprises at least 221, 225, 230, 240, 250, 260, 270, 275, or 276 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise KNN, NN, or N. In certain embodiments, the N-terminal amino acid(s) of the second portion comprise PIS, PI, or P. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of ALNASKNNPISD (SEQ ID NO: 112), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, or 11 and 12 of SEQ ID NO: 112. In some embodiments, one or more amino acids of SEQ ID NO: 112 are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 112 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 112 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids L953-C965


In some embodiments, n is 960 and m is 961. In certain embodiments, the first portion comprises at least 768, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, or 960 amino acids. In certain embodiments, the second portion comprises at least 75, 80, 85, 90, 91, 92, 93, or 94 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise DRK, RK, or K. In certain embodiments, the N-terminal amino acid(s) of the second portion comprise SNI, SN, or S. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of LKWRSDRKSNIPC (SEQ ID NO: 113), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, 11 and 12, or 12 and 13 of SEQ ID NO: 113. In certain embodiments, one or more amino acids of SEQ ID NO: 113 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 sequential amino acids of SEQ ID NO: 113 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 sequential amino acids of SEQ ID NO: 113 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids S55-I65


In some embodiments of the Cas12i2 fusion protein described herein, a) n is 61 and m is 62, orb) n is 62 and m is 63. In some embodiments, the first portion comprises at least 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, or 61 amino acids. In certain embodiments, the second portion comprises at least 795, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, or 991 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise EKQ, KQ, or Q. In certain embodiments, the N-terminal amino acid(s) of the second portion comprise QQD, QQ, or Q. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of STEQEKQQQDI (SEQ ID NO: 114), e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, or 8 and 9 of SEQ ID NO: 114. In certain embodiments, one or more amino acids of SEQ ID NO: 114 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 amino acids of SEQ ID NO: 114 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 114 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids Y99-D105


In certain embodiments of the Cas12i2 fusion protein described herein, a) n is 101 and m is 102, orb) n is 102 and m is 103. In certain embodiments, the first portion comprises at least 81, 90, 100, or 101 amino acids. In certain embodiments, the second portion comprises at least 762, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 951, 952, or 953 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise YGGT, YGG, GG, G, or T. In some embodiments, the N-terminal amino acid(s) of the second portion comprise TAS, TA, AS, T, or A. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of YGGTASD (SEQ ID NO: 115), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, or 6 and 7 of SEQ ID NO: 115. In some embodiments, one or more amino acids of SEQ ID NO: 115 are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, or 7 sequential amino acids of SEQ ID NO: 115 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, or 7 sequential amino acids of SEQ ID NO: 115 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids S112-Y120


In some embodiments, n is 116 and m is 117. In certain embodiments, the first portion comprises at least 81, 90, 100, or 101 amino acids. In some embodiments, the second portion comprises at least 762, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 951, 952, or 953 amino acids. In other embodiments, the C-terminal amino acid(s) of the first portion comprise SIG, IG, or G. In certain embodiments, the N-terminal amino acid(s) of the second portion comprise ESY, ES, or E. In other embodiments, the heterologous moiety is situated between any two adjacent amino acids of SASIGESYY (SEQ ID NO: 116), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, or 8 and 9 of SEQ ID NO: 116. In some embodiments, one or more amino acids of SEQ ID NO: 116 are absent from the Cas12i2 fusion protein. In other embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 116 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 116 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids S195-P206


In some embodiments, n is 199 and m is 200. In other embodiments, the first portion comprises at least 160, 170, 180, 190, 195, 196, 197, 198, or 199 amino acids. In certain embodiments, the second portion comprises at least 684, 690, 700, 710, 720, 730, 740, 750, 760, 780, 790, 800, 810, 820, 830, 840, 850, or 855 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise LKE, KE, or E. In some embodiments, the N-terminal amino acid(s) of the second portion comprise IPK, IP, or I. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of SNLKEIPKNVAP (SEQ ID NO: 117), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, or 11 and 12 of SEQ ID NO: 117. In some embodiments, one or more amino acids of SEQ ID NO: 117 are absent from the Cas12i2 fusion protein. In other embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 117 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In other embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 117 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids K241-L250


In some embodiments, n is 246 and m is 247. In other embodiments, the first portion comprises at least 197, 200, 210, 220, 230, 240, 245, or 246 amino acids. In certain embodiments, the second portion comprises at least 646, 650, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 780, 790, 800, 805, 806, 807, or 808 amino acids. In yet another embodiment, the C-terminal amino acid(s) of the first portion comprise GQK, QK, or K. In certain embodiments, the N-terminal amino acid(s) of the second portion comprise EFD, EF, or E. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of KDGQKEFDL (SEQ ID NO: 118), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, or 8 and 9 of SEQ ID NO: 118. In some embodiments, one or more amino acids of SEQ ID NO: 118 are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 118 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or 9 sequential amino acids of SEQ ID NO: 118 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at the Loop Region of Amino Acids G583-R594


In some embodiments of the Cas12i2 fusion protein described herein, a) n is 587 and m is 588, or b) n is 590 and m is 591. In other embodiments, the first portion comprises at least 470, 472, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 585, 587, or 590 amino acids. In certain embodiments, the second portion comprises at least 371, 374, 380, 390, 400, 410, 420, 430, 440, 450, 460, 464, or 467 amino acids. In some embodiments, the C-terminal amino acid(s) of the first portion comprise: a) QKG, KG, or G; or b) TLQ, LQ, or Q. In some embodiments, the N-terminal amino acid(s) of the second portion comprise: a) TLQ, TL, or T; orb) IGD, IG, or I. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of GRQKGTLQIGDR (SEQ ID NO: 119), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, or 11 and 12 of SEQ ID NO: 119. In certain embodiments, one or more amino acids of SEQ ID NO: 119 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 119 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 sequential amino acids of SEQ ID NO: 119 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


Exemplary Cas12i2 Fusion Proteins Having a Heterologous Sequence at Loop the Region of Amino Acids C877-W901


In some embodiments of the Cas12i2 fusion protein described herein, a) n is 893 and m is 894, or b) n is 894 and m is 895. In other embodiments, the first portion comprises at least 715, 716, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 891, 892, 893, or 894 amino acids. In some embodiments, the second portion comprises at least 128, 129, 130, 140, 150, 160, or 161 amino acids. In certain embodiments, the C-terminal amino acid(s) of the first portion comprise: a) RNP, NP, or P; or b) NPD, PD, or D. In some embodiments, the N-terminal amino acid(s) of the second portion comprise: a) DKA, DK, or D; orb) KAM, KA, or K. In some embodiments, the heterologous moiety is situated between any two adjacent amino acids of CGSLYTSHQDPLVHRNPDKAMKCRW (SEQ ID NO: 120), or an amino acid sequence having at least 80% (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) identity thereto, e.g., between positions 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 8 and 9, 9 and 10, 10 and 11, 11 and 12, 12 and 13, 13 and 14, 14 and 15, 15 and 16, 16 and 17, 17 and 18, 18 and 19, 19 and 20, 20 and 21, 21 and 22, 22 and 23, 23 and 24, 24 and 25, of SEQ ID NO: 120. In other embodiments, one or more amino acids of SEQ ID NO: 120 are absent from the Cas12i2 fusion protein. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, or 24 sequential amino acids of SEQ ID NO: 120 that are N-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein. In other embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, or 24 sequential amino acids of SEQ ID NO: 120 that are C-terminal to the heterologous sequence are absent from the Cas12i2 fusion protein.


In certain embodiments, the heterologous sequence comprises at least one linker sequence. In some embodiments, the heterologous sequence comprises a first linker (e.g., a first peptide linker) and a second linker (e.g., a second peptide linker). In some embodiments, the first linker and the second linker each independently comprise between 3 and 70 amino acid residues (e.g., 5, 10, 15, 20, 25, 30, 35, 40, 50, 55, 60, 65, or 70, between 3-10, between 10-15, between 15-20, between 20-25, between 25-30, between 30-35, between 35-40, between 40-45, between 45-50, between 50-55, between 55-60, between 60-65, or between 65-70). In some embodiments, the first linker and the second linker each independently comprise one or more Gly residues and/or one or more Ser residues. In other embodiments, the first linker and the second peptide linker each independently comprise (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In certain embodiments, the first linker and the second linker each independently comprise one or more proline residues. In some embodiments, the first linker is N-terminal of the deaminase domain, and the second linker is C-terminal of the deaminase domain. In certain embodiments, the first linker and the second linker have the same sequence. In some embodiments, the first linker and the second linker have different sequences.


In one aspect, the Cas12i fusion protein comprises

    • (a) a Cas12i (e.g., a Cas12i1, Cas12i2, Cas12i3, or Cas12i4) polypeptide,
    • (b) a deaminase domain (e.g., any deaminase described herein), or a biologically active portion or variant thereof.


In some embodiments, the Cas12i polypeptide is a Cas12i1 polypeptide. In some embodiments, the Cas12i polypeptide is a Cas12i2 polypeptide. In some embodiments, the Cas12i polypeptide is a Cas12i3 polypeptide. In some embodiments, the Cas12i polypeptide is a Cas12i4 polypeptide.


In some embodiments, the deaminase domain is N-terminal of the Cas12i polypeptide. In certain embodiments, the deaminase domain is C-terminal of the Cas12i polypeptide.


In certain embodiments, the fusion protein does not comprise a linker sequence.


In some embodiments, the fusion protein further comprises at least one heterologous sequence, which is heterologous to both the Cas12i domain and the deaminase domain. In certain embodiments, the heterologous sequence comprises a Uracil Glycosylase Inhibitor (UGI) polypeptide. In some embodiments, the heterologous sequence comprises a nuclear localization sequence (NLS) polypeptide.


In some embodiments, the fusion protein comprises, one, two, or three of:

    • i. a first heterologous sequence situated between the Cas12i domain and the deaminase domain;
    • ii. a second heterologous sequence situated between the Cas12i domain and the terminus nearest the Cas12i domain; or
    • iii) a third heterologous sequence situated between the deaminase domain and the terminus nearest the deaminase domain.


In certain embodiments, the deaminase domain is N-terminal of the Cas12i domain and the UGI domain. In certain embodiments, the deaminase domain is C-terminal of the Cas12i domain. In some embodiments, the deaminase domain is N-terminal of the Cas12i domain, the first heterologous sequence comprises the UGI domain, the second heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide), and the third heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide).


In some embodiments, the deaminase domain is C-terminal of the Cas12i domain, the first heterologous sequence comprises a linker, the second heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide) and the UGI domain, and the third heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide).


In certain embodiments, the deaminase domain is C-terminal of the Cas12i domain, the first heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide), the second heterologous sequence is absent or comprises a peptide tag (e.g., a peptide purification tag), and the third heterologous sequence comprises a UGI domain and an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide).


In some embodiments, the first heterologous sequence comprises the UGI polypeptide. In certain embodiments, the UGI polypeptide is flanked by peptide linkers. In some embodiments, the second and third heterologous sequence each independently comprise an NLS polypeptide.


In some embodiments, the first heterologous sequence comprises a peptide linker and, when present, the second heterologous sequence or the third heterologous sequence comprises an NLS polypeptide, one or more linkers, and a UGI polypeptide. In certain embodiments, the NLS polypeptide is N-terminal of the UGI polypeptide.


In some embodiments, the NLS polypeptide is C-terminal of the UGI polypeptide.


In certain embodiments, one of the one or more linkers is situated between the NLS polypeptide and the UGI polypeptide. In certain embodiments, the third heterologous sequence comprises a first and a second linker, wherein the first linker is situated N-terminal the NLS polypeptide and the UGI polypeptide and the second linker is situated between the NLS polypeptide and the UGI polypeptide. In some embodiments, the first heterologous sequence further comprises an NLS sequence. In certain embodiments, the NLS polypeptide is situated N-terminal of the linker.


In some embodiments, the fusion protein does not comprise the second heterologous sequence.


In one aspect, the disclosure provides a fusion protein comprising:

    • (a) a Cas12i (e.g., a Cas12i1, Cas12i2, Cas12i3, or Cas12i4) polypeptide,
    • (b) a deaminase domain; and
    • (c) a UGI polypeptide.


In some embodiments, the deaminase domain is N-terminal of the Cas12i domain and the UGI domain. In certain embodiments, the deaminase domain is C-terminal of the Cas12i domain. In some embodiments, the deaminase domain is N-terminal of the Cas12i4 polypeptide. In some embodiments, the deaminase domain is C-terminal of the Cas12i4 domain.


In some embodiments, the fusion protein does not comprise a linker sequence.


In some embodiments, the fusion protein comprises at least one heterologous sequence. In certain embodiments, the heterologous sequence is heterologous to each of the Cas12i domain (e.g., Cas12i4 domain), the deaminase domain, and the UGI polypeptide. In certain embodiments, the heterologous sequence comprises a nuclear localization sequence (NLS) polypeptide.


In some embodiments, the fusion protein comprises, one, two, or three of:

    • i. a first heterologous sequence situated between the Cas12i domain and the deaminase domain;
    • ii. a second heterologous sequence situated between the Cas12i domain and the terminus nearest the Cas12i domain; or
    • iii) a third heterologous sequence situated between the deaminase domain and the terminus nearest the deaminase domain.


In some embodiments, the fusion protein does not comprise the first heterologous sequence, and the UGI domain is situated between the deaminase domain and the Cas12i domain.


In certain embodiments, UGI domain is situated C-terminal of both the deaminase domain and the Cas12i domain.


In certain embodiments, the UGI domain is flanked by peptide linkers.


In some embodiments, when present, the first and second heterologous sequence each independently comprise an NLS polypeptide.


In certain embodiments, the one, two, or three of the first, the second, and the third heterologous sequence each independently comprise one or more peptide linkers and the third heterologous sequence comprises an NLS polypeptide. In certain embodiments, NLS polypeptide is N-terminal of the UGI polypeptide. In some embodiments, the NLS polypeptide is C-terminal of the UGI polypeptide. In some embodiments, one of the one or more of the linkers is situated between the NLS polypeptide and the UGI polypeptide.


In certain embodiments, the third heterologous sequence comprises a first and a second linker, wherein the first linker is situated N-terminal the NLS polypeptide and the UGI polypeptide and the second linker is situated between the NLS polypeptide and the UGI polypeptide.


In some embodiments, the first heterologous sequence comprises an NLS sequence. In certain embodiments, the NLS polypeptide is situated N-terminal of the linker.


In some embodiments, the Cas12i fusion protein is a is a fusion protein of Table 4. In some embodiments, a Cas12i fusion protein comprises an amino acid sequence of any one of SEQ ID NOs: 41-46.


In some embodiments, a Cas12i fusion protein is a polypeptide of Table 8. In some embodiments, a Cas12i fusion protein comprises an amino acid sequence of any one of SEQ ID NOs: 60-65.


In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 60, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can produce a substitution in a target sequence of a target nucleic acid.


In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 61, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can produce a substitution in a target sequence of a target nucleic acid.


In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 62, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can produce a substitution in a target sequence of a target nucleic acid.


In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 63, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can produce a substitution in a target sequence of a target nucleic acid.


In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 64, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can produce a substitution in a target sequence of a target nucleic acid.


In one aspect, the disclosure provides a fusion protein that forms a complex with a ribonucleic acid (RNA) guide wherein the RNA guide comprises a nuclease binding sequence (e.g., a direct repeat sequence) and a DNA-binding sequence (e.g., a spacer sequence).


Exemplary Circularly Permuted Cas12i2 Fusion Proteins


In another aspect, the disclosure provides an engineered, non-naturally occurring Cas12i2 protein comprising:

    • a) a first portion comprising an amino acid sequence of an N-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and
    • b) a second portion comprising an amino acid sequence of a C-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto,
    • wherein the second portion is N-terminal of the first portion,
    • wherein the first portion and second portion together bind to an RNA guide comprising a direct repeat sequence and a spacer sequence. In some embodiments, the circularly permuted Cas12i2 protein is capable of specifically binding a target nucleic acid complementary to the spacer sequence.


In certain embodiments, the first portion and the second portion are linked by a heterologous sequence. In some embodiments, the heterologous sequence comprises one or more of:

    • a) a first linker (e.g., a first peptide linker);
    • b) a second linker (e.g., a second peptide linker); and
    • c) a fusion domain.


In some embodiments, the heterologous sequence comprises each of a first linker (e.g., a first peptide linker), a second linker (e.g., a second peptide linker), and a fusion domain, wherein the fusion domain is disposed between the first linker and the second linker. In certain embodiments, the first linker and the second linker, when present, comprise between 3 and 60 amino acid residues. In some embodiments, the first linker and the second linker each independently comprise the amino acid sequence (GSG)x, (GGGS)x, or (GSSG)x, wherein X is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15).


In some embodiments, the C-terminal most amino acid of the first portion is any amino acid residue of any one of SEQ ID NOs: 2-7 chosen from residues:

    • a) 342-358 (e.g., residue 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358);
    • b) 373-378 (e.g., residue 373, 374, 375, 376, 377, or 378);
    • c) 408-413 (e.g., residue 408, 409, 410, 411, 412, or 413);
    • d) 677-685 (e.g., residue 677, 678, 679, 680, 681, 682, 683, 684, or 685);
    • e) 718-723 (e.g., residue 718, 719, 720, 721, 722, or 723);
    • f) 771-782 (e.g., residue 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, or 782);
    • g) 953-965 (e.g., residue 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, or 965);
    • h) 55-65 (e.g., residue 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, or 65);
    • i) 99-105 (e.g., residue 99, 100, 101, 102, 103, 104, or 105);
    • j) 112-120 (e.g., residue 112, 113, 114, 115, 116, 117, 118, 119, or 120);
    • k) 195-206 (e.g., residue 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, or 206);
    • l) 241-250 (e.g., residue 241, 242, 243, 244, 245, 246, 247, 248, 249, or 250);
    • m) 583-594 (e.g., residue 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, or 594);
    • n) 877-901 (e.g., residue 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, or 901);
    • o) 173-179 (e.g., residue 173, 174, 175, 176, 177, 178, or 179);
    • p) 216-221 (e.g., residue 216, 217, 218, 219, 220, or 221);
    • q) 265-272 (e.g., residue 265, 266, 267, 268, 269, 270, 271, or 272);
    • r) 456-468 (e.g., residue 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, or 468);
    • s) 476-482 (e.g., residue 476, 477, 478, 479, 480, 481, or 482);
    • t) 498-513 (e.g., residue 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, or 513);
    • u) 614-625 (e.g., residue 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, or 625);
    • v) 977-982 (e.g., residue 977, 978, 979, 980, 981, or 982);
    • w) 1007-1012 (e.g., residue 1007, 1008, 1009, 1010, 1011, or 1012);
    • x) 386-397 (e.g., residue 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, or 397); or
    • y) 831-844 (e.g., residue 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, or 844).


In some embodiments, the N-terminal most amino acid of the second portion is any amino acid residue of any one of SEQ ID NOs: 2-7 chosen from residues:

    • a) 342-358 (e.g., residue 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358);
    • b) 373-378 (e.g., residue 373, 374, 375, 376, 377, or 378);
    • c) 408-413 (e.g., residue 408, 409, 410, 411, 412, or 413);
    • d) 677-685 (e.g., residue 677, 678, 679, 680, 681, 682, 683, 684, or 685);
    • e) 718-723 (e.g., residue 718, 719, 720, 721, 722, or 723);
    • f) 771-782 (e.g., residue 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, or 782);
    • g) 953-965 (e.g., residue 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, or 965);
    • h) 55-65 (e.g., residue 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, or 65);
    • i) 99-105 (e.g., residue 99, 100, 101, 102, 103, 104, or 105);
    • j) 112-120 (e.g., residue 112, 113, 114, 115, 116, 117, 118, 119, or 120);
    • k) 195-206 (e.g., residue 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, or 206);
    • l) 241-250 (e.g., residue 241, 242, 243, 244, 245, 246, 247, 248, 249, or 250);
    • m) 583-594 (e.g., residue 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, or 594);
    • n) 877-901 (e.g., residue 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, or 901);
    • o) 173-179 (e.g., residue 173, 174, 175, 176, 177, 178, or 179);
    • p) 216-221 (e.g., residue 216, 217, 218, 219, 220, or 221);
    • q) 265-272 (e.g., residue 265, 266, 267, 268, 269, 270, 271, or 272);
    • r) 456-468 (e.g., residue 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, or 468);
    • s) 476-482 (e.g., residue 476, 477, 478, 479, 480, 481, or 482);
    • t) 498-513 (e.g., residue 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, or 513);
    • u) 614-625 (e.g., residue 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, or 625);
    • v) 977-982 (e.g., residue 977, 978, 979, 980, 981, or 982);
    • w) 1007-1012 (e.g., residue 1007, 1008, 1009, 1010, 1011, or 1012);
    • x) 386-397 (e.g., residue 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, or 397); or
    • y) 831-844 (e.g., residue 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, or 844).


In any of the embodiments described herein, the circularly permuted Cas12i2 protein further comprises a second heterologous sequence at its N-terminus. In some embodiments, the circularly permuted Cas12i2 protein further comprises an additional heterologous sequence at its C-terminus. In some embodiments, the second heterologous sequence and/or the additional heterologous sequence a chosen from a deaminase, a purification tag, a stability tag, or a restriction endonuclease or restriction endonuclease domain.


In some embodiments, a circularly permutated Cas12i2 protein comprises a) a first portion comprising an amino acid sequence of an N-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and b) a second portion comprising an amino acid sequence of a C-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, wherein the second portion is N-terminal of the first portion, and wherein the C-terminal most amino acid of the first portion is an amino acid residue of a flexible loop within the Helical II, Helical III, Nuc, or RuvC II domain. In some embodiments, the flexible loop is in proximity to or in contact with target DNA, such as a loop depicted in FIG. 12A-D.


In some embodiments, a circularly permutated Cas12i2 protein comprises a) a first portion comprising an amino acid sequence of an N-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and b) a second portion comprising an amino acid sequence of a C-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, wherein the second portion is N-terminal of the first portion, and wherein the N-terminal most amino acid of the second portion is an amino acid residue of a flexible loop within the Helical II, Helical III, Nuc, or RuvC II domain. In some embodiments, the flexible loop is in proximity to or in contact with target DNA, such as a loop depicted in FIG. 12A-D.


In some embodiments, a circularly permutated Cas12i2 protein comprises a) a first portion comprising an amino acid sequence of an N-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and b) a second portion comprising an amino acid sequence of a C-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, wherein the second portion is N-terminal of the first portion, and wherein the C-terminal most amino acid of the first portion is any amino acid residue of any one of SEQ ID NOs: 2-7 chosen from residues a) 342-358 (e.g., residue 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358); b) 373-378 (e.g., residue 373, 374, 375, 376, 377, or 378); c) 386-397 (e.g., residue 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, or 397); d) 677-685 (e.g., residue 677, 678, 679, 680, 681, 682, 683, 684, or 685); e) 771-782 (e.g., residue 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, or 782); f) 831-844 (e.g., residue 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, or 844), or g) 953-965 (e.g., residue 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, or 965). The positions of the residues are indicated in FIG. 12A-D.


In some embodiments, a circularly permutated Cas12i2 protein comprises a) a first portion comprising an amino acid sequence of an N-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and b) a second portion comprising an amino acid sequence of a C-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, wherein the second portion is N-terminal of the first portion, and wherein the N-terminal most amino acid of the second portion is any amino acid residue of any one of SEQ ID NOs: 2-7 chosen from residues a) 342-358 (e.g., residue 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358); b) 373-378 (e.g., residue 373, 374, 375, 376, 377, or 378); c) 386-397 (e.g., residue 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, or 397); d) 677-685 (e.g., residue 677, 678, 679, 680, 681, 682, 683, 684, or 685); e) 771-782 (e.g., residue 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, or 782); f) 831-844 (e.g., residue 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, or 844), or g) 953-965 (e.g., residue 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, or 965). The positions of the residues are indicated in FIG. 12A-D.


In some embodiments, a circularly permuted Cas12i2 protein is truncated relative to a Cas12i2 protein of any one of SEQ ID NOs: 2-7. In some embodiments, a circularly permuted Cas12i2 protein has a modified Helical II domain relative to the Cas12i2 protein of any one of SEQ ID NOs: 2-7. For example, in some embodiments, the circularly permuted Cas12i2 protein comprises substitutions or deletions in the Helical II domain relative to the sequence of any one of SEQ ID NOs: 2-7. In some embodiments, a circularly permuted Cas12i2 protein comprises a truncated Helical II domain. For example, in some embodiments, the circularly permuted Cas12i2 protein does not comprise one or more flexible loops or alpha helices of the Helical II domain. For example, in some embodiments, the circularly permuted Cas12i2 protein does not comprise the loop of residues 342-358 (or 343-357), the loop of residues 386-397 (or 387-396), or the alpha helices of residues 359-385 (or 358-386).


In some embodiments, the N-terminus of a circularly permutated Cas12i2 protein comprises at least one fusion domain. In some embodiments, the fusion domain is a FokI nuclease domain. See e.g., Ramirez et al., Nucleic Acids Res. 40(12): 5560-8 (2012) and Guilinger et al., Nature Biotechnology 32: 577-82 (2014). In some embodiments, the FokI nuclease domain is a catalytically active FokI nuclease domain. In some embodiments, the FokI nuclease domain is a dead (e.g., a catalytically inactive) FokI nuclease domain. In some embodiments, the circularly permuted Cas12i2 protein comprises a FokI nuclease domain at its N-terminus (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain). In some embodiments, the circularly permuted Cas12i2 protein comprises a FokI nuclease domain at its C-terminus (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain). In some embodiments, the circularly permuted Cas12i2 protein comprises a FokI nuclease domain at its N-terminus and at its C-terminus. In some embodiments, the circularly permuted Cas12i2 protein comprises a catalytically active FokI nuclease domain at its N-terminus and a catalytically active FokI nuclease domain at its C-terminus. In some embodiments, the circularly permuted Cas12i2 protein comprises a catalytically active FokI nuclease domain at its N-terminus and a catalytically inactive FokI nuclease domain at its C-terminus. In some embodiments, the circularly permuted Cas12i2 protein comprises a catalytically inactive FokI nuclease domain at its N-terminus and a catalytically active FokI nuclease domain at its C-terminus. In some embodiments, the circularly permuted Cas12i2 protein comprises a catalytically inactive FokI nuclease domain at its N-terminus and a catalytically inactive FokI nuclease domain at its C-terminus. In some embodiments wherein a circularly permuted Cas12i2 protein comprises a FokI nuclease domain at its N-terminus and at its C-terminus, the FokI nuclease domains form a dimer (e.g., a homodimer or a heterodimer). See, e.g., FIG. 11, FIG. 13A, and FIG. 13B.


In some embodiments, the FokI nuclease domain further comprises an additional fusion domain. In some embodiments, the FokI nuclease domain is a catalytically active FokI nuclease domain, and the additional fusion domain is a deaminase. In some embodiments, the FokI nuclease domain is a catalytically inactive FokI nuclease domain and the additional fusion domain is a deaminase.


In some embodiments, the circularly permuted Cas12i2 fusion protein further comprises an additional fusion domain. In some embodiments, the additional fusion domain is a deaminase. In some embodiments, the deaminase is fused to the N-terminus of the circularly permuted Cas12i2 fusion protein. In some embodiments, the deaminase is fused to the C-terminus of the circularly permuted Cas12i2 fusion protein. In some embodiments, the deaminase is inserted at an internal residue (e.g., a residue of a loop) of the circularly permuted Cas12i2 fusion protein.


In some embodiments, the circularly permuted Cas12i2 fusion protein further comprises a UGI polypeptide. In some embodiments, the UGI polypeptide is fused to the N-terminus of the circularly permuted Cas12i2 fusion protein. In some embodiments, the UGI polypeptide is fused to the C-terminus of the circularly permuted Cas12i2 fusion protein. In some embodiments, the UGI polypeptide is inserted at an internal residue (e.g., a residue of a loop) of the circularly permuted Cas12i2 fusion protein. In some embodiments, the UGI polypeptide is fused to a FokI nuclease domain (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain). In some embodiments, the circularly permuted Cas12i2 fusion protein does not comprise a UGI polypeptide.


In some embodiments, the circularly permuted Cas12i2 fusion protein further comprises at least one NLS. In some embodiments, the NLS is fused to the N-terminus of the circularly permuted Cas12i2 fusion protein. In some embodiments, the NLS is fused to the C-terminus of the circularly permuted Cas12i2 fusion protein. In some embodiments, the NLS polypeptide is inserted at an internal residue (e.g., a residue of a loop) of the circularly permuted Cas12i2 fusion protein. In some embodiments, the NLS is fused to a FokI nuclease domain (e.g., a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain).


In certain embodiments, the N-terminal Met residue of any of any one of SEQ ID NOs: 2-7 is absent. In some embodiments, the N-terminal residue of a circularly permuted Cas12i2 protein is a Met residue. In some embodiments, the Met residue is added to the N-terminus of any one of the circularly permuted Cas12i2 proteins described herein.


In some embodiments, the circularly permuted Cas12i2 protein is capable of binding an RNA guide comprising a direct repeat sequence and a spacer sequence, wherein the spacer sequence is capable of hybridizing to a target nucleic acid.


In any of the aspects described herein, the circularly permuted Cas12i2 protein comprises a catalytic residue (e.g., D599, E833, and D1019). In certain embodiments, the circularly permuted Cas12i2 protein comprises a mutation (e.g., an alanine mutation) at any one of amino acid residue D599, E833, or D1019 of any one of SEQ ID NOs: 2-7. In certain embodiments, the circularly permuted Cas12i2 protein is a dead Cas12i2 protein (e.g., a catalytically inactive Cas12i2 protein).


In some embodiments, a circularly permuted Cas12i2 protein described herein comprises nickase activity. In some embodiments, a circularly permuted Cas12i2 protein described herein nicks the target strand of a target nucleic acid. In some embodiments, a circularly permuted Cas12i2 protein described herein nicks the non-target strand of a target nucleic acid. In some embodiments, a circularly permuted Cas12i2 protein described herein nicks a target sequence adjacent to a Cas12i2 PAM sequence (e.g., a 5′-NTTN-3′ sequence). See, e.g., FIG. 11.


NLS Polypeptides


In some embodiments, Cas12i2 fusion protein comprises a nuclear localization sequence (also known as a nuclear localization signal) that promotes translocation through the nuclear envelope via nuclear pore complexes. The nuclear pore complex is composed of nucleoporins. Nucleoporins interact with transport molecules known as karyopherins. Karyopherins bind to proteins containing a nuclear localization sequence and transport the protein across the nuclear pore complex. In some embodiments a nuclear localization sequence consists of one or more short (e.g., <50 amino-acid residues) sequence of basic amino acids. In some embodiments a nuclear localization sequence consists of one or more short (e.g., <50 amino-acid residues) sequence of lysines or arginines. In some embodiments the nuclear localization sequence is monopartite or bipartite.


In some embodiments, the NLS polypeptide is selected from nuclear plasma NLS (npNLS) polypeptide or a bipartite NLS (bpNLS) polypeptide. In some embodiments, the npNLS polypeptide comprises an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 36. In some embodiments, the bpNLS polypeptide comprises an amino acid sequence having at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 38.


In some embodiments the nuclear localization sequence is disposed in the middle of the Cas12i2 fusion protein and is exposed on the fusion protein surface. In some embodiments a nuclear localization sequence is recognized by a karyopherin. In some embodiment the nuclear localization sequence interacts with one or more karyopherin. In some embodiments the karyopherin recognizes a nuclear localization sequence as it emerges from a ribosome. In some embodiments the karyopherin recognizes a nuclear localization sequence on a fully translated protein.


In some embodiments, the nuclear localization sequence is defined as the nuclear localization sequence from the proteins listed in Table 6 of US 2015-0246139, which is incorporated by reference herein.


Cas12i Polypeptide Systems


Also provided within this disclosure is a polypeptide system comprising:

    • (a) a first polypeptide comprising a Cas12i domain and a first dimerization domain, and
    • (b) a second polypeptide comprising a deaminase domain and a second, compatible dimerization domain.


In some embodiments, the first polypeptide comprises a first peptide linker situated between the Cas12i domain and the first dimerization domain.


In certain embodiments, the second polypeptide comprises a second peptide linker situated between the Cas12i domain and the second dimerization domain.


In some embodiments, the first polypeptide and the second polypeptide form a complex.


In some embodiments, the disclosure provides a first nucleic acid sequence encoding the first polypeptide and a second nucleic acid sequence encoding the second polypeptide. The first and second nucleic acid sequences may be in the same or different nucleic acid molecules.


Dimerization Domains


In some embodiments, a protein described herein, e.g., a polypeptide comprising a Cas12i domain, a polypeptide comprising a deaminase domain, or a Cas12i fusion protein, comprises a dimerization domain. Typically, a dimerization domain is a polypeptide domain capable of specifically binding a separate, and compatible, polypeptide domain (e.g., a second compatible dimerization domain). In some embodiments, the dimer is formed by a non-covalent bond between the first dimerization domain and the second compatible dimerization domain. In some embodiments, the first dimerization domain and the second compatible dimerization domain have identical sequences (e.g., form a homodimer). In some embodiments, the first dimerization domain and the second dimerization domain do not have identical sequences (e.g., form a heterodimer). In some embodiments, a dimerization domain is a leucine zipper. In some instances, the dimerization domain is a nanobody, antibody, or coiled-coil domain. In some instances, the dimerization domain is a chemically inducible dimerization domain (e.g., a rapamycin sensitive dimerization domain) that can be regulated by the presence of a small molecule. In some embodiments, the dimerization domain is a light inducible dimerization domain (e.g., a far-red light inducible) that can be regulated by light exposure.


Linkers


In some instances, a linker is a covalent linkage or connection between two or more components described herein. In some embodiments, the linker comprises a chemical linker. In some embodiments, a linker is a peptide linker. In some instances, the linker(s) is located N-terminal of the fusion domain. In some instances, the linker(s) is located C-terminal of the fusion domain. In some instances, a first linker is located N-terminal of the fusion domain and the second linker is located C-terminal of the fusion domain. In some embodiments, a first linker(s) is located C-terminal of a first fusion domain and a second linker is located N-terminal of a second fusion domain.


In some embodiments, a heterologous sequence comprises one or more linkers (e.g., peptide linkers) of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, or more amino acid residues. In some embodiments, the linker can be located N-terminal of a fusion domain. In certain embodiments, the linker can be located C-terminal of a fusion domain. The linker sequence may comprise any naturally occurring amino acid. In some embodiments, the linker sequence may comprise between 2 and 200 amino acid residues. In some embodiments, the linker comprises amino acids glycine and serine. In some embodiments, the linker comprises sets of glycine and serine repeats such as (G4S)x, where x is a positive integer between 0 and 15 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In some embodiments, the linker comprises an amino acid sequence of (GSG)x, wherein X is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In some embodiments, the linker comprises an amino acid sequence of (GSSG)x, wherein X is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In some embodiments, the linker can comprise the amino acid sequence of any of the following:















Linker Amino Acid Sequence
SEQ ID NO








GGGGS
SEQ ID NO: 121






GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
SEQ ID NO: 122






GGGGSGGGGSGGGGS
SEQ ID NO: 123






GSSG
SEQ ID NO: 124






GSSGGSSG
SEQ ID NO: 125






GSSGGSSGGSSG
SEQ ID NO: 126






GSSGGSSGGSSGGSSG
SEQ ID NO: 127






GSG
SEQ ID NO: 128






GSGGSGGSGGSG
SEQ ID NO: 129






GGGS
SEQ ID NO: 130









In some embodiments, the linker comprises the 16 residue “XTEN” linker, or a variant thereof (see, e.g., Schellenberger et al. (Nat. Biotechnol. 27: 1186-1190, 2009), the entirety of which is incorporated herein by reference.


In some embodiments, any peptide linker described herein may further comprise between 1-5 (e.g., 1, 2, 3, 4, or 5) amino acid residues N-terminal or C-terminal of the peptide linker. The 1-5 amino acids residues N-terminal or C-terminal of the peptide linker can comprise any naturally occurring or modified amino acid residue.


Also included within the scope of the invention are linkers described in WO2012/138475, incorporated herein by reference in its entirety.


In some embodiments, the peptide linker comprises the structure of:





L1-L2-L3

    • wherein L1 and L3 are each independently chosen from (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15), and
    • L2 is a polypeptide comprising between 0-40 (e.g., 0-5, 5-10, 10-15, 15-20, 20-25, 25-30, 30-35, or 35-40) amino acid residues.


In certain embodiments, L2 is an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SGSETPGTSESATPES (SEQ ID NO: 106).


In certain embodiments, the peptide linker comprises an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 40 or 106.


RNA Guide


In some embodiments, a composition as described herein comprises a nuclease binding sequence and a DNA-binding sequence. In some embodiments, an RNA guide comprises a nuclease binding sequence and a DNA-binding sequence. The RNA guide can bind any one of the Cas12i polypeptides described herein with specific binding affinity. In some embodiments, the RNA guide further comprises specific binding affinity to a target sequence. In some embodiments, a composition described herein comprises two or more RNA guides (e.g., 2, 3, 4, 5, 6, 7, 8, 9, or more). In some embodiments, the RNA guide is encoded in a vector. In some embodiments, the vector comprises a Pol II promoter or a Pol III promoter.


In some embodiments, the RNA guide can associate with a Cas12i polypeptide described herein. In some embodiments, the RNA guide directs the polypeptide to a target nucleic acid sequence (e.g., DNA).


Nuclease Binding Sequence


In some embodiments, the nuclease binding sequence comprises a direct repeat sequence. In certain embodiments, the nuclease binding sequence includes a direct repeat sequence linked to a DNA-binding sequence (e.g., a DNA-targeting sequence or spacer). In some embodiments, the nuclease binding sequence includes a direct repeat sequence and a DNA-binding sequence or a direct repeat-DNA-binding sequence-direct repeat sequence. In some embodiments, the nuclease binding sequence includes a truncated direct repeat sequence and a DNA-binding sequence, which is typical of processed or mature crRNA.


In some embodiments, the direct repeat sequence comprises at least 90% identity to any one of SEQ ID NOs: 12-24. In some embodiments, the direct repeat sequence comprises at least 95% (e.g., at least 97%, at least 99%, or at least 100%) identity to any one of SEQ ID NOs: 12-24. In some embodiments, the direct repeat sequence comprises any one of SEQ ID NOs: 12-24. In some embodiments, the direct repeat sequence comprises a portion of any one of SEQ ID NOs: 12-24.









TABLE 2







Direct repeat sequences.









Sequence
Direct Repeat



identifier
Sequence
Cas12i Description





SEQ ID NO:
GUUGGAAUGACUA
Cas 12i1 (SEQ ID NO: 3 of


12
AUUUUUGUGCCCA
U.S. Pat. No. 10,808,245,



CCGUUGGCAC
SEQ ID NO: 8 of present




application)





SEQ ID NO:
AAUUUUUGUGCCC
Cas12i1 (SEQ ID NO: 3 of


13
AUCGUUGGCAC
U.S. Pat. No. 10,808,245,




SEQ ID NO: 8 of present




application)





SEQ ID NO:
AUUUUUGUGCCCA
Cas12i1 (SEQ ID NO: 3 of


14
UCGUUGGCAC
U.S. Pat. No. 10,808,245,




SEQ ID NO: 8 of present




application)





SEQ ID NO:
GUUGCAAAACCCA
Cas12i2 (SEQ ID NO: 5 of


15
AGAAAUCCGUCUU
U.S. Pat. No. 10,808,245



UCAUUGACGG
or SEQ ID NOs: 2-7 of 




present application)





SEQ ID NO:
GCAACACCUAAGA
Cas12i2 (SEQ ID NO: 5 of


16
AAUCCGUCUUUCA
U.S. Pat. No. 10,808,245



UUGACGGG
or SEQ ID NOs: 2-7 of 




present application)





SEQ ID NO:
AGAAAUCCGUCUU
Cas 12i2 (SEQ ID NO: 5 of


17
UCAUUGACGG
U.S. Pat. No. 10,808,245




or SEQ ID NOs: 2-7 of 




present application)





SEQ ID NO:
CUAGCAAUGACCU
Cas 12i3 (SEQ ID NO: 14 of


18
AAUAGUGUGUCCU
U.S. Pat. No. 10,808,245



UAGUUGACAU
or SEQ ID NO: 11 or 




present application)





SEQ ID NO:
CCUACAAUACCUA
Cas12i3 (SEQ ID NO: 14 of


19
AGAAAUCCGUCCU
U.S. Pat. No. 10,808,245



AAGUUGACGG
or SEQ ID NO: 11 or




present application)





SEQ ID NO:
AUAGUGUGUCCUU
Cas 12i3 (SEQ ID NO: 14 of


20
AGUUGACAU
U.S. Pat. No. 10,808,245




or SEQ ID NO: 11 or 




present application)





SEQ ID NO:
GUUGGAAUGACUA
Cas 12i4 (SEQ ID NO: 16 of


21
AUUUUUGUGCCCA
U.S. Pat. No. 10,808,245,



CCGUUGGCAC
SEQ ID NOs: 9 or 10 of




present application)





SEQ ID NO:
CCCACAAUACCUG
Cas 12i4 (SEQ ID NO: 16 of


22
AGAAAUCCGUCCU
U.S. Pat. No. 10,808,245,



ACGUUGACGG
SEQ ID NOs: 9 or 10 of




present application)





SEQ ID NO:
UCUCAACGAUAGU
Cas 12i4 (SEQ ID NO: 16 of


23
CAGACAUGUGUCC
U.S. Pat. No. 10,808,245,



UCAGUGACAC
SEQ ID NOs: 9 or 10 of




present application)





SEQ ID NO:
AGACAUGUGUCCU
Cas 12i4 (SEQ ID NO: 16 of


24
CAGUGACAC
U.S. Pat. No. 10,808,245,




SEQ ID NOs: 9 or 10 of




present application)









DNA-Binding Sequence


In some embodiments, the DNA-binding sequence is a DNA-targeting sequence (e.g., spacer) having a length of from about 7 nucleotides to about 100 nucleotides. For example, the spacer can have a length of from about 7 nucleotides to about 80 nucleotides, from about 7 nucleotides to about 50 nucleotides, from about 7 nucleotides to about 40 nucleotides, from about 7 nucleotides to about 30 nucleotides, from about 7 nucleotides to about 25 nucleotides, from about 7 nucleotides to about 20 nucleotides, or from about 7 nucleotides to about 19 nucleotides. For example, the spacer can have a length of from about 7 nucleotides to about 20 nucleotides, from about 7 nucleotides to about 25 nucleotides, from about 7 nucleotides to about 30 nucleotides, from about 7 nucleotides to about 35 nucleotides, from about 7 nucleotides to about 40 nucleotides, from about 7 nucleotides to about 45 nucleotides, from about 7 nucleotides to about 50 nucleotides, from about 7 nucleotides to about 60 nucleotides, from about 7 nucleotides to about 70 nucleotides, from about 7 nucleotides to about 80 nucleotides, from about 7 nucleotides to about 90 nucleotides, from about 7 nucleotides to about 100 nucleotides, from about 10 nucleotides to about 25 nucleotides, from about 10 nucleotides to about 30 nucleotides, from about 10 nucleotides to about 35 nucleotides, from about 10 nucleotides to about 40 nucleotides, from about 10 nucleotides to about 45 nucleotides, from about 10 nucleotides to about 50 nucleotides, from about 10 nucleotides to about 60 nucleotides, from about 10 nucleotides to about 70 nucleotides, from about 10 nucleotides to about 80 nucleotides, from about 10 nucleotides to about 90 nucleotides, or from about 10 nucleotides to about 100 nucleotides.


In some embodiments, the DNA-binding sequence may be generally designed to have a length of between 7 and 50 nucleotides or between 15 and 35 nucleotides (e.g., 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 or 50 nucleotides) and be complementary to a specific target sequence. In some embodiments, the RNA guide may be designed to be complementary to a specific DNA strand, e.g., of a genomic locus. In some embodiments, the DNA-binding sequence is designed to be complementary to a specific DNA strand, e.g., of a genomic locus.


In some embodiments, the DNA-binding sequence has at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity to a specific DNA sequence.


In some embodiments, a spacer or spacer sequence (e.g., the DNA-binding sequence) is a portion in an RNA guide that is the RNA equivalent of the target sequence (a DNA sequence). Typically, the spacer contains a sequence capable of binding to the non-PAM strand via base-pairing at the site complementary to the target sequence (in the PAM strand). In some instances, the spacer may be at least 75% identical to the target sequence (e.g., at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%), when considering T to be equivalent to U for the purpose of this comparison. In some instances, the spacer may be 100% identical to the target sequence when considering T to be equivalent to U for the purpose of this comparison.


In some instances, a polynucleotide is complementary to another when a first polynucleotide (e.g., a spacer sequence of an RNA guide) has a certain level of complementarity to a second polynucleotide (e.g., the complementary sequence of a target sequence) such that the first and second polynucleotides can form a double-stranded complex via base-pairing to permit an effector polypeptide that is complexed with the first polynucleotide to act on (e.g., cleave) the second polynucleotide. In some embodiments, the first polynucleotide may be substantially complementary to the second polynucleotide. In some embodiments, the first polynucleotide has at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% complementarity to the second polynucleotide. In some embodiments, the first polynucleotide is completely complementary to the second polynucleotide, i.e., having 100% complementarity to the second polynucleotide.


In some embodiments, the DNA-binding sequence and specific DNA sequence do not base pair with 100% complementarity (e.g., there are mismatches between the DNA-binding sequence and specific DNA sequence). In some embodiments, mismatches between the DNA-binding sequence and the specific DNA sequence prevent retargeting by the Cas12i polypeptide.


In some embodiments, the DNA-binding sequence comprises only RNA bases. In some embodiments, the DNA-binding sequence comprises a DNA base (e.g., the spacer comprises at least one thymine). In some embodiments, the DNA-binding sequence comprises RNA bases and DNA bases (e.g., the DNA-binding sequence comprises at least one thymine and at least one uracil).


Modifications


An RNA guide or a nucleic acid sequence encoding a Cas12i polypeptide, a deaminase polypeptide, or Cas12i-deaminase fusion polypeptide may include one or more covalent modifications with respect to a reference sequence, in particular the parent polyribonucleotide, which are included within the scope of this invention.


Exemplary modifications can include any modification to the sugar, the nucleobase, the internucleoside linkage (e.g. to a linking phosphate/to a phosphodiester linkage/to the phosphodiester backbone), and any combination thereof. Some of the exemplary modifications provided herein are described in detail below.


The RNA guide or any of the nucleic acid sequences encoding components of the variant polypeptides may include any useful modification, such as to the sugar, the nucleobase, or the internucleoside linkage (e.g. to a linking phosphate/to a phosphodiester linkage/to the phosphodiester backbone). One or more atoms of a pyrimidine nucleobase may be replaced or substituted with optionally substituted amino, optionally substituted thiol, optionally substituted alkyl (e.g., methyl or ethyl), or halo (e.g., chloro or fluoro). In certain embodiments, modifications (e.g., one or more modifications) are present in each of the sugar and the internucleoside linkage. Modifications may be modifications of ribonucleic acids (RNAs) to deoxyribonucleic acids (DNAs), threose nucleic acids (TNAs), glycol nucleic acids (GNAs), peptide nucleic acids (PNAs), locked nucleic acids (LNAs) or hybrids thereof). Additional modifications are described herein.


In some embodiments, the modification may include a chemical or cellular induced modification. For example, some nonlimiting examples of intracellular RNA modifications are described by Lewis and Pan in “RNA modifications and structures cooperate to guide RNA-protein interactions” from Nat Reviews Mol Cell Biol, 2017, 18:202-210.


Different sugar modifications, nucleotide modifications, and/or internucleoside linkages (e.g., backbone structures) may exist at various positions in the sequence. One of ordinary skill in the art will appreciate that the nucleotide analogs or other modification(s) may be located at any position(s) of the sequence, such that the function of the sequence is not substantially decreased. The sequence may include from about 1% to about 100% modified nucleotides (either in relation to overall nucleotide content, or in relation to one or more types of nucleotide, i.e. any one or more of A, G, U or C) or any intervening percentage (e.g., from 1% to 20%>, from 1% to 25%, from 1% to 50%, from 1% to 60%, from 1% to 70%, from 1% to 80%, from 1% to 90%, from 1% to 95%, from 10% to 20%, from 10% to 25%, from 10% to 50%, from 10% to 60%, from 10% to 70%, from 10% to 80%, from 10% to 90%, from 10% to 95%, from 10% to 100%, from 20% to 25%, from 20% to 50%, from 20% to 60%, from 20% to 70%, from 20% to 80%, from 20% to 90%, from 20% to 95%, from 20% to 100%, from 50% to 60%, from 50% to 70%, from 50% to 80%, from 50% to 90%, from 50% to 95%, from 50% to 100%, from 70% to 80%, from 70% to 90%, from 70% to 95%, from 70% to 100%, from 80% to 90%, from 80% to 95%, from 80% to 100%, from 90% to 95%, from 90% to 100%, and from 95% to 100%).


In some embodiments, sugar modifications (e.g., at the 2′ position or 4′ position) or replacement of the sugar at one or more ribonucleotides of the sequence may, as well as backbone modifications, include modification or replacement of the phosphodiester linkages. Specific examples of a sequence include, but are not limited to, sequences including modified backbones or no natural internucleoside linkages such as internucleoside modifications, including modification or replacement of the phosphodiester linkages. Sequences having modified backbones include, among others, those that do not have a phosphorus atom in the backbone. For the purposes of this application, and as sometimes referenced in the art, modified RNAs that do not have a phosphorus atom in their internucleoside backbone can also be considered to be oligonucleosides. In particular embodiments, a sequence will include ribonucleotides with a phosphorus atom in its internucleoside backbone.


Modified sequence backbones may include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkylphosphotriesters, methyl and other alkyl phosphonates such as 3′-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates such as 3′-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates having normal 3′-5′ linkages, 2′-5′ linked analogs of these, and those having inverted polarity wherein the adjacent pairs of nucleoside units are linked 3′-5′ to 5′-3′ or 2′-5′ to 5′-2′. Various salts, mixed salts and free acid forms are also included. In some embodiments, the sequence may be negatively or positively charged.


The modified nucleotides, which may be incorporated into the sequence, can be modified on the internucleoside linkage (e.g., phosphate backbone). Herein, in the context of the polynucleotide backbone, the phrases “phosphate” and “phosphodiester” are used interchangeably. Backbone phosphate groups can be modified by replacing one or more of the oxygen atoms with a different substituent. Further, the modified nucleosides and nucleotides can include the wholesale replacement of an unmodified phosphate moiety with another internucleoside linkage as described herein. Examples of modified phosphate groups include, but are not limited to, phosphorothioate, phosphoroselenates, boranophosphates, boranophosphate esters, hydrogen phosphonates, phosphoramidates, phosphorodiamidates, alkyl or aryl phosphonates, and phosphotriesters. Phosphorodithioates have both non-linking oxygens replaced by sulfur. The phosphate linker can also be modified by the replacement of a linking oxygen with nitrogen (bridged phosphoramidates), sulfur (bridged phosphorothioates), and carbon (bridged methylene-phosphonates).


The α-thio substituted phosphate moiety is provided to confer stability to RNA and DNA polymers through the unnatural phosphorothioate backbone linkages. Phosphorothioate DNA and RNA have increased nuclease resistance and subsequently a longer half-life in a cellular environment.


In specific embodiments, a modified nucleoside includes an alpha-thio-nucleoside (e.g., 5′-O-(1-thiophosphate)-adenosine, 5′-O-(1-thiophosphate)-cytidine (a-thio-cytidine), 5′-O-(1-thiophosphate)-guanosine, 5′-O-(1-thiophosphate)-uridine, or 5′-O-(1-thiophosphate)-pseudouridine).


Other internucleoside linkages that may be employed according to the present invention, including internucleoside linkages which do not contain a phosphorous atom, are described herein.


In some embodiments, the sequence may include one or more cytotoxic nucleosides. For example, cytotoxic nucleosides may be incorporated into sequence, such as bifunctional modification. Cytotoxic nucleoside may include, but are not limited to, adenosine arabinoside, 5-azacytidine, 4′-thio-aracytidine, cyclopentenylcytosine, cladribine, clofarabine, cytarabine, cytosine arabinoside, 1-(2-C-cyano-2-deoxy-beta-D-arabino-pentofuranosyl)-cytosine, decitabine, 5-fluorouracil, fludarabine, floxuridine, gemcitabine, a combination of tegafur and uracil, tegafur ((RS)-5-fluoro-1-(tetrahydrofuran-2-yl)pyrimidine-2,4(1H,3H)-dione), troxacitabine, tezacitabine, 2′-deoxy-2′-methylidenecytidine (DMDC), and 6-mercaptopurine. Additional examples include fludarabine phosphate, N4-behenoyl-1-beta-D-arabinofuranosylcytosine, N4-octadecyl-1-beta-D-arabinofuranosylcytosine, N4-palmitoyl-1-(2-C-cyano-2-deoxy-beta-D-arabino-pentofuranosyl) cytosine, and P-4055 (cytarabine 5′-elaidic acid ester).


In some embodiments, the sequence includes one or more post-transcriptional modifications (e.g., capping, cleavage, polyadenylation, splicing, poly-A sequence, methylation, acylation, phosphorylation, methylation of lysine and arginine residues, acetylation, and nitrosylation of thiol groups and tyrosine residues, etc.). The one or more post-transcriptional modifications can be any post-transcriptional modification, such as any of the more than one hundred different nucleoside modifications that have been identified in RNA (Rozenski, J, Crain, P, and McCloskey, J. (1999). The RNA Modification Database: 1999 update. Nucl Acids Res 27: 196-197) In some embodiments, the first isolated nucleic acid comprises messenger RNA (mRNA). In some embodiments, the mRNA comprises at least one nucleoside selected from the group consisting of pyridin-4-one ribonucleoside, 5-aza-uridine, 2-thio-5-aza-uridine, 2-thiouridine, 4-thio-pseudouridine, 2-thio-pseudouridine, 5-hydroxyuridine, 3-methyluridine, 5-carboxymethyl-uridine, 1-carboxymethyl-pseudouridine, 5-propynyl-uridine, 1-propynyl-pseudouridine, 5-taurinomethyluridine, 1-taurinomethyl-pseudouridine, 5-taurinomethyl-2-thio-uridine, 1-taurinomethyl-4-thio-uridine, 5-methyl-uridine, 1-methyl-pseudouridine, 4-thio-1-methyl-pseudouridine, 2-thio-1-methyl-pseudouridine, 1-methyl-1-deaza-pseudouridine, 2-thio-1-methyl-1-deaza-pseudouridine, dihydrouridine, dihydropseudouridine, 2-thio-dihydrouridine, 2-thio-dihydropseudouridine, 2-methoxyuridine, 2-methoxy-4-thio-uridine, 4-methoxy-pseudouridine, and 4-methoxy-2-thio-pseudouridine. In some embodiments, the mRNA comprises at least one nucleoside selected from the group consisting of 5-aza-cytidine, pseudoisocytidine, 3-methyl-cytidine, N4-acetylcytidine, 5-formylcytidine, N4-methylcytidine, 5-hydroxymethylcytidine, 1-methyl-pseudoisocytidine, pyrrolo-cytidine, pyrrolo-pseudoisocytidine, 2-thio-cytidine, 2-thio-5-methyl-cytidine, 4-thio-pseudoisocytidine, 4-thio-1-methyl-pseudoisocytidine, 4-thio-1-methyl-1-deaza-pseudoisocytidine, 1-methyl-1-deaza-pseudoisocytidine, zebularine, 5-aza-zebularine, 5-methyl-zebularine, 5-aza-2-thio-zebularine, 2-thio-zebularine, 2-methoxy-cytidine, 2-methoxy-5-methyl-cytidine, 4-methoxy-pseudoisocytidine, and 4-methoxy-1-methyl-pseudoisocytidine. In some embodiments, the mRNA comprises at least one nucleoside selected from the group consisting of 2-aminopurine, 2, 6-diaminopurine, 7-deaza-adenine, 7-deaza-8-aza-adenine, 7-deaza-2-aminopurine, 7-deaza-8-aza-2-aminopurine, 7-deaza-2,6-diaminopurine, 7-deaza-8-aza-2,6-diaminopurine, 1-methyladenosine, N6-methyladenosine, N6-isopentenyladenosine, N6-(cis-hydroxyisopentenyl)adenosine, 2-methylthio-N6-(cis-hydroxyisopentenyl) adenosine, N6-glycinylcarbamoyladenosine, N6-threonylcarbamoyladenosine, 2-methylthio-N6-threonyl carbamoyladenosine, N6,N6-dimethyladenosine, 7-methyladenine, 2-methylthio-adenine, and 2-methoxy-adenine. In some embodiments, mRNA comprises at least one nucleoside selected from the group consisting of inosine, 1-methyl-inosine, wyosine, wybutosine, 7-deaza-guanosine, 7-deaza-8-aza-guanosine, 6-thio-guanosine, 6-thio-7-deaza-guanosine, 6-thio-7-deaza-8-aza-guanosine, 7-methyl-guanosine, 6-thio-7-methyl-guanosine, 7-methylinosine, 6-methoxy-guanosine, 1-methylguanosine, N2-methylguanosine, N2,N2-dimethylguanosine, 8-oxo-guanosine, 7-methyl-8-oxo-guanosine, 1-methyl-6-thio-guanosine, N2-methyl-6-thio-guanosine, and N2,N2-dimethyl-6-thio-guanosine.


The sequence may or may not be uniformly modified along the entire length of the molecule. For example, one or more or all types of nucleotide (e.g., naturally-occurring nucleotides, purine or pyrimidine, or any one or more or all of A, G, U, C, I, pU) may or may not be uniformly modified in the sequence, or in a given predetermined sequence region thereof. In some embodiments, the sequence includes a pseudouridine. In some embodiments, the sequence includes an inosine, which may aid in the immune system characterizing the sequence as endogenous versus viral RNAs. The incorporation of inosine may also mediate improved RNA stability/reduced degradation. See for example, Yu, Z. et al. (2015) RNA editing by ADAR1 marks dsRNA as “self”. Cell Res. 25, 1283-1284, which is incorporated by reference in its entirety.


Target Sequence


The compositions disclosed herein are applicable for editing a variety of target sequences. In some embodiments, the target sequence is a DNA molecule, such as a DNA locus (referred to herein as a target sequence or an on-target sequence). In some embodiments, the target sequence is an RNA, such as an RNA locus or mRNA. In some embodiments, the target sequence is single-stranded (e.g., single-stranded DNA). In some embodiments, the target sequence is double-stranded (e.g., double-stranded DNA). In some embodiments, the target sequence comprises both single-stranded and double-stranded regions. In some embodiments, the target sequence is linear. In some embodiments, the target sequence is circular. In some embodiments, the target sequence comprises one or more modified nucleotides, such as methylated nucleotides, damaged nucleotides, or nucleotides analogs. In some embodiments, the target sequence is not modified. In some embodiments, a single-stranded target sequence does not require a PAM sequence.


The target sequence may be of any length, such as about at least any one of 100 bp, 200 bp, 500 bp, 1000 bp, 2000 bp, 5000 bp, 10 kb, 20 kb, 50 kb, 100 kb, 200 kb, 500 kb, 1 Mb, or longer. The target sequence may also comprise any sequence. In some embodiments, the target sequence is GC-rich, such as having at least about any one of 40%, 45%, 50%, 55%, 60%, 65%, or higher GC content. In some embodiments, the target sequence has a GC content of at least about 70%, 80%, or more. In some embodiments, the target sequence is a GC-rich fragment in a non-GC-rich target sequence. In some embodiments, the target sequence is not GC-rich. In some embodiments, the target sequence has one or more secondary structures or higher-order structures. In some embodiments, the target sequence is not in a condensed state, such as in a chromatin, to render the target sequence inaccessible by ribonucleoprotein.


In some embodiments, the target sequence is present in a cell. In some embodiments, the target sequence is present in the nucleus of the cell. In some embodiments, the target sequence is endogenous to the cell. In some embodiments, the target sequence is a genomic DNA. In some embodiments, the target sequence is a chromosomal DNA. In some embodiments, the target sequence is a protein-coding gene or a functional region thereof, such as a coding region, or a regulatory element, such as a promoter, enhancer, a 5′ or 3′ untranslated region, etc. In some embodiments, the target sequence is a non-coding gene, such as transposon, miRNA, tRNA, ribosomal RNA, ribozyme, or lincRNA. In some embodiments, the target sequence is a plasmid.


In some embodiments, the target sequence is exogenous to a cell. In some embodiments, the target sequence is a viral nucleic acid, such as viral DNA or viral RNA. In some embodiments, the target sequence is a horizontally transferred plasmid. In some embodiments, the target sequence is integrated in the genome of the cell. In some embodiments, the target sequence is not integrated in the genome of the cell. In some embodiments, the target sequence is a plasmid in the cell. In some embodiments, the target sequence is present in an extrachromosomal array.


In some embodiments, the target sequence is an isolated nucleic acid, such as an isolated DNA or an isolated RNA. In some embodiments, the target sequence is present in a cell-free environment. In some embodiments, the target sequence is an isolated vector, such as a plasmid. In some embodiments, the target sequence is an ultrapure plasmid.


The target is a segment of the target sequence that hybridizes to the RNA guide. In some embodiments, the target sequence has only one copy of the target sequence. In some embodiments, the target sequence has more than one copy, such as at least about any one of 2, 3, 4, 5, 10, 100, or more copies of the target sequence. For example, a target sequence comprising a repeated sequence in a genome of a viral nucleic acid or a bacterium may be targeted by the Cas12i polypeptide.


In some embodiments, the target sequence is present in a readily accessible region of the target sequence. In some embodiments, the target sequence is in an exon of a target gene. In some embodiments, the target sequence is across an exon-intron junction of a target gene. In some embodiments, the target sequence is present in a non-coding region, such as a regulatory region of a gene. In some embodiments, wherein the target sequence is exogenous to a cell, the target sequence comprises a sequence that is not found in the genome of the cell.


Suitable DNA/RNA binding conditions include physiological conditions normally present in a cell. Other suitable DNA/RNA binding conditions (e.g., conditions in a cell-free system) are known in the art; see, e.g., Sambrook, supra. The strand of the target sequence that is complementary to and hybridizes with the RNA guide is referred to as the “complementary strand” and the strand of the target sequence that is complementary to the “complementary strand” (and is therefore not complementary to the RNA guide) is referred to as the “noncomplementary strand” or “non-complementary strand”.


In some embodiments, the PAM sequence comprises 5′-NTTN-3′ wherein N is any nucleotide (e.g., A, G, T, or C). In other embodiments, a PAM sequence of the disclosure comprises the sequence 5′-TTY-3′ or 5′-TTB-3′, wherein Y is C or T, and B is G, T, or C. The PAM sequence may be immediately adjacent to the target sequence or, for example, within a small number (e.g., 1, 2, 3, 4, or 5) of nucleotides of the target sequence. In the case of a double-stranded target, the RNA guide binds to a first strand of the target and a PAM sequence as described herein is present in the second, complementary strand. In such a case, the PAM sequence is immediately adjacent to (or within a small number, e.g., 1, 2, 3, 4, or 5 nucleotides of) a sequence in the second strand that is complementary to the sequence in the first strand to which the binding moiety binds.


In some embodiments, the target sequence is a gene that is involved in an immune response in a subject. In some embodiments, the target sequence is an immune checkpoint gene. In some embodiments, the target sequence is selected from the group consisting of: BCL11A intronic erythroid enhancer, CD3, Beta-2 microglobulin (B2M), T Cell Receptor Alpha Constant (TRAC), Programmed Cell Death 1 (PDCD1), T-cell receptor alpha, T-cell receptor beta, B-cell lymphoma/leukemia 11A (BCL11A), Cytotoxic T-Lymphocyte Antigen 4 (CTLA-4), chemokine (C—C motif) receptor 5 (gene/pseudogene) (CCR5), CXCR4 gene, CD160 molecule (CD160), adenosine A2a receptor (ADORA), CD276, B7-H3, B7-H4, BTLA, nicotinamide adenine dinucleotide phosphate NADPH oxidase isoform 2 (NOX2), V-domain Ig suppressor of T cell activation (VISTA), Sialic acid-binding immunoglobulin-type lectin 7 (SIGLEC7), Sialic acid-binding immunoglobulin-type lectin 9 (SIGLEC9), SIGLEC10, V-set domain containing T cell activation inhibitor 1 (VTCN1), B and T lymphocyte associated (BTLA), Indoleamine 2,3-dioxygenase (IDO), indoleamine 2,3-dioxygenase 1 (IDO1), Killer-cell Immunoglobulin-like Receptor (KIR), killer cell immunoglobulin-like receptor, three domains, long cytoplasmic tail, 1 (KIR3DL1), lymphocyte-activation gene 3 (LAG3), T-cell Immunoglobulin domain and Mucin domain 3 (TIM3), hepatitis A virus cellular receptor 2 (HAVCR2), natural killer cell receptor 2B4 (CD244), hypoxanthine phosphoribosyltransferase 1 (HPRT), T-cell immunoreceptor with Ig and ITIM domains (TIGIT), CD96 molecule (CD96), cytotoxic and regulatory T-cell molecule (CRTAM), leukocyte associated immunoglobulin like receptor 1 (LAIR1), adeno-associated virus integration site 1 (AAVS1), AAVS 2, AAVS3, AAVS4, AAVS5, AAVS6, AAVS7, AAVS8, transforming growth factor beta receptor II (TGFBRII), transforming growth factor beta receptor I (TGFBR1), SMAD family member 2 (SMAD2), SMAD family member 3 (SMAD3), SMAD family member 4 (SMAD4), SKI proto-oncogene (SKI), SKI-like proto-oncogene (SKIL), egl-9 family hypoxia-inducible factor 1 (EGLN1), egl-9 family hypoxia-inducible factor 2 (EGLN2), egl-9 family hypoxia-inducible factor 3 (EGLN3), protein phosphatase 1 regulatory subunit 12C (PPP1R12C), TGFB induced factor homeobox 1 (TGIF1), tumor necrosis factor receptor superfamily member, tumor necrosis factor receptor superfamily member 10b (TNFRSF10B), tumor necrosis factor receptor superfamily member 10a (TNFRSF10A), BY55, B7H5, caspase 8 (CASP8), caspase 10 (CASP10), caspase 3 (CASP3), caspase 6 (CASP6), caspase 7 (CASP7), Fas associated via death domain (FADD), Fas cell surface death receptor (FAS), interleukin 10 receptor subunit alpha (IL10RA), interleukin 10 receptor subunit beta (IL10RB), heme oxygenase 2 (HMOX2), interleukin 6 receptor (IL6R), interleukin 6 signal transducer (IL6ST), c-src tyrosine kinase (CSK), phosphoprotein membrane anchor with glycosphingolipid microdomains 1 (PAG1), guanylate cyclase 1, soluble, beta 3 (GUCY1B3), signaling threshold regulating transmembrane adaptor 1 (SIT1), forkhead box P3 (FOXP3), PR domain 1 (PRDM1), basic leucine zipper transcription factor, ATF-like (BATF), guanylate cyclase 1, soluble, alpha 2 (GUCY1A2), guanylate cyclase 1, soluble, alpha 3 (GUCY1A3), guanylate cyclase 1, soluble, beta 2 (GUCY1B2), prolyl hydroxylase domain (PHD1, PHD2, PHD3) family of proteins, CD27, CD28, CD40, CD122, CD137, OX40, GITR, and ICOS. In some embodiments, the modified gene is programmed death ligand 1 (PD-L1), class II major histocompatibility complex transactivator (CIITA), citramalyl-CoA lyase (CLYBL), transthyretin (TTR), lactate dehydrogenase-A (LDHA), dydroxyacid oxidase-1 (HAO1), alanine-glyoxylate and serine-pyruvate aminotransferase (AGXT), glyoxylate reductase/hydroxypyruvate reductase (GRHPR), 4-hydroxy-2-oxoglutarate aldolase (HOGA), polypyrimidine tract binding protein 1 (PTBP1), stathmin 2 (STMN2), or actin beta (ACTB).


Base Editing


In some embodiments, a composition described herein introduces at least one edit into a target sequence of a target nucleic acid. In some embodiments, the edit may include a substitution relative to a wild-type nucleic acid sequence. In some embodiments, the edit is a one-nucleotide substitution. In some embodiments, the edit is a two-nucleotide substitution. In some embodiments, the edit is a three-nucleotide substitution. In some embodiments, the edit is a four-nucleotide substitution. In some embodiments, the edit is a five-nucleotide substitution.


In aspect, the disclosure provides a method of producing an edit (e.g., a substitution) in a target sequence of a target nucleic acid (e.g., a target nucleic acid in a cell), the method comprising:

    • contacting target nucleic acid (e.g., the target nucleic acid in the cell): (i) a Cas12i domain, (ii) an RNA guide, and (iii) a deaminase domain, or nucleic acid encoding (i), (ii), and (iii),
    • wherein the target nucleic acid comprises a nucleotide position 0, wherein nucleotide position 0 of the target strand hybridizes with the direct repeat-proximal end of the spacer sequence of the RNA guide, and a position x (wherein optionally x=20), wherein position x of the target strand hybridizes with the direct repeat-distal end of the spacer sequence of the RNA guide,
    • and wherein the target nucleic acid comprises an A or a C between positions 5-16 (e.g., between positions 7-12, e.g., between positions 8-11) on the target strand or the non-target strand,
    • wherein the A is mutated to a inosine (I) (e.g., converts an A:T base pair to an I:C, I:U, or I:A base pair) or the C is mutated to a U or T (e.g., converts a C:G base pair to a T:A base pair).


In one aspect, the disclosure provides a method of producing an edit (e.g., a substitution) in a target sequence of a target nucleic acid (e.g., a target nucleic acid in a cell), the method comprising:

    • contacting target nucleic acid (e.g., the target nucleic acid in the cell) (i) a fusion protein described herein or the polypeptide system described herein, and (ii) an RNA guide, or a nucleic acid encoding (i) and (ii),
    • thereby introducing the substitution.


In certain embodiments, the target nucleic acid comprises a nucleotide position 0, wherein nucleotide position 0 of the target strand hybridizes with the direct repeat-proximal end of the spacer sequence of the RNA guide, and a position x (wherein optionally x=20), wherein position x of the target strand hybridizes with the direct repeat-distal end of the spacer sequence of the RNA guide,

    • and wherein the target nucleic acid comprises an A or a C between positions 5-16 (e.g., between positions 7-12 (e.g., 7, 8, 9, 10, 11, or 12)) on the target strand or the non-target strand,
    • wherein the A is mutated to a inosine (I) or the C is mutated to a U (e.g., converts a C:G base pair to a T:A base pair).


In some embodiments, the method converts a C:G base pair to a T:A base pair alteration in the target nucleic acid.


In certain embodiments, the alteration occurs at one or more C:G base pairs between positions 7-12 (e.g., 7, 8, 9, 10, 11, or 12) of the target nucleic acid.


It is understood that, herein, when a nucleic is said to comprise a particular nucleotide between specified positions, the end positions are included. For example, a nucleic acid comprising A between positions 8-11 could comprise the A at position 8, 9, 10, or 11.


In some embodiments wherein the Cas12i domain is a circularly permuted domain, the target nucleic acid comprises an alteration between positions 1-30. For example, in some embodiments, the alteration is between positions 1-30 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30), positions 1-25 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25), positions 1-20 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20), position 5-25 (e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25), or position 5-20 (e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20). In some embodiments wherein the Cas12i domain comprises a FokI nuclease domain, the target nucleic acid comprises an alteration between positions 1-30. For example, in some embodiments, the alteration is between positions 1-30 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30), positions 1-25 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25), positions 1-20 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20), position 5-25 (e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25), or position 5-20 (e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20). In some embodiments wherein the alteration between positions 1-30, the alteration is in the target strand. In some embodiments wherein the alteration between positions 1-30, the alteration is in the non-target strand.


In some embodiments, the cell is selected from a eukaryotic cell, a mammalian cell, or a human cell. In certain embodiments, the cell is in vivo. In some embodiments, the cell is ex vivo. In certain embodiments, the cell is in vitro.


Production


In some embodiments, a composition of the present invention comprising a Cas12i polypeptide and a deaminase or a Cas12i polypeptide-deaminase fusion can be prepared by (a) culturing bacteria which produce the Cas12i polypeptide and the deaminase polypeptide of the present invention, isolating the Cas12i polypeptide and the deaminase, optionally, purifying the Cas12i polypeptide and the deaminase, and complexing the Cas12i polypeptide and the deaminase with the RNA guide. The Cas12i polypeptide and the deaminase can be also prepared by (b) a known genetic engineering technique, specifically, by isolating a gene encoding the Cas12i polypeptide and the deaminase of the present invention from bacteria, constructing a recombinant expression vector, and then transferring the vector into an appropriate host cell that expresses the RNA guide for expression of a recombinant protein that complexes with the RNA guide in the host cell. Alternatively, the Cas12i polypeptide and the deaminase can be prepared by (c) an in vitro coupled transcription-translation system and then complexes with RNA guide. Bacteria that can be used for preparation of the Cas12i polypeptide and the deaminase of the present invention are not particularly limited as long as they can produce the Cas12i polypeptide and the deaminase of the present invention. Some nonlimiting examples of the bacteria include E. coli cells described herein.


Unless otherwise noted, all compositions and complexes and polypeptides provided herein are made in reference to the active level of that composition or complex or polypeptide, and are exclusive of impurities, for example, residual solvents or by-products, which may be present in commercially available sources. Enzymatic component weights are based on total active protein. All percentages and ratios are calculated by weight unless otherwise indicated. All percentages and ratios are calculated based on the total composition unless otherwise indicated. In the exemplified composition, the enzymatic levels are expressed by pure enzyme by weight of the total composition and unless otherwise specified, the ingredients are expressed by weight of the total compositions.


Vectors


The present invention provides a vector for expressing the Cas12i polypeptide and the deaminase described herein or nucleic acids encoding the composition components described herein may be incorporated into a vector. In some embodiments, a vector of the invention includes a nucleotide sequence encoding Cas12i polypeptide and the deaminase. In some embodiments, a vector of the invention includes a nucleotide sequence encoding the Cas12i polypeptide and the deaminase.


In some embodiments, the RNA guide or any portion thereof is encoded in a vector. In some embodiments, the vector comprises a Pol II promoter or a Pol III promoter.


The present invention also provides a vector that may be used for preparation of the Cas12i polypeptide and the deaminase and/or the RNA guide or compositions comprising the Cas12i polypeptide and the deaminase and/or the RNA guide as described herein. In some embodiments, the invention includes the composition or vector described herein in a cell. In some embodiments, the invention includes a method of expressing the composition comprising the Cas12i polypeptide and the deaminase and/or the RNA guide, or vector or nucleic acid encoding the Cas12i polypeptide and the deaminase and/or the RNA guide, in a cell. The method may comprise the steps of providing the composition, e.g., vector or nucleic acid, and delivering the composition to the cell.


Expression of natural or synthetic polynucleotides is typically achieved by operably linking a polynucleotide encoding the gene of interest, e.g., nucleotide sequence encoding the Cas12i polypeptide and the deaminase and/or the RNA guide, to a promoter and incorporating the construct into an expression vector. The expression vector is not particularly limited as long as it includes a polynucleotide encoding the Cas12i polypeptide and the deaminase and/or the RNA guide of the present invention and can be suitable for replication and integration in eukaryotic cells.


Typical expression vectors include transcription and translation terminators, initiation sequences, and promoters useful for expression of the desired polynucleotide. For example, plasmid vectors carrying a recognition sequence for RNA polymerase (pSP64, pBluescript, etc.). may be used. Vectors including those derived from retroviruses such as lentivirus are suitable tools to achieve long-term gene transfer since they allow long-term, stable integration of a transgene and its propagation in daughter cells. Examples of vectors include expression vectors, replication vectors, probe generation vectors, and sequencing vectors. The expression vector may be provided to a cell in the form of a viral vector.


Viral vector technology is well known in the art and described in a variety of virology and molecular biology manuals. Viruses which are useful as vectors include, but are not limited to phage viruses, retroviruses, adenoviruses, adeno-associated viruses, herpes viruses, and lentiviruses. In general, a suitable vector contains an origin of replication functional in at least one organism, a promoter sequence, convenient restriction endonuclease sites, and one or more selectable markers.


The kind of the vector is not particularly limited, and a vector that can be expressed in host cells can be appropriately selected. To be more specific, depending on the kind of the host cell, a promoter sequence to ensure the expression of the effector polypeptide(s) from the polynucleotide is appropriately selected, and this promoter sequence and the polynucleotide are inserted into any of various plasmids etc. for preparation of the expression vector.


Additional promoter elements, e.g., enhancing sequences, regulate the frequency of transcriptional initiation. Typically, these are located in the region 30-110 bp upstream of the start site, although a number of promoters have recently been shown to contain functional elements downstream of the start site as well. Depending on the promoter, it appears that individual elements can function either cooperatively or independently to activate transcription.


Further, the disclosure should not be limited to the use of constitutive promoters. Inducible promoters are also contemplated as part of the disclosure. The use of an inducible promoter provides a molecular switch capable of turning on expression of the polynucleotide sequence which it is operatively linked when such expression is desired or turning off the expression when expression is not desired. Examples of inducible promoters include, but are not limited to a metallothionine promoter, a glucocorticoid promoter, a progesterone promoter, and a tetracycline promoter.


The expression vector to be introduced can also contain either a selectable marker gene or a reporter gene or both to facilitate identification and selection of expressing cells from the population of cells sought to be transfected or infected through viral vectors. In other aspects, the selectable marker may be carried on a separate piece of DNA and used in a co-transfection procedure. Both selectable markers and reporter genes may be flanked with appropriate transcriptional control sequences to enable expression in the host cells. Examples of such a marker include a dihydrofolate reductase gene and a neomycin resistance gene for eukaryotic cell culture; and a tetracycline resistance gene and an ampicillin resistance gene for culture of E. coli and other bacteria. By use of such a selection marker, it can be confirmed whether the polynucleotide encoding the effector polypeptide(s) of the present invention has been transferred into the host cells and then expressed without fail.


The preparation method for recombinant expression vectors is not particularly limited, and examples thereof include methods using a plasmid, a phage or a cosmid.


Methods of Expression


The present invention includes a method for protein expression, comprising translating the Cas12i polypeptide and the deaminase, and expressing the RNA guide described herein.


In some embodiments, a host cell described herein is used to express the Cas12i polypeptide and the deaminase and/or the RNA guide. The host cell is not particularly limited, and various known cells can be preferably used. Specific examples of the host cell include bacteria such as E. coli, yeasts (budding yeast, Saccharomyces cerevisiae, and fission yeast, Schizosaccharomyces pombe), nematodes (Caenorhabditis elegans), Xenopus laevis oocytes, and animal cells (for example, CHO cells, COS cells and HEK293 cells). The method for transferring the expression vector described above into host cells, i.e., the transformation method, is not particularly limited, and known methods such as electroporation, the calcium phosphate method, the liposome method and the DEAE dextran method can be used.


After a host is transformed with the expression vector, the host cells may be cultured, cultivated or bred, for production of the Cas12i polypeptide, the deaminase and/or the RNA guide. After expression of the Cas12i polypeptide, the deaminase and/or the RNA guide, the host cells can be collected and Cas12i polypeptide, the deaminase and/or the RNA guide purified from the cultures etc. according to conventional methods (for example, filtration, centrifugation, cell disruption, gel filtration chromatography, ion exchange chromatography, etc.).


In some embodiments, the methods for expression comprise translation of at least 5 amino acids, at least 10 amino acids, at least 15 amino acids, at least 20 amino acids, at least 50 amino acids, at least 100 amino acids, at least 150 amino acids, at least 200 amino acids, at least 250 amino acids, at least 300 amino acids, at least 400 amino acids, at least 500 amino acids, at least 600 amino acids, at least 700 amino acids, at least 800 amino acids, at least 900 amino acids, or at least 1000 amino acids of the effector polypeptide(s). In some embodiments, the methods for protein expression comprises translation of about 5 amino acids, about 10 amino acids, about 15 amino acids, about 20 amino acids, about 50 amino acids, about 100 amino acids, about 150 amino acids, about 200 amino acids, about 250 amino acids, about 300 amino acids, about 400 amino acids, about 500 amino acids, about 600 amino acids, about 700 amino acids, about 800 amino acids, about 900 amino acids, about 1000 amino acids or more of the Cas12i polypeptide and the deaminase.


A variety of methods can be used to determine the level of production of a mature Cas12i polypeptide, the deaminase and/or the RNA guide in a host cell. Such methods include, but are not limited to, for example, methods that utilize either polyclonal or monoclonal antibodies specific for the proteins or a labeling tag as described elsewhere herein. Exemplary methods include, but are not limited to, enzyme-linked immunosorbent assays (ELISA), radioimmunoassays (MA), fluorescent immunoassays (FIA), and fluorescent activated cell sorting (FACS). These and other assays are well known in the art (See, e.g., Maddox et al., J. Exp. Med. 158:1211 [1983]).


The present disclosure provides methods of in vivo expression of the Cas12i polypeptide and the deaminase and/or the RNA guide in a cell, comprising providing a polyribonucleotide encoding the Cas12i polypeptide, the deaminase and/or the RNA guide to a host cell wherein the polyribonucleotide encodes the Cas12i polypeptide, the deaminase and/or the RNA guide, expressing the Cas12i polypeptide, the deaminase and/or the RNA guide in the cell, and obtaining the Cas12i polypeptide, the deaminase and/or the RNA guide from the cell.


Compositions and Formulations


The disclosure also provides a composition or formulation comprising a cell modified by a composition described herein. In some embodiments, the composition or formulation includes a cell or plurality of cells modified by a system described herein (e.g., (i) an RNA guide and (ii) a Cas12i fusion protein or a protein system comprising a Cas12i polypeptide and a deaminase polypeptide). In some embodiments, the composition or formulation includes a cell or plurality of cells comprising a substitution, insertion, or deletion described herein. In some embodiments, the composition or formulation includes a cell line modified by system described herein. In some embodiments, the composition or formulation includes a cell line comprising a substitution, insertion, or deletion described herein. The composition or formulation can additionally include, optionally, media and/or instructions for use of the modified cell or cell line.


In some embodiments, the composition is a pharmaceutical composition. A pharmaceutical composition that is useful may be prepared, packaged, or sold in a formulation suitable for oral, rectal, vaginal, parenteral, topical, pulmonary, intranasal, intra-lesional, buccal, ophthalmic, intravenous, intra-organ or another route of administration. A pharmaceutical composition of the disclosure may be prepared, packaged, or sold in bulk, as a single unit dose, or as a plurality of single unit doses. As used herein, a “unit dose” is discrete amount of the pharmaceutical composition comprising a predetermined number of cells. The number of cells is generally equal to the dosage of the cells which would be administered to a subject or a convenient fraction of such a dosage such as, for example, one-half or one-third of such a dosage.


A formulation of a pharmaceutical composition suitable for parenteral administration may comprise the cells combined with a pharmaceutically acceptable carrier, such as sterile water or sterile isotonic saline. Such a formulation may be prepared, packaged, or sold in a form suitable for bolus administration or for continuous administration. Some injectable formulations may be prepared, packaged, or sold in unit dosage form, such as in ampules or in multi-dose containers containing a preservative. Some formulations for parenteral administration include, but are not limited to, suspensions, solutions, emulsions in oily or aqueous vehicles, pastes, and implantable sustained-release or biodegradable formulations. Some formulations may further comprise one or more additional ingredients including, but not limited to, suspending, stabilizing, or dispersing agents.


The pharmaceutical composition may be prepared, packaged, or sold in the form of a sterile injectable aqueous or oily suspension or solution. This suspension or solution may be formulated according to the known art, and may comprise, in addition to the cells, additional ingredients such as the dispersing agents, wetting agents, or suspending agents described herein. Such sterile injectable formulation may be prepared using a non-toxic parenterally-acceptable diluent or solvent, such as water or saline. Other acceptable diluents and solvents include, but are not limited to, Ringer's solution, isotonic sodium chloride solution, and fixed oils such as synthetic mono- or di-glycerides. Other parentally-administrable formulations which that are useful include those which may comprise the cells in a packaged form, in a liposomal preparation, or as a component of a biodegradable polymer system. Some compositions for sustained release or implantation may comprise pharmaceutically acceptable polymeric or hydrophobic materials such as an emulsion, an ion exchange resin, a sparingly soluble polymer, or a sparingly soluble salt.


Kits and Uses


The invention also provides kits or systems that can be used, for example, to carry out a method described herein. In some embodiments, the kits or systems include a Cas12i polypeptide and a deaminase. In some embodiments, the kits or systems include a polynucleotide that encodes a Cas12i polypeptide and deaminase, and optionally the polynucleotide is comprised within a vector, e.g., as described herein. In some embodiments, the kits or systems include a Cas12i-deaminase fusion polypeptide. The kits or systems also can include a deaminase, and an RNA guide as described herein. The RNA guide of the kits or systems of the invention can be designed to target a sequence of interest. The Cas12i polypeptide, deaminase, and RNA guide can be packaged within the same vial or other vessel within a kit or system or can be packaged in separate vials or other vessels, the contents of which can be mixed prior to use. The kits or systems can additionally include, optionally, a buffer and/or instructions for use of the Cas12i polypeptide and deaminase, along with the RNA guide.


In some embodiments, the kit may be useful for research purposes. For example, in some embodiments, the kit may be useful to study gene function.


Delivery


Compositions described herein may be formulated, for example, including a carrier, such as a carrier and/or a polymeric carrier, e.g., a liposome, and delivered by known methods to a cell (e.g., a prokaryotic, eukaryotic, plant, mammalian, etc.). Such methods include, but not limited to, transfection (e.g., lipid-mediated, cationic polymers, calcium phosphate, dendrimers); electroporation or other methods of membrane disruption (e.g., nucleofection), viral delivery (e.g., lentivirus, retrovirus, adenovirus, AAV), microinjection, microprojectile bombardment (“gene gun”), fugene, direct sonic loading, cell squeezing, optical transfection, protoplast fusion, impalefection, magnetofection, exosome-mediated transfer, lipid nanoparticle-mediated transfer, and any combination thereof.


In some embodiments, compositions are delivered using an AAV particle comprising an AAV vector. In some embodiments, the AAV particle is an AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, or AAV11 particle (e.g., an AAV8, AAV3, or AAV2 particle). In some embodiments, the AAV particle comprises an AAV capsid. In some embodiments, the AAV capsid comprises one or more AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, or AAV11 proteins. In some embodiments, all the protein components of the AAV capsid are proteins of the same AAV serotype (e.g., all AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, or AAV11 proteins). In some embodiments, a first protein component of the AAV capsid is a protein of a first AAV serotype, and a second protein component of the AAV capsid is a protein of a second different AAV serotype. In some embodiments, the AAV particle is a pseudotype particle. In some embodiments, the first AAV ITR is from a different AAV serotype than the serotype of one or more of the proteins of the AAV capsid. In some embodiments, the second AAV ITR is from a different AAV serotype than the serotype of one or more of the proteins of the AAV capsid. In some embodiments, the first AAV ITR is from the same AAV serotype as the serotype of one or more of the proteins of the AAV capsid. In some embodiments, the second AAV ITR is from the same AAV serotype as the serotype of one or more of the proteins of the AAV capsid.


In some embodiments, the method comprises delivering one or more nucleic acids (e.g., nucleic acids encoding the Cas12i polypeptide, deaminase, RNA guide, one or more transcripts thereof, and/or a pre-formed ribonucleoprotein to a cell. Exemplary intracellular delivery methods, include, but are not limited to: viruses or virus-like agents; chemical-based transfection methods, such as those using calcium phosphate, dendrimers, liposomes, or cationic polymers (e.g., DEAE-dextran or polyethylenimine); non-chemical methods, such as microinjection, electroporation, cell squeezing, sonoporation, optical transfection, impalefection, protoplast fusion, bacterial conjugation, delivery of plasmids or transposons; particle-based methods, such as using a gene gun, magnetofection or magnet assisted transfection, particle bombardment; and hybrid methods, such as nucleofection. In some embodiments, the present application further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells. In some embodiments, a composition of the present invention is further delivered with an agent (e.g., compound, molecule, or biomolecule) that affects DNA repair or DNA repair machinery. In some embodiments, a composition of the present invention is further delivered with an agent (e.g., compound, molecule, or biomolecule) that affects the cell cycle.


Cells


In embodiments described herein the composition is delivered to or introduced into a cell. The cell described herein can be a variety of cells. In some embodiments, the cell is an isolated cell. In some embodiments, the cell is in cell culture or a co-culture of two or more cell types. In some embodiments, the cell is ex vivo. In some embodiments, the cell is obtained from a living organism and maintained in a cell culture. In some embodiments, the cell is a single-cellular organism.


In some embodiments, the cell is a prokaryotic cell. In some embodiments, the cell is a bacterial cell or derived from a bacterial cell. In some embodiments, the cell is an archaeal cell or derived from an archaeal cell.


In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a plant cell or derived from a plant cell. In some embodiments, the cell is a fungal cell or derived from a fungal cell. In some embodiments, the cell is an animal cell or derived from an animal cell. In some embodiments, the cell is an invertebrate cell or derived from an invertebrate cell. In some embodiments, the cell is a vertebrate cell or derived from a vertebrate cell. In some embodiments, the cell is a mammalian cell or derived from a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the cell is a zebra fish cell. In some embodiments, the cell is a primate cell. In some embodiments, the cell is a rodent cell. In some embodiments, the cell is synthetically made, sometimes termed an artificial cell.


In some embodiments, the cell is derived from a cell line. A wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, 293T, MF7, K562, HeLa, CHO, and transgenic varieties thereof. Cell lines are available from a variety of sources known to those with skill in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassas, Va.)). In some embodiments, the cell is an immortal or immortalized cell. In some embodiments, the cell is a stem cell such as a totipotent stem cell (e.g., omnipotent), a pluripotent stem cell, a multipotent stem cell, an oligopotent stem cell, or an unipotent stem cell. In some embodiments, the cell is an induced pluripotent stem cell (iPSC) or derived from an iPSC. In some embodiments, the cell is a mesenchymal stem cell. In some embodiments, the cell is an embryonic stem cell. In some embodiments, the cell is a hematopoietic stem cell. In some embodiments, the cell is a differentiated cell. For example, in some embodiments, the differentiated cell is a muscle cell (e.g., a myocyte), a fat cell (e.g., an adipocyte), a bone cell (e.g., an osteoblast, osteocyte, osteoclast), a blood cell (e.g., a monocyte, a lymphocyte, a neutrophil, an eosinophil, a basophil, a macrophage, a erythrocyte, or a platelet), a nerve cell (e.g., a neuron), an epithelial cell, an immune cell (e.g., a lymphocyte, a neutrophil, a monocyte, or a macrophage), a liver cell (e.g., a hepatocyte), a fibroblast, or a sex cell. In some embodiments, the cell is a terminally differentiated cell. For example, in some embodiments, the terminally differentiated cell is a neuronal cell, an adipocyte, a cardiomyocyte, a skeletal muscle cell, an epidermal cell, or a gut cell. In some embodiments, the cell is a glial cell. In some embodiments, the cell is a pancreatic islet cell, including an alpha cell, beta cell, delta cell, or enterochromaffin cell. In some embodiments, the cell is an immune cell. In some embodiments, the immune cell is a T cell. In some embodiments, the immune cell is a B cell. In some embodiments, the immune cell is a Natural Killer (NK) cell. In some embodiments, the immune cell is a Tumor Infiltrating Lymphocyte (TIL). In some embodiments, the cell is a mammalian cell, e.g., a human cell or primate cell or a murine cell. In some embodiments, the murine cell is derived from a wild-type mouse, an immunosuppressed mouse, or a disease-specific mouse model. In some embodiments, the cell is a cell within a living tissue, organ, or organism.


In some embodiments, the cell is a primary cell. For example, cultures of primary cells can be passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, 15 times or more. In some embodiments, the primary cells are harvest from an individual by any known method. For example, leukocytes may be harvested by apheresis, leukocytapheresis, density gradient separation, etc. Cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. can be harvested by biopsy. An appropriate solution may be used for dispersion or suspension of the harvested cells. Such solution can generally be a balanced salt solution, (e.g. normal saline, phosphate-buffered saline (PBS), Hank's balanced salt solution, etc.), conveniently supplemented with fetal calf serum or other naturally occurring factors, in conjunction with an acceptable buffer at low concentration. Buffers can include HEPES, phosphate buffers, lactate buffers, etc. Cells may be used immediately, or they may be stored (e.g., by freezing). Frozen cells can be thawed and can be capable of being reused. Cells can be frozen in a DMSO, serum, medium buffer (e.g., 10% DMSO, 50% serum, 40% buffered medium), and/or some other such common solution used to preserve cells at freezing temperatures.


In embodiments wherein a composition of the present invention is introduced into a plurality of cells, at least about 0.5% of the cells comprise the desired edit. In some embodiments, at least about 1% of the cells comprise the desired edit. In some embodiments, at least about 2% of the cells comprise the desired edit. In some embodiments, at least about 3% of the cells comprise the desired edit. In some embodiments, at least about 4% of the cells comprise the desired edit. In some embodiments, at least about 5% of the cells comprise the desired edit. In some embodiments, at least about 10% of the cells comprise the desired edit. In some embodiments, at least about 20% of the cells comprise the desired edit. In some embodiments, at least about 30% of the cells comprise the desired edit. In some embodiments, at least about 40% of the cells comprise the desired edit. In some embodiments, at least about 50% of the cells comprise the desired edit.


In some embodiments, the composition or formulation comprising a cell modified by a Cas12i polypeptide, deaminase, and RNA guide as described herein may be useful as an expression system to manufacture biomolecules. For example, in some embodiments, the composition or formulation comprising the modified cell may be useful to produce biomolecules such as proteins (e.g., cytokines, antibodies, antibody-based molecules), peptides, lipids, carbohydrates, nucleic acids, amino acids, and vitamins. In other embodiments, the composition or formulation comprising the modified cell may be useful in the production of a viral vector such as a lentivirus, adenovirus, adeno-associated virus, and oncolytic virus vector. In some embodiments, the composition or formulation comprising the modified cell may be useful in cytotoxicity studies. In some embodiments, the composition or formulation comprising the modified cell may be useful as a disease model. In some embodiments, the composition or formulation comprising the modified cell may be useful in vaccine production. In some embodiments, the composition or formulation comprising the modified cell may be useful in therapeutics. For example, in some embodiments, the composition or formulation comprising the modified cell may be useful in cellular therapies such as transfusions and transplantations.


In some embodiments, the composition or formulation comprising a cell modified by a Cas12i polypeptide, deaminase, and RNA guide as described herein may be useful to establish a new cell line comprising a modified genomic sequence. In some embodiments, a modified cell of the disclosure is a modified stem cell (e.g., a modified totipotent/omnipotent stem cell, a modified pluripotent stem cell, a modified multipotent stem cell, a modified oligopotent stem cell, or a modified unipotent stem cell) that differentiates into one or more cell lineages comprising the deletion of the modified stem cell. The disclosure further provides organisms (such as animals, plants, or fungi) comprising or produced from a modified cell of the disclosure.


All references and publications cited herein are hereby incorporated by reference.


EXAMPLES

The following examples are provided to further illustrate some embodiments of the present invention but are not intended to limit the scope of the invention; it will be understood by their exemplary nature that other procedures, methodologies, or techniques known to those skilled in the art may alternatively be used.


Example 1—Base Editing Mediated by Cas12i2

This Example describes editing of multiple mammalian targets using inactivated Cas12i2 fused to a deaminase.


To generate base editing fusion constructs, the variant Cas12i2 of SEQ ID NO: 4 was first deactivated by mutating the catalytic D599 residue to alanine. The deactivated Cas12i2 variant (referred to as dCas12i2 herein and having the sequence set forth in SEQ ID NO: 25) was then fused to one of the two cytidine deaminases—humanAPOBEC3a (A3A) (SEQ ID NO: 29) or Activation Induced Deaminase (AID) (SEQ ID NO: 28). In addition to fusing the deaminase, a copy of Uracyl Glycosylase Inhibitor (UGI) (SEQ ID NO: 31) was also fused. See Table 3. Various N- and C-terminal fusion combinations were generated, as shown in Table 4. Cas9 base editing constructs were also generated with either inactivated Cas9 (dCas9) or Cas9 nickase (nCas9) carrying the D10A mutation. Base editing constructs were cloned into a pcda3.1 backbone (Invitrogen).









TABLE 3







Base editing construct components








Component
Sequence





Variant Cas12i2
MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLFGGIT


(SEQ ID NO: 4)
PEIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIASDNLVEKF



EEYYGGTASDAIKQYFSASIGESYYWNDCRQQYYDLCRELGVEVSD



LTHDLEILCREKCLAVATESNQNNSIISVLFGTGEKEDRSVKLRITKKI



LEAISNLKEIPKNVAPIQEIILNVAKATKETFRQVYAGNLGAPSTLEKF



IAKDGQKEFDLKKLQTDLKKVIRGKSKERDWCCQEELRSYVEQNTI



QYDLWAWGEMFNKAHTALKIKSTRNYNFAKQRLEQFKEIQSLNNLL



VVKKLNDFFDSEFFSGEETYTICVHHLGGKDLSKLYKAWEDDPADP



ENAIVVLCDDLKNNFKKEPIRNILRYIFTIRQECSAQDILAAAKYNQQ



LDRYKSQKANPSVLGNQGFTWTNAVILPEKAQRNDRPNSLDLRIWL



YLKLRHPDGRWKKHHIPFYDTRFFQEIYAAGNSPVDTCQFRTPRFGY



HLPKLTDQTAIRVNKKHVKAAKTEARIRLAIQQGTLPVSNLKITEISA



TINSKGQVRIPVKFRVGRQKGTLQIGDRFCGYDQNQTASHAYSLWE



VVKEGQYHKELGCFVRFISSGDIVSITENRGNQFDQLSYEGLAYPQY



ADWRKKASKFVSLWQITKKNKKKEIVTVEAKEKFDAICKYQPRLYK



FNKEYAYLLRDIVRGKSLVELQQIRQEIFRFIEQDCGVTRLGSLSLSTL



ETVKAVKGIIYSYFSTALNASKNNPISDEQRKEFDPELFALLEKLELIR



TRKKKQKVERIANSLIQTCLENNIKFIRGEGDLSTTNNATKKKANSRS



MDWLARGVFNKIRQLAPMHNITLFGCGSLYTSHQDPLVHRNPDKA



MKCRWAAIPVKDIGDWVLRKLSQNLRAKNRGTGEYYHQGVKEFLS



HYELQDLEEELLKWRSDRKSNIPCWVLQNRLAEKLGNKEAVVYIPV



RGGRIYFATHKVATGAVSIVFDQKQVWVCNADHVAAANIALTGKGI



GEQSSDEENPDGSRIKLQLTS





dCas12i2(D599A)
MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLFGGIT


(SEQ ID NO: 25)
PEIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIASDNLVEKF



EEYYGGTASDAIKQYFSASIGESYYWNDCRQQYYDLCRELGVEVSD



LTHDLEILCREKCLAVATESNQNNSIISVLFGTGEKEDRSVKLRITKKI



LEAISNLKEIPKNVAPIQEIILNVAKATKETFRQVYAGNLGAPSTLEKF



IAKDGQKEFDLKKLQTDLKKVIRGKSKERDWCCQEELRSYVEQNTI



QYDLWAWGEMFNKAHTALKIKSTRNYNFAKQRLEQFKEIQSLNNLL



VVKKLNDFFDSEFFSGEETYTICVHHLGGKDLSKLYKAWEDDPADP



ENAIVVLCDDLKNNFKKEPIRNILRYIFTIRQECSAQDILAAAKYNQQ



LDRYKSQKANPSVLGNQGFTWTNAVILPEKAQRNDRPNSLDLRIWL



YLKLRHPDGRWKKHHIPFYDTRFFQEIYAAGNSPVDTCQFRTPRFGY



HLPKLTDQTAIRVNKKHVKAAKTEARIRLAIQQGTLPVSNLKITEISA



TINSKGQVRIPVKFRVGRQKGTLQIGDRFCGYAQNQTASHAYSLWE



VVKEGQYHKELGCFVRFISSGDIVSITENRGNQFDQLSYEGLAYPQY



ADWRKKASKFVSLWQITKKNKKKEIVTVEAKEKFDAICKYQPRLYK



FNKEYAYLLRDIVRGKSLVELQQIRQEIFRFIEQDCGVTRLGSLSLSTL



ETVKAVKGIIYSYFSTALNASKNNPISDEQRKEFDPELFALLEKLELIR



TRKKKQKVERIANSLIQTCLENNIKFIRGEGDLSTTNNATKKKANSRS



MDWLARGVFNKIRQLAPMHNITLFGCGSLYTSHQDPLVHRNPDKA



MKCRWAAIPVKDIGDWVLRKLSQNLRAKNRGTGEYYHQGVKEFLS



HYELQDLEEELLKWRSDRKSNIPCWVLQNRLAEKLGNKEAVVYIPV



RGGRIYFATHKVATGAVSIVFDQKQVWVCNADHVAAANIALTGKGI



GEQSSDEENPDGSRIKLQLTS





dCas9(D10A_H840A)
MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKN


(SEQ ID NO: 26)
LIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKV



DDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKK



LVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQL



VQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKN



GLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQI



GDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHH



QDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFI



KPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILR



RQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET



ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFT



VYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLK



EDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENE



DILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGW



GRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKED



IQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR



HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPV



ENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFL



KDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLI



TQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRM



NTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHD



AYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA



TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDF



ATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDW



DPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSS



FEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGEL



QKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYL



DEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLT



NLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQ



LGGD





nCas9(D10A)
MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKN


(SEQ ID NO: 27)
LIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKV



DDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKK



LVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQL



VQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKN



GLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQI



GDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHH



QDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFI



KPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILR



RQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET



ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFT



VYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLK



EDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENE



DILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGW



GRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKED



IQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR



HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPV



ENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFL



KDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLI



TQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRM



NTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHD



AYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA



TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDF



ATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDW



DPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSS



FEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGEL



QKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYL



DEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLT



NLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQ



LGGD





AID
MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLR


(SEQ ID NO: 28)
NKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVA



DFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGVQIAIMTFKD



YFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRILQ





A3A
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVK


(SEQ ID NO: 29)
MDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVT



WFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYDYDYLYKEAL



QMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQA



LSGRLRAILQNQGN





ABE8_20
MSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGW


(SEQ ID NO: 30)
NRAIGLHDPTAHAEIMALRQGGLVMQNYRLYDATLYSTFEPCVMCA



GAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHHPGMNHRVEITEGIL



ADECAALLCRFFRMPRRVFNAQKKAQSSTD





UGI
MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDE


(SEQ ID NO: 31)
STDENVMLLTSDAPEYKPWALVIQDSNGENKIKML





SGGS
SGGS


(SEQ ID NO: 32)






SV40NLS
PKKKRKV


(SEQ ID NO: 33)






mH6
MKIEEGKGHHHHHH


(SEQ ID NO: 34)






3xGGGGS
GGGGSGGGGSGGGGS


(SEQ ID NO: 35)






Nuclear plasma NLS
KRPAATKKAGQAKKKK


(npNLS)



(SEQ ID NO: 36)






T2A
EGRGSLLTCGDVEENPGP


(SEQ ID NO: 37)






bipartite NLS (bpNLS)
MKRTADGSEFESPKKKRKV


(SEQ ID NO: 38)






P2A
ATNFSLLKQAGDVEENPGP


(SEQ ID NO: 39)






2xSGGS-XTEN-
SGGSSGGSSGSETPGTSESATPESSGGSSGGS


2xSGGS



(SEQ ID NO: 40)
















TABLE 4







Base editing constructs.








Construct
Sequence





bpNLS-AID-SGGS-
MKRTADGSEFESPKKKRKV-


UGI-2xSGGS-XTEN-
MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFG


2xSGGS-
YLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDC


dCas12i2(D599A)-
ARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGV


npNLS
QIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLR


(SEQ ID NO: 41)
RILQ-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLF



GGITPEIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIAS



DNLVEKFEEYYGGTASDAIKQYFSASIGESYYWNDCRQQYYDL



CRELGVEVSDLTHDLEILCREKCLAVATESNQNNSIISVLFGTGE



KEDRSVKLRITKKILEAISNLKEIPKNVAPIQEIILNVAKATKETF



RQVYAGNLGAPSTLEKFIAKDGQKEFDLKKLQTDLKKVIRGKS



KERDWCCQEELRSYVEQNTIQYDLWAWGEMFNKAHTALKIKS



TRNYNFAKQRLEQFKEIQSLNNLLVVKKLNDFFDSEFFSGEETY



TICVHHLGGKDLSKLYKAWEDDPADPENAIVVLCDDLKNNFKK



EPIRNILRYIFTIRQECSAQDILAAAKYNQQLDRYKSQKANPSVL



GNQGFTWTNAVILPEKAQRNDRPNSLDLRIWLYLKLRHPDGRW



KKHHIPFYDTRFFQEIYAAGNSPVDTCQFRTPRFGYHLPKLTDQ



TAIRVNKKHVKAAKTEARIRLAIQQGTLPVSNLKITEISATINSK



GQVRIPVKFRVGRQKGTLQIGDRFCGYAQNQTASHAYSLWEVV



KEGQYHKELGCFVRFISSGDIVSITENRGNQFDQLSYEGLAYPQ



YADWRKKASKFVSLWQITKKNKKKEIVTVEAKEKFDAICKYQP



RLYKFNKEYAYLLRDIVRGKSLVELQQIRQEIFRFIEQDCGVTRL



GSLSLSTLETVKAVKGIIYSYFSTALNASKNNPISDEQRKEFDPEL



FALLEKLELIRTRKKKQKVERIANSLIQTCLENNIKFIRGEGDLST



TNNATKKKANSRSMDWLARGVFNKIRQLAPMHNITLFGCGSLY



TSHQDPLVHRNPDKAMKCRWAAIPVKDIGDWVLRKLSQNLRA



KNRGTGEYYHQGVKEFLSHYELQDLEEELLKWRSDRKSNIPCW



VLQNRLAEKLGNKEAVVYIPVRGGRIYFATHKVATGAVSIVFD



QKQVWVCNADHVAAANIALTGKGIGEQSSDEENPDGSRIKLQL



TS-KRPAATKKAGQAKKKK





bpNLS-AID-2xSGGS-
MKRTADGSEFESPKKKRKV-


XTEN-2xSGGS-
MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFG


dCas12i2(D599A)-
YLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDC


npNLS-3xGGGGS-UGI
ARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGV


(SEQ ID NO: 42)
QIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLR



RILQ-SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLF



GGITPEIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIAS



DNLVEKFEEYYGGTASDAIKQYFSASIGESYYWNDCRQQYYDL



CRELGVEVSDLTHDLEILCREKCLAVATESNQNNSIISVLFGTGE



KEDRSVKLRITKKILEAISNLKEIPKNVAPIQEIILNVAKATKETF



RQVYAGNLGAPSTLEKFIAKDGQKEFDLKKLQTDLKKVIRGKS



KERDWCCQEELRSYVEQNTIQYDLWAWGEMFNKAHTALKIKS



TRNYNFAKQRLEQFKEIQSLNNLLVVKKLNDFFDSEFFSGEETY



TICVHHLGGKDLSKLYKAWEDDPADPENAIVVLCDDLKNNFKK



EPIRNILRYIFTIRQECSAQDILAAAKYNQQLDRYKSQKANPSVL



GNQGFTWTNAVILPEKAQRNDRPNSLDLRIWLYLKLRHPDGRW



KKHHIPFYDTRFFQEIYAAGNSPVDTCQFRTPRFGYHLPKLTDQ



TAIRVNKKHVKAAKTEARIRLAIQQGTLPVSNLKITEISATINSK



GQVRIPVKFRVGRQKGTLQIGDRFCGYAQNQTASHAYSLWEVV



KEGQYHKELGCFVRFISSGDIVSITENRGNQFDQLSYEGLAYPQ



YADWRKKASKFVSLWQITKKNKKKEIVTVEAKEKFDAICKYQP



RLYKFNKEYAYLLRDIVRGKSLVELQQIRQEIFRFIEQDCGVTRL



GSLSLSTLETVKAVKGIIYSYFSTALNASKNNPISDEQRKEFDPEL



FALLEKLELIRTRKKKQKVERIANSLIQTCLENNIKFIRGEGDLST



TNNATKKKANSRSMDWLARGVFNKIRQLAPMHNITLFGCGSLY



TSHQDPLVHRNPDKAMKCRWAAIPVKDIGDWVLRKLSQNLRA



KNRGTGEYYHQGVKEFLSHYELQDLEEELLKWRSDRKSNIPCW



VLQNRLAEKLGNKEAVVYIPVRGGRIYFATHKVATGAVSIVFD



QKQVWVCNADHVAAANIALTGKGIGEQSSDEENPDGSRIKLQL



TS-KRPAATKKAGQAKKKK-GGGGSGGGGSGGGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML





mH6-dCas12i2(D599A)-
MKIEEGKGHHHHHH-


npNLS-2xSGGS-XTEN-
MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLF


2xSGGS-AID-SGGS-
GGITPEIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIAS


UGI-SGGS-bpNLS
DNLVEKFEEYYGGTASDAIKQYFSASIGESYYWNDCRQQYYDL


(SEQ ID NO: 43)
CRELGVEVSDLTHDLEILCREKCLAVATESNQNNSIISVLFGTGE



KEDRSVKLRITKKILEAISNLKEIPKNVAPIQEIILNVAKATKETF



RQVYAGNLGAPSTLEKFIAKDGQKEFDLKKLQTDLKKVIRGKS



KERDWCCQEELRSYVEQNTIQYDLWAWGEMFNKAHTALKIKS



TRNYNFAKQRLEQFKEIQSLNNLLVVKKLNDFFDSEFFSGEETY



TICVHHLGGKDLSKLYKAWEDDPADPENAIVVLCDDLKNNFKK



EPIRNILRYIFTIRQECSAQDILAAAKYNQQLDRYKSQKANPSVL



GNQGFTWTNAVILPEKAQRNDRPNSLDLRIWLYLKLRHPDGRW



KKHHIPFYDTRFFQEIYAAGNSPVDTCQFRTPRFGYHLPKLTDQ



TAIRVNKKHVKAAKTEARIRLAIQQGTLPVSNLKITEISATINSK



GQVRIPVKFRVGRQKGTLQIGDRFCGYAQNQTASHAYSLWEVV



KEGQYHKELGCFVRFISSGDIVSITENRGNQFDQLSYEGLAYPQ



YADWRKKASKFVSLWQITKKNKKKEIVTVEAKEKFDAICKYQP



RLYKFNKEYAYLLRDIVRGKSLVELQQIRQEIFRFIEQDCGVTRL



GSLSLSTLETVKAVKGIIYSYFSTALNASKNNPISDEQRKEFDPEL



FALLEKLELIRTRKKKQKVERIANSLIQTCLENNIKFIRGEGDLST



TNNATKKKANSRSMDWLARGVFNKIRQLAPMHNITLFGCGSLY



TSHQDPLVHRNPDKAMKCRWAAIPVKDIGDWVLRKLSQNLRA



KNRGTGEYYHQGVKEFLSHYELQDLEEELLKWRSDRKSNIPCW



VLQNRLAEKLGNKEAVVYIPVRGGRIYFATHKVATGAVSIVFD



QKQVWVCNADHVAAANIALTGKGIGEQSSDEENPDGSRIKLQL



TS-KRPAATKKAGQAKKKK-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFG



YLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDC



ARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGV



QIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLR



RILQ-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGS-MKRTADGSEFESPKKKRKV





bpNLS-A3A-SGGS-
MKRTADGSEFESPKKKRKV-


UGI-2xSGGS-XTEN-
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGT


2xSGGS-
SVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPA


dCas12i2(D599A)-
QIYRVTWFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYD


npNLS
YDYLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPF


(SEQ ID NO: 44)
QPWDGLDEHSQALSGRLRAILQNQGN-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLF



GGITPEIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIAS



DNLVEKFEEYYGGTASDAIKQYFSASIGESYYWNDCRQQYYDL



CRELGVEVSDLTHDLEILCREKCLAVATESNQNNSIISVLFGTGE



KEDRSVKLRITKKILEAISNLKEIPKNVAPIQEIILNVAKATKETF



RQVYAGNLGAPSTLEKFIAKDGQKEFDLKKLQTDLKKVIRGKS



KERDWCCQEELRSYVEQNTIQYDLWAWGEMFNKAHTALKIKS



TRNYNFAKQRLEQFKEIQSLNNLLVVKKLNDFFDSEFFSGEETY



TICVHHLGGKDLSKLYKAWEDDPADPENAIVVLCDDLKNNFKK



EPIRNILRYIFTIRQECSAQDILAAAKYNQQLDRYKSQKANPSVL



GNQGFTWTNAVILPEKAQRNDRPNSLDLRIWLYLKLRHPDGRW



KKHHIPFYDTRFFQEIYAAGNSPVDTCQFRTPRFGYHLPKLTDQ



TAIRVNKKHVKAAKTEARIRLAIQQGTLPVSNLKITEISATINSK



GQVRIPVKFRVGRQKGTLQIGDRFCGYAQNQTASHAYSLWEVV



KEGQYHKELGCFVRFISSGDIVSITENRGNQFDQLSYEGLAYPQ



YADWRKKASKFVSLWQITKKNKKKEIVTVEAKEKFDAICKYQP



RLYKFNKEYAYLLRDIVRGKSLVELQQIRQEIFRFIEQDCGVTRL



GSLSLSTLETVKAVKGIIYSYFSTALNASKNNPISDEQRKEFDPEL



FALLEKLELIRTRKKKQKVERIANSLIQTCLENNIKFIRGEGDLST



TNNATKKKANSRSMDWLARGVFNKIRQLAPMHNITLFGCGSLY



TSHQDPLVHRNPDKAMKCRWAAIPVKDIGDWVLRKLSQNLRA



KNRGTGEYYHQGVKEFLSHYELQDLEEELLKWRSDRKSNIPCW



VLQNRLAEKLGNKEAVVYIPVRGGRIYFATHKVATGAVSIVFD



QKQVWVCNADHVAAANIALTGKGIGEQSSDEENPDGSRIKLQL



TS-KRPAATKKAGQAKKKK





bpNLS-A3A-2xSGGS-
MKRTADGSEFESPKKKRKV-


XTEN-2xSGGS-
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGT


dCas12i2(D599A)-
SVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPA


npNLS-3xGGGGS-UGI
QIYRVTWFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYD


(SEQ ID NO: 45)
YDYLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPF



QPWDGLDEHSQALSGRLRAILQNQGN-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLF



GGITPEIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIAS



DNLVEKFEEYYGGTASDAIKQYFSASIGESYYWNDCRQQYYDL



CRELGVEVSDLTHDLEILCREKCLAVATESNQNNSIISVLFGTGE



KEDRSVKLRITKKILEAISNLKEIPKNVAPIQEIILNVAKATKETF



RQVYAGNLGAPSTLEKFIAKDGQKEFDLKKLQTDLKKVIRGKS



KERDWCCQEELRSYVEQNTIQYDLWAWGEMFNKAHTALKIKS



TRNYNFAKQRLEQFKEIQSLNNLLVVKKLNDFFDSEFFSGEETY



TICVHHLGGKDLSKLYKAWEDDPADPENAIVVLCDDLKNNFKK



EPIRNILRYIFTIRQECSAQDILAAAKYNQQLDRYKSQKANPSVL



GNQGFTWTNAVILPEKAQRNDRPNSLDLRIWLYLKLRHPDGRW



KKHHIPFYDTRFFQEIYAAGNSPVDTCQFRTPRFGYHLPKLTDQ



TAIRVNKKHVKAAKTEARIRLAIQQGTLPVSNLKITEISATINSK



GQVRIPVKFRVGRQKGTLQIGDRFCGYAQNQTASHAYSLWEVV



KEGQYHKELGCFVRFISSGDIVSITENRGNQFDQLSYEGLAYPQ



YADWRKKASKFVSLWQITKKNKKKEIVTVEAKEKFDAICKYQP



RLYKFNKEYAYLLRDIVRGKSLVELQQIRQEIFRFIEQDCGVTRL



GSLSLSTLETVKAVKGIIYSYFSTALNASKNNPISDEQRKEFDPEL



FALLEKLELIRTRKKKQKVERIANSLIQTCLENNIKFIRGEGDLST



TNNATKKKANSRSMDWLARGVFNKIRQLAPMHNITLFGCGSLY



TSHQDPLVHRNPDKAMKCRWAAIPVKDIGDWVLRKLSQNLRA



KNRGTGEYYHQGVKEFLSHYELQDLEEELLKWRSDRKSNIPCW



VLQNRLAEKLGNKEAVVYIPVRGGRIYFATHKVATGAVSIVFD



QKQVWVCNADHVAAANIALTGKGIGEQSSDEENPDGSRIKLQL



TS-KRPAATKKAGQAKKKK-GGGGSGGGGSGGGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML





mH6-dCas12i2(D599A)-
MKIEEGKGHHHHHH-


npNLS-2xSGGS-XTEN-
MSSAIKSYKSVLRPNERKNQLLKSTIQCLEDGSAFFFKMLQGLF


2xSGGS-A3A-SGGS-
GGITPEIVRFSTEQEKQQQDIALWCAVNWFRPVSQDSLTHTIAS


UGI-SGGS-bpNLS
DNLVEKFEEYYGGTASDAIKQYFSASIGESYYWNDCRQQYYDL


(SEQ ID NO: 46)
CRELGVEVSDLTHDLEILCREKCLAVATESNQNNSIISVLFGTGE



KEDRSVKLRITKKILEAISNLKEIPKNVAPIQEIILNVAKATKETF



RQVYAGNLGAPSTLEKFIAKDGQKEFDLKKLQTDLKKVIRGKS



KERDWCCQEELRSYVEQNTIQYDLWAWGEMFNKAHTALKIKS



TRNYNFAKQRLEQFKEIQSLNNLLVVKKLNDFFDSEFFSGEETY



TICVHHLGGKDLSKLYKAWEDDPADPENAIVVLCDDLKNNFKK



EPIRNILRYIFTIRQECSAQDILAAAKYNQQLDRYKSQKANPSVL



GNQGFTWTNAVILPEKAQRNDRPNSLDLRIWLYLKLRHPDGRW



KKHHIPFYDTRFFQEIYAAGNSPVDTCQFRTPRFGYHLPKLTDQ



TAIRVNKKHVKAAKTEARIRLAIQQGTLPVSNLKITEISATINSK



GQVRIPVKFRVGRQKGTLQIGDRFCGYAQNQTASHAYSLWEVV



KEGQYHKELGCFVRFISSGDIVSITENRGNQFDQLSYEGLAYPQ



YADWRKKASKFVSLWQITKKNKKKEIVTVEAKEKFDAICKYQP



RLYKFNKEYAYLLRDIVRGKSLVELQQIRQEIFRFIEQDCGVTRL



GSLSLSTLETVKAVKGIIYSYFSTALNASKNNPISDEQRKEFDPEL



FALLEKLELIRTRKKKQKVERIANSLIQTCLENNIKFIRGEGDLST



TNNATKKKANSRSMDWLARGVFNKIRQLAPMHNITLFGCGSLY



TSHQDPLVHRNPDKAMKCRWAAIPVKDIGDWVLRKLSQNLRA



KNRGTGEYYHQGVKEFLSHYELQDLEEELLKWRSDRKSNIPCW



VLQNRLAEKLGNKEAVVYIPVRGGRIYFATHKVATGAVSIVFD



QKQVWVCNADHVAAANIALTGKGIGEQSSDEENPDGSRIKLQL



TS-KRPAATKKAGQAKKKK-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGT



SVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPA



QIYRVTWFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYD



YDYLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPF



QPWDGLDEHSQALSGRLRAILQNQGN-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGS-MKRTADGSEFESPKKKRKV





bpNLS-AID-SGGS-
MKRTADGSEFESPKKKRKV-


UGI-2xSGGS-XTEN-
MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFG


2xSGGS-
YLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDC


dCas9(D10A_H840A)-
ARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGV


npNLS
QIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLR


(SEQ ID NO: 47)
RILQ-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI



KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSN



EMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY



PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP



DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR



RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL



QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRV



NTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF



DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE



DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE



KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKG



ASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKY



VTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI



ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE



DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG



RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK



EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVK



VMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGS



QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY



DVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKM



KNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLV



ETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF



RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV



YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA



NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKK



TEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAY



SVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK



GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALP



SKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQI



SEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNL



GAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLS



QLGGD-KRPAATKKAGQAKKKK





bpNLS-AID-2xSGGS-
MKRTADGSEFESPKKKRKV-


XTEN-2xSGGS-
MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFG


dCas9(D10A_H840A)-
YLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDC


npNLS-3xGGGGS-UGI
ARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGV


(SEQ ID NO: 48)
QIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLR



RILQ-SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI



KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSN



EMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY



PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP



DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR



RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL



QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRV



NTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF



DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE



DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE



KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKG



ASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKY



VTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI



ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE



DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG



RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK



EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVK



VMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGS



QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY



DVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKM



KNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLV



ETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF



RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV



YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA



NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKK



TEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAY



SVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK



GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALP



SKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQI



SEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNL



GAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLS



QLGGD-KRPAATKKAGQAKKKK-GGGGSGGGGSGGGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML





mH6-
MKIEEGKGHHHHHH-


dCas9(D10A_H840A)-
MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI


npNLS-2xSGGS-XTEN-
KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSN


2xSGGS-AID-SGGS-
EMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY


UGI-SGGS-bpNLS
PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP


(SEQ ID NO: 49)
DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR



RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL



QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRV



NTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF



DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE



DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE



KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKG



ASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKY



VTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI



ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE



DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG



RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK



EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVK



VMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGS



QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY



DVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKM



KNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLV



ETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF



RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV



YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA



NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKK



TEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAY



SVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK



GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALP



SKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQI



SEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNL



GAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLS



QLGGD-KRPAATKKAGQAKKKK-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFG



YLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDC



ARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGV



QIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLR



RILQ-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGS-MKRTADGSEFESPKKKRKV





bpNLS-A3A-SGGS-
MKRTADGSEFESPKKKRKV-


UGI-2xSGGS-XTEN-
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGT


2xSGGS-
SVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPA


dCas9(D10A_H840A)-
QIYRVTWFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYD


npNLS
YDYLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPF


(SEQ ID NO: 50)
QPWDGLDEHSQALSGRLRAILQNQGN-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI



KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSN



EMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY



PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP



DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR



RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL



QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRV



NTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF



DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE



DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE



KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKG



ASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKY



VTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI



ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE



DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG



RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK



EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVK



VMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGS



QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY



DVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKM



KNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLV



ETROITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF



RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV



YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA



NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKK



TEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAY



SVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK



GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALP



SKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQI



SEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNL



GAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLS



QLGGD-KRPAATKKAGQAKKKK





bpNLS-A3A-2xSGGS-
MKRTADGSEFESPKKKRKV-


XTEN-2xSGGS-
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGT


dCas9(D10A_H840A)-
SVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPA


npNLS-3xGGGGS-UGI
QIYRVTWFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYD


(SEQ ID NO: 51)
YDYLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPF



QPWDGLDEHSQALSGRLRAILQNQGN-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI



KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSN



EMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY



PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP



DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR



RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL



QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRV



NTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF



DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE



DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE



KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKG



ASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKY



VTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI



ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE



DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG



RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK



EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVK



VMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGS



QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY



DVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKM



KNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLV



ETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF



RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV



YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA



NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKK



TEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAY



SVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK



GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALP



SKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQI



SEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNL



GAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLS



QLGGD-KRPAATKKAGQAKKKK-GGGGSGGGGSGGGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML





mH6-
MKIEEGKGHHHHHH-


dCas9(D10A_H840A)-
MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI


npNLS-2xSGGS-XTEN-
KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSN


2xSGGS-A3A-SGGS-
EMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY


UGI-SGGS-bpNLS
PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP


(SEQ ID NO: 52)
DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR



RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL



QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRV



NTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF



DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE



DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE



KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKG



ASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKY



VTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI



ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE



DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG



RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK



EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVK



VMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGS



QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY



DVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKM



KNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLV



ETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF



RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV



YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA



NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKK



TEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAY



SVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK



GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALP



SKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQI



SEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNL



GAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLS



QLGGD-KRPAATKKAGQAKKKK-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGT



SVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPA



QIYRVTWFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYD



YDYLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPF



QPWDGLDEHSQALSGRLRAILQNQGN-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGS-MKRTADGSEFESPKKKRKV





bpNLS-AID-SGGS-
MKRTADGSEFESPKKKRKV-


UGI-2xSGGS-XTEN-
MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFG


2xSGGS-nCas9(D10A)-
YLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDC


npNLS
ARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGV


(SEQ ID NO: 53)
QIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLR



RILQ-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI



KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSN



EMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY



PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP



DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR



RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL



QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRV



NTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF



DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE



DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE



KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKG



ASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKY



VTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI



ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE



DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG



RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK



EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVK



VMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGS



QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY



DVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKM



KNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLV



ETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF



RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV



YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA



NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKK



TEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAY



SVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK



GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALP



SKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQI



SEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNL



GAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLS



QLGGD-KRPAATKKAGQAKKKK





bpNLS-AID-2xSGGS-
MKRTADGSEFESPKKKRKV-


XTEN-2xSGGS-
MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFG


nCas9(D10A)-npNLS-
YLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDC


3xGGGGS-UGI
ARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGV


(SEQ ID NO: 54)
QIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLR



RILQ-SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI



KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSN



EMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY



PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP



DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR



RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL



QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRV



NTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF



DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE



DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE



KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKG



ASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKY



VTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI



ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE



DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG



RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK



EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVK



VMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGS



QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY



DVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKM



KNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLV



ETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF



RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV



YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA



NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKK



TEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAY



SVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK



GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALP



SKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQI



SEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNL



GAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLS



QLGGD-KRPAATKKAGQAKKKK-GGGGSGGGGSGGGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML





mH6-nCas9(D10A)-
MKIEEGKGHHHHHH-


npNLS-2xSGGS-XTEN-
MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI


2xSGGS-AID-SGGS-
KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSN


UGI-SGGS-bpNLS
EMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY


(SEQ ID NO: 55)
PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP



DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR



RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL



QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRV



NTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF



DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE



DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE



KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKG



ASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKY



VTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI



ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE



DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG



RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK



EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVK



VMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGS



QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY



DVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKM



KNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLV



ETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF



RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV



YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA



NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKK



TEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAY



SVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK



GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALP



SKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQI



SEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNL



GAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLS



QLGGD-KRPAATKKAGQAKKKK-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFG



YLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDC



ARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGV



QIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLR



RILQ-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGS-MKRTADGSEFESPKKKRKV





bpNLS-A3A-SGGS-
MKRTADGSEFESPKKKRKV-


UGI-2xSGGS-XTEN-
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGT


2xSGGS-nCas9(D10A)-
SVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPA


npNLS
QIYRVTWFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYD


(SEQ ID NO: 56)
YDYLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPF



QPWDGLDEHSQALSGRLRAILQNQGN-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI



KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSN



EMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY



PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP



DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR



RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL



QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRV



NTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF



DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE



DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE



KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKG



ASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKY



VTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI



ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE



DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG



RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK



EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVK



VMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGS



QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY



DVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKM



KNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLV



ETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF



RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV



YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA



NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKK



TEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAY



SVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK



GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALP



SKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQI



SEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNL



GAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLS



QLGGD-KRPAATKKAGQAKKKK





bpNLS-A3A-2xSGGS-
MKRTADGSEFESPKKKRKV-


XTEN-2xSGGS-
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGT


nCas9(D10A)-npNLS-
SVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPA


3xGGGGS-UGI
QIYRVTWFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYD


(SEQ ID NO: 57)
YDYLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPF



QPWDGLDEHSQALSGRLRAILQNQGN-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI



KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSN



EMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY



PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP



DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR



RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL



QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRV



NTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF



DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE



DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE



KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKG



ASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKY



VTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI



ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE



DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG



RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK



EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVK



VMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGS



QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY



DVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKM



KNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLV



ETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF



RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV



YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA



NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKK



TEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAY



SVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK



GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALP



SKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQI



SEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNL



GAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLS



QLGGD-KRPAATKKAGQAKKKK-GGGGSGGGGSGGGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML





mH6-nCas9(D10A)-
MKIEEGKGHHHHHH-


npNLS-2xSGGS-XTEN-
MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI


2xSGGS-A3A-SGGS-
KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSN


UGI-SGGS-bpNLS
EMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY


(SEQ ID NO: 58)
PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP



DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR



RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL



QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRV



NTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF



DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE



DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE



KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKG



ASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKY



VTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI



ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILE



DIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG



RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK



EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVK



VMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGS



QILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY



DVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKM



KNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLV



ETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF



RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV



YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA



NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKK



TEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAY



SVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK



GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALP



SKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQI



SEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNL



GAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLS



QLGGD-KRPAATKKAGQAKKKK-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGT



SVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPA



QIYRVTWFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYD



YDYLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPF



QPWDGLDEHSQALSGRLRAILQNQGN-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTA



YDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGS-MKRTADGSEFESPKKKRKV









Each RNA guide sequence with a U6 promoter (Table 5) was cloned into a plasmid backbone and maxi-prepped. A working solution of 144 ng/μL effector plasmids was prepared in water (effector working solution), and a working solution of 50 ng/μL of corresponding guide RNA plasmids was prepared in water (guide working solution).









TABLE 5







RNA guide sequences










Target
Nuclease
Target Sequence
RNA Guide Sequence





AAVS1_T3
Cas12i2
GTGAGAATGGTGCGTCCTAG
AGAAAUCCGUCUUUCAUUGAC




(SEQ ID NO: 66)
GGGUGAGAAUGGUGCGUCCUAG





(SEQ ID NO: 86)





AAVS1_T3
Cas9
TTGTGAGAATGGTGCGTCCT
GUUUUAGAGCUAGAAAUAGCA




(SEQ ID NO: 67)
AGUUAAAAUAAGGCUAGUCCG





UUAUCAACUUGAAAAAGUGGC





ACCGAGUCGGUGCUUGUGAGA





AUGGUGCGUCCU





(SEQ ID NO: 87)





AAVS1_T5
Cas12i2
AACTGGCCCTGGCTTTGGCA
AGAAAUCCGUCUUUCAUUGAC




(SEQ ID NO: 68)
GGAACUGGCCCUGGCUUUGGCA





(SEQ ID NO: 88)





AAVS1_T5
Cas9
GCTTTAACTGGCCCTGGCTT
GUUUUAGAGCUAGAAAUAGCA




(SEQ ID NO: 69)
AGUUAAAAUAAGGCUAGUCCG





UUAUCAACUUGAAAAAGUGGC





ACCGAGUCGGUGCAACUGGCC





CUGGCUUUGGCA





(SEQ ID NO: 89)





AAVS1_T6
Cas12i2
GTAGCCTCTCCCGCTCTGGT
AGAAAUCCGUCUUUCAUUGAC




(SEQ ID NO: 70)
GGGUAGCCUCUCCCGCUCUGGU





(SEQ ID NO: 90)





AAVS1_T6
Cas9
CTTTGTAGCCTCTCCCGCTC
GUUUUAGAGCUAGAAAUAGCA




(SEQ ID NO: 71)
AGUUAAAAUAAGGCUAGUCCG





UUAUCAACUUGAAAAAGUGGC





ACCGAGUCGGUGCCUUUGUAG





CCUCUCCCGCUC





(SEQ ID NO: 91)





EMX1_T2
Cas12i2
GGATGGCGACTTCAGGCACA
AGAAAUCCGUCUUUCAUUGAC




(SEQ ID NO: 72)
GGGGAUGGCGACUUCAGGCACA





(SEQ ID NO: 92)





EMX1_T2
Cas9
TGGATGGCGACTTCAGGCAC
GUUUUAGAGCUAGAAAUAGCA




(SEQ ID NO: 73)
AGUUAAAAUAAGGCUAGUCCG





UUAUCAACUUGAAAAAGUGGC





ACCGAGUCGGUGCUGGAUGGC





GACUUCAGGCAC





(SEQ ID NO: 93)





EMX1_T4
Cas12i2
GGGGAGGCCTGGAGTCATGG
AGAAAUCCGUCUUUCAUUGAC




(SEQ ID NO: 74)
GGGGGGAGGCCUGGAGUCAUGG





(SEQ ID NO: 94)





EMX1_T4
Cas9
TTTGGGGAGGCCTGGAGTCA
GUUUUAGAGCUAGAAAUAGCA




(SEQ ID NO: 75)
AGUUAAAAUAAGGCUAGUCCG





UUAUCAACUUGAAAAAGUGGC





ACCGAGUCGGUGCUUUGGGGA





GGCCUGGAGUCA





(SEQ ID NO: 95)





EMX1_T7
Cas12i2
AGCAAGGGACTATTCAGGGA
AGAAAUCCGUCUUUCAUUGAC




(SEQ ID NO: 76)
GGAGCAAGGGACUAUUCAGGGA





(SEQ ID NO: 96)





EMX1_T7
Cas9
CTTTAGCAAGGGACTATTCA
GUUUUAGAGCUAGAAAUAGCA




(SEQ ID NO: 77)
AGUUAAAAUAAGGCUAGUCCG





UUAUCAACUUGAAAAAGUGGC





ACCGAGUCGGUGCCUUUAGCA





AGGGACUAUUCA





(SEQ ID NO: 97)





EMX1_T8
Cas12i2
AAAATTGAGCAATCTACCCT
AGAAAUCCGUCUUUCAUUGAC




(SEQ ID NO: 78)
GGAAAAUUGAGCAAUCUACCCU





(SEQ ID NO: 98)





EMX1_T8
Cas9
TAAAATTGAGCAATCTACCC
GUUUUAGAGCUAGAAAUAGCA




(SEQ ID NO: 79)
AGUUAAAAUAAGGCUAGUCCG





UUAUCAACUUGAAAAAGUGGC





ACCGAGUCGGUGCUAAAAUUG





AGCAAUCUACCC





(SEQ ID NO: 99)





VEGFA_T1
Cas1212
TGGGGGTGACCGCCGGAGCG
AGAAAUCCGUCUUUCAUUGAC




(SEQ ID NO: 80)
GGUGGGGGUGACCGCCGGAGCG





(SEQ ID NO: 100)





VEGFA_T1
Cas9
TGGGGGTGACCGCCGGAGCG
GUUUUAGAGCUAGAAAUAGCA




(SEQ ID NO: 81)
AGUUAAAAUAAGGCUAGUCCG





UUAUCAACUUGAAAAAGUGGC





ACCGAGUCGGUGCUGGGGGUG





ACCGCCGGAGCG





(SEQ ID NO: 101)





VEGFA_T3
Cas12i2
GTTGACATTGTCCACACCTG
AGAAAUCCGUCUUUCAUUGAC




(SEQ ID NO: 82)
GGGUUGACAUUGUCCACACCUG





(SEQ ID NO: 102)





VEGFA_T3
Cas9
TTGTTGACATTGTCCACACC
GUUUUAGAGCUAGAAAUAGCA




(SEQ ID NO: 83)
AGUUAAAAUAAGGCUAGUCCG





UUAUCAACUUGAAAAAGUGGC





ACCGAGUCGGUGCUUGUUGAC





AUUGUCCACACC





(SEQ ID NO: 103)





VEGFA_T5
Cas12i2
TTAAACTCTCCATGGACCAG
AGAAAUCCGUCUUUCAUUGAC




(SEQ ID NO: 84)
GGUUAAACUCUCCAUGGACCAG





(SEQ ID NO: 104)





VEGFA_T5
Cas9
TTTTAAACTCTCCATGGACC
GUUUUAGAGCUAGAAAUAGCA




(SEQ ID NO: 85)
AGUUAAAAUAAGGCUAGUCCG





UUAUCAACUUGAAAAAGUGGC





ACCGAGUCGGUGCUUUUAAAC





UCUCCAUGGACC





(SEQ ID NO: 105)









Approximately 16 hours prior to transfection, 100 μl of 25,000 HEK293T cells in DMEM/10% FBS+Pen/Strep were plated into each well of a 96-well plate. On the day of transfection, the cells were 70-90% confluent. For each well to be transfected, a mixture of 0.5 μl of Lipofectamine 2000 and 9.5 μl of Opti-MEM was prepared and then incubated at room temperature for 5-20 minutes (Solution 1). After incubation, the lipofectamine:OptiMEM mixture was added to a separate mixture containing 1 μL of the effector working solution, 1 μL of the guide working solution and 8 μL of the OptiMEM media (Solution 2). For apo controls the crRNA was not included in Solution 2. The solution 1 and solution 2 mixtures were mixed by pipetting up and down and then incubated at room temperature for 25 minutes. Following incubation, 20 μL of the Solution 1 and Solution 2 mixture were added dropwise to each well of a 96 well plate containing the cells. 72 hours post transfection, cells are trypsinized by adding 10 μL of TrypLE to the center of each well and incubated for approximately 5 minutes. 100 μL of D10 media was then added to each well and mixed to resuspend cells. The cells were then spun down at 500 g for 10 minutes, and the supernatant was discarded. QuickExtract buffer was added to ⅕ the amount of the original cell suspension volume. Cells were incubated at 65° C. for 15 minutes, 68° C. for 15 minutes, and 98° C. for 10 minutes.


Samples for Next Generation Sequencing were prepared by two rounds of PCR. The first round (PCR1) was used to amplify specific genomic regions depending on the target. PCR1 products were purified by column purification. Round 2 PCR (PCR2) was done to add Illumina adapters and indexes. Reactions were then pooled and purified by column purification. Sequencing runs were done with a 150 cycle NextSeq v2.5 mid or high output kit.


For each target, the percentage of reads with C>T edits was measured for every C within the target. For all targets tested, each of the Cas12i2-deaminase fusions constructs demonstrated C>T editing at one or more cytosines within the target. FIG. 1 shows the highest C>T editing efficiency observed at different targets for each base editing construct. All the Cas12i2-deaminase fusion constructs had similar editing efficiencies at any given target. For EMX1_T4, EMX1_T7, EMX1_T8 and AAVS1_T5, the Cas12i2 base editing efficiency was comparable to that of the dCas9-A3A fusion construct.



FIG. 2 and FIG. 3 show base editing efficiencies of Cas12i2 constructs according to positions within the tested targets. Edit ratio is defined as the fraction of analyzed reads (typically N>=10K) aligning to the genomic reference sequence that also resulted in a gap in said sequence alignment. For each target, the position of C from the 5′-NTTN-3′ PAM sequence (PAM is −3 to 0) is shown on the x-axis and the corresponding C>T editing efficiency at that C is plotted on the y-axis. These aggregated data sets show that for most Cas12i2-deaminase fusion constructs, the optimal editing window was 8-10 nucleotides from the PAM sequence. Compared to the Cas9 base editing constructs with the same deaminases, shown in FIG. 4 and FIG. 5, the Cas12i2 editing window was found to be narrower, potentially allowing for more specific editing compared to Cas9.


Comparisons of C>T base editing by Cas12i2- and Cas9-deaminase fusion constructs at various positions within the EMX1_T4 or EMX1_T7 targets are shown in FIG. 6A-B and FIG. 7A-B, respectively. As shown in FIG. 6A, dCas9-deaminase and nCas9-deaminase constructs induced C>T substitutions primarily at C-3, C8, and C9 (or C0, C10, and C11 according to Cas12i2 numbering). Cas12i2-deaminase constructs induced C>T substitutions primarily at positions C10 and C11, with Cas12i2-deaminase activity exceeding that of Cas9-deaminase activity. As shown in FIG. 7A, dCas9-deaminase and nCas9-deaminase fusion constructs favored C>T substitutions at positions Cl and C7 (or C-3 and C3 according to Cas12i2 numbering). Cas12i2-deaminase fusion constructs, however, favored C>T substitutions at positions C10 and C15. Additionally, as shown in both FIG. 6B and FIG. 7B, Cas12i2- and Cas9-deaminase fusion constructs did not demonstrate significant indel activity. Control sequences (e.g., variant Cas12i2 of SEQ ID NO: 4 and wild-type Cas9), however, were active nucleases.


To increase base editing efficiency, several mutations were introduced into the dCas12i2-NA3A-CUGI fusion construct. These mutations are listed in Table 6. Most mutations substituted the catalytic site residues (D599, D1019 and E833) into negatively charged amino acid residues such as K, N or Q. Some additional mutations tested, such as F626R, G587R and G624R, were predicted from structural analysis to enhance the binding contacts with the dsDNA target. FIG. 8 and FIG. 9 show the raw editing efficiency for each of these variants. Two variants showed consistent fold improvement of 1.0-2.5 across most targets tested—the variant containing single point mutant G587R, and the variant containing combo mutations of G587R_G624R_F626R. In addition, some catalytic residue mutations such as D599K D1019K also showed an improvement over dCas12i2-NA3A-CUGI. Therefore, this result demonstrates that the base editing efficiency of dCas12i2 base editors can be improved significantly by engineering the dCas12i2 effector for improved substrate binding.









TABLE 6







dCas12i2 Variants for increased base editing activity.









Substitution(s)














Single Mutants
G587R




D599A




D599K




F626R




E833Q




E833N




D1019K




D1019N



Combination Mutants
D599K D1019K




D599K D1019N




D599K E833N D1019K




G587R G624R F626R










Example 2—Base Editing Mediated by Cas12i4

This Example describes editing of multiple mammalian targets using inactivated Cas12i4 fused to a deaminase.


To generate base editing fusion constructs, the variant Cas12i4 of SEQ ID NO: 10 was first deactivated by mutating the catalytic D608 residue to alanine. See Table 7. The deactivated Cas12i4 variant (referred to as dCas12i4 herein and having the sequence set forth in SEQ ID NO: 59) was then fused to one of the two cytidine deaminases—humanAPOBEC3a (A3A) or Activation Induced Deaminase (AID). In addition to fusing the deaminase, a copy of Uracyl Glycosylase Inhibitor (UGI) was also fused. Various N- and C-terminal fusion combinations were generated, as shown in Table 8.









TABLE 7







Cas12i4 sequences.








Cas 12i4
Sequence





Variant Cas 12i4
MASISRPYGTKLRPDARKKEMLDKFFNTLTKGQRVFADLALCIYGSLT


(SEQ ID NO: 10)
LEMAKSLEPESDSELVCAIGWFRLVDKTIWSKDGIKQENLVKQYEAYS



GKEASEVVKTYLNSPSSDKYVWIDCRQKFLRFQRELGTRNLSEDFECM



LFEQYIRLTKGEIEGYAAISNMFGNGEKEDRSKKRMYATRMKDWLEA



NENITWEQYREALKNQLNAKNLEQVVANYKGNAGGADPFFKYSFSKE



GMVSKKEHAQQLDKFKTVLKNKARDLNFPNKEKLKQYLEAEIGIPVD



ANVYSQMFSNGVSEVQPKTTRNMSFSNEKLDLLTELKDLNKGDGFEY



AREVLNGFFDSELHTTEDKFNITSRYLGGDKSNRLSKLYKIWKKEGVD



CEEGIQQFCEAVKDKMGQIPIRNVLKYLWQFRETVSAEDFEAAAKAN



HLEEKISRVKAHPIVISNRYWAFGTSALVGNIMPADKRHQGEYAGQNF



KMWLRAELHYDGKKAKHHLPFYNARFFEEVYCYHPSVAEITPFKTKQ



FGCEIGKDIPDYVSVALKDNPYKKATKRILRAIYNPVANTTRVDKTTN



CSFMIKRENDEYKLVINRKISRDRPKRIEVGRTIMGYARNQTASDTYWI



GRLVPPGTRGAYRIGEWSVQYIKSGPVLSSTQGVNNSTTDQLVYNGM



PSSSERFKAWKKARMAFIRKLIRQLNDEGLESKGQDYIPENPSSFDVRG



ETLYVFNSNYLKALVSKHRKAKKPVEGILDEIEAWTSKDKDSCSLMRL



SSLSDASMQGIASLKSLINSYFNKNGCKTIEDKEKFNPVLYAKLVEVEQ



RRTNKRSEKVGRIAGSLEQLALLNGVEVVIGEADLGEVEKGKSKKQNS



RNMDWCAKQVAQRLEYKLAFHGIGYFGVNPMYTSHQDPFEHRRVAD



HIVMRARFEEVNVENIAEWHVRNFSNYLRADSGTGLYYKQATMDFLK



HYGLEEHAEGLENKKIKFYDFRKILEDKNLTSVIIPKRGGRIYMATNPV



TSDSTPITYAGKTYNRCNADEVAAANIVISVLAPRSKKNREQDDIPLIT



KKAESKSPPKDRKRSKTSQLPQK





dCas12i4(D608A)
MASISRPYGTKLRPDARKKEMLDKFFNTLTKGQRVFADLALCIYGSLT


(SEQ ID NO: 59)
LEMAKSLEPESDSELVCAIGWFRLVDKTIWSKDGIKQENLVKQYEAYS



GKEASEVVKTYLNSPSSDKYVWIDCRQKFLRFQRELGTRNLSEDFECM



LFEQYIRLTKGEIEGYAAISNMFGNGEKEDRSKKRMYATRMKDWLEA



NENITWEQYREALKNQLNAKNLEQVVANYKGNAGGADPFFKYSFSKE



GMVSKKEHAQQLDKFKTVLKNKARDLNFPNKEKLKQYLEAEIGIPVD



ANVYSQMFSNGVSEVQPKTTRNMSFSNEKLDLLTELKDLNKGDGFEY



AREVLNGFFDSELHTTEDKFNITSRYLGGDKSNRLSKLYKIWKKEGVD



CEEGIQQFCEAVKDKMGQIPIRNVLKYLWQFRETVSAEDFEAAAKAN



HLEEKISRVKAHPIVISNRYWAFGTSALVGNIMPADKRHQGEYAGQNF



KMWLRAELHYDGKKAKHHLPFYNARFFEEVYCYHPSVAEITPFKTKQ



FGCEIGKDIPDYVSVALKDNPYKKATKRILRAIYNPVANTTRVDKTTN



CSFMIKRENDEYKLVINRKISRDRPKRIEVGRTIMGYDRNQTASDTYWI



GRLVPPGTRGAYRIGEWSVQYIKSGPVLSSTQGVNNSTTDQLVYNGM



PSSSERFKAWKKARMAFIRKLIRQLNDEGLESKGQDYIPENPSSFDVRG



ETLYVFNSNYLKALVSKHRKAKKPVEGILDEIEAWTSKDKDSCSLMRL



SSLSDASMQGIASLKSLINSYFNKNGCKTIEDKEKFNPVLYAKLVEVEQ



RRTNKRSEKVGRIAGSLEQLALLNGVEVVIGEADLGEVEKGKSKKQNS



RNMDWCAKQVAQRLEYKLAFHGIGYFGVNPMYTSHQDPFEHRRVAD



HIVMRARFEEVNVENIAEWHVRNFSNYLRADSGTGLYYKQATMDFLK



HYGLEEHAEGLENKKIKFYDFRKILEDKNLTSVIIPKRGGRIYMATNPV



TSDSTPITYAGKTYNRCNADEVAAANIVISVLAPRSKKNREQDDIPLIT



KKAESKSPPKDRKRSKTSQLPQK
















TABLE 8







Cas12i4 base editing constructs.








Construct
Sequence





bpNLS-AID-
MKRTADGSEFESPKKKRKV-


SGGS-UGI-
MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKN


2xSGGS-XTEN-
GCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGN


2xSGGS-Cas12i4
PNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGVQIAIMTFKDYFYCWNTF


(D608A)-npNLS
VENHERTFKAWEGLHENSVRLSRQLRRILQ-SGGS-


(SEQ ID NO: 60)
MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDEST



DENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MASISRPYGTKLRPDARKKEMLDKFFNTLTKGQRVFADLALCIYGSLTL



EMAKSLEPESDSELVCAIGWFRLVDKTIWSKDGIKQENLVKQYEAYSGK



EASEVVKTYLNSPSSDKYVWIDCRQKFLRFQRELGTRNLSEDFECMLFE



QYIRLTKGEIEGYAAISNMFGNGEKEDRSKKRMYATRMKDWLEANENI



TWEQYREALKNQLNAKNLEQVVANYKGNAGGADPFFKYSFSKEGMVS



KKEHAQQLDKFKTVLKNKARDLNFPNKEKLKQYLEAEIGIPVDANVYS



QMFSNGVSEVQPKTTRNMSFSNEKLDLLTELKDLNKGDGFEYAREVLN



GFFDSELHTTEDKFNITSRYLGGDKSNRLSKLYKIWKKEGVDCEEGIQQF



CEAVKDKMGQIPIRNVLKYLWQFRETVSAEDFEAAAKANHLEEKISRVK



AHPIVISNRYWAFGTSALVGNIMPADKRHQGEYAGQNFKMWLRAELHY



DGKKAKHHLPFYNARFFEEVYCYHPSVAEITPFKTKQFGCEIGKDIPDYV



SVALKDNPYKKATKRILRAIYNPVANTTRVDKTTNCSFMIKRENDEYKL



VINRKISRDRPKRIEVGRTIMGYDRNQTASDTYWIGRLVPPGTRGAYRIG



EWSVQYIKSGPVLSSTQGVNNSTTDQLVYNGMPSSSERFKAWKKARMA



FIRKLIRQLNDEGLESKGQDYIPENPSSFDVRGETLYVFNSNYLKALVSK



HRKAKKPVEGILDEIEAWTSKDKDSCSLMRLSSLSDASMQGIASLKSLIN



SYFNKNGCKTIEDKEKFNPVLYAKLVEVEQRRTNKRSEKVGRIAGSLEQ



LALLNGVEVVIGEADLGEVEKGKSKKQNSRNMDWCAKQVAQRLEYKL



AFHGIGYFGVNPMYTSHQDPFEHRRVADHIVMRARFEEVNVENIAEWH



VRNFSNYLRADSGTGLYYKQATMDFLKHYGLEEHAEGLENKKIKFYDF



RKILEDKNLTSVIIPKRGGRIYMATNPVTSDSTPITYAGKTYNRCNADEV



AAANIVISVLAPRSKKNREQDDIPLITKKAESKSPPKDRKRSKTSQLPQK-



KRPAATKKAGQAKKKK





bpNLS-AID-
MKRTADGSEFESPKKKRKV-


2xSGGS-XTEN- 
MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKN


2xSGGS-Cas12i4
GCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGN


(D608A)-
PNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGVQIAIMTFKDYFYCWNTF


npNLS-
VENHERTFKAWEGLHENSVRLSRQLRRILQ-


3xGGGGS-UGI
SGGSSGGSSGSETPGTSESATPESSGGSSGGS-


(SEQ ID NO: 61)
MASISRPYGTKLRPDARKKEMLDKFFNTLTKGQRVFADLALCIYGSLTL



EMAKSLEPESDSELVCAIGWFRLVDKTIWSKDGIKQENLVKQYEAYSGK



EASEVVKTYLNSPSSDKYVWIDCRQKFLRFQRELGTRNLSEDFECMLFE



QYIRLTKGEIEGYAAISNMFGNGEKEDRSKKRMYATRMKDWLEANENI



TWEQYREALKNQLNAKNLEQVVANYKGNAGGADPFFKYSFSKEGMVS



KKEHAQQLDKFKTVLKNKARDLNFPNKEKLKQYLEAEIGIPVDANVYS



QMFSNGVSEVQPKTTRNMSFSNEKLDLLTELKDLNKGDGFEYAREVLN



GFFDSELHTTEDKFNITSRYLGGDKSNRLSKLYKIWKKEGVDCEEGIQQF



CEAVKDKMGQIPIRNVLKYLWQFRETVSAEDFEAAAKANHLEEKISRVK



AHPIVISNRYWAFGTSALVGNIMPADKRHQGEYAGQNFKMWLRAELHY



DGKKAKHHLPFYNARFFEEVYCYHPSVAEITPFKTKQFGCEIGKDIPDYV



SVALKDNPYKKATKRILRAIYNPVANTTRVDKTTNCSFMIKRENDEYKL



VINRKISRDRPKRIEVGRTIMGYDRNQTASDTYWIGRLVPPGTRGAYRIG



EWSVQYIKSGPVLSSTQGVNNSTTDQLVYNGMPSSSERFKAWKKARMA



FIRKLIRQLNDEGLESKGQDYIPENPSSFDVRGETLYVFNSNYLKALVSK



HRKAKKPVEGILDEIEAWTSKDKDSCSLMRLSSLSDASMQGIASLKSLIN



SYFNKNGCKTIEDKEKFNPVLYAKLVEVEQRRTNKRSEKVGRIAGSLEQ



LALLNGVEVVIGEADLGEVEKGKSKKQNSRNMDWCAKQVAQRLEYKL



AFHGIGYFGVNPMYTSHQDPFEHRRVADHIVMRARFEEVNVENIAEWH



VRNFSNYLRADSGTGLYYKQATMDFLKHYGLEEHAEGLENKKIKFYDF



RKILEDKNLTSVIIPKRGGRIYMATNPVTSDSTPITYAGKTYNRCNADEV



AAANIVISVLAPRSKKNREQDDIPLITKKAESKSPPKDRKRSKTSQLPQK-



KRPAATKKAGQAKKKK-GGGGSGGGGSGGGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDEST



DENVMLLTSDAPEYKPWALVIQDSNGENKIKML





mH6-Cas12i4
MKIEEGKGHHHHHH-


(D608A)-
MASISRPYGTKLRPDARKKEMLDKFFNTLTKGQRVFADLALCIYGSLTL


npNLS-2xSGGS-
EMAKSLEPESDSELVCAIGWFRLVDKTIWSKDGIKQENLVKQYEAYSGK


XTEN-2xSGGS-
EASEVVKTYLNSPSSDKYVWIDCRQKFLRFQRELGTRNLSEDFECMLFE


AID-SGGS-
QYIRLTKGEIEGYAAISNMFGNGEKEDRSKKRMYATRMKDWLEANENI


UGI-SGGS-
TWEQYREALKNQLNAKNLEQVVANYKGNAGGADPFFKYSFSKEGMVS


bpNLS
KKEHAQQLDKFKTVLKNKARDLNFPNKEKLKQYLEAEIGIPVDANVYS


(SEQ ID NO: 62)
QMFSNGVSEVQPKTTRNMSFSNEKLDLLTELKDLNKGDGFEYAREVLN



GFFDSELHTTEDKFNITSRYLGGDKSNRLSKLYKIWKKEGVDCEEGIQQF



CEAVKDKMGQIPIRNVLKYLWQFRETVSAEDFEAAAKANHLEEKISRVK



AHPIVISNRYWAFGTSALVGNIMPADKRHQGEYAGQNFKMWLRAELHY



DGKKAKHHLPFYNARFFEEVYCYHPSVAEITPFKTKQFGCEIGKDIPDYV



SVALKDNPYKKATKRILRAIYNPVANTTRVDKTTNCSFMIKRENDEYKL



VINRKISRDRPKRIEVGRTIMGYDRNQTASDTYWIGRLVPPGTRGAYRIG



EWSVQYIKSGPVLSSTQGVNNSTTDQLVYNGMPSSSERFKAWKKARMA



FIRKLIRQLNDEGLESKGQDYIPENPSSFDVRGETLYVFNSNYLKALVSK



HRKAKKPVEGILDEIEAWTSKDKDSCSLMRLSSLSDASMQGIASLKSLIN



SYFNKNGCKTIEDKEKFNPVLYAKLVEVEQRRTNKRSEKVGRIAGSLEQ



LALLNGVEVVIGEADLGEVEKGKSKKQNSRNMDWCAKQVAQRLEYKL



AFHGIGYFGVNPMYTSHQDPFEHRRVADHIVMRARFEEVNVENIAEWH



VRNFSNYLRADSGTGLYYKQATMDFLKHYGLEEHAEGLENKKIKFYDF



RKILEDKNLTSVIIPKRGGRIYMATNPVTSDSTPITYAGKTYNRCNADEV



AAANIVISVLAPRSKKNREQDDIPLITKKAESKSPPKDRKRSKTSQLPQK-



KRPAATKKAGQAKKKK-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MDPATFTYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKN



GCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGN



PNLSLRIFTARLYFCEDRKAEPEGLRRLAEAGVQIAIMTFKDYFYCWNTF



VENHERTFKAWEGLHENSVRLSRQLRRILQ-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDEST



DENVMLLTSDAPEYKPWALVIQDSNGENKIKML-SGGS-



MKRTADGSEFESPKKKRKV





bpNLS-A3A-
MKRTADGSEFESPKKKRKV-


SGGS-UGI-
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMD


2xSGGS-XTEN-
QHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISW


2xSGGS-Cas12i4
SPCFSAGCAGEVRAFLQENTHVRLRIFAARIYDYDYLYKEALQMLRDAG


(D608A)-npNLS
AQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQ


(SEQ ID NO: 63)
NQGN-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDEST



DENVMLLTSDAPEYKPWALVIQDSNGENKIKML-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MASISRPYGTKLRPDARKKEMLDKFFNTLTKGQRVFADLALCIYGSLTL



EMAKSLEPESDSELVCAIGWFRLVDKTIWSKDGIKQENLVKQYEAYSGK



EASEVVKTYLNSPSSDKYVWIDCRQKFLRFQRELGTRNLSEDFECMLFE



QYIRLTKGEIEGYAAISNMFGNGEKEDRSKKRMYATRMKDWLEANENI



TWEQYREALKNQLNAKNLEQVVANYKGNAGGADPFFKYSFSKEGMVS



KKEHAQQLDKFKTVLKNKARDLNFPNKEKLKQYLEAEIGIPVDANVYS



QMFSNGVSEVQPKTTRNMSFSNEKLDLLTELKDLNKGDGFEYAREVLN



GFFDSELHTTEDKFNITSRYLGGDKSNRLSKLYKIWKKEGVDCEEGIQQF



CEAVKDKMGQIPIRNVLKYLWQFRETVSAEDFEAAAKANHLEEKISRVK



AHPIVISNRYWAFGTSALVGNIMPADKRHQGEYAGQNFKMWLRAELHY



DGKKAKHHLPFYNARFFEEVYCYHPSVAEITPFKTKQFGCEIGKDIPDYV



SVALKDNPYKKATKRILRAIYNPVANTTRVDKTTNCSFMIKRENDEYKL



VINRKISRDRPKRIEVGRTIMGYDRNQTASDTYWIGRLVPPGTRGAYRIG



EWSVQYIKSGPVLSSTQGVNNSTTDQLVYNGMPSSSERFKAWKKARMA



FIRKLIRQLNDEGLESKGQDYIPENPSSFDVRGETLYVFNSNYLKALVSK



HRKAKKPVEGILDEIEAWTSKDKDSCSLMRLSSLSDASMQGIASLKSLIN



SYFNKNGCKTIEDKEKFNPVLYAKLVEVEQRRTNKRSEKVGRIAGSLEQ



LALLNGVEVVIGEADLGEVEKGKSKKQNSRNMDWCAKQVAQRLEYKL



AFHGIGYFGVNPMYTSHQDPFEHRRVADHIVMRARFEEVNVENIAEWH



VRNFSNYLRADSGTGLYYKQATMDFLKHYGLEEHAEGLENKKIKFYDF



AAANIVISVLAPRSKKNREQDDIPLITKKAESKSPPKDRKRSKTSQLPQK-



KRPAATKKAGQAKKKK





bpNLS-A3A-
MKRTADGSEFESPKKKRKV-


2xSGGS-XTEN-
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMD


2xSGGS-Cas12i4
QHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISW


(D608A)-
SPCFSAGCAGEVRAFLQENTHVRLRIFAARIYDYDYLYKEALQMLRDAG


npNLS-
AQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQ


3xGGGGS-UGI
NQGN-SGGSSGGSSGSETPGTSESATPESSGGSSGGS-


(SEQ ID NO: 64)
MASISRPYGTKLRPDARKKEMLDKFFNTLTKGQRVFADLALCIYGSLTL



EMAKSLEPESDSELVCAIGWFRLVDKTIWSKDGIKQENLVKQYEAYSGK



EASEVVKTYLNSPSSDKYVWIDCRQKFLRFQRELGTRNLSEDFECMLFE



QYIRLTKGEIEGYAAISNMFGNGEKEDRSKKRMYATRMKDWLEANENI



TWEQYREALKNQLNAKNLEQVVANYKGNAGGADPFFKYSFSKEGMVS



KKEHAQQLDKFKTVLKNKARDLNFPNKEKLKQYLEAEIGIPVDANVYS



QMFSNGVSEVQPKTTRNMSFSNEKLDLLTELKDLNKGDGFEYAREVLN



GFFDSELHTTEDKFNITSRYLGGDKSNRLSKLYKIWKKEGVDCEEGIQQF



CEAVKDKMGQIPIRNVLKYLWQFRETVSAEDFEAAAKANHLEEKISRVK



AHPIVISNRYWAFGTSALVGNIMPADKRHQGEYAGQNFKMWLRAELHY



DGKKAKHHLPFYNARFFEEVYCYHPSVAEITPFKTKQFGCEIGKDIPDYV



SVALKDNPYKKATKRILRAIYNPVANTTRVDKTTNCSFMIKRENDEYKL



VINRKISRDRPKRIEVGRTIMGYDRNQTASDTYWIGRLVPPGTRGAYRIG



EWSVQYIKSGPVLSSTQGVNNSTTDQLVYNGMPSSSERFKAWKKARMA



FIRKLIRQLNDEGLESKGQDYIPENPSSFDVRGETLYVFNSNYLKALVSK



HRKAKKPVEGILDEIEAWTSKDKDSCSLMRLSSLSDASMQGIASLKSLIN



SYFNKNGCKTIEDKEKFNPVLYAKLVEVEQRRTNKRSEKVGRIAGSLEQ



LALLNGVEVVIGEADLGEVEKGKSKKQNSRNMDWCAKQVAQRLEYKL



AFHGIGYFGVNPMYTSHQDPFEHRRVADHIVMRARFEEVNVENIAEWH



VRNFSNYLRADSGTGLYYKQATMDFLKHYGLEEHAEGLENKKIKFYDF



RKILEDKNLTSVIIPKRGGRIYMATNPVTSDSTPITYAGKTYNRCNADEV



AAANIVISVLAPRSKKNREQDDIPLITKKAESKSPPKDRKRSKTSQLPQK-



KRPAATKKAGQAKKKK-GGGGSGGGGSGGGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDEST



DENVMLLTSDAPEYKPWALVIQDSNGENKIKML





mH6-Cas12i4
MKIEEGKGHHHHHH-


(D608A)-
MASISRPYGTKLRPDARKKEMLDKFFNTLTKGQRVFADLALCIYGSLTL


npNLS-2xSGGS-
EMAKSLEPESDSELVCAIGWFRLVDKTIWSKDGIKQENLVKQYEAYSGK


XTEN-2xSGGS-
EASEVVKTYLNSPSSDKYVWIDCRQKFLRFQRELGTRNLSEDFECMLFE


A3A-SGGS-
QYIRLTKGEIEGYAAISNMFGNGEKEDRSKKRMYATRMKDWLEANENI


UGI-SGGS-
TWEQYREALKNQLNAKNLEQVVANYKGNAGGADPFFKYSFSKEGMVS


bpNLS
KKEHAQQLDKFKTVLKNKARDLNFPNKEKLKQYLEAEIGIPVDANVYS


(SEQ ID NO: 65)
QMFSNGVSEVQPKTTRNMSFSNEKLDLLTELKDLNKGDGFEYAREVLN



GFFDSELHTTEDKFNITSRYLGGDKSNRLSKLYKIWKKEGVDCEEGIQQF



CEAVKDKMGQIPIRNVLKYLWQFRETVSAEDFEAAAKANHLEEKISRVK



AHPIVISNRYWAFGTSALVGNIMPADKRHQGEYAGQNFKMWLRAELHY



DGKKAKHHLPFYNARFFEEVYCYHPSVAEITPFKTKQFGCEIGKDIPDYV



SVALKDNPYKKATKRILRAIYNPVANTTRVDKTTNCSFMIKRENDEYKL



VINRKISRDRPKRIEVGRTIMGYDRNQTASDTYWIGRLVPPGTRGAYRIG



EWSVQYIKSGPVLSSTQGVNNSTTDQLVYNGMPSSSERFKAWKKARMA



FIRKLIRQLNDEGLESKGQDYIPENPSSFDVRGETLYVFNSNYLKALVSK



HRKAKKPVEGILDEIEAWTSKDKDSCSLMRLSSLSDASMQGIASLKSLIN



SYFNKNGCKTIEDKEKFNPVLYAKLVEVEQRRTNKRSEKVGRIAGSLEQ



LALLNGVEVVIGEADLGEVEKGKSKKQNSRNMDWCAKQVAQRLEYKL



AFHGIGYFGVNPMYTSHQDPFEHRRVADHIVMRARFEEVNVENIAEWH



VRNFSNYLRADSGTGLYYKQATMDFLKHYGLEEHAEGLENKKIKFYDF



RKILEDKNLTSVIIPKRGGRIYMATNPVTSDSTPITYAGKTYNRCNADEV



AAANIVISVLAPRSKKNREQDDIPLITKKAESKSPPKDRKRSKTSQLPQK-



KRPAATKKAGQAKKKK-



SGGSSGGSSGSETPGTSESATPESSGGSSGGS-



MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMD



QHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISW



SPCFSAGCAGEVRAFLQENTHVRLRIFAARIYDYDYLYKEALQMLRDAG



AQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQ



NQGN-SGGS-



MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDEST



DENVMLLTSDAPEYKPWALVIQDSNGENKIKML-SGGS-



MKRTADGSEFESPKKKRKV









Each RNA guide sequence with a U6 promoter (Table 9) was cloned into a plasmid backbone and maxi-prepped. A working solution of 144 ng/μL effector plasmids was prepared in water (effector working solution), and a working solution of 50 ng/μL of corresponding guide RNA plasmids was prepared in water (guide working solution).









TABLE 9







RNA guide sequences










Target
Nuclease
Target Sequence
RNA Guide Sequence





EMX1_T7
Cas1214
AGCAAGGGACTATTCAGGGA
AGACAUGUGUCCUCAGUGACA




(SEQ ID NO: 76)
CAGCAAGGGACUAUUCAGGGA





(SEQ ID NO: 121)





EMX1_T7
Cas12i2
AGCAAGGGACTATTCAGGGA
AGAAAUCCGUCUUUCAUUGAC




(SEQ ID NO: 76)
GGAGCAAGGGACUAUUCAGGGA





(SEQ ID NO: 96)





EMX1_T7
Cas9
CTTTAGCAAGGGACTATTCA
GUUUUAGAGCUAGAAAUAGCA




(SEQ ID NO: 77)
AGUUAAAAUAAGGCUAGUCCG





UUAUCAACUUGAAAAAGUGGC





ACCGAGUCGGUGCCUUUAGCA





AGGGACUAUUCA





(SEQ ID NO: 97)









Cells were transfected and C>T reads were measured for every C within the target as described in Example 1. Each of the Cas12i4-deaminase fusions constructs demonstrated C>T editing at one or more cytosines within the EMX1_T7 target. FIG. 10 shows base editing efficiencies of Cas12i4, Cas12i2, and Cas9 constructs according to positions within the tested targets. As shown in FIG. 10, the Cas12i4-deaminase fusion construct of SEQ ID NO: 64 and the Cas12i2-deaminase fusion construct of SEQ ID NO: 45 each demonstrated C>T base editing activity at C10 and C15 within the Cas12i EMX1_T7 target, and the Cas9-deaminase fusion construct of SEQ ID NO: 51 demonstrated C>T base editing activity at C7 and C14 of the Cas9 EMX1_T7 target. Therefore, the fusion strategy used for Cas12i2 was compatible with Cas12i4, and Cas12i4-deaminase fusion constructs exhibited similar editing profiles as the Cas12i2-deaminase fusion constructs.


Therefore, this Example shows that like Cas12i4-deaminase constructs and Cas12i2-deaminase constructs introduced C>T edits in targets.


Other Embodiments

The disclosures of each and every patent, patent application, and publication cited herein are hereby incorporated herein by reference in their entirety. While this disclosure has been described with reference to specific embodiments, it is apparent that other embodiments and variations of this disclosure may be devised by others skilled in the art without departing from the true spirit and scope of the invention. The appended claims are intended to be construed to include all such embodiments and equivalent variations.

Claims
  • 1. A Cas12i fusion protein comprising: i) a Cas12i polypeptide comprising an alteration (e.g., comprising a plurality of alterations) relative to the amino acid sequence of SEQ ID NO: 2, wherein the alteration (e.g., plurality of alterations) is selected from the group comprising D581R, D911R, I926R, V1030G, S1046G, G624R, F626R, E1035R, and P868T, wherein the Cas12i2 polypeptide comprises at least 90% identity to the amino acid sequence of SEQ ID NO: 2, and wherein the Cas12i polypeptide has reduced nuclease activity or is a nuclease dead Cas12i polypeptide; andii) a heterologous sequence comprising a deaminase domain.
  • 2. The Cas12i fusion protein of claim 1, wherein the Cas12i polypeptide comprises one or more alterations in a catalytic residue, wherein the one or more catalytic residues are selected from D599, E833, and D1019.
  • 3. The Cas12i fusion protein of claim 1 or 2, wherein the Cas12i polypeptide comprises one or more alterations in a catalytic residue, wherein the one or more alterations are selected from D599A, D599K, E833Q, E833N, D1019K, and D1019N.
  • 4. The Cas12i fusion protein of claim 2 or 3, wherein the one or more alterations in a catalytic residue comprise: (i) D1019K and D599K;(ii) D1019N and D599K; or(iii) D1019K, E833N, and D599K.
  • 5. The Cas12i fusion protein of any of the preceding claims, wherein the plurality of alterations further comprises G587R.
  • 6. The Cas12i fusion protein of any of the preceding claims, wherein the plurality of alterations further comprise a second alteration relative to the amino acid sequence of SEQ ID NO: 2.
  • 7. The Cas12i fusion protein of claim 6, wherein the second alteration comprises a substitution, insertion, or deletion.
  • 8. The Cas12i fusion protein of claim 7, wherein the Cas12i polypeptide further comprises a third alteration relative to the amino acid sequence of SEQ ID NO: 2, and optionally further comprises a fourth, a fifth, a sixth, a seventh, eighth, ninth, and tenth alteration relative to the amino acid sequence of SEQ ID NO: 2.
  • 9. The Cas12i fusion protein of claim 8, wherein the third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth alterations, if present, each independently comprises a substitution, insertion, or deletion.
  • 10. The Cas12i fusion protein of any of the preceding claims, wherein the plurality of alterations comprise one or more of (e.g., 2, 3, or all of) D581R, D911R, I926R, and V1030G.
  • 11. The Cas12i fusion protein of any of claims 1-9, wherein the plurality of alterations comprise one or more of (e.g., 2 or all of) D581R, I926R, and V1030G.
  • 12. The Cas12i fusion protein of any of claims 1-9, wherein the plurality of alterations comprise one or more of (e.g., 2, 3, or all of) D581R, I926R, V1030G, and S1046G.
  • 13. The Cas12i fusion protein of any of claims 1-9, wherein the plurality of alterations comprise one or more of (e.g., 2, 3, 4, 5, 6, or all of) D581R, G624R, F626R, I926R, V1030G, E1035R, and S1046G.
  • 14. The Cas12i fusion protein of any of claims 1-9, wherein the plurality of alterations comprise one or more of (e.g., 2, 3, 4, 5, 6, or all of) D581R, G624R, F626R, P868T, I926R, V1030G, E1035R, and S1046G.
  • 15. The Cas12i fusion protein of any one of claims 1-14, wherein the Cas12i polypeptide comprises at least 95% or 99% identity to the amino acid sequence of SEQ ID NO: 2.
  • 16. The Cas12i fusion protein of any one of claims 1-15, wherein the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 45, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the Cas12i fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.
  • 17. The Cas12i fusion protein of any one of claims 1-15, wherein the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 41-44 or 46, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.
  • 18. A Cas12i polypeptide comprising an alteration relative to the amino acid sequence of SEQ ID NO: 2, wherein the alteration is selected from D1019K or D1019N.
  • 19. A Cas12i fusion protein comprising the Cas12i polypeptide of claim 18 and a heterologous sequence comprising a deaminase domain.
  • 20. A Cas12i fusion protein comprising: i) a Cas12i polypeptide comprising an alteration (e.g., a plurality of alterations) relative to the amino acid sequence of SEQ ID NO: 9, wherein the alteration (e.g., plurality of alterations) is selected from the group comprising E480R, G564R, V592R, or E1042R, wherein the Cas12i polypeptide comprises at least 90% identity to the amino acid sequence of SEQ ID NO: 9, and wherein the Cas12i polypeptide has reduced nuclease activity or is a nuclease dead Cas12i polypeptide; andii) a heterologous sequence comprising a deaminase domain.
  • 21. The Cas12i fusion protein of claim 20, wherein the Cas12i polypeptide comprises an alteration in a catalytic residue, wherein optionally the alteration comprises an alteration at one or more of D608 (e.g., D608A), E844, and D1022.
  • 22. The Cas12i fusion protein of claim 20 or 21, wherein the Cas12i polypeptide further comprises a second alteration relative to the amino acid sequence of SEQ ID NO: 9.
  • 23. The Cas12i fusion protein of claim 22, wherein the second alteration comprises a substitution, insertion, or deletion.
  • 24. The Cas12i fusion protein of claim 23, wherein the Cas12i polypeptide further comprises a third alteration, and optionally further comprises a fourth, a fifth, a sixth, a seventh, eighth, ninth, and tenth alteration.
  • 25. The Cas12i fusion protein of claim 24, wherein the third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth alterations, if present, each independently comprises a substitution, insertion, or deletion.
  • 26. The Cas12i fusion protein of any of claims 20-25, wherein the plurality of alterations comprise E480R, G564R, V592R, and E1042R.
  • 27. The Cas12i fusion protein of claim 26, wherein the Cas12i polypeptide further comprises an alteration in a catalytic residue, wherein the alteration comprises D608A.
  • 28. The Cas12i fusion protein of any one of claims 20-27, wherein the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 60-63, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein Cas12i the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.
  • 29. The Cas12i fusion protein of any one of claims 20-27, wherein the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 64, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.
  • 30. The Cas12i fusion protein of any of the preceding claims, wherein the heterologous sequence is N-terminal or C-terminal of the Cas12i polypeptide.
  • 31. The Cas12i fusion protein of any of the preceding claims, wherein the heterologous sequence is N-terminal of the Cas12i polypeptide.
  • 32. The Cas12i fusion protein of any one of claims 1-30, wherein the heterologous sequence is C-terminal of the Cas12i polypeptide.
  • 33. The Cas12i fusion protein of any of the preceding claims, wherein the deaminase domain is chosen from a human APOBEC3 family deaminase, an Activation Induced Deaminase (AID), or an ABE8 deaminase, or a functional fragment or variant thereof, or a deaminase domain having at least 80% amino acid sequence identity thereto.
  • 34. The Cas12i fusion protein of claim 33, wherein the human APOBEC3 family deaminase is A3A comprising an amino acid sequence of SEQ ID NO: 29, the AID deaminase comprises an amino acid sequence of SEQ ID NO: 28, or the ABE8 is ABE8_20 (SEQ ID NO: 30), or a functional fragment or variant thereof, or a deaminase domain having at least 80% amino acid sequence identity thereto.
  • 35. The Cas12i fusion protein of any one of claims 1-19, wherein the deaminase domain is chosen from humanAPOBEC3a (A3A; SEQ ID NO: 29) or Activation Induced Deaminase (AID; SEQ ID NO: 28), or a functional fragment or variant thereof, or a deaminase domain having at least 80% amino acid sequence identity thereto.
  • 36. The Cas12i fusion protein of any one of claims 20-33, wherein the deaminase domain is chosen from an APOBEC3 family deaminase or ABE8_20, or a functional fragment or variant thereof, or a deaminase domain having at least 80% amino acid sequence identity thereto.
  • 37. The Cas12i fusion protein of any of the preceding claims, wherein the heterologous sequence further comprises at least one peptide linker.
  • 38. The Cas12i fusion protein of claim 37, wherein the peptide linker comprises between 3 and 70 (e.g., 3-10, 10-15, 15-20, 20-25, 25-30, 30-35, 35-40, 40-45, 45-50, 50-55, 55-60, 60-65, or 65-70) amino acid residues.
  • 39. The Cas12i fusion protein of any of the preceding claims, wherein the peptide linker comprises one or more Gly residues and one or more Ser residues.
  • 40. The Cas12i fusion protein of any one of claims 37-39, wherein the peptide linker comprises (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15).
  • 41. The Cas12i fusion protein of any of claim 37-40, wherein the peptide linker comprises one or more proline residues.
  • 42. The Cas12i fusion protein of any of claims 39-41, wherein the peptide linker comprises the structure of: L1-L2-L3 wherein L1 and L3 are each independently chosen from (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15), andL2 is a polypeptide comprising between 0-40 (e.g., 0-5, 5-10, 10-15, 15-20, 20-25, 25-30, 30-35, or 35-40) amino acid residues.
  • 43. The fusion protein of claim 42, wherein L2 is an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SGSETPGTSESATPES (SEQ ID NO: 106).
  • 44. The Cas12i fusion protein of any of claims 37-43, wherein the peptide linker comprises an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 40.
  • 45. The Cas12i fusion protein of any of claims 1-36, wherein the Cas12i fusion protein does not comprise a linker sequence.
  • 46. The Cas12i fusion protein of any of the preceding claims, wherein heterologous sequence is heterologous to both the Cas12i polypeptide and the deaminase domain.
  • 47. The Cas12i fusion protein of any of the preceding claims, wherein the heterologous sequence comprises a Uracil Glycosylase Inhibitor (UGI) polypeptide.
  • 48. The Cas21i fusion protein of any of the preceding claims, wherein the heterologous sequence comprises a nuclear localization sequence (NLS) polypeptide.
  • 49. The Cas12i fusion protein of any of the preceding claims, wherein the Cas12i fusion protein forms a complex with a ribonucleic acid (RNA) guide wherein the RNA guide comprises a nuclease binding sequence (e.g., a direct repeat sequence) and a DNA-binding sequence (e.g., a spacer sequence).
  • 50. A method of introducing a substitution in a target sequence of a target nucleic acid in a cell, the method comprising: contacting the cell with: (i) a Cas12i domain, (ii) an RNA guide, and (iii) a deaminase domain, or nucleic acid encoding (i), (ii), and (iii),wherein the target nucleic acid comprises a nucleotide position 0, wherein nucleotide position 0 of the target strand hybridizes with the direct repeat-proximal end of the spacer sequence of the RNA guide, and a position x (wherein optionally x=20), wherein position x of the target strand hybridizes with the direct repeat-distal end of the spacer sequence of the RNA guide,and wherein the target nucleic acid comprises an A or a C between positions 5-16 (e.g., between positions 7-12, e.g., between positions 8-11) on the target strand or the non-target strand,wherein the A is substituted to a inosine (I) (e.g., converts an A:T base pair to an I:C, I:U, or I:A base pair) or a guanine (G) or the C is substituted to a U or T (e.g., converts a C:G base pair to a T:A base pair).
  • 51. A method of introducing a substitution in a target sequence of a target nucleic acid in a cell, the method comprising: contacting the cell with (i) a Cas12i fusion protein of any of claims 1-49, and (ii) an RNA guide, or a nucleic acid encoding (i) and (ii),thereby introducing the substitution.
  • 52. The method of claim 51, wherein the cell is in vivo.
  • 53. The method of claim 51, wherein the cell is ex vivo.
  • 54. A composition comprising: a) the Cas12i fusion protein of any one of claims 1-49; andb) an RNA guide or a nucleic acid encoding the RNA guide, wherein the RNA guide comprises a nuclease binding sequence (e.g., a direct repeat sequence) and a DNA-binding sequence (e.g., a spacer sequence).
  • 55. The Cas12i fusion protein of claim 49, the method of any one of claims 50-53, or the composition of claim 54, wherein the spacer sequence comprises about 10 nucleotides to about 50 nucleotides in length (e.g., about 15 nucleotides and about 35 nucleotides in length).
  • 56. The Cas12i fusion protein of claim 49 or 55, the method of any one of claim 50-53 or 55, or the composition of claim 54 or 55, wherein the spacer sequence is substantially identical to a target sequence of a target nucleic acid.
  • 57. The Cas12i fusion protein, the method, or the composition of claim 56, wherein the target sequence is adjacent to a protospacer adjacent motif (PAM) sequence.
  • 58. The Cas12i fusion protein, the method, or the composition of claim 57, wherein the PAM sequence comprises a sequence set forth as 5′-NTTN-3′, wherein N is any nucleotide.
CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional Application No. 63/242,940, filed Sep. 10, 2021 and U.S. Provisional Application No. 63/270,513 filed Oct. 21, 2021. The contents of the aforementioned applications are hereby incorporated by reference in their entirety.

Provisional Applications (2)
Number Date Country
63270513 Oct 2021 US
63242940 Sep 2021 US