COMPOSITIONS COMPRISING A CAS12I POLYPEPTIDE AND USES THEREOF

SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted electronically in XML file format and is hereby incorporated by reference in its entirety. Said XML copy, created on Nov. 17, 2022, is named A2186-704110_SL.xml and is 184,137 bytes in size. No new matter has been added.

BACKGROUND

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated (Cas) genes, collectively known as CRISPR-Cas or CRISPR/Cas systems, are adaptive immune systems in archaea and bacteria that defend particular species against foreign genetic elements.

SUMMARY OF THE INVENTION

It is against the above background that the present invention provides certain advantages and advancements over the prior art. Although this invention disclosed herein is not limited to specific advantages or functionalities, the invention provides Cas12i fusion proteins, compositions, systems, and methods of using the Cas12i fusion proteins. In particular, such Cas12i fusion proteins contain one or more domains, wherein at least one of the domains is a deaminase domain and wherein at least one of the domains is a Cas12i domain or biologically active portion thereof. The Cas12i domain in the Cas12i fusion proteins may bind to a target sequence on a target nucleic acid specified by an RNA guide. While the amino acid numbering system used herein is in relation to SEQ ID NO: 2, other Cas12i sequences can be used. One of ordinary skill in the art can identify the corresponding amino acid positions in another Cas12i sequences using available tools, such as sequence alignment algorithms.

In one aspect, the disclosure provides a Cas12i fusion protein comprising:

- i) a Cas12i polypeptide comprising an alteration (e.g., comprising a plurality of alterations) relative to the amino acid sequence of SEQ ID NOs: 2, wherein the alteration is selected from the group comprising G587R, G624R, F626R, E833Q, E833N, D1019K, D1019N, D581R, D911R, I926R, V1030G, E1035R, S1046G, and P868T, and wherein the Cas12i2 polypeptide comprises at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% identity to the amino acid sequence of SEQ ID NO: 2; and
- ii) a heterologous sequence comprising a deaminase domain.

In one aspect, the disclosure provides a Cas12i fusion protein comprising:

- i) a Cas12i polypeptide comprising an alteration (e.g., comprising a plurality of alterations) relative to the amino acid sequence of SEQ ID NO: 2, wherein the alteration (e.g., plurality of alterations) is selected from the group comprising D581R, D911R, I926R, V1030G, S1046G, G624R, F626R, E1035R, and P868T, wherein the Cas12i2 polypeptide comprises at least 90% identity to the amino acid sequence of SEQ ID NO: 2; and
- ii) a heterologous sequence comprising a deaminase domain.

In one aspect, the disclosure provides a Cas12i fusion protein comprising:

- i) a Cas12i polypeptide comprising an alteration (e.g., comprising a plurality of alterations) relative to the amino acid sequence of SEQ ID NO: 2, wherein the alteration (e.g., plurality of alterations) is selected from the group comprising D581R, D911R, I926R, V1030G, S1046G, G624R, F626R, E1035R, and P868T, wherein the Cas12i2 polypeptide comprises at least 90% identity to the amino acid sequence of SEQ ID NO: 2, and wherein the Cas12i polypeptide has reduced nuclease activity or is a nuclease dead Cas12i polypeptide; and
- ii) a heterologous sequence comprising a deaminase domain.

In some embodiments the Cas12i polypeptide comprises one or more alterations in a catalytic residue, wherein the one or more catalytic residues are selected from D599, E833, and D1019.

In certain embodiments, the Cas12i polypeptide comprises one or more alterations in a catalytic residue, wherein the one or more alterations are selected from D599A, D599K, E833Q, E833N, D1019K, and D1019N.

In some embodiments, the alteration in a catalytic residue comprises D599A. In certain embodiments, the alteration in a catalytic residue comprises D599K. In some embodiments, the alteration in a catalytic residue comprises E833Q. In one embodiment, the alteration in a catalytic residue comprises E833N. In certain embodiments, the alteration in a catalytic residue comprises D1019K. In some embodiments, the alteration in a catalytic residue comprises D1019N.

In one embodiment, the one or more alterations in a catalytic residue comprises D1019K and D599K.

In certain embodiments, the one or more alterations in the catalytic residue comprises D1019N and D599K.

In one embodiment, the one or more alterations in the catalytic residue comprises D1019K, E833N, and D599K.

In certain embodiments, the plurality of alterations further comprises G587R.

In some embodiments, the alteration comprises G624R. In some embodiments, the alteration comprises F626R. In some embodiments, the alteration comprises D581R. In certain embodiments, the alteration comprises D911R. In some embodiments, the alteration comprises I926R. In certain embodiments, the alteration comprises V1030G. In some embodiments, the alteration comprises S1046G. In certain embodiments, the alteration comprises E1035R. In one embodiment, the alteration comprises P868T.

In certain embodiments, the plurality of alterations further comprise a second alteration relative to the amino acid sequence of SEQ ID NO: 2.

In certain embodiments, the second alteration comprises a substitution, insertion, or deletion.

In some embodiments, the Cas12i polypeptide further comprises a third alteration relative to the amino acid sequence of SEQ ID NO: 2, and optionally further comprises a fourth, a fifth, a sixth, a seventh, eighth, ninth, and tenth alteration relative to the amino acid sequence of SEQ ID NO: 2.

In certain embodiments, the third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth alterations, if present, each independently comprises a substitution, insertion, or deletion.

In some embodiments, the plurality of alterations comprise one or more of (e.g., 2, 3, or all of) D581R, D911R, I926R, and V1030G.

In certain embodiments, the plurality of alterations comprise one or more of (e.g., 2 or all of) D581R, I926R, and V1030G.

In certain embodiments, the plurality of alterations comprise one or more of (e.g., 2, 3, or all of) D581R, I926R, V1030G, and S1046G.

In some embodiments, the plurality of alterations comprise one or more of (e.g., 2, 3, 4, 5, 6, or all of) D581R, G624R, F626R, I926R, V1030G, E1035R, and S1046G.

In certain embodiments, the plurality of alterations comprise one or more of (e.g., 2, 3, 4, 5, 6, or all of) D581R, G624R, F626R, P868T, I926R, V1030G, E1035R, and S1046G.

In one embodiment, the plurality of alterations comprise:

- i) D581R, D911R, I926R, and V1030G;
- ii) D581R, I926R, and V1030G;
- iii) D581R, I926R, V1030G, and S1046G;
- iv) D581R, G624R, F626R, I926R, V1030G, E1035R, and S1046G; or
- v) D581R, G624R, F626R, P868T, I926R, V1030G, E1035R, and S1046G.

In certain embodiments the Cas12i polypeptide comprises at least 95% or 99% identity to the amino acid sequence of SEQ ID NO: 2.

In certain embodiments, an amino acid sequence according to SEQ ID NO: 41, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.

In some embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 42, or a sequence having at least 80%, 5%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.

In one embodiment, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 43, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.

In certain embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 44, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.

In some embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 45, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.

In certain embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 46, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.

In one aspect, the disclosure provides a Cas12i polypeptide comprising an alteration relative to the amino acid sequence of SEQ ID NO: 2, wherein the alteration is selected from D1019K or D1019N.

In one aspect, the disclosure provides a Cas12i fusion protein comprising the Cas12i polypeptide of the immediate preceding aspect and a heterologous sequence comprising a deaminase domain.

In one aspect, the disclosure provides a Cas12i fusion protein comprising:

- i) a Cas12i polypeptide comprising an alteration (e.g., a plurality of alterations) relative to the amino acid sequence of SEQ ID NO: 9, wherein the alteration (e.g., plurality of alterations) is selected from the group comprising E480R, G564R, V592R, or E1042R, wherein the Cas12i polypeptide comprises at least 90% identity to the amino acid sequence of SEQ ID NO: 9, and wherein the Cas12i polypeptide has reduced nuclease activity or is a nuclease dead Cas12i polypeptide; and
- ii) a heterologous sequence comprising a deaminase domain.

In some embodiments, the alteration comprises E480R. In one embodiment, the alteration comprises G564R. In certain embodiments, the alteration comprises V592R. In some embodiments, the alteration comprises E1042R.

In certain embodiments, the Cas12i polypeptide comprises an alteration in a catalytic residue, wherein optionally the alteration comprises an alteration at one or more of D608 (e.g., D608A), E844, and D1022.

In certain embodiments, the Cas12i polypeptide further comprises a second alteration relative to the amino acid sequence of SEQ ID NO: 9.

In some embodiments, the second alteration comprises a substitution, insertion, or deletion.

In certain embodiments, the Cas12i polypeptide further comprises a third alteration, and optionally further comprises a fourth, a fifth, a sixth, a seventh, eighth, ninth, and tenth alteration.

In certain embodiments, the third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth alterations, if present, each independently comprises a substitution, insertion, or deletion.

In certain embodiments, the plurality of alterations comprise E480R, G564R, V592R, and E1042R.

In some embodiments, the Cas12i polypeptide further comprises an alteration in a catalytic residue, wherein the alteration comprises D608A.

In certain embodiments, the Cas12i fusion protein an amino acid sequence according to SEQ ID NO: 60, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.

In certain embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 61, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.

In some embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 62, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.

In certain embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 63, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.

In certain embodiments, the Cas12i fusion protein comprises an amino acid sequence according to SEQ ID NO: 64, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a RNA guide, can introduce a substitution in a target sequence of a target nucleic acid.

In certain embodiments, the heterologous sequence is N-terminal or C-terminal of the Cas12i polypeptide. In some embodiments, the heterologous sequence is N-terminal of the Cas12i polypeptide. In certain embodiments, the heterologous sequence is C-terminal of the Cas12i polypeptide.

In some embodiments, the deaminase domain is chosen from a human APOBEC3 family deaminase, an Activation Induced Deaminase (AID), or an ABE8 deaminase, or a functional fragment or variant thereof, or a deaminase domain having at least 80% amino acid sequence identity thereto.

In certain embodiments, the human APOBEC3 family deaminase is A3A comprising an amino acid sequence of SEQ ID NO: 29, the AID deaminase comprises an amino acid sequence of SEQ ID NO: 28, or the ABE8 is ABE8_20 (SEQ ID NO: 30), or a functional fragment or variant thereof, or a deaminase domain having at least 80% amino acid sequence identity thereto.

In some embodiments, the deaminase domain is chosen from humanAPOBEC3a (A3A; SEQ ID NO: 29) or Activation Induced Deaminase (AID; SEQ ID NO: 28), or a functional fragment or variant thereof, or a deaminase domain having at least 80% amino acid sequence identity thereto.

In certain embodiments, the deaminase domain is chosen from an APOBEC3 family deaminase or ABE8_20, or a functional fragment or variant thereof, or a deaminase domain having at least 80% amino acid sequence identity thereto.

In certain embodiments, the heterologous sequence further comprises at least one peptide linker. In some embodiments, the peptide linker comprises between 3 and 70 (e.g., 3-10, 10-15, 15-20, 20-25, 25-30, 30-35, 35-40, 40-45, 45-50, 50-55, 55-60, 60-65, or 65-70) amino acid residues. In certain embodiments, the peptide linker comprises one or more Gly residues and one or more Ser residues. In some embodiments, the peptide linker comprises (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In certain embodiments, the peptide linker comprises one or more proline residues.

In some embodiments, the peptide linker comprises the structure of:

L₁-L₂-L₃

- wherein L₁and L₃are each independently chosen from (GSG)_x, (GGGS)_x, or (GSSG)_x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15), and
- L₂is a polypeptide comprising between 0-40 (e.g., 0-5, 5-10, 10-15, 15-20, 20-25, 25-30, 30-35, or 35-40) amino acid residues. In certain embodiments, L₂is an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SGSETPGTSESATPES (SEQ ID NO: 106). In certain embodiments, the peptide linker comprises an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 40.

In some embodiments, the Cas12i fusion protein does not comprise a linker sequence.

In some embodiments, heterologous sequence is heterologous to both the Cas12i polypeptide and the deaminase domain.

In certain embodiments, the heterologous sequence comprises a Uracil Glycosylase Inhibitor (UGI) polypeptide.

In some embodiments, the heterologous sequence comprises a nuclear localization sequence (NLS) polypeptide.

In some embodiments, the Cas12i fusion protein forms a complex with a ribonucleic acid (RNA) guide wherein the RNA guide comprises a nuclease binding sequence (e.g., a direct repeat sequence) and a DNA-binding sequence (e.g., a spacer sequence).

In one aspect, the disclosure provides a method of introducing a substitution in a target sequence of a target nucleic acid in a cell, the method comprising:

- contacting the cell with: (i) a Cas12i domain, (ii) an RNA guide, and (iii) a deaminase domain, or nucleic acid encoding (i), (ii), and (iii),
- wherein the target nucleic acid comprises a nucleotide position 0, wherein nucleotide position 0 of the target strand hybridizes with the direct repeat-proximal end of the spacer sequence of the RNA guide, and a position x (wherein optionally x=20), wherein position x of the target strand hybridizes with the direct repeat-distal end of the spacer sequence of the RNA guide,
- and wherein the target nucleic acid comprises an A or a C between positions 5-16 (e.g., between positions 7-12, e.g., between positions 8-11) on the target strand or the non-target strand,
- wherein the A is substituted to a inosine (I) (e.g., converts an A:T base pair to an I:C, I:U, or I:A base pair) or a guanine (G) or the C is substituted to a U or T (e.g., converts a C:G base pair to a T:A base pair).

In one aspect, the disclosure provides a method of introducing a substitution in a target sequence of a target nucleic acid in a cell, the method comprising:

- contacting the cell with (i) a Cas12i fusion protein described herein, and (ii) an RNA guide, or a nucleic acid encoding (i) and (ii),
- thereby introducing the substitution.

In certain embodiments, the cell is in vivo.

In some embodiments, the cell is ex vivo.

In one aspect, the disclosure provides a composition comprising:

- a) the Cas12i fusion protein described herein; and
- b) an RNA guide or a nucleic acid encoding the RNA guide, wherein the RNA guide comprises a nuclease binding sequence (e.g., a direct repeat sequence) and a DNA-binding sequence (e.g., a spacer sequence).

In some embodiments of the aspects or embodiments described herein, the spacer sequence comprises about 10 nucleotides to about 50 nucleotides in length. In some embodiments, the spacer sequence comprises about 15 nucleotides and about 35 nucleotides in length.

In certain embodiments, the spacer sequence is substantially identical to a target sequence of a target nucleic acid.

In some embodiments, the target sequence is adjacent to a protospacer adjacent motif (PAM) sequence. In certain embodiments, the PAM sequence comprises a sequence set forth as 5′-NTTN-3′, wherein N is any nucleotide.

In one aspect, the disclosure provides Cas12i fusion protein comprising, in an N-terminal to C-terminal direction:

- (a) an N-terminal portion of a Cas12i polypeptide, wherein the N-terminal portion of the Cas12i polypeptide comprises a Cas12i sequence from the N-terminus to a loop, or a functional fragment or variant thereof;
- (b) a heterologous sequence comprising a deaminase domain, and
- (c) a C-terminal portion of the Cas12i polypeptide, wherein the C-terminal portion of the Cas12i polypeptide comprises a Cas12i sequence from the loop to the C-terminus, or a fragment or variant thereof.

In some embodiments, the N-terminal portion of the Cas12i polypeptide comprises amino acids 1-n of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and

- the C-terminal portion of the Cas12i polypeptide comprises amino acids m-1054 of SEQ ID NO: 2, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.

In some embodiments, n and m are each independently a number between:

- i) 342-358 (e.g., 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358);
- ii) 373-378 (e.g., 373, 374, 375, 376, 377, or 378);
- iii) 408-413 (e.g., 408, 409, 410, 411, 412, or 413);
- iv) 677-685 (e.g., 677, 678, 679, 680, 681, 682, 683, 684, or 685);
- v) 718-723 (e.g., 718, 719, 720, 721, 722, or 723);
- vi) 771-782 (e.g., 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, or 782);
- vii) 953-965 (e.g., 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, or 965);
- viii) 55-65 (e.g., 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, or 65);
- ix) 99-105 (e.g., 99, 100, 101, 102, 103, 104, or 105);
- x) 112-120 (e.g., 112, 113, 114, 115, 116, 117, 118, 119, or 120);
- xi) 195-206 (e.g., 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, or 206);
- xii) 241-250 (e.g., 241, 242, 243, 244, 245, 246, 247, 248, 249, or 250);
- xiii) 583-594 (e.g., 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, or 594);
- xiv) 877-901 (e.g., 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, or 901); or
- xv) 386-397 (e.g., residue 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, or 397).

In certain embodiments, n<m. In some embodiments, m=n+1.

In particular embodiments, the Cas12i polypeptide is a Cas12i4 polypeptide.

In some embodiments, the heterologous sequence comprises at least one linker (e.g., any linker described herein).

In certain embodiments, the heterologous sequence comprises a first linker (e.g., a first peptide linker) and a second linker (e.g., a second peptide linker). In some embodiments, the first linker and the second linker each independently comprise between 3 and 70 (e.g., 3-10, 10-15, 15-20, 20-25, 25-30, 30-35, 35-40, 40-45, 45-50, 50-55, 55-60, 60-65, or 65-70) amino acid residues. In certain embodiments, the first linker and the second linker each independently comprise (GSG)x, (GGGS)x, or (GSSG)x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15). In some embodiments, the first linker and the second linker independently comprise amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 40. In some embodiments, the first linker and the second linker each independently comprise one or more proline residues. In certain embodiments, the first linker is N-terminal of the deaminase domain and the second linker is C-terminal of the deaminase domain. In some embodiments, the first linker and the second linker have the same sequence. In certain embodiments, the first linker and the second linker have different sequences.

In one aspect, the disclosure provides a fusion protein comprising:

- (a) a Cas12i4 polypeptide,
- (b) a deaminase domain chosen from APOBEC3 or ABE8_20, or a biologically active portion or variant thereof.

In one embodiments, the deaminase domain is N-terminal or C-terminal of the Cas12i4 polypeptide. In certain embodiments, the deaminase domain is N-terminal of the Cas12i4 polypeptide. In certain embodiments, the deaminase domain is C-terminal of the Cas12i4 polypeptide. In some embodiments, the fusion protein does not comprise a linker sequence.

In certain embodiments, the fusion protein further comprises at least one heterologous sequence, which is heterologous to both the Cas12i4 domain and the deaminase domain.

In certain embodiments, the heterologous sequence comprises a Uracil Glycosylase Inhibitor (UGI) polypeptide.

In some embodiments, the heterologous sequence comprises a nuclear localization sequence (NLS) polypeptide.

In some embodiments, the fusion protein comprises, one, two, or three of:

- i) a first heterologous sequence situated between the Cas12i4 domain and the deaminase domain;
- ii) a second heterologous sequence situated between the Cas12i4 domain and the terminus nearest the Cas12i4 domain; or
- iii) a third heterologous sequence situated between the deaminase domain and the terminus nearest the deaminase domain.

In some embodiments, the deaminase domain is N-terminal of the Cas12i4 domain and the UGI domain. In certain embodiments, the deaminase domain is C-terminal of the Cas12i4 domain.

In some embodiments, the deaminase domain is N-terminal of the Cas12i4 domain, the first heterologous sequence comprises the UGI domain, the second heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide), and the third heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide).

In some embodiments, the deaminase domain is C-terminal of the Cas12i4 domain, the first heterologous sequence comprises a linker, the second heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide) and the UGI domain, and the third heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide).

In some embodiments, the deaminase domain is C-terminal of the Cas12i4 domain, the first heterologous sequence comprises an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide), the second heterologous sequence is absent or comprises a peptide tag (e.g., a peptide purification tag), and the third heterologous sequence comprises a UGI domain and an NLS polypeptide (e.g., a bpNLS or an npNLS polypeptide).

In certain embodiments, the first heterologous sequence comprises the UGI polypeptide. In some embodiments, the UGI polypeptide is flanked by peptide linkers.

In some embodiments, the second and third heterologous sequence each independently comprise an NLS polypeptide.

In some embodiments, the first heterologous sequence comprises a peptide linker and, when present, the second heterologous sequence or the third heterologous sequence comprises an NLS polypeptide, one or more linkers, and a UGI polypeptide.

In certain embodiments, the NLS polypeptide is N-terminal of the UGI polypeptide. In certain embodiments, the NLS polypeptide is C-terminal of the UGI polypeptide.

In some embodiments, one of the one or more linkers is situated between the NLS polypeptide and the UGI polypeptide.

In certain embodiments, the third heterologous sequence comprises a first and a second linker, wherein the first linker is situated N-terminal the NLS polypeptide and the UGI polypeptide and the second linker is situated between the NLS polypeptide and the UGI polypeptide.

In some embodiments, the first heterologous sequence further comprises an NLS sequence. In certain embodiments, the NLS polypeptide is situated N-terminal of the linker.

In some embodiments, the fusion protein does not comprise the second heterologous sequence.

In one aspect, the disclosure provides a fusion protein comprising:

- (a) a Cas12i4 polypeptide,
- (b) a deaminase domain; and
- (c) a UGI polypeptide.

In some embodiments, the deaminase domain is N-terminal or C-terminal of the Cas12i4 polypeptide. In some embodiments, the deaminase domain is N-terminal of the Cas12i4 polypeptide. In certain embodiments, the deaminase domain is C-terminal of the Cas12i4 polypeptide.

In some embodiments, the fusion protein does not comprise a linker sequence.

In certain embodiments, the fusion protein further comprises at least one heterologous sequence, which is heterologous to each of the Cas12i4 domain, the deaminase domain, and the UGI polypeptide.

In one embodiments, the heterologous sequence comprises a nuclear localization sequence (NLS) polypeptide.

In certain embodiments, the fusion protein comprises, one, two, or three of:

- i) a first heterologous sequence situated between the Cas12i4 domain and the deaminase domain;
- ii) a second heterologous sequence situated between the Cas12i4 domain and the terminus nearest the Cas12i4 domain; or
- iii) a third heterologous sequence situated between the deaminase domain and the terminus nearest the deaminase domain.

In certain embodiments, the deaminase domain is N-terminal of the Cas12i4 domain and the UGI domain.

In some embodiments, the deaminase domain is C-terminal of the Cas12i4 domain.

In certain embodiments, the fusion protein does not comprise the first heterologous sequence, and wherein the UGI domain is situated between the deaminase domain and the Cas12i4 domain.

In some embodiments, UGI domain is situated C-terminal of both the deaminase domain and the Cas12i4 domain.

In certain embodiments, the UGI domain is flanked by peptide linkers.

In certain embodiments, when present, the first and second heterologous sequence each independently comprise an NLS polypeptide.

In some embodiments, the one, two, or three of the first, the second, and the third heterologous sequence each independently comprise one or more peptide linkers and the third heterologous sequence comprises an NLS polypeptide. In some embodiments, the NLS polypeptide is N-terminal of the UGI polypeptide. In certain embodiments, the NLS polypeptide is C-terminal of the UGI polypeptide.

In some embodiments, at least one (e.g., one) of the one or more of the linkers is situated between the NLS polypeptide and the UGI polypeptide.

In some embodiments, the first heterologous sequence further comprises an NLS sequence. In certain embodiments, the NLS polypeptide is situated N-terminal of the linker.

In some embodiments, the NLS polypeptide is selected from a nuclear plasma NLS (npNLS) polypeptide or a bipartite NLS (bpNLS) polypeptide. In certain embodiments, the fusion protein comprises an npNLS polypeptide and a bpNLS polypeptide.

In some embodiments, the npNLS polypeptide is situated N-terminal of the bpNLS polypeptide.

In certain embodiments, the npNLS polypeptide is situated C-terminal of the bpNLS polypeptide.

In some embodiments, the npNLS polypeptide comprises an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 36. In certain embodiments, the bpNLS polypeptide comprises an amino acid sequence having at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 38.

In some embodiments, the npNLS polypeptide comprises an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 36, and the bpNLS polypeptide comprises an amino acid sequence having at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 38.

In certain embodiments, each peptide linker independently comprises between 2 and 200 amino acid residues. In some embodiments, each peptide linker independently comprises one or more Gly residues and one or more Ser residues. In certain embodiments, each peptide linker independently comprises (GSG)_x, (GGGS)_x, or (GSSG)_x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15).

In particular embodiments, the peptide linker comprises the structure of:

L₁-L₂-L₃

- wherein L₁and L₃are each independently chosen from (GSG)_x, (GGGS)_x, or (GSSG)_x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15), and
- L₂is a polypeptide comprising between 0-40 (e.g., 0-5, 5-10, 10-15, 15-20, 20-25, 25-30, 30-35, or 35-40) amino acid residues.

In certain embodiments, L₂is an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SGSETPGTSESATPES (SEQ ID NO: 106).

In some embodiments, the peptide linker comprises an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 40.

In certain embodiments, at least one of the first, second, or third heterologous sequence comprises a linker comprising an amino acid sequence comprising at least 80% (e.g., 85%, 90%, 95%, 99% or 100%) sequence identity to SEQ ID NO: 40.

In some embodiments, the fusion protein comprises an N-terminal or C-terminal peptide tag.

In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 60, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can introduce a substitution in a target sequence of a target nucleic acid.

In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 61, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can introduce a substitution in a target sequence of a target nucleic acid.

In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 62, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can introduce a substitution in a target sequence of a target nucleic acid.

In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 63, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can introduce a substitution in a target sequence of a target nucleic acid.

In one aspect, the disclosure provides a fusion protein comprising an amino acid sequence according to SEQ ID NO: 64, or a sequence having at least 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the fusion protein, when bound to a gRNA, can introduce a substitution in a target sequence of a target nucleic acid.

In certain embodiments, the fusion protein forms a complex with a ribonucleic acid (RNA) guide wherein the RNA guide comprises a nuclease binding sequence (e.g., a direct repeat sequence) and a DNA-binding sequence (e.g., a spacer sequence).

In one aspect, the disclosure provides a polypeptide system comprising:

- (a) a first polypeptide comprising a Cas12i domain and a first dimerization domain, and
- (b) a second polypeptide comprising a deaminase domain and a second, compatible dimerization domain.

In certain embodiments, the first polypeptide comprises a first peptide linker situated between the Cas12i domain and the first dimerization domain.

In some embodiments, the second polypeptide comprises a second peptide linker situated between the Cas12i domain and the second dimerization domain.

In certain embodiments, each peptide linker independently comprises between 2 and 200 amino acid residues.

In some embodiments, each peptide linker independently comprises one or more Gly residues and one or more Ser residues.

In certain embodiments, each peptide linker independently comprises (GSG)_x, (GGGS)_x, or (GSSG)_x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15).

In some embodiments, each peptide linker independently comprises one or more proline residues.

In particular embodiments, the peptide linker comprises the structure of:

L₁-L₂-L₃

- wherein L₁and L₃are each independently chosen from (GSG)_x, (GGGS)_x, or (GSSG)_x, wherein x is an integer between 0 and 15 (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15), and
- L₂is a polypeptide comprising between 0-40 (e.g., 0-5, 5-10, 10-15, 15-20, 20-25, 25-30, 30-35, or 35-40) amino acid residues.

In some embodiments, the first polypeptide and the second polypeptide form a complex upon dimerization of the of the first dimerization domain and the second dimerization domain.

In certain embodiments, the Cas12i domain comprises a Cas12i1 polypeptide, a Cas12i2 polypeptide, a Cas12i3 polypeptide, or a Cas12i4 polypeptide, and wherein:

- (a) the Cas12i1 polypeptide comprises an amino acid sequence with at least 80% (e.g., 85%, 90%, 95%, 97%, 98%, or 99%) identity to SEQ ID NO: 8;
- (b) the Cas12i2 polypeptide comprises an amino acid sequence with at least 80% (e.g., 85%, 90%, 95%, 97%, 98%, or 99%) identity to any one of SEQ ID NOs: 2-7;
- (c) the Cas12i3 polypeptide comprises an amino acid sequence with at least 80% (e.g., 85%, 90%, 95%, 97%, 98%, or 99%) identity to SEQ ID NO: 11; and
- (d) the Cas12i4 polypeptide comprises an amino acid sequence with at least 80% (e.g., 85%, 90%, 95%, 97%, 98%, or 99%) identity to SEQ ID NO: 9 or SEQ ID NO: 10.

In some embodiments, the Cas12i domain forms a complex with an RNA guide, wherein the RNA guide comprises a nuclease binding sequence (e.g., a direct repeat sequence) and a DNA-binding sequence (e.g., a spacer sequence).

In certain embodiments, the first dimerization domain and the second dimerization domain are identical (e.g., a homodimer). In some embodiments, the first dimerization domain and the second dimerization domain are not identical (e.g., a heterodimer). In certain embodiments, the first dimerization domain is chosen from leucine zipper, nanobody, antibody, or coiled-coil domain. In certain embodiments, the first and second dimerization domains are chemically inducible dimerization domains (e.g., a rapamycin sensitive dimerization domain) that can be regulated by the presence of a small molecule.

In one aspect, the disclosure provides a fusion protein comprising:

- a) a first portion comprising an amino acid sequence of an N-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and
- b) a second portion comprising an amino acid sequence of a C-terminal portion of any one of SEQ ID NOs: 2-7, or a sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto,
- wherein the second portion is N-terminal of the first portion,
- wherein the first portion and second portion together bind to an RNA guide comprising a direct repeat sequence and a spacer sequence.

In some embodiments, the fusion protein is capable of specifically binding a target nucleic acid complementary to the spacer sequence.

In certain embodiments, the first portion and the second portion are linked by a heterologous sequence.

In some embodiments, the heterologous sequence comprises one or more of:

- a) a first linker (e.g., a first peptide linker);
- b) a second linker (e.g., a second peptide linker); and
- c) an effector domain.

In certain embodiments, the C-terminal most amino acid of the first portion is any amino acid residue of any one of SEQ ID NOs: 2-7 chosen from residues:

- a) 342-358 (e.g., residue 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, or 358);
- b) 373-378 (e.g., residue 373, 374, 375, 376, 377, or 378);
- c) 408-413 (e.g., residue 408, 409, 410, 411, 412, or 413);
- d) 677-685 (e.g., residue 677, 678, 679, 680, 681, 682, 683, 684, or 685);
- e) 718-723 (e.g., residue 718, 719, 720, 721, 722, or 723);
- f) 771-782 (e.g., residue 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, or 782);
- g) 953-965 (e.g., residue 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, or 965);
- h) 55-65 (e.g., residue 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, or 65);
- i) 99-105 (e.g., residue 99, 100, 101, 102, 103, 104, or 105);
- j) 112-120 (e.g., residue 112, 113, 114, 115, 116, 117, 118, 119, or 120);
- k) 195-206 (e.g., residue 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, or 206);
- l) 241-250 (e.g., residue 241, 242, 243, 244, 245, 246, 247, 248, 249, or 250);
- m) 583-594 (e.g., residue 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, or 594);
- n) 877-901 (e.g., residue 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, or 901);
- o) 173-179 (e.g., residue 173, 174, 175, 176, 177, 178, or 179);
- p) 216-221 (e.g., residue 216, 217, 218, 219, 220, or 221);
- q) 265-272 (e.g., residue 265, 266, 267, 268, 269, 270, 271, or 272);
- r) 456-468 (e.g., residue 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, or 468);
- s) 476-482 (e.g., residue 476, 477, 478, 479, 480, 481, or 482);
- t) 498-513 (e.g., residue 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, or 513);
- u) 614-625 (e.g., residue 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, or 625);
- v) 977-982 (e.g., residue 977, 978, 979, 980, 981, or 982);
- w) 1007-1012 (e.g., residue 1007, 1008, 1009, 1010, 1011, or 1012);
- x) 386-397 (e.g., residue 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, or 397); or
- y) 831-844 (e.g., residue 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, or 844).

In some embodiments, the first portion further comprises a fusion domain, the second portion comprises a fusion domain, or the first portion and the second portion comprise a fusion domain.

In certain embodiments, the fusion domain is a deaminase.

In some embodiments, the fusion domain is a UGI polypeptide and/or an NLS.

In certain embodiments, the fusion domain is a FokI nuclease domain.

In some embodiments, the FokI nuclease domain is a catalytically active FokI nuclease domain or a catalytically inactive FokI nuclease domain.

In certain embodiments, the FokI nuclease domain is fused to a deaminase.

In some embodiments, the FokI nuclease domain is fused to a UGI polypeptide and/or an NLS.

In some embodiments, the first portion comprises a catalytically active FokI nuclease domain and the second portion comprises a catalytically inactive FokI nuclease domain, or the first portion comprises a catalytically inactive FokI nuclease domain and the second portion comprises a catalytically active FokI nuclease domain.

In certain embodiments, the fusion protein comprises a catalytically inactive RuvC domain.

In some embodiments, the fusion protein comprises nickase activity.

In one aspect, the disclosure provides a method of producing a substitution in a target sequence of a target nucleic acid in a cell, the method comprising:

- contacting the cell with: (i) a Cas12i domain, (ii) an RNA guide, and (iii) a deaminase domain, or nucleic acid encoding (i), (ii), and (iii),
- wherein the target sequence comprises a nucleotide position 0, wherein nucleotide position 0 of the target strand hybridizes with the direct repeat-proximal end of the spacer sequence of the RNA guide, and a position x (wherein optionally x=20), wherein position x of the target strand hybridizes with the direct repeat-distal end of the spacer sequence of the RNA guide,
- and wherein the target sequence comprises an A or a C between positions 5-16 (e.g., between positions 7-12, e.g., between positions 8-11)) on the target strand or the non-target strand,
- wherein the A is substituted to a inosine (I) (e.g., converts an A:T base pair to an I:C, I:U, or I:A base pair) or a guanine (G) or the C is substituted to a U or T (e.g., converts a C:G base pair to a T:A base pair).

In one aspect, the disclosure provides a method of introducing a substitution in a target sequence of a target nucleic acid in a cell, the method comprising:

- contacting the cell with (i) a fusion protein described herein or the polypeptide system described herein, and (ii) an RNA guide, or a nucleic acid encoding (i) and (ii),
- thereby introducing the substitution.

In certain embodiments, the target sequence comprises a nucleotide position 0, wherein nucleotide position 0 of the target strand hybridizes with the direct repeat-proximal end of the spacer sequence of the RNA guide, and a position x (wherein optionally x=20), wherein position x of the target strand hybridizes with the direct repeat-distal end of the spacer sequence of the RNA guide,

- and wherein the target sequence comprises an A or a C between positions 5-16 (e.g., between positions 7-12, e.g., between positions 8-11) on the target strand or the non-target strand,
- wherein the A is substituted to a inosine (I) (e.g., an A:T base pair is converted to an I:C, I:U, or I:A base pair) or to guanine (G), or the C is substituted to a U (e.g., converts a C:G base pair to a T:A base pair).

In some embodiments, the method converts a C:G base pair to a T:A base pair alteration in the target sequence.

In some embodiments, the alteration occurs at one or more C:G base pairs between positions 7-12 (e.g., between positions 8-11) of the target sequence. In certain embodiments, the cell is selected from a eukaryotic cell, a mammalian cell, or a human cell. In some embodiments, the cell is in vivo. In certain embodiments, the cell is ex vivo. In some embodiments, the cell is in vitro.

In one aspect, the disclosure provides a composition comprising:

- a) the fusion protein described herein, or the polypeptide system described herein; and
- b) an RNA guide or a nucleic acid encoding the RNA guide, wherein the RNA guide comprises a nuclease binding sequence (e.g., a direct repeat sequence) and a DNA-binding sequence (e.g., a spacer sequence).

In some embodiments, the Cas12i polypeptide is a Cas12i1 polypeptide, a Cas12i2 polypeptide, a Cas12i3 polypeptide, or a Cas12i4 polypeptide, and wherein:

- (a) the Cas12i1 polypeptide comprises an amino acid sequence with at least 80% identity to SEQ ID NO: 8;
- (b) the Cas12i2 polypeptide comprises an amino acid sequence with at least 80% identity to any one of SEQ ID NOs: 2-7;
- (c) the Cas12i3 polypeptide comprises an amino acid sequence with at least 80% identity to SEQ ID NO: 11; and
- (d) the Cas12i4 polypeptide comprises an amino acid sequence with at least 80% identity to SEQ ID NO: 9 or SEQ ID NO: 10.