PROGRAMMABLE INSERTION APPROACHES VIA REVERSE TRANSCRIPTASE RECRUITMENT

BACKGROUND

Editing genomes using the RNA-guided DNA targeting principle of CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats-CRISPR associated proteins) has been widely exploited and has become a powerful genome editing means for a wide variety of applications. A wide range of applications using the CRISPR system have been developed, including the use of additional proteins that confer extra functional properties. However, there exists a need for strategies to recruit these additional proteins to the CRISPR system in the genome.

SUMMARY

In one aspect, the disclosure provides a complex for genome editing comprising: (i) an RNA-guided nuclease; (ii) a fusion protein comprising a reverse transcriptase domain linked to a nucleic acid binding protein; and (iii) a guide RNA (gRNA) comprising a 5′ end and a 3′ end and comprising at least one protein-recruiting stem-loop nucleic acid sequence, wherein the protein-recruiting stem-loop nucleic acid sequence binds to the nucleic acid binding protein.

In certain embodiments, the nucleic acid binding protein is MS2 coat protein (MCP) or PP7 coat protein.

In certain embodiments, the protein-recruiting stem-loop nucleic acid sequence is a MS2 sequence or PP7 stem loop sequence. In certain embodiments, the MS2 sequence comprises a nucleic acid sequence of ACAUGAGGAUCACCCAUGU. (SEQ ID NO:54)

In certain embodiments, the gRNA comprises a primer binding site (PBS), a reverse transcriptase (RT) template sequence, and an integration site sequence.

In certain embodiments, the gRNA comprises 1, 2, 3, 4, 5, or 6 protein-recruiting stem-loop nucleic acid sequences.

In certain embodiments, the gRNA comprises 2 or more distinct protein-recruiting stem-loop nucleic acid sequences.

In certain embodiments, the protein-recruiting stem-loop nucleic acid sequences are identical.

In certain embodiments, the protein-recruiting stem-loop nucleic acid sequence is present at the 5′ end of the gRNA, the 3′ end of the gRNA, or both. In certain embodiments, the gRNA comprises two protein-recruiting stem-loop nucleic acid sequences present at the 5′ end of the gRNA, the 3′ end of the gRNA, or both.

In certain embodiments, the complex comprises one or more additional gRNAs.

In certain embodiments, the one or more additional gRNAs comprise at least one protein-recruiting stem-loop nucleic acid sequence.

In certain embodiments, the complex comprises two or more gRNAs, each gRNA comprising a different target at desired locations in a cell genome.

In certain embodiments, the RNA-guided nuclease comprises a CRISPR nuclease. In certain embodiments, the CRISPR nuclease is Cas9 or Cas12. In certain embodiments, the CRISPR nuclease comprises nickase activity. In certain embodiments, the CRISPR nuclease is selected from Cas9-D10A, Cas9-H840A, and Cas12a/b nickase.

In certain embodiments, the reverse transcriptase domain is selected from the group consisting of Moloney Murine Leukemia Virus (M-MLV) reverse transcriptase domain, transcription xenopolymerase (RTX), avian myeloblastosis virus reverse transcriptase (AMV-RT), and Eubacterium rectale maturase RT (MarathonRT).

In certain embodiments, the reverse transcriptase domain comprises a mutation relative to the wild-type sequence or contains a stabilization domain like the DNA-binding Sto7d protein from Sulfolobus tokodaii.

In certain embodiments, the M-MLV reverse transcriptase domain comprises one or more mutations selected from the group consisting of D200N, T306K, W313F, T330P, L603W, and L139P.

In certain embodiments, the reverse transcriptase domain is linked to the nucleic acid binding protein via a linker. In certain embodiments, the linker is cleavable. In certain embodiments, the linker is non-cleavable. In certain embodiments, the complex comprises any one or more of the linker sequences recited in Table 4.

In certain embodiments, the one or both of the RNA-guided nuclease and fusion protein are linked to an integration enzyme or fragment thereof (e.g., an integrase or fragment thereof).

In certain embodiments, the RNA-guided nuclease is linked to an integration enzyme or fragment thereof (e.g., an integrase or fragment thereof).

In certain embodiments, the fusion protein is linked to an integration enzyme or fragment thereof (e.g., an integrase or fragment thereof).

In certain embodiments, the integration enzyme is selected from the group consisting of Cre, Dre, Vika, Bxb1, BceINT φC31, RDF, FLP, φBT1, R1, R2, R3, R4, R5, TP901-1, A118, φFC1, φC1, MR11, TG1, φ370.1, Wβ, BL3, SPBc, K38, Peaches, Veracruz, Rebeuca, Theia, Benedict, KSSJEB, PattyP, Doom, Scowl, Lockley, Switzer, Bob3, Troube, Abrogate, Anglerfish, Sarfire, SkiPole, ConceptII, Museum, Severus, Airmid, Benedict, Hinder, ICleared, Sheen, Mundrea, BxZ2, φRV, retrotransposases encoded by R2, L1, Tol2 Tc1, Tc3, Mariner (Himar 1), Mariner (mos 1), and Minos, and any mutants thereof.

In certain embodiments, the integration enzyme is Bxb1 or a mutant thereof.

In certain embodiments, the integration enzyme is BceINT or a mutant thereof.

In certain embodiments, the integration enzyme comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in any one of SEQ ID NOs: 1-16.

In certain embodiments, the integration enzyme recognizes an integration site.

In certain embodiments, the integration site is an attB site, an attP site, an attL site, an attR site, a lox71 site a Vox site, or a FRT site.

In certain embodiments, the integration enzyme recognizes nucleic acid attachment sites attB and attP, other recognition site pairs, or any pseudosites in a human genome.

In certain embodiments, the attB and/or attP nucleic acid sequence is between 12 and 60 nucleotides in length or between 18 and 50 nucleotides in length.

In certain embodiments, the attB and/or attP nucleic acid sequence comprises one or more truncations. In certain embodiments, the attB and/or attP nucleic acid sequence is truncated by 1 to 32 nucleotides from one or both of the 5′ end and 3′ end.

In certain embodiments, the integration enzyme binds to any one of the attB nucleic acid sequences selected from the group consisting of SEQ ID NOs: 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, and 47. In certain embodiments, the integration enzyme binds to any one of the attP nucleic acid sequences selected from the group consisting of SEQ ID NOs: 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, and 48.

In certain embodiments: a) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 1, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 17 and the attP nucleic acid set forth in SEQ ID NO: 18; b) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 2, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 19 and the attP nucleic acid set forth in SEQ ID NO: 20; c) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 3, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 21 and the attP nucleic acid set forth in SEQ ID NO: 22; d) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 4, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 23 and the attP nucleic acid set forth in SEQ ID NO: 24; e) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 5, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 25 and the attP nucleic acid set forth in SEQ ID NO: 26; f) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 6, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 27 and the attP nucleic acid set forth in SEQ ID NO: 28; g) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 7, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 29 and the attP nucleic acid set forth in SEQ ID NO: 30; h) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 8, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 31 and the attP nucleic acid set forth in SEQ ID NO: 32; i) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 9, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 33 and the attP nucleic acid set forth in SEQ ID NO: 34; j) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 10, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 35 and the attP nucleic acid set forth in SEQ ID NO: 36; k) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 11, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 37 and the attP nucleic acid set forth in SEQ ID NO: 38; l) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 12, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 39 and the attP nucleic acid set forth in SEQ ID NO: 40; m) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 13, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 41 and the attP nucleic acid set forth in SEQ ID NO: 42; n) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 14, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 43 and the attP nucleic acid set forth in SEQ ID NO: 44; o) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 15, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 45 and the attP nucleic acid set forth in SEQ ID NO: 46; or p) the integrase or fragment thereof comprises an amino acid sequence that is at least 90% identical to an amino acid sequence set forth in SEQ ID NO: 16, wherein the integrase binds to the attB nucleic acid set forth in SEQ ID NO: 47 and the attP nucleic acid set forth in SEQ ID NO: 48.

In certain embodiments, any one of the attP nucleic acid sequences selected from the group consisting of SEQ ID NOs: 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, and 48 is truncated by 1 to 32 nucleotides from one or both of the 5′ end and 3′ end.

In certain embodiments, the RNA-guided nuclease interacts with a gRNA comprising a primer binding sequence linked to an integration sequence.

In certain embodiments, the gRNA interacts with the RNA-guided nuclease and targets a desired location in a cell genome.

In certain embodiments, the RNA-guided nuclease nicks a strand of the cell genome and the reverse transcriptase domain incorporates the integration sequence of the gRNA into the nicked site, thereby providing the integration site at the desired location of the cell genome.

In certain embodiments, the integrase is capable of binding the integration sequence.

In one aspect, the disclosure provides a polynucleotide comprising a nucleic acid sequence encoding the RNA-guided nuclease described above.

In one aspect, the disclosure provides a polynucleotide comprising a nucleic acid sequence encoding the gRNA described above.

In one aspect, the disclosure provides a polynucleotide comprising a nucleic acid sequence encoding the fusion protein described above.

In one aspect, the disclosure provides a vector comprising any of the polynucleotides described above.

In one aspect, the disclosure provides a host cell comprising the vector described above.

In one aspect, the disclosure provides a method of site-specific integration of a nucleic acid into a cell genome, the method comprising:

- (a) incorporating an integration site at a desired location in the cell genome by introducing into the cell:
  - i. an RNA-guided nuclease comprising a nickase activity;
  - ii. a fusion protein comprising a reverse transcriptase domain linked to a nucleic acid binding protein; and
  - iii. a guide RNA (gRNA) comprising a 5′ end and a 3′ end and comprising a primer binding sequence linked to an integration sequence and at least one protein-recruiting stem-loop nucleic acid sequence, wherein the protein-recruiting stem-loop nucleic acid sequence binds to the nucleic acid binding protein, wherein the gRNA interacts with the RNA-guided nuclease and targets the desired location in the cell genome, wherein the RNA-guided nuclease nicks a strand of the cell genome and the reverse transcriptase domain incorporates the integration sequence of the gRNA into the nicked site, thereby providing the integration site at the desired location of the cell genome; and
- (b) integrating the nucleic acid into the cell genome by introducing into the cell:
  - i. a DNA or RNA strand comprising the nucleic acid linked to a sequence that is complementary or associated to the integration site; and
  - ii. an integration enzyme or fragment thereof, wherein the integration enzyme or fragment thereof incorporates the nucleic acid into the cell genome at the integration site by integration, recombination, or reverse transcription of the sequence that is complementary or associated to the integration site, thereby introducing the nucleic acid into the desired location of the cell genome of the cell.

In certain embodiments, the nucleic acid binding protein is MS2 coat protein (MCP) or PP7 coat protein.

In certain embodiments, the protein-recruiting stem-loop nucleic acid sequence is a MS2 sequence or PP7 stem loop sequence.

In certain embodiments, the MS2 sequence comprises a nucleic acid sequence of ACAUGAGGAUCACCCAUGU. (SEQ ID NO:54)

In certain embodiments, the gRNA comprises 1, 2, 3, 4, 5, or 6 protein-recruiting stem-loop nucleic acid sequences.

In certain embodiments, the gRNA comprises 2 or more distinct protein-recruiting stem-loop nucleic acid sequences.

In certain embodiments, the protein-recruiting stem-loop nucleic acid sequences are identical.

In certain embodiments, the method comprises one or more additional gRNAs. In certain embodiments, the one or more additional gRNAs comprise at least one protein-recruiting stem-loop nucleic acid sequence,

In certain embodiments, the M-MLV reverse transcriptase domain comprises one or more mutations selected from the group consisting of D200N, T306K, W313F, T330P, L603W, and L139P.

In certain embodiments, the reverse transcriptase domain is linked to the nucleic acid binding protein via a linker. In certain embodiments, the linker is cleavable. In certain embodiments, the linker is non-cleavable. In certain embodiments, the linker comprises any one or more of the linker sequences recited in Table 4.