CFTR-MODULATING COMPOSITIONS AND METHODS

Abstract
The disclosure provides, e.g., compositions, systems, and methods for targeting, editing, modifying, or manipulating a host cell's genome at one or more locations in a DNA sequence in a cell, tissue, or subject. Gene modifying systems for treating cystic fibrosis, e.g., in subjects having a mutation resulting in F508del, are described.
Description
SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted electronically in XML, format compliant with WIPO Standard ST.26 and is hereby incorporated by reference in its entirety. Said XML copy, created on Sep. 15, 2023, is named V2065-703420FT_SL.xml and is 33,706,554 bytes in size.


BACKGROUND

Integration of a nucleic acid of interest into a genome occurs at low frequency and with little site specificity, in the absence of a specialized protein to promote the insertion event. Some existing approaches, like CRISPR/Cas9, are more suited for small edits that rely on host repair pathways, and are less effective at integrating longer sequences. Other existing approaches, like Cre/loxP, require a first step of inserting a loxP site into the genome and then a second step of inserting a sequence of interest into the loxP site. There is a need in the art for improved compositions (e.g., proteins and nucleic acids) and methods for inserting, altering, or deleting sequences of interest in a genome.


Cystic fibrosis (CF) is caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene, which result in either no CFTR protein being made or a malformed CFTR protein that cannot perform its key function in the cell. CFTR's function is to create channels on the cell surface to allow the movement of chloride in and out of the cell. When the CFTR protein functions properly, the balance of chloride and fluid at the cell surface remains normal. If the CFTR protein does not function properly, the balance of chloride and fluids is disrupted, causing mucus in various organs to become thick and sticky. This leads to lung infections and, eventually, respiratory failure in the lungs, poor digestion, and problems in the reproductive system.


More than 1,700 different mutations in the CFTR gene have been identified that can cause CF. One classification system groups mutations by the problems that they cause in the production of the CFTR protein: protein production mutations (Class 1); protein processing mutations (Class 2); gating mutations (Class 3); conduction mutations (Class 4); and insufficient protein mutations (Class 5).


In Class 2 mutations CFTR protein is created, but misfolds, keeping it from moving to the cell surface. An exemplary Class 2 mutation is F508del.


There is no cure for cystic fibrosis, but current treatment options can ease symptoms, reduce complications, and improve quality of life. Close monitoring and early, aggressive intervention is recommended to slow the progression of CF, to extend the life expectancy of cystic fibrosis patients. Managing cystic fibrosis is complex and includes preventing and controlling infections that occur in the lungs; removing and loosening mucus from the lungs; treating and preventing intestinal blockage; and providing adequate nutrition. Accordingly, there is a need for new and more effective treatments for cystic fibrosis.


SUMMARY OF THE INVENTION

This disclosure relates to novel compositions, systems and methods for altering a genome at one or more locations in a host cell, tissue or subject, in vivo or in vitro. In particular, the invention features compositions, systems and methods for inserting, altering, or deleting sequences of interest in a host genome. For example, the disclosure provides systems that are capable of modulating (e.g., inserting, altering, or deleting sequences of interest) the cystic fibrosis transmembrane conductance regulator (CFTR) gene activity and methods of treating cystic fibrosis by administering one or more such systems to alter a genomic sequence at one or more nucleotides to correct a pathogenic mutation causing cystic fibrosis.


In one aspect, the disclosure relates to a system for modifying DNA to correct a human CFTR gene mutation causing cystic fibrosis comprising (a) a nucleic acid encoding a gene modifying polypeptide capable of target primed reverse transcription, the polypeptide comprising (i) a reverse transcriptase domain and (ii) a Cas9 nickase that binds DNA and has endonuclease activity, and (b) a template RNA comprising (i) a gRNA spacer that is complementary to a first portion of the human CFTR gene, (ii) a gRNA scaffold that binds the polypeptide, (iii) a heterologous object sequence comprising a mutation region to correct the mutation, and (iv) a primer binding site (PBS) sequence comprising at least 3, 4, 5, 6, 7, or 8 bases of 100% homology to a target DNA strand at the 3′ end of the template RNA. The CFTR gene may comprise a F508 deletion (F508del) mutation. The template RNA sequence may comprise a sequence described herein, e.g., in Table 1, 3, 4, E2, E2A, E3, or E3A, and optionally G3 or G3A.


The gRNA spacer may comprise at least 15 bases of 100% homology to the target DNA at the 5′ end of the template RNA. The template RNA may further comprise a PBS sequence comprising at least 5 bases of at least 80% homology to the target DNA strand. The template RNA may comprise one or more chemical modifications.


The domains of the gene modifying polypeptide may be joined by a peptide linker. The polypeptide may comprise one or more peptide linkers. The gene modifying polypeptide may further comprise a nuclear localization signal. The polypeptide may comprise more than one nuclear localization signal, e.g., multiple adjacent nuclear localization signals or one or more nuclear localization signals in different regions of the polypeptide, e.g., one or more nuclear localization signals in the N-terminus of the polypeptide and one or more nuclear localization signals in the C-terminus of the polypeptide. The nucleic acid encoding the gene modifying polypeptide may encode one or more intein domains.


Introduction of the system into a target cell may result in insertion of at least 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 500, or 1000 base pairs of exogenous DNA. Introduction of the system into a target cell may result in deletion, wherein the deletion is less than 2, 3, 4, 5, 10, 50, or 100 base pairs of genomic DNA upstream or downstream of the insertion. Introduction of the system into a target cell may result in substitution, e.g., substitution of 1, 2, or 3 nucleotides, e.g., consecutive nucleotides.


The heterologous object sequence may be at least 5, 10, 25, 50, 100, 150, 200, 250, 300, 400, 500, 600, or 700 base pairs.


In one aspect, the disclosure relates to a pharmaceutical composition comprising the system described above and a pharmaceutically acceptable excipient or carrier, wherein the pharmaceutically acceptable excipient or carrier is selected from the group consisting of a plasmid vector, a viral vector, a vesicle, and a lipid nanoparticle. In one aspect, the disclosure relates to a pharmaceutical composition comprising the system described above and multiple pharmaceutically acceptable excipients or carriers, wherein the pharmaceutically acceptable excipients or carriers are selected from the group consisting of a plasmid vector, a viral vector, a vesicle, and a lipid nanoparticle, e.g., where the system described above is delivered by two distinct excipients or carriers, e.g., two lipid nanoparticles, two viral vectors, or one lipid nanoparticle and one viral vector. The viral vector may be an adeno-associated virus (AAV).


In one aspect, the disclosure relates to a host cell (e.g., a mammalian cell, e.g., a human cell) comprising the system described above.


In one aspect, the disclosure relates to a method of correcting a mutation (e.g., a F508del mutation) in the human CFTR gene in a cell, tissue or subject, the method comprising administering the system described above to the cell, tissue or subject, wherein optionally the correction of the mutant CFTR gene results in an amino acid insertion of F508 (reversing the pathogenic deletion).


In some embodiments, the system is capable of correcting more than one mutation in the CFTR gene. For example, where the CFTR gene has one or more mutations within a particular region, the system can correct mutations within the CFTR gene occurring anywhere within the region of the gene that corresponds to the RT template sequence of the template RNA.


The system may be introduced in vivo, in vitro, ex vivo, or in situ. The nucleic acid of (a) may be integrated into the genome of the host cell. In some embodiments, the nucleic acid of (a) is not integrated into the genome of the host cell. In some embodiments, the heterologous object sequence is inserted at only one target site in the host cell genome. The heterologous object sequence may be inserted at two or more target sites in the host cell genome, e.g., at the same corresponding site in two homologous chromosomes or at two different sites on the same or different chromosomes. The heterologous object sequence may encode a mammalian polypeptide, or a fragment or a variant thereof. The components of the system may be delivered on 1, 2, 3, 4, or more distinct nucleic acid molecules. The system may be introduced into a host cell by electroporation or by using at least one vehicle selected from a plasmid vector, a viral vector, a vesicle, and a lipid nanoparticle.


Features of the compositions or methods can include one or more of the following enumerated embodiments.


ENUMERATED EMBODIMENTS

1. A template RNA comprising, e.g., from 5′ to 3′:

    • (i) a gRNA spacer that is complementary to a first portion of the human CFTR gene, wherein the gRNA spacer has a sequence comprising the core nucleotides of a gRNA spacer sequence of Table 1, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the gRNA spacer (e.g., comprises one or more flanking nucleotides that are adjacent to the core nucleotides), or wherein the gRNA spacer has a sequence of a spacer chosen from Table E2, E2A, E3, or E3A, or a sequence having 1, 2, or 3 substitutions thereto;
    • (ii) a gRNA scaffold that binds a gene modifying polypeptide (e.g., binds the Cas domain of the gene modifying polypeptide),
    • (iii) a heterologous object sequence comprising a mutation region to introduce a mutation into a second portion of the human CFTR gene (wherein optionally the heterologous object sequence comprises, from 5′ to 3′, a post-edit homology region, a mutation region, and a pre-edit homology region), and
    • (iv) a primer binding site (PBS) sequence comprising at least 3, 4, 5, 6, 7, or 8 bases with 100% identity to a third portion of the human CFTR gene.


      2. The template RNA of embodiment 1, wherein the heterologous object sequence comprises the core nucleotides of an RT template sequence from Table 3, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the RT template sequence, or wherein the heterologous object sequence comprises a sequence of an RT template sequence from Table E3 or E3A, or a sequence having 1, 2, or 3 substitutions thereto.


      3. The template RNA of embodiment 1, wherein the heterologous object sequence comprises the core nucleotides of the RT template sequence of Table 3 that corresponds to the gRNA spacer sequence, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the RT template sequence (e.g., comprises one or more flanking nucleotides that are adjacent to the core nucleotides), or wherein the heterologous object sequence comprises a sequence of an RT template sequence from Table E3 or E3A, or a sequence having 1, 2, or 3 substitutions thereto.


      4. The template RNA according to any one of embodiments 1-3 wherein the PBS sequence has a sequence comprising the core nucleotides of the PBS sequence from the same row of Table 3 as the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 5′ end of the flanking nucleotides of the PBS sequence (e.g., comprises one or more flanking nucleotides that are adjacent to the core nucleotides).


      5. The template RNA according to any one of embodiments 1-3, wherein the PBS sequence has a sequence comprising the core nucleotides of a PBS sequence of Table 3 that corresponds to the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto, the gRNA spacer sequence, or both, and optionally comprises one or more consecutive nucleotides starting with the 5′ end of the flanking nucleotides of the PBS sequence, or wherein the PBS sequence comprises a PBS sequence from Table E3 or E3A that corresponds to the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto, the gRNA spacer sequence, or both.


      6. The template RNA according to any of embodiments 1-5, wherein the gRNA scaffold comprises a sequence of a gRNA scaffold of Table 12, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      7. The template RNA according to any of embodiments 1-5, wherein the gRNA scaffold comprises a sequence of a gRNA scaffold of Table 12 that corresponds to the RT template sequence, the gRNA spacer sequence, or both, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      8. A template RNA comprising, e.g., from 5′ to 3′:
    • (i) a gRNA spacer that is complementary to a first portion of the human CFTR gene,
    • (ii) a gRNA scaffold that binds a gene modifying polypeptide (e.g., binds the Cas domain of the gene modifying polypeptide),
    • (iii) a heterologous object sequence comprising a mutation region to introduce a mutation into a second portion of the human CFTR gene, wherein the heterologous object sequence comprises the core nucleotides of an RT template sequence of Table 3, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the RT template sequence, or wherein the heterologous object sequence comprises an RT template sequence of Table E3 or E3A, or a sequence having 1, 2, or 3 substitutions thereto; and
    • (iv) a PBS sequence comprising at least 3, 4, 5, 6, 7, or 8 bases of 100% identity to a third portion of the human CFTR gene.


      9. The template RNA of embodiment 8, wherein the gRNA spacer comprises the core nucleotides of a gRNA spacer sequence of Table 1, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the gRNA spacer sequence, or wherein the gRNA spacer comprises a gRNA spacer sequence of Table E2, E2A, E3, or E3A, or a sequence having 1, 2, or 3 substitutions thereto.


      10. The template RNA of any one of embodiments 1-9, wherein the gRNA spacer comprises ACCAUUAAAGAAAAUAUCAU (SEQ ID NO: 19587).


      11. The template RNA of embodiment 9 or 10, wherein the heterologous object sequence comprises the core nucleotides of the gRNA spacer sequence of Table 1 that corresponds to the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the gRNA spacer sequence, or wherein the heterologous object sequence comprises the nucleotides of the gRNA spacer sequence of Table E3 or E3A that corresponds to the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto.


      12. The template RNA according to any one of embodiments 8-11, wherein the PBS sequence has a sequence comprising the core nucleotides of the PBS sequence from the same row of Table 3 as the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 5′ end of the flanking nucleotides of the PBS sequence.


      13. The template RNA according to any one of embodiments 8-11, wherein the PBS sequence has a sequence comprising the core nucleotides of a PBS sequence of Table 3 that corresponds to the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto, the gRNA spacer sequence, or both, and optionally comprises one or more consecutive nucleotides starting with the 5′ end of the flanking nucleotides of the PBS sequence, or wherein the PBS sequence has a sequence comprising the a PBS sequence of Table E3 or E3A that corresponds to the RT template sequence, the gRNA spacer sequence, or both.


      14. The template RNA according to any of embodiments 8-13, wherein the gRNA scaffold comprises a sequence of a gRNA scaffold of Table 12, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      15. The template RNA according to any of embodiments 8-13, wherein the gRNA scaffold comprises a sequence of a gRNA scaffold of Table 12 that corresponds to the RT template sequence, the gRNA spacer sequence, or both, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      16. A gene modifying system for modifying DNA, comprising:
    • (a) a first RNA comprising, from 5′ to 3, (i) a guide RNA sequence that is complementary to a first portion of the human CFTR gene, wherein the guide RNA sequence has a sequence comprising the core nucleotides of a spacer sequence of Table 1, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the guide RNA sequence; and (ii) a sequence (e.g., a scaffold region) that binds a gene modifying polypeptide (e.g., binds the Cas domain of the gene modifying polypeptide), or wherein the guide RNA sequence has a sequence of a spacer chosen from Table E2, E2A, E3, or E3A, or a sequence having 1, 2, or 3 substitutions thereto; and (ii) a sequence (e.g., a scaffold region) that binds a gene modifying polypeptide (e.g., binds the Cas domain of the gene modifying polypeptide), and
    • (b) a second RNA comprising (iii) a heterologous object sequence comprising a nucleotide substitution to introduce a mutation into a second portion of the human CFTR gene (wherein optionally the heterologous object sequence comprises, from 5′ to 3′, a post-edit homology region, a mutation region, and a pre-edit homology region), (iv) a primer region comprising at least 5, 6, 7, or 8 bases of 100% identity to a third portion of the human CFTR gene, and (v) an RRS (RNA binding protein recognition sequence) that binds a gene modifying protein.


      17. The gene modifying system of embodiment 16, wherein the heterologous object sequence comprises the core nucleotides of an RT template sequence from Table 3, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the RT template sequence, or wherein the heterologous object sequence comprises a sequence of an RT template sequence from Table E3 or E3A, or a sequence having 1, 2, or 3 substitutions thereto.


      18. The gene modifying system of embodiment 16, wherein the heterologous object sequence comprises the core nucleotides of the RT template sequence of Table 3 that corresponds to the gRNA spacer sequence, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the RT template sequence, or wherein the heterologous object sequence comprises a sequence of an RT template sequence from Table E3 or E3A that corresponds to the gRNA spacer sequence, or a sequence having 1, 2, or 3 substitutions thereto.


      19. The gene modifying system of any one of embodiments 16-18, wherein the PBS sequence has a sequence comprising the core nucleotides of the PBS sequence from the same row of Table 3 as the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 5′ end of the flanking nucleotides of the PBS sequence, or wherein the PBS sequence has a sequence of a PBS sequence from Table E3 or E3A, or a sequence having 1, 2, or 3 substitutions thereto.


      20. The gene modifying system of one of embodiments 16-18, wherein the PBS sequence has a sequence comprising the core nucleotides of a PBS sequence of Table 3 that corresponds to the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto, the gRNA spacer sequence, or both, and optionally comprises one or more consecutive nucleotides starting with the 5′ end of the flanking nucleotides of the PBS sequence, or wherein the PBS sequence has a sequence of a PBS sequence from Table E3 or E3A that corresponds to the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto.


      21. The gene modifying system of any one of embodiments 16-20, wherein the gRNA scaffold comprises a sequence of a gRNA scaffold of Table 12, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      22. The gene modifying system of any one of embodiments 16-20, wherein the gRNA scaffold comprises a sequence of a gRNA scaffold of Table 12 that corresponds to the RT template sequence, the gRNA spacer sequence, or both, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      23. A gene modifying system for modifying DNA, comprising:
    • (a) a first RNA comprising, from 5′ to 3, (i) a guide RNA sequence that is complementary to a first portion of the human CFTR gene, and (ii) a sequence (e.g., a scaffold region) that binds a gene modifying polypeptide (e.g., binds the Cas domain of the gene modifying polypeptide), and
    • (b) a second RNA comprising (iii) a heterologous object sequence comprising a nucleotide substitution to introduce a mutation into a second portion of the human CFTR gene, wherein the heterologous object sequence comprises the core nucleotides of an RT template sequence of Table 3, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the RT template sequence, or wherein the heterologous object sequence comprises an RT sequence from Table E3 or E3A, or a sequence having 1, 2, or 3 substitutions thereto, and (iv) a primer region comprising at least 5, 6, 7, or 8 bases of 100% homology to a third portion of the human CFTR gene, and (v) an RRS (RNA binding protein recognition sequence) that binds a gene modifying protein.


      24. The gene modifying system of embodiment 23, wherein the gRNA spacer comprises the core nucleotides of a gRNA spacer sequence of Table 1, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the gRNA spacer sequence, or wherein the gRNA spacer comprises a gRNA spacer sequence from Table E2, E2A, E3, or E3A, or a sequence having 1, 2, or 3 substitutions thereto.


      25. The gene modifying system of embodiment 23, wherein the heterologous object sequence comprises the core nucleotides of the gRNA spacer sequence of Table 1 that corresponds to the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the gRNA spacer sequence, or wherein the heterologous object sequence comprises a gRNA spacer sequence from Table E3 or E3A, that corresponds to the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto.


      26. The gene modifying system of any one of embodiments 23-25, wherein the PBS sequence has a sequence comprising the core nucleotides of the PBS sequence from the same row of Table 3 as the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 5′ end of the flanking nucleotides of the PBS sequence, or wherein the PBS sequence has a sequence comprising a PBS sequence from Table E3 or E3A, or a sequence having 1, 2, or 3 substitutions thereto.


      27. The gene modifying system of any one of embodiments 23-25, wherein the PBS sequence has a sequence comprising the core nucleotides of a PBS sequence of Table 3 that corresponds to the RT template sequence, the gRNA spacer sequence, or both, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 5′ end of the flanking nucleotides of the PBS sequence, or wherein the PBS sequence has a sequence comprising a PBS sequence from Table E3 or E3A that corresponds to the RT template sequence, the gRNA spacer sequence, or both, or a sequence having 1, 2, or 3 substitutions thereto.


      28. The gene modifying system of any one of embodiments 23-27, wherein the gRNA scaffold comprises a sequence of a gRNA scaffold of Table 12, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      29. The gene modifying system of any one of embodiments 23-27, wherein the gRNA scaffold comprises a sequence of a gRNA scaffold of Table 12 that corresponds to the RT template sequence, the gRNA spacer sequence, or both, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      30. A gRNA comprising (i) a gRNA spacer sequence that is complementary to a first portion of the human CFTR gene, wherein the gRNA spacer has a sequence comprising the core nucleotides of a gRNA spacer sequence of Table 1, Table 2, or Table 4, or a sequence having 1, 2, or 3 substitutions thereto and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the gRNA spacer sequence, or wherein the gRNA spacer has a sequence comprising a gRNA spacer sequence from Table E2, E2A, E3, or E3A or a sequence having 1, 2, or 3 substitutions thereto; and (ii) a gRNA scaffold.


      31. The gRNA of embodiment 30, wherein the gRNA scaffold comprises a sequence of a gRNA scaffold of Table 12, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      32. The gRNA of embodiment 30, wherein the gRNA scaffold comprises a sequence of a gRNA scaffold of Table 12 that corresponds to the gRNA spacer sequence, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      33. A template RNA comprising: (iii) a heterologous object sequence comprising a mutation region to introduce a mutation into a second portion of the human CFTR gene, wherein the heterologous object sequence comprises the core nucleotides of an RT template sequence of Table 3, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the RT template sequence, or wherein the heterologous object sequence comprises an RT template sequence from Table E3 or E3A or a sequence having 1, 2, or 3 substitutions thereto, and (iv) a PBS sequence comprising at least 5, 6, 7, or 8 bases of 100% homology to a third portion of the human CFTR gene.


      34. The template RNA according to embodiment 33, wherein the PBS sequence has a sequence comprising the core nucleotides of the PBS sequence from the same row of Table 3 as the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 5′ end of the flanking nucleotides of the PBS sequence.


      35. The template RNA according to embodiment 33, wherein the PBS sequence has a sequence comprising the core nucleotides of a PBS sequence of Table 3 that corresponds to the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 5′ end of the flanking nucleotides of the PBS sequence, or wherein the PBS sequence comprises a PBS sequence from Table E3 or E3A that corresponds to the RT template sequence, or a sequence having 1, 2, or 3 substitutions thereto.


      36. The template RNA according to any one of embodiments 1-15 or 33-35, the gene modifying system of any one of embodiments 16-29, or the gRNA of any one of embodiments 30-32, wherein the mutation comprises an amino acid insertion of F508 (e.g., to correct a pathogenic F508 deletion mutation) of the CFTR gene.


      37. The template RNA according to any one of embodiments 1-15 or 33-36 or the gene modifying system of any one of embodiments 16-29 or 36, wherein the pre-edit sequence comprises between about 1 nucleotide to about 35 nucleotides (e.g., comprises about 1-5, 5-10, 10-15, 15-20, 20-25, 25-30, or 30-35 nucleotides) in length.


      38. The template RNA according to any one of embodiments 1-15 or 33-35, the gene modifying system of any one of embodiments 16-29, or the gRNA of any one of embodiments 30-32, wherein the mutation comprises a change at a portion of the CFTR gene corresponding to the RT template sequence of Table 3.


      39. The template RNA according to any one of embodiments 1-15 or 33-37 or the gene modifying system of any one of embodiments 16-29, 36, or 37, wherein the mutation region is at least three nucleotides in length.


      40. The template RNA according to any one of embodiments 1-15, 33-37, or 39 or the gene modifying system of any one of embodiments 16-29, 36, 37, or 39, wherein the mutation region is up to 32 (e.g., up to 5, 10, 15, 20, 25, 30, or 32) nucleotides in length and comprises one, two, or three sequence differences relative to a second portion of the human CFTR gene.


      41. The template RNA according to any one of embodiments 1-15, 33-37, 39, or 40 or the gene modifying system of any one of embodiments 16-29, 36, 37, 39, or 40, wherein the mutation region comprises two sequences differences relative to a second portion of the human CFTR gene.


      42. The template RNA according to any one of embodiments 1-15, 33-37, or 39-41 or the gene modifying system of any one of embodiments 16-29, 36, 37, or 39-41, wherein the mutation region comprises a first region (e.g., a first nucleotide) designed to correct a pathogenic mutation in the CFTR gene and a second region (e.g., a second nucleotide) designed to inactivate a PAM sequence (e.g., a “PAM-kill” mutation as described herein).


      43. The template RNA according to any one of embodiments 1-15, or 33-41 or the gene modifying system of any one of embodiments 16-29, or 36-41, wherein the mutation region comprises less than 80%, 70%, 60%, 50%, 40%, or 30% identity to corresponding portion of the human CFTR gene.


      44. The template RNA of any one of the preceding embodiments, wherein the template RNA comprises one or more silent mutations (e.g., silent substitutions).


      45. The template RNA of any of the preceding embodiments, wherein the mutation region comprises a first region designed to correct a pathogenic mutation in the CFTR gene and a second region designed to introduce a silent substitution.


      46. The template RNA of any one of the preceding embodiments, which comprises one or more chemically modified nucleotides.


      47. A gene modifying system comprising: a template RNA of any of embodiments 1-15, 33-41, or 44-46, or a system of any of embodiments 16-29 or 36-41, and a gene modifying polypeptide, or a nucleic acid (e.g., RNA) encoding the gene modifying polypeptide.


      48. The gene modifying system of embodiment 47, wherein the gene modifying polypeptide comprises:
    • a reverse transcriptase (RT) domain (e.g., an RT domain from a retrovirus, or a polypeptide domain having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% amino acids sequence identity thereto); and
    • a Cas domain that binds to the target DNA molecule and is heterologous to the RT domain (e.g., a Cas9 domain); and
    • optionally, a linker disposed between the RT domain and the Cas domain.


      49. The gene modifying system of embodiment 48, wherein the RT domain comprises:
    • (a) an RT domain of Table 6 or F3; or
    • (b) an RT domain from a murine leukemia virus (MMLV), a porcine endogenous retrovirus (PERV); Avian reticuloendotheliosis virus (AVIRE), a feline leukemia virus (FLV), simian foamy virus (SFV) (e.g., SFV3L), bovine leukemia virus (BLV), Mason-Pfizer monkey virus (MPMV), human foamy virus (HFV), or bovine foamy/syncytial virus (BFV/BSV).


      50. The gene modifying system of embodiment 48 or 49, wherein the Cas domain comprises a Cas domain of Table 7 or Table 8.


      51. The gene modifying system of any one of embodiments 48-50, wherein the Cas domain:
    • (a) is a Cas9 domain;
    • (b) is a SpCas9 domain, a BlatCas9 domain, a Nme2Cas9 domain, a PnpCas9 domain, a SauCas9 domain, a SauCas9-KKH domain, a SauriCas9 domain, a SauriCas9-KKH domain, a ScaCas9-Sc++domain, a SpyCas9 domain, a SpyCas9-NG domain, a SpyCas9-SpRY domain, or a St1Cas9 domain; and/or
    • (c) is a Cas9 domain comprising an N670A mutation, an N611A mutation, an N605A mutation, an N580A mutation, an N588A mutation, an N872A mutation, an N863 mutation, an N622A mutation, or an H840A mutation.


      52. The gene modifying system of embodiment 51, wherein the Cas9 domain binds a PAM sequence listed in Table 7 or Table 12.


      53. The gene modifying system of embodiment 52, wherein a second portion of the human CFTR gene overlaps with a PAM recognized by the Cas domain, e.g., wherein the second portion of the human CFTR gene is within the PAM or wherein the PAM is within the second portion of the human CFTR gene).


      54. The gene modifying system any one of embodiments 47-53, wherein the gRNA spacer is a gRNA spacer according to Table 1, and the Cas domain comprises a Cas domain listed in the same row of Table 1, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      55. The gene modifying system of any one of embodiments 47-53, wherein the template RNA comprises a sequence of a template RNA sequence of Table E3 or E3A or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      56. The gene modifying system of any one of embodiments 47-55, wherein:
    • (a) the template RNA comprises a sequence of a template RNA sequence of Table 3, E3 or E3A;
    • (b) the Cas domain comprises a Cas domain of Table 7 or Table 8;
    • (c) the linker comprises a linker sequence of Table 10; and
    • (d) the gene modifying polypeptide comprises one or two NLS sequences from Table 11.


      57. The gene modifying system of any of embodiments 47-56, which produces a first nick in a first strand of the human CFTR gene.


      58. The gene modifying system of embodiment 57, which further comprises a second strand-targeting gRNA spacer that directs a second nick to the second strand of the human CFTR gene.


      59. The gene modifying system of embodiment 58, wherein the second strand-targeting gRNA comprises a sequence comprising the core nucleotides of a left gRNA spacer sequence or a right gRNA spacer sequence from Table 2, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the left gRNA spacer sequence or right gRNA spacer sequence.


      60. The gene modifying system of embodiment 58, wherein the second strand-targeting gRNA comprises a sequence comprising the core nucleotides of a left gRNA spacer sequence or a right gRNA spacer sequence from Table 2 that corresponds to the gRNA spacer sequence of
    • (i), and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the left gRNA spacer sequence or right gRNA spacer sequence.


      61. The gene modifying system of embodiment 58, wherein the second strand-targeting gRNA comprises:
    • (i) a sequence comprising the core nucleotides of a second nick gRNA sequence from Table 4, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the second nick gRNA sequence; or
    • (ii) a second-strand-targeting gRNA comprising a spacer sequence from Table G3 or G3A or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      62. The gene modifying system of embodiment 58, wherein the second strand-targeting gRNA comprises a sequence comprising the core nucleotides of the second nick gRNA sequence from Table 4 that corresponds to the gRNA spacer sequence of (i), or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the second nick gRNA sequence.


      63. The gene modifying system of any one of embodiments 58-62, wherein the second strand-targeting gRNA has a “PAM-in orientation” with the template RNA of the gene modifying system.


      64. The gene modifying system of any one of embodiments 58-63, the second strand-targeting gRNA targets a sequence overlapping the target mutation of the template RNA.


      65. The gene modifying system of embodiment 64, wherein the second strand-targeting gRNA comprises:
    • (i) a sequence (e.g., a spacer sequence) complementary to the CFTR mutation;
    • (ii) a sequence (e.g., a spacer sequence) complementary to the wild-type sequence at the target locus;
    • (iii) a sequence (e.g., a spacer sequence) complementary to a SNP proximal to the target locus, e.g., a SNP contained in the genomic DNA of a subject (e.g., a patient);
    • (iv) a sequence (e.g., spacer sequence) complementary to or comprising one or more silent substitutions proximal to the target locus.


      66. The template RNA, gene modifying system, or gRNA, of any one of the preceding embodiments, wherein the gRNA spacer comprises about 1, 2, 3, or more flanking nucleotides of the gRNA spacer.


      67. The template RNA or gene modifying system of any one of the preceding embodiments, wherein the heterologous object sequence comprises about 2, 3, 4, 5, 10, 20, 30, 40, or more flanking nucleotides of the RT template sequence.


      68. The template RNA or gene modifying system, of any one of the preceding embodiments, wherein the heterologous object sequence comprises between about 8-30, 9-25, 10-20, 11-16, or 12-15 (e.g., about 11-16) nucleotides.


      69. The template RNA or gene modifying system, of any one of the preceding embodiments, wherein the mutation region comprises 1, 2, or 3 nucleotide positions of sequence differences relative to the corresponding portion of the human CFTR gene.


      70. The template RNA or gene modifying system of any one of the preceding embodiments, wherein the mutation region comprises at least 2 nucleotide positions of sequence difference relative to the corresponding portion of the human CFTR gene.


      71. The template RNA or gene modifying system, of any one of the preceding embodiments, wherein the post-edit homology region and/or pre-edit homology region comprises 100% identity to the CFTR gene.


      72. The template RNA or gene modifying system of any one of the preceding embodiments, wherein the PBS sequence additionally comprises about 1, 2, 3, 4, 5, 6, 7, or more flanking nucleotides.


      73. The template RNA or gene modifying system of any one of the preceding embodiments, wherein the PBS sequence comprises about 5-20, 8-16, 8-14, 8-13, 9-13, 9-12, or 10-12 (e.g., about 9-12) nucleotides.


      74. The template RNA or gene modifying system of any one of the preceding embodiments, wherein the PBS sequence binds within 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides of a nick site in the CFTR gene.


      75. The gene modifying system of any one of the preceding embodiments, wherein the domains of the gene modifying polypeptide are joined by a peptide linker.


      76. The gene modifying system of embodiment 75, wherein the linker comprises a sequence of a linker of Table 10 (e.g., of any of SEQ ID NOs: 5217, 5106, 5190, and 5218).


      77. The gene modifying system of any one of the preceding embodiments, wherein the gene modifying polypeptide further comprise one or more nuclear localization sequences (NLS).


      78. The gene modifying system of embodiment 77, wherein the gene modifying polypeptide comprises a first NLS and a second NLS.


      79. The gene modifying system of embodiment 77 or 78, wherein the NLS comprises a sequence of a NLS of Table 11 (e.g., of any of SEQ ID NOs: 5245, 5290, 5323, 5330, 5349, 5350, 5351, and 4001).


      80. A template RNA comprising a sequence of a template RNA of Table 4, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      81. A template RNA comprising a sequence of a template RNA of Table 4.


      82. A gene modifying system comprising:
    • (i) a template RNA comprising a sequence of a template RNA of Table 4, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and
    • (ii) a second-nick gRNA sequence from the same row of Table 4 as (i), a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


      83. A gene modifying system comprising:
    • (i) a template RNA comprising a sequence of a template RNA of Table 4; and
    • (ii) a second-nick gRNA sequence from the same row of Table 4 as (i).


      84. A DNA encoding the template RNA of any one of embodiments 1-15, 33-46, 66-74, 80, or 81, or the gRNA of any one of embodiments 30-32.


      85. A pharmaceutical composition, comprising the system of any one of embodiments 47-79, 82, or 83, or one or more nucleic acids encoding the same, and a pharmaceutically acceptable excipient or carrier.


      86. The pharmaceutical composition of embodiment 85, wherein the pharmaceutically acceptable excipient or carrier is selected from the group consisting of a plasmid vector, a viral vector, a vesicle, and a lipid nanoparticle.


      87. The pharmaceutical composition of embodiment 86, wherein the viral vector is an adeno-associated virus.


      88. A host cell (e.g., a mammalian cell, e.g., a human cell) comprising the template RNA or gene modifying system of any one of the preceding embodiments.


      89. A method of making the template RNA of any one of embodiments 1-15, 33-46, 66-74, 80, or 81, the method comprising synthesizing the template RNA in vitro (e.g., by in vitro transcription or by solid state synthesis) or by introducing a DNA encoding the template RNA into a host cell under conditions that allow for production of the template RNA.


      90. A method for modifying a target site in the human CFTR gene in a cell, the method comprising contacting the cell with the gene modifying system of any one of embodiments 47-79, 82, or 83, or DNA encoding the same, thereby modifying the target site in the human CFTR gene in a cell.


      91. A method for modifying a target site in the human CFTR gene in a cell, the method comprising contacting the cell with: (i) the template RNA of any one of embodiments 47-79, 82, or 83, or DNA encoding the same; and (ii) a gene modifying polypeptide or a nucleic acid encoding a gene modifying polypeptide, thereby modifying the target site in the human CFTR gene in a cell.


      92. A method for treating a subject having a disease or condition associated with a mutation in the human CFTR gene, the method comprising administering to the subject the gene modifying system of any one of embodiments 47-79, 82, or 83, or DNA encoding the same, thereby treating the subject having a disease or condition associated with a mutation in the human CFTR gene.


      93. A method for treating a subject having a disease or condition associated with a mutation in the human CFTR gene, the method comprising administering to the subject the template RNA of any one of embodiments 47-79, 82, or 83, or DNA encoding the same; and (ii) a gene modifying polypeptide or a nucleic acid encoding a gene modifying polypeptide, thereby treating the subject having a disease or condition associated with a mutation in the human CFTR gene.


      94. The method of embodiment 92 or 93, wherein the disease or condition is cystic fibrosis.


      95. The method of any one of embodiments 92-94, wherein the subject has a mutation at a portion of the CFTR gene corresponding to the RT template sequence of Table 3, e.g., a F508 deletion (F508del) mutation.


      96. A method for treating a subject having cystic fibrosis, the method comprising administering to the subject the gene modifying system of any one of embodiments 47-79, 82, or 83, or DNA encoding the same, thereby treating the subject having cystic fibrosis.


      97. A method for treating a subject having cystic fibrosis, the method comprising administering to the subject (i) the template RNA of any one of embodiments 47-79, 82, or 83, or DNA encoding the same, and (ii) a gene modifying polypeptide or a nucleic acid encoding a gene modifying polypeptide, thereby treating the subject having cystic fibrosis.


      98. The gene modifying system or method of any one of the preceding embodiments, wherein introduction of the system into a target cell results in a correction of one or more pathogenic mutations in the CFTR gene.


      99. The gene modifying system or method of any one of the preceding embodiments, wherein the pathogenic mutation is a F508del mutation, and wherein the correction results in an amino acid insertion of F508.


      100. The gene modifying system or method of any of the preceding embodiments, wherein correction of the mutation occurs in at least 30% (e.g., 30%, 40%, 50%, 60%, 70%, or more) of target nucleic acids.


      101. The gene modifying system or method of any of the preceding embodiments, wherein correction of the mutation occurs in at least 30% (e.g., 30%, 40%, 50%, 60%, 70%, or more) of target cells.


      102. The gene modifying system or method of any of the preceding embodiments, wherein the system comprises a second strand-targeting gRNA spacer that directs a second nick to the second strand of the human CFTR gene, and wherein introduction of the system into a target cell results in an increase of the frequency of correction of the mutation as compared to a system that comprises the template RNA and the gene modifying polypeptide or nucleic acid encoding the gene modifying polypeptide but does not comprise a second strand targeting gRNA spacer.


      103. The method of any of the preceding embodiments, wherein the cell is a mammalian cell, such as a human cell.


      104. The method of any one of the preceding embodiments, wherein the subject is a human.


      105. The method of any of the preceding embodiments, wherein the contacting occurs ex vivo, e.g., wherein the cell's or subject's DNA is modified ex vivo.


      106. The method of any of the preceding embodiments, wherein the contacting occurs in vivo, e.g., wherein the cell's or subject's DNA is modified in vivo.


      107. The method of any of the preceding embodiments, wherein contacting the cell or the subject with the system comprises contacting the cell or a cell within the subject with a nucleic acid (e.g., DNA or RNA) encoding the gene modifying polypeptide under conditions that allow for production of the gene modifying polypeptide.


      108. A template RNA comprising, e.g., from 5′ to 3′:
    • (i) a gRNA spacer that is complementary to a first portion of the human CFTR gene;
    • (ii) a gRNA scaffold that binds a gene modifying polypeptide,
    • (iii) a heterologous object sequence comprising a mutation region to introduce a mutation into a second portion of the human CFTR gene wherein the second portion of the human CFTR gene is a portion that comprises mutations at a plurality of nucleotide positions in the human population, and
    • (iv) a primer binding site (PBS) sequence comprising at least 3, 4, 5, 6, 7, or 8 bases with 100% identity to a third portion of the human CFTR gene,
    • wherein the template RNA is capable of directing editing of mutations at a plurality of nucleotide positions in the human population.





BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.



FIG. 1 depicts a gene modifying system as described herein. The left hand diagram shows the gene modifying polypeptide, which comprises a Cas nickase domain (e.g., spCas9 N863A) and a reverse transcriptase domain (RT domain) which are linked by a linker. The right hand diagram shows the template RNA which comprises, from 5′ to 3′, a gRNA spacer, a gRNA scaffold, a heterologous object sequence, and a primer binding site sequence (PBS sequence). The heterologous object sequence can comprise a mutation region that comprises one or more sequence differences relative to the target site. The heterologous object sequence can also comprise a pre-edit homology region and a post-edit homology region, which flank the mutation region. Without wishing to be bound by theory, it is thought that the gRNA spacer of the template RNA binds to the second strand of a target site in the genome, and the gRNA scaffold of the template RNA binds to the gene modifying polypeptide, e.g., localizing the gene modifying polypeptide to the target site in the genome. It is thought that the Cas domain of the gene modifying polypeptide nicks the target site (e.g., the first strand of the target site), e.g., allowing the PBS sequence to bind to a sequence adjacent to the site to be altered on the first strand of the target site. It is thought that the RT domain of the gene modifying polypeptide uses the first strand of the target site that is bound to the complementary sequence comprising the PBS sequence of the template RNA as a primer and the heterologous object sequence of the template RNA as a template to, e.g., polymerize a sequence complementary to the heterologous object sequence. Without wishing to be bound by theory, it is thought that reverse transcription can then proceed through the pre-edit homology region, then through the mutation region, and then through the post-edit homology region, thereby producing a DNA strand comprising a mutation specified by the heterologous object sequence.



FIG. 2 is a graph showing the percent rewriting achieved using the RNAV209-013 or RNAV214-040 gene modifying polypeptides with the indicated template RNAs.



FIG. 3 is a graph showing the amount of Fah mRNA relative to wild type when template RNAs are used with the RNAV209-013 or RNAV214-040 gene modifying polypeptides.



FIG. 4 is a graph showing the percentage of Cas9-positive hepatocytes 6 hours following dosing with LNPs containing various gene modifying polypeptides and template RNAs.



FIG. 5 is a graph showing the rewrite levels in liver samples 6 days following dosing with LNPs containing various gene modifying polypeptides and template RNAs.



FIG. 6 is a graph showing wild type Fah mRNA restoration compared to littermate heterozygous mice in liver samples following dosing with LNPs containing various gene modifying polypeptides and template RNAs.



FIG. 7 is a graph showing Fah protein distribution in liver samples following dosing with LNPs containing various gene modifying polypeptides and template RNAs.



FIG. 8 is a series of western blots showing Cas9-RT Expression 6 hours after infusion of Cas9-RT mRNA+TTR guide LNP. Each lane represents an individual animal where 20 μg of tissue homogenate was added per lane. Positive control was from an in vitro cell experiment where Cas9-RT was expressed (described previously). GAPDH was used as a loading control for each sample. n=4 per group, vehicle or treated.



FIG. 9 is a graph showing gene editing of TTR locus after treatment with Cas9-RT mRNA+TTR guide LNP. Level of indels detected at the TTR locus measured by TIDE analysis of Sanger sequencing of the TTR locus where the protospacer targets.



FIG. 10 is a graph showing that TTR Serum levels decrease after treatment with Cas9-RT mRNA+TTR guide LNP. Measurement of circulating TTR levels 5 days after mice were treated with LNPs encapsulating Cas9-RT+TTR guide RNA.



FIG. 11 is a graph showing Cas9-RT Expression after infusion of Cas9-RT mRNA+TTR guide LNP. Relative expression quantified by ProteinSimple Jess capillary electrophoresis Western blot. Numbers in the symbols are animal number in group. Vehicle n=2, Cas9-RT+TTR guide n=3.



FIG. 12 is a graph showing gene editing of TTR locus after infusion of Cas9-RT mRNA+TTR guide LNP. Level of indels detected at the TTR locus were measured by amplicon sequencing of the TTR locus where the protospacer targets. Each animal had 8 different biopsies taken across the liver where amplicon sequencing measured the percentage of reads showing an indel.



FIG. 13 is a graph showing INDEL % assessed by Amp-Seq for M470 cells nucleofected with select exemplary gene modifying systems containing different F508 spacer guide RNAs and mRNAs encoding different Cas-RT fusion protein variants, where the systems were designed to assess cutting activity with the different Cas9 domains.



FIG. 14 is a graph showing INDEL percentage as assessed by Amp-Seq for primary HBE cells nucleofected with select exemplary gene modifying system containing select different spacer guide RNAs designed to produce cutting activity at an F508del proximal nick site and mRNA encoding different Cas-RT fusion protein variants (SpCas9, SpRY Cas9, and SpCas9-NG).



FIG. 15 is a graph showing editing percentage as assessed by Amp-Seq for M470 cells nucleofected with exemplary gene modifying systems containing F508 template RNAs designed to produce 3 nucleotide CTT insertions at the nick site to correct the F508 mutation from T to TCTT.



FIG. 16 is a graph showing editing percentage as assessed by Amp-Seq for primary HBE cells nucleofected with exemplary gene modifying system containing F508 template RNAs designed to produce 3 nucleotide CTT insertions at the nick site to correct the F508 mutation selected from the screen in M470 cells.



FIG. 17 is a graph showing rewriting percentage as assessed by amplicon sequencing for M470 cells nucleofected with exemplary gene modifying system containing an F508 template RNA designed to produce 3 nucleotide CTT insertions at the nick site to correct the F508 mutation from T to TCTT along with various different second nick gRNAs.



FIG. 18 is a graph showing editing percentage as assessed by Amp-Seq for primary HBE cells nucleofected with exemplary gene modifying systems containing select F508 template RNAs designed to produce 3 nucleotide CTT insertion at the F508 mutation along with second nick gRNA RNACS2288.





DETAILED DESCRIPTION
Definitions

The term “expression cassette,” as used herein, refers to a nucleic acid construct comprising nucleic acid elements sufficient for the expression of the nucleic acid molecule of the instant invention.


A “gRNA spacer”, as used herein, refers to a portion of a nucleic acid that has complementarity to a target nucleic acid and can, together with a gRNA scaffold, target a Cas protein to the target nucleic acid.


A “gRNA scaffold”, as used herein, refers to a portion of a nucleic acid that can bind a Cas protein and can, together with a gRNA spacer, target the Cas protein to the target nucleic acid. In some embodiments, the gRNA scaffold comprises a crRNA sequence, tetraloop, and tracrRNA sequence.


A “gene modifying polypeptide”, as used herein, refers to a polypeptide comprising a retroviral reverse transcriptase, or a polypeptide comprising an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% amino acid sequence identity to a retroviral reverse transcriptase, which is capable of integrating a nucleic acid sequence (e.g., a sequence provided on a template nucleic acid) into a target DNA molecule (e.g., in a mammalian host cell, such as a genomic DNA molecule in the host cell). In some embodiments, the gene modifying polypeptide is capable of integrating the sequence substantially without relying on host machinery. In some embodiments, the gene modifying polypeptide integrates a sequence into a random position in a genome, and in some embodiments, the gene modifying polypeptide integrates a sequence into a specific target site. In some embodiments, a gene modifying polypeptide includes one or more domains that, collectively, facilitate 1) binding the template nucleic acid, 2) binding the target DNA molecule, and 3) facilitate integration of the at least a portion of the template nucleic acid into the target DNA. Gene modifying polypeptides include both naturally occurring polypeptides as well as engineered variants of the foregoing, e.g., having one or more amino acid substitutions to the naturally occurring sequence. Gene modifying polypeptides also include heterologous constructs, e.g., where one or more of the domains recited above are heterologous to each other, whether through a heterologous fusion (or other conjugate) of otherwise wild-type domains, as well as fusions of modified domains, e.g., by way of replacement or fusion of a heterologous sub-domain or other substituted domain. Exemplary gene modifying polypeptides, and systems comprising them and methods of using them, that can be used in the methods provided herein are described, e.g., in PCT/US2021/020948, which is incorporated herein by reference with respect to gene modifying polypeptides that comprise a retroviral reverse transcriptase domain. In some embodiments, a gene modifying polypeptide integrates a sequence into a gene. In some embodiments, a gene modifying polypeptide integrates a sequence into a sequence outside of a gene. A “gene modifying system,” as used herein, refers to a system comprising a gene modifying polypeptide and a template nucleic acid.


The term “domain” as used herein refers to a structure of a biomolecule that contributes to a specified function of the biomolecule. A domain may comprise a contiguous region (e.g., a contiguous sequence) or distinct, non-contiguous regions (e.g., non-contiguous sequences) of a biomolecule. Examples of protein domains include, but are not limited to, an endonuclease domain, a DNA binding domain, a reverse transcription domain; an example of a domain of a nucleic acid is a regulatory domain, such as a transcription factor binding domain. In some embodiments, a domain (e.g., a Cas domain) can comprise two or more smaller domains (e.g., a DNA binding domain and an endonuclease domain).


As used herein, the term “exogenous”, when used with reference to a biomolecule (such as a nucleic acid sequence or polypeptide) means that the biomolecule was introduced into a host genome, cell or organism by the hand of man. For example, a nucleic acid that is as added into an existing genome, cell, tissue or subject using recombinant DNA techniques or other methods is exogenous to the existing nucleic acid sequence, cell, tissue or subject.


As used herein, “first strand” and “second strand”, as used to describe the individual DNA strands of target DNA, distinguish the two DNA strands based upon which strand the reverse transcriptase domain initiates polymerization, e.g., based upon where target primed synthesis initiates. The first strand refers to the strand of the target DNA upon which the reverse transcriptase domain initiates polymerization, e.g., where target primed synthesis initiates. The second strand refers to the other strand of the target DNA. First and second strand designations do not describe the target site DNA strands in other respects; for example, in some embodiments the first and second strands are nicked by a polypeptide described herein, but the designations ‘first’ and ‘second’ strand have no bearing on the order in which such nicks occur.


The term “heterologous,” as used herein to describe a first element in reference to a second element means that the first element and second element do not exist in nature disposed as described. For example, a heterologous polypeptide, nucleic acid molecule, construct or sequence refers to (a) a polypeptide, nucleic acid molecule or portion of a polypeptide or nucleic acid molecule sequence that is not native to a cell in which it is expressed, (b) a polypeptide or nucleic acid molecule or portion of a polypeptide or nucleic acid molecule that has been altered or mutated relative to its native state, or (c) a polypeptide or nucleic acid molecule with an altered expression as compared to the native expression levels under similar conditions. For example, a heterologous regulatory sequence (e.g., promoter, enhancer) may be used to regulate expression of a gene or a nucleic acid molecule in a way that is different than the gene or a nucleic acid molecule is normally expressed in nature. In another example, a heterologous domain of a polypeptide or nucleic acid sequence (e.g., a DNA binding domain of a polypeptide or nucleic acid encoding a DNA binding domain of a polypeptide) may be disposed relative to other domains or may be a different sequence or from a different source, relative to other domains or portions of a polypeptide or its encoding nucleic acid. In certain embodiments, a heterologous nucleic acid molecule may exist in a native host cell genome, but may have an altered expression level or have a different sequence or both. In other embodiments, heterologous nucleic acid molecules may not be endogenous to a host cell or host genome but instead may have been introduced into a host cell by transformation (e.g., transfection, electroporation), wherein the added molecule may integrate into the host genome or can exist as extra-chromosomal genetic material either transiently (e.g., mRNA) or semi-stably for more than one generation (e.g., episomal viral vector, plasmid or other self-replicating vector).


As used herein, “insertion” of a sequence into a target site refers to the net addition of DNA sequence at the target site, e.g., where there are new nucleotides in the heterologous object sequence with no cognate positions in the unedited target site. In some embodiments, a nucleotide alignment of the PBS sequence and heterologous object sequence to the target nucleic acid sequence would result in an alignment gap in the target nucleic acid sequence.


As used herein, a “deletion” generated by a heterologous object sequence in a target site refers to the net deletion of DNA sequence at the target site, e.g., where there are nucleotides in the unedited target site with no cognate positions in the heterologous object sequence. In some embodiments, a nucleotide alignment of the PBS sequence and heterologous object sequence to the target nucleic acid sequence would result in an alignment gap in the molecule comprising the PBS sequence and heterologous object sequence.


The term “inverted terminal repeats” or “ITRs” as used herein refers to AAV viral cis-elements named so because of their symmetry. These elements promote efficient multiplication of an AAV genome. It is hypothesized that the minimal elements for ITR function are a Rep-binding site (RBS; 5′-GCGCGCTCGCTCGCTC-3′ (SEQ ID NO: 37642) for AAV2) and a terminal resolution site (TRS; 5′-AGTTGG-3′ for AAV2) plus a variable palindromic sequence allowing for hairpin formation. According to the present invention, an ITR comprises at least these three elements (RBS, TRS, and sequences allowing the formation of a hairpin). In addition, in the present invention, the term “ITR” refers to ITRs of known natural AAV serotypes (e.g., ITR of a serotype 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or 11 AAV), to chimeric ITRs formed by the fusion of ITR elements derived from different serotypes, and to functional variants thereof. “Functional variant” refers to a sequence presenting a sequence identity of at least 80%, 85%, 90%, preferably of at least 95% with a known ITR and allowing multiplication of the sequence that includes said ITR in the presence of Rep proteins.


The term “mutation region,” as used herein, refers to a region in a template RNA having one or more sequence difference relative to the corresponding sequence in a target nucleic acid. The sequence difference may comprise, for example, a substitution, insertion, frameshift, or deletion.


The term “mutated” when applied to nucleic acid sequences means that nucleotides in a nucleic acid sequence are inserted, deleted, or changed compared to a reference (e.g., native) nucleic acid sequence. A single alteration may be made at a locus (a point mutation), or multiple nucleotides may be inserted, deleted, or changed at a single locus. In addition, one or more alterations may be made at any number of loci within a nucleic acid sequence. A nucleic acid sequence may be mutated by any method known in the art. “Nucleic acid molecule” refers to both RNA and DNA molecules including, without limitation, complementary DNA (“cDNA”), genomic DNA (“gDNA”), and messenger RNA (“mRNA”), and also includes synthetic nucleic acid molecules, such as those that are chemically synthesized or recombinantly produced, such as RNA templates, as described herein. The nucleic acid molecule can be double-stranded or single-stranded, circular, or linear. If single-stranded, the nucleic acid molecule can be the sense strand or the antisense strand. Unless otherwise indicated, and as an example for all sequences described herein under the general format “SEQ ID NO:” or “nucleic acid comprising SEQ ID NO:1” refers to a nucleic acid, at least a portion which has either (i) the sequence of SEQ ID NO:1, or (ii) a sequence complimentary to SEQ ID NO:1. The choice between the two is dictated by the context in which SEQ ID NO:1 is used. For instance, if the nucleic acid is used as a probe, the choice between the two is dictated by the requirement that the probe be complementary to the desired target. Nucleic acid sequences of the present disclosure may be modified chemically or biochemically or may contain non-natural or derivatized nucleotide bases, as will be readily appreciated by those of skill in the art. Such modifications include, for example, labels, methylation, substitution of one or more naturally occurring nucleotides with an analog, inter-nucleotide modifications such as uncharged linkages (for example, methyl phosphonates, phosphotriesters, phosphoramidates, carbamates, etc.), charged linkages (for example, phosphorothioates, phosphorodithioates, etc.), pendant moieties, (for example, polypeptides), intercalators (for example, acridine, psoralen, etc.), chelators, alkylators, and modified linkages (for example, alpha anomeric nucleic acids, etc.). Also included are chemically modified bases (see, for example, Table 13), backbones (see, for example, Table 14), and modified caps (see, for example, Table 15). Also included are synthetic molecules that mimic polynucleotides in their ability to bind to a designated sequence via hydrogen bonding and other chemical interactions. Such molecules are known in the art and include, for example, those in which peptide linkages substitute for phosphate linkages in the backbone of a molecule, e.g., peptide nucleic acids (PNAs). Other modifications can include, for example, analogs in which the ribose ring contains a bridging moiety or other structure such as modifications found in “locked” nucleic acids (LNAs). In various embodiments, the nucleic acids are in operative association with additional genetic elements, such as tissue-specific expression-control sequence(s) (e.g., tissue-specific promoters and tissue-specific microRNA recognition sequences), as well as additional elements, such as inverted repeats (e.g., inverted terminal repeats, such as elements from or derived from viruses, e.g., AAV ITRs) and tandem repeats, inverted repeats/direct repeats, homology regions (segments with various degrees of homology to a target DNA), untranslated regions (UTRs) (5′, 3′, or both 5′ and 3′ UTRs), and various combinations of the foregoing. The nucleic acid elements of the systems provided by the invention can be provided in a variety of topologies, including single-stranded, double-stranded, circular, linear, linear with open ends, linear with closed ends, and particular versions of these, such as doggybone DNA (dbDNA), closed-ended DNA (ceDNA).


As used herein, a “gene expression unit” is a nucleic acid sequence comprising at least one regulatory nucleic acid sequence operably linked to at least one effector sequence. A first nucleic acid sequence is operably linked with a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter or enhancer is operably linked to a coding sequence if the promoter or enhancer affects the transcription or expression of the coding sequence. Operably linked DNA sequences may be contiguous or non-contiguous. Where necessary to join two protein-coding regions, operably linked sequences may be in the same reading frame. The terms “host genome” or “host cell”, as used herein, refer to a cell and/or its genome into which protein and/or genetic material has been introduced. It should be understood that such terms are intended to refer not only to the particular subject cell and/or genome, but to the progeny of such a cell and/or the genome of the progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term “host cell” as used herein. A host genome or host cell may be an isolated cell or cell line grown in culture, or genomic material isolated from such a cell or cell line, or may be a host cell or host genome which composing living tissue or an organism. In some instances, a host cell may be an animal cell or a plant cell, e.g., as described herein. In certain instances, a host cell may be a mammalian cell, a human cell, avian cell, reptilian cell, bovine cell, horse cell, pig cell, goat cell, sheep cell, chicken cell, or turkey cell. In certain instances, a host cell may be a corn cell, soy cell, wheat cell, or rice cell.


As used herein, “operative association” describes a functional relationship between two nucleic acid sequences, such as a 1) promoter and 2) a heterologous object sequence, and means, in such example, the promoter and heterologous object sequence (e.g., a gene of interest) are oriented such that, under suitable conditions, the promoter drives expression of the heterologous object sequence. For instance, a template nucleic acid carrying a promoter and a heterologous object sequence may be single-stranded, e.g., either the (+) or (−) orientation. An “operative association” between the promoter and the heterologous object sequence in this template means that, regardless of whether the template nucleic acid will be transcribed in a particular state, when it is in the suitable state (e.g., is in the (+) orientation, in the presence of required catalytic factors, and NTPs, etc.), it is accurately transcribed. Operative association applies analogously to other pairs of nucleic acids, including other tissue-specific expression control sequences (such as enhancers, repressors and microRNA recognition sequences), IR/DR, ITRs, UTRs, or homology regions and heterologous object sequences or sequences encoding a retroviral RT domain.


The term “primer binding site sequence” or “PBS sequence,” as used herein, refers to a portion of a template RNA capable of binding to a region comprised in a target nucleic acid sequence. In some instances, a PBS sequence is a nucleic acid sequence comprising at least 3, 4, 5, 6, 7, or 8 bases with 100% identity to the region comprised in the target nucleic acid sequence. In some embodiments the primer region comprises at least 5, 6, 7, 8 bases with 100% identity to the region comprised in the target nucleic acid sequence. Without wishing to be bound by theory, in some embodiments when a template RNA comprises a PBS sequence and a heterologous object sequence, the PBS sequence binds to a region comprised in a target nucleic acid sequence, allowing a reverse transcriptase domain to use that region as a primer for reverse transcription, and to use the heterologous object sequence as a template for reverse transcription.


As used herein, a “stem-loop sequence” refers to a nucleic acid sequence (e.g., RNA sequence) with sufficient self-complementarity to form a stem-loop, e.g., having a stem comprising at least two (e.g., 3, 4, 5, 6, 7, 8, 9, or 10) base pairs, and a loop with at least three (e.g., four) base pairs. The stem may comprise mismatches or bulges.


As used herein, a “tissue-specific expression-control sequence” means nucleic acid elements that increase or decrease the level of a transcript comprising the heterologous object sequence in a target tissue in a tissue-specific manner, e.g., preferentially in on-target tissue(s), relative to off-target tissue(s). In some embodiments, a tissue-specific expression-control sequence preferentially drives or represses transcription, activity, or the half-life of a transcript comprising the heterologous object sequence in the target tissue in a tissue-specific manner, e.g., preferentially in an on-target tissue(s), relative to an off-target tissue(s). Exemplary tissue-specific expression-control sequences include tissue-specific promoters, repressors, enhancers, or combinations thereof, as well as tissue-specific microRNA recognition sequences. Tissue specificity refers to on-target (tissue(s) where expression or activity of the template nucleic acid is desired or tolerable) and off-target (tissue(s) where expression or activity of the template nucleic acid is not desired or is not tolerable). For example, a tissue-specific promoter drives expression preferentially in on-target tissues, relative to off-target tissues. In contrast, a microRNA that binds the tissue-specific microRNA recognition sequences is preferentially expressed in off-target tissues, relative to on-target tissues, thereby reducing expression of a template nucleic acid in off-target tissues. Accordingly, a promoter and a microRNA recognition sequence that are specific for the same tissue, such as the target tissue, have contrasting functions (promote and repress, respectively, with concordant expression levels, i.e., high levels of the microRNA in off-target tissues and low levels in on-target tissues, while promoters drive high expression in on-target tissues and low expression in off-target tissues) with regard to the transcription, activity, or half-life of an associated sequence in that tissue.












Table of Contents















1) Introduction


2) Gene modifying systems


 a) Polypeptide components of gene modifying systems


  i) Writing domain


  ii) Endonuclease domains and DNA binding domains


   (1) Gene modifying polypeptides comprising Cas domains


   (2) TAL Effectors and Zinc Finger Nucleases


  iii) Linkers


  iv) Localization sequences for gene modifying systems


  v) Evolved Variants of Gene Modifying Polypeptides and Systems


  vi) Inteins


  vii) Additional domains


 b) Template nucleic acids


  i) gRNA spacer and gRNA scaffold


  ii) Heterologous object sequence


  iii) PBS sequence


  iv) Exemplary Template Sequences


 c) gRNAs with inducible activity


 d) Circular RNAs and Ribozymes in Gene Modifying Systems


 e) Target Nucleic Acid Site


 f) Second strand nicking


3) Production of Compositions and Systems


4) Therapeutic Applications


5) Administration and Delivery


 a) Tissue Specific Activity/Administration


  i) Promoters


  ii) microRNAs


 b) Viral vectors and components thereof


 c) AAV Administration


 d) Lipid Nanoparticles


6) Kits, Articles of Manufacture, and Pharmaceutical Compositions


7) Chemistry, Manufacturing, and Controls (CMC)









INTRODUCTION

This disclosure relates to methods for treating cystic fibrosis (CF) and compositions for targeting, editing, modifying or manipulating a DNA sequence (e.g., inserting a heterologous object sequence into a target site of a mammalian genome) at one or more locations in a DNA sequence in a cell, tissue or subject, e.g., in vivo or in vitro. The heterologous object DNA sequence may include, e.g., a substitution, a deletion, an insertion, e.g., a coding sequence, a regulatory sequence, or a gene expression unit.


More specifically, the disclosure provides methods for treating cystic fibrosis using reverse transcriptase-based systems for altering a genomic DNA sequence of interest, e.g., by inserting, deleting, or substituting one or more nucleotides into/from the sequence of interest.


The disclosure provides, in part, methods for treating cystic fibrosis using a gene modifying system comprising a gene modifying polypeptide component and a template nucleic acid (e.g., template RNA) component. In some embodiments, a gene modifying system can be used to introduce an alteration into a target site in a genome. In some embodiments, the gene modifying polypeptide component comprises a writing domain (e.g., a reverse transcriptase domain), a DNA-binding domain, and an endonuclease domain (e.g., nickase domain). In some embodiments, the template nucleic acid (e.g., template RNA) comprises a sequence (e.g., a gRNA spacer) that binds a target site in the genome (e.g., that binds to a second strand of the target site), a sequence (e.g., a gRNA scaffold) that binds the gene modifying polypeptide component, a heterologous object sequence, and a PBS sequence. Without wishing to be bound by theory, it is thought that the template nucleic acid (e.g., template RNA) binds to the second strand of a target site in the genome and binds to the gene modifying polypeptide component (e.g., localizing the polypeptide component to the target site in the genome). It is thought that the endonuclease (e.g., nickase) of the gene modifying polypeptide component cuts the target site (e.g., the first strand of the target site), e.g., allowing the PBS sequence to bind to a sequence adjacent to the site to be altered on the first strand of the target site. It is thought that the writing domain (e.g., reverse transcriptase domain) of the polypeptide component uses the first strand of the target site that is bound to the complementary sequence comprising the PBS sequence of the template nucleic acid as a primer and the heterologous object sequence of the template nucleic acid as a template to, e.g., polymerize a sequence complementary to the heterologous object sequence. Without wishing to be bound by theory, it is thought that selection of an appropriate heterologous object sequence can result in substitution, deletion, and/or insertion of one or more nucleotides at the target site.


Gene Modifying Systems

In some embodiments, a gene modifying system described herein comprises: (A) a gene modifying polypeptide or a nucleic acid encoding the gene modifying polypeptide, wherein the gene modifying polypeptide comprises (i) a reverse transcriptase domain, and either (x) an endonuclease domain that contains DNA binding functionality or (y) an endonuclease domain and separate DNA binding domain; and (B) a template RNA. A gene modifying polypeptide, in some embodiments, acts as a substantially autonomous protein machine capable of integrating a template nucleic acid sequence into a target DNA molecule (e.g., in a mammalian host cell, such as a genomic DNA molecule in the host cell), substantially without relying on host machinery. For example, the gene modifying protein may comprise a DNA-binding domain, a reverse transcriptase domain, and an endonuclease domain. In some embodiments, the DNA-binding function may involve an RNA component that directs the protein to a DNA sequence, e.g., a gRNA spacer. In other embodiments, the gene modifying polypeptide may comprise a reverse transcriptase domain and an endonuclease domain. The RNA template element of a gene modifying system is typically heterologous to the gene modifying polypeptide element and provides an object sequence to be inserted (reverse transcribed) into the host genome. In some embodiments, the gene modifying polypeptide is capable of target primed reverse transcription. In some embodiments, the gene modifying polypeptide is capable of second-strand synthesis.


In some embodiments the gene modifying system is combined with a second polypeptide. In some embodiments, the second polypeptide may comprise an endonuclease domain. In some embodiments, the second polypeptide may comprise a polymerase domain, e.g., a reverse transcriptase domain. In some embodiments, the second polypeptide may comprise a DNA-dependent DNA polymerase domain. In some embodiments, the second polypeptide aids in completion of the genome edit, e.g., by contributing to second-strand synthesis or DNA repair resolution.


A functional gene modifying polypeptide can be made up of unrelated DNA binding, reverse transcription, and endonuclease domains. This modular structure allows combining of functional domains, e.g., dCas9 (DNA binding), MMLV reverse transcriptase (reverse transcription), FokI (endonuclease). In some embodiments, multiple functional domains may arise from a single protein, e.g., Cas9 or Cas9 nickase (DNA binding, endonuclease).


In some embodiments, a gene modifying polypeptide includes one or more domains that, collectively, facilitate 1) binding the template nucleic acid, 2) binding the target DNA molecule, and 3) facilitate integration of the at least a portion of the template nucleic acid into the target DNA. In some embodiments, the gene modifying polypeptide is an engineered polypeptide that comprises one or more amino acid substitutions to a corresponding naturally occurring sequence. In some embodiments, the gene modifying polypeptide comprises two or more domains that are heterologous relative to each other, e.g., through a heterologous fusion (or other conjugate) of otherwise wild-type domains, or well as fusions of modified domains, e.g., by way of replacement or fusion of a heterologous sub-domain or other substituted domain. For instance, in some embodiments, one or more of: the RT domain is heterologous to the DBD; the DBD is heterologous to the endonuclease domain; or the RT domain is heterologous to the endonuclease domain.


In some embodiments, a template RNA molecule for use in the system comprises, from 5′ to 3′ (1) a gRNA spacer; (2) a gRNA scaffold; (3) heterologous object sequence (4) a primer binding site (PBS) sequence. In some embodiments:

    • (1) Is a gRNA spacer of −18-22 nt, e.g., is 20 nt
    • (2) Is a gRNA scaffold comprising one or more hairpin loops, e.g., 1, 2, of 3 loops for associating the template with a Cas domain, e.g., a nickase Cas9 domain. In some embodiments, the gRNA scaffold comprises the sequence, from 5′ to 3′,









(SEQ ID NO: 5008)


GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAA


CTTGAAAAAGTGGGACCGAGTCGGTCC.








    • (3) In some embodiments, the heterologous object sequence is, e.g., 7-74, e.g., 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, or 70-80 nt or, 80-90 nt in length. In some embodiments, the first (most 5′) base of the sequence is not C.

    • (4) In some embodiments, the PBS sequence that binds the target priming sequence after nicking occurs is e.g., 3-20 nt, e.g., 7-15 nt, e.g., 12-14 nt. In some embodiments, the PBS sequence has 40-60% GC content.





In some embodiments, a second gRNA associated with the system may help drive complete integration. In some embodiments, the second gRNA may target a location that is 0-200 nt away from the first-strand nick, e.g., 0-50, 50-100, 100-200 nt away from the first-strand nick. In some embodiments, the second gRNA can only bind its target sequence after the edit is made, e.g., the gRNA binds a sequence present in the heterologous object sequence, but not in the initial target sequence.


In some embodiments, a gene modifying system described herein is used to make an edit in HEK293, K562, U205, or HeLa cells. In some embodiment, a gene modifying system is used to make an edit in primary cells, e.g., primary cortical neurons from E18.5 mice.


In some embodiments, a gene modifying polypeptide as described herein comprises a reverse transcriptase or RT domain (e.g., as described herein) that comprises a MoMLV RT sequence or variant thereof. In embodiments, the MoMLV RT sequence comprises one or more mutations selected from D200N, L603W, T330P, T306K, W313F, D524G, E562Q, D583N, P51L, S67R, E67K, T197A, H204R, E302K, F309N, L435G, N454K, H594Q, D653N, R110S, and K103L. In embodiments, the MoMLV RT sequence comprises a combination of mutations, such as D200N, L603W, and T330P, optionally further including T306K and/or W313F.


In some embodiments, an endonuclease domain (e.g., as described herein) comprises nCas9, e.g., comprising an N863A mutation (e.g., in spCas9) or a H840A mutation.


In some embodiments, the heterologous object sequence (e.g., of a system as described herein) is about 1-50, 50-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1000, or more, nucleotides in length.


In some embodiments, the RT and endonuclease domains are joined by a flexible linker, e.g., comprising the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGSS (SEQ ID NO: 5006).


In some embodiments, the endonuclease domain is N-terminal relative to the RT domain. In some embodiments, the endonuclease domain is C-terminal relative to the RT domain.


In some embodiments, the system incorporates a heterologous object sequence into a target site by TPRT, e.g., as described herein.


In some embodiments, a gene modifying polypeptide comprises a DNA binding domain. In some embodiments, a gene modifying polypeptide comprises an RNA binding domain. In some embodiments, the RNA binding domain comprises an RNA binding domain of B-box protein, MS2 coat protein, dCas, or an element of a sequence of a table herein. In some embodiments, the RNA binding domain is capable of binding to a template RNA with greater affinity than a reference RNA binding domain.


In some embodiments, a gene modifying system is capable of producing an insertion into the target site of at least 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 nucleotides (and optionally no more than 500, 400, 300, 200, or 100 nucleotides). In some embodiments, a gene modifying system is capable of producing an insertion into the target site of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 nucleotides (and optionally no more than 500, 400, 300, 200, or 100 nucleotides). In some embodiments, a gene modifying system is capable of producing an insertion into the target site of at least 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5 or 10 kilobases (and optionally no more than 1, 5, 10, or 20 kilobases). In some embodiments, a gene modifying system is capable of producing a deletion of at least 81, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides (and optionally no more than 500, 400, 300, or 200 nucleotides). In some embodiments, a gene modifying system is capable of producing a deletion of at least 81, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides (and optionally no more than 500, 400, 300, or 200 nucleotides). In some embodiments, a gene modifying system is capable of producing a deletion of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides (and optionally no more than 500, 400, 300, or 200 nucleotides). In some embodiments, a gene modifying system is capable of producing a deletion of at least 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5 or 10 kilobases (and optionally no more than 1, 5, 10, or 20 kilobases). In some embodiments, a gene modifying system is capable of producing a substitution into the target site of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 or more nucleotides. In some embodiments, a gene modifying system is capable of producing a substitution in the target site of 1-2, 2-3, 3-4, 4-5, 5-10, 10-15, 15-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, or 90-100 nucleotides.


In some embodiments, the substitution is a transition mutation. In some embodiments, the substitution is a transversion mutation. In some embodiments, the substitution converts an adenine to a thymine, an adenine to a guanine, an adenine to a cytosine, a guanine to a thymine, a guanine to a cytosine, a guanine to an adenine, a thymine to a cytosine, a thymine to an adenine, a thymine to a guanine, a cytosine to an adenine, a cytosine to a guanine, or a cytosine to a thymine.


In some embodiments, an insertion, deletion, substitution, or combination thereof, increases or decreases expression (e.g. transcription or translation) of a gene. In some embodiments, an insertion, deletion, substitution, or combination thereof, increases or decreases expression (e.g. transcription or translation) of a gene by altering, adding, or deleting sequences in a promoter or enhancer, e.g. sequences that bind transcription factors. In some embodiments, an insertion, deletion, substitution, or combination thereof alters translation of a gene (e.g. alters an amino acid sequence), inserts or deletes a start or stop codon, alters or fixes the translation frame of a gene. In some embodiments, an insertion, deletion, substitution, or combination thereof alters splicing of a gene, e.g. by inserting, deleting, or altering a splice acceptor or donor site. In some embodiments, an insertion, deletion, substitution, or combination thereof alters transcript or protein half-life. In some embodiments, an insertion, deletion, substitution, or combination thereof alters protein localization in the cell (e.g. from the cytoplasm to a mitochondria, from the cytoplasm into the extracellular space (e.g. adds a secretion tag)). In some embodiments, an insertion, deletion, substitution, or combination thereof alters (e.g. improves) protein folding (e.g. to prevent accumulation of misfolded proteins). In some embodiments, an insertion, deletion, substitution, or combination thereof, alters, increases, decreases the activity of a gene, e.g., a protein encoded by the gene.


Exemplary gene modifying polypeptides, and systems comprising them and methods of using them are described, e.g., in PCT/US2021/020948, which is incorporated herein by reference with respect to retroviral RT domains, including the amino acid and nucleic acid sequences therein.


Exemplary gene modifying polypeptides and retroviral RT domain sequences are also described, e.g., in International Application No. PCT/US21/20948 filed Mar. 4, 2021, e.g., at Table 30, Table 31, and Table 44 therein; the entire application is incorporated by reference herein with respect to retroviral RTs, e.g., in said sequences and tables. Accordingly, a gene modifying polypeptide described herein may comprise an amino acid sequence according to any of the Tables mentioned in this paragraph, or a domain thereof (e.g., a retroviral RT domain), or a functional fragment or variant of any of the foregoing, or an amino acid sequence having at least 70%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In some embodiments, a polypeptide for use in any of the systems described herein can be a molecular reconstruction or ancestral reconstruction based upon the aligned polypeptide sequence of multiple homologous proteins. In some embodiments, a reverse transcriptase domain for use in any of the systems described herein can be a molecular reconstruction or an ancestral reconstruction, or can be modified at particular residues, based upon alignments of reverse transcriptase domains from the same or different sources. A skilled artisan can, based on the Accession numbers provided herein, align polypeptides or nucleic acid sequences, e.g., by using routine sequence analysis tools as Basic Local Alignment Search Tool (BLAST) or CD-Search for conserved domain analysis. Molecular reconstructions can be created based upon sequence consensus, e.g., using approaches described in Ivies et al., Cell 1997, 501-510; Wagstaff et al., Molecular Biology and Evolution 2013, 88-99.


Polypeptide Components of Gene Modifying Systems

In some embodiments, the gene modifying polypeptide possesses the functions of DNA target site binding, template nucleic acid (e.g., RNA) binding, DNA target site cleavage, and template nucleic acid (e.g., RNA) writing, e.g., reverse transcription. In some embodiments, each function is contained within a distinct domain. In some embodiments, a function may be attributed to two or more domains (e.g., two or more domains, together, exhibit the functionality). In some embodiments, two or more domains may have the same or similar function (e.g., two or more domains each independently have DNA-binding functionality, e.g., for two different DNA sequences). In other embodiments, one or more domains may be capable of enabling one or more functions, e.g., a Cas9 domain enabling both DNA binding and target site cleavage. In some embodiments, the domains are all located within a single polypeptide. In some embodiments, a first domain is in one polypeptide and a second domain is in a second polypeptide. For example, in some embodiments, the sequences may be split between a first polypeptide and a second polypeptide, e.g., wherein the first polypeptide comprises a reverse transcriptase (RT) domain and wherein the second polypeptide comprises a DNA-binding domain and an endonuclease domain, e.g., a nickase domain. As a further example, in some embodiments, the first polypeptide and the second polypeptide each comprise a DNA binding domain (e.g., a first DNA binding domain and a second DNA binding domain). In some embodiments, the first and second polypeptide may be brought together post-translationally via a split-intein to form a single gene modifying polypeptide.


In some aspects, a gene modifying polypeptide described herein comprises (e.g., a system described herein comprises a gene modifying polypeptide that comprises): 1) a Cas domain (e.g., a Cas nickase domain, e.g., a Cas9 nickase domain); 2) a reverse transcriptase (RT) domain of Table D, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto, wherein the RT domain is C-terminal of the Cas domain; and a linker disposed between the RT domain and the Cas domain, wherein the linker has a sequence from the same row of Table D as the RT domain, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto.


In some embodiments, the RT domain has a sequence with 100% identity to the RT domain of Table D and the linker has a sequence with 100% identity to the linker sequence from the same row of Table D as the RT domain. In some embodiments, the Cas domain comprises a sequence of Table 8, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99% identity thereto. In some embodiments, the gene modifying polypeptide comprises an amino acid sequence according to any of SEQ ID NOs: 1-3332 in the sequence listing, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto.


In some embodiments, the gene modifying polypeptide comprises a GG amino acid sequence between the Cas domain and the linker, an AG amino acid sequence between the RT domain and the second NLS, and/or a GG amino acid sequence between the linker and the RT domain. In some embodiments, the gene modifying polypeptide comprises a sequence of SEQ ID NO: 4000 which comprises the first NLS and the Cas domain, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99% identity thereto. In some embodiments, the gene modifying polypeptide comprises a sequence of SEQ ID NO: 4001 which comprises the second NLS, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99% identity thereto.


Exemplary N-Terminal NLS-Cas9 Domain










(SEQ ID NO: 4000)



MPAAKRVKLDGGDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF






DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHP





IFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDV





DKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALS





LGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVN





TEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFY





KFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNR





EKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKN





LPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLK





EDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDR





EMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNF





MQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPE





NIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRD





MYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWRQ





LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKL





IREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDY





KVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKG





RDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAY





SVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFE





LENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEI





IEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDR





KRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDGG






Exemplary C-Terminal Sequence Comprising an NLS











(SEQ ID NO: 4001)



AGKRTADGSEFEKRTADGSEFESPKKKAKVE






Writing Domain (RT Domain)


In certain aspects of the present invention, the writing domain of the gene modifying system possesses reverse transcriptase activity and is also referred to as a reverse transcriptase domain (a RT domain). In some embodiments, the RT domain comprises an RT catalytic portion and RNA-binding region (e.g., a region that binds the template RNA).


In some embodiments, a nucleic acid encoding the reverse transcriptase is altered from its natural sequence to have altered codon usage, e.g., improved for human cells. In some embodiments the reverse transcriptase domain is a heterologous reverse transcriptase from a retrovirus. In some embodiments, the RT domain comprising a gene modifying polypeptide has been mutated from its original amino acid sequence, e.g., has at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions. In some embodiments, the RT domain is derived from the RT of a retrovirus, e.g., HIV-1 RT, Moloney Murine Leukemia Virus (MMLV) RT, avian myeloblastosis virus (AMV) RT, or Rous Sarcoma Virus (RSV) RT.


In some embodiments, the retroviral reverse transcriptase (RT) domain exhibits enhanced stringency of target-primed reverse transcription (TPRT) initiation, e.g., relative to an endogenous RT domain. In some embodiments, the RT domain initiates TPRT when the 3 nt in the target site immediately upstream of the first strand nick, e.g., the genomic DNA priming the RNA template, have at least 66% or 100% complementarity to the 3 nt of homology in the RNA template. In some embodiments, the RT domain initiates TPRT when there are less than 5 nt mismatched (e.g., less than 1, 2, 3, 4, or 5 nt mismatched) between the template RNA homology and the target DNA priming reverse transcription. In some embodiments, the RT domain is modified such that the stringency for mismatches in priming the TPRT reaction is increased, e.g., wherein the RT domain does not tolerate any mismatches or tolerates fewer mismatches in the priming region relative to a wild-type (e.g., unmodified) RT domain. In some embodiments, the RT domain comprises a HIV-1 RT domain. In embodiments, the HIV-1 RT domain initiates lower levels of synthesis even with three nucleotide mismatches relative to an alternative RT domain (e.g., as described by Jamburuthugoda and Eickbush J Mol Biol 407(5):661-672 (2011); incorporated herein by reference in its entirety). In some embodiments, the RT domain forms a dimer (e.g., a heterodimer or homodimer). In some embodiments, the RT domain is monomeric. In some embodiments, an RT domain, naturally functions as a monomer or as a dimer (e.g., heterodimer or homodimer). In some embodiments, an RT domain naturally functions as a monomer, e.g., is derived from a virus wherein it functions as a monomer. In embodiments, the RT domain is selected from an RT domain from murine leukemia virus (MLV; sometimes referred to as MoMLV) (e.g., P03355), porcine endogenous retrovirus (PERV) (e.g., UniProt Q4VFZ2), mouse mammary tumor virus (MIVITV) (e.g., UniProt P03365), Avian reticuloendotheliosis virus (AVIRE) (e.g., UniProtKB accession: P03360); Feline leukemia virus (FLV or FeLV) (e.g., e.g., UniProtKB accession: P10273); Mason-Pfizer monkey virus (MPMV) (e.g., UniProt P07572), bovine leukemia virus (BLV) (e.g., UniProt P03361), human T-cell leukemia virus-1 (HTLV-1) (e.g., UniProt P03362), human foamy virus (HFV) (e.g., UniProt P14350), simian foamy virus (SFV) (e.g., SFV3L) (e.g., UniProt P23074 or P27401), or bovine foamy/syncytial virus (BFV/BSV) (e.g., UniProt 041894), or a functional fragment or variant thereof (e.g., an amino acid sequence having at least 70%, 80%, 90%, 95%, or 99% identity thereto). In some embodiments, an RT domain is dimeric in its natural functioning. In some embodiments, the RT domain is derived from a virus wherein it functions as a dimer. In embodiments, the RT domain is selected from an RT domain from avian sarcoma/leukemia virus (ASLV) (e.g., UniProt A0A142BKH1), Rous sarcoma virus (RSV) (e.g., UniProt P03354), avian myeloblastosis virus (AMV) (e.g., UniProt Q83133), human immunodeficiency virus type I (HIV-1) (e.g., UniProt P03369), human immunodeficiency virus type II (HIV-2) (e.g., UniProt P15833), simian immunodeficiency virus (SIV) (e.g., UniProt P05896), bovine immunodeficiency virus (BIV) (e.g., UniProt P19560), equine infectious anemia virus (EIAV) (e.g., UniProt P03371), or feline immunodeficiency virus (FIV) (e.g., UniProt P16088) (Herschhorn and Hizi Cell Mol Life Sci 67(16):2717-2747 (2010)), or a functional fragment or variant thereof (e.g., an amino acid sequence having at least 70%, 80%, 90%, 95%, or 99% identity thereto). Naturally heterodimeric RT domains may, in some embodiments, also be functional as homodimers. In some embodiments, dimeric RT domains are expressed as fusion proteins, e.g., as homodimeric fusion proteins or heterodimeric fusion proteins. In some embodiments, the RT function of the system is fulfilled by multiple RT domains (e.g., as described herein). In further embodiments, the multiple RT domains are fused or separate, e.g., may be on the same polypeptide or on different polypeptides.


In some embodiments, a gene modifying system described herein comprises an integrase domain, e.g., wherein the integrase domain may be part of the RT domain. In some embodiments, an RT domain (e.g., as described herein) comprises an integrase domain. In some embodiments, an RT domain (e.g., as described herein) lacks an integrase domain, or comprises an integrase domain that has been inactivated by mutation or deleted. In some embodiment, a gene modifying system described herein comprises an RNase H domain, e.g., wherein the RNase H domain may be part of the RT domain. In some embodiments, the RNase H domain is not part of the RT domain and is covalently linked via a flexible linker. In some embodiments, an RT domain (e.g., as described herein) comprises an RNase H domain, e.g., an endogenous RNAse H domain or a heterologous RNase H domain. In some embodiments, an RT domain (e.g., as described herein) lacks an RNase H domain. In some embodiments, an RT domain (e.g., as described herein) comprises an RNase H domain that has been added, deleted, mutated, or swapped for a heterologous RNase H domain. In some embodiments, the polypeptide comprises an inactivated endogenous RNase H domain. In some embodiments, an endogenous RNase H domain from one of the other domains of the polypeptide is genetically removed such that it is not included in the polypeptide, e.g., the endogenous RNase H domain is partially or completely truncated from the comprising domain. In some embodiments, mutation of an RNase H domain yields a polypeptide exhibiting lower RNase activity, e.g., as determined by the methods described in Kotewicz et al. Nucleic Acids Res 16(1):265-277 (1988) (incorporated herein by reference in its entirety), e.g., lower by at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% compared to an otherwise similar domain without the mutation. In some embodiments, RNase H activity is abolished.


In some embodiments, an RT domain is mutated to increase fidelity compared to an otherwise similar domain without the mutation. For instance, in some embodiments, a YADD (SEQ ID NO: 37635) or YMDD motif (SEQ ID NO: 37636) in an RT domain (e.g., in a reverse transcriptase) is replaced with YVDD (SEQ ID NO: 37637). In embodiments, replacement of the YADD (SEQ ID NO: 37635) or YMDD (SEQ ID NO: 37636) or YVDD (SEQ ID NO: 37637) results in higher fidelity in retroviral reverse transcriptase activity (e.g., as described in Jamburuthugoda and Eickbush J Mol Biol 2011; incorporated herein by reference in its entirety).


In some embodiments, a gene modifying polypeptide described herein comprises an RT domain having an amino acid sequence according to Table 6, or a sequence having at least 70%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto. In some embodiments, a nucleic acid described herein encodes an RT domain having an amino acid sequence according to Table 6, or a sequence having at least 70%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto.









TABLE 6







Exemplary reverse transcriptase domains from retroviruses









RT
SEQ ID



Name
NO:
RT amino acid sequence





AVIRE_
8,001
TAPLEEEYRLFLEAPIQNVTLLEQWKREIPKVWAEINPPGLASTQAPIHVQLLSTALPVRVRQYPITLEAKRSLRETIRKFRAAGILRPVHSPWNTPLLPV


P03360

RKSGTSEYRMVQDLREVNKRVETIHPTVPNPYTLLSLLPPDRIWYSVLDLKDAFFCIPLAPESQLIFAFEWADAEEGESGQLTWTRLPQGFKNSPTLFD




EALNRDLQGFRLDHPSVSLLQYVDDLLIAADTQAACLSATRDLLMTLAELGYRVSGKKAQLCQEEVTYLGFKIHKGSRSLSNSRTQAILQIPVPKTKRQV




REFLGTIGYCRLWIPGFAELAQPLYAATRGGNDPLVWGEKEEEAFQSLKLALTQPPALALPSLDKPFQLFVEETSGAAKGVLTQALGPWKRPVAYLSK




RLDPVAAGWPRCLRAIAAAALLTREASKLTFGQDIEITSSHNLESLLRSPPDKWLTNARITQYQVLLLDPPRVRFKQTAALNPATLLPETDDTLPIHHCLD




TLDSLTSTRPDLTDQPLAQAEATLFTDGSSYIRDGKRYAGAAVVTLDSVIWAEPLPIGTSAQKAELIALTKALEWSKDKSVNIYTDSRYAFATLHVHGMIY




RERGLLTAGGKAIKNAPEILALLTAVWLPKRVAVMHCKGHQKDDAPTSTGNRRADEVAREVAIRPLSTQATIS





AVIRE_
8,002
TAPLEEEYRLFLEAPIQNVTLLEQWKREIPKVWAEINPPGLASTQAPIHVQLLSTALPVRVRQYPITLEAKRSLRETIRKFRAAGILRPVHSPWNTPLLPV


P03360_

RKSGTSEYRMVQDLREVNKRVETIHPTVPNPYTLLSLLPPDRIWYSVLDLKDAFFCIPLAPESQLIFAFEWADAEEGESGQLTWTRLPQGFKNSPTLFN


3mut

EALNRDLQGFRLDHPSVSLLQYVDDLLIAADTQAACLSATRDLLMTLAELGYRVSGKKAQLCQEEVTYLGFKIHKGSRSLSNSRTQAILQIPVPKTKRQV




REFLGTIGYCRLWIPGFAELAQPLYAATRPGNDPLVWGEKEEEAFQSLKLALTQPPALALPSLDKPFQLFVEETSGAAKGVLTQALGPWKRPVAYLSK




RLDPVAAGWPRCLRAIAAAALLTREASKLTFGQDIEITSSHNLESLLRSPPDKWLTNARITQYQVLLLDPPRVRFKQTAALNPATLLPETDDTLPIHHCLD




TLDSLTSTRPDLTDQPLAQAEATLFTDGSSYIRDGKRYAGAAVVTLDSVIWAEPLPIGTSAQKAELIALTKALEWSKDKSVNIYTDSRYAFATLHVHGMIY




RERGWLTAGGKAIKNAPEILALLTAVWLPKRVAVMHCKGHQKDDAPTSTGNRRADEVAREVAIRPLSTQATIS





AVIRE_
8,003
TAPLEEEYRLFLEAPIQNVTLLEQWKREIPKVWAEINPPGLASTQAPIHVQLLSTALPVRVRQYPITLEAKRSLRETIRKFRAAGILRPVHSPWNTPLLPV


P03360_

RKSGTSEYRMVQDLREVNKRVETIHPTVPNPYTLLSLLPPDRIWYSVLDLKDAFFCIPLAPESQLIFAFEWADAEEGESGQLTWTRLPQGFKNSPTLFN


3mutA

EALNRDLQGFRLDHPSVSLLQYVDDLLIAADTQAACLSATRDLLMTLAELGYRVSGKKAQLCQEEVTYLGFKIHKGSRSLSNSRTQAILQIPVPKTKRQV




REFLGKIGYCRLFIPGFAELAQPLYAATRPGNDPLVWGEKEEEAFQSLKLALTQPPALALPSLDKPFQLFVEETSGAAKGVLTQALGPWKRPVAYLSKR




LDPVAAGWPRCLRAIAAAALLTREASKLTFGQDIEITSSHNLESLLRSPPDKWLTNARITQYQVLLLDPPRVRFKQTAALNPATLLPETDDTLPIHHCLDT




LDSLTSTRPDLTDQPLAQAEATLFTDGSSYIRDGKRYAGAAVVTLDSVIWAEPLPIGTSAQKAELIALTKALEWSKDKSVNIYTDSRYAFATLHVHGMIY




RERGWLTAGGKAIKNAPEILALLTAVWLPKRVAVMHCKGHQKDDAPTSTGNRRADEVAREVAIRPLSTQATIS





BAEVM_
8,004
TVSLQDEHRLFDIPVTTSLPDVWLQDFPQAWAETGGLGRAKCQAPIIIDLKPTAVPVSIKQYPMSLEAHMGIRQHIIKFLELGVLRPCRSPWNTPLLPVK


P10272

KPGTQDYRPVQDLREINKRTVDIHPTVPNPYNLLSTLKPDYSWYTVLDLKDAFFCLPLAPQSQELFAFEWKDPERGISGQLTWTRLPQGFKNSPTLFD




EALHRDLTDFRTQHPEVTLLQYVDDLLLAAPTKKACTQGTRHLLQELGEKGYRASAKKAQICQTKVTYLGYILSEGKRWLTPGRIETVARIPPPRNPRE




VREFLGTAGFCRLWIPGFAELAAPLYALTKESTPFTWQTEHQLAFEALKKALLSAPALGLPDTSKPFTLFLDERQGIAKGVLTQKLGPWKRPVAYLSKK




LDPVAAGWPPCLRIMAATAMLVKDSAKLTLGQPLTVITPHTLEAIVRQPPDRWITNARLTHYQALLLDTDRVQFGPPVTLNPATLLPVPENQPSPHDCR




QVLAETHGTREDLKDQELPDADHTWYTDGSSYLDSGTRRAGAAVVDGHNTIWAQSLPPGTSAQKAELIALTKALELSKGKKANIYTDSRYAFATAHTH




GSIYERRGLLTSEGKEIKNKAEIIALLKALFLPQEVAIIHCPGHQKGQDPVAVGNRQADRVARQAAMAEVLTLATEPDNTSHIT





BAEVM_
8,005
TVSLQDEHRLFDIPVTTSLPDVWLQDFPQAWAETGGLGRAKCQAPIIIDLKPTAVPVSIKQYPMSLEAHMGIRQHIIKFLELGVLRPCRSPWNTPLLPVK


P10272_

KPGTQDYRPVQDLREINKRTVDIHPTVPNPYNLLSTLKPDYSWYTVLDLKDAFFCLPLAPQSQELFAFEWKDPERGISGQLTWTRLPQGFKNSPTLFN


3mut

EALHRDLTDFRTQHPEVTLLQYVDDLLLAAPTKKACTQGTRHLLQELGEKGYRASAKKAQICQTKVTYLGYILSEGKRWLTPGRIETVARIPPPRNPRE




VREFLGTAGFCRLWIPGFAELAAPLYALTKPSTPFTWQTEHQLAFEALKKALLSAPALGLPDTSKPFTLFLDERQGIAKGVLTQKLGPWKRPVAYLSKK




LDPVAAGWPPCLRIMAATAMLVKDSAKLTLGQPLTVITPHTLEAIVRQPPDRWITNARLTHYQALLLDTDRVQFGPPVTLNPATLLPVPENQPSPHDCR




QVLAETHGTREDLKDQELPDADHTWYTDGSSYLDSGTRRAGAAVVDGHNTIWAQSLPPGTSAQKAELIALTKALELSKGKKANIYTDSRYAFATAHTH




GSIYERRGWLTSEGKEIKNKAEIIALLKALFLPQEVAIIHCPGHQKGQDPVAVGNRQADRVARQAAMAEVLTLATEPDNTSHIT





BAEVM_
8,006
TVSLQDEHRLFDIPVTTSLPDVWLQDFPQAWAETGGLGRAKCQAPIIIDLKPTAVPVSIKQYPMSLEAHMGIRQHIIKFLELGVLRPCRSPWNTPLLPVK


P10272_

KPGTQDYRPVQDLREINKRTVDIHPTVPNPYNLLSTLKPDYSWYTVLDLKDAFFCLPLAPQSQELFAFEWKDPERGISGQLTWTRLPQGFKNSPTLFN


3mutA

EALHRDLTDFRTQHPEVTLLQYVDDLLLAAPTKKACTQGTRHLLQELGEKGYRASAKKAQICQTKVTYLGYILSEGKRWLTPGRIETVARIPPPRNPRE




VREFLGKAGFCRLFIPGFAELAAPLYALTKPSTPFTWQTEHQLAFEALKKALLSAPALGLPDTSKPFTLFLDERQGIAKGVLTQKLGPWKRPVAYLSKKL




DPVAAGWPPCLRIMAATAMLVKDSAKLTLGQPLTVITPHTLEAIVRQPPDRWITNARLTHYQALLLDTDRVQFGPPVTLNPATLLPVPENQPSPHDCRQ




VLAETHGTREDLKDQELPDADHTWYTDGSSYLDSGTRRAGAAVVDGHNTIWAQSLPPGTSAQKAELIALTKALELSKGKKANIYTDSRYAFATAHTHG




SIYERRGWLTSEGKEIKNKAEIIALLKALFLPQEVAIIHCPGHQKGQDPVAVGNRQADRVARQAAMAEVLTLATEPDNTSHIT





BLVAU_
8,007
GVLDAPPSHIGLEHLPPPPEVPQFPLNLERLQALQDLVHRSLEAGYISPWDGPGNNPVFPVRKPNGAWRFVHDLRVTNALTKPIPALSPGPPDLTAIPT


P25059

HLPHIICLDLKDAFFQIPVEDRFRSYFAFTLPTPGGLQPHRRFAWRVLPQGFINSPALFERALQEPLRQVSAAFSQSLLVSYMDDILYVSPTEEQRLQCY




QTMAAHLRDLGFQVASEKTRQTPSPVPFLGQMVHERMVTYQSLPTLQISSPISLHQLQTVLGDLQWVSRGTPTTRRPLQLLYSSLKGIDDPRAIIHLSP




EQQQGIAELRQALSHNARSRYNEQEPLLAYVHLTRAGSTLVLFQKGAQFPLAYFQTPLTDNQASPWGLLLLLGCQYLQAQALSSYAKTILKYYHNLPK




TSLDNWIQSSEDPRVQELLQLWPQISSQGIQPPGPWKTLVTRAEVFLTPQFSPEPIPAALCLFSDGAARRGAYCLWKDHLLDFQAVPAPESAQKGELA




GLLAGLAAAPPEPLNIWVDSKYLYSLLRTLVLGAWLQPDPVPSYALLYKSLLRHPAIFVGHVRSHSSASHPIASLNNYVDQL





BLVAU_
8,008
GVLDAPPSHIGLEHLPPPPEVPQFPLNLERLQALQDLVHRSLEAGYISPWDGPGNNPVFPVRKPNGAWRFVHDLRVTNALTKPIPALSPGPPDLTAIPT


P25059_

HLPHIICLDLKDAFFQIPVEDRFRSYFAFTLPTPGGLQPHRRFAWRVLPQGFINSPALFQRALQEPLRQVSAAFSQSLLVSYMDDILYVSPTEEQRLQCY


2mut

QTMAAHLRDLGFQVASEKTRQTPSPVPFLGQMVHERMVTYQSLPTLQISSPISLHQLQTVLGDLQWVSRGTPTTRRPLQLLYSSLKPIDDPRAIIHLSP




EQQQGIAELRQALSHNARSRYNEQEPLLAYVHLTRAGSTLVLFQKGAQFPLAYFQTPLTDNQASPWGLLLLLGCQYLQAQALSSYAKTILKYYHNLPK




TSLDNWIQSSEDPRVQELLQLWPQISSQGIQPPGPWKTLVTRAEVFLTPQFSPEPIPAALCLFSDGAARRGAYCLWKDHLLDFQAVPAPESAQKGELA




GLLAGLAAAPPEPLNIWVDSKYLYSLLRTLVLGAWLQPDPVPSYALLYKSLLRHPAIFVGHVRSHSSASHPIASLNNYVDQL





BLVJ_
8,009
GVLDTPPSHIGLEHLPPPPEVPQFPLNLERLQALQDLVHRSLEAGYISPWDGPGNNPVFPVRKPNGAWRFVHDLRATNALTKPIPALSPGPPDLTAIPT


P03361

HPPHIICLDLKDAFFQIPVEDRFRFYLSFTLPSPGGLQPHRRFAWRVLPQGFINSPALFERALQEPLRQVSAAFSQSLLVSYMDDILYASPTEEQRSQCY




QALAARLRDLGFQVASEKTSQTPSPVPFLGQMVHEQIVTYQSLPTLQISSPISLHQLQAVLGDLQWVSRGTPTTRRPLQLLYSSLKRHHDPRAIIQLSPE




QLQGIAELRQALSHNARSRYNEQEPLLAYVHLTRAGSTLVLFQKGAQFPLAYFQTPLTDNQASPWGLLLLLGCQYLQTQALSSYAKPILKYYHNLPKTS




LDNWIQSSEDPRVQELLQLWPQISSQGIQPPGPWKTLITRAEVFLTPQFSPDPIPAALCLFSDGATGRGAYCLWKDHLLDFQAVPAPESAQKGELAGL




LAGLAAAPPEPVNIWVDSKYLYSLLRTLVLGAWLQPDPVPSYALLYKSLLRHPAIVVGHVRSHSSASHPIASLNNYVDQL





BLVJ_
8,010
GVLDTPPSHIGLEHLPPPPEVPQFPLNLERLQALQDLVHRSLEAGYISPWDGPGNNPVFPVRKPNGAWRFVHDLRATNALTKPIPALSPGPPDLTAIPT


P03361_

HPPHIICLDLKDAFFQIPVEDRFRFYLSFTLPSPGGLQPHRRFAWRVLPQGFINSPALFNRALQEPLRQVSAAFSQSLLVSYMDDILYASPTEEQRSQCY


2mut

QALAARLRDLGFQVASEKTSQTPSPVPFLGQMVHEQIVTYQSLPTLQISSPISLHQLQAVLGDLQWVSRGTPTTRRPLQLLYSSLKRHHDPRAIIQLSPE




QLQGIAELRQALSHNARSRYNEQEPLLAYVHLTRAGSTLVLFQKGAQFPLAYFQTPLTDNQASPWGLLLLLGCQYLQTQALSSYAKPILKYYHNLPKTS




LDNWIQSSEDPRVQELLQLWPQISSQGIQPPGPWKTLITRAEVFLTPQFSPDPIPAALCLFSDGATGRGAYCLWKDHLLDFQAVPAPESAQKGELAGL




LAGLAAAPPEPVNIWVDSKYLYSLLRTWVLGAWLQPDPVPSYALLYKSLLRHPAIVVGHVRSHSSASHPIASLNNYVDQL





BLVJ_
8,011
GVLDTPPSHIGLEHLPPPPEVPQFPLNLERLQALQDLVHRSLEAGYISPWDGPGNNPVFPVRKPNGAWRFVHDLRATNALTKPIPALSPGPPDLTAPP


P03361_

THPPHIICLDLKDAFFQIPVEDRFRFYLSFTLPSPGGLQPHRRFAWRVLPQGFINSPALFQRALQEPLRQVSAAFSQSLLVSYMDDILYASPTEEQRSQC


2mutB

YQALAARLRDLGFQVASEKTSQTPSPVPFLGQMVHEQIVTYQSLPTLQISSPISLHQLQAVLGDLQWVSRGTPTTRRPLQLLYSSLKRHHDPRAIIQLSP




EQLQGIAELRQALSHNARSRYNEQEPLLAYVHLTRAGSTLVLFQKGAQFPLAYFQTPLTDNQASPWGLLLLLGCQYLQTQALSSYAKPILKYYHNLPKT




SLDNWIQSSEDPRVQELLQLWPQISSQGIQPPGPWKTLITRAEVFLTPQFSPDPIPAALCLFSDGATGRGAYCLWKDHLLDFQAVPAPESAQKGELAG




LLAGLAAAPPEPVNIWVDSKYLYSLLRTWVLGAWLQPDPVPSYALLYKSLLRHPAIVVGHVRSHSSASHPIASLNNYVDQL





FFV_
8,012
MDLLKPLTVERKGVKIKGYWNSQADITCVPKDLLQGEEPVRQQNVTTIHGTQEGDVYYVNLKIDGRRINTEVIGTTLDYAIITPGDVPWILKKPLELTIKLD


O93209

LEEQQGTLLNNSILSKKGKEELKQLFEKYSALWQSWENQVGHRRIRPHKIATGTVKPTPQKQYHINPKAKPDIQIVINDLLKQGVLIQKESTMNTPVYPV




PKPNGRWRMVLDYRAVNKVTPLIAVQNQHSYGILGSLFKGRYKTTIDLSNGFWAHPIVPEDYWITAFTWQGKQYCWTVLPQGFLNSPGLFTGDVVDL




LQGIPNVEVYVDDVYISHDSEKEHLEYLDILFNRLKEAGYIISLKKSNIANSIVDFLGFQITNEGRGLTDTFKEKLENITAPTTLKQLQSILGLLNFARNFIPD




FTELIAPLYALIPKSTKNYVPWQIEHSTTLETLITKLNGAEYLQGRKGDKTLIMKVNASYTTGYIRYYNEGEKKPISYVSIVFSKTELKFTELEKLLTTVHKG




LLKALDLSMGQNIHVYSPIVSMQNIQKTPQTAKKALASRWLSWLSYLEDPRIRFFYDPQMPALKDLPAVDTGKDNKKHPSNFQHIFYTDGSAITSPTKE




GHLNAGMGIVYFINKDGNLQKQQEWSISLGNHTAQFAEIAAFEFALKKCLPLGGNILVVTDSNYVAKAYNEELDVWASNGFVNNRKKPLKHISKWKSV




ADLKRLRPDVVVTHEPGHQKLDSSPHAYGNNLADQLATQASFKVH





FFV_
8,013
MDLLKPLTVERKGVKIKGYWNSQADITCVPKDLLQGEEPVRQQNVTTIHGTQEGDVYYVNLKIDGRRINTEVIGTTLDYAIITPGDVPWILKKPLELTIKLD


O93209_

LEEQQGTLLNNSILSKKGKEELKQLFEKYSALWQSWENQVGHRRIRPHKIATGTVKPTPQKQYHINPKAKPDIQIVINDLLKQGVLIQKESTMNTPVYPV


2mut

PKPNGRWRMVLDYRAVNKVTPLIAVQNQHSYGILGSLFKGRYKTTIDLSNGFWAHPIVPEDYWITAFTWQGKQYCWTVLPQGFLNSPGLFNGDVVDL




LQGIPNVEVYVDDVYISHDSEKEHLEYLDILFNRLKEAGYIISLKKSNIANSIVDFLGFQITNEGRGLTDTFKEKLENITAPTTLKQLQSILGLLNFARNFIPD




FTELIAPLYALIPKSPKNYVPWQIEHSTTLETLITKLNGAEYLQGRKGDKTLIMKVNASYTTGYIRYYNEGEKKPISYVSIVFSKTELKFTELEKLLTTVHKG




LLKALDLSMGQNIHVYSPIVSMQNIQKTPQTAKKALASRWLSWLSYLEDPRIRFFYDPQMPALKDLPAVDTGKDNKKHPSNFQHIFYTDGSAITSPTKE




GHLNAGMGIVYFINKDGNLQKQQEWSISLGNHTAQFAEIAAFEFALKKCLPLGGNILVVTDSNYVAKAYNEELDVWASNGFVNNRKKPLKHISKWKSV




ADLKRLRPDVVVTHEPGHQKLDSSPHAYGNNLADQLATQASFKVH





FFV_
8,014
MDLLKPLTVERKGVKIKGYWNSQADITCVPKDLLQGEEPVRQQNVTTIHGTQEGDVYYVNLKIDGRRINTEVIGTTLDYAIITPGDVPWILKKPLELTIKLD


O93209_

LEEQQGTLLNNSILSKKGKEELKQLFEKYSALWQSWENQVGHRRIRPHKIATGTVKPTPQKQYHINPKAKPDIQIVINDLLKQGVLIQKESTMNTPVYPV


2mutA

PKPNGRWRMVLDYRAVNKVTPLIAVQNQHSYGILGSLFKGRYKTTIDLSNGFWAHPIVPEDYWITAFTWQGKQYCWTVLPQGFLNSPGLFNGDVVDL




LQGIPNVEVYVDDVYISHDSEKEHLEYLDILFNRLKEAGYIISLKKSNIANSIVDFLGFQITNEGRGLTDTFKEKLENITAPTTLKQLQSILGKLNFARNFIPD




FTELIAPLYALIPKSPKNYVPWQIEHSTTLETLITKLNGAEYLQGRKGDKTLIMKVNASYTTGYIRYYNEGEKKPISYVSIVFSKTELKFTELEKLLTTVHKG




LLKALDLSMGQNIHVYSPIVSMQNIQKTPQTAKKALASRWLSWLSYLEDPRIRFFYDPQMPALKDLPAVDTGKDNKKHPSNFQHIFYTDGSAITSPTKE




GHLNAGMGIVYFINKDGNLQKQQEWSISLGNHTAQFAEIAAFEFALKKCLPLGGNILVVTDSNYVAKAYNEELDVWASNGFVNNRKKPLKHISKWKSV




ADLKRLRPDVVVTHEPGHQKLDSSPHAYGNNLADQLATQASFKVH





FFV_
8,015
VPWILKKPLELTIKLDLEEQQGTLLNNSILSKKGKEELKQLFEKYSALWQSWENQVGHRRIRPHKIATGTVKPTPQKQYHINPKAKPDIQIVINDLLKQGV


O93209-

LIQKESTMNTPVYPVPKPNGRWRMVLDYRAVNKVTPLIAVQNQHSYGILGSLFKGRYKTTIDLSNGFWAHPIVPEDYWITAFTWQGKQYCWTVLPQGF


Pro

LNSPGLFTGDVVDLLQGIPNVEVYVDDVYISHDSEKEHLEYLDILFNRLKEAGYIISLKKSNIANSIVDFLGFQITNEGRGLTDTFKEKLENITAPTTLKQLQ




SILGLLNFARNFIPDFTELIAPLYALIPKSTKNYVPWQIEHSTTLETLITKLNGAEYLQGRKGDKTLIMKVNASYTTGYIRYYNEGEKKPISYVSIVFSKTELK




FTELEKLLTTVHKGLLKALDLSMGQNIHVYSPIVSMQNIQKTPQTAKKALASRWLSWLSYLEDPRIRFFYDPQMPALKDLPAVDTGKDNKKHPSNFQHI




FYTDGSAITSPTKEGHLNAGMGIVYFINKDGNLQKQQEWSISLGNHTAQFAEIAAFEFALKKCLPLGGNILVVTDSNYVAKAYNEELDVWASNGFVNNR




KKPLKHISKWKSVADLKRLRPDVVVTHEPGHQKLDSSPHAYGNNLADQLATQASFKVH





FFV_
8,016
VPWILKKPLELTIKLDLEEQQGTLLNNSILSKKGKEELKQLFEKYSALWQSWENQVGHRRIRPHKIATGTVKPTPQKQYHINPKAKPDIQIVINDLLKQGV


O93209-

LIQKESTMNTPVYPVPKPNGRWRMVLDYRAVNKVTPLIAVQNQHSYGILGSLFKGRYKTTIDLSNGFWAHPIVPEDYWITAFTWQGKQYCWTVLPQGF


Pro_2mut

LNSPGLFNGDVVDLLQGIPNVEVYVDDVYISHDSEKEHLEYLDILFNRLKEAGYIISLKKSNIANSIVDFLGFQITNEGRGLTDTFKEKLENITAPTTLKQLQ




SILGLLNFARNFIPDFTELIAPLYALIPKSPKNYVPWQIEHSTTLETLITKLNGAEYLQGRKGDKTLIMKVNASYTTGYIRYYNEGEKKPISYVSIVFSKTELK




FTELEKLLTTVHKGLLKALDLSMGQNIHVYSPIVSMQNIQKTPQTAKKALASRWLSWLSYLEDPRIRFFYDPQMPALKDLPAVDTGKDNKKHPSNFQHI




FYTDGSAITSPTKEGHLNAGMGIVYFINKDGNLQKQQEWSISLGNHTAQFAEIAAFEFALKKCLPLGGNILVVTDSNYVAKAYNEELDVWASNGFVNNR




KKPLKHISKWKSVADLKRLRPDVVVTHEPGHQKLDSSPHAYGNNLADQLATQASFKVH





FFV_
8,017
VPWILKKPLELTIKLDLEEQQGTLLNNSILSKKGKEELKQLFEKYSALWQSWENQVGHRRIRPHKIATGTVKPTPQKQYHINPKAKPDIQIVINDLLKQGV


O93209-

LIQKESTMNTPVYPVPKPNGRWRMVLDYRAVNKVTPLIAVQNQHSYGILGSLFKGRYKTTIDLSNGFWAHPIVPEDYWITAFTWQGKQYCWTVLPQGF


Pro_2mutA

LNSPGLFNGDVVDLLQGIPNVEVYVDDVYISHDSEKEHLEYLDILFNRLKEAGYIISLKKSNIANSIVDFLGFQITNEGRGLTDTFKEKLENITAPTTLKQLQ




SILGKLNFARNFIPDFTELIAPLYALIPKSPKNYVPWQIEHSTTLETLITKLNGAEYLQGRKGDKTLIMKVNASYTTGYIRYYNEGEKKPISYVSIVFSKTELK




FTELEKLLTTVHKGLLKALDLSMGQNIHVYSPIVSMQNIQKTPQTAKKALASRWLSWLSYLEDPRIRFFYDPQMPALKDLPAVDTGKDNKKHPSNFQHI




FYTDGSAITSPTKEGHLNAGMGIVYFINKDGNLQKQQEWSISLGNHTAQFAEIAAFEFALKKCLPLGGNILVVTDSNYVAKAYNEELDVWASNGFVNNR




KKPLKHISKWKSVADLKRLRPDVVVTHEPGHQKLDSSPHAYGNNLADQLATQASFKVH





FLV_
8,018
TLQLEEEYRLFEPESTQKQEMDIWLKNFPQAWAETGGMGTAHCQAPVLIQLKATATPISIRQYPMPHEAYQGIKPHIRRMLDQGILKPCQSPWNTPLLP


P10273

VKKPGTEDYRPVQDLREVNKRVEDIHPTVPNPYNLLSTLPPSHPWYTVLDLKDAFFCLRLHSESQLLFAFEWRDPEIGLSGQLTWTRLPQGFKNSPTL




FDEALHSDLADFRVRYPALVLLQYVDDLLLAAATRTECLEGTKALLETLGNKGYRASAKKAQICLQEVTYLGYSLKDGQRWLTKARKEAILSIPVPKNSR




QVREFLGTAGYCRLWIPGFAELAAPLYPLTRPGTLFQWGTEQQLAFEDIKKALLSSPALGLPDITKPFELFIDENSGFAKGVLVQKLGPWKRPVAYLSK




KLDTVASGWPPCLRMVAAIAILVKDAGKLTLGQPLTILTSHPVEALVRQPPNKWLSNARMTHYQAMLLDAERVHFGPTVSLNPATLLPLPSGGNHHDC




LQILAETHGTRPDLTDQPLPDADLTWYTDGSSFIRNGEREAGAAVTTESEVIWAAPLPPGTSAQRAELIALTQALKMAEGKKLTVYTDSRYAFATTHVH




GEIYRRRGLLTSEGKEIKNKNEILALLEALFLPKRLSIIHCPGHQKGDSPQAKGNRLADDTAKKAATETHSSLTVLP





FLV_
8,019
TLQLEEEYRLFEPESTQKQEMDIWLKNFPQAWAETGGMGTAHCQAPVLIQLKATATPISIRQYPMPHEAYQGIKPHIRRMLDQGILKPCQSPWNTPLLP


P10273_

VKKPGTEDYRPVQDLREVNKRVEDIHPTVPNPYNLLSTLPPSHPWYTVLDLKDAFFCLRLHSESQLLFAFEWRDPEIGLSGQLTWTRLPQGFKNSPTL


3mut

FNEALHSDLADFRVRYPALVLLQYVDDLLLAAATRTECLEGTKALLETLGNKGYRASAKKAQICLQEVTYLGYSLKDGQRWLTKARKEAILSIPVPKNSR




QVREFLGTAGYCRLWIPGFAELAAPLYPLTRPGTLFQWGTEQQLAFEDIKKALLSSPALGLPDITKPFELFIDENSGFAKGVLVQKLGPWKRPVAYLSK




KLDTVASGWPPCLRMVAAIAILVKDAGKLTLGQPLTILTSHPVEALVRQPPNKWLSNARMTHYQAMLLDAERVHFGPTVSLNPATLLPLPSGGNHHDC




LQILAETHGTRPDLTDQPLPDADLTWYTDGSSFIRNGEREAGAAVTTESEVIWAAPLPPGTSAQRAELIALTQALKMAEGKKLTVYTDSRYAFATTHVH




GEIYRRRGWLTSEGKEIKNKNEILALLEALFLPKRLSIIHCPGHQKGDSPQAKGNRLADDTAKKAATETHSSLTVLP





FLV_
8,020
TLQLEEEYRLFEPESTQKQEMDIWLKNFPQAWAETGGMGTAHCQAPVLIQLKATATPISIRQYPMPHEAYQGIKPHIRRMLDQGILKPCQSPWNTPLLP


P10273_

VKKPGTEDYRPVQDLREVNKRVEDIHPTVPNPYNLLSTLPPSHPWYTVLDLKDAFFCLRLHSESQLLFAFEWRDPEIGLSGQLTWTRLPQGFKNSPTL


3mutA

FNEALHSDLADFRVRYPALVLLQYVDDLLLAAATRTECLEGTKALLETLGNKGYRASAKKAQICLQEVTYLGYSLKDGQRWLTKARKEAILSIPVPKNSR




QVREFLGKAGYCRLFIPGFAELAAPLYPLTRPGTLFQWGTEQQLAFEDIKKALLSSPALGLPDITKPFELFIDENSGFAKGVLVQKLGPWKRPVAYLSKK




LDTVASGWPPCLRMVAAIAILVKDAGKLTLGQPLTILTSHPVEALVRQPPNKWLSNARMTHYQAMLLDAERVHFGPTVSLNPATLLPLPSGGNHHDCL




QILAETHGTRPDLTDQPLPDADLTWYTDGSSFIRNGEREAGAAVTTESEVIWAAPLPPGTSAQRAELIALTQALKMAEGKKLTVYTDSRYAFATTHVHG




EIYRRRGWLTSEGKEIKNKNEILALLEALFLPKRLSIIHCPGHQKGDSPQAKGNRLADDTAKKAATETHSSLTVLP





FOAMV_
8,021
MNPLQLLQPLPAEIKGTKLLAHWNSGATITCIPESFLEDEQPIKKTLIKTIHGEKQQNVYYVTFKVKGRKVEAEVIASPYEYILLSPTDVPWLTQQPLQLTIL


P14350

VPLQEYQEKILSKTALPEDQKQQLKTLFVKYDNLWQHWENQVGHRKIRPHNIATGDYPPRPQKQYPINPKAKPSIQIVIDDLLKQGVLTPQNSTMNTPV




YPVPKPDGRWRMVLDYREVNKTIPLTAAQNQHSAGILATIVRQKYKTTLDLANGFWAHPITPESYWLTAFTWQGKQYCWTRLPQGFLNSPALFTADV




VDLLKEIPNVQVYVDDIYLSHDDPKEHVQQLEKVFQILLQAGYVVSLKKSEIGQKTVEFLGFNITKEGRGLTDTFKTKLLNITPPKDLKQLQSILGLLNFAR




NFIPNFAELVQPLYNLIASAKGKYIEWSEENTKQLNMVIEALNTASNLEERLPEQRLVIKVNTSPSAGYVRYYNETGKKPIMYLNYVFSKAELKFSMLEKL




LTTMHKALIKAMDLAMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMTYLEDPRIQFHYDKTLPELKHIPDVYTSSQSPVKHPSQYEGVFYTDGSAI




KSPDPTKSNNAGMGIVHATYKPEYQVLNQWSIPLGNHTAQMAEIAAVEFACKKALKIPGPVLVITDSFYVAESANKELPYWKSNGFVNNKKKPLKHISK




WKSIAECLSMKPDITIQHEKGISLQIPVFILKGNALADKLATQGSYVVN





FOAMV_
8,022
MNPLQLLQPLPAEIKGTKLLAHWNSGATITCIPESFLEDEQPIKKTLIKTIHGEKQQNVYYVTFKVKGRKVEAEVIASPYEYILLSPTDVPWLTQQPLQLTIL


P14350_

VPLQEYQEKILSKTALPEDQKQQLKTLFVKYDNLWQHWENQVGHRKIRPHNIATGDYPPRPQKQYPINPKAKPSIQIVIDDLLKQGVLTPQNSTMNTPV


2mut

YPVPKPDGRWRMVLDYREVNKTIPLTAAQNQHSAGILATIVRQKYKTTLDLANGFWAHPITPESYWLTAFTWQGKQYCWTRLPQGFLNSPALFNADV




VDLLKEIPNVQVYVDDIYLSHDDPKEHVQQLEKVFQILLQAGYVVSLKKSEIGQKTVEFLGFNITKEGRGLTDTFKTKLLNITPPKDLKQLQSILGLLNFAR




NFIPNFAELVQPLYNLIAPAKGKYIEWSEENTKQLNMVIEALNTASNLEERLPEQRLVIKVNTSPSAGYVRYYNETGKKPIMYLNYVFSKAELKFSMLEKL




LTTMHKALIKAMDLAMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMTYLEDPRIQFHYDKTLPELKHIPDVYTSSQSPVKHPSQYEGVFYTDGSAI




KSPDPTKSNNAGMGIVHATYKPEYQVLNQWSIPLGNHTAQMAEIAAVEFACKKALKIPGPVLVITDSFYVAESANKELPYWKSNGFVNNKKKPLKHISK




WKSIAECLSMKPDITIQHEKGISLQIPVFILKGNALADKLATQGSYVVN





FOAMV_
8,023
MNPLQLLQPLPAEIKGTKLLAHWNSGATITCIPESFLEDEQPIKKTLIKTIHGEKQQNVYYVTFKVKGRKVEAEVIASPYEYILLSPTDVPWLTQQPLQLTIL


P14350_

VPLQEYQEKILSKTALPEDQKQQLKTLFVKYDNLWQHWENQVGHRKIRPHNIATGDYPPRPQKQYPINPKAKPSIQIVIDDLLKQGVLTPQNSTMNTPV


2mutA

YPVPKPDGRWRMVLDYREVNKTIPLTAAQNQHSAGILATIVRQKYKTTLDLANGFWAHPITPESYWLTAFTWQGKQYCWTRLPQGFLNSPALFNADV




VDLLKEIPNVQVYVDDIYLSHDDPKEHVQQLEKVFQILLQAGYVVSLKKSEIGQKTVEFLGFNITKEGRGLTDTFKTKLLNITPPKDLKQLQSILGKLNFAR




NFIPNFAELVQPLYNLIAPAKGKYIEWSEENTKQLNMVIEALNTASNLEERLPEQRLVIKVNTSPSAGYVRYYNETGKKPIMYLNYVFSKAELKFSMLEKL




LTTMHKALIKAMDLAMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMTYLEDPRIQFHYDKTLPELKHIPDVYTSSQSPVKHPSQYEGVFYTDGSAI




KSPDPTKSNNAGMGIVHATYKPEYQVLNQWSIPLGNHTAQMAEIAAVEFACKKALKIPGPVLVITDSFYVAESANKELPYWKSNGFVNNKKKPLKHISK




WKSIAECLSMKPDITIQHEKGISLQIPVFILKGNALADKLATQGSYVVN





FOAMV_
8,024
VPWLTQQPLQLTILVPLQEYQEKILSKTALPEDQKQQLKTLFVKYDNLWQHWENQVGHRKIRPHNIATGDYPPRPQKQYPINPKAKPSIQIVIDDLLKQG


P14350-

VLTPQNSTMNTPVYPVPKPDGRWRMVLDYREVNKTIPLTAAQNQHSAGILATIVRQKYKTTLDLANGFWAHPITPESYWLTAFTWQGKQYCWTRLPQ


Pro

GFLNSPALFTADVVDLLKEIPNVQVYVDDIYLSHDDPKEHVQQLEKVFQILLQAGYVVSLKKSEIGQKTVEFLGFNITKEGRGLTDTFKTKLLNITPPKDLK




QLQSILGLLNFARNFIPNFAELVQPLYNLIASAKGKYIEWSEENTKQLNMVIEALNTASNLEERLPEQRLVIKVNTSPSAGYVRYYNETGKKPIMYLNYVF




SKAELKFSMLEKLLTTMHKALIKAMDLAMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMTYLEDPRIQFHYDKTLPELKHIPDVYTSSQSPVKHPS




QYEGVFYTDGSAIKSPDPTKSNNAGMGIVHATYKPEYQVLNQWSIPLGNHTAQMAEIAAVEFACKKALKIPGPVLVITDSFYVAESANKELPYWKSNGF




VNNKKKPLKHISKWKSIAECLSMKPDITIQHEKGISLQIPVFILKGNALADKLATQGSYVVN





FOAMV_
8,025
VPWLTQQPLQLTILVPLQEYQEKILSKTALPEDQKQQLKTLFVKYDNLWQHWENQVGHRKIRPHNIATGDYPPRPQKQYPINPKAKPSIQIVIDDLLKQG


P14350-

VLTPQNSTMNTPVYPVPKPDGRWRMVLDYREVNKTIPLTAAQNQHSAGILATIVRQKYKTTLDLANGFWAHPITPESYWLTAFTWQGKQYCWTRLPQ


Pro_2mut

GFLNSPALFNADVVDLLKEIPNVQVYVDDIYLSHDDPKEHVQQLEKVFQILLQAGYVVSLKKSEIGQKTVEFLGFNITKEGRGLTDTFKTKLLNITPPKDL




KQLQSILGLLNFARNFIPNFAELVQPLYNLIAPAKGKYIEWSEENTKQLNMVIEALNTASNLEERLPEQRLVIKVNTSPSAGYVRYYNETGKKPIMYLNYV




FSKAELKFSMLEKLLTTMHKALIKAMDLAMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMTYLEDPRIQFHYDKTLPELKHIPDVYTSSQSPVKHP




SQYEGVFYTDGSAIKSPDPTKSNNAGMGIVHATYKPEYQVLNQWSIPLGNHTAQMAEIAAVEFACKKALKIPGPVLVITDSFYVAESANKELPYWKSNG




FVNNKKKPLKHISKWKSIAECLSMKPDITIQHEKGISLQIPVFILKGNALADKLATQGSYVVN





FOAMV_
8,026
VPWLTQQPLQLTILVPLQEYQEKILSKTALPEDQKQQLKTLFVKYDNLWQHWENQVGHRKIRPHNIATGDYPPRPQKQYPINPKAKPSIQIVIDDLLKQG


P14350-

VLTPQNSTMNTPVYPVPKPDGRWRMVLDYREVNKTIPLTAAQNQHSAGILATIVRQKYKTTLDLANGFWAHPITPESYWLTAFTWQGKQYCWTRLPQ


Pro_2mutA

GFLNSPALFNADVVDLLKEIPNVQVYVDDIYLSHDDPKEHVQQLEKVFQILLQAGYVVSLKKSEIGQKTVEFLGFNITKEGRGLTDTFKTKLLNITPPKDL




KQLQSILGKLNFARNFIPNFAELVQPLYNLIAPAKGKYIEWSEENTKQLNMVIEALNTASNLEERLPEQRLVIKVNTSPSAGYVRYYNETGKKPIMYLNYV




FSKAELKFSMLEKLLTTMHKALIKAMDLAMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMTYLEDPRIQFHYDKTLPELKHIPDVYTSSQSPVKHP




SQYEGVFYTDGSAIKSPDPTKSNNAGMGIVHATYKPEYQVLNQWSIPLGNHTAQMAEIAAVEFACKKALKIPGPVLVITDSFYVAESANKELPYWKSNG




FVNNKKKPLKHISKWKSIAECLSMKPDITIQHEKGISLQIPVFILKGNALADKLATQGSYVVN





GALV_
8,027
VLNLEEEYRLHEKPVPSSIDPSWLQLFPTVWAERAGMGLANQVPPVVVELRSGASPVAVRQYPMSKEAREGIRPHIQKFLDLGVLVPCRSPWNTPLL


P21414

PVKKPGTNDYRPVQDLREINKRVQDIHPTVPNPYNLLSSLPPSYTWYSVLDLKDAFFCLRLHPNSQPLFAFEWKDPEKGNTGQLTWTRLPQGFKNSP




TLFDEALHRDLAPFRALNPQVVLLQYVDDLLVAAPTYEDCKKGTQKLLQELSKLGYRVSAKKAQLCQREVTYLGYLLKEGKRWLTPARKATVMKIPVP




TTPRQVREFLGTAGFCRLWIPGFASLAAPLYPLTKESIPFIWTEEHQQAFDHIKKALLSAPALALPDLTKPFTLYIDERAGVARGVLTQTLGPWRRPVAY




LSKKLDPVASGWPTCLKAVAAVALLLKDADKLTLGQNVTVIASHSLESIVRQPPDRWMTNARMTHYQSLLLNERVSFAPPAVLNPATLLPVESEATPVH




RCSEILAEETGTRRDLEDQPLPGVPTWYTDGSSFITEGKRRAGAPIVDGKRTVWASSLPEGTSAQKAELVALTQALRLAEGKNINIYTDSRYAFATAHIH




GAIYKQRGLLTSAGKDIKNKEEILALLEAIHLPRRVAIIHCPGHQRGSNPVATGNRRADEAAKQAALSTRVLAGTTKP





GALV_
8,028
VLNLEEEYRLHEKPVPSSIDPSWLQLFPTVWAERAGMGLANQVPPVVVELRSGASPVAVRQYPMSKEAREGIRPHIQKFLDLGVLVPCRSPWNTPLL


P21414_

PVKKPGTNDYRPVQDLREINKRVQDIHPTVPNPYNLLSSLPPSYTWYSVLDLKDAFFCLRLHPNSQPLFAFEWKDPEKGNTGQLTWTRLPQGFKNSP


3mut

TLFNEALHRDLAPFRALNPQVVLLQYVDDLLVAAPTYEDCKKGTQKLLQELSKLGYRVSAKKAQLCQREVTYLGYLLKEGKRWLTPARKATVMKIPVP




TTPRQVREFLGTAGFCRLWIPGFASLAAPLYPLTKPSIPFIWTEEHQQAFDHIKKALLSAPALALPDLTKPFTLYIDERAGVARGVLTQTLGPWRRPVAY




LSKKLDPVASGWPTCLKAVAAVALLLKDADKLTLGQNVTVIASHSLESIVRQPPDRWMTNARMTHYQSLLLNERVSFAPPAVLNPATLLPVESEATPVH




RCSEILAEETGTRRDLEDQPLPGVPTWYTDGSSFITEGKRRAGAPIVDGKRTVWASSLPEGTSAQKAELVALTQALRLAEGKNINIYTDSRYAFATAHIH




GAIYKQRGWLTSAGKDIKNKEEILALLEAIHLPRRVAIIHCPGHQRGSNPVATGNRRADEAAKQAALSTRVLAGTTKP





GALV_
8,029
VLNLEEEYRLHEKPVPSSIDPSWLQLFPTVWAERAGMGLANQVPPVVVELRSGASPVAVRQYPMSKEAREGIRPHIQKFLDLGVLVPCRSPWNTPLL


P21414_

PVKKPGTNDYRPVQDLREINKRVQDIHPTVPNPYNLLSSLPPSYTWYSVLDLKDAFFCLRLHPNSQPLFAFEWKDPEKGNTGQLTWTRLPQGFKNSP


3mutA

TLFNEALHRDLAPFRALNPQVVLLQYVDDLLVAAPTYEDCKKGTQKLLQELSKLGYRVSAKKAQLCQREVTYLGYLLKEGKRWLTPARKATVMKIPVP




TTPRQVREFLGKAGFCRLFIPGFASLAAPLYPLTKPSIPFIWTEEHQQAFDHIKKALLSAPALALPDLTKPFTLYIDERAGVARGVLTQTLGPWRRPVAYL




SKKLDPVASGWPTCLKAVAAVALLLKDADKLTLGQNVTVIASHSLESIVRQPPDRWMTNARMTHYQSLLLNERVSFAPPAVLNPATLLPVESEATPVH




RCSEILAEETGTRRDLEDQPLPGVPTWYTDGSSFITEGKRRAGAPIVDGKRTVWASSLPEGTSAQKAELVALTQALRLAEGKNINIYTDSRYAFATAHIH




GAIYKQRGWLTSAGKDIKNKEEILALLEAIHLPRRVAIIHCPGHQRGSNPVATGNRRADEAAKQAALSTRVLAGTTKP





HTL1A_
8,030
AVLGLEHLPRPPQISQFPLNPERLQALQHLVRKALEAGHIEPYTGPGNNPVFPVKKANGTWRFIHDLRATNSLTIDLSSSSPGPPDLSSLPTTLAHLQTI


P03362

DLRDAFFQIPLPKQFQPYFAFTVPQQCNYGPGTRYAWKVLPQGFKNSPTLFEMQLAHILQPIRQAFPQCTILQYMDDILLASPSHEDLLLLSEATMASLI




SHGLPVSENKTQQTPGTIKFLGQIISPNHLTYDAVPTVPIRSRWALPELQALLGEIQWVSKGTPTLRQPLHSLYCALQRHTDPRDQIYLNPSQVQSLVQL




RQALSQNCRSRLVQTLPLLGAIMLTLTGTTTVVFQSKEQWPLVWLHAPLPHTSQCPWGQLLASAVLLLDKYTLQSYGLLCQTIHHNISTQTFNQFIQTS




DHPSVPILLHHSHRFKNLGAQTGELWNTFLKTAAPLAPVKALMPVFTLSPVIINTAPCLFSDGSTSRAAYILWDKQILSQRSFPLPPPHKSAQRAELLGLL




HGLSSARSWRCLNIFLDSKYLYHYLRTLALGTFQGRSSQAPFQALLPRLLSRKVVYLHHVRSHTNLPDPISRLNALTDALLITPVLQL





HTL1A_
8,031
AVLGLEHLPRPPQISQFPLNPERLQALQHLVRKALEAGHIEPYTGPGNNPVFPVKKANGTWRFIHDLRATNSLTIDLSSSSPGPPDLSSLPTTLAHLQTI


P03362_

DLRDAFFQIPLPKQFQPYFAFTVPQQCNYGPGTRYAWKVLPQGFKNSPTLFQMQLAHILQPIRQAFPQCTILQYMDDILLASPSHEDLLLLSEATMASLI


2mut

SHGLPVSENKTQQTPGTIKFLGQIISPNHLTYDAVPTVPIRSRWALPELQALLGEIQWVSKGTPTLRQPLHSLYCALQPHTDPRDQIYLNPSQVQSLVQL




RQALSQNCRSRLVQTLPLLGAIMLTLTGTTTVVFQSKEQWPLVWLHAPLPHTSQCPWGQLLASAVLLLDKYTLQSYGLLCQTIHHNISTQTFNQFIQTS




DHPSVPILLHHSHRFKNLGAQTGELWNTFLKTAAPLAPVKALMPVFTLSPVIINTAPCLFSDGSTSRAAYILWDKQILSQRSFPLPPPHKSAQRAELLGLL




HGLSSARSWRCLNIFLDSKYLYHYLRTLALGTFQGRSSQAPFQALLPRLLSRKVVYLHHVRSHTNLPDPISRLNALTDALLITPVLQL





HTL1A_
8,032
AVLGLEHLPRPPQISQFPLNPERLQALQHLVRKALEAGHIEPYTGPGNNPVFPVKKANGTWRFIHDLRATNSLTIDLSSSSPGPPDLSSPPTTLAHLQTI


P03362_

DLRDAFFQIPLPKQFQPYFAFTVPQQCNYGPGTRYAWKVLPQGFKNSPTLFQMQLAHILQPIRQAFPQCTILQYMDDILLASPSHEDLLLLSEATMASLI


2mutB

SHGLPVSENKTQQTPGTIKFLGQIISPNHLTYDAVPTVPIRSRWALPELQALLGEIQWVSKGTPTLRQPLHSLYCALQPHTDPRDQIYLNPSQVQSLVQL




RQALSQNCRSRLVQTLPLLGAIMLTLTGTTTVVFQSKEQWPLVWLHAPLPHTSQCPWGQLLASAVLLLDKYTLQSYGLLCQTIHHNISTQTFNQFIQTS




DHPSVPILLHHSHRFKNLGAQTGELWNTFLKTAAPLAPVKALMPVFTLSPVIINTAPCLFSDGSTSRAAYILWDKQILSQRSFPLPPPHKSAQRAELLGLL




HGLSSARSWRCLNIFLDSKYLYHYLRTLALGTFQGRSSQAPFQALLPRLLSRKVVYLHHVRSHTNLPDPISRLNALTDALLITPVLQL





HTL1C_
8,033
AVLGLEHLPRPPEISQFPLNPERLQALQHLVRKALEAGHIEPYTGPGNNPVFPVKKANGTWRFIHDLRATNSLTIDLSSSSPGPPDLSSLPTTLAHLQTI


P14078

DLKDAFFQIPLPKQFQPYFAFTVPQQCNYGPGTRYAWRVLPQGFKNSPTLFEMQLAHILQPIRQAFPQCTILQYMDDILLASPSHADLQLLSEATMASLI




SHGLPVSENKTQQTPGTIKFLGQIISPNHLTYDAVPKVPIRSRWALPELQALLGEIQWVSKGTPTLRQPLHSLYCALQRHTDPRDQIYLNPSQVQSLVQL




RQALSQNCRSRLVQTLPLLGAIMLTLTGTTTVVFQSKQQWPLVWLHAPLPHTSQCPWGQLLASAVLLLDKYTLQSYGLLCQTIHHNISTQTFNQFIQTS




DHPSVPILLHHSHRFKNLGAQTGELWNTFLKTTAPLAPVKALMPVFTLSPVIINTAPCLFSDGSTSQAAYILWDKHILSQRSFPLPPPHKSAQRAELLGLL




HGLSSARSWRCLNIFLDSKYLYHYLRTLALGTFQGRSSQAPFQALLPRLLSRKVVYLHHVRSHTNLPDPISRLNALTDALLITPVLQL





HTL1C_
8,034
AVLGLEHLPRPPEISQFPLNPERLQALQHLVRKALEAGHIEPYTGPGNNPVFPVKKANGTWRFIHDLRATNSLTIDLSSSSPGPPDLSSLPTTLAHLQTI


P14078_

DLKDAFFQIPLPKQFQPYFAFTVPQQCNYGPGTRYAWRVLPQGFKNSPTLFQMQLAHILQPIRQAFPQCTILQYMDDILLASPSHADLQLLSEATMASLI


2mut

SHGLPVSENKTQQTPGTIKFLGQIISPNHLTYDAVPKVPIRSRWALPELQALLGEIQWVSKGTPTLRQPLHSLYCALQPHTDPRDQIYLNPSQVQSLVQL




RQALSQNCRSRLVQTLPLLGAIMLTLTGTTTVVFQSKQQWPLVWLHAPLPHTSQCPWGQLLASAVLLLDKYTLQSYGLLCQTIHHNISTQTFNQFIQTS




DHPSVPILLHHSHRFKNLGAQTGELWNTFLKTTAPLAPVKALMPVFTLSPVIINTAPCLFSDGSTSQAAYILWDKHILSQRSFPLPPPHKSAQRAELLGLL




HGLSSARSWRCLNIFLDSKYLYHYLRTLALGTFQGRSSQAPFQALLPRLLSRKVVYLHHVRSHTNLPDPISRLNALTDALLITPVLQL





HTL1L_
8,035
GLEHLPRPPEISQFPLNPERLQALQHLVRKALEAGHIEPYTGPGNNPVFPVKKANGTWRFIHDLRATNSLTVDLSSSSPGPPDLSSLPTTLAHLQTIDLK


P0C211

DAFFQIPLPKQFQPYFAFTVPQQCNYGPGTRYAWKVLPQGFKNSPTLFEMQLASILQPIRQAFPQCVILQYMDDILLASPSPEDLQQLSEATMASLISH




GLPVSQDKTQQTPGTIKFLGQIISPNHITYDAVPTVPIRSRWALPELQALLGEIQWVSKGTPTLRQPLHSLYCALQGHTDPRDQIYLNPSQVQSLMQLQ




QALSQNCRSRLAQTLPLLGAIMLTLTGTTTVVFQSKQQWPLVWLHAPLPHTSQCPWGQLLASAVLLLDKYTLQSYGLLCQTIHHNISIQTFNQFIQTSD




HPSVPILLHHSHRFKNLGAQTGELWNTFLKTAAPLAPVKALTPVFTLSPIIINTAPCLFSDGSTSQAAYILWDKHILSQRSFPLPPPHKSAQQAELLGLLH




GLSSARSWHCLNIFLDSKYLYHYLRTLALGTFQGKSSQAPFQALLPRLLAHKVIYLHHVRSHTNLPDPISKLNALTDALLITPIL





HTL1L_
8,036
GLEHLPRPPEISQFPLNPERLQALQHLVRKALEAGHIEPYTGPGNNPVFPVKKANGTWRFIHDLRATNSLTVDLSSSSPGPPDLSSLPTTLAHLQTIDLK


P0C211_

DAFFQIPLPKQFQPYFAFTVPQQCNYGPGTRYAWKVLPQGFKNSPTLFQMQLASILQPIRQAFPQCVILQYMDDILLASPSPEDLQQLSEATMASLISH


2mut

GLPVSQDKTQQTPGTIKFLGQIISPNHITYDAVPTVPIRSRWALPELQALLGEIQWVSKGTPTLRQPLHSLYCALQGHTDPRDQIYLNPSQVQSLMQLQ




QALSQNCRSRLAQTLPLLGAIMLTLTGTTTVVFQSKQQWPLVWLHAPLPHTSQCPWGQLLASAVLLLDKYTLQSYGLLCQTIHHNISIQTFNQFIQTSD




HPSVPILLHHSHRFKNLGAQTGELWNTFLKTAAPLAPVKALTPVFTLSPIIINTAPCLFSDGSTSQAAYILWDKHILSQRSFPLPPPHKSAQQAELLGLLH




GLSSARSWHCLNIFLDSKYLYHYLRTLAWGTFQGKSSQAPFQALLPRLLAHKVIYLHHVRSHTNLPDPISKLNALTDALLITPIL





HTL1L_
8,037
GLEHLPRPPEISQFPLNPERLQALQHLVRKALEAGHIEPYTGPGNNPVFPVKKANGTWRFIHDLRATNSLTVDLSSSSPGPPDLSSPPTTLAHLQTIDLK


P0C211_

DAFFQIPLPKQFQPYFAFTVPQQCNYGPGTRYAWKVLPQGFKNSPTLFQMQLASILQPIRQAFPQCVILQYMDDILLASPSPEDLQQLSEATMASLISH


2mutB

GLPVSQDKTQQTPGTIKFLGQIISPNHITYDAVPTVPIRSRWALPELQALLGEIQWVSKGTPTLRQPLHSLYCALQGHTDPRDQIYLNPSQVQSLMQLQ




QALSQNCRSRLAQTLPLLGAIMLTLTGTTTVVFQSKQQWPLVWLHAPLPHTSQCPWGQLLASAVLLLDKYTLQSYGLLCQTIHHNISIQTFNQFIQTSD




HPSVPILLHHSHRFKNLGAQTGELWNTFLKTAAPLAPVKALTPVFTLSPIIINTAPCLFSDGSTSQAAYILWDKHILSQRSFPLPPPHKSAQQAELLGLLH




GLSSARSWHCLNIFLDSKYLYHYLRTLAWGTFQGKSSQAPFQALLPRLLAHKVIYLHHVRSHTNLPDPISKLNALTDALLITPIL





HTL32_
8,038
GLEHLPPPPEVSQFPLNPERLQALTDLVSRALEAKHIEPYQGPGNNPIFPVKKPNGKWRFIHDLRATNSVTRDLASPSPGPPDLTSLPQGLPHLRTIDLT


Q0R5R2

DAFFQIPLPTIFQPYFAFTLPQPNNYGPGTRYSWRVLPQGFKNSPTLFEQQLSHILTPVRKTFPNSLIIQYMDDILLASPAPGELAALTDKVTNALTKEGL




PLSPEKTQATPGPIHFLGQVISQDCITYETLPSINVKSTWSLAELQSMLGELQWVSKGTPVLRSSLHQLYLALRGHRDPRDTIKLTSIQVQALRTIQKALT




LNCRSRLVNQLPILALIMLRPTGTTAVLFQTKQKWPLVWLHTPHPATSLRPWGQLLANAVIILDKYSLQHYGQVCKSFHHNISNQALTYYLHTSDQSSV




AILLQHSHRFHNLGAQPSGPWRSLLQMPQIFQNIDVLRPPFTISPVVINHAPCLFSDGSASKAAFIIWDRQVIHQQVLSLPSTCSAQAGELFGLLAGLQK




SQPWVALNIFLDSKFLIGHLRRMALGAFPGPSTQCELHTQLLPLLQGKTVYVHHVRSHTLLQDPISRLNEATDALMLAPLLPL





HTL32_
8,039
GLEHLPPPPEVSQFPLNPERLQALTDLVSRALEAKHIEPYQGPGNNPIFPVKKPNGKWRFIHDLRATNSVTRDLASPSPGPPDLTSLPQGLPHLRTIDLT


Q0R5R2_

DAFFQIPLPTIFQPYFAFTLPQPNNYGPGTRYSWRVLPQGFKNSPTLFQQQLSHILTPVRKTFPNSLIIQYMDDILLASPAPGELAALTDKVTNALTKEGL


2mut

PLSPEKTQATPGPIHFLGQVISQDCITYETLPSINVKSTWSLAELQSMLGELQWVSKGTPVLRSSLHQLYLALRGHRDPRDTIKLTSIQVQALRTIQKALT




LNCRSRLVNQLPILALIMLRPTGTTAVLFQTKQKWPLVWLHTPHPATSLRPWGQLLANAVIILDKYSLQHYGQVCKSFHHNISNQALTYYLHTSDQSSV




AILLQHSHRFHNLGAQPSGPWRSLLQMPQIFQNIDVLRPPFTISPVVINHAPCLFSDGSASKAAFIIWDRQVIHQQVLSLPSTCSAQAGELFGLLAGLQK




SQPWVALNIFLDSKFLIGHLRRMAWGAFPGPSTQCELHTQLLPLLQGKTVYVHHVRSHTLLQDPISRLNEATDALMLAPLLPL





HTL32_
8,040
GLEHLPPPPEVSQFPLNPERLQALTDLVSRALEAKHIEPYQGPGNNPIFPVKKPNGKWRFIHDLRATNSVTRDLASPSPGPPDLTSPPQGLPHLRTIDL


Q0R5R2_

TDAFFQIPLPTIFQPYFAFTLPQPNNYGPGTRYSWRVLPQGFKNSPTLFQQQLSHILTPVRKTFPNSLIIQYMDDILLASPAPGELAALTDKVTNALTKEG


2mutB

LPLSPEKTQATPGPIHFLGQVISQDCITYETLPSINVKSTWSLAELQSMLGELQWVSKGTPVLRSSLHQLYLALRGHRDPRDTIKLTSIQVQALRTIQKAL




TLNCRSRLVNQLPILALIMLRPTGTTAVLFQTKQKWPLVWLHTPHPATSLRPWGQLLANAVIILDKYSLQHYGQVCKSFHHNISNQALTYYLHTSDQSS




VAILLQHSHRFHNLGAQPSGPWRSLLQMPQIFQNIDVLRPPFTISPVVINHAPCLFSDGSASKAAFIIWDRQVIHQQVLSLPSTCSAQAGELFGLLAGLQ




KSQPWVALNIFLDSKFLIGHLRRMAWGAFPGPSTQCELHTQLLPLLQGKTVYVHHVRSHTLLQDPISRLNEATDALMLAPLLPL





HTL3P_
8,041
GLEHLPPPPEVSQFPLNPERLQALTDLVSRALEAKHIEPYQGPGNNPIFPVKKPNGKWRFIHDLRATNSLTRDLASPSPGPPDLTSLPQDLPHLRTIDLT


Q4U0X6

DAFFQIPLPAVFQPYFAFTLPQPNNHGPGTRYSWRVLPQGFKNSPTLFEQQLSHILAPVRKAFPNSLIIQYMDDILLASPALRELTALTDKVTNALTKEGL




PMSLEKTQATPGSIHFLGQVISPDCITYETLPSIHVKSIWSLAELQSMLGELQWVSKGTPVLRSSLHQLYLALRGHRDPRDTIELTSTQVQALKTIQKALA




LNCRSRLVSQLPILALIILRPTGTTAVLFQTKQKWPLVWLHTPHPATSLRPWGQLLANAIITLDKYSLQHYGQICKSFHHNISNQALTYYLHTSDQSSVAIL




LQHSHRFHNLGAQPSGPWRSLLQVPQIFQNIDVLRPPFIISPVVIDHAPCLFSDGATSKAAFILWDKQVIHQQVLPLPSTCSAQAGELFGLLAGLQKSKP




WPALNIFLDSKFLIGHLRRMALGAFLGPSTQCDLHARLFPLLQGKTVYVHHVRSHTLLQDPISRLNEATDALMLAPLLPL





HTL3P_
8,042
GLEHLPPPPEVSQFPLNPERLQALTDLVSRALEAKHIEPYQGPGNNPIFPVKKPNGKWRFIHDLRATNSLTRDLASPSPGPPDLTSLPQDLPHLRTIDLT


Q4U0X6_

DAFFQIPLPAVFQPYFAFTLPQPNNHGPGTRYSWRVLPQGFKNSPTLFQQQLSHILAPVRKAFPNSLIIQYMDDILLASPALRELTALTDKVTNALTKEG


2mut

LPMSLEKTQATPGSIHFLGQVISPDCITYETLPSIHVKSIWSLAELQSMLGELQWVSKGTPVLRSSLHQLYLALRGHRDPRDTIELTSTQVQALKTIQKAL




ALNCRSRLVSQLPILALIILRPTGTTAVLFQTKQKWPLVWLHTPHPATSLRPWGQLLANAIITLDKYSLQHYGQICKSFHHNISNQALTYYLHTSDQSSVAI




LLQHSHRFHNLGAQPSGPWRSLLQVPQIFQNIDVLRPPFIISPVVIDHAPCLFSDGATSKAAFILWDKQVIHQQVLPLPSTCSAQAGELFGLLAGLQKSK




PWPALNIFLDSKFLIGHLRRMAWGAFLGPSTQCDLHARLFPLLQGKTVYVHHVRSHTLLQDPISRLNEATDALMLAPLLPL





HTL3P_
8,043
GLEHLPPPPEVSQFPLNPERLQALTDLVSRALEAKHIEPYQGPGNNPIFPVKKPNGKWRFIHDLRATNSLTRDLASPSPGPPDLTSPPQDLPHLRTIDLT


Q4U0X6_

DAFFQIPLPAVFQPYFAFTLPQPNNHGPGTRYSWRVLPQGFKNSPTLFQQQLSHILAPVRKAFPNSLIIQYMDDILLASPALRELTALTDKVTNALTKEG


2mutB

LPMSLEKTQATPGSIHFLGQVISPDCITYETLPSIHVKSIWSLAELQSMLGELQWVSKGTPVLRSSLHQLYLALRGHRDPRDTIELTSTQVQALKTIQKAL




ALNCRSRLVSQLPILALIILRPTGTTAVLFQTKQKWPLVWLHTPHPATSLRPWGQLLANAIITLDKYSLQHYGQICKSFHHNISNQALTYYLHTSDQSSVAI




LLQHSHRFHNLGAQPSGPWRSLLQVPQIFQNIDVLRPPFIISPVVIDHAPCLFSDGATSKAAFILWDKQVIHQQVLPLPSTCSAQAGELFGLLAGLQKSK




PWPALNIFLDSKFLIGHLRRMAWGAFLGPSTQCDLHARLFPLLQGKTVYVHHVRSHTLLQDPISRLNEATDALMLAPLLPL





HTLV2_
8,044
HLPPPPQVDQFPLNLPERLQALNDLVSKALEAGHIEPYSGPGNNPVFPVKKPNGKWRFIHDLRATNAITTTLTSPSPGPPDLTSLPTALPHLQTIDLTDA


P03363_

FFQIPLPKQYQPYFAFTIPQPCNYGPGTRYAWTVLPQGFKNSPTLFQQQLAAVLNPMRKMFPTSTIVQYMDDILLASPTNEELQQLSQLTLQALTTHGL


2mut

PISQEKTQQTPGQIRFLGQVISPNHITYESTPTIPIKSQWTLTELQVILGEIQWVSKGTPILRKHLQSLYSALHPYRDPRACITLTPQQLHALHAIQQALQH




NCRGRLNPALPLLGLISLSTSGTTSVIFQPKQNWPLAWLHTPHPPTSLCPWGHLLACTILTLDKYTLQHYGQLCQSFHHNMSKQALCDFLRNSPHPSV




GILIHHMGRFHNLGSQPSGPWKTLLHLPTLLQEPRLLRPIFTLSPVVLDTAPCLFSDGSPQKAAYVLWDQTILQQDITPLPSHETHSAQKGELLALICGLR




AAKPWPSLNIFLDSKYLIKYLHSLAIGAFLGTSAHQTLQAALPPLLQGKTIYLHHVRSHTNLPDPISTFNEYTDSLILAPLVPL





JSRV_
8,045
PLGTSDSPVTHADPIDWKSEEPVWVDQWPLTQEKLSAAQQLVQEQLRLGHIEPSTSAWNSPIFVIKKKSGKWRLLQDLRKVNETMMHMGALQPGLPT


P31623

PSAIPDKSYIIVIDLKDCFYTIPLAPQDCKRFAFSLPSVNFKEPMQRYQWRVLPQGMTNSPTLCQKFVATAIAPVRQRFPQLYLVHYMDDILLAHTDEHLL




YQAFSILKQHLSLNGLVIADEKIQTHFPYNYLGFSLYPRVYNTQLVKLQTDHLKTLNDFQKLLGDINWIRPYLKLPTYTLQPLFDILKGDSDPASPRTLSLE




GRTALQSIEEAIRQQQITYCDYQRSWGLYILPTPRAPTGVLYQDKPLRWIYLSATPTKHLLPYYELVAKIIAKGRHEAIQYFGMEPPFICVPYALEQQDWL




FQFSDNWSIAFANYPGQITHHYPSDKLLQFASSHAFIFPKIVRRQPIPEATLIFTDGSSNGTAALIINHQTYYAQTSFSSAQVVELFAVHQALLTVPTSFNL




FTDSSYVVGALQMIETVPIIGTTSPEVLNLFTLIQQVLHCRQHPCFFGHIRAHSTLPGALVQGNHTADVLTKQVFFQS





JSRV_
8,046
PLGTSDSPVTHADPIDWKSEEPVWVDQWPLTQEKLSAAQQLVQEQLRLGHIEPSTSAWNSPIFVIKKKSGKWRLLQDLRKVNETMMHMGALQPGLPT


P31623_

PSPIPDKSYIIVIDLKDCFYTIPLAPQDCKRFAFSLPSVNFKEPMQRYQWRVLPQGMTNSPTLCQKFVATAIAPVRQRFPQLYLVHYMDDILLAHTDEHLL


2mutB

YQAFSILKQHLSLNGLVIADEKIQTHFPYNYLGFSLYPRVYNTQLVKLQTDHLKTLNDFQKLLGDINWIRPYLKLPTYTLQPLFDILKGDSDPASPRTLSLE




GRTALQSIEEAIRQQQITYCDYQRSWGLYILPTPRAPTGVLYQDKPLRWIYLSATPTKHLLPYYELVAKIIAKGRHEAIQYFGMEPPFICVPYALEQQDWL




FQFSDNWSIAFANYPGQITHHYPSDKLLQFASSHAFIFPKIVRRQPIPEATLIFTDGSSNGTAALIINHQTYYAQTSFSSAQVVELFAVHQALLTVPTSFNL




FTDSSYVVGALQMIETVPIIGTTSPEVLNLFTLIQQVLHCRQHPCFFGHIRAHSTLPGALVQGNHTADVLTKQVFFQS





KORV_
8,047
TLGDQGSRGSDPLPEPRVTLTVEGIPTEFLVNTGAEHSVLTKPMGKMGSKRTVVAGATGSKVYPWTTKRLLKIGQKQVTHSFLVIPECPAPLLGRDLLT


Q9TTC1

KLKAQIQFSTEGPQVTWEDRPAMCLVLNLEEEYRLHEKPVPPSIDPSWLQLFPMVWAEKAGMGLANQVPPVVVELKSDASPVAVRQYPMSKEAREGI




RPHIQRFLDLGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVQDIHPTVPNPYNLLSSLPPSHTWYSVLDLKDAFFCLKLHPNSQPLFAFEW




RDPEKGNTGQLTWTRLPQGFKNSPTLFDEALHRDLASFRALNPQVVMLQYVDDLLVAAPTYRDCKEGTRRLLQELSKLGYRVSAKKAQLCREEVTYL




GYLLKGGKRWLTPARKATVMKIPTPTTPRQVREFLGTAGFCRLWIPGFASLAAPLYPLTREKVPFTWTEAHQEAFGRIKEALLSAPALALPDLTKPFAL




YVDEKEGVARGVLTQTLGPWRRPVAYLSKKLDPVASGWPTCLKAIAAVALLLKDADKLTLGQNVLVIAPHNLESIVRQPPDRWMTNARMTHYQSLLLN




ERVSFAPPAILNPATLLPVESDDTPIHICSEILAEETGTRPDLRDQPLPGVPAWYTDGSSFIMDGRRQAGAAIVDNKRTVWASNLPEGTSAQKAELIALT




QALRLAEGKSINIYTDSRYAFATAHVHGAIYKQRGLLTSAGKDIKNKEEILALLEAIHLPKRVAIIHCPGHQRGTDPVATGNRKADEAAKQAAQSTRILTET




TKN





KORV_
8,048
TLGDQGSRGSDPLPEPRVTLTVEGIPTEFLVNTGAEHSVLTKPMGKMGSKRTVVAGATGSKVYPWTTKRLLKIGQKQVTHSFLVIPECPAPLLGRDLLT


Q9TTC1_

KLKAQIQFSTEGPQVTWEDRPAMCLVLNLEEEYRLHEKPVPPSIDPSWLQLFPMVWAEKAGMGLANQVPPVVVELKSDASPVAVRQYPMSKEAREGI


3mut

RPHIQRFLDLGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVQDIHPTVPNPYNLLSSLPPSHTWYSVLDLKDAFFCLKLHPNSQPLFAFEW




RDPEKGNTGQLTWTRLPQGFKNSPTLFNEALHRDLASFRALNPQVVMLQYVDDLLVAAPTYRDCKEGTRRLLQELSKLGYRVSAKKAQLCREEVTYL




GYLLKGGKRWLTPARKATVMKIPTPTTPRQVREFLGTAGFCRLWIPGFASLAAPLYPLTRPKVPFTWTEAHQEAFGRIKEALLSAPALALPDLTKPFAL




YVDEKEGVARGVLTQTLGPWRRPVAYLSKKLDPVASGWPTCLKAIAAVALLLKDADKLTLGQNVLVIAPHNLESIVRQPPDRWMTNARMTHYQSLLLN




ERVSFAPPAILNPATLLPVESDDTPIHICSEILAEETGTRPDLRDQPLPGVPAWYTDGSSFIMDGRRQAGAAIVDNKRTVWASNLPEGTSAQKAELIALT




QALRLAEGKSINIYTDSRYAFATAHVHGAIYKQRGWLTSAGKDIKNKEEILALLEAIHLPKRVAIIHCPGHQRGTDPVATGNRKADEAAKQAAQSTRILTE




TTKN





KORV_
8,049
TLGDQGSRGSDPLPEPRVTLTVEGIPTEFLVNTGAEHSVLTKPMGKMGSKRTVVAGATGSKVYPWTTKRLLKIGQKQVTHSFLVIPECPAPLLGRDLLT


Q9TTC1_

KLKAQIQFSTEGPQVTWEDRPAMCLVLNLEEEYRLHEKPVPPSIDPSWLQLFPMVWAEKAGMGLANQVPPVVVELKSDASPVAVRQYPMSKEAREGI


3mutA

RPHIQRFLDLGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVQDIHPTVPNPYNLLSSLPPSHTWYSVLDLKDAFFCLKLHPNSQPLFAFEW




RDPEKGNTGQLTWTRLPQGFKNSPTLFNEALHRDLASFRALNPQVVMLQYVDDLLVAAPTYRDCKEGTRRLLQELSKLGYRVSAKKAQLCREEVTYL




GYLLKGGKRWLTPARKATVMKIPTPTTPRQVREFLGKAGFCRLFIPGFASLAAPLYPLTRPKVPFTWTEAHQEAFGRIKEALLSAPALALPDLTKPFALY




VDEKEGVARGVLTQTLGPWRRPVAYLSKKLDPVASGWPTCLKAIAAVALLLKDADKLTLGQNVLVIAPHNLESIVRQPPDRWMTNARMTHYQSLLLNE




RVSFAPPAILNPATLLPVESDDTPIHICSEILAEETGTRPDLRDQPLPGVPAWYTDGSSFIMDGRRQAGAAIVDNKRTVWASNLPEGTSAQKAELIALTQ




ALRLAEGKSINIYTDSRYAFATAHVHGAIYKQRGWLTSAGKDIKNKEEILALLEAIHLPKRVAIIHCPGHQRGTDPVATGNRKADEAAKQAAQSTRILTET




TKN





KORV_
8,050
LLGRDLLTKLKAQIQFSTEGPQVTWEDRPAMCLVLNLEEEYRLHEKPVPPSIDPSWLQLFPMVWAEKAGMGLANQVPPVVVELKSDASPVAVRQYPM


Q9TTC1-

SKEAREGIRPHIQRFLDLGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVQDIHPTVPNPYNLLSSLPPSHTWYSVLDLKDAFFCLKLHPNSQ


Pro

PLFAFEWRDPEKGNTGQLTWTRLPQGFKNSPTLFDEALHRDLASFRALNPQWVMLQYVDDLLVAAPTYRDCKEGTRRLLQELSKLGYRVSAKKAQLC




REEVTYLGYLLKGGKRWLTPARKATVMKIPTPTTPRQVREFLGTAGFCRLWIPGFASLAAPLYPLTREKVPFTWTEAHQEAFGRIKEALLSAPALALPD




LTKPFALYVDEKEGVARGVLTQTLGPWRRPVAYLSKKLDPVASGWPTCLKAIAAVALLLKDADKLTLGQNVLVIAPHNLESIVRQPPDRWMTNARMTH




YQSLLLNERVSFAPPAILNPATLLPVESDDTPIHICSEILAEETGTRPDLRDQPLPGVPAWYTDGSSFIMDGRRQAGAAIVDNKRTVWASNLPEGTSAQ




KAELIALTQALRLAEGKSINIYTDSRYAFATAHVHGAIYKQRGLLTSAGKDIKNKEEILALLEAIHLPKRVAIIHCPGHQRGTDPVATGNRKADEAAKQAAQ




STRILTETTKN





KORV_
8,051
LLGRDLLTKLKAQIQFSTEGPQVTWEDRPAMCLVLNLEEEYRLHEKPVPPSIDPSWLQLFPMVWAEKAGMGLANQVPPVVVELKSDASPVAVRQYPM


Q9TTC1-

SKEAREGIRPHIQRFLDLGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVQDIHPTVPNPYNLLSSLPPSHTWYSVLDLKDAFFCLKLHPNSQ


Pro_3mut

PLFAFEWRDPEKGNTGQLTWTRLPQGFKNSPTLFNEALHRDLASFRALNPQVVMLQYVDDLLVAAPTYRDCKEGTRRLLQELSKLGYRVSAKKAQLC




REEVTYLGYLLKGGKRWLTPARKATVMKIPTPTTPRQVREFLGTAGFCRLWIPGFASLAAPLYPLTRPKVPFTWTEAHQEAFGRIKEALLSAPALALPD




LTKPFALYVDEKEGVARGVLTQTLGPWRRPVAYLSKKLDPVASGWPTCLKAIAAVALLLKDADKLTLGQNVLVIAPHNLESIVRQPPDRWMTNARMTH




YQSLLLNERVSFAPPAILNPATLLPVESDDTPIHICSEILAEETGTRPDLRDQPLPGVPAWYTDGSSFIMDGRRQAGAAIVDNKRTVWASNLPEGTSAQ




KAELIALTQALRLAEGKSINIYTDSRYAFATAHVHGAIYKQRGWLTSAGKDIKNKEEILALLEAIHLPKRVAIIHCPGHQRGTDPVATGNRKADEAAKQAA




QSTRILTETTKN





KORV_
8,052
LLGRDLLTKLKAQIQFSTEGPQVTWEDRPAMCLVLNLEEEYRLHEKPVPPSIDPSWLQLFPMVWAEKAGMGLANQVPPVVVELKSDASPVAVRQYPM


Q9TTC1-

SKEAREGIRPHIQRFLDLGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVQDIHPTVPNPYNLLSSLPPSHTWYSVLDLKDAFFCLKLHPNSQ


Pro_3mutA

PLFAFEWRDPEKGNTGQLTWTRLPQGFKNSPTLFNEALHRDLASFRALNPQVVMLQYVDDLLVAAPTYRDCKEGTRRLLQELSKLGYRVSAKKAQLC




REEVTYLGYLLKGGKRWLTPARKATVMKIPTPTTPRQVREFLGKAGFCRLFIPGFASLAAPLYPLTRPKVPFTWTEAHQEAFGRIKEALLSAPALALPDL




TKPFALYVDEKEGVARGVLTQTLGPWRRPVAYLSKKLDPVASGWPTCLKAIAAVALLLKDADKLTLGQNVLVIAPHNLESIVRQPPDRWMTNARMTHY




QSLLLNERVSFAPPAILNPATLLPVESDDTPIHICSEILAEETGTRPDLRDQPLPGVPAWYTDGSSFIMDGRRQAGAAIVDNKRTVWASNLPEGTSAQK




AELIALTQALRLAEGKSINIYTDSRYAFATAHVHGAIYKQRGWLTSAGKDIKNKEEILALLEAIHLPKRVAIIHCPGHQRGTDPVATGNRKADEAAKQAAQ




STRILTETTKN





MLVAV_
8,053
TLNLEDEYRLYETSAEPEVSPGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEAKLGIKPHIQRLLDQGILVPCQSPWNTPLL


P03356

PVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHRWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGMGISGQLTWTRLPQGFKNSP




TLFDEALHRDLADFRIQHPDLILLQYVDDILLAATSELDCQQGTRALLLTLGNLGYRASAKKAQLCQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPK




TPRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVA




YLSKKLDPVAAGWPPCLRMVAAIAVLRKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPVVALNPATLLPLPEEG




APHDCLEILAETHGTRPDLTDQPIPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWARALPAGTSAQRAELIALTQALKMAEGKRLNVYTDSRYAF




ATAHIHGEIYRRRGLLTSEGREIKNKSEILALLKALFLPKRLSIIHCLGHQKGDSAEARGNRLADQAAREAAIKTPPDTSTLL





MLVAV_
8,054
TLNLEDEYRLYETSAEPEVSPGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEAKLGIKPHIQRLLDQGILVPCQSPWNTPLL


P03356_

PVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHRWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGMGISGQLTWTRLPQGFKNSP


3mut

TLFNEALHRDLADFRIQHPDLILLQYVDDILLAATSELDCQQGTRALLLTLGNLGYRASAKKAQLCQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPK




TPRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPV




AYLSKKLDPVAAGWPPCLRMVAAIAVLRKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPWVALNPATLLPLPEE




GAPHDCLEILAETHGTRPDLTDQPIPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWARALPAGTSAQRAELIALTQALKMAEGKRLNVYTDSRYA




FATAHIHGEIYRRRGWLTSEGREIKNKSEILALLKALFLPKRLSIIHCLGHQKGDSAEARGNRLADQAAREAAIKTPPDTSTLL





MLVAV_
8,055
TLNLEDEYRLYETSAEPEVSPGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEAKLGIKPHIQRLLDQGILVPCQSPWNTPLL


P03356_

PVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHRWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGMGISGQLTWTRLPQGFKNSP


3mutA

TLFNEALHRDLADFRIQHPDLILLQYVDDILLAATSELDCQQGTRALLLTLGNLGYRASAKKAQLCQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPK




TPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVA




YLSKKLDPVAAGWPPCLRMVAAIAVLRKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPWVALNPATLLPLPEEG




APHDCLEILAETHGTRPDLTDQPIPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWARALPAGTSAQRAELIALTQALKMAEGKRLNVYTDSRYAF




ATAHIHGEIYRRRGWLTSEGREIKNKSEILALLKALFLPKRLSIIHCLGHQKGDSAEARGNRLADQAAREAAIKTPPDTSTLL





MLVBM_
8,056
TLGIEDEYRLHETSTEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIQQYPMSHEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


Q7SVK7

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGMGISGQLTWTRLPQGFKNSPT




LFDEALHRDLADFRIQHPDLILLQYVDDILLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLREGQRWLTEARKETVMGQPVPKT




PRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFSWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPWVALNPATLLPLPEEGAP




HDCLEILAETHGTRPDLTDQPIPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWAGALPAGTSAQRAELIALTQALKMAEGKRLNVYTDSRYAFAT




AHIHGEIYRRRGLLTSEGREIKNKSEILALLKALFLPKRLSIIHCLGHQKGDSAEARGNRLADQAAREAAIKTPPDTSTLL





MLVBM_
8,057
TLGIEDEYRLHETSTEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIQQYPMSHEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


Q7SVK7

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGMGISGQLTWTRLPQGFKNSPT




LFDEALHRDLADFRIQHPDLILLQYVDDILLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLREGQRWLTEARKETVMGQPVPKT




PRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFSWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPWVALNPATLLPLPEEGAP




HDCLEILAETHGTRPDLTDQPIPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWAGALPAGTSAQRAELIALTQALKMAEGKRLNVYTDSRYAFAT




AHIHGEIYRRRGLLTSEGREIKNKSEILALLKALFLPKRLSIIHCLGHQKGDSAEARGNRLADQAAREAAIKTPPDTSTLL





MLVBM_
8,058
TLGIEDEYRLHETSTEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIQQYPMSHEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


Q7SVK7_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGMGISGQLTWTRLPQGFKNSPT


3mut

LFNEALHRDLADFRIQHPDLILLQYVDDILLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLREGQRWLTEARKETVMGQPVPKT




PRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKPGTLFSWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVA




YLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPVVALNPATLLPLPEEGA




PHDCLEILAETHGTRPDLTDQPIPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWAGALPAGTSAQRAELIALTQALKMAEGKRLNVYTDSRYAFA




TAHIHGEIYRRRGWLTSEGREIKNKSEILALLKALFLPKRLSIIHCLGHQKGDSAEARGNRLADQAAREAAIKTPPDTSTLL





MLVBM_
8,059
TLGIEDEYRLHETSTEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIQQYPMSHEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


Q7SVK7_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGMGISGQLTWTRLPQGFKNSPT


3mut

LFNEALHRDLADFRIQHPDLILLQYVDDILLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLREGQRWLTEARKETVMGQPVPKT




PRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKPGTLFSWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVA




YLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPVVALNPATLLPLPEEGA




PHDCLEILAETHGTRPDLTDQPIPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWAGALPAGTSAQRAELIALTQALKMAEGKRLNVYTDSRYAFA




TAHIHGEIYRRRGWLTSEGREIKNKSEILALLKALFLPKRLSIIHCLGHQKGDSAEARGNRLADQAAREAAIKTPPDTSTLL





MLVBM_
8,060
LGIEDEYRLHETSTEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIQQYPMSHEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPV


Q7SVK7_

KKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGMGISGQLTWTRLPQGFKNSPTL


3mutA_

FNEALHRDLADFRIQHPDLILLQYVDDILLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLREGQRWLTEARKETVMGQPVPKTP


WS

RQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFSWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYL




SKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPVVALNPATLLPLPEEGAP




HDCLEILAETHGTRPDLTDQPIPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWAGALPAGTSAQRAELIALTQALKMAEGKRLNVYTDSRYAFAT




AHIHGEIYRRRGWLTSEGREIKNKSEILALLKALFLPKRLSIIHCLGHQKGDSAEARGNRLADQAAREAAIKTPPDTSTLLI





MLVBM_
8,061
LGIEDEYRLHETSTEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIQQYPMSHEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPV


Q7SVK7_

KKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGMGISGQLTWTRLPQGFKNSPTL


3mutA_

FNEALHRDLADFRIQHPDLILLQYVDDILLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLREGQRWLTEARKETVMGQPVPKTP


WS

RQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFSWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYL




SKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPVVALNPATLLPLPEEGAP




HDCLEILAETHGTRPDLTDQPIPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWAGALPAGTSAQRAELIALTQALKMAEGKRLNVYTDSRYAFAT




AHIHGEIYRRRGWLTSEGREIKNKSEILALLKALFLPKRLSIIHCLGHQKGDSAEARGNRLADQAAREAAIKTPPDTSTLLI





MLVCB_
8,062
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P08361

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT




LFDEALHRDLAGFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPIPKT




PRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAFQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQ




HDCLDILAEAHGTRSDLMDQPLPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWARALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFAT




AHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGNSAEARGNRMADQAAREVATRETPETSTLL





MLVCB_
8,063
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P08361_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT


3mut

LFNEALHRDLAGFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPIPKT




PRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAFQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVA




YLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPWVALNPATLLPLPEEGL




QHDCLDILAEAHGTRSDLMDQPLPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWARALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAF




ATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGNSAEARGNRMADQAAREVATRETPETSTLL





MLVCB_
8,064
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P08361_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT


3mutA

LFNEALHRDLAGFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPIPKT




PRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAFQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQ




HDCLDILAEAHGTRSDLMDQPLPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWARALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFAT




AHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGNSAEARGNRMADQAAREVATRETPETSTLL





MLVF5_
8,065
TLNIEDEYRLHETSKGPDVPLGSTWLSDFPQAWAETGGMGLAFRQAPLIISLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P26810

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQSLFAFEWKDPEMGISGQLTWTRLPQGFKNSPT




LFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT




PRQLREFLGTAGLCRLWIPGFAEMAAPLYPLTKTGTLFKWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDVGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPIVALNPATLLPLPEEGLQ




HDCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSFLQEGQRRAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAAGKKLNVYTDSRYAFAT




AHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGNHAEARGNRMADQAAREVATRETPETSTLL





MLVF5_
8,066
TLNIEDEYRLHETSKGPDVPLGSTWLSDFPQAWAETGGMGLAFRQAPLIISLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P26810_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQSLFAFEWKDPEMGISGQLTWTRLPQGFKNSPT


3mut

LFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT




PRQLREFLGTAGLCRLWIPGFAEMAAPLYPLTKPGTLFKWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDVGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPIVALNPATLLPLPEEGLQ




HDCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSFLQEGQRRAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAAGKKLNVYTDSRYAFAT




AHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGNHAEARGNRMADQAAREVATRETPETSTLL





MLVF5_
8,067
TLNIEDEYRLHETSKGPDVPLGSTWLSDFPQAWAETGGMGLAFRQAPLIISLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P26810_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQSLFAFEWKDPEMGISGQLTWTRLPQGFKNSPT


3mutA

LFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT




PRQLREFLGKAGLCRLFIPGFAEMAAPLYPLTKPGTLFKWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDVGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPIVALNPATLLPLPEEGLQ




HDCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSFLQEGQRRAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAAGKKLNVYTDSRYAFAT




AHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGNHAEARGNRMADQAAREVATRETPETSTLL





MLVFF_
8,068
TLNIEDEYRLHETSKGPDVPLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P26809_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQSLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT


3mut

LFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT




PRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKPGTLFEWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVA




YLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPIVALNPATLLPLPEEGLQ




HDCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVVWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFA




TAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGNRAEARGNRMADQAAREVATRETPETSTLL





MLVFF_
8,069
TLNIEDEYRLHETSKGPDVPLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P26809_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQSLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT


3mutA

LFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGDLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT




PRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFEWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPIVALNPATLLPLPEEGLQ




HDCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVVWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFA




TAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGNRAEARGNRMADQAAREVATRETPETSTLL





MLVMS_
8,070
TLNIEDEHRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P03355

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT




LFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT




PRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVA




YLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGL




QHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFA




TAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLL





MLVMS_
8,137
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


reference

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT




LFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT




PRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQ




HNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFAT




AHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP





MLVMS_
8,071
TLNIEDEHRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P03355

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT




LFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT




PRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVA




YLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPWVALNPATLLPLPEEGL




QHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFA




TAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLL





MLVMS_
8,072
TLNIEDEHRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P03355_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT


3mut

LFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT




PRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVA




YLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPWVALNPATLLPLPEEGL




QHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFA




TAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLL





MLVMS_
8,073
TLNIEDEHRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P03355_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT


3mut

LFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT




PRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVA




YLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPWVALNPATLLPLPEEGL




QHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFA




TAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLL





MLVMS_
8,074
TLNIEDEHRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P03355_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT


3mutA_

LFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT


WS

PRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQ




HNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFAT




AHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLL





MLVMS_
8,075
TLNIEDEHRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P03355_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT


3mutA_

LFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT


WS

PRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQ




HNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFAT




AHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLL





MLVMS_
8,076
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P03355_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT


PLV919

LFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT




PRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPWVALNPATLLPLPEEGLQ




HNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFAT




AHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEF




E





MLVMS_
8,077
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P03355_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT


PLV919

LFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT




PRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQ




HNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFAT




AHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEF




E





MLVRD_
8,078
TLNIEDEYRLHEISTEPDVSPGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEAKLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P11227

VKKPGTNDYRPVQGLREVNKRVEDIHPTVPNPYNLLSGLPTSHRWYTVLDLKDAFFCLRLHPTSQPLFASEWRDPGMGISGQLTWTRLPQGFKNSPT




LFDEALHRGLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLKTLGNLGYRASAKKAQICQKQVKYLGYLLREGQRWLTEARKETVMGQPTPKT




PRQLREFLGTAGFCRLWIPRFAEMAAPLYPLTKTGTLFNWGPDQQKAYHEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPWVALNPATLLPLPEEGAP




HDCLEILAETHGTEPDLTDQPIPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWARALPAGTSAQRAELIALTQALKMAEGKRLNVYTDSRYAFATA




HIHGEIYKRRGLLTSEGREIKNKSEILALLKALFLPKRLSIIHCLGHQKGDSAEARGNRLADQAAREAAIKTPPDTSTLL





MLVRD_
8,079
TLNIEDEYRLHEISTEPDVSPGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEAKLGIKPHIQRLLDQGILVPCQSPWNTPLLP


P11227_

VKKPGTNDYRPVQGLREVNKRVEDIHPTVPNPYNLLSGLPTSHRWYTVLDLKDAFFCLRLHPTSQPLFASEWRDPGMGISGQLTWTRLPQGFKNSPT


3mut

LFNEALHRGLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLKTLGNLGYRASAKKAQICQKQVKYLGYLLREGQRWLTEARKETVMGQPTPKT




PRQLREFLGTAGFCRLWIPRFAEMAAPLYPLTKPGTLFNWGPDQQKAYHEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAY




LSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPVVALNPATLLPLPEEGAP




HDCLEILAETHGTEPDLTDQPIPDADHTWYTDGSSFLQEGQRKAGAAVTTETEVIWARALPAGTSAQRAELIALTQALKMAEGKRLNVYTDSRYAFATA




HIHGEIYKRRGWLTSEGREIKNKSEILALLKALFLPKRLSIIHCLGHQKGDSAEARGNRLADQAAREAAIKTPPDTSTLL





MMTVB_
8,080
WVQEISDSRPMLHIYLNGRRFLGLLNTGADKTCIAGRDWPANWPIHQTESSLQGLGMACGVARSSQPLRWQHEDKSGIIHPFVIPTLPFTLWGRDIMK


P03365

DIKVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLLQDLRAV




NATMHDMGALQPGLPSPVAVPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVRDKYQDS




YIVHYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTTGELKPLF




EILNGDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHRSKELFSK




DPDYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQNTAQQA




EIVAVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILT





MMTVB_
8,081
WVQEISDSRPMLHIYLNGRRFLGLLNTGADKTCIAGRDWPANWPIHQTESSLQGLGMACGVARSSQPLRWQHEDKSGIIHPFVIPTLPFTLWGRDIMK


P03365

DIKVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLLQDLRAV




NATMHDMGALQPGLPSPVAVPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVRDKYQDS




YIVHYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTTGELKPLF




EILNGDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHRSKELFSK




DPDYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQNTAQQA




EIVAVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILT





MMTVB_
8,082
WVQEISDSRPMLHIYLNGRRFLGLLNTGADKTCIAGRDWPANWPIHQTESSLQGLGMACGVARSSQPLRWQHEDKSGIIHPFVIPTLPFTLWGRDIMK


P03365_

DIKVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLLQDLRAV


2mut

NATMHDMGALQPGLPSPVAVPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVRDKYQDS




YIVHYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTTGELKPLF




EILNPDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHRSKELFSK




DPDYIWVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQNTAQQA




EIVAVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILT





MMTVB_
8,083
VQEISDSRPMLHIYLNGRRFLGLLDTGADKTCIAGRDWPANWPIHQTESSLQGLGMACGVARSSQPLRWQHEDKSGIIHPFVIPTLPFTLWGRDIMKDI


P03365_

KVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLLQDLRAVNA


2mut_

TMHDMGALQPGLPSPVAVPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVRDKYQDSYIV


WS

HYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTTGELKPLFEIL




NPDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGWVEWIHLPHISPKVITPYDIFCTQLIIKGRHRSKELFSKDP




DYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQNTAQQAEIV




AVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILTA





MMTVB_
8,084
VQEISDSRPMLHIYLNGRRFLGLLDTGADKTCIAGRDWPANWPIHQTESSLQGLGMACGVARSSQPLRWQHEDKSGIIHPFVIPTLPFTLWGRDIMKDI


P03365_

KVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLLQDLRAVNA


2mut_

TMHDMGALQPGLPSPVAVPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVRDKYQDSYIV


WS

HYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTTGELKPLFEIL




NPDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHRSKELFSKDP




DYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQNTAQQAEIV




AVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILTA





MMTVB_
8,085
WVQEISDSRPMLHIYLNGRRFLGLLNTGADKTCIAGRDWPANWPIHQTESSLQGLGMACGVARSSQPLRWQHEDKSGIIHPFVIPTLPFTLWGRDIMK


P03365_

DIKVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLLQDLRAV


2mutB

NATMHDMGALQPGLPSPVAPPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVRDKYQDS




YIVHYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTTGELKPLF




EILNPDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHRSKELFSK




DPDYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQNTAQQA




EIVAVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILT





MMTVB_
8,086
WVQEISDSRPMLHIYLNGRRFLGLLNTGADKTCIAGRDWPANWPIHQTESSLQGLGMACGVARSSQPLRWQHEDKSGIIHPFVIPTLPFTLWGRDIMK


P03365_

DIKVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLLQDLRAV


2mutB

NATMHDMGALQPGLPSPVAPPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVRDKYQDS




YIVHYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTTGELKPLF




EILNPDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHRSKELFSK




DPDYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQNTAQQA




EIVAVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILT





MMTVB_
8,087
VQEISDSRPMLHIYLNGRRFLGLLDTGADKTCIAGRDWPANWPIHQTESSLQGLGMACGVARSSQPLRWQHEDKSGIIHPFVIPTLPFTLWGRDIMKDI


P03365_

KVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLLQDLRAVNA


2mutB_

TMHDMGALQPGLPSPPAVPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVRDKYQDSYIV


WS

HYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTTGELKPLFEIL




NPDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHRSKELFSKDP




DYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQNTAQQAEIV




AVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILTA





MMTVB_
8,088
VQEISDSRPMLHIYLNGRRFLGLLDTGADKTCIAGRDWPANWPIHQTESSLQGLGMACGVARSSQPLRWQHEDKSGIIHPFVIPTLPFTLWGRDIMKDI


P03365_

KVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLLQDLRAVNA


2mutB_

TMHDMGALQPGLPSPPAVPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVRDKYQDSYIV


WS

HYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTTGELKPLFEIL




NPDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHRSKELFSKDP




DYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQNTAQQAEIV




AVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILTA





MMTVB_
8,089
VQEISDSRPMLHIYLNGRRFLGLLDTGADKTCIAGRDWPANWPIHQTESSLQGLGMACGVARSSQPLRWQHEDKSGIIHPFVIPTLPFTLWGRDIMKDI


P03365_

KVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLLQDLRAVNA


WS

TMHDMGALQPGLPSPVAVPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVRDKYQDSYIV




HYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTTGELKPLFEIL




NGDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHRSKELFSKDP




DYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQNTAQQAEIV




AVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILTA





MMTVB_
8,090
VQEISDSRPMLHIYLNGRRFLGLLDTGADKTCIAGRDWPANWPIHQTESSLQGLGMACGVARSSQPLRWQHEDKSGIIHPFVIPTLPFTLWGRDIMKDI


P03365_

KVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLLQDLRAVNA


WS

TMHDMGALQPGLPSPVAVPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVRDKYQDSYIV




HYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTTGELKPLFEIL




NGDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGWVEWIHLPHISPKVITPYDIFCTQLIIKGRHRSKELFSKDP




DYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQNTAQQAEIV




AVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILTA





MMTVB_
8,091
GRDIMKDIKVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLL


P03365-

QDLRAVNATMHDMGALQPGLPSPVAVPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVR


Pro

DKYQDSYIVHYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTT




GELKPLFEILNGDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHR




SKELFSKDPDYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQ




NTAQQAEIVAVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILT





MMTVB_
8,092
GRDIMKDIKVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLL


P03365-

QDLRAVNATMHDMGALQPGLPSPVAVPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVR


Pro

DKYQDSYIVHYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTT




GELKPLFEILNGDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHR




SKELFSKDPDYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQ




NTAQQAEIVAVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILT





MMTVB_
8,093
GRDIMKDIKVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLL


P03365-

QDLRAVNATMHDMGALQPGLPSPVAVPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVR


Pro_2mut

DKYQDSYIVHYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTT




GELKPLFEILNPDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHR




SKELFSKDPDYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQ




NTAQQAEIVAVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILT





MMTVB_
8,094
GRDIMKDIKVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLL


P03365-

QDLRAVNATMHDMGALQPGLPSPVAVPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVR


Pro_2mut

DKYQDSYIVHYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTT




GELKPLFEILNPDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHR




SKELFSKDPDYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQ




NTAQQAEIVAVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILT





MMTVB_
8,095
GRDIMKDIKVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLL


P03365-

QDLRAVNATMHDMGALQPGLPSPVAPPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVR


Pro_2mutB

DKYQDSYIVHYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTT




GELKPLFEILNPDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHR




SKELFSKDPDYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQ




NTAQQAEIVAVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILT





MMTVB_
8,096
GRDIMKDIKVRLMTDSPDDSQDLMIGAIESNLFADQISWKSDQPVWLNQWPLKQEKLQALQQLVTEQLQLGHLEESNSPWNTPVFVIKKKSGKWRLL


P03365-

QDLRAVNATMHDMGALQPGLPSPVAPPKGWEIIIIDLQDCFFNIKLHPEDCKRFAFSVPSPNFKRPYQRFQWKVLPQGMKNSPTLCQKFVDKAILTVR


Pro_2mutB

DKYQDSYIVHYMDDILLAHPSRSIVDEILTSMIQALNKHGLVVSTEKIQKYDNLKYLGTHIQGDSVSYQKLQIRTDKLRTLNDFQKLLGNINWIRPFLKLTT




GELKPLFEILNPDSNPISTRKLTPEACKALQLMNERLSTARVKRLDLSQPWSLCILKTEYTPTACLWQDGVVEWIHLPHISPKVITPYDIFCTQLIIKGRHR




SKELFSKDPDYIVVPYTKVQFDLLLQEKEDWPISLLGFLGEVHFHLPKDPLLTFTLQTAIIFPHMTSTTPLEKGIVIFTDGSANGRSVTYIQGREPIIKENTQ




NTAQQAEIVAVITAFEEVSQPFNLYTDSKYVTGLFPEIETATLSPRTKIYTELKHLQRLIHKRQEKFYIGHIRGHTGLPGPLAQGNAYADSLTRILT





MPMV_
8,097
LTAAIDILAPQQCAEPITWKSDEPVWVDQWPLTNDKLAAAQQLVQEQLEAGHITESSSPWNTPIFVIKKKSGKWRLLQDLRAVNATMVLMGALQPGLP


P07572

SPVAIPQGYLKIIIDLKDCFFSIPLHPSDQKRFAFSLPSTNFKEPMQRFQWKVLPQGMANSPTLCQKYVATAIHKVRHAWKQMYIIHYMDDILIAGKDGQ




QVLQCFDQLKQELTAAGLHIAPEKVQLQDPYTYLGFELNGPKITNQKAVIRKDKLQTLNDFQKLLGDINWLRPYLKLTTGDLKPLFDTLKGDSDPNSHR




SLSKEALASLEKVETAIAEQFVTHINYSLPLIFLIFNTALTPTGLFWQDNPIMWIHLPASPKKVLLPYYDAIADLIILGRDHSKKYFGIEPSTIIQPYSKSQIDW




LMQNTEMWPIACASFVGILDNHYPPNKLIQFCKLHTFVFPQIISKTPLNNALLVFTDGSSTGMAAYTLTDTTIKFQTNLNSAQLVELQALIAVLSAFPNQPL




NIYTDSAYLAHSIPLLETVAQIKHISETAKLFLQCQQLIYNRSIPFYIGHVRAHSGLPGPIAQGNQRADLATKIVASNINT





MPMV_
8,098
LTAAIDILAPQQCAEPITWKSDEPVWVDQWPLTNDKLAAAQQLVQEQLEAGHITESSSPWNTPIFVIKKKSGKWRLLQDLRAVNATMVLMGALQPGLP


P07572_

SPVAPPQGYLKIIIDLKDCFFSIPLHPSDQKRFAFSLPSTNFKEPMQRFQWKVLPQGMANSPTLCQKYVATAIHKVRHAWKQMYIIHYMDDILIAGKDGQ


2mutB

QVLQCFDQLKQELTAAGLHIAPEKVQLQDPYTYLGFELNGPKITNQKAVIRKDKLQTLNDFQKLLGDINWLRPYLKLTTGDLKPLFDTLKPDSDPNSHRS




LSKEALASLEKVETAIAEQFVTHINYSLPLIFLIFNTALTPTGLFWQDNPIMWIHLPASPKKVLLPYYDAIADLIILGRDHSKKYFGIEPSTIIQPYSKSQIDWL




MQNTEMWPIACASFVGILDNHYPPNKLIQFCKLHTFVFPQIISKTPLNNALLVFTDGSSTGMAAYTLTDTTIKFQTNLNSAQLVELQALIAVLSAFPNQPL




NIYTDSAYLAHSIPLLETVAQIKHISETAKLFLQCQQLIYNRSIPFYIGHVRAHSGLPGPIAQGNQRADLATKIVASNINT





PERV_
8,099
TLQLDDEYRLYSPLVKPDQNIQFWLEQFPQAWAETAGMGLAKQVPPQVIQLKASATPVSVRQYPLSKEAQEGIRPHVQRLIQQGILVPVQSPWNTPLL


Q4VFZ2

PVRKPGTNDYRPVQDLREVNKRVQDIHPTVPNPYNLLCALPPQRSWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGTGRTGQLTWTRLPQGFKNS




PTIFDEALHRDLANFRIQHPQVTLLQYVDDLLLAGATKQDCLEGTKALLLELSDLGYRASAKKAQICRREVTYLGYSLRDGQRWLTEARKKTVQIPAPT




TAKQVREFLGTAGFCRLWIPGFATLAAPLYPLTKEKGEFSWAPEHQKAFDAIKKALLSAPALALPDVTKPFTLYVDERKGVARGVLTQTLGPWRRPVA




YLSKKLDPVASGWPVCLKAIAAVAILVKDADKLTLGQNITVIAPHALENIVRQPPDRWMTNARMTHYQSLLLTERVTFAPPAALNPATLLPEETDEPVTH




DCHQLLIEETGVRKDLTDIPLTGEVLTWFTDGSSYVVEGKRMAGAAVVDGTRTIWASSLPEGTSAQKAELMALTQALRLAEGKSINIYTDSRYAFATAH




VHGAIYKQRGLLTSAGREIKNKEEILSLLEALHLPKRLAIIHCPGHQKAKDPISRGNQMADRVAKQAAQGVNLL





PERV_
8,100
TLQLDDEYRLYSPLVKPDQNIQFWLEQFPQAWAETAGMGLAKQVPPQVIQLKASATPVSVRQYPLSKEAQEGIRPHVQRLIQQGILVPVQSPWNTPLL


Q4VFZ2

PVRKPGTNDYRPVQDLREVNKRVQDIHPTVPNPYNLLCALPPQRSWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGTGRTGQLTWTRLPQGFKNS




PTIFDEALHRDLANFRIQHPQVTLLQYVDDLLLAGATKQDCLEGTKALLLELSDLGYRASAKKAQICRREVTYLGYSLRDGQRWLTEARKKTVQIPAPT




TAKQVREFLGTAGFCRLWIPGFATLAAPLYPLTKEKGEFSWAPEHQKAFDAIKKALLSAPALALPDVTKPFTLYVDERKGVARGVLTQTLGPWRRPVA




YLSKKLDPVASGWPVCLKAIAAVAILVKDADKLTLGQNITVIAPHALENIVRQPPDRWMTNARMTHYQSLLLTERVTFAPPAALNPATLLPEETDEPVTH




DCHQLLIEETGVRKDLTDIPLTGEVLTWFTDGSSYVVEGKRMAGAAVVDGTRTIWASSLPEGTSAQKAELMALTQALRLAEGKSINIYTDSRYAFATAH




VHGAIYKQRGLLTSAGREIKNKEEILSLLEALHLPKRLAIIHCPGHQKAKDPISRGNQMADRVAKQAAQGVNLL





PERV_
8,101
TLQLDDEYRLYSPLVKPDQNIQFWLEQFPQAWAETAGMGLAKQVPPQVIQLKASATPVSVRQYPLSKEAQEGIRPHVQRLIQQGILVPVQSPWNTPLL


Q4VFZ2_

PVRKPGTNDYRPVQDLREVNKRVQDIHPTVPNPYNLLCALPPQRSWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGTGRTGQLTWTRLPQGFKNS


3mut

PTIFNEALHRDLANFRIQHPQVTLLQYVDDLLLAGATKQDCLEGTKALLLELSDLGYRASAKKAQICRREVTYLGYSLRDGQRWLTEARKKTVQIPAPT




TAKQVREFLGTAGFCRLWIPGFATLAAPLYPLTKPKGEFSWAPEHQKAFDAIKKALLSAPALALPDVTKPFTLYVDERKGVARGVLTQTLGPWRRPVA




YLSKKLDPVASGWPVCLKAIAAVAILVKDADKLTLGQNITVIAPHALENIVRQPPDRWMTNARMTHYQSLLLTERVTFAPPAALNPATLLPEETDEPVTH




DCHQLLIEETGVRKDLTDIPLTGEVLTWFTDGSSYVVEGKRMAGAAVVDGTRTIWASSLPEGTSAQKAELMALTQALRLAEGKSINIYTDSRYAFATAH




VHGAIYKQRGWLTSAGREIKNKEEILSLLEALHLPKRLAIIHCPGHQKAKDPISRGNQMADRVAKQAAQGVNLL





PERV_
8,102
TLQLDDEYRLYSPLVKPDQNIQFWLEQFPQAWAETAGMGLAKQVPPQVIQLKASATPVSVRQYPLSKEAQEGIRPHVQRLIQQGILVPVQSPWNTPLL


Q4VFZ2_

PVRKPGTNDYRPVQDLREVNKRVQDIHPTVPNPYNLLCALPPQRSWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGTGRTGQLTWTRLPQGFKNS


3mut

PTIFNEALHRDLANFRIQHPQVTLLQYVDDLLLAGATKQDCLEGTKALLLELSDLGYRASAKKAQICRREVTYLGYSLRDGQRWLTEARKKTVVQIPAPT




TAKQVREFLGTAGFCRLWIPGFATLAAPLYPLTKPKGEFSWAPEHQKAFDAIKKALLSAPALALPDVTKPFTLYVDERKGVARGVLTQTLGPWRRPVA




YLSKKLDPVASGWPVCLKAIAAVAILVKDADKLTLGQNITVIAPHALENIVRQPPDRWMTNARMTHYQSLLLTERVTFAPPAALNPATLLPEETDEPVTH




DCHQLLIEETGVRKDLTDIPLTGEVLTWFTDGSSYVVEGKRMAGAAVVDGTRTIWASSLPEGTSAQKAELMALTQALRLAEGKSINIYTDSRYAFATAH




VHGAIYKQRGWLTSAGREIKNKEEILSLLEALHLPKRLAIIHCPGHQKAKDPISRGNQMADRVAKQAAQGVNLL





PERV_
8,103
LDDEYRLYSPLVKPDQNIQFWLEQFPQAWAETAGMGLAKQVPPQVIQLKASATPVSVRQYPLSKEAQEGIRPHVQRLIQQGILVPVQSPWNTPLLPVR


Q4VFZ2_

KPGTNDYRPVQDLREVNKRVQDIHPTVPNPYNLLCALPPQRSWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGTGRTGQLTWTRLPQGFKNSPTIF


3mutA_

NEALHRDLANFRIQHPQVTLLQYVDDLLLAGATKQDCLEGTKALLLELSDLGYRASAKKAQICRREVTYLGYSLRDGQRWLTEARKKTVVQIPAPTTAK


WS

QVREFLGKAGFCRLFIPGFATLAAPLYPLTKPKGEFSWAPEHQKAFDAIKKALLSAPALALPDVTKPFTLYVDERKGVARGVLTQTLGPWRRPVAYLSK




KLDPVASGWPVCLKAIAAVAILVKDADKLTLGQNITVIAPHALENIVRQPPDRWMTNARMTHYQSLLLTERVTFAPPAALNPATLLPEETDEPVTHDCHQ




LLIEETGVRKDLTDIPLTGEVLTWFTDGSSYVVEGKRMAGAAVVDGTRTIWASSLPEGTSAQKAELMALTQALRLAEGKSINIYTDSRYAFATAHVHGAI




YKQRGWLTSAGREIKNKEEILSLLEALHLPKRLAIIHCPGHQKAKDPISRGNQMADRVAKQAAQGVNLLP





PERV_
8,104
LDDEYRLYSPLVKPDQNIQFWLEQFPQAWAETAGMGLAKQVPPQVIQLKASATPVSVRQYPLSKEAQEGIRPHVQRLIQQGILVPVQSPWNTPLLPVR


Q4VFZ2_

KPGTNDYRPVQDLREVNKRVQDIHPTVPNPYNLLCALPPQRSWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPGTGRTGQLTWTRLPQGFKNSPTIF


3mutA_

NEALHRDLANFRIQHPQVTLLQYVDDLLLAGATKQDCLEGTKALLLELSDLGYRASAKKAQICRREVTYLGYSLRDGQRWLTEARKKTVVQIPAPTTAK


WS

QVREFLGKAGFCRLFIPGFATLAAPLYPLTKPKGEFSWAPEHQKAFDAIKKALLSAPALALPDVTKPFTLYVDERKGVARGVLTQTLGPWRRPVAYLSK




KLDPVASGWPVCLKAIAAVAILVKDADKLTLGQNITVIAPHALENIVRQPPDRWMTNARMTHYQSLLLTERVTFAPPAALNPATLLPEETDEPVTHDCHQ




LLIEETGVRKDLTDIPLTGEVLTWFTDGSSYVVEGKRMAGAAVVDGTRTIWASSLPEGTSAQKAELMALTQALRLAEGKSINIYTDSRYAFATAHVHGAI




YKQRGWLTSAGREIKNKEEILSLLEALHLPKRLAIIHCPGHQKAKDPISRGNQMADRVAKQAAQGVNLLP





SFV1_
8,105
MDPLQLLQPLEAEIKGTKLKAHWNSGATITCVPEAFLEDERPIQTMLIKTIHGEKQQDVYYLTFKVQGRKVEAEVLASPYDYILLNPSDVPWLMKKPLQL


P23074

TVLVPLHEYQERLLQQTALPKEQKELLQKLFLKYDALWQHWENQVGHRRIKPHNIATGTLAPRPQKQYPINPKAKPSIQIVIDDLLKQGVLIQQNSTMNT




PVYPVPKPDGKWRMVLDYREVNKTIPLIAAQNQHSAGILSSIYRGKYKTTLDLTNGFWAHPITPESYWLTAFTWQGKQYCWTRLPQGFLNSPALFTAD




WVDLLKEIPNVQAYVDDIYISHDDPQEHLEQLEKIFSILLNAGYVVSLKKSEIAQREVEFLGFNITKEGRGLTDTFKQKLLNITPPKDLKQLQSILGLLNFAR




NFIPNYSELVKPLYTIVANANGKFISWTEDNSNQLQHIISVLNQADNLEERNPETRLIIKVNSSPSAGYIRYYNEGSKRPIMYVNYIFSKAEAKFTQTEKLL




TTMHKGLIKAMDLAMGQEILVYSPIVSMTKIQRTPLPERKALPVRWITWMTYLEDPRIQFHYDKSLPELQQIPNVTEDVIAKTKHPSEFAMVFYTDGSAIK




HPDVNKSHSAGMGIAQVQFIPEYKIVHQWSIPLGDHTAQLAEIAAVEFACKKALKISGPVLIVTDSFYVAESANKELPYWKSNGFLNNKKKPLRHVSKW




KSIAECLQLKPDIIIMHEKGHQQPMTTLHTEGNNLADKLATQGSYVVH





SFV1_
8,106
MDPLQLLQPLEAEIKGTKLKAHWNSGATITCVPEAFLEDERPIQTMLIKTIHGEKQQDVYYLTFKVQGRKVEAEVLASPYDYILLNPSDVPWLMKKPLQL


P23074_

TVLVPLHEYQERLLQQTALPKEQKELLQKLFLKYDALWQHWENQVGHRRIKPHNIATGTLAPRPQKQYPINPKAKPSIQIVIDDLLKQGVLIQQNSTMNT


2mut

PVYPVPKPDGKWRMVLDYREVNKTIPLIAAQNQHSAGILSSIYRGKYKTTLDLTNGFWAHPITPESYWLTAFTWQGKQYCWTRLPQGFLNSPALFNAD




WVDLLKEIPNVQAYVDDIYISHDDPQEHLEQLEKIFSILLNAGYVVSLKKSEIAQREVEFLGFNITKEGRGLTDTFKQKLLNITPPKDLKQLQSILGLLNFAR




NFIPNYSELVKPLYTIVAPANGKFISWTEDNSNQLQHIISVLNQADNLEERNPETRLIIKVNSSPSAGYIRYYNEGSKRPIMYVNYIFSKAEAKFTQTEKLLT




TMHKGLIKAMDLAMGQEILVYSPIVSMTKIQRTPLPERKALPVRWITWMTYLEDPRIQFHYDKSLPELQQIPNVTEDVIAKTKHPSEFAMVFYTDGSAIKH




PDVNKSHSAGMGIAQVQFIPEYKIVHQWSIPLGDHTAQLAEIAAVEFACKKALKISGPVLIVTDSFYVAESANKELPYWKSNGFLNNKKKPLRHVSKWK




SIAECLQLKPDIIIMHEKGHQQPMTTLHTEGNNLADKLATQGSYVVH





SFV1_
8,107
MDPLQLLQPLEAEIKGTKLKAHWNSGATITCVPEAFLEDERPIQTMLIKTIHGEKQQDVYYLTFKVQGRKVEAEVLASPYDYILLNPSDVPWLMKKPLQL


P23074_

TVLVPLHEYQERLLQQTALPKEQKELLQKLFLKYDALWQHWENQVGHRRIKPHNIATGTLAPRPQKQYPINPKAKPSIQIVIDDLLKQGVLIQQNSTMNT


2mutA

PVYPVPKPDGKWRMVLDYREVNKTIPLIAAQNQHSAGILSSIYRGKYKTTLDLTNGFWAHPITPESYWLTAFTWQGKQYCWTRLPQGFLNSPALFNAD




WDLLKEIPNVQAYVDDIYISHDDPQEHLEQLEKIFSILLNAGYVVSLKKSEIAQREVEFLGFNITKEGRGLTDTFKQKLLNITPPKDLKQLQSILGKLNFAR




NFIPNYSELVKPLYTIVAPANGKFISWTEDNSNQLQHIISVLNQADNLEERNPETRLIIKVNSSPSAGYIRYYNEGSKRPIMYVNYIFSKAEAKFTQTEKLLT




TMHKGLIKAMDLAMGQEILVYSPIVSMTKIQRTPLPERKALPVRWITWMTYLEDPRIQFHYDKSLPELQQIPNVTEDVIAKTKHPSEFAMVFYTDGSAIKH




PDVNKSHSAGMGIAQVQFIPEYKIVHQWSIPLGDHTAQLAEIAAVEFACKKALKISGPVLIVTDSFYVAESANKELPYWKSNGFLNNKKKPLRHVSKWK




SIAECLQLKPDIIIMHEKGHQQPMTTLHTEGNNLADKLATQGSYVVH





SFV1_
8,108
VPWLMKKPLQLTVLVPLHEYQERLLQQTALPKEQKELLQKLFLKYDALWQHWENQVGHRRIKPHNIATGTLAPRPQKQYPINPKAKPSIQIVIDDLLKQ


P23074-

GVLIQQNSTMNTPVYPVPKPDGKWRMVLDYREVNKTIPLIAAQNQHSAGILSSIYRGKYKTTLDLTNGFWAHPITPESYWLTAFTWQGKQYCWTRLPQ


Pro

GFLNSPALFTADVVDLLKEIPNVQAYVDDIYISHDDPQEHLEQLEKIFSILLNAGYVVSLKKSEIAQREVEFLGFNITKEGRGLTDTFKQKLLNITPPKDLKQ




LQSILGLLNFARNFIPNYSELVKPLYTIVANANGKFISWTEDNSNQLQHIISVLNQADNLEERNPETRLIIKVNSSPSAGYIRYYNEGSKRPIMYVNYIFSKA




EAKFTQTEKLLTTMHKGLIKAMDLAMGQEILVYSPIVSMTKIQRTPLPERKALPVRWITWMTYLEDPRIQFHYDKSLPELQQIPNVTEDVIAKTKHPSEFA




MVFYTDGSAIKHPDVNKSHSAGMGIAQVQFIPEYKIVHQWSIPLGDHTAQLAEIAAVEFACKKALKISGPVLIVTDSFYVAESANKELPYWKSNGFLNNK




KKPLRHVSKWKSIAECLQLKPDIIIMHEKGHQQPMTTLHTEGNNLADKLATQGSYVVH





SFV1_
8,109
VPWLMKKPLQLTVLVPLHEYQERLLQQTALPKEQKELLQKLFLKYDALWQHWENQVGHRRIKPHNIATGTLAPRPQKQYPINPKAKPSIQIVIDDLLKQ


P23074-

GVLIQQNSTMNTPVYPVPKPDGKWRMVLDYREVNKTIPLIAAQNQHSAGILSSIYRGKYKTTLDLTNGFWAHPITPESYWLTAFTWQGKQYCWTRLPQ


Pro_2mut

GFLNSPALFNADVVDLLKEIPNVQAYVDDIYISHDDPQEHLEQLEKIFSILLNAGYVVSLKKSEIAQREVEFLGFNITKEGRGLTDTFKQKLLNITPPKDLK




QLQSILGLLNFARNFIPNYSELVKPLYTIVAPANGKFISWTEDNSNQLQHIISVLNQADNLEERNPETRLIIKVNSSPSAGYIRYYNEGSKRPIMYVNYIFSK




AEAKFTQTEKLLTTMHKGLIKAMDLAMGQEILVYSPIVSMTKIQRTPLPERKALPVRWITWMTYLEDPRIQFHYDKSLPELQQIPNVTEDVIAKTKHPSEF




AMVFYTDGSAIKHPDVNKSHSAGMGIAQVQFIPEYKIVHQWSIPLGDHTAQLAEIAAVEFACKKALKISGPVLIVTDSFYVAESANKELPYWKSNGFLNN




KKKPLRHVSKWKSIAECLQLKPDIIIMHEKGHQQPMTTLHTEGNNLADKLATQGSYVVH





SFV1_
8,110
VPWLMKKPLQLTVLVPLHEYQERLLQQTALPKEQKELLQKLFLKYDALWQHWENQVGHRRIKPHNIATGTLAPRPQKQYPINPKAKPSIQIVIDDLLKQ


P23074-

GVLIQQNSTMNTPVYPVPKPDGKWRMVLDYREVNKTIPLIAAQNQHSAGILSSIYRGKYKTTLDLTNGFWAHPITPESYWLTAFTWQGKQYCWTRLPQ


Pro_2mutA

GFLNSPALFNADVVDLLKEIPNVQAYVDDIYISHDDPQEHLEQLEKIFSILLNAGYVVSLKKSEIAQREVEFLGFNITKEGRGLTDTFKQKLLNITPPKDLK




QLQSILGKLNFARNFIPNYSELVKPLYTIVAPANGKFISWTEDNSNQLQHIISVLNQADNLEERNPETRLIIKVNSSPSAGYIRYYNEGSKRPIMYVNYIFSK




AEAKFTQTEKLLTTMHKGLIKAMDLAMGQEILVYSPIVSMTKIQRTPLPERKALPVRWITWMTYLEDPRIQFHYDKSLPELQQIPNVTEDVIAKTKHPSEF




AMVFYTDGSAIKHPDVNKSHSAGMGIAQVQFIPEYKIVHQWSIPLGDHTAQLAEIAAVEFACKKALKISGPVLIVTDSFYVAESANKELPYWKSNGFLNN




KKKPLRHVSKWKSIAECLQLKPDIIIMHEKGHQQPMTTLHTEGNNLADKLATQGSYVVH





SFV3L_
8,111
MDPLQLLQPLEAEIKGTKLKAHWNSGATITCVPQAFLEEEVPIKNIWIKTIHGEKEQPVYYLTFKIQGRKVEAEVISSPYDYILVSPSDIPWLMKKPLQLTT


P27401

LVPLQEYEERLLKQTMLTGSYKEKLQSLFLKYDALWQHWENQVGHRRIKPHHIATGTVNPRPQKQYPINPKAKASIQTVINDLLKQGVLIQQNSIMNTP




VYPVPKPDGKWRMVLDYREVNKTIPLIAAQNQHSAGILSSIFRGKYKTTLDLSNGFWAHSITPESYWLTAFTWLGQQYCWTRLPQGFLNSPALFTADV




VDLLKEVPNVQVYVDDIYISHDDPREHLEQLEKVFSLLLNAGYVVSLKKSEIAQHEVEFLGFNITKEGRGLTETFKQKLLNITPPRDLKQLQSILGLLNFAR




NFIPNFSELVKPLYNIIATANGKYITWTTDNSQQLQNIISMLNSAENLEERNPEVRLIMKVNTSPSAGYIRFYNEFAKRPIMYLNYVYTKAEVKFTNTEKLL




TTIHKGLIKALDLGMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMSYLEDPRIQFHYDKTLPELQQVPTVTDDIIAKIKHPSEFSMVFYTDGSAIKHP




NVNKSHNAGMGIAQVQFKPEFTVINTWSIPLGDHTAQLAEVAAVEFACKKALKIDGPVLIVTDSFYVAESVNKELPYWQSNGFFNNKKKPLKHVSKWK




SIADCIQLKPDIIIIHEKGHQPTASTFHTEGNNLADKLATQGSYVVN





SFV3L_
8,112
MDPLQLLQPLEAEIKGTKLKAHWNSGATITCVPQAFLEEEVPIKNIWIKTIHGEKEQPVYYLTFKIQGRKVEAEVISSPYDYILVSPSDIPWLMKKPLQLTT


P27401_

LVPLQEYEERLLKQTMLTGSYKEKLQSLFLKYDALWQHWENQVGHRRIKPHHIATGTVNPRPQKQYPINPKAKASIQTVINDLLKQGVLIQQNSIMNTP


2mut

VYPVPKPDGKWRMVLDYREVNKTIPLIAAQNQHSAGILSSIFRGKYKTTLDLSNGFWAHSITPESYWLTAFTWLGQQYCWTRLPQGFLNSPALFNADV




VDLLKEVPNVQVYVDDIYISHDDPREHLEQLEKVFSLLLNAGYVVSLKKSEIAQHEVEFLGFNITKEGRGLTETFKQKLLNITPPRDLKQLQSILGLLNFAR




NFIPNFSELVKPLYNIIATAPGKYITWTTDNSQQLQNIISMLNSAENLEERNPEVRLIMKVNTSPSAGYIRFYNEFAKRPIMYLNYVYTKAEVKFTNTEKLL




TTIHKGLIKALDLGMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMSYLEDPRIQFHYDKTLPELQQVPTVTDDIIAKIKHPSEFSMVFYTDGSAIKHP




NVNKSHNAGMGIAQVQFKPEFTVINTWSIPLGDHTAQLAEVAAVEFACKKALKIDGPVLIVTDSFYVAESVNKELPYWQSNGFFNNKKKPLKHVSKWK




SIADCIQLKPDIIIIHEKGHQPTASTFHTEGNNLADKLATQGSYVVN





SFV3L_
8,113
MDPLQLLQPLEAEIKGTKLKAHWNSGATITCVPQAFLEEEVPIKNIWIKTIHGEKEQPVYYLTFKIQGRKVEAEVISSPYDYILVSPSDIPWLMKKPLQLTT


P27401_

LVPLQEYEERLLKQTMLTGSYKEKLQSLFLKYDALWQHWENQVGHRRIKPHHIATGTVNPRPQKQYPINPKAKASIQTVINDLLKQGVLIQQNSIMNTP


2mutA

VYPVPKPDGKWRMVLDYREVNKTIPLIAAQNQHSAGILSSIFRGKYKTTLDLSNGFWAHSITPESYWLTAFTWLGQQYCWTRLPQGFLNSPALFNADV




VDLLKEVPNVQVYVDDIYISHDDPREHLEQLEKVFSLLLNAGYVVSLKKSEIAQHEVEFLGFNITKEGRGLTETFKQKLLNITPPRDLKQLQSILGKLNFA




RNFIPNFSELVKPLYNIIATAPGKYITWTTDNSQQLQNIISMLNSAENLEERNPEVRLIMKVNTSPSAGYIRFYNEFAKRPIMYLNYVYTKAEVKFTNTEKL




LTTIHKGLIKALDLGMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMSYLEDPRIQFHYDKTLPELQQVPTVTDDIIAKIKHPSEFSMVFYTDGSAIKH




PNVNKSHNAGMGIAQVQFKPEFTVINTWSIPLGDHTAQLAEVAAVEFACKKALKIDGPVLIVTDSFYVAESVNKELPYWQSNGFFNNKKKPLKHVSKW




KSIADCIQLKPDIIIIHEKGHQPTASTFHTEGNNLADKLATQGSYVVN





SFV3L_
8,114
IPWLMKKPLQLTTLVPLQEYEERLLKQTMLTGSYKEKLQSLFLKYDALWQHWENQVGHRRIKPHHIATGTVNPRPQKQYPINPKAKASIQTVINDLLKQ


P27401-

GVLIQQNSIMNTPVYPVPKPDGKWRMVLDYREVNKTIPLIAAQNQHSAGILSSIFRGKYKTTLDLSNGFWAHSITPESYWLTAFTWLGQQYCWTRLPQ


Pro

GFLNSPALFTADVVDLLKEVPNVQVYVDDIYISHDDPREHLEQLEKVFSLLLNAGYVVSLKKSEIAQHEVEFLGFNITKEGRGLTETFKQKLLNITPPRDL




KQLQSILGLLNFARNFIPNFSELVKPLYNIIATANGKYITWTTDNSQQLQNIISMLNSAENLEERNPEVRLIMKVNTSPSAGYIRFYNEFAKRPIMYLNYVY




TKAEVKFTNTEKLLTTIHKGLIKALDLGMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMSYLEDPRIQFHYDKTLPELQQVPTVTDDIIAKIKHPSEF




SMVFYTDGSAIKHPNVNKSHNAGMGIAQVQFKPEFTVINTWSIPLGDHTAQLAEVAAVEFACKKALKIDGPVLIVTDSFYVAESVNKELPYWQSNGFFN




NKKKPLKHVSKWKSIADCIQLKPDIIIIHEKGHQPTASTFHTEGNNLADKLATQGSYVVN





SFV3L_
8,115
IPWLMKKPLQLTTLVPLQEYEERLLKQTMLTGSYKEKLQSLFLKYDALWQHWENQVGHRRIKPHHIATGTVNPRPQKQYPINPKAKASIQTVINDLLKQ


P27401-

GVLIQQNSIMNTPVYPVPKPDGKWRMVLDYREVNKTIPLIAAQNQHSAGILSSIFRGKYKTTLDLSNGFWAHSITPESYWLTAFTWLGQQYCWTRLPQ


Pro_2mut

GFLNSPALFNADVVDLLKEVPNVQVYVDDIYISHDDPREHLEQLEKVFSLLLNAGYVVSLKKSEIAQHEVEFLGFNITKEGRGLTETFKQKLLNITPPRDL




KQLQSILGLLNFARNFIPNFSELVKPLYNIIATAPGKYITWTTDNSQQLQNIISMLNSAENLEERNPEVRLIMKVNTSPSAGYIRFYNEFAKRPIMYLNYVY




TKAEVKFTNTEKLLTTIHKGLIKALDLGMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMSYLEDPRIQFHYDKTLPELQQVPTVTDDIIAKIKHPSEF




SMVFYTDGSAIKHPNVNKSHNAGMGIAQVQFKPEFTVINTWSIPLGDHTAQLAEVAAVEFACKKALKIDGPVLIVTDSFYVAESVNKELPYWQSNGFFN




NKKKPLKHVSKWKSIADCIQLKPDIIIIHEKGHQPTASTFHTEGNNLADKLATQGSYVVN





SFV3L_
8,116
IPWLMKKPLQLTTLVPLQEYEERLLKQTMLTGSYKEKLQSLFLKYDALWQHWENQVGHRRIKPHHIATGTVNPRPQKQYPINPKAKASIQTVINDLLKQ


P27401-

GVLIQQNSIMNTPVYPVPKPDGKWRMVLDYREVNKTIPLIAAQNQHSAGILSSIFRGKYKTTLDLSNGFWAHSITPESYWLTAFTWLGQQYCWTRLPQ


Pro_2mutA

GFLNSPALFNADVVDLLKEVPNVQVYVDDIYISHDDPREHLEQLEKVFSLLLNAGYVVSLKKSEIAQHEVEFLGFNITKEGRGLTETFKQKLLNITPPRDL




KQLQSILGKLNFARNFIPNFSELVKPLYNIIATAPGKYITWTTDNSQQLQNIISMLNSAENLEERNPEVRLIMKVNTSPSAGYIRFYNEFAKRPIMYLNYVY




TKAEVKFTNTEKLLTTIHKGLIKALDLGMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMSYLEDPRIQFHYDKTLPELQQVPTVTDDIIAKIKHPSEF




SMVFYTDGSAIKHPNVNKSHNAGMGIAQVQFKPEFTVINTWSIPLGDHTAQLAEVAAVEFACKKALKIDGPVLIVTDSFYVAESVNKELPYWQSNGFFN




NKKKPLKHVSKWKSIADCIQLKPDIIIIHEKGHQPTASTFHTEGNNLADKLATQGSYVVN





SFVCP_
8,117
MNPLQLLQPLPAEVKGTKLLAHWNSGATITCIPESFLEDEQPIKQTLIKTIHGEKQQNVYYLTFKVKGRKVEAEVIASPYEYILLSPTDVPWLTQQPLQLTI


Q87040

LVPLQEYQDRILNKTALPEEQKQQLKALFTKYDNLWQHWENQVGHRKIRPHNIATGDYPPRPQKQYPINPKAKPSIQIVIDDLLKQGVLTPQNSTMNTP




VYPVPKPDGRWRMVLDYREVNKTIPLTAAQNQHSAGILATIVRQKYKTTLDLANGFWAHPITPDSYWLTAFTWQGKQYCWTRLPQGFLNSPALFTAD




AVDLLKEVPNVQVYVDDIYLSHDNPHEHIQQLEKVFQILLQAGYVVSLKKSEIGQRTVEFLGFNITKEGRGLTDTFKTKLLNVTPPKDLKQLQSILGLLNF




ARNFIPNFAELVQTLYNLIASSKGKYIEWTEDNTKQLNKVIEALNTASNLEERLPDQRLVIKVNTSPSAGYVRYYNESGKKPIMYLNYVFSKAELKFSMLE




KLLTTMHKALIKAMDLAMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMTYLEDPRIQFHYDKTLPELKHIPDVYTSSIPPLKHPSQYEGVFCTDGSA




IKSPDPTKSNNAGMGIVHAIYNPEYKILNQWSIPLGHHTAQMAEIAAVEFACKKALKVPGPVLVITDSFYVAESANKELPYWKSNGFVNNKKEPLKHISK




WKSIAECLSIKPDITIQHEKGHQPINTSIHTEGNALADKLATQGSYVVN





SFVCP_
8,118
MNPLQLLQPLPAEVKGTKLLAHWNSGATITCIPESFLEDEQPIKQTLIKTIHGEKQQNVYYLTFKVKGRKVEAEVIASPYEYILLSPTDVPWLTQQPLQLTI


Q87040_

LVPLQEYQDRINKTALPEEQKQQLKALFTKYDNLWQHWENQVGHRKIRPHNIATGDYPPRPQKQYPINPKAKPSIQIVIDDLLKQGVLTPQNSTMNTP


2mut

VYPVPKPDGRWRMVLDYREVNKTIPLTAAQNQHSAGILATIVRQKYKTTLDLANGFWAHPITPDSYWLTAFTWQGKQYCWTRLPQGFLNSPALFNAD




AVDLLKEVPNVQVYVDDIYLSHDNPHEHIQQLEKVFQILLQAGYVVSLKKSEIGQRTVEFLGFNITKEGRGLTDTFKTKLLNVTPPKDLKQLQSILGLLNF




ARNFIPNFAELVQTLYNLIASSPGKYIEWTEDNTKQLNKVIEALNTASNLEERLPDQRLVIKVNTSPSAGYVRYYNESGKKPIMYLNYVFSKAELKFSMLE




KLLTTMHKALIKAMDLAMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMTYLEDPRIQFHYDKTLPELKHIPDVYTSSIPPLKHPSQYEGVFCTDGSA




IKSPDPTKSNNAGMGIVHAIYNPEYKILNQWSIPLGHHTAQMAEIAAVEFACKKALKVPGPVLVITDSFYVAESANKELPYWKSNGFVNNKKEPLKHISK




WKSIAECLSIKPDITIQHEKGHQPINTSIHTEGNALADKLATQGSYVVN





SFVCP_
8,119
MNPLQLLQPLPAEVKGTKLLAHWNSGATITCIPESFLEDEQPIKQTLIKTIHGEKQQNVYYLTFKVKGRKVEAEVIASPYEYILLSPTDVPWLTQQPLQLTI


Q87040_

LVPLQEYQDRILNKTALPEEQKQQLKALFTKYDNLWQHWENQVGHRKIRPHNIATGDYPPRPQKQYPINPKAKPSIQIVIDDLLKQGVLTPQNSTMNTP


2mutA

VYPVPKPDGRWRMVLDYREVNKTIPLTAAQNQHSAGILATIVRQKYKTTLDLANGFWAHPITPDSYWLTAFTWQGKQYCWTRLPQGFLNSPALFNAD




AVDLLKEVPNVQVYVDDIYLSHDNPHEHIQQLEKVFQILLQAGYVVSLKKSEIGQRTVEFLGFNITKEGRGLTDTFKTKLLNVTPPKDLKQLQSILGKLNF




ARNFIPNFAELVQTLYNLIASSPGKYIEWTEDNTKQLNKVIEALNTASNLEERLPDQRLVIKVNTSPSAGYVRYYNESGKKPIMYLNYVFSKAELKFSMLE




KLLTTMHKALIKAMDLAMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMTYLEDPRIQFHYDKTLPELKHIPDVYTSSIPPLKHPSQYEGVFCTDGSA




IKSPDPTKSNNAGMGIVHAIYNPEYKILNQWSIPLGHHTAQMAEIAAVEFACKKALKVPGPVLVITDSFYVAESANKELPYWKSNGFVNNKKEPLKHISK




WKSIAECLSIKPDITIQHEKGHQPINTSIHTEGNALADKLATQGSYVVN





SFVCP_
8,120
VPWLTQQPLQLTILVPLQEYQDRILNKTALPEEQKQQLKALFTKYDNLWQHWENQVGHRKIRPHNIATGDYPPRPQKQYPINPKAKPSIQIVIDDLLKQG


Q87040-

VLTPQNSTMNTPVYPVPKPDGRWRMVLDYREVNKTIPLTAAQNQHSAGILATIVRQKYKTTLDLANGFWAHPITPDSYWLTAFTWQGKQYCWTRLPQ


Pro

GFLNSPALFTADAVDLLKEVPNVQVYVDDIYLSHDNPHEHIQQLEKVFQILLQAGYVVSLKKSEIGQRTVEFLGFNITKEGRGLTDTFKTKLLNVTPPKDL




KQLQSILGLLNFARNFIPNFAELVQTLYNLIASSKGKYIEWTEDNTKQLNKVIEALNTASNLEERLPDQRLVIKVNTSPSAGYVRYYNESGKKPIMYLNYV




FSKAELKFSMLEKLLTTMHKALIKAMDLAMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMTYLEDPRIQFHYDKTLPELKHIPDVYTSSIPPLKHPS




QYEGVFCTDGSAIKSPDPTKSNNAGMGIVHAIYNPEYKILNQWSIPLGHHTAQMAEIAAVEFACKKALKVPGPVLVITDSFYVAESANKELPYWKSNGF




VNNKKEPLKHISKWKSIAECLSIKPDITIQHEKGHQPINTSIHTEGNALADKLATQGSYVVN





SFVCP_
8,121
VPWLTQQPLQLTILVPLQEYQDRILNKTALPEEQKQQLKALFTKYDNLWQHWENQVGHRKIRPHNIATGDYPPRPQKQYPINPKAKPSIQIVIDDLLKQG


Q87040-

VLTPQNSTMNTPVYPVPKPDGRWRMVLDYREVNKTIPLTAAQNQHSAGILATIVRQKYKTTLDLANGFWAHPITPDSYWLTAFTWQGKQYCWTRLPQ


Pro_2mut

GFLNSPALFNADAVDLLKEVPNVQVYVDDIYLSHDNPHEHIQQLEKVFQILLQAGYVVSLKKSEIGQRTVEFLGFNITKEGRGLTDTFKTKLLNVTPPKDL




KQLQSILGLLNFARNFIPNFAELVQTLYNLIASSPGKYIEWTEDNTKQLNKVIEALNTASNLEERLPDQRLVIKVNTSPSAGYVRYYNESGKKPIMYLNYV




FSKAELKFSMLEKLLTTMHKALIKAMDLAMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMTYLEDPRIQFHYDKTLPELKHIPDVYTSSIPPLKHPS




QYEGVFCTDGSAIKSPDPTKSNNAGMGIVHAIYNPEYKILNQWSIPLGHHTAQMAEIAAVEFACKKALKVPGPVLVITDSFYVAESANKELPYWKSNGF




VNNKKEPLKHISKWKSIAECLSIKPDITIQHEKGHQPINTSIHTEGNALADKLATQGSYVVN





SFVCP_
8,122
VPWLTQQPLQLTILVPLQEYQDRILNKTALPEEQKQQLKALFTKYDNLWQHWENQVGHRKIRPHNIATGDYPPRPQKQYPINPKAKPSIQIVIDDLLKQG


Q87040-

VLTPQNSTMNTPVYPVPKPDGRWRMVLDYREVNKTIPLTAAQNQHSAGILATIVRQKYKTTLDLANGFWAHPITPDSYWLTAFTWQGKQYCWTRLPQ


Pro_2mutA

GFLNSPALFNADAVDLLKEVPNVQVYVDDIYLSHDNPHEHIQQLEKVFQILLQAGYVVSLKKSEIGQRTVEFLGFNITKEGRGLTDTFKTKLLNVTPPKDL




KQLQSILGKLNFARNFIPNFAELVQTLYNLIASSPGKYIEWTEDNTKQLNKVIEALNTASNLEERLPDQRLVIKVNTSPSAGYVRYYNESGKKPIMYLNYV




FSKAELKFSMLEKLLTTMHKALIKAMDLAMGQEILVYSPIVSMTKIQKTPLPERKALPIRWITWMTYLEDPRIQFHYDKTLPELKHIPDVYTSSIPPLKHPS




QYEGVFCTDGSAIKSPDPTKSNNAGMGIVHAIYNPEYKILNQWSIPLGHHTAQMAEIAAVEFACKKALKVPGPVLVITDSFYVAESANKELPYWKSNGF




VNNKKEPLKHISKWKSIAECLSIKPDITIQHEKGHQPINTSIHTEGNALADKLATQGSYVVN





SMRVH_
8,123
PRSRAIDIPVPHADKISWKITDPVWVDQWPLTYEKTLAAIALVQEQLAAGHIEPTNSPWNTPIFIIKKKSGSWRLLQDLRAVNKVMVPMGALQPGLPSPV


P03364

AIPLNYHKIVIDLKDCFFTIPLHPEDRPYFAFSVPQINFQSPMPRYQWKVLPQGMANSPTLCQKFVAAAIAPVRSQWPEAYILHYMDDILLACDSAEAAK




ACYAHIISCLTSYGLKIAPDKVQVSEPFSYLGFELHHQQVFTPRVCLKTDHLKTLNDFQKLLGDIQWLRPYLKLPTSALVPLNNILKGDPNPLSVRALTPE




AKQSLALINKAIQNQSVQQISYNLPLVLLLLPTPHTPTAVFWQPNGTDPTKNGSPLLWLHLPASPSKVLLTYPSLLAMLIIKGRYTGRQLFGRDPHSIIIPY




TQDQLTWLLQTSDEWAIALSSFTGDIDNHYPSDPVIQFAKLHQFIFPKITKCAPIPQATLVFTDGSSNGIAAYVIDNQPISIKSPYLSAQLVELYAILQVFTV




LAHQPFNLYTDSAYIAQSVPLLETVPFIKSSTNATPLFSKLQQLILNRQHPFFIGHLRAHLNLPGPLAEGNALADAATQIFPIISD





SMRVH_
8,124
PRSRAIDIPVPHADKISWKITDPVWVDQWPLTYEKTLAAIALVQEQLAAGHIEPTNSPWNTPIFIIKKKSGSWRLLQDLRAVNKVMVPMGALQPGLPSPV


P03364_

AIPLNYHKIVIDLKDCFFTIPLHPEDRPYFAFSVPQINFQSPMPRYQWKVLPQGMANSPTLCQKFVAAAIAPVRSQWPEAYILHYMDDILLACDSAEAAK


2mut

ACYAHIISCLTSYGLKIAPDKVQVSEPFSYLGFELHHQQVFTPRVCLKTDHLKTLNDFQKLLGDIQWLRPYLKLPTSALVPLNNILKPDPNPLSVRALTPE




AKQSLALINKAIQNQSVQQISYNLPLVLLLLPTPHTPTAVFWQPNGTDPTKNGSPLLWLHLPASPSKVLLTYPSLLAMLIIKGRYTGRQLFGRDPHSIIIPY




TQDQLTWLLQTSDEWAIALSSFTGDIDNHYPSDPVIQFAKLHQFIFPKITKCAPIPQATLVFTDGSSNGIAAYVIDNQPISIKSPYLSAQLVELYAILQVFTV




LAHQPFNLYTDSAYIAQSVPLLETVPFIKSSTNATPLFSKLQQLILNRQHPFFIGHLRAHLNLPGPLAEGNALADAATQIFPIISD





SMRVH_
8,125
PRSRAIDIPVPHADKISWKITDPVWVDQWPLTYEKTLAAIALVQEQLAAGHIEPTNSPWNTPIFIIKKKSGSWRLLQDLRAVNKVMVPMGALQPGLPSPV


P03364_

APPLNYHKIVIDLKDCFFTIPLHPEDRPYFAFSVPQINFQSPMPRYQWKVLPQGMANSPTLCQKFVAAAIAPVRSQWPEAYILHYMDDILLACDSAEAAK


2mutB

ACYAHIISCLTSYGLKIAPDKVQVSEPFSYLGFELHHQQVFTPRVCLKTDHLKTLNDFQKLLGDIQWLRPYLKLPTSALVPLNNILKPDPNPLSVRALTPE




AKQSLALINKAIQNQSVQQISYNLPLVLLLLPTPHTPTAVFWQPNGTDPTKNGSPLLWLHLPASPSKVLLTYPSLLAMLIIKGRYTGRQLFGRDPHSIIIPY




TQDQLTWLLQTSDEWAIALSSFTGDIDNHYPSDPVIQFAKLHQFIFPKITKCAPIPQATLVFTDGSSNGIAAYVIDNQPISIKSPYLSAQLVELYAILQVFTV




LAHQPFNLYTDSAYIAQSVPLLETVPFIKSSTNATPLFSKLQQLILNRQHPFFIGHLRAHLNLPGPLAEGNALADAATQIFPIISD





SRV2_
8,126
LATAVDILAPQRYADPITWKSDEPVWVDQWPLTQEKLAAAQQLVQEQLQAGHIIESNSPWNTPIFVIKKKSGKWRLLQDLRAVNATMVLMGALQPGLP


P51517

SPVAIPQGYFKIVIDLKDCFFTIPLQPVDQKRFAFSLPSTNFKQPMKRYQWKVLPQGMANSPTLCQKYVAAAIEPVRKSWAQMYIIHYMDDILIAGKLGE




QVLQCFAQLKQALTTTGLQIAPEKVQLQDPYTYLGFQINGPKITNQKAVIRRDKLQTLNDFQKLLGDINWLRPYLHLTTGDLKPLFDILKGDSNPNSPRS




LSEAALASLQKVETAIAEQFVTQIDYTQPLTFLIFNTTLTPTGLFWQNNPVMWVHLPASPKKVLLPYYDAIADLIILGRDNSKKYFGLEPSTIIQPYSKSQIH




WLMQNTETWPIACASYAGNIDNHYPPNKLIQFCKLHAVVFPRIISKTPLDNALLVFTDGSSTGIAAYTFEKTTVRFKTSHTSAQLVELQALIAVLSAFPHR




ALNVYTDSAYLAHSIPLLETVSHIKHISDTAKFFLQCQQLIYNRSIPFYLGHIRAHSGLPGPLSQGNHITDLATKVVATTLTT





SRV2_
8,127
LATAVDILAPQRYADPITWKSDEPVWVDQWPLTQEKLAAAQQLVQEQLQAGHIIESNSPWNTPIFVIKKKSGKWRLLQDLRAVNATMVLMGALQPGLP


P51517_

SPVAPPQGYFKIVIDLKDCFFTIPLQPVDQKRFAFSLPSTNFKQPMKRYQWKVLPQGMANSPTLCQKYVAAAIEPVRKSWAQMYIIHYMDDILIAGKLGE


2mutB

QVLQCFAQLKQALTTTGLQIAPEKVQLQDPYTYLGFQINGPKITNQKAVIRRDKLQTLNDFQKLLGDINWLRPYLHLTTGDLKPLFDILKGDSNPNSPRS




LSEAALASLQKVETAIAEQFVTQIDYTQPLTFLIFNTTLTPTGLFWQNNPVMWVHLPASPKKVLLPYYDAIADLIILGRDNSKKYFGLEPSTIIQPYSKSQIH




WLMQNTETWPIACASYAGNIDNHYPPNKLIQFCKLHAVVFPRIISKTPLDNALLVFTDGSSTGIAAYTFEKTTVRFKTSHTSAQLVELQALIAVLSAFPHR




ALNVYTDSAYLAHSIPLLETVSHIKHISDTAKFFLQCQQLIYNRSIPFYLGHIRAHSGLPGPLSQGNHITDLATKVVATTLTT





WDSV_
8,128
SCQTKNTLNIDEYLLQFPDQLWASLPTDIGRMLVPPITIKIKDNASLPSIRQYPLPKDKTEGLRPLISSLENQGILIKCHSPCNTPIFPIKKAGRDEYRMIHD


O92815

LRAINNIVAPLTAVVASPTTVLSNLAPSLHWFTVIDLSNAFFSVPIHKDSQYLFAFTFEGHQYTWTVLPQGFIHSPTLFSQALYQSLHKIKFKISSEICIYMD




DVLIASKDRDTNLKDTAVMLQHLASEGHKVSKKKLQLCQQEVVYLGQLLTPEGRKILPDRKVTVSQFQQPTTIRQIRAFLGLVGYCRHWIPEFSIHSKFL




EKQLKKDTAEPFQLDDQQVEAFNKLKHAITTAPVLWPDPAKPFQLYTSHSEHASIAVLTQKHAGRTRPIAFLSSKFDAIESGLPPCLKACASIHRSLTQA




DSFILGAPLIIYTTHAICTLLQRDRSQLVTASRFSKWEADLLRPELTFVACSAVSPAHLYMQSCENNIPPHDCVLLTHTISRPRPDLSDLPIPDPDMTLFSD




GSYTTGRGGAAVVMHRPVTDDFIIIHQQPGGASAQTAELLALAAACHLATDKTVNIYTDSRYAYGWHDFGHLWMHRGFVTSAGTPIKNHKEIEYLLKQ




IMKPKQVSVIKIEAHTKGVSMEVRGNAAADEAAKNAVFLVQR





WDSV_
8,129
SCQTKNTLNIDEYLLQFPDQLWASLPTDIGRMLVPPITIKIKDNASLPSIRQYPLPKDKTEGLRPLISSLENQGILIKCHSPCNTPIFPIKKAGRDEYRMIHD


O92815_

LRAINNIVAPLTAVVASPTTVLSNLAPSLHWFTVIDLSNAFFSVPIHKDSQYLFAFTFEGHQYTWTVLPQGFIHSPTLFNQALYQSLHKIKFKISSEICIYMD


2mut

DVLIASKDRDTNLKDTAVMLQHLASEGHKVSKKKLQLCQQEVVYLGQLLTPEGRKILPDRKVTVSQFQQPTTIRQIRAFLGLVGYCRHWIPEFSIHSKFL




EKQLKPDTAEPFQLDDQQVEAFNKLKHAITTAPVLVVPDPAKPFQLYTSHSEHASIAVLTQKHAGRTRPIAFLSSKFDAIESGLPPCLKACASIHRSLTQA




DSFILGAPLIIYTTHAICTLLQRDRSQLVTASRFSKWEADLLRPELTFVACSAVSPAHLYMQSCENNIPPHDCVLLTHTISRPRPDLSDLPIPDPDMTLFSD




GSYTTGRGGAAVVMHRPVTDDFIIIHQQPGGASAQTAELLALAAACHLATDKTVNIYTDSRYAYGWHDFGHLWMHRGFVTSAGTPIKNHKEIEYLLKQ




IMKPKQVSVIKIEAHTKGVSMEVRGNAAADEAAKNAVFLVQR





WDSV_
8,130
SCQTKNTLNIDEYLLQFPDQLWASLPTDIGRMLVPPITIKIKDNASLPSIRQYPLPKDKTEGLRPLISSLENQGILIKCHSPCNTPIFPIKKAGRDEYRMIHD


O92815_

LRAINNIVAPLTAVVASPTTVLSNLAPSLHWFTVIDLSNAFFSVPIHKDSQYLFAFTFEGHQYTWTVLPQGFIHSPTLFNQALYQSLHKIKFKISSEICIYMD


2mutA

DVLIASKDRDTNLKDTAVMLQHLASEGHKVSKKKLQLCQQEVVYLGQLLTPEGRKILPDRKVTVSQFQQPTTIRQIRAFLGKVGYCRHFIPEFSIHSKFL




EKQLKPDTAEPFQLDDQQVEAFNKLKHAITTAPVLVVPDPAKPFQLYTSHSEHASIAVLTQKHAGRTRPIAFLSSKFDAIESGLPPCLKACASIHRSLTQA




DSFILGAPLIIYTTHAICTLLQRDRSQLVTASRFSKWEADLLRPELTFVACSAVSPAHLYMQSCENNIPPHDCVLLTHTISRPRPDLSDLPIPDPDMTLFSD




GSYTTGRGGAAVVMHRPVTDDFIIIHQQPGGASAQTAELLALAAACHLATDKTVNIYTDSRYAYGWHDFGHLWMHRGFVTSAGTPIKNHKEIEYLLKQ




IMKPKQVSVIKIEAHTKGVSMEVRGNAAADEAAKNAVFLVQR





WMSV_
8,131
VLNLEEEYRLHEKPVPSSIDPSWLQLFPTVWAERAGMGLANQVPPVVVELRSGASPVAVRQYPMSKEAREGIRPHIQRFLDLGVLVPCQSPWNTPLL


P03359

PVKKPGTNDYRPVQDLREINKRVQDIHPTVPNPYNLLSSLPPSHTWYSVLDLKDAFFCLKLHPNSQPLFAFEWRDPEKGNTGQLTWTRLPQGFKNSP




TLFDEALHRDLAPFRALNPQVVLLQYVDDLLVAAPTYRDCKEGTQKLLQELSKLGYRVSAKKAQLCQKEVTYLGYLLKEGKRWLTPARKATVMKIPPP




TTPRQVREFLGTAGFCRLWIPGFASLAAPLYPLTKESIPFIWTEEHQKAFDRIKEALLSAPALALPDLTKPFTLYVDERAGVARGVLTQTLGPWRRPVAY




LSKKLDPVASGWPTCLKAVAAVALLLKDADKLTLGQNVTVIASHSLESIVRQPPDRWMTNARMTHYQSLLLNERVSFAPPAVLNPATLLPVESEATPVH




RCSEILAEETGTRRDLKDQPLPGVPAWYTDGSSFIAEGKRRAGAAIVDGKRTVWASSLPEGTSAQKAELVALTQALRLAEGKDINIYTDSRYAFATAHI




HGAIYKQRGLLTSAGKDIKNKEEILALLEAIHLPKRVAIIHCPGHQKGNDPVATGNRRADEAAKQAALSTRVLAETTKP





WMSV_
8,132
VLNLEEEYRLHEKPVPSSIDPSWLQLFPTVWAERAGMGLANQVPPVVVELRSGASPVAVRQYPMSKEAREGIRPHIQRFLDLGVLVPCQSPWNTPLL


P03359_

PVKKPGTNDYRPVQDLREINKRVQDIHPTVPNPYNLLSSLPPSHTWYSVLDLKDAFFCLKLHPNSQPLFAFEWRDPEKGNTGQLTWTRLPQGFKNSP


3mut

TLFNEALHRDLAPFRALNPQVVLLQYVDDLLVAAPTYRDCKEGTQKLLQELSKLGYRVSAKKAQLCQKEVTYLGYLLKEGKRWLTPARKATVMKIPPP




TTPRQVREFLGTAGFCRLWIPGFASLAAPLYPLTKPSIPFIWTEEHQKAFDRIKEALLSAPALALPDLTKPFTLYVDERAGVARGVLTQTLGPWRRPVAY




LSKKLDPVASGWPTCLKAVAAVALLLKDADKLTLGQNVTVIASHSLESIVRQPPDRWMTNARMTHYQSLLLNERVSFAPPAVLNPATLLPVESEATPVH




RCSEILAEETGTRRDLKDQPLPGVPAWYTDGSSFIAEGKRRAGAAIVDGKRTVWASSLPEGTSAQKAELVALTQALRLAEGKDINIYTDSRYAFATAHI




HGAIYKQRGWLTSAGKDIKNKEEILALLEAIHLPKRVAIIHCPGHQKGNDPVATGNRRADEAAKQAALSTRVLAETTKP





WMSV_
8,133
VLNLEEEYRLHEKPVPSSIDPSWLQLFPTVWAERAGMGLANQVPPVVVELRSGASPVAVRQYPMSKEAREGIRPHIQRFLDLGVLVPCQSPWNTPLL


P03359_

PVKKPGTNDYRPVQDLREINKRVQDIHPTVPNPYNLLSSLPPSHTWYSVLDLKDAFFCLKLHPNSQPLFAFEWRDPEKGNTGQLTWTRLPQGFKNSP


3mutA

TLFNEALHRDLAPFRALNPQVVLLQYVDDLLVAAPTYRDCKEGTQKLLQELSKLGYRVSAKKAQLCQKEVTYLGYLLKEGKRWLTPARKATVMKIPPP




TTPRQVREFLGKAGFCRLFIPGFASLAAPLYPLTKPSIPFIWTEEHQKAFDRIKEALLSAPALALPDLTKPFTLYVDERAGVARGVLTQTLGPWRRPVAY




LSKKLDPVASGWPTCLKAVAAVALLLKDADKLTLGQNVTVIASHSLESIVRQPPDRWMTNARMTHYQSLLLNERVSFAPPAVLNPATLLPVESEATPVH




RCSEILAEETGTRRDLKDQPLPGVPAWYTDGSSFIAEGKRRAGAAIVDGKRTVWASSLPEGTSAQKAELVALTQALRLAEGKDINIYTDSRYAFATAHI




HGAIYKQRGWLTSAGKDIKNKEEILALLEAIHLPKRVAIIHCPGHQKGNDPVATGNRRADEAAKQAALSTRVLAETTKP





XMRV6_
8,134
TLNIEDEYRLHETSKEPDVPLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


A1Z651

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT




LFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSEQDCQRGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPK




TPRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVA




YLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPWVALNPATLLPLPEKEA




PHDCLEILAETHGTRPDLTDQPIPDADYTWYTDGSSFLQEGQRRAGAAVTTETEVIWARALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFAT




AHVHGEIYRRRGLLTSEGREIKNKNEILALLKALFLPKRLSIIHCPGHQKGNSAEARGNRMADQAAREAAMKAVLETSTLL





XMRV6_
8,135
TLNIEDEYRLHETSKEPDVPLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


A1Z651_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT


3mut

LFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSEQDCQRGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPK




TPRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPV




AYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPWVALNPATLLPLPEKE




APHDCLEILAETHGTRPDLTDQPIPDADYTWYTDGSSFLQEGQRRAGAAVTTETEVIWARALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAF




ATAHVHGEIYRRRGWLTSEGREIKNKNEILALLKALFLPKRLSIIHCPGHQKGNSAEARGNRMADQAAREAAMKAVLETSTLL





XMRV6_
8,136
TLNIEDEYRLHETSKEPDVPLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP


A1Z651_

VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPT


3mutA

LFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSEQDCQRGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPK




TPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVA




YLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQAMLLDTDRVQFGPVVALNPATLLPLPEKEA




PHDCLEILAETHGTRPDLTDQPIPDADYTWYTDGSSFLQEGQRRAGAAVTTETEVIWARALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFAT




AHVHGEIYRRRGWLTSEGREIKNKNEILALLKALFLPKRLSIIHCPGHQKGNSAEARGNRMADQAAREAAMKAVLETSTLL









In some embodiments, a gene modifying polypeptide described herein comprises an RT domain having an amino acid sequence according to Table F3, or a sequence having at least 70%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto. In some embodiments, a nucleic acid described herein encodes an RT domain having an amino acid sequence according to Table F3, or a sequence having at least 70%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity thereto.









TABLE F3







Exemplary polypeptide sequences


Table F3 provides exemplary polypeptide sequences for use in combination with template sequences


described herein for use in correcting the pathogenic F508del mutation in CFTR.















SEQ


RNAIVT
Plasmid
Name
Polypeptide amino acid sequence
ID NO





RNAV252
PLV9103
pT7RNA_
MPAAKRVKLDGGDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDS
19543




ModUTRs-
GETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNI





SpCas9-
VDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQL





WT_MLVMS
VQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNF






DLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIK






RYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL






LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLAR






GNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNE






LTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNA






SLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRY






TGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHE






HIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGI






KELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDN






KVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIK






RQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA






HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEI






TLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSD






KLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLE






AKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSP






EDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTN






LGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSSGGSSGSETPGTS






ESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIP






LKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREV






NKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLT






WTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNL






GYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIP






GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGV






LTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALV






KQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDL






TDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAE






GKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKG






HSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFESPKKKA






KVE*






RNAV209
PLV8279
pT7RNA
MPAAKRVKLDGGDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDS
19544




ModUTRs_
GETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNI





N863A-
VDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQL





cMyc-
VQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNF





CAS9_
DLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIK





MLVMS-
RYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL





bi-
LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLAR





SV40A5
GNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNE





NLS
LTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNA





(Prop_
SLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRY





design3)_
TGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHE





BspQI
HIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGI






KELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDN






KVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIK






RQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA






HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEI






TLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSD






KLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLE






AKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSP






EDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTN






LGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSSGGSSGSETPGTS






ESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIP






LKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREV






NKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLT






WTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNL






GYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIP






GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGV






LTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALV






KQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDL






TDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAE






GKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKG






HSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFESPKKKA






KVE*






RNAV253
PLV9106
pT7RNA_
MPAAKRVKLDGGSDLVLGLDIGIGSVGVGILNKVTGEIIHKNSRIFPAAQAENNLVRRTNRQGRRL
19545




ModUTRs-
ARRKKHRRVRLNRLFEESGLITDFTKISINLNPYQLRVKGLTDELSNEELFIALKNMVKHRGISYLD





St1-
DASDDGNSSVGDYAQIVKENSKQLETKTPGQIQLERYQTYGQLRGDFTVEKDGKKHRLINVFPTSA





WT_MLVMS
YRSEALRILQTQQEFNPQITDEFINRYLEILTGKRKYYHGPGNEKSRTDYGRYRTSGETLDNIFGILIG






KCTFYPDEFRAAKASYTAQEFNLLNDLNNLTVPTETKKLSKEQKNQIINYVKNEKAMGPAKLFKYI






AKLLSCDVADIKGYRIDKSGKAEIHTFEAYRKMKTLETLDIEQMDRETLDKLAYVLTLNTEREGIQ






EALEHEFADGSFSQKQVDELVQFRKANSSIFGKGWHNFSVKLMMELIPELYETSEEQMTILTRLGK






QKTTSSSNKTKYIDEKLLTEEIYNPVVAKSVRQAIKIVNAAIKEYGDFDNIVIEMARETNEDDEKKAI






QKIQKANKDEKDAAMLKAANQYNGKAELPHSVFHGHKQLATKIRLWHQQGERCLYTGKTISIHD






LINNSNQFEVDHILPLSITFDDSLANKVLVYATANQEKGQRTPYQALDSMDDAWSFRELKAFVRES






KTLSNKKKEYLLTEEDISKFDVRKKFIERNLVDTRYASRVVLNALQEHFRAHKIDTKVSVVRGQFT






SQLRRHWGIEKTRDTYHHHAVDALIIAASSQLNLWKKQKNTLVSYSEDQLLDIETGELISDDEYKE






SVFKAPYQHFVDTLKSKEFEDSILFSYQVDSKFNRKISDATIYATRQAKVGKDKADETYVLGKIKDI






YTQDGYDAFMKIYKKDKSKFLMYRHDPQTFEKVIEPILENYPNKQINEKGKEVPCNPFLKYKEEHG






YIRKYSKKGNGPEIKSLKYYDSKLGNHIDITPKDSNNKVVLQSVSPWRADVYFNKTTGKYEILGLK






YADLQFEKGTGTYKISQEKYNDIKKKEGVDSDSEFKFTLYKNDLLLVKDTETKEQQLFRFLSRTMP






KQKHYVELKPYDKQKFEGGEALIKVLGNVANSGQCKKGLGKSNISIYKVRTDVLGNQHIIKNEGD






KPKLDFSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFP






QAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTP






LLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPT






SQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAA






TSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTP






KTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLP






DLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAG






KLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEE






GLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPA






GTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILAL






LKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADG






SEFEKRTADGSEFESPKKKAKVE*






RNAV236
PLV8388
pT7RNA
MPAAKRVKLDGGSDLVLGLDIGIGSVGVGILNKVTGEIIHKNSRIFPAAQAENNLVRRTNRQGRRL
19546




ModUTRs-
ARRKKHRRVRLNRLFEESGLITDFTKISINLNPYQLRVKGLTDELSNEELFIALKNMVKHRGISYLD





St1Cas9-
DASDDGNSSVGDYAQIVKENSKQLETKTPGQIQLERYQTYGQLRGDFTVEKDGKKHRLINVFPTSA





N622A_
YRSEALRILQTQQEFNPQITDEFINRYLEILTGKRKYYHGPGNEKSRTDYGRYRTSGETLDNIFGILIG





MLVMS_
KCTFYPDEFRAAKASYTAQEFNLLNDLNNLTVPTETKKLSKEQKNQIINYVKNEKAMGPAKLFKYI





BspQ1_
AKLLSCDVADIKGYRIDKSGKAEIHTFEAYRKMKTLETLDIEQMDRETLDKLAYVLTLNTEREGIQ





BtgZI
EALEHEFADGSFSQKQVDELVQFRKANSSIFGKGWHNFSVKLMMELIPELYETSEEQMTILTRLGK






QKTTSSSNKTKYIDEKLLTEEIYNPVVAKSVRQAIKIVNAAIKEYGDFDNIVIEMARETNEDDEKKAI






QKIQKANKDEKDAAMLKAANQYNGKAELPHSVFHGHKQLATKIRLWHQQGERCLYTGKTISIHD






LINNSNQFEVDHILPLSITFDDSLANKVLVYATAAQEKGQRTPYQALDSMDDAWSFRELKAFVRES






KTLSNKKKEYLLTEEDISKFDVRKKFIERNLVDTRYASRVVLNALQEHFRAHKIDTKVSVVRGQFT






SQLRRHWGIEKTRDTYHHHAVDALIIAASSQLNLWKKQKNTLVSYSEDQLLDIETGELISDDEYKE






SVFKAPYQHFVDTLKSKEFEDSILFSYQVDSKFNRKISDATIYATRQAKVGKDKADETYVLGKIKDI






YTQDGYDAFMKIYKKDKSKFLMYRHDPQTFEKVIEPILENYPNKQINEKGKEVPCNPFLKYKEEHG






YIRKYSKKGNGPEIKSLKYYDSKLGNHIDITPKDSNNKVVLQSVSPWRADVYFNKTTGKYEILGLK






YADLQFEKGTGTYKISQEKYNDIKKKEGVDSDSEFKFTLYKNDLLLVKDTETKEQQLFRFLSRTMP






KQKHYVELKPYDKQKFEGGEALIKVLGNVANSGQCKKGLGKSNISIYKVRTDVLGNQHIIKNEGD






KPKLDFSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFP






QAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTP






LLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPT






SQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAA






TSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTP






KTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLP






DLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAG






KLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEE






GLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPA






GTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILAL






LKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADG






SEFEKRTADGSEFESPKKKAKVE*






RNAV254
PLV9096
pT7RNA_
MPAAKRVKLDGGAAFKPNPINYILGLDIGIASVGWAMVEIDEEENPIRLIDLGVRVFERAEVPKTGD
19547




ModUTRs-
SLAMARRLARSVRRLTRRRAHRLLRARRLLKREGVLQAADFDENGLIKSLPNTPWQLRAAALDRK





Nme2-
LTPLEWSAVLLHLIKHRGYLSQRKNEGETADKELGALLKGVANNAHALQTGDFRTPAELALNKFE





WT_MLVMS
KESGHIRNQRGDYSHTFSRKDLQAELILLFEKQKEFGNPHVSGGLKEGIETLLMTQRPALSGDAVQ






KMLGHCTFEPAEPKAAKNTYTAERFIWLTKLNNLRILEQGSERPLTDTERATLMDEPYRKSKLTYA






QARKLLGLEDTAFFKGLRYGKDNAEASTLMEMKAYHAISRALEKEGLKDKKSPLNLSSELQDEIG






TAFSLFKTDEDITGRLKDRVQPEILEALLKHISFDKFVQISLKALRRIVPLMEQGKRYDEACAEIYGD






HYGKKNTEEKIYLPPIPADEIRNPVVLRALSQARKVINGVVRRYGSPARIHIETAREVGKSFKDRKEI






EKRQEENRKDREKAAAKFREYFPNFVGEPKSKDILKLRLYEQQHGKCLYSGKEINLVRLNEKGYV






EIDHALPFSRTWDDSFNNKVLVLGSENQNKGNQTPYEYFNGKDNSREWQEFKARVETSRFPRSKK






QRILLQKFDEDGFKECNLNDTRYVNRFLCQFVADHILLTGKGKRRVFASNGQITNLLRGFWGLRK






VRAENDRHHALDAVVVACSTVAMQQKITRFVRYKEMNAFDGKTIDKETGKVLHQKTHFPQPWEF






FAQEVMIRVFGKPDGKPEFEEADTPEKLRTLLAEKLSSRPEAVHEYVTPLFVSRAPNRKMSGAHKD






TLRSAKRFVKHNEKISVKRVWLTEIKLADLENMVNYKNGREIELYEALKARLEAYGGNAKQAFDP






KDNPFYKKGGQLVKAVRVEKTQESGVLLNKKNAYTIADNGDMVRVDVFCKVDKKGKNQYFIVPI






YAWQVAENILPDIDCKGYRIDDSYTFCFSLHKYDLIAFQKDEKSKVEFAYYINCDSSNGRFYLAWH






DKGSKEQQFRISTQNLVLIQKYQVNELGKEIRPCRLKKRPPVRSGGSSGGSSGSETPGTSESATPESS






GGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPV






SIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH






PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGF






KNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKK






AQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAP






LYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPW






RRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWL






SNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDA






DHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVY






TDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARG






NRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFESPKKKAKVE*






RNAV237
PLV8379
pT7RNA
MPAAKRVKLDGGAAFKPNPINYILGLDIGIASVGWAMVEIDEEENPIRLIDLGVRVFERAEVPKTGD
19548




ModUTRs-
SLAMARRLARSVRRLTRRRAHRLLRARRLLKREGVLQAADFDENGLIKSLPNTPWQLRAAALDRK





Nme2Cas9-
LTPLEWSAVLLHLIKHRGYLSQRKNEGETADKELGALLKGVANNAHALQTGDFRTPAELALNKFE





N611A_
KESGHIRNQRGDYSHTFSRKDLQAELILLFEKQKEFGNPHVSGGLKEGIETLLMTQRPALSGDAVQ





MLVMS_
KMLGHCTFEPAEPKAAKNTYTAERFIWLTKLNNLRILEQGSERPLTDTERATLMDEPYRKSKLTYA





BspQ1_
QARKLLGLEDTAFFKGLRYGKDNAEASTLMEMKAYHAISRALEKEGLKDKKSPLNLSSELQDEIG





BtgZI
TAFSLFKTDEDITGRLKDRVQPEILEALLKHISFDKFVQISLKALRRIVPLMEQGKRYDEACAEIYGD






HYGKKNTEEKIYLPPIPADEIRNPVVLRALSQARKVINGVVRRYGSPARIHIETAREVGKSFKDRKEI






EKRQEENRKDREKAAAKFREYFPNFVGEPKSKDILKLRLYEQQHGKCLYSGKEINLVRLNEKGYV






EIDHALPFSRTWDDSFNNKVLVLGSEAQNKGNQTPYEYFNGKDNSREWQEFKARVETSRFPRSKK






QRILLQKFDEDGFKECNLNDTRYVNRFLCQFVADHILLTGKGKRRVFASNGQITNLLRGFWGLRK






VRAENDRHHALDAVVVACSTVAMQQKITRFVRYKEMNAFDGKTIDKETGKVLHQKTHFPQPWEF






FAQEVMIRVFGKPDGKPEFEEADTPEKLRTLLAEKLSSRPEAVHEYVTPLFVSRAPNRKMSGAHKD






TLRSAKRFVKHNEKISVKRVWLTEIKLADLENMVNYKNGREIELYEALKARLEAYGGNAKQAFDP






KDNPFYKKGGQLVKAVRVEKTQESGVLLNKKNAYTIADNGDMVRVDVFCKVDKKGKNQYFIVPI






YAWQVAENILPDIDCKGYRIDDSYTFCFSLHKYDLIAFQKDEKSKVEFAYYINCDSSNGRFYLAWH






DKGSKEQQFRISTQNLVLIQKYQVNELGKEIRPCRLKKRPPVRSGGSSGGSSGSETPGTSESATPESS






GGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPV






SIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH






PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGF






KNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKK






AQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAP






LYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPW






RRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWL






SNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDA






DHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVY






TDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARG






NRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFESPKKKAKVE*






RNAV255
PLV9105
pT7RNA_
MPAAKRVKLDGGDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDS
19549




ModUTRs-
GETAERTRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNI





Spy-
VDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQL





SpRY-
VQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNF





WT_MLVMS
DLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIK






RYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL






LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLAR






GNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNE






LTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNA






SLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRY






TGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHE






HIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGI






KELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDN






KVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIK






RQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA






HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEI






TLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESIRPKRNSD






KLIARKKDWDPKKYGGFLWPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLE






AKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAKQLQKGNELALPSKYVNFLYLASHYEKLKGS






PEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLT






RLGAPRAFKYFDTTIDPKQYRSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSSGGSSGSETPGT






SESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIP






LKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREV






NKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLT






WTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNL






GYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIP






GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGV






LTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALV






KQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDL






TDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAE






GKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKG






HSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFESPKKKA






KVE*






RNAV256
PLV9095
pT7RNA_
MPAAKRVKLDGGAYTMGIDVGIASCGWAIVDLERQRIIDIGVRTFEKAENPKNGEALAVPRREARS
19550




ModUT
SRRRLRRKKHRIERLKHMFVRNGLAVDIQHLEQTLRSQNEIDVWQLRVDGLDRMLTQKEWLRVLI





Rs-Blat-
HLAQRRGFQSNRKTDGSSEDGQVLVNVTENDRLMEEKDYRTVAEMMVKDEKFSDHKRNKNGNY





WT_MLVMS
HGVVSRSSLLVEIHTLFETQRQHHNSLASKDFELEYVNIWSAQRPVATKDQIEKMIGTCTFLPKEKR






APKASWHFQYFMLLQTINHIRITNVQGTRSLNKEEIEQVVNMALTKSKVSYHDTRKILDLSEEYQF






VGLDYGKEDEKKKVESKETIIKLDDYHKLNKIFNEVELAKGETWEADDYDTVAYALTFFKDDEDI






RDYLQNKYKDSKNRLVKNLANKEYTNELIGKVSTLSFRKVGHLSLKALRKIIPFLEQGMTYDKAC






QAAGFDFQGISKKKRSVVLPVIDQISNPVVNRALTQTRKVINALIKKYGSPETIHIETARELSKTFDE






RKNITKDYKENRDKNEHAKKHLSELGIINPTGLDIVKYKLWCEQQGRCMYSNQPISFERLKESGYT






EVDHIIPYSRSMNDSYNNRVLVMTRENREKGNQTPFEYMGNDTQRWYEFEQRVTTNPQIKKEKRQ






NLLLKGFTNRRELEMLERNLNDTRYITKYLSHFISTNLEFSPSDKKKKVVNTSGRITSHLRSRWGLE






KNRGQNDLHHAMDAIVIAVTSDSFIQQVTNYYKRKERRELNGDDKFPLPWKFFREEVIARLSPNPK






EQIEALPNHFYSEDELADLQPIFVSRMPKRSITGEAHQAQFRRVVGKTKEGKNITAKKTALVDISYD






KNGDFNMYGRETDPATYEAIKERYLEFGGNVKKAFSTDLHKPKKDGTKGPLIKSVRIMENKTLVH






PVNKGKGVVYNSSIVRTDVFQRKEKYYLLPVYVTDVTKGKLPNKVIVAKKGYHDWIEVDDSFTFL






FSLYPNDLIFIRQNPKKKISLKKRIESHSISDSKEVQEIHAYYKGVDSSTAAIEFIIHDGSYYAKGVGV






QNLDCFEKYQVDILGNYFKVKGEKRLELETSDSNHKGKDVNSIKSTSRSGGSSGGSSGSETPGTSES






ATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLK






ATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNK






RVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWT






RLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY






RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGF






AEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLT






QKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQ






PPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTD






QPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEG






KKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGH






SAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFESPKKKAK






VE*






RNAV257
PLV9097
pT7RNA_
MPAAKRVKLDGGQNNPLNYILGLDLGIASIGWAVVEIDEESSPIRLIDVGVRTFERAEVAKTGESLA
19551




ModUTRs-
LSRRLARSSRRLIKRRAERLKKAKRLLKAEKILHSIDEKLPINVWQLRVKGLKEKLERQEWAAVLL





Ppn-
HLSKHRGYLSQRKNEGKSDNKELGALLSGIASNHQMLQSSEYRTPAEIAVKKFQVEEGHIRNQRGS





WT_MLVMS
YTHTFSRLDLLAEMELLFQRQAELGNSYTSTTLLENLTALLMWQKPALAGDAILKMLGKCTFEPSE






YKAAKNSYSAERFVWLTKLNNLRILENGTERALNDNERFALLEQPYEKSKLTYAQVRAMLALSDN






AIFKGVRYLGEDKKTVESKTTLIEMKFYHQIRKTLGSAELKKEWNELKGNSDLLDEIGTAFSLYKT






DDDICRYLEGKLPERVLNALLENLNFDKFIQLSLKALHQILPLMLQGQRYDEAVSAIYGDHYGKKS






TETTRLLPTIPADEIRNPVVLRTLTQARKVINAVVRLYGSPARIHIETAREVGKSYQDRKKLEKQQE






DNRKQRESAVKKFKEMFPHFVGEPKGKDILKMRLYELQQAKCLYSGKSLELHRLLEKGYVEVDH






ALPFSRTWDDSFNNKVLVLANENQNKGNLTPYEWLDGKNNSERWQHFVVRVQTSGFSYAKKQRI






LNHKLDEKGFIERNLNDTRYVARFLCNFIADNMLLVGKGKRNVFASNGQITALLRHRWGLQKVRE






QNDRHHALDAVVVACSTVAMQQKITRFVRYNEGNVFSGERIDRETGEIIPLHFPSPWAFFKENVEIR






IFSENPKLELENRLPDYPQYNHEWVQPLFVSRMPTRKMTGQGHMETVKSAKRLNEGLSVLKVPLT






QLKLSDLERMVNRDREIALYESLKARLEQFGNDPAKAFAEPFYKKGGALVKAVRLEQTQKSGVLV






RDGNGVADNASMVRVDVFTKGGKYFLVPIYTWQVAKGILPNRAATQGKDENDWDIMDEMATFQ






FSLCQNDLIKLVTKKKTIFGYFNGLNRATSNINIKEHDLDKSKGKLGIYLEVGVKLAISLEKYQVDE






LGKNIRPCRPTKRQHVRSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPD






VSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQG






ILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDL






KDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLI






LLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTE






ARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIK






QALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRM






VAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVV






ALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVT






TETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTS






EGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIE






NSSPSGGSKRTADGSEFEKRTADGSEFESPKKKAKVE*






RNAV258
PLV9098
pT7RNA_
MPAAKRVKLDGGKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARR
19552




ModUTRs-
LKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNV





Sau-KKH-
NEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQ





WT_MLVMS
KAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYN






ADLYNALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTG






KPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGY






TGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSI






KVINAIIKKYGLPNDIIIELAREKNSKDAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLH






DMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPFQYLSS






SDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNLVDTRYATRGLMNLLR






SYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIANADFIFKEWKKLDKAKK






VMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRKLINDTLYSTRK






DDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNPLYKY






YEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGV






YKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYKNDLIKINGELYRVIGVNNDL






LNRIEVNMIDITYREYLENMNDKRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKGSGGS






SGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGG






MGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGT






NDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEW






RDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQ






GTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLRE






FLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELF






VDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPL






VILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLD






ILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAE






LIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPK






RLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTA






DGSEFESPKKKAKVE*






RNAV259
PLV9099
pT7RNA_
MPAAKRVKLDGGQENQQKQNYILGLDIGITSVGYGLIDSKTREVIDAGVRLFPEADSENNSNRRSK
19553




ModUTRs-
RGARRLKRRRIHRLNRVKDLLADYQMIDLNNVPKSTDPYTIRVKGLREPLTKEEFAIALLHIAKRRG





Sauri-
LHNISVSMGDEEQDNELSTKQQLQKNAQQLQDKYVCELQLERLTNINKVRGEKNRFKTEDFVKEV





KKH-
KQLCETQRQYHNIDDQFIQQYIDLVSTRREYFEGPGNGSPYGWDGDLLKWYEKLMGRCTYFPEEL





WT_MLVMS
RSVKYAYSADLFNALNDLNNLVVTRDDNPKLEYYEKYHIIENVFKQKKNPTLKQIAKEIGVQDYDI






RGYRITKSGKPQFTSFKLYHDLKNIFEQAKYLEDVEMLDEIAKILTIYQDEISIKKALDQLPELLTESE






KSQIAQLTGYTGTHRLSLKCIHIVIDELWESPENQMEIFTRLNLKPKKVEMSEIDSIPTTLVDEFILSP






VVKRAFIQSIKVINAVINRFGLPEDIIIELAREKNSKDRRKFINKLQKQNEATRKKIEQLLAKYGNTN






AKYMIEKIKLHDMQEGKCLYSLEAIPLEDLLSNPTHYEVDHIIPRSVSFDNSLNNKVLVKQSENSKK






GNRTPYQYLSSNESKISYNQFKQHILNLSKAKDRISKKKRDMLLEERDINKFEVQKEFINRNLVDTR






YATRELSNLLKTYFSTHDYAVKVKTINGGFTNHLRKVWDFKKHRNHGYKHHAEDALVIANADFL






FKTHKALRRTDKILEQPGLEVNDTTVKVDTEEKYQELFETPKQVKNIKQFRDFKYSHRVDKKPNR






KLINDTLYSTREIDGETYVVQTLKDLYAKDNEKVKKLFTERPQKILMYQHDPKTFEKLMTILNQYA






EAKNPLAAYYEDKGEYVTKYAKKGNGPAIHKIKYIDKKLGSYLDVSNKYPETQNKLVKLSLKSFR






FDIYKCEQGYKMVSIGYLDVLKKDNYYYIPKDKYEAEKQKKKIKESDLFVGSFYKNDLIMYEDELF






RVIGVNSDINNLVELNMVDITYKDFCEVNNVTGEKHIKKTIGKRVVLIEKYTTDILGNLYKTPLPKK






PQLIFKRGELSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWL






SDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSP






WNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCL






RLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDD






LLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVM






GQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAP






ALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVL






TKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATL






LPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIW






AKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKN






KDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGS






KRTADGSEFEKRTADGSEFESPKKKAKVE*






RNAV260
PLV9100
pT7RNA_
MPAAKRVKLDGGQENQQKQNYILGLDIGITSVGYGLIDSKTREVIDAGVRLFPEADSENNSNRRSK
19554




ModUTRs-
RGARRLKRRRIHRLNRVKDLLADYQMIDLNNVPKSTDPYTIRVKGLREPLTKEEFAIALLHIAKRRG





Sauri-
LHNISVSMGDEEQDNELSTKQQLQKNAQQLQDKYVCELQLERLTNINKVRGEKNRFKTEDFVKEV





WT_MLVMS
KQLCETQRQYHNIDDQFIQQYIDLVSTRREYFEGPGNGSPYGWDGDLLKWYEKLMGRCTYFPEEL






RSVKYAYSADLFNALNDLNNLVVTRDDNPKLEYYEKYHIIENVFKQKKNPTLKQIAKEIGVQDYDI






RGYRITKSGKPQFTSFKLYHDLKNIFEQAKYLEDVEMLDEIAKILTIYQDEISIKKALDQLPELLTESE






KSQIAQLTGYTGTHRLSLKCIHIVIDELWESPENQMEIFTRLNLKPKKVEMSEIDSIPTTLVDEFILSP






VVKRAFIQSIKVINAVINRFGLPEDIIIELAREKNSKDRRKFINKLQKQNEATRKKIEQLLAKYGNTN






AKYMIEKIKLHDMQEGKCLYSLEAIPLEDLLSNPTHYEVDHIIPRSVSFDNSLNNKVLVKQSENSKK






GNRTPYQYLSSNESKISYNQFKQHILNLSKAKDRISKKKRDMLLEERDINKFEVQKEFINRNLVDTR






YATRELSNLLKTYFSTHDYAVKVKTINGGFTNHLRKVWDFKKHRNHGYKHHAEDALVIANADFL






FKTHKALRRTDKILEQPGLEVNDTTVKVDTEEKYQELFETPKQVKNIKQFRDFKYSHRVDKKPNR






QLINDTLYSTREIDGETYVVQTLKDLYAKDNEKVKKLFTERPQKILMYQHDPKTFEKLMTILNQYA






EAKNPLAAYYEDKGEYVTKYAKKGNGPAIHKIKYIDKKLGSYLDVSNKYPETQNKLVKLSLKSFR






FDIYKCEQGYKMVSIGYLDVLKKDNYYYIPKDKYEAEKQKKKIKESDLFVGSFYYNDLIMYEDELF






RVIGVNSDINNLVELNMVDITYKDFCEVNNVTGEKRIKKTIGKRVVLIEKYTTDILGNLYKTPLPKK






PQLIFKRGELSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWL






SDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSP






WNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCL






RLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDD






LLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVM






GQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAP






ALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVL






TKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATL






LPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIW






AKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKN






KDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGS






KRTADGSEFEKRTADGSEFESPKKKAKVE*






RNAV261
PLV9101
pT7RNA_
MPAAKRVKLDGGKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARR
19555




ModUTRs-
LKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNV





Sau-
NEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQ





WT_MLVMS
KAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYN






ADLYNALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTG






KPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGY






TGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSI






KVINAIIKKYGLPNDIIIELAREKNSKDAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLH






DMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPFQYLSS






SDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNLVDTRYATRGLMNLLR






SYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIANADFIFKEWKKLDKAKK






VMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRELINDTLYSTRK






DDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNPLYKY






YEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGV






YKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDL






LNRIEVNMIDITYREYLENMNDKRPPRIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKGSGGS






SGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGG






MGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGT






NDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEW






RDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQ






GTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLRE






FLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELF






VDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPL






VILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLD






ILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAE






LIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPK






RLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTA






DGSEFESPKKKAKVE*






RNAV262
PLV9102
pT7RNA_
MPAAKRVKLDGGEKKYSIGLDIGTNSVGWAVITDDYKVPSKKFKVLGNTNRKSIKKNLMGALLFD
19556




ModUTRs-
SGETAEATRLKRTARRRYTRRKNRIRYLQEIFANEMAKLDDSFFQRLEESFLVEEDKKNERHPIFGN





Sca++-
LADEVAYHRNYPTIYHLRKKLADSPEKADLRLIYLALAHIIKFRGHFLIEGKLNAENSDVAKLFYQL





WT_MLVMS
IQTYNQLFEESPLDEIEVDAKGILSARLSKSKRLEKLIAVFPNEKKNGLFGNIIALALGLTPNFKSNFD






LTEDAKLQLSKDTYDDDLDELLGQIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKAPLSASMVK






RYDEHHQDLALLKTLVRQQFPEKYAEIFKDDTKNGYAGYVGADKKLRKRSGKLATEEEFYKFIKPI






LEKMDGAEELLAKLNRDDLLRKQRTFDNGSIPHQIHLKELHAILRRQEEFYPFLKENREKIEKILTFR






IPYYVGPLARGNSRFAWLTRKSEEAITPWNFEEVVDKGASAQSFIERMTNFDEQLPNKKVLPKHSL






LYEYFTVYNELTKVKYVTERMRKPEFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVE






IIGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKV






MKQLKRRHYTGWGRLSRKMINGIRDKQSGKTILDFLKSDGFSNRNFMQLIHDDSLTFKEEIEKAQV






SGQGDSLHEQIADLAGSPAIKKGILQTVKIVDELVKVMGHKPENIVIEMARENQTTTKGLQQSRER






KKRIEEGIKELESQILKENPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFI






KDDSIDNKVLTRSVENRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSEA






DKAGFIKRQLVETRQITKHVARILDSRMNTKRDKNDKPIREVKVITLKSKLVSDFRKDFQLYKVRDI






NNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKRFFYSNIM






NFFKTEVKLANGEIRKRPLIETNGETGEVVWNKEKDFATVRKVLAMPQVNIVKKTEVQTGGFSKE






SILSKRESAKLIPRKKGWDTRKYGGFGSPTVAYSILVVAKVEKGKAKKLKSVKVLVGITIMEKGSY






EKDPIGFLEAKGYKDIKKELIFKLPKYSLFELENGRRRMLASAKELQKANELVLPQHLVRLLYYTQ






NISATTGSNNLGYIEQHREEFKEIFEKIIDFSEKYILKNKVNSNLKSSFDEQFAVSDSILLSNSFVSLLK






YTSFGASGGFTFLDLDVKQGRLRYQTVTEVLDATLIYQSITGLYETRTDLSQLGGDSGGSSGGSSGS






ETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQ






APLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQ






DLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGI






SGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQ






TLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGF






CRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQG






YAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHA






VEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLOHNCLDILAEAHG






TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQA






LKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCP






GHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFES






PKKKAKVE*






RNAV263
PLV9104
pT7RNA_
MPAAKRVKLDGGDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDS
19557




ModUTRs-
GETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNI





Spy-
VDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQL





NG-
VQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNF





WT_MLVMS
DLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIK






RYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL






LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLAR






GNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNE






LTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNA






SLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRY






TGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHE






HIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGI






KELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDN






KVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIK






RQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA






HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEI






TLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESIRPKRNSD






KLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLE






AKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARFLQKGNELALPSKYVNFLYLASHYEKLKGSP






EDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTN






LGAPRAFKYFDTTIDRKVYRSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSSGGSSGSETPGTS






ESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIP






LKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREV






NKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLT






WTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNL






GYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIP






GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGV






LTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALV






KQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDL






TDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAE






GKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKG






HSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFESPKKKA






KVE*






RNAV239
PLV8378
pT7RNA
MPAAKRVKLDGGAYTMGIDVGIASCGWAIVDLERQRIIDIGVRTFEKAENPKNGEALAVPRREARS
19558




ModUTRs-
SRRRLRRKKHRIERLKHMFVRNGLAVDIQHLEQTLRSQNEIDVWQLRVDGLDRMLTQKEWLRVLI





BlatCas9-
HLAQRRGFQSNRKTDGSSEDGQVLVNVTENDRLMEEKDYRTVAEMMVKDEKFSDHKRNKNGNY





N607A_
HGVVSRSSLLVEIHTLFETQRQHHNSLASKDFELEYVNIWSAQRPVATKDQIEKMIGTCTFLPKEKR





MLVMS_
APKASWHFQYFMLLQTINHIRITNVQGTRSLNKEEIEQVVNMALTKSKVSYHDTRKILDLSEEYQF





BspQI_
VGLDYGKEDEKKKVESKETIIKLDDYHKLNKIFNEVELAKGETWEADDYDTVAYALTFFKDDEDI





BtgZI
RDYLQNKYKDSKNRLVKNLANKEYTNELIGKVSTLSFRKVGHLSLKALRKIIPFLEQGMTYDKAC






QAAGFDFQGISKKKRSVVLPVIDQISNPVVNRALTQTRKVINALIKKYGSPETIHIETARELSKTFDE






RKNITKDYKENRDKNEHAKKHLSELGIINPTGLDIVKYKLWCEQQGRCMYSNQPISFERLKESGYT






EVDHIIPYSRSMNDSYNNRVLVMTREAREKGNQTPFEYMGNDTQRWYEFEQRVTTNPQIKKEKRQ






NLLLKGFTNRRELEMLERNLNDTRYITKYLSHFISTNLEFSPSDKKKKVVNTSGRITSHLRSRWGLE






KNRGQNDLHHAMDAIVIAVTSDSFIQQVTNYYKRKERRELNGDDKFPLPWKFFREEVIARLSPNPK






EQIEALPNHFYSEDELADLQPIFVSRMPKRSITGEAHQAQFRRVVGKTKEGKNITAKKTALVDISYD






KNGDFNMYGRETDPATYEAIKERYLEFGGNVKKAFSTDLHKPKKDGTKGPLIKSVRIMENKTLVH






PVNKGKGVVYNSSIVRTDVFQRKEKYYLLPVYVTDVTKGKLPNKVIVAKKGYHDWIEVDDSFTFL






FSLYPNDLIFIRQNPKKKISLKKRIESHSISDSKEVQEIHAYYKGVDSSTAAIEFIIHDGSYYAKGVGV






QNLDCFEKYQVDILGNYFKVKGEKRLELETSDSNHKGKDVNSIKSTSRSGGSSGGSSGSETPGTSES






ATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLK






ATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNK






RVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWT






RLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY






RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGF






AEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLT






QKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQ






PPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTD






QPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEG






KKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGH






SAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFESPKKKAK






VE*






RNAV240
PLV8380
pT7RNA
MPAAKRVKLDGGQNNPLNYILGLDLGIASIGWAVVEIDEESSPIRLIDVGVRTFERAEVAKTGESLA
19559




ModUTRs-
LSRRLARSSRRLIKRRAERLKKAKRLLKAEKILHSIDEKLPINVWQLRVKGLKEKLERQEWAAVLL





PpnCas9-
HLSKHRGYLSQRKNEGKSDNKELGALLSGIASNHQMLQSSEYRTPAEIAVKKFQVEEGHIRNQRGS





N605A_
YTHTFSRLDLLAEMELLFQRQAELGNSYTSTTLLENLTALLMWQKPALAGDAILKMLGKCTFEPSE





MLVMS_
YKAAKNSYSAERFVWLTKLNNLRILENGTERALNDNERFALLEQPYEKSKLTYAQVRAMLALSDN





BspQ1_
AIFKGVRYLGEDKKTVESKTTLIEMKFYHQIRKTLGSAELKKEWNELKGNSDLLDEIGTAFSLYKT





BtgZI
DDDICRYLEGKLPERVLNALLENLNFDKFIQLSLKALHQILPLMLQGQRYDEAVSAIYGDHYGKKS






TETTRLLPTIPADEIRNPVVLRTLTQARKVINAVVRLYGSPARIHIETAREVGKSYQDRKKLEKQQE






DNRKQRESAVKKFKEMFPHFVGEPKGKDILKMRLYELQQAKCLYSGKSLELHRLLEKGYVEVDH






ALPFSRTWDDSFNNKVLVLANEAQNKGNLTPYEWLDGKNNSERWQHFVVRVQTSGFSYAKKQRI






LNHKLDEKGFIERNLNDTRYVARFLCNFIADNMLLVGKGKRNVFASNGQITALLRHRWGLQKVRE






QNDRHHALDAVVVACSTVAMQQKITRFVRYNEGNVFSGERIDRETGEIIPLHFPSPWAFFKENVEIR






IFSENPKLELENRLPDYPQYNHEWVQPLFVSRMPTRKMTGQGHMETVKSAKRLNEGLSVLKVPLT






QLKLSDLERMVNRDREIALYESLKARLEQFGNDPAKAFAEPFYKKGGALVKAVRLEQTQKSGVLV






RDGNGVADNASMVRVDVFTKGGKYFLVPIYTWQVAKGILPNRAATQGKDENDWDIMDEMATFQ






FSLCQNDLIKLVTKKKTIFGYFNGLNRATSNINIKEHDLDKSKGKLGIYLEVGVKLAISLEKYQVDE






LGKNIRPCRPTKRQHVRSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPD






VSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQG






ILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDL






KDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLI






LLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTE






ARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIK






QALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRM






VAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVV






ALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVT






TETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTS






EGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIE






NSSPSGGSKRTADGSEFEKRTADGSEFESPKKKAKVE*






RNAV241
PLV8381
pT7RNA
MPAAKRVKLDGGKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARR
19560




ModUTRs-
LKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNV





SauCas9-
NEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQ





KKH-
KAYHOLDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYN





N580A_
ADLYNALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTG





MLVMS_
KPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGY





BspQ1_
TGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSI





BtgZI
KVINAIIKKYGLPNDIIIELAREKNSKDAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLH






DMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEEASKKGNRTPFQYLSS






SDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNLVDTRYATRGLMNLLR






SYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIANADFIFKEWKKLDKAKK






VMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRKLINDTLYSTRK






DDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNPLYKY






YEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGV






YKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYKNDLIKINGELYRVIGVNNDL






LNRIEVNMIDITYREYLENMNDKRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKGSGGS






SGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGG






MGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGT






NDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEW






RDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQ






GTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLRE






FLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELF






VDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPL






VILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLD






ILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAE






LIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPK






RLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTA






DGSEFESPKKKAKVE*






RNAV242
PLV8383
pT7RNA
MPAAKRVKLDGGQENQQKQNYILGLDIGITSVGYGLIDSKTREVIDAGVRLFPEADSENNSNRRSK
19561




ModUTRs-
RGARRLKRRRIHRLNRVKDLLADYQMIDLNNVPKSTDPYTIRVKGLREPLTKEEFAIALLHIAKRRG





SauriCas9-
LHNISVSMGDEEQDNELSTKQQLQKNAQQLQDKYVCELQLERLTNINKVRGEKNRFKTEDFVKEV





KKH-
KQLCETQRQYHNIDDQFIQQYIDLVSTRREYFEGPGNGSPYGWDGDLLKWYEKLMGRCTYFPEEL





N588A_
RSVKYAYSADLFNALNDLNNLVVTRDDNPKLEYYEKYHIIENVFKQKKNPTLKQIAKEIGVQDYDI





MLVMS_
RGYRITKSGKPQFTSFKLYHDLKNIFEQAKYLEDVEMLDEIAKILTIYQDEISIKKALDQLPELLTESE





BspQ1_
KSQIAQLTGYTGTHRLSLKCIHIVIDELWESPENQMEIFTRLNLKPKKVEMSEIDSIPTTLVDEFILSP





BtgZI
VVKRAFIQSIKVINAVINRFGLPEDIIIELAREKNSKDRRKFINKLQKQNEATRKKIEQLLAKYGNTN






AKYMIEKIKLHDMQEGKCLYSLEAIPLEDLLSNPTHYEVDHIIPRSVSFDNSLNNKVLVKQSEASKK






GNRTPYQYLSSNESKISYNQFKQHILNLSKAKDRISKKKRDMLLEERDINKFEVQKEFINRNLVDTR






YATRELSNLLKTYFSTHDYAVKVKTINGGFTNHLRKVWDFKKHRNHGYKHHAEDALVIANADFL






FKTHKALRRTDKILEQPGLEVNDTTVKVDTEEKYQELFETPKQVKNIKQFRDFKYSHRVDKKPNR






KLINDTLYSTREIDGETYVVQTLKDLYAKDNEKVKKLFTERPQKILMYQHDPKTFEKLMTILNQYA






EAKNPLAAYYEDKGEYVTKYAKKGNGPAIHKIKYIDKKLGSYLDVSNKYPETQNKLVKLSLKSFR






FDIYKCEQGYKMVSIGYLDVLKKDNYYYIPKDKYEAEKQKKKIKESDLFVGSFYKNDLIMYEDELF






RVIGVNSDINNLVELNMVDITYKDFCEVNNVTGEKHIKKTIGKRVVLIEKYTTDILGNLYKTPLPKK






PQLIFKRGELSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWL






SDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSP






WNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCL






RLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDD






LLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVM






GQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAP






ALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVL






TKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATL






LPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIW






AKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKN






KDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGS






KRTADGSEFEKRTADGSEFESPKKKAKVE*






RNAV243
PLV8384
pT7RNA
MPAAKRVKLDGGQENQQKQNYILGLDIGITSVGYGLIDSKTREVIDAGVRLFPEADSENNSNRRSK
19562




ModUTRs-
RGARRLKRRRIHRLNRVKDLLADYQMIDLNNVPKSTDPYTIRVKGLREPLTKEEFAIALLHIAKRRG





SauriCas9-
LHNISVSMGDEEQDNELSTKQQLQKNAQQLQDKYVCELQLERLTNINKVRGEKNRFKTEDFVKEV





N588A_
KQLCETQRQYHNIDDQFIQQYIDLVSTRREYFEGPGNGSPYGWDGDLLKWYEKLMGRCTYFPEEL





MLVMS_
RSVKYAYSADLFNALNDLNNLVVTRDDNPKLEYYEKYHIIENVFKQKKNPTLKQIAKEIGVQDYDI





BspQ1_
RGYRITKSGKPQFTSFKLYHDLKNIFEQAKYLEDVEMLDEIAKILTIYQDEISIKKALDQLPELLTESE





BtgZI
KSQIAQLTGYTGTHRLSLKCIHIVIDELWESPENQMEIFTRLNLKPKKVEMSEIDSIPTTLVDEFILSP






VVKRAFIQSIKVINAVINRFGLPEDIIIELAREKNSKDRRKFINKLQKQNEATRKKIEQLLAKYGNTN






AKYMIEKIKLHDMQEGKCLYSLEAIPLEDLLSNPTHYEVDHIIPRSVSFDNSLNNKVLVKQSEASKK






GNRTPYQYLSSNESKISYNQFKQHILNLSKAKDRISKKKRDMLLEERDINKFEVQKEFINRNLVDTR






YATRELSNLLKTYFSTHDYAVKVKTINGGFTNHLRKVWDFKKHRNHGYKHHAEDALVIANADFL






FKTHKALRRTDKILEQPGLEVNDTTVKVDTEEKYQELFETPKQVKNIKQFRDFKYSHRVDKKPNR






QLINDTLYSTREIDGETYVVQTLKDLYAKDNEKVKKLFTERPQKILMYQHDPKTFEKLMTILNQYA






EAKNPLAAYYEDKGEYVTKYAKKGNGPAIHKIKYIDKKLGSYLDVSNKYPETQNKLVKLSLKSFR






FDIYKCEQGYKMVSIGYLDVLKKDNYYYIPKDKYEAEKQKKKIKESDLFVGSFYYNDLIMYEDELF






RVIGVNSDINNLVELNMVDITYKDFCEVNNVTGEKRIKKTIGKRVVLIEKYTTDILGNLYKTPLPKK






PQLIFKRGELSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWL






SDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSP






WNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCL






RLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDD






LLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVM






GQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAP






ALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVL






TKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATL






LPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIW






AKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKN






KDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGS






KRTADGSEFEKRTADGSEFESPKKKAKVE*






RNAV244
PLV8382
pT7RNA
MPAAKRVKLDGGKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARR
19563




ModUTRs-
LKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNV





SauCas9-
NEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQ





N580A_
KAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYN





MLVMS_
ADLYNALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTG





BspQ1_
KPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGY





BtgZI
TGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSI






KVINAIIKKYGLPNDIIIELAREKNSKDAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLH






DMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEEASKKGNRTPFQYLSS






SDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNLVDTRYATRGLMNLLR






SYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIANADFIFKEWKKLDKAKK






VMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRELINDTLYSTRK






DDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNPLYKY






YEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGV






YKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDL






LNRIEVNMIDITYREYLENMNDKRPPRIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKGSGGS






SGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGG






MGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGT






NDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEW






RDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQ






GTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLRE






FLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELF






VDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPL






VILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLD






ILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAE






LIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPK






RLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTA






DGSEFESPKKKAKVE*






RNAV245
PLV8385
pT7RNA
MPAAKRVKLDGGEKKYSIGLDIGTNSVGWAVITDDYKVPSKKFKVLGNTNRKSIKKNLMGALLFD
19564




ModUTRs-
SGETAEATRLKRTARRRYTRRKNRIRYLQEIFANEMAKLDDSFFQRLEESFLVEEDKKNERHPIFGN





ScaCas9++-
LADEVAYHRNYPTIYHLRKKLADSPEKADLRLIYLALAHIIKFRGHFLIEGKLNAENSDVAKLFYQL





N872A_
IQTYNQLFEESPLDEIEVDAKGILSARLSKSKRLEKLIAVFPNEKKNGLFGNIIALALGLTPNFKSNFD





MLVMS_
LTEDAKLQLSKDTYDDDLDELLGQIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKAPLSASMVK





BspQ1_
RYDEHHQDLALLKTLVRQQFPEKYAEIFKDDTKNGYAGYVGADKKLRKRSGKLATEEEFYKFIKPI





BtgZI
LEKMDGAEELLAKLNRDDLLRKQRTFDNGSIPHQIHLKELHAILRRQEEFYPFLKENREKIEKILTFR






IPYYVGPLARGNSRFAWLTRKSEEAITPWNFEEVVDKGASAQSFIERMTNFDEQLPNKKVLPKHSL






LYEYFTVYNELTKVKYVTERMRKPEFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVE






IIGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKV






MKQLKRRHYTGWGRLSRKMINGIRDKQSGKTILDFLKSDGFSNRNFMQLIHDDSLTFKEEIEKAQV






SGQGDSLHEQIADLAGSPAIKKGILQTVKIVDELVKVMGHKPENIVIEMARENQTTTKGLQQSRER






KKRIEEGIKELESQILKENPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFI






KDDSIDNKVLTRSVEARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSEA






DKAGFIKRQLVETRQITKHVARILDSRMNTKRDKNDKPIREVKVITLKSKLVSDFRKDFQLYKVRDI






NNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKRFFYSNIM






NFFKTEVKLANGEIRKRPLIETNGETGEVVWNKEKDFATVRKVLAMPQVNIVKKTEVQTGGFSKE






SILSKRESAKLIPRKKGWDTRKYGGFGSPTVAYSILVVAKVEKGKAKKLKSVKVLVGITIMEKGSY






EKDPIGFLEAKGYKDIKKELIFKLPKYSLFELENGRRRMLASAKELQKANELVLPQHLVRLLYYTQ






NISATTGSNNLGYIEQHREEFKEIFEKIIDFSEKYILKNKVNSNLKSSFDEQFAVSDSILLSNSFVSLLK






YTSFGASGGFTFLDLDVKQGRLRYQTVTEVLDATLIYQSITGLYETRTDLSQLGGDSGGSSGGSSGS






ETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQ






APLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQ






DLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGI






SGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQ






TLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGF






CRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQG






YAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHA






VEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG






TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQA






LKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCP






GHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFES






PKKKAKVE*






RNAV246
PLV8386
pT7RNA
MPAAKRVKLDGGDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDS
19565




ModUTRs-
GETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNI





SpyCas9NG-
VDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQL





N863A_
VQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNF





MLVMS_
DLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIK





BspQ1_
RYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL





BtgZI
LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLAR






GNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNE






LTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNA






SLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRY






TGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHE






HIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGI






KELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDN






KVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIK






RQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA






HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEI






TLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESIRPKRNSD






KLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLE






AKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARFLQKGNELALPSKYVNFLYLASHYEKLKGSP






EDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTN






LGAPRAFKYFDTTIDRKVYRSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSSGGSSGSETPGTS






ESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIP






LKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREV






NKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLT






WTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNL






GYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIP






GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGV






LTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALV






KQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDL






TDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAE






GKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKG






HSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFESPKKKA






KVE*






RNAV214
PLV8932
pT7RNA
MPAAKRVKLDGGDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDS
19566




ModUTRs_
GETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNI





WT-
VDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQL





cMyc-
VQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNF





CAS9-
DLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIK





MLVMS-
RYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL





bi-
LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLAR





SV40A5
GNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNE





NLS
LTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNA





(Prop_
SLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRY





design3)_
TGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHE





BspQI-
HIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGI





internally
KELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDN






KVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIK






RQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA






HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEI






TLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSD






KLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLE






AKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSP






EDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTN






LGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSSGGSSGSETPGTS






ESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIP






LKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREV






NKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLT






WTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNL






GYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIP






GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGV






LTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALV






KQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDL






TDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAE






GKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKG






HSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFESPKKKA






KVE*









In some embodiments, reverse transcriptase domains are modified, for example by site-specific mutation. In some embodiments, reverse transcriptase domains are engineered to have improved properties, e.g., SuperScript IV (SSIV) reverse transcriptase derived from the MMLV RT. In some embodiments, the reverse transcriptase domain may be engineered to have lower error rates, e.g., as described in WO2001068895, incorporated herein by reference. In some embodiments, the reverse transcriptase domain may be engineered to be more thermostable. In some embodiments, the reverse transcriptase domain may be engineered to be more processive. In some embodiments, the reverse transcriptase domain may be engineered to have tolerance to inhibitors. In some embodiments, the reverse transcriptase domain may be engineered to be faster. In some embodiments, the reverse transcriptase domain may be engineered to better tolerate modified nucleotides in the RNA template. In some embodiments, the reverse transcriptase domain may be engineered to insert modified DNA nucleotides. In some embodiments, the reverse transcriptase domain is engineered to bind a template RNA. In some embodiments, one or more mutations are chosen from D200N, L603W, T330P, D524G, E562Q, D583N, P51L, S67R, E67K, T197A, H204R, E302K, F309N, W313F, L435G, N454K, H594Q, L671P, E69K, H8Y, T306K, or D653N in the RT domain of murine leukemia virus reverse transcriptase or a corresponding mutation at a corresponding position of another RT domain.


In some embodiments, a gene modifying polypeptide comprises the RT domain from a retroviral reverse transcriptase, e.g., a wild-type M-MLV RT, e.g., comprising the following sequence:









M-MLV (WT):


(SEQ ID NO: 5002)


TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLII





PLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP





VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLD





LKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFD





EALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNL





GYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQL





REFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQA





LLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLD





PVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDR





WLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILA





EAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAK





ALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRR





RGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNR





MADQAARKAAITETPDTSTLLI






In some embodiments, a gene modifying polypeptide comprises the RT domain from a retroviral reverse transcriptase, e.g., an M-MLV RT, e.g., comprising the following sequence:









(SEQ ID NO: 5003)


TLNIEDEHRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLII





PLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP





VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLD





LKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFD





EALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNL





GYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQL





REFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQA





LLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLD





PVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDR





WLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILA





EAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAK





ALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRR





RGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNR





MADQAARKAAITETPDTSTLL






In some embodiments, a gene modifying polypeptide comprises the RT domain from a retroviral reverse transcriptase comprising the sequence of amino acids 659-1329 of NP_057933. In embodiments, the gene modifying polypeptide further comprises one additional amino acid at the N-terminus of the sequence of amino acids 659-1329 of NP_057933, e.g., as shown below:









(SEQ ID NO: 5004)


TLNIEDEHRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLII





PLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP






VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLD







LKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFD







EALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNL







GYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQL






REFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQA





LLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLD





PVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDR





WLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILA





EAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAK






ALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRR







RGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNR







MADQAARKAA








Core RT (bold), annotated per above


RNAseH (underlined), annotated per above


In embodiments, the gene modifying polypeptide further comprises one additional amino acid at the C-terminus of the sequence of amino acids 659-1329 of NP_057933. In embodiments, the gene modifying polypeptide comprises an RNaseH1 domain (e.g., amino acids 1178-1318 of NP_057933).


In some embodiments, a retroviral reverse transcriptase domain, e.g., M-MLV RT, may comprise one or more mutations from a wild-type sequence that may improve features of the RT, e.g., thermostability, processivity, and/or template binding. In some embodiments, an M-MLV RT domain comprises, relative to the M-MLV (WT) sequence above, one or more mutations, e.g., selected from D200N, L603W, T330P, T306K, W313F, D524G, E562Q, D583N, P51L, S67R, E67K, T197A, H204R, E302K, F309N, L435G, N454K, H594Q, D653N, R110S, K103L, e.g., a combination of mutations, such as D200N, L603W, and T330P, optionally further including T306K and W313F. In some embodiments, an M-MLV RT used herein comprises the mutations D200N, L603W, T330P, T306K and W313F. In embodiments, the mutant M-MLV RT comprises the following amino acid sequence:









M-MLV:


(SEQ ID NO: 5005)


TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLII





PLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLP





VKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLD





LKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFN





EALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNL





GYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQL





REFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQA





LLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLD





PVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDR





WLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILA





EAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAK





ALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRR





RGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNR





MADQAARKAAITETPDTSTLLI






In some embodiments, a writing domain (e.g., RT domain) comprises an RNA-binding domain, e.g., that specifically binds to an RNA sequence. In some embodiments, a template RNA comprises an RNA sequence that is specifically bound by the RNA-binding domain of the writing domain.


In some embodiments, the reverse transcription domain only recognizes and reverse transcribes a specific template, e.g., a template RNA of the system. In some embodiments, the template comprises a sequence or structure that enables recognition and reverse transcription by a reverse transcription domain. In some embodiments, the template comprises a sequence or structure that enables association with an RNA-binding domain of a polypeptide component of a genome engineering system described herein. In some embodiments, the genome engineering system reverse preferably transcribes a template comprising an association sequence over a template lacking an association sequence.


The writing domain may also comprise DNA-dependent DNA polymerase activity, e.g., comprise enzymatic activity capable of writing DNA into the genome from a template DNA sequence. In some embodiments, DNA-dependent DNA polymerization is employed to complete second-strand synthesis of a target site edit. In some embodiments, the DNA-dependent DNA polymerase activity is provided by a DNA polymerase domain in the polypeptide. In some embodiments, the DNA-dependent DNA polymerase activity is provided by a reverse transcriptase domain that is also capable of DNA-dependent DNA polymerization, e.g., second-strand synthesis. In some embodiments, the DNA-dependent DNA polymerase activity is provided by a second polypeptide of the system. In some embodiments, the DNA-dependent DNA polymerase activity is provided by an endogenous host cell polymerase that is optionally recruited to the target site by a component of the genome engineering system.


In some embodiments, the reverse transcriptase domain has a lower probability of premature termination rate (Poll) in vitro relative to a reference reverse transcriptase domain. In some embodiments, the reference reverse transcriptase domain is a viral reverse transcriptase domain, e.g., the RT domain from M-MLV.


In some embodiments, the reverse transcriptase domain has a lower probability of premature termination rate (Par) in vitro of less than about 5×10−3/nt, 5×10−4/nt, or 5×10−6/nt, e.g., as measured on a 1094 nt RNA. In embodiments, the in vitro premature termination rate is determined as described in Bibillo and Eickbush (2002) J Biol Chem 277(38):34836-34845 (incorporated by reference herein its entirety).


In some embodiments, the reverse transcriptase domain is able to complete at least about 30% or 50% of integrations in cells. The percent of complete integrations can be measured by dividing the number of substantially full-length integration events (e.g., genomic sites that comprise at least 98% of the expected integrated sequence) by the number of total (including substantially full-length and partial) integration events in a population of cells. In embodiments, the integrations in cells is determined (e.g., across the integration site) using long-read amplicon sequencing, e.g., as described in Karst et al. (2020) bioRxiv doi.org/10.1101/645903 (incorporated by reference herein in its entirety).


In embodiments, quantifying integrations in cells comprises counting the fraction of integrations that contain at least about 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% of the DNA sequence corresponding to the template RNA (e.g., a template RNA having a length of at least 0.05, 0.1, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.5, 2, 3, 4, or 5 kb, e.g., a length between 0.5-0.6, 0.6-0.7, 0.7-0.8, 0.8-0.9, 1.0-1.2, 1.2-1.4, 1.4-1.6, 1.6-1.8, 1.8-2.0, 2-3, 3-4, or 4-5 kb).


In some embodiments, the reverse transcriptase domain is capable of polymerizing dNTPs in vitro. In embodiments, the reverse transcriptase domain is capable of polymerizing dNTPs in vitro at a rate between 0.1-50 nt/sec (e.g., between 0.1-1, 1-10, or 10-50 nt/sec). In embodiments, polymerization of dNTPs by the reverse transcriptase domain is measured by a single-molecule assay, e.g., as described in Schwartz and Quake (2009) PNAS 106(48):20294-20299 (incorporated by reference in its entirety).


In some embodiments, the reverse transcriptase domain has an in vitro error rate (e.g., misincorporation of nucleotides) of between 1×10−3-1×10−4 or 1×10−4-1×10−5 substitutions/nt, e.g., as described in Yasukawa et al. (2017) Biochem Biophys Res Commun 492(2):147-153 (incorporated herein by reference in its entirety). In some embodiments, the reverse transcriptase domain has an error rate (e.g., misincorporation of nucleotides) in cells (e.g., HEK293T cells) of between 1×10−3-1×10−4 or 1×10−4-1×10−5 substitutions/nt, e.g., by long-read amplicon sequencing, e.g., as described in Karst et al. (2020) bioRxiv doi.org/10.1101/645903 (incorporated by reference herein in its entirety).


In some embodiments, the reverse transcriptase domain is capable of performing reverse transcription of a target RNA in vitro. In some embodiments, the reverse transcriptase requires a primer of at least 3 nucleotides to initiate reverse transcription of a template. In some embodiments, reverse transcription of the target RNA is determined by detection of cDNA from the target RNA (e.g., when provided with a ssDNA primer, e.g., which anneals to the target with at least 3, 4, 5, 6, 7, 8, 9, or 10 nt at the 3′ end), e.g., as described in Bibillo and Eickbush (2002) J Blot Chem 277(38):34836-34845 (incorporated herein by reference in its entirety).


In some embodiments, the reverse transcriptase domain performs reverse transcription at least 5 or 10 times more efficiently (e.g., by cDNA production), e.g., when converting its RNA template to cDNA, for example, as compared to an RNA template lacking the protein binding motif (e.g., a 3′ UTR). In embodiments, efficiency of reverse transcription is measured as described in Yasukawa et al. (2017) Biochem Biophys Res Commun 492(2):147-153 (incorporated by reference herein in its entirety).


In some embodiments, the reverse transcriptase domain specifically binds a specific RNA template with higher frequency (e.g., about 5 or 10-fold higher frequency) than any endogenous cellular RNA, e.g., when expressed in cells (e.g., HEK293T cells). In embodiments, frequency of specific binding between the reverse transcriptase domain and the template RNA are measured by CLIP-seq, e.g., as described in Lin and Miles (2019) Nucleic Acids Res 47(11):5490-5501 (incorporated herein by reference in its entirety).


In some embodiments, an RT domain (e.g., as listed in Table 6) comprises one or more mutations as listed in Table 2A below. In some embodiment, an RT domain as listed in Table 6 comprises one, two, three, four, five, or six of the mutations listed in the corresponding row of Table 2A below.









TABLE 2A







Exemplary RT domain mutations (relative to corresponding wild-type


sequences as listed in the corresponding row of Table 6)








RT Domain Name
Mutation(s)
















AVIRE_P03360








AVIRE_P03360_
D200N
G330P
L605W





3mut








AVIRE_P03360_
D200N
G330P
L605W
T306K
W313F



3mutA








BAEVM_P10272








BAEVM_P10272_
D198N
E328P
L602W





3mut








BAEVM_P10272_
D198N
E328P
L602W
T304K
W311F



3mutA








BLVAU_P25059








BLVAU_P25059_
E159Q
G286P






2mut








BLVJ_P03361








BLVJ_P03361_
E159Q
L524W






2mut








BLVJ_P03361_
E159Q
L524W
I97P





2mutB








FFV_O93209
D21N







FFV_O93209_
D21N
T293N
T419P





2mut








FFV_O93209_
D21N
T293N
T419P
L393K




2mutA








FFV_O93209-Pro








FFV_O93209-
T207N
T333P






Pro_2mut








FFV_O93209-
T207N
T333P
L307K





Pro_2mutA








FLV_P10273








FLV_P10273_
D199N
L602W






3mut








FLV_P10273_
D199N
L602W
T305K
W312F




3mutA








FOAMV_P14350
D24N







FOAMV_
D24N
T296N
S420P





P14350_2mut








FOAMV_
D24N
T296N
S420P
L396K




P14350_2mutA








FOAMV_








P14350-Pro








FOAMV_
T207N
S331P






P14350-








Pro_2mut








FOAMV_
T207N
S331P
L307K





P14350-








Pro_2mutA








GALV_P21414








GALV_P21414_
D198N
E328P
L600W





3mut








GALV_P21414_
D198N
E328P
L600W
T304K
W311F



3mutA








HTL1A_P03362








HTL1A_
E152Q
R279P






P03362_








2mut








HTL1A_
E152Q
R279P
L90P





P03362_








2mutB








HTL1C_P14078








HTL1C_
E152Q
R279P






P14078_








2mut








HTL1L_P0C211








HTL1L_
E149Q
L527W






P0C211_








2mut








HTL_1L_
E149Q
L527W
L87P





P0C211_








2mutB








HTL32_Q0R5R2








HTL32_
E149Q
L526W






Q0R5R2_








2mut








HTL32_
E149Q
L526W
L87P





Q0R5R2_








2mutB








HTL3P_Q4U0X6








HTL3P_








Q4U0X6_
E149Q
L526W






2mut








HTL3P_
E149Q
L526W
L87P





Q4U0X6_2mutB








HTLV2_
E147Q
G274P






P03363_2mut








JSRV_P31623








JSRV_P31623_
A100P







2mutB








KORV_Q9TTC1
D32N







KORV_
D32N
D322N
E452P
L724W




Q9TTC1_








3mut








KORV_
D32N
D322N
E452P
L724W
T428K
W435F


Q9TTC1_








3mutA








KORV_








Q9TTC1-Pro








KORV_
D231N
E361P
L633W





Q9TTC1-








Pro_3mut








KORV_
D231N
E361P
L633W
T337K
W344F



Q9TTC1-








Pro_3mutA








MLVAV_P03356








MLVAV_
D200N
T330P
L603W





P03356_








3mut








MLVAV_








P03356_
D200N
T330P
L603W
T306K
W313F



3mutA








MLVBM_








Q7SVK7








MLVBM_








Q7SVK7








MLVBM_
D200N
T330P
L603W





Q7SVK7_3mut








MLVBM_
D200N
T330P
L603W





Q7SVK7_3mut








MLVBM_
D199N
T329P
L602W
T305K
W312F



Q7SVK7_








3mutA_WS








MLVBM_
D199N
T329P
L602W
T305K
W312F



Q7SVK7_








3mutA_WS








MLVCB_P08361








MLVCB_
D200N
T330P
L603W





P08361_3mut








MLVCB_
D200N
T330P
L603W
T306K
W313F



P08361_








3mutA








MLVF5_P26810








MLVF5_
D200N
T330P
L603W





P26810_








3mut








MLVF5_
D200N
T330P
L603W
T306K
W313F



P26810_








3mutA








MLVFF_
D200N
T330P
L603W





P26809_








3mut








MLVFF_
D200N
T330P
L603W
T306K
W313F



P26809_








3mutA








MLVMS_P03355








MLVMS_P03355








MLVMS_
D200N
T330P
L603W





P03355_








3mut








MLVMS_
D200N
T330P
L603W





P03355_3mut








MLVMS_
D200N
T330P
L603W
T306K
W313F



P03355_








3mutA_WS








MLVMS_
D200N
T330P
L603W
T306K
W313F



P03355_








3mutA_WS








MLVMS_
D200N
T330P
L603W
T306K
W313F
H8Y


P03355_








PLV919








MLVMS_
D200N
T330P
L603W
T306K
W313F
H8Y


P03355_








PLV919








MLVRD_P11227








MLVRD_
D200N
T330P
L603W





P11227_








3mut








MMTVB_P03365
D26N







MMTVB_P03365
D26N







MMTVB_
D26N
G401P






P03365_








2mut








MMTVB_
G400P







P03365_








2mut_WS








MMTVB_
G400P







P03365_








2mut_WS








MMTVB_
D26N
G401P
V215P





P03365_








2mutB








MMTVB_
D26N
G401P
V215P





P03365_








2mutB








MMTVB_
G400P
V212P






P03365_








2mutB_WS








MMTVB_
G400P
V212P






P03365_








2mutB_WS








MMTVB_








P03365_WS








MMTVB_








P03365_WS








MMTVB_








P03365-Pro








MMTVB_








P03365-Pro








MMTVB_
G309P







P03365-








Pro_2mut








MMTVB_
G309P







P03365-








Pro_2mut








MMTVB_
G309P
V123P






P03365-








Pro_2mutB








MMTVB_
G309P
V123P






P03365-








Pro_2mutB








MPMV_P07572








MPMV_
G289P
I103P






P07572_








2mutB








PERV_Q4VFZ2








PERV_Q4VFZ2








PERV_
D199N
E329P
L602W





Q4VFZ2_








3mut








PERV_
D199N
E329P
L602W





Q4VFZ2_








3mut








PERV_
D196N
E326P
L599W
T302K
W309F



Q4VFZ2_








3mutA_WS








PERV_
D196N
E326P
L599W
T302K
W309F



Q4VFZ2_








3mutA_WS








SFV1_P23074
D24N







SFV1_
D24N
T296N
N420P





P23074_








2mut








SFV1_
D24N
T296N
N420P
L396K




P23074_








2mutA








SFV1_








P23074-Pro








SFV1_P23074-
T207N
N331P






Pro_2mut








SFV1_
T207N
N331P
L307K





P23074-








Pro_2mutA








SFV3L_P27401
D24N







SFV3L_
D24N
T296N
N422P





P27401_








2mut








SFV3L_
D24N
T296N
N422P
L396K




P27401_








2mutA








SFV3L_P27401-








Pro








SFV3L_P27401-
T307N
N333P






Pro_2mut








SFV3L_P27401-
T307N
N333P
L307K





Pro_2mutA








SFVCP_Q87040
D24N







SFVCP_Q87040_
D24N
T296N
K422P





2mut








SFVCP_Q87040_
D24N
T296N
K422P
L396K




2mutA








SFVCP_








Q87040-Pro








SFVCP_Q87040-
T207N
K333P






Pro_2mut








SFVCP_Q87040-
T207N
K333P
L307K





Pro_2mutA








SMRVH_P03364








SMRVH_
G288P







P03364_2mut








SMRVH_
G288P
I102P






P03364_2mutB








SRV2_P51517








SRV2_P51517_
I103P







2mutB








WDSV_O92815








WDSV_O92815_
S183N
K312P






2mut








WDSV_O92815_
S183N
K312P
L288K
W295F




2mutA








WMSV_P03359








WMSV_P03359_
D198N
E328P
L600W





3mut








WMSV_P03359_
D198N
E328P
L600W
T304K
W311F



3mutA








XMRV6_A1Z651








XMRV6_
D200N
T330P
L603W





A1Z651_3mut








XMRV6_
D200N
T330P
L603W
T306K
W313F



A1Z651_3mutA









Template Nucleic Acid Binding Domain

The gene modifying polypeptide typically contains regions capable of associating with the template nucleic acid (e.g., template RNA). In some embodiments, the template nucleic acid binding domain is an RNA binding domain. In some embodiments, the RNA binding domain is a modular domain that can associate with RNA molecules containing specific signatures, e.g., structural motifs. In other embodiments, the template nucleic acid binding domain (e.g., RNA binding domain) is contained within the reverse transcription domain, e.g., the reverse transcriptase-derived component has a known signature for RNA preference.


In other embodiments, the template nucleic acid binding domain (e.g., RNA binding domain) is contained within the target DNA binding domain. For example, in some embodiments, the DNA binding domain is a CRISPR-associated protein that recognizes the structure of a template nucleic acid (e.g., template RNA) comprising a gRNA. In some embodiments, a gene modifying polypeptide comprises a DNA-binding domain comprising a CRISPR-associated protein that associates with a gRNA scaffold that allows the DNA-binding domain to bind a target genomic DNA sequence. In some embodiments, the gRNA scaffold and gRNA spacer is comprised within the template nucleic acid (e.g., template RNA), thus the DNA-binding domain is also the template nucleic acid binding domain. In some embodiments, the polypeptide possesses RNA binding function in multiple domains, e.g., can bind a gRNA structure in a CRISPR-associated DNA binding domain and an additional sequence or structure in a reverse transcriptase domain.


In some embodiments, the RNA binding domain is capable of binding to a template RNA with greater affinity than a reference RNA binding domain. In some embodiments, the reference RNA binding domain is an RNA binding domain from Cas9 of S. pyogenes. In some embodiments, the RNA binding domain is capable of binding to a template RNA with an affinity between 100 pM-10 nM (e.g., between 100 pM-1 nM or 1 nM-10 nM). In some embodiments, the affinity of a RNA binding domain for its template RNA is measured in vitro, e.g., by thermophoresis, e.g., as described in Asmari et al. Methods 146:107-119 (2018) (incorporated by reference herein in its entirety). In some embodiments, the affinity of a RNA binding domain for its template RNA is measured in cells (e.g., by FRET or CLIP-Seq).


In some embodiments, the RNA binding domain is associated with the template RNA in vitro at a frequency at least about 5-fold or 10-fold higher than with a scrambled RNA. In some embodiments, the frequency of association between the RNA binding domain and the template RNA or scrambled RNA is measured by CLIP-seq, e.g., as described in Lin and Miles (2019) Nucleic Acids Res 47(11):5490-5501 (incorporated by reference herein in its entirety). In some embodiments, the RNA binding domain is associated with the template RNA in cells (e.g., in HEK293T cells) at a frequency at least about 5-fold or 10-fold higher than with a scrambled RNA. In some embodiments, the frequency of association between the RNA binding domain and the template RNA or scrambled RNA is measured by CLIP-seq, e.g., as described in Lin and Miles (2019), supra.


Endonuclease Domains and DNA Binding Domains


In some embodiments, a gene modifying polypeptide possesses the function of DNA target site cleavage via an endonuclease domain. In some embodiments, a gene modifying polypeptide comprises a DNA binding domain, e.g., for binding to a target nucleic acid. In some embodiments, a domain (e.g., a Cas domain) of the gene modifying polypeptide comprises two or more smaller domains, e.g., a DNA binding domain and an endonuclease domain. It is understood that when a DNA binding domain (e.g., a Cas domain) is said to bind to a target nucleic acid sequence, in some embodiments, the binding is mediated by a gRNA.


In some embodiments, a domain has two functions. For example, in some embodiments, the endonuclease domain is also a DNA-binding domain. In some embodiments, the endonuclease domain is also a template nucleic acid (e.g., template RNA) binding domain. For example, in some embodiments, a polypeptide comprises a CRISPR-associated endonuclease domain that binds a template RNA comprising a gRNA, binds a target DNA sequence (e.g., with complementarity to a portion of the gRNA), and cuts the target DNA sequence. In some embodiments, an endonuclease domain or endonuclease/DNA-binding domain from a heterologous source can be used or can be modified (e.g., by insertion, deletion, or substitution of one or more residues) in a gene modifying system described herein.


In some embodiments, a nucleic acid encoding the endonuclease domain or endonuclease/DNA binding domain is altered from its natural sequence to have altered codon usage, e.g. improved for human cells. In some embodiments, the endonuclease element is a heterologous endonuclease element, such as a Cas endonuclease (e.g., Cas9), a type-II restriction endonuclease (e.g., FokI), a meganuclease (e.g., I-SceI), or other endonuclease domain.


In certain aspects, the DNA-binding domain of a gene modifying polypeptide described herein is selected, designed, or constructed for binding to a desired host DNA target sequence. In certain embodiments, the DNA-binding domain of the polypeptide is a heterologous DNA-binding element. In some embodiments the heterologous DNA binding element is a zinc-finger element or a TAL effector element, e.g., a zinc-finger or TAL polypeptide or functional fragment thereof. In some embodiments the heterologous DNA binding element is a sequence-guided DNA binding element, such as Cas9, Cpf1, or other CRISPR-related protein that has been altered to have no endonuclease activity. In some embodiments the heterologous DNA binding element retains endonuclease activity. In some embodiments, the heterologous DNA binding element retains partial endonuclease activity to cleave ssDNA, e.g., possesses nickase activity. In specific embodiments, the heterologous DNA-binding domain can be any one or more of Cas9, TAL domain, ZF domain, Myb domain, combinations thereof, or multiples thereof.


In some embodiments, DNA-binding domains are modified, for example by site-specific mutation, increasing or decreasing DNA-binding elements (for example, number and/or specificity of zinc fingers), etc., to alter DNA-binding specificity and affinity. In some embodiments a nucleic acid sequence encoding the DNA binding domain is altered from its natural sequence to have altered codon usage, e.g., improved for human cells. In embodiments, the DNA binding domain comprises one or more modifications relative to a wild-type DNA binding domain, e.g., a modification via directed evolution, e.g., phage-assisted continuous evolution (PACE).


In some embodiments, the DNA binding domain comprises a meganuclease domain (e.g., as described herein, e.g., in the endonuclease domain section), or a functional fragment thereof. In some embodiments, the meganuclease domain possesses endonuclease activity, e.g., double-strand cleavage and/or nickase activity. In other embodiments, the meganuclease domain has reduced activity, e.g., lacks endonuclease activity, e.g., the meganuclease is catalytically inactive. In some embodiments, a catalytically inactive meganuclease is used as a DNA binding domain, e.g., as described in Fonfara et al. Nucleic Acids Res 40(2):847-860 (2012), incorporated herein by reference in its entirety.


In some embodiments, a gene modifying polypeptide comprises a modification to a DNA-binding domain, e.g., relative to the wild-type polypeptide. In some embodiments, the DNA-binding domain comprises an addition, deletion, replacement, or modification to the amino acid sequence of the original DNA-binding domain. In some embodiments, the DNA-binding domain is modified to include a heterologous functional domain that binds specifically to a target nucleic acid (e.g., DNA) sequence of interest. In some embodiments, the functional domain replaces at least a portion (e.g., the entirety of) the prior DNA-binding domain of the polypeptide. In some embodiments, the functional domain comprises a zinc finger (e.g., a zinc finger that specifically binds to the target nucleic acid (e.g., DNA) sequence of interest. In some embodiments, the functional domain comprises a Cas domain (e.g., a Cas domain that specifically binds to the target nucleic acid (e.g., DNA) sequence of interest. In some embodiments, the Cas domain comprises a Cas9 or a mutant or variant thereof (e.g., as described herein). In embodiments, the Cas domain is associated with a guide RNA (gRNA), e.g., as described herein. In embodiments, the Cas domain is directed to a target nucleic acid (e.g., DNA) sequence of interest by the gRNA. In embodiments, the Cas domain is encoded in the same nucleic acid (e.g., RNA) molecule as the gRNA. In embodiments, the Cas domain is encoded in a different nucleic acid (e.g., RNA) molecule from the gRNA.


In some embodiments, the DNA binding domain is capable of binding to a target sequence (e.g., a dsDNA target sequence) with greater affinity than a reference DNA binding domain. In some embodiments, the reference DNA binding domain is a DNA binding domain from Cas9 of S. pyogenes. In some embodiments, the DNA binding domain is capable of binding to a target sequence (e.g., a dsDNA target sequence) with an affinity between 100 pM 10 nM (e.g., between 100 pM-1 nM or 1 nM-10 nM).


In some embodiments, the affinity of a DNA binding domain for its target sequence (e.g., dsDNA target sequence) is measured in vitro, e.g., by thermophoresis, e.g., as described in Asmari et al. Methods 146:107-119 (2018) (incorporated by reference herein in its entirety).


In embodiments, the DNA binding domain is capable of binding to its target sequence (e.g., dsDNA target sequence), e.g, with an affinity between 100 pM-10 nM (e.g., between 100 pM-1 nM or 1 nM-10 nM) in the presence of a molar excess of scrambled sequence competitor dsDNA, e.g., of about 100-fold molar excess.


In some embodiments, the DNA binding domain is found associated with its target sequence (e.g., dsDNA target sequence) more frequently than any other sequence in the genome of a target cell, e.g., human target cell, e.g., as measured by ChIP-seq (e.g., in HEK293T cells), e.g., as described in He and Pu (2010) Curr. Protoc Mol Blol Chapter 21 (incorporated herein by reference in its entirety). In some embodiments, the DNA binding domain is found associated with its target sequence (e.g., dsDNA target sequence) at least about 5-fold or 10-fold, more frequently than any other sequence in the genome of a target cell, e.g., as measured by ChIP-seq (e.g., in HEK293T cells), e.g., as described in He and Pu (2010), supra.


In some embodiments, the endonuclease domain has nickase activity and cleaves one strand of a target DNA. In some embodiments, nickase activity reduces the formation of double-stranded breaks at the target site. In some embodiments, the endonuclease domain creates a staggered nick structure in the first and second strands of a target DNA. In some embodiments, a staggered nick structure generates free 3′ overhangs at the target site. In some embodiments, free 3′ overhangs at the target site improve editing efficiency, e.g., by enhancing access and annealing of a 3′ homology region of a template nucleic acid. In some embodiments, a staggered nick structure reduces the formation of double-stranded breaks at the target site.


In some embodiments, the endonuclease domain cleaves both strands of a target DNA, e.g., results in blunt-end cleavage of a target with no ssDNA overhangs on either side of the cut-site. The amino acid sequence of an endonuclease domain of a gene modifying system described herein may be at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to the amino acid sequence of an endonuclease domain described herein, e.g., an endonuclease domain from Table 8.


In certain embodiments, the heterologous endonuclease is Fok1 or a functional fragment thereof. In certain embodiments, the heterologous endonuclease is a Holliday junction resolvase or homolog thereof, such as the Holliday junction resolving enzyme from Sulfolobus solfataricus—Ssol Hje (Govindaraju et al., Nucleic Acids Research 44:7, 2016). In certain embodiments, the heterologous endonuclease is the endonuclease of the large fragment of a spliceosomal protein, such as Prp8 (Mahbub et al., Mobile DNA 8:16, 2017). In certain embodiments, the heterologous endonuclease is derived from a CRISPR-associated protein, e.g., Cas9. In certain embodiments, the heterologous endonuclease is engineered to have only ssDNA cleavage activity, e.g., only nickase activity, e.g., be a Cas9 nickase, e.g., SpCas9 with D10A, H840A, or N863A mutations. Table 8 provides exemplary Cas proteins and mutations associated with nickase activity. In still other embodiments, homologous endonuclease domains are modified, for example by site-specific mutation, to alter DNA endonuclease activity. In still other embodiments, endonuclease domains are modified to reduce DNA-sequence specificity, e.g., by truncation to remove domains that confer DNA-sequence specificity or mutation to inactivate regions conferring DNA-sequence specificity.


In some embodiments, the endonuclease domain has nickase activity and does not form double-stranded breaks. In some embodiments, the endonuclease domain forms single-stranded breaks at a higher frequency than double-stranded breaks, e.g., at least 90%, 95%, 96%, 97%, 98%, or 99% of the breaks are single-stranded breaks, or less than 10%, 5%, 4%, 3%, 2%, or 1% of the breaks are double-stranded breaks. In some embodiments, the endonuclease forms substantially no double-stranded breaks. In some embodiments, the endonuclease does not form detectable levels of double-stranded breaks.


In some embodiments, the endonuclease domain has nickase activity that nicks the target site DNA of the first strand; e.g., in some embodiments, the endonuclease domain cuts the genomic DNA of the target site near to the site of alteration on the strand that will be extended by the writing domain. In some embodiments, the endonuclease domain has nickase activity that nicks the target site DNA of the first strand and does not nick the target site DNA of the second strand. For example, when a polypeptide comprises a CRISPR-associated endonuclease domain having nickase activity, in some embodiments, said CRISPR-associated endonuclease domain nicks the target site DNA strand containing the PAM site (e.g., and does not nick the target site DNA strand that does not contain the PAM site). As a further example, when a polypeptide comprises a CRISPR-associated endonuclease domain having nickase activity, in some embodiments, said CRISPR-associated endonuclease domain nicks the target site DNA strand not containing the PAM site (e.g., and does not nick the target site DNA strand that contains the PAM site).


In some other embodiments, the endonuclease domain has nickase activity that nicks the target site DNA of the first strand and the second strand. Without wishing to be bound by theory, after a writing domain (e.g., RT domain) of a polypeptide described herein polymerizes (e.g., reverse transcribes) from the heterologous object sequence of a template nucleic acid (e.g., template RNA), the cellular DNA repair machinery must repair the nick on the first DNA strand. The target site DNA now contains two different sequences for the first DNA strand: one corresponding to the original genomic DNA (e.g., having a free 5′ end) and a second corresponding to that polymerized from the heterologous object sequence (e.g., having a free 3′ end). It is thought that the two different sequences equilibrate with one another, first one hybridizing the second strand, then the other, and which sequence the cellular DNA repair apparatus incorporates into its repaired target site may be a stochastic process. Without wishing to be bound by theory, it is thought that introducing an additional nick to the second-strand may bias the cellular DNA repair machinery to adopt the heterologous object sequence-based sequence more frequently than the original genomic sequence (Anzalone et al. Nature 576:149-157 (2019)). In some embodiments, the additional nick is positioned at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, or 150 nucleotides 5′ or 3′ of the target site modification (e.g., the insertion, deletion, or substitution) or to the nick on the first strand.


Alternatively or additionally, without wishing to be bound by theory, it is thought that an additional nick to the second strand may promote second-strand synthesis. In some embodiments, where the gene modifying system has inserted or substituted a portion of the first strand, synthesis of a new sequence corresponding to the insertion/substitution in the second strand is necessary.


In some embodiments, the polypeptide comprises a single domain having endonuclease activity (e.g., a single endonuclease domain) and said domain nicks both the first strand and the second strand. For example, in such an embodiment the endonuclease domain may be a CRISPR-associated endonuclease domain, and the template nucleic acid (e.g., template RNA) comprises a gRNA spacer that directs nicking of the first strand and an additional gRNA spacer that directs nicking of the second strand. In some embodiments, the polypeptide comprises a plurality of domains having endonuclease activity, and a first endonuclease domain nicks the first strand and a second endonuclease domain nicks the second strand (optionally, the first endonuclease domain does not (e.g., cannot) nick the second strand and the second endonuclease domain does not (e.g., cannot) nick the first strand).


In some embodiments, the endonuclease domain is capable of nicking a first strand and a second strand. In some embodiments, the first and second strand nicks occur at the same position in the target site but on opposite strands. In some embodiments, the second strand nick occurs in a staggered location, e.g., upstream or downstream, from the first nick. In some embodiments, the endonuclease domain generates a target site deletion if the second strand nick is upstream of the first strand nick. In some embodiments, the endonuclease domain generates a target site duplication if the second strand nick is downstream of the first strand nick. In some embodiments, the endonuclease domain generates no duplication and/or deletion if the first and second strand nicks occur in the same position of the target site. In some embodiments, the endonuclease domain has altered activity depending on protein conformation or RNA-binding status, e.g., which promotes the nicking of the first or second strand (e.g., as described in Christensen et al. PNAS 2006; incorporated by reference herein in its entirety).


In some embodiments, the endonuclease domain comprises a meganuclease, or a functional fragment thereof. In some embodiments, the endonuclease domain comprises a homing endonuclease, or a functional fragment thereof. In some embodiments, the endonuclease domain comprises a meganuclease from the LAGLIDADG (SEQ ID NO: 37638), GIY-YIG, HNH, His-Cys Box, or PD-(D/E) XK families, or a functional fragment or variant thereof, e.g., which possess conserved amino acid motifs, e.g., as indicated in the family names. In some embodiments, the endonuclease domain comprises a meganuclease, or fragment thereof, chosen from, e.g., I-SmaMI (Uniprot F7WD42), I-SceI (Uniprot P03882), I-Anil (Uniprot P03880), I-DmoI (Uniprot P21505), I-CreI (Uniprot P05725), I-TevI (Uniprot P13299), I-OnuI (Uniprot Q4VWW5), or I-BmoI (Uniprot Q9ANR6). In some embodiments, the meganuclease is naturally monomeric, e.g., I-SceI, I-TevI, or dimeric, e.g., I-CreI, in its functional form. For example, the LAGLIDADG meganucleases (SEQ ID NO: 37638) with a single copy of the LAGLIDADG motif (SEQ ID NO: 37638) generally form homodimers, whereas members with two copies of the LAGLIDADG motif (SEQ ID NO: 37638) are generally found as monomers. In some embodiments, a meganuclease that normally forms as a dimer is expressed as a fusion, e.g., the two subunits are expressed as a single ORF and, optionally, connected by a linker, e.g., an I-CreI dimer fusion (Rodriguez-Fornes et al. Gene Therapy 2020; incorporated by reference herein in its entirety). In some embodiments, a meganuclease, or a functional fragment thereof, is altered to favor nickase activity for one strand of a double-stranded DNA molecule, e.g., I-SceI (K1221 and/or K223I) (Niu et al. J Mol Biol 2008), I-Anil (K227M) (McConnell Smith et al. PNAS 2009), I-DmoI (Q42A and/or K120M) (Molina et al. J Biol Chem 2015). In some embodiments, a meganuclease or functional fragment thereof possessing this preference for single-strand cleavage is used as an endonuclease domain, e.g., with nickase activity. In some embodiments, an endonuclease domain comprises a meganuclease, or a functional fragment thereof, which naturally targets or is engineered to target a safe harbor site, e.g., an I-CreI targeting SH6 site (Rodriguez-Fomes et al., supra). In some embodiments, an endonuclease domain comprises a meganuclease, or a functional fragment thereof, with a sequence tolerant catalytic domain, e.g., I-TevI recognizing the minimal motif CNNNG (Kleinstiver et al. PNAS 2012). In some embodiments, a target sequence tolerant catalytic domain is fused to a DNA binding domain, e.g., to direct activity, e.g., by fusing I-TevI to: (i) zinc fingers to create Tev-ZFEs (Kleinstiver et al. PNAS 2012), (ii) other meganucleases to create MegaTevs (Wolfs et al. Nucleic Acids Res 2014), and/or (iii) Cas9 to create TevCas9 (Wolfs et al. PNAS 2016).


In some embodiments, the endonuclease domain comprises a restriction enzyme, e.g., a Type IIS or Type IIP restriction enzyme. In some embodiments, the endonuclease domain comprises a Type IIS restriction enzyme, e.g., FokI, or a fragment or variant thereof. In some embodiments, the endonuclease domain comprises a Type IIP restriction enzyme, e.g., PvuII, or a fragment or variant thereof. In some embodiments, a dimeric restriction enzyme is expressed as a fusion such that it functions as a single chain, e.g., a FokI dimer fusion (Minczuk et al. Nucleic Acids Res 36(12):3926-3938 (2008)).


The use of additional endonuclease domains is described, for example, in Guha and Edgell Int J Mol Sci 18(22):2565 (2017), which is incorporated herein by reference in its entirety.


In some embodiments, a gene modifying polypeptide comprises a modification to an endonuclease domain, e.g., relative to a wild-type Cas protein. In some embodiments, the endonuclease domain comprises an addition, deletion, replacement, or modification to the amino acid sequence of the wild-type Cas protein. In some embodiments, the endonuclease domain is modified to include a heterologous functional domain that binds specifically to and/or induces endonuclease cleavage of a target nucleic acid (e.g., DNA) sequence of interest. In some embodiments, the endonuclease domain comprises a zinc finger. In embodiments, the endonuclease domain comprising the Cas domain is associated with a guide RNA (gRNA), e.g., as described herein. In some embodiments, the endonuclease domain is modified to include a functional domain that does not target a specific target nucleic acid (e.g., DNA) sequence. In embodiments, the endonuclease domain comprises a FokI domain.


In some embodiments, the endonuclease domain is associated with the target dsDNA in vitro at a frequency at least about 5-fold or 10-fold higher than with a scrambled dsDNA. In some embodiments, the endonuclease domain is associated with the target dsDNA in vitro at a frequency at least about 5-fold or 10-fold higher than with a scrambled dsDNA, e.g., in a cell (e.g., a HEK293T cell). In some embodiments, the frequency of association between the endonuclease domain and the target DNA or scrambled DNA is measured by ChIP-seq, e.g., as described in He and Pu (2010) Curr. Protoc Mol Biol Chapter 21 (incorporated by reference herein in its entirety).


In some embodiments, the endonuclease domain can catalyze the formation of a nick at a target sequence, e.g., to an increase of at least about 5-fold or 10-fold relative to a non-target sequence (e.g., relative to any other genomic sequence in the genome of the target cell). In some embodiments, the level of nick formation is determined using NickSeq, e.g., as described in Elacqua et al. (2019) bioRxiv doi.org/10.1101/867937 (incorporated herein by reference in its entirety).


In some embodiments, the endonuclease domain is capable of nicking DNA in vitro. In embodiments, the nick results in an exposed base. In embodiments, the exposed base can be detected using a nuclease sensitivity assay, e.g., as described in Chaudhry and Weinfeld (1995) Nucleic Acids Res 23(19):3805-3809 (incorporated by reference herein in its entirety). In embodiments, the level of exposed bases (e.g., detected by the nuclease sensitivity assay) is increased by at least 10%, 50%, or more relative to a reference endonuclease domain. In some embodiments, the reference endonuclease domain is an endonuclease domain from Cas9 of S. pyogenes.


In some embodiments, the endonuclease domain is capable of nicking DNA in a cell. In embodiments, the endonuclease domain is capable of nicking DNA in a HEK293T cell. In embodiments, an unrepaired nick that undergoes replication in the absence of Rad51 results in increased NHEJ rates at the site of the nick, which can be detected, e.g., by using a Rad51 inhibition assay, e.g., as described in Bothmer et al. (2017) Nat Commun 8:13905 (incorporated by reference herein in its entirety). In embodiments, NHEJ rates are increased above 0-5%. In embodiments, NHEJ rates are increased to 20-70% (e.g., between 30%-60% or 40-50%), e.g., upon Rad51 inhibition.


In some embodiments, the endonuclease domain releases the target after cleavage. In some embodiments, release of the target is indicated indirectly by assessing for multiple turnovers by the enzyme, e.g., as described in Yourik at al. RNA 25(1):35-44 (2019) (incorporated herein by reference in its entirety) and shown in FIG. 2. In some embodiments, the kexp of an endonuclease domain is 1×10−3-1×10−5 min−1 as measured by such methods.


In some embodiments, the endonuclease domain has a catalytic efficiency (kcat/Km) greater than about 1×108 s−1 M−1 in vitro. In embodiments, the endonuclease domain has a catalytic efficiency greater than about 1×105, 1×106, 1×107, or 1×108, s−1 M−1 in vitro. In embodiments, catalytic efficiency is determined as described in Chen et al. (2018) Science 360(6387):436-439 (incorporated herein by reference in its entirety). In some embodiments, the endonuclease domain has a catalytic efficiency (kcat/Km) greater than about 1×108 s−1 M−1 in cells. In embodiments, the endonuclease domain has a catalytic efficiency greater than about 1×105, 1×106, 1×107, or 1×108 s−1 M−1 in cells.


Gene Modifying Polypeptides Comprising Cas Domains


In some embodiments, a gene modifying polypeptide described herein comprises a Cas domain. In some embodiments, the Cas domain can direct the gene modifying polypeptide to a target site specified by a gRNA spacer, thereby modifying a target nucleic acid sequence in “cis”. In some embodiments, a gene modifying polypeptide is fused to a Cas domain. In some embodiments, a gene modifying polypeptide comprises a CRISPR/Cas domain (also referred to herein as a CRISPR-associated protein). In some embodiments, a CRISPR/Cas domain comprises a protein involved in the clustered regulatory interspaced short palindromic repeat (CRISPR) system, e.g., a Cas protein, and optionally binds a guide RNA, e.g., single guide RNA (sgRNA).


CRISPR systems are adaptive defense systems originally discovered in bacteria and archaea. CRISPR systems use RNA-guided nucleases termed CRISPR-associated or “Cas” endonucleases (e.g., Cas9 or Cpf1) to cleave foreign DNA. For example, in a typical CRISPR-Cas system, an endonuclease is directed to a target nucleotide sequence (e.g., a site in the genome that is to be sequence-edited) by sequence-specific, non-coding “guide RNAs” that target single- or double-stranded DNA sequences. Three classes (I-III) of CRISPR systems have been identified. The class II CRISPR systems use a single Cas endonuclease (rather than multiple Cas proteins). One class II CRISPR system includes a type II Cas endonuclease such as Cas9, a CRISPR RNA (“crRNA”), and a trans-activating crRNA (“tracrRNA”). The crRNA contains a “spacer” sequence, a typically about 20-nucleotide RNA sequence that corresponds to a target DNA sequence (“protospacer”). In the wild-type system, and in some engineered systems, crRNA also contains a region that binds to the tracrRNA to form a partially double-stranded structure that is cleaved by RNase III, resulting in a crRNA/tracrRNA hybrid molecule. A crRNA/tracrRNA hybrid then directs the Cas endonuclease to recognize and cleave a target DNA sequence. A target DNA sequence is generally adjacent to a “protospacer adjacent motif” (“PAM”) that is specific for a given Cas endonuclease and required for cleavage activity at a target site matching the spacer of the crRNA. CRISPR endonucleases identified from various prokaryotic species have unique PAM sequence requirements, e.g., as listed for exemplary Cas enzymes in Table 7; examples of PAM sequences include 5′-NGG (Streptococcus pyogenes), 5′-NNAGAA (Streptococcus thermophilus CRISPR1), 5′-NGGNG (Streptococcus thermophilus CRISPR3), and 5″-NNNGATT (Neisseria meningiditis). Some endonucleases, e.g., Cas9 endonucleases, are associated with G-rich PAM sites, e.g., 5′-NGG, and perform blunt-end cleaving of the target DNA at a location 3 nucleotides upstream from (5′ from) the PAM site. Another class II CRISPR system includes the type V endonuclease Cpf1, which is smaller than Cas9; examples include AsCpf1 (from Acidaminococcus sp.) and LbCpf1 (from Lachnospiraceae sp.). Cpf1-associated CRISPR arrays are processed into mature crRNAs without the requirement of a tracrRNA; in other words, a Cpf1 system, in some embodiments, comprises only Cpf1 nuclease and a crRNA to cleave a target DNA sequence. Cpf1 endonucleases, are typically associated with T-rich PAM sites, e.g., 5′-TTN. Cpf1 can also recognize a 5″-CTA PAM motif. Cpf1 typically cleaves a target DNA by introducing an offset or staggered double-strand break with a 4- or 5-nucleotide 5′ overhang, for example, cleaving a target DNA with a 5-nucleotide offset or staggered cut located 18 nucleotides downstream from (3′ from) from a PAM site on the coding strand and 23 nucleotides downstream from the PAM site on the complimentary strand; the 5-nucleotide overhang that results from such offset cleavage allows more precise genome editing by DNA insertion by homologous recombination than by insertion at blunt-end cleaved DNA. See, e.g., Zetsche et al. (2015) Cell, 163:759-771.


A variety of CRISPR associated (Cas) genes or proteins can be used in the technologies provided by the present disclosure and the choice of Cas protein will depend upon the particular conditions of the method. Specific examples of Cas proteins include class II systems including Cas1, Cas2, Cas3, Cas4, Cas5, Cash, Cas7, Cas8, Cas9, Cas10, Cpf1, C2C1, or C2C3. In some embodiments, a Cas protein, e.g., a Cas9 protein, may be from any of a variety of prokaryotic species. In some embodiments a particular Cas protein, e.g., a particular Cas9 protein, is selected to recognize a particular protospacer-adjacent motif (PAM) sequence. In some embodiments, a DNA-binding domain or endonuclease domain includes a sequence targeting polypeptide, such as a Cas protein, e.g., Cas9. In certain embodiments a Cas protein, e.g., a Cas9 protein, may be obtained from a bacteria or archaea or synthesized using known methods. In certain embodiments, a Cas protein may be from a gram-positive bacteria or a gram-negative bacteria. In certain embodiments, a Cas protein may be from a Streptococcus (e.g., a S. pyogenes, or a S. thermophilus), a Francisella (e.g., an F. novicida), a Staphylococcus (e.g., an S. aureus), an Acidaminococcus (e.g., an Acidaminococcus sp. BV3L6), a Neisseria (e.g., an N. meningitidis), a Cryptococcus, a Corynebacterium, a Haemophilus, a Eubacterium, a Pasteurella, a Prevotella, a Veillonella, or a Marinobacter.


In some embodiments, a gene modifying polypeptide may comprise the amino acid sequence of SEQ ID NO: 4000 below, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereto. In embodiments, the amino acid sequence of SEQ ID NO: 4000 below, or the sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereto, is positioned at the N-terminal end of the heterologous gene modifying polypeptide. In embodiments, the amino acid sequence of SEQ ID NO: 4000 below, or the sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereto, is positioned within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, or 30 amino acids of the N-terminal end of the gene modifying polypeptide.


Exemplary N-Terminal NLS-Cas9 Domain










(SEQ ID NO: 4000)



MPAAKRVKLDGGDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI






KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEE





SFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIK





FRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLE





NLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIG





DQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQL





PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQR





TFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAW





MTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNE





LTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISG





VEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLF





DDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDS





LTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVI





EMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGR





DMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKK





MKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDS





RMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGT





ALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANG





EIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRN





SDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSF





EKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYV





NFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSA





YNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITG





LYETRIDLSQLGGDGG






In some embodiments, a gene modifying polypeptide may comprise the amino acid sequence of SEQ ID NO: 4001 below, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereto. In embodiments, the amino acid sequence of SEQ ID NO: 4001 below, or the sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereto, is positioned at the C-terminal end of the heterologous gene modifying polypeptide. In embodiments, the amino acid sequence of SEQ ID NO: 4001 below, or the sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereto, is positioned within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, or 30 amino acids of the C-terminal end of the heterologous gene modifying polypeptide.


Exemplary C-Terminal Sequence











(SEQ ID NO: 4001)



AGKRTADGSEFEKRTADGSEFESPKKKAKVE






Exemplary Benchmarking Sequence










(SEQ ID NO: 4002)



MPAAKRVKLDGGDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI






KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEE





SFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIK





FRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLE





NLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIG





DQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQL





PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQR





TFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAW





MTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNE





LTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISG





VEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLF





DDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDS





LTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVI





EMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGR





DMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKK





MKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDS





RMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGT





ALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANG





EIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRN





SDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSF





EKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYV





NFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSA





YNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITG





LYETRIDLSQLGGDGGSGGSSGGSSGSETPGTSESATPESSGGSSGGSSGGTLNIEDEYRL





HETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQE





ARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTV





PNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRL





PQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLG





NLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLG





KAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTK





PFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTK





DAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVAL





NPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAG





AAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHG





EIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQA





ARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEAGKRTADGSEFEKRTADGSEFESPK





KKAKVE.






In some embodiments, a gene modifying polypeptide may comprise a Cas domain as listed in Table 7 or 8, or a functional fragment thereof, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity thereto.









TABLE 7







CRISPR/Cas Proteins, Species, and Mutations
















# of

Mutations to alter
Mutations to make


Name
Enzyme
Species
AAs
PAM
PAM recognition
catalytically dead





FnCas9
Cas9

Francisella

1629
5′-NGG-3′
Wt
D11A/H969A/N995A





novicida










FnCas9
Cas9

Francisella

1629
5′-YG-3′
E1369R/E1449H/
D11A/H969A/N995A


RHA


novicida



R1556A






SaCas9
Cas9

Staphylococcus

1053
5′-
Wt
D10A/H557A





aureus


NNGRRT-3′







SaCas9
Cas9

Staphylococcus

1053
5′-
E782K/N968K/
D10A/H557A


KKH


aureus


NNNRRT-3′
R1015H






SpCas9
Cas9

Streptococcus

1368
5′-NGG-3′
Wt
D10A/D839A/H840A/





pyogenes




N863A





SpCas9
Cas9

Streptococcus

1368
5′-NGA-3′
D1135V/R1335Q/
D10A/D839A/H840A/


VQR


pyogenes



T1337R
N863A





AsCpf1
Cpf1

Acidaminococcus

1307
5′-TYCV-3′
S542R/K607R
E993A


RR

sp. BV3L6









AsCpf1
Cpf1

Acidaminococcus

1307
5′-TATV-3′
S542R/K548V/N552R
E993A


RVR

sp. BV3L6









FnCpf1
Cpf1

Francisella

1300
5′-NTTN-3′
Wt
D917A/E1006A/





novicida




D1255A





NmCas9
Cas9

Neisseria

1082
5′-
Wt
D16A/D587A/H588A/





meningitidis


NNNGATT-3′

N611A
















TABLE 8







Amino Acid Sequences of CRISPR/Cas Proteins, Species, and Mutations














Parental

SEQ ID
Nickase
Nickase
Nickase


Variant
Host(s)
Protein Sequence
NO:
(HNH)
(HNH)
(RuvC)





Nme2Cas9

Neisseria

MAAFKPNPINYILGLDIGIASVGWAMVEIDEEENPIRLIDLGVRVFERAEVPK
9,001
N611A
H588A
D16A




meningitidis

TGDSLAMARRLARSVRRLTRRRAHRLLRARRLLKREGVLQAADFDENGLIKS








LPNTPWQLRAAALDRKLTPLEWSAVLLHLIKHRGYLSQRKNEGETADKELG








ALLKGVANNAHALQTGDFRTPAELALNKFEKESGHIRNQRGDYSHTFSRKD








LQAELILLFEKQKEFGNPHVSGGLKEGIETLLMTQRPALSGDAVQKMLGHCT








FEPAEPKAAKNTYTAERFIWLTKLNNLRILEQGSERPLTDTERATLMDEPYRK








SKLTYAQARKLLGLEDTAFFKGLRYGKDNAEASTLMEMKAYHAISRALEKEG








LKDKKSPLNLSSELQDEIGTAFSLFKTDEDITGRLKDRVQPEILEALLKHISFDKF








VQISLKALRRIVPLMEQGKRYDEACAEIYGDHYGKKNTEEKIYLPPIPADEIRN








PVVLRALSQARKVINGVVRRYGSPARIHIETAREVGKSFKDRKEIEKRQEENR








KDREKAAAKFREYFPNFVGEPKSKDILKLRLYEQQHGKCLYSGKEINLVRLNE








KGYVEIDHALPFSRTWDDSFNNKVLVLGSENQNKGNQTPYEYFNGKDNSR








EWQEFKARVETSRFPRSKKQRILLQKFDEDGFKECNLNDTRYVNRFLCQFVA








DHILLTGKGKRRVFASNGQITNLLRGFWGLRKVRAENDRHHALDAVVVACS








TVAMQQKITRFVRYKEMNAFDGKTIDKETGKVLHQKTHFPQPWEFFAQEV








MIRVFGKPDGKPEFEEADTPEKLRTLLAEKLSSRPEAVHEYVTPLFVSRAPNR








KMSGAHKDTLRSAKRFVKHNEKISVKRVWLTEIKLADLENMVNYKNGREIEL








YEALKARLEAYGGNAKQAFDPKDNPFYKKGGQLVKAVRVEKTQESGVLLNK








KNAYTIADNGDMVRVDVFCKVDKKGKNQYFIVPIYAWQVAENILPDIDCKG








YRIDDSYTFCFSLHKYDLIAFQKDEKSKVEFAYYINCDSSNGRFYLAWHDKGS








KEQQFRISTQNLVLIQKYQVNELGKEIRPCRLKKRPPVR









PpnCas9

Pasteurella

MQNNPLNYILGLDLGIASIGWAVVEIDEESSPIRLIDVGVRTFERAEVAKTGE
9,002
N605A
H582A
D13A




pneumotropica

SLALSRRLARSSRRLIKRRAERLKKAKRLLKAEKILHSIDEKLPINVWQLRVKGL








KEKLERQEWAAVLLHLSKHRGYLSQRKNEGKSDNKELGALLSGIASNHQML








QSSEYRTPAEIAVKKFQVEEGHIRNQRGSYTHTFSRLDLLAEMELLFQRQAEL








GNSYTSTTLLENLTALLMWQKPALAGDAILKMLGKCTFEPSEYKAAKNSYSA








ERFVWLTKLNNLRILENGTERALNDNERFALLEQPYEKSKLTYAQVRAMLAL








SDNAIFKGVRYLGEDKKTVESKTTLIEMKFYHQIRKTLGSAELKKEWNELKGN








SDLLDEIGTAFSLYKTDDDICRYLEGKLPERVLNALLENLNFDKFIQLSLKALHQ








ILPLMLQGQRYDEAVSAIYGDHYGKKSTETTRLLPTIPADEIRNPVVLRTLTQA








RKVINAVVRLYGSPARIHIETAREVGKSYQDRKKLEKQQEDNRKQRESAVKK








FKEMFPHFVGEPKGKDILKMRLYELQQAKCLYSGKSLELHRLLEKGYVEVDH








ALPFSRTWDDSFNNKVLVLANENQNKGNLTPYEWLDGKNNSERWQHFVV








RVQTSGFSYAKKQRILNHKLDEKGFIERNLNDTRYVARFLCNFIADNMLLVG








KGKRNVFASNGQITALLRHRWGLQKVREQNDRHHALDAVVVACSTVAMQ








QKITRFVRYNEGNVFSGERIDRETGEIIPLHFPSPWAFFKENVEIRIFSENPKLE








LENRLPDYPQYNHEWVQPLFVSRMPTRKMTGQGHMETVKSAKRLNEGLS








VLKVPLTQLKLSDLERMVNRDREIALYESLKARLEQFGNDPAKAFAEPFYKKG








GALVKAVRLEQTQKSGVLVRDGNGVADNASMVRVDVFTKGGKYFLVPIYT








WQVAKGILPNRAATQGKDENDWDIMDEMATFQFSLCQNDLIKLVTKKKTI








FGYFNGLNRATSNINIKEHDLDKSKGKLGIYLEVGVKLAISLEKYQVDELGKNI








RPCRPTKRQHVR









SauCas9

Staphylococcus

MKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGA
9,003
N580A
H557A
D10A




aureus

RRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSA








ALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKK








DGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYE








GPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLN








NLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTST








GKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSE








LTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKV








DLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSK








DAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYS








LEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPFQ








YLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNL








VDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKG








YKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQ








EYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRELINDTLYSTRKDDKGNTLIVN








NLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNPL








YKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVKL








SLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQA








EFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPP








RIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG









SauCas9-

Staphylococcus

MKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGA
9,004
N580A
H557A
D10A


KKH

aureus

RRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSA








ALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKK








DGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYE








GPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLN








NLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTST








GKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSE








LTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKV








DLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSK








DAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYS








LEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPFQ








YLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNL








VDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKG








YKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQ








EYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRKLINDTLYSTRKDDKGNTLIV








NNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNP








LYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVK








LSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQ








AEFIASFYKNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRP








PHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG









SauriCas9

Staphylococcus

MQENQQKQNYILGLDIGITSVGYGLIDSKTREVIDAGVRLFPEADSENNSNR
9,005
N588A
H565A
D15A




auricularis

RSKRGARRLKRRRIHRLNRVKDLLADYQMIDLNNVPKSTDPYTIRVKGLREPL








TKEEFAIALLHIAKRRGLHNISVSMGDEEQDNELSTKQQLQKNAQQLQDKY








VCELQLERLTNINKVRGEKNRFKTEDFVKEVKQLCETQRQYHNIDDQFIQQY








IDLVSTRREYFEGPGNGSPYGWDGDLLKWYEKLMGRCTYFPEELRSVKYAYS








ADLFNALNDLNNLVVTRDDNPKLEYYEKYHIIENVFKQKKNPTLKQIAKEIGV








QDYDIRGYRITKSGKPQFTSFKLYHDLKNIFEQAKYLEDVEMLDEIAKILTIYQ








DEISIKKALDQLPELLTESEKSQIAQLTGYTGTHRLSLKCIHIVIDELWESPENQ








MEIFTRLNLKPKKVEMSEIDSIPTTLVDEFILSPVVKRAFIQSIKVINAVINRFGL








PEDIIIELAREKNSKDRRKFINKLQKQNEATRKKIEQLLAKYGNTNAKYMIEKI








KLHDMQEGKCLYSLEAIPLEDLLSNPTHYEVDHIIPRSVSFDNSLNNKVLVKQ








SENSKKGNRTPYQYLSSNESKISYNQFKQHILNLSKAKDRISKKKRDMLLEER








DINKFEVQKEFINRNLVDTRYATRELSNLLKTYFSTHDYAVKVKTINGGFTNH








LRKVWDFKKHRNHGYKHHAEDALVIANADFLFKTHKALRRTDKILEQPGLE








VNDTTVKVDTEEKYQELFETPKQVKNIKQFRDFKYSHRVDKKPNRQLINDTL








YSTREIDGETYVVQTLKDLYAKDNEKVKKLFTERPQKILMYQHDPKTFEKLM








TILNQYAEAKNPLAAYYEDKGEYVTKYAKKGNGPAIHKIKYIDKKLGSYLDVS








NKYPETQNKLVKLSLKSFRFDIYKCEQGYKMVSIGYLDVLKKDNYYYIPKDKYE








AEKQKKKIKESDLFVGSFYYNDLIMYEDELFRVIGVNSDINNLVELNMVDITY








KDFCEVNNVTGEKRIKKTIGKRVVLIEKYTTDILGNLYKTPLPKKPQLIFKRGEL









SauriCas9-

Staphylococcus

MQENQQKQNYILGLDIGITSVGYGLIDSKTREVIDAGVRLFPEADSENNSNR
9,006
N588A
H565A
D15A


KKH

auricularis

RSKRGARRLKRRRIHRLNRVKDLLADYQMIDLNNVPKSTDPYTIRVKGLREPL








TKEEFAIALLHIAKRRGLHNISVSMGDEEQDNELSTKQQLQKNAQQLQDKY








VCELQLERLTNINKVRGEKNRFKTEDFVKEVKQLCETQRQYHNIDDQFIQQY








IDLVSTRREYFEGPGNGSPYGWDGDLLKWYEKLMGRCTYFPEELRSVKYAYS








ADLFNALNDLNNLVVTRDDNPKLEYYEKYHIIENVFKQKKNPTLKQIAKEIGV








QDYDIRGYRITKSGKPQFTSFKLYHDLKNIFEQAKYLEDVEMLDEIAKILTIYQ








DEISIKKALDQLPELLTESEKSQIAQLTGYTGTHRLSLKCIHIVIDELWESPENQ








MEIFTRLNLKPKKVEMSEIDSIPTTLVDEFILSPVVKRAFIQSIKVINAVINRFGL








PEDIIIELAREKNSKDRRKFINKLQKQNEATRKKIEQLLAKYGNTNAKYMIEKI








KLHDMQEGKCLYSLEAIPLEDLLSNPTHYEVDHIIPRSVSFDNSLNNKVLVKQ








SENSKKGNRTPYQYLSSNESKISYNQFKQHILNLSKAKDRISKKKRDMLLEER








DINKFEVQKEFINRNLVDTRYATRELSNLLKTYFSTHDYAVKVKTINGGFTNH








LRKVWDFKKHRNHGYKHHAEDALVIANADFLFKTHKALRRTDKILEQPGLE








VNDTTVKVDTEEKYQELFETPKQVKNIKQFRDFKYSHRVDKKPNRKLINDTL








YSTREIDGETYVVQTLKDLYAKDNEKVKKLFTERPQKILMYQHDPKTFEKLM








TILNQYAEAKNPLAAYYEDKGEYVTKYAKKGNGPAIHKIKYIDKKLGSYLDVS








NKYPETQNKLVKLSLKSFRFDIYKCEQGYKMVSIGYLDVLKKDNYYYIPKDKYE








AEKQKKKIKESDLFVGSFYKNDLIMYEDELFRVIGVNSDINNLVELNMVDITY








KDFCEVNNVTGEKHIKKTIGKRVVLIEKYTTDILGNLYKTPLPKKPQLIFKRGEL









ScaCas9-

Streptococcus

MEKKYSIGLDIGTNSVGWAVITDDYKVPSKKFKVLGNTNRKSIKKNLMGALL
9,007
N872A
H849A
D10A


Sc++

canis

FDSGETAEATRLKRTARRRYTRRKNRIRYLQEIFANEMAKLDDSFFQRLEESF








LVEEDKKNERHPIFGNLADEVAYHRNYPTIYHLRKKLADSPEKADLRLIYLALA








HIIKFRGHFLIEGKLNAENSDVAKLFYQLIQTYNQLFEESPLDEIEVDAKGILSA








RLSKSKRLEKLIAVFPNEKKNGLFGNIIALALGLTPNFKSNFDLTEDAKLQLSKD








TYDDDLDELLGQIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKAPLSASMV








KRYDEHHQDLALLKTLVRQQFPEKYAEIFKDDTKNGYAGYVGADKKLRKRS








GKLATEEEFYKFIKPILEKMDGAEELLAKLNRDDLLRKQRTFDNGSIPHQIHLK








ELHAILRRQEEFYPFLKENREKIEKILTFRIPYYVGPLARGNSRFAWLTRKSEEA








ITPWNFEEVVDKGASAQSFIERMTNFDEQLPNKKVLPKHSLLYEYFTVYNEL








TKVKYVTERMRKPEFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDS








VEIIGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIE








ERLKTYAHLFDDKVMKQLKRRHYTGWGRLSRKMINGIRDKQSGKTILDFLKS








DGFSNRNFMQLIHDDSLTFKEEIEKAQVSGQGDSLHEQIADLAGSPAIKKGIL








QTVKIVDELVKVMGHKPENIVIEMARENQTTTKGLQQSRERKKRIEEGIKELE








SQILKENPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVP








QSFIKDDSIDNKVLTRSVENRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQ








RKFDNLTKAERGGLSEADKAGFIKRQLVETRQITKHVARILDSRMNTKRDKN








DKPIREVKVITLKSKLVSDFRKDFQLYKVRDINNYHHAHDAYLNAVVGTALIK








KYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKRFFYSNIMNFFKTEVKL








ANGEIRKRPLIETNGETGEVVWNKEKDFATVRKVLAMPQVNIVKKTEVQTG








GFSKESILSKRESAKLIPRKKGWDTRKYGGFGSPTVAYSILVVAKVEKGKAKKL








KSVKVLVGITIMEKGSYEKDPIGFLEAKGYKDIKKELIFKLPKYSLFELENGRRR








MLASAKELQKANELVLPQHLVRLLYYTQNISATTGSNNLGYIEQHREEFKEIF








EKIIDFSEKYILKNKVNSNLKSSFDEQFAVSDSILLSNSFVSLLKYTSFGASGGFT








FLDLDVKQGRLRYQTVTEVLDATLIYQSITGLYETRTDLSQLGGD









SpyCas9

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,008
N863A
H840A
D10A




pyogenes

DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF








YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ








EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE








EVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR








NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKV








VDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQ








ILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES








ILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE








LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASA








GELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEII








EQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAF








KYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD









SpyCas9-

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,009
N863A
H840A
D10A


NG

pyogenes

DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF








YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ








EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE








EVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR








NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKV








VDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQ








ILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES








IRPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKE








LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASA








RFLQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEII








EQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPRAF








KYFDTTIDRKVYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD









SpyCas9-

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,010
N863A
H840A
D10A


SpRY

pyogenes

DSGETAERTRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF








YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ








EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE








EVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR








NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKV








VDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQ








ILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES








IRPKRNSDKLIARKKDWDPKKYGGFLWPTVAYSVLVVAKVEKGKSKKLKSVK








ELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLAS








AKQLQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDE








IIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTRLGAPRAF








KYFDTTIDPKQYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD









St1Cas9

Streptococcus

MSDLVLGLDIGIGSVGVGILNKVTGEIIHKNSRIFPAAQAENNLVRRTNRQG
9,011
N622A
H599A
D9A




thermophilus

RRLARRKKHRRVRLNRLFEESGLITDFTKISINLNPYQLRVKGLTDELSNEELFI








ALKNMVKHRGISYLDDASDDGNSSVGDYAQIVKENSKQLETKTPGQIQLER








YQTYGQLRGDFTVEKDGKKHRLINVFPTSAYRSEALRILQTQQEFNPQITDEF








INRYLEILTGKRKYYHGPGNEKSRTDYGRYRTSGETLDNIFGILIGKCTFYPDEF








RAAKASYTAQEFNLLNDLNNLTVPTETKKLSKEQKNQIINYVKNEKAMGPAK








LFKYIAKLLSCDVADIKGYRIDKSGKAEIHTFEAYRKMKTLETLDIEQMDRETL








DKLAYVLTLNTEREGIQEALEHEFADGSFSQKQVDELVQFRKANSSIFGKGW








HNFSVKLMMELIPELYETSEEQMTILTRLGKQKTTSSSNKTKYIDEKLLTEEIY








NPVVAKSVRQAIKIVNAAIKEYGDFDNIVIEMARETNEDDEKKAIQKIQKAN








KDEKDAAMLKAANQYNGKAELPHSVFHGHKQLATKIRLWHQQGERCLYT








GKTISIHDLINNSNQFEVDHILPLSITFDDSLANKVLVYATANQEKGQRTPYQ








ALDSMDDAWSFRELKAFVRESKTLSNKKKEYLLTEEDISKFDVRKKFIERNLV








DTRYASRVVLNALQEHFRAHKIDTKVSVVRGQFTSQLRRHWGIEKTRDTYH








HHAVDALIIAASSQLNLWKKQKNTLVSYSEDQLLDIETGELISDDEYKESVFK








APYQHFVDTLKSKEFEDSILFSYQVDSKFNRKISDATIYATRQAKVGKDKADE








TYVLGKIKDIYTQDGYDAFMKIYKKDKSKFLMYRHDPQTFEKVIEPILENYPN








KQINEKGKEVPCNPFLKYKEEHGYIRKYSKKGNGPEIKSLKYYDSKLGNHIDIT








PKDSNNKVVLQSVSPWRADVYFNKTTGKYEILGLKYADLQFEKGTGTYKISQ








EKYNDIKKKEGVDSDSEFKFTLYKNDLLLVKDTETKEQQLFRFLSRTMPKQKH








YVELKPYDKQKFEGGEALIKVLGNVANSGQCKKGLGKSNISIYKVRTDVLGN








QHIIKNEGDKPKLDF









BlatCas9

Brevibacillus

MAYTMGIDVGIASCGWAIVDLERQRIIDIGVRTFEKAENPKNGEALAVPRRE
9,012
N607A
H584A
D8A




laterosporus

ARSSRRRLRRKKHRIERLKHMFVRNGLAVDIQHLEQTLRSQNEIDVWQLRV








DGLDRMLTQKEWLRVLIHLAQRRGFQSNRKTDGSSEDGQVLVNVTENDRL








MEEKDYRTVAEMMVKDEKFSDHKRNKNGNYHGVVSRSSLLVEIHTLFETQ








RQHHNSLASKDFELEYVNIWSAQRPVATKDQIEKMIGTCTFLPKEKRAPKAS








WHFQYFMLLQTINHIRITNVQGTRSLNKEEIEQVVNMALTKSKVSYHDTRKI








LDLSEEYQFVGLDYGKEDEKKKVESKETIIKLDDYHKLNKIFNEVELAKGETWE








ADDYDTVAYALTFFKDDEDIRDYLQNKYKDSKNRLVKNLANKEYTNELIGKV








STLSFRKVGHLSLKALRKIIPFLEQGMTYDKACQAAGFDFQGISKKKRSVVLP








VIDQISNPVVNRALTQTRKVINALIKKYGSPETIHIETARELSKTFDERKNITKD








YKENRDKNEHAKKHLSELGIINPTGLDIVKYKLWCEQQGRCMYSNQPISFER








LKESGYTEVDHIIPYSRSMNDSYNNRVLVMTRENREKGNQTPFEYMGNDT








QRWYEFEQRVTTNPQIKKEKRQNLLLKGFTNRRELEMLERNLNDTRYITKYL








SHFISTNLEFSPSDKKKKVVNTSGRITSHLRSRWGLEKNRGQNDLHHAMDAI








VIAVTSDSFIQQVTNYYKRKERRELNGDDKFPLPWKFFREEVIARLSPNPKEQ








IEALPNHFYSEDELADLQPIFVSRMPKRSITGEAHQAQFRRVVGKTKEGKNIT








AKKTALVDISYDKNGDFNMYGRETDPATYEAIKERYLEFGGNVKKAFSTDLH








KPKKDGTKGPLIKSVRIMENKTLVHPVNKGKGVVYNSSIVRTDVFQRKEKYY








LLPVYVTDVTKGKLPNKVIVAKKGYHDWIEVDDSFTFLFSLYPNDLIFIRQNPK








KKISLKKRIESHSISDSKEVQEIHAYYKGVDSSTAAIEFIIHDGSYYAKGVGVQN








LDCFEKYQVDILGNYFKVKGEKRLELETSDSNHKGKDVNSIKSTSR









cCas9-v16

Staphylococcus

MKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGA
9,013
N580A
H557A
D10A




aureus

RRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSA








ALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKK








DGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYE








GPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLN








NLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTST








GKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSE








LTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKV








DLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSK








DAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYS








LEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPFQ








YLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNL








VDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKG








YKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQ








EYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRKLINDTLYSTRKDDKGNTLIV








NNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNP








LYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVK








LSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQ








AEFIASFYKNDLIKINGELYRVIGVNSDKNNLIEVNMIDITYREYLENMNDKRP








PHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG









cCas9-v17

Staphylococcus

MKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGA
9,014
N580A
H557A
D10A




aureus

RRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSA








ALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKK








DGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYE








GPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLN








NLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTST








GKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSE








LTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKV








DLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSK








DAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYS








LEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPFQ








YLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNL








VDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKG








YKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQ








EYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRKLINDTLYSTRKDDKGNTLIV








NNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNP








LYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVK








LSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQ








AEFIASFYKNDLIKINGELYRVIGVNNSTRNIVELNMIDITYREYLENMNDKRP








PHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG









cCas9-v21

Staphylococcus

MKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGA
9,015
N580A
H557A
D10A




aureus

RRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSA








ALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKK








DGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYE








GPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLN








NLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTST








GKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSE








LTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKV








DLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSK








DAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYS








LEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPFQ








YLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNL








VDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKG








YKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQ








EYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRKLINDTLYSTRKDDKGNTLIV








NNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNP








LYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVK








LSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQ








AEFIASFYKNDLIKINGELYRVIGVNSDDRNIIELNMIDITYREYLENMNDKRP








PHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG









cCas9-v42

Staphylococcus

MKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGA
9,016
N580A
H557A
D10A




aureus

RRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSA








ALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKK








DGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYE








GPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLN








NLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTST








GKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSE








LTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKV








DLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSK








DAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYS








LEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPFQ








YLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNL








VDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKG








YKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQ








EYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRKLINDTLYSTRKDDKGNTLIV








NNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNP








LYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVK








LSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQ








AEFIASFYKNDLIKINGELYRVIGVNNNRLNKIELNMIDITYREYLENMNDKRP








PHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG









CdiCas9

Corynebacterium

MKYHVGIDVGTFSVGLAAIEVDDAGMPIKTLSLVSHIHDSGLDPDEIKSAVT
9,017
N597A
H573A
D8A




diphtheriae

RLASSGIARRTRRLYRRKRRRLQQLDKFIQRQGWPVIELEDYSDPLYPWKVR








AELAASYIADEKERGEKLSVALRHIARHRGWRNPYAKVSSLYLPDGPSDAFK








AIREEIKRASGQPVPETATVGQMVTLCELGTLKLRGEGGVLSARLQQSDYAR








EIQEICRMQEIGQELYRKIIDVVFAAESPKGSASSRVGKDPLQPGKNRALKAS








DAFQRYRIAALIGNLRVRVDGEKRILSVEEKNLVFDHLVNLTPKKEPEWVTIA








EILGIDRGQLIGTATMTDDGERAGARPPTHDTNRSIVNSRIAPLVDWWKTA








SALEQHAMVKALSNAEVDDFDSPEGAKVQAFFADLDDDVHAKLDSLHLPV








GRAAYSEDTLVRLTRRMLSDGVDLYTARLQEFGIEPSWTPPTPRIGEPVGNP








AVDRVLKTVSRWLESATKTWGAPERVIIEHVREGFVTEKRAREMDGDMRR








RAARNAKLFQEMQEKLNVQGKPSRADLWRYQSVQRQNCQCAYCGSPITF








SNSEMDHIVPRAGQGSTNTRENLVAVCHRCNQSKGNTPFAIWAKNTSIEG








VSVKEAVERTRHWVTDTGMRSTDFKKFTKAVVERFQRATMDEEIDARSME








SVAWMANELRSRVAQHFASHGTTVRVYRGSLTAEARRASGISGKLKFFDGV








GKSRLDRRHHAIDAAVIAFTSDYVAETLAVRSNLKQSQAHRQEAPQWREFT








GKDAEHRAAWRVWCQKMEKLSALLTEDLRDDRVVVMSNVRLRLGNGSA








HKETIGKLSKVKLSSQLSVSDIDKASSEALWCALTREPGFDPKEGLPANPERHI








RVNGTHVYAGDNIGLFPVSAGSIALRGGYAELGSSFHHARVYKITSGKKPAF








AMLRVYTIDLLPYRNQDLFSVELKPQTMSMRQAEKKLRDALATGNAEYLG








WLVVDDELVVDTSKIATDQVKAVEAELGTIRRWRVDGFFSPSKLRLRPLQM








SKEGIKKESAPELSKIIDRPGWLPAVNKLFSDGNVTVVRRDSLGRVRLESTAH








LPVTWKVQ









CjeCas9

Campylobacter

MARILAFDIGISSIGWAFSENDELKDCGVRIFTKVENPKTGESLALPRRLARSA
9,018
N582A
H559A
D8A




jejuni

RKRLARRKARLNHLKHLIANEFKLNYEDYQSFDESLAKAYKGSLISPYELRFRA








LNELLSKQDFARVILHIAKRRGYDDIKNSDDKEKGAILKAIKQNEEKLANYQS








VGEYLYKEYFQKFKENSKEFTNVRNKKESYERCIAQSFLKDELKLIFKKQREFG








FSFSKKFEEEVLSVAFYKRALKDFSHLVGNCSFFTDEKRAPKNSPLAFMFVAL








TRIINLLNNLKNTEGILYTKDDLNALLNEVLKNGTLTYKQTKKLLGLSDDYEFK








GEKGTYFIEFKKYKEFIKALGEHNLSQDDLNEIAKDITLIKDEIKLKKALAKYDLN








QNQIDSLSKLEFKDHLNISFKALKLVTPLMLEGKKYDEACNELNLKVAINEDK








KDFLPAFNETYYKDEVTNPVVLRAIKEYRKVLNALLKKYGKVHKINIELAREVG








KNHSQRAKIEKEQNENYKAKKDAELECEKLGLKINSKNILKLRLFKEQKEFCAY








SGEKIKISDLQDEKMLEIDHIYPYSRSFDDSYMNKVLVFTKQNQEKLNQTPFE








AFGNDSAKWQKIEVLAKNLPTKKQKRILDKNYKDKEQKNFKDRNLNDTRYI








ARLVLNYTKDYLDFLPLSDDENTKLNDTQKGSKVHVEAKSGMLTSALRHTW








GFSAKDRNNHLHHAIDAVIIAYANNSIVKAFSDFKKEQESNSAELYAKKISELD








YKNKRKFFEPFSGFRQKVLDKIDEIFVSKPERKKPSGALHEETFRKEEEFYQSY








GGKEGVLKALELGKIRKVNGKIVKNGDMFRVDIFKHKKTNKFYAVPIYTMDF








ALKVLPNKAVARSKKGEIKDWILMDENYEFCFSLYKDSLILIQTKDMQEPEFV








YYNAFTSSTVSLIVSKHDNKFETLSKNQKILFKNANEKEVIAKSIGIQNLKVFEK








YIVSALGEVTKAEFRQREDFKK









GeoCas9

Geobacillus

MRYKIGLDIGITSVGWAVMNLDIPRIEDLGVRIFDRAENPQTGESLALPRRLA
9,019
N605A
H582A
D8A




stearothermo-

RSARRRLRRRKHRLERIRRLVIREGILTKEELDKLFEEKHEIDVWQLRVEALDR








philus

KLNNDELARVLLHLAKRRGFKSNRKSERSNKENSTMLKHIEENRAILSSYRTV








GEMIVKDPKFALHKRNKGENYTNTIARDDLEREIRLIFSKQREFGNMSCTEEF








ENEYITIWASQRPVASKDDIEKKVGFCTFEPKEKRAPKATYTFQSFIAWEHIN








KLRLISPSGARGLTDEERRLLYEQAFQKNKITYHDIRTLLHLPDDTYFKGIVYDR








GESRKQNENIRFLELDAYHQIRKAVDKVYGKGKSSSFLPIDFDTFGYALTLFKD








DADIHSYLRNEYEQNGKRMPNLANKVYDNELIEELLNLSFTKFGHLSLKALRS








ILPYMEQGEVYSSACERAGYTFTGPKKKQKTMLLPNIPPIANPVVMRALTQA








RKVVNAIIKKYGSPVSIHIELARDLSQTFDERRKTKKEQDENRKKNETAIRQL








MEYGLTLNPTGHDIVKFKLWSEQNGRCAYSLQPIEIERLLEPGYVEVDHVIPY








SRSLDDSYTNKVLVLTRENREKGNRIPAEYLGVGTERWQQFETFVLTNKQFS








KKKRDRLLRLHYDENEETEFKNRNLNDTRYISRFFANFIREHLKFAESDDKQK








VYTVNGRVTAHLRSRWEFNKNREESDLHHAVDAVIVACTTPSDIAKVTAFY








QRREQNKELAKKTEPHFPQPWPHFADELRARLSKHPKESIKALNLGNYDDQ








KLESLQPVFVSRMPKRSVTGAAHQETLRRYVGIDERSGKIQTVVKTKLSEIKL








DASGHFPMYGKESDPRTYEAIRQRLLEHNNDPKKAFQEPLYKPKKNGEPGP








VIRTVKIIDTKNQVIPLNDGKTVAYNSNIVRVDVFEKDGKYYCVPVYTMDIM








KGILPNKAIEPNKPYSEWKEMTEDYTFRFSLYPNDLIRIELPREKTVKTAAGEE








INVKDVFVYYKTIDSANGGLELISHDHRFSLRGVGSRTLKRFEKYQVDVLGNI








YKVRGEKRVGLASSAHSKPGKTIRPLQSTRD









iSpyMacCas9

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,020
N863A
H840A
D10A



spp.
DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRKLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF








YKFIKPILEKMDGTEELLVKLKREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ








EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE








EVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR








NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKV








VDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQ








ILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEIQTVGQNGG








LFDDNPKSPLEVTPSKLVPLKKELNPKKYGGYQKPTTAYPVLLITDTKQLIPISV








MNKKQFEQNPVKFLRDRGYQQVGKNDFIKLPKYTLVDIGDGIKRLWASSKEI








HKGNQLVVSKKSQILLYHAHHLDSDLSNDYLQNHNQQFDVLFNEIISFSKKC








KLGKEHIQKIENVYSNKKNSASIEELAESFIKLLGFTQLGATSPFNFLGVKLNQ








KQYKGKKDYILPCTEGTLIRQSITGLYETRVDLSKIGEDSGGSGGSKRTADGSE








FES









NmeCas9

Neisseria

MAAFKPNSINYILGLDIGIASVGWAMVEIDEEENPIRLIDLGVRVFERAEVPK
9,021
N611A
H588A
D16A




meningitidis

TGDSLAMARRLARSVRRLTRRRAHRLLRTRRLLKREGVLQAANFDENGLIKS








LPNTPWQLRAAALDRKLTPLEWSAVLLHLIKHRGYLSQRKNEGETADKELG








ALLKGVAGNAHALQTGDFRTPAELALNKFEKESGHIRNQRSDYSHTFSRKDL








QAELILLFEKQKEFGNPHVSGGLKEGIETLLMTQRPALSGDAVQKMLGHCTF








EPAEPKAAKNTYTAERFIWLTKLNNLRILEQGSERPLTDTERATLMDEPYRKS








KLTYAQARKLLGLEDTAFFKGLRYGKDNAEASTLMEMKAYHAISRALEKEGL








KDKKSPLNLSPELQDEIGTAFSLFKTDEDITGRLKDRIQPEILEALLKHISFDKFV








QISLKALRRIVPLMEQGKRYDEACAEIYGDHYGKKNTEEKIYLPPIPADEIRNP








VVLRALSQARKVINGVVRRYGSPARIHIETAREVGKSFKDRKEIEKRQEENRK








DREKAAAKFREYFPNFVGEPKSKDILKLRLYEQQHGKCLYSGKEINLGRLNEK








GYVEIDHALPFSRTWDDSFNNKVLVLGSENQNKGNQTPYEYFNGKDNSRE








WQEFKARVETSRFPRSKKQRILLQKFDEDGFKERNLNDTRYVNRFLCQFVA








DRMRLTGKGKKRVFASNGQITNLLRGFWGLRKVRAENDRHHALDAVVVA








CSTVAMQQKITRFVRYKEMNAFDGKTIDKETGEVLHQKTHFPQPWEFFAQ








EVMIRVFGKPDGKPEFEEADTLEKLRTLLAEKLSSRPEAVHEYVTPLFVSRAP








NRKMSGQGHMETVKSAKRLDEGVSVLRVPLTQLKLKDLEKMVNREREPKL








YEALKARLEAHKDDPAKAFAEPFYKYDKAGNRTQQVKAVRVEQVQKTGVW








VRNHNGIADNATMVRVDVFEKGDKYYLVPIYSWQVAKGILPDRAVVQGKD








EEDWQLIDDSFNFKFSLHPNDLVEVITKKARMFGYFASCHRGTGNINIRIHD








LDHKIGKNGILEGIGVKTALSFQKYQIDELGKEIRPCRLKKRPPVR









ScaCas9

Streptococcus

MEKKYSIGLDIGTNSVGWAVITDDYKVPSKKFKVLGNTNRKSIKKNLMGALL
9,022
N872A
H849A
D10A




canis

FDSGETAEATRLKRTARRRYTRRKNRIRYLQEIFANEMAKLDDSFFQRLEESF








LVEEDKKNERHPIFGNLADEVAYHRNYPTIYHLRKKLADSPEKADLRLIYLALA








HIIKFRGHFLIEGKLNAENSDVAKLFYQLIQTYNQLFEESPLDEIEVDAKGILSA








RLSKSKRLEKLIAVFPNEKKNGLFGNIIALALGLTPNFKSNFDLTEDAKLQLSKD








TYDDDLDELLGQIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKAPLSASMV








KRYDEHHQDLALLKTLVRQQFPEKYAEIFKDDTKNGYAGYVGIGIKHRKRTT








KLATQEEFYKFIKPILEKMDGAEELLAKLNRDDLLRKQRTFDNGSIPHQIHLKE








LHAILRRQEEFYPFLKENREKIEKILTFRIPYYVGPLARGNSRFAWLTRKSEEAI








TPWNFEEVVDKGASAQSFIERMTNFDEQLPNKKVLPKHSLLYEYFTVYNELT








KVKYVTERMRKPEFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSV








EIIGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEE








RLKTYAHLFDDKVMKQLKRRHYTGWGRLSRKMINGIRDKQSGKTILDFLKS








DGFSNRNFMQLIHDDSLTFKEEIEKAQVSGQGDSLHEQIADLAGSPAIKKGIL








QTVKIVDELVKVMGHKPENIVIEMARENQTTTKGLQQSRERKKRIEEGIKELE








SQILKENPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVP








QSFIKDDSIDNKVLTRSVENRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQ








RKFDNLTKAERGGLSEADKAGFIKRQLVETRQITKHVARILDSRMNTKRDKN








DKPIREVKVITLKSKLVSDFRKDFQLYKVRDINNYHHAHDAYLNAVVGTALIK








KYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKRFFYSNIMNFFKTEVKL








ANGEIRKRPLIETNGETGEVVWNKEKDFATVRKVLAMPQVNIVKKTEVQTG








GFSKESILSKRESAKLIPRKKGWDTRKYGGFGSPTVAYSILVVAKVEKGKAKKL








KSVKVLVGITIMEKGSYEKDPIGFLEAKGYKDIKKELIFKLPKYSLFELENGRRR








MLASATELQKANELVLPQHLVRLLYYTQNISATTGSNNLGYIEQHREEFKEIF








EKIIDFSEKYILKNKVNSNLKSSFDEQFAVSDSILLSNSFVSLLKYTSFGASGGFT








FLDLDVKQGRLRYQTVTEVLDATLIYQSITGLYETRTDLSQLGGD









ScaCas9-

Streptococcus

MEKKYSIGLDIGTNSVGWAVITDDYKVPSKKFKVLGNTNRKSIKKNLMGALL
9,023
N872A
H849A
D10A


HiFi-Sc++

canis

FDSGETAEATRLKRTARRRYTRRKNRIRYLQEIFANEMAKLDDSFFQRLEESF








LVEEDKKNERHPIFGNLADEVAYHRNYPTIYHLRKKLADSPEKADLRLIYLALA








HIIKFRGHFLIEGKLNAENSDVAKLFYQLIQTYNQLFEESPLDEIEVDAKGILSA








RLSKSKRLEKLIAVFPNEKKNGLFGNIIALALGLTPNFKSNFDLTEDAKLQLSKD








TYDDDLDELLGQIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKAPLSASMV








KRYDEHHQDLALLKTLVRQQFPEKYAEIFKDDTKNGYAGYVGADKKLRKRS








GKLATEEEFYKFIKPILEKMDGAEELLAKLNRDDLLRKQRTFDNGSIPHQIHLK








ELHAILRRQEEFYPFLKENREKIEKILTFRIPYYVGPLARGNSRFAWLTRKSEEA








ITPWNFEEVVDKGASAQSFIERMTNFDEQLPNKKVLPKHSLLYEYFTVYNEL








TKVKYVTERMRKPEFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDS








VEIIGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIE








ERLKTYAHLFDDKVMKQLKRRHYTGWGRLSRKMINGIRDKQSGKTILDFLKS








DGFSNANFMQLIHDDSLTFKEEIEKAQVSGQGDSLHEQIADLAGSPAIKKGIL








QTVKIVDELVKVMGHKPENIVIEMARENQTTTKGLQQSRERKKRIEEGIKELE








SQILKENPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVP








QSFIKDDSIDNKVLTRSVENRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQ








RKFDNLTKAERGGLSEADKAGFIKRQLVETRQITKHVARILDSRMNTKRDKN








DKPIREVKVITLKSKLVSDFRKDFQLYKVRDINNYHHAHDAYLNAVVGTALIK








KYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKRFFYSNIMNFFKTEVKL








ANGEIRKRPLIETNGETGEVVWNKEKDFATVRKVLAMPQVNIVKKTEVQTG








GFSKESILSKRESAKLIPRKKGWDTRKYGGFGSPTVAYSILVVAKVEKGKAKKL








KSVKVLVGITIMEKGSYEKDPIGFLEAKGYKDIKKELIFKLPKYSLFELENGRRR








MLASAKELQKANELVLPQHLVRLLYYTQNISATTGSNNLGYIEQHREEFKEIF








EKIIDFSEKYILKNKVNSNLKSSFDEQFAVSDSILLSNSFVSLLKYTSFGASGGFT








FLDLDVKQGRLRYQTVTEVLDATLIYQSITGLYETRTDLSQLGGD









SpyCas9-

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,024
N863A
H840A
D10A


3var-NRRH

pyogenes

DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MVKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEE








FYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGIIPHQIHLGELHAILRRQ








GDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE








EVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRLRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRN








FMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV








DELVKVMGGHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQI








LKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES








ILPKGNSDKLIARKKDWDPKKYGGFNSPTAAYSVLVVAKVEKGKSKKLKSVK








ELLGITIMERSSFEKNPIGFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLAS








AGVLHKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDE








IIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGVPAA








FKYFDTTIDKKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD









SpyCas9-

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,025
N863A
H840A
D10A


3var-NRTH

pyogenes

DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MVKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEE








FYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGIIPHQIHLGELHAILRRQ








GDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE








EVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRLRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRN








FMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV








DELVKVMGGHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQI








LKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES








ILPKGNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKSKKLKSVK








ELLGITIMERSSFEKNPIGFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLAS








ASVLHKGNELALPSKYVNFLYLASHYEKLKGSSEDNKQKQLFVEQHKHYLDEI








IEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGASAAF








KYFDTTIGRKLYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD









SpyCas9-

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,026
N863A
H840A
D10A


3var-NRCH

pyogenes

DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MVKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEE








FYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGIIPHQIHLGELHAILRRQ








GDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE








EVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRLRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRN








FMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV








DELVKVMGGHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQI








LKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES








ILPKGNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKSKKLKSVK








ELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLAS








AGVLQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDE








IIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAA








FKYFDTTINRKQYNTTKEVLDATLIRQSITGLYETRIDLSQLGGD









SpyCas9-

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,027
N863A
H840A
D10A


HF1

pyogenes

DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF








YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ








EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE








EVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR








NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKV








VDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQ








ILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES








ILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE








LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASA








GELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEII








EQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAF








KYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD









SpyCas9-

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,028
N863A
H840A
D10A


QQR1

pyogenes

DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF








YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ








EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE








EVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR








NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKV








VDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQ








ILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES








ILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE








LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASA








RELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEII








EQISEFSKRVILADAQLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAF








KYFDTTFKQKQYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD









SpyCas9-

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,029
N863A
H840A
D10A


SpG

pyogenes

DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF








YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ








EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE








EVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR








NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKV








VDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQ








ILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES








ILPKRNSDKLIARKKDWDPKKYGGFLWPTVAYSVLVVAKVEKGKSKKLKSVK








ELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLAS








AKQLQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDE








IIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAA








FKYFDTTIDRKQYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD









SpyCas9-

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,030
N863A
H840A
D10A


VQR

pyogenes

DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF








YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ








EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE








EVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR








NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKV








VDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQ








ILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES








ILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKE








LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASA








GELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEII








EQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAF








KYFDTTIDRKQYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD









SpyCas9-

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,031
N863A
H840A
D10A


VRER

pyogenes

DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF








YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ








EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE








EVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR








NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKV








VDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQ








ILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES








ILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKE








LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASA








RELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEII








EQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAF








KYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD









SpyCas9-

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,032
N863A
H840A
D10A


xCas

pyogenes

DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDTKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MIKLYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF








YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGIIPHQIHLGELHAILRRQE








DFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEK








VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGDQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR








NFIQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV








DELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQI








LKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES








ILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE








LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASA








GVLQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEII








EQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAF








KYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD









SpyCas9-

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLF
9,033
N863A
H840A
D10A


xCas-NG

pyogenes

DSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL








VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH








MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS








ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDTKLQLS








KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS








MIKLYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF








YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGIIPHQIHLGELHAILRRQE








DFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEK








VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE








GMRKPAFLSGDQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED








RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA








HLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR








NFIQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV








DELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQI








LKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF








LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF








DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLI








REVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK








LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEI








RKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES








IRPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKE








LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASA








RFLQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEII








EQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPRAF








KYFDTTIDRKVYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD









St1Cas9-

Streptococcus

MSDLVLGLDIGIGSVGVGILNKVTGEIIHKNSRIFPAAQAENNLVRRTNRQG
9,034
N622A
H599A
D9A


CNRZ1066

thermophilus

RRLARRKKHRRVRLNRLFEESGLITDFTKISINLNPYQLRVKGLTDELSNEELFI








ALKNMVKHRGISYLDDASDDGNSSVGDYAQIVKENSKQLETKTPGQIQLER








YQTYGQLRGDFTVEKDGKKHRLINVFPTSAYRSEALRILQTQQEFNPQITDEF








INRYLEILTGKRKYYHGPGNEKSRTDYGRYRTSGETLDNIFGILIGKCTFYPDEF








RAAKASYTAQEFNLLNDLNNLTVPTETKKLSKEQKNQIINYVKNEKAMGPAK








LFKYIAKLLSCDVADIKGYRIDKSGKAEIHTFEAYRKMKTLETLDIEQMDRETL








DKLAYVLTLNTEREGIQEALEHEFADGSFSQKQVDELVQFRKANSSIFGKGW








HNFSVKLMMELIPELYETSEEQMTILTRLGKQKTTSSSNKTKYIDEKLLTEEIY








NPVVAKSVRQAIKIVNAAIKEYGDFDNIVIEMARETNEDDEKKAIQKIQKAN








KDEKDAAMLKAANQYNGKAELPHSVFHGHKQLATKIRLWHQQGERCLYT








GKTISIHDLINNSNQFEVDHILPLSITFDDSLANKVLVYATANQEKGQRTPYQ








ALDSMDDAWSFRELKAFVRESKTLSNKKKEYLLTEEDISKFDVRKKFIERNLV








DTRYASRVVLNALQEHFRAHKIDTKVSVVRGQFTSQLRRHWGIEKTRDTYH








HHAVDALIIAASSQLNLWKKQKNTLVSYSEEQLLDIETGELISDDEYKESVFKA








PYQHFVDTLKSKEFEDSILFSYQVDSKFNRKISDATIYATRQAKVGKDKKDET








YVLGKIKDIYTQDGYDAFMKIYKKDKSKFLMYRHDPQTFEKVIEPILENYPNK








QMNEKGKEVPCNPFLKYKEEHGYIRKYSKKGNGPEIKSLKYYDSKLLGNPIDI








TPENSKNKVVLQSLKPWRTDVYFNKATGKYEILGLKYADLQFEKGTGTYKIS








QEKYNDIKKKEGVDSDSEFKFTLYKNDLLLVKDTETKEQQLFRFLSRTLPKQK








HYVELKPYDKQKFEGGEALIKVLGNVANGGQCIKGLAKSNISIYKVRTDVLG








NQHIIKNEGDKPKLDF









St1Cas9-

Streptococcus

MSDLVLGLDIGIGSVGVGILNKVTGEIIHKNSRIFPAAQAENNLVRRTNRQG
9,035
N622A
H599A
D9A


LMG1831

thermophilus

RRLARRKKHRRVRLNRLFEESGLITDFTKISINLNPYQLRVKGLTDELSNEELFI








ALKNMVKHRGISYLDDASDDGNSSVGDYAQIVKENSKQLETKTPGQIQLER








YQTYGQLRGDFTVEKDGKKHRLINVFPTSAYRSEALRILQTQQEFNPQITDEF








INRYLEILTGKRKYYHGPGNEKSRTDYGRYRTSGETLDNIFGILIGKCTFYPDEF








RAAKASYTAQEFNLLNDLNNLTVPTETKKLSKEQKNQIINYVKNEKAMGPAK








LFKYIAKLLSCDVADIKGYRIDKSGKAEIHTFEAYRKMKTLETLDIEQMDRETL








DKLAYVLTLNTEREGIQEALEHEFADGSFSQKQVDELVQFRKANSSIFGKGW








HNFSVKLMMELIPELYETSEEQMTILTRLGKQKTTSSSNKTKYIDEKLLTEEIY








NPVVAKSVRQAIKIVNAAIKEYGDFDNIVIEMARETNEDDEKKAIQKIQKAN








KDEKDAAMLKAANQYNGKAELPHSVFHGHKQLATKIRLWHQQGERCLYT








GKTISIHDLINNSNQFEVDHILPLSITFDDSLANKVLVYATANQEKGQRTPYQ








ALDSMDDAWSFRELKAFVRESKTLSNKKKEYLLTEEDISKFDVRKKFIERNLV








DTRYASRVVLNALQEHFRAHKIDTKVSVVRGQFTSQLRRHWGIEKTRDTYH








HHAVDALIIAASSQLNLWKKQKNTLVSYSEEQLLDIETGELISDDEYKESVFKA








PYQHFVDTLKSKEFEDSILFSYQVDSKFNRKISDATIYATRQAKVGKDKKDET








YVLGKIKDIYTQDGYDAFMKIYKKDKSKFLMYRHDPQTFEKVIEPILENYPNK








QMNEKGKEVPCNPFLKYKEEHGYIRKYSKKGNGPEIKSLKYYDSKLLGNPIDI








TPENSKNKVVLQSLKPWRTDVYFNKNTGKYEILGLKYADLQFEKKTGTYKISQ








EKYNGIMKEEGVDSDSEFKFTLYKNDLLLVKDTETKEQQLFRFLSRTMPNVK








YYVELKPYSKDKFEKNESLIEILGSADKSGRCIKGLGKSNISIYKVRTDVLGNQH








IIKNEGDKPKLDF









St1Cas9-

Streptococcus

MSDLVLGLDIGIGSVGVGILNKVTGEIIHKNSRIFPAAQAENNLVRRTNRQG
9,036
N622A
H599A
D9A


MTH17CL396

thermophilus

RRLARRKKHRRVRLNRLFEESGLITDFTKISINLNPYQLRVKGLTDELSNEELFI








ALKNMVKHRGISYLDDASDDGNSSVGDYAQIVKENSKQLETKTPGQIQLER








YQTYGQLRGDFTVEKDGKKHRLINVFPTSAYRSEALRILQTQQEFNPQITDEF








INRYLEILTGKRKYYHGPGNEKSRTDYGRYRTSGETLDNIFGILIGKCTFYPDEF








RAAKASYTAQEFNLLNDLNNLTVPTETKKLSKEQKNQIINYVKNEKAMGPAK








LFKYIAKLLSCDVADIKGYRIDKSGKAEIHTFEAYRKMKTLETLDIEQMDRETL








DKLAYVLTLNTEREGIQEALEHEFADGSFSQKQVDELVQFRKANSSIFGKGW








HNFSVKLMMELIPELYETSEEQMTILTRLGKQKTTSSSNKTKYIDEKLLTEEIY








NPVVAKSVRQAIKIVNAAIKEYGDFDNIVIEMARETNEDDEKKAIQKIQKAN








KDEKDAAMLKAANQYNGKAELPHSVFHGHKQLATKIRLWHQQGERCLYT








GKTISIHDLINNSNQFEVDHILPLSITFDDSLANKVLVYATANQEKGQRTPYQ








ALDSMDDAWSFRELKAFVRESKTLSNKKKEYLLTEEDISKFDVRKKFIERNLV








DTRYASRVVLNALQEHFRAHKIDTKVSVVRGQFTSQLRRHWGIEKTRDTYH








HHAVDALIIAASSQLNLWKKQKNTLVSYSEDQLLDIETGELISDDEYKESVFK








APYQHFVDTLKSKEFEDSILFSYQVDSKFNRKISDATIYATRQAKVGKDKADE








TYVLGKIKDIYTQDGYDAFMKIYKKDKSKFLMYRHDPQTFEKVIEPILENYPN








KQINEKGKEVPCNPFLKYKEEHGYIRKYSKKGNGPEIKSLKYYDSKLGNHIDIT








PKDSNNKVVLQSLKPWRTDVYFNKNTGKYEILGLKYSDMQFEKGTGKYSISK








EQYENIKVREGVDENSEFKFTLYKNDLLLLKDSENGEQILLRFTSRNDTSKHYV








ELKPYNRQKFEGSEYLIKSLGTVAKGGQCIKGLGKSNISIYKVRTDVLGNQHII








KNEGDKPKLDF









St1Cas9-

Streptococcus

MSDLVLGLDIGIGSVGVGILNKVTGEIIHKNSRIFPAAQAENNLVRRTNRQG
9,037
N622A
H599A
D9A


TH1477

thermophilus

RRLARRKKHRRVRLNRLFEESGLITDFTKISINLNPYQLRVKGLTDELSNEELFI








ALKNMVKHRGISYLDDASDDGNSSVGDYAQIVKENSKQLETKTPGQIQLER








YQTYGQLRGDFTVEKDGKKHRLINVFPTSAYRSEALRILQTQQEFNPQITDEF








INRYLEILTGKRKYYHGPGNEKSRTDYGRYRTSGETLDNIFGILIGKCTFYPDEF








RAAKASYTAQEFNLLNDLNNLTVPTETKKLSKEQKNQIINYVKNEKAMGPAK








LFKYIAKLLSCDVADIKGYRIDKSGKAEIHTFEAYRKMKTLETLDIEQMDRETL








DKLAYVLTLNTEREGIQEALEHEFADGSFSQKQVDELVQFRKANSSIFGKGW








HNFSVKLMMELIPELYETSEEQMTILTRLGKQKTTSSSNKTKYIDEKLLTEEIY








NPVVAKSVRQAIKIVNAAIKEYGDFDNIVIEMARETNEDDEKKAIQKIQKAN








KDEKDAAMLKAANQYNGKAELPHSVFHGHKQLATKIRLWHQQGERCLYT








GKTISIHDLINNSNQFEVDHILPLSITFDDSLANKVLVYATANQEKGQRTPYQ








ALDSMDDAWSFRELKAFVRESKTLSNKKKEYLLTEEDISKFDVRKKFIERNLV








DTRYASRVVLNALQEHFRAHKIDTKVSVVRGQFTSQLRRHWGIEKTRDTYH








HHAVDALIIAASSQLNLWKKQKNTLVSYSEDQLLDIETGELISDDEYKESVFK








APYQHFVDTLKSKEFEDSILFSYQVDSKFNRKISDATIYATRQAKVGKDKADE








TYVLGKIKDIYTQDGYDAFMKIYKKDKSKFLMYRHDPQTFEKVIEPILENYPN








KQINEKGKEVPCNPFLKYKEEHGYIRKYSKKGNGPEIKSLKYYDSKLGNHIDIT








PKDSNNKVVLQSLKPWRTDVYFNKNTGKYEILGLKYSDMQFEKGTGKYSISK








EQYENIKVREGVDENSEFKFTLYKNDLLLLKDSENGEQILLRFTSRNDTSKHYV








ELKPYNRQKFEGSEYLIKSLGTVVKGGRCIKGLGKSNISIYKVRTDVLGNQHIIK








NEGDKPKLDF









SRGN3.1

Staphylococcus

MNQKFILGLDIGITSVGYGLIDYETKNIIDAGVRLFPEANVENNEGRRSKRGS
9,038
N585A
H562A
D10A



spp.
RRLKRRRIHRLERVKLLLTEYDLINKEQIPTSNNPYQIRVKGLSEILSKDELAIAL








LHLAKRRGIHNVDVAADKEETASDSLSTKDQINKNAKFLESRYVCELQKERLE








NEGHVRGVENRFLTKDIVREAKKIIDTQMQYYPEIDETFKEKYISLVETRREYF








EGPGQGSPFGWNGDLKKWYEMLMGHCTYFPQELRSVKYAYSADLFNALN








DLNNLIIQRDNSEKLEYHEKYHIIENVFKQKKKPTLKQIAKEIGVNPEDIKGYRI








TKSGTPEFTSFKLFHDLKKVVKDHAILDDIDLLNQIAEILTIYQDKDSIVAELGQ








LEYLMSEADKQSISELTGYTGTHSLSLKCMNMIIDELWHSSMNQMEVFTYL








NMRPKKYELKGYQRIPTDMIDDAILSPVVKRTFIQSINVINKVIEKYGIPEDIIIE








LARENNSDDRKKFINNLQKKNEATRKRINEIIGQTGNQNAKRIVEKIRLHDQ








QEGKCLYSLESIPLEDLLNNPNHYEVDHIIPRSVSFDNSYHNKVLVKQSENSK








KSNLTPYQYFNSGKSKLSYNQFKQHILNLSKSQDRISKKKKEYLLEERDINKFE








VQKEFINRNLVDTRYATRELTNYLKAYFSANNMNVKVKTINGSFTDYLRKV








WKFKKERNHGYKHHAEDALIIANADFLFKENKKLKAVNSVLEKPEIETKQLDI








QVDSEDNYSEMFIIPKQVQDIKDFRNFKYSHRVDKKPNRQLINDTLYSTRKK








DNSTYIVQTIKDIYAKDNTTLKKQFDKSPEKFLMYQHDPRTFEKLEVIMKQYA








NEKNPLAKYHEETGEYLTKYSKKNNGPIVKSLKYIGNKLGSHLDVTHQFKSST








KKLVKLSIKNYRFDVYLTEKGYKFVTIAYLNVFKKDNYYYIPKDKYQELKEKKKI








KDTDQFIASFYKNDLIKLNGDLYKIIGVNSDDRNIIELDYYDIKYKDYCEINNIK








GEPRIKKTIGKKTESIEKFTTDVLGNLYLHSTEKAPQLIFKRGL









SRGN3.3

Staphylococcus

MNQKFILGLDIGITSVGYGLIDYETKNIIDAGVRLFPEANVENNEGRRSKRGS
9,039
N585A
H562A
D10A



spp.
RRLKRRRIHRLERVKLLLTEYDLINKEQIPTSNNPYQIRVKGLSEILSKDELAIAL








LHLAKRRGIHNVDVAADKEETASDSLSTKDQINKNAKFLESRYVCELQKERLE








NEGHVRGVENRFLTKDIVREAKKIIDTQMQYYPEIDETFKEKYISLVETRREYF








EGPGQGSPFGWNGDLKKWYEMLMGHCTYFPQELRSVKYAYSADLFNALN








DLNNLIIQRDNSEKLEYHEKYHIIENVFKQKKKPTLKQIAKEIGVNPEDIKGYRI








TKSGTPEFTSFKLFHDLKKVVKDHAILDDIDLLNQIAEILTIYQDKDSIVAELGQ








LEYLMSEADKQSISELTGYTGTHSLSLKCMNMIIDELWHSSMNQMEVFTYL








NMRPKKYELKGYQRIPTDMIDDAILSPVVKRTFIQSINVINKVIEKYGIPEDIIIE








LARENNSDDRKKFINNLQKKNEATRKRINEIIGQTGNQNAKRIVEKIRLHDQ








QEGKCLYSLESIPLEDLLNNPNHYEVDHIIPRSVSFDNSYHNKVLVKQSENSK








KSNLTPYQYFNSGKSKLSYNQFKQHILNLSKSQDRISKKKKEYLLEERDINKFE








VQKEFINRNLVDTRYATRELTSYLKAYFSANNMDVKVKTINGSFTNHLRKV








WRFDKYRNHGYKHHAEDALIIANADFLFKENKKLQNTNKILEKPTIENNTKK








VTVEKEEDYNNVFETPKLVEDIKQYRDYKFSHRVDKKPNRQLINDTLYSTRM








KDEHDYIVQTITDIYGKDNTNLKKQFNKNPEKFLMYQNDPKTFEKLSIIMKQ








YSDEKNPLAKYYEETGEYLTKYSKKNNGPIVKKIKLLGNKVGNHLDVTNKYEN








STKKLVKLSIKNYRFDVYLTEKGYKFVTIAYLNVFKKDNYYYIPKDKYQELKEKK








KIKDTDQFIASFYKNDLIKLNGDLYKIIGVNSDDRNIIELDYYDIKYKDYCEINNI








KGEPRIKKTIGKKTESIEKFTTDVLGNLYLHSTEKAPQLIFKRGL









In some embodiments, a Cas protein requires a protospacer adjacent motif (PAM) to be present in or adjacent to a target DNA sequence for the Cas protein to bind and/or function. In some embodiments, the PAM is or comprises, from 5′ to 3′, NGG, YG, NNGRRT, NNNRRT, NGA, TYCV, TATV, NTTN, or NNNGATT, where N stands for any nucleotide, Y stands for C or T, R stands for A or G, and V stands for A or C or G. In some embodiments, a Cas protein is a protein listed in Table 7 or 8. In some embodiments, a Cas protein comprises one or more mutations altering its PAM. In some embodiments, a Cas protein comprises E1369R, E1449H, and R1556A mutations or analogous substitutions to the amino acids corresponding to said positions. In some embodiments, a Cas protein comprises E782K, N968K, and R1015H mutations or analogous substitutions to the amino acids corresponding to said positions. In some embodiments, a Cas protein comprises D1135V, R1335Q, and T1337R mutations or analogous substitutions to the amino acids corresponding to said positions. In some embodiments, a Cas protein comprises S542R and K607R mutations or analogous substitutions to the amino acids corresponding to said positions. In some embodiments, a Cas protein comprises S542R, K548V, and N552R mutations or analogous substitutions to the amino acids corresponding to said positions. Exemplary advances in the engineering of Cas enzymes to recognize altered PAM sequences are reviewed in Collias et al Nature Communications 12:555 (2021), incorporated herein by reference in its entirety.


In some embodiments, the Cas protein is catalytically active and cuts one or both strands of the target DNA site. In some embodiments, cutting the target DNA site is followed by formation of an alteration, e.g., an insertion or deletion, e.g., by the cellular repair machinery.


In some embodiments, the Cas protein is modified to deactivate or partially deactivate the nuclease, e.g., nuclease-deficient Cas9. Whereas wild-type Cas9 generates double-strand breaks (DSBs) at specific DNA sequences targeted by a gRNA, a number of CRISPR endonucleases having modified functionalities are available, for example: a “nickase” version of Cas9 that has been partially deactivated generates only a single-strand break; a catalytically inactive Cas9 (“dCas9”) does not cut target DNA. In some embodiments, dCas9 binding to a DNA sequence may interfere with transcription at that site by steric hindrance. In some embodiments, dCas9 binding to an anchor sequence may interfere with (e.g., decrease or prevent) genomic complex (e.g., ASMC) formation and/or maintenance. In some embodiments, a DNA-binding domain comprises a catalytically inactive Cas9, e.g., dCas9. Many catalytically inactive Cas9 proteins are known in the art. In some embodiments, dCas9 comprises mutations in each endonuclease domain of the Cas protein, e.g., D10A and H840A or N863A mutations. In some embodiments, a catalytically inactive or partially inactive CRISPR/Cas domain comprises a Cas protein comprising one or more mutations, e.g., one or more of the mutations listed in Table 7. In some embodiments, a Cas protein described on a given row of Table 7 comprises one, two, three, or all of the mutations listed in the same row of Table 7. In some embodiments, a Cas protein, e.g., not described in Table 7, comprises one, two, three, or all of the mutations listed in a row of Table 7 or a corresponding mutation at a corresponding site in that Cas protein.


In some embodiments, a catalytically inactive, e.g., dCas9, or partially deactivated Cas9 protein comprises a D11 mutation (e.g., D11A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a H969 mutation (e.g., H969A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a N995 mutation (e.g., N995A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, comprises mutations at one, two, or three of positions D11, H969, and N995 (e.g., D11A, H969A, and N995A mutations) or analogous substitutions to the amino acids corresponding to said positions.


In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a D10 mutation (e.g., a D10A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a H557 mutation (e.g., a H557A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, comprises a D10 mutation (e.g., a D10A mutation) and a H557 mutation (e.g., a H557A mutation) or analogous substitutions to the amino acids corresponding to said positions.


In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a D839 mutation (e.g., a D839A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a H840 mutation (e.g., a H840A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a N863 mutation (e.g., a N863A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, comprises a D10 mutation (e.g., D10A), a D839 mutation (e.g., D839A), a H840 mutation (e.g., H840A), and a N863 mutation (e.g., N863A) or analogous substitutions to the amino acids corresponding to said positions.


In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a E993 mutation (e.g., a E993A mutation) or an analogous substitution to the amino acid corresponding to said position.


In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a D917 mutation (e.g., a D917A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a a E1006 mutation (e.g., a E1006A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a D1255 mutation (e.g., a D1255A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, comprises a D917 mutation (e.g., D917A), a E1006 mutation (e.g., E1006A), and a D1255 mutation (e.g., D1255A) or analogous substitutions to the amino acids corresponding to said positions.


In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a D16 mutation (e.g., a D16A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a D587 mutation (e.g., a D587A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a partially deactivated Cas domain has nickase activity. In some embodiments, a partially deactivated Cas9 domain is a Cas9 nickase domain. In some embodiments, the catalytically inactive Cas domain or dead Cas domain produces no detectable double strand break formation. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a H588 mutation (e.g., a H588A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, or partially deactivated Cas9 protein comprises a N611 mutation (e.g., a N611A mutation) or an analogous substitution to the amino acid corresponding to said position. In some embodiments, a catalytically inactive Cas9 protein, e.g., dCas9, comprises a D16 mutation (e.g., D16A), a D587 mutation (e.g., D587A), a H588 mutation (e.g., H588A), and a N611 mutation (e.g., N611A) or analogous substitutions to the amino acids corresponding to said positions.


In some embodiments, a DNA-binding domain or endonuclease domain may comprise a Cas molecule comprising or linked (e.g., covalently) to a gRNA (e.g., a template nucleic acid, e.g., template RNA, comprising a gRNA).


In some embodiments, an endonuclease domain or DNA binding domain comprises a Streptococcus pyogenes Cas9 (SpCas9) or a functional fragment or variant thereof. In some embodiments, the endonuclease domain or DNA binding domain comprises a modified SpCas9. In embodiments, the modified SpCas9 comprises a modification that alters protospacer-adjacent motif (PAM) specificity. In embodiments, the PAM has specificity for the nucleic acid sequence 5′-NGT-3′. In embodiments, the modified SpCas9 comprises one or more amino acid substitutions, e.g., at one or more of positions L1111, D1135, G1218, E1219, A1322, of R1335, e.g., selected from L1111R, D1135V, G1218R, E1219F, A1322R, R1335V. In embodiments, the modified SpCas9 comprises the amino acid substitution T1337R and one or more additional amino acid substitutions, e.g., selected from L1111, D1135L, S1136R, G1218S, E1219V, D1332A, D1332S, D1332T, D1332V, D1332L, D1332K, D1332R, R1335Q, T1337, T1337L, T1337Q, T1337I, T1337V, T1337F, T1337S, T1337N, T1337K, T1337H, T1337Q, and T1337M, or corresponding amino acid substitutions thereto. In embodiments, the modified SpCas9 comprises: (i) one or more amino acid substitutions selected from D1135L, S1136R, G1218S, E1219V, A1322R, R1335Q, and T1337; and (ii) one or more amino acid substitutions selected from L1111R, G1218R, E1219F, D1332A, D1332S, D1332T, D1332V, D1332L, D1332K, D1332R, T1337L, T1337I, T1337V, T1337F, T1337S, T1337N, T1337K, T1337R, T1337H, T1337Q, and T1337M, or corresponding amino acid substitutions thereto.


In some embodiments, the endonuclease domain or DNA binding domain comprises a Cas domain, e.g., a Cas9 domain. In embodiments, the endonuclease domain or DNA binding domain comprises a nuclease-active Cas domain, a Cas nickase (nCas) domain, or a nuclease-inactive Cas (dCas) domain. In embodiments, the endonuclease domain or DNA binding domain comprises a nuclease-active Cas9 domain, a Cas9 nickase (nCas9) domain, or a nuclease-inactive Cas9 (dCas9) domain. In some embodiments, the endonuclease domain or DNA binding domain comprises a Cas9 domain of Cas9 (e.g., dCas9 and nCas9), Cas12a/Cpf1, Cas12b/C2c1, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12g, Cas12h, or Cas12i. In some embodiments, the endonuclease domain or DNA binding domain comprises a Cas9 (e.g., dCas9 and nCas9), Cas12a/Cpf1, Cas12b/C2c1, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12g, Cas12h, or Cas12i. In some embodiments, the endonuclease domain or DNA binding domain comprises an S. pyogenes or an S. thermophilus Cas9, or a functional fragment thereof. In some embodiments, the endonuclease domain or DNA binding domain comprises a Cas9 sequence, e.g., as described in Chylinski, Rhun, and Charpentier (2013) RNA Biology 10:5, 726-737; incorporated herein by reference. In some embodiments, the endonuclease domain or DNA binding domain comprises the HNH nuclease subdomain and/or the RuvC1 subdomain of a Cas, e.g., Cas9, e.g., as described herein, or a variant thereof. In some embodiments, the endonuclease domain or DNA binding domain comprises Cas12a/Cpf1, Cas12b/C2c1, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12g, Cas12h, or Cas12i. In some embodiments, the endonuclease domain or DNA binding domain comprises a Cas polypeptide (e.g., enzyme), or a functional fragment thereof. In embodiments, the Cas polypeptide (e.g., enzyme) is selected from Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5d, Cas5t, Cas5h, Cas5a, Cash, Cas7, Cas8, Cas8a, Cas8b, Cas8c, Cas9 (e.g., Csn1 or Csx12), Cas10, Cas10d, Cas12a/Cpf1, Cas12b/C2c1, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12g, Cas12h, Cas12i, Csy1, Csy2, Csy3, Csy4, Cse1, Cse2, Cse3, Cse4, Cse5e, Csc1, Csc2, Csa5, Csn1, Csn2, Csm1, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx1S, Csx11, Csf1, Csf2, CsO, Csf4, Csd1, Csd2, Cst1, Cst2, Csh1, Csh2, Csa1, Csa2, Csa3, Csa4, Csa5, Type II Cas effector proteins, Type V Cas effector proteins, Type VI Cas effector proteins, CARF, DinG, Cpf1, Cas12b/C2c1, Cas12c/C2c3, Cas12b/C2c1, Cas12c/C2c3, SpCas9(K855A), eSpCas9(1.1), SpCas9-HF1, hyper accurate Cas9 variant (HypaCas9), homologues thereof, modified or engineered versions thereof, and/or functional fragments thereof. In embodiments, the Cas9 comprises one or more substitutions, e.g., selected from H840A, D10A, P475A, W476A, N477A, D1125A, W1126A, and D1127A. In embodiments, the Cas9 comprises one or more mutations at positions selected from: D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or A987, e.g., one or more substitutions selected from D10A, G12A, G17A, E762A, H840A, N854A, N863A, H982A, H983A, A984A, and/or D986A. In some embodiments, the endonuclease domain or DNA binding domain comprises a Cas (e.g., Cas9) sequence from Corynebacterium ulcerans, Corynebacterium diphtheria, Spiroplasma syrphidicola, Prevotella intermedia, Spiroplasma taiwanense, Streptococcus iniae, Belliella baltica, Psychroflexus torquis, Streptococcus thermophilus, Listeria innocua, Campylobacter jejuni, Neisseria meningitidis, Streptococcus pyogenes, or Staphylococcus aureus, or a fragment or variant thereof.


In some embodiments, the endonuclease domain or DNA binding domain comprises a Cpf1 domain, e.g., comprising one or more substitutions, e.g., at position D917, E1006A, D1255 or any combination thereof, e.g., selected from D917A, E1006A, D1255A, D917A/E1006A, D917A/D1255A, E1006A/D1255A, and D917A/E1006A/D1255A.


In some embodiments, the endonuclease domain or DNA binding domain comprises spCas9, spCas9-VRQR, spCas9-VRER, xCas9 (sp), saCas9, saCas9-KKH, spCas9-MQKSER, spCas9-LRKIQK, or spCas9-LRVSQL.


In some embodiments, a gene modifying polypeptide has an endonuclease domain comprising a Cas9 nickase, e.g., Cas9 H840A. In embodiments, the Cas9 H840A has the following amino acid sequence:










Cas9 nickase (H840A):



(SEQ ID NO: 11,001)



DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEA






TRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGN





IVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDV





DKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLI





ALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAIL





LSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAG





YIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAI





LRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVV





DKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLS





GEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKII





KDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG





RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSL





HEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRE





RMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDV





DAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRK





FDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVI





TLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYK





VYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWD





KGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGG





FDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKK





DLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPED





NEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHL





FTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD









In some embodiments, a gene modifying polypeptide comprises a dCas9 sequence comprising a D10A and/or H840A mutation, e.g., the following sequence:










(SEQ ID NO: 5007)



SMDKKYSIGLAIGTNSVGWAVITDDYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGET






AEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI





FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDN





SDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLF





GNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLS





DAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNG





YAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGE





LHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFE





EVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKP





AFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDL





LKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYT





GWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQ





GDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQK





NSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLS





DYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLI





TQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIR





EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVY





GDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETG





EIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPK





KYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYK





EVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKG





SPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAE





NIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD






TAL Effectors and Zinc Finger Nucleases


In some embodiments, an endonuclease domain or DNA-binding domain comprises a TAL effector molecule. A TAL effector molecule, e.g., a TAL effector molecule that specifically binds a DNA sequence, typically comprises a plurality of TAL effector domains or fragments thereof, and optionally one or more additional portions of naturally occurring TAL effectors (e.g., N- and/or C-terminal of the plurality of TAL effector domains). Many TAL effectors are known to those of skill in the art and are commercially available, e.g., from Thermo Fisher Scientific.


Naturally occurring TALEs are natural effector proteins secreted by numerous species of bacterial pathogens including the plant pathogen Xanthomonas which modulates gene expression in host plants and facilitates bacterial colonization and survival. The specific binding of TAL effectors is based on a central repeat domain of tandemly arranged nearly identical repeats of typically 33 or 34 amino acids (the repeat-variable di-residues, RVD domain).


Members of the TAL effectors family differ mainly in the number and order of their repeats. The number of repeats typically ranges from 1.5 to 33.5 repeats and the C-terminal repeat is usually shorter in length (e.g., about 20 amino acids) and is generally referred to as a “half-repeat.” Each repeat of the TAL effector generally features a one-repeat-to-one-base-pair correlation with different repeat types exhibiting different base-pair specificity (one repeat recognizes one base-pair on the target gene sequence). Generally, the smaller the number of repeats, the weaker the protein-DNA interactions. A number of 6.5 repeats has been shown to be sufficient to activate transcription of a reporter gene (Scholze et al., 2010).


Repeat to repeat variations occur predominantly at amino acid positions 12 and 13, which have therefore been termed “hypervariable” and which are responsible for the specificity of the interaction with the target DNA promoter sequence, as shown in Table 9 listing exemplary repeat variable diresidues (RVD) and their correspondence to nucleic acid base targets.









TABLE 9







RVDs and Nucleic Acid Base Specificity








Target
Possible RVD Amino Acid Combinations























A
NI
NN
CI
HI
KI










G
NN
GN
SN
VN
LN
DN
QN
EN
HN
RH
NK
AN
FN


C
HD
RD
KD
ND
AD










T
NG
HG
VG
IG
EG
MG
YG
AA
EP
VA
QG
KG
RG









Accordingly, it is possible to modify the repeats of a TAL effector to target specific DNA sequences. Further studies have shown that the RVD NK can target G. Target sites of TAL effectors also tend to include a T flanking the 5′ base targeted by the first repeat, but the exact mechanism of this recognition is not known. More than 113 TAL effector sequences are known to date. Non-limiting examples of TAL effectors from Xanthomonas include, Hax2, Hax3, Hax4, AvrXa7, AvrXa10 and AvrBs3.


Accordingly, the TAL effector domain of a TAL effector molecule described herein may be derived from a TAL effector from any bacterial species (e.g., Xanthomonas species such as the African strain of Xanthomonas oryzae pv. Oryzae (Yu et al. 2011), Xanthomonas campestris pv. raphani strain 756C and Xanthomonas oryzae pv. oryzicola strain BLS256 (Bogdanove et al. 2011). In some embodiments, the TAL effector domain comprises an RVD domain as well as flanking sequence(s) (sequences on the N-terminal and/or C-terminal side of the RVD domain) also from the naturally occurring TAL effector. It may comprise more or fewer repeats than the RVD of the naturally occurring TAL effector. The TAL effector molecule can be designed to target a given DNA sequence based on the above code and others known in the art. The number of TAL effector domains (e.g., repeats (monomers or modules)) and their specific sequence can be selected based on the desired DNA target sequence. For example, TAL effector domains, e.g., repeats, may be removed or added in order to suit a specific target sequence. In an embodiment, the TAL effector molecule of the present invention comprises between 6.5 and 33.5 TAL effector domains, e.g., repeats. In an embodiment, TAL effector molecule of the present invention comprises between 8 and 33.5 TAL effector domains, e.g., repeats, e.g., between 10 and 25 TAL effector domains, e.g., repeats, e.g., between 10 and 14 TAL effector domains, e.g., repeats.


In some embodiments, the TAL effector molecule comprises TAL effector domains that correspond to a perfect match to the DNA target sequence. In some embodiments, a mismatch between a repeat and a target base-pair on the DNA target sequence is permitted as along as it allows for the function of the polypeptide comprising the TAL effector molecule. In general, TALE binding is inversely correlated with the number of mismatches. In some embodiments, the TAL effector molecule of a polypeptide of the present invention comprises no more than 7 mismatches, 6 mismatches, 5 mismatches, 4 mismatches, 3 mismatches, 2 mismatches, or 1 mismatch, and optionally no mismatch, with the target DNA sequence. Without wishing to be bound by theory, in general the smaller the number of TAL effector domains in the TAL effector molecule, the smaller the number of mismatches will be tolerated and still allow for the function of the polypeptide comprising the TAL effector molecule. The binding affinity is thought to depend on the sum of matching repeat-DNA combinations. For example, TAL effector molecules having 25 TAL effector domains or more may be able to tolerate up to 7 mismatches.


In addition to the TAL effector domains, the TAL effector molecule of the present invention may comprise additional sequences derived from a naturally occurring TAL effector. The length of the C-terminal and/or N-terminal sequence(s) included on each side of the TAL effector domain portion of the TAL effector molecule can vary and be selected by one skilled in the art, for example based on the studies of Zhang et al. (2011). Zhang et al., have characterized a number of C-terminal and N-terminal truncation mutants in Hax3 derived TAL-effector based proteins and have identified key elements, which contribute to optimal binding to the target sequence and thus activation of transcription. Generally, it was found that transcriptional activity is inversely correlated with the length of N-terminus. Regarding the C-terminus, an important element for DNA binding residues within the first 68 amino acids of the Hax 3 sequence was identified. Accordingly, in some embodiments, the first 68 amino acids on the C-terminal side of the TAL effector domains of the naturally occurring TAL effector is included in the TAL effector molecule. Accordingly, in an embodiment, a TAL effector molecule comprises 1) one or more TAL effector domains derived from a naturally occurring TAL effector; 2) at least 70, 80, 90, 100, 110, 120, 130, 140, 150, 170, 180, 190, 200, 220, 230, 240, 250, 260, 270, 280 or more amino acids from the naturally occurring TAL effector on the N-terminal side of the TAL effector domains; and/or 3) at least 68, 80, 90, 100, 110, 120, 130, 140, 150, 170, 180, 190, 200, 220, 230, 240, 250, 260 or more amino acids from the naturally occurring TAL effector on the C-terminal side of the TAL effector domains.


In some embodiments, an endonuclease domain or DNA-binding domain is or comprises a Zn finger molecule. A Zn finger molecule comprises a Zn finger protein, e.g., a naturally occurring Zn finger protein or engineered Zn finger protein, or fragment thereof. Many Zn finger proteins are known to those of skill in the art and are commercially available, e.g., from Sigma-Aldrich.


In some embodiments, a Zn finger molecule comprises a non-naturally occurring Zn finger protein that is engineered to bind to a target DNA sequence of choice. See, for example, Beerli, et al. (2002) Nature Biotechnol. 20:135-141; Pabo, et al. (2001) Ann. Rev. Biochem. 70:313-340; Isalan, et al. (2001) Nature Biotechnol. 19:656-660; Segal, et al. (2001) Curr. Opin. Biotechnol. 12:632-637; Choo, et al. (2000) Curr. Opin. Struct. Biol. 10:411-416; U.S. Pat. Nos. 6,453,242; 6,534,261; 6,599,692; 6,503,717; 6,689,558; 7,030,215; 6,794,136; 7,067,317; 7,262,054; 7,070,934; 7,361,635; 7,253,273; and U.S. Patent Publication Nos. 2005/0064474; 2007/0218528; 2005/0267061, all incorporated herein by reference in their entireties.


An engineered Zn finger protein may have a novel binding specificity, compared to a naturally occurring Zn finger protein. Engineering methods include, but are not limited to, rational design and various types of selection. Rational design includes, for example, using databases comprising triplet (or quadruplet) nucleotide sequences and individual Zn finger amino acid sequences, in which each triplet or quadruplet nucleotide sequence is associated with one or more amino acid sequences of zinc fingers which bind the particular triplet or quadruplet sequence. See, for example, U.S. Pat. Nos. 6,453,242 and 6,534,261, incorporated by reference herein in their entireties.


Exemplary selection methods, including phage display and two-hybrid systems, are disclosed in U.S. Pat. Nos. 5,789,538; 5,925,523; 6,007,988; 6,013,453; 6,410,248; 6,140,466; 6,200,759; and 6,242,568; as well as International Patent Publication Nos. WO 98/37186; WO 98/53057; WO 00/27878; and WO 01/88197 and GB 2,338,237. In addition, enhancement of binding specificity for zinc finger proteins has been described, for example, in International Patent Publication No. WO 02/077227.


In addition, as disclosed in these and other references, zinc finger domains and/or multi-fingered zinc finger proteins may be linked together using any suitable linker sequences, including for example, linkers of 5 or more amino acids in length. See, also, U.S. Pat. Nos. 6,479,626; 6,903,185; and 7,153,949 for exemplary linker sequences 6 or more amino acids in length. The proteins described herein may include any combination of suitable linkers between the individual zinc fingers of the protein. In addition, enhancement of binding specificity for zinc finger binding domains has been described, for example, in co-owned International Patent Publication No. WO 02/077227.


Zn finger proteins and methods for design and construction of fusion proteins (and polynucleotides encoding same) are known to those of skill in the art and described in detail in U.S. Pat. Nos. 6,140,0815; 789,538; 6,453,242; 6,534,261; 5,925,523; 6,007,988; 6,013,453; and 6,200,759; International Patent Publication Nos. WO 95/19431; WO 96/06166; WO 98/53057; WO 98/54311; WO 00/27878; WO 01/60970; WO 01/88197; WO 02/099084; WO 98/53058; WO 98/53059; WO 98/53060; WO 02/016536; and WO 03/016496.


In addition, as disclosed in these and other references, Zn finger proteins and/or multi-fingered Zn finger proteins may be linked together, e.g., as a fusion protein, using any suitable linker sequences, including for example, linkers of 5 or more amino acids in length. See, also, U.S. Pat. Nos. 6,479,626; 6,903,185; and 7,153,949 for exemplary linker sequences 6 or more amino acids in length. The Zn finger molecules described herein may include any combination of suitable linkers between the individual zinc finger proteins and/or multi-fingered Zn finger proteins of the Zn finger molecule.


In certain embodiments, the DNA-binding domain or endonuclease domain comprises a Zn finger molecule comprising an engineered zinc finger protein that binds (in a sequence-specific manner) to a target DNA sequence. In some embodiments, the Zn finger molecule comprises one Zn finger protein or fragment thereof. In other embodiments, the Zn finger molecule comprises a plurality of Zn finger proteins (or fragments thereof), e.g., 2, 3, 4, 5, 6 or more Zn finger proteins (and optionally no more than 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, or 2 Zn finger proteins). In some embodiments, the Zn finger molecule comprises at least three Zn finger proteins. In some embodiments, the Zn finger molecule comprises four, five or six fingers. In some embodiments, the Zn finger molecule comprises 8, 9, 10, 11 or 12 fingers. In some embodiments, a Zn finger molecule comprising three Zn finger proteins recognizes a target DNA sequence comprising 9 or 10 nucleotides. In some embodiments, a Zn finger molecule comprising four Zn finger proteins recognizes a target DNA sequence comprising 12 to 14 nucleotides. In some embodiments, a Zn finger molecule comprising six Zn finger proteins recognizes a target DNA sequence comprising 18 to 21 nucleotides.


In some embodiments, a Zn finger molecule comprises a two-handed Zn finger protein. Two handed zinc finger proteins are those proteins in which two clusters of zinc finger proteins are separated by intervening amino acids so that the two zinc finger domains bind to two discontinuous target DNA sequences. An example of a two-handed type of zinc finger binding protein is SIP1, where a cluster of four zinc finger proteins is located at the amino terminus of the protein and a cluster of three Zn finger proteins is located at the carboxyl terminus (see Remade, et al. (1999) EMBO Journal 18(18):5073-5084). Each cluster of zinc fingers in these proteins is able to bind to a unique target sequence and the spacing between the two target sequences can comprise many nucleotides.


Linkers


In some embodiments, a gene modifying polypeptide may comprise a linker, e.g., a peptide linker, e.g., a linker as described in Table 10. In some embodiments, a gene modifying polypeptide comprises, in an N-terminal to C-terminal direction, a Cas domain (e.g., a Cas domain of Table 8), a linker of Table 10 (or a sequence having at least 70%, 80%, 85%, 90%, 95%, or 99% identity thereto), and an RT domain (e.g., an RT domain of Table 6 or F3). In some embodiments, a gene modifying polypeptide comprises a flexible linker between the endonuclease and the RT domain, e.g., a linker comprising the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGSS (SEQ ID NO: 5006). In some embodiments, an RT domain of a gene modifying polypeptide may be located C-terminal to the endonuclease domain. In some embodiments, an RT domain of a gene modifying polypeptide may be located N-terminal to the endonuclease domain.









TABLE 10







Exemplary linker sequences









SEQ



ID


Amino Acid Sequence
NO





GGS






GGSGGS
5102





GGSGGSGGS
5103





GGSGGSGGSGGS
5104





GGSGGSGGSGGSGGS
5105





GGSGGSGGSGGSGGSGGS
5106





GGGGS
5107





GGGGSGGGGS
5108





GGGGSGGGGSGGGGS
5109





GGGGSGGGGSGGGGSGGGGS
5110





GGGGSGGGGSGGGGSGGGGSGGGGS
5111





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
5112





GGG






GGGG
5114





GGGGG
5115





GGGGGG
5116





GGGGGGG
5117





GGGGGGGG
5118





GSS






GSSGSS
5120





GSSGSSGSS
5121





GSSGSSGSSGSS
5122





GSSGSSGSSGSSGSS
5123





GSSGSSGSSGSSGSSGSS
5124





EAAAK
5125





EAAAKEAAAK
5126





EAAAKEAAAKEAAAK
5127





EAAAKEAAAKEAAAKEAAAK
5128





EAAAKEAAAKEAAAKEAAAKEAAAK
5129





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
5130





PAP






PAPAP
5132





PAPAPAP
5133





PAPAPAPAP
5134





PAPAPAPAPAP
5135





PAPAPAPAPAPAP
5136





GGSGGG
5137





GGGGGS
5138





GGSGSS
5139





GSSGGS
5140





GGSEAAAK
5141





EAAAKGGS
5142





GGSPAP
5143





PAPGGS
5144





GGGGSS
5145





GSSGGG
5146





GGGEAAAK
5147





EAAAKGGG
5148





GGGPAP
5149





PAPGGG
5150





GSSEAAAK
5151





EAAAKGSS
5152





GSSPAP
5153





PAPGSS
5154





EAAAKPAP
5155





PAPEAAAK
5156





GGSGGGGSS
5157





GGSGSSGGG
5158





GGGGGSGSS
5159





GGGGSSGGS
5160





GSSGGSGGG
5161





GSSGGGGGS
5162





GGSGGGEAAAK
5163





GGSEAAAKGGG
5164





GGGGGSEAAAK
5165





GGGEAAAKGGS
5166





EAAAKGGSGGG
5167





EAAAKGGGGGS
5168





GGSGGGPAP
5169





GGSPAPGGG
5170





GGGGGSPAP
5171





GGGPAPGGS
5172





PAPGGSGGG
5173





PAPGGGGGS
5174





GGSGSSEAAAK
5175





GGSEAAAKGSS
5176





GSSGGSEAAAK
5177





GSSEAAAKGGS
5178





EAAAKGGSGSS
5179





EAAAKGSSGGS
5180





GGSGSSPAP
5181





GGSPAPGSS
5182





GSSGGSPAP
5183





GSSPAPGGS
5184





PAPGGSGSS
5185





PAPGSSGGS
5186





GGSEAAAKPAP
5187





GGSPAPEAAAK
5188





EAAAKGGSPAP
5189





EAAAKPAPGGS
5190





PAPGGSEAAAK
5191





PAPEAAAKGGS
5192





GGGGSSEAAAK
5193





GGGEAAAKGSS
5194





GSSGGGEAAAK
5195





GSSEAAAKGGG
5196





EAAAKGGGGSS
5197





EAAAKGSSGGG
5198





GGGGSSPAP
5199





GGGPAPGSS
5200





GSSGGGPAP
5201





GSSPAPGGG
5202





PAPGGGGSS
5203





PAPGSSGGG
5204





GGGEAAAKPAP
5205





GGGPAPEAAAK
5206





EAAAKGGGPAP
5207





EAAAKPAPGGG
5208





PAPGGGEAAAK
5209





PAPEAAAKGGG
5210





GSSEAAAKPAP
5211





GSSPAPEAAAK
5212





EAAAKGSSPAP
5213





EAAAKPAPGSS
5214





PAPGSSEAAAK
5215





PAPEAAAKGSS
5216





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEA
5217


AAKA






GGGGSEAAAKGGGGS
5218





EAAAKGGGGSEAAAK
5219





SGSETPGTSESATPES
5220





GSAGSAAGSGEF
5221





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
5222









In some embodiments, a linker of a gene modifying polypeptide comprises a motif chosen from: (SGGS)n(SEQ ID NO: 5025), (GGGS)n(SEQ ID NO: 5026), (GGGGS)n(SEQ ID NO: 5027), (G)n, (EAAAK)n(SEQ ID NO: 5028), (GGS)n, or (XP)n.


Gene Modifying Polypeptide Selection by Pooled Screening


Candidate gene modifying polypeptides may be screened to evaluate a candidate's gene editing ability. For example, an RNA gene modifying system designed for the targeted editing of a coding sequence in the human genome may be used. In certain embodiments, such a gene modifying system may be used in conjunction with a pooled screening approach.


For example, a library of gene modifying polypeptide candidates and a template guide RNA (tgRNA) may be introduced into mammalian cells to test the candidates' gene editing abilities by a pooled screening approach. In specific embodiments, a library of gene modifying polypeptide candidates is introduced into mammalian cells followed by introduction of the tgRNA into the cells.


Representative, non-limiting examples of mammalian cells that may be used in screening include HEK293T cells, U2OS cells, HeLa cells, HepG2 cells, Huh7 cells, K562 cells, or iPS cells.


A gene modifying polypeptide candidate may comprise 1) a Cas-nuclease, for example a wild-type Cas nuclease, e.g., a wild-type Cas9 nuclease, a mutant Cas nuclease, e.g., a Cas nickase, for example, a Cas9 nickase such as a Cas9 N863A nickase, or a Cas nuclease selected from Table 7 or 8, 2) a peptide linker, e.g., a sequence from Table D or 10, that may exhibit varying degrees of length, flexibility, hydrophobicity, and/or secondary structure; and 3) a reverse transcriptase (RT), e.g. an RT domain from Table D or 6. A gene modifying polypeptide candidate library comprises: a plurality of different gene modifying polypeptide candidates that differ from each other with respect to one, two or all three of the Cas nuclease, peptide linker or RT domain components, or a plurality of nucleic acid expression vectors that encode such gene modifying polypeptide candidates.


For screening of gene modifying polypeptide candidates, a two-component system may be used that comprises a gene modifying polypeptide component and a tgRNA component. A gene modifying component may comprise, for example, an expression vector, e.g., an expression plasmid or lentiviral vector, that encodes a gene modifying polypeptide candidate, for example, comprises a human codon-optimized nucleic acid that encodes a gene modifying polypeptide candidate, e.g., a Cas-linker-RT fusion as described above. In a particular embodiment, a lentiviral cassette is utilized that comprises: (i) a promoter for expression in mammalian cells, e.g., a CMV promoter; (ii) a gene modifying library candidate, e.g. a Cas-linker-RT fusion comprising a Cas nuclease of Table CC, a peptide linker of Table AA and an RT of Table BB, for example a Cas-linker-RT fusion as in Table D; (iii) a self-cleaving polypeptide, e.g., a T2A peptide; (iv) a marker enabling selection in mammalian cells, e.g., a puromycin resistance gene; and (v) a termination signal, e.g., a poly A tail.


The tgRNA component may comprise a tgRNA or expression vector, e.g., an expression plasmid, that produces the tgRNA, for example, utilizes a U6 promoter to drive expression of the tgRNA, wherein the tgRNA is a non-coding RNA sequence that is recognized by Cas and localizes it to the genomic locus of interest, and that also templates reverse transcription of the desired edit into the genome by the RT domain.


To prepare a pool of cells expressing gene modifying polypeptide library candidates, mammalian cells, e.g., HEK293T or U2OS cells, may be transduced with pooled gene modifying polypeptide candidate expression vector preparations, e.g., lentiviral preparations, of the gene modifying candidate polypeptide library. In a particular embodiment, lentiviral plasmids are utilized, and HEK293 Lenti-X cells are seeded in 15 cm plates (12×106 cells) prior to lentiviral plasmid transfection. In such an embodiment, lentiviral plasmid transfection may be performed using the Lentiviral Packaging Mix (Biosettia) and transfection of the plasmid DNA for the gene modifying candidate library is performed the following day using Lipofectamine 2000 and Opti-MEM media according to the manufacturer's protocol. In such an embodiment, extracellular DNA may be removed by a full media change the next day and virus-containing media may be harvested 48 hours after. Lentiviral media may be concentrated using Lenti-X Concentrator (TaKaRa Biosciences) and 5 mL lentiviral aliquots may be made and stored at −80° C. Lentiviral titering is performed by enumerating colony forming units post-selection, e.g., post Puromycin selection.


For monitoring gene editing of a target DNA, mammalian cells, e.g., HEK293T or U2OS cells, carrying a target DNA may be utilized. In other embodiments for monitoring gene editing of a target DNA, mammalian cells, e.g., HEK293T or U2OS cells, carrying a target DNA genomic landing pad may be utilized. In particular embodiments, the target DNA genomic landing pad may comprise a gene to be edited for treatment of a disease or disorder of interest. In other particular embodiments, the target DNA is a gene sequence that expresses a protein that exhibits detectable characteristics that may be monitored to determine whether gene editing has occurred. For example, in certain embodiments, a blue fluorescence protein (BFP)- or green fluorescence protein (GFP)-expressing genomic landing pad is utilized. In certain embodiments, mammalian cells, e.g., HEK293T or U2OS cells, comprising a target DNA, e.g., a target DNA genomic landing pad, are seeded in culture plates at 500×-3000× cells per gene modifying library candidate and transduced at a 0.2-0.3 multiplicity of infection (MOI) to minimize multiple infections per cell. Puromycin (2.5 ug/mL) may be added 48 hours post infection to allow for selection of infected cells. In such an embodiment, cells may be kept under puromycin selection for at least 7 days and then scaled up for tgRNA introduction, e.g., tgRNA electroporation.


To ascertain whether gene editing occurs, mammalian cells containing a target DNA to be edited may be infected with gene modifying polypeptide library candidates then transfected with tgRNA designed for use in editing of the target DNA. Subsequently, the cells may be analyzed to determine whether editing of the target locus has occurred according to the designed outcome, or whether no editing or imperfect editing has occurred, e.g., by using cell sorting and sequence analysis.


In a particular embodiment, to ascertain whether genome editing occurs, BFP- or GFP-expressing mammalian cells, e.g., HEK293T or U2OS cells, may be infected with gene modifying library candidates and then transfected or electroporated with tgRNA plasmid or RNA, e.g., by electroporation of 250,000 cells/well with 200 ng of a tgRNA plasmid designed to convert BFP-to-GFP or GFP-to-BFP, at a cell count ensuring >250×-1000× coverage per library candidate. In such an embodiment, the genome-editing capacity of the various constructs in this assay may be assessed by sorting the cells by Fluorescence-Activated Cell Sorting (FACS) for expression of the color-converted fluorescent protein (FP) at 4-10 days post-electroporation. Cells are sorted and harvested as distinct populations of unedited cells (exhibiting original florescence protein signal), edited cells (exhibiting converted fluorescence protein signal), and imperfect edit (exhibiting no florescence protein signal) cells. A sample of unsorted cells may also be harvested as the input population to determine candidate enrichment during analysis.


To determine which gene modifying library candidates exhibit genome-editing capacity in an assay, genomic DNA (gDNA) is harvested from the sorted cell populations and analyzed by sequencing the gene modifying library candidates in each population. Briefly, gene modifying candidates may be amplified from the genome using primers specific to the gene modifying polypeptide expression vector, e.g., the lentiviral cassette, amplified in a second round of PCR to dilute genomic DNA, and then sequenced, for example, sequenced by a next-generation sequencing platform. After quality control of sequencing reads, reads of at least about 1500 nucleotides and generally no more than about 3200 nucleotides are mapped to the gene modifying polypeptide library sequences and those containing a minimum of about an 80% match to a library sequence are considered to be successfully aligned to a given candidate for purposes of this pooled screen. In order to identify candidates capable of performing gene editing in the assay, e.g., the BFP-to-GFP or GFP-to-BFP edit, the read count of each library candidate in the edited population is compared to its read count in the initial, unsorted population.


For purposes of pooled screening, gene modifying candidates with genome-editing capacity are identified based on enrichment in the edited (converted FP) population relative to unsorted (input) cells. In some embodiments, an enrichment of at least 1.0, 1.5, 2.0, 2.5, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, or at least 100-fold over the input indicates potentially useful gene editing activity, e.g., at least 2-fold enrichment. In some embodiments, the enrichment is converted to a log-value by taking the log base 2 of the enrichment ratio. In some embodiments, a log 2 enrichment score of at least 0, 1, 2, 3, 4, 5, 5.5, 6.0, 6.2, 6.3, 6.4, 6.5, or at least 6.6 indicates potentially useful gene editing activity, e.g., a log 2 enrichment score of at least 1.0. In particular embodiments, enrichment values observed for gene modifying candidates may be compared to enrichment values observed under similar conditions utilizing a reference, e.g., Element ID No: 17380.


In some embodiments, multiple tgRNAs may be used to screen the gene modifying candidate library. In particular embodiments, a plurality of tgRNAs may be utilized to optimize template/Cas-linker-RT fusion pairs, e.g., for gene editing of particular target genes, for example, gene targets for the treatment of disease. In specific embodiments, a pooled approach to screening gene modifying candidates may be performed using a multiplicity of different tgRNAs in an arrayed format.


In some embodiments, multiple types of edits, e.g., insertions, substitutions, and/or deletions of different lengths, may be used to screen the gene modifying candidate library.


In some embodiments, multiple target sequences, e.g., different fluorescent proteins, may be used to screen the gene modifying candidate library. In some embodiments, multiple target sequences, e.g., different fluorescent proteins, may be used to screen the gene modifying candidate library. In some embodiments, multiple cell types, e.g., HEK293T or U20S, may be used to screen the gene modifying candidate library. The person of ordinary skill in the art will appreciate that a given candidate may exhibit altered editing capacity or even the gain or loss of any observable or useful activity across different conditions, including tgRNA sequence (e.g., nucleotide modifications, PBS length, RT template length), target sequence, target location, type of edit, location of mutation relative to the first-strand nick of the gene modifying polypeptide, or cell type. Thus, in some embodiments, gene modifying library candidates are screened across multiple parameters, e.g., with at least two distinct tgRNAs in at least two cell types, and gene editing activity is identified by enrichment in any single condition. In other embodiments, a candidate with more robust activity across different tgRNA and cell types is identified by enrichment in at least two conditions, e.g., in all conditions screened. For clarity, candidates found to exhibit little to no enrichment under any given condition are not assumed to be inactive across all conditions and may be screened with different parameters or reconfigured at the polypeptide level, e.g., by swapping, shuffling, or evolving domains (e.g., RT domain), linkers, or other signals (e.g., NLS).


Sequences of Exemplary Cas9-Linker-RT Fusions


In some embodiments, a gene modifying polypeptide comprises a linker sequence and an RT sequence. In some embodiments, a gene modifying polypeptide comprises a linker sequence as listed in Table D, or an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, a gene modifying polypeptide comprises the amino acid sequence of an RT domain as listed in Table D, or an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, a gene modifying polypeptide comprises a linker sequence as listed in Table D, or an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and the amino acid sequence of an RT domain as listed in Table D, or an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, a gene modifying polypeptide comprises: (i) a linker sequence as listed in a row of Table D, or an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and (ii) the amino acid sequence of an RT domain as listed in the same row of Table D, or an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


Exemplary Gene Modifying Polypeptides

In some embodiments, a gene modifying polypeptide (e.g., a gene modifying polypeptide that is part of a system described herein) comprises an amino acid sequence of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, a gene modifying polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 80% identity thereto. In some embodiments, a gene modifying polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 90% identity thereto. In some embodiments, a gene modifying polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 95% identity thereto. In some embodiments, a gene modifying polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 99% identity thereto. In some embodiments, a gene modifying polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 1-7743. In some embodiments, a gene modifying polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, a gene modifying polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In some embodiments, a gene modifying polypeptide comprises an amino acid sequence as listed in Table A1, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In some embodiments, a gene modifying polypeptide comprises an amino acid sequence as listed in Table T1, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, a gene modifying polypeptide comprises a linker comprising a linker sequence as listed in Table T1, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, a gene modifying polypeptide comprises an RT domain comprising an RT domain sequence as listed in Table T1, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, a gene modifying polypeptide comprises: (i) a linker comprising a linker sequence as listed in a row of Table T1, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto; and (ii) an RT domain comprising an RT domain sequence as listed in the same row of Table T1, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.









TABLE T1







Selection of exemplary gene modifying polypeptides










SEQ ID NO:





for Full

SEQ ID



Polypeptide

NO: of



Sequence
Linker Sequence
linker
RT name





1372
AEAAAKEAAAKEAAAKEAAAKALEAEA
15,401
AVIRE_P03360_3mutA



AAKEAAAKEAAAKEAAAKA







1197
AEAAAKEAAAKEAAAKEAAAKALEAEA
15,402
FLV_P10273_3mutA



AAKEAAAKEAAAKEAAAKA







2784
AEAAAKEAAAKEAAAKEAAAKALEAEA
15,403
MLVMS_P03355_3mutA_



AAKEAAAKEAAAKEAAAKA

WS





 647
AEAAAKEAAAKEAAAKEAAAKALEAEA
15,404
SFV3L_P27401_2mutA



AAKEAAAKEAAAKEAAAKA









In some embodiments, a gene modifying polypeptide comprises an amino acid sequence as listed in Table T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, a gene modifying polypeptide comprises a linker comprising a linker sequence as listed in Table T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, a gene modifying polypeptide comprises an RT domain comprising an RT domain sequence as listed in Table T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, a gene modifying polypeptide comprises: (i) a linker comprising a linker sequence as listed in a row of Table T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto; and (ii) an RT domain comprising an RT domain sequence as listed in the same row of Table T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.









TABLE T2







Selection of exemplary gene modifying polypeptides










SEQ ID





NO: for





Full

SEQ ID



Polypeptide

NO: of



Sequence
Linker Sequence
linker
RT name





2311
GGGGSGGGGSGGGGSGGGGS
15,405
MLVCB_P08361_3mutA





1373
GGGGSGGGGSGGGGSGGGGSGGGGSGGG
15,406
AVIRE_P03360_3mutA



GS







2644
GGGGSGGGGSGGGGSGGGGSGGGGSGGG
15,407
MLVMS_P03355_PLV919



GS







2304
GSSGSSGSSGSSGSSGSS
15,408
MLVCB_P08361_3mutA





2325
EAAAKEAAAKEAAAKEAAAK
15,409
MLVCB_P08361_3mutA





2322
EAAAKEAAAKEAAAKEAAAKEAAAKEAA
15,410
MLVCB_P08361_3mutA



AK







2187
PAPAPAPAPAP
15,411
MLVBM_Q7SVK7_3mut





2309
PAPAPAPAPAPAP
15,412
MLVCB_P08361_3mutA





2534
PAPAPAPAPAPAP
15,413
MLVFF_P26809_3mutA





2797
PAPAPAPAPAPAP
15,414
MLVMS_P03355_3mutA_





WS





3084
PAPAPAPAPAPAP
15,415
MLVMS_P03355_3mutA_





WS





2868
PAPAPAPAPAPAP
15,416
MLVMS_P03355_PLV919





 126
EAAAKGGG
15,417
PERV_Q4VFZ2_3mut





 306
EAAAKGGG
15,418
PERV_Q4VFZ2_3mut





1410
PAPGGG
15,419
AVIRE_P03360_3mutA





 804
GGGGSSGGS
15,420
WMSV_P03359_3mut





1937
GGGGGSEAAAK
15,421
BAEVM_P10272_3mutA





2721
GGGEAAAKGGS
15,422
MLVMS_P03355_3mut





3018
GGGEAAAKGGS
15,423
MLVMS_P03355_3mut





1018
GGGEAAAKGGS
15,424
XMRV6_A1Z651_3mutA





2317
GGSGGGPAP
15,425
MLVCB_P08361_3mutA





2649
PAPGGSGGG
15,426
MLVMS_P03355_PLV919





2878
PAPGGSGGG
15,427
MLVMS_P03355_PLV919





 912
GGSEAAAKPAP
15,428
WMSV_P03359_3mutA





2338
GGSPAPEAAAK
15,429
MLVCB_P08361_3mutA





2527
GGSPAPEAAAK
15,430
MLVFF_P26809_3mutA





 141
EAAAKGGSPAP
15,431
PERV_Q4VFZ2_3mut





 341
EAAAKGGSPAP
15,432
PERV_Q4VFZ2_3mut





2315
EAAAKPAPGGS
15,433
MLVCB_P08361_3mutA





3080
EAAAKPAPGGS
15,434
MLVMS_P03355_3mutA_





WS





2688
GGGGSSEAAAK
15,435
MLVMS_P03355_PLV919





2885
GGGGSSEAAAK
15,436
MLVMS_P03355_PLV919





2810
GSSGGGEAAAK
15,437
MLVMS_P03355_3mutA_





WS





3057
GSSGGGEAAAK
15,438
MLVMS_P03355_3mutA_





WS





1861
GSSEAAAKGGG
15,439
MLVAV_P03356_3mutA





3056
GSSGGGPAP
15,440
MLVMS_P03355_3mutA_





WS





1038
GSSPAPGGG
15,441
XMRV6_A1Z651_3mutA





2308
PAPGGGGSS
15,442
MLVCB_P08361_3mutA





1672
GGGEAAAKPAP
15,443
KORV_Q9TTC1-





Pro_3mutA





2526
GGGEAAAKPAP
15,444
MLVFF_P26809_3mutA





1938
GGGPAPEAAAK
15,445
BAEVM_P10272_3mutA





2641
GSSEAAAKPAP
15,446
MLVMS_P03355_PLV919





2891
GSSEAAAKPAP
15,447
MLVMS_P03355_PLV919





1225
GSSPAPEAAAK
15,448
FLV_P10273_3mutA





2839
GSSPAPEAAAK
15,449
MLVMS_P03355_3mutA_





WS





3127
GSSPAPEAAAK
15,450
MLVMS_P03355_3mutA_





WS





2798
PAPGSSEAAAK
15,451
MLVMS_P03355_3mutA_





WS





3091
PAPGSSEAAAK
15,452
MLVMS_P03355_3mutA_





WS





1372
AEAAAKEAAAKEAAAKEAAAKALEAEAA
15,453
AVIRE_P03360_3mutA



AKEAAAKEAAAKEAAAKA







1197
AEAAAKEAAAKEAAAKEAAAKALEAEAA
15,454
FLV_P10273_3mutA



AKEAAAKEAAAKEAAAKA







2611
AEAAAKEAAAKEAAAKEAAAKALEAEAA
15,455
MLVMS_P03355_PLV919



AKEAAAKEAAAKEAAAKA







2784
AEAAAKEAAAKEAAAKEAAAKALEAEAA
15,456
MLVMS_P03355_3mutA_



AKEAAAKEAAAKEAAAKA

WS





 480
AEAAAKEAAAKEAAAKEAAAKALEAEAA
15,457
SFV1_P23074_2mutA



AKEAAAKEAAAKEAAAKA







 647
AEAAAKEAAAKEAAAKEAAAKALEAEAA
15,458
SFV3L_P27401_2mutA



AKEAAAKEAAAKEAAAKA







1006
AEAAAKEAAAKEAAAKEAAAKALEAEAA
15,459
XMRV6_A1Z651_3mutA



AKEAAAKEAAAKEAAAKA







2518
SGSETPGTSESATPES
15,460
MLVFF_P26809_3mutA









Subsequences of Exemplary Gene Modifying Polypeptides

In some embodiments, the gene modifying polypeptide comprises, in N-terminal to C-terminal order, one or more (e.g., 1, 2, 3, 4, 5, or all 6) of an N-terminal methionine residue, a first nuclear localization signal (NLS), a DNA binding domain, a linker, an RT domain, and/or a second NLS. In some embodiments, a gene modifying polypeptide comprises, in N-terminal to C-terminal order, a NLS (e.g., a first NLS), a DNA binding domain, a linker, and an RT domain, wherein the linker and RT domain are the linker and RT domain of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said linker and RT domain. In some embodiments, a gene modifying polypeptide comprises, in N-terminal to C-terminal order, a DNA binding domain, a linker, an RT domain, and an NLS (e.g., a second NLS) wherein the linker and RT domain are the linker and RT domain of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said linker and RT domain. In some embodiments, a gene modifying polypeptide comprises, in N-terminal to C-terminal order, a first NLS, a DNA binding domain, a linker, an RT domain, and a second NLS, wherein the linker and RT domain are the linker and RT domain of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said linker and RT domain. In some embodiments, the gene modifying polypeptide further comprises an N-terminal methionine residue.


In some embodiments, the gene modifying polypeptide comprises, in N-terminal to C-terminal order, one or more (e.g., 1, 2, 3, 4, 5, or all 6) of an N-terminal methionine residue, a first nuclear localization signal (NLS) (e.g., of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743 and/or as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto), a DNA binding domain (e.g., a Cas domain, e.g., a SpyCas9 domain, e.g., as listed in Table 8, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto; or a DNA binding domain of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743 and/or as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto), a linker (e.g., of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743 and/or as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto), an RT domain (e.g., of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743 and/or as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto), and a second NLS (e.g., of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743 and/or as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto). In some embodiments, the gene modifying polypeptide further comprises (e.g., C-terminal to the second NLS) a T2A sequence and/or a puromycin sequence (e.g., of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743 and/or as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto). In some embodiments, a nucleic acid encoding a gene modifying polypeptide (e.g., as described herein) encodes a T2A sequence, e.g., wherein the T2A sequence is situated between a region encoding the gene modifying polypeptide and a second region, wherein the second region optionally encodes a selectable marker, e.g., puromycin.


In certain embodiments, the first NLS comprises a first NLS sequence of a gene modifying polypeptide having an amino acid sequence of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the first NLS comprises a first NLS sequence of a gene modifying polypeptide as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the first NLS sequence comprises a C-myc NLS. In certain embodiments, the first NLS comprises the amino acid sequence PAAKRVKLD (SEQ ID NO: 11,095), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In certain embodiments, the gene modifying polypeptide further comprises a spacer sequence between the first NLS and the DNA binding domain. In certain embodiments, the spacer sequence between the first NLS and the DNA binding domain comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids. In certain embodiments, the spacer sequence between the first NLS and the DNA binding domain comprises the amino acid sequence GG.


In certain embodiments, the DNA binding domain comprises a DNA binding domain of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the DNA binding domain comprises a DNA binding domain of a gene modifying polypeptide as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the DNA binding domain comprises a Cas domain (e.g., as listed in Table 8). In certain embodiments, the DNA binding domain comprises the amino acid sequence of a SpyCas9 polypeptide (e.g., as listed in Table 8, e.g., a Cas9 N863A polypeptide), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the DNA binding domain comprises the amino acid sequence:










(SEQ ID NO: 11,096)



DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLK






RTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYH





EKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYN





QLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFD





LAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASM





IKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDG





TEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIP





YYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSL





LYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDS





VEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAH





LFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFK





EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQT





TQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF





DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSK





LVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKS





EQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSM





PQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGK





SKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASA





GELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVI





LADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDA





TLIHQSITGLYETRIDLSQLGGD,







or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In certain embodiments, the gene modifying polypeptide further comprises a spacer sequence between the DNA binding domain and the linker. In certain embodiments, the spacer sequence between the DNA binding domain and the linker comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids. In certain embodiments, the spacer sequence between the DNA binding domain and the linker comprises the amino acid sequence GG.


In certain embodiments, the linker comprises a linker sequence of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the linker comprises a linker sequence of a gene modifying polypeptide as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the linker comprises an amino acid sequence as listed in Table D or 10, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In certain embodiments, the gene modifying polypeptide further comprises a spacer sequence between the linker and the RT domain. In certain embodiments, the spacer sequence between the linker and the RT domain comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids. In certain embodiments, the spacer sequence between the linker and the RT domain comprises the amino acid sequence GG.


In certain embodiments, the RT domain comprises a RT domain sequence of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the RT domain comprises a RT domain sequence of a gene modifying polypeptide as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the RT domain comprises an amino acid sequence as listed in Table D or 6, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain has a length of about 400-500, 500-600, 600-700, 700-800, 800-900, or 900-1000 amino acids.


In certain embodiments, the gene modifying polypeptide further comprises a spacer sequence between the RT domain and the second NLS. In certain embodiments, the spacer sequence between the RT domain and the second NLS comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids. In certain embodiments, the spacer sequence between the RT domain and the second NLS comprises the amino acid sequence AG.


In certain embodiments, the second NLS comprises a second NLS sequence of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743. In certain embodiments, the second NLS comprises a second NLS sequence of a gene modifying polypeptide as listed in any of Tables A1, T1, or T2. In certain embodiments, the second NLS sequence comprises a plurality of partial NLS sequences. In embodiments, the NLS sequence, e.g., the second NLS sequence, comprises a first partial NLS sequence, e.g., comprising the amino acid sequence KRTADGSEFE (SEQ ID NO: 11,097), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In embodiments, the NLS sequence, e.g., the second NLS sequence, comprises a second partial NLS sequence. In embodiments, the NLS sequence, e.g., the second NLS sequence, comprises an SV40A5 NLS, e.g., a bipartite SV40A5 NLS, e.g., comprising the amino acid sequence KRTADGSEFESPKKKAKVE (SEQ ID NO: 11,098), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the NLS sequence, e.g., the second NLS sequence, comprises the amino acid sequence KRTADGSEFEKRTADGSEFESPKKKAKVE (SEQ ID NO: 11,099), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In certain embodiments, the gene modifying polypeptide further comprises a spacer sequence between the second NLS and the T2A sequence and/or puromycin sequence. In certain embodiments, the spacer sequence between the second NLS and the T2A sequence and/or puromycin sequence comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids. In certain embodiments, the spacer sequence between the second NLS and the T2A sequence and/or puromycin sequence comprises the amino acid sequence GSG.


Linkers and RT Domains

In some embodiments, the gene modifying polypeptide comprises a linker (e.g., as described herein) and an RT domain (e.g., as described herein). In certain embodiments, the gene modifying polypeptide comprises, in N-terminal to C-terminal order, a linker (e.g., as described herein) and an RT domain (e.g., as described herein).


In certain embodiments, the linker comprises a linker sequence as listed in Table 10, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the linker comprises a linker sequence of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the linker comprises a linker sequence of any one of SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the linker comprises a linker sequence of any one of SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the linker comprises a linker sequence of an exemplary gene modifying polypeptide listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the RT domain comprises an RT domain sequence as listed in Table 6 or F3, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the RT domain comprises an RT domain sequence of an exemplary gene modifying polypeptide listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In some embodiments, a gene modifying polypeptide comprises a portion of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743, wherein the portion comprises a linker and RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said portion.


In some embodiments, a gene modifying polypeptide comprises a linker of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said linker. In some embodiments, a gene modifying polypeptide comprises a linker of a gene modifying polypeptide of any one of SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said linker. In some embodiments, a gene modifying polypeptide comprises a linker of a gene modifying polypeptide of any one of SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said linker. In some embodiments, a gene modifying polypeptide comprises a linker of a gene modifying polypeptide as listed in any of Tables A1, T1, or T2, or a linker comprising an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In some embodiments, a gene modifying polypeptide comprises an RT domain of a gene modifying polypeptide of any one of SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said RT domain. In some embodiments, a gene modifying polypeptide comprises an RT domain of a gene modifying polypeptide of any one of SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity said RT domain. In some embodiments, a gene modifying polypeptide comprises an RT domain of a gene modifying polypeptide of any one of SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity said RT domain. In some embodiments, a gene modifying polypeptide comprises an RT domain of a gene modifying polypeptide as listed in any of Tables A1, T1, or T2, or an RT domain comprising an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In certain embodiments, the linker and the RT domain of a gene modifying polypeptide comprise the amino acid sequences of a linker and RT domain (or amino acid sequences having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto) of a gene modifying polypeptide having the amino acid sequence of any one of SEQ ID NOs: 1-7743. In certain embodiments, the linker and the RT domain of a gene modifying polypeptide comprise amino acid sequences of a linker and RT domain having at least 80% identity to the linker and RT domains of any one of SEQ ID NOs: 1-7743. In certain embodiments, the linker and the RT domain of a gene modifying polypeptide comprise amino acid sequences of a linker and RT domain having at least 90% identity to the linker and RT domains of any one of SEQ ID NOs: 1-7743. In certain embodiments, the linker and the RT domain of a gene modifying polypeptide comprise amino acid sequences of a linker and RT domain having at least 95% identity to the linker and RT domains of any one of SEQ ID NOs: 1-7743. In certain embodiments, the linker and the RT domain of a gene modifying polypeptide comprise amino acid sequences of a linker and RT domain having at least 99% identity to the linker and RT domains of any one of SEQ ID NOs: 1-7743. In certain embodiments, the linker and the RT domain of a gene modifying polypeptide comprise the amino acid sequences of a linker and RT domain (or amino acid sequences having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto) of a gene modifying polypeptide having the amino acid sequence of any one of SEQ ID NOs: 6001-7743. In certain embodiments, the linker and the RT domain of a gene modifying polypeptide comprise the amino acid sequences of a linker and RT domain (or amino acid sequences having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto) of a gene modifying polypeptide having the amino acid sequence of any one of SEQ ID NOs: 4501-4541. In certain embodiments, the linker and the RT domain of a gene modifying polypeptide comprise the amino acid sequences of a linker and RT domain (or amino acid sequences having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto) from a single row of any of Tables A1, T1, or T2 (e.g., from a single exemplary gene modifying polypeptide as listed in any of Tables A1, T1, or T2).


In certain embodiments, the linker and the RT domain of a gene modifying polypeptide comprise the amino acid sequences of a linker and RT domain (or amino acid sequences having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto) from two different amino acid sequences selected from SEQ ID NOs: 1-7743. In certain embodiments, the linker and the RT domain of a gene modifying polypeptide comprise the amino acid sequences of a linker and RT domain (or amino acid sequences having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto) from different rows of any of Tables A1, T1, or T2.


In certain embodiments, the gene modifying polypeptide further comprises a first NLS (e.g., a 5′ NLS), e.g., as described herein. In certain embodiments, the gene modifying polypeptide further comprises a second NLS (e.g., a 3′ NLS), e.g., as described herein. In certain embodiments, the gene modifying polypeptide further comprises an N-terminal methionine residue.


RT Families and Mutants

In certain embodiments, a gene modifying polypeptide comprises comprises the amino acid sequence of an RT domain sequence from a family selected from: AVIRE, BAEVM, FFV, FLY, FOAMY, GALV, KORV, MLVAV, MLVBM, MLVCB, MLVFF, MLVMS, PERV, SFV1, SFV3L, WMSV, XMRV6, BLVAU, BLVJ, HTL1A, HTL1C, HTL1L, HTL32, HTL3P, HTLV2, JSRV, MLVF5, MLVRD, MMTVB, MPMV, SFVCP, SMRVH, SRV1, SRV2, and WDSV. In certain embodiments, a gene modifying polypeptide comprises comprises the amino acid sequence of an RT domain sequence from a family selected from: AVIRE, BAEVM, FFV, FLY, FOAMY, GALV, KORV, MLVAV, MLVBM, MLVCB, MLVFF, MLVMS, PERV, SFV1, SFV3L, WMSV, and XMRV6.


In certain embodiments, a gene modifying polypeptide comprises comprises the amino acid sequence of an RT domain sequence from an MLVMS RT domain. In embodiments, the amino acid sequence of an RT domain sequence comprises one or more point mutations as listed in column 1 of Table M1, or a point mutation corresponding thereto. In embodiments, the amino acid sequence of an RT domain sequence comprises one or more point mutations as listed in column 3 of Table M1 (MLVMS), or a point mutation corresponding thereto. In embodiments, the amino acid sequence of an RT domain sequence comprises one or more point mutations at an amino acid position of the RT domain as listed in columns 1 and 2 of Table M2, or an amino acid position corresponding thereto.


In certain embodiments, a gene modifying polypeptide comprises comprises the amino acid sequence of an RT domain sequence from an AVIRE RT domain. In embodiments, the amino acid sequence of an RT domain sequence comprises one or more point mutations as listed in column 2 of Table M1, or a point mutation corresponding thereto. In embodiments, the amino acid sequence of an RT domain sequence comprises one or more point mutations as listed in column 4 of Table M1 (AVIRE), or a point mutation corresponding thereto. In embodiments, the amino acid sequence of an RT domain sequence comprises one or more point mutations at an amino acid position of the RT domain as listed in columns 3 and 4 of Table M2, or an amino acid position corresponding thereto. In certain embodiments, the RT domain comprises an IENSSP (SEQ ID NO: 37650) (e.g., at the C-terminus).









TABLE M1







Exemplary point mutations in MLVMS and AVIRE RT domains










RT-linker filing
Corresponding
MLVMS
AVIRE


(MLVMS)
AVIRE
(PLV4921)
(PLV10990)







H8Y



P51L
Q51L




S67R
T67R




E67K
E67K




E69K
E69K




T197A
T197A




D200N
D200N
D200N
D200N


H204R
N204R




E302K
E302K






T306K
T306K


F309N
Y309N




W313F
W313F
W313F
W313F


T330P
G330P
T330P
G330P


L435G
T436G




N454K
N455K




D524G
D526G




E562Q
E564Q




D583N
D585N




H594Q
H596Q




L603W
L605W
L603W
L605W


D653N
D655N




L671P
L673P













IENSSP (SEQ ID NO:




37650) at C-term
















TABLE M2







Positions that can be mutated in exemplary MLVMS


and AVIRE RT domains


WT residue & position













MLVMS

AVIRE



MLVMS aa
position # *
AVIRE aa
position # *
















H
8
Y
8



P
51
Q
51



S
67
T
67



E
69
E
69



T
197
T
197



D
200
D
200



H
204
N
204



E
302
E
302



T
306
T
306



F
309
Y
309



W
313
W
313



T
330
G
330



L
435
T
436



N
454
N
455



D
524
D
526



E
562
E
564



D
583
D
585



H
594
H
596



L
603
L
605



D
653
D
655



L
671
S
673










In certain embodiments, a gene modifying polypeptide comprises a gamma retrovirus derived RT domain. In certain embodiments, the gamma retrovirus-derived RT domain of a gene modifying polypeptide comprises the amino acid sequence of an RT domain sequence from a family selected from: AVIRE, BAEVM, FFV, FLY, FOAMY, GALV, KORV, MLVAV, MLVBM, MLVCB, MLVFF, MLVMS, PERV, SFV1, SFV3L, WMSV, and XMRV6. In some embodiments, the gamma retrovirus-derived RT domain of a gene modifying polypeptide is not derived from PERV. In some embodiments, said RT includes one, two, three, four, five, six or more mutations shown in Table 2A and corresponding to mutations D200N, L603W, T330P, D524G, E562Q, D583N, P51L, S67R, E67K, T197A, H204R, E302K, F309N, W313F, L435G, N454K, H594Q, L671P, E69K, or D653N in the RT domain of murine leukemia virus reverse transcriptase. In some embodiments, the gene modifying polypeptide further comprises a linker having at least 99% identity to a linker domains of any one of SEQ ID NOs: 1-7743. In some embodiments, the gene modifying polypeptide further comprises a linker having at least 99% or 100% identity to SEQ ID NO: 5217 or SEQ ID NO:11,041.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of an AVIRE RT (e.g., an AVIRE_P03360 sequence, e.g., SEQ ID NO: 8001), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of an AVIRE RT further comprising one, two, three, four, or five mutations selected from the group consisting of D200N, G330P, L605W, T306K, and W313F, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of an AVIRE RT further comprising one, two, or three mutations selected from the group consisting of D200N, G330P, and L605W, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a BAEVM RT (e.g., an BAEVM_P10272 sequence, e.g., SEQ ID NO: 8004), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of a BAEVM RT further comprising one, two, three, four, or five mutations selected from the group consisting of D198N, E328P, L602W, T304K, and W311F, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a BAEVM RT further comprising one, two, or three mutations selected from the group consisting of D198N, E328P, and L602W, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of an FFV RT (e.g., an FFV 093209 sequence, e.g., SEQ ID NO: 8012), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of an FFV RT further comprising one, two, three, or four mutations selected from the group consisting of D21N, T293N, T419P, and L393K, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of an FFV RT further comprising one, two, or three mutations selected from the group consisting of D21N, T293N, and T419P, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of an FFV RT further comprising the mutation D21N. In some embodiments, the RT domain comprises the amino acid sequence of an FFV RT further comprising one, two, or three mutations selected from the group consisting of T207N, T333P, and L307K, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of an FFV RT further comprising one or two mutations selected from the group consisting of T207N and T333P, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of an FLV RT (e.g., an FLV_P10273 sequence, e.g., SEQ ID NO: 8019), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of an FLV RT further comprising one, two, three, or four mutations selected from the group consisting of D199N, L602W, T305K, and W312F, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of an FLV RT further comprising one or two mutations selected from the group consisting of D199N and L602W, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a FOAMV RT (e.g., an FOAMV_P14350 sequence, e.g., SEQ ID NO: 8021), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of an FOAMV RT further comprising one, two, three, or four mutations selected from the group consisting of D24N, T296N, S420P, and L396K, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of an FOAMV RT further comprising one, two, or three mutations selected from the group consisting of D24N, T296N, and S420P, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of an FOAMV RT further comprising the mutation D24N, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of an FOAMV RT further comprising one, two, or three mutations selected from the group consisting of T207N, S331P, and L307K, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of an FOAMV RT further comprising one or two mutations selected from the group consisting of T207N and S331P, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a GALV RT (e.g., an GALV_P21414 sequence, e.g., SEQ ID NO: 8027), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of a GALV RT further comprising one, two, three, four, or five mutations selected from the group consisting of D198N, E328P, L600W, T304K, and W311F, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a GALV RT further comprising one, two, or three mutations selected from the group consisting of D198N, E328P, and L600W, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a KORV RT (e.g., an KORV_Q9TTC1 sequence, e.g., SEQ ID NO: 8047), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of a GALV RT further comprising one, two, three, four, five, or six mutations selected from the group consisting of D32N, D322N, E452P, L274W, T428K, and W435F, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a GALV RT further comprising one, two, three, or four mutations selected from the group consisting of D32N, D322N, E452P, and L274W, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a GALV RT further comprising the mutation D32N. In some embodiments, the RT domain comprises the amino acid sequence of a KORV RT further comprising one, two, three, four, or five mutations selected from the group consisting of D231N, E361P, L633W, T337K, and W344F, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a KORV RT further comprising one, two, or three mutations selected from the group consisting of D231N, E361P, and L633W, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a MLVAV RT (e.g., an MLVAV_P03356 sequence, e.g., SEQ ID NO: 8053), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of a MLVAV RT further comprising one, two, three, four, or five mutations selected from the group consisting of D200N, T330P, L603W, T306K, and W313F, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a MLVAV RT further comprising one, two, or three mutations selected from the group consisting of D200N, T330P, and L603W, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a MLVBM RT (e.g., an MLVBM Q7SVK7 sequence, e.g., SEQ ID NO: 8056), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of a MLVBM RT further comprising one, two, three, four, or five mutations selected from the group consisting of D199N, T329P, L602W, T305K, and W312F, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a MLVBM RT further comprising one, two, and three mutations selected from the group consisting of D200N, T330P, and L603W, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a MLVCB RT (e.g., an MLVCB_P08361 sequence, e.g., SEQ ID NO: 8062), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of a MLVCB RT further comprising one, two, three, four, or five mutations selected from the group consisting of D200N, T330P, L603W, T306K, and W313F, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a MLVCB RT further comprising one, two, and three mutations selected from the group consisting of D200N, T330P, and L603W, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a MLVFF RT, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of a MLVFF RT further comprising one, two, three, four, or five mutations selected from the group consisting of D200N, T330P, L603W, T306K, and W313F, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a MLVFF RT further comprising one, two, and three mutations selected from the group consisting of D200N, T330P, and L603W, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a MLVMS RT (e.g., an MLVMS reference sequence, e.g., SEQ ID NO: 8137; or an MLVMS_P03355 sequence, e.g., SEQ ID NO: 8070), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of a MLVMS RT further comprising one, two, three, four, five, or six mutations selected from the group consisting of D200N, T330P, L603W, T306K, W313F, and H8Y, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a MLVMS RT further comprising one, two, three, four, or five mutations selected from the group consisting of D200N, T330P, L603W, T306K, and W313F, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a MLVMS RT further comprising one, two, or three mutations selected from the group consisting of D200N, T330P, and L603W, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a PERV RT (e.g., an PERV_Q4VFZ2 sequence, e.g., SEQ ID NO: 8099), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of a PERV RT further comprising one, two, three, four, or five mutations selected from the group consisting of D196N, E326P, L599W, T302K, and W309F, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a PERV RT further comprising one, two, or three mutations selected from the group consisting of D196N, E326P, and L599W, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a SFV1 RT (e.g., an SFV1_P23074 sequence, e.g., SEQ ID NO: 8105), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of a SFV1 RT further comprising one, two, three, or four mutations selected from the group consisting of D24N, T296N, N420P, and L396K, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a SFV1 RT further comprising one, two, or three mutations selected from the group consisting of D24N, T296N, and N420P, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a SFV1 RT further comprising the D24N, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a SFV3L RT (e.g., an SFV3L P27401 sequence, e.g., SEQ ID NO: 8111), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of a SFV3L RT further comprising one, two, three, or four mutations selected from the group consisting of D24N, T296N, N422P, and L396K, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a SFV3L RT further comprising one, two, or three mutations selected from the group consisting of D24N, T296N, and N422P, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a SFV3L RT further comprising the mutation D24N, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a SFV3L RT further comprising one, two, or three mutations selected from the group consisting of T307N, N333P, and L307K, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a SFV3L RT further comprising one or two mutations selected from the group consisting of T307N and N333P, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a WMSV RT (e.g., an WMSV_P03359 sequence, e.g., SEQ ID NO: 8131), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of a WMSV RT further comprising one, two, three, four, or five mutations selected from the group consisting of D198N, E328P, L600W, T304K, and W311F, or a corresponding position in a homologous RT domain.


In some embodiments, the RT domain comprises the amino acid sequence of a WMSV RT further comprising one, two, or three mutations selected from the group consisting of D198N, E328P, and L600W, or a corresponding position in a homologous RT domain.


In embodiments, the RT domain comprises the amino acid sequence of an RT domain of a XMRV6 RT (e.g., an XMRV6_A1Z651 sequence, e.g., SEQ ID NO: 8134), or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the RT domain comprises the amino acid sequence of a XMRV6 RT further comprising one, two, three, four, or five mutations selected from the group consisting of D200N, T330P, L603W, T306K, and W313F, or a corresponding position in a homologous RT domain. In some embodiments, the RT domain comprises the amino acid sequence of a XMRV6 RT further comprising one, two, or three mutations selected from the group consisting of D200N, T330P, and L603W, or a corresponding position in a homologous RT domain.


In certain embodiments, the RT domain of a gene modifying polypeptide comprises the amino acid sequence of an RT domain of an AVIRE RT, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In embodiments, the RT domain comprises the amino acid sequence of an RT domain comprised in a sequence listed in column 1 of Table A5, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the gene modifying polypeptide further comprises a linker having at least 99% or 100% identity to SEQ ID NO: 5217 or SEQ ID NO:11,041.


In certain embodiments, the RT domain of a gene modifying polypeptide comprises the amino acid sequence of an RT domain of an MLVMS RT, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In embodiments, the RT domain comprises the amino acid sequence of an RT domain comprised in a sequence listed in any of columns 2-6 of Table A5, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the gene modifying polypeptide further comprises a linker having at least 99% or 100% identity to SEQ ID NO: 5217 or SEQ ID NO:11,041.









TABLE A5







Exemplary gene modifying polypeptides comprising an AVIRE


RT domain or an MLVMS RT domain.












AVIRE SEQ ID













NOs:
MLVMS SEQ ID NOs:















1
2704
3007
3038
2638
2930


2
2706
3007
3038
2639
2930


3
2708
3008
3039
2639
2931


4
2709
3008
3039
2640
2931


5
2709
3009
3040
2640
2932


6
2710
3010
3040
2641
2932


7
2957
3010
3041
2641
2933


9
2957
3011
3041
2642
2933


10
2958
3012
3042
2642
2934


12
2959
3012
3042
2643
2934


13
2960
3013
3043
2643
2935


14
2962
3013
3043
2644
2935


6076
6042
3014
3044
2644
2936


6143
6068
3014
3044
2645
2936


6200
6097
3015
3045
2645
2937


6254
6136
3015
3045
2646
2937


6274
6156
3016
3046
2646
2938


6315
6215
3016
3046
2647
2938


6328
6216
3017
3047
2647
2939


6337
6301
3018
3047
2648
2939


6403
6352
3018
3048
2648
2940


6420
6365
3019
3048
2649
2940


6440
6411
3019
3049
2649
2941


6513
6436
3020
3049
2650
2941


6552
6458
3020
3050
2650
2942


6613
6459
3021
3051
2651
2942


6671
6524
3021
3051
2651
2943


6822
6562
3022
3052
2652
2943


6840
6563
3023
3052
2652
2944


6884
6699
3023
3053
2653
2945


6907
6865
3024
3053
2653
2945


6970
7022
3024
3054
2654
2946


7025
7037
3025
3054
2655
2946


7052
7088
3025
3055
2655
2947


7078
7116
3026
3055
2656
2947


7243
7175
3026
3056
2656
2948


7253
7200
3027
3056
2657
2948


7318
7206
3027
3057
2657
2949


7379
7277
3028
3057
2658
2949


7486
7294
3028
3058
2658
2950


7524
7330
3029
3058
2659
2950


7668
7411
3030
3059
2659
2951


7680
7455
3030
3059
2660
2951


7720
7477
3031
3060
2660
2952


1137
7511
3031
3060
2661
2952


1138
7538
3032
3061
2661
2953


1139
7559
3032
3061
2662
2953


1140
7560
3033
3062
2662
2954


1141
7593
3033
3062
2663
2954


1142
7594
3034
3063
2663
2955


1143
7607
3034
3063
2664
2955


1144
7623
6025
3064
2664
6485


1145
7638
6041
3064
2665
6486


1146
7717
6043
3065
2665
6504


1147
7731
6098
3065
2666
6505


1148
7732
6099
3066
2666
6595


1149
2711
6180
3066
2667
6596


1150
2711
6182
3067
2667
6751


1151
2712
6237
3067
2668
6752


1152
2712
6238
3068
2668
6777


1153
2713
6311
3068
2669
6778


1154
2713
6312
3069
2669
7172


1155
2714
6578
3069
2670
7174


1156
2714
6579
3070
2670
7313


1157
2715
6663
3070
2671
7314


1158
2715
6664
3071
2671



1159
2716
6708
3071
2672



1160
2716
6709
3072
2672



1161
2717
6809
3072
2673



1162
2717
6831
3073
2673



1163
2718
6832
3073
2674



1164
2718
6864
3074
2674



1165
2719
6866
3074
2675



1166
2719
7089
3075
2675



1167
2720
7157
3075
2676



6015
2720
7159
3076
2676



6029
2721
7173
3076
2677



6045
2721
7176
3077
2677



6077
2722
7293
3077
2678



6129
2722
7295
3078
2678



6144
2723
7343
3078
2679



6164
2723
7393
3079
2680



6201
2724
7394
3079
2680



6227
2724
7425
3080
2681



6244
2725
7426
3080
2681



6250
2725
7444
3081
2682



6264
2726
7445
3081
2682



6289
2726
7476
3082
2683



6304
2727
7478
3082
2683



6316
2727
7496
3083
2684



6384
2728
7497
3083
2684



6421
2728
7537
3084
2685



6441
2729
7539
3084
2685



6492
2729
2780
3085
2686



6514
2730
2780
3085
2686



6530
2730
2781
3086
2687



6569
2731
2781
3086
2687



6584
2731
2782
3087
2688



6621
2732
2782
3087
2688



6651
2732
2783
3088
2689



6659
2733
2783
3088
2689



6683
2734
2784
3089
2690



6703
2734
2784
3089
2690



6727
2735
2785
3090
2691



6732
2735
2785
3090
2692



6745
2736
2786
3091
2692



6755
2736
2786
3091
2693



6784
2737
2787
3092
2693



6817
2737
2787
3092
2694



6823
2738
2788
3093
2694



6841
2739
2788
3093
2695



6871
2740
2789
3094
2695



6885
2740
2789
3095
2696



6898
2741
2790
3095
2696



6908
2741
2790
3096
2697



6933
2742
2791
3096
2697



6971
2742
2791
3097
2698



7009
2743
2792
3097
2698



7018
2743
2792
3098
2699



7045
2744
2793
3098
2699



7053
2744
2793
3099
2700



7068
2745
2794
3099
2700



7079
2745
2794
3100
2701



7096
2746
2795
3100
2701



7104
2746
2795
3101
2702



7122
2747
2796
3101
2702



7151
2747
2796
3102
2703



7163
2748
2797
3102
2703



7181
2748
2797
3103
2862



7244
2749
2798
3103
2862



7273
2750
2798
3104
2863



7319
2750
2799
3104
2863



7336
2751
2799
3105
2864



7380
2751
2800
3105
2864



7402
2752
2800
3106
2865



7462
2752
2801
3106
2865



7487
2753
2801
3107
2866



7525
2753
2802
3107
2866



7569
2754
2802
3108
2867



7626
2754
2803
3108
2867



7689
2755
2803
3109
2868



7707
2755
2804
3109
2868



7721
2756
2804
3110
2869



1371
2756
2805
3110
2869



1372
2757
2805
3111
2870



1373
2758
2806
3111
2870



1374
2758
2806
3112
2871



1375
2759
2807
3112
2871



1376
2759
2807
3113
2872



1377
2760
2808
3113
2872



1378
2760
2808
3114
2873



1379
2761
2809
3114
2873



1380
2761
2809
3115
2874



1381
2762
2810
3115
2874



1382
2762
2810
3116
2875



1383
2763
2811
3116
2875



1384
2763
2811
3117
2876



1385
2764
2812
3117
2876



1386
2764
2812
3118
2877



1387
2765
2813
3118
2877



1388
2765
2813
3119
2878



1389
2766
2814
3119
2878



1390
2766
2814
3120
2879



1391
2767
2815
3120
2879



1392
2767
2815
3121
2880



1393
2768
2816
3121
2880



1394
2768
2816
3122
2881



1395
2769
2817
3122
2881



1396
2769
2817
3123
2882



1397
2770
2818
3123
2882



1398
2770
2818
3124
2883



1399
2771
2819
3124
2883



1400
2771
2819
3125
2884



1401
2772
2820
3125
2884



1402
2773
2820
3126
2885



1403
2773
2821
3126
2885



1404
2774
2821
3127
2886



1405
2774
2822
3127
2886



1406
2775
2822
3128
2887



1407
2775
2823
3128
2887



1408
2776
2823
3129
2888



1409
2776
2824
3129
2888



1410
2777
2824
3130
2889



1411
2777
2825
3130
2889



1412
2778
2825
3131
2890



1413
2779
2826
3131
2890



1414
2779
2826
3132
2891



1415
2965
2827
3133
2891



1416
2965
2827
3133
2892



1417
2966
2828
3134
2893



1418
2966
2828
3134
2893



1419
2967
2829
3135
2894



1420
2968
2829
3135
2894



1421
2968
2830
3136
2895



1422
2969
2830
3136
2895



1423
2969
2831
6181
2896



1424
2970
2831
6183
2896



1425
2970
2832
6284
2897



1426
2971
2832
6285
2897



1427
2971
2833
6760
2898



1428
2972
2833
6761
2898



1429
2972
2834
7036
2899



1430
2973
2834
7038
2899



1431
2974
2835
7158
2900



1432
2974
2835
7160
2900



1433
2975
2836
2610
2901



1434
2976
2836
2610
2901



1435
2976
2837
2611
2902



1436
2977
2837
2611
2902



1437
2977
2838
2612
2903



1439
2978
2838
2612
2903



1440
2978
2839
2613
2904



1441
2979
2839
2613
2904



1442
2979
2840
2614
2905



1443
2980
2840
2614
2905



1444
2980
2841
2615
2906



1445
2981
2841
2615
2906



1446
2981
2842
2616
2907



1447
2982
2842
2616
2907



6001
2982
2843
2617
2908



6030
2983
2843
2617
2908



6078
2983
2844
2618
2909



6108
2984
2844
2618
2909



6130
2985
2845
2619
2910



6165
2985
2845
2619
2910



6265
2986
2846
2620
2911



6275
2987
2846
2620
2911



6305
2987
2847
2621
2912



6329
2988
2847
2621
2912



6370
2988
2848
2622
2913



6385
2989
2848
2622
2913



6404
2989
2849
2623
2914



6531
2990
2849
2623
2914



6585
2990
2850
2624
2915



6622
2991
2850
2624
2915



6652
2991
2851
2625
2916



6733
2992
2851
2625
2916



6756
2992
2852
2626
2917



6765
2993
2852
2626
2917



6798
2993
2853
2627
2918



6824
2994
2853
2627
2919



6972
2994
2854
2628
2919



7046
2995
2854
2628
2920



7054
2995
2855
2629
2920



7069
2996
2855
2629
2921



7080
2996
2856
2630
2921



7105
2997
2856
2630
2922



7123
2998
2857
2631
2922



7143
2998
2857
2631
2923



7152
2999
2858
2632
2923



7204
2999
2858
2632
2924



7320
3001
2859
2633
2924



7351
3001
2859
2633
2925



7381
3002
2860
2634
2925



7403
3002
2860
2634
2926



7438
3003
2861
2635
2926



7488
3003
2861
2635
2927



7500
3004
3035
2636
2927



7526
3004
3036
2636
2928



7588
3005
3036
2637
2928



7612
3005
3037
2637
2929



7627
3006
3037
2638
2929









Systems

In an aspect, the disclosure relates to a system comprising nucleic acid molecule encoding a gene modifying polypeptide (e.g., as described herein) and a template nucleic acid (e.g., a template RNA, e.g., as described herein). In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide comprises one or more silent mutations in the coding region (e.g., in the sequence encoding the RT domain) relative to a nucleic acid molecule as described herein. In certain embodiments, the system further comprises a gRNA (e.g., a gRNA that binds to a polypeptide that induces a nick, e.g., in the opposite strand of the target DNA bound by the gene modifying polypeptide).


In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide encodes a polypeptide having an amino acid sequence selected from SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide encodes a polypeptide having an amino acid sequence selected from SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide encodes a polypeptide having an amino acid sequence selected from SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide encodes a polypeptide as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide comprises a sequence encoding a portion of an amino acid sequence selected from SEQ ID NOs: 1-7743, wherein the portion comprises a linker and RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said portion. In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide comprises a sequence encoding a portion of an amino acid sequence selected from SEQ ID NOs: 6001-7743, wherein the portion comprises a linker and RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said portion. In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide comprises a sequence encoding a portion of an amino acid sequence selected from SEQ ID NOs: 4501-4541, wherein the portion comprises a linker and RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said portion. In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide comprises a sequence encoding a portion of a polypeptide listed in any of Tables A1, T1, or T2, wherein the portion comprises a linker and RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said portion.


In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide comprises a sequence encoding the linker of an amino acid sequence selected from SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide comprises a sequence encoding the linker of a polypeptide having an amino acid sequence selected from SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide comprises a sequence encoding the linker of a polypeptide having an amino acid sequence selected from SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide comprises a sequence encoding the linker of a polypeptide as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide comprises a sequence encoding the RT domain of an amino acid sequence selected from SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide comprises a sequence encoding the RT domain of a polypeptide having an amino acid sequence selected from SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide comprises a sequence encoding the RT domain of a polypeptide having an amino acid sequence selected from SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the nucleic acid molecule encoding the gene modifying polypeptide comprises a sequence encoding the RT domain of a polypeptide as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In an aspect, the disclosure relates to a system comprising a gene modifying polypeptide (e.g., as described herein) and a template nucleic acid (e.g., a template RNA, e.g., as described herein).


In certain embodiments, the gene modifying polypeptide comprises a polypeptide having an amino acid sequence selected from SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the gene modifying polypeptide comprises a polypeptide having an amino acid sequence selected from SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the gene modifying polypeptide comprises a polypeptide having an amino acid sequence selected from SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the gene modifying polypeptide comprises a polypeptide as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In certain embodiments, the gene modifying polypeptide comprises a portion of an amino acid sequence selected from SEQ ID NOs: 1-7743, wherein the portion comprises a linker and RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said portion. In certain embodiments, the gene modifying polypeptide comprises a portion of an amino acid sequence selected from SEQ ID NOs: 6001-7743, wherein the portion comprises a linker and RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said portion. In certain embodiments, the gene modifying polypeptide comprises a portion of an amino acid sequence selected from SEQ ID NOs: 4501-4541, wherein the portion comprises a linker and RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said portion. In certain embodiments, the gene modifying polypeptide comprises a portion of a polypeptide listed in any of Tables A1, T1, or T2, wherein the portion comprises a linker and RT domain, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity to said portion.


In certain embodiments, the gene modifying polypeptide comprises the linker of an amino acid sequence selected from SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the gene modifying polypeptide comprises a sequence encoding the linker of a polypeptide having an amino acid sequence selected from SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the gene modifying polypeptide comprises a sequence encoding the linker of a polypeptide having an amino acid sequence selected from SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the gene modifying polypeptide comprises the linker of a polypeptide as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.


In certain embodiments, the gene modifying polypeptide comprises the RT domain of an amino acid sequence selected from SEQ ID NOs: 1-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the gene modifying polypeptide comprises a sequence encoding the RT domain of a polypeptide having an amino acid sequence selected from SEQ ID NOs: 6001-7743, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the gene modifying polypeptide comprises a sequence encoding the RT domain of a polypeptide having an amino acid sequence selected from SEQ ID NOs: 4501-4541, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto. In certain embodiments, the gene modifying polypeptide comprises the RT domain of a polypeptide as listed in any of Tables A1, T1, or T2, or an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, or 99% identity thereto.










Lengthy table referenced here




US20240093186A1-20240321-T00001


Please refer to the end of the specification for access instructions.






Localization Sequences for Gene Modifying Systems


In certain embodiments, a gene editor system RNA further comprises an intracellular localization sequence, e.g., a nuclear localization sequence (NLS). In some embodiments, a gene modifying polypeptide comprises an NLS as comprised in SEQ ID NO: 4000 and/or SEQ ID NO: 4001, or an NLS having an amino acid sequence having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


The nuclear localization sequence may be an RNA sequence that promotes the import of the RNA into the nucleus. In certain embodiments the nuclear localization signal is located on the template RNA. In certain embodiments, the gene modifying polypeptide is encoded on a first RNA, and the template RNA is a second, separate, RNA, and the nuclear localization signal is located on the template RNA and not on an RNA encoding the gene modifying polypeptide. While not wishing to be bound by theory, in some embodiments, the RNA encoding the gene modifying polypeptide is targeted primarily to the cytoplasm to promote its translation, while the template RNA is targeted primarily to the nucleus to promote insertion into the genome. In some embodiments the nuclear localization signal is at the 3′ end, 5′ end, or in an internal region of the template RNA. In some embodiments the nuclear localization signal is 3′ of the heterologous sequence (e.g., is directly 3′ of the heterologous sequence) or is 5′ of the heterologous sequence (e.g., is directly 5′ of the heterologous sequence). In some embodiments the nuclear localization signal is placed outside of the 5′ UTR or outside of the 3′ UTR of the template RNA. In some embodiments the nuclear localization signal is placed between the 5′ UTR and the 3′ UTR, wherein optionally the nuclear localization signal is not transcribed with the transgene (e.g., the nuclear localization signal is an anti-sense orientation or is downstream of a transcriptional termination signal or polyadenylation signal). In some embodiments the nuclear localization sequence is situated inside of an intron. In some embodiments a plurality of the same or different nuclear localization signals are in the RNA, e.g., in the template RNA. In some embodiments the nuclear localization signal is less than 5, 10, 25, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900 or 1000 bp in length. Various RNA nuclear localization sequences can be used. For example, Lubelsky and Ulitsky, Nature 555 (107-111), 2018 describe RNA sequences which drive RNA localization into the nucleus. In some embodiments, the nuclear localization signal is a SINE-derived nuclear RNA localization (SIRLOIN) signal. In some embodiments the nuclear localization signal binds a nuclear-enriched protein. In some embodiments the nuclear localization signal binds the HNRNPK protein. In some embodiments the nuclear localization signal is rich in pyrimidines, e.g., is a C/T rich, C/U rich, C rich, T rich, or U rich region. In some embodiments the nuclear localization signal is derived from a long non-coding RNA. In some embodiments the nuclear localization signal is derived from MALAT1 long non-coding RNA or is the 600 nucleotide M region of MALAT1 (described in Miyagawa et al., RNA 18, (738-751), 2012). In some embodiments the nuclear localization signal is derived from BORG long non-coding RNA or is a AGCCC motif (described in Zhang et al., Molecular and Cellular Biology 34, 2318-2329 (2014). In some embodiments the nuclear localization sequence is described in Shukla et al., The EMBO Journal e98452 (2018). In some embodiments the nuclear localization signal is derived from a retrovirus.


In some embodiments, a polypeptide described herein comprises one or more (e.g., 2, 3, 4, 5) nuclear targeting sequences, for example a nuclear localization sequence (NLS). In some embodiments, the NLS is a bipartite NLS. In some embodiments, an NLS facilitates the import of a protein comprising an NLS into the cell nucleus. In some embodiments, the NLS is fused to the N-terminus of a gene modifying polypeptide as described herein. In some embodiments, the NLS is fused to the C-terminus of the gene modifying polypeptide. In some embodiments, the NLS is fused to the N-terminus or the C-terminus of a Cas domain. In some embodiments, a linker sequence is disposed between the NLS and the neighboring domain of the gene modifying polypeptide.


In some embodiments, an NLS comprises the amino acid sequence MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 5009), PKKRKVEGADKRTADGSEFESPKKKRKV(SEQ ID NO: 5010), RKSGKIAAIWKRPRKPKKKRKV (SEQ ID NO: 5011) KRTADGSEFESPKKKRKV(SEQ ID NO: 5012), KKTELQTTNAENKTKKL (SEQ ID NO: 5013), or KRGINDRNFWRGENGRKTR (SEQ ID NO: 5014), KRPAATKKAGQAKKKK (SEQ ID NO: 5015), PAAKRVKLD (SEQ ID NO:4644), KRTADGSEFEKRTADGSEFESPKKKAKVE (SEQ ID NO: 4649), KRTADGSEFE (SEQ ID NO: 4650), KRTADGSEFESPKKKAKVE (SEQ ID NO: 11098), AGKRTADGSEFEKRTADGSEFESPKKKAKVE (SEQ ID NO: 4001), or a functional fragment or variant thereof. Exemplary NLS sequences are also described in PCT/EP2000/011690, the contents of which are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences. In some embodiments, an NLS comprises an amino acid sequence as disclosed in Table 11. An NLS of this table may be utilized with one or more copies in a polypeptide in one or more locations in a polypeptide, e.g., 1, 2, 3 or more copies of an NLS in an N-terminal domain, between peptide domains, in a C-terminal domain, or in a combination of locations, in order to improve subcellular localization to the nucleus. Multiple unique sequences may be used within a single polypeptide. Sequences may be naturally monopartite or bipartite, e.g., having one or two stretches of basic amino acids, or may be used as chimeric bipartite sequences. Sequence references correspond to UniProt accession numbers, except where indicated as SeqNLS for sequences mined using a subcellular localization prediction algorithm (Lin et al BMC Bioinformat 13:157 (2012), incorporated herein by reference in its entirety).









TABLE 11







Exemplary nuclear localization signals for use in gene modifying systems









Sequence
Sequence References
SEQ ID No.





AHFKISGEKRPSTDPGKKAK
Q76IQ7
5223


NPKKKKKKDP







AHRAKKMSKTHA
P21827
5224





ASPEYVNLPINGNG
SeqNLS
5225





CTKRPRW
O88622, Q86W56, Q9QYM2, O02776
5226





DKAKRVSRNKSEKKRR
O15516, Q5RAK8, Q91YB2, Q91YB0,
5227



Q8QGQ6, O08785, Q9WVS9, Q6YGZ4






EELRLKEELLKGIYA
Q9QY16, Q9UHL0, Q2TBP1, Q9QY15
5228





EEQLRRRKNSRLNNTG
G5EFF5
5229





EVLKVIRTGKRKKKAWKR
SeqNLS
5230


MVTKVC







HHHHHHHHHHHHQPH
Q63934, G3V7L5, Q12837
5231





HKKKHPDASVNFSEFSK
P10103, Q4R844, P12682, B0CM99,
5232



A9RA84, Q6YKA4, P09429, P63159,




Q08IE6, P63158, Q9YH06, B1MTB0






HKRTKK
Q2R2D5
5233





IINGRKLKLKKSRRRSSQTS
SeqNLS
5234


NNSFTSRRS







KAEQERRK
Q8LH59
5235





KEKRKRREELFIEQKKRK
SeqNLS
5236





KKGKDEWFSRGKKP
P30999
5237





KKGPSVQKRKKT
Q6ZN17
5238





KKKTVINDLLHYKKEK
SeqNLS, P32354
5239





KKNGGKGKNKPSAKIKK
SeqNLS
5240





KKPKWDDFKKKKK
Q15397, Q8BKS9, Q562C7
5241





KKRKKD
SeqNLS, Q91Z62, Q1A730, Q969P5,
5242



Q2KHT6, Q9CPU7






KKRRKRRRK
SeqNLS
5243





KKRRRRARK
Q9UMS6, D4A702, Q91YE8
5244





KKSKRGR
Q9UBS0
5245





KKSRKRGS
B4FG96
5246





KKSTALSRELGKIMRRR
SeqNLS, P32354
5247





KKSYQDPEIIAHSRPRK
Q9U7C9
5248





KKTGKNRKLKSKRVKTR
Q9Z301, O54943, Q8K3T2
5249





KKVSIAGQSGKLWRWKR
Q6YUL8
5250





KKYENVVIKRSPRKRGRPR
SeqNLS
5251


K







KNKKRK
SeqNLS
5252





KPKKKR
SeqNLS
5253





KRAMKDDSHGNSTSPKRRK
Q0E671
5254





KRANSNLVAAYEKAKKK
P23508
5255





KRASEDTTSGSPPKKSSAGP
Q9BZZ5, Q5R644
5256


KR







KRFKRRWMVRKMKTKK
SeqNLS
5257





KRGLNSSFETSPKKVK
Q8IV63
5258





KRGNSSIGPNDLSKRKQRK
SeqNLS
5259


K







KRIHSVSLSQSQIDPSKKVK
SeqNLS
5260


RAK







KRKGKLKNKGSKRKK
O15381
5261





KRRRRRRREKRKR
Q96GM8
5262





KRSNDRTYSPEEEKQRRA
Q91ZF2
5263





KRTVATNGDASGAHRAKK
SeqNLS
5264


MSK







KRVYNKGEDEQEHLPKGKK
SeqNLS
5265


R







KSGKAPRRRAVSMDNSNK
Q9WVH4, O43524
5266





KVNFLDMSLDDIIIYKELE
Q9P127
5267





KVQHRIAKKTTRRRR
Q9DXE6
5268





LSPSLSPL
Q9Y261, P32182, P35583
5269





MDSLLMNRRKFLYQFKNVR
Q9GZX7
5270


WAKGRRETYLC







MPQNEYIELHRKRYGYRLD
SeqNLS
5271


YHEKKRKKESREAHERSKK




AKKMIGLKAKLYHK







MVQLRPRASR
SeqNLS
5272





NNKLLAKRRKGGASPKDDP
Q965G5
5273


MDDIK







NYKRPMDGTYGPPAKRHEG
O14497, A2BH40
5274


E







PDTKRAKLDSSETTMVKKK
SeqNLS
5275





PEKRTKI
SeqNLS
5276





PGGRGKKK
Q719N1, Q9UBP0, A2VDN5
5277





PGKMDKGEHRQERRDRPY
Q01844, Q61545
5278





PKKGDKYDKTD
Q45FA5
5279





PKKKSRK
O35914, Q01954
5280





PKKNKPE
Q22663
5281





PKKRAKV
P04295, P89438
5282





PKPKKLKVE
P55263, P55262, P55264, Q64640
5283





PKRGRGR
Q9FYS5, Q43386
5284





PKRRLVDDA
P0C797
5285





PKRRRTY
SeqNLS
5286





PLFKRR
A8X6H4, Q9TXJ0
5287





PLRKAKR
Q86WB0, Q5R8V9
5288





PPAKRKCIF
Q6AZ28, O75928, Q8C5D8
5289





PPARRRRL
Q8NAG6
5290





PPKKKRKV
Q3L6L5, P03070, P14999, P03071
5291





PPNKRMKVKH
Q8BN78
5292





PPRIYPQLPSAPT
P0C799
5293





PQRSPFPKSSVKR
SeqNLS
5294





PRPRKVPR
P0C799
5295





PRRRVQRKR
SeqNLS, Q5R448, Q5TAQ9
5296





PRRVRLK
Q58DJ0, P56477, Q13568
5297





PSRKRPR
Q62315, Q5F363, Q92833
5298





PSSKKRKV
SeqNLS
5299





PTKKRVK
P07664
5300





QRPGPYDRP
SeqNLS
5301





RGKGGKGLGKGGAKRHRK
SeqNLS
5302





RKAGKGGGGHKTTKKRSA
B4FG96
5303





KDEKVP







RKIKLKRAK
A1L3G9
5304





RKIKRKRAK
B9X187
5305





RKKEAPGPREELRSRGR
O35126, P54258, Q5IS70, P54259
5306





RKKRKGK
SeqNLS, Q29243, Q62165, Q28685,
5307



O18738, Q9TSZ6, Q14118






RKKRRQRRR
P04326, P69697, P69698, P05907,
5308



P20879, P04613, P19553, P0C1J9,




P20893, P12506, P04612, Q73370,




P0CIK0, P05906, P35965, P04609,




P04610, P04614, P04608, P05905






RKKSIPLSIKNLKRKHKRKK
Q9C0C9
5309


NKITR







RKLVKPKNTKMKTKLRTNP
Q14190
5310


Y







RKRLILSDKGQLDWKK
SeqNLS, Q91Z62, Q1A730, Q2KHT6,
5311



Q9CPU7






RKRLKSK
Q13309
5312





RKRRVRDNM
Q8QPH4, Q809M7, A8C8X1, Q2VNC5,
5313



Q38SQ0, O89749, Q6DNQ9, Q809L9,




Q0A429, Q20NV3, P16509, P16505,




Q6DNQ5, P16506, Q6XT06, P26118,




Q2ICQ2, Q2RCG8, Q0A2D0, Q0A2H9,




Q9IQ46, Q809M3, Q6J847, Q6J856,




B4URE4, A4GCM7, Q0A440, P26120,




P16511,






RKRSPKDKKEKDLDGAGKR
Q7RTP6
5314


RKT







RKRTPRVDGQTGENDMNK
O94851
5315


RRRK







RLPVRRRRRR
P04499, P12541, P03269, P48313,
5316



P03270






RLRFRKPKSK
P69469
5317





RQQRKR
Q14980
5318





RRDLNSSFETSPKKVK
Q8K3G5
5319





RRDRAKLR
Q9SLB8
5320





RRGDGRRR
Q80WE1, Q5R9B4, Q06787, P35922
5321





RRGRKRKAEKQ
Q812D1, Q5XXA9, Q99JF8, Q8MJG1,
5322



Q66T72, O75475






RRKKRR
Q0VD86, Q58DS6, Q5R6G2, Q9ERI5,
5323



Q6AYK2, Q6NYC1






RRKRSKSEDMDSVESKRRR
Q7TT18
5324





RRKRSR
Q99PU7, D3ZHS6, Q92560, A2VDM8
5325





RRPKGKTLQKRKPK
Q6ZN17
5326





RRRGFERFGPDNMGRKRK
Q63014, Q9DBR0
5327





RRRGKNKVAAQNCRK
SeqNLS
5328





RRRKRR
Q5FVH8, Q6MZT1, Q08DH5, Q8BQP9
5329





RRRQKQKGGASRRR
SeqNLS
5330





RRRREGPRARRRR
P08313, P10231
5331





RRTIRLKLVYDKCDRSCKIQ
SeqNLS
5332


KKNRNKCQYCRFHKCLSVG




MSHNAIRFGRMPRSEKAKL




KAE







RRVPQRKEVSRCRKCRK
Q5RJN4, Q32L09, Q8CAK3, Q9NUL5
5333





RVGGRRQAVECIEDLLNEP
P03255
5334


GQPLDLSCKRPRP







RVVKLRIAP
P52639, Q8JMN0
5335





RVVRRR
P70278
5336





SKRKTKISRKTR
Q5RAY1, O00443
5337





SYVKTVPNRTRTYIKL
P21935
5338





TGKNEAKKRKIA
P52739, Q8K3J5, Q5RAU9
5339





TLSPASSPSSVSCPVIPASTD
SeqNLS
5340


ESPGSALNI







VSKKQRTGKKIH
P52739, Q8K3J5, Q5RAU9
5341





SPKKKRKVE

5342





KRTAD GSEFE SPKKKRKVE

5343





PAAKRVKLD

5344





PKKKRKV

5345





MDSLLMNRRKFLYQFKNVR

5346


WAKGRRETYLC







SPKKKRKVEAS

5347





MAPKKKRKVGIHRGVP

5348





KRTADGSEFEKRTADGSEFE

5349


SPKKKAKVE







KRTADGSEFE

5350





KRTADGSEFESPKKKAKVE

5351





AGKRTADGSEFEKRTADGS

4001


EFESPKKKAKVE









In some embodiments, the NLS is a bipartite NLS. A bipartite NLS typically comprises two basic amino acid clusters separated by a spacer sequence (which may be, e.g., about 10 amino acids in length). A monopartite NLS typically lacks a spacer. An example of a bipartite NLS is the nucleoplasmin NLS, having the sequence KR[PAATKKAGQA]KKKK (SEQ ID NO: 5015), wherein the spacer is bracketed. Another exemplary bipartite NLS has the sequence PKKKRKVEGADKRTADGSEFESPKKKRKV (SEQ ID NO: 5016). Exemplary NLSs are described in International Application WO2020051561, which is herein incorporated by reference in its entirety, including for its disclosures regarding nuclear localization sequences.


In certain embodiments, a gene editor system polypeptide (e.g., a gene modifying polypeptide as described herein) further comprises an intracellular localization sequence, e.g., a nuclear localization sequence and/or a nucleolar localization sequence. The nuclear localization sequence and/or nucleolar localization sequence may be amino acid sequences that promote the import of the protein into the nucleus and/or nucleolus, where it can promote integration of heterologous sequence into the genome. In certain embodiments, a gene editor system polypeptide (e.g., (e.g., a gene modifying polypeptide as described herein) further comprises a nucleolar localization sequence. In certain embodiments, the gene modifying polypeptide is encoded on a first RNA, and the template RNA is a second, separate, RNA, and the nucleolar localization signal is encoded on the RNA encoding the gene modifying polypeptide and not on the template RNA. In some embodiments, the nucleolar localization signal is located at the N-terminus, C-terminus, or in an internal region of the polypeptide. In some embodiments, a plurality of the same or different nucleolar localization signals are used. In some embodiments, the nuclear localization signal is less than 5, 10, 25, 50, 75, or 100 amino acids in length. Various polypeptide nucleolar localization signals can be used. For example, Yang et al., Journal of Biomedical Science 22, 33 (2015), describe a nuclear localization signal that also functions as a nucleolar localization signal. In some embodiments, the nucleolar localization signal may also be a nuclear localization signal. In some embodiments, the nucleolar localization signal may overlap with a nuclear localization signal. In some embodiments, the nucleolar localization signal may comprise a stretch of basic residues. In some embodiments, the nucleolar localization signal may be rich in arginine and lysine residues. In some embodiments, the nucleolar localization signal may be derived from a protein that is enriched in the nucleolus. In some embodiments, the nucleolar localization signal may be derived from a protein enriched at ribosomal RNA loci. In some embodiments, the nucleolar localization signal may be derived from a protein that binds rRNA. In some embodiments, the nucleolar localization signal may be derived from MSP58. In some embodiments, the nucleolar localization signal may be a monopartite motif. In some embodiments, the nucleolar localization signal may be a bipartite motif. In some embodiments, the nucleolar localization signal may consist of a multiple monopartite or bipartite motifs. In some embodiments, the nucleolar localization signal may consist of a mix of monopartite and bipartite motifs. In some embodiments, the nucleolar localization signal may be a dual bipartite motif. In some embodiments, the nucleolar localization motif may be a KRASSQALGTIPKRRSSSRFIKRKK (SEQ ID NO: 5017). In some embodiments, the nucleolar localization signal may be derived from nuclear factor-κB-inducing kinase. In some embodiments, the nucleolar localization signal may be an RKKRKKK motif (SEQ ID NO: 5018) (described in Birbach et al., Journal of Cell Science, 117 (3615-3624), 2004).


Evolved Variants of Gene Modifying Polypeptides and Systems


In some embodiments, the invention provides evolved variants of gene modifying polypeptides as described herein. Evolved variants can, in some embodiments, be produced by mutagenizing a reference gene modifying polypeptide, or one of the fragments or domains comprised therein. In some embodiments, one or more of the domains (e.g., the reverse transcriptase domain) is evolved. One or more of such evolved variant domains can, in some embodiments, be evolved alone or together with other domains. An evolved variant domain or domains may, in some embodiments, be combined with unevolved cognate component(s) or evolved variants of the cognate component(s), e.g., which may have been evolved in either a parallel or serial manner.


In some embodiments, the process of mutagenizing a reference gene modifying polypeptide, or fragment or domain thereof, comprises mutagenizing the reference gene modifying polypeptide or fragment or domain thereof. In embodiments, the mutagenesis comprises a continuous evolution method (e.g., PACE) or non-continuous evolution method (e.g., PANCE), e.g., as described herein. In some embodiments, the evolved gene modifying polypeptide, or a fragment or domain thereof, comprises one or more amino acid variations introduced into its amino acid sequence relative to the amino acid sequence of the reference gene modifying polypeptide, or fragment or domain thereof. In embodiments, amino acid sequence variations may include one or more mutated residues (e.g., conservative substitutions, non-conservative substitutions, or a combination thereof) within the amino acid sequence of a reference gene modifying polypeptide, e.g., as a result of a change in the nucleotide sequence encoding the gene modifying polypeptide that results in, e.g., a change in the codon at any particular position in the coding sequence, the deletion of one or more amino acids (e.g., a truncated protein), the insertion of one or more amino acids, or any combination of the foregoing. The evolved variant gene modifying polypeptide may include variants in one or more components or domains of the gene modifying polypeptide (e.g., variants introduced into a reverse transcriptase domain).


In some aspects, the disclosure provides gene modifying polypeptides, systems, kits, and methods using or comprising an evolved variant of a gene modifying polypeptide, e.g., employs an evolved variant of a gene modifying polypeptide or a gene modifying polypeptide produced or producible by PACE or PANCE. In embodiments, the unevolved reference gene modifying polypeptide is a gene modifying polypeptide as disclosed herein.


The term “phage-assisted continuous evolution (PACE),” as used herein, generally refers to continuous evolution that employs phage as viral vectors. Examples of PACE technology have been described, for example, in International PCT Application No. PCT/US 2009/056194, filed Sep. 8, 2009, published as WO 2010/028347 on Mar. 11, 2010; International PCT Application, PCT/US2011/066747, filed Dec. 22, 2011, published as WO 2012/088381 on Jun. 28, 2012; U.S. Pat. No. 9,023,594, issued May 5, 2015; U.S. Pat. No. 9,771,574, issued Sep. 26, 2017; U.S. Pat. No. 9,394,537, issued Jul. 19, 2016; International PCT Application, PCT/US2015/012022, filed Jan. 20, 2015, published as WO 2015/134121 on Sep. 11, 2015; U.S. Pat. No. 10,179,911, issued Jan. 15, 2019; and International PCT Application, PCT/US2016/027795, filed Apr. 15, 2016, published as WO 2016/168631 on Oct. 20, 2016, the entire contents of each of which are incorporated herein by reference.


The term “phage-assisted non-continuous evolution (PANCE),” as used herein, generally refers to non-continuous evolution that employs phage as viral vectors. Examples of PANCE technology have been described, for example, in Suzuki T. et al, Crystal structures reveal an elusive functional domain of pyrrolysyl-tRNA synthetase, Nat Chem Biol. 13(12): 1261-1266 (2017), incorporated herein by reference in its entirety. Briefly, PANCE is a technique for rapid in vivo directed evolution using serial flask transfers of evolving selection phage (SP), which contain a gene of interest to be evolved, across fresh host cells (e.g., E. coli cells). Genes inside the host cell may be held constant while genes contained in the SP continuously evolve. Following phage growth, an aliquot of infected cells may be used to transfect a subsequent flask containing host E. coli. This process can be repeated and/or continued until the desired phenotype is evolved, e.g., for as many transfers as desired.


Methods of applying PACE and PANCE to gene modifying polypeptides may be readily appreciated by the skilled artisan by reference to, inter alia, the foregoing references. Additional exemplary methods for directing continuous evolution of genome-modifying proteins or systems, e.g., in a population of host cells, e.g., using phage particles, can be applied to generate evolved variants of gene modifying polypeptides, or fragments or subdomains thereof. Non-limiting examples of such methods are described in International PCT Application, PCT/US2009/056194, filed Sep. 8, 2009, published as WO 2010/028347 on Mar. 11, 2010; International PCT Application, PCT/US2011/066747, filed Dec. 22, 2011, published as WO 2012/088381 on Jun. 28, 2012; U.S. Pat. No. 9,023,594, issued May 5, 2015; U.S. Pat. No. 9,771,574, issued Sep. 26, 2017; U.S. Pat. No. 9,394,537, issued Jul. 19, 2016; International PCT Application, PCT/US2015/012022, filed Jan. 20, 2015, published as WO 2015/134121 on Sep. 11, 2015; U.S. Pat. No. 10,179,911, issued Jan. 15, 2019; International Application No. PCT/US2019/37216, filed Jun. 14, 2019, International Patent Publication WO 2019/023680, published Jan. 31, 2019, International PCT Application, PCT/US2016/027795, filed Apr. 15, 2016, published as WO 2016/168631 on Oct. 20, 2016, and International Patent Publication No. PCT/US2019/47996, filed Aug. 23, 2019, each of which is incorporated herein by reference in its entirety.


In some non-limiting illustrative embodiments, a method of evolution of a evolved variant gene modifying polypeptide, of a fragment or domain thereof, comprises: (a) contacting a population of host cells with a population of viral vectors comprising the gene of interest (the starting gene modifying polypeptide or fragment or domain thereof), wherein: (1) the host cell is amenable to infection by the viral vector; (2) the host cell expresses viral genes required for the generation of viral particles; (3) the expression of at least one viral gene required for the production of an infectious viral particle is dependent on a function of the gene of interest; and/or (4) the viral vector allows for expression of the protein in the host cell, and can be replicated and packaged into a viral particle by the host cell. In some embodiments, the method comprises (b) contacting the host cells with a mutagen, using host cells with mutations that elevate mutation rate (e.g., either by carrying a mutation plasmid or some genome modification—e.g., proofing-impaired DNA polymerase, SOS genes, such as UmuC, UmuD′, and/or RecA, which mutations, if plasmid-bound, may be under control of an inducible promoter), or a combination thereof. In some embodiments, the method comprises (c) incubating the population of host cells under conditions allowing for viral replication and the production of viral particles, wherein host cells are removed from the host cell population, and fresh, uninfected host cells are introduced into the population of host cells, thus replenishing the population of host cells and creating a flow of host cells. In some embodiments, the cells are incubated under conditions allowing for the gene of interest to acquire a mutation. In some embodiments, the method further comprises (d) isolating a mutated version of the viral vector, encoding an evolved gene product (e.g., an evolved variant gene modifying polypeptide, or fragment or domain thereof), from the population of host cells.


The skilled artisan will appreciate a variety of features employable within the above-described framework. For example, in some embodiments, the viral vector or the phage is a filamentous phage, for example, an M13 phage, e.g., an M13 selection phage. In certain embodiments, the gene required for the production of infectious viral particles is the M13 gene III (gIII) In embodiments, the phage may lack a functional gIII, but otherwise comprise gI, gII, gIV, gV, gVI, gVII, gVIII, gIX, and a gX. In some embodiments, the generation of infectious VSV particles involves the envelope protein VSV-G. Various embodiments can use different retroviral vectors, for example, Murine Leukemia Virus vectors, or Lentiviral vectors. In embodiments, the retroviral vectors can efficiently be packaged with VSV-G envelope protein, e.g., as a substitute for the native envelope protein of the virus.


In some embodiments, host cells are incubated according to a suitable number of viral life cycles, e.g., at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 200, at least 300, at least 400, at least, 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1250, at least 1500, at least 1750, at least 2000, at least 2500, at least 3000, at least 4000, at least 5000, at least 7500, at least 10000, or more consecutive viral life cycles, which in on illustrative and non-limiting examples of M13 phage is 10-20 minutes per virus life cycle. Similarly, conditions can be modulated to adjust the time a host cell remains in a population of host cells, e.g., about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 70, about 80, about 90, about 100, about 120, about 150, or about 180 minutes. Host cell populations can be controlled in part by density of the host cells, or, in some embodiments, the host cell density in an inflow, e.g., 103 cells/ml, about 104 cells/ml, about 105 cells/ml, about 5-105 cells/ml, about 106 cells/ml, about 5-106 cells/ml, about 107 cells/ml, about 5-107 cells/ml, about 108 cells/ml, about 5-108 cells/ml, about 109 cells/ml, about 5·109 cells/ml, about 1010 cells/ml, or about 5·1010 cells/ml.


Inteins


In some embodiments, as described in more detail below, an intein-N(intN) domain may be fused to the N-terminal portion of a first domain of a gene modifying polypeptide described herein, and an intein-C(intC) domain may be fused to the C-terminal portion of a second domain of a gene modifying polypeptide described herein for the joining of the N-terminal portion to the C-terminal portion, thereby joining the first and second domains. In some embodiments, the first and second domains are each independently chosen from a DNA binding domain, an RNA binding domain, an RT domain, and an endonuclease domain.


Inteins can occur as self-splicing protein intron (e.g., peptide), e.g., which ligates flanking N-terminal and C-terminal exteins (e.g., fragments to be joined). An intein may, in some instances, comprise a fragment of a protein that is able to excise itself and join the remaining fragments (the exteins) with a peptide bond in a process known as protein splicing. Inteins are also referred to as “protein introns.” The process of an intein excising itself and joining the remaining portions of the protein is herein termed “protein splicing” or “intein-mediated protein splicing.”


In some embodiments, an intein of a precursor protein (an intein containing protein prior to intein-mediated protein splicing) comes from two genes. Such intein is referred to herein as a split intein (e.g., split intein-N and split intein-C). Accordingly, an intein-based approach may be used to join a first polypeptide sequence and a second polypeptide sequence together. For example, in cyanobacteria, DnaE, the catalytic subunit a of DNA polymerase III, is encoded by two separate genes, dnaE-n and dnaE-c. An intein-N domain, such as that encoded by the dnaE-n gene, when situated as part of a first polypeptide sequence, may join the first polypeptide sequence with a second polypeptide sequence, wherein the second polypeptide sequence comprises an intein-C domain, such as that encoded by the dnaE-c gene. Accordingly, in some embodiments, a protein can be made by providing nucleic acid encoding the first and second polypeptide sequences (e.g., wherein a first nucleic acid molecule encodes the first polypeptide sequence and a second nucleic acid molecule encodes the second polypeptide sequence), and the nucleic acid is introduced into the cell under conditions that allow for production of the first and second polypeptide sequences, and for joining of the first to the second polypeptide sequence via an intein-based mechanism.


Use of inteins for joining heterologous protein fragments is described, for example, in Wood et al., J. Biol. Chem. 289(21); 14512-9 (2014) (incorporated herein by reference in its entirety). For example, when fused to separate protein fragments, the inteins IntN and IntC may recognize each other, splice themselves out, and/or simultaneously ligate the flanking N- and C-terminal exteins of the protein fragments to which they were fused, thereby reconstituting a full-length protein from the two protein fragments.


In some embodiments, a synthetic intein based on the dnaE intein, the Cfa-N(e.g., split intein-N) and Cfa-C(e.g., split intein-C) intein pair, is used. Examples of such inteins have been described, e.g., in Stevens et al., J Am Chem Soc. 2016 Feb. 24; 138(7):2162-5 (incorporated herein by reference in its entirety). Non-limiting examples of intein pairs that may be used in accordance with the present disclosure include: Cfa DnaE intein, Ssp GyrB intein, Ssp DnaX intein, Ter DnaE3 intein, Ter ThyX intein, Rma DnaB intein and Cne Prp8 intein (e.g., as described in U.S. Pat. No. 8,394,604, incorporated herein by reference.


In some embodiments involving a split Cas9, an intein-N domain and an intein-C domain may be fused to the N-terminal portion of the split Cas9 and the C-terminal portion of a split Cas9, respectively, for the joining of the N-terminal portion of the split Cas9 and the C-terminal portion of the split Cas9. For example, in some embodiments, an intein-N is fused to the C-terminus of the N-terminal portion of the split Cas9, i.e., to form a structure of N—[N-terminal portion of the split Cas9]-[intein-N]— C. In some embodiments, an intein-C is fused to the N-terminus of the C-terminal portion of the split Cas9, i.e., to form a structure of N-[intein-C]— [C-terminal portion of the split Cas9]-C. The mechanism of intein-mediated protein splicing for joining the proteins the inteins are fused to (e.g., split Cas9) is described in Shah et al., Chem Sci. 2014; 5(1):446-461, incorporated herein by reference. Methods for designing and using inteins are known in the art and described, for example by WO2020051561, WO2014004336, WO2017132580, US20150344549, and US20180127780, each of which is incorporated herein by reference in their entirety.


In some embodiments, a split refers to a division into two or more fragments. In some embodiments, a split Cas9 protein or split Cas9 comprises a Cas9 protein that is provided as an N-terminal fragment and a C-terminal fragment encoded by two separate nucleotide sequences. The polypeptides corresponding to the N-terminal portion and the C-terminal portion of the Cas9 protein may be spliced to form a reconstituted Cas9 protein. In embodiments, the Cas9 protein is divided into two fragments within a disordered region of the protein, e.g., as described in Nishimasu et al., Cell, Volume 156, Issue 5, pp. 935-949, 2014, or as described in Jiang et al. (2016) Science 351: 867-871 and PDB file: 5F9R (each of which is incorporated herein by reference in its entirety). A disordered region may be determined by one or more protein structure determination techniques known in the art, including, without limitation, X-ray crystallography, NMR spectroscopy, electron microscopy (e.g., cryoEM), and/or in silico protein modeling. In some embodiments, the protein is divided into two fragments at any C, T, A, or S, e.g., within a region of SpCas9 between amino acids A292-G364, F445-K483, or E565-T637, or at corresponding positions in any other Cas9, Cas9 variant (e.g., nCas9, dCas9), or other napDNAbp. In some embodiments, protein is divided into two fragments at SpCas9 T310, T313, A456, S469, or C574. In some embodiments, the process of dividing the protein into two fragments is referred to as splitting the protein.


In some embodiments, a protein fragment ranges from about 2-1000 amino acids (e.g., between 2-10, 10-50, 50-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, or 900-1000 amino acids) in length. In some embodiments, a protein fragment ranges from about 5-500 amino acids (e.g., between 5-10, 10-50, 50-100, 100-200, 200-300, 300-400, or 400-500 amino acids) in length. In some embodiments, a protein fragment ranges from about 20-200 amino acids (e.g., between 20-30, 30-40, 40-50, 50-100, or 100-200 amino acids) in length.


In some embodiments, a portion or fragment of a gene modifying polypeptide is fused to an intein. The nuclease can be fused to the N-terminus or the C-terminus of the intein. In some embodiments, a portion or fragment of a fusion protein is fused to an intein and fused to an AAV capsid protein. The intein, nuclease and capsid protein can be fused together in any arrangement (e.g., nuclease-intein-capsid, intein-nuclease-capsid, capsid-intein-nuclease, etc.). In some embodiments, the N-terminus of an intein is fused to the C-terminus of a fusion protein and the C-terminus of the intein is fused to the N-terminus of an AAV capsid protein.


In some embodiments, an endonuclease domain (e.g., a nickase Cas9 domain) is fused to intein-N and a polypeptide comprising an RT domain is fused to an intein-C.


Exemplary nucleotide and amino acid sequences of intein-N domains and compatible intein-C domains are provided below:









DnaE Intein-N DNA:


(SEQ ID NO: 5029)


TGCCTGTCATACGAAACCGAGATACTGACAGTAGAATATGGCCTTCTGC





CAATCGGGAAGATTGTGGAGAAACGGATAGAATGCACAGTTTACTCTGT





CGATAACAATGGTAACATTTATACTCAGCCAGTTGCCCAGTGGCACGAC





CGGGGAGAGCAGGAAGTATTCGAATACTGTCTGGAGGATGGAAGTCTCA





TTAGGGCCACTAAGGACCACAAATTTATGACAGTCGATGGCCAGATGCT





GCCTATAGACGAAATCTTTGAGCGAGAGTTGGACCTCATGCGAGTTGAC





AACCTTCCTAAT





DnaE Intein-N Protein:


(SEQ ID NO: 5030)


CLSYETEILTVEYGLLPIGKIVEKRIECTVYSVDNNGNIYTQPVAQWHD





RGEQEVFEYCLEDGSLIRATKDHKFMTVDGQMLPIDEIFERELDLMRVD





NLPN





DnaE Intein-C DNA:


(SEQ ID NO: 5031)


ATGATCAAGATAGCTACAAGGAAGTATCTTGGCAAACAAAACGTTTATG





ATATTGGAGTCGAAAGAGATCACAACTTTGCTCTGAAGAACGGATTCAT





AGCTTCTAAT





DnaE Intein-C Protein:


(SEQ ID NO: 5032)


MIKIATRKYLGKQNVYDIGVERDHNFALKNGFIASN





Cfa-N DNA:


(SEQ ID NO: 5033)


TGCCTGTCTTATGATACCGAGATACTTACCGTTGAATATGGCTTCTTGC





CTATTGGAAAGATTGTCGAAGAGAGAATTGAATGCACAGTATATACTGT





AGACAAGAATGGTTTCGTTTACACACAGCCCATTGCTCAATGGCACAAT





CGCGGCGAACAAGAAGTATTTGAGTACTGTCTCGAGGATGGAAGCATCA





TACGAGCAACTAAAGATCATAAATTCATGACCACTGACGGGCAGATGTT





GCCAATAGATGAGATATTCGAGCGGGGCTTGGATCTCAAACAAGTGGAT





GGATTG CCA





Cfa-N Protein:


(SEQ ID NO: 5034)


CLSYDTEILTVEYGFLPIGKIVEERIECTVYTVDKNGFVYTQPIAQWHN





RGEQEVFEYCLEDGSIIRATKDHKFMTTDGQMLPIDEIFERGLDLKQVD





GLP





Cfa-C DNA:


(SEQ ID NO: 5035)


ATGAAGAGGACTGCCGATGGATCAGAGTTTGAATCTCCCAAGAAGAAGA





GGAAAGTAAAGATAATATCTCGAAAAAGTCTTGGTACCCAAAATGTCTA





TGATATTGGAGTGGAGAAAGATCACAACTTCCTTCTCAAGAACGGTCTC





GTAGCCAGCAAC





Cfa-C Protein:


(SEQ ID NO: 5036)


MKRTADGSEFESPKKKRKVKIISRKSLGTQNVYDIGVEKDHNFLLKNGL





VASN






Additional Domains


The gene modifying polypeptide can bind a target DNA sequence and template nucleic acid (e.g., template RNA), nick the target site, and write (e.g., reverse transcribe) the template into DNA, resulting in a modification of the target site. In some embodiments, additional domains may be added to the polypeptide to enhance the efficiency of the process. In some embodiments, the gene modifying polypeptide may contain an additional DNA ligation domain to join reverse transcribed DNA to the DNA of the target site. In some embodiments, the polypeptide may comprise a heterologous RNA-binding domain. In some embodiments, the polypeptide may comprise a domain having 5′ to 3′ exonuclease activity (e.g., wherein the 5′ to 3′ exonuclease activity increases repair of the alteration of the target site, e.g., in favor of alteration over the original genomic sequence). In some embodiments, the polypeptide may comprise a domain having 3′ to 5′ exonuclease activity, e.g., proof-reading activity. In some embodiments, the writing domain, e.g., RT domain, has 3′ to 5′ exonuclease activity, e.g., proof-reading activity.


Template Nucleic Acids

The gene modifying systems described herein can modify a host target DNA site using a template nucleic acid sequence. In some embodiments, the gene modifying systems described herein transcribe an RNA sequence template into host target DNA sites by target-primed reverse transcription (TPRT). By modifying DNA sequence(s) via reverse transcription of the RNA sequence template directly into the host genome, the gene modifying system can insert an object sequence into a target genome without the need for exogenous DNA sequences to be introduced into the host cell (unlike, for example, CRISPR systems), as well as eliminate an exogenous DNA insertion step. The gene modifying system can also delete a sequence from the target genome or introduce a substitution using an object sequence. Therefore, the gene modifying system provides a platform for the use of customized RNA sequence templates containing object sequences, e.g., sequences comprising heterologous gene coding and/or function information.


In some embodiments, the template nucleic acid comprises one or more sequence (e.g., 2 sequences) that binds the gene modifying polypeptide.


In some embodiments a system or method described herein comprises a single template nucleic acid (e.g., template RNA). In some embodiments a system or method described herein comprises a plurality of template nucleic acids (e.g., template RNAs). For example, a system described herein comprises a first RNA comprising (e.g., from 5′ to 3′) a sequence that binds the gene modifying polypeptide (e.g., the DNA-binding domain and/or the endonuclease domain, e.g., a gRNA) and a sequence that binds a target site (e.g., a second strand of a site in a target genome), and a second RNA (e.g., a template RNA) comprising (e.g., from 5′ to 3′) optionally a sequence that binds the gene modifying polypeptide (e.g., that specifically binds the RT domain), a heterologous object sequence, and a PBS sequence. In some embodiments, when the system comprises a plurality of nucleic acids, each nucleic acid comprises a conjugating domain. In some embodiments, a conjugating domain enables association of nucleic acid molecules, e.g., by hybridization of complementary sequences. For example, in some embodiments a first RNA comprises a first conjugating domain and a second RNA comprises a second conjugating domain, and the first and second conjugating domains are capable of hybridizing to one another, e.g., under stringent conditions. In some embodiments, the stringent conditions for hybridization include hybridization in 4× sodium chloride/sodium citrate (SSC), at about 65 C, followed by a wash in 1×SSC, at about 65 C.


In some embodiments, the template nucleic acid comprises RNA. In some embodiments, the template nucleic acid comprises DNA (e.g., single stranded or double stranded DNA).


In some embodiments, the template nucleic acid comprises one or more (e.g., 2) homology domains that have homology to the target sequence. In some embodiments, the homology domains are about 10-20, 20-50, or 50-100 nucleotides in length.


In some embodiments, a template RNA can comprise a gRNA sequence, e.g., to direct the gene modifying polypeptide to a target site of interest. In some embodiments, a template RNA comprises (e.g., from 5′ to 3′) (i) optionally a gRNA spacer that binds a target site (e.g., a second strand of a site in a target genome), (ii) optionally a gRNA scaffold that binds a polypeptide described herein (e.g., a gene modifying polypeptide or a Cas polypeptide), (iii) a heterologous object sequence comprising a mutation region (optionally the heterologous object sequence comprises, from 5′ to 3′, a first homology region, a mutation region, and a second homology region), and (iv) a primer binding site (PBS) sequence comprising a 3′ target homology domain.


The template nucleic acid (e.g., template RNA) component of a genome editing system described herein typically is able to bind the gene modifying polypeptide of the system. In some embodiments the template nucleic acid (e.g., template RNA) has a 3′ region that is capable of binding a gene modifying polypeptide. The binding region, e.g., 3′ region, may be a structured RNA region, e.g., having at least 1, 2 or 3 hairpin loops, capable of binding the gene modifying polypeptide of the system. The binding region may associate the template nucleic acid (e.g., template RNA) with any of the polypeptide modules. In some embodiments, the binding region of the template nucleic acid (e.g., template RNA) may associate with an RNA-binding domain in the polypeptide. In some embodiments, the binding region of the template nucleic acid (e.g., template RNA) may associate with the reverse transcription domain of the gene modifying polypeptide (e.g., specifically bind to the RT domain). In some embodiments, the template nucleic acid (e.g., template RNA) may associate with the DNA binding domain of the polypeptide, e.g., a gRNA associating with a Cas9-derived DNA binding domain. In some embodiments, the binding region may also provide DNA target recognition, e.g., a gRNA hybridizing to the target DNA sequence and binding the polypeptide, e.g., a Cas9 domain. In some embodiments, the template nucleic acid (e.g., template RNA) may associate with multiple components of the polypeptide, e.g., DNA binding domain and reverse transcription domain.


In some embodiments the template RNA has a poly-A tail at the 3′ end. In some embodiments the template RNA does not have a poly-A tail at the 3′ end.


In some embodiments, the template nucleic acid is a template RNA. In some embodiments, the template RNA comprises one or more modified nucleotides. For example, in some embodiments, the template RNA comprises one or more deoxyribonucleotides. In some embodiments, regions of the template RNA are replaced by DNA nucleotides, e.g., to enhance stability of the molecule. For example, the 3′end of the template may comprise DNA nucleotides, while the rest of the template comprises RNA nucleotides that can be reverse transcribed. For instance, in some embodiments, the heterologous object sequence is primarily or wholly made up of RNA nucleotides (e.g., at least 90%, 95%, 98%, or 99% RNA nucleotides). In some embodiments, the PBS sequence is primarily or wholly made up of DNA nucleotides (e.g., at least 90%, 95%, 98%, or 99% DNA nucleotides). In other embodiments, the heterologous object sequence for writing into the genome may comprise DNA nucleotides. In some embodiments, the DNA nucleotides in the template are copied into the genome by a domain capable of DNA-dependent DNA polymerase activity. In some embodiments, the DNA-dependent DNA polymerase activity is provided by a DNA polymerase domain in the polypeptide. In some embodiments, the DNA-dependent DNA polymerase activity is provided by a reverse transcriptase domain that is also capable of DNA-dependent DNA polymerization, e.g., second strand synthesis. In some embodiments, the template molecule is composed of only DNA nucleotides.


In some embodiments, a system described herein comprises two nucleic acids which together comprise the sequences of a template RNA described herein. In some embodiments, the two nucleic acids are associated with each other non-covalently, e.g., directly associated with each other (e.g., via base pairing), or indirectly associated as part of a complex comprising one or more additional molecule.


A template RNA described herein may comprise, from 5′ to 3′: (1) a gRNA spacer; (2) a gRNA scaffold; (3) heterologous object sequence (4) a primer binding site (PBS) sequence. Each of these components is now described in more detail.


gRNA Spacer and gRNA Scaffold


A template RNA described herein may comprise a gRNA spacer that directs the gene modifying system to a target nucleic acid, and a gRNA scaffold that promotes association of the template RNA with the Cas domain of the gene modifying polypeptide. The systems described herein can also comprise a gRNA that is not part of a template nucleic acid. For example, a gRNA that comprises a gRNA spacer and gRNA scaffold, but not a heterologous object sequence or a PBS sequence, can be used, e.g., to induce second strand nicking, e.g., as described in the section herein entitled “Second Strand Nicking”.


In some embodiments, the gRNA is a short synthetic RNA composed of a scaffold sequence that participates in CRISPR-associated protein binding and a user-defined −20 nucleotide targeting sequence for a genomic target. The structure of a complete gRNA was described by Nishimasu et al. Cell 156, P935-949 (2014). The gRNA (also referred to as sgRNA for single-guide RNA) consists of crRNA- and tracrRNA-derived sequences connected by an artificial tetraloop. The crRNA sequence can be divided into guide (20 nt) and repeat (12 nt) regions, whereas the tracrRNA sequence can be divided into anti-repeat (14 nt) and three tracrRNA stem loops (Nishimasu et al. Cell 156, P935-949 (2014)). In practice, guide RNA sequences are generally designed to have a length of between 17-24 nucleotides (e.g., 19, 20, or 21 nucleotides) and be complementary to a targeted nucleic acid sequence. Custom gRNA generators and algorithms are available commercially for use in the design of effective guide RNAs. In some embodiments, the gRNA comprises two RNA components from the native CRISPR system, e.g., crRNA and tracrRNA. As is well known in the art, the gRNA may also comprise a chimeric, single guide RNA (sgRNA) containing sequence from both a tracrRNA (for binding the nuclease) and at least one crRNA (to guide the nuclease to the sequence targeted for editing/binding). Chemically modified sgRNAs have also been demonstrated to be effective for use with CRISPR-associated proteins; see, for example, Hendel et al. (2015) Nature Biotechnol., 985-991. In some embodiments, a gRNA spacer comprises a nucleic acid sequence that is complementary to a DNA sequence associated with a target gene.


In some embodiments, the region of the template nucleic acid, e.g., template RNA, comprising the gRNA adopts an underwound ribbon-like structure of gRNA bound to target DNA (e.g., as described in Mulepati et al. Science 19 Sep. 2014:Vol. 345, Issue 6203, pp. 1479-1484). Without wishing to be bound by theory, this non-canonical structure is thought to be facilitated by rotation of every sixth nucleotide out of the RNA-DNA hybrid. Thus, in some embodiments, the region of the template nucleic acid, e.g., template RNA, comprising the gRNA may tolerate increased mismatching with the target site at some interval, e.g., every sixth base. In some embodiments, the region of the template nucleic acid, e.g., template RNA, comprising the gRNA comprising homology to the target site may possess wobble positions at a regular interval, e.g., every sixth base, that do not need to base pair with the target site.


In some embodiments, the template nucleic acid (e.g., template RNA) has at least 15, 16, 17, 18, 19, 20, 21, 22, 23, or 24 bases of at least 80%, 85%, 90%, 95%, 99%, or 100% homology to the target site, e.g., at the 5′ end, e.g., comprising a gRNA spacer sequence of length appropriate to the Cas9 domain of the gene modifying polypeptide (Table 8).


In some embodiments, a Cas9 derivative with enhanced activity may be used in the gene modification polypeptide. In some embodiments, a Cas9 derivative may comprise mutations that improve activity of the HNH endonuclease domain, e.g., SpyCas9 R221K, N394K, or mutations that improve R-loop formation, e.g., SpyCas9 L1245V, or comprise a combination of such mutations, e.g., SpyCas9 R221K/N394K, SpyCas9 N394K/L1245V, SpyCas9 R221K/L1245V, or SpyCas9 R221K/N394K/L1245V (see, e.g., Spencer and Zhang Sci Rep 7:16836 (2017), the Cas9 derivatives and comprising mutations of which are incorporated herein by reference). In some embodiments, a Cas9 derivative may comprise one or more types of mutations described herein, e.g., PAM-modifying mutations, protein stabilizing mutations, activity enhancing mutations, and/or mutations partially or fully inactivating one or two endonuclease domains relative to the parental enzyme (e.g., one or more mutations to abolish endonuclease activity towards one or both strands of a target DNA, e.g., a nickase or catalytically dead enzyme). In some embodiments, a Cas9 enzyme used in a system described herein may comprise mutations that confer nickase activity toward the enzyme (e.g., SpyCas9 N863A or H840A) in addition to mutations improving catalytic efficiency (e.g., SpyCas9 R221K, N394K, and/or L1245V). In some embodiments, a Cas9 enzyme used in a system described herein is a SpyCas9 enzyme or derivative that further comprises an N863A mutation to confer nickase activity in addition to R221K and N394K mutations to improve catalytic efficiency.


Table 12 provides parameters to define components for designing gRNA and/or Template RNAs to apply Cas variants listed in Table 8 for gene modifying. The cut site indicates the validated or predicted protospacer adjacent motif (PAM) requirements, validated or predicted location of cut site (relative to the most upstream base of the PAM site). The gRNA for a given enzyme can be assembled by concatenating the crRNA, Tetraloop, and tracrRNA sequences, and further adding a 5′ spacer of a length within Spacer (min) and Spacer (max) that matches a protospacer at a target site. Further, the predicted location of the ssDNA nick at the target is important for designing a PBS sequence of a Template RNA that can anneal to the sequence immediately 5′ of the nick in order to initiate target primed reverse transcription. In some embodiments, a gRNA scaffold described herein comprises a nucleic acid sequence comprising, in the 5′ to 3′ direction, a crRNA of Table 12, a tetraloop from the same row of Table 12, and a tracrRNA from the same row of Table 12, or a sequence having at least 70%, 80%, 85%, 90%, 95%, or 99% identity thereto. In some embodiments, the gRNA or template RNA comprising the scaffold further comprises a gRNA spacer having a length within the Spacer (min) and Spacer (max) indicated in the same row of Table 12. In some embodiments, the gRNA or template RNA having a sequence according to Table 12 is comprised by a system that further comprises a gene modifying polypeptide, wherein the gene modifying polypeptide comprises a Cas domain described in the same row of Table 12.









TABLE 12







Parameters to define components for designing gRNA and/or Template RNAs to apply Cas variants listed in Table 8


in gene modifying systems





















Spacer
Spacer

SEQ ID


SEQ ID


Variant
PAM(s)
Cut
Tier
(min)
(max)
crRNA
NO:
Tetraloop
tracrRNA
NO:





Nme2Cas9
NNNNCC
−3
1
22
24
GTTGTAGC
10,051
GAAA
CGAAATGAGAACCGTTGCTACAATAAGGC
10,151








TCCCTTTCT


CGTCTGAAAAGATGTGCCGCAACGCTCTG









CATTTCG


CCCCTTAAAGCTTCTGCTTTAAGGGGCATC












GTTTA






PpnCas9
NNNNRTT

1
21
24
GTTGTAGC
10,052
GAAA
GCGAAATGAAAAACGTTGTTACAATAAGA
10,152








TCCCTTTTT


GATGAATTTCTCGCAAAGCTCTGCCTCTTG









CATTTCGC


AAATTTCGGTTTCAAGAGGCATC






SauCas9
NNGRR;
−3
1
21
23
GTTTTAGT
10,053
GAAA
CAGAATCTACTAAAACAAGGCAAAATGCC
10,153



NNGRRT




ACTCTG


GTGTTTATCTCGTCAACTTGTTGGCGAGA






SauCas9-KKH
NNNRR;
−3
1
21
21
GTTTTAGT
10,054
GAAA
CAGAATCTACTAAAACAAGGCAAAATGCC
10,154



NNNRRT




ACTCTG


GTGTTTATCTCGTCAACTTGTTGGCGAGA






SauriCas9
NNGG
−3
1
21
21
GTTTTAGT
10,055
GAAA
CAGAATCTACTAAAACAAGGCAAAATGCC
10,155








ACTCTG


GTGTTTATCTCGTCAACTTGTTGGCGAGA






SauriCas9-KKH
NNRG
−3
1
21
21
GTTTTAGT
10,056
GAAA
CAGAATCTACTAAAACAAGGCAAAATGCC
10,156








ACTCTG


GTGTTTATCTCGTCAACTTGTTGGCGAGA






ScaCas9-Sc++
NNG
−3
1
20
20
GTTTTAGA
10,057
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,157








GCTA


TCAACTTGAAAAAGTGGCACCGAGTCGGT












GC






SpyCas9
NGG
−3
1
20
20
GTTTTAGA
10,058
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,158








GCTA


TCAACTTGAAAAAGTGGCACCGAGTCGGT












GC






SpyCas9_i_v1
NGG
−3
1
20
20
GTTTTAGA
10,058
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,193








GCTA


TCAACTTGGACTTCGGTCCAAGTGGCACC












GAGTCGGTGC






SpyCas9_i_v2
NGG
−3
1
20
20
GTTTTAGA
10,058
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,194








GCTA


TCAACTTGGAGCTTGCTCCAAGTGGCACC












GAGTCGGTGC






SpyCas9_i_v3
NGG
−3
1
20
20
GTTTTAGA
10,058
GAAA
GTTTTAGAGCTAGAAATAGCAAGTTAAAA
10,195








GCTA


TAAGGCTAGTCCGTTATCGACTTGAAAAA












GTCGCACCGAGTCGGTGC






SpyCas9-NG
NG
−3
1
20
20
GTTTTAGA
10,059
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,159



(NGG =




GCTA


TCAACTTGAAAAAGTGGCACCGAGTCGGT




NGA =







GC




NGT > NGC)














SpyCas9-SpRY
NRN > NYN
−3
1
20
20
GTTTTAGA
10,060
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,160








GCTA


TCAACTTGAAAAAGTGGCACCGAGTCGGT












GC






St1Cas9
NNAGAAW >
−3
1
20
20
GTCTTTGTA
10,061
GTAC
CAGAAGCTACAAAGATAAGGCTTCATGCC
10,161



NNAGGAW =




CTCTG


GAAATCAACACCCTGTCATTTTATGGCAG




NNGGAAW







GGTGTTTT






BlatCas9
NNNNCNAA >
−3
1
19
23
GCTATAGT
10,062
GAAA
GGTAAGTTGCTATAGTAAGGGCAACAGAC
10,162



NNNNCNDD >




TCCTTACT


CCGAGGCGTTGGGGATCGCCTAGCCCGTG




NNNNC







TTTACGGGCTCTCCCCATATTCAAAATAAT












GACAGACGAGCACCTTGGAGCATTTATCT












CCGAGGTGCT






cCas9-v16
NNVACT;
−3
2
21
21
GTCTTAGT
10,063
GAAA
CAGAATCTACTAAGACAAGGCAAAATGCC
10,163



NNVATGM;




ACTCTG


GTGTTTATCTCGTCAACTTGTTGGCGAGA




NNVATT;












NNVGCT;












NNVGTG;












NNVGTT














cCas9-v17
NNVRRN
−3
2
21
21
GTCTTAGT
10,064
GAAA
CAGAATCTACTAAGACAAGGCAAAATGCC
10,164








ACTCTG


GTGTTTATCTCGTCAACTTGTTGGCGAGA






cCas9-v21
NNVACT;
−3
2
21
21
GTCTTAGT
10,065
GAAA
CAGAATCTACTAAGACAAGGCAAAATGCC
10,165



NNVATGM;




ACTCTG


GTGTTTATCTCGTCAACTTGTTGGCGAGA




NNVATT;












NNVGCT;












NNVGTG;












NNVGTT














cCas9-v42
NNVRRN
−3
2
21
21
GTCTTAGT
10,066
GAAA
CAGAATCTACTAAGACAAGGCAAAATGCC
10,166








ACTCTG


GTGTTTATCTCGTCAACTTGTTGGCGAGA






CdiCas9
NNRHHHY;

2
22
22
ACTGGGGT
10,067
GAAA
CTGAACCTCAGTAAGCATTGGCTCGTTTCC
10,167



NNRAAAY




TCAG


AATGTTGATTGCTCCGCCGGTGCTCCTTAT












TTTTAAGGGCGCCGGC






CjeCas9
NNNNRYAC
−3
2
21
23
GTTTTAGTC
10,068
GAAA
AGGGACTAAAATAAAGAGTTTGCGGGACT
10,168








CCT


CTGCGGGGTTACAATCCCCTAAAACCGC






GeoCas9
NNNNCRAA

2
21
23
GTCATAGT
10,069
GAAA
TCAGGGTTACTATGATAAGGGCTTTCTGCC
10,169








TCCCCTGA


TAAGGCAGACTGACCCGCGGCGTTGGGG












ATCGCCTGTCGCCCGCTTTTGGCGGGCATT












CCCCATCCTT






iSpyMacCas9
NAAN
−3
2
19
21
GTTTTAGA
10,070
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,170








GCTA


TCAACTTGAAAAAGTGGCACCGAGTCGGT












GC






NmeCas9
NNNNGAYT;
−3
2
20
24
GTTGTAGC
10,071
GAAA
CGAAATGAGAACCGTTGCTACAATAAGGC
10,171



NNNNGYTT;




TCCCTTTCT


CGTCTGAAAAGATGTGCCGCAACGCTCTG




NNNNGAYA;




CATTTCG


CCCCTTAAAGCTTCTGCTTTAAGGGGCATC




NNNNGTCT







GTTTA






ScaCas9
NNG
−3
2
20
20
GTTTTAGA
10,072
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,172








GCTA


TCAACTTGAAAAAGTGGCACCGAGTCGGT












GC






ScaCas9-HiFi-
NNG
−3
2
20
20
GTTTTAGA
10,073
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,173


Sc++





GCTA


TCAACTTGAAAAAGTGGCACCGAGTCGGT












GC






SpyCas9-3var-
NRRH
−3
2
20
20
GTTTAAGA
10,074
GAAA
CAGCATAGCAAGTTTAAATAAGGCTAGTC
10,174


NRRH





GCTATGCT


CGTTATCAACTTGAAAAAGTGGCACCGAG









G


TCGGTGC






SpyCas9-3var-
NRTH
−3
2
20
20
GTTTAAGA
10,075
GAAA
CAGCATAGCAAGTTTAAATAAGGCTAGTC
10,175


NRTH





GCTATGCT


CGTTATCAACTTGAAAAAGTGGCACCGAG









G


TCGGTGC






SpyCas9-3var-
NRCH
−3
2
20
20
GTTTAAGA
10,076
GAAA
CAGCATAGCAAGTTTAAATAAGGCTAGTC
10,176


NRCH





GCTATGCT


CGTTATCAACTTGAAAAAGTGGCACCGAG









G


TCGGTGC






SpyCas9-HF1
NGG
−3
2
20
20
GTTTTAGA
10,077
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,177








GCTA


TCAACTTGAAAAAGTGGCACCGAGTCGGT












GC






SpyCas9-
NAAG
−3
2
20
20
GTTTTAGA
10,078
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,178


QQR1





GCTA


TCAACTTGAAAAAGTGGCACCGAGTCGGT












GC






SpyCas9-SpG
NGN
−3
2
20
20
GTTTTAGA
10,079
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,179








GCTA


TCAACTTGAAAAAGTGGCACCGAGTCGGT












GC






SpyCas9-VQR
NGAN
−3
2
20
20
GTTTTAGA
10,080
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,180








GCTA


TCAACTTGAAAAAGTGGCACCGAGTCGGT












GC






SpyCas9-VRER
NGCG
−3
2
20
20
GTTTTAGA
10,081
GAAA
TAGCAAGTTAAAATAAGGCTAGTCCGTTA
10,181








GCTA


TCAACTTGAAAAAGTGGCACCGAGTCGGT












GC






SpyCas9-xCas
NG; GAA;
−3
2
20
20
GTTTAAGA
10,082
GAAA
CAGCATAGCAAGTTTAAATAAGGCTAGTC
10,182



GAT




GCTATGCT


CGTTATCAACTTGAAAAAGTGGCACCGAG









G


TCGGTGC






SpyCas9-xCas-
NG
−3
2
20
20
GTTTAAGA
10,083
GAAA
CAGCATAGCAAGTTTAAATAAGGCTAGTC
10,183


NG





GCTATGCT


CGTTATCAACTTGAAAAAGTGGCACCGAG









G


TCGGTGC






St1Cas9-
NNACAA
−3
2
20
20
GTCTTTGTA
10,084
GTAC
CAGAAGCTACAAAGATAAGGCTTCATGCC
10,184


CNRZ1066





CTCTG


GAAATCAACACCCTGTCATTTTATGGCAG












GGTGTTTT






St1Cas9-
NNGCAA
−3
2
20
20
GTCTTTGTA
10,085
GTAC
CAGAAGCTACAAAGATAAGGCTTCATGCC
10,185


LMG1831





CTCTG


GAAATCAACACCCTGTCATTTTATGGCAG












GGTGTTTT






St1Cas9-
NNAAAA
−3
2
20
20
GTCTTTGTA
10,086
GTAC
CAGAAGCTACAAAGATAAGGCTTCATGCC
10,186


MTH17CL396





CTCTG


GAAATCAACACCCTGTCATTTTATGGCAG












GGTGTTTT






St1Cas9-
NNGAAA
−3
2
20
20
GTCTTTGTA
10,087
GTAC
CAGAAGCTACAAAGATAAGGCTTCATGCC
10,187


TH1477





CTCTG


GAAATCAACACCCTGTCATTTTATGGCAG












GGTGTTTT






SRGN3.1
NNGG

1
21
23
GTTTTAGT
10,088
GAAA
CAGAATCTACTGAAACAAGACAATATGTC
10,188








ACTCTG


GTGTTTATCCCATCAATTTATTGGTGGGAT












TTT






sRGN3.3
NNGG

1
21
23
GTTTTAGT
10,089
GAAA
CAGAATCTACTGAAACAAGACAATATGTC
10,189








ACTCTG


GTGTTTATCCCATCAATTTATTGGTGGGAT












TTT









Herein, when an RNA sequence (e.g., a template RNA sequence) is said to comprise a particular sequence (e.g., a sequence of Table 12 or a portion thereof) that comprises thymine (T), it is of course understood that the RNA sequence may (and frequently does) comprise uracil (U) in place of T. For instance, the RNA sequence may comprise U at every position shown as T in the sequence in Table 12. More specifically, the present disclosure provides an RNA sequence according to every gRNA scaffold sequence of Table 12, wherein the RNA sequence has a U in place of each T in the sequence in Table 12. Additionally, it is understood that terminal Us and Ts may optionally be added or removed from tracrRNA sequences and may be modified or unmodified when provided as RNA. Without wishing to be bound by example, versions of gRNA scaffold sequences alternative to those exemplified in Table 12 may also function with the different Cas9 enzymes or derivatives thereof exemplified in Table 8, e.g., alternate gRNA scaffold sequences with nucleotide additions, substitutions, or deletions, e.g., sequences with stem-loop structures added or removed. It is contemplated herein that the gRNA scaffold sequences represent a component of gene modifying systems that can be similarly optimized for a given system, Cas-RT fusion polypeptide, indication, target mutation, template RNA, or delivery vehicle.


Heterologous Object Sequence

A template RNA described herein may comprise a heterologous object sequence that the gene modifying polypeptide can use as a template for reverse transcription, to write a desired sequence into the target nucleic acid. In some embodiments, the heterologous object sequence comprises, from 5′ to 3′, a post-edit homology region, the mutation region, and a pre-edit homology region. Without wishing to be bound by theory, an RT performing reverse transcription on the template RNA first reverse transcribes the pre-edit homology region, then the mutation region, and then the post-edit homology region, thereby creating a DNA strand comprising the desired mutation with a homology region on either side.


In some embodiments, the heterologous object sequence is at least 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 120, 140, 160, 180, 200, 500, or 1,000 nucleotides (nts) in length, or at least 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or 10 kilobases in length. In some embodiments, the heterologous object sequence is no more than 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 120, 140, 160, 180, 200, 500, 1,000, or 2000 nucleotides (nts) in length, or no more than 20, 15, 10, 9, 8, 7, 6, 5, 4, or 3 kilobases in length. In some embodiments, the heterologous object sequence is 30-1000, 40-1000, 50-1000, 60-1000, 70-1000, 74-1000, 75-1000, 76-1000, 77-1000, 78-1000, 79-1000, 80-1000, 85-1000, 90-1000, 100-1000, 120-1000, 140-1000, 160-1000, 180-1000, 200-1000, 500-1000, 30-500, 40-500, 50-500, 60-500, 70-500, 74-500, 75-500, 76-500, 77-500, 78-500, 79-500, 80-500, 85-500, 90-500, 100-500, 120-500, 140-500, 160-500, 180-500, 200-500, 30-200, 40-200, 50-200, 60-200, 70-200, 74-200, 75-200, 76-200, 77-200, 78-200, 79-200, 80-200, 85-200, 90-200, 100-200, 120-200, 140-200, 160-200, 180-200, 30-100, 40-100, 50-100, 60-100, 70-100, 74-100, 75-100, 76-100, 77-100, 78-100, 79-100, 80-100, 85-100, or 90-100 nucleotides (nts) in length, or 1-20, 1-15, 1-10, 1-9, 1-8, 1-7, 1-6, 1-5, 1-4, 1-3, 1-2, 2-20, 2-15, 2-10, 2-9, 2-8, 2-7, 2-6, 2-5, 2-4, 2-3, 3-20, 3-15, 3-10, 3-9, 3-8, 3-7, 3-6, 3-5, 3-4, 4-20, 4-15, 4-10, 4-9, 4-8, 4-7, 4-6, 4-5, 5-20, 5-15, 5-10, 5-9, 5-8, 5-7, 5-6, 6-20, 6-15, 6-10, 6-9, 6-8, 6-7, 7-20, 7-15, 7-10, 7-9, 7-8, 8-20, 8-15, 8-10, 8-9, 9-20, 9-15, 9-10, 10-15, 10-20, or 15-20 kilobases in length. In some embodiments, the heterologous object sequence is 10-100, 10-90, 10-80, 10-70, 10-60, 10-50, 10-40, 10-30, or 10-20 nt in length, e.g., 10-80, 10-50, or 10-20 nt in length, e.g., about 10-20 nt in length. In some embodiments, the heterologous object sequence is 8-30, 9-25, 10-20, 11-16, or 12-15 nucleotides in length, e.g., is 11-16 nt in length. Without wishing to be bound by theory, in some embodiments, a larger insertion size, larger region of editing (e.g., the distance between a first edit/substitution and a second edit/substitution in the target region), and/or greater number of desired edits (e.g., mismatches of the heterologous object sequence to the target genome), may result in a longer optimal heterologous object sequence.


In certain embodiments, the template nucleic acid comprises a customized RNA sequence template which can be identified, designed, engineered and constructed to contain sequences altering or specifying host genome function, for example by introducing a heterologous coding region into a genome; affecting or causing exon structure/alternative splicing, e.g., leading to exon skipping of one or more exons; causing disruption of an endogenous gene, e.g., creating a genetic knockout; causing transcriptional activation of an endogenous gene; causing epigenetic regulation of an endogenous DNA; causing up-regulation of one or more operably linked genes, e.g., leading to gene activation or overexpression; causing down-regulation of one or more operably linked genes, e.g., creating a genetic knock-down; etc. In certain embodiments, a customized RNA sequence template can be engineered to contain sequences coding for exons and/or transgenes, provide binding sites for transcription factor activators, repressors, enhancers, etc., and combinations thereof. In some embodiments, a customized template can be engineered to encode a nucleic acid or peptide tag to be expressed in an endogenous RNA transcript or endogenous protein operably linked to the target site. In other embodiments, the coding sequence can be further customized with splice donor sites, splice acceptor sites, or poly-A tails.


The template nucleic acid (e.g., template RNA) of the system typically comprises an object sequence (e.g., a heterologous object sequence) for writing a desired sequence into a target DNA. The object sequence may be coding or non-coding. The template nucleic acid (e.g., template RNA) can be designed to result in insertions, mutations, or deletions at the target DNA locus. In some embodiments, the template nucleic acid (e.g., template RNA) may be designed to cause an insertion in the target DNA. For example, the template nucleic acid (e.g., template RNA) may contain a heterologous sequence, wherein the reverse transcription will result in insertion of the heterologous sequence into the target DNA. In other embodiments, the RNA template may be designed to introduce a deletion into the target DNA. For example, the template nucleic acid (e.g., template RNA) may match the target DNA upstream and downstream of the desired deletion, wherein the reverse transcription will result in the copying of the upstream and downstream sequences from the template nucleic acid (e.g., template RNA) without the intervening sequence, e.g., causing deletion of the intervening sequence. In other embodiments, the template nucleic acid (e.g., template RNA) may be designed to introduce an edit into the target DNA. For example, the template RNA may match the target DNA sequence with the exception of one or more nucleotides, wherein the reverse transcription will result in the copying of these edits into the target DNA, e.g., resulting in mutations, e.g., transition or transversion mutations.


In some embodiments, writing of an object sequence into a target site results in the substitution of nucleotides, e.g., where the full length of the object sequence corresponds to a matching length of the target site with one or more mismatched bases. In some embodiments, a heterologous object sequence may be designed such that a combination of sequence alterations may occur, e.g., a simultaneous addition and deletion, addition and substitution, or deletion and substitution.


In some embodiments, the heterologous object sequence may contain an open reading frame or a fragment of an open reading frame. In some embodiments the heterologous object sequence has a Kozak sequence. In some embodiments the heterologous object sequence has an internal ribosome entry site. In some embodiments the heterologous object sequence has a self-cleaving peptide such as a T2A or P2A site. In some embodiments the heterologous object sequence has a start codon. In some embodiments the template RNA has a splice acceptor site. In some embodiments the template RNA has a splice donor site. Exemplary splice acceptor and splice donor sites are described in WO2016044416, incorporated herein by reference in its entirety. Exemplary splice acceptor site sequences are known to those of skill in the art. In some embodiments the template RNA has a microRNA binding site downstream of the stop codon. In some embodiments the template RNA has a polyA tail downstream of the stop codon of an open reading frame. In some embodiments the template RNA comprises one or more exons. In some embodiments the template RNA comprises one or more introns. In some embodiments the template RNA comprises a eukaryotic transcriptional terminator. In some embodiments the template RNA comprises an enhanced translation element or a translation enhancing element. In some embodiments the RNA comprises the human T-cell leukemia virus (HTLV-1) R region. In some embodiments the RNA comprises a posttranscriptional regulatory element that enhances nuclear export, such as that of Hepatitis B Virus (HPRE) or Woodchuck Hepatitis Virus (WPRE).


In some embodiments, the heterologous object sequence may contain a non-coding sequence. For example, the template nucleic acid (e.g., template RNA) may comprise a regulatory element, e.g., a promoter or enhancer sequence or miRNA binding site. In some embodiments, integration of the object sequence at a target site will result in upregulation of an endogenous gene. In some embodiments, integration of the object sequence at a target site will result in downregulation of an endogenous gene. In some embodiments the template nucleic acid (e.g., template RNA) comprises a tissue specific promoter or enhancer, each of which may be unidirectional or bidirectional. In some embodiments the promoter is an RNA polymerase I promoter, RNA polymerase II promoter, or RNA polymerase III promoter. In some embodiments the promoter comprises a TATA element. In some embodiments the promoter comprises a B recognition element. In some embodiments the promoter has one or more binding sites for transcription factors.


In some embodiments, the template nucleic acid (e.g., template RNA) comprises a site that coordinates epigenetic modification. In some embodiments, the template nucleic acid (e.g., template RNA) comprises a chromatin insulator. For example, the template nucleic acid (e.g., template RNA) comprises a CTCF site or a site targeted for DNA methylation.


In some embodiments, the template nucleic acid (e.g., template RNA) comprises a gene expression unit composed of at least one regulatory region operably linked to an effector sequence. The effector sequence may be a sequence that is transcribed into RNA (e.g., a coding sequence or a non-coding sequence such as a sequence encoding a micro RNA).


In some embodiments, the heterologous object sequence of the template nucleic acid (e.g., template RNA) is inserted into a target genome in an endogenous intron. In some embodiments, the heterologous object sequence of the template nucleic acid (e.g., template RNA) is inserted into a target genome and thereby acts as a new exon. In some embodiments, the insertion of the heterologous object sequence into the target genome results in replacement of a natural exon or the skipping of a natural exon.


The template nucleic acid (e.g., template RNA) can be designed to result in insertions, mutations, or deletions at the target DNA locus. In some embodiments, the template nucleic acid (e.g., template RNA) may be designed to cause an insertion in the target DNA. For example, the template nucleic acid (e.g., template RNA) may contain a heterologous object sequence, wherein the reverse transcription will result in insertion of the heterologous object sequence into the target DNA. In other embodiments, the RNA template may be designed to write a deletion into the target DNA. For example, the template nucleic acid (e.g., template RNA) may match the target DNA upstream and downstream of the desired deletion, wherein the reverse transcription will result in the copying of the upstream and downstream sequences from the template nucleic acid (e.g., template RNA) without the intervening sequence, e.g., causing deletion of the intervening sequence. In other embodiments, the template nucleic acid (e.g., template RNA) may be designed to write an edit into the target DNA. For example, the template RNA may match the target DNA sequence with the exception of one or more nucleotides, wherein the reverse transcription will result in the copying of these edits into the target DNA, e.g., resulting in mutations, e.g., transition or transversion mutations.


In some embodiments, the pre-edit homology domain comprises a nucleic acid sequence having 100% sequence identity with a nucleic acid sequence comprised in a target nucleic acid molecule.


In some embodiments, the post-edit homology domain comprises a nucleic acid sequence having 100% sequence identity with a nucleic acid sequence comprised in a target nucleic acid molecule.


PBS Sequence

In some embodiments, a template nucleic acid (e.g., template RNA) comprises a PBS sequence. In some embodiments, a PBS sequence is disposed 3′ of the heterologous object sequence and is complementary to a sequence adjacent to a site to be modified by a system described herein, or comprises no more than 1, 2, 3, 4, or 5 mismatches to a sequence complementary to the sequence adjacent to a site to be modified by the system/gene modifying polypeptide. In some embodiments, the PBS sequence binds within 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides of a nick site in the target nucleic acid molecule. In some embodiments, binding of the PBS sequence to the target nucleic acid molecule permits initiation of target-primed reverse transcription (TPRT), e.g., with the 3′ homology domain acting as a primer for TPRT. In some embodiments, the PBS sequence is 3-5, 5-10, 10-30, 10-25, 10-20, 10-19, 10-18, 10-17, 10-16, 10-15, 10-14, 10-13, 10-12, 10-11, 11-30, 11-25, 11-20, 11-19, 11-18, 11-17, 11-16, 11-15, 11-14, 11-13, 11-12, 12-30, 12-25, 12-20, 12-19, 12-18, 12-17, 12-16, 12-15, 12-14, 12-13, 13-30, 13-25, 13-20, 13-19, 13-18, 13-17, 13-16, 13-15, 13-14, 14-30, 14-25, 14-20, 14-19, 14-18, 14-17, 14-16, 14-15, 15-30, 15-25, 15-20, 15-19, 15-18, 15-17, 15-16, 16-30, 16-25, 16-20, 16-19, 16-18, 16-17, 17-30, 17-25, 17-20, 17-19, 17-18, 18-30, 18-25, 18-20, 18-19, 19-30, 19-25, 19-20, 20-30, 20-25, or 25-30 nucleotides in length, e.g., 10-17, 12-16, or 12-14 nucleotides in length. In some embodiments, the PBS sequence is 5-20, 8-16, 8-14, 8-13, 9-13, 9-12, or 10-12 nucleotides in length, e.g., 9-12 nucleotides in length.


The template nucleic acid (e.g., template RNA) may have some homology to the target DNA. In some embodiments, the template nucleic acid (e.g., template RNA) PBS sequence domain may serve as an annealing region to the target DNA, such that the target DNA is positioned to prime the reverse transcription of the template nucleic acid (e.g., template RNA). In some embodiments the template nucleic acid (e.g., template RNA) has at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 175, 200 or more bases of exact homology to the target DNA at the 3′ end of the RNA. In some embodiments the template nucleic acid (e.g., template RNA) has at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 175, 200 or more bases of at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% homology to the target DNA, e.g., at the 5′ end of the template nucleic acid (e.g., template RNA).


Exemplary Template Sequences

In some embodiments of the systems and methods herein, the template RNA comprises a gRNA spacer comprising the core nucleotides of a gRNA spacer sequence of Table 1. In some embodiments, the gRNA spacer additionally comprises one or more (e.g., 2, 3, or all) consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the gRNA spacer. In some embodiments, the template RNA comprising a sequence of Table 1 is comprised by a system that further comprises a gene modifying polypeptide having an RT domain listed in the same line of Table 1. RT domain amino acid sequences can be found, e.g., in Table 6 and F3 herein.









TABLE 1







Exemplary gRNA spacer Cas pairs
















SEQ

SEQ






PAM
ID

ID
Cas

Overlaps


ID
Sequence
NO
gRNA Spacer
NO
Species
distance
Mutation

















10
GTG

CATTAAAGAAAATATCATT
16921
ScaCas9-
1
0





G

Sc++







11
ATG

ATTCATCATAGGAAACACC
16922
ScaCas9-
1
1





A

Sc++







12
GTG

CATTAAAGAAAATATCATT
16923
SpyCas9-
1
0





G

SpRY







13
ATG

ATTCATCATAGGAAACACC
16924
SpyCas9-
1
1





A

SpRY







14
GTGTTTC

ACCATTAAAGAAAATATCA
16925
CdiCas9
1
0





TTG









15
ATGATAT

ATATTCATCATAGGAAACA
16926
CdiCas9
1
1





CCA









16
GTG

CATTAAAGAAAATATCATT
16927
ScaCas9
1
0





G









17
ATG

ATTCATCATAGGAAACACC
16928
ScaCas9
1
1





A









18
GTG

CATTAAAGAAAATATCATT
16929
ScaCas9-
1
0





G

HiFi-Sc++







19
ATG

ATTCATCATAGGAAACACC
16930
ScaCas9-
1
1





A

HiFi-Sc++







20
AATGA

ATATTCATCATAGGAAACA
16931
SauCas9K
2
1





CC

KH







21
AATGAT

ATATTCATCATAGGAAACA
16932
SauCas9K
2
1





CC

KH







22
GG

CCATTAAAGAAAATATCAT
16933
SpyCas9-
2
0





T

NG







23
GGT

CCATTAAAGAAAATATCAT
16934
SpyCas9-
2
0





T

SpRY







24
AAT

TATTCATCATAGGAAACAC
16935
SpyCas9-
2
1





C

SpRY







25
GGT

CCATTAAAGAAAATATCAT
16936
SpyCas9-
2
0





T

SpG







26
GG

CCATTAAAGAAAATATCAT
16937
SpyCas9-
2
0





T

xCas







27
GG

CCATTAAAGAAAATATCAT
16938
SpyCas9-
2
0





T

xCas-NG







28
TGGTGTT

tggCACCATTAAAGAAAATA
16939
PpnCas9
3
0





TCAT









29
TGG

ACCATTAAAGAAAATATCA
16940
ScaCas9-
3
0





T

Sc++







30
TGG

ACCATTAAAGAAAATATCA
16941
SpyCas9
3
0





T









31
TG

ACCATTAAAGAAAATATCA
16942
SpyCas9-
3
0





T

NG







32
TGG

ACCATTAAAGAAAATATCA
16943
SpyCas9-
3
0





T

SpRY







33
CAA

ATATTCATCATAGGAAACA
16944
SpyCas9-
3
1





C

SpRY







34
CAAT

taTATTCATCATAGGAAACA
16945
iSpyMacCas9
3
1





C









35
TGGTGTT

tggcACCATTAAAGAAAATA
16946
NmeCas9
3
0



T

TCAT









36
CAATGAT

atctATATTCATCATAGGAAA
16947
NmeCas9
3
1



A

CAC









37
TGG

ACCATTAAAGAAAATATCA
16948
ScaCas9
3
0





T









38
TGG

ACCATTAAAGAAAATATCA
16949
ScaCas9-
3
0





T

HiFi-Sc++







39
TGGT

ACCATTAAAGAAAATATCA
16950
SpyCas9-
3
0





T

3var-









NRRH







40
CAAT

ATATTCATCATAGGAAACA
16951
SpyCas9-
3
1





C

3var-









NRRH







41
TGG

ACCATTAAAGAAAATATCA
16952
SpyCas9-
3
0





T

HF1







42
TGG

ACCATTAAAGAAAATATCA
16953
SpyCas9-
3
0





T

SpG







43
TG

ACCATTAAAGAAAATATCA
16954
SpyCas9-
3
0





T

xCas







44
TG

ACCATTAAAGAAAATATCA
16955
SpyCas9-
3
0





T

xCas-NG







45
TTGG

GCACCATTAAAGAAAATAT
16956
SauriCas9
4
1





CA









46
TTGG

GCACCATTAAAGAAAATAT
16957
SauriCas9-
4
1





CA

KKH







47
TTG

CACCATTAAAGAAAATATC
16958
ScaCas9-
4
1





A

Sc++







48
TTG

CACCATTAAAGAAAATATC
16959
SpyCas9-
4
1





A

SpRY







49
CCA

TATATTCATCATAGGAAAC
16960
SpyCas9-
4
0





A

SpRY







50
TTGGTG

GCACCATTAAAGAAAATAT
16961
cCas9-v16
4
1





CA









51
CCAATGA

CTATATTCATCATAGGAAA
16962
cCas9-v16
4
1





CA









52
TTGGTG

GCACCATTAAAGAAAATAT
16963
cCas9-v21
4
1





CA









53
CCAATGA

CTATATTCATCATAGGAAA
16964
cCas9-v21
4
1





CA









54
TTG

CACCATTAAAGAAAATATC
16965
ScaCas9
4
1





A









55
TTG

CACCATTAAAGAAAATATC
16966
ScaCas9-
4
1





A

HiFi-Sc++







56
ATTGG

GGCACCATTAAAGAAAATA
16967
SauCas9KKH
5
1





TC









57
ATTGGT

GGCACCATTAAAGAAAATA
16968
SauCas9KKH
5
1





TC









58
ACCAA

TCTATATTCATCATAGGAA
16969
SauCas9KKH
5
1





AC









59
ACCAAT

TCTATATTCATCATAGGAA
16970
SauCas9KKH
5
1





AC









60
ATT

GCACCATTAAAGAAAATAT
16971
SpyCas9-
5
1





C

SpRY







61
ACC

CTATATTCATCATAGGAAA
16972
SpyCas9-
5
0





C

SpRY







62
ACCAAT

TCTATATTCATCATAGGAA
16973
cCas9-v17
5
1





AC









63
ACCAAT

TCTATATTCATCATAGGAA
16974
cCas9-v42
5
1





AC









64
CAT

GGCACCATTAAAGAAAATA
16975
SpyCas9-
6
1





T

SpRY







65
CAC

TCTATATTCATCATAGGAA
16976
SpyCas9-
6
0





A

SpRY







66
CATT

GGCACCATTAAAGAAAATA
16977
SpyCas9-
6
1





T

3var-









NRTH







67
CACC

TCTATATTCATCATAGGAA
16978
SpyCas9-
6
0





A

3var-









NRCH







68
TCA

TGGCACCATTAAAGAAAAT
16979
SpyCas9-
7
0





A

SpRY







69
ACA

ATCTATATTCATCATAGGA
16980
SpyCas9-
7
0





A

SpRY







70
ACACCAA

tgtaTCTATATTCATCATAGG
16981
BlatCas9
7
1



T

AA









71
ACACC

tgtaTCTATATTCATCATAGG
16982
BlatCas9
7
0





AA









72
AACACC

tcTGTATCTATATTCATCATA
16983
Nme2Cas9
8
0





GGA









73
ATC

CTGGCACCATTAAAGAAAA
16984
SpyCas9-
8
0





T

SpRY







74
AAC

TATCTATATTCATCATAGGA
16985
SpyCas9-
8
0







SpRY







75
AACACCA

ctgtATCTATATTCATCATAG
16986
BlatCas9
8
1



A

GA









76
AACACCA

ctgtATCTATATTCATCATAG
16987
BlatCas9
8
1



A

GA









77
AACAC

ctgtATCTATATTCATCATAG
16988
BlatCas9
8
0





GA









78
ATCATT

CCTGGCACCATTAAAGAAA
16989
cCas9-v16
8
1





AT









79
ATCATT

CCTGGCACCATTAAAGAAA
16990
cCas9-v21
8
1





AT









80
AACA

TATCTATATTCATCATAGGA
16991
SpyCas9-
8
0







3var









NRCH







81
TATCATT

tatGCCTGGCACCATTAAAGA
16992
PpnCas9
9
1





AAA









82
TAT

CCTGGCACCATTAAAGAAA
16993
SpyCas9-
9
0





A

SpRY







83
AAA

GTATCTATATTCATCATAGG
16994
SpyCas9-
9
0







SpRY







84
AAACACC

CTGTATCTATATTCATCATA
16995
CdiCas9
9
0





GG









85
AAAC

tgTATCTATATTCATCATAG
16996
iSpyMacC
9
0





G

as9







86
AAAC

GTATCTATATTCATCATAGG
16997
SpyCas9-
9
0







3var-









NRRH







87
TATC

CCTGGCACCATTAAAGAAA
16998
SpyCas9-
9
0





A

3var-









NRTH







88
ATA

GCCTGGCACCATTAAAGAA
16999
SpyCas9-
10
0





A

SpRY







89
GAA

TGTATCTATATTCATCATAG
17000
SpyCas9-
10
0







SpRY







90
ATATC

tatgCCTGGCACCATTAAAGA
17001
BlatCas9
10
0





AA









91
ATATCAT

tatgCCTGGCACCATTAAAGA
17002
BlatCas9
10
1



T

AA









92
GAAAC

ttctGTATCTATATTCATCATA
17003
BlatCas9
10
0





G









93
ATATCAT

ATGCCTGGCACCATTAAAG
17004
CdiCas9
10
1





AAA









94
GAAACA

TCTGTATCTATATTCATCAT
17005
CdiCas9
10
0



C

AG









95
GAAA

ctGTATCTATATTCATCATAG
17006
iSpyMacCas9
10
0





96
GAAA

TGTATCTATATTCATCATAG
17007
SpyCas9-
10
0







3var-









NRRH







97
GAA

TGTATCTATATTCATCATAG
17008
SpyCas9-
10
0







xCas







98
GGAAA

TCTGTATCTATATTCATCAT
17009
SauCas9KKH
11
0





A









99
GG

CTGTATCTATATTCATCATA
17010
SpyCas9-
11
0







NG







100
AAT

TGCCTGGCACCATTAAAGA
17011
SpyCas9-
11
0





A

SpRY







101
GGA

CTGTATCTATATTCATCATA
17012
SpyCas9-
11
0







SpRY







102
GGAAAC

TCTGTATCTATATTCATCAT
17013
cCas9-v17
11
0





A









103
GGAAAC

TCTGTATCTATATTCATCAT
17014
cCas9-v42
11
0





A









104
GGAAAC

ctTCTGTATCTATATTCATCA
17015
CjeCas9
11
0



AC

TA









105
GGAA

CTGTATCTATATTCATCATA
17016
SpyCas9-
11
0







3var-









NRRH







106
AATA

TGCCTGGCACCATTAAAGA
17017
SpyCas9-
11
0





A

3var-









NRTH







107
GGA

CTGTATCTATATTCATCATA
17018
SpyCas9-
11
0







SpG







108
GGAA

CTGTATCTATATTCATCATA
17019
SpyCas9-
11
0







VQR







109
GG

CTGTATCTATATTCATCATA
17020
SpyCas9-
11
0







xCas







110
GG

CTGTATCTATATTCATCATA
17021
SpyCas9-
11
0







xCas-NG







111
AGGAA

gcTTCTGTATCTATATTCATC
17022
SauCas9
12
0





AT









112
AGGAA

TTCTGTATCTATATTCATCA
17023
SauCas9KKH
12
0





T









113
AGG

TCTGTATCTATATTCATCAT
17024
ScaCas9-
12
0







Sc++







114
AGG

TCTGTATCTATATTCATCAT
17025
SpyCas9
12
0





115
AG

TCTGTATCTATATTCATCAT
17026
SpyCas9-
12
0







NG







116
AAA

ATGCCTGGCACCATTAAAG
17027
SpyCas9-
12
0





A

SpRY







117
AGG

TCTGTATCTATATTCATCAT
17028
SpyCas9-
12
0







SpRY







118
AGGAAA

TTCTGTATCTATATTCATCA
17029
cCas9-v17
12
0





T









119
AGGAAA

TTCTGTATCTATATTCATCA
17030
cCas9-v42
12
0





T









120
AAATATC

TTATGCCTGGCACCATTAA
17031
CdiCas9
12
0





AGA









121
AGGAAA

CTTCTGTATCTATATTCATC
17032
CdiCas9
12
0



C

AT









122
AGGAAA

CTTCTGTATCTATATTCATC
17033
CdiCas9
12
0



C

AT









123
AAAT

taTGCCTGGCACCATTAAAG
17034
iSpyMacCas9
12
0





A









124
AGG

TCTGTATCTATATTCATCAT
17035
ScaCas9
12
0





125
AGG

TCTGTATCTATATTCATCAT
17036
ScaCas9-
12
0







HiFi-Sc++







126
AAAT

ATGCCTGGCACCATTAAAG
17037
SpyCas9-
12
0





A

3var-









NRRH







127
AGGA

TCTGTATCTATATTCATCAT
17038
SpyCas9-
12
0







3var-









NRRH







128
AGG

TCTGTATCTATATTCATCAT
17039
SpyCas9-
12
0







HF1







129
AGG

TCTGTATCTATATTCATCAT
17040
SpyCas9-
12
0







SpG







130
AG

TCTGTATCTATATTCATCAT
17041
SpyCas9-
12
0







xCas







131
AG

TCTGTATCTATATTCATCAT
17042
SpyCas9-
12
0







xCas-NG







132
AGGAAA

TCTGTATCTATATTCATCAT
17043
St1Cas9-
12
0







TH1477







133
TAGGA

cgCTTCTGTATCTATATTCAT
17044
SauCas9
13
0





CA









134
TAGGA

CTTCTGTATCTATATTCATC
17045
SauCas9KKH
13
0





A









135
TAGG

CTTCTGTATCTATATTCATC
17046
SauriCas9
13
0





A









136
TAGG

CTTCTGTATCTATATTCATC
17047
SauriCas9-
13
0





A

KKH







137
TAG

TTCTGTATCTATATTCATCA
17048
ScaCas9-
13
0







Sc++







138
AAA

TATGCCTGGCACCATTAAA
17049
SpyCas9-
13
0





G

SpRY







139
TAG

TTCTGTATCTATATTCATCA
17050
SpyCas9-
13
0







SpRY







140
TAGGAA

TTCTGTATCTATATTCATCA
17051
St1Cas9
13
0



A











141
TAGGAA

CTTCTGTATCTATATTCATC
17052
cCas9-v17
13
0





A









142
TAGGAA

CTTCTGTATCTATATTCATC
17053
cCas9-v42
13
0





A









143
AAAATAT

ATTATGCCTGGCACCATTA
17054
CdiCas9
13
0





AAG









144
AAAA

ttATGCCTGGCACCATTAAA
17055
iSpyMacCas9
13
0





G









145
TAG

TTCTGTATCTATATTCATCA
17056
ScaCas9
13
0





146
TAG

TTCTGTATCTATATTCATCA
17057
ScaCas9-
13
0







HiFi-Sc++







147
AAAA

TATGCCTGGCACCATTAAA
17058
SpyCas9-
13
0





G

3var-









NRRH







148
GAAAA

ATTATGCCTGGCACCATTA
17059
SauCas9KKH
14
0





AA









149
GAAAAT

ATTATGCCTGGCACCATTA
17060
SauCas9KKH
14
0





AA









150
ATAGG

GCTTCTGTATCTATATTCAT
17061
SauCas9KKH
14
0





C









151
ATAG

GCTTCTGTATCTATATTCAT
17062
SauriCas9-
14
0





C

KKH







152
GAA

TTATGCCTGGCACCATTAA
17063
SpyCas9-
14
0





A

SpRY







153
ATA

CTTCTGTATCTATATTCATC
17064
SpyCas9-
14
0







SpRY







154
ATAGGA

CTTCTGTATCTATATTCATC
17065
St1Cas9
14
0



A











155
GAAAAT

ATTATGCCTGGCACCATTA
17066
cCas9-v17
14
0





AA









156
ATAGGA

GCTTCTGTATCTATATTCAT
17067
cCas9-v17
14
0





C









157
GAAAAT

ATTATGCCTGGCACCATTA
17068
cCas9-v42
14
0





AA









158
ATAGGA

GCTTCTGTATCTATATTCAT
17069
cCas9-v42
14
0





C









159
GAAA

atTATGCCTGGCACCATTAA
17070
iSpyMacCas9
14
0





A









160
GAAA

TTATGCCTGGCACCATTAA
17071
SpyCas9-
14
0





A

3var-









NRRH







161
GAA

TTATGCCTGGCACCATTAA
17072
SpyCas9-
14
0





A

xCas







162
AGAAA

GATTATGCCTGGCACCATT
17073
SauCas9KKH
15
0





AA









163
CATAG

CGCTTCTGTATCTATATTCA
17074
SauCas9KKH
15
0





T









164
AG

ATTATGCCTGGCACCATTA
17075
SpyCas9-
15
0





A

NG







165
AGA

ATTATGCCTGGCACCATTA
17076
SpyCas9-
15
0





A

SpRY







166
CAT

GCTTCTGTATCTATATTCAT
17077
SpyCas9-
15
0







SpRY







167
AGAAAA

GATTATGCCTGGCACCATT
17078
cCas9-v17
15
0





AA









168
AGAAAA

GATTATGCCTGGCACCATT
17079
cCas9-v42
15
0





AA









169
AGAAAA

GGATTATGCCTGGCACCAT
17080
CdiCas9
15
0



T

TAA









170
AGAAAA

GGATTATGCCTGGCACCAT
17081
CdiCas9
15
0



T

TAA









171
AGAA

ATTATGCCTGGCACCATTA
17082
SpyCas9-
15
0





A

3var-









NRRH







172
CATA

GCTTCTGTATCTATATTCAT
17083
SpyCas9-
15
0







3var-









NRTH







173
AGA

ATTATGCCTGGCACCATTA
17084
SpyCas9-
15
0





A

SpG







174
AGAA

ATTATGCCTGGCACCATTA
17085
SpyCas9-
15
0





A

VQR







175
AG

ATTATGCCTGGCACCATTA
17086
SpyCas9-
15
0





A

xCas







176
AG

ATTATGCCTGGCACCATTA
17087
SpyCas9-
15
0





A

xCas-NG







177
AGAAAA

ATTATGCCTGGCACCATTA
17088
St1Cas9-
15
0





A

MTH17CL396







178
AAGAA

ctGGATTATGCCTGGCACCA
17089
SauCas9
16
0





TTA









179
AAGAA

GGATTATGCCTGGCACCAT
17090
SauCas9KKH
16
0





TA









180
AAG

GATTATGCCTGGCACCATT
17091
ScaCas9-
16
0





A

Sc++







181
AAG

GATTATGCCTGGCACCATT
17092
SpyCas9-
16
0





A

SpRY







182
TCA

CGCTTCTGTATCTATATTCA
17093
SpyCas9-
16
0







SpRY







183
AAGAAA

GGATTATGCCTGGCACCAT
17094
cCas9-v17
16
0





TA









184
AAGAAA

GGATTATGCCTGGCACCAT
17095
cCas9-v42
16
0





TA









185
AAG

GATTATGCCTGGCACCATT
17096
ScaCas9
16
0





A









186
AAG

GATTATGCCTGGCACCATT
17097
ScaCas9-
16
0





A

HiFi-Sc++







187
AAGA

GATTATGCCTGGCACCATT
17098
SpyCas9-
16
0





A

3var-









NRRH







188
AAGAAA

GATTATGCCTGGCACCATT
17099
St1Cas9-
16
0





A

TH1477







189
AAAGA

TGGATTATGCCTGGCACCA
17100
SauCas9KKH
17
0





TT









190
AAAG

TGGATTATGCCTGGCACCA
17101
SauriCas9-
17
0





TT

KKH







191
AAA

GGATTATGCCTGGCACCAT
17102
SpyCas9-
17
0





T

SpRY







192
ATC

ACGCTTCTGTATCTATATTC
17103
SpyCas9-
17
0







SpRY







193
AAAGAA

GGATTATGCCTGGCACCAT
17104
St1Cas9
17
0



A

T









194
AAAGAA

TGGATTATGCCTGGCACCA
17105
cCas9-v17
17
0





TT









195
AAAGAA

TGGATTATGCCTGGCACCA
17106
cCas9-v42
17
0





TT









196
AAAG

tgGATTATGCCTGGCACCAT
17107
iSpyMacCas9
17
0





T









197
AAAG

GGATTATGCCTGGCACCAT
17108
SpyCas9-
17
0





T

QQR1







198
TAAAG

CTGGATTATGCCTGGCACC
17109
SauCas9KKH
18
0





AT









199
TAA

TGGATTATGCCTGGCACCA
17110
SpyCas9-
18
0





T

SpRY







200
CAT

GACGCTTCTGTATCTATATT
17111
SpyCas9-
18
0







SpRY







201
TAAAGA

CTGGATTATGCCTGGCACC
17112
cCas9-v17
18
0





AT









202
TAAAGA

CTGGATTATGCCTGGCACC
17113
cCas9-v42
18
0





AT









203
TAAA

ctGGATTATGCCTGGCACCA
17114
iSpyMacCas9
18
0





T









204
TAAA

TGGATTATGCCTGGCACCA
17115
SpyCas9-
18
0





T

3var-









NRRH







205
CATC

GACGCTTCTGTATCTATATT
17116
SpyCas9-
18
0







3var-









NRTH







206
TTAAA

CCTGGATTATGCCTGGCAC
17117
SauCas9KKH
19
0





CA









207
TTA

CTGGATTATGCCTGGCACC
17118
SpyCas9-
19
0





A

SpRY







208
TCA

TGACGCTTCTGTATCTATAT
17119
SpyCas9-
19
0







SpRY







209
TCATC

tgatGACGCTTCTGTATCTAT
17120
BlatCas9
19
0





AT









210
TCATCAT

tgatGACGCTTCTGTATCTAT
17121
BlatCas9
19
0



A

AT









211
TTAAAG

CCTGGATTATGCCTGGCAC
17122
cCas9-v17
19
0





CA









212
TTAAAG

CCTGGATTATGCCTGGCAC
17123
cCas9-v42
19
0





CA









213
TCATCAT

GATGACGCTTCTGTATCTAT
17124
CdiCas9
19
0





AT









214
ATTAA

TCCTGGATTATGCCTGGCA
17125
SauCas9KKH
20
0





CC









215
ATT

CCTGGATTATGCCTGGCAC
17126
SpyCas9-
20
0





C

SpRY







216
TTC

ATGACGCTTCTGTATCTATA
17127
SpyCas9-
20
0







SpRY







217
CAT

TCCTGGATTATGCCTGGCA
17128
SpyCas9-
21
0





C

SpRY







218
ATT

GATGACGCTTCTGTATCTAT
17129
SpyCas9-
21
0







SpRY







219
CATT

TCCTGGATTATGCCTGGCA
17130
SpyCas9-
21
0





C

3var-









NRTH







220
CCA

TTCCTGGATTATGCCTGGCA
17131
SpyCas9-
22
0







SpRY







221
TAT

TGATGACGCTTCTGTATCTA
17132
SpyCas9-
22
0







SpRY







222
TATTC

ctttGATGACGCTTCTGTATCT
17133
BlatCas9
22
0





A









223
TATT

TGATGACGCTTCTGTATCTA
17134
SpyCas9-
22
0







3var-









NRTH







224
ACC

TTTCCTGGATTATGCCTGGC
17135
SpyCas9-
23
0







SpRY







225
ATA

TTGATGACGCTTCTGTATCT
17136
SpyCas9-
23
0







SpRY







226
ACCATT

TTTTCCTGGATTATGCCTGG
17137
cCas9-v16
23
0





C









227
ACCATT

TTTTCCTGGATTATGCCTGG
17138
cCas9-v21
23
0





C









228
CACCATT

tcaGTTTTCCTGGATTATGCC
17139
PpnCas9
24
0





TGG









229
CAC

TTTTCCTGGATTATGCCTGG
17140
SpyCas9-
24
0







SpRY







230
TAT

TTTGATGACGCTTCTGTATC
17141
SpyCas9-
24
0







SpRY







231
TATA

TTTGATGACGCTTCTGTATC
17142
SpyCas9-
24
0







3var-









NRTH







232
CACC

TTTTCCTGGATTATGCCTGG
17143
SpyCas9-
24
0







3var-









NRCH







233
CTATATT

catGCTTTGATGACGCTTCTG
17144
PpnCas9
25
0





TAT









234
GCA

GTTTTCCTGGATTATGCCTG
17145
SpyCas9-
25
0







SpRY







235
CTA

CTTTGATGACGCTTCTGTAT
17146
SpyCas9-
25
0







SpRY







236
GCACCAT

tcagTTTTCCTGGATTATGCC
17147
BlatCas9
25
0



T

TG









237
GCACC

tcagTTTTCCTGGATTATGCC
17148
BlatCas9
25
0





TG









238
GCACCAT

CAGTTTTCCTGGATTATGCC
17149
CdiCas9
25
0





TG









239
CTATATT

TGCTTTGATGACGCTTCTGT
17150
CdiCas9
25
0





AT









240
GGCACC

tcTCAGTTTTCCTGGATTATG
17151
Nme2Cas9
26
0





CCT









241
GG

AGTTTTCCTGGATTATGCCT
17152
SpyCas9-
26
0







NG







242
GGC

AGTTTTCCTGGATTATGCCT
17153
SpyCas9-
26
0







SpRY







243
TCT

GCTTTGATGACGCTTCTGTA
17154
SpyCas9-
26
0







SpRY







244
GGCACCA

ctcaGTTTTCCTGGATTATGC
17155
BlatCas9
26
0



T

CT









245
GGCAC

ctcaGTTTTCCTGGATTATGC
17156
BlatCas9
26
0





CT









246
GGCA

AGTTTTCCTGGATTATGCCT
17157
SpyCas9-
26
0







3var-









NRCH







247
GGC

AGTTTTCCTGGATTATGCCT
17158
SpyCas9-
26
0







SpG







248
GG

AGTTTTCCTGGATTATGCCT
17159
SpyCas9-
26
0







xCas







249
GG

AGTTTTCCTGGATTATGCCT
17160
SpyCas9-
26
0







xCas-NG







250
TGG

CAGTTTTCCTGGATTATGCC
17161
ScaCas9-
27
0







Sc++







251
TGG

CAGTTTTCCTGGATTATGCC
17162
SpyCas9
27
0





252
TG

CAGTTTTCCTGGATTATGCC
17163
SpyCas9-
27
0







NG







253
TGG

CAGTTTTCCTGGATTATGCC
17164
SpyCas9-
27
0







SpRY







254
ATC

TGCTTTGATGACGCTTCTGT
17165
SpyCas9-
27
0







SpRY







255
TGGCACC

CTCAGTTTTCCTGGATTATG
17166
CdiCas9
27
0





CC









256
TGG

CAGTTTTCCTGGATTATGCC
17167
ScaCas9
27
0





257
TGG

CAGTTTTCCTGGATTATGCC
17168
ScaCas9-
27
0







HiFi-Sc++







258
TGGC

CAGTTTTCCTGGATTATGCC
17169
SpyCas9-
27
0







3var-









NRRH







259
TGG

CAGTTTTCCTGGATTATGCC
17170
SpyCas9-
27
0







HF1







260
TGG

CAGTTTTCCTGGATTATGCC
17171
SpyCas9-
27
0







SpG







261
TG

CAGTTTTCCTGGATTATGCC
17172
SpyCas9-
27
0







xCas







262
TG

CAGTTTTCCTGGATTATGCC
17173
SpyCas9-
27
0







xCas-NG







263
CTGG

CTCAGTTTTCCTGGATTATG
17174
SauriCas9
28
0





C









264
CTGG

CTCAGTTTTCCTGGATTATG
17175
SauriCas9-
28
0





C

KKH







265
CTG

TCAGTTTTCCTGGATTATGC
17176
ScaCas9-
28
0







Sc++







266
CTG

TCAGTTTTCCTGGATTATGC
17177
SpyCas9-
28
0







SpRY







267
TAT

ATGCTTTGATGACGCTTCTG
17178
SpyCas9-
28
0







SpRY







268
CTGGC

ttctCAGTTTTCCTGGATTATG
17179
BlatCas9
28
0





C









269
CTG

TCAGTTTTCCTGGATTATGC
17180
ScaCas9
28
0





270
CTG

TCAGTTTTCCTGGATTATGC
17181
ScaCas9-
28
0







HiFi-Sc++







271
TATC

ATGCTTTGATGACGCTTCTG
17182
SpyCas9-
28
0







3var-









NRTH







272
CCTGG

TCTCAGTTTTCCTGGATTAT
17183
SauCas9K
29
0





G

KH







273
CCT

CTCAGTTTTCCTGGATTATG
17184
SpyCas9-
29
0







SpRY







274
GTA

CATGCTTTGATGACGCTTCT
17185
SpyCas9-
29
0







SpRY







275
GTATCTA

tggcATGCTTTGATGACGCTT
17186
BlatCas9
29
0



T

CT









276
GTATC

tggcATGCTTTGATGACGCTT
17187
BlatCas9
29
0





CT









277
CCTGGCA

gtTCTCAGTTTTCCTGGATTA
17188
CjeCas9
29
0



C

TG









278
TG

GCATGCTTTGATGACGCTTC
17189
SpyCas9-
30
0







NG







279
GCC

TCTCAGTTTTCCTGGATTAT
17190
SpyCas9-
30
0







SpRY







280
TGT

GCATGCTTTGATGACGCTTC
17191
SpyCas9-
30
0







SpRY







281
TGTA

GCATGCTTTGATGACGCTTC
17192
SpyCas9-
30
0







3var-









NRTH







282
TGT

GCATGCTTTGATGACGCTTC
17193
SpyCas9-
30
0







SpG







283
TG

GCATGCTTTGATGACGCTTC
17194
SpyCas9-
30
0







xCas







284
TG

GCATGCTTTGATGACGCTTC
17195
SpyCas9-
30
0







xCas-NG







285
CTG

GGCATGCTTTGATGACGCT
17196
ScaCas9-
31
0





T

Sc++







286
TG

TTCTCAGTTTTCCTGGATTA
17197
SpyCas9-
31
0







NG







287
TGC

TTCTCAGTTTTCCTGGATTA
17198
SpyCas9-
31
0







SpRY







288
CTG

GGCATGCTTTGATGACGCT
17199
SpyCas9-
31
0





T

SpRY







289
CTGTATC

TTGGCATGCTTTGATGACG
17200
CdiCas9
31
0





CTT









290
CTG

GGCATGCTTTGATGACGCT
17201
ScaCas9
31
0





T









291
CTG

GGCATGCTTTGATGACGCT
17202
ScaCas9-
31
0





T

HiFi-Sc++







292
TGCC

TTCTCAGTTTTCCTGGATTA
17203
SpyCas9-
31
0







3var-









NRCH







293
TGC

TTCTCAGTTTTCCTGGATTA
17204
SpyCas9-
31
0







SpG







294
TG

TTCTCAGTTTTCCTGGATTA
17205
SpyCas9-
31
0







xCas







295
TG

TTCTCAGTTTTCCTGGATTA
17206
SpyCas9-
31
0







xCas-NG







296
ATG

GTTCTCAGTTTTCCTGGATT
17207
ScaCas9-
32
0







Sc++







297
ATG

GTTCTCAGTTTTCCTGGATT
17208
SpyCas9-
32
0







SpRY







298
TCT

TGGCATGCTTTGATGACGC
17209
SpyCas9-
32
0





T

SpRY







299
ATGCCTG

tctgTTCTCAGTTTTCCTGGAT
17210
BlatCas9
32
0



G

T









300
ATGCC

tctgTTCTCAGTTTTCCTGGAT
17211
BlatCas9
32
0





T









301
ATG

GTTCTCAGTTTTCCTGGATT
17212
ScaCas9
32
0





302
ATG

GTTCTCAGTTTTCCTGGATT
17213
ScaCas9-
32
0







HiFi-Sc++







303
TATGCC

atTCTGTTCTCAGTTTTCCTG
17214
Nme2Cas9
33
0





GAT









304
TAT

TGTTCTCAGTTTTCCTGGAT
17215
SpyCas9-
33
0







SpRY







305
TTC

TTGGCATGCTTTGATGACG
17216
SpyCas9-
33
0





C

SpRY







306
TATGCCT

ttctGTTCTCAGTTTTCCTGGA
17217
BlatCas9
33
0



G

T









307
TATGC

ttctGTTCTCAGTTTTCCTGGA
17218
BlatCas9
33
0





T









308
TTA

CTGTTCTCAGTTTTCCTGGA
17219
SpyCas9-
34
0







SpRY







309
CTT

GTTGGCATGCTTTGATGAC
17220
SpyCas9-
34
0





G

SpRY







310
ATT

TCTGTTCTCAGTTTTCCTGG
17221
SpyCas9-
35
0







SpRY







311
GCT

AGTTGGCATGCTTTGATGA
17222
SpyCas9-
35
0





C

SpRY







312
GCTTC

tctaGTTGGCATGCTTTGATG
17223
BlatCas9
35
0





AC









313
GCTTCTG

tctaGTTGGCATGCTTTGATG
17224
BlatCas9
35
0



T

AC









314
CG

TAGTTGGCATGCTTTGATG
17225
SpyCas9-
36
0





A

NG







315
GAT

TTCTGTTCTCAGTTTTCCTG
17226
SpyCas9-
36
0







SpRY







316
CGC

TAGTTGGCATGCTTTGATG
17227
SpyCas9-
36
0





A

SpRY







317
GATT

TTCTGTTCTCAGTTTTCCTG
17228
SpyCas9-
36
0







3var-









NRTH







318
CGCT

TAGTTGGCATGCTTTGATG
17229
SpyCas9-
36
0





A

3var-









NRCH







319
CGC

TAGTTGGCATGCTTTGATG
17230
SpyCas9-
36
0





A

SpG







320
GAT

TTCTGTTCTCAGTTTTCCTG
17231
SpyCas9-
36
0







xCas







321
CG

TAGTTGGCATGCTTTGATG
17232
SpyCas9-
36
0





A

xCas







322
CG

TAGTTGGCATGCTTTGATG
17233
SpyCas9-
36
0





A

xCas-NG







323
ACG

CTAGTTGGCATGCTTTGATG
17234
ScaCas9-
37
0







Sc++







324
GG

ATTCTGTTCTCAGTTTTCCT
17235
SpyCas9-
37
0







NG







325
GGA

ATTCTGTTCTCAGTTTTCCT
17236
SpyCas9-
37
0







SpRY







326
ACG

CTAGTTGGCATGCTTTGATG
17237
SpyCas9-
37
0







SpRY







327
GGATTAT

TCATTCTGTTCTCAGTTTTC
17238
CdiCas9
37
0





CT









328
ACGCTTC

TTCTAGTTGGCATGCTTTGA
17239
CdiCas9
37
0





TG









329
ACG

CTAGTTGGCATGCTTTGATG
17240
ScaCas9
37
0





330
ACG

CTAGTTGGCATGCTTTGATG
17241
ScaCas9-
37
0







HiFi-Sc++







331
GGAT

ATTCTGTTCTCAGTTTTCCT
17242
SpyCas9-
37
0







3var-









NRRH







332
GGA

ATTCTGTTCTCAGTTTTCCT
17243
SpyCas9-
37
0







SpG







333
GGAT

ATTCTGTTCTCAGTTTTCCT
17244
SpyCas9-
37
0







VQR







334
GG

ATTCTGTTCTCAGTTTTCCT
17245
SpyCas9-
37
0







xCas







335
GG

ATTCTGTTCTCAGTTTTCCT
17246
SpyCas9-
37
0







xCas-NG







336
TGG

CATTCTGTTCTCAGTTTTCC
17247
ScaCas9-
38
0







Sc++







337
TGG

CATTCTGTTCTCAGTTTTCC
17248
SpyCas9
38
0





338
TG

CATTCTGTTCTCAGTTTTCC
17249
SpyCas9-
38
0







NG







339
TGG

CATTCTGTTCTCAGTTTTCC
17250
SpyCas9-
38
0







SpRY







340
GAC

TCTAGTTGGCATGCTTTGAT
17251
SpyCas9-
38
0







SpRY







341
GACGC

tcttCTAGTTGGCATGCTTTGA
17252
BlatCas9
38
0





T









342
TGGATT

TCATTCTGTTCTCAGTTTTC
17253
cCas9-v16
38
0





C









343
GACGCT

TTCTAGTTGGCATGCTTTGA
17254
cCas9-v16
38
0





T









344
TGGATT

TCATTCTGTTCTCAGTTTTC
17255
cCas9-v21
38
0





C









345
GACGCT

TTCTAGTTGGCATGCTTTGA
17256
cCas9-v21
38
0





T









346
TGG

CATTCTGTTCTCAGTTTTCC
17257
ScaCas9
38
0





347
TGG

CATTCTGTTCTCAGTTTTCC
17258
ScaCas9-
38
0







HiFi-Sc++







348
TGGA

CATTCTGTTCTCAGTTTTCC
17259
SpyCas9-
38
0







3var-









NRRH







349
TGG

CATTCTGTTCTCAGTTTTCC
17260
SpyCas9-
38
0







HF1







350
TGG

CATTCTGTTCTCAGTTTTCC
17261
SpyCas9-
38
0







SpG







351
TG

CATTCTGTTCTCAGTTTTCC
17262
SpyCas9-
38
0







xCas







352
TG

CATTCTGTTCTCAGTTTTCC
17263
SpyCas9-
38
0







xCas-NG







353
CTGGATT

aatTTCATTCTGTTCTCAGTT
17264
PpnCas9
39
0





TTC









354
CTGGAT

atTTCATTCTGTTCTCAGTTT
17265
SauCas9
39
0





TC









355
CTGGA

atTTCATTCTGTTCTCAGTTT
17266
SauCas9
39
0





TC









356
CTGGAT

TTCATTCTGTTCTCAGTTTT
17267
SauCas9KKH
39
0





C









357
CTGGA

TTCATTCTGTTCTCAGTTTT
17268
SauCas9KKH
39
0





C









358
CTGG

TTCATTCTGTTCTCAGTTTT
17269
SauriCas9
39
0





C









359
CTGG

TTCATTCTGTTCTCAGTTTT
17270
SauriCas9-
39
0





C

KKH







360
CTG

TCATTCTGTTCTCAGTTTTC
17271
ScaCas9-
39
0







Sc++







361
TG

TTCTAGTTGGCATGCTTTGA
17272
SpyCas9-
39
0







NG







362
CTG

TCATTCTGTTCTCAGTTTTC
17273
SpyCas9-
39
0







SpRY







363
TGA

TTCTAGTTGGCATGCTTTGA
17274
SpyCas9-
39
0







SpRY







364
CTGGAT

TTCATTCTGTTCTCAGTTTT
17275
cCas9-v17
39
0





C









365
CTGGAT

TTCATTCTGTTCTCAGTTTT
17276
cCas9-v42
39
0





C









366
TGACGCT

cctcTTCTAGTTGGCATGCTT
17277
NmeCas9
39
0



T

TGA









367
CTG

TCATTCTGTTCTCAGTTTTC
17278
ScaCas9
39
0





368
CTG

TCATTCTGTTCTCAGTTTTC
17279
ScaCas9-
39
0







HiFi-Sc++







369
TGAC

TTCTAGTTGGCATGCTTTGA
17280
SpyCas9-
39
0







3var-









NRRH







370
TGA

TTCTAGTTGGCATGCTTTGA
17281
SpyCas9-
39
0







SpG







371
TGAC

TTCTAGTTGGCATGCTTTGA
17282
SpyCas9-
39
0







VQR







372
TG

TTCTAGTTGGCATGCTTTGA
17283
SpyCas9-
39
0







xCas







373
TG

TTCTAGTTGGCATGCTTTGA
17284
SpyCas9-
39
0







xCas-NG







374
CCTGG

TTTCATTCTGTTCTCAGTTT
17285
SauCas9K
40
0





T

KH







375
ATG

CTTCTAGTTGGCATGCTTTG
17286
ScaCas9-
40
0







Sc++







376
CCT

TTCATTCTGTTCTCAGTTTT
17287
SpyCas9-
40
0







SpRY







377
ATG

CTTCTAGTTGGCATGCTTTG
17288
SpyCas9-
40
0







SpRY







378
ATGAC

cctcTTCTAGTTGGCATGCTT
17289
BlatCas9
40
0





TG









379
CCTGGAT

gaatTTCATTCTGTTCTCAGTT
17290
NmeCas9
40
0



T

TT









380
ATG

CTTCTAGTTGGCATGCTTTG
17291
ScaCas9
40
0





381
ATG

CTTCTAGTTGGCATGCTTTG
17292
ScaCas9-
40
0







HiFi-Sc++







382
GATGA

CTCTTCTAGTTGGCATGCTT
17293
SauCas9KKH
41
0





T









383
TCC

TTTCATTCTGTTCTCAGTTT
17294
SpyCas9-
41
0







SpRY







384
GAT

TCTTCTAGTTGGCATGCTTT
17295
SpyCas9-
41
0







SpRY







385
GAT

TCTTCTAGTTGGCATGCTTT
17296
SpyCas9-
41
0







xCas







386
TG

CTCTTCTAGTTGGCATGCTT
17297
SpyCas9-
42
0







NG







387
TTC

ATTTCATTCTGTTCTCAGTT
17298
SpyCas9-
42
0







SpRY







388
TGA

CTCTTCTAGTTGGCATGCTT
17299
SpyCas9-
42
0







SpRY







389
TGAT

CTCTTCTAGTTGGCATGCTT
17300
SpyCas9-
42
0







3var-









NRRH







390
TGA

CTCTTCTAGTTGGCATGCTT
17301
SpyCas9-
42
0







SpG







391
TGAT

CTCTTCTAGTTGGCATGCTT
17302
SpyCas9-
42
0







VQR







392
TG

CTCTTCTAGTTGGCATGCTT
17303
SpyCas9-
42
0







xCas







393
TG

CTCTTCTAGTTGGCATGCTT
17304
SpyCas9-
42
0







xCas-NG







394
TTG

CCTCTTCTAGTTGGCATGCT
17305
ScaCas9-
43
0







Sc++







395
TTT

AATTTCATTCTGTTCTCAGT
17306
SpyCas9-
43
0







SpRY







396
TTG

CCTCTTCTAGTTGGCATGCT
17307
SpyCas9-
43
0







SpRY







397
TTTCCTG

aagaATTTCATTCTGTTCTCA
17308
BlatCas9
43
0



G

GT









398
TTTCC

aagaATTTCATTCTGTTCTCA
17309
BlatCas9
43
0





GT









399
TTGATGA

ACCTCTTCTAGTTGGCATGC
17310
cCas9-v16
43
0





T









400
TTGATGA

ACCTCTTCTAGTTGGCATGC
17311
cCas9-v21
43
0





T









401
TTG

CCTCTTCTAGTTGGCATGCT
17312
ScaCas9
43
0





402
TTG

CCTCTTCTAGTTGGCATGCT
17313
ScaCas9-
43
0







HiFi-Sc++







403
TTTTCC

ggAAGAATTTCATTCTGTTC
17314
Nme2Cas9
44
0





TCAG









404
TTTGA

TACCTCTTCTAGTTGGCATG
17315
SauCas9KKH
44
0





C









405
TTTGAT

TACCTCTTCTAGTTGGCATG
17316
SauCas9KKH
44
0





C









406
TTT

GAATTTCATTCTGTTCTCAG
17317
SpyCas9-
44
0







SpRY







407
TTT

ACCTCTTCTAGTTGGCATGC
17318
SpyCas9-
44
0







SpRY







408
TTTTC

gaagAATTTCATTCTGTTCTC
17319
BlatCas9
44
0





AG









409
TTTTCCT

gaagAATTTCATTCTGTTCTC
17320
BlatCas9
44
0



G

AG









410
GTT

AGAATTTCATTCTGTTCTCA
17321
SpyCas9-
45
0







SpRY







411
CTT

TACCTCTTCTAGTTGGCATG
17322
SpyCas9-
45
0







SpRY







412
AG

AAGAATTTCATTCTGTTCTC
17323
SpyCas9-
46
0







NG







413
AGT

AAGAATTTCATTCTGTTCTC
17324
SpyCas9-
46
0







SpRY







414
GCT

TTACCTCTTCTAGTTGGCAT
17325
SpyCas9-
46
0







SpRY







415
AGTT

AAGAATTTCATTCTGTTCTC
17326
SpyCas9-
46
0







3var-









NRTH







416
AGT

AAGAATTTCATTCTGTTCTC
17327
SpyCas9-
46
0







SpG







417
AG

AAGAATTTCATTCTGTTCTC
17328
SpyCas9-
46
0







xCas







418
AG

AAGAATTTCATTCTGTTCTC
17329
SpyCas9-
46
0







xCas-NG







419
CAG

GAAGAATTTCATTCTGTTCT
17330
ScaCas9-
47
0







Sc++







420
TG

CTTACCTCTTCTAGTTGGCA
17331
SpyCas9-
47
0







NG







421
CAG

GAAGAATTTCATTCTGTTCT
17332
SpyCas9-
47
0







SpRY







422
TGC

CTTACCTCTTCTAGTTGGCA
17333
SpyCas9-
47
0







SpRY







423
CAGTTTT

TGGAAGAATTTCATTCTGTT
17334
CdiCas9
47
0





CT









424
CAG

GAAGAATTTCATTCTGTTCT
17335
ScaCas9
47
0





425
CAG

GAAGAATTTCATTCTGTTCT
17336
ScaCas9-
47
0







HiFi-Sc++







426
CAGT

GAAGAATTTCATTCTGTTCT
17337
SpyCas9-
47
0







3var-









NRRH







427
TGCT

CTTACCTCTTCTAGTTGGCA
17338
SpyCas9-
47
0







3var-









NRCH







428
TGC

CTTACCTCTTCTAGTTGGCA
17339
SpyCas9-
47
0







SpG







429
TG

CTTACCTCTTCTAGTTGGCA
17340
SpyCas9-
47
0







xCas







430
TG

CTTACCTCTTCTAGTTGGCA
17341
SpyCas9-
47
0







xCas-NG







431
TCAG

TGGAAGAATTTCATTCTGTT
17342
SauriCas9-
48
0





C

KKH







432
ATG

TCTTACCTCTTCTAGTTGGC
17343
ScaCas9-
48
0







Sc++







433
TCA

GGAAGAATTTCATTCTGTTC
17344
SpyCas9-
48
0







SpRY







434
ATG

TCTTACCTCTTCTAGTTGGC
17345
SpyCas9-
48
0







SpRY







435
TCAGTT

TGGAAGAATTTCATTCTGTT
17346
cCas9-v16
48
0





C









436
TCAGTT

TGGAAGAATTTCATTCTGTT
17347
cCas9-v21
48
0





C









437
ATGCTTT

TTTCTTACCTCTTCTAGTTG
17348
CdiCas9
48
0





GC









438
ATG

TCTTACCTCTTCTAGTTGGC
17349
ScaCas9
48
0





439
ATG

TCTTACCTCTTCTAGTTGGC
17350
ScaCas9-
48
0







HiFi-Sc++







440
CTCAGTT

acaGTGGAAGAATTTCATTC
17351
PpnCas9
49
0





TGTT









441
CTCAG

GTGGAAGAATTTCATTCTG
17352
SauCas9KKH
49
0





TT









442
CTCAGT

GTGGAAGAATTTCATTCTG
17353
SauCas9KKH
49
0





TT









443
CTC

TGGAAGAATTTCATTCTGTT
17354
SpyCas9-
49
0







SpRY







444
CAT

TTCTTACCTCTTCTAGTTGG
17355
SpyCas9-
49
0







SpRY







445
CATGC

agttTCTTACCTCTTCTAGTTG
17356
BlatCas9
49
0





G









446
CATGCTT

agttTCTTACCTCTTCTAGTTG
17357
BlatCas9
49
0



T

G









447
CTCAGT

GTGGAAGAATTTCATTCTG
17358
cCas9-v17
49
0





TT









448
CTCAGT

GTGGAAGAATTTCATTCTG
17359
cCas9-v42
49
0





TT









449
CTCAGTT

acagTGGAAGAATTTCATTCT
17360
NmeCas9
49
0



T

GTT









450
TCT

GTGGAAGAATTTCATTCTG
17361
SpyCas9-
50
0





T

SpRY







451
GCA

TTTCTTACCTCTTCTAGTTG
17362
SpyCas9-
50
0







SpRY







452
GCATGCT

atagTTTCTTACCTCTTCTAGT
17363
NmeCas9
50
0



T

TG









453
GG

GTTTCTTACCTCTTCTAGTT
17364
SpyCas9-
51
0







NG







454
TTC

AGTGGAAGAATTTCATTCT
17365
SpyCas9-
51
0





G

SpRY







455
GGC

GTTTCTTACCTCTTCTAGTT
17366
SpyCas9-
51
0







SpRY







456
TTCTCAG

cacaGTGGAAGAATTTCATTC
17367
BlatCas9
51
0



T

TG









457
TTCTC

cacaGTGGAAGAATTTCATTC
17368
BlatCas9
51
0





TG









458
GGCATGC

AGTTTCTTACCTCTTCTAGT
17369
cCas9-v16
51
0





T









459
GGCATGC

AGTTTCTTACCTCTTCTAGT
17370
cCas9-v21
51
0





T









460
GGCA

GTTTCTTACCTCTTCTAGTT
17371
SpyCas9-
51
0







3var-









NRCH







461
GGC

GTTTCTTACCTCTTCTAGTT
17372
SpyCas9-
51
0







SpG







462
GG

GTTTCTTACCTCTTCTAGTT
17373
SpyCas9-
51
0







xCas







463
GG

GTTTCTTACCTCTTCTAGTT
17374
SpyCas9-
51
0







xCas-NG







464
TGG

AGTTTCTTACCTCTTCTAGT
17375
ScaCas9-
52
0







Sc++







465
TGG

AGTTTCTTACCTCTTCTAGT
17376
SpyCas9
52
0





466
TG

AGTTTCTTACCTCTTCTAGT
17377
SpyCas9-
52
0







NG







467
GTT

CAGTGGAAGAATTTCATTC
17378
SpyCas9-
52
0





T

SpRY







468
TGG

AGTTTCTTACCTCTTCTAGT
17379
SpyCas9-
52
0







SpRY







469
TGG

AGTTTCTTACCTCTTCTAGT
17380
ScaCas9
52
0





470
TGG

AGTTTCTTACCTCTTCTAGT
17381
ScaCas9-
52
0







HiFi-Sc++







471
TGGC

AGTTTCTTACCTCTTCTAGT
17382
SpyCas9-
52
0







3var-









NRRH







472
TGG

AGTTTCTTACCTCTTCTAGT
17383
SpyCas9-
52
0







HF1







473
TGG

AGTTTCTTACCTCTTCTAGT
17384
SpyCas9-
52
0







SpG







474
TG

AGTTTCTTACCTCTTCTAGT
17385
SpyCas9-
52
0







xCas







475
TG

AGTTTCTTACCTCTTCTAGT
17386
SpyCas9-
52
0







xCas-NG







476
TTGG

ATAGTTTCTTACCTCTTCTA
17387
SauriCas9
53
0





G









477
TTGG

ATAGTTTCTTACCTCTTCTA
17388
SauriCas9-
53
0





G

KKH







478
TTG

TAGTTTCTTACCTCTTCTAG
17389
ScaCas9-
53
0







Sc++







479
TG

ACAGTGGAAGAATTTCATT
17390
SpyCas9-
53
0





C

NG







480
TGT

ACAGTGGAAGAATTTCATT
17391
SpyCas9-
53
0





C

SpRY







481
TTG

TAGTTTCTTACCTCTTCTAG
17392
SpyCas9-
53
0







SpRY







482
TGTTC

agcaCAGTGGAAGAATTTCA
17393
BlatCas9
53
0





TTC









483
TTGGCAT

acatAGTTTCTTACCTCTTCTA
17394
BlatCas9
53
0



G

G









484
TTGGC

acatAGTTTCTTACCTCTTCTA
17395
BlatCas9
53
0





G









485
TTG

TAGTTTCTTACCTCTTCTAG
17396
ScaCas9
53
0





486
TTG

TAGTTTCTTACCTCTTCTAG
17397
ScaCas9-
53
0







HiFi-Sc++







487
TGTT

ACAGTGGAAGAATTTCATT
17398
SpyCas9-
53
0





C

3var-









NRTH







488
TGT

ACAGTGGAAGAATTTCATT
17399
SpyCas9-
53
0





C

SpG







489
TG

ACAGTGGAAGAATTTCATT
17400
SpyCas9-
53
0





C

xCas







490
TG

ACAGTGGAAGAATTTCATT
17401
SpyCas9-
53
0





C

xCas-NG







491
GTTGG

CATAGTTTCTTACCTCTTCT
17402
SauCas9KKH
54
0





A









492
CTG

CACAGTGGAAGAATTTCAT
17403
ScaCas9-
54
0





T

Sc++







493
CTG

CACAGTGGAAGAATTTCAT
17404
SpyCas9-
54
0





T

SpRY







494
GTT

ATAGTTTCTTACCTCTTCTA
17405
SpyCas9-
54
0







SpRY







495
CTGTTCT

AGCACAGTGGAAGAATTTC
17406
CdiCas9
54
0





ATT









496
CTG

CACAGTGGAAGAATTTCAT
17407
ScaCas9
54
0





T









497
CTG

CACAGTGGAAGAATTTCAT
17408
ScaCas9-
54
0





T

HiFi-Sc++







498
AG

CATAGTTTCTTACCTCTTCT
17409
SpyCas9-
55
0







NG







499
TCT

GCACAGTGGAAGAATTTCA
17410
SpyCas9-
55
0





T

SpRY







500
AGT

CATAGTTTCTTACCTCTTCT
17411
SpyCas9-
55
0







SpRY







501
AGTT

CATAGTTTCTTACCTCTTCT
17412
SpyCas9-
55
0







3var-









NRTH







502
AGT

CATAGTTTCTTACCTCTTCT
17413
SpyCas9-
55
0







SpG







503
AG

CATAGTTTCTTACCTCTTCT
17414
SpyCas9-
55
0







xCas







504
AG

CATAGTTTCTTACCTCTTCT
17415
SpyCas9-
55
0







xCas-NG







505
TTCTGTT

attAAGCACAGTGGAAGAAT
17416
PpnCas9
56
0





TTCA









506
TAG

ACATAGTTTCTTACCTCTTC
17417
ScaCas9-
56
0







Sc++







507
TTC

AGCACAGTGGAAGAATTTC
17418
SpyCas9-
56
0





A

SpRY







508
TAG

ACATAGTTTCTTACCTCTTC
17419
SpyCas9-
56
0







SpRY







509
TAG

ACATAGTTTCTTACCTCTTC
17420
ScaCas9
56
0





510
TAG

ACATAGTTTCTTACCTCTTC
17421
ScaCas9-
56
0







HiFi-Sc++







511
TAGT

ACATAGTTTCTTACCTCTTC
17422
SpyCas9-
56
0







3var-









NRRH







512
CTAG

TCACATAGTTTCTTACCTCT
17423
SauriCas9-
57
0





T

KKH







513
ATT

AAGCACAGTGGAAGAATTT
17424
SpyCas9-
57
0





C

SpRY







514
CTA

CACATAGTTTCTTACCTCTT
17425
SpyCas9-
57
0







SpRY







515
CTAGTT

TCACATAGTTTCTTACCTCT
17426
cCas9-v16
57
0





T









516
CTAGTT

TCACATAGTTTCTTACCTCT
17427
cCas9-v21
57
0





T









517
TCTAGTT

gttTTCACATAGTTTCTTACC
17428
PpnCas9
58
0





TCT









518
TCTAG

TTCACATAGTTTCTTACCTC
17429
SauCas9KKH
58
0





T









519
TCTAGT

TTCACATAGTTTCTTACCTC
17430
SauCas9KKH
58
0





T









520
CAT

TAAGCACAGTGGAAGAATT
17431
SpyCas9-
58
0





T

SpRY







521
TCT

TCACATAGTTTCTTACCTCT
17432
SpyCas9-
58
0







SpRY







522
CATTC

aattAAGCACAGTGGAAGAAT
17433
BlatCas9
58
0





TT









523
CATTCTG

aattAAGCACAGTGGAAGAAT
17434
BlatCas9
58
0



T

TT









524
CATT

TAAGCACAGTGGAAGAATT
17435
SpyCas9-
58
0





T

3var-









NRTH







525
TCA

TTAAGCACAGTGGAAGAAT
17436
SpyCas9-
59
0





T

SpRY







526
TTC

TTCACATAGTTTCTTACCTC
17437
SpyCas9-
59
0







SpRY







527
TCATTCT

AATTAAGCACAGTGGAAGA
17438
CdiCas9
59
0





ATT









528
TTC

ATTAAGCACAGTGGAAGAA
17439
SpyCas9-
60
0





T

SpRY







529
CTT

TTTCACATAGTTTCTTACCT
17440
SpyCas9-
60
0







SpRY







530
TTCATT

AATTAAGCACAGTGGAAGA
17441
cCas9-v16
60
0





AT









531
TTCATT

AATTAAGCACAGTGGAAGA
17442
cCas9-v21
60
0





AT









532
TTTCATT

gtaAAATTAAGCACAGTGGA
17443
PpnCas9
61
0





AGAA









533
TTT

AATTAAGCACAGTGGAAGA
17444
SpyCas9-
61
0





A

SpRY







534
TCT

TTTTCACATAGTTTCTTACC
17445
SpyCas9-
61
0







SpRY







535
TCTTC

aagtTTTCACATAGTTTCTTA
17446
BlatCas9
61
0





CC









536
TCTTCTA

aagtTTTCACATAGTTTCTTA
17447
BlatCas9
61
0



G

CC









537
ATT

AAATTAAGCACAGTGGAAG
17448
SpyCas9-
62
0





A

SpRY







538
CTC

GTTTTCACATAGTTTCTTAC
17449
SpyCas9-
62
0







SpRY







539
ATTTC

gtaaAATTAAGCACAGTGGA
17450
BlatCas9
62
0





AGA









540
ATTTCAT

gtaaAATTAAGCACAGTGGA
17451
BlatCas9
62
0



T

AGA









541
AAT

AAAATTAAGCACAGTGGAA
17452
SpyCas9-
63
0





G

SpRY







542
CCT

AGTTTTCACATAGTTTCTTA
17453
SpyCas9-
63
0







SpRY







543
AATT

AAAATTAAGCACAGTGGAA
17454
SpyCas9-
63
0





G

3var-









NRTH







544
GAA

TAAAATTAAGCACAGTGGA
17455
SpyCas9-
64
0





A

SpRY







545
ACC

AAGTTTTCACATAGTTTCTT
17456
SpyCas9-
64
0







SpRY







546
ACCTC

aaaaAGTTTTCACATAGTTTC
17457
BlatCas9
64
0





TT









547
GAATTTC

GGTAAAATTAAGCACAGTG
17458
CdiCas9
64
0





GAA









548
GAAT

gtAAAATTAAGCACAGTGGA
17459
iSpyMacCas9
64
0





A









549
GAAT

TAAAATTAAGCACAGTGGA
17460
SpyCas9-
64
0





A

3var-









NRRH







550
GAA

TAAAATTAAGCACAGTGGA
17461
SpyCas9-
64
0





A

xCas







551
AG

GTAAAATTAAGCACAGTGG
17462
SpyCas9-
65
0





A

NG







552
AGA

GTAAAATTAAGCACAGTGG
17463
SpyCas9-
65
0





A

SpRY







553
TAC

AAAGTTTTCACATAGTTTCT
17464
SpyCas9-
65
0







SpRY







554
AGAATT

GGTAAAATTAAGCACAGTG
17465
cCas9-v16
65
0





GA









555
AGAATT

GGTAAAATTAAGCACAGTG
17466
cCas9-v21
65
0





GA









556
AGAATTT

GGGTAAAATTAAGCACAGT
17467
CdiCas9
65
0





GGA









557
AGAA

GTAAAATTAAGCACAGTGG
17468
SpyCas9-
65
0





A

3var-









NRRH







558
TACC

AAAGTTTTCACATAGTTTCT
17469
SpyCas9-
65
0







3var-









NRCH







559
AGA

GTAAAATTAAGCACAGTGG
17470
SpyCas9-
65
0





A

SpG







560
AGAA

GTAAAATTAAGCACAGTGG
17471
SpyCas9-
65
0





A

VQR







561
AG

GTAAAATTAAGCACAGTGG
17472
SpyCas9-
65
0





A

xCas







562
AG

GTAAAATTAAGCACAGTGG
17473
SpyCas9-
65
0





A

xCas-NG







563
AAGAATT

agaGGGTAAAATTAAGCACA
17474
PpnCas9
66
0





GTGG









564
AAGAAT

gaGGGTAAAATTAAGCACA
17475
SauCas9
66
0





GTGG









565
AAGAA

gaGGGTAAAATTAAGCACA
17476
SauCas9
66
0





GTGG









566
AAGAAT

GGGTAAAATTAAGCACAGT
17477
SauCas9KKH
66
0





GG









567
AAGAA

GGGTAAAATTAAGCACAGT
17478
SauCas9KKH
66
0





GG









568
AAG

GGTAAAATTAAGCACAGTG
17479
ScaCas9-
66
0





G

Sc++







569
AAG

GGTAAAATTAAGCACAGTG
17480
SpyCas9-
66
0





G

SpRY







570
TTA

AAAAGTTTTCACATAGTTTC
17481
SpyCas9-
66
0







SpRY







571
TTACC

tcaaAAAGTTTTCACATAGTT
17482
BlatCas9
66
0





TC









572
AAGAAT

GGGTAAAATTAAGCACAGT
17483
cCas9-v17
66
0





GG









573
AAGAAT

GGGTAAAATTAAGCACAGT
17484
cCas9-v42
66
0





GG









574
AAGAATT

AGGGTAAAATTAAGCACAG
17485
CdiCas9
66
0





TGG









575
TTACCTC

CAAAAAGTTTTCACATAGT
17486
CdiCas9
66
0





TTC









576
AAG

GGTAAAATTAAGCACAGTG
17487
ScaCas9
66
0





G









577
AAG

GGTAAAATTAAGCACAGTG
17488
ScaCas9-
66
0





G

HiFi-Sc++







578
AAGA

GGTAAAATTAAGCACAGTG
17489
SpyCas9-
66
0





G

3var-









NRRH







579
CTTACC

aaTCAAAAAGTTTTCACATA
17490
Nme2Cas9
67
0





GTTT









580
GAAGA

AGGGTAAAATTAAGCACAG
17491
SauCas9KKH
67
0





TG









581
GAAG

AGGGTAAAATTAAGCACAG
17492
SauriCas9-
67
0





TG

KKH







582
GAA

GGGTAAAATTAAGCACAGT
17493
SpyCas9-
67
0





G

SpRY







583
CTT

AAAAAGTTTTCACATAGTT
17494
SpyCas9-
67
0





T

SpRY







584
GAAGAA

GGGTAAAATTAAGCACAGT
17495
St1Cas9
67
0



T

G









585
CTTAC

atcaAAAAGTTTTCACATAGT
17496
BlatCas9
67
0





TT









586
GAAGAA

AGGGTAAAATTAAGCACAG
17497
cCas9-v17
67
0





TG









587
GAAGAA

AGGGTAAAATTAAGCACAG
17498
cCas9-v42
67
0





TG









588
GAAG

agGGTAAAATTAAGCACAGT
17499
iSpyMacC
67
0





G

as9







589
GAAG

GGGTAAAATTAAGCACAGT
17500
SpyCas9-
67
0





G

QQR1







590
GAA

GGGTAAAATTAAGCACAGT
17501
SpyCas9-
67
0





G

xCas







591
GGAAG

GAGGGTAAAATTAAGCACA
17502
SauCas9KKH
68
0





GT









592
GG

AGGGTAAAATTAAGCACAG
17503
SpyCas9-
68
0





T

NG







593
GGA

AGGGTAAAATTAAGCACAG
17504
SpyCas9-
68
0





T

SpRY







594
TCT

CAAAAAGTTTTCACATAGT
17505
SpyCas9-
68
0





T

SpRY







595
GGAAGA

GAGGGTAAAATTAAGCACA
17506
cCas9-v17
68
0





GT









596
GGAAGA

GAGGGTAAAATTAAGCACA
17507
cCas9-v42
68
0





GT









597
GGAA

AGGGTAAAATTAAGCACAG
17508
SpyCas9-
68
0





T

3var-









NRRH







598
GGA

AGGGTAAAATTAAGCACAG
17509
SpyCas9-
68
0





T

SpG







599
GGAA

AGGGTAAAATTAAGCACAG
17510
SpyCas9-
68
0





T

VQR







600
GG

AGGGTAAAATTAAGCACAG
17511
SpyCas9-
68
0





T

xCas







601
GG

AGGGTAAAATTAAGCACAG
17512
SpyCas9-
68
0





T

xCas-NG







602
TGGAA

tcAGAGGGTAAAATTAAGCA
17513
SauCas9
69
0





CAG









603
TGGAA

AGAGGGTAAAATTAAGCAC
17514
SauCas9KKH
69
0





AG









604
TGG

GAGGGTAAAATTAAGCACA
17515
ScaCas9-
69
0





G

Sc++







605
TGG

GAGGGTAAAATTAAGCACA
17516
SpyCas9
69
0





G









606
TG

GAGGGTAAAATTAAGCACA
17517
SpyCas9-
69
0





G

NG







607
TGG

GAGGGTAAAATTAAGCACA
17518
SpyCas9-
69
0





G

SpRY







608
TTC

TCAAAAAGTTTTCACATAG
17519
SpyCas9-
69
0





T

SpRY







609
TGGAAG

AGAGGGTAAAATTAAGCAC
17520
cCas9-v17
69
0





AG









610
TGGAAG

AGAGGGTAAAATTAAGCAC
17521
cCas9-v42
69
0





AG









611
TGG

GAGGGTAAAATTAAGCACA
17522
ScaCas9
69
0





G









612
TGG

GAGGGTAAAATTAAGCACA
17523
ScaCas9.
69
0





G

HiFi-Sc++







613
TGGA

GAGGGTAAAATTAAGCACA
17524
SpyCas9-
69
0





G

3var-









NRRH







614
TGG

GAGGGTAAAATTAAGCACA
17525
SpyCas9-
69
0





G

HF1







615
TGG

GAGGGTAAAATTAAGCACA
17526
SpyCas9-
69
0





G

SpG







616
TG

GAGGGTAAAATTAAGCACA
17527
SpyCas9-
69
0





G

xCas







617
TG

GAGGGTAAAATTAAGCACA
17528
SpyCas9-
69
0





G

xCas-NG







618
GTGGA

ttCAGAGGGTAAAATTAAGC
17529
SauCas9
70
0





ACA









619
GTGGA

CAGAGGGTAAAATTAAGCA
17530
SauCas9KKH
70
0





CA









620
GTGG

CAGAGGGTAAAATTAAGCA
17531
SauriCas9
70
0





CA









621
GTGG

CAGAGGGTAAAATTAAGCA
17532
SauriCas9-
70
0





CA

KKH







622
GTG

AGAGGGTAAAATTAAGCAC
17533
ScaCas9-
70
0





A

Sc++







623
GTG

AGAGGGTAAAATTAAGCAC
17534
SpyCas9-
70
0





A

SpRY







624
TTT

ATCAAAAAGTTTTCACATA
17535
SpyCas9-
70
0





G

SpRY







625
GTGGAA

CAGAGGGTAAAATTAAGCA
17536
cCas9-v17
70
0





CA









626
GTGGAA

CAGAGGGTAAAATTAAGCA
17537
cCas9-v42
70
0





CA









627
GTG

AGAGGGTAAAATTAAGCAC
17538
ScaCas9
70
0





A









628
GTG

AGAGGGTAAAATTAAGCAC
17539
ScaCas9-
70
0





A

HiFi-Sc++







629
AGTGG

TCAGAGGGTAAAATTAAGC
17540
SauCas9KKH
71
0





AC









630
AG

CAGAGGGTAAAATTAAGCA
17541
SpyCas9-
71
0





C

NG







631
AGT

CAGAGGGTAAAATTAAGCA
17542
SpyCas9-
71
0





C

SpRY







632
GTT

AATCAAAAAGTTTTCACAT
17543
SpyCas9-
71
0





A

SpRY







633
GTTTCTT

cataATCAAAAAGTTTTCACA
17544
BlatCas9
71
0



A

TA









634
GTTTC

cataATCAAAAAGTTTTCACA
17545
BlatCas9
71
0





TA









635
AGT

CAGAGGGTAAAATTAAGCA
17546
SpyCas9-
71
0





C

SpG







636
AG

CAGAGGGTAAAATTAAGCA
17547
SpyCas9-
71
0





C

xCas







637
AG

CAGAGGGTAAAATTAAGCA
17548
SpyCas9-
71
0





C

xCas-NG







638
CAG

TCAGAGGGTAAAATTAAGC
17549
ScaCas9-
72
0





A

Sc++







639
AG

TAATCAAAAAGTTTTCACA
17550
SpyCas9-
72
0





T

NG







640
CAG

TCAGAGGGTAAAATTAAGC
17551
SpyCas9-
72
0





A

SpRY







641
AGT

TAATCAAAAAGTTTTCACA
17552
SpyCas9-
72
0





T

SpRY







642
CAG

TCAGAGGGTAAAATTAAGC
17553
ScaCas9
72
0





A









643
CAG

TCAGAGGGTAAAATTAAGC
17554
ScaCas9-
72
0





A

HiFi-Sc++







644
CAGT

TCAGAGGGTAAAATTAAGC
17555
SpyCas9-
72
0





A

3var-









NRRH







645
AGTT

TAATCAAAAAGTTTTCACA
17556
SpyCas9-
72
0





T

3var-









NRTH







646
AGT

TAATCAAAAAGTTTTCACA
17557
SpyCas9-
72
0





T

SpG







647
AG

TAATCAAAAAGTTTTCACA
17558
SpyCas9-
72
0





T

xCas







648
AG

TAATCAAAAAGTTTTCACA
17559
SpyCas9-
72
0





T

xCas-NG







649
ACAG

CTTCAGAGGGTAAAATTAA
17560
SauriCas9-
73
0





GC

KKH







650
TAG

ATAATCAAAAAGTTTTCAC
17561
ScaCas9-
73
0





A

Sc++







651
ACA

TTCAGAGGGTAAAATTAAG
17562
SpyCas9-
73
0





C

SpRY







652
TAG

ATAATCAAAAAGTTTTCAC
17563
SpyCas9-
73
0





A

SpRY







653
ACAGTG

CTTCAGAGGGTAAAATTAA
17564
cCas9-v16
73
0





GC









654
ACAGTG

CTTCAGAGGGTAAAATTAA
17565
cCas9-v21
73
0





GC









655
TAGTTTC

GCATAATCAAAAAGTTTTC
17566
CdiCas9
73
0





ACA









656
TAG

ATAATCAAAAAGTTTTCAC
17567
ScaCas9
73
0





A









657
TAG

ATAATCAAAAAGTTTTCAC
17568
ScaCas9-
73
0





A

HiFi-Sc++







658
TAGT

ATAATCAAAAAGTTTTCAC
17569
SpyCas9-
73
0





A

3var-









NRRH







659
CACAG

CCTTCAGAGGGTAAAATTA
17570
SauCas9KKH
74
0





AG









660
CACAGT

CCTTCAGAGGGTAAAATTA
17571
SauCas9KKH
74
0





AG









661
ATAG

GCATAATCAAAAAGTTTTC
17572
SauriCas9-
74
0





AC

KKH







662
CAC

CTTCAGAGGGTAAAATTAA
17573
SpyCas9-
74
0





G

SpRY







663
ATA

CATAATCAAAAAGTTTTCA
17574
SpyCas9-
74
0





C

SpRY




664
ATAGTT

GCATAATCAAAAAGTTTTC
17575
cCas9-v16
74
0





AC












665
CACAGT

CCTTCAGAGGGTAAAATTA
17576
cCas9-v17
74
0





AG









666
ATAGTT

GCATAATCAAAAAGTTTTC
17577
cCas9-v21
74
0





AC









667
CACAGT

CCTTCAGAGGGTAAAATTA
17578
cCas9-v42
74
0





AG









668
CACA

CTTCAGAGGGTAAAATTAA
17579
SpyCas9-
74
0





G

3var-









NRCH







669
CATAGTT

ataTGCATAATCAAAAAGTTT
17580
PpnCas9
75
0





TCA









670
CATAGT

TGCATAATCAAAAAGTTTT
17581
SauCas9KKH
75
0





CA









671
CATAG

TGCATAATCAAAAAGTTTT
17582
SauCas9KKH
75
0





CA









672
GCA

CCTTCAGAGGGTAAAATTA
17583
SpyCas9-
75
0





A

SpRY







673
CAT

GCATAATCAAAAAGTTTTC
17584
SpyCas9-
75
0





A

SpRY







674
CATAGTT

atatGCATAATCAAAAAGTTT
17585
NmeCas9
75
0



T

TCA









675
CATA

GCATAATCAAAAAGTTTTC
17586
SpyCas9-
75
0





A

3var-









NRTH







676
AG

GCCTTCAGAGGGTAAAATT
17587
SpyCas9-
76
0





A

NG







677
AGC

GCCTTCAGAGGGTAAAATT
17588
SpyCas9-
76
0





A

SpRY







678
ACA

TGCATAATCAAAAAGTTTT
17589
SpyCas9-
76
0





C

SpRY







679
AGCAC

ggagCCTTCAGAGGGTAAAA
17590
BlatCas9
76
0





TTA









680
AGCACA

ggagCCTTCAGAGGGTAAAA
17591
BlatCas9
76
0



GT

TTA









681
AGCA

GCCTTCAGAGGGTAAAATT
17592
SpyCas9-
76
0





A

3var-









NRCH







682
AGC

GCCTTCAGAGGGTAAAATT
17593
SpyCas9-
76
0





A

SpG







683
AG

GCCTTCAGAGGGTAAAATT
17594
SpyCas9-
76
0





A

xCas







684
AG

GCCTTCAGAGGGTAAAATT
17595
SpyCas9-
76
0





A

xCas-NG







685
AAG

AGCCTTCAGAGGGTAAAAT
17596
ScaCas9-
77
0





T

Sc++







686
AAG

AGCCTTCAGAGGGTAAAAT
17597
SpyCas9-
77
0





T

SpRY







687
CAC

ATGCATAATCAAAAAGTTT
17598
SpyCas9-
77
0





T

SpRY







688
AAG

AGCCTTCAGAGGGTAAAAT
17599
ScaCas9
77
0





T









689
AAG

AGCCTTCAGAGGGTAAAAT
17600
ScaCas9-
77
0





T

HiFi-Sc++







690
AAGC

AGCCTTCAGAGGGTAAAAT
17601
SpyCas9-
77
0





T

3var-









NRRH







691
CACA

ATGCATAATCAAAAAGTTT
17602
SpyCas9-
77
0





T

3var-









NRCH







692
TAAG

GGAGCCTTCAGAGGGTAAA
17603
SauriCas9-
78
0





AT

KKH







693
TAA

GAGCCTTCAGAGGGTAAAA
17604
SpyCas9-
78
0





T

SpRY







694
TCA

TATGCATAATCAAAAAGTT
17605
SpyCas9-
78
0





T

SpRY







695
TAAGC

ctggAGCCTTCAGAGGGTAA
17606
BlatCas9
78
0





AAT









696
TAAG

ggAGCCTTCAGAGGGTAAA
17607
iSpyMacCas9
78
0





AT









697
TAAG

GAGCCTTCAGAGGGTAAAA
17608
SpyCas9-
78
0





T

QQR1







698
TTAAG

TGGAGCCTTCAGAGGGTAA
17609
SauCas9KKH
79
0





AA









699
TTA

GGAGCCTTCAGAGGGTAAA
17610
SpyCas9-
79
0





A

SpRY







700
TTC

ATATGCATAATCAAAAAGT
17611
SpyCas9-
79
0





T

SpRY







701
TTCACAT

ttcaTATGCATAATCAAAAAG
17612
BlatCas9
79
0



A

TT









702
TTCAC

ttcaTATGCATAATCAAAAAG
17613
BlatCas9
79
0





TT









703
TTAAGC

TGGAGCCTTCAGAGGGTAA
17614
cCas9-v17
79
0





AA









704
TTAAGC

TGGAGCCTTCAGAGGGTAA
17615
cCas9-v42
79
0





AA









705
TTAAGCA

acTGGAGCCTTCAGAGGGTA
17616
CjeCas9
79
0



C

AAA









706
ATTAA

CTGGAGCCTTCAGAGGGTA
17617
SauCas9KKH
80
0





AA









707
ATT

TGGAGCCTTCAGAGGGTAA
17618
SpyCas9-
80
0





A

SpRY







708
TTT

CATATGCATAATCAAAAAG
17619
SpyCas9-
80
0





T

SpRY







709
AAT

CTGGAGCCTTCAGAGGGTA
17620
SpyCas9-
81
0





A

SpRY







710
TTT

TCATATGCATAATCAAAAA
17621
SpyCas9-
81
0





G

SpRY







711
TTTTC

ggttCATATGCATAATCAAAA
17622
BlatCas9
81
0





AG









712
AATT

CTGGAGCCTTCAGAGGGTA
17623
SpyCas9-
81
0





A

3var-









NRTH







713
AAA

ACTGGAGCCTTCAGAGGGT
17624
SpyCas9-
82
0





A

SpRY







714
GTT

TTCATATGCATAATCAAAA
17625
SpyCas9-
82
0





A

SpRY







715
AAAT

aaCTGGAGCCTTCAGAGGGT
17626
iSpyMacCas9
82
0





A









716
AAAT

ACTGGAGCCTTCAGAGGGT
17627
SpyCas9-
82
0





A

3var-









NRRH







717
AG

GTTCATATGCATAATCAAA
17628
SpyCas9-
83
0





A

NG







718
AAA

AACTGGAGCCTTCAGAGGG
17629
SpyCas9-
83
0





T

SpRY







719
AGT

GTTCATATGCATAATCAAA
17630
SpyCas9-
83
0





A

SpRY







720
AAAATT

GAACTGGAGCCTTCAGAGG
17631
cCas9-v16
83
0





GT









721
AAAATT

GAACTGGAGCCTTCAGAGG
17632
cCas9-v21
83
0





GT









722
AAAA

gaACTGGAGCCTTCAGAGGG
17633
iSpyMacCas9
83
0





T









723
AAAA

AACTGGAGCCTTCAGAGGG
17634
SpyCas9-
83
0





T

3var-









NRRH







724
AGTT

GTTCATATGCATAATCAAA
17635
SpyCas9-
83
0





A

3var-









NRTH







725
AGT

GTTCATATGCATAATCAAA
17636
SpyCas9
83
0





A

SpG







726
AG

GTTCATATGCATAATCAAA
17637
SpyCas9-
83
0





A

xCas







727
AG

GTTCATATGCATAATCAAA
17638
SpyCas9-
83
0





A

xCas-NG







728
TAAAATT

gggAGAACTGGAGCCTTCAG
17639
PpnCas9
84
0





AGGG









729
TAAAA

AGAACTGGAGCCTTCAGAG
17640
SauCas9KKH
84
0





GG









730
TAAAAT

AGAACTGGAGCCTTCAGAG
17641
SauCas9KKH
84
0





GG









731
AAG

GGTTCATATGCATAATCAA
17642
ScaCas9-
84
0





A

Sc++







732
TAA

GAACTGGAGCCTTCAGAGG
17643
SpyCas9-
84
0





G

SpRY







733
AAG

GGTTCATATGCATAATCAA
17644
SpyCas9-
84
0





A

SpRY







734
TAAAAT

AGAACTGGAGCCTTCAGAG
17645
cCas9-v17
84
0





GG









735
TAAAAT

AGAACTGGAGCCTTCAGAG
17646
cCas9-v42
84
0





GG









736
TAAAATT

GAGAACTGGAGCCTTCAGA
17647
CdiCas9
84
0





GGG









737
AAGTTTT

AGGGTTCATATGCATAATC
17648
CdiCas9
84
0





AAA









738
TAAA

agAACTGGAGCCTTCAGAGG
17649
iSpyMacCas9
84
0





G









739
AAG

GGTTCATATGCATAATCAA
17650
ScaCas9
84
0





A









740
AAG

GGTTCATATGCATAATCAA
17651
ScaCas9-
84
0





A

HiFi-Sc++







741
TAAA

GAACTGGAGCCTTCAGAGG
17652
SpyCas9-
84
0





G

3var-









NRRH







742
AAGT

GGTTCATATGCATAATCAA
17653
SpyCas9-
84
0





A

3var-









NRRH







743
GTAAA

GAGAACTGGAGCCTTCAGA
17654
SauCas9KKH
85
0





GG









744
AAAG

AGGGTTCATATGCATAATC
17655
SauriCas9-
85
0





AA

KKH







745
GTA

AGAACTGGAGCCTTCAGAG
17656
SpyCas9-
85
0





G

SpRY







746
AAA

GGGTTCATATGCATAATCA
17657
SpyCas9-
85
0





A

SpRY







747
AAAGTT

AGGGTTCATATGCATAATC
17658
cCas9-v16
85
0





AA









748
GTAAAA

GAGAACTGGAGCCTTCAGA
17659
cCas9-v17
85
0





GG









749
AAAGTT

AGGGTTCATATGCATAATC
17660
cCas9-v21
85
0





AA









750
GTAAAA

GAGAACTGGAGCCTTCAGA
17661
cCas9-v42
85
0





GG









751
GTAAAAT

GGAGAACTGGAGCCTTCAG
17662
CdiCas9
85
0





AGG









752
GTAAAAT

GGAGAACTGGAGCCTTCAG
17663
CdiCas9
85
0





AGG









753
AAAG

agGGTTCATATGCATAATCA
17664
iSpyMacCas9
85
0





A









754
AAAG

GGGTTCATATGCATAATCA
17665
SpyCas9-
85
0





A

QQR1







755
GTAAAA

AGAACTGGAGCCTTCAGAG
17666
St1Cas9-
85
0





G

MTH17CL396







756
AAAAGTT

gtgAAGGGTTCATATGCATA
17667
PpnCas9
86
0





ATCA









757
GGTAA

GGAGAACTGGAGCCTTCAG
17668
SauCas9KKH
86
0





AG









758
AAAAG

AAGGGTTCATATGCATAAT
17669
SauCas9KKH
86
0





CA









759
AAAAGT

AAGGGTTCATATGCATAAT
17670
SauCas9KKH
86
0





CA









760
GG

GAGAACTGGAGCCTTCAGA
17671
SpyCas9-
86
0





G

NG







761
GGT

GAGAACTGGAGCCTTCAGA
17672
SpyCas9-
86
0





G

SpRY







762
AAA

AGGGTTCATATGCATAATC
17673
SpyCas9-
86
0





A

SpRY







763
AAAAGT

AAGGGTTCATATGCATAAT
17674
cCas9-v17
86
0





CA









764
AAAAGT

AAGGGTTCATATGCATAAT
17675
cCas9-v42
86
0





CA









765
AAAA

aaGGGTTCATATGCATAATC
17676
iSpyMacCas9
86
0





A









766
AAAAGTT

gtgaAGGGTTCATATGCATAA
17677
NmeCas9
86
0



T

TCA









767
AAAA

AGGGTTCATATGCATAATC
17678
SpyCas9-
86
0





A

3var-









NRRH







768
GGTA

GAGAACTGGAGCCTTCAGA
17679
SpyCas9-
86
0





G

3var-









NRTH







769
GGT

GAGAACTGGAGCCTTCAGA
17680
SpyCas9-
86
0





G

SpG







770
GG

GAGAACTGGAGCCTTCAGA
17681
SpyCas9-
86
0





G

xCas







771
GG

GAGAACTGGAGCCTTCAGA
17682
SpyCas9-
86
0





G

xCas-NG







772
AAAAA

GAAGGGTTCATATGCATAA
17683
SauCas9KKH
87
0





TC









773
GGG

GGAGAACTGGAGCCTTCAG
17684
ScaCas9-
87
0





A

Sc++







774
GGG

GGAGAACTGGAGCCTTCAG
17685
SpyCas9
87
0





A









775
GG

GGAGAACTGGAGCCTTCAG
17686
SpyCas9-
87
0





A

NG







776
GGG

GGAGAACTGGAGCCTTCAG
17687
SpyCas9-
87
0





A

SpRY







777
AAA

AAGGGTTCATATGCATAAT
17688
SpyCas9-
87
0





C

SpRY







778
AAAAAG

GAAGGGTTCATATGCATAA
17689
cCas9-v17
87
0





TC









779
AAAAAG

GAAGGGTTCATATGCATAA
17690
cCas9-v42
87
0





TC









780
AAAA

gaAGGGTTCATATGCATAAT
17691
iSpyMacCas9
87
0





C









781
GGG

GGAGAACTGGAGCCTTCAG
17692
ScaCas9
87
0





A









782
GGG

GGAGAACTGGAGCCTTCAG
17693
ScaCas9-
87
0





A

HiFi-Sc++







783
GGGT

GGAGAACTGGAGCCTTCAG
17694
SpyCas9-
87
0





A

3var-









NRRH







784
AAAA

AAGGGTTCATATGCATAAT
17695
SpyCas9-
87
0





C

3var-









NRRH







785
GGG

GGAGAACTGGAGCCTTCAG
17696
SpyCas9-
87
0





A

HF1







786
GGG

GGAGAACTGGAGCCTTCAG
17697
SpyCas9-
87
0





A

SpG







787
GG

GGAGAACTGGAGCCTTCAG
17698
SpyCas9-
87
0





A

xCas







788
GG

GGAGAACTGGAGCCTTCAG
17699
SpyCas9-
87
0





A

xCas-NG







789
CAAAA

TGAAGGGTTCATATGCATA
17700
SauCas9KKH
88
0





AT









790
AGGG

TGGGAGAACTGGAGCCTTC
17701
SauriCas9
88
0





AG









791
AGGG

TGGGAGAACTGGAGCCTTC
17702
SauriCas9-
88
0





AG

KKH







792
AGG

GGGAGAACTGGAGCCTTCA
17703
ScaCas9-
88
0





G

Sc++







793
AGG

GGGAGAACTGGAGCCTTCA
17704
SpyCas9
88
0





G









794
AG

GGGAGAACTGGAGCCTTCA
17705
SpyCas9-
88
0





G

NG







795
AGG

GGGAGAACTGGAGCCTTCA
17706
SpyCas9-
88
0





G

SpRY







796
CAA

GAAGGGTTCATATGCATAA
17707
SpyCas9-
88
0





T

SpRY







797
CAAAAA

TGAAGGGTTCATATGCATA
17708
cCas9-v17
88
0





AT









798
CAAAAA

TGAAGGGTTCATATGCATA
17709
cCas9-v42
88
0





AT









799
CAAA

tgAAGGGTTCATATGCATAA
17710
iSpyMacCas9
88
0





T









800
AGG

GGGAGAACTGGAGCCTTCA
17711
ScaCas9
88
0





G









801
AGG

GGGAGAACTGGAGCCTTCA
17712
ScaCas9-
88
0





G

HiFi-Sc++







802
CAAA

GAAGGGTTCATATGCATAA
17713
SpyCas9-
88
0





T

3var-









NRRH







803
AGG

GGGAGAACTGGAGCCTTCA
17714
SpyCas9-
88
0





G

HF1







804
AGG

GGGAGAACTGGAGCCTTCA
17715
SpyCas9-
88
0





G

SpG







805
AG

GGGAGAACTGGAGCCTTCA
17716
SpyCas9-
88
0





G

xCas







806
AG

GGGAGAACTGGAGCCTTCA
17717
SpyCas9-
88
0





G

xCas-NG







807
CAAAAA

GAAGGGTTCATATGCATAA
17718
St1Cas9-
88
0





T

MTH17CL396







808
GAGGG

ttATGGGAGAACTGGAGCCT
17719
SauCas9
89
0





TCA









809
GAGGGT

ttATGGGAGAACTGGAGCCT
17720
SauCas9
89
0





TCA









810
GAGGG

ATGGGAGAACTGGAGCCTT
17721
SauCas9KKH
89
0





CA









811
GAGGGT

ATGGGAGAACTGGAGCCTT
17722
SauCas9KKH
89
0





CA









812
TCAAA

GTGAAGGGTTCATATGCAT
17723
SauCas9KKH
89
0





AA









813
GAGG

ATGGGAGAACTGGAGCCTT
17724
SauriCas9
89
0





CA









814
GAGG

ATGGGAGAACTGGAGCCTT
17725
SauriCas9-
89
0





CA

KKH







815
GAG

TGGGAGAACTGGAGCCTTC
17726
ScaCas9-
89
0





A

Sc++







816
GAG

TGGGAGAACTGGAGCCTTC
17727
SpyCas9-
89
0





A

SpRY







817
TCA

TGAAGGGTTCATATGCATA
17728
SpyCas9-
89
0





A

SpRY







818
GAGGGT

ATGGGAGAACTGGAGCCTT
17729
cCas9-v17
89
0





CA









819
TCAAAA

GTGAAGGGTTCATATGCAT
17730
cCas9-v17
89
0





AA









820
GAGGGT

ATGGGAGAACTGGAGCCTT
17731
cCas9-v42
89
0





CA









821
TCAAAA

GTGAAGGGTTCATATGCAT
17732
cCas9-v42
89
0





AA









822
GAG

TGGGAGAACTGGAGCCTTC
17733
ScaCas9
89
0





A









823
GAG

TGGGAGAACTGGAGCCTTC
17734
ScaCas9-
89
0





A

HiFi-Sc++







824
TCAAAA

TGAAGGGTTCATATGCATA
17735
St1Cas9-
89
0





A

MTH17CL396







825
AGAGG

TATGGGAGAACTGGAGCCT
17736
SauCas9KKH
90
0





TC









826
ATCAA

TGTGAAGGGTTCATATGCA
17737
SauCas9KKH
90
0





TA









827
AGAG

TATGGGAGAACTGGAGCCT
17738
SauriCas9-
90
0





TC

KKH







828
AG

ATGGGAGAACTGGAGCCTT
17739
SpyCas9-
90
0





C

NG







829
AGA

ATGGGAGAACTGGAGCCTT
17740
SpyCas9-
90
0





C

SpRY







830
ATC

GTGAAGGGTTCATATGCAT
17741
SpyCas9-
90
0





A

SpRY







831
AGAGGG

TATGGGAGAACTGGAGCCT
17742
cCas9-v17
90
0





TC









832
ATCAAA

TGTGAAGGGTTCATATGCA
17743
cCas9-v17
90
0





TA









833
AGAGGG

TATGGGAGAACTGGAGCCT
17744
cCas9-v42
90
0





TC









834
ATCAAA

TGTGAAGGGTTCATATGCA
17745
cCas9-v42
90
0





TA









835
AGA

ATGGGAGAACTGGAGCCTT
17746
SpyCas9-
90
0





C

SpG







836
AGAG

ATGGGAGAACTGGAGCCTT
17747
SpyCas9-
90
0





C

VQR







837
AG

ATGGGAGAACTGGAGCCTT
17748
SpyCas9-
90
0





C

xCas







838
AG

ATGGGAGAACTGGAGCCTT
17749
SpyCas9-
90
0





C

xCas-NG







839
CAGAG

gaTTATGGGAGAACTGGAGC
17750
SauCas9
91
0





CTT









840
CAGAG

TTATGGGAGAACTGGAGCC
17751
SauCas9KKH
91
0





TT









841
CAG

TATGGGAGAACTGGAGCCT
17752
ScaCas9-
91
0





T

Sc++







842
CAG

TATGGGAGAACTGGAGCCT
17753
SpyCas9-
91
0





T

SpRY







843
AAT

TGTGAAGGGTTCATATGCA
17754
SpyCas9-
91
0





T

SpRY







844
CAGAGG

TTATGGGAGAACTGGAGCC
17755
cCas9-v17
91
0





TT









845
CAGAGG

TTATGGGAGAACTGGAGCC
17756
cCas9-v42
91
0





TT









846
CAG

TATGGGAGAACTGGAGCCT
17757
ScaCas9
91
0





T









847
CAG

TATGGGAGAACTGGAGCCT
17758
ScaCas9-
91
0





T

HiFi-Sc++







848
CAGA

TATGGGAGAACTGGAGCCT
17759
SpyCas9-
91
0





T

3var-









NRRH







849
AATC

TGTGAAGGGTTCATATGCA
17760
SpyCas9-
91
0





T

3var-









NRTH







850
TCAGA

ATTATGGGAGAACTGGAGC
17761
SauCas9KKH
92
0





CT









851
TCAG

ATTATGGGAGAACTGGAGC
17762
SauriCas9-
92
0





CT

KKH







852
TCA

TTATGGGAGAACTGGAGCC
17763
SpyCas9-
92
0





T

SpRY







853
TAA

GTGTGAAGGGTTCATATGC
17764
SpyCas9-
92
0





A

SpRY







854
TAATCAA

gtagTGTGAAGGGTTCATATG
17765
BlatCas9
92
0



A

CA









855
TAATCAA

gtagTGTGAAGGGTTCATATG
17766
BlatCas9
92
0



A

CA









856
TAATC

gtagTGTGAAGGGTTCATATG
17767
BlatCas9
92
0





CA









857
TCAGAG

ATTATGGGAGAACTGGAGC
17768
cCas9-v17
92
0





CT









858
TCAGAG

ATTATGGGAGAACTGGAGC
17769
cCas9-v42
92
0





CT









859
TAATCAA

gtAGTGTGAAGGGTTCATAT
17770
GeoCas9
92
0



A

GCA









860
TAAT

agTGTGAAGGGTTCATATGC
17771
iSpyMacCas9
92
0





A









861
TAAT

GTGTGAAGGGTTCATATGC
17772
SpyCas9-
92
0





A

3var-









NRRH







862
TTCAG

GATTATGGGAGAACTGGAG
17773
SauCas9KKH
93
0





CC









863
TTC

ATTATGGGAGAACTGGAGC
17774
SpyCas9-
93
0





C

SpRY







864
ATA

AGTGTGAAGGGTTCATATG
17775
SpyCas9-
93
0





C

SpRY







865
TTCAGA

GATTATGGGAGAACTGGAG
17776
cCas9-v17
93
0





CC









866
TTCAGA

GATTATGGGAGAACTGGAG
17777
cCas9-v42
93
0





CC









867
CATAA

GTAGTGTGAAGGGTTCATA
17778
SauCas9KKH
94
0





TG









868
CATAAT

GTAGTGTGAAGGGTTCATA
17779
SauCas9KKH
94
0





TG









869
CTT

GATTATGGGAGAACTGGAG
17780
SpyCas9-
94
0





C

SpRY







870
CAT

TAGTGTGAAGGGTTCATAT
17781
SpyCas9-
94
0





G

SpRY







871
CATA

TAGTGTGAAGGGTTCATAT
17782
SpyCas9-
94
0





G

3var-









NRTH







872
CCT

TGATTATGGGAGAACTGGA
17783
SpyCas9-
95
0





G

SpRY







873
GCA

GTAGTGTGAAGGGTTCATA
17784
SpyCas9-
95
0





T

SpRY







874
CCTTC

tggtGATTATGGGAGAACTGG
17785
BlatCas9
95
0





AG









875
CCTTCAG

tggtGATTATGGGAGAACTGG
17786
BlatCas9
95
0



A

AG









876
GCATAAT

GGGTAGTGTGAAGGGTTCA
17787
CdiCas9
95
0





TAT









877
TG

GGTAGTGTGAAGGGTTCAT
17788
SpyCas9-
96
0





A

NG







878
GCC

GTGATTATGGGAGAACTGG
17789
SpyCas9-
96
0





A

SpRY







879
TGC

GGTAGTGTGAAGGGTTCAT
17790
SpyCas9-
96
0





A

SpRY







880
TGCA

GGTAGTGTGAAGGGTTCAT
17791
SpyCas9-
96
0





A

3var-









NRCH







881
TGC

GGTAGTGTGAAGGGTTCAT
17792
SpyCas9-
96
0





A

SpG







882
TG

GGTAGTGTGAAGGGTTCAT
17793
SpyCas9-
96
0





A

xCas







883
TG

GGTAGTGTGAAGGGTTCAT
17794
SpyCas9-
96
0





A

xCas-NG







884
ATG

GGGTAGTGTGAAGGGTTCA
17795
ScaCas9-
97
0





T

Sc++







885
AG

GGTGATTATGGGAGAACTG
17796
SpyCas9-
97
0





G

NG







886
AGC

GGTGATTATGGGAGAACTG
17797
SpyCas9-
97
0





G

SpRY







887
ATG

GGGTAGTGTGAAGGGTTCA
17798
SpyCas9-
97
0





T

SpRY







888
ATG

GGGTAGTGTGAAGGGTTCA
17799
ScaCas9
97
0





T









889
ATG

GGGTAGTGTGAAGGGTTCA
17800
ScaCas9-
97
0





T

HiFi-Sc++







890
AGCC

GGTGATTATGGGAGAACTG
17801
SpyCas9-
97
0





G

3var-









NRCH







891
AGC

GGTGATTATGGGAGAACTG
17802
SpyCas9-
97
0





G

SpG







892
AG

GGTGATTATGGGAGAACTG
17803
SpyCas9-
97
0





G

xCas







893
AG

GGTGATTATGGGAGAACTG
17804
SpyCas9-
97
0





G

xCas-NG







894
GAG

TGGTGATTATGGGAGAACT
17805
ScaCas9-
98
0





G

Sc++







895
GAG

TGGTGATTATGGGAGAACT
17806
SpyCas9-
98
0





G

SpRY







896
TAT

TGGGTAGTGTGAAGGGTTC
17807
SpyCas9-
98
0





A

SpRY







897
GAGCC

taatGGTGATTATGGGAGAAC
17808
BlatCas9
98
0





TG









898
TATGC

atttGGGTAGTGTGAAGGGTT
17809
BlatCas9
98
0





CA









899
TATGCAT

atttGGGTAGTGTGAAGGGTT
17810
BlatCas9
98
0



A

CA









900
GAGCCTT

AATGGTGATTATGGGAGAA
17811
CdiCas9
98
0





CTG









901
GAG

TGGTGATTATGGGAGAACT
17812
ScaCas9
98
0





G









902
GAG

TGGTGATTATGGGAGAACT
17813
ScaCas9-
98
0





G

HiFi-Sc++







903
GAGC

TGGTGATTATGGGAGAACT
17814
SpyCas9-
98
0





G

3var-









NRRH







904
GGAGCC

tcTAATGGTGATTATGGGAG
17815
Nme2Cas9
99
0





AACT









905
GGAG

AATGGTGATTATGGGAGAA
17816
SauriCas9-
99
0





CT

KKH







906
GG

ATGGTGATTATGGGAGAAC
17817
SpyCas9-
99
0





T

NG







907
GGA

ATGGTGATTATGGGAGAAC
17818
SpyCas9-
99
0





T

SpRY







908
ATA

TTGGGTAGTGTGAAGGGTT
17819
SpyCas9-
99
0





C

SpRY







909
GGAGCCT

ctaaTGGTGATTATGGGAGAA
17820
BlatCas9
99
0



T

CT









910
GGAGC

ctaaTGGTGATTATGGGAGAA
17821
BlatCas9
99
0





CT









911
GGA

ATGGTGATTATGGGAGAAC
17822
SpyCas9-
99
0





T

SpG







912
GGAG

ATGGTGATTATGGGAGAAC
17823
SpyCas9-
99
0





T

VQR







913
GG

ATGGTGATTATGGGAGAAC
17824
SpyCas9-
99
0





T

xCas







914
GG

ATGGTGATTATGGGAGAAC
17825
SpyCas9-
99
0





T

xCas-NG







915
TGGAG

tcTAATGGTGATTATGGGAG
17826
SauCas9
100
0





AAC









916
TGGAG

TAATGGTGATTATGGGAGA
17827
SauCas9KKH
100
0





AC









917
TGG

AATGGTGATTATGGGAGAA
17828
ScaCas9-
100
0





C

Sc++







918
TGG

AATGGTGATTATGGGAGAA
17829
SpyCas9
100
0





C









919
TG

AATGGTGATTATGGGAGAA
17830
SpyCas9-
100
0





C

NG







920
TGG

AATGGTGATTATGGGAGAA
17831
SpyCas9-
100
0





C

SpRY







921
TGGAGC

TAATGGTGATTATGGGAGA
17832
cCas9-v17
100
0





AC









922
TGGAGC

TAATGGTGATTATGGGAGA
17833
cCas9-v42
100
0





AC









923
TGG

AATGGTGATTATGGGAGAA
17834
ScaCas9
100
0





C









924
TGG

AATGGTGATTATGGGAGAA
17835
ScaCas9-
100
0





C

HiFi-Sc++







925
TGGA

AATGGTGATTATGGGAGAA
17836
SpyCas9-
100
0





C

3var-









NRRH







926
TGG

AATGGTGATTATGGGAGAA
17837
SpyCas9-
100
0





C

HF1







927
TGG

AATGGTGATTATGGGAGAA
17838
SpyCas9-
100
0





C

SpG







928
TG

AATGGTGATTATGGGAGAA
17839
SpyCas9-
100
0





C

xCas







929
TG

AATGGTGATTATGGGAGAA
17840
SpyCas9-
100
0





C

xCas-NG










Table 1 provides a gRNA database for correcting the pathogenic F508del mutation in CFTR. List of spacers, PAMs, and Cas variants for generating a nick at an appropriate position to enable installation of a desired genomic edit with a gene modifying system. The spacers in this table are designed to be used with a gene modifying polypeptide comprising a nickase variant of the Cas species indicated in the table. Tables 2, 3, and 4 detail the other components of the system and are organized such that the ID number shown here in Column 1 (“ID”) is meant to correspond to the same ID number in the subsequent tables.


In the exemplary template sequences provided herein, capital letters indicate “core nucleotides” while lower case letters indicate “flanking nucleotides.” Herein, when an RNA sequence (e.g., a template RNA sequence) is said to comprise a particular sequence (e.g., a sequence of Table 1 or a portion thereof) that comprises thymine (T), it is of course understood that the RNA sequence may (and frequently does) comprise uracil (U) in place of T. For instance, the RNA sequence may comprise U at every position shown as T in the sequence in Table 1. More specifically, the present disclosure provides an RNA sequence according to every gRNA spacer sequence shown in Table 1, wherein the RNA sequence has a U in place of each T in the sequence in Table 1.


In some embodiments of the systems and methods herein, the heterologous object sequence comprises the core nucleotides of an RT template sequence from Table 3. In some embodiments, the heterologous object sequence additionally comprises one or more (e.g., 2, 3, 4, 5, 10, 20, 30, 40, or all) consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the RT template sequence. In some embodiments, the heterologous object sequence comprises the core nucleotides of the RT template sequence of Table 3 that corresponds to the gRNA spacer sequence. In the context of the sequence tables, a first component “corresponds to” a second component when both components have the same ID number in the referenced table. For example, for a gRNA spacer of ID #1, the corresponding RT template would be the RT template also having ID #1. In some embodiments, the heterologous object sequence additionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the RT template sequence.


In some embodiments, the primer binding site (PBS) sequence has a sequence comprising the core nucleotides of a PBS sequence from the same row of Table 3 as the RT template sequence. In some embodiments, the PBS sequence additionally comprises one or more (e.g., 1, 2, 3, 4, 5, 6, 7, or all) consecutive nucleotides starting with the 5′ end of the flanking nucleotides of the primer region.









TABLE 3







Exemplary RT sequence (heterologous object sequence) and PBS sequence pairs













SEQ ID

SEQ ID


ID
RT Sequence
NO
PBS Sequence
NO














10
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17841
TGATATTTtctttaat
17979





g






11
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17842
TGTTTCCTatgatga
17980



GG

at






12
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17843
TGATATTTtctttaat
17981





g






13
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17844
TGTTTCCTatgatga
17982



GG

at






20
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17845
GTTTCCTAtgatgaa
17983



GGT

ta






21
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17846
GTTTCCTAtgatgaa
17984



GGT

ta






22
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17847
GATATTTTctttaatg
17985



T

g






23
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17848
GATATTTTctttaatg
17986



T

g






24
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17849
GTTTCCTAtgatgaa
17987



GGT

ta






28
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17850
ATATTTTCtttaatgg
17988



TG

t






29
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17851
ATATTTTCtttaatgg
17989



TG

t






30
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17852
ATATTTTCtttaatgg
17990



TG

t






31
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17853
ATATTTTCtttaatgg
17991



TG

t






32
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17854
ATATTTTCtttaatgg
17992



TG

t






33
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17855
TTTCCTATgatgaat
17993



GGTG

at






45
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17856
TATTTTCTttaatggt
17994



TGA

g






46
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17857
TATTTTCTttaatggt
17995



TGA

g






47
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17858
TATTTTCTttaatggt
17996



TGA

g






48
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17859
TATTTTCTttaatggt
17997



TGA

g






49
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17860
TTCCTATGatgaatat
17998



GGTGT

a






56
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17861
ATTTTCTTtaatggtg
17999



TGAT

C






57
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17862
ATTTTCTTtaatggtg
18000



TGAT

C






58
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17863
TCCTATGAtgaatat
18001



GGTGTT

ag






59
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17864
TCCTATGAtgaatat
18002



GGTGTT

ag






60
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17865
ATTTTCTTtaatggtg
18003



TGAT

C






61
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17866
TCCTATGAtgaatat
18004



GGTGTT

ag






64
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17867
TTTTCTTTaatggtgc
18005



TGATA

c






65
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17868
CCTATGATgaatata
18006



GGTGTTT

ga






68
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17869
TTTCTTTAatggtgc
18007



TGATAT

ca






69
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17870
CTATGATGaatatag
18008



GGTGTTTC

at






70
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17871
CTATGATGaatatag
18009



GGTGTTTC

at






71
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17872
CTATGATGaatatag
18010



GGTGTTTC

at






72
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17873
TATGATGAatataga
18011



GGTGTTTCC

ta






73
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17874
TTCTTTAAtggtgcc
18012



TGATATT

ag






74
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17875
TATGATGAatataga
18013



GGTGTTTCC

ta






75
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17876
TATGATGAatataga
18014



GGTGTTTCC

ta






76
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17877
TATGATGAatataga
18015



GGTGTTTCC

ta






77
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17878
TATGATGAatataga
18016



GGTGTTTCC

ta






81
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17879
TCTTTAATggtgcca
18017



TGATATTT

gg






82
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17880
TCTTTAATggtgcca
18018



TGATATTT

gg






83
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17881
ATGATGAAtatagat
18019



GGTGTTTCCT

ac






88
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17882
CTTTAATGgtgcca
18020



TGATATTTT

ggc






89
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17883
TGATGAATatagata
18021



GGTGTTTCCTA

ca






90
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17884
CTTTAATGgtgcca
18022



TGATATTTT

ggc






91
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17885
CTTTAATGgtgcca
18023



TGATATTTT

ggc






92
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17886
TGATGAATatagata
18024



GGTGTTTCCTA

ca






98
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17887
GATGAATAtagata
18025



GGTGTTTCCTAT

cag






99
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17888
GATGAATAtagata
18026



GGTGTTTCCTAT

cag






100
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17889
TTTAATGGtgccag
18027



TGATATTTTC

gca






101
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17890
GATGAATAtagata
18028



GGTGTTTCCTAT

cag






111
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17891
ATGAATATagatac
18029



GGTGTTTCCTATG

aga






112
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17892
ATGAATATagatac
18030



GGTGTTTCCTATG

aga






113
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17893
ATGAATATagatac
18031



GGTGTTTCCTATG

aga






114
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17894
ATGAATATagatac
18032



GGTGTTTCCTATG

aga






115
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17895
ATGAATATagatac
18033



GGTGTTTCCTATG

aga






116
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17896
TTAATGGTgccagg
18034



TGATATTTTCT

cat






117
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17897
ATGAATATagatac
18035



GGTGTTTCCTATG

aga






133
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17898
TGAATATAgataca
18036



GGTGTTTCCTATGA

gaa






134
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17899
TGAATATAgataca
18037



GGTGTTTCCTATGA

gaa






135
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17900
TGAATATAgataca
18038



GGTGTTTCCTATGA

gaa






136
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17901
TGAATATAgataca
18039



GGTGTTTCCTATGA

gaa






137
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17902
TGAATATAgataca
18040



GGTGTTTCCTATGA

gaa






138
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17903
TAATGGTGccaggc
18041



TGATATTTTCTT

ata






139
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17904
TGAATATAgataca
18042



GGTGTTTCCTATGA

gaa






140
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17905
TGAATATAgataca
18043



GGTGTTTCCTATGA

gaa






148
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17906
AATGGTGCcaggca
18044



TGATATTTTCTTT

taa






149
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17907
AATGGTGCcaggca
18045



TGATATTTTCTTT

taa






150
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17908
GAATATAGatacag
18046



GGTGTTTCCTATGAT

aag






151
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17909
GAATATAGatacag
18047



GGTGTTTCCTATGAT

aag






152
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17910
AATGGTGCcaggca
18048



TGATATTTTCTTT

taa






153
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17911
GAATATAGatacag
18049



GGTGTTTCCTATGAT

aag






154
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17912
GAATATAGatacag
18050



GGTGTTTCCTATGAT

aag






162
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17913
ATGGTGCCaggcat
18051



TGATATTTTCTTTA

aat






163
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17914
AATATAGAtacaga
18052



GGTGTTTCCTATGATG

agc






164
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17915
ATGGTGCCaggcat
18053



TGATATTTTCTTTA

aat






165
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17916
ATGGTGCCaggcat
18054



TGATATTTTCTTTA

aat






166
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17917
AATATAGAtacaga
18055



GGTGTTTCCTATGATG

agc






178
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17918
TGGTGCCAggcata
18056



TGATATTTTCTTTAA

atc






179
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17919
TGGTGCCAggcata
18057



TGATATTTTCTTTAA

atc






180
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17920
TGGTGCCAggcata
18058



TGATATTTTCTTTAA

atc






181
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17921
TGGTGCCAggcata
18059



TGATATTTTCTTTAA

atc






182
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17922
ATATAGATacagaa
18060



GGTGTTTCCTATGATGA

gcg






189
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17923
GGTGCCAGgcataa
18061



TGATATTTTCTTTAAT

tcc






190
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17924
GGTGCCAGgcataa
18062



TGATATTTTCTTTAAT

tcc






191
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17925
GGTGCCAGgcataa
18063



TGATATTTTCTTTAAT

tcc






192
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17926
TATAGATAcagaag
18064



GGTGTTTCCTATGATGAA

cgt






193
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17927
GGTGCCAGgcataa
18065



TGATATTTTCTTTAAT

tcc






198
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17928
GTGCCAGGcataatc
18066



TGATATTTTCTTTAATG

ca






199
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17929
GTGCCAGGcataatc
18067



TGATATTTTCTTTAATG

ca






200
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17930
ATAGATACagaagc
18068



GGTGTTTCCTATGATGAAT

gtc






206
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17931
TGCCAGGCataatcc
18069



TGATATTTTCTTTAATGG

ag






207
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17932
TGCCAGGCataatcc
18070



TGATATTTTCTTTAATGG

ag






208
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17933
TAGATACAgaagcg
18071



GGTGTTTCCTATGATGAATA

tca






209
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17934
TAGATACAgaagcg
18072



GGTGTTTCCTATGATGAATA

tca






210
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17935
TAGATACAgaagcg
18073



GGTGTTTCCTATGATGAATA

tca






214
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17936
GCCAGGCAtaatcc
18074



TGATATTTTCTTTAATGGT

agg






215
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17937
GCCAGGCAtaatcc
18075



TGATATTTTCTTTAATGGT

agg






216
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17938
AGATACAGaagcgt
18076



GGTGTTTCCTATGATGAATAT

cat






217
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17939
CCAGGCATaatcca
18077



TGATATTTTCTTTAATGGTG

gga






218
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17940
GATACAGAagcgtc
18078



GGTGTTTCCTATGATGAATATA

atc






220
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17941
CAGGCATAatccag
18079



TGATATTTTCTTTAATGGTGC

gaa






221
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17942
ATACAGAAgcgtca
18080



GGTGTTTCCTATGATGAATATAG

tca






222
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17943
ATACAGAAgcgtca
18081



GGTGTTTCCTATGATGAATATAG

tca






224
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17944
AGGCATAAtccagg
18082



TGATATTTTCTTTAATGGTGCC

aaa






225
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17945
TACAGAAGcgtcat
18083



GGTGTTTCCTATGATGAATATAGA

caa






228
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17946
GGCATAATccagga
18084



TGATATTTTCTTTAATGGTGCCA

aaa






229
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17947
GGCATAATccagga
18085



TGATATTTTCTTTAATGGTGCCA

aaa






230
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17948
ACAGAAGCgtcatc
18086



GGTGTTTCCTATGATGAATATAGAT

aaa






233
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17949
CAGAAGCGtcatca
18087



GGTGTTTCCTATGATGAATATAGATA

aag






234
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17950
GCATAATCcaggaa
18088



TGATATTTTCTTTAATGGTGCCAG

aac






235
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17951
CAGAAGCGtcatca
18089



GGTGTTTCCTATGATGAATATAGATA

aag






236
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17952
GCATAATCcaggaa
18090



TGATATTTTCTTTAATGGTGCCAG

aac






237
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17953
GCATAATCcaggaa
18091



TGATATTTTCTTTAATGGTGCCAG

aac






240
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17954
CATAATCCaggaaa
18092



TGATATTTTCTTTAATGGTGCCAGG

act






241
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17955
CATAATCCaggaaa
18093



TGATATTTTCTTTAATGGTGCCAGG

act






242
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17956
CATAATCCaggaaa
18094



TGATATTTTCTTTAATGGTGCCAGG

act






243
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17957
AGAAGCGTcatcaa
18095



GGTGTTTCCTATGATGAATATAGATAC

agc






244
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17958
CATAATCCaggaaa
18096



TGATATTTTCTTTAATGGTGCCAGG

act






245
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17959
CATAATCCaggaaa
18097



TGATATTTTCTTTAATGGTGCCAGG

act






250
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17960
ATAATCCAggaaaa
18098



TGATATTTTCTTTAATGGTGCCAGGC

ctg






251
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17961
ATAATCCAggaaaa
18099



TGATATTTTCTTTAATGGTGCCAGGC

ctg






252
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17962
ATAATCCAggaaaa
18100



TGATATTTTCTTTAATGGTGCCAGGC

ctg






253
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17963
ATAATCCAggaaaa
18101



TGATATTTTCTTTAATGGTGCCAGGC

ctg






254
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17964
GAAGCGTCatcaaa
18102



GGTGTTTCCTATGATGAATATAGATACA

gca






263
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17965
TAATCCAGgaaaac
18103



TGATATTTTCTTTAATGGTGCCAGGCA

tga






264
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17966
TAATCCAGgaaaac
18104



TGATATTTTCTTTAATGGTGCCAGGCA

tga






265
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17967
TAATCCAGgaaaac
18105



TGATATTTTCTTTAATGGTGCCAGGCA

tga






266
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17968
TAATCCAGgaaaac
18106



TGATATTTTCTTTAATGGTGCCAGGCA

tga






267
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17969
AAGCGTCAtcaaag
18107



GGTGTTTCCTATGATGAATATAGATACAG

cat






268
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17970
TAATCCAGgaaaac
18108



TGATATTTTCTTTAATGGTGCCAGGCA

tga






272
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17971
AATCCAGGaaaact
18109



TGATATTTTCTTTAATGGTGCCAGGCAT

gag






273
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17972
AATCCAGGaaaact
18110



TGATATTTTCTTTAATGGTGCCAGGCAT

gag






274
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17973
AGCGTCATcaaagc
18111



GGTGTTTCCTATGATGAATATAGATACAGA

atg






275
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17974
AGCGTCATcaaagc
18112



GGTGTTTCCTATGATGAATATAGATACAGA

atg






276
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17975
AGCGTCATcaaagc
18113



GGTGTTTCCTATGATGAATATAGATACAGA

atg






278
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17976
GCGTCATCaaagca
18114



GGTGTTTCCTATGATGAATATAGATACAGAA

tgo






279
gcatgctttgatgacgcttctgtatctatattcatcataggaaacaccaAAGA
17977
ATCCAGGAaaactg
18115



TGATATTTTCTTTAATGGTGCCAGGCATA

aga






280
tgttctcagttttcctggattatgcctggcaccattaaagaaaatatcaTCTTT
17978
GCGTCATCaaagca
18116



GGTGTTTCCTATGATGAATATAGATACAGAA

tgo










Table 3 provides exemplified PBS sequences and heterologous object sequences (reverse transcription template regions) of a template RNA for correcting the pathogenic F508del mutation in CFTR. The gRNA spacers from Table 1 were filtered, e.g., filtered by occurrence within 15 nt of the desired editing location and use of a Tier 1 Cas enzyme. PBS sequences and heterologous object sequences (reverse transcription template regions) were designed relative to the nick site directed by the cognate gRNA from Table 1, as described in this application. For exemplification, these regions were designed to be 8-17 nt (priming) and 1-50 nt extended beyond the location of the edit (RT). Without wishing to be limited by example, given variability of length, sequences are provided that use the maximum length parameters and comprise all templates of shorter length within the given parameters. Sequences are shown with uppercase letters indicating core sequence and lowercase letters indicating flanking sequence that may be truncated within the described length parameters.


Capital letters indicate “core nucleotides” while lower case letters indicate “flanking nucleotides.” Herein, when an RNA sequence (e.g., a template RNA sequence) is said to comprise a particular sequence (e.g., a sequence of Table 3 or a portion thereof) that comprises thymine (T), it is of course understood that the RNA sequence may (and frequently does) comprise uracil (U) in place of T. For instance, the RNA sequence may comprise U at every position shown as T in the sequence in Table 3. More specifically, the present disclosure provides an RNA sequence according to every heterologous object sequence and PBS sequence shown in Table 3, wherein the RNA sequence has a U in place of each T in the sequence of Table 3.


In some embodiments of the systems and methods herein, the template RNA comprises a gRNA scaffold (e.g., that binds a gene modifying polypeptide, e.g., a Cas polypeptide) that comprises a sequence of a gRNA scaffold of Table 12. In some embodiments, the gRNA scaffold comprises a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity to a gRNA scaffold of Table 12. In some embodiments, the gRNA scaffold comprises a sequence of a scaffold region of Table 12 that corresponds to the RT template sequence, the spacer sequence, or both, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.


In some embodiments of the systems and methods herein, the system further comprises a second strand-targeting gRNA that directs a nick to the second strand of the human CFTR gene. In some embodiments, the second strand-targeting gRNA comprises a left gRNA spacer sequence or a right gRNA spacer sequence from Table 2. In some embodiments, the gRNA spacer additionally comprises one or more (e.g., 2, 3, or all) consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the left gRNA spacer sequence or right gRNA spacer sequence. In some embodiments, the second strand-targeting gRNA comprises a sequence comprising the core nucleotides of a second nick gRNA sequence from Table 4, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, the second nick gRNA sequence additionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the second nick gRNA sequence. In some embodiments, the second nick gRNA comprises a gRNA scaffold sequence that is orthogonal to the Cas domain of the gene modifying polypeptide. In some embodiments, the second strand-targeting gRNA comprises a sequence comprising the nucleotides of a second nick gRNA sequence from Table G3 or G3A, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. In some embodiments, the second nick gRNA comprises a gRNA scaffold sequence of Table 12.









TABLE 2







Exemplary left gRNA spacer and right gRNA spacer pairs

















SEQ

SEQ

SEQ

SEQ




ID
Left
ID

ID
Right
ID


ID
Left gRNA Spacer
NO
PAM
NO
Right gRNA Spacer
NO
PAM
NO


















10
ATTTTACCCTCTGA
18117
CAG

GGGTAGTGTGAAG
18393
ATG




AGGCTC



GGTTCAT








11
TGGTGATTATGGGA
18118
GAG

AAAACTTTTTGATT
18394
ATG




GAACTG



ATGCAT








12
TCTGAAGGCTCCAG
18119
CAT

GGGTAGTGTGAAG
18395
ATG




TTCTCC



GGTTCAT








13
GGTGATTATGGGAG
18120
AGC

GATTATGCATATG
18396
CAC




AACTGG



AACCCTT








20
GATTATGGGAGAA
18121
TTCA

GCATATGAACCCTT
18397
CCCA




CTGGAGCC

G

CACACTA

A






21
GATTATGGGAGAA
18122
TTCA

GCATATGAACCCTT
18398
CCCA




CTGGAGCC

G

CACACTA

A






22
TTTTACCCTCTGAA
18123
AG

GGTAGTGTGAAGG
18399
TG




GGCTCC



GTTCATA








23
CTGAAGGCTCCAGT
18124
ATA

GGTAGTGTGAAGG
18400
TGC




TCTCCC



GTTCATA








24
GTGATTATGGGAGA
18125
GCC

ATTATGCATATGA
18401
ACA




ACTGGA



ACCCTTC








28
tgaAGGCTCCAGTTC
18126
CACC

gtgAAGGGTTCATAT
18402
AAAA




TCCCATAAT

ATT

GCATAATCA

GTT






29
AGTTCTCCCATAAT
18127
TAG

GGGTAGTGTGAAG
18403
ATG




CACCAT



GGTTCAT








30
TGCTTAATTTTACC
18128
AGG

TATAATTTGGGTAG
18404
GGG




CTCTGA



TGTGAA








31
TTTTACCCTCTGAA
18129
AG

GGTAGTGTGAAGG
18405
TG




GGCTCC



GTTCATA








32
TGAAGGCTCCAGTT
18130
TAA

GTAGTGTGAAGGG
18406
GCA




CTCCCA



TTCATAT








33
TGATTATGGGAGAA
18131
CCT

TTATGCATATGAAC
18407
CAC




CTGGAG



CCTTCA








45
TGTGCTTAATTTTA
18132
AAGG

TATATAATTTGGGT
18408
AGGG




CCCTCTG



AGTGTGA








46
CCAGTTCTCCCATA
18133
TTAG

AGGGTTCATATGC
18409
AAAG




ATCACCA



ATAATCAA








47
AGTTCTCCCATAAT
18134
TAG

GGGTAGTGTGAAG
18410
ATG




CACCAT



GGTTCAT








48
GAAGGCTCCAGTTC
18135
AAT

TAGTGTGAAGGGT
18411
CAT




TCCCAT



TCATATG








49
GATTATGGGAGAA
18136
CTT

TATGCATATGAAC
18412
ACT




CTGGAGC



CCTTCAC








56
CTCTGAAGGCTCCA
18137
CATA

GTAGTGTGAAGGG
18413
CATA




GTTCTCC

AT

TTCATATG

A






57
CTCTGAAGGCTCCA
18138
CATA

GTAGTGTGAAGGG
18414
CATA




GTTCTCC

AT

TTCATATG

A






58
GATTATGGGAGAA
18139
TTCA

GCATATGAACCCTT
18415
CCCA




CTGGAGCC

G

CACACTA

A






59
GATTATGGGAGAA
18140
TTCA

GCATATGAACCCTT
18416
CCCA




CTGGAGCC

G

CACACTA

A






60
AAGGCTCCAGTTCT
18141
ATC

AGTGTGAAGGGTT
18417
ATA




CCCATA



CATATGC








61
ATTATGGGAGAACT
18142
TTC

ATGCATATGAACC
18418
CTA




GGAGCC



CTTCACA








64
AGGCTCCAGTTCTC
18143
TCA

GTGTGAAGGGTTC
18419
TAA




CCATAA



ATATGCA








65
TTATGGGAGAACTG
18144
TCA

TGCATATGAACCCT
18420
TAC




GAGCCT



TCACAC








68
GGCTCCAGTTCTCC
18145
CAC

TGTGAAGGGTTCA
18421
AAT




CATAAT



TATGCAT








69
TATGGGAGAACTG
18146
CAG

GCATATGAACCCTT
18422
ACC




GAGCCTT



CACACT








70
tggtGATTATGGGAG
18147
CCTTC

ttatGCATATGAACC
18423
TACC




AACTGGAG



CTTCACAC

C






71
tggtGATTATGGGAG
18148
CCTTC

ttatGCATATGAACC
18424
TACC




AACTGGAG



CTTCACAC

C






72
tcTAATGGTGATTAT
18146
GGAG

gaTTATGCATATGA
18425
CTAC




GGGAGAACT

CC

ACCCTTCACA

CC






73
GCTCCAGTTCTCCC
18150
ACC

GTGAAGGGTTCAT
18426
ATC




ATAATC



ATGCATA








74
ATGGGAGAACTGG
18151
AGA

CATATGAACCCTTC
18427
CCC




AGCCTTC



ACACTA








75
tggtGATTATGGGAG
18152
CCTTC

ttatGCATATGAACC
18428
TACC




AACTGGAG



CTTCACAC

C






76
tggtGATTATGGGAG
18153
CCTTC

ttatGCATATGAACC
18429
TACC




AACTGGAG



CTTCACAC

C






77
tggtGATTATGGGAG
18154
CCTTC

ttatGCATATGAACC
18430
TACC




AACTGGAG



CTTCACAC

C






81
tgaAGGCTCCAGTTC
18155
CACC

gtgAAGGGTTCATAT
18431
AAAA




TCCCATAAT

ATT

GCATAATCA

GTT






82
CTCCAGTTCTCCCA
18156
CCA

TGAAGGGTTCATA
18432
TCA




TAATCA



TGCATAA








83
TGGGAGAACTGGA
18157
GAG

ATATGAACCCTTCA
18433
CCA




GCCTTCA



CACTAC








88
TCCAGTTCTCCCAT
18158
CAT

GAAGGGTTCATAT
18434
CAA




AATCAC



GCATAAT








89
GGGAGAACTGGAG
18159
AGG

TATGAACCCTTCAC
18435
CAA




CCTTCAG



ACTACC








90
tgaaGGCTCCAGTTCT
18160
TCAC

gtagTGTGAAGGGTT
18436
TAAT




CCCATAA

CATT

CATATGCA

CAAA






91
tgaaGGCTCCAGTTCT
18161
TCAC

gtagTGTGAAGGGTT
18437
TAAT




CCCATAA

CATT

CATATGCA

CAAA






92
tggtGATTATGGGAG
18162
CCTTC

ttatGCATATGAACC
18438
TACC




AACTGGAG



CTTCACAC

C






98
GGAGAACTGGAGC
18163
GGTA

CATATGAACCCTTC
18439
CCAA




CTTCAGAG

A

ACACTAC

A






99
GGAGAACTGGAGC
18164
GG

CACTACCCAAATT
18440
TG




CTTCAGA



ATATATT








100
CCAGTTCTCCCATA
18165
ATT

AAGGGTTCATATG
18441
AAA




ATCACC



CATAATC








101
GGAGAACTGGAGC
18166
GGG

ATGAACCCTTCAC
18442
AAA




CTTCAGA



ACTACCC








111
ttATGGGAGAACTGG
18167
GAGG

gtGAAAACTTTTTG
18443
ATGA




AGCCTTCA

G

ATTATGCAT

A






112
GGAGAACTGGAGC
18168
GGTA

CATATGAACCCTTC
18444
CCAA




CTTCAGAG

A

ACACTAC

A






113
GGAGAACTGGAGC
18169
GGG

ACACTACCCAAAT
18445
TTG




CTTCAGA



TATATAT








114
GGAGAACTGGAGC
18170
GGG

CACTACCCAAATT
18446
TGG




CTTCAGA



ATATATT








115
GAGAACTGGAGCC
18171
GG

CACTACCCAAATT
18447
TG




TTCAGAG



ATATATT








116
CAGTTCTCCCATAA
18172
TTA

AGGGTTCATATGC
18448
AAA




TCACCA



ATAATCA








117
GAGAACTGGAGCC
18173
GGT

TGAACCCTTCACAC
18449
AAT




TTCAGAG



TACCCA








133
ttATGGGAGAACTGG
18174
GAGG

gtGAAAACTTTTTG
18450
ATGA




AGCCTTCA

G

ATTATGCAT

A






134
GAGAACTGGAGCC
18175
GTAA

CATATGAACCCTTC
18451
CCAA




TTCAGAGG

A

ACACTAC

A






135
TGGGAGAACTGGA
18176
AGGG

CACACTACCCAAA
18452
TTGG




GCCTTCAG



TTATATAT








136
TGGGAGAACTGGA
18177
AGGG

CACACTACCCAAA
18453
TTGG




GCCTTCAG



TTATATAT








137
GGAGAACTGGAGC
18178
GGG

ACACTACCCAAAT
18454
TTG




CTTCAGA



TATATAT








138
AGTTCTCCCATAAT
18179
TAG

GGGTTCATATGCAT
18455
AAA




CACCAT



AATCAA








139
AGAACTGGAGCCTT
18180
GTA

GAACCCTTCACACT
18456
ATT




CAGAGG



ACCCAA








140
GGGTAAAATTAAG
18181
GAAG

AGCATGCCAACTA
18457
TAAG




CACAGTG

AAT

GAAGAGG

AAA






148
AGTTCTCCCATAAT
18182
AGAA

AAGGGTTCATATG
18458
AAAA




CACCATT

GT

CATAATCA

G






149
AGTTCTCCCATAAT
18183
AGAA

AAGGGTTCATATG
18459
AAAA




CACCATT

GT

CATAATCA

G






150
AGAACTGGAGCCTT
18184
TAAA

CATATGAACCCTTC
18460
CCAA




CAGAGGG

A

ACACTAC

A






151
TGGGAGAACTGGA
18185
AGGG

CACACTACCCAAA
18461
TTGG




GCCTTCAG



TTATATAT








152
GTTCTCCCATAATC
18186
AGA

GGTTCATATGCATA
18462
AAG




ACCATT



ATCAAA








153
GAACTGGAGCCTTC
18187
TAA

AACCCTTCACACTA
18463
TTA




AGAGGG



CCCAAA








154
GGGTAAAATTAAG
18188
GAAG

AGCATGCCAACTA
18464
TAAG




CACAGTG

AAT

GAAGAGG

AAA






162
AGTTCTCCCATAAT
18189
AGAA

AAGGGTTCATATG
18465
AAAA




CACCATT

GT

CATAATCA

G






163
AGAACTGGAGCCTT
18190
TAAA

CATATGAACCCTTC
18466
CCAA




CAGAGGG

A

ACACTAC

A






164
GTTCTCCCATAATC
18191
AG

GTTCATATGCATAA
18467
AG




ACCATT



TCAAAA








165
TTCTCCCATAATCA
18192
GAA

GTTCATATGCATAA
18468
AGT




CCATTA



TCAAAA








166
AACTGGAGCCTTCA
18193
AAA

ACCCTTCACACTAC
18469
TAT




GAGGGT



CCAAAT








178
ttCTCCCATAATCAC
18194
GTGA

aaATATATAATTTG
18470
AAGG




CATTAGAA

A

GGTAGTGTG

GT






179
TCTCCCATAATCAC
18195
AGTG

AAGGGTTCATATG
18471
AAAA




CATTAGA

A

CATAATCA

G






180
TCTCCCATAATCAC
18196
AAG

GGTTCATATGCATA
18472
AAG




CATTAG



ATCAAA








181
TCTCCCATAATCAC
18197
AAG

TTCATATGCATAAT
18473
GTT




CATTAG



CAAAAA








182
ACTGGAGCCTTCAG
18198
AAA

CCCTTCACACTACC
18474
ATA




AGGGTA



CAAATT








189
TCTCCCATAATCAC
18199
AGTG

AAGGGTTCATATG
18475
AAAA




CATTAGA

A

CATAATCA

G






190
GTTCTCCCATAATC
18200
GAAG

AGGGTTCATATGC
18476
AAAG




ACCATTA



ATAATCAA








191
CTCCCATAATCACC
18201
AGT

TCATATGCATAATC
18477
TTT




ATTAGA



AAAAAG








192
CTGGAGCCTTCAGA
18202
AAT

CCTTCACACTACCC
18478
TAT




GGGTAA



AAATTA








193
ATCACCATTAGAAG
18203
CTGG


18479

18617



TGAAGT

AAA










198
CTCCCATAATCACC
18204
GTGA

TGCATAATCAAAA
18480
CATA




ATTAGAA

A

AGTTTTCA

GT






199
TCCCATAATCACCA
18205
GTG

CATATGCATAATC
18481
TTT




TTAGAA



AAAAAGT








200
TGGAGCCTTCAGAG
18206
ATT

CTTCACACTACCCA
18482
ATA




GGTAAA



AATTAT








206
TCCCATAATCACCA
18207
TGAA

TGCATAATCAAAA
18483
CATA




TTAGAAG

GT

AGTTTTCA

GT






207
CCCATAATCACCAT
18208
TGA

ATATGCATAATCA
18484
TTC




TAGAAG



AAAAGTT








208
GGAGCCTTCAGAG
18209
TTA

TTCACACTACCCAA
18485
TAT




GGTAAAA



ATTATA








209
ctggAGCCTTCAGAG
18210
TAAG

ttcaCACTACCCAAA
18486
TTGG




GGTAAAAT

C

TTATATAT

C






210
ctggAGCCTTCAGAG
18211
TAAG

ttcaCACTACCCAAA
18487
TTGG




GGTAAAAT

C

TTATATAT

C






214
TCCCATAATCACCA
18212
TGAA

TGCATAATCAAAA
18488
CATA




TTAGAAG

GT

AGTTTTCA

GT






215
CCATAATCACCATT
18213
GAA

TATGCATAATCAA
18489
TCA




AGAAGT



AAAGTTT








216
GAGCCTTCAGAGG
18214
TAA

TCACACTACCCAA
18490
ATT




GTAAAAT



ATTATAT








217
CATAATCACCATTA
18215
AAG

ATGCATAATCAAA
18491
CAC




GAAGTG



AAGTTTT








218
AGCCTTCAGAGGGT
18216
AAG

CACACTACCCAAA
18492
TTT




AAAATT



TTATATA








220
ATAATCACCATTAG
18217
AGT

TGCATAATCAAAA
18493
ACA




AAGTGA



AGTTTTC








221
GCCTTCAGAGGGTA
18218
AGC

ACACTACCCAAAT
18494
TTG




AAATTA



TATATAT








222
ggagCCTTCAGAGGG
18219
AGCA

ttcaCACTACCCAAA
18495
TTGG




TAAAATTA

C

TTATATAT

C






224
TAATCACCATTAGA
18220
GTC

GCATAATCAAAAA
18496
CAT




AGTGAA



GTTTTCA








225
CCTTCAGAGGGTAA
18221
GCA

CACTACCCAAATT
18497
TGG




AATTAA



ATATATT








228
gaaGTGAAGTCTGGA
18222
CATC

ataTGCATAATCAAA
18498
CATA




AATAAAACC

ATT

AAGTTTTCA

GTT






229
AATCACCATTAGAA
18223
TCT

CATAATCAAAAAG
18499
ATA




GTGAAG



TTTTCAC








230
CTTCAGAGGGTAAA
18224
CAC

ACTACCCAAATTAT
18500
GGC




ATTAAG



ATATTT








233
agaGGGTAAAATTA
18225
AAGA

actACCCAAATTATA
18501
CCAT




AGCACAGTGG

ATT

TATTTGGCT

ATT






234
ATCACCATTAGAAG
18226
CTG

ATAATCAAAAAGT
18502
TAG




TGAAGT



TTTCACA








235
TTCAGAGGGTAAA
18227
ACA

CTACCCAAATTATA
18503
GCT




ATTAAGC



TATTTG








236
tcccATAATCACCATT
18228
AAGT

cataATCAAAAAGTT
18504
GTTTC




AGAAGTG

CTGG

TTCACATA








237
tcccATAATCACCATT
18229
AAGT

cataATCAAAAAGTT
18505
GTTTC




AGAAGTG

CTGG

TTCACATA








240
ccATTAGAAGTGAA
18230
AAAA

aaTCAAAAAGTTTT
18506
CTTAC




GTCTGGAAAT

CC

CACATAGTTT

C






241
TCACCATTAGAAGT
18231
TG

TAATCAAAAAGTT
18507
AG




GAAGTC



TTCACAT








242
TCACCATTAGAAGT
18232
TGG

TAATCAAAAAGTT
18508
AGT




GAAGTC



TTCACAT








243
TCAGAGGGTAAAA
18233
CAG

TACCCAAATTATAT
18509
CTC




TTAAGCA



ATTTGG








244
tcccATAATCACCATT
18234
AAGT

cataATCAAAAAGTT
18510
GTTTC




AGAAGTG

CTGG

TTCACATA








245
tcccATAATCACCATT
18235
AAGT

cataATCAAAAAGTT
18511
GTTTC




AGAAGTG

CTGG

TTCACATA








250
TCACCATTAGAAGT
18236
TGG

ATAATCAAAAAGT
18512
TAG




GAAGTC



TTTCACA








251
TCACCATTAGAAGT
18237
TGG

AGTTTCTTACCTCT
18513
TGG




GAAGTC



TCTAGT








252
CACCATTAGAAGTG
18238
GG

TAATCAAAAAGTT
18514
AG




AAGTCT



TTCACAT








253
CACCATTAGAAGTG
18239
GGA

AATCAAAAAGTTT
18515
GTT




AAGTCT



TCACATA








254
CAGAGGGTAAAAT
18240
AGT

ACCCAAATTATAT
18516
TCC




TAAGCAC



ATTTGGC








263
AATCACCATTAGAA
18241
CTGG

ATAGTTTCTTACCT
18517
TTGG




GTGAAGT



CTTCTAG








264
AATCACCATTAGAA
18242
CTGG

GCATAATCAAAAA
18518
ATAG




GTGAAGT



GTTTTCAC








265
TCACCATTAGAAGT
18243
TGG

ATAATCAAAAAGT
18519
TAG




GAAGTC



TTTCACA








266
ACCATTAGAAGTGA
18244
GAA

ATCAAAAAGTTTTC
18520
TTT




AGTCTG



ACATAG








267
AGAGGGTAAAATT
18245
GTG

CCCAAATTATATAT
18521
CCA




AAGCACA



TTGGCT








268
cattAGAAGTGAAGT
18246
AAAA

cataATCAAAAAGTT
18522
GTTTC




CTGGAAAT

C

TTCACATA








272
CCATTAGAAGTGAA
18247
AATA

TGCATAATCAAAA
18523
CATA




GTCTGGA

A

AGTTTTCA

GT






273
CCATTAGAAGTGAA
18248
AAA

TCAAAAAGTTTTCA
18524
TTC




GTCTGG



CATAGT








274
GAGGGTAAAATTA
18249
TGG

CCAAATTATATATT
18525
CAT




AGCACAG



TGGCTC








275
ggagCCTTCAGAGGG
18250
AGCA

cccaAATTATATATT
18526
TATTC




TAAAATTA

C

TGGCTCCA

AAT






276
ggagCCTTCAGAGGG
18251
AGCA

cccaAATTATATATT
18527
TATTC




TAAAATTA

C

TGGCTCCA

AAT






278
AGGGTAAAATTAA
18252
GG

ACTACCCAAATTAT
18528
GG




GCACAGT



ATATTT








279
CATTAGAAGTGAA
18253
AAT

CAAAAAGTTTTCA
18529
TCT




GTCTGGA



CATAGTT








280
AGGGTAAAATTAA
18254
GGA

CAAATTATATATTT
18530
ATA




GCACAGT



GGCTCC










Table 2 provides exemplified second-nick gRNA species for optional use for correcting the pathogenic F508del mutation in CFTR. The gRNA spacers from Table 1 were filtered, e.g., filtered by occurrence within 15 nt of the desired editing location and use of a Tier 1 Cas enzyme. Second-nick gRNAs were generated by searching the opposite strand of DNA in the regions −40 to −140 (“left”) and +40 to +140 (“right”), relative to the first nick site defined by the first gRNA, for the PAM utilized by the corresponding Cas variant. One exemplary spacer is shown for each side of the target nick site.


Capital letters indicate “core nucleotides” while lower case letters indicate “flanking nucleotides.” Herein, when an RNA sequence (e.g., a gRNA to produce a second nick) is said to comprise a particular sequence (e.g., a sequence of Table 2 or a portion thereof) that comprises thymine (T), it is of course understood that the RNA sequence may (and frequently does) comprise uracil (U) in place of T. For instance, the RNA sequence may comprise U at every position shown as T in the sequence in Table 2. More specifically, the present disclosure provides an RNA sequence according to every gRNA spacer sequence shown in Table 2, wherein the RNA sequence has a U in place of each T in the sequence in Table 2.


In some embodiments, the systems and methods provided herein may comprise a template sequence listed in Table 4. Table 4 provides exemplary template RNA sequences (column 4) and optional second-nick gRNA spacer sequences (column 5) designed to be paired with a gene modifying polypeptide to correct a mutation in the CFTR gene. The templates in Table 4 are meant to exemplify the total sequence of: (1) gRNA spacer (e.g., for targeting for first strand nick), (2) gRNA scaffold, (3) heterologous object sequence, and (4) PBS sequence (e.g., for initiating TPRT at first strand nick).









TABLE 4







Exemplary template RNA sequences and second nick gRNA spacer sequences














Cas


SEQ

SEQ ID


ID
Species
strand
Template RNA
ID NO
Second-Nick gRNA
NO
















10
ScaCas9-
+
CATTAAAGAAAATATCATTGGTTTTAGAGCTA
18669
GGGTAGTGTGAAGGGTTCATGTTTTAGAGCT
18807



Sc++

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCatctatattcatcataggaaacaccaAAGATGATATTTtctt

GTGC






11
ScaCas9-

ATTCATCATAGGAAACACCAGTTTTAGAGCTA
18670
TGGTGATTATGGGAGAACTGGTTTTAGAGCT
18808



Sc++

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCtggcaccattaaagaaaatatcaTCTTTGGTGTTTCCTatg

GTGC






a








12
SpyCas9-
+
CATTAAAGAAAATATCATTGGTTTTAGAGCTA
18671
GGGTAGTGTGAAGGGTTCATGTTTTAGAGCT
18809



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCatctatattcatcataggaaacaccaAAGATGATATTTtctt

GTGC






13
SpyCas9-

ATTCATCATAGGAAACACCAGTTTTAGAGCTA
18672
GGTGATTATGGGAGAACTGGGTTTTAGAGCT
18810



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCtggcaccattaaagaaaatatcaTCTTTGGTGTTTCCTatg

GTGC






a








20
SauCas9KKH

ATATTCATCATAGGAAACACCGTTTTAGTACT
18673
GATTATGGGAGAACTGGAGCCGTTTTAGTAC
18811





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAggcaccattaaagaaaatatcaTCTTTGGTGTTTCCTAt

CGAGA






gat








21
SauCas9KKH

ATATTCATCATAGGAAACACCGTTTTAGTACT
18674
GATTATGGGAGAACTGGAGCCGTTTTAGTAC
18812





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAggcaccattaaagaaaatatcaTCTTTGGTGTTTCCTAt

CGAGA






gat








22
Spy Cas9-
+
CCATTAAAGAAAATATCATTGTTTTAGAGCTA
18675
GGTAGTGTGAAGGGTTCATAGTTTTAGAGCT
18813



NG

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCtctatattcatcataggaaacaccaAAGATGATATTTTcttt

GTGC






23
SpyCas9-
+
CCATTAAAGAAAATATCATTGTTTTAGAGCTA
18676
GGTAGTGTGAAGGGTTCATAGTTTTAGAGCT
18814



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCtctatattcatcataggaaacaccaAAGATGATATTTTcttt

GTGC






24
SpyCas9-

TATTCATCATAGGAAACACCGTTTTAGAGCTA
18677
GTGATTATGGGAGAACTGGAGTTTTAGAGCT
18815



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCggcaccattaaagaaaatatcaTCTTTGGTGTTTCCTAtg

GTGC






at








28
PpnCas9
+
tggCACCATTAAAGAAAATATCATGTTGTAGCT
18678
gtgAAGGGTTCATATGCATAATCAGTTGTAGC
18816





CCCTTTTTCATTTCGCGAAAGCGAAATGAAAA

TCCCTTTTTCATTTCGCGAAAGCGAAATGAA






ACGTTGTTACAATAAGAGATGAATTTCTCGCA

AAACGTTGTTACAATAAGAGATGAATTTCTC






AAGCTCTGCCTCTTGAAATTTCGGTTTCAAGA

GCAAAGCTCTGCCTCTTGAAATTTCGGTTTC






GGCATCctatattcatcataggaaacaccaAAGATGATATT

AAGAGGCATC






TTCttta








29
ScaCas9-
+
ACCATTAAAGAAAATATCATGTTTTAGAGCTA
18679
GGGTAGTGTGAAGGGTTCATGTTTTAGAGCT
18817



Sc++

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCctatattcatcataggaaacaccaAAGATGATATTTTCttta

GTGC






30
SpyCas9
+
ACCATTAAAGAAAATATCATGTTTTAGAGCTA
18680
TATAATTTGGGTAGTGTGAAGTTTTAGAGCT
18818





GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCctatattcatcataggaaacaccaAAGATGATATTTTCttta

GTGC






31
SpyCas9-
+
ACCATTAAAGAAAATATCATGTTTTAGAGCTA
18681
GGTAGTGTGAAGGGTTCATAGTTTTAGAGCT
18819



NG

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCctatattcatcataggaaacaccaAAGATGATATTTTCttta

GTGC






32
SpyCas9-
+
ACCATTAAAGAAAATATCATGTTTTAGAGCTA
18682
GTAGTGTGAAGGGTTCATATGTTTTAGAGCT
18820



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCctatattcatcataggaaacaccaAAGATGATATTTTCttta

GTGC






33
SpyCas9-

ATATTCATCATAGGAAACACGTTTTAGAGCTA
18683
TGATTATGGGAGAACTGGAGGTTTTAGAGCT
18821



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCgcaccattaaagaaaatatcaTCTTTGGTGTTTCCTATg

GTGC






atg








45
SauriCas9
+
GCACCATTAAAGAAAATATCAGTTTTAGTACT
18684
TATATAATTTGGGTAGTGTGAGTTTTAGTAC
18822





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAtatattcatcataggaaacaccaAAGATGATATTTTCTtta

CGAGA






a








46
SauriCas9-
+
GCACCATTAAAGAAAATATCAGTTTTAGTACT
18685
AGGGTTCATATGCATAATCAAGTTTTAGTAC
18823



KKH

CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAtatattcatcataggaaacaccaAAGATGATATTTTCTtta

CGAGA






a








47
ScaCas9-
+
CACCATTAAAGAAAATATCAGTTTTAGAGCTA
18686
GGGTAGTGTGAAGGGTTCATGTTTTAGAGCT
18824



Sc++

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCtatattcatcataggaaacaccaAAGATGATATTTTCTtta

GTGC






a








48
SpyCas9-
+
CACCATTAAAGAAAATATCAGTTTTAGAGCTA
18687
TAGTGTGAAGGGTTCATATGGTTTTAGAGCT
18825



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCtatattcatcataggaaacaccaAAGATGATATTTTCTtta

GTGC






a








49
SpyCas9-

TATATTCATCATAGGAAACAGTTTTAGAGCTA
18688
GATTATGGGAGAACTGGAGCGTTTTAGAGCT
18826



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCcaccattaaagaaaatatcaTCTTTGGTGTTTCCTATGa

GTGC






tga








56
SauCas9KKH
+
GGCACCATTAAAGAAAATATCGTTTTAGTACT
18689
GTAGTGTGAAGGGTTCATATGGTTTTAGTAC
18827





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAatattcatcataggaaacaccaAAGATGATATTTTCTTta

CGAGA






at








57
SauCas9KKH
+
GGCACCATTAAAGAAAATATCGTTTTAGTACT
18690
GTAGTGTGAAGGGTTCATATGGTTTTAGTAC
18828





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAatattcatcataggaaacaccaAAGATGATATTTTCTTta

CGAGA






at








58
SauCas9KKH

TCTATATTCATCATAGGAAACGTTTTAGTACT
18691
GATTATGGGAGAACTGGAGCCGTTTTAGTAC
18829





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAaccattaaagaaaatatcaTCTTTGGTGTTTCCTATGA

CGAGA






tgaa








59
SauCas9KKH

TCTATATTCATCATAGGAAACGTTTTAGTACT
18692
GATTATGGGAGAACTGGAGCCGTTTTAGTAC
18830





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAaccattaaagaaaatatcaTCTTTGGTGTTTCCTATGA

CGAGA






tgaa








60
SpyCas9-
+
GCACCATTAAAGAAAATATCGTTTTAGAGCTA
18693
AGTGTGAAGGGTTCATATGCGTTTTAGAGCT
18831



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCatattcatcataggaaacaccaAAGATGATATTTTCTTta

GTGC






at








61
SpyCas9-

CTATATTCATCATAGGAAACGTTTTAGAGCTA
18694
ATTATGGGAGAACTGGAGCCGTTTTAGAGCT
18832



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaccattaaagaaaatatcaTCTTTGGTGTTTCCTATGA

GTGC






tgaa








64
SpyCas9-
+
GGCACCATTAAAGAAAATATGTTTTAGAGCT
18695
GTGTGAAGGGTTCATATGCAGTTTTAGAGCT
18833



SpRY

AGAAATAGCAAGTTAAAATAAGGCTAGTCCG

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TTATCAACTTGAAAAAGTGGCACCGAGTCGG

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






TGCtattcatcataggaaacaccaAAGATGATATTTTCTTT

GTGC






aatg








65
SpyCas9-

TCTATATTCATCATAGGAAAGTTTTAGAGCTA
18696
TTATGGGAGAACTGGAGCCTGTTTTAGAGCT
18834



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCccattaaagaaaatatcaTCTTTGGTGTTTCCTATGAT

GTGC






gaat








68
SpyCas9-
+
TGGCACCATTAAAGAAAATAGTTTTAGAGCT
18697
TGTGAAGGGTTCATATGCATGTTTTAGAGCT
18835



SpRY

AGAAATAGCAAGTTAAAATAAGGCTAGTCCG

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TTATCAACTTGAAAAAGTGGCACCGAGTCGG

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






TGCattcatcataggaaacaccaAAGATGATATTTTCTTT

GTGC






Aatgg








69
SpyCas9-

ATCTATATTCATCATAGGAAGTTTTAGAGCTA
18698
TATGGGAGAACTGGAGCCTTGTTTTAGAGCT
18836



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCcattaaagaaaatatcaTCTTTGGTGTTTCCTATGAT

GTGC






Gaata








70
BlatCas9

tgtaTCTATATTCATCATAGGAAGCTATAGTTCC
18699
tggtGATTATGGGAGAACTGGAGGCTATAGTT
18837





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CCTTACTGAAAGGTAAGTTGCTATAGTAAGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

GCAACAGACCCGAGGCGTTGGGGATCGCCT






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

AGCCCGTGTTTACGGGCTCTCCCCATATTCA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AAATAATGACAGACGAGCACCTTGGAGCAT






CCGAGGTGCTcattaaagaaaatatcaTCTTTGGTGTTT

TTATCTCCGAGGTGCT






CCTATGATGaata








71
BlatCas9

tgtaTCTATATTCATCATAGGAAGCTATAGTTCC
18700
tggtGATTATGGGAGAACTGGAGGCTATAGTT
18838





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CCTTACTGAAAGGTAAGTTGCTATAGTAAGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

GCAACAGACCCGAGGCGTTGGGGATCGCCT






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

AGCCCGTGTTTACGGGCTCTCCCCATATTCA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AAATAATGACAGACGAGCACCTTGGAGCAT






CCGAGGTGCTcattaaagaaaatatcaTCTTTGGTGTTT

TTATCTCCGAGGTGCT






CCTATGATGaata








72
Nme2Cas9

tcTGTATCTATATTCATCATAGGAGTTGTAGCT
18701
tcTAATGGTGATTATGGGAGAACTGTTGTAGC
18839





CCCTTTCTCATTTCGGAAACGAAATGAGAACC

TCCCTTTCTCATTTCGGAAACGAAATGAGAA






GTTGCTACAATAAGGCCGTCTGAAAAGATGT

CCGTTGCTACAATAAGGCCGTCTGAAAAGAT






GCCGCAACGCTCTGCCCCTTAAAGCTTCTGCT

GTGCCGCAACGCTCTGCCCCTTAAAGCTTCT






TTAAGGGGCATCGTTTAattaaagaaaatatcaTCTTTG

GCTTTAAGGGGCATCGTTTA






GTGTTTCCTATGATGAatat








73
SpyCas9-
+
CTGGCACCATTAAAGAAAATGTTTTAGAGCTA
18702
GTGAAGGGTTCATATGCATAGTTTTAGAGCT
18840



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCttcatcataggaaacaccaAAGATGATATTTTCTTTA

GTGC






Atggt








74
SpyCas9-

TATCTATATTCATCATAGGAGTTTTAGAGCTA
18703
ATGGGAGAACTGGAGCCTTCGTTTTAGAGCT
18841



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCattaaagaaaatatcaTCTTTGGTGTTTCCTATGATG

GTGC






Aatat








75
BlatCas9

ctgtATCTATATTCATCATAGGAGCTATAGTTCC
18704
tggtGATTATGGGAGAACTGGAGGCTATAGTT
18842





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CCTTACTGAAAGGTAAGTTGCTATAGTAAGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

GCAACAGACCCGAGGCGTTGGGGATCGCCT






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

AGCCCGTGTTTACGGGCTCTCCCCATATTCA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AAATAATGACAGACGAGCACCTTGGAGCAT






CCGAGGTGCTattaaagaaaatatcaTCTTTGGTGTTTC

TTATCTCCGAGGTGCT






CTATGATGAatat








76
BlatCas9

ctgtATCTATATTCATCATAGGAGCTATAGTTCC
18705
tggtGATTATGGGAGAACTGGAGGCTATAGTT
18843





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CCTTACTGAAAGGTAAGTTGCTATAGTAAGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

GCAACAGACCCGAGGCGTTGGGGATCGCCT






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

AGCCCGTGTTTACGGGCTCTCCCCATATTCA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AAATAATGACAGACGAGCACCTTGGAGCAT






CCGAGGTGCTattaaagaaaatatcaTCTTTGGTGTTTC

TTATCTCCGAGGTGCT






CTATGATGAatat








77
BlatCas9

ctgtATCTATATTCATCATAGGAGCTATAGTTCC
18706
tggtGATTATGGGAGAACTGGAGGCTATAGTT
18844





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CCTTACTGAAAGGTAAGTTGCTATAGTAAGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

GCAACAGACCCGAGGCGTTGGGGATCGCCT






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

AGCCCGTGTTTACGGGCTCTCCCCATATTCA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AAATAATGACAGACGAGCACCTTGGAGCAT






CCGAGGTGCTattaaagaaaatatcaTCTTTGGTGTTTC

TTATCTCCGAGGTGCT






CTATGATGAatat








81
PpnCas9
+
tatGCCTGGCACCATTAAAGAAAAGTTGTAGCT
18707
gtgAAGGGTTCATATGCATAATCAGTTGTAGC
18845





CCCTTTTTCATTTCGCGAAAGCGAAATGAAAA

TCCCTTTTTCATTTCGCGAAAGCGAAATGAA






ACGTTGTTACAATAAGAGATGAATTTCTCGCA

AAACGTTGTTACAATAAGAGATGAATTTCTC






AAGCTCTGCCTCTTGAAATTTCGGTTTCAAGA

GCAAAGCTCTGCCTCTTGAAATTTCGGTTTC






GGCATCtcatcataggaaacaccaAAGATGATATTTTCT

AAGAGGCATC






TTAATggtg








82
SpyCas9-
+
CCTGGCACCATTAAAGAAAAGTTTTAGAGCT
18708
TGAAGGGTTCATATGCATAAGTTTTAGAGCT
18846



SpRY

AGAAATAGCAAGTTAAAATAAGGCTAGTCCG

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TTATCAACTTGAAAAAGTGGCACCGAGTCGG

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






TGCtcatcataggaaacaccaAAGATGATATTTTCTTTA

GTGC






ATggtg








83
SpyCas9-

GTATCTATATTCATCATAGGGTTTTAGAGCTA
18709
TGGGAGAACTGGAGCCTTCAGTTTTAGAGCT
18847



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCttaaagaaaatatcaTCTTTGGTGTTTCCTATGATG

GTGC






AAtata








88
SpyCas9-
+
GCCTGGCACCATTAAAGAAAGTTTTAGAGCT
18710
GAAGGGTTCATATGCATAATGTTTTAGAGCT
18848



SpRY

AGAAATAGCAAGTTAAAATAAGGCTAGTCCG

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TTATCAACTTGAAAAAGTGGCACCGAGTCGG

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






TGCcatcataggaaacaccaAAGATGATATTTTCTTTA

GTGC






ATGgtgc








89
SpyCas9-

TGTATCTATATTCATCATAGGTTTTAGAGCTA
18711
GGGAGAACTGGAGCCTTCAGGTTTTAGAGCT
18849



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCtaaagaaaatatcaTCTTTGGTGTTTCCTATGATGA

GTGC






ATatag








90
BlatCas9
+
tatgCCTGGCACCATTAAAGAAAGCTATAGTTC
18712
gtagTGTGAAGGGTTCATATGCAGCTATAGTT
18850





CTTACTGAAAGGTAAGTTGCTATAGTAAGGG

CCTTACTGAAAGGTAAGTTGCTATAGTAAGG






CAACAGACCCGAGGCGTTGGGGATCGCCTAG

GCAACAGACCCGAGGCGTTGGGGATCGCCT






CCCGTGTTTACGGGCTCTCCCCATATTCAAAA

AGCCCGTGTTTACGGGCTCTCCCCATATTCA






TAATGACAGACGAGCACCTTGGAGCATTTATC

AAATAATGACAGACGAGCACCTTGGAGCAT






TCCGAGGTGCTcatcataggaaacaccaAAGATGATAT

TTATCTCCGAGGTGCT






TTTCTTTAATGgtgc








91
BlatCas9
+
tatgCCTGGCACCATTAAAGAAAGCTATAGTTC
18713
gtagTGTGAAGGGTTCATATGCAGCTATAGTT
18851





CTTACTGAAAGGTAAGTTGCTATAGTAAGGG

CCTTACTGAAAGGTAAGTTGCTATAGTAAGG






CAACAGACCCGAGGCGTTGGGGATCGCCTAG

GCAACAGACCCGAGGCGTTGGGGATCGCCT






CCCGTGTTTACGGGCTCTCCCCATATTCAAAA

AGCCCGTGTTTACGGGCTCTCCCCATATTCA






TAATGACAGACGAGCACCTTGGAGCATTTATC

AAATAATGACAGACGAGCACCTTGGAGCAT






TCCGAGGTGCTcatcataggaaacaccaAAGATGATAT

TTATCTCCGAGGTGCT






TTTCTTTAATGgtgc








92
BlatCas9

ttctGTATCTATATTCATCATAGGCTATAGTTCC
18714
tggtGATTATGGGAGAACTGGAGGCTATAGTT
18852





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CCTTACTGAAAGGTAAGTTGCTATAGTAAGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

GCAACAGACCCGAGGCGTTGGGGATCGCCT






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

AGCCCGTGTTTACGGGCTCTCCCCATATTCA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AAATAATGACAGACGAGCACCTTGGAGCAT






CCGAGGTGCTtaaagaaaatatcaTCTTTGGTGTTTCC

TTATCTCCGAGGTGCT






TATGATGAATatag








98
SauCas9KKH

TCTGTATCTATATTCATCATAGTTTTAGTACTC
18715
GGAGAACTGGAGCCTTCAGAGGTTTTAGTAC
18853





TGGAAACAGAATCTACTAAAACAAGGCAAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






TGCCGTGTTTATCTCGTCAACTTGTTGGCGAG

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






AaaagaaaatatcaTCTTTGGTGTTTCCTATGATGAA

CGAGA






TAtaga








99
SpyCas9-

CTGTATCTATATTCATCATAGTTTTAGAGCTA
18716
GGAGAACTGGAGCCTTCAGAGTTTTAGAGCT
18854



NG

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaaagaaaatatcaTCTTTGGTGTTTCCTATGATGA

GTGC






ATAtaga








100
SpyCas9-
+
TGCCTGGCACCATTAAAGAAGTTTTAGAGCTA
18717
AAGGGTTCATATGCATAATCGTTTTAGAGCT
18855



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCatcataggaaacaccaAAGATGATATTTTCTTTAAT

GTGC






GGtgcc








101
SpyCas9-

CTGTATCTATATTCATCATAGTTTTAGAGCTA
18718
GGAGAACTGGAGCCTTCAGAGTTTTAGAGCT
18856



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaaagaaaatatcaTCTTTGGTGTTTCCTATGATGA

GTGC






ATAtaga








111
SauCas9

gcTTCTGTATCTATATTCATCATGTTTTAGTAC
18719
ttATGGGAGAACTGGAGCCTTCAGTTTTAGTA
18857





TCTGGAAACAGAATCTACTAAAACAAGGCAA

CTCTGGAAACAGAATCTACTAAAACAAGGC






AATGCCGTGTTTATCTCGTCAACTTGTTGGCG

AAAATGCCGTGTTTATCTCGTCAACTTGTTG






AGAaagaaaatatcaTCTTTGGTGTTTCCTATGATGA

GCGAGA






ATATagat








112
SauCas9KKH

TTCTGTATCTATATTCATCATGTTTTAGTACTC
18720
GGAGAACTGGAGCCTTCAGAGGTTTTAGTAC
18858





TGGAAACAGAATCTACTAAAACAAGGCAAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






TGCCGTGTTTATCTCGTCAACTTGTTGGCGAG

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






AaagaaaatatcaTCTTTGGTGTTTCCTATGATGAAT

CGAGA






ATagat








113
ScaCas9-

TCTGTATCTATATTCATCATGTTTTAGAGCTA
18721
GGAGAACTGGAGCCTTCAGAGTTTTAGAGCT
18859



Sc++

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaagaaaatatcaTCTTTGGTGTTTCCTATGATGAA

GTGC






TATagat








114
SpyCas9

TCTGTATCTATATTCATCATGTTTTAGAGCTA
18722
GGAGAACTGGAGCCTTCAGAGTTTTAGAGCT
18860





GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaagaaaatatcaTCTTTGGTGTTTCCTATGATGAA

GTGC






TATagat








115
SpyCas9-

TCTGTATCTATATTCATCATGTTTTAGAGCTA
18723
GAGAACTGGAGCCTTCAGAGGTTTTAGAGCT
18861



NG

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaagaaaatatcaTCTTTGGTGTTTCCTATGATGAA

GTGC






TATagat








116
SpyCas9-
+
ATGCCTGGCACCATTAAAGAGTTTTAGAGCTA
18724
AGGGTTCATATGCATAATCAGTTTTAGAGCT
18862



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCtcataggaaacaccaAAGATGATATTTTCTTTAATG

GTGC






GTgcca








117
SpyCas9-

TCTGTATCTATATTCATCATGTTTTAGAGCTA
18725
GAGAACTGGAGCCTTCAGAGGTTTTAGAGCT
18863



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaagaaaatatcaTCTTTGGTGTTTCCTATGATGAA

GTGC






TATagat








133
SauCas9

cgCTTCTGTATCTATATTCATCAGTTTTAGTAC
18726
ttATGGGAGAACTGGAGCCTTCAGTTTTAGTA
18864





TCTGGAAACAGAATCTACTAAAACAAGGCAA

CTCTGGAAACAGAATCTACTAAAACAAGGC






AATGCCGTGTTTATCTCGTCAACTTGTTGGCG

AAAATGCCGTGTTTATCTCGTCAACTTGTTG






AGAagaaaatatcaTCTTTGGTGTTTCCTATGATGA

GCGAGA






ATATAgata








134
SauCas9KKH

CTTCTGTATCTATATTCATCAGTTTTAGTACTC
18727
GAGAACTGGAGCCTTCAGAGGGTTTTAGTAC
18865





TGGAAACAGAATCTACTAAAACAAGGCAAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






TGCCGTGTTTATCTCGTCAACTTGTTGGCGAG

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






AagaaaatatcaTCTTTGGTGTTTCCTATGATGAAT

CGAGA






ATAgata








135
SauriCas9

CTTCTGTATCTATATTCATCAGTTTTAGTACTC
18728
TGGGAGAACTGGAGCCTTCAGGTTTTAGTAC
18866





TGGAAACAGAATCTACTAAAACAAGGCAAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






TGCCGTGTTTATCTCGTCAACTTGTTGGCGAG

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






AagaaaatatcaTCTTTGGTGTTTCCTATGATGAAT

CGAGA






ATAgata








136
SauriCas9-

CTTCTGTATCTATATTCATCAGTTTTAGTACTC
18729
TGGGAGAACTGGAGCCTTCAGGTTTTAGTAC
18867



KKH

TGGAAACAGAATCTACTAAAACAAGGCAAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






TGCCGTGTTTATCTCGTCAACTTGTTGGCGAG

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






AagaaaatatcaTCTTTGGTGTTTCCTATGATGAAT

CGAGA






ATAgata








137
ScaCas9-

TTCTGTATCTATATTCATCAGTTTTAGAGCTA
18730
GGAGAACTGGAGCCTTCAGAGTTTTAGAGCT
18868



Sc++

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCagaaaatatcaTCTTTGGTGTTTCCTATGATGAAT

GTGC






ATAgata








138
SpyCas9-
+
TATGCCTGGCACCATTAAAGGTTTTAGAGCTA
18731
GGGTTCATATGCATAATCAAGTTTTAGAGCT
18869



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCcataggaaacaccaAAGATGATATTTTCTTTAATG

GTGC






GTGccag








139
SpyCas9-

TTCTGTATCTATATTCATCAGTTTTAGAGCTA
18732
AGAACTGGAGCCTTCAGAGGGTTTTAGAGCT
18870



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCagaaaatatcaTCTTTGGTGTTTCCTATGATGAAT

GTGC






ATAgata








140
St1Cas9

TTCTGTATCTATATTCATCAGTCTTTGTACTCT
18733
GGGTAAAATTAAGCACAGTGGTCTTTGTACT
18871





GGTACCAGAAGCTACAAAGATAAGGCTTCAT

CTGGTACCAGAAGCTACAAAGATAAGGCTT






GCCGAAATCAACACCCTGTCATTTTATGGCAG

CATGCCGAAATCAACACCCTGTCATTTTATG






GGTGTTTTagaaaatatcaTCTTTGGTGTTTCCTATG

GCAGGGTGTTTT






ATGAATATAgata








148
SauCas9KKH
+
ATTATGCCTGGCACCATTAAAGTTTTAGTACT
18734
AAGGGTTCATATGCATAATCAGTTTTAGTAC
18872





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAataggaaacaccaAAGATGATATTTTCTTTAATG

CGAGA






GTGCcagg








149
SauCas9KKH
+
ATTATGCCTGGCACCATTAAAGTTTTAGTACT
18735
AAGGGTTCATATGCATAATCAGTTTTAGTAC
18873





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAataggaaacaccaAAGATGATATTTTCTTTAATG

CGAGA






GTGCcagg








150
SauCas9KKH

GCTTCTGTATCTATATTCATCGTTTTAGTACTC
18736
AGAACTGGAGCCTTCAGAGGGGTTTTAGTAC
18874





TGGAAACAGAATCTACTAAAACAAGGCAAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






TGCCGTGTTTATCTCGTCAACTTGTTGGCGAG

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






AgaaaatatcaTCTTTGGTGTTTCCTATGATGAATA

CGAGA






TAGatac








151
SauriCas9-

GCTTCTGTATCTATATTCATCGTTTTAGTACTC
18737
TGGGAGAACTGGAGCCTTCAGGTTTTAGTAC
18875



KKH

TGGAAACAGAATCTACTAAAACAAGGCAAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






TGCCGTGTTTATCTCGTCAACTTGTTGGCGAG

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






AgaaaatatcaTCTTTGGTGTTTCCTATGATGAATA

CGAGA






TAGatac








152
SpyCas9-
+
TTATGCCTGGCACCATTAAAGTTTTAGAGCTA
18738
GGTTCATATGCATAATCAAAGTTTTAGAGCT
18876



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCataggaaacaccaAAGATGATATTTTCTTTAATGG

GTGC






TGCcagg








153
SpyCas9-

CTTCTGTATCTATATTCATCGTTTTAGAGCTAG
18739
GAACTGGAGCCTTCAGAGGGGTTTTAGAGCT
18877



SpRY

AAATAGCAAGTTAAAATAAGGCTAGTCCGTT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






ATCAACTTGAAAAAGTGGCACCGAGTCGGTG

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






CgaaaatatcaTCTTTGGTGTTTCCTATGATGAATA

GTGC






TAGatac








154
St1Cas9

CTTCTGTATCTATATTCATCGTCTTTGTACTCT
18740
GGGTAAAATTAAGCACAGTGGTCTTTGTACT
18878





GGTACCAGAAGCTACAAAGATAAGGCTTCAT

CTGGTACCAGAAGCTACAAAGATAAGGCTT






GCCGAAATCAACACCCTGTCATTTTATGGCAG

CATGCCGAAATCAACACCCTGTCATTTTATG






GGTGTTTTgaaaatatcaTCTTTGGTGTTTCCTATGA

GCAGGGTGTTTT






TGAATATAGatac








162
SauCas9KKH
+
GATTATGCCTGGCACCATTAAGTTTTAGTACT
18741
AAGGGTTCATATGCATAATCAGTTTTAGTAC
18879





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAtaggaaacaccaAAGATGATATTTTCTTTAATGG

CGAGA






TGCCaggc








163
SauCas9KKH

CGCTTCTGTATCTATATTCATGTTTTAGTACTC
18742
AGAACTGGAGCCTTCAGAGGGGTTTTAGTAC
18880





TGGAAACAGAATCTACTAAAACAAGGCAAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






TGCCGTGTTTATCTCGTCAACTTGTTGGCGAG

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






AaaaatatcaTCTTTGGTGTTTCCTATGATGAATAT

CGAGA






AGAtaca








164
SpyCas9-
+
ATTATGCCTGGCACCATTAAGTTTTAGAGCTA
18743
GTTCATATGCATAATCAAAAGTTTTAGAGCT
18881



NG

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCtaggaaacaccaAAGATGATATTTTCTTTAATGG

GTGC






TGCCaggc








165
SpyCas9-
+
ATTATGCCTGGCACCATTAAGTTTTAGAGCTA
18744
GTTCATATGCATAATCAAAAGTTTTAGAGCT
18882



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCtaggaaacaccaAAGATGATATTTTCTTTAATGG

GTGC






TGCCaggc








166
SpyCas9-

GCTTCTGTATCTATATTCATGTTTTAGAGCTA
18745
AACTGGAGCCTTCAGAGGGTGTTTTAGAGCT
18883



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaaaatatcaTCTTTGGTGTTTCCTATGATGAATA

GTGC






TAGAtaca








178
SauCas9
+
ctGGATTATGCCTGGCACCATTAGTTTTAGTAC
18746
aaATATATAATTTGGGTAGTGTGGTTTTAGTA
18884





TCTGGAAACAGAATCTACTAAAACAAGGCAA

CTCTGGAAACAGAATCTACTAAAACAAGGC






AATGCCGTGTTTATCTCGTCAACTTGTTGGCG

AAAATGCCGTGTTTATCTCGTCAACTTGTTG






AGAaggaaacaccaAAGATGATATTTTCTTTAATG

GCGAGA






GTGCCAggca








179
SauCas9KKH
+
GGATTATGCCTGGCACCATTAGTTTTAGTACT
18747
AAGGGTTCATATGCATAATCAGTTTTAGTAC
18885





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAaggaaacaccaAAGATGATATTTTCTTTAATGGT

CGAGA






GCCAggca








180
ScaCas9-
+
GATTATGCCTGGCACCATTAGTTTTAGAGCTA
18748
GGTTCATATGCATAATCAAAGTTTTAGAGCT
18886



Sc++

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaggaaacaccaAAGATGATATTTTCTTTAATGGT

GTGC






GCCAggca








181
SpyCas9-
+
GATTATGCCTGGCACCATTAGTTTTAGAGCTA
18749
TTCATATGCATAATCAAAAAGTTTTAGAGCT
18887



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaggaaacaccaAAGATGATATTTTCTTTAATGGT

GTGC






GCCAggca








182
SpyCas9-

CGCTTCTGTATCTATATTCAGTTTTAGAGCTA
18750
ACTGGAGCCTTCAGAGGGTAGTTTTAGAGCT
18888



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaaatatcaTCTTTGGTGTTTCCTATGATGAATAT

GTGC






AGATacag








189
SauCas9KKH
+
TGGATTATGCCTGGCACCATTGTTTTAGTACT
18751
AAGGGTTCATATGCATAATCAGTTTTAGTAC
18889





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAggaaacaccaAAGATGATATTTTCTTTAATGGT

CGAGA






GCCAGgcat








190
SauriCas9-
+
TGGATTATGCCTGGCACCATTGTTTTAGTACT
18752
AGGGTTCATATGCATAATCAAGTTTTAGTAC
18890



KKH

CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAggaaacaccaAAGATGATATTTTCTTTAATGGT

CGAGA






GCCAGgcat








191
SpyCas9-
+
GGATTATGCCTGGCACCATTGTTTTAGAGCTA
18753
TCATATGCATAATCAAAAAGGTTTTAGAGCT
18891



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCggaaacaccaAAGATGATATTTTCTTTAATGGT

GTGC






GCCAGgcat








192
SpyCas9-

ACGCTTCTGTATCTATATTCGTTTTAGAGCTA
18754
CTGGAGCCTTCAGAGGGTAAGTTTTAGAGCT
18892



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaatatcaTCTTTGGTGTTTCCTATGATGAATAT

GTGC






AGATAcaga








193
St1Cas9
+
GGATTATGCCTGGCACCATTGTCTTTGTACTC
18755
GTCTTTGTACTCTGGTACCAGAAGCTACAAA
18893





TGGTACCAGAAGCTACAAAGATAAGGCTTCA

GATAAGGCTTCATGCCGAAATCAACACCCTG






TGCCGAAATCAACACCCTGTCATTTTATGGCA

TCATTTTATGGCAGGGTGTTTT






GGGTGTTTTggaaacaccaAAGATGATATTTTCTTT








AATGGTGCCAGgcat








198
SauCas9KKH
+
CTGGATTATGCCTGGCACCATGTTTTAGTACT
18756
TGCATAATCAAAAAGTTTTCAGTTTTAGTAC
18894





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAgaaacaccaAAGATGATATTTTCTTTAATGGTG

CGAGA






CCAGGcata








199
SpyCas9-
+
TGGATTATGCCTGGCACCATGTTTTAGAGCTA
18757
CATATGCATAATCAAAAAGTGTTTTAGAGCT
18895



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCgaaacaccaAAGATGATATTTTCTTTAATGGTG

GTGC






CCAGGcata








200
SpyCas9-

GACGCTTCTGTATCTATATTGTTTTAGAGCTA
18758
TGGAGCCTTCAGAGGGTAAAGTTTTAGAGCT
18896



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCatatcaTCTTTGGTGTTTCCTATGATGAATATA

GTGC






GATACagaa








206
SauCas9KKH
+
CCTGGATTATGCCTGGCACCAGTTTTAGTACT
18759
TGCATAATCAAAAAGTTTTCAGTTTTAGTAC
18897





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAaaacaccaAAGATGATATTTTCTTTAATGGTGC

CGAGA






CAGGCataa








207
SpyCas9-
+
CTGGATTATGCCTGGCACCAGTTTTAGAGCTA
18760
ATATGCATAATCAAAAAGTTGTTTTAGAGCT
18898



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaaacaccaAAGATGATATTTTCTTTAATGGTGC

GTGC






CAGGCataa








208
SpyCas9-

TGACGCTTCTGTATCTATATGTTTTAGAGCTA
18761
GGAGCCTTCAGAGGGTAAAAGTTTTAGAGCT
18899



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCtatcaTCTTTGGTGTTTCCTATGATGAATATA

GTGC






GATACAgaag








209
BlatCas9

tgatGACGCTTCTGTATCTATATGCTATAGTTCC
18762
ctggAGCCTTCAGAGGGTAAAATGCTATAGTT
18900





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CCTTACTGAAAGGTAAGTTGCTATAGTAAGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

GCAACAGACCCGAGGCGTTGGGGATCGCCT






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

AGCCCGTGTTTACGGGCTCTCCCCATATTCA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AAATAATGACAGACGAGCACCTTGGAGCAT






CCGAGGTGCTtatcaTCTTTGGTGTTTCCTATGAT

TTATCTCCGAGGTGCT






GAATATAGATACAgaag








210
BlatCas9

tgatGACGCTTCTGTATCTATATGCTATAGTTCC
18763
ctggAGCCTTCAGAGGGTAAAATGCTATAGTT
18901





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CCTTACTGAAAGGTAAGTTGCTATAGTAAGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

GCAACAGACCCGAGGCGTTGGGGATCGCCT






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

AGCCCGTGTTTACGGGCTCTCCCCATATTCA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AAATAATGACAGACGAGCACCTTGGAGCAT






CCGAGGTGCTtatcaTCTTTGGTGTTTCCTATGAT

TTATCTCCGAGGTGCT






GAATATAGATACAgaag








214
SauCas9KKH
+
TCCTGGATTATGCCTGGCACCGTTTTAGTACT
18764
TGCATAATCAAAAAGTTTTCAGTTTTAGTAC
18902





CTGGAAACAGAATCTACTAAAACAAGGCAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






ATGCCGTGTTTATCTCGTCAACTTGTTGGCGA

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






GAaacaccaAAGATGATATTTTCTTTAATGGTGC

CGAGA






CAGGCAtaat








215
SpyCas9-
+
CCTGGATTATGCCTGGCACCGTTTTAGAGCTA
18765
TATGCATAATCAAAAAGTTTGTTTTAGAGCT
18903



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaacaccaAAGATGATATTTTCTTTAATGGTGC

GTGC






CAGGCAtaat








216
SpyCas9-

ATGACGCTTCTGTATCTATAGTTTTAGAGCTA
18766
GAGCCTTCAGAGGGTAAAATGTTTTAGAGCT
18904



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCatcaTCTTTGGTGTTTCCTATGATGAATATAG

GTGC






ATACAGaagc








217
SpyCas9-
+
TCCTGGATTATGCCTGGCACGTTTTAGAGCTA
18767
ATGCATAATCAAAAAGTTTTGTTTTAGAGCT
18905



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCacaccaAAGATGATATTTTCTTTAATGGTGCC

GTGC






AGGCATaatc








218
SpyCas9-

GATGACGCTTCTGTATCTATGTTTTAGAGCTA
18768
AGCCTTCAGAGGGTAAAATTGTTTTAGAGCT
18906



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCtcaTCTTTGGTGTTTCCTATGATGAATATAG

GTGC






ATACAGAagcg








220
SpyCas9-
+
TTCCTGGATTATGCCTGGCAGTTTTAGAGCTA
18769
TGCATAATCAAAAAGTTTTCGTTTTAGAGCT
18907



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCcaccaAAGATGATATTTTCTTTAATGGTGCCA

GTGC






GGCATAatcc








221
SpyCas9-

TGATGACGCTTCTGTATCTAGTTTTAGAGCTA
18770
GCCTTCAGAGGGTAAAATTAGTTTTAGAGCT
18908



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCcaTCTTTGGTGTTTCCTATGATGAATATAGA

GTGC






TACAGAAgcgt








222
BlatCas9

ctttGATGACGCTTCTGTATCTAGCTATAGTTCC
18771
ggagCCTTCAGAGGGTAAAATTAGCTATAGTT
18909





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CCTTACTGAAAGGTAAGTTGCTATAGTAAGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

GCAACAGACCCGAGGCGTTGGGGATCGCCT






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

AGCCCGTGTTTACGGGCTCTCCCCATATTCA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AAATAATGACAGACGAGCACCTTGGAGCAT






CCGAGGTGCTcaTCTTTGGTGTTTCCTATGATG

TTATCTCCGAGGTGCT






AATATAGATACAGAAgcgt








224
SpyCas9-
+
TTTCCTGGATTATGCCTGGCGTTTTAGAGCTA
18772
GCATAATCAAAAAGTTTTCAGTTTTAGAGCT
18910



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaccaAAGATGATATTTTCTTTAATGGTGCCA

GTGC






GGCATAAtcca








225
SpyCas9-

TTGATGACGCTTCTGTATCTGTTTTAGAGCTA
18773
CCTTCAGAGGGTAAAATTAAGTTTTAGAGCT
18911



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaTCTTTGGTGTTTCCTATGATGAATATAGAT

GTGC






ACAGAAGcgtc








228
PpnCas9
+
tcaGTTTTCCTGGATTATGCCTGGGTTGTAGCTC
18774
ataTGCATAATCAAAAAGTTTTCAGTTGTAGC
18912





CCTTTTTCATTTCGCGAAAGCGAAATGAAAAA

TCCCTTTTTCATTTCGCGAAAGCGAAATGAA






CGTTGTTACAATAAGAGATGAATTTCTCGCAA

AAACGTTGTTACAATAAGAGATGAATTTCTC






AGCTCTGCCTCTTGAAATTTCGGTTTCAAGAG

GCAAAGCTCTGCCTCTTGAAATTTCGGTTTC






GCATCccaAAGATGATATTTTCTTTAATGGTGC

AAGAGGCATC






CAGGCATAATccag








229
SpyCas9-
+
TTTTCCTGGATTATGCCTGGGTTTTAGAGCTA
18775
CATAATCAAAAAGTTTTCACGTTTTAGAGCT
18913



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCccaAAGATGATATTTTCTTTAATGGTGCCAG

GTGC






GCATAATccag








230
SpyCas9-

TTTGATGACGCTTCTGTATCGTTTTAGAGCTA
18776
CTTCAGAGGGTAAAATTAAGGTTTTAGAGCT
18914



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCTCTTTGGTGTTTCCTATGATGAATATAGAT

GTGC






ACAGAAGCgtca








233
PpnCas9

catGCTTTGATGACGCTTCTGTATGTTGTAGCTC
18777
agaGGGTAAAATTAAGCACAGTGGGTTGTAG
18915





CCTTTTTCATTTCGCGAAAGCGAAATGAAAAA

CTCCCTTTTTCATTTCGCGAAAGCGAAATGA






CGTTGTTACAATAAGAGATGAATTTCTCGCAA

AAAACGTTGTTACAATAAGAGATGAATTTCT






AGCTCTGCCTCTTGAAATTTCGGTTTCAAGAG

CGCAAAGCTCTGCCTCTTGAAATTTCGGTTT






GCATCCTTTGGTGTTTCCTATGATGAATATAG

CAAGAGGCATC






ATACAGAAGCGtcat








234
SpyCas9-
+
GTTTTCCTGGATTATGCCTGGTTTTAGAGCTA
18778
ATAATCAAAAAGTTTTCACAGTTTTAGAGCT
18916



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCcaAAGATGATATTTTCTTTAATGGTGCCAGG

GTGC






CATAATCcagg








235
SpyCas9-

CTTTGATGACGCTTCTGTATGTTTTAGAGCTA
18779
TTCAGAGGGTAAAATTAAGCGTTTTAGAGCT
18917



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCCTTTGGTGTTTCCTATGATGAATATAGATA

GTGC






CAGAAGCGtcat








236
BlatCas9
+
tcagTTTTCCTGGATTATGCCTGGCTATAGTTCC
18780
cataATCAAAAAGTTTTCACATAGCTATAGTTC
18918





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CTTACTGAAAGGTAAGTTGCTATAGTAAGGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

CAACAGACCCGAGGCGTTGGGGATCGCCTA






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

GCCCGTGTTTACGGGCTCTCCCCATATTCAA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AATAATGACAGACGAGCACCTTGGAGCATTT






CCGAGGTGCTcaAAGATGATATTTTCTTTAATG

ATCTCCGAGGTGCT






GTGCCAGGCATAATCcagg








237
BlatCas9
+
tcagTTTTCCTGGATTATGCCTGGCTATAGTTCC
18781
cataATCAAAAAGTTTTCACATAGCTATAGTTC
18919





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CTTACTGAAAGGTAAGTTGCTATAGTAAGGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

CAACAGACCCGAGGCGTTGGGGATCGCCTA






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

GCCCGTGTTTACGGGCTCTCCCCATATTCAA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AATAATGACAGACGAGCACCTTGGAGCATTT






CCGAGGTGCTcaAAGATGATATTTTCTTTAATG

ATCTCCGAGGTGCT






GTGCCAGGCATAATCcagg








240
Nme2Cas9
+
tcTCAGTTTTCCTGGATTATGCCTGTTGTAGCT
18782
aaTCAAAAAGTTTTCACATAGTTTGTTGTAGC
18920





CCCTTTCTCATTTCGGAAACGAAATGAGAACC

TCCCTTTCTCATTTCGGAAACGAAATGAGAA






GTTGCTACAATAAGGCCGTCTGAAAAGATGT

CCGTTGCTACAATAAGGCCGTCTGAAAAGAT






GCCGCAACGCTCTGCCCCTTAAAGCTTCTGCT

GTGCCGCAACGCTCTGCCCCTTAAAGCTTCT






TTAAGGGGCATCGTTTAaAAGATGATATTTTC

GCTTTAAGGGGCATCGTTTA






TTTAATGGTGCCAGGCATAATCCagga








241
SpyCas9-
+
AGTTTTCCTGGATTATGCCTGTTTTAGAGCTA
18783
TAATCAAAAAGTTTTCACATGTTTTAGAGCT
18921



NG

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaAAGATGATATTTTCTTTAATGGTGCCAGG

GTGC






CATAATCCagga








242
SpyCas9-
+
AGTTTTCCTGGATTATGCCTGTTTTAGAGCTA
18784
TAATCAAAAAGTTTTCACATGTTTTAGAGCT
18922



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCaAAGATGATATTTTCTTTAATGGTGCCAGG

GTGC






CATAATCCagga








243
SpyCas9-

GCTTTGATGACGCTTCTGTAGTTTTAGAGCTA
18785
TCAGAGGGTAAAATTAAGCAGTTTTAGAGCT
18923



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCTTTGGTGTTTCCTATGATGAATATAGATAC

GTGC






AGAAGCGTcatc








244
BlatCas9
+
ctcaGTTTTCCTGGATTATGCCTGCTATAGTTCC
18786
cataATCAAAAAGTTTTCACATAGCTATAGTTC
18924





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CTTACTGAAAGGTAAGTTGCTATAGTAAGGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

CAACAGACCCGAGGCGTTGGGGATCGCCTA






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

GCCCGTGTTTACGGGCTCTCCCCATATTCAA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AATAATGACAGACGAGCACCTTGGAGCATTT






CCGAGGTGCTaAAGATGATATTTTCTTTAATG

ATCTCCGAGGTGCT






GTGCCAGGCATAATCCagga








245
BlatCas9
+
ctcaGTTTTCCTGGATTATGCCTGCTATAGTTCC
18787
cataATCAAAAAGTTTTCACATAGCTATAGTTC
18925





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CTTACTGAAAGGTAAGTTGCTATAGTAAGGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

CAACAGACCCGAGGCGTTGGGGATCGCCTA






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

GCCCGTGTTTACGGGCTCTCCCCATATTCAA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AATAATGACAGACGAGCACCTTGGAGCATTT






CCGAGGTGCTaAAGATGATATTTTCTTTAATG

ATCTCCGAGGTGCT






GTGCCAGGCATAATCCagga








250
ScaCas9-
+
CAGTTTTCCTGGATTATGCCGTTTTAGAGCTA
18788
ATAATCAAAAAGTTTTCACAGTTTTAGAGCT
18926



Sc++

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCAAGATGATATTTTCTTTAATGGTGCCAGGC

GTGC






ATAATCCAggaa








251
SpyCas9
+
CAGTTTTCCTGGATTATGCCGTTTTAGAGCTA
18789
AGTTTCTTACCTCTTCTAGTGTTTTAGAGCTA
18927





GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

GAAATAGCAAGTTAAAATAAGGCTAGTCCG






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

TTATCAACTTGAAAAAGTGGCACCGAGTCGG






GCAAGATGATATTTTCTTTAATGGTGCCAGGC

TGC






ATAATCCAggaa








252
SpyCas9-
+
CAGTTTTCCTGGATTATGCCGTTTTAGAGCTA
18790
TAATCAAAAAGTTTTCACATGTTTTAGAGCT
18928



NG

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCAAGATGATATTTTCTTTAATGGTGCCAGGC

GTGC






ATAATCCAggaa








253
SpyCas9-
+
CAGTTTTCCTGGATTATGCCGTTTTAGAGCTA
18791
AATCAAAAAGTTTTCACATAGTTTTAGAGCT
18929



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCAAGATGATATTTTCTTTAATGGTGCCAGGC

GTGC






ATAATCCAggaa








254
SpyCas9-

TGCTTTGATGACGCTTCTGTGTTTTAGAGCTA
18792
CAGAGGGTAAAATTAAGCACGTTTTAGAGCT
18930



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCTTGGTGTTTCCTATGATGAATATAGATACA

GTGC






GAAGCGTCatca








263
SauriCas9
+
CTCAGTTTTCCTGGATTATGCGTTTTAGTACTC
18793
ATAGTTTCTTACCTCTTCTAGGTTTTAGTACT
18931





TGGAAACAGAATCTACTAAAACAAGGCAAAA

CTGGAAACAGAATCTACTAAAACAAGGCAA






TGCCGTGTTTATCTCGTCAACTTGTTGGCGAG

AATGCCGTGTTTATCTCGTCAACTTGTTGGC






AAGATGATATTTTCTTTAATGGTGCCAGGCAT

GAGA






AATCCAGgaaa








264
SauriCas9-
+
CTCAGTTTTCCTGGATTATGCGTTTTAGTACTC
18794
GCATAATCAAAAAGTTTTCACGTTTTAGTAC
18932



KKH

TGGAAACAGAATCTACTAAAACAAGGCAAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






TGCCGTGTTTATCTCGTCAACTTGTTGGCGAG

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






AAGATGATATTTTCTTTAATGGTGCCAGGCAT

CGAGA






AATCCAGgaaa








265
ScaCas9-
+
TCAGTTTTCCTGGATTATGCGTTTTAGAGCTA
18795
ATAATCAAAAAGTTTTCACAGTTTTAGAGCT
18933



Sc++

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCAGATGATATTTTCTTTAATGGTGCCAGGCA

GTGC






TAATCCAGgaaa








266
SpyCas9-
+
TCAGTTTTCCTGGATTATGCGTTTTAGAGCTA
18796
ATCAAAAAGTTTTCACATAGGTTTTAGAGCT
18934



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCAGATGATATTTTCTTTAATGGTGCCAGGCA

GTGC






TAATCCAGgaaa








267
SpyCas9-

ATGCTTTGATGACGCTTCTGGTTTTAGAGCTA
18797
AGAGGGTAAAATTAAGCACAGTTTTAGAGC
18935



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

TAGAAATAGCAAGTTAAAATAAGGCTAGTC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

CGTTATCAACTTGAAAAAGTGGCACCGAGTC






GCTGGTGTTTCCTATGATGAATATAGATACAG

GGTGC






AAGCGTCAtcaa








268
BlatCas9
+
ttctCAGTTTTCCTGGATTATGCGCTATAGTTCCT
18798
cataATCAAAAAGTTTTCACATAGCTATAGTTC
18936





TACTGAAAGGTAAGTTGCTATAGTAAGGGCA

CTTACTGAAAGGTAAGTTGCTATAGTAAGGG






ACAGACCCGAGGCGTTGGGGATCGCCTAGCC

CAACAGACCCGAGGCGTTGGGGATCGCCTA






CGTGTTTACGGGCTCTCCCCATATTCAAAATA

GCCCGTGTTTACGGGCTCTCCCCATATTCAA






ATGACAGACGAGCACCTTGGAGCATTTATCTC

AATAATGACAGACGAGCACCTTGGAGCATTT






CGAGGTGCTAGATGATATTTTCTTTAATGGTG

ATCTCCGAGGTGCT






CCAGGCATAATCCAGgaaa








272
SauCas9KKH
+
TCTCAGTTTTCCTGGATTATGGTTTTAGTACTC
18799
TGCATAATCAAAAAGTTTTCAGTTTTAGTAC
18937





TGGAAACAGAATCTACTAAAACAAGGCAAAA

TCTGGAAACAGAATCTACTAAAACAAGGCA






TGCCGTGTTTATCTCGTCAACTTGTTGGCGAG

AAATGCCGTGTTTATCTCGTCAACTTGTTGG






AGATGATATTTTCTTTAATGGTGCCAGGCATA

CGAGA






ATCCAGGaaaa








273
SpyCas9-
+
CTCAGTTTTCCTGGATTATGGTTTTAGAGCTA
18800
TCAAAAAGTTTTCACATAGTGTTTTAGAGCT
18938



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCGATGATATTTTCTTTAATGGTGCCAGGCAT

GTGC






AATCCAGGaaaa








274
SpyCas9-

CATGCTTTGATGACGCTTCTGTTTTAGAGCTA
18801
GAGGGTAAAATTAAGCACAGGTTTTAGAGC
18939



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

TAGAAATAGCAAGTTAAAATAAGGCTAGTC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

CGTTATCAACTTGAAAAAGTGGCACCGAGTC






GCGGTGTTTCCTATGATGAATATAGATACAGA

GGTGC






AGCGTCATcaaa








275
BlatCas9

tggcATGCTTTGATGACGCTTCTGCTATAGTTCC
18802
ggagCCTTCAGAGGGTAAAATTAGCTATAGTT
18940





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CCTTACTGAAAGGTAAGTTGCTATAGTAAGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

GCAACAGACCCGAGGCGTTGGGGATCGCCT






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

AGCCCGTGTTTACGGGCTCTCCCCATATTCA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AAATAATGACAGACGAGCACCTTGGAGCAT






CCGAGGTGCTGGTGTTTCCTATGATGAATATA

TTATCTCCGAGGTGCT






GATACAGAAGCGTCATcaaa








276
BlatCas9

tggcATGCTTTGATGACGCTTCTGCTATAGTTCC
18803
ggagCCTTCAGAGGGTAAAATTAGCTATAGTT
18941





TTACTGAAAGGTAAGTTGCTATAGTAAGGGC

CCTTACTGAAAGGTAAGTTGCTATAGTAAGG






AACAGACCCGAGGCGTTGGGGATCGCCTAGC

GCAACAGACCCGAGGCGTTGGGGATCGCCT






CCGTGTTTACGGGCTCTCCCCATATTCAAAAT

AGCCCGTGTTTACGGGCTCTCCCCATATTCA






AATGACAGACGAGCACCTTGGAGCATTTATCT

AAATAATGACAGACGAGCACCTTGGAGCAT






CCGAGGTGCTGGTGTTTCCTATGATGAATATA

TTATCTCCGAGGTGCT






GATACAGAAGCGTCATcaaa








278
SpyCas9-

GCATGCTTTGATGACGCTTCGTTTTAGAGCTA
18804
AGGGTAAAATTAAGCACAGTGTTTTAGAGCT
18942



NG

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCGTGTTTCCTATGATGAATATAGATACAGAA

GTGC






GCGTCATCaaag








279
SpyCas9-
+
TCTCAGTTTTCCTGGATTATGTTTTAGAGCTA
18805
CAAAAAGTTTTCACATAGTTGTTTTAGAGCT
18943



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCATGATATTTTCTTTAATGGTGCCAGGCATA

GTGC






ATCCAGGAaaac








280
SpyCas9-

GCATGCTTTGATGACGCTTCGTTTTAGAGCTA
18806
AGGGTAAAATTAAGCACAGTGTTTTAGAGCT
18944



SpRY

GAAATAGCAAGTTAAAATAAGGCTAGTCCGT

AGAAATAGCAAGTTAAAATAAGGCTAGTCC






TATCAACTTGAAAAAGTGGCACCGAGTCGGT

GTTATCAACTTGAAAAAGTGGCACCGAGTCG






GCGTGTTTCCTATGATGAATATAGATACAGAA

GTGC






GCGTCATCaaag










Table 4 provides design of RNA components of gene modifying systems for correcting the pathogenic F508del mutation in CFTR. The gRNA spacers from Table 1 were filtered, e.g., filtered by occurrence within 15 nt of the desired editing location and use of a Tier 1 Cas enzyme. For each gRNA ID, this table details the sequence of a complete template RNA, optional second-nick gRNA, and Cas variant for use in a Cas-RT fusion gene modifying polypeptide. For exemplification, PBS sequences and post-edit homology regions (after the location of the edit) are set to 12 nt and 30 nt, respectively. Additionally, a second-nick gRNA is selected with preference for a distance near 100 nt from the first nick and a first preference for a design resulting in a PAM-in system, as described elsewhere in this application.


Capital letters indicate “core nucleotides” while lower case letters indicate “flanking nucleotides.” Herein, when an RNA sequence (e.g., a template RNA sequence) is said to comprise a particular sequence (e.g., a sequence of Table 4 or a portion thereof) that comprises thymine (T), it is of course understood that the RNA sequence may (and frequently does) comprise uracil (U) in place of T. For instance, the RNA sequence may comprise U at every position shown as T in the sequence in Table 4. More specifically, the present disclosure provides an RNA sequence according to every template sequence shown in Table 4, wherein the RNA sequence has a U in place of each T in the sequence of Table 4.


In some embodiments, the systems and methods provided herein may comprise a template sequence listed in Table E3 or E3A. Tables 4E3 and E3A provide exemplary template RNA sequences designed to be paired with a gene modifying polypeptide to correct a mutation in the CFTR gene. The templates in Tables E3 and E3A are meant to exemplify the total sequence of: (1) gRNA spacer (e.g., for targeting for first strand nick), (2) gRNA scaffold, (3) heterologous object sequence, and (4) PBS sequence (e.g., for initiating TPRT at first strand nick).









TABLE E3







Exemplary Template RNAs and Sequences













SEQ





ID


RNACS
Name
Sequence
NO





RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
18992


2106
P16R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArA*mU*mG*m





G






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
18993


2107
P15R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrA*mA*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
18994


2108
P14R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrU*mA*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
18995


2109
P13R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUmU*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
18996


2110
P12R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArUrUrUrUrCrU*mU*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
18997


2111
P11R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArUrUrUrUrC*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
18998


2112
P10R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArUrUrUrU*mC*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
18999


2113
P9R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArUrUrU*mU*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19000


2114
P8R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArUrU*mU*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19001


2115
P7R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArU*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19002


2116
P16R11_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArA*mU*m





G*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19003


2117
P15R11_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrA*mA*mU*





mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19004


2118
P14R11_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrU*mA*mA*m





U






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19005


2119
P13R11_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrU*mU*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19006


2120
P12R11_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrU*mU*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19007


2121
P11R11_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArUrUrUrUrC*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19008


2122
P10R11_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArUrUrUrU*mC*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19009


2123
P9R11_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArUrUrU*mU*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19010


2124
P8R11_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArUrU*mU*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19011


2125
P7R11_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArU*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19012


2126
P16R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArA*m





U*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19013


2127
P15R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrA*mA*





mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19014


2128
P14R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrU*mA*m





A*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19015


2129
P13R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrU*mU*mA*





mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19016


2130
P12R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrU*mU*mU*m





A






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19017


2131
P11R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrC*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19018


2132
P10R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArUrUrUrU*mC*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19019


2133
P9R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArUrUrU*mU*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19020


2134
P8R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArUrU*mU*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19021


2135
P7R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArU*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19022


2136
P16R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrAr





A*mU*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19023


2137
P15R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrA





*mA*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19024


2138
P14R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrU*m





A*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19025


2139
P13R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrU*mU*





mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19026


2140
P12R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrU*mU*m





U*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19027


2141
P11R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrC*mU*mU*





mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19028


2142
P10R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrU*mC*mU*m





U






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19029


2143
P9R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrU*mU*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19030


2144
P8R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArUrU*mU*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19031


2145
P7R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArU*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19032


2146
P16R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUr





UrArA*mU*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19033


2147
P15R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUr





UrA*mA*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19034


2148
P14R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUr





U*mA*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19035


2149
P13R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrU





*mU*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19036


2150
P12R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrU*m





U*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19037


2151
P11R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrC*mU*





mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19038


2152
P10R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrU*mC*m





U*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19039


2153
P9R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrU*mU*mC*





mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19040


2154
P8R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrU*mU*mU*m





C






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19041


2155
P7R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArU*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19042


2156
P16R19_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCr





UrUrUrArA*mU*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19043


2157
P15R19_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCr





UrUrUrA*mA*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19044


2158
P14R19_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCr





UrUrU*mA*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19045


2159
P13R19_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCr





UrU*mU*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19046


2160
P12R19_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCr





U*mU*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19047


2161
P11R19_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrC





*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19048


2162
P10R19_
ArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrU*m





C*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19049


2163
P9R19_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrU*mU





*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19050


2164
P8R19_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrU*mU*m





U*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19051


2165
P7R19_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArU*mU*mU*





mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19052


2166
P16R21_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUr





UrCrUrUrUrArA*mU*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC



2167
P15R21_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUr
19053




UrCrUrUrUrA*mA*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19054


2168
P14R21_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUr





UrCrUrUrU*mA*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19055


2169
P13R21_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUr





UrCrUrU*mU*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19056


2170
P12R21_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUr





UrCrU*mU*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19057


2171
P11R21_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUr





UrC*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19058


2172
P10R21_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUr





U*mC*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19059


2173
P9R21_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrU





*mU*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19060


2174
P8R21_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrU*m





U*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19061


2175
P7R21_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArU*mU*





mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19062


2176
P16R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUr





UrUrUrCrUrUrUrArA*mU*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19063


2177
P15R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUr





UrUrUrCrUrUrUrA*mA*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19064


2178
P14R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUr





UrUrUrCrUrUrU*mA*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19065


2179
P13R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUr





UrUrUrCrUrU*mU*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19066


2180
P12R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUr





UrUrUrCrU*mU*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19067


2181
P11R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUr





UrUrUrC*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19068


2182
P10R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUr





UrUrU*mC*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19069


2183
P9R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUr





UrU*mU*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19070


2184
P8R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUr





U*mU*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19071


2185
P7R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArU





*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19072


2186
P16R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArUrUrUrUrCrUrUrUrArA*mU*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19073


2187
P15R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArUrUrUrUrCrUrUrUrA*mA*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19074


2188
P14R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArUrUrUrUrCrUrUrU*mA*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19075


2189
P13R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArUrUrUrUrCrUrU*mU*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19076


2190
P12R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArUrUrUrUrCrU*mU*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19077


2191
P11R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArUrUrUrUrC*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19078


2192
P10R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArUrUrUrU*mC*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19079


2193
P9R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArUrUrU*mU*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19080


2194
P8R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArUrU*mU*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19081


2195
P7R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArU*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19082


2196
P16R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArA*mU*mG*





mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19083


2197
P15R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrA*mA*mU*m





G






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19084


2198
P14R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrU*mA*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19085


2199
P13R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUmU*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19086


2200
P12R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrU*mU*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19087


2201
P11R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrC*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19088


2202
P10R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArUrUrUrU*mC*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19089


2203
P9R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArUrUrU*mU*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19090


2204
P8R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArUrU*mU*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19091


2205
P7R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArU*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19092


2206
P16R12_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArA*mU*





mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19093


2207
P15R12_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrA*mA*m





U*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19094


2208
P14R12_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrU*mA*mA*





mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19095


2209
P13R12_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUmU*mA*m





A






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19096


2210
P12R12_
ArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrU*mU*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19097


2211
P11R12_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrC*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19098


2212
P10R12_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArUrUrUrU*mC*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19099


2213
P9R12_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArUrUrU*mU*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19100


2214
P8R12_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArUrU*mU*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19101


2215
P7R12_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArU*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19102


2216
P16R14_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArA





*mU*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19103


2217
P15R14_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrA*m





A*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19104


2218
P14R14_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrU*mA*





mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19105


2219
P13R14_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrU*mU*m





A*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19106


2220
P12R14_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrU*mU*mU*





mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19107


2221
P11R14_
ArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrC*mU*mU*m





U






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19108


2222
P10R14_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrU*mC*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19109


2223
P9R14_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrU*mU*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19110


2224
P8R14_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArUrU*mU*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19111


2225
P7R14_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArU*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19112


2226
P16R16_
ArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUr





ArA*mU*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19113


2227
P15R16_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUr





A*mA*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19114


2228
P14R16_
ArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrU





*mA*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19115


2229
P13R16_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrU*m





U*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19116


2230
P12R16_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrU*mU*





mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19117


2231
P11R16_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrC*mU*m





U*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19118


2232
P10R16_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrU*mC*mU*





mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19119


2233
P9R16_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrU*mU*mC*m





U






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19120


2234
P8R16_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrU*mU*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19121


2235
P7R16_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArU*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19122


2236
P16R18_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUr





UrUrArA*mU*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19123


2237
P15R18_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUr





UrUrA*mA*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19124


2238
P14R18_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUr





UrU*mA*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19125


2239
P13R18_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUr





U*mU*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19126


2240
P12R18_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrU





*mU*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19127


2241
P11R18_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrC*m





U*mU*mU






RNACS
hCFTR2_
mA*mC*mC*ArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19128


2242
P10R18_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrU*mC*





mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19129


2243
P9R18_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrU*mU*m





C*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19130


2244
P8R18_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrU*mU*mU*





mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19131


2245
P7R18_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArU*mU*mU*m





U






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19132


2246
P16R20_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUr





CrUrUrUrArA*mU*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19133


2247
P15R20_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUr





CrUrUrUrA*mA*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19134


2248
P14R20_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUr





CrUrUrU*mA*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19135


2249
P13R20_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUr





CrUrU*mU*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19136


2250
P12R20_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUr





CrU*mU*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19137


2251
P11R20_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUr





C*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19138


2252
P10R20_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrU





*mC*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19139


2253
P9R20_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrU*m





U*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19140


2254
P8R20_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrU*mU*





mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19141


2255
P7R20_
ArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArU*mU*m





U*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19142


2256
P16R22_
ArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUr





UrUrCrUrUrUrArA*mU*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19143


2257
P15R22_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUr





UrUrCrUrUrUrA*mA*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19144


2258
P14R22_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUr





UrUrCrUrUrU*mA*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19145


2259
P13R22_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUr





UrUrCrUrU*mU*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19146


2260
P12R22_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUr





UrUrCrU*mU*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19147


2261
P11R22_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUr





UrUrC*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19148


2262
P10R22_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUr





UrU*mC*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19149


2263
P9R22_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUr





U*mU*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19150


2264
P8R22_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrU





*mU*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19151


2265
P7R22_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArU*m





U*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19152


2266
P16R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





UrUrUrUrCrUrUrUrArA*mU*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19153


2267
P15R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





UrUrUrUrCrUrUrUrA*mA*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19154


2268
P14R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





UrUrUrUrCrUrUrU*mA*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19155


2269
P13R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





UrUrUrUrCrUrU*mU*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19156


2270
P12R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





UrUrUrUrCrU*mU*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19157


2271
P11R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





UrUrUrUrC*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19158


2272
P10R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





UrUrUrU*mC*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19159


2273
P9R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





UrUrU*mU*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19160


2274
P8R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





UrU*mU*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19161


2275
P7R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





U*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19162


2276
P16R26_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArUrUrUrUrCrUrUrUrArA*mU*mG*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19163


2277
P15R26_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArUrUrUrUrCrUrUrUrA*mA*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19164


2278
P14R26_
ArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArUrUrUrUrCrUrUrU*mA*mA*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19165


2279
P13R26_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArUrUrUrUrCrUrU*mU*mA*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19166


2280
P12R26_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArUrUrUrUrCrU*mU*mU*mA






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19167


2281
P11R26_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArUrUrUrUrC*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19168


2282
P10R26_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArUrUrUrU*mC*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19169


2283
P9R26_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArUrUrU*mU*mC*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19170


2284
P8R26_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArUrU*mU*mU*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19171


2285
P7R26_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArU*mU*mU*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19172


3676
P17R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArU*mG*mG*





mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19173


3677
P18R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUrG*mG*m





U*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19174


3678
P19R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUrGrG*mU





*mG*mC






RNACS
hCFTR2_
mA*mC*mC*ArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19175


3679
P20R9_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUrGrGrU*m





G*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19176


3680
P17R11_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArU*mG*





mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19177


3681
P18R11_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUrG*m





G*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19178


3682
P19R11_
ArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUrGrG





*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19179


3683
P20R11_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUrGrGr





U*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19180


3684
P17R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArU





*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19181


3685
P18R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUr





G*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19182


3686
P19R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUr





GrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19183


3687
P20R13_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUr





GrGrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19184


3688
P17R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrAr





ArU*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19185


3689
P18R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrAr





ArUrG*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19186


3690
P19R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrAr





ArUrGrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19187


3691
P20R15_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUrUrAr





ArUrGrGrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19188


3692
P17R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUr





UrArArU*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19189


3693
P18R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUr





UrArArUrG*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19190


3694
P19R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUr





UrArArUrGrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19191


3695
P20R17_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCrUrUr





UrArArUrGrGrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19192


3696
P17R19_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCr





UrUrUrArArU*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19193


3697
P18R19_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCr





UrUrUrArArUrG*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19194


3698
P19R19_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCr





UrUrUrArArUrGrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19195


3699
P20R19_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUrUrCr





UrUrUrArArUrGrGrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19196


3700
P17R21_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUr





UrCrUrUrUrArArU*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19197


3701
P18R21_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUr





UrCrUrUrUrArArUrG*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19198


3702
P19R21_
ArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUr





UrCrUrUrUrArArUrGrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19199


3703
P20R21_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUrUrUr





UrCrUrUrUrArArUrGrGrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19200


3704
P17R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUr





UrUrUrCrUrUrUrArArU*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19201


3705
P18R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUr





UrUrUrCrUrUrUrArArUrG*mG*mU*mG






RNACS
hCFTR2_
mA*mC*C*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19202


3706
P19R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUr





UrUrUrCrUrUrUrArArUrGrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19203


3707
P20R23_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUrArUr





UrUrUrCrUrUrUrArArUrGrGrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19204


3708
P17R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArUrUrUrUrCrUrUrUrArArU*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19205


3709
P18R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArUrUrUrUrCrUrUrUrArArUrG*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19206


3710
P19R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArUrUrUrUrCrUrUrUrArArUrGrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19207


3711
P20R25_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTins
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrUrCrArUrCrArUrArGrGrArArArCrArCrCrArArArGrArUrGrArUr





ArUrUrUrUrCrUrUrUrArArUrGrGrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19208


3712
P17R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArU*mG*m





G*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19209


3713
P18R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUrG*mG





*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19210


3714
P19R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUrGrG*m





U*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19211


3715
P20R10_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUrGrGrU





*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19212


3716
P17R12_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArU*m





G*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19213


3717
P18R12_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUrG





*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19214


3718
P19R12_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUrGr





G*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19215


3719
P20R12_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArArUrGr





GrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19216


3720
P17R14_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArAr





U*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19217


3721
P18R14_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArAr





UrG*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19218


3722
P19R14_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArAr





UrGrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19219


3723
P20R14_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUrArAr





UrGrGrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19220


3724
P17R16_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUr





ArArU*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19221


3725
P18R16_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUr





ArArUrG*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19222


3726
P19R16_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUr





ArArUrGrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19223


3727
P20R16_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUrUrUr





ArArUrGrGrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19224


3728
P17R18_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUr





UrUrArArU*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19225


3729
P18R18_
ArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUr





UrUrArArUrG*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19226


3730
P19R18_
ArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUr





UrUrArArUrGrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19227


3731
P20R18_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUrCrUr





UrUrArArUrGrGrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19228


3732
P17R20_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUr





CrUrUrUrArArU*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19229


3733
P18R20_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUr





CrUrUrUrArArUrG*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19230


3734
P19R20_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUr





CrUrUrUrArArUrGrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19231


3735
P20R20_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUrUrUr





CrUrUrUrArArUrGrGrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19232


3736
P17R22_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUr





UrUrCrUrUrUrArArU*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19233


3737
P18R22_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUr





UrUrCrUrUrUrArArUrG*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19234


3738
P19R22_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUr





UrUrCrUrUrUrArArUrGrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19235


3739
P20R22_
ArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrArUrUr





UrUrCrUrUrUrArArUrGrGrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19236


3740
P17R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





UrUrUrUrCrUrUrUrArArU*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19237


3741
P18R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





UrUrUrUrCrUrUrUrArArUrG*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19238


3742
P19R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





UrUrUrUrCrUrUrUrArArUrGrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19239


3743
P20R24_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrArUrAr





UrUrUrUrCrUrUrUrArArUrGrGrU*mG*mC*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19240


3744
P17R26_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArUrUrUrUrCrUrUrUrArArU*mG*mG*mU






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19241


3745
P18R26_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArUrUrUrUrCrUrUrUrArArUrG*mG*mU*mG






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19242


3746
P19R26_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArUrUrUrUrCrUrUrUrArArUrGrG*mU*mG*mC






RNACS
hCFTR2_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmC
19243


3747
P20R26_
rArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmUmGmAmAmAmAmAmGmUmGm




CTTinsSub
GmCmAmCmCmGmAmGmUmCmGmGmUmGmCrArUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrArArGrArUrGrAr





UrArUrUrUrUrCrUrUrUrArArUrGrGrU*mG*mC*mC










Table E3 provides exemplary template RNAs comprising the hCFTR2 gRNA spacer (ACCAUUAAAGAAAAUAUCAU SEQ ID NO: 19587) for use in combination with gene modifying systems described herein for correcting the pathogenic F508del mutation in the CFTR gene. The description of each exemplary template RNA (column 2) identifies the gRNA spacer, the length of the PBS, the length of the heterologous object sequence, the nature and size of the edit, and the location in the target nucleic acid sequence. For example, hCFTR2_P16_R9_CTTins indicates that the template RNA comprised a hCFTR2 gRNA spacer, a PBS length of 16 nucleotides, an RT region (heterologous object sequence) of 9 nucleotides, and was designed to produce a 3 nucleotide CTT insertions at the nick site to correct the F508 mutation from T to TCTT. In the sequences of Table E3, the gRNA spacer is the 5′-most sequence; immediately 3′ of the gRNA spacer is the gRNA scaffold which comprises a sequence of GuuuuAGAGCuAGAAAuAGCAAGuuAAAAuAAGGCuAGuCCGuuAuCAACuuGAAAAAG uGGCACCGAGuCGGuGC (SEQ ID NO: 19588); immediately 3′ of the gRNA scaffold is the heterologous object sequence which has the length indicated in the description of column 2; and immediately 3′ of the heterologous sequence is the PBS which has the length indicated in the description of column 2. Nucleotide modifications are noted as follows: phosphorothioate linkages denoted by an asterisk, 2′-O-methyl groups denoted by an ‘m’ preceding a nucleotide.


Table E3A shows the sequences of E3 without chemical modifications. In some embodiments, the sequences of Table E3A may be used without chemical modifications, or with one or more chemical modifications.









TABLE E3A







Sequences without Chemical Modifications













SEQ ID


RNACS
Name
Sequence
NO





RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19291


2106
P16R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUUCUUUAAUGG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19292


2107
P15R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUUCUUUAAUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19293


2108
P14R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUUCUUUAAU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19294


2109
P13R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUUCUUUAA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19295


2110
P12R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUUCUUUA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19296


2111
P11R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUUCUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19297


2112
P10R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUUCUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19298


2113
P9R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUUCU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19299


2114
P8R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUUC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19300


2115
P7R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19301


2116
P16R11_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUUCUUUAAUGG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19302


2117
P15R11_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUUCUUUAAUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19303


2118
P14R11_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUUCUUUAAU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19304


2119
P13R11_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUUCUUUAA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19305


2120
P12R11_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUUCUUUA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19306


2121
P11R11_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUUCUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19307


2122
P10R11_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUUCUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19308


2123
P9R11_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUUCU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19309


2124
P8R11_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUUC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19310


2125
P7R11_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19311


2126
P16R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUUCUUUAAUGG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19312


2127
P15R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUUCUUUAAUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19313


2128
P14R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUUCUUUAAU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19314


2129
P13R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUUCUUUAA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19315


2130
P12R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUUCUUUA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19316


2131
P11R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUUCUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19317


2132
P10R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUUCUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19318


2133
P9R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUUCU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19319


2134
P8R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUUC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19320


2135
P7R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19321


2136
P16R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUUCUUUAAUGG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19322


2137
P15R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUUCUUUAAUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19323


2138
P14R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUUCUUUAAU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19324


2139
P13R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUUCUUUAA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19325


2140
P12R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUUCUUUA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19326


2141
P11R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUUCUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19327


2142
P10R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUUCUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19328


2143
P9R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUUCU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19329


2144
P8R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUUC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19330


2145
P7R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19331


2146
P16R17_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19332


2147
P15R17_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUUCUUUAAUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19333


2148
P14R17_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUUCUUUAAU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19334


2149
P13R17_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUUCUUUAA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19335


2150
P12R17_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUUCUUUA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19336


2151
P11R17_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUUCUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19337


2152
P10R17_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUUCUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19338


2153
P9R17_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUUCU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19339


2154
P8R17_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUUC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19340


2155
1P7R7_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19341


2156
P16R19_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19342


2157
P15R19_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19343


2158
P14R19_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19344


2159
P13R19_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19345


2160
P12R19_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUUCUUUA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19346


2161
P11R19_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUUCUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19347


2162
P10R19_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUUCUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19348


2163
P9R19_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUUCU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19349


2164
1P8R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUUC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19350


2165
P7R19_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19351


2166
P16R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19352


2167
P15R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19353


2168
P14R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19354


2169
P13R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19355


2170
P12R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19356


2171
P11R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUUCUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19357


2172
P10R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUUCUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19358


2173
P9R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUUCU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19359


2174
P8R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUUC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19360


2175
P7R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19361


2176
P16R23_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAU





GG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19362


2177
P15R23_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAU





G






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19363


2178
P14R23_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19364


2179
P13R23_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19365


2180
P12R23_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19366


2181
P11R23_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19367


2182
RP1023_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19368


2183
P9R23_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUUCU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19369


2184
P8R23_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUUC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19370


2185
2P7R3_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19371


2186
P16R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUA





AUGG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19372


2187
P15R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUA





AUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19373


2188
P14R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUA





AU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19374


2189
P13R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUA





A






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19375


2190
P12R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19376


2191
P11R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19377


2192
P10R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19378


2193
P9R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUUCU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19379


2194
P8R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUUC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19380


2195
P7R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19381


2196
P16R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUUCUUUAAUGG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19382


2197
P15R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUUCUUUAAUG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19383


2198
P14R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUUCUUUAAU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19384


2199
P13R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUUCUUUAA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19385


2200
P12R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUUCUUUA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19386


2201
P11R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUUCUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19387


2202
P10R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUUCUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19388


2203
P9R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUUCU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19389


2204
P8R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUUC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19390


2205
P7R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19391


2206
P16R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUUCUUUAAUGG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19392


2207
P15R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUUCUUUAAUG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19393


2208
P14R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUUCUUUAAU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19394


2209
P13R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUUCUUUAA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19395


2210
P12R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUUCUUUA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19396


2211
P11R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUUCUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19397


2212
P10R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUUCUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19398


2213
P9R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUUCU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19399


2214
P8R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUUC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19400


2215
P7R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19401


2216
P16R14_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUUCUUUAAUGG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19402


2217
P15R14_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUUCUUUAAUG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19403


2218
P14R14_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUUCUUUAAU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19404


2219
P13R14_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUUCUUUAA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19405


2220
P12R14_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUUCUUUA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19406


2221
RP1114_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUUCUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19407


2222
P10R14_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUUCUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19408


2223
P9R14_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUUCU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19409


2224
P8R14_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUUC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19410


2225
P7R14_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19411


2226
P16R16_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUUCUUUAAUGG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19412


2227
P15R16_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUUCUUUAAUG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19413


2228
P14R16_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUUCUUUAAU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19414


2229
P13R16_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUUCUUUAA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19415


2230
P12R16_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUUCUUUA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19416


2231
P11R16_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUUCUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19417


2232
P10R16_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUUCUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19418


2233
1P9R6_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUUCU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19419


2234
P8R16_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUUC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19420


2235
P7R16_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19421


2236
P16R18_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUGG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19422


2237
P15R18_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19423


2238
P14R18_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUUCUUUAAU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19424


2239
P13R18_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUUCUUUAA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19425


2240
P12R18_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUUCUUUA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19426


2241
RP1118_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19427


2242
P10R18_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUUCUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19428


2243
P9R18_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUUCU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19429


2244
1P8R8_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUUC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19430


2245
P7R18_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19431


2246
P16R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUGG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19432


2247
P15R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19433


2248
P14R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19434


2249
P13R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19435


2250
P12R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUUCUUUA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19436


2251
P11R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19437


2252
P10R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUUCUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19438


2253
P9R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUUCU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19439


2254
P8R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUUC




bSu







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19440


2255
P7R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19441


2256
P16R22_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUG




Sub
G






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19442


2257
P15R22_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19443


2258
RP1422_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19444


2259
P13R22_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19445


2260
P12R22_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19446


2261
P11R22_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19447


2262
P10R22_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19448


2263
P9R22_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUUCU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19449


2264
P8R22_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUUC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19450


2265
P7R22_CTTinsS
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUU




ub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19451


2266
P16R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAA




Sub
UGG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19452


2267
P15R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAA




Sub
UG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19453


2268
P14R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAA




Sub
U






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19454


2269
P13R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19455


2270
P12R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUA




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19456


2271
P11R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19457


2272
P10R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19458


2273
P9R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCU




bSu







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19459


2274
P8R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUUC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19460


2275
P7R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19461


2276
P16R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub
AAUGG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19462


2277
P15R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub
AAUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19463


2278
P14R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub
AAU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19464


2279
P13R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub
AA






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19465


2280
P12R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub
A






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19466


2281
P11R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19467


2282
P10R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19468


2283
P9R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19469


2284
P8R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUUC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19470


2285
P7R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19471


3676
P17R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUUCUUUAAUGGU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19472


3677
P18R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUUCUUUAAUGGUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19473


3678
P19R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUUCUUUAAUGGUGC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19474


3679
P20R9_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCCAAAGAUGAUAUUUUCUUUAAUGGUGCC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19475


3680
RP1711_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUUCUUUAAUGGU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19476


3681
P18R11_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUUCUUUAAUGGUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19477


3682
P19R11_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUUCUUUAAUGGUGC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19478


3683
P20R11_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCACCAAAGAUGAUAUUUUCUUUAAUGGUGCC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19479


3684
P17R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUUCUUUAAUGGU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19480


3685
P18R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUUCUUUAAUGGUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19481


3686
P19R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUUCUUUAAUGGUGC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19482


3687
P20R13_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAACACCAAAGAUGAUAUUUUCUUUAAUGGUGCC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19483


3688
P17R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUUCUUUAAUGGU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19484


3689
P18R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUUCUUUAAUGGUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19485


3690
P19R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUUCUUUAAUGGUGC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19486


3691
P20R15_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGAAACACCAAAGAUGAUAUUUUCUUUAAUGGUGCC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19487


3692
P17R17_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGGU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19488


3693
P18R17_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGGUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19489


3694
P19R17_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGGUGC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19490


3695
P20R17_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGGUGCC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19491


3696
P17R19_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGGU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19492


3697
P18R19_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGGUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19493


3698
P19R19_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGGUG





C






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19494


3699
P20R19_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGGUG





CC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19495


3700
P17R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGG





U






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19496


3701
P18R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGG





UG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19497


3702
P19R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGG





UGC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19498


3703
P20R21_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAUGG





UGCC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19499


3704
P17R23_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAU





GGU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19500


3705
RP1823_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAU





GGUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19501


3706
P19R23_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAU





GGUGC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19502


3707
P20R23_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUAAU





GGUGCC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19503


3708
P17R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUA





AUGGU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19504


3709
P18R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUA





AUGGUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19505


3710
P19R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUA





AUGGUGC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19506


3711
P20R25_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUUCAUCAUAGGAAACACCAAAGAUGAUAUUUUCUUUA





AUGGUGCC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19507


3712
P17R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUUCUUUAAUGGU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19508


3713
P18R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUUCUUUAAUGGUG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19509


3714
P19R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUUCUUUAAUGGUGC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19510


3715
P20R10_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACCGAAGAUGAUAUUUUCUUUAAUGGUGCC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19511


3716
P17R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUUCUUUAAUGGU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19512


3717
P18R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUUCUUUAAUGGUG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19513


3718
P19R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUUCUUUAAUGGUGC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19514


3719
P20R12_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCACACCGAAGAUGAUAUUUUCUUUAAUGGUGCC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19515


3720
P17R14_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUUCUUUAAUGGU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19516


3721
P18R14_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUUCUUUAAUGGUG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19517


3722
P19R14_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUUCUUUAAUGGUGC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19518


3723
P20R14_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAAACACCGAAGAUGAUAUUUUCUUUAAUGGUGCC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19519


3724
P17R16_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUUCUUUAAUGGU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19520


3725
P18R16_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUUCUUUAAUGGUG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19521


3726
P19R16_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUUCUUUAAUGGUGC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19522


3727
P20R16_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCGGAAACACCGAAGAUGAUAUUUUCUUUAAUGGUGCC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19523


3728
P17R18_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUGGU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19524


3729
P18R18_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUGGUG




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19525


3730
P19R18_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUGGUGC




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19526


3731
P20R18_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUGGUGC




Sub
C






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19527


3732
P17R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUGGU




Sub







RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19528


3733
P18R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUGGU




Sub
G






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19529


3734
P19R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUGGU




Sub
GC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19530


3735
P20R20_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUGGU




Sub
GCC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19531


3736
RP1722_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUG




Sub
GU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19532


3737
P18R22_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUG




Sub
GUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19533


3738
P19R22_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUG




Sub
GUGC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19534


3739
P20R22_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAAUG




Sub
GUGCC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19535


3740
P17R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAA




Sub
UGGU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19536


3741
P18R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAA




Sub
UGGUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19537


3742
P19R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAA




Sub
UGGUGC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19538


3743
P20R24_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUUAA




Sub
UGGUGCC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19539


3744
P17R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub
AAUGGU






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19540


3745
P18R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub
AAUGGUG






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19541


3746
P19R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub
AAUGGUGC






RNACS
hCFTR2_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAU
19542


3747
P20R26_CTTins
CAACUUGAAAAAGUGGCACCGAGUCGGUGCAUUCAUCAUAGGAAACACCGAAGAUGAUAUUUUCUUU




Sub
AAUGGUGCC









The sequences within Tables E3 and E3A provide the suitable versions of the RT template sequence and PBS sequences for use in the template RNA. Other suitable versions may comprise shorter or longer sequences compared to the sequences shown in Tables E3 and E3A. In some embodiments, the RT template sequence can be 59 nt or less. In some embodiments, where the RT template sequences comprises fewer bases than shown in Tables E3 and E3A, bases are removed from the 5′ end of the RT template sequence. Consequently, in some embodiments, a template RNA described herein comprises a RT sequence of Table E3 or E3A or a portion thereof (e.g., a 3′ portion of the RT sequence). In some embodiments, the 3′ portion of the RT sequence of Table E3 or E3A has a length of about 30-35, 35-40, 40-45, 45-50, 50-55, or 50-59 nucleotides. In some embodiments, the PBS sequence can be varied, e.g., from 5 to 17 nucleotides. In some embodiments, where the PBS sequences comprises fewer bases than shown in Tables E3 and E3A, bases are sequentially removed from the 3′ end of the PBS sequence. Consequently, in some embodiments, a template RNA described herein comprises a PBS sequence of Tables E3 and E3A or a portion thereof (e.g., a 5′ portion of the PBS sequence). In some embodiments, the 5′ portion of the PBS sequence of Table E3 or E3A has a length of 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, or 17 nucleotides.


In some embodiments, the template RNA comprises a spacer of Table E3 or E3A. In some embodiments, the template RNA comprises a sequence having at least 16, 17, 18, or 19 nucleotides of a spacer of Table E3 or E3A. In some embodiments, the 16, 17, 18, or 19 nucleotides are from the 5′ end of the spacer of Table E3 or E3A. In some embodiments, the 16, 17, 18, or 19 nucleotides are from the 3′ end of the spacer of Table E3 or E3A.


In some embodiments, the systems and methods provided herein may comprise a second-nick gRNA spacer sequence listed in Table G3 or G3A. Tables G3 and G3A provides exemplary second-nick gRNA spacer sequences designed to be paired with a template RNA sequence (e.g., as described in Table E3 or E3A) and a gene modifying polypeptide to correct a mutation in the CFTR gene. In some embodiments, the second-nick gRNA spacer comprises a sequence having at least 16, 17, 18, or 19 nucleotides of a spacer of Table G3 or G3A. In some embodiments, the 16, 17, 18, or 19 nucleotides are from the 5′ end of the spacer of Table G3 or G3A. In some embodiments, the 16, 17, 18, or 19 nucleotides are from the 3′ end of the spacer of Table G3 or G3A.









TABLE G3







Exemplary Second Nick gRNA Spacer Sequences













SEQ ID


RNACS
Name
Sequence
NO





RNACS
CF_F508_
mU*mA*mU*rArArUrUrUrGrGrGrUrArGrUrGrUrGrArArGrUrUrUrUrArGrAmGmCmUmAmGm
19567


2286
ngRNA_
AmAmAmUmAmGmCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAm




1
AmCmUmUmGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCmU*mU*mU*mU






RNACS
CF_F508_
mA*mU*mA*rUrArArUrUrUrGrGrGrUrArGrUrGrUrGrArGrUrUrUrUrArGrAmGmCmUmAmGm
19568


2287
ngRNA_
AmAmAmUmAmGmCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAm




2
AmCmUmUmGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCmU*mU*mU*mU






RNACS
CF_F508_
mA*mG*mU*rUrUrCrUrUrArCrCrUrCrUrUrCrUrArGrUrGrUrUrUrUrArGrAmGmCmUmAmGm
19569


2288
ngRNA_
AmAmAmUmAmGmCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAm




3
AmCmUmUmGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCmU*mU*mU*mU






RNACS
CF_F508_
mU*mC*mA*rUrCrArUrUrArGrArArGrUrGrArArGrUrCrGrUrUrUrUrArGrAmGmCmUmAmGm
19570


8395
ngRNA_
AmAmAmUmAmGmCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAm




4
AmCmUmUmGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCmU*mU*mU*mU






RNACS
CF_F508_
mU*mU*mC*rArCrUrUrCrUrArArUrGrArUrGrArUrUrArGrUrUrUrUrArGrAmGmCmUmAmGm
19571


8396
ngRNA_
AmAmAmUmAmGmCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAm




5
AmCmUmUmGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCmU*mU*mU*mU






RNACS
CF_F508_
mA*mA*mU*rGrGrUrGrCrCrArGrGrCrArUrArArUrCrCrGrUrUrUrUrArGrAmGmCmUmAmGm
19572


8397
ngRNA_
AmAmAmUmAmGmCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAm




6
AmCmUmUmGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCmU*mU*mU*mU






RNACS
CF_F508_
mA*mU*mA*rUrUrUrUrCrUrUrUrArArUrGrGrUrGrCrCrGrUrUrUrUrArGrAmGmCmUmAmGm
19573


8398
ngRNA_
AmAmAmUmAmGmCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAm




7
AmCmUmUmGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCmU*mU*mU*mU






RNACS
CF_F508_
mA*mA*mA*rGrArUrGrArUrArUrUrUrUrCrUrUrUrArArGrUrUrUrUrArGrAmGmCmUmAmGm
19574


8399
ngRNA_
AmAmAmUmAmGmCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAm




8_mut
AmCmUmUmGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCmU*mU*mU*mU






RNACS
CF_F508_
mA*mC*mC*rArArUrGrArUrArUrUrUrUrCrUrUrUrArArGrUrUrUrUrArGrAmGmCmUmAmGm
19575


8400
ngRNA_
AmAmAmUmAmGmCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAm




8
AmCmUmUmGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCmU*mU*mU*mU






RNACS
CF_F508_
mU*mC*mU*rGrUrArUrCrUrArUrArUrUrCrArUrCrArUrGrUrUrUrUrArGrAmGmCmUmAmGm
19576


1714
ngRNA_
AmAmAmUmAmGmCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAm




9
AmCmUmUmGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCmU*mU*mU*mU









Table G3 provides exemplified second-nick gRNA species for optional use for correcting the pathogenic F508del mutation in CFTR.


Table G3A shows the sequences of G3 without chemical modifications. In some embodiments, the sequences of Table G3A may be used without chemical modifications, or with one or more chemical modifications.









TABLE G3A







Exemplary Second Nick gRNA Spacer Sequences without Chemical Modifications













SEQ ID


RNACS
Name
Sequence
NO





RNACS
CF_F508_
UAUAAUUUGGGUAGUGUGAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19577


2286
ngRNA_1
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU






RNACS
CF_F508_
AUAUAAUUUGGGUAGUGUGAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19578


2287
ngRNA_2
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU






RNACS
CF_F508_
AGUUUCUUACCUCUUCUAGUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19579


2288
ngRNA_3
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU






RNACS
CF_F508_
UCAUCAUUAGAAGUGAAGUCGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19580


8395
ngRNA_4
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU






RNACS
CF_F508_
UUCACUUCUAAUGAUGAUUAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19581


8396
ngRNA_5
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU






RNACS
CF_F508_
AAUGGUGCCAGGCAUAAUCCGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19582


8397
ngRNA_6
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU






RNACS
CF_F508_
AUAUUUUCUUUAAUGGUGCCGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19583


8398
ngRNA_7
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU






RNACS
CF_F508_
AAAGAUGAUAUUUUCUUUAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19584


8399
ngRNA_8_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU




mut







RNACS
CF_F508_
ACCAAUGAUAUUUUCUUUAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19585


8400
ngRNA_8
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU






RNACS
CF_F508_
UCUGUAUCUAUAUUCAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19586


1714
ngRNA_9
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU









The template RNA sequences shown in Tables 1-4, E3, and E3A may be customized depending on the cell being targeted. For example, in some embodiments it is desired to inactivate a PAM sequence upon editing (e.g., using a “PAM-kill” modification) to decrease the potential for further gene editing (e.g., by Cas retargeting) following the initial edit. Consequently, certain template RNAs described herein are designed to write a mutation (e.g., a substitution) into the PAM of the target site, such that upon editing, the PAM site will be mutated to a sequence no longer recognized by the gene modifying polypeptide. Thus, a mutation region within the heterologous object sequence of the template RNA may comprise a PAM-kill sequence. Without wishing to be bound by theory, in some embodiments, a PAM-kill sequence prevents re-engagement of the gene modifying polypeptide upon completion of a gene modification, or decreases re-engagement relative to a template RNA lacking a PAM-kill sequence. In some embodiments, a PAM-kill sequence does not alter the amino acid sequence encoded by a gene, e.g., the PAM-kill sequence results in a silent mutation. In other embodiments, it is desired to leave the PAM sequence intact (no PAM-kill).


Similarly, in some embodiments, to decrease the potential for further gene editing (e.g., by Cas retargeting) following the initial edit, it may be desirable to alter the first three nucleotides of the RT template sequence via a “seed-kill” motif. Consequently, certain template RNAs described herein are designed to write a mutation (e.g., a substitution) into the portion of the target site corresponding to the first three nucleotides of the RT template sequence, such that upon editing, the target site will be mutated to a sequence with lower homology to the RT template sequence. Thus, a mutation region within the heterologous object sequence of the template RNA may comprise a seed-kill sequence. Without wishing to be bound by theory, in some embodiments, a seed-kill sequence prevents re-engagement of the gene modifying polypeptide upon completion of genetic modification, or decreases re-engagement relative to an otherwise similar template RNA lacking a seed-kill sequence. In some embodiments, a seed-kill sequence does not alter the amino acid sequence encoded by a gene, e.g., the seed-kill sequence results in a silent mutation. In other embodiments, it is desired to leave the seed region intact, and a seed-kill sequence is not used.


In further embodiments, to optimize or improve gene editing efficiency, it may be desirable to evade the target cell's mismatch repair or nucleotide repair pathways or to bias the target cell's repair pathways toward preservation of the edited strand. In some embodiments, multiple silent mutations (for example, silent substitutions) may be introduced within the RT template sequence to evade the target cell's mismatch repair or nucleotide repair pathways or to bias the target cell's repair pathways toward preservation of the edited strand.


In some embodiments, the template RNA comprises one or more silent mutations.


It should be understood that the silent mutations may be used individually or combined in any manner in a template RNA sequence described herein.


In some embodiments, the template RNA comprises a sequence having one or more silent substitutions.


gRNAs with Inducible Activity


In some embodiments, a gRNA described herein (e.g., a gRNA that is part of a template RNA or a gRNA used for second strand nicking) has inducible activity. Inducible activity may be achieved by the template nucleic acid, e.g., template RNA, further comprising (in addition to the gRNA) a blocking domain, wherein the sequence of a portion of or all of the blocking domain is at least partially complementary to a portion or all of the gRNA. The blocking domain is thus capable of hybridizing or substantially hybridizing to a portion of or all of the gRNA. In some embodiments, the blocking domain and inducibly active gRNA are disposed on the template nucleic acid, e.g., template RNA, such that the gRNA can adopt a first conformation where the blocking domain is hybridized or substantially hybridized to the gRNA, and a second conformation where the blocking domain is not hybridized or not substantially hybridized to the gRNA. In some embodiments, in the first conformation the gRNA is unable to bind to the gene modifying polypeptide (e.g., the template nucleic acid binding domain, DNA binding domain, or endonuclease domain (e.g., a CRISPR/Cas protein)) or binds with substantially decreased affinity compared to an otherwise similar template RNA lacking the blocking domain. In some embodiments, in the second conformation the gRNA is able to bind to the gene modifying polypeptide (e.g., the template nucleic acid binding domain, DNA binding domain, or endonuclease domain (e.g., a CRISPR/Cas protein)). In some embodiments, whether the gRNA is in the first or second conformation can influence whether the DNA binding or endonuclease activities of the gene modifying polypeptide (e.g., of the CRISPR/Cas protein the gene modifying polypeptide comprises) are active.


In some embodiments, the gRNA that coordinates the second nick has inducible activity. In some embodiments, the gRNA that coordinates the second nick is induced after the template is reverse transcribed. In some embodiments, hybridization of the gRNA to the blocking domain can be disrupted using an opener molecule. In some embodiments, an opener molecule comprises an agent that binds to a portion or all of the gRNA or blocking domain and inhibits hybridization of the gRNA to the blocking domain. In some embodiments, the opener molecule comprises a nucleic acid, e.g., comprising a sequence that is partially or wholly complementary to the gRNA, blocking domain, or both. By choosing or designing an appropriate opener molecule, providing the opener molecule can promote a change in the conformation of the gRNA such that it can associate with a CRISPR/Cas protein and provide the associated functions of the CRISPR/Cas protein (e.g., DNA binding and/or endonuclease activity). Without wishing to be bound by theory, providing the opener molecule at a selected time and/or location may allow for spatial and temporal control of the activity of the gRNA, CRISPR/Cas protein, or gene modifying system comprising the same. In some embodiments, the opener molecule is exogenous to the cell comprising the gene modifying polypeptide and or template nucleic acid. In some embodiments, the opener molecule comprises an endogenous agent (e.g., endogenous to the cell comprising the gene modifying polypeptide and or template nucleic acid comprising the gRNA and blocking domain). For example, an inducible gRNA, blocking domain, and opener molecule may be chosen such that the opener molecule is an endogenous agent expressed in a target cell or tissue, e.g., thereby ensuring activity of a gene modifying system in the target cell or tissue. As a further example, an inducible gRNA, blocking domain, and opener molecule may be chosen such that the opener molecule is absent or not substantially expressed in one or more non-target cells or tissues, e.g., thereby ensuring that activity of a gene modifying system does not occur or substantially occur in the one or more non-target cells or tissues, or occurs at a reduced level compared to a target cell or tissue. Exemplary blocking domains, opener molecules, and uses thereof are described in PCT App. Publication WO2020044039A1, which is incorporated herein by reference in its entirety. In some embodiments, the template nucleic acid, e.g., template RNA, may comprise one or more sequences or structures for binding by one or more components of a gene modifying polypeptide, e.g., by a reverse transcriptase or RNA binding domain, and a gRNA. In some embodiments, the gRNA facilitates interaction with the template nucleic acid binding domain (e.g., RNA binding domain) of the gene modifying polypeptide. In some embodiments, the gRNA directs the gene modifying polypeptide to the matching target sequence, e.g., in a target cell genome.


Circular RNAs and Ribozymes in Gene Modifying Systems


It is contemplated that it may be useful to employ circular and/or linear RNA states during the formulation, delivery, or gene modifying reaction within the target cell. Thus, in some embodiments of any of the aspects described herein, a gene modifying system comprises one or more circular RNAs (circRNAs). In some embodiments of any of the aspects described herein, a gene modifying system comprises one or more linear RNAs. In some embodiments, a nucleic acid as described herein (e.g., a template nucleic acid, a nucleic acid molecule encoding a gene modifying polypeptide, or both) is a circRNA. In some embodiments, a circular RNA molecule encodes the gene modifying polypeptide. In some embodiments, the circRNA molecule encoding the gene modifying polypeptide is delivered to a host cell. In some embodiments, a circular RNA molecule encodes a recombinase, e.g., as described herein. In some embodiments, the circRNA molecule encoding the recombinase is delivered to a host cell. In some embodiments, the circRNA molecule encoding the gene modifying polypeptide is linearized (e.g., in the host cell, e.g., in the nucleus of the host cell) prior to translation.


Circular RNAs (circRNAs) have been found to occur naturally in cells and have been found to have diverse functions, including both non-coding and protein coding roles in human cells. It has been shown that a circRNA can be engineered by incorporating a self-splicing intron into an RNA molecule (or DNA encoding the RNA molecule) that results in circularization of the RNA, and that an engineered circRNA can have enhanced protein production and stability (Wesselhoeft et al. Nature Communications 2018). In some embodiments, the gene modifying polypeptide is encoded as circRNA. In certain embodiments, the template nucleic acid is a DNA, such as a dsDNA or ssDNA. In certain embodiments, the circDNA comprises a template RNA.


In some embodiments, the circRNA comprises one or more ribozyme sequences. In some embodiments, the ribozyme sequence is activated for autocleavage, e.g., in a host cell, e.g., thereby resulting in linearization of the circRNA. In some embodiments, the ribozyme is activated when the concentration of magnesium reaches a sufficient level for cleavage, e.g., in a host cell. In some embodiments the circRNA is maintained in a low magnesium environment prior to delivery to the host cell. In some embodiments, the ribozyme is a protein-responsive ribozyme. In some embodiments, the ribozyme is a nucleic acid-responsive ribozyme. In some embodiments, the circRNA comprises a cleavage site. In some embodiments, the circRNA comprises a second cleavage site.


In some embodiments, the circRNA is linearized in the nucleus of a target cell. In some embodiments, linearization of a circRNA in the nucleus of a cell involves components present in the nucleus of the cell, e.g., to activate a cleavage event. In some embodiments, a ribozyme, e.g., a ribozyme from a B2 or ALU element, that is responsive to a nuclear element, e.g., a nuclear protein, e.g., a genome-interacting protein, e.g., an epigenetic modifier, e.g., EZH2, is incorporated into a circRNA, e.g., of a gene modifying system. In some embodiments, nuclear localization of the circRNA results in an increase in autocatalytic activity of the ribozyme and linearization of the circRNA.


In some embodiments, the ribozyme is heterologous to one or more of the other components of the gene modifying system. In some embodiments, an inducible ribozyme (e.g., in a circRNA as described herein) is created synthetically, for example, by utilizing a protein ligand-responsive aptamer design. A system for utilizing the satellite RNA of tobacco ringspot virus hammerhead ribozyme with an MS2 coat protein aptamer has been described (Kennedy et al. Nucleic Acids Res 42(19):12306-12321 (2014), incorporated herein by reference in its entirety) that results in activation of the ribozyme activity in the presence of the MS2 coat protein. In embodiments, such a system responds to protein ligand localized to the cytoplasm or the nucleus. In some embodiments the protein ligand is not MS2. Methods for generating RNA aptamers to target ligands have been described, for example, based on the systematic evolution of ligands by exponential enrichment (SELEX) (Tuerk and Gold, Science 249(4968):505-510 (1990); Ellington and Szostak, Nature 346(6287):818-822 (1990); the methods of each of which are incorporated herein by reference) and have, in some instances, been aided by in silico design (Bell et al. PNAS 117(15):8486-8493, the methods of which are incorporated herein by reference). Thus, in some embodiments, an aptamer for a target ligand is generated and incorporated into a synthetic ribozyme system, e.g., to trigger ribozyme-mediated cleavage and circRNA linearization, e.g., in the presence of the protein ligand. In some embodiments, circRNA linearization is triggered in the cytoplasm, e.g., using an aptamer that associates with a ligand in the cytoplasm. In some embodiments, circRNA linearization is triggered in the nucleus, e.g., using an aptamer that associates with a ligand in the nucleus. In embodiments, the ligand in the nucleus comprises an epigenetic modifier or a transcription factor. In some embodiments the ligand that triggers linearization is present at higher levels in on-target cells than off-target cells.


It is further contemplated that a nucleic acid-responsive ribozyme system can be employed for circRNA linearization. For example, biosensors that sense defined target nucleic acid molecules to trigger ribozyme activation are described, e.g., in Penchovsky (Biotechnology Advances 32(5):1015-1027 (2014), incorporated herein by reference). By these methods, a ribozyme naturally folds into an inactive state and is only activated in the presence of a defined target nucleic acid molecule (e.g., an RNA molecule). In some embodiments, a circRNA of a gene modifying system comprises a nucleic acid-responsive ribozyme that is activated in the presence of a defined target nucleic acid, e.g., an RNA, e.g., an mRNA, miRNA, guide RNA, gRNA, sgRNA, ncRNA, lncRNA, tRNA, snRNA, or mtRNA. In some embodiments the nucleic acid that triggers linearization is present at higher levels in on-target cells than off-target cells.


In some embodiments of any of the aspects herein, a gene modifying system incorporates one or more ribozymes with inducible specificity to a target tissue or target cell of interest, e.g., a ribozyme that is activated by a ligand or nucleic acid present at higher levels in a target tissue or target cell of interest. In some embodiments, the gene modifying system incorporates a ribozyme with inducible specificity to a subcellular compartment, e.g., the nucleus, nucleolus, cytoplasm, or mitochondria. In some embodiments, the ribozyme that is activated by a ligand or nucleic acid present at higher levels in the target subcellular compartment. In some embodiments, an RNA component of a gene modifying system is provided as circRNA, e.g., that is activated by linearization. In some embodiments, linearization of a circRNA encoding a gene modifying polypeptide activates the molecule for translation. In some embodiments, a signal that activates a circRNA component of a gene modifying system is present at higher levels in on-target cells or tissues, e.g., such that the system is specifically activated in these cells.


In some embodiments, an RNA component of a gene modifying system is provided as a circRNA that is inactivated by linearization. In some embodiments, a circRNA encoding the gene modifying polypeptide is inactivated by cleavage and degradation. In some embodiments, a circRNA encoding the gene modifying polypeptide is inactivated by cleavage that separates a translation signal from the coding sequence of the polypeptide. In some embodiments, a signal that inactivates a circRNA component of a gene modifying system is present at higher levels in off-target cells or tissues, such that the system is specifically inactivated in these cells.


Target Nucleic Acid Site

In some embodiments, after gene modification, the target site surrounding the edited sequence contains a limited number of insertions or deletions, for example, in less than about 50% or 10% of editing events, e.g., as determined by long-read amplicon sequencing of the target site, e.g., as described in Karst et al. (2020) bioRxiv doi.org/10.1101/645903 (incorporated by reference herein in its entirety). In some embodiments, the target site does not show multiple consecutive editing events, e.g., head-to-tail or head-to-head duplications, e.g., as determined by long-read amplicon sequencing of the target site, e.g., as described in Karst et al. bioRxiv doi.org/10.1101/645903 (2020) (incorporated herein by reference in its entirety). In some embodiments, the target site contains an integrated sequence corresponding to the template RNA. In some embodiments, the target site does not contain insertions resulting from endogenous RNA in more than about 1% or 10% of events, e.g., as determined by long-read amplicon sequencing of the target site, e.g., as described in Karst et al. bioRxiv doi.org/10.1101/645903 (2020) (incorporated herein by reference in its entirety). In some embodiments, the target site contains the integrated sequence corresponding to the template RNA.


In certain aspects of the present invention, the host DNA-binding site integrated into by the gene modifying system can be in a gene, in an intron, in an exon, an ORF, outside of a coding region of any gene, in a regulatory region of a gene, or outside of a regulatory region of a gene. In other aspects, the polypeptide may bind to one or more than one host DNA sequence.


In some embodiments, a gene modifying system is used to edit a target locus in multiple alleles. In some embodiments, a gene modifying system is designed to edit a specific allele. For example, a gene modifying polypeptide may be directed to a specific sequence that is only present on one allele, e.g., comprises a template RNA with homology to a target allele, e.g., a gRNA or annealing domain, but not to a second cognate allele. In some embodiments, a gene modifying system can alter a haplotype-specific allele. In some embodiments, a gene modifying system that targets a specific allele preferentially targets that allele, e.g., has at least a 2, 4, 6, 8, or 10-fold preference for a target allele.


Second Strand Nicking

In some embodiments, a gene modifying system described herein comprises a nickase activity (e.g., in the gene modifying polypeptide) that nicks the first strand, and a nickase activity (e.g., in a polypeptide separate from the gene modifying polypeptide) that nicks the second strand of target DNA. As discussed herein, without wishing to be bound by theory, nicking of the first strand of the target site DNA is thought to provide a 3′ OH that can be used by an RT domain to reverse transcribe a sequence of a template RNA, e.g., a heterologous object sequence. Without wishing to be bound by theory, it is thought that introducing an additional nick to the second strand may bias the cellular DNA repair machinery to adopt the heterologous object sequence-based sequence more frequently than the original genomic sequence. In some embodiments, the additional nick to the second strand is made by the same endonuclease domain (e.g., nickase domain) as the nick to the first strand. In some embodiments, the same gene modifying polypeptide performs both the nick to the first strand and the nick to the second strand. In some embodiments, the gene modifying polypeptide comprises a CRISPR/Cas domain and the additional nick to the second strand is directed by an additional nucleic acid, e.g., comprising a second gRNA directing the CRISPR/Cas domain to nick the second strand. In other embodiments, the additional second strand nick is made by a different endonuclease domain (e.g., nickase domain) than the nick to the first strand. In some embodiments, that different endonuclease domain is situated in an additional polypeptide (e.g., a system of the invention further comprises the additional polypeptide), separate from the gene modifying polypeptide. In some embodiments, the additional polypeptide comprises an endonuclease domain (e.g., nickase domain) described herein. In some embodiments, the additional polypeptide comprises a DNA binding domain, e.g., described herein.


It is contemplated herein that the position at which the second strand nick occurs relative to the first strand nick may influence the extent to which one or more of: desired gene modifying DNA modifications are obtained, undesired double-strand breaks (DSBs) occur, undesired insertions occur, or undesired deletions occur. Without wishing to be bound by theory, second strand nicking may occur in two general orientations: inward nicks and outward nicks.


In some embodiments, in the inward nick orientation, the RT domain polymerizes (e.g., using the template RNA (e.g., the heterologous object sequence)) away from the second strand nick. In some embodiments, in the inward nick orientation, the location of the nick to the first strand and the location of the nick to the second strand are positioned between the first PAM site and second PAM site (e.g., in a scenario wherein both nicks are made by a polypeptide (e.g., a gene modifying polypeptide) comprising a CRISPR/Cas domain). In some embodiments, in the inward nick orientation, the location of the nick to the first strand and the location of the nick to the second strand are between the sites where the polypeptide and the additional polypeptide bind to the target DNA. In some embodiments, in the inward nick orientation, the location of the nick to the second strand is positioned on the same side of the binding sites of the polypeptide and additional polypeptide relative to the location of the nick to the first strand. In some embodiments, in the inward nick orientation, the location of the nick to the first strand and the location of the nick to the second strand are positioned between the PAM site and the site at a distance from the target site.


An example of a gene modifying system that provides an inward nick orientation comprises a gene modifying polypeptide comprising a CRISPR/Cas domain, a template RNA comprising a gRNA that directs nicking of the target site DNA on the first strand, and an additional nucleic acid comprising an additional gRNA that directs nicking at a site a distance from the location of the first nick, wherein the location of the first nick and the location of the second nick are between the PAM sites of the sites to which the two gRNAs direct the gene modifying polypeptide. As a further example, another gene modifying system that provides an inward nick orientation comprises a gene modifying polypeptide comprising a zinc finger molecule and a first nickase domain wherein the zinc finger molecule binds to the target DNA in a manner that directs the first nickase domain to nick the first strand of the target site; an additional polypeptide comprising a CRISPR/Cas domain, and an additional nucleic acid comprising a gRNA that directs the additional polypeptide to nick a site a distance from the target site DNA on the second strand, wherein the location of the first nick and the location of the second nick are between the PAM site and the site to which the zinc finger molecule binds. As a further example, another gene modifying system that provides an inward nick orientation comprises a gene modifying polypeptide comprising a zinc finger molecule and a first nickase domain wherein the zinc finger molecule binds to the target DNA in a manner that directs the first nickase domain to nick the first strand of the target site; an additional polypeptide comprising a TAL effector molecule and a second nickase domain wherein the TAL effector molecule binds to a site a distance from the target site in a manner that directs the additional polypeptide to nick the second strand, wherein the location of the first nick and the location of the second nick are between the site to which the TAL effector molecule binds and the site to which the zinc finger molecule binds.


In some embodiments, in the outward nick orientation, the RT domain polymerizes (e.g., using the template RNA (e.g., the heterologous object sequence)) toward the second strand nick. In some embodiments, in the inward nick orientation when both the first and second nicks are made by a polypeptide comprising a CRISPR/Cas domain (e.g., a gene modifying polypeptide), the first PAM site and second PAM site are positioned between the location of the nick to the first strand and the location of the nick to the second strand. In some embodiments, in the inward nick orientation, the polypeptide (e.g., the gene modifying polypeptide) and the additional polypeptide bind to sites on the target DNA between the location of the nick to the first strand and the location of the nick to the second. In some embodiments, in the inward nick orientation, the location of the nick to the second strand is positioned on the opposite side of the binding sites of the polypeptide and additional polypeptide relative to the location of the nick to the first strand. In some embodiments, in the inward orientation, the PAM site and the site at a distance from the target site are positioned between the location of the nick to the first strand and the location of the nick to the second strand.


An example of a gene modifying system that provides an outward nick orientation comprises a gene modifying polypeptide comprising a CRISPR/Cas domain, a template RNA comprising a gRNA that directs nicking of the target site DNA on the first strand, and an additional nucleic acid comprising an additional gRNA that directs nicking at a site a distance from the location of the first nick, wherein the location of the first nick and the location of the second nick are outside of the PAM sites of the sites to which the two gRNAs direct the gene modifying polypeptide (i.e., the PAM sites are between the location of the first nick and the location of the second nick). As a further example, another gene modifying system that provides an outward nick orientation comprises a gene modifying polypeptide comprising a zinc finger molecule and a first nickase domain wherein the zinc finger molecule binds to the target DNA in a manner that directs the first nickase domain to nick the first strand of the target site; an additional polypeptide comprising a CRISPR/Cas domain, and an additional nucleic acid comprising a gRNA that directs the additional polypeptide to nick a site a distance from the target site DNA on the second strand, wherein the location of the first nick and the location of the second nick are outside the PAM site and the site to which the zinc finger molecule binds (i.e., the PAM site and the site to which the zinc finger molecule binds are between the location of the first nick and the location of the second nick). As a further example, another gene modifying system that provides an outward nick orientation comprises a gene modifying polypeptide comprising a zinc finger molecule and a first nickase domain wherein the zinc finger molecule binds to the target DNA in a manner that directs the first nickase domain to nick the first strand of the target site; an additional polypeptide comprising a TAL effector molecule and a second nickase domain wherein the TAL effector molecule binds to a site a distance from the target site in a manner that directs the additional polypeptide to nick the second strand, wherein the location of the first nick and the location of the second nick are outside the site to which the TAL effector molecule binds and the site to which the zinc finger molecule binds (i.e., the site to which the TAL effector molecule binds and the site to which the zinc finger molecule binds are between the location of the first nick and the location of the second nick).


Without wishing to be bound by theory, it is thought that, for gene modifying systems where a second strand nick is provided, an outward nick orientation is preferred in some embodiments. As is described herein, an inward nick may produce a higher number of double-strand breaks (DSBs) than an outward nick orientation. DSBs may be recognized by the DSB repair pathways in the nucleus of a cell, which can result in undesired insertions and deletions. An outward nick orientation may provide a decreased risk of DSB formation, and a corresponding lower amount of undesired insertions and deletions. In some embodiments, undesired insertions and deletions are insertions and deletions not encoded by the heterologous object sequence, e.g., an insertion or deletion produced by the double-strand break repair pathway unrelated to the modification encoded by the heterologous object sequence. In some embodiments, a desired gene modification comprises a change to the target DNA (e.g., a substitution, insertion, or deletion) encoded by the heterologous object sequence (e.g., and achieved by the gene modifying writing the heterologous object sequence into the target site). In some embodiments, the first strand nick and the second strand nick are in an outward orientation.


In addition, the distance between the first strand nick and second strand nick may influence the extent to which one or more of: desired gene modifying system DNA modifications are obtained, undesired double-strand breaks (DSBs) occur, undesired insertions occur, or undesired deletions occur. Without wishing to be bound by theory, it is thought the second strand nick benefit, the biasing of DNA repair toward incorporation of the heterologous object sequence into the target DNA, increases as the distance between the first strand nick and second strand nick decreases. However, it is thought that the risk of DSB formation also increases as the distance between the first strand nick and second strand nick decreases. Correspondingly, it is thought that the number of undesired insertions and/or deletions may increase as the distance between the first strand nick and second strand nick decreases. In some embodiments, the distance between the first strand nick and second strand nick is chosen to balance the benefit of biasing DNA repair toward incorporation of the heterologous object sequence into the target DNA and the risk of DSB formation and of undesired deletions and/or insertions. In some embodiments, a system where the first strand nick and the second strand nick are at least a threshold distance apart has an increased level of desired gene modifying system modification outcomes, a decreased level of undesired deletions, and/or a decreased level of undesired insertions relative to an otherwise similar inward nick orientation system where the first nick and the second nick are less than the threshold distance apart. In some embodiments the threshold distance(s) is given below.


In some embodiments, the first nick and the second nick are at least 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides apart. In some embodiments, the first nick and the second nick are no more than 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, or 250 nucleotides apart. In some embodiments, the first nick and the second nick are 20-200, 30-200, 40-200, 50-200, 60-200, 70-200, 80-200, 90-200, 100-200, 110-200, 120-200, 130-200, 140-200, 150-200, 160-200, 170-200, 180-200, 190-200, 20-190, 30-190, 40-190, 50-190, 60-190, 70-190, 80-190, 90-190, 100-190, 110-190, 120-190, 130-190, 140-190, 150-190, 160-190, 170-190, 180-190, 20-180, 30-180, 40-180, 50-180, 60-180, 70-180, 80-180, 90-180, 100-180, 110-180, 120-180, 130-180, 140-180, 150-180, 160-180, 170-180, 20-170, 30-170, 40-170, 50-170, 60-170, 70-170, 80-170, 90-170, 100-170, 110-170, 120-170, 130-170, 140-170, 150-170, 160-170, 20-160, 30-160, 40-160, 50-160, 60-160, 70-160, 80-160, 90-160, 100-160, 110-160, 120-160, 130-160, 140-160, 150-160, 20-150, 30-150, 40-150, 50-150, 60-150, 70-150, 80-150, 90-150, 100-150, 110-150, 120-150, 130-150, 140-150, 20-140, 30-140, 40-140, 50-140, 60-140, 70-140, 80-140, 90-140, 100-140, 110-140, 120-140, 130-140, 20-130, 30-130, 40-130, 50-130, 60-130, 70-130, 80-130, 90-130, 100-130, 110-130, 120-130, 20-120, 30-120, 40-120, 50-120, 60-120, 70-120, 80-120, 90-120, 100-120, 110-120, 20-110, 30-110, 40-110, 50-110, 60-110, 70-110, 80-110, 90-110, 100-110, 20-100, 30-100, 40-100, 50-100, 60-100, 70-100, 80-100, 90-100, 20-90, 30-90, 40-90, 50-90, 60-90, 70-90, 80-90, 20-80, 30-80, 40-80, 50-80, 60-80, 70-80, 20-70, 30-70, 40-70, 50-70, 60-70, 20-60, 30-60, 40-60, 50-60, 20-50, 30-50, 40-50, 20-40, 30-40, or 20-30 nucleotides apart. In some embodiments, the first nick and the second nick are 40-100 nucleotides apart.


Without wishing to be bound by theory, it is thought that, for gene modifying systems where a second strand nick is provided and an inward nick orientation is selected, increasing the distance between the first strand nick and second strand nick may be preferred. As is described herein, an inward nick orientation may produce a higher number of DSBs than an outward nick orientation and may result in a higher amount of undesired insertions and deletions than an outward nick orientation, but increasing the distance between the nicks may mitigate that increase in DSBs, undesired deletions, and/or undesired insertions. In some embodiments, an inward nick orientation wherein the first nick and the second nick are at least a threshold distance apart has an increased level of desired gene modifying system modification outcomes, a decreased level of undesired deletions, and/or a decreased level of undesired insertions relative to an otherwise similar inward nick orientation system where the first nick and the second nick are less than the threshold distance apart. In some embodiments the threshold distance is given below.


In some embodiments, the first strand nick and the second strand nick are in an inward orientation. In some embodiments, the first strand nick and the second strand nick are in an inward orientation and the first strand nick and second strand nick are at least 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 220, 240, 260, 280, 300, 350, 400, 450, or 500 nucleotides apart, e.g., at least 100 nucleotides apart, (and optionally no more than 500, 400, 300, 200, 190, 180, 170, 160, 150, 140, 130, or 120 nucleotides apart). In some embodiments, the first strand nick and the second strand nick are in an inward orientation and the first strand nick and second strand nick are 100-200, 110-200, 120-200, 130-200, 140-200, 150-200, 160-200, 170-200, 180-200, 190-200, 100-190, 110-190, 120-190, 130-190, 140-190, 150-190, 160-190, 170-190, 180-190, 100-180, 110-180, 120-180, 130-180, 140-180, 150-180, 160-180, 170-180, 100-170, 110-170, 120-170, 130-170, 140-170, 150-170, 160-170, 100-160, 110-160, 120-160, 130-160, 140-160, 150-160, 100-150, 110-150, 120-150, 130-150, 140-150, 100-140, 110-140, 120-140, 130-140, 100-130, 110-130, 120-130, 100-120, 110-120, or 100-110 nucleotides apart.


Chemically Modified Nucleic Acids and Nucleic Acid End Features

A nucleic acid described herein (e.g., a template nucleic acid, e.g., a template RNA; or a nucleic acid (e.g., mRNA) encoding a gene modifying polypeptide; or a gRNA) can comprise unmodified or modified nucleobases. Naturally occurring RNAs are synthesized from four basic ribonucleotides: ATP, CTP, UTP and GTP, but may contain post-transcriptionally modified nucleotides. Further, approximately one hundred different nucleoside modifications have been identified in RNA (Rozenski, J, Crain, P, and McCloskey, J. (1999). The RNA Modification Database: 1999 update. Nucl Acids Res 27: 196-197). An RNA can also comprise wholly synthetic nucleotides that do not occur in nature.


In some embodiments, the chemical modification is one provided in WO/2016/183482, US Pat. Pub. No. 20090286852, of International Application No. WO/2012/019168, WO/2012/045075, WO/2012/135805, WO/2012/158736, WO/2013/039857, WO/2013/039861, WO/2013/052523, WO/2013/090648, WO/2013/096709, WO/2013/101690, WO/2013/106496, WO/2013/130161, WO/2013/151669, WO/2013/151736, WO/2013/151672, WO/2013/151664, WO/2013/151665, WO/2013/151668, WO/2013/151671, WO/2013/151667, WO/2013/151670, WO/2013/151666, WO/2013/151663, WO/2014/028429, WO/2014/081507, WO/2014/093924, WO/2014/093574, WO/2014/113089, WO/2014/144711, WO/2014/144767, WO/2014/144039, WO/2014/152540, WO/2014/152030, WO/2014/152031, WO/2014/152027, WO/2014/152211, WO/2014/158795, WO/2014/159813, WO/2014/164253, WO/2015/006747, WO/2015/034928, WO/2015/034925, WO/2015/038892, WO/2015/048744, WO/2015/051214, WO/2015/051173, WO/2015/051169, WO/2015/058069, WO/2015/085318, WO/2015/089511, WO/2015/105926, WO/2015/164674, WO/2015/196130, WO/2015/196128, WO/2015/196118, WO/2016/011226, WO/2016/011222, WO/2016/011306, WO/2016/014846, WO/2016/022914, WO/2016/036902, WO/2016/077125, or WO/2016/077123, each of which is herein incorporated by reference in its entirety. It is understood that incorporation of a chemically modified nucleotide into a polynucleotide can result in the modification being incorporated into a nucleobase, the backbone, or both, depending on the location of the modification in the nucleotide. In some embodiments, the backbone modification is one provided in EP 2813570, which is herein incorporated by reference in its entirety. In some embodiments, the modified cap is one provided in US Pat. Pub. No. 20050287539, which is herein incorporated by reference in its entirety.


In some embodiments, the chemically modified nucleic acid (e.g., RNA, e.g., mRNA) comprises one or more of ARCA: anti-reverse cap analog (m27.3′-OGP3G), GP3G (Unmethylated Cap Analog), m7GP3G (Monomethylated Cap Analog), m32.2.7GP3G (Trimethylated Cap Analog), m5CTP (5′-methyl-cytidine triphosphate), m6ATP (N6-methyl-adenosine-5′-triphosphate), s2UTP (2-thio-uridine triphosphate), and Ψ (pseudouridine triphosphate).


In some embodiments, the chemically modified nucleic acid comprises a 5′ cap, e.g.: a 7-methylguanosine cap (e.g., a O-Me-m7G cap); a hypermethylated cap analog; an NAD+-derived cap analog (e.g., as described in Kiledjian, Trends in Cell Biology 28, 454-464 (2018)); or a modified, e.g., biotinylated, cap analog (e.g., as described in Bednarek et al., Phil Trans R Soc B 373, 20180167 (2018)).


In some embodiments, the chemically modified nucleic acid comprises a 3′ feature selected from one or more of: a polyA tail; a 16-nucleotide long stem-loop structure flanked by unpaired 5 nucleotides (e.g., as described by Mannironi et al., Nucleic Acid Research 17, 9113-9126 (1989)); a triple-helical structure (e.g., as described by Brown et al., PNAS 109, 19202-19207 (2012)); a tRNA, Y RNA, or vault RNA structure (e.g., as described by Labno et al., Biochemica et Biophysica Acta 1863, 3125-3147 (2016)); incorporation of one or more deoxyribonucleotide triphosphates (dNTPs), 2′O-Methylated NTPs, or phosphorothioate-NTPs; a single nucleotide chemical modification (e.g., oxidation of the 3′ terminal ribose to a reactive aldehyde followed by conjugation of the aldehyde-reactive modified nucleotide); or chemical ligation to another nucleic acid molecule.


In some embodiments, the nucleic acid (e.g., template nucleic acid) comprises one or more modified nucleotides, e.g., selected from dihydrouridine, inosine, 7-methylguanosine, 5-methylcytidine (5mC), 5′ Phosphate ribothymidine, 2′-O-methyl ribothymidine, 2′-O-ethyl ribothymidine, 2′-fluoro ribothymidine, C-5 propynyl-deoxycytidine (pdC), C-5 propynyl-deoxyuridine (pdU), C-5 propynyl-cytidine (pC), C-5 propynyl-uridine (pU), 5-methyl cytidine, 5-methyl uridine, 5-methyl deoxycytidine, 5-methyl deoxyuridine methoxy, 2,6-diaminopurine, 5′-Dimethoxytrityl-N4-ethyl-2′-deoxycytidine, C-5 propynyl-f-cytidine (pfC), C-5 propynyl-f-uridine (pfU), 5-methyl f-cytidine, 5-methyl f-uridine, C-5 propynyl-m-cytidine (pmC), C-5 propynyl-f-uridine (pmU), 5-methyl m-cytidine, 5-methyl m-uridine, LNA (locked nucleic acid), MGB (minor groove binder) pseudouridine (T), 1-N-methylpseudouridine (1-Me-′P), or 5-methoxyuridine (5-MO-U).


In some embodiments, the nucleic acid comprises a backbone modification, e.g., a modification to a sugar or phosphate group in the backbone. In some embodiments, the nucleic acid comprises a nucleobase modification.


In some embodiments, the nucleic acid comprises one or more chemically modified nucleotides of Table 13, one or more chemical backbone modifications of Table 14, one or more chemically modified caps of Table 15. For instance, in some embodiments, the nucleic acid comprises two or more (e.g., 3, 4, 5, 6, 7, 8, 9, or 10 or more) different types of chemical modifications. As an example, the nucleic acid may comprise two or more (e.g., 3, 4, 5, 6, 7, 8, 9, or 10 or more) different types of modified nucleobases, e.g., as described herein, e.g., in Table 13. Alternatively or in combination, the nucleic acid may comprise two or more (e.g., 3, 4, 5, 6, 7, 8, 9, or 10 or more) different types of backbone modifications, e.g., as described herein, e.g., in Table 14. Alternatively or in combination, the nucleic acid may comprise one or more modified cap, e.g., as described herein, e.g., in Table 15. For instance, in some embodiments, the nucleic acid comprises one or more type of modified nucleobase and one or more type of backbone modification; one or more type of modified nucleobase and one or more modified cap; one or more type of modified cap and one or more type of backbone modification; or one or more type of modified nucleobase, one or more type of backbone modification, and one or more type of modified cap.


In some embodiments, the nucleic acid comprises one or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, or more) modified nucleobases. In some embodiments, all nucleobases of the nucleic acid are modified. In some embodiments, the nucleic acid is modified at one or more (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, or more) positions in the backbone. In some embodiments, all backbone positions of the nucleic acid are modified.









TABLE 13





Modified nucleotides
















5-aza-uridine
N2-methyl-6-thio-guanosine


2-thio-5-aza-midine
N2,N2-dimethyl-6-thio-guanosine


2-thiouridine
pyridin-4-one ribonucleoside


4-thio-pseudouridine
2-thio-5-aza-uridine


2-thio-pseudouridine
2-thiomidine


5-hydroxyuridine
4-thio-pseudomidine


3-methyluridine
2-thio-pseudowidine


5-carboxymethyl-uridine
3-methylmidine


1-carboxymethyl-pseudouridine
1-propynyl-pseudomidine


5-propynyl-uridine
1-methyl-1-deaza-pseudomidine


1-propynyl-pseudouridine
2-thio-1-methyl-1-deaza-pseudouridine


5-taurinomethyluridine
4-methoxy-pseudomidine


1-taurinomethyl-pseudouridine
5′-O-(1-Thiophosphate)-Adenosine


5-taurinomethyl-2-thio-uridine
5′-O-(1-Thiophosphate)-Cytidine


1-taurinomethyl-4-thio-uridine
5′-O-(1-thiophosphate)-Guanosine


5-methyl-uridine
5′-O-(1-Thiophophate)-Uridine


1-methyl-pseudouridine
5′-O-(1-Thiophosphate)-Pseudouridine


4-thio-1-methyl-pseudouridine
2′-O-methyl-Adenosine


2-thio-1-methyl-pseudouridine
2′-O-methyl-Cytidine


1-methyl-1-deaza-pseudouridine
2′-O-methyl-Guanosine


2-thio-1-methyl-1-deaza-pseudomidine
2′-O-methyl-Uridine


dihydrouridine
2′-O-methyl-Pseudouridine


dihydropseudouridine
2′-O-methyl-Inosine


2-thio-dihydromidine
2-methyladenosine


2-thio-dihydropseudouridine
2-methylthio-N6-methyladenosine


2-methoxyuridine
2-methylthio-N6 isopentenyladenosine


2-methoxy-4-thio-uridine
2-methylthio-N6-(cis-


4-methoxy-pseudouridine
hydroxyisopentenyl)adenosine


4-methoxy-2-thio-pseudouridine
N6-methyl-N6-threonylcarbamoyladenosine


5-aza-cytidine
N6-hydroxynorvalylcarbamoyladenosine


pseudoisocytidine
2-methylthio-N6-hydroxynorvalyl


3-methyl-cytidine
carbamoyladenosine


N4-acetylcytidine
2′-O-ribosyladenosine (phosphate)


5-formylcytidine
1,2′-O-dimethylinosine


N4-methylcytidine
5,2′-O-dimethylcytidine


5-hydroxymethylcytidine
N4-acetyl-2′-O-methylcytidine


1-methyl-pseudoisocytidine
Lysidine


pyrrolo-cytidine
7-methylguanosine


pyrrolo-pseudoisocytidine
N2,2′-O-dimethylguanosine


2-thio-cytidine
N2,N2,2′-O-trimethylguanosine


2-thio-5-methyl-cytidine
2′-O-ribosylguanosine (phosphate)


4-thio-pseudoisocytidine
Wybutosine


4-thio-1-methyl-pseudoisocytidine
Peroxywybutosine


4-thio-1-methyl-1-deaza-pseudoisocytidine
Hydroxywybutosine


1-methyl-1-deaza-pseudoisocytidine
undermodified hydroxywybutosine


zebularine
methylwyosine


5-aza-zebularine
queuosine


5-methyl-zebularine
epoxyqueuosine


5-aza-2-thio-zebularine
galactosyl-queuosine


2-thio-zebularine
mannosyl-queuosine


2-methoxy-cytidine
7-cyano-7-deazaguanosine


2-methoxy-5-methyl-cytidine
7-aminomethyl-7-deazaguanosine


4-methoxy-pseudoisocytidine
archaeosine


4-methoxy-1-methyl-pseudoisocytidine
5,2′-O-dimethyluridine


2-aminopurine
4-thiouridine


2,6-diaminopurine
5-methyl-2-thiouridine


7-deaza-adenine
2-thio-2′-O-methyluridine


7-deaza-8-aza-adenine
3-(3-amino-3-carboxypropyl)uridine


7-deaza-2-aminopurine
5-methoxyuridine


7-deaza-8-aza-2-aminopurine
uridine 5-oxyacetic acid


7-deaza-2,6-diaminopurine
uridine 5-oxyacetic acid methyl ester


7-deaza-8-aza-2,6-diarninopurine
5-(carboxyhydroxymethyl)uridine)


1-methyladenosine
5-(carboxyhydroxymethyl)uridine methyl ester


N6-isopentenyladenosine
5-methoxycarbonylmethyluridine


N6-(cis-hydroxyisopentenyl)adenosine
5-methoxycarbonylmethyl-2′-O-methyluridine


2-methylthio-N6-(cis-hydroxyisopentenyl)
5-methoxycarbonylmethyl-2-thiouridine


adenosine
5-aminomethyl-2-thiouridine


N6-glycinylcarbamoyladenosine
5-methylaminomethyluridine


N6-threonylcarbamoyladenosine
5-methylaminomethyl-2-thiouridine


2-methylthio-N6-threonyl
5-methylaminomethyl-2-selenouridine


carbamoyladenosine
5-carbamoylmethyluridine


N6,N6-dimethyladenosine
5-carbamoylmethyl-2′-O-methyluridine


7-methyladenine
5-carboxymethylaminomethyluridine


2-methylthio-adenine
5-carboxymethylaminomethyl-2′-O-


2-methoxy-adenine
methyluridine


inosine
5-carboxymethylaminomethyl-2-thiouridine


1-methyl-inosine
N4,2′-O-dimethylcytidine


wyosine
5-carboxymethyluridine


wybutosine
N6,2′-O-dimethyladenosine


7-deaza-guanosine
N,N6,O-2′-trimethyladenosine


7-deaza-8-aza-guanosine
N2,7-dimethylguanosine


6-thio-guanosine
N2,N2,7-trimethylguanosine


6-thio-7-deaza-guanosine
3,2′-O-dimethyluridine


6-thio-7-deaza-8-aza-guanosine
5-methyldihydrouridine


7-methyl-guanosine
5-formyl-2′-O-methylcytidine


6-thio-7-methyl-guanosine
1,2′-O-dimethylguanosine


7-methylinosine
4-demethylwyosine


6-methoxy-guanosine
Isowyosine


1-methylguanosine
N6-acetyladenosine


N2-methylguanosine



N2,N2-dimethylguanosine



8-oxo-guanosine



7-methyl-8-oxo-guanosine



1-methyl-6-thio-guanosine
















TABLE 14





Backbone modifications



















2′-O-Methyl backbone




Peptide Nucleic Acid (PNA) backbone




phosphorothioate backbone




morpholino backbone




carbamate backbone




siloxane backbone




sulfide backbone




sulfoxide backbone




sulfone backbone




formacetyl backbone




thioformacetyl backbone




methyleneformacetyl backbone




riboacetyl backbone




alkene containing backbone




sulfamate backbone




sulfonate backbone




sulfonamide backbone




methyleneimino backbone




methylenehydrazino backbone




amide backbone

















TABLE 15





Modified caps



















m7GpppA




m7GpppC




m2,7GpppG




m2,2,7GpppG




m7Gpppm7G




m7,2′OmeGpppG




m72′dGpppG




m7,3′OmeGpppG




m7,3′dGpppG




GppppG




m7GppppG




m7GppppA




m7GppppC




m2,7GppppG




m2,2,7GppppG




m7Gppppm7G




m7,2′OmeGppppG




m72′dGppppG




m7,3′OmeGppppG




m7,3′dGppppG










The nucleotides comprising the template of the gene modifying system can be natural or modified bases, or a combination thereof. For example, the template may contain pseudouridine, dihydrouridine, inosine, 7-methylguanosine, or other modified bases. In some embodiments, the template may contain locked nucleic acid nucleotides. In some embodiments, the modified bases used in the template do not inhibit the reverse transcription of the template. In some embodiments, the modified bases used in the template may improve reverse transcription, e.g., specificity or fidelity.


In some embodiments, an RNA component of the system (e.g., a template RNA or a gRNA) comprises one or more nucleotide modifications. In some embodiments, the modification pattern of a gRNA can significantly affect in vivo activity compared to unmodified or end-modified guides (e.g., as shown in FIG. 1D from Finn et al. Cell Rep 22(9):2227-2235 (2018); incorporated herein by reference in its entirety). Without wishing to be bound by theory, this process may be due, at least in part, to a stabilization of the RNA conferred by the modifications. Non-limiting examples of such modifications may include 2′-O-methyl (2′-O-Me), 2′-O-(2-methoxyethyl) (2′-O-MOE), 2′-fluoro (2′-F), phosphorothioate (PS) bond between nucleotides, G-C substitutions, and inverted abasic linkages between nucleotides and equivalents thereof.


In some embodiments, the template RNA (e.g., at the portion thereof that binds a target site) or the guide RNA comprises a 5′ terminus region. In some embodiments, the template RNA or the guide RNA does not comprise a 5′ terminus region. In some embodiments, the 5′terminus region comprises a gRNA spacer region, e.g., as described with respect to sgRNA in Briner A E et al, Molecular Cell 56: 333-339 (2014) (incorporated herein by reference in its entirety; applicable herein, e.g., to all guide RNAs). In some embodiments, the 5′ terminus region comprises a 5′ end modification. In some embodiments, a 5′ terminus region with or without a spacer region may be associated with a crRNA, trRNA, sgRNA and/or dgRNA. The gRNA spacer region can, in some instances, comprise a guide region, guide domain, or targeting domain.


In some embodiments, the template RNAs (e.g., at the portion thereof that binds a target site) or guide RNAs described herein comprises any of the sequences shown in Table 4 of WO2018107028A1, incorporated herein by reference in its entirety. In some embodiments, where a sequence shows a guide and/or spacer region, the composition may comprise this region or not. In some embodiments, a guide RNA comprises one or more of the modifications of any of the sequences shown in Table 4 of WO2018107028A1, e.g., as identified therein by a SEQ ID NO. In embodiments, the nucleotides may be the same or different, and/or the modification pattern shown may be the same or similar to a modification pattern of a guide sequence as shown in Table 4 of WO2018107028A1. In some embodiments, a modification pattern includes the relative position and identity of modifications of the gRNA or a region of the gRNA (e.g. 5′ terminus region, lower stem region, bulge region, upper stem region, nexus region, hairpin 1 region, hairpin 2 region, 3′ terminus region). In some embodiments, the modification pattern contains at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% of the modifications of any one of the sequences shown in the sequence column of Table 4 of WO2018107028A1, and/or over one or more regions of the sequence. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the modification pattern of any one of the sequences shown in the sequence column of Table 4 of WO2018107028A1. In some embodiments, the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over one or more regions of the sequence shown in Table 4 of WO2018107028A1, e.g., in a 5′ terminus region, lower stem region, bulge region, upper stem region, nexus region, hairpin 1 region, hairpin 2 region, and/or 3′ terminus region. In some embodiments, the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the modification pattern of a sequence over the 5′ terminus region. In some embodiments, the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the lower stem. In some embodiments, the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the bulge. In some embodiments, the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the upper stem. In some embodiments, the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the nexus. In some embodiments, the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the hairpin 1. In some embodiments, the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the hairpin 2. In some embodiments, the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the 3′ terminus. In some embodiments, the modification pattern differs from the modification pattern of a sequence of Table 4 of WO2018107028A1, or a region (e.g., 5′ terminus, lower stem, bulge, upper stem, nexus, hairpin 1, hairpin 2, 3′ terminus) of such a sequence, e.g., at 0, 1, 2, 3, 4, 5, 6, or more nucleotides. In some embodiments, the gRNA comprises modifications that differ from the modifications of a sequence of Table 4 of WO2018107028A1, e.g., at 0, 1, 2, 3, 4, 5, 6, or more nucleotides. In some embodiments, the gRNA comprises modifications that differ from modifications of a region (e.g., 5′ terminus, lower stem, bulge, upper stem, nexus, hairpin 1, hairpin 2, 3′ terminus) of a sequence of Table 4 of WO2018107028A1, e.g., at 0, 1, 2, 3, 4, 5, 6, or more nucleotides.


In some embodiments, the template RNAs (e.g., at the portion thereof that binds a target site) or the gRNA comprises a 2′-O-methyl (2′-O-Me) modified nucleotide. In some embodiments, the gRNA comprises a 2′-O-(2-methoxy ethyl) (2′-O-moe) modified nucleotide. In some embodiments, the gRNA comprises a 2′-fluoro (2′-F) modified nucleotide. In some embodiments, the gRNA comprises a phosphorothioate (PS) bond between nucleotides. In some embodiments, the gRNA comprises a 5′ end modification, a 3′ end modification, or 5′ and 3′ end modifications. In some embodiments, the 5′ end modification comprises a phosphorothioate (PS) bond between nucleotides. In some embodiments, the 5′ end modification comprises a 2′-O-methyl (2′-O-Me), 2′-O-(2-methoxy ethyl) (2′-O-MOE), and/or 2′-fluoro (2′-F) modified nucleotide. In some embodiments, the 5′ end modification comprises at least one phosphorothioate (PS) bond and one or more of a 2′-O-methyl (2′-O-Me), 2′-O-(2-methoxyethyl) (2′-O-MOE), and/or 2′-fluoro (2′-F) modified nucleotide. The end modification may comprise a phosphorothioate (PS), 2′-O-methyl (2′-O-Me), 2′-O-(2-methoxyethyl) (2′-O-MOE), and/or 2′-fluoro (2′-F) modification. Equivalent end modifications are also encompassed by embodiments described herein. In some embodiments, the template RNA or gRNA comprises an end modification in combination with a modification of one or more regions of the template RNA or gRNA. Additional exemplary modifications and methods for protecting RNA, e.g., gRNA, and formulae thereof, are described in WO2018126176A1, which is incorporated herein by reference in its entirety.


In some embodiments, a template RNA described herein comprises three phosphorothioate linkages at the 5′ end and three phosphorothioate linkages at the 3′ end. In some embodiments, a template RNA described herein comprises three 2′-O-methyl ribonucleotides at the 5′ end and three 2′-O-methyl ribonucleotides at the 3′ end. In some embodiments, the 5′ most three nucleotides of the template RNA are 2′-O-methyl ribonucleotides, the 5′ most three internucleotide linkages of the template RNA are phosphorothioate linkages, the 3′ most three nucleotides of the template RNA are 2′-O-methyl ribonucleotides, and the 3′ most three internucleotide linkages of the template RNA are phosphorothioate linkages. In some embodiments, the template RNA comprises alternating blocks of ribonucleotides and 2′-O-methyl ribonucleotides, for instance, blocks of between 12 and 28 nucleotides in length. In some embodiments, the central portion of the template RNA comprises the alternating blocks and the 5′ and 3′ ends each comprise three 2′-O-methyl ribonucleotides and three phosphorothioate linkages.


In some embodiments, structure-guided and systematic approaches are used to introduce modifications (e.g., 2′-OMe-RNA, 2′-F-RNA, and PS modifications) to a template RNA or guide RNA, for example, as described in Mir et al. Nat Commun 9:2641 (2018) (incorporated by reference herein in its entirety). In some embodiments, the incorporation of 2′-F-RNAs increases thermal and nuclease stability of RNA:RNA or RNA:DNA duplexes, e.g., while minimally interfering with C3′-endo sugar puckering. In some embodiments, 2′-F may be better tolerated than 2′-OMe at positions where the 2′-OH is important for RNA:DNA duplex stability. In some embodiments, a crRNA comprises one or more modifications that do not reduce Cas9 activity, e.g., C10, C20, or C21 (fully modified), e.g., as described in Supplementary Table 1 of Mir et al. Nat Commun 9:2641 (2018), incorporated herein by reference in its entirety. In some embodiments, a tracrRNA comprises one or more modifications that do not reduce Cas9 activity, e.g., T2, T6, T7, or T8 (fully modified) of Supplementary Table 1 of Mir et al. Nat Commun 9:2641 (2018). In some embodiments, a crRNA comprises one or more modifications (e.g., as described herein) may be paired with a tracrRNA comprising one or more modifications, e.g., C20 and T2. In some embodiments, a gRNA comprises a chimera, e.g., of a crRNA and a tracrRNA (e.g., Jinek et al. Science 337(6096):816-821 (2012)). In embodiments, modifications from the crRNA and tracrRNA are mapped onto the single-guide chimera, e.g., to produce a modified gRNA with enhanced stability.


In some embodiments, gRNA molecules may be modified by the addition or subtraction of the naturally occurring structural components, e.g., hairpins. In some embodiments, a gRNA may comprise a gRNA with one or more 3′ hairpin elements deleted, e.g., as described in WO2018106727, incorporated herein by reference in its entirety. In some embodiments, a gRNA may contain an added hairpin structure, e.g., an added hairpin structure in the spacer region, which was shown to increase specificity of a CRISPR-Cas system in the teachings of Kocak et al. Nat Biotechnol 37(6):657-666 (2019). Additional modifications, including examples of shortened gRNA and specific modifications improving in vivo activity, can be found in US20190316121, incorporated herein by reference in its entirety.


In some embodiments, structure-guided and systematic approaches (e.g., as described in Mir et al. Nat Commun 9:2641 (2018); incorporated herein by reference in its entirety) are employed to find modifications for the template RNA. In embodiments, the modifications are identified with the inclusion or exclusion of a guide region of the template RNA. In some embodiments, a structure of polypeptide bound to template RNA is used to determine non-protein-contacted nucleotides of the RNA that may then be selected for modifications, e.g., with lower risk of disrupting the association of the RNA with the polypeptide. Secondary structures in a template RNA can also be predicted in silico by software tools, e.g., the RNAstructure tool available at rna.urmc.rochester.edu/RNAstructureWeb (Bellaousov et al. Nucleic Acids Res 41:W471-W474 (2013); incorporated by reference herein in its entirety), e.g., to determine secondary structures for selecting modifications, e.g., hairpins, stems, and/or bulges.


Production of Compositions and Systems

As will be appreciated by one of skill, methods of designing and constructing nucleic acid constructs and proteins or polypeptides (such as the systems, constructs and polypeptides described herein) are routine in the art. Generally, recombinant methods may be used. See, in general, Smales & James (Eds.), Therapeutic Proteins: Methods and Protocols (Methods in Molecular Biology), Humana Press (2005); and Crommelin, Sindelar & Meibohm (Eds.), Pharmaceutical Biotechnology: Fundamentals and Applications, Springer (2013). Methods of designing, preparing, evaluating, purifying and manipulating nucleic acid compositions are described in Green and Sambrook (Eds.), Molecular Cloning: A Laboratory Manual (Fourth Edition), Cold Spring Harbor Laboratory Press (2012).


The disclosure provides, in part, a nucleic acid, e.g., vector, encoding a gene modifying polypeptide described herein, a template nucleic acid described herein, or both. In some embodiments, a vector comprises a selective marker, e.g., an antibiotic resistance marker. In some embodiments, the antibiotic resistance marker is a kanamycin resistance marker. In some embodiments, the antibiotic resistance marker does not confer resistance to beta-lactam antibiotics. In some embodiments, the vector does not comprise an ampicillin resistance marker. In some embodiments, the vector comprises a kanamycin resistance marker and does not comprise an ampicillin resistance marker. In some embodiments, a vector encoding a gene modifying polypeptide is integrated into a target cell genome (e.g., upon administration to a target cell, tissue, organ, or subject). In some embodiments, a vector encoding a gene modifying polypeptide is not integrated into a target cell genome (e.g., upon administration to a target cell, tissue, organ, or subject). In some embodiments, a vector encoding a template nucleic acid (e.g., template RNA) is not integrated into a target cell genome (e.g., upon administration to a target cell, tissue, organ, or subject). In some embodiments, if a vector is integrated into a target site in a target cell genome, the selective marker is not integrated into the genome. In some embodiments, if a vector is integrated into a target site in a target cell genome, genes or sequences involved in vector maintenance (e.g., plasmid maintenance genes) are not integrated into the genome. In some embodiments, if a vector is integrated into a target site in a target cell genome, transfer regulating sequences (e.g., inverted terminal repeats, e.g., from an AAV) are not integrated into the genome. In some embodiments, administration of a vector (e.g., encoding a gene modifying polypeptide described herein, a template nucleic acid described herein, or both) to a target cell, tissue, organ, or subject results in integration of a portion of the vector into one or more target sites in the genome(s) of said target cell, tissue, organ, or subject. In some embodiments, less than 99, 95, 90, 80, 70, 60, 50, 40, 30, 20, 10, 5, 4, 3, 2, or 1% of target sites (e.g., no target sites) comprising integrated material comprise a selective marker (e.g., an antibiotic resistance gene), a transfer regulating sequence (e.g., an inverted terminal repeat, e.g., from an AAV), or both from the vector.


Exemplary methods for producing a therapeutic pharmaceutical protein or polypeptide described herein involve expression in mammalian cells, although recombinant proteins can also be produced using insect cells, yeast, bacteria, or other cells under control of appropriate promoters. Mammalian expression vectors may comprise non-transcribed elements such as an origin of replication, a suitable promoter, and other 5′ or 3′ flanking non-transcribed sequences, and 5′ or 3′ non-translated sequences such as necessary ribosome binding sites, a polyadenylation site, splice donor and acceptor sites, and termination sequences. DNA sequences derived from the SV40 viral genome, for example, SV40 origin, early promoter, splice, and polyadenylation sites may be used to provide other genetic elements required for expression of a heterologous DNA sequence. Appropriate cloning and expression vectors for use with bacterial, fungal, yeast, and mammalian cellular hosts are described in Green & Sambrook, Molecular Cloning: A Laboratory Manual (Fourth Edition), Cold Spring Harbor Laboratory Press (2012).


Various mammalian cell culture systems can be employed to express and manufacture recombinant protein. Examples of mammalian expression systems include CHO, COS, HEK293, HeLA, and BHK cell lines. Processes of host cell culture for production of protein therapeutics are described in Zhou and Kantardjieff (Eds.), Mammalian Cell Cultures for Biologics Manufacturing (Advances in Biochemical Engineering/Biotechnology), Springer (2014). Compositions described herein may include a vector, such as a viral vector, e.g., a lentiviral vector, encoding a recombinant protein. In some embodiments, a vector, e.g., a viral vector, may comprise a nucleic acid encoding a recombinant protein.


Purification of protein therapeutics is described in Franks, Protein Biotechnology: Isolation, Characterization, and Stabilization, Humana Press (2013); and in Cutler, Protein Purification Protocols (Methods in Molecular Biology), Humana Press (2010).


The disclosure also provides compositions and methods for the production of template nucleic acid molecules (e.g., template RNAs) with specificity for a gene modifying polypeptide and/or a genomic target site. In an aspect, the method comprises production of RNA segments including an upstream homology segment, a heterologous object sequence segment, a gene modifying polypeptide binding motif, and a gRNA segment.


Therapeutic Applications

In some embodiments, a gene modifying system as described herein can be used to modify a cell (e.g., an animal cell, plant cell, or fungal cell). In some embodiments, a gene modifying system as described herein can be used to modify a mammalian cell (e.g., a human cell). In some embodiments, a gene modifying system as described herein can be used to modify a cell from a livestock animal (e.g., a cow, horse, sheep, goat, pig, llama, alpaca, camel, yak, chicken, duck, goose, or ostrich). In some embodiments, a gene modifying system as described herein can be used as a laboratory tool or a research tool, or used in a laboratory method or research method, e.g., to modify an animal cell, e.g., a mammalian cell (e.g., a human cell), a plant cell, or a fungal cell.


By integrating coding genes into a RNA sequence template, the gene modifying system can address therapeutic needs, for example, by providing expression of a therapeutic transgene in individuals with loss-of-function mutations, by replacing gain-of-function mutations with normal transgenes, by providing regulatory sequences to eliminate gain-of-function mutation expression, and/or by controlling the expression of operably linked genes, transgenes and systems thereof. In certain embodiments, the RNA sequence template encodes a promotor region specific to the therapeutic needs of the host cell, for example a tissue specific promotor or enhancer. In still other embodiments, a promotor can be operably linked to a coding sequence.


Accordingly, provided herein are methods for treating cystic fibrosis in a subject in need thereof. In some embodiments, treatment results in amelioration of one or more symptoms associated with cystic fibrosis.


In some embodiments, a system herein is used to treat a subject having a mutation affecting amino acid F508 (e.g., F508del). In some embodiments, a system described herein is used to treat a subject having a mutation at a portion of the CFTR gene corresponding to the corresponding RT template sequence of Table 3.


Clusters of Mutations


Certain mutations that occur in the CFTR gene in the human population are concentrated within a particular region of the CFTR gene. In particular, these mutations causing the amino acid changes may form a cluster at the population level. However, in many cases, an individual patient has a mutation at only one position in the cluster and is wild type at the other positions. In some embodiments, a template RNA described herein can be used to treat a patient having one or more mutations in a cluster. More specifically, a template RNA may comprise a heterologous object sequence (e.g., RT template sequence of Table 3) that overlaps with this cluster region and contains the correct, wild type sequence for that region. As a result, a single sequence of template RNA described may be useful for treating several different sub-population of patients having different mutations within the cluster. Without wishing to be bound by theory, in some embodiments, a gene editing polypeptide writes the entire RT template sequence into the target genome, resulting in correction of a mutation anywhere in the cluster. More generally, in some embodiments, a system described herein is used to treat a subject having a mutation at a portion of the CFTR gene corresponding to the corresponding RT template sequence of Table 3.


In some embodiments, treatment with a system disclosed herein results in correction of the F508del mutation in between about 5-50% (e.g., about 5-10%, 10-20%, 20-30%, 30-40%, 40-50%, or about 10%) of cells. In some embodiments, treatment with a system disclosed herein results in correction of the F508del mutation in between about 5-50% (e.g., about 5-10%, 10-20%, 20-30%, 30-40%, 40-50%, or about 10%) of DNA isolated from the treated cells.


In some embodiments, treatment with a gene modifying system described herein results in one or more of:

    • (a) an improvement of function of the CFTR gene;
    • (b) a restoration of balance of chloride and fluid at the cell surface;
    • (c) a reduction of the thickness and/or stickiness of mucus in an organ;
    • (d) a reduction in lung infection;
    • (e) improved respiratory function in the lungs;
    • (f) an improvement in digestion; and
    • (g) an improvement of function of the reproductive system.


      as compared to a subject having cystic fibrosis that has not been treated with a gene modifying system described herein.


Administration and Delivery

The compositions and systems described herein may be used in vitro or in vivo. In some embodiments the system or components of the system are delivered to cells (e.g., mammalian cells, e.g., human cells), e.g., in vitro or in vivo. In some embodiments, the cells are eukaryotic cells, e.g., cells of a multicellular organism, e.g., an animal, e.g., a mammal (e.g., human, swine, bovine), a bird (e.g., poultry, such as chicken, turkey, or duck), or a fish. In some embodiments, the cells are non-human animal cells (e.g., a laboratory animal, a livestock animal, or a companion animal). In some embodiments, the cell is a stem cell (e.g., a hematopoietic stem cell), a fibroblast, or a T cell. In some embodiments, the cell is an immune cell, e.g., a T cell (e.g., a Treg, CD4, CD8, γδ, or memory T cell), B cell (e.g., memory B cell or plasma cell), or NK cell. In some embodiments, the cell is a non-dividing cell, e.g., a non-dividing fibroblast or non-dividing T cell. In some embodiments, the cell is an HSC and p53 is not upregulated or is upregulated by less than 10%, 5%, 2%, or 1%, e.g., as determined according to the method described in Example 30 of PCT/US2019/048607. The skilled artisan will understand that the components of the gene modifying system may be delivered in the form of polypeptide, nucleic acid (e.g., DNA, RNA), and combinations thereof.


In one embodiment the system and/or components of the system are delivered as nucleic acid. For example, the gene modifying polypeptide may be delivered in the form of a DNA or RNA encoding the polypeptide, and the template RNA may be delivered in the form of RNA or its complementary DNA to be transcribed into RNA. In some embodiments the system or components of the system are delivered on 1, 2, 3, 4, or more distinct nucleic acid molecules. In some embodiments the system or components of the system are delivered as a combination of DNA and RNA. In some embodiments the system or components of the system are delivered as a combination of DNA and protein. In some embodiments the system or components of the system are delivered as a combination of RNA and protein. In some embodiments the gene modifying polypeptide is delivered as a protein.


In some embodiments the system or components of the system are delivered to cells, e.g., mammalian cells or human cells, using a vector. The vector may be, e.g., a plasmid or a virus. In some embodiments, delivery is in vivo, in vitro, ex vivo, or in situ. In some embodiments the virus is an adeno associated virus (AAV), a lentivirus, or an adenovirus. In some embodiments the system or components of the system are delivered to cells with a viral-like particle or a virosome. In some embodiments the delivery uses more than one virus, viral-like particle or virosome.


In one embodiment, the compositions and systems described herein can be formulated in liposomes or other similar vesicles. Liposomes are spherical vesicle structures composed of a uni- or multilamellar lipid bilayer surrounding internal aqueous compartments and a relatively impermeable outer lipophilic phospholipid bilayer. Liposomes may be anionic, neutral or cationic. Liposomes are biocompatible, nontoxic, can deliver both hydrophilic and lipophilic drug molecules, protect their cargo from degradation by plasma enzymes, and transport their load across biological membranes and the blood brain barrier (BBB) (see, e.g., Spuch and Navarro, Journal of Drug Delivery, vol. 2011, Article ID 469679, 12 pages, 2011. doi:10.1155/2011/469679 for review).


Vesicles can be made from several different types of lipids; however, phospholipids are most commonly used to generate liposomes as drug carriers. Methods for preparation of multilamellar vesicle lipids are known in the art (see for example U.S. Pat. No. 6,693,086, the teachings of which relating to multilamellar vesicle lipid preparation are incorporated herein by reference). Although vesicle formation can be spontaneous when a lipid film is mixed with an aqueous solution, it can also be expedited by applying force in the form of shaking by using a homogenizer, sonicator, or an extrusion apparatus (see, e.g., Spuch and Navarro, Journal of Drug Delivery, vol. 2011, Article ID 469679, 12 pages, 2011. doi:10.1155/2011/469679 for review). Extruded lipids can be prepared by extruding through filters of decreasing size, as described in Templeton et al., Nature Biotech, 15:647-652, 1997, the teachings of which relating to extruded lipid preparation are incorporated herein by reference.


A variety of nanoparticles can be used for delivery, such as a liposome, a lipid nanoparticle, a cationic lipid nanoparticle, an ionizable lipid nanoparticle, a polymeric nanoparticle, a gold nanoparticle, a dendrimer, a cyclodextrin nanoparticle, a micelle, or a combination of the foregoing.


Lipid nanoparticles are an example of a carrier that provides a biocompatible and biodegradable delivery system for the pharmaceutical compositions described herein. Nanostructured lipid carriers (NLCs) are modified solid lipid nanoparticles (SLNs) that retain the characteristics of the SLN, improve drug stability and loading capacity, and prevent drug leakage. Polymer nanoparticles (PNPs) are an important component of drug delivery. These nanoparticles can effectively direct drug delivery to specific targets and improve drug stability and controlled drug release. Lipid-polymer nanoparticles (PLNs), a type of carrier that combines liposomes and polymers, may also be employed. These nanoparticles possess the complementary advantages of PNPs and liposomes. A PLN is composed of a core-shell structure; the polymer core provides a stable structure, and the phospholipid shell offers good biocompatibility. As such, the two components increase the drug encapsulation efficiency rate, facilitate surface modification, and prevent leakage of water-soluble drugs. For a review, see, e.g., Li et al. 2017, Nanomaterials 7, 122; doi:10.3390/nano7060122.


Exosomes can also be used as drug delivery vehicles for the compositions and systems described herein. For a review, see Ha et al. July 2016. Acta Pharmaceutica Sinica B. Volume 6, Issue 4, Pages 287-296; https://doi.org/10.1016/j.apsb.2016.02.001.


Fusosomes interact and fuse with target cells, and thus can be used as delivery vehicles for a variety of molecules. They generally consist of a bilayer of amphipathic lipids enclosing a lumen or cavity and a fusogen that interacts with the amphipathic lipid bilayer. The fusogen component has been shown to be engineerable in order to confer target cell specificity for the fusion and payload delivery, allowing the creation of delivery vehicles with programmable cell specificity (see for example Patent Application WO2020014209, the teachings of which relating to fusosome design, preparation, and usage are incorporated herein by reference).


In some embodiments, the protein component(s) of the gene modifying system may be pre-associated with the template nucleic acid (e.g., template RNA). For example, in some embodiments, the gene modifying polypeptide may be first combined with the template nucleic acid (e.g., template RNA) to form a ribonucleoprotein (RNP) complex. In some embodiments, the RNP may be delivered to cells via, e.g., transfection, nucleofection, virus, vesicle, LNP, exosome, fusosome.


A gene modifying system can be introduced into cells, tissues and multicellular organisms. In some embodiments the system or components of the system are delivered to the cells via mechanical means or physical means.


Formulation of protein therapeutics is described in Meyer (Ed.), Therapeutic Protein Drug Products: Practical Approaches to formulation in the Laboratory, Manufacturing, and the Clinic, Woodhead Publishing Series (2012).


Tissue Specific Activity/Administration

In some embodiments, a system described herein can make use of one or more feature (e.g., a promoter or microRNA binding site) to limit activity in off-target cells or tissues.


In some embodiments, a nucleic acid described herein (e.g., a template RNA or a DNA encoding a template RNA) comprises a promoter sequence, e.g., a tissue specific promoter sequence. In some embodiments, the tissue-specific promoter is used to increase the target-cell specificity of a gene modifying system. For instance, the promoter can be chosen on the basis that it is active in a target cell type but not active in (or active at a lower level in) a non-target cell type. Thus, even if the promoter integrated into the genome of a non-target cell, it would not drive expression (or only drive low level expression) of an integrated gene. A system having a tissue-specific promoter sequence in the template RNA may also be used in combination with a microRNA binding site, e.g., in the template RNA or a nucleic acid encoding a gene modifying protein, e.g., as described herein. A system having a tissue-specific promoter sequence in the template RNA may also be used in combination with a DNA encoding a gene modifying polypeptide, driven by a tissue-specific promoter, e.g., to achieve higher levels of gene modifying protein in target cells than in non-target cells. In some embodiments, e.g., for liver indications, a tissue-specific promoter is selected from Table 3 of WO2020014209, incorporated herein by reference.


In some embodiments, a nucleic acid described herein (e.g., a template RNA or a DNA encoding a template RNA) comprises a microRNA binding site. In some embodiments, the microRNA binding site is used to increase the target-cell specificity of a gene modifying system. For instance, the microRNA binding site can be chosen on the basis that is recognized by a miRNA that is present in a non-target cell type, but that is not present (or is present at a reduced level relative to the non-target cell) in a target cell type. Thus, when the template RNA is present in a non-target cell, it would be bound by the miRNA, and when the template RNA is present in a target cell, it would not be bound by the miRNA (or bound but at reduced levels relative to the non-target cell). While not wishing to be bound by theory, binding of the miRNA to the template RNA may interfere with its activity, e.g., may interfere with insertion of the heterologous object sequence into the genome. Accordingly, the system would edit the genome of target cells more efficiently than it edits the genome of non-target cells, e.g., the heterologous object sequence would be inserted into the genome of target cells more efficiently than into the genome of non-target cells, or an insertion or deletion is produced more efficiently in target cells than in non-target cells. A system having a microRNA binding site in the template RNA (or DNA encoding it) may also be used in combination with a nucleic acid encoding a gene modifying polypeptide, wherein expression of the gene modifying polypeptide is regulated by a second microRNA binding site, e.g., as described herein. In some embodiments, e.g., for liver indications, a miRNA is selected from Table 4 of WO2020014209, incorporated herein by reference.


In some embodiments, the template RNA comprises a microRNA sequence, an siRNA sequence, a guide RNA sequence, or a piwi RNA sequence.


Promoters


In some embodiments, one or more promoter or enhancer elements are operably linked to a nucleic acid encoding a gene modifying protein or a template nucleic acid, e.g., that controls expression of the heterologous object sequence. In certain embodiments, the one or more promoter or enhancer elements comprise cell-type or tissue specific elements. In some embodiments, the promoter or enhancer is the same or derived from the promoter or enhancer that naturally controls expression of the heterologous object sequence. For example, the ornithine transcarbomylase promoter and enhancer may be used to control expression of the ornithine transcarbomylase gene in a system or method provided by the invention for correcting ornithine transcarbomylase deficiencies. In some embodiments, the promoter is a promoter of Table 16 or 17 or a functional fragment or variant thereof.


Exemplary tissue specific promoters that are commercially available can be found, for example, at a uniform resource locator (e.g., www.invivogen.com/tissue-specific-promoters). In some embodiments, a promoter is a native promoter or a minimal promoter, e.g., which consists of a single fragment from the 5′ region of a given gene. In some embodiments, a native promoter comprises a core promoter and its natural 5′ UTR. In some embodiments, the 5′ UTR comprises an intron. In other embodiments, these include composite promoters, which combine promoter elements of different origins or were generated by assembling a distal enhancer with a minimal promoter of the same origin.


Exemplary cell or tissue specific promoters are provided in the tables, below, and exemplary nucleic acid sequences encoding them are known in the art and can be readily accessed using a variety of resources, such as the NCBI database, including RefSeq, as well as the Eukaryotic Promoter Database (//epd.epfl.ch//index.php).









TABLE 16







Exemplary cell or tissue-specific promoters










Promoter
Target cells







B29 Promoter
B cells



CD14 Promoter
Monocytic Cells



CD43 Promoter
Leukocytes and platelets



CD45 Promoter
Hematopoeitic cells



CD68 promoter
macrophages



Desmin promoter
muscle cells



Elastase-1 promoter
pancreatic acinar cells



Endoglin promoter
endothelial cells



fibronectin promoter
differentiating cells, healing tissue



Flt-1 promoter
endothelial cells



GFAP promoter
Astrocytes



GPIIB promoter
megakaryocytes



ICAM-2 Promoter
Endothelial cells



INF-Beta promoter
Hematopoeitic cells



Mb promoter
muscle cells



Nphs1 promoter
podocytes



OG-2 promoter
Osteoblasts, Odonblasts



SP-B promoter
Lung



Synl promoter
Neurons



WASP promoter
Hematopoeitic cells



SV40/bAlb promoter
Liver



SV40/bAlb promoter
Liver



SV40/Cd3 promoter
Leukocytes and platelets



SV40/CD45 promoter
hematopoeitic cells



NSE/RU5′ promoter
Mature Neurons

















TABLE 17







Additional exemplary cell or tissue-specific promoters









Promoter
Gene Description
Gene Specificity





APOA2
Apolipoprotein A-II
Hepatocytes (from hepatocyte progenitors)


SERPINA1
Serpin peptidase inhibitor, clade A (alpha-1
Hepatocytes


(hAAT)
antiproteinase, antitrypsin), member 1
(from definitive endoderm



(also named alpha 1 anti-tryps in)
stage)


CYP3A
Cytochrome P450, family 3, subfamily A,
Mature Hepatocytes



polypeptide



MIR122
MicroRNA 122
Hepatocytes




(from early stage embryonic




liver cells)




and endoderm







Pancreatic specific promoters









INS
Insulin
Pancreatic beta cells




(from definitive endoderm stage)


IRS2
Insulin receptor substrate 2
Pancreatic beta cells


Pdx1
Pancreatic and duodenal
Pancreas



homeobox 1
(from definitive endoderm stage)


Alx3
Aristaless-like homeobox 3
Pancreatic beta cells




(from definitive endoderm stage)


Ppy
Pancreatic polypeptide
PP pancreatic cells




(gamma cells)







Cardiac specific promoters









Myh6
Myosin, heavy chain 6, cardiac muscle, alpha
Late differentiation marker of cardiac muscle


(aMHC)

cells (atrial specificity)


MYL2
Myosin, light chain 2, regulatory, cardiac,
Late differentiation marker of cardiac muscle


(MLC-2v)
slow
cells (ventricular specificity)


ITNNl3
Troponin I type 3 (cardiac)
Cardiomyocytes


(cTnl)

(from immature state)


ITNNl3
Troponin I type 3 (cardiac)
Cardiomyocytes


(cTnl)

(from immature state)


NPPA (ANF)
Natriuretic peptide precursor A (also named
Atrial specificity in adult cells



Atrial Natriuretic Factor)



Slc8a1
Solute carrier family 8 (sodium/calcium
Cardiomyocytes from early developmental


(Ncx1)
exchanger), member 1
stages







CNS specific promoters









SYN1 (hSyn)
Synapsin I
Neurons


GFAP
Glial fibrillary acidic protein
Astrocytes


INA
Internexin neuronal intermediate filament
Neuroprogenitors



protein, alpha (a-internexin)



NES
Nestin
Neuroprogenitors and ectoderm


MOBP
Myelin-associated oligodendrocyte basic
Oligodendrocytes



protein



MBP
Myelin basic protein
Oligodendrocytes


TH
Tyrosine hydroxylase
Dopaminergic neurons


FOXA2
Forkhead box A2
Dopaminergic neurons (also used as a marker of


(HNF3 beta)

endoderm)







Skin specific promoters









FLG
Filaggrin
Keratinocytes from granular layer


K14
Keratin 14
Keratinocytes from granular




and basal layers


TGM3
Transglutaminase 3
Keratinocytes from granular layer







Immune cell specific promoters









ITGAM
Integrin, alpha M (complement
Monocytes, macrophages, granulocytes,


(CD11B)
component 3 receptor 3 subunit)
natural killer cells







Urogential cell specific promoters









Pbsn
Probasin
Prostatic epithelium


Upk2
Uroplakin 2
Bladder


Sbp
Spermine binding protein
Prostate


Fer1l4
Fer-1-like 4
Bladder







Endothelial cell specific promoters









ENG
Endoglin
Endothelial cells







Pluripotent and embryonic cell specific promoters









Oct4
POU class 5 homeobox 1
Pluripotent cells


(POU5F1)

(germ cells, ES cells, iPS cells)


NANOG
Nanog homeobox
Pluripotent cells




(ES cells, iPS cells)


Synthetic
Synthetic promoter based on a Oct-4 core
Pluripotent cells (ES cells, iPS cells)


Oct4
enhancer element



T brachyury
Brachyury
Mesoderm


NES
Nestin
Neuroprogenitors and Ectoderm


SOX17
SRY (sex determining region Y)-box 17
Endoderm


FOXA2
Forkhead box A2
Endoderm (also used as a marker of


(HNFJ

dopaminergic neurons)


beta)




MIR122
MicroRNA 122
Endoderm and hepatocytes




(from early stage embryonic liver cells~









Depending on the host/vector system utilized, any of a number of suitable transcription and translation control elements, including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector (see e.g., Bitter et al. (1987) Methods in Enzymology, 153:516-544; incorporated herein by reference in its entirety).


In some embodiments, a nucleic acid encoding a gene modifying protein or template nucleic acid is operably linked to a control element, e.g., a transcriptional control element, such as a promoter. The transcriptional control element may, in some embodiment, be functional in either a eukaryotic cell, e.g., a mammalian cell; or a prokaryotic cell (e.g., bacterial or archaeal cell). In some embodiments, a nucleotide sequence encoding a polypeptide is operably linked to multiple control elements, e.g., that allow expression of the nucleotide sequence encoding the polypeptide in both prokaryotic and eukaryotic cells.


For illustration purposes, examples of spatially restricted promoters include, but are not limited to, neuron-specific promoters, adipocyte-specific promoters, cardiomyocyte-specific promoters, smooth muscle-specific promoters, photoreceptor-specific promoters, etc. Neuron-specific spatially restricted promoters include, but are not limited to, a neuron-specific enolase (NSE) promoter (see, e.g., EMBL HSENO2, X51956); an aromatic amino acid decarboxylase (AADC) promoter, a neurofilament promoter (see, e.g., GenBank HUMNFL, L04147); a synapsin promoter (see, e.g., GenBank HUMSYNIB, M55301); a thy-1 promoter (see, e.g., Chen et al. (1987) Cell 51:7-19; and Llewellyn, et al. (2010) Nat. Med. 16(10):1161-1166); a serotonin receptor promoter (see, e.g., GenBank S62283); a tyrosine hydroxylase promoter (TH) (see, e.g., Oh et al. (2009) Gene Ther 16:437; Sasaoka et al. (1992) Mol. Brain Res. 16:274; Boundy et al. (1998) J. Neurosci. 18:9989; and Kaneda et al. (1991) Neuron 6:583-594); a GnRH promoter (see, e.g., Radovick et al. (1991) Proc. Natl. Acad. Sci. USA 88:3402-3406); an L7 promoter (see, e.g., Oberdick et al. (1990) Science 248:223-226); a DNMT promoter (see, e.g., Bartge et al. (1988) Proc. Natl. Acad. Sci. USA 85:3648-3652); an enkephalin promoter (see, e.g., Comb et al. (1988) EMBO J. 17:3793-3805); a myelin basic protein (MBP) promoter; a Ca2+-calmodulin-dependent protein kinase II-alpha (CamKIIa) promoter (see, e.g., Mayford et al. (1996) Proc. Natl. Acad. Sci. USA 93:13250; and Casanova et al. (2001) Genesis 31:37); a CMV enhancer/platelet-derived growth factor-β promoter (see, e.g., Liu et al. (2004) Gene Therapy 11:52-60); and the like.


Adipocyte-specific spatially restricted promoters include, but are not limited to, the aP2 gene promoter/enhancer, e.g., a region from −5.4 kb to +21 bp of a human aP2 gene (see, e.g., Tozzo et al. (1997) Endocrinol. 138:1604; Ross et al. (1990) Proc. Natl. Acad. Sci. USA 87:9590; and Pavjani et al. (2005) Nat. Med. 11:797); a glucose transporter-4 (GLUT4) promoter (see, e.g., Knight et al. (2003) Proc. Natl. Acad. Sci. USA 100:14725); a fatty acid translocase (FAT/CD36) promoter (see, e.g., Kuriki et al. (2002) Biol. Pharm. Bull. 25:1476; and Sato et al. (2002) J. Biol. Chem. 277:15703); a stearoyl-CoA desaturase-1 (SCD1) promoter (Tabor et al. (1999) J. Biol. Chem. 274:20603); a leptin promoter (see, e.g., Mason et al. (1998) Endocrinol. 139:1013; and Chen et al. (1999) Biochem. Biophys. Res. Comm. 262:187); an adiponectin promoter (see, e.g., Kita et al. (2005) Biochem. Biophys. Res. Comm. 331:484; and Chakrabarti (2010) Endocrinol. 151:2408); an adipsin promoter (see, e.g., Platt et al. (1989) Proc. Natl. Acad. Sci. USA 86:7490); a resistin promoter (see, e.g., Seo et al. (2003) Molec. Endocrinol. 17:1522); and the like.


Cardiomyocyte-specific spatially restricted promoters include, but are not limited to, control sequences derived from the following genes: myosin light chain-2, α-myosin heavy chain, AE3, cardiac troponin C, cardiac actin, and the like. Franz et al. (1997) Cardiovasc. Res. 35:560-566; Robbins et al. (1995) Ann. N.Y. Acad. Sci. 752:492-505; Linn et al. (1995) Circ. Res. 76:584-591; Parmacek et al. (1994) Mol. Cell. Biol. 14:1870-1885; Hunter et al. (1993) Hypertension 22:608-617; and Sartorelli et al. (1992) Proc. Natl. Acad. Sci. USA 89:4047-4051.


Smooth muscle-specific spatially restricted promoters include, but are not limited to, an SM22a promoter (see, e.g., Akyilrek et al. (2000) Mol. Med. 6:983; and U.S. Pat. No. 7,169,874); a smoothelin promoter (see, e.g., WO 2001/018048); an α-smooth muscle actin promoter; and the like. For example, a 0.4 kb region of the SM22a promoter, within which lie two CArG elements, has been shown to mediate vascular smooth muscle cell-specific expression (see, e.g., Kim, et al. (1997) Mol. Cell. Biol. 17, 2266-2278; Li, et al., (1996) J. Cell Biol. 132, 849-859; and Moessler, et al. (1996) Development 122, 2415-2425).


Photoreceptor-specific spatially restricted promoters include, but are not limited to, a rhodopsin promoter; a rhodopsin kinase promoter (Young et al. (2003) Ophthalmol. Vis. Sci. 44:4076); a beta phosphodiesterase gene promoter (Nicoud et al. (2007) J. Gene Med. 9:1015); a retinitis pigmentosa gene promoter (Nicoud et al. (2007) supra); an interphotoreceptor retinoid-binding protein (IRBP) gene enhancer (Nicoud et al. (2007) supra); an IRBP gene promoter (Yokoyama et al. (1992) Exp Eye Res. 55:225); and the like.


In some embodiments, a gene modifying system, e.g., DNA encoding a gene modifying polypeptide, DNA encoding a template RNA, or DNA or RNA encoding a heterologous object sequence, is designed such that one or more elements is operably linked to a tissue-specific promoter, e.g., a promoter that is active in T-cells. In further embodiments, the T-cell active promoter is inactive in other cell types, e.g., B-cells, NK cells. In some embodiments, the T-cell active promoter is derived from a promoter for a gene encoding a component of the T-cell receptor, e.g., TRAC, TRBC, TRGC, TRDC. In some embodiments, the T-cell active promoter is derived from a promoter for a gene encoding a component of a T-cell-specific cluster of differentiation protein, e.g., CD3, e.g., CD3D, CD3E, CD3G, CD3Z. In some embodiments, T-cell-specific promoters in gene modifying systems are discovered by comparing publicly available gene expression data across cell types and selecting promoters from the genes with enhanced expression in T-cells. In some embodiments, promoters may be selecting depending on the desired expression breadth, e.g., promoters that are active in T-cells only, promoters that are active in NK cells only, promoters that are active in both T-cells and NK cells.


Cell-specific promoters known in the art may be used to direct expression of a gene modifying protein, e.g., as described herein. Nonlimiting exemplary mammalian cell-specific promoters have been characterized and used in mice expressing Cre recombinase in a cell-specific manner. Certain nonlimiting exemplary mammalian cell-specific promoters are listed in Table 1 of U.S. Pat. No. 9,845,481, incorporated herein by reference.


In some embodiments, a vector as described herein comprises an expression cassette. Typically, an expression cassette comprises the nucleic acid molecule of the instant invention operatively linked to a promoter sequence. For example, a promoter is operatively linked with a coding sequence when it is capable of affecting the expression of that coding sequence (e.g., the coding sequence is under the transcriptional control of the promoter). Encoding sequences can be operatively linked to regulatory sequences in sense or antisense orientation. In certain embodiments, the promoter is a heterologous promoter. In certain embodiments, an expression cassette may comprise additional elements, for example, an intron, an enhancer, a polyadenylation site, a woodchuck response element (WRE), and/or other elements known to affect expression levels of the encoding sequence. A promoter typically controls the expression of a coding sequence or functional RNA. In certain embodiments, a promoter sequence comprises proximal and more distal upstream elements and can further comprise an enhancer element. An enhancer can typically stimulate promoter activity and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissue-specificity of a promoter. In certain embodiments, the promoter is derived in its entirety from a native gene. In certain embodiments, the promoter is composed of different elements derived from different naturally occurring promoters. In certain embodiments, the promoter comprises a synthetic nucleotide sequence. It will be understood by those skilled in the art that different promoters will direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions or to the presence or the absence of a drug or transcriptional co-factor. Ubiquitous, cell-type-specific, tissue-specific, developmental stage-specific, and conditional promoters, for example, drug-responsive promoters (e.g., tetracycline-responsive promoters) are well known to those of skill in the art. Exemplary promoters include, but are not limited to, the phosphoglycerate kinase (PKG) promoter, CAG (composite of the CMV enhancer the chicken beta actin promoter (CBA) and the rabbit beta globin intron), NSE (neuronal specific enolase), synapsin or NeuN promoters, the SV40 early promoter, mouse mammary tumor virus LTR promoter; adenovirus major late promoter (Ad MLP), a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), SFFV promoter, rous sarcoma virus (RSV) promoter, synthetic promoters, hybrid promoters, and the like. Other promoters can be of human origin or from other species, including from mice. Common promoters include, e.g., the human cytomegalovirus (CMV) immediate early gene promoter, the SV40 early promoter, the Rous sarcoma virus long terminal repeat, [beta]-actin, rat insulin promoter, the phosphoglycerate kinase promoter, the human alpha-1 antitrypsin (hAAT) promoter, the transthyretin promoter, the TBG promoter and other liver-specific promoters, the desmin promoter and similar muscle-specific promoters, the EF1-alpha promoter, hybrid promoters with multi-tissue specificity, promoters specific for neurons like synapsin and glyceraldehyde-3-phosphate dehydrogenase promoter, all of which are promoters well known and readily available to those of skill in the art, can be used to obtain high-level expression of the coding sequence of interest. In addition, sequences derived from non-viral genes, such as the murine metallothionein gene, will also find use herein. Such promoter sequences are commercially available from, e.g., Stratagene (San Diego, CA). Additional exemplary promoter sequences are described, for example, in WO2018213786A1 (incorporated by reference herein in its entirety).


In some embodiments, the apolipoprotein E enhancer (ApoE) or a functional fragment thereof is used, e.g., to drive expression in the liver. In some embodiments, two copies of the ApoE enhancer or a functional fragment thereof are used. In some embodiments, the ApoE enhancer or functional fragment thereof is used in combination with a promoter, e.g., the human alpha-1 antitrypsin (hAAT) promoter.


In some embodiments, the regulatory sequences impart tissue-specific gene expression capabilities. In some cases, the tissue-specific regulatory sequences bind tissue-specific transcription factors that induce transcription in a tissue specific manner. Various tissue-specific regulatory sequences (e.g., promoters, enhancers, etc.) are known in the art. Exemplary tissue-specific regulatory sequences include, but are not limited to, the following tissue-specific promoters: a liver-specific thyroxin binding globulin (TBG) promoter, a insulin promoter, a glucagon promoter, a somatostatin promoter, a pancreatic polypeptide (PPY) promoter, a synapsin-1 (Syn) promoter, a creatine kinase (MCK) promoter, a mammalian desmin (DES) promoter, a α-myosin heavy chain (a-MHC) promoter, or a cardiac Troponin T (cTnT) promoter. Other exemplary promoters include Beta-actin promoter, hepatitis B virus core promoter, Sandig et al., Gene Ther., 3:1002-9 (1996); alpha-fetoprotein (AFP) promoter, Arbuthnot et al., Hum. Gene Ther., 7:1503-14 (1996)), bone osteocalcin promoter (Stein et al., Mol. Biol. Rep., 24:185-96 (1997)); bone sialoprotein promoter (Chen et al., J. Bone Miner. Res., 11:654-64 (1996)), CD2 promoter (Hansal et al., J. Immunol., 161:1063-8 (1998); immunoglobulin heavy chain promoter; T cell receptor α-chain promoter, neuronal such as neuron-specific enolase (NSE) promoter (Andersen et al., Cell. Mol. Neurobiol., 13:503-15 (1993)), neurofilament light-chain gene promoter (Piccioli et al., Proc. Natl. Acad. Sci. USA, 88:5611-5 (1991)), and the neuron-specific vgf gene promoter (Piccioli et al., Neuron, 15:373-84 (1995)), and others. Additional exemplary promoter sequences are described, for example, in U.S. patent Ser. No. 10/300,146 (incorporated herein by reference in its entirety). In some embodiments, a tissue-specific regulatory element, e.g., a tissue-specific promoter, is selected from one known to be operably linked to a gene that is highly expressed in a given tissue, e.g., as measured by RNA-seq or protein expression data, or a combination thereof. Methods for analyzing tissue specificity by expression are taught in Fagerberg et al. Mol Cell Proteomics 13(2):397-406 (2014), which is incorporated herein by reference in its entirety.


In some embodiments, a vector described herein is a multicistronic expression construct. Multicistronic expression constructs include, for example, constructs harboring a first expression cassette, e.g. comprising a first promoter and a first encoding nucleic acid sequence, and a second expression cassette, e.g. comprising a second promoter and a second encoding nucleic acid sequence. Such multicistronic expression constructs may, in some instances, be particularly useful in the delivery of non-translated gene products, such as hairpin RNAs, together with a polypeptide, for example, a gene modifying polypeptide and gene modifying template. In some embodiments, multicistronic expression constructs may exhibit reduced expression levels of one or more of the included transgenes, for example, because of promoter interference or the presence of incompatible nucleic acid elements in close proximity. If a multicistronic expression construct is part of a viral vector, the presence of a self-complementary nucleic acid sequence may, in some instances, interfere with the formation of structures necessary for viral reproduction or packaging.


In some embodiments, the sequence encodes an RNA with a hairpin. In some embodiments, the hairpin RNA is a guide RNA, a template RNA, a shRNA, or a microRNA. In some embodiments, the first promoter is an RNA polymerase I promoter. In some embodiments, the first promoter is an RNA polymerase II promoter. In some embodiments, the second promoter is an RNA polymerase III promoter. In some embodiments, the second promoter is a U6 or H1 promoter.


Without wishing to be bound by theory, multicistronic expression constructs may not achieve optimal expression levels as compared to expression systems containing only one cistron. One of the suggested causes of lower expression levels achieved with multicistronic expression constructs comprising two or more promoter elements is the phenomenon of promoter interference (see, e.g., Curtin J A, Dane A P, Swanson A, Alexander I E, Ginn S L. Bidirectional promoter interference between two widely used internal heterologous promoters in a late-generation lentiviral construct. Gene Ther. 2008 March; 15(5):384-90; and Martin-Duque P, Jezzard S, Kaftansis L, Vassaux G. Direct comparison of the insulating properties of two genetic elements in an adenoviral vector containing two different expression cassettes. Hum Gene Ther. 2004 October; 15(10):995-1002; both references incorporated herein by reference for disclosure of promoter interference phenomenon). In some embodiments, the problem of promoter interference may be overcome, e.g., by producing multicistronic expression constructs comprising only one promoter driving transcription of multiple encoding nucleic acid sequences separated by internal ribosomal entry sites, or by separating cistrons comprising their own promoter with transcriptional insulator elements. In some embodiments, single-promoter driven expression of multiple cistrons may result in uneven expression levels of the cistrons. In some embodiments, a promoter cannot efficiently be isolated and isolation elements may not be compatible with some gene transfer vectors, for example, some retroviral vectors.


MicroRNAs


MicroRNAs (miRNAs) and other small interfering nucleic acids generally regulate gene expression via target RNA transcript cleavage/degradation or translational repression of the target messenger RNA (mRNA). miRNAs may, in some instances, be natively expressed, typically as final 19-25 non-translated RNA products. miRNAs generally exhibit their activity through sequence-specific interactions with the 3′ untranslated regions (UTR) of target mRNAs. These endogenously expressed miRNAs may form hairpin precursors that are subsequently processed into an miRNA duplex, and further into a mature single stranded miRNA molecule. This mature miRNA generally guides a multiprotein complex, miRISC, which identifies target 3′ UTR regions of target mRNAs based upon their complementarity to the mature miRNA. Useful transgene products may include, for example, miRNAs or miRNA binding sites that regulate the expression of a linked polypeptide. A non-limiting list of miRNA genes; the products of these genes and their homologues are useful as transgenes or as targets for small interfering nucleic acids (e.g., miRNA sponges, antisense oligonucleotides), e.g., in methods such as those listed in U.S. Ser. No. 10/300,146, 22:25-25:48, are herein incorporated by reference. In some embodiments, one or more binding sites for one or more of the foregoing miRNAs are incorporated in a transgene, e.g., a transgene delivered by a rAAV vector, e.g., to inhibit the expression of the transgene in one or more tissues of an animal harboring the transgene. In some embodiments, a binding site may be selected to control the expression of a transgene in a tissue specific manner. For example, binding sites for the liver-specific miR-122 may be incorporated into a transgene to inhibit expression of that transgene in the liver. Additional exemplary miRNA sequences are described, for example, in U.S. Pat. No. 10,300,146 (incorporated herein by reference in its entirety).


An miR inhibitor or miRNA inhibitor is generally an agent that blocks miRNA expression and/or processing. Examples of such agents include, but are not limited to, microRNA antagonists, microRNA specific antisense, microRNA sponges, and microRNA oligonucleotides (double-stranded, hairpin, short oligonucleotides) that inhibit miRNA interaction with a Drosha complex. MicroRNA inhibitors, e.g., miRNA sponges, can be expressed in cells from transgenes (e.g., as described in Ebert, M. S. Nature Methods, Epub Aug. 12, 2007; incorporated by reference herein in its entirety). In some embodiments, microRNA sponges, or other miR inhibitors, are used with the AAVs. microRNA sponges generally specifically inhibit miRNAs through a complementary heptameric seed sequence. In some embodiments, an entire family of miRNAs can be silenced using a single sponge sequence. Other methods for silencing miRNA function (derepression of miRNA targets) in cells will be apparent to one of ordinary skill in the art.


In some embodiments, a gene modifying system, template RNA, or polypeptide described herein is administered to or is active in (e.g., is more active in) a target tissue, e.g., a first tissue. In some embodiments, the gene modifying system, template RNA, or polypeptide is not administered to or is less active in (e.g., not active in) a non-target tissue. In some embodiments, a gene modifying system, template RNA, or polypeptide described herein is useful for modifying DNA in a target tissue, e.g., a first tissue, (e.g., and not modifying DNA in a non-target tissue).


In some embodiments, a gene modifying system comprises (a) a polypeptide described herein or a nucleic acid encoding the same, (b) a template nucleic acid (e.g., template RNA) described herein, and (c) one or more first tissue-specific expression-control sequences specific to the target tissue, wherein the one or more first tissue-specific expression-control sequences specific to the target tissue are in operative association with (a), (b), or (a) and (b), wherein, when associated with (a), (a) comprises a nucleic acid encoding the polypeptide.


In some embodiments, the nucleic acid in (b) comprises RNA.


In some embodiments, the nucleic acid in (b) comprises DNA.


In some embodiments, the nucleic acid in (b): (i) is single-stranded or comprises a single-stranded segment, e.g., is single-stranded DNA or comprises a single-stranded segment and one or more double stranded segments; (ii) has inverted terminal repeats; or (iii) both (i) and (ii).


In some embodiments, the nucleic acid in (b) is double-stranded or comprises a double-stranded segment.


In some embodiments, (a) comprises a nucleic acid encoding the polypeptide.


In some embodiments, the nucleic acid in (a) comprises RNA.


In some embodiments, the nucleic acid in (a) comprises DNA.


In some embodiments, the nucleic acid in (a): (i) is single-stranded or comprises a single-stranded segment, e.g., is single-stranded DNA or comprises a single-stranded segment and one or more double stranded segments; (ii) has inverted terminal repeats; or (iii) both (i) and (ii).


In some embodiments, the nucleic acid in (a) is double-stranded or comprises a double-stranded segment.


In some embodiments, the nucleic acid in (a), (b), or (a) and (b) is linear.


In some embodiments, the nucleic acid in (a), (b), or (a) and (b) is circular, e.g., a plasmid or minicircle.


In some embodiments, the heterologous object sequence is in operative association with a first promoter.


In some embodiments, the one or more first tissue-specific expression-control sequences comprise a tissue specific promoter.


In some embodiments, the tissue-specific promoter comprises a first promoter in operative association with: (i) the heterologous object sequence, (ii) a nucleic acid encoding the retroviral RT, or (iii) (i) and (ii).


In some embodiments, the one or more first tissue-specific expression-control sequences comprise a tissue-specific microRNA recognition sequence in operative association with: (i) the heterologous object sequence, (ii) a nucleic acid encoding the retroviral RT domain, or (iii) (i) and (ii).


In some embodiments, a system comprises a tissue-specific promoter, and the system further comprises one or more tissue-specific microRNA recognition sequences, wherein: (i) the tissue specific promoter is in operative association with: (I) the heterologous object sequence, (II) a nucleic acid encoding the retroviral RT domain, or (III) (I) and (II); and/or (ii) the one or more tissue-specific microRNA recognition sequences are in operative association with: (I) the heterologous object sequence, (II) a nucleic acid encoding the retroviral RT, or (III) (I) and (II).


In some embodiments, wherein (a) comprises a nucleic acid encoding the polypeptide, the nucleic acid comprises a promoter in operative association with the nucleic acid encoding the polypeptide.


In some embodiments, the nucleic acid encoding the polypeptide comprises one or more second tissue-specific expression-control sequences specific to the target tissue in operative association with the polypeptide coding sequence.


In some embodiments, the one or more second tissue-specific expression-control sequences comprise a tissue specific promoter.


In some embodiments, the tissue-specific promoter is the promoter in operative association with the nucleic acid encoding the polypeptide.


In some embodiments, the one or more second tissue-specific expression-control sequences comprise a tissue-specific microRNA recognition sequence.


In some embodiments, the promoter in operative association with the nucleic acid encoding the polypeptide is a tissue-specific promoter, the system further comprising one or more tissue-specific microRNA recognition sequences.


In some embodiments, a nucleic acid component of a system provided by the invention is a sequence (e.g., encoding the polypeptide or comprising a heterologous object sequence) flanked by untranslated regions (UTRs) that modify protein expression levels. Various 5′ and 3′ UTRs can affect protein expression. For example, in some embodiments, the coding sequence may be preceded by a 5′ UTR that modifies RNA stability or protein translation. In some embodiments, the sequence may be followed by a 3′ UTR that modifies RNA stability or translation. In some embodiments, the sequence may be preceded by a 5′ UTR and followed by a 3′ UTR that modify RNA stability or translation. In some embodiments, the 5′ and/or 3′ UTR may be selected from the 5′ and 3′ UTRs of complement factor 3 (C3) (CACTCCTCCCCATCCTCTCCCTCTGTCCCTCTGTCCCTCTGACCCTGCACTGTCCCAG CACC; SEQ ID NO: 11,004) or orosomucoid 1 (ORM1) (CAGGACACAGCCTTGGATCAGGACAGAGACTTGGGGGCCATCCTGCCCCTCCAACC CGACATGTGTACCTCAGCTTTTTCCCTCACTTGCATCAATAAAGCTTCTGTGTTTGGA ACAGCTAA; SEQ ID NO: 11,005) (Asrani et al. RNA Biology 2018). In certain embodiments, the 5′ UTR is the 5′ UTR from C3 and the 3′ UTR is the 3′ UTR from ORM1. In certain embodiments, a 5′ UTR and 3′ UTR for protein expression, e.g., mRNA (or DNA encoding the RNA) for a gene modifying polypeptide or heterologous object sequence, comprise optimized expression sequences. In some embodiments, the 5′ UTR comprises GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACC (SEQ ID NO: 11,006) and/or the 3′ UTR comprising UGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCC AGCCCCUCCUCCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGA (SEQ ID NO: 11,007), e.g., as described in Richner et al. Cell 168(6): P1114-1125 (2017), the sequences of which are incorporated herein by reference. In some embodiments, a 5′ and/or 3′ UTR may be selected to enhance protein expression.


In some embodiments, a 5′ and/or 3′ UTR may be selected to modify protein expression such that overproduction inhibition is minimized. In some embodiments, UTRs are around a coding sequence, e.g., outside the coding sequence and in other embodiments proximal to the coding sequence. In some embodiments, additional regulatory elements (e.g., miRNA binding sites, cis-regulatory sites) are included in the UTRs.


In some embodiments, an open reading frame of a gene modifying system, e.g., an ORF of an mRNA (or DNA encoding an mRNA) encoding a gene modifying polypeptide or one or more ORFs of an mRNA (or DNA encoding an mRNA) of a heterologous object sequence, is flanked by a 5′ and/or 3′ untranslated region (UTR) that enhances the expression thereof. In some embodiments, the 5′ UTR of an mRNA component (or transcript produced from a DNA component) of the system comprises the sequence 5′-GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGAGCCACC-3′; SEQ ID NO: 11,008). In some embodiments, the 3′ UTR of an mRNA component (or transcript produced from a DNA component) of the system comprises the sequence 5′-UGAUAAUAGGCUGGAGCCUCGGUGGCCAUGCUUCUUGCCCCUUGGGCCUCCCCCC AGCCCCUCCUCCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGA-3′ (SEQ ID NO: 11,009). This combination of 5′ UTR and 3′ UTR has been shown to result in desirable expression of an operably linked ORF by Richner et al. Cell 168(6): P1114-1125 (2017), the teachings and sequences of which are incorporated herein by reference. In some embodiments, a system described herein comprises a DNA encoding a transcript, wherein the DNA comprises the corresponding 5′ UTR and 3′ UTR sequences, with T substituting for U in the above-listed sequence). In some embodiments, a DNA vector used to produce an RNA component of the system further comprises a promoter upstream of the 5′ UTR for initiating in vitro transcription, e.g, a T7, T3, or SP6 promoter. The 5′ UTR above begins with GGG, which is a suitable start for optimizing transcription using T7 RNA polymerase. For tuning transcription levels and altering the transcription start site nucleotides to fit alternative 5′ UTRs, the teachings of Davidson et al. Pac Symp Biocomput 433-443 (2010) describe T7 promoter variants, and the methods of discovery thereof, that fulfill both of these traits.


Viral Vectors and Components Thereof

Viruses are a useful source of delivery vehicles for the systems described herein, in addition to a source of relevant enzymes or domains as described herein, e.g., as sources of polymerases and polymerase functions used herein, e.g., DNA-dependent DNA polymerase, RNA-dependent RNA polymerase, RNA-dependent DNA polymerase, DNA-dependent RNA polymerase, reverse transcriptase. Some enzymes, e.g., reverse transcriptases, may have multiple activities, e.g., be capable of both RNA-dependent DNA polymerization and DNA-dependent DNA polymerization, e.g., first and second strand synthesis. In some embodiments, the virus used as a gene modifying delivery system or a source of components thereof may be selected from a group as described by Baltimore Bacteriol Rev 35(3):235-241 (1971).


In some embodiments, the virus is selected from a Group I virus, e.g., is a DNA virus and packages dsDNA into virions. In some embodiments, the Group I virus is selected from, e.g., Adenoviruses, Herpesviruses, Poxviruses.


In some embodiments, the virus is selected from a Group II virus, e.g., is a DNA virus and packages ssDNA into virions. In some embodiments, the Group II virus is selected from, e.g., Parvoviruses. In some embodiments, the parvovirus is a dependoparvovirus, e.g., an adeno-associated virus (AAV).


In some embodiments, the virus is selected from a Group III virus, e.g., is an RNA virus and packages dsRNA into virions. In some embodiments, the Group III virus is selected from, e.g., Reoviruses. In some embodiments, one or both strands of the dsRNA contained in such virions is a coding molecule able to serve directly as mRNA upon transduction into a host cell, e.g., can be directly translated into protein upon transduction into a host cell without requiring any intervening nucleic acid replication or polymerization steps.


In some embodiments, the virus is selected from a Group IV virus, e.g., is an RNA virus and packages ssRNA(+) into virions. In some embodiments, the Group IV virus is selected from, e.g., Coronaviruses, Picornaviruses, Togaviruses. In some embodiments, the ssRNA(+) contained in such virions is a coding molecule able to serve directly as mRNA upon transduction into a host cell, e.g., can be directly translated into protein upon transduction into a host cell without requiring any intervening nucleic acid replication or polymerization steps.


In some embodiments, the virus is selected from a Group V virus, e.g., is an RNA virus and packages ssRNA(−) into virions. In some embodiments, the Group V virus is selected from, e.g., Orthomyxoviruses, Rhabdoviruses. In some embodiments, an RNA virus with an ssRNA(−) genome also carries an enzyme inside the virion that is transduced to host cells with the viral genome, e.g., an RNA-dependent RNA polymerase, capable of copying the ssRNA(−) into ssRNA(+) that can be translated directly by the host.


In some embodiments, the virus is selected from a Group VI virus, e.g., is a retrovirus and packages ssRNA(+) into virions. In some embodiments, the Group VI virus is selected from, e.g., retroviruses. In some embodiments, the retrovirus is a lentivirus, e.g., HIV-1, HIV-2, SIV, BIV. In some embodiments, the retrovirus is a spumavirus, e.g., a foamy virus, e.g., HFV, SFV, BFV. In some embodiments, the ssRNA(+) contained in such virions is a coding molecule able to serve directly as mRNA upon transduction into a host cell, e.g., can be directly translated into protein upon transduction into a host cell without requiring any intervening nucleic acid replication or polymerization steps. In some embodiments, the ssRNA(+) is first reverse transcribed and copied to generate a dsDNA genome intermediate from which mRNA can be transcribed in the host cell. In some embodiments, an RNA virus with an ssRNA(+) genome also carries an enzyme inside the virion that is transduced to host cells with the viral genome, e.g., an RNA-dependent DNA polymerase, capable of copying the ssRNA(+) into dsDNA that can be transcribed into mRNA and translated by the host. In some embodiments, the reverse transcriptase from a Group VI retrovirus is incorporated as the reverse transcriptase domain of a gene modifying polypeptide.


In some embodiments, the virus is selected from a Group VII virus, e.g., is a retrovirus and packages dsRNA into virions. In some embodiments, the Group VII virus is selected from, e.g., Hepadnaviruses. In some embodiments, one or both strands of the dsRNA contained in such virions is a coding molecule able to serve directly as mRNA upon transduction into a host cell, e.g., can be directly translated into protein upon transduction into a host cell without requiring any intervening nucleic acid replication or polymerization steps. In some embodiments, one or both strands of the dsRNA contained in such virions is first reverse transcribed and copied to generate a dsDNA genome intermediate from which mRNA can be transcribed in the host cell. In some embodiments, an RNA virus with a dsRNA genome also carries an enzyme inside the virion that is transduced to host cells with the viral genome, e.g., an RNA-dependent DNA polymerase, capable of copying the dsRNA into dsDNA that can be transcribed into mRNA and translated by the host. In some embodiments, the reverse transcriptase from a Group VII retrovirus is incorporated as the reverse transcriptase domain of a gene modifying polypeptide.


In some embodiments, virions used to deliver nucleic acid in this invention may also carry enzymes involved in the process of gene modification. For example, a retroviral virion may contain a reverse transcriptase domain that is delivered into a host cell along with the nucleic acid. In some embodiments, an RNA template may be associated with a gene modifying polypeptide within a virion, such that both are co-delivered to a target cell upon transduction of the nucleic acid from the viral particle. In some embodiments, the nucleic acid in a virion may comprise DNA, e.g., linear ssDNA, linear dsDNA, circular ssDNA, circular dsDNA, minicircle DNA, dbDNA, ceDNA. In some embodiments, the nucleic acid in a virion may comprise RNA, e.g., linear ssRNA, linear dsRNA, circular ssRNA, circular dsRNA. In some embodiments, a viral genome may circularize upon transduction into a host cell, e.g., a linear ssRNA molecule may undergo a covalent linkage to form a circular ssRNA, a linear dsRNA molecule may undergo a covalent linkage to form a circular dsRNA or one or more circular ssRNA. In some embodiments, a viral genome may replicate by rolling circle replication in a host cell. In some embodiments, a viral genome may comprise a single nucleic acid molecule, e.g., comprise a non-segmented genome. In some embodiments, a viral genome may comprise two or more nucleic acid molecules, e.g., comprise a segmented genome. In some embodiments, a nucleic acid in a virion may be associated with one or proteins. In some embodiments, one or more proteins in a virion may be delivered to a host cell upon transduction. In some embodiments, a natural virus may be adapted for nucleic acid delivery by the addition of virion packaging signals to the target nucleic acid, wherein a host cell is used to package the target nucleic acid containing the packaging signals.


In some embodiments, a virion used as a delivery vehicle may comprise a commensal human virus. In some embodiments, a virion used as a delivery vehicle may comprise an anellovirus, the use of which is described in WO2018232017A1, which is incorporated herein by reference in its entirety.


AAV Administration

In some embodiments, an adeno-associated virus (AAV) is used in conjunction with the system, template nucleic acid, and/or polypeptide described herein. In some embodiments, an AAV is used to deliver, administer, or package the system, template nucleic acid, and/or polypeptide described herein. In some embodiments, the AAV is a recombinant AAV (rAAV).


In some embodiments, a system comprises (a) a polypeptide described herein or a nucleic acid encoding the same, (b) a template nucleic acid (e.g., template RNA) described herein, and (c) one or more first tissue-specific expression-control sequences specific to the target tissue, wherein the one or more first tissue-specific expression-control sequences specific to the target tissue are in operative association with (a), (b), or (a) and (b), wherein, when associated with (a), (a) comprises a nucleic acid encoding the polypeptide.


In some embodiments, a system described herein further comprises a first recombinant adeno-associated virus (rAAV) capsid protein; wherein the at least one of (a) or (b) is associated with the first rAAV capsid protein, wherein at least one of (a) or (b) is flanked by AAV inverted terminal repeats (ITRs).


In some embodiments, (a) and (b) are associated with the first rAAV capsid protein.


In some embodiments, (a) and (b) are on a single nucleic acid.


In some embodiments, the system further comprises a second rAAV capsid protein, wherein at least one of (a) or (b) is associated with the second rAAV capsid protein, and wherein the at least one of (a) or (b) associated with the second rAAV capsid protein is different from the at least one of (a) or (b) is associated with the first rAAV capsid protein.


In some embodiments, the at least one of (a) or (b) is associated with the first or second rAAV capsid protein is dispersed in the interior of the first or second rAAV capsid protein, which first or second rAAV capsid protein is in the form of an AAV capsid particle.


In some embodiments, the system further comprises a nanoparticle, wherein the nanoparticle is associated with at least one of (a) or (b).


In some embodiments, (a) and (b), respectively are associated with: a) a first rAAV capsid protein and a second rAAV capsid protein; b) a nanoparticle and a first rAAV capsid protein; c) a first rAAV capsid protein; d) a first adenovirus capsid protein; e) a first nanoparticle and a second nanoparticle; or f) a first nanoparticle.


Viral vectors are useful for delivering all or part of a system provided by the invention, e.g., for use in methods provided by the invention. Systems derived from different viruses have been employed for the delivery of polypeptides or nucleic acids; for example: integrase-deficient lentivirus, adenovirus, adeno-associated virus (AAV), herpes simplex virus, and baculovirus (reviewed in Hodge et al. Hum Gene Ther 2017; Narayanavari et al. Crit Rev Biochem Mol Biol 2017; Boehme et al. Curr Gene Ther 2015).


Adenoviruses are common viruses that have been used as gene delivery vehicles given well-defined biology, genetic stability, high transduction efficiency, and ease of large-scale production (see, for example, review by Lee et al. Genes & Diseases 2017). They possess linear dsDNA genomes and come in a variety of serotypes that differ in tissue and cell tropisms. In order to prevent replication of infectious virus in recipient cells, adenovirus genomes used for packaging are deleted of some or all endogenous viral proteins, which are provided in trans in viral production cells. This renders the genomes helper-dependent, meaning they can only be replicated and packaged into viral particles in the presence of the missing components provided by so-called helper functions. A helper-dependent adenovirus system with all viral ORFs removed may be compatible with packaging foreign DNA of up to −37 kb (Parks et al. J Virol 1997). In some embodiments, an adenoviral vector is used to deliver DNA corresponding to the polypeptide or template component of the gene modifying system, or both are contained on separate or the same adenoviral vector. In some embodiments, the adenovirus is a helper-dependent adenovirus (HD-AdV) that is incapable of self-packaging. In some embodiments, the adenovirus is a high-capacity adenovirus (HC-AdV) that has had all or a substantial portion of endogenous viral ORFs deleted, while retaining the necessary sequence components for packaging into adenoviral particles. For this type of vector, the only adenoviral sequences required for genome packaging are noncoding sequences: the inverted terminal repeats (ITRs) at both ends and the packaging signal at the 5′-end (Jager et al. Nat Protoc 2009). In some embodiments, the adenoviral genome also comprises stuffer DNA to meet a minimal genome size for optimal production and stability (see, for example, Hausl et al. Mol Ther 2010). In some embodiments, an adenovirus is used to deliver a gene modifying system to the liver.


In some embodiments, an adenovirus is used to deliver a gene modifying system to HSCs, e.g., HDAd5/35++. HDAd5/35++ is an adenovirus with modified serotype 35 fibers that de-target the vector from the liver (Wang et al. Blood Adv 2019). In some embodiments, the adenovirus that delivers a gene modifying system to HSCs utilizes a receptor that is expressed specifically on primitive HSCs, e.g., CD46.


Adeno-associated viruses (AAV) belong to the parvoviridae family and more specifically constitute the dependoparvovirus genus. The AAV genome is composed of a linear single-stranded DNA molecule which contains approximately 4.7 kilobases (kb) and consists of two major open reading frames (ORFs) encoding the non-structural Rep (replication) and structural Cap (capsid) proteins. A second ORF within the cap gene was identified that encodes the assembly-activating protein (AAP). The DNAs flanking the AAV coding regions are two cis-acting inverted terminal repeat (ITR) sequences, approximately 145 nucleotides in length, with interrupted palindromic sequences that can be folded into energetically stable hairpin structures that function as primers of DNA replication. In addition to their role in DNA replication, the ITR sequences have been shown to be involved in viral DNA integration into the cellular genome, rescue from the host genome or plasmid, and encapsidation of viral nucleic acid into mature virions (Muzyczka, (1992) Curr. Top. Micro. Immunol. 158:97-129). In some embodiments, one or more gene modifying nucleic acid components is flanked by ITRs derived from AAV for viral packaging. See, e.g., WO2019113310.


In some embodiments, one or more components of the gene modifying system are carried via at least one AAV vector. In some embodiments, the at least one AAV vector is selected for tropism to a particular cell, tissue, organism. In some embodiments, the AAV vector is pseudotyped, e.g., AAV2/8, wherein AAV2 describes the design of the construct but the capsid protein is replaced by that from AAV8. It is understood that any of the described vectors could be pseudotype derivatives, wherein the capsid protein used to package the AAV genome is derived from that of a different AAV serotype. Without wishing to be limited in vector choice, a list of exemplary AAV serotypes can be found in Table 18. In some embodiments, an AAV to be employed for gene modifying may be evolved for novel cell or tissue tropism as has been demonstrated in the literature (e.g., Davidsson et al. Proc Natl Acad Sci USA 2019).


In some embodiments, the AAV delivery vector is a vector which has two AAV inverted terminal repeats (ITRs) and a nucleotide sequence of interest (for example, a sequence coding for a gene modifying polypeptideor a DNA template, or both), each of said ITRs having an interrupted (or noncontiguous) palindromic sequence, i.e., a sequence composed of three segments: a first segment and a last segment that are identical when read 5′→3′ but hybridize when placed against each other, and a segment that is different that separates the identical segments. See, for example, WO2012123430.


Conventionally, AAV virions with capsids are produced by introducing a plasmid or plasmids encoding the rAAV or scAAV genome, Rep proteins, and Cap proteins (Grimm et al, 1998). Upon introduction of these helper plasmids in trans, the AAV genome is “rescued” (i.e., released and subsequently recovered) from the host genome, and is further encapsidated to produce infectious AAV. In some embodiments, one or more gene modifying nucleic acids are packaged into AAV particles by introducing the ITR-flanked nucleic acids into a packaging cell in conjunction with the helper functions.


In some embodiments, the AAV genome is a so called self-complementary genome (referred to as scAAV), such that the sequence located between the ITRs contains both the desired nucleic acid sequence (e.g., DNA encoding the gene modifying polypeptide or template, or both) in addition to the reverse complement of the desired nucleic acid sequence, such that these two components can fold over and self-hybridize. In some embodiments, the self-complementary modules are separated by an intervening sequence that permits the DNA to fold back on itself, e.g., forms a stem-loop. An scAAV has the advantage of being poised for transcription upon entering the nucleus, rather than being first dependent on ITR priming and second-strand synthesis to form dsDNA. In some embodiments, one or more gene modifying components is designed as an scAAV, wherein the sequence between the AAV ITRs contains two reverse complementing modules that can self-hybridize to create dsDNA.


In some embodiments, nucleic acid (e.g., encoding a polypeptide, or a template, or both) delivered to cells is closed-ended, linear duplex DNA (CELiD DNA or ceDNA). In some embodiments, ceDNA is derived from the replicative form of the AAV genome (Li et al. PLoS One 2013). In some embodiments, the nucleic acid (e.g., encoding a polypeptide, or a template DNA, or both) is flanked by ITRs, e.g., AAV ITRs, wherein at least one of the ITRs comprises a terminal resolution site and a replication protein binding site (sometimes referred to as a replicative protein binding site). In some embodiments, the ITRs are derived from an adeno-associated virus, e.g., AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, or a combination thereof. In some embodiments, the ITRs are symmetric. In some embodiments, the ITRs are asymmetric. In some embodiments, at least one Rep protein is provided to enable replication of the construct. In some embodiments, the at least one Rep protein is derived from an adeno-associated virus, e.g., AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, or a combination thereof. In some embodiments, ceDNA is generated by providing a production cell with (i) DNA flanked by ITRs, e.g., AAV ITRs, and (ii) components required for ITR-dependent replication, e.g., AAV proteins Rep78 and Rep52 (or nucleic acid encoding the proteins). In some embodiments, ceDNA is free of any capsid protein, e.g., is not packaged into an infectious AAV particle. In some embodiments, ceDNA is formulated into LNPs (see, for example, WO2019051289A1).


In some embodiments, the ceDNA vector consists of two self-complementary sequences, e.g., asymmetrical or symmetrical or substantially symmetrical ITRs as defined herein, flanking said expression cassette, wherein the ceDNA vector is not associated with a capsid protein. In some embodiments, the ceDNA vector comprises two self-complementary sequences found in an AAV genome, where at least one ITR comprises an operative Rep-binding element (RBE) (also sometimes referred to herein as “RBS”) and a terminal resolution site (trs) of AAV or a functional variant of the RBE. See, for example, WO2019113310.


In some embodiments, the AAV genome comprises two genes that encode four replication proteins and three capsid proteins, respectively. In some embodiments, the genes are flanked on either side by 145-bp inverted terminal repeats (ITRs). In some embodiments, the virion comprises up to three capsid proteins (Vp1, Vp2, and/or Vp3), e.g., produced in a 1:1:10 ratio. In some embodiments, the capsid proteins are produced from the same open reading frame and/or from differential splicing (Vp1) and alternative translational start sites (Vp2 and Vp3, respectively). Generally, Vp3 is the most abundant subunit in the virion and participates in receptor recognition at the cell surface defining the tropism of the virus. In some embodiments, Vp1 comprises a phospholipase domain, e.g., which functions in viral infectivity, in the N-terminus of Vp1.


In some embodiments, packaging capacity of the viral vectors limits the size of the gene modifying system that can be packaged into the vector. For example, the packaging capacity of the AAVs can be about 4.5 kb (e.g., about 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, or 6.0 kb), e.g., including one or two inverted terminal repeats (ITRs), e.g., 145 base ITRs.


In some embodiments, recombinant AAV (rAAV) comprises cis-acting 145-bp ITRs flanking vector transgene cassettes, e.g., providing up to 4.5 kb for packaging of foreign DNA. Subsequent to infection, rAAV can, in some instances, express a fusion protein of the invention and persist without integration into the host genome by existing episomally in circular head-to-tail concatemers. rAAV can be used, for example, in vitro and in vivo. In some embodiments, AAV-mediated gene delivery requires that the length of the coding sequence of the gene is equal or greater in size than the wild-type AAV genome. AAV delivery of genes that exceed this size and/or the use of large physiological regulatory elements can be accomplished, for example, by dividing the protein(s) to be delivered into two or more fragments. In some embodiments, the N-terminal fragment is fused to an intein-N sequence. In some embodiments, the C-terminal fragment is fused to an intein-C sequence. In embodiments, the fragments are packaged into two or more AAV vectors.


In some embodiments, dual AAV vectors are generated by splitting a large transgene expression cassette in two separate halves (5′ and 3′ ends, or head and tail), e.g., wherein each half of the cassette is packaged in a single AAV vector (of <5 kb). The re-assembly of the full-length transgene expression cassette can, in some embodiments, then be achieved upon co-infection of the same cell by both dual AAV vectors. In some embodiments, co-infection is followed by one or more of: (1) homologous recombination (HR) between 5′ and 3′ genomes (dual AAV overlapping vectors); (2) ITR-mediated tail-to-head concatemerization of 5′ and 3′ genomes (dual AAV trans-splicing vectors); and/or (3) a combination of these two mechanisms (dual AAV hybrid vectors). In some embodiments, the use of dual AAV vectors in vivo results in the expression of full-length proteins. In some embodiments, the use of the dual AAV vector platform represents an efficient and viable gene transfer strategy for transgenes of greater than about 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, or 5.0 kb in size. In some embodiments, AAV vectors can also be used to transduce cells with target nucleic acids, e.g., in the in vitro production of nucleic acids and peptides. In some embodiments, AAV vectors can be used for in vivo and ex vivo gene therapy procedures (see, e.g., West et al., Virology 160:38-47 (1987); U.S. Pat. No. 4,797,368; WO 93/24641; Kotin, Human Gene Therapy 5:793-801 (1994); Muzyczka, J. Clin. Invest. 94:1351 (1994); each of which is incorporated herein by reference in their entirety). The construction of recombinant AAV vectors is described in a number of publications, including U.S. Pat. No. 5,173,414; Tratschin et al., Mol. Cell. Biol. 5:3251-3260 (1985); Tratschin, et al., Mol. Cell. Biol. 4:2072-2081 (1984); Hermonat & Muzyczka, PNAS 81:6466-6470 (1984); and Samulski et al., J. Virol. 63:03822-3828 (1989) (incorporated by reference herein in their entirety).


In some embodiments, a gene modifying polypeptide described herein (e.g., with or without one or more guide nucleic acids) can be delivered using AAV, lentivirus, adenovirus or other plasmid or viral vector types, in particular, using formulations and doses from, for example, U.S. Pat. No. 8,454,972 (formulations, doses for adenovirus), U.S. Pat. No. 8,404,658 (formulations, doses for AAV) and U.S. Pat. No. 5,846,946 (formulations, doses for DNA plasmids) and from clinical trials and publications regarding the clinical trials involving lentivirus, AAV and adenovirus. For example, for AAV, the route of administration, formulation and dose can be as described in U.S. Pat. No. 8,454,972 and as in clinical trials involving AAV. For adenovirus, the route of administration, formulation and dose can be as described in U.S. Pat. No. 8,404,658 and as in clinical trials involving adenovirus. For plasmid delivery, the route of administration, formulation and dose can be as described in U.S. Pat. No. 5,846,946 and as in clinical studies involving plasmids. Doses can be based on or extrapolated to an average 70 kg individual (e.g. a male adult human), and can be adjusted for patients, subjects, mammals of different weight and species. Frequency of administration is within the ambit of the medical or veterinary practitioner (e.g., physician, veterinarian), depending on usual factors including the age, sex, general health, other conditions of the patient or subject and the particular condition or symptoms being addressed. In some embodiments, the viral vectors can be injected into the tissue of interest. For cell-type specific gene modifying, the expression of the gene modifying polypeptide and optional guide nucleic acid can, in some embodiments, be driven by a cell-type specific promoter.


In some embodiments, AAV allows for low toxicity, for example, due to the purification method not requiring ultracentrifugation of cell particles that can activate the immune response. In some embodiments, AAV allows low probability of causing insertional mutagenesis, for example, because it does not substantially integrate into the host genome.


In some embodiments, AAV has a packaging limit of about 4.4, 4.5, 4.6, 4.7, or 4.75 kb. In some embodiments, a gene modifying polypeptide-encoding sequence, promoter, and transcription terminator can fit into a single viral vector. SpCas9 (4.1 kb) may, in some instances, be difficult to package into AAV. Therefore, in some embodiments, a gene modifying polypeptide coding sequence is used that is shorter in length than other gene modifying polypeptide coding sequences or base editors. In some embodiments, the gene modifying polypeptide encoding sequences are less than about 4.5 kb, 4.4 kb, 4.3 kb, 4.2 kb, 4.1 kb, 4 kb, 3.9 kb, 3.8 kb, 3.7 kb, 3.6 kb, 3.5 kb, 3.4 kb, 3.3 kb, 3.2 kb, 3.1 kb, 3 kb, 2.9 kb, 2.8 kb, 2.7 kb, 2.6 kb, 2.5 kb, 2 kb, or 1.5 kb.


An AAV can be AAV1, AAV2, AAVS or any combination thereof. In some embodiments, the type of AAV is selected with respect to the cells to be targeted; e.g., AAV serotypes 1, 2, 5 or a hybrid capsid AAV1, AAV2, AAVS or any combination thereof can be selected for targeting brain or neuronal cells; or AAV4 can be selected for targeting cardiac tissue. In some embodiments, AAV8 is selected for delivery to the liver. Exemplary AAV serotypes as to these cells are described, for example, in Grimm, D. et al, J. Virol. 82: 5887-5911 (2008) (incorporated herein by reference in its entirety). In some embodiments, AAV refers all serotypes, subtypes, and naturally-occurring AAV as well as recombinant AAV. AAV may be used to refer to the virus itself or a derivative thereof. In some embodiments, AAV includes AAV1, AAV2, AAV3, AAV3B, AAV4, AAVS, AAV6, AAV6.2, AAV7, AAVrh.64R1, AAVhu.37, AAVrh.8, AAVrh.32.33, AAV8, AAV9, AAV-DJ, AAV2/8, AAVrh10, AAVLK03, AV10, AAV11, AAV 12, rh10, and hybrids thereof, avian AAV, bovine AAV, canine AAV, equine AAV, primate AAV, non-primate AAV, and ovine AAV. The genomic sequences of various serotypes of AAV, as well as the sequences of the native terminal repeats (TRs), Rep proteins, and capsid subunits are known in the art. Such sequences may be found in the literature or in public databases such as GenBank. Additional exemplary AAV serotypes are listed in Table 18.









TABLE 18







Exemplary AAV serotypes.









Target Tissue
Vehicle
Reference





Liver
AAV (AAV81, AAVrh.81,
1. Wang et al., Mol. Ther. 18, 118-



AAVhu.371, AAV2/8,
25 (2010)



AAV2/rh102, AAV9, AAV2,
2. Ginn et al., JHEP Reports,



NP403, NP592,3, AAV3B5, AAV-
100065 (2019)



DJ4, AAV-LK014, AAV-LK024,
3. Paulk et al., Mol. Ther. 26, 289-



AAV-LK034, AAV-LK194, AAV57
303 (2018).



Adenovirus (Ad5, HC-AdV6)
4. L. Lisowski et al., Nature. 506,




382-6 (2014).




5. L. Wang et al., Mol. Ther. 23,




1877-87 (2015).




6. Hausl Mol Ther (2010)




7. Davidoff et al., Mol. Ther. 11,




875-88 (2005)


Lung
AAV (AAV4, AAV5, AAV61,
1. Duncan et al., Mol Ther Methods



AAV9, H222)
Clin Dev (2018)



Adenovirus (Ad5, Ad3, Ad21,
2. Cooney et al., Am J Respir Cell



Ad14)3
Mol Biol (2019)




3. Li et al., Mol Ther Methods Clin





Dev (2019)



Skin
AAV (AAV61, AAV-LK192)
1. Petek et al., Mol. Ther. (2010)




2. L. Lisowski et al., Nature. 506,




382-6 (2014).


HSCs
Adenovirus (HDAd5/35++)
Wang et al. Blood Adv (2019)









In some embodiments, a pharmaceutical composition (e.g., comprising an AAV as described herein) has less than 10% empty capsids, less than 8% empty capsids, less than 7% empty capsids, less than 5% empty capsids, less than 3% empty capsids, or less than 1% empty capsids. In some embodiments, the pharmaceutical composition has less than about 5% empty capsids. In some embodiments, the number of empty capsids is below the limit of detection. In some embodiments, it is advantageous for the pharmaceutical composition to have low amounts of empty capsids, e.g., because empty capsids may generate an adverse response (e.g., immune response, inflammatory response, liver response, and/or cardiac response), e.g., with little or no substantial therapeutic benefit.


In some embodiments, the residual host cell protein (rHCP) in the pharmaceutical composition is less than or equal to 100 ng/ml rHCP per 1×1013 vg/ml, e.g., less than or equal to 40 ng/ml rHCP per 1×1013 vg/ml or 1-50 ng/ml rHCP per 1×1013 vg/ml. In some embodiments, the pharmaceutical composition comprises less than 10 ng rHCP per 1.0×1013 vg, or less than 5 ng rHCP per 1.0×1013 vg, less than 4 ng rHCP per 1.0×1013 vg, or less than 3 ng rHCP per 1.0×1013 vg, or any concentration in between. In some embodiments, the residual host cell DNA (hcDNA) in the pharmaceutical composition is less than or equal to 5×106 pg/ml hcDNA per 1×1013 vg/ml, less than or equal to 1.2×106 pg/ml hcDNA per 1×1013 vg/ml, or 1×105 pg/ml hcDNA per 1×1013 vg/ml. In some embodiments, the residual host cell DNA in said pharmaceutical composition is less than 5.0×105 pg per 1×1013 vg, less than 2.0×105 pg per 1.0×1013 vg, less than 1.1×105 pg per 1.0×1013 vg, less than 1.0×105 pg hcDNA per 1.0×1013 vg, less than 0.9×105 pg hcDNA per 1.0×1013 vg, less than 0.8×105 pg hcDNA per 1.0×1013 vg, or any concentration in between.


In some embodiments, the residual plasmid DNA in the pharmaceutical composition is less than or equal to 1.7×105 pg/ml per 1.0×1013 vg/ml, or 1×105 pg/ml per 1×1.0×1013 vg/ml, or 1.7×106 pg/ml per 1.0×1013 vg/ml. In some embodiments, the residual DNA plasmid in the pharmaceutical composition is less than 10.0×105 pg by 1.0×1013 vg, less than 8.0×105 pg by 1.0×1013 vg or less than 6.8×105 pg by 1.0×1013 vg. In embodiments, the pharmaceutical composition comprises less than 0.5 ng per 1.0×1013 vg, less than 0.3 ng per 1.0×1013 vg, less than 0.22 ng per 1.0×1013 vg or less than 0.2 ng per 1.0×1013 vg or any intermediate concentration of bovine serum albumin (BSA). In embodiments, the benzonase in the pharmaceutical composition is less than 0.2 ng by 1.0×1013 vg, less than 0.1 ng by 1.0×1013 vg, less than 0.09 ng by 1.0×1013 vg, less than 0.08 ng by 1.0×1013 vg or any intermediate concentration. In embodiments, Poloxamer 188 in the pharmaceutical composition is about 10 to 150 ppm, about 15 to 100 ppm or about 20 to 80 ppm. In embodiments, the cesium in the pharmaceutical composition is less than 50 pg/g (ppm), less than 30 pg/g (ppm) or less than 20 pg/g (ppm) or any intermediate concentration.


In embodiments, the pharmaceutical composition comprises total impurities, e.g., as determined by SDS-PAGE, of less than 10%, less than 8%, less than 7%, less than 6%, less than 5%, less than 4%, less than 3%, less than 2%, or any percentage in between. In embodiments, the total purity, e.g., as determined by SDS-PAGE, is greater than 90%, greater than 92%, greater than 93%, greater than 94%, greater than 95%, greater than 96%, greater than 97%, greater than 98%, or any percentage in between. In embodiments, no single unnamed related impurity, e.g., as measured by SDS-PAGE, is greater than 5%, greater than 4%, greater than 3% or greater than 2%, or any percentage in between. In embodiments, the pharmaceutical composition comprises a percentage of filled capsids relative to total capsids (e.g., peak 1+peak 2 as measured by analytical ultracentrifugation) of greater than 85%, greater than 86%, greater than 87%, greater than 88%, greater than 89%, greater than 90%, greater than 91%, greater than 91.9%, greater than 92%, greater than 93%, or any percentage in between. In embodiments of the pharmaceutical composition, the percentage of filled capsids measured in peak 1 by analytical ultracentrifugation is 20-80%, 25-75%, 30-75%, 35-75%, or 37.4-70.3%. In embodiments of the pharmaceutical composition, the percentage of filled capsids measured in peak 2 by analytical ultracentrifugation is 20-80%, 20-70%, 22-65%, 24-62%, or 24.9-60.1%.


In one embodiment, the pharmaceutical composition comprises a genomic titer of 1.0 to 5.0×1013 vg/mL, 1.2 to 3.0×1013 vg/mL or 1.7 to 2.3×1013 vg/ml. In one embodiment, the pharmaceutical composition exhibits a biological load of less than 5 CFU/mL, less than 4 CFU/mL, less than 3 CFU/mL, less than 2 CFU/mL or less than 1 CFU/mL or any intermediate contraction. In embodiments, the amount of endotoxin according to USP, for example, USP <85> (incorporated by reference in its entirety) is less than 1.0 EU/mL, less than 0.8 EU/mL or less than 0.75 EU/mL. In embodiments, the osmolarity of a pharmaceutical composition according to USP, for example, USP <785> (incorporated by reference in its entirety) is 350 to 450 mOsm/kg, 370 to 440 mOsm/kg or 390 to 430 mOsm/kg. In embodiments, the pharmaceutical composition contains less than 1200 particles that are greater than 25 μm per container, less than 1000 particles that are greater than 25 μm per container, less than 500 particles that are greater than 25 μm per container or any intermediate value. In embodiments, the pharmaceutical composition contains less than 10,000 particles that are greater than 10 μm per container, less than 8000 particles that are greater than 10 μm per container or less than 600 particles that are greater than 10 pm per container.


In one embodiment, the pharmaceutical composition has a genomic titer of 0.5 to 5.0×1013 vg/mL, 1.0 to 4.0×1013 vg/mL, 1.5 to 3.0×1013 vg/ml or 1.7 to 2.3×1013 vg/ml. In one embodiment, the pharmaceutical composition described herein comprises one or more of the following: less than about 0.09 ng benzonase per 1.0×1013 vg, less than about 30 pg/g (ppm) of cesium, about 20 to 80 ppm Poloxamer 188, less than about 0.22 ng BSA per 1.0×1013 vg, less than about 6.8×105 pg of residual DNA plasmid per 1.0×1013 vg, less than about 1.1×105 pg of residual hcDNA per 1.0×1013 vg, less than about 4 ng of rHCP per 1.0×1013 vg, pH 7.7 to 8.3, about 390 to 430 mOsm/kg, less than about 600 particles that are >25 μm in size per container, less than about 6000 particles that are >10 μm in size per container, about 1.7×1013-2.3×1013 vg/mL genomic titer, infectious titer of about 3.9×108 to 8.4×1010 IU per 1.0×1013 vg, total protein of about 100-300 μg per 1.0×1013 vg, mean survival of >24 days in A7SMA mice with about 7.5×1013 vg/kg dose of viral vector, about 70 to 130% relative potency based on an in vitro cell based assay and/or less than about 5% empty capsid. In various embodiments, the pharmaceutical compositions described herein comprise any of the viral particles discussed here, retain a potency of between ±20%, between ±15%, between ±10% or within ±5% of a reference standard. In some embodiments, potency is measured using a suitable in vitro cell assay or in vivo animal model.


Additional methods of preparation, characterization, and dosing AAV particles are taught in WO2019094253, which is incorporated herein by reference in its entirety.


Additional rAAV constructs that can be employed consonant with the invention include those described in Wang et al 2019, available at: //doi.org/10.1038/s41573-019-0012-9, including Table 1 thereof, which is incorporated by reference in its entirety.


Lipid Nanoparticles

The methods and systems provided herein may employ any suitable carrier or delivery modality, including, in certain embodiments, lipid nanoparticles (LNPs). Lipid nanoparticles, in some embodiments, comprise one or more ionic lipids, such as non-cationic lipids (e.g., neutral or anionic, or zwitterionic lipids); one or more conjugated lipids (such as PEG-conjugated lipids or lipids conjugated to polymers described in Table 5 of WO2019217941; incorporated herein by reference in its entirety); one or more sterols (e.g., cholesterol); and, optionally, one or more targeting molecules (e.g., conjugated receptors, receptor ligands, antibodies); or combinations of the foregoing.


Lipids that can be used in nanoparticle formations (e.g., lipid nanoparticles) include, for example those described in Table 4 of WO2019217941, which is incorporated by reference—e.g., a lipid-containing nanoparticle can comprise one or more of the lipids in Table 4 of WO2019217941. Lipid nanoparticles can include additional elements, such as polymers, such as the polymers described in Table 5 of WO2019217941, incorporated by reference.


In some embodiments, conjugated lipids, when present, can include one or more of PEG-diacylglycerol (DAG) (such as 1-(monomethoxy-polyethyleneglycol)-2,3-dimyristoylglycerol (PEG-DMG)), PEG-dialkyloxypropyl (DAA), PEG-phospholipid, PEG-ceramide (Cer), a pegylated phosphatidylethanoloamine (PEG-PE), PEG succinate diacylglycerol (PEGS-DAG) (such as 4-0-(2′,3′-di(tetradecanoyloxy)propyl-1-0-(w-methoxy(polyethoxy)ethyl) butanedioate (PEG-S-DMG)), PEG dialkoxypropylcarbam, N-(carbonyl-methoxypoly ethylene glycol 2000)-1,2-distearoyl-sn-glycero-3-phosphoethanolamine sodium salt, and those described in Table 2 of WO2019051289 (incorporated by reference), and combinations of the foregoing.


In some embodiments, sterols that can be incorporated into lipid nanoparticles include one or more of cholesterol or cholesterol derivatives, such as those in WO2009/127060 or US2010/0130588, which are incorporated by reference. Additional exemplary sterols include phytosterols, including those described in Eygeris et al (2020), dx.doi.org/10.1021/acs.nanolett.0c01386, incorporated herein by reference.


In some embodiments, the lipid particle comprises an ionizable lipid, a non-cationic lipid, a conjugated lipid that inhibits aggregation of particles, and a sterol. The amounts of these components can be varied independently and to achieve desired properties. For example, in some embodiments, the lipid nanoparticle comprises an ionizable lipid is in an amount from about 20 mol % to about 90 mol % of the total lipids (in other embodiments it may be 20-70% (mol), 30-60% (mol) or 40-50% (mol); about 50 mol % to about 90 mol % of the total lipid present in the lipid nanoparticle), a non-cationic lipid in an amount from about 5 mol % to about 30 mol % of the total lipids, a conjugated lipid in an amount from about 0.5 mol % to about 20 mol % of the total lipids, and a sterol in an amount from about 20 mol % to about 50 mol % of the total lipids. The ratio of total lipid to nucleic acid (e.g., encoding the gene modifying polypeptide or template nucleic acid) can be varied as desired. For example, the total lipid to nucleic acid (mass or weight) ratio can be from about 10:1 to about 30:1.


In some embodiments, an ionizable lipid may be a cationic lipid, an ionizable cationic lipid, e.g., a cationic lipid that can exist in a positively charged or neutral form depending on pH, or an amine-containing lipid that can be readily protonated. In some embodiments, the cationic lipid is a lipid capable of being positively charged, e.g., under physiological conditions. Exemplary cationic lipids include one or more amine group(s) which bear the positive charge. In some embodiments, the lipid particle comprises a cationic lipid in formulation with one or more of neutral lipids, ionizable amine-containing lipids, biodegradable alkyn lipids, steroids, phospholipids including polyunsaturated lipids, structural lipids (e.g., sterols), PEG, cholesterol and polymer conjugated lipids. In some embodiments, the cationic lipid may be an ionizable cationic lipid. An exemplary cationic lipid as disclosed herein may have an effective pKa over 6.0. In embodiments, a lipid nanoparticle may comprise a second cationic lipid having a different effective pKa (e.g., greater than the first effective pKa), than the first cationic lipid. A lipid nanoparticle may comprise between 40 and 60 mol percent of a cationic lipid, a neutral lipid, a steroid, a polymer conjugated lipid, and a therapeutic agent, e.g., a nucleic acid (e.g., RNA) described herein (e.g., a template nucleic acid or a nucleic acid encoding a gene modifying polypeptide), encapsulated within or associated with the lipid nanoparticle. In some embodiments, the nucleic acid is co-formulated with the cationic lipid. The nucleic acid may be adsorbed to the surface of an LNP, e.g., an LNP comprising a cationic lipid. In some embodiments, the nucleic acid may be encapsulated in an LNP, e.g., an LNP comprising a cationic lipid. In some embodiments, the lipid nanoparticle may comprise a targeting moiety, e.g., coated with a targeting agent. In embodiments, the LNP formulation is biodegradable. In some embodiments, a lipid nanoparticle comprising one or more lipid described herein, e.g., Formula (i), (ii), (ii), (vii) and/or (ix) encapsulates at least 1%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 92%, at least 95%, at least 97%, at least 98% or 100% of an RNA molecule, e.g., template RNA and/or a mRNA encoding the gene modifying polypeptide.


In some embodiments, the lipid to nucleic acid ratio (mass/mass ratio; w/w ratio) can be in the range of from about 1:1 to about 25:1, from about 10:1 to about 14:1, from about 3:1 to about 15:1, from about 4:1 to about 10:1, from about 5:1 to about 9:1, or about 6:1 to about 9:1. The amounts of lipids and nucleic acid can be adjusted to provide a desired N/P ratio, for example, N/P ratio of 3, 4, 5, 6, 7, 8, 9, 10 or higher. Generally, the lipid nanoparticle formulation's overall lipid content can range from about 5 mg/ml to about 30 mg/mL. Exemplary ionizable lipids that can be used in lipid nanoparticle formulations include, without limitation, those listed in Table 1 of WO2019051289, incorporated herein by reference. Additional exemplary lipids include, without limitation, one or more of the following formulae: X of US2016/0311759; I of US20150376115 or in US2016/0376224; I, II or III of US20160151284; I, IA, II, or IIA of US20170210967; I-c of US20150140070; A of US2013/0178541; I of US2013/0303587 or US2013/0123338; I of US2015/0141678; II, III, IV, or V of US2015/0239926; I of US2017/0119904; I or II of WO2017/117528; A of US2012/0149894; A of US2015/0057373; A of WO2013/116126; A of US2013/0090372; A of US2013/0274523; A of US2013/0274504; A of US2013/0053572; A of WO2013/016058; A of WO2012/162210; I of US2008/042973; I, II, III, or IV of US2012/01287670; I or II of US2014/0200257; I, II, or III of US2015/0203446; I or III of US2015/0005363; I, IA, IB, IC, ID, II, IIA, IIB, IIC, IID, or III-XXIV of US2014/0308304; of US2013/0338210; I, II, III, or IV of WO2009/132131; A of US2012/01011478; I or XXXV of US2012/0027796; XIV or XVII of US2012/0058144; of US2013/0323269; I of US2011/0117125; I, II, or III of US2011/0256175; I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII of US2012/0202871; I, II, III, IV, V, VI, VII, VIII, X, XII, XIII, XIV, XV, or XVI of US2011/0076335; I or II of US2006/008378; I of US2013/0123338; I or X-A-Y-Z of US2015/0064242; XVI, XVII, or XVIII of US2013/0022649; I, II, or III of US2013/0116307; I, II, or III of US2013/0116307; I or II of US2010/0062967; I-X of US2013/0189351; I of US2014/0039032; V of US2018/0028664; I of US2016/0317458; I of US2013/0195920; 5, 6, or 10 of U.S. Pat. No. 10,221,127; 111-3 of WO2018/081480; I-5 or I-8 of WO2020/081938; 18 or 25 of U.S. Pat. No. 9,867,888; A of US2019/0136231; II of WO2020/219876; 1 of US2012/0027803; OF-02 of US2019/0240349; 23 of U.S. Pat. No. 10,086,013; cKK-E12/A6 of Miao et al (2020); C12-200 of WO2010/053572; 7C1 of Dahlman et al (2017); 304-O13 or 503-O13 of Whitehead et al; TS-P4C2 of U.S. Pat. No. 9,708,628; I of WO2020/106946; I of WO2020/106946.


In some embodiments, the ionizable lipid is MC3 (6Z,9Z,28Z,3 1Z)-heptatriaconta-6,9,28,31-tetraen-19-yl-4-(dimethylamino) butanoate (DLin-MC3-DMA or MC3), e.g., as described in Example 9 of WO2019051289A9 (incorporated by reference herein in its entirety). In some embodiments, the ionizable lipid is the lipid ATX-002, e.g., as described in Example 10 of WO2019051289A9 (incorporated by reference herein in its entirety). In some embodiments, the ionizable lipid is (13Z,16Z)-A,A-dimethyl-3-nonyldocosa-13,16-dien-1-amine (Compound 32), e.g., as described in Example 11 of WO2019051289A9 (incorporated by reference herein in its entirety). In some embodiments, the ionizable lipid is Compound 6 or Compound 22, e.g., as described in Example 12 of WO2019051289A9 (incorporated by reference herein in its entirety). In some embodiments, the ionizable lipid is heptadecan-9-yl 8-((2-hydroxyethyl)(6-oxo-6-(undecyloxy)hexyl)amino)octanoate (SM-102); e.g., as described in Example 1 of U.S. Pat. No. 9,867,888 (incorporated by reference herein in its entirety). In some embodiments, the ionizable lipid is 9Z,12Z)-3-((4,4-bis(octyloxy)butanoyl)oxy)-2-((((3-(diethylamino)propoxy)carbonyl)oxy)methyl)propyl octadeca-9,12-dienoate (LP01) e.g., as synthesized in Example 13 of WO2015/095340 (incorporated by reference herein in its entirety). In some embodiments, the ionizable lipid is Di((Z)-non-2-en-1-yl) 9-((4-dimethylamino)butanoyl)oxy)heptadecanedioate (L319), e.g. as synthesized in Example 7, 8, or 9 of US2012/0027803 (incorporated by reference herein in its entirety). In some embodiments, the ionizable lipid is 1,1′-((2-(4-(2-((2-(Bis(2-hydroxydodecyl)amino)ethyl)(2-hydroxydodecyl) amino)ethyl)piperazin-1-yl)ethyl)azanediyl)bis(dodecan-2-ol) (C12-200), e.g., as synthesized in Examples 14 and 16 of WO2010/053572(incorporated by reference herein in its entirety). In some embodiments, the ionizable lipid is; Imidazole cholesterol ester (ICE) lipid (3S,10R,13R,17R)-10, 13-dimethyl-17-((R)-6-methylheptan-2-yl)-2, 3, 4, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17-tetradecahydro-1H-cyclopenta[a]phenanthren-3-yl 3-(1H-imidazol-4-yl)propanoate, e.g., Structure (I) from WO2020/106946 (incorporated by reference herein in its entirety).


Some non-limiting examples of lipid compounds that may be used (e.g., in combination with other lipid components) to form lipid nanoparticles for the delivery of compositions described herein, e.g., nucleic acid (e.g., RNA) described herein (e.g., a template nucleic acid or a nucleic acid encoding a gene modifying polypeptide) includes,




embedded image


In some embodiments an LNP comprising Formula (i) is used to deliver a gene modifying composition described herein to the liver and/or hepatocyte cells.




embedded image


In some embodiments an LNP comprising Formula (ii) is used to deliver a gene modifying composition described herein to the liver and/or hepatocyte cells.




embedded image


In some embodiments an LNP comprising Formula (iii) is used to deliver a gene modifying composition described herein to the liver and/or hepatocyte cells.




embedded image


In some embodiments an LNP comprising Formula (v) is used to deliver a gene modifying composition described herein to the liver and/or hepatocyte cells.




embedded image


In some embodiments an LNP comprising Formula (vi) is used to deliver a gene modifying composition described herein to the liver and/or hepatocyte cells.




embedded image


In some embodiments an LNP comprising Formula (viii) is used to deliver a gene modifying composition described herein to the liver and/or hepatocyte cells.




embedded image


In some embodiments an LNP comprising Formula (ix) is used to deliver a gene modifying composition described herein to the liver and/or hepatocyte cells.




embedded image


wherein


X1 is O, NR1, or a direct bond, X2 is C2-5 alkylene, X3 is C(═O) or a direct bond, le is H or Me, R3 is Ci-3 alkyl, R2 is Ci-3 alkyl, or R2 taken together with the nitrogen atom to which it is attached and 1-3 carbon atoms of X2 form a 4-, 5-, or 6-membered ring, or X1 is NR1, R1 and R2 taken together with the nitrogen atoms to which they are attached form a 5- or 6-membered ring, or R2 taken together with R3 and the nitrogen atom to which they are attached form a 5-, 6-, or 7-membered ring, Y1 is C2-12 alkylene, Y2 is selected from




embedded image


n is 0 to 3, R4 is Ci-15 alkyl, Z1 is Ci-6 alkylene or a direct bond,


Z2 is



embedded image


(in either orientation) or absent, provided that if Z1 is a direct bond, Z2 is absent;


R5 is C5-9 alkyl or C6-10 alkoxy, R6 is C5-9 alkyl or C6-10 alkoxy, W is methylene or a direct bond, and R7 is H or Me, or a salt thereof, provided that if R3 and R2 are C2 alkyls, X1 is O, X2 is linear C3 alkylene, X3 is C(═O), Y1 is linear Ce alkylene, (Y2)n-R4 is




embedded image


R4 is linear C5 alkyl, Z1 is C2 alkylene, Z2 is absent, W is methylene, and R7 is H, then R5 and R6 are not Cx alkoxy.


In some embodiments an LNP comprising Formula (xii) is used to deliver a gene modifying composition described herein to the liver and/or hepatocyte cells.




embedded image


In some embodiments an LNP comprising Formula (xi) is used to deliver a gene modifying composition described herein to the liver and/or hepatocyte cells.




embedded image


In some embodiments an LNP comprises a compound of Formula (xiii) and a compound of Formula (Xiv).




embedded image


In some embodiments an LNP comprising Formula (xv) is used to deliver a gene modifying composition described herein to the liver and/or hepatocyte cells.




embedded image


In some embodiments an LNP comprising a formulation of Formula (xvi) is used to deliver a gene modifying composition described herein to the lung endothelial cells.




embedded image


In some embodiments, a lipid compound used to form lipid nanoparticles for the delivery of compositions described herein, e.g., nucleic acid (e.g., RNA) described herein (e.g., a template nucleic acid or a nucleic acid encoding a gene modifying polypeptide) is made by one of the following reactions:




embedded image


Exemplary non-cationic lipids include, but are not limited to, distearoyl-sn-glycero-phosphoethanolamine, di stearoylphosphatidylcholine (DSPC), dioleoylphosphatidylcholine (DOPC), dipalmitoylphosphatidylcholine (DPPC), dioleoylphosphatidylglycerol (DOPG), dipalmitoylphosphatidylglycerol (DPPG), dioleoyl-phosphatidylethanolamine (DOPE), 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), palmitoyloleoylphosphatidylcholine (POPC), palmitoyloleoylphosphatidylethanolamine (POPE), dioleoyl-phosphatidylethanolamine 4-(N-maleimidomethyl)-cyclohexane-1-carboxylate (DOPE-mal), dipalmitoyl phosphatidyl ethanolamine (DPPE), dimyristoylphosphoethanolamine (DMPE), distearoyl-phosphatidyl-ethanolamine (DSPE), monomethyl-phosphatidylethanolamine (such as 16-O-monomethyl PE), dimethyl-phosphatidylethanolamine (such as 16-O-dimethyl PE), 18-1-trans PE, 1-stearoyl-2-oleoyl-phosphatidyethanolamine (SOPE), hydrogenated soy phosphatidylcholine (HSPC), egg phosphatidylcholine (EPC), dioleoylphosphatidylserine (DOPS), sphingomyelin (SM), dimyristoyl phosphatidylcholine (DMPC), dimyristoyl phosphatidylglycerol (DMPG), distearoylphosphatidylglycerol (DSPG), dierucoylphosphatidylcholine (DEPC), palmitoyloleyolphosphatidylglycerol (POPG), dielaidoyl-phosphatidylethanolamine (DEPE), lecithin, phosphatidylethanolamine, lysolecithin, lysophosphatidylethanolamine, phosphatidylserine, phosphatidylinositol, sphingomyelin, egg sphingomyelin (ESM), cephalin, cardiolipin, phosphatidicacid, cerebrosides, dicetylphosphate, lysophosphatidylcholine, dilinoleoylphosphatidylcholine, or mixtures thereof. It is understood that other diacylphosphatidylcholine and diacylphosphatidylethanolamine phospholipids can also be used. The acyl groups in these lipids are preferably acyl groups derived from fatty acids having C10-C24 carbon chains, e.g., lauroyl, myristoyl, paimitoyl, stearoyl, or oleoyl. Additional exemplary lipids, in certain embodiments, include, without limitation, those described in Kim et al. (2020) dx.doi.org/10.1021/acs.nanolett.0c01386, incorporated herein by reference. Such lipids include, in some embodiments, plant lipids found to improve liver transfection with mRNA (e.g., DGTS). In some embodiments, the non-cationic lipid may have the following structure,




embedded image


Other examples of non-cationic lipids suitable for use in the lipid nanopartieles include, without limitation, nonphosphorous lipids such as, e.g., stearylamine, dodeeylamine, hexadecylamine, acetyl palmitate, glycerol ricinoleate, hexadecyl stereate, isopropyl myristate, amphoteric acrylic polymers, triethanolamine-lauryl sulfate, alkyl-aryl sulfate polyethyloxylated fatty acid amides, dioctadecyl dimethyl ammonium bromide, ceramide, sphingomyelin, and the like. Other non-cationic lipids are described in WO2017/099823 or US patent publication US2018/0028664, the contents of which is incorporated herein by reference in their entirety.


In some embodiments, the non-cationic lipid is oleic acid or a compound of Formula I, II, or IV of US2018/0028664, incorporated herein by reference in its entirety. The non-cationic lipid can comprise, for example, 0-30% (mol) of the total lipid present in the lipid nanoparticle. In some embodiments, the non-cationic lipid content is 5-20% (mol) or 10-15% (mol) of the total lipid present in the lipid nanoparticle. In embodiments, the molar ratio of ionizable lipid to the neutral lipid ranges from about 2:1 to about 8:1 (e.g., about 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, or 8:1).


In some embodiments, the lipid nanoparticles do not comprise any phospholipids.


In some aspects, the lipid nanoparticle can further comprise a component, such as a sterol, to provide membrane integrity. One exemplary sterol that can be used in the lipid nanoparticle is cholesterol and derivatives thereof. Non-limiting examples of cholesterol derivatives include polar analogues such as 5a-choiestanol, 53-coprostanol, choiesteryl-(2′-hydroxy)-ethyl ether, choiesteryl-(4′-hydroxy)-butyl ether, and 6-ketocholestanol; non-polar analogues such as 5a-cholestane, cholestenone, 5a-cholestanone, 5p-cholestanone, and cholesteryl decanoate; and mixtures thereof. In some embodiments, the cholesterol derivative is a polar analogue, e.g., choiesteryl-(4′-hydroxy)-butyl ether. Exemplary cholesterol derivatives are described in PCT publication WO2009/127060 and US patent publication US2010/0130588, each of which is incorporated herein by reference in its entirety.


In some embodiments, the component providing membrane integrity, such as a sterol, can comprise 0-50% (mol) (e.g., 0-10%, 10-20%, 20-30%, 30-40%, or 40-50%) of the total lipid present in the lipid nanoparticle. In some embodiments, such a component is 20-50% (mol) 30-40% (mol) of the total lipid content of the lipid nanoparticle.


In some embodiments, the lipid nanoparticle can comprise a polyethylene glycol (PEG) or a conjugated lipid molecule. Generally, these are used to inhibit aggregation of lipid nanoparticles and/or provide steric stabilization. Exemplary conjugated lipids include, but are not limited to, PEG-lipid conjugates, polyoxazoline (POZ)-lipid conjugates, polyamide-lipid conjugates (such as ATTA-lipid conjugates), cationic-polymer lipid (CPL) conjugates, and mixtures thereof. In some embodiments, the conjugated lipid molecule is a PEG-lipid conjugate, for example, a (methoxy polyethylene glycol)-conjugated lipid.


Exemplary PEG-lipid conjugates include, but are not limited to, PEG-diacylglycerol (DAG) (such as 1-(monomethoxy-polyethyleneglycol)-2,3-dimyristoylglycerol (PEG-DMG)), PEG-dialkyloxypropyl (DAA), PEG-phospholipid, PEG-ceramide (Cer), a pegylated phosphatidylethanoloamine (PEG-PE), 1,2-dimyristoyl-sn-glycerol, methoxypoly ethylene glycol (DMG-PEG-2K), PEG succinate diacylglycerol (PEGS-DAG) (such as 4-O-(2′,3′-di(tetradecanoyloxy)propyl-1-O-(w-methoxy(polyethoxy)ethyl) butanedioate (PEG-S-DMG)), PEG dialkoxypropylcarbam, N-(carbonyl-methoxypolyethylene glycol 2000)-1,2-distearoyl-sn-glycero-3-phosphoethanolamine sodium salt, or a mixture thereof. Additional exemplary PEG-lipid conjugates are described, for example, in U.S. Pat. Nos. 5,885,613, 6,287,591, US2003/0077829, US2003/0077829, US2005/0175682, US2008/0020058, US2011/0117125, US2010/0130588, US2016/0376224, US2017/0119904, and US/099823, the contents of all of which are incorporated herein by reference in their entirety. In some embodiments, a PEG-lipid is a compound of Formula III, III-a-2, III-b-1, III-b-2, or V of US2018/0028664, the content of which is incorporated herein by reference in its entirety. In some embodiments, a PEG-lipid is of Formula II of US20150376115 or US2016/0376224, the content of both of which is incorporated herein by reference in its entirety. In some embodiments, the PEG-DAA conjugate can be, for example, PEG-dilauryloxypropyl, PEG-dimyristyloxypropyl, PEG-dipalmityloxypropyl, or PEG-distearyloxypropyl. The PEG-lipid can be one or more of PEG-DMG, PEG-dilaurylglycerol, PEG-dipalmitoylglycerol, PEG-disterylglycerol, PEG-dilaurylglycamide, PEG-dimyristylglycamide, PEG-dipalmitoylglycamide, PEG-disterylglycamide, PEG-cholesterol (1-[8′-(Cholest-5-en-3[beta]-oxy)carboxamido-3′,6′-dioxaoctanyl] carbamoyl-[omega]-methyl-poly(ethylene glycol), PEG-DMB (3,4-Ditetradecoxylbenzyl-[omega]-methyl-poly(ethylene glycol) ether), and 1,2-dimyristoyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethylene glycol)-2000]. In some embodiments, the PEG-lipid comprises PEG-DMG, 1,2-dimyristoyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethylene glycol)-2000]. In some embodiments, the PEG-lipid comprises a structure selected from:




embedded image


In some embodiments, lipids conjugated with a molecule other than a PEG can also be used in place of PEG-lipid. For example, polyoxazoline (POZ)-lipid conjugates, polyamide-lipid conjugates (such as ATTA-lipid conjugates), and cationic-polymer lipid (GPL) conjugates can be used in place of or in addition to the PEG-lipid.


Exemplary conjugated lipids, i.e., PEG-lipids, (POZ)-lipid conjugates, ATTA-lipid conjugates and cationic polymer-lipids are described in the PCT and LIS patent applications listed in Table 2 of WO2019051289A9 and in WO2020106946A1, the contents of all of which are incorporated herein by reference in their entirety.


In some embodiments an LNP comprises a compound of Formula (xix), a compound of Formula (xxi) and a compound of Formula (xxv). In some embodiments an LNP comprising a formulation of Formula (xix), Formula (xxi) and Formula (xxv) is used to deliver a gene modifying composition described herein to the lung or pulmonary cells.


In some embodiments, a lipid nanoparticle may comprise one or more cationic lipids selected from Formula (i), Formula (ii), Formula (iii), Formula (vii), and Formula (ix). In some embodiments, the LNP may further comprise one or more neutral lipid, e.g., DSPC, DPPC, DMPC, DOPC, POPC, DOPE, SM, a steroid, e.g., cholesterol, and/or one or more polymer conjugated lipid, e.g., a pegylated lipid, e.g., PEG-DAG, PEG-PE, PEG-S-DAG, PEG-cer or a PEG dialkyoxypropylcarbamate.


In some embodiments, the PEG or the conjugated lipid can comprise 0-20% (mol) of the total lipid present in the lipid nanoparticle. In some embodiments, PEG or the conjugated lipid content is 0.5-10% or 2-5% (mol) of the total lipid present in the lipid nanoparticle. Molar ratios of the ionizable lipid, non-cationic-lipid, sterol, and PEG/conjugated lipid can be varied as needed. For example, the lipid particle can comprise 30-70% ionizable lipid by mole or by total weight of the composition, 0-60% cholesterol by mole or by total weight of the composition, 0-30% non-cationic-lipid by mole or by total weight of the composition and 1-10% conjugated lipid by mole or by total weight of the composition. Preferably, the composition comprises 30-40% ionizable lipid by mole or by total weight of the composition, 40-50% cholesterol by mole or by total weight of the composition, and 10-20% non-cationic-lipid by mole or by total weight of the composition. In some other embodiments, the composition is 50-75% ionizable lipid by mole or by total weight of the composition, 20-40% cholesterol by mole or by total weight of the composition, and 5 to 10% non-cationic-lipid, by mole or by total weight of the composition and 1-10% conjugated lipid by mole or by total weight of the composition. The composition may contain 60-70% ionizable lipid by mole or by total weight of the composition, 25-35% cholesterol by mole or by total weight of the composition, and 5-10% non-cationic-lipid by mole or by total weight of the composition. The composition may also contain up to 90% ionizable lipid by mole or by total weight of the composition and 2 to 15% non-cationic lipid by mole or by total weight of the composition. The formulation may also be a lipid nanoparticle formulation, for example comprising 8-30% ionizable lipid by mole or by total weight of the composition, 5-30% non-cationic lipid by mole or by total weight of the composition, and 0-20% cholesterol by mole or by total weight of the composition; 4-25% ionizable lipid by mole or by total weight of the composition, 4-25% non-cationic lipid by mole or by total weight of the composition, 2 to 25% cholesterol by mole or by total weight of the composition, 10 to 35% conjugate lipid by mole or by total weight of the composition, and 5% cholesterol by mole or by total weight of the composition; or 2-30% ionizable lipid by mole or by total weight of the composition, 2-30% non-cationic lipid by mole or by total weight of the composition, 1 to 15% cholesterol by mole or by total weight of the composition, 2 to 35% conjugate lipid by mole or by total weight of the composition, and 1-20% cholesterol by mole or by total weight of the composition; or even up to 90% ionizable lipid by mole or by total weight of the composition and 2-10% non-cationic lipids by mole or by total weight of the composition, or even 100% cationic lipid by mole or by total weight of the composition. In some embodiments, the lipid particle formulation comprises ionizable lipid, phospholipid, cholesterol and a PEG-ylated lipid in a molar ratio of 50:10:38.5:1.5. In some other embodiments, the lipid particle formulation comprises ionizable lipid, cholesterol and a PEG-ylated lipid in a molar ratio of 60:38.5:1.5.


In some embodiments, the lipid particle comprises ionizable lipid, non-cationic lipid (e.g. phospholipid), a sterol (e.g., cholesterol) and a PEG-ylated lipid, where the molar ratio of lipids ranges from 20 to 70 mole percent for the ionizable lipid, with a target of 40-60, the mole percent of non-cationic lipid ranges from 0 to 30, with a target of 0 to 15, the mole percent of sterol ranges from 20 to 70, with a target of 30 to 50, and the mole percent of PEG-ylated lipid ranges from 1 to 6, with a target of 2 to 5.


In some embodiments, the lipid particle comprises ionizable lipid/non-cationic-lipid/sterol/conjugated lipid at a molar ratio of 50:10:38.5:1.5.


In an aspect, the disclosure provides a lipid nanoparticle formulation comprising phospholipids, lecithin, phosphatidylcholine and phosphatidylethanolamine.


In some embodiments, one or more additional compounds can also be included. Those compounds can be administered separately, or the additional compounds can be included in the lipid nanoparticles of the invention. In other words, the lipid nanoparticles can contain other compounds in addition to the nucleic acid or at least a second nucleic acid, different than the first. Without limitations, other additional compounds can be selected from the group consisting of small or large organic or inorganic molecules, monosaccharides, disaccharides, trisaccharides, oligosaccharides, polysaccharides, peptides, proteins, peptide analogs and derivatives thereof, peptidomimetics, nucleic acids, nucleic acid analogs and derivatives, an extract made from biological materials, or any combinations thereof.


In some embodiments, a lipid nanoparticle (or a formulation comprising lipid nanoparticles) lacks reactive impurities (e.g., aldehydes or ketones), or comprises less than a preselected level of reactive impurities (e.g., aldehydes or ketones). While not wishing to be bound by theory, in some embodiments, a lipid reagent is used to make a lipid nanoparticle formulation, and the lipid reagent may comprise a contaminating reactive impurity (e.g., an aldehyde or ketone). A lipid regent may be selected for manufacturing based on having less than a preselected level of reactive impurities (e.g., aldehydes or ketones). Without wishing to be bound by theory, in some embodiments, aldehydes can cause modification and damage of RNA, e.g., cross-linking between bases and/or covalently conjugating lipid to RNA (e.g., forming lipid-RNA adducts). This may, in some instances, lead to failure of a reverse transcriptase reaction and/or incorporation of inappropriate bases, e.g., at the site(s) of lesion(s), e.g., a mutation in a newly synthesized target DNA.


In some embodiments, a lipid nanoparticle formulation is produced using a lipid reagent comprising less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content. In some embodiments, a lipid nanoparticle formulation is produced using a lipid reagent comprising less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species. In some embodiments, a lipid nanoparticle formulation is produced using a lipid reagent comprising: (i) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content; and (ii) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species. In some embodiments, the lipid nanoparticle formulation is produced using a plurality of lipid reagents, and each lipid reagent of the plurality independently meets one or more criterion described in this paragraph. In some embodiments, each lipid reagent of the plurality meets the same criterion, e.g., a criterion of this paragraph.


In some embodiments, the lipid nanoparticle formulation comprises less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content. In some embodiments, the lipid nanoparticle formulation comprises less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species. In some embodiments, the lipid nanoparticle formulation comprises: (i) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content; and (ii) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species.


In some embodiments, one or more, or optionally all, of the lipid reagents used for a lipid nanoparticle as described herein or a formulation thereof comprise less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content. In some embodiments, one or more, or optionally all, of the lipid reagents used for a lipid nanoparticle as described herein or a formulation thereof comprise less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species. In some embodiments, one or more, or optionally all, of the lipid reagents used for a lipid nanoparticle as described herein or a formulation thereof comprise: (i) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content; and (ii) less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species.


In some embodiments, total aldehyde content and/or quantity of any single reactive impurity (e.g., aldehyde) species is determined by liquid chromatography (LC), e.g., coupled with tandem mass spectrometry (MS/MS), e.g., according to the method described in Example 40 of PCT/US21/20948. In some embodiments, reactive impurity (e.g., aldehyde) content and/or quantity of reactive impurity (e.g., aldehyde) species is determined by detecting one or more chemical modifications of a nucleic acid molecule (e.g., an RNA molecule, e.g., as described herein) associated with the presence of reactive impurities (e.g., aldehydes), e.g., in the lipid reagents. In some embodiments, reactive impurity (e.g., aldehyde) content and/or quantity of reactive impurity (e.g., aldehyde) species is determined by detecting one or more chemical modifications of a nucleotide or nucleoside (e.g., a ribonucleotide or ribonucleoside, e.g., comprised in or isolated from a template nucleic acid, e.g., as described herein) associated with the presence of reactive impurities (e.g., aldehydes), e.g., in the lipid reagents, e.g., according to the method described in Example 41 of PCT/US21/20948. In embodiments, chemical modifications of a nucleic acid molecule, nucleotide, or nucleoside are detected by determining the presence of one or more modified nucleotides or nucleosides, e.g., using LC-MS/MS analysis, e.g., according to the method described in Example 41 of PCT/US21/20948.


In some embodiments, a nucleic acid (e.g., RNA) described herein (e.g., a template nucleic acid or a nucleic acid encoding a gene modifying polypeptide) does not comprise an aldehyde modification, or comprises less than a preselected amount of aldehyde modifications. In some embodiments, on average, a nucleic acid has less than 50, 20, 10, 5, 2, or 1 aldehyde modifications per 1000 nucleotides, e.g., wherein a single cross-linking of two nucleotides is a single aldehyde modification. In some embodiments, the aldehyde modification is an RNA adduct (e.g., a lipid-RNA adduct). In some embodiments, the aldehyde-modified nucleotide is cross-linking between bases. In some embodiments, a nucleic acid (e.g., RNA) described herein comprises less than 50, 20, 10, 5, 2, or 1 cross-links between nucleotide.


In some embodiments, LNPs are directed to specific tissues by the addition of targeting domains. For example, biological ligands may be displayed on the surface of LNPs to enhance interaction with cells displaying cognate receptors, thus driving association with and cargo delivery to tissues wherein cells express the receptor. In some embodiments, the biological ligand may be a ligand that drives delivery to the liver, e.g., LNPs that display GalNAc result in delivery of nucleic acid cargo to hepatocytes that display asialoglycoprotein receptor (ASGPR). The work of Akinc et al. Mol Ther 18(7):1357-1364 (2010) teaches the conjugation of a trivalent GalNAc ligand to a PEG-lipid (GalNAc-PEG-DSG) to yield LNPs dependent on ASGPR for observable LNP cargo effect (see, e.g., FIG. 6 therein). Other ligand-displaying LNP formulations, e.g., incorporating folate, transferrin, or antibodies, are discussed in WO2017223135, which is incorporated herein by reference in its entirety, in addition to the references used therein, namely Kolhatkar et al., Curr Drug Discov Technol. 2011 8:197-206; Musacchio and Torchilin, Front Biosci. 2011 16:1388-1412; Yu et al., Mol Membr Biol. 2010 27:286-298; Patil et al., Crit Rev Ther Drug Carrier Syst. 2008 25:1-61; Benoit et al., Biomacromolecules. 2011 12:2708-2714; Zhao et al., Expert Opin Drug Deliv. 2008 5:309-319; Akinc et al., Mol Ther. 2010 18:1357-1364; Srinivasan et al., Methods Mol Biol. 2012 820:105-116; Ben-Arie et al., Methods Mol Biol. 2012 757:497-507; Peer 2010 J Control Release. 20:63-68; Peer et al., Proc Natl Acad Sci USA. 2007 104:4095-4100; Kim et al., Methods Mol Biol. 2011 721:339-353; Subramanya et al., Mol Ther. 2010 18:2028-2037; Song et al., Nat Biotechnol. 2005 23:709-717; Peer et al., Science. 2008 319:627-630; and Peer and Lieberman, Gene Ther. 2011 18:1127-1133.


In some embodiments, LNPs are selected for tissue-specific activity by the addition of a Selective ORgan Targeting (SORT) molecule to a formulation comprising traditional components, such as ionizable cationic lipids, amphipathic phospholipids, cholesterol and poly(ethylene glycol) (PEG) lipids. The teachings of Cheng et al. Nat Nanotechnol 15(4):313-320 (2020) demonstrate that the addition of a supplemental “SORT” component precisely alters the in vivo RNA delivery profile and mediates tissue-specific (e.g., lungs, liver, spleen) gene delivery and editing as a function of the percentage and biophysical property of the SORT molecule.


In some embodiments, the LNPs comprise biodegradable, ionizable lipids. In some embodiments, the LNPs comprise (9Z,12Z)-3-((4,4-bis(octyloxy)butanoyl)oxy)-2-((((3-(diethylamino)propoxy)carbonyl)oxy)methyl)propyl octadeca-9,12-dienoate, also called 3-((4,4-bis(octyloxy)butanoyl)oxy)-2-((((3-(diethylamino)propoxy)carbonyl)oxy)methyl)propyl (9Z,12Z)-octadeca-9,12-dienoate) or another ionizable lipid. See, e.g, lipids of WO2019/067992, WO/2017/173054, WO2015/095340, and WO2014/136086, as well as references provided therein. In some embodiments, the term cationic and ionizable in the context of LNP lipids is interchangeable, e.g., wherein ionizable lipids are cationic depending on the pH.


In some embodiments, an LNP described herein comprises a lipid described in Table 19.












TABLE 19





Exemplary





Lipids

Molecular



LIPID ID
Chemical Name
Weight
Structure


















LIPIDV00 3
(9Z,12Z)-3-((4,4- bis(octyloxy) butanoyl)oxy)-2- ((((3- (diethylamino) propoxy)carbonyl) oxy)methyl)propyl octadeca-9, 12-dienoate
852.29


embedded image







LIPIDV00 4
Heptadecan-9-yl 8-((2- hydroxyethyl) (8-(nonyloxy)-8- oxooctyl)amino) octanoate
710.18


embedded image







LIPIDV00 5

919.56


embedded image











In some embodiments, multiple components of a gene modifying system may be prepared as a single LNP formulation, e.g., an LNP formulation comprises mRNA encoding for the gene modifying polypeptide and an RNA template. Ratios of nucleic acid components may be varied in order to maximize the properties of a therapeutic. In some embodiments, the ratio of RNA template to mRNA encoding a gene modifying polypeptide is about 1:1 to 100:1, e.g., about 1:1 to 20:1, about 20:1 to 40:1, about 40:1 to 60:1, about 60:1 to 80:1, or about 80:1 to 100:1, by molar ratio. In other embodiments, a system of multiple nucleic acids may be prepared by separate formulations, e.g., one LNP formulation comprising a template RNA and a second LNP formulation comprising an mRNA encoding a gene modifying polypeptide. In some embodiments, the system may comprise more than two nucleic acid components formulated into LNPs. In some embodiments, the system may comprise a protein, e.g., a gene modifying polypeptide, and a template RNA formulated into at least one LNP formulation.


In some embodiments, the average LNP diameter of the LNP formulation may be between 10s of nm and 100s of nm, e.g., measured by dynamic light scattering (DLS). In some embodiments, the average LNP diameter of the LNP formulation may be from about 40 nm to about 150 nm, such as about 40 nm, 45 nm, 50 nm, 55 nm, 60 nm, 65 nm, 70 nm, 75 nm, 80 nm, 85 nm, 90 nm, 95 nm, 100 nm, 105 nm, 110 nm, 115 nm, 120 nm, 125 nm, 130 nm, 135 nm, 140 nm, 145 nm, or 150 nm. In some embodiments, the average LNP diameter of the LNP formulation may be from about 50 nm to about 100 nm, from about 50 nm to about 90 nm, from about 50 nm to about 80 nm, from about 50 nm to about 70 nm, from about 50 nm to about 60 nm, from about 60 nm to about 100 nm, from about 60 nm to about 90 nm, from about 60 nm to about 80 nm, from about 60 nm to about 70 nm, from about 70 nm to about 100 nm, from about 70 nm to about 90 nm, from about 70 nm to about 80 nm, from about 80 nm to about 100 nm, from about 80 nm to about 90 nm, or from about 90 nm to about 100 nm. In some embodiments, the average LNP diameter of the LNP formulation may be from about 70 nm to about 100 nm. In a particular embodiment, the average LNP diameter of the LNP formulation may be about 80 nm. In some embodiments, the average LNP diameter of the LNP formulation may be about 100 nm. In some embodiments, the average LNP diameter of the LNP formulation ranges from about 1 mm to about 500 mm, from about 5 mm to about 200 mm, from about 10 mm to about 100 mm, from about 20 mm to about 80 mm, from about 25 mm to about 60 mm, from about 30 mm to about 55 mm, from about 35 mm to about 50 mm, or from about 38 mm to about 42 mm.


An LNP may, in some instances, be relatively homogenous. A polydispersity index may be used to indicate the homogeneity of an LNP, e.g., the particle size distribution of the lipid nanoparticles. A small (e.g., less than 0.3) polydispersity index generally indicates a narrow particle size distribution. An LNP may have a polydispersity index from about 0 to about 0.25, such as 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22, 0.23, 0.24, or 0.25. In some embodiments, the polydispersity index of an LNP may be from about 0.10 to about 0.20.


The zeta potential of an LNP may be used to indicate the electrokinetic potential of the composition. In some embodiments, the zeta potential may describe the surface charge of an LNP. Lipid nanoparticles with relatively low charges, positive or negative, are generally desirable, as more highly charged species may interact undesirably with cells, tissues, and other elements in the body. In some embodiments, the zeta potential of an LNP may be from about −10 mV to about +20 mV, from about −10 mV to about +15 mV, from about −10 mV to about +10 mV, from about −10 mV to about +5 mV, from about −10 mV to about 0 mV, from about −10 mV to about −5 mV, from about −5 mV to about +20 mV, from about −5 mV to about +15 mV, from about −5 mV to about +10 mV, from about −5 mV to about +5 mV, from about −5 mV to about 0 mV, from about 0 mV to about +20 mV, from about 0 mV to about +15 mV, from about 0 mV to about +10 mV, from about 0 mV to about +5 mV, from about +5 mV to about +20 mV, from about +5 mV to about +15 mV, or from about +5 mV to about +10 mV.


The efficiency of encapsulation of a protein and/or nucleic acid, e.g., gene modifying polypeptide or mRNA encoding the polypeptide, describes the amount of protein and/or nucleic acid that is encapsulated or otherwise associated with an LNP after preparation, relative to the initial amount provided. The encapsulation efficiency is desirably high (e.g., close to 100%). The encapsulation efficiency may be measured, for example, by comparing the amount of protein or nucleic acid in a solution containing the lipid nanoparticle before and after breaking up the lipid nanoparticle with one or more organic solvents or detergents. An anion exchange resin may be used to measure the amount of free protein or nucleic acid (e.g., RNA) in a solution. Fluorescence may be used to measure the amount of free protein and/or nucleic acid (e.g., RNA) in a solution. For the lipid nanoparticles described herein, the encapsulation efficiency of a protein and/or nucleic acid may be at least 50%, for example 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%. In some embodiments, the encapsulation efficiency may be at least 80%. In some embodiments, the encapsulation efficiency may be at least 90%. In some embodiments, the encapsulation efficiency may be at least 95%.


An LNP may optionally comprise one or more coatings. In some embodiments, an LNP may be formulated in a capsule, film, or table having a coating. A capsule, film, or tablet including a composition described herein may have any useful size, tensile strength, hardness or density.


Additional exemplary lipids, formulations, methods, and characterization of LNPs are taught by WO2020061457, which is incorporated herein by reference in its entirety.


In some embodiments, in vitro or ex vivo cell lipofections are performed using Lipofectamine MessengerMax (Thermo Fisher) or TransIT-mRNA Transfection Reagent (Minis Bio). In certain embodiments, LNPs are formulated using the GenVoy_ILM ionizable lipid mix (Precision NanoSystems). In certain embodiments, LNPs are formulated using 2,2-dilinoleyl-4-dimethylaminoethyl-[1,3]-dioxolane (DLin-KC2-DMA) or dilinoleylmethyl-4-dimethylaminobutyrate (DLin-MC3-DMA or MC3), the formulation and in vivo use of which are taught in Jayaraman et al. Angew Chem Int Ed Engl 51(34):8529-8533 (2012), incorporated herein by reference in its entirety.


LNP formulations optimized for the delivery of CRISPR-Cas systems, e.g., Cas9-gRNA RNP, gRNA, Cas9 mRNA, are described in WO2019067992 and WO2019067910, both incorporated by reference.


Additional specific LNP formulations useful for delivery of nucleic acids are described in U.S. Pat. Nos. 8,158,601 and 8,168,775, both incorporated by reference, which include formulations used in patisiran, sold under the name ONPATTRO.


Exemplary dosing of gene modifying LNP may include about 0.1, 0.25, 0.3, 0.5, 1, 2, 3, 4, 5, 6, 8, 10, or 100 mg/kg (RNA). Exemplary dosing of AAV comprising a nucleic acid encoding one or more components of the system may include an MOI of about 1011, 1012, 1013, and 1014 vg/kg.


Kits, Articles of Manufacture, and Pharmaceutical Compositions

In an aspect the disclosure provides a kit comprising a gene modifying polypeptide or a gene modifying system, e.g., as described herein. In some embodiments, the kit comprises a gene modifying polypeptide (or a nucleic acid encoding the polypeptide) and a template RNA (or DNA encoding the template RNA). In some embodiments, the kit further comprises a reagent for introducing the system into a cell, e.g., transfection reagent, LNP, and the like. In some embodiments, the kit is suitable for any of the methods described herein. In some embodiments, the kit comprises one or more elements, compositions (e.g., pharmaceutical compositions), gene modifying polypeptides, and/or gene modifying systems, or a functional fragment or component thereof, e.g., disposed in an article of manufacture. In some embodiments, the kit comprises instructions for use thereof.


In an aspect, the disclosure provides an article of manufacture, e.g., in which a kit as described herein, or a component thereof, is disposed.


In an aspect, the disclosure provides a pharmaceutical composition comprising a gene modifying polypeptide or a gene modifying system, e.g., as described herein. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier or excipient. In some embodiments, the pharmaceutical composition comprises a template RNA and/or an RNA encoding the polypeptide. In embodiments, the pharmaceutical composition has one or more (e.g., 1, 2, 3, or 4) of the following characteristics:

    • (a) less than 1% (e.g., less than 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%) DNA template relative to the template RNA and/or the RNA encoding the polypeptide, e.g., on a molar basis;
    • (b) less than 1% (e.g., less than 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%) uncapped RNA relative to the template RNA and/or the RNA encoding the polypeptide, e.g., on a molar basis;
    • (c) less than 1% (e.g., less than 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%) partial length RNAs relative to the template RNA and/or the RNA encoding the polypeptide, e.g., on a molar basis;
    • (d) substantially lacks unreacted cap dinucleotides.


Chemistry, Manufacturing, and Controls (CMC)

Purification of protein therapeutics is described, for example, in Franks, Protein Biotechnology: Isolation, Characterization, and Stabilization, Humana Press (2013); and in Cutler, Protein Purification Protocols (Methods in Molecular Biology), Humana Press (2010).


In some embodiments, a gene modifying system, polypeptide, and/or template nucleic acid (e.g., template RNA) conforms to certain quality standards. In some embodiments, a gene modifying system, polypeptide, and/or template nucleic acid (e.g., template RNA) produced by a method described herein conforms to certain quality standards. Accordingly, the disclosure is directed, in some aspects, to methods of manufacturing a gene modifying system, polypeptide, and/or template nucleic acid (e.g., template RNA) that conforms to certain quality standards, e.g., in which said quality standards are assayed. The disclosure is also directed, in some aspects, to methods of assaying said quality standards in a gene modifying system, polypeptide, and/or template nucleic acid (e.g., template RNA). In some embodiments, quality standards include, but are not limited to, one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12) of the following:

    • (i) the length of the template RNA, e.g., whether the template RNA has a length that is above a reference length or within a reference length range, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the template RNA present is greater than 100, 125, 150, 175, or 200 nucleotides long;
    • (ii) the presence, absence, and/or length of a polyA tail on the template RNA, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the template RNA present contains a polyA tail (e.g., a polyA tail that is at least 5, 10, 20, 30, 50, 70, 100 nucleotides in length) (SEQ ID NO: 37640));
    • (iii) the presence, absence, and/or type of a 5′ cap on the template RNA, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the template RNA present contains a 5′ cap, e.g., whether that cap is a 7-methylguanosine cap, e.g., a O-Me-m7G cap;
    • (iv) the presence, absence, and/or type of one or more modified nucleotides (e.g., selected from pseudouridine, dihydrouridine, inosine, 7-methylguanosine, 1-N-methylpseudouridine (1-Me-T), 5-methoxyuridine (5-MO-U), 5-methylcytidine (5mC), or a locked nucleotide) in the template RNA, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the template RNA present contains one or more modified nucleotides;
    • (v) the stability of the template RNA (e.g., over time and/or under a pre-selected condition), e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the template RNA remains intact (e.g., greater than 100, 125, 150, 175, or 200 nucleotides long) after a stability test;
    • (vi) the potency of the template RNA in a system for modifying DNA, e.g., whether at least 1% of target sites are modified after a system comprising the template RNA is assayed for potency;
    • (vii) the length of the polypeptide, first polypeptide, or second polypeptide, e.g., whether the polypeptide, first polypeptide, or second polypeptide has a length that is above a reference length or within a reference length range, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the polypeptide, first polypeptide, or second polypeptide present is greater than 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1600, 1700, 1800, 1900, or 2000 amino acids long (and optionally, no larger than 2500, 2000, 1500, 1400, 1300, 1200, 1100, 1000, 900, 800, 700, or 600 amino acids long);
    • (viii) the presence, absence, and/or type of post-translational modification on the polypeptide, first polypeptide, or second polypeptide, e.g., whether at least 80, 85, 90, 95, 96, 97, 98, or 99% of the polypeptide, first polypeptide, or second polypeptide contains phosphorylation, methylation, acetylation, myristoylation, palmitoylation, isoprenylation, glipyatyon, or lipoylation, or any combination thereof;
    • (ix) the presence, absence, and/or type of one or more artificial, synthetic, or non-canonical amino acids (e.g., selected from ornithine, (3-alanine, GABA, 6-Aminolevulinic acid, PABA, a D-amino acid (e.g., D-alanine or D-glutamate), aminoisobutyric acid, dehydroalanine, cystathionine, lanthionine, Djenkolic acid, Diaminopimelic acid, Homoalanine, Norvaline, Norleucine, Homonorleucine, homoserine, O-methyl-homoserine and O-ethyl-homoserine, ethionine, selenocysteine, selenohomocysteine, selenomethionine, selenoethionine, tellurocysteine, or telluromethionine) in the polypeptide, first polypeptide, or second polypeptide, e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the polypeptide, first polypeptide, or second polypeptide present contains one or more artificial, synthetic, or non-canonical amino acids;
    • (x) the stability of the polypeptide, first polypeptide, or second polypeptide (e.g., over time and/or under a pre-selected condition), e.g., whether at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the polypeptide, first polypeptide, or second polypeptide remains intact (e.g., greater than 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1600, 1700, 1800, 1900, or 2000 amino acids long (and optionally, no larger than 2500, 2000, 1500, 1400, 1300, 1200, 1100, 1000, 900, 800, 700, or 600 amino acids long)) after a stability test;
    • (xi) the potency of the polypeptide, first polypeptide, or second polypeptide in a system for modifying DNA, e.g., whether at least 1% of target sites are modified after a system comprising the polypeptide, first polypeptide, or second polypeptide is assayed for potency; or
    • (xii) the presence, absence, and/or level of one or more of a pyrogen, virus, fungus, bacterial pathogen, or host cell protein, e.g., whether the system is free or substantially free of pyrogen, virus, fungus, bacterial pathogen, or host cell protein contamination.


In some embodiments, a system or pharmaceutical composition described herein is endotoxin free.


In some embodiments, the presence, absence, and/or level of one or more of a pyrogen, virus, fungus, bacterial pathogen, and/or host cell protein is determined. In embodiments, whether the system is free or substantially free of pyrogen, virus, fungus, bacterial pathogen, and/or host cell protein contamination is determined.


In some embodiments, a pharmaceutical composition or system as described herein has one or more (e.g., 1, 2, 3, or 4) of the following characteristics:

    • (a) less than 1% (e.g., less than 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%) DNA template relative to the template RNA and/or the RNA encoding the polypeptide, e.g., on a molar basis;
    • (b) less than 1% (e.g., less than 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%) uncapped RNA relative to the template RNA and/or the RNA encoding the polypeptide, e.g., on a molar basis;
    • (c) less than 1% (e.g., less than 0.5%, 0.4%, 0.3%, 0.2%, or 0.1%) partial length RNAs relative to the template RNA and/or the RNA encoding the polypeptide, e.g., on a molar basis;
    • (d) substantially lacks unreacted cap dinucleotides.


EXAMPLES
Example 1: Screening Configurations of Template RNAs that Correct the F508del Mutation in a Genomic Landing Pad in Human Cells

This example describes the use of gene modifying system containing a gene modifying polypeptide and template RNAs comprising varied lengths of heterologous object sequences and PBS sequences to quantify the activity of template RNAs for correction of the F508del mutation. In this example, a template RNA contains:

    • (1) a gRNA spacer;
    • (2) a gRNA scaffold;
    • (3) a heterologous object sequence; and
    • (4) a primer binding site (PBS) sequence.


One or more template RNAs described in Tables 1-4 can be tested as described in this example. The heterologous object sequences and PBS sequences were designed to correct the F508del mutation in a landing pad by inserting a “CTT” nucleotide triplet at the mutation site via gene editing, to reverse a F508del mutation in the corresponding protein. It is of course understood that the correction of the mutation may be accomplished by directly modifying the opposite (e.g., non-coding) strand of DNA. For example, to reverse a F508del mutation, the correction may be effected by inserting a “GAA” nucleotide triplet in the opposite (e.g., non-coding) strand of DNA to restore the CFTR gene to the wild type reference sequence.


A cell line is created to have a “landing pad” or a stable integration that mimics a region of the CFTR gene that contains the F508del mutation site and flanking sequences. The DNA for the landing pad is chemically synthesized and cloned into the pLenti-N-tGFP vector. The cloned landing pad sequence in the lentiviral expression vector is confirmed and the sequence is verified by Sanger sequencing of the landing pad. The sequence verified plasmids (9 μg) along with the lentiviral packaging mix (9 μg, Biosettia) are transfected using Lipofectamine2000™ according to the manufacturer instructions into a packaging cell line, LentiX-293T (Takara Bio). The transfected cells are incubated at 37° C., 5% CO2 for 48 hours (including one medium change at 24 hrs) and the viral particle containing medium is collected from the cell culture dish. The collected medium is filtered through a 0.2 μm filter to remove cell debris and is prepared for transduction of HEK293T cells. The virus-containing medium is diluted in DMEM and mixed with polybrene to prepare a dilution series for transduction of HEK293T cells where the final concentration of polybrene is 8 μg/ml. The HEK293T cells are grown in virus containing medium for 48 hours and then split with fresh medium. The split cells are grown to confluence and transduction efficiency of the different dilutions of virus is measured by GFP expression via flow cytometry and ddPCR detection of the genomic integrated lentivirus that contained GFP and the CFTR landing pads.


A gene modifying system comprising a (i) compatible gene modifying polypeptide described herein, e.g., having: an NLS of Table 11, a compatible Cas9 domain having a sequence of Table 8 (e.g., SpyCas9-SpRY), a linker of Table 10, an RT sequence of Table 6 or F3 (e.g., MLVMS_P03355_PLV919 or an RT domain from porcine endogenous retrovirus (PERV)), and a second NLS of Table 11 and (ii) a template RNA of any of Tables 1-4, E3, or E3A (e.g., a template RNA of ID #1) is transfected into the HEK293T landing pad cell line. The gene modifying polypeptide and the template RNAs are delivered by nucleofection in RNA format. Specifically, 1 μg of gene modifying polypeptide mRNA is combined with 10 μM template RNAs. The mRNA and template RNAs are added to 25 μL SF buffer containing 250,000 HEK293T landing pad cells and cells are nucleofected using program DS-150. After nucleofection, are were grown at 37° C., 5% CO2 for 3 days prior to cell lysis and genomic DNA extraction. To analyze gene editing activity, primers flanking the CFTR mutation site are used to amplify across the locus. Amplicons are analyzed via short read sequencing using an Illumina MiSeq. In some embodiments, the assay will indicate that at least 10%, 20%, 30%, 40%, 50%, 60%, or 70% of copies of the CFTR gene in the sample are converted to the desired wild-type sequence.


Example 2: Optimization of Gene Modifying Systems for the Therapeutic Correction of a Pathogenic Mutation Associated with Cystic Fibrosis

This example describes a wet lab experimental process designed to determine an optimized gene modifying system composition for the modification of a disease-causing mutation in model cells. More specifically, therapeutic template RNAs that work with different Cas variants are generated to replace a pathogenic mutation in genomic DNA with a non-pathogenic sequence, e.g., a sequence comprising the corresponding sequence found in a reference genome, e.g., the Genome Reference Consortium Human Build 38 (GRCh38/hg38) human reference genome. A Cas variant domain in a Cas-RT fusion as described herein, is used with an appropriate template RNA composition to create the gene modifying systems for screening. Here, gene modifying systems are designed to correct a pathogenic mutation known to be associated with cystic fibrosis. More specifically, this Example describes the screening of a template RNA and gene modifying polypeptide combination in order to correct the disease-causing F508del mutation in the human CFTR gene.


To perform the wet lab screening of combinations, a screening cell line comprising the target mutation is acquired or engineered as per Example 1 or using methods well-known in the art. To evaluate suitable nucleotide composition of the template RNA to achieve the correction of the mutation, a screen is performed in two stages. In the first stage, initial gRNAs designed for subsequent incorporation into template RNA molecules (Table 1), as well as optional secondary gRNAs that can serve as second-nick partners are tested for independent cleavage activity using the corresponding Cas variant(s). In this context, the parental Cas variant, e.g., the enzyme comprising double-stranded cleavage activity, may be used, e.g., to improve the mutational signal generated by target site cleavage and thus the sensitivity of the assay. Selection of spacers corresponding to these template gRNA molecules and optional second-nick gRNA molecules then follows (e.g., Table 2)).


gRNAs are tested for activity in combination with an appropriate Cas9 by lipid delivery or electroporation of RNA (gRNA and/or mRNA encoding the Cas9), DNA (DNA plasmid encoding the gRNA and/or Cas9), or a combination of RNA and DNA molecules. For screening in DNA format, the gRNA is placed in a plasmid with expression driven by a U6 promoter. The corresponding Cas9 species, comprising double-stranded cleavage activity, is cloned into a plasmid with expression driven by a CMV promoter. For screening in RNA format, the gRNAs are chemically synthesized and the Cas9 mRNA is produced by in vitro transcription of the Cas9 variant from a plasmid or PCR amplicon comprising a T7 promoter to drive T7 RNA polymerase-dependent transcription, the Cas9 coding sequence flanked by untranslated regions for optimal protein expression, and a DNA template-encoded polyA tail. The screening cell line is transfected or electroporated with Cas9 and gRNA expression plasmids for screening in DNA format or Cas9 mRNA and gRNA for screening in RNA format.


After 3-5 days post-transfection, the efficiency of Cas9 editing is measured using amplicon sequencing. Cells are harvested and lysed to extract genomic DNA for analysis. Amplicon sequencing libraries are prepared using primers to generate amplicons across the target site and sequenced using an Illumina MiSeq. Indel rates, corresponding to the activity of the individual gRNAs, are determined using the CRISPResso2 pipeline (Clement et al Nat Biotechnol 37(3):224-226 (2019)). The five gRNAs from the pool of potential template gRNAs producing the highest indel rates are then selected for use in full-length template RNA optimization. Additionally, four gRNAs from the pool of potential second-nick gRNAs are selected as candidate second-nick gRNAs.


To screen candidate gene modifying systems (gene modifying polypeptide comprising the appropriate Cas variant fused to an RT domain, template RNA, and optional second-nick gRNA), a template RNA screening library (e.g., Table 3) is built from the selected template gRNAs, e.g., template gRNAs enabling the highest cleavage activity at the target site. Exemplary gene modifying systems can be built by assembling the components from Table 1, Table 3, and, optionally, from Table 2. Without wishing to be limited, exemplary gene modifying systems are listed in Table 4. A full-length system comprises (1) a gene modifying polypeptide, e.g., a Cas-RT fusion as describe herein, e.g., a Cas-RT fusion comprising a Cas nickase listed in Table 1 and as described in Table 7, Table 8, and Table 12; (2) a full-length template RNA, further comprising, from 5′ to 3′, (i) a gRNA spacer, e.g., a spacer from Table 1; (ii) a gRNA scaffold further comprising a crRNA, tetraloop, and tracrRNA, as described for the specific Cas of interest in Table 12; (iii) an indication-specific RT template sequence (heterologous object sequence), e.g., an RT template comprised in Table 3; and (iv) a primer binding site sequence, e.g., a primer binding site sequence comprised in Table 3; and, optionally, (3) a second-nick gRNA, e.g., a second-nick gRNA from Table 2. The systems comprising full-length template RNAs are tested for activity by lipid delivery or electroporation in the screening cell line in both DNA and RNA formats, as above. For screening in DNA format, the template RNA is encoded on a plasmid with expression driven by a U6 promoter. The corresponding gene modifying polypeptide, e.g., a nickase mutant of the Cas9 variant matching the gRNA of interest (gRNA parameters and nickase mutation noted in Table 7, Table 8, or Table 12) fused to a reverse transcriptase domain, e.g., a domain comprising the MMLV RT or PERV RT (Table 6) is cloned into a plasmid with expression of the Cas-RT fusion protein driven by a CMV promoter. Selected secondary gRNAs targeting the non-edited strand, e.g., second-nick gRNAs enabling the highest cleavage activity, are cloned into a plasmid with expression driven by a U6 promoter. For screening in RNA format, template RNAs are chemically synthesized and gene modifying polypeptides are produced by in vitro transcription of the coding sequence from a plasmid or PCR amplicon comprising a T7 promoter to drive T7 RNA polymerase-dependent transcription, the Cas9-RT coding sequence flanked by untranslated regions for optimal protein expression, and a DNA template-encoded polyA tail. Where relevant, second-nick gRNAs are produced by chemical synthesis. For screening in DNA format, the screening cell line is transfected or electroporated with (1) the Cas9-RT fusion and template RNA expression plasmids, or (2) the Cas9-RT fusion, template RNA, and second-nick gRNA expression plasmids. For screening in RNA format, the screening cell line is transfected or electroporated with (1) an mRNA encoding the Cas9-RT fusion and a template RNA, or (2) an mRNA encoding the Cas9-RT fusion, a template RNA, and a second-nick gRNA.


At three days post-transfection, the efficiency of gene modifying system-mediated target correction is measured using amplicon sequencing. Cells are harvested and lysed to extract genomic DNA for analysis. Amplicon sequencing libraries are prepared using primers to generate amplicons across the target site and sequenced using an Illumina MiSeq. Rates of precise correction of the target mutation, as well as edits comprising unintended mutations (Indels), are determined using the CRISPResso2 pipeline (Clement et al Nat Biotechnol 37(3):224-226 (2019)). The best performing gene modifying systems, e.g., those with the highest precise correction rate and/or ratio of precise corrections to indels, with and without second-nick gRNA components, are selected for subsequent validation experiments in patient-derived cells, primary cell lines, or iPSCs comprising the target mutation.


As used in this Example, gene modifying polypeptides (Cas-RT fusions) comprise a Cas nickase, appropriate for use with a paired gRNA design, fused to an RT domain via a peptide linker (e.g., Table 10 and Table D), and preferably further comprise at least one nuclear localization signal (e.g., Table 11). Though this Example focuses on assaying the template RNA and optional second-nick gRNA in combination with a given Cas variant, it is contemplated that a combined approach of gene modifying polypeptide optimization, e.g., exploring gene modifying system activity improvements with various Cas, RT, linker, and NLS module combinations, with template RNA and optional gRNA combinations may result in further improvement of the system.


Example 3: Gene Modifying Polypeptide Selection by Pooled Screening in HEK293T & U2OS Cells

This example describes the use of an RNA gene modifying system for the targeted editing of a coding sequence in the human genome. More specifically, this example describes the infection of HEK293T and U2OS cells with a library of gene modifying candidates, followed by transfection of a template guide RNA (tgRNA) for in vitro gene modifying in the cells, e.g., as a means of evaluating a new gene modifying polypeptide for editing activity in human cells by a pooled screening approach.


The gene modifying polypeptide library candidates assayed herein each comprise: 1) a S. pyogenes (Spy) Cas9 nickase containing an N863A mutation that inactivates one endonuclease active site; 2) one of the 122 peptide linkers depicted at Table 10; and 3) a reverse transcriptase (RT) domain from Table 6 of retroviral origin. The particular retroviral RT domains utilized were selected if they were expected to function as a monomer. For each selected RT domain, the wild-type sequences were tested, as well as versions with point mutations installed in the primary wild-type sequence. In particular, 143 RT domains were tested, either wild type or containing various mutations. In total, 17,446 Cas-linker-RT gene modifying polypeptides were tested.


The system described here is a two-component system comprising: 1) an expression plasmid encoding a human codon-optimized gene modifying polypeptide library candidate within a lentiviral cassette, and 2) a tgRNA expression plasmid expressing a non-coding tgRNA sequence that is recognized by Cas and localizes it to the genomic locus of interest, and that also templates reverse transcription of the desired edit into the genome by the RT domain, driven by a U6 promoter. The lentiviral cassette comprises: (i) a CMV promoter for expression in mammalian cells; (ii) a gene modifying polypeptide library candidate as shown; (iii) a self-cleaving T2A polypeptide; (iv) a puromycin resistance gene enabling selection in mammalian cells; and (v) a polyA tail termination signal.


To prepare a pool of cells expressing gene modifying polypeptide library candidates, HEK293T or U2OS cells were transduced with pooled lentiviral preparations of the gene modifying candidate plasmid library. HEK293 Lenti-X cells were seeded in 15 cm plates (12×106 cells) prior to lentiviral plasmid transfection. Lentiviral plasmid transfection using the Lentiviral Packaging Mix (Biosettia, 27 ug) and the plasmid DNA for the gene modifying candidate library (27 ug) was performed the following day using Lipofectamine 2000 and Opti-MEM media according to the manufacturer's protocol. Extracellular DNA was removed by a full media change the next day and virus-containing media was harvested 48 hours after. Lentiviral media was concentrated using Lenti-X Concentrator (TaKaRa Biosciences) and 5 mL lentiviral aliquots were made and stored at −80° C. Lentiviral titering was performed by enumerating colony forming units post Puromycin selection. HEK293T or U2OS cells carrying a BFP-expressing genomic landing pad were seeded at 6×107 cells in culture plates and transduced at a 0.3 multiplicity of infection (MOI) to minimize multiple infections per cell. Puromycin (2.5 ug/mL) was added 48 hours post infection to allow for selection of infected cells. Cells were kept under puromycin selection for at least 7 days and then scaled up for tgRNA electroporation.


To determine the genome-editing capacity of the gene modifying library candidates in the assay, infected BFP-expressing HEK293T or U2OS cells were then transfected by electroporation of 250,000 cells/well with 200 ng of a tgRNA (either g4 or g10) plasmid, designed to convert BFP to GFP, at sufficient cell count for >1000× coverage per library candidate.


The g4 tgRNA (5′ to 3′) is as follows: 20 nucleotide spacer region (GCCGAAGCACTGCACGCCGT (SEQ ID NO: 37643)), a scaffold region (GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAA AAGTGGCACCGAGTCGGTGC (SEQ ID NO: 37644)), the template region encoding the single base pair substitution to change BFP to GFP (bold) and a PAM inactivation that introduces a synonymous point mutation in the SpyCas9 PAM (NGG to NCG) that prevents re-engagement of the gene modifying polypeptide upon completion of a functional gene modifying reaction (underline) (ACCCTGACGTACG (SEQ ID NO: 37645)), and the 13 nucleotide PBS (GCGTGCAGTGCTT (SEQ ID NO: 37646)). Similarly, the g10 tgRNA (5′ to 3′) is as follows: 20 nucleotide spacer region (AGAAGTCGTGCTGCTTCATG (SEQ ID NO: 37647)), a scaffold region (GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAA AAGTGGCACCGAGTCGGTGC (SEQ ID NO: 37644)), the template region encoding the single base pair substitution to change BFP to GFP (bold) and a PAM inactivation that introduces a synonymous point mutation in the SpyCas9 PAM (NGG to NGA) that prevents re-engagement of the gene modifying polypeptide upon completion of a functional gene modifying reaction (underline) (ACCCTGACCTACGGCGTGCAGTGCTTCGGCCGCTACCCCGATCACAT (SEQ ID NO: 37648)), and 13 nucleotide PBS (GAAGCAGCACGAC (SEQ ID NO: 37649)).


To assess the genome-editing capacity of the various constructs in the assay, cells were sorted by Fluorescence-Activated Cell Sorting (FACS) for GFP expression 6-7 days post-electroporation. Cells were sorted and harvested as distinct populations of unedited (BFP+) cells, edited (GFP+) cells and imperfect edit (BFP−, GFP−) cells. A sample of unsorted cells was also harvested as the input population to determine enrichment during analysis.


To determine which gene modifying library candidates have genome-editing capacity in this assay, genomic DNA (gDNA) was harvested from sorted and unsorted cell populations and analyzed by sequencing the gene modifying library candidates in each population. Briefly, gene modifying sequences were amplified from the genome using primers specific to the lentiviral cassette, amplified in a second round of PCR to dilute genomic DNA, and then sequenced using Oxford Nanopore Sequencing Technology according to the manufacturer's protocol.


After quality control of sequencing reads, reads of at least 1500 and no more than 3200 nucleotides were mapped to the gene modifying polypeptide library sequences and those containing a minimum of an 80% match to a library sequence were considered to be successfully aligned to a given candidate. To identify gene modifying candidates capable of performing gene editing in the assay, the read count of each library candidate in the edited population was compared to its read count in the initial, unsorted population. For purposes of this pooled screen, gene modifying candidates with genome-editing capacity were selected as those candidates that were enriched in the converted (GFP+) population relative to unsorted (input) cells and wherein the enrichment was determined to be at or above the enrichment level of a reference (Element ID No: 17380).


A large number of gene modifying polypeptide candidates were determined to be enriched in the GFP+ cell populations. For example, of the 17,446 candidates tested, over 3,300 exhibited enrichment in GFP+sorted populations (relative to unsorted) that was at least equivalent to that of the reference under similar experimental conditions (HEK293T using g4 tgRNA; HEK293T cells using g10 tgRNA; or U2OS cells using g4 tgRNA), shown in Table D. Although the 17,446 candidates were also tested in U2OS cells using g10 tgRNA, the pooled screen did not yield candidates that were enriched in the converted (GFP+) population relative to unsorted (input) cells under that experimental condition; further investigation is required to explain these results.









TABLE D







Combinations of linker and RT sequences screened. The amino acid sequence of


each RT in this table is provided in Table 6.










SEQ ID



Linker amino acid sequence
NO:
RT domain name





EAAAKGSS
12,001
PERV_Q4VFZ2_3mutA_WS





EAAAKEAAAKEAAAKEAAAK
12,002
MLVMS_P03355_PLV919





PAPEAAAK
12,003
MLVFF_P26809_3mutA





EAAAKPAPGGG
12,004
MLVFF_P26809_3mutA





GSSGSSGSSGSSGSSGSS
12,005
PERV_Q4VFZ2_3mut





PAPGGGEAAAK
12,006
MLVAV_P03356_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,007
MLVMS_P03355_PLV919





GSSEAAAK
12,008
MLVFF_P26809_3mutA





EAAAKPAPGGS
12,009
MLVFF_P26809_3mutA





GGSGGSGGSGGSGGSGGS
12,010
MLVFF_P26809_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,011
XMRV6_A1Z651_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,012
PERV_Q4VFZ2_3mutA_WS





EAAAKEAAAKEAAAK
12,013
MLVFF_P26809_3mutA





PAPEAAAKGSS
12,014
MLVFF_P26809_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,015
PERV_Q4VFZ2_3mutA_WS





EAAAKEAAAKEAAAK
12,016
PERV_Q4VFZ2_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,017
AVIRE_P03360_3mutA





PAPAPAPAPAP
12,018
MLVCB_P08361_3mutA





PAPAPAPAPAP
12,019
MLVFF_P26809_3mutA





EAAAKGGSPAP
12,020
PERV_Q4VFZ2_3mutA_WS





PAP

MLVMS_P03355_PLV919





PAPGGGGSS
12,022
WMSV_P03359_3mutA





SGSETPGTSESATPES
12,023
MLVFF_P26809_3mutA





PAPEAAAKGSS
12,024
XMRV6_A1Z651_3mutA





EAAAKGGSGGG
12,025
MLVMS_P03355_PLV919





GGGGSGGGGS
12,026
MLVFF_P26809_3mutA





GGGPAPGSS
12,027
MLVAV_P03356_3mutA





GGSGGSGGSGGSGGSGGS
12,028
XMRV6_A1Z651_3mut





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
12,029
MLVCB_P08361_3mutA





GSSPAP
12,030
AVIRE_P03360_3mutA





EAAAKGSSPAP
12,031
MLVFF_P26809_3mutA





GSSGGGEAAAK
12,032
MLVFF_P26809_3mutA





GGSGGSGGSGGSGGSGGS
12,033
MLVMS_P03355_3mutA_WS





PAPAPAPAP
12,034
MLVFF_P26809_3mutA





EAAAKEAAAKEAAAKEAAAK
12,035
XMRV6_A1Z651_3mutA





EAAAKGGSPAP
12,036
MLVMS_P03355_3mutA_WS





PAPGGSEAAAK
12,037
AVIRE_P03360_3mutA





GGGGGGGGSGGGGSGGGGSGGGGSGGGGS
12,038
AVIRE_P03360_3mutA





EAAAKGGGGSEAAAK
12,039
MLVCB_P08361_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,040
WMSV_P03359_3mutA





GSS

MLVMS_P03355_PLV919





GSSGSSGSSGSS
12,042
MLVMS_P03355_PLV919





GSSPAPEAAAK
12,043
XMRV6_A1Z651_3mutA





GGSPAPEAAAK
12,044
MLVFF_P26809_3mutA





GGGEAAAKGGS
12,045
MLVFF_P26809_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
12,046
PERV_Q4VFZ2_3mutA_WS





GGGGGGGG
12,047
PERV_Q4VFZ2_3mut





GGGPAP
12,048
MLVCB_P08361_3mutA





PAPAPAPAPAPAP
12,049
MLVCB_P08361_3mutA





GGSGGSGGSGGSGGSGGS
12,050
MLVCB_P08361_3mutA





PAP

MLVMS_P03355_3mutA_WS





GGSGGSGGSGGSGGSGGS
12,052
PERV_Q4VFZ2_3mutA_WS





PAPAPAPAPAPAP
12,053
MLVMS_P03355_PLV919





EAAAKPAPGSS
12,054
MLVMS_P03355_3mutA_WS





EAAAKEAAAKEAAAKEAAAK
12,055
MLVMS_P03355_3mutA_WS





EAAAKGGS
12,056
MLVMS_P03355_3mutA_WS





GGGGSEAAAKGGGGS
12,057
MLVFF_P26809_3mutA





EAAAKPAPGSS
12,058
MLVFF_P26809_3mutA





GGGGSGGGGSGGGGSGGGGS
12,059
MLVMS_P03355_PLV919





EAAAKGGGGGS
12,060
MLVMS_P03355_PLV919





GGSPAP
12,061
XMRV6_A1Z651_3mutA





EAAAKGGGPAP
12,062
MLVMS_P03355_PLV919





EAAAKEAAAKEAAAKEAAAKEAAAK
12,063
MLVFF_P26809_3mutA





PAP

MLVCB_P08361_3mutA





EAAAK
12,065
XMRV6_A1Z651_3mutA





GGSGSSPAP
12,066
PERV_Q4VFZ2_3mutA_WS





GSSGSSGSSGSSGSSGSS
12,067
MLVMS_P03355_PLV919





GSSEAAAKGGG
12,068
MLVAV_P03356_3mutA





GGGEAAAKGGS
12,069
XMRV6_A1Z651_3mutA





EAAAKGGGGSEAAAK
12,070
MLVAV_P03356_3mutA





GGGGSGGGGSGGGGS
12,071
MLVFF_P26809_3mutA





GGGGSGGGGSGGGGSGGGGS
12,072
AVIRE_P03360_3mutA





SGSETPGTSESATPES
12,073
AVIRE_P03360_3mutA





GGGEAAAKPAP
12,074
MLVFF_P26809_3mutA





EAAAKGSSGGG
12,075
MLVMS_P03355_3mutA_WS





EAAAKEAAAKEAAAKEAAAKEAAAK
12,076
WMSV_P03359_3mut





GGSGGSGGSGGS
12,077
XMRV6_A1Z651_3mutA





GGSEAAAKPAP
12,078
MLVFF_P26809_3mutA





EAAAKGSSGGG
12,079
XMRV6_A1Z651_3mutA





GGGGS
12,080
MLVFF_P26809_3mutA





GGGEAAAKGSS
12,081
MLVMS_P03355_PLV919





PAPAPAPAPAPAP
12,082
MLVAV_P03356_3mutA





GGGGSGGGGSGGGGSGGGGS
12,083
MLVCB_P08361_3mutA





GGGEAAAKGSS
12,084
MLVCB_P08361_3mutA





PAPGGSGSS
12,085
MLVFF_P26809_3mutA





GSAGSAAGSGEF
12,086
MLVCB_P08361_3mutA





PAPGGSEAAAK
12,087
MLVMS_P03355_3mutA_WS





GGSGSS
12,088
XMRV6_A1Z651_3mutA





PAPGGGGSS
12,089
MLVMS_P03355_PLV919





GSSGSSGSS
12,090
XMRV6_A1Z651_3mut





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,091
MLVMS_P03355_3mutA_WS





EAAAK
12,092
MLVMS_P03355_PLV919





GSSGSSGSSGSS
12,093
MLVFF_P26809_3mutA





PAPGGGGSS
12,094
MLVCB_P08361_3mutA





GGGEAAAKGGS
12,095
MLVCB_P08361_3mutA





PAPGGGEAAAK
12,096
MLVMS_P03355_PLV919





GGGGGSPAP
12,097
XMRV6_A1Z651_3mutA





EAAAKGGS
12,098
XMRV6_A1Z651_3mutA





EAAAKGSSPAP
12,099
XMRV6_A1Z651_3mut





PAPEAAAK
12,100
MLVAV_P03356_3mutA





GGSGGSGGSGGS
12,101
MLVMS_P03355_3mutA_WS





GGGPAPGGS
12,102
MLVMS_P03355_PLV919





GSSGSSGSSGSS
12,103
PERV_Q4VFZ2_3mutA_WS





EAAAKPAPGGS
12,104
MLVCB_P08361_3mutA





GSSGSS
12,105
MLVFF_P26809_3mutA





EAAAKEAAAKEAAAKEAAAK
12,106
MLVCB_P08361_3mutA





EAAAKEAAAKEAAAKEAAAK
12,107
FLV_P10273_3mutA





GSS

MLVFF_P26809_3mutA





EAAAKEAAAK
12,109
MLVMS_P03355_3mutA_WS





PAPEAAAKGGG
12,110
MLVAV_P03356_3mutA





GGSGSSEAAAK
12,111
MLVFF_P26809_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
12,112
PERV_Q4VFZ2





GSSEAAAKPAP
12,113
AVIRE_P03360_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
12,114
MLVCB_P08361_3mutA





EAAAKGGG
12,115
MLVFF_P26809_3mutA





GSSPAPGGG
12,116
MLVCB_P08361_3mutA





GGGPAPGSS
12,117
MLVMS_P03355_PLV919





GGGGGS
12,118
MLVMS_P03355_3mutA_WS





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,119
PERV_Q4VFZ2_3mut





GGGGSGGGGSGGGGSGGGGSGGGGS
12,120
WMSV_P03359_3mutA





EAAAKEAAAKEAAAK
12,121
PERV_Q4VFZ2_3mut





PAPAPAPAP
12,122
MLVCB_P08361_3mutA





GSSGSSGSSGSSGSS
12,123
PERV_Q4VFZ2_3mut





GGGGSSEAAAK
12,124
MLVMS_P03355_3mutA_WS





GGSGGSGGSGGS
12,125
MLVCB_P08361_3mutA





PAPEAAAKGGS
12,126
MLVCB_P08361_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,127
MLVCB_P08361_3mutA





EAAAKGGGGSEAAAK
12,128
MLVMS_P03355_PLV919





EAAAKGGGGSEAAAK
12,129
MLVMS_P03355_3mutA_WS





EAAAKGGGPAP
12,130
XMRV6_A1Z651_3mut





EAAAKEAAAKEAAAKEAAAKEAAAK
12,131
MLVMS_P03355_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,132
FLV_P10273_3mutA





GGSEAAAKGGG
12,133
MLVMS_P03355_3mutA_WS





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
12,134
KORV_Q9TTC1-Pro_3mutA





GGGPAPGGS
12,135
MLVCB_P08361_3mutA





PAPAPAPAPAPAP
12,136
XMRV6_A1Z651_3mutA





GGSGSSGGG
12,137
XMRV6_A1Z651_3mutA





GGSGSSGGG
12,138
MLVCB_P08361_3mutA





GGGEAAAKGGS
12,139
MLVMS_P03355_3mutA_WS





EAAAK
12,140
MLVCB_P08361_3mutA





GGSPAPGSS
12,141
MLVMS_P03355_3mutA_WS





GGGGSSEAAAK
12,142
PERV_Q4VFZ2_3mut





PAPAPAPAPAP
12,143
MLVBM_Q7SVK7_3mut





EAAAKEAAAKEAAAKEAAAK
12,144
MLVAV_P03356_3mutA





GGGGGSGSS
12,145
MLVCB_P08361_3mutA





EAAAKGSSPAP
12,146
MLVMS_P03355_3mutA_WS





PAPAPAPAPAPAP
12,147
MLVMS_P03355_3mutA_WS





GSSGGGGGS
12,148
MLVMS_P03355_3mutA_WS





PAPGSSGGG
12,149
MLVMS_P03355_PLV919





GGSGGGPAP
12,150
MLVCB_P08361_3mutA





GGGGGGG
12,151
MLVCB_P08361_3mutA





GSSGSSGSSGSSGSSGSS
12,152
MLVCB_P08361_3mutA





GGGPAPGGS
12,153
MLVFF_P26809_3mutA





EAAAKGGSGGG
12,154
PERV_Q4VFZ2_3mut





EAAAKGGGGSS
12,155
MLVMS_P03355_3mutA_WS





GSSGSSGSSGSSGSSGSS
12,156
MLVMS_P03355_3mut





GGGGSGGGGGGGGSGGGGS
12,157
MLVBM_Q7SVK7_3mutA_WS





PAPAPAPAPAP
12,158
MLVMS_P03355_PLV919





GGGEAAAKGGS
12,159
MLVMS_P03355_PLV919





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,160
MLVMS_P03355_3mut





GSAGSAAGSGEF
12,161
MLVMS_P03355_3mutA_WS





GSSGSSGSSGSSGSS
12,162
MLVFF_P26809_3mutA





EAAAKGGSGSS
12,163
MLVFF_P26809_3mutA





PAPGGG
12,164
MLVFF_P26809_3mutA





GGGPAPGSS
12,165
XMRV6_A1Z651_3mutA





PAPEAAAKGGS
12,166
AVIRE_P03360_3mutA





PAPGGGEAAAK
12,167
MLVFF_P26809_3mut





GGGGSSEAAAK
12,168
MLVCB_P08361_3mutA





EAAAK
12,169
MLVMS_P03355_PLV919





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
12,170
BAEVM_P10272_3mutA





GGSGGGEAAAK
12,171
MLVMS_P03355_PLV919





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,172
MLVFF_P26809_3mutA





GSSPAPGGS
12,173
XMRV6_A1Z651_3mutA





GGSGGGPAP
12,174
MLVMS_P03355_PLV919





EAAAK
12,175
AVIRE_P03360_3mutA





GSS

XMRV6_A1Z651_3mutA





GGSGGSGGS
12,177
MLVFF_P26809_3mutA





EAAAKEAAAKEAAAKEAAAK
12,178
AVIRE_P03360_3mut





PAPEAAAKGGG
12,179
PERV_Q4VFZ2_3mutA_WS





GGGGGSEAAAK
12,180
BAEVM_P10272_3mutA





GGSGSSGGG
12,181
MLVMS_P03355_3mutA_WS





GGGGGGG
12,182
MLVMS_P03355_3mutA_WS





GSSEAAAKPAP
12,183
PERV_Q4VFZ2_3mut





GGGGGSEAAAK
12,184
WMSV_P03359_3mut





GGGGSGGGGSGGGGSGGGGSGGGGS
12,185
MLVFF_P26809_3mut





GGGEAAAKGGS
12,186
AVIRE_P03360_3mutA





GGSPAPGGG
12,187
AVIRE_P03360_3mutA





GSAGSAAGSGEF
12,188
MLVAV_P03356_3mutA





EAAAK
12,189
MLVAV_P03356_3mutA





EAAAKPAPGSS
12,190
WMSV_P03359_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,191
PERV_Q4VFZ2_3mutA_WS





GGSEAAAKPAP
12,192
MLVCB_P08361_3mutA





PAPAPAPAPAPAP
12,193
MLVBM_Q7SVK7_3mutA_WS





GGSPAPGGG
12,194
MLVMS_P03355_3mutA_WS





GGSEAAAKGGG
12,195
MLVMS_P03355_3mut





GGSGGSGGSGGS
12,196
MLVFF_P26809_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,197
MLVFF_P26809_3mutA





GGG

AVIRE_P03360_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,199
PERV_Q4VFZ2_3mut





GGSGGSGGSGGS
12,200
MLVMS_P03355_3mutA_WS





GGGEAAAK
12,201
MLVCB_P08361_3mutA





GSSGSSGSSGSSGSSGSS
12,202
MLVMS_P03355_3mutA_WS





GSSGGGPAP
12,203
MLVMS_P03355_3mutA_WS





GSSEAAAKPAP
12,204
MLVFF_P26809_3mutA





EAAAKEAAAK
12,205
MLVMS_P03355_PLV919





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
12,206
MLVCB_P08361_3mut





GGGGGG
12,207
MLVMS_P03355_3mutA_WS





GGSGSSGGG
12,208
MLVFF_P26809_3mutA





GSSGGGEAAAK
12,209
PERV_Q4VFZ2_3mutA_WS





PAPAPAPAPAP
12,210
PERV_Q4VFZ2_3mut





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,211
SFV3L_P27401_2mut





EAAAKGGSGGG
12,212
BAEVM_P10272_3mutA





GGGGSSPAP
12,213
PERV_Q4VFZ2_3mutA_WS





GGGEAAAKPAP
12,214
MLVMS_P03355_PLV919





GGSGGGPAP
12,215
BAEVM_P10272_3mutA





PAPGSSGGS
12,216
MLVMS_P03355_PLV919





GGSGGGPAP
12,217
MLVMS_P03355_3mutA_WS





EAAAKGGSPAP
12,218
PERV_Q4VFZ2_3mutA_WS





EAAAKGGSGGG
12,219
MLVMS_P03355_3mutA_WS





PAPGSSGGG
12,220
MLVFF_P26809_3mutA





GSSEAAAKGGS
12,221
MLVFF_P26809_3mutA





PAPGSSEAAAK
12,222
MLVFF_P26809_3mutA





EAAAKGSSPAP
12,223
KORV_Q9TTC1-Pro_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
12,224
MLVBM_Q7SVK7_3mutA_WS





PAPGSSEAAAK
12,225
MLVMS_P03355_PLV919





EAAAKGSSGGG
12,226
MLVMS_P03355_3mutA_WS





EAAAKGGGGGS
12,227
AVIRE_P03360_3mutA





EAAAKEAAAKEAAAK
12,228
MLVMS_P03355_PLV919





PAPAPAPAPAPAP
12,229
MLVFF_P26809_3mutA





GGGGSGGGGSGGGGS
12,230
MLVCB_P08361_3mutA





PAPGGSEAAAK
12,231
MLVCB_P08361_3mutA





PAPGSSEAAAK
12,232
MLVBM_Q7SVK7_3mutA_WS





PAPEAAAKGSS
12,233
AVIRE_P03360_3mutA





GGSPAPGSS
12,234
WMSV_P03359_3mutA





PAPGGSGGG
12,235
MLVMS_P03355_PLV919





EAAAKGGSGSS
12,236
MLVMS_P03355_3mutA_WS





GGSGGG
12,237
MLVFF_P26809_3mutA





GGSEAAAKGSS
12,238
KORV_Q9TTC1_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,239
MLVCB_P08361_3mutA





PAPAPAPAPAPAP
12,240
PERV_Q4VFZ2_3mutA_WS





PAPEAAAK
12,241
MLVMS_P03355_3mutA_WS





GGSEAAAKGGG
12,242
MLVMS_P03355_PLV919





GSSPAP
12,243
MLVMS_P03355_3mutA_WS





GGGGSS
12,244
MLVMS_P03355_PLV919





GGGEAAAKPAP
12,245
AVIRE_P03360_3mutA





EAAAKPAPGGS
12,246
MLVAV_P03356_3mutA





EAAAKGGGPAP
12,247
MLVAV_P03356_3mutA





PAPGGSEAAAK
12,248
BAEVM_P10272_3mutA





PAPGGSGSS
12,249
MLVMS_P03355_3mutA_WS





PAPGGSGSS
12,250
AVIRE_P03360_3mutA





GGSGGGPAP
12,251
MLVMS_P03355_3mutA_WS





EAAAKEAAAKEAAAKEAAAK
12,252
BAEVM_P10272_3mutA





GGGGSGGGGSGGGGSGGGGSGGGGS
12,253
MLVMS_P03355_PLV919





GGGGSSPAP
12,254
MLVCB_P08361_3mutA





GSSGGGPAP
12,255
MLVFF_P26809_3mutA





GGGGSSGGS
12,256
MLVMS_P03355_PLV919





GGSGGG
12,257
MLVCB_P08361_3mutA





GSSGGGGGS
12,258
MLVMS_P03355_PLV919





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
12,259
XMRV6_A1Z651_3mutA





GGGGGSGSS
12,260
KORV_Q9TTC1_3mut





GGGEAAAKGGS
12,261
BAEVM_P10272_3mutA





GGSGGG
12,262
BAEVM_P10272_3mutA





PAPAPAP
12,263
KORV_Q9TTC1-Pro_3mut





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,264
SFV3L_P27401_2mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,265
MLVBM_Q7SVK7_3mutA_WS





GSSGSSGSSGSSGSS
12,266
MLVMS_P03355_3mutA_WS





GSSGGGEAAAK
12,267
MLVMS_P03355_3mutA_WS





GSSGGSEAAAK
12,268
MLVFF_P26809_3mutA





PAP

MLVMS_P03355_PLV919





EAAAKGGGGSEAAAK
12,270
MLVBM_Q7SVK7_3mutA_WS





PAPAP
12,271
AVIRE_P03360_3mutA





PAP

MLVFF_P26809_3mutA





GSSGGG
12,273
MLVMS_P03355_3mut





GSSPAPGGS
12,274
MLVFF_P26809_3mutA





PAPAPAPAP
12,275
XMRV6_A1Z651_3mutA





EAAAKGSSGGS
12,276
PERV_Q4VFZ2_3mut





PAPEAAAKGGG
12,277
KORV_Q9TTC1-Pro_3mutA





PAPGGS
12,278
MLVCB_P08361_3mutA





EAAAKGGG
12,279
MLVCB_P08361_3mutA





GSSEAAAKPAP
12,280
MLVMS_P03355_PLV919





PAPGGS
12,281
MLVFF_P26809_3mutA





EAAAKGGS
12,282
MLVCB_P08361_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,283
FLV_P10273_3mutA





PAPGGSEAAAK
12,284
MLVAV_P03356_3mutA





GSS

MLVCB_P08361_3mutA





GSSGSSGSSGSS
12,286
AVIRE_P03360_3mutA





GSSGSSGSS
12,287
MLVFF_P26809_3mutA





GSSGGG
12,288
MLVMS_P03355_PLV919





EAAAK
12,289
MLVFF_P26809_3mutA





GGSPAPEAAAK
12,290
MLVCB_P08361_3mutA





GGSGSS
12,291
MLVCB_P08361_3mutA





GSSPAPGGG
12,292
MLVMS_P03355_PLV919





EAAAKEAAAKEAAAKEAAAKEAAAK
12,293
MLVAV_P03356_3mutA





EAAAKGSSPAP
12,294
FLV_P10273_3mutA





GGGGSS
12,295
XMRV6_A1Z651_3mutA





GGSPAPGSS
12,296
MLVMS_P03355_PLV919





EAAAKEAAAKEAAAKEAAAKEAAAK
12,297
MLVMS_P03355_3mutA_WS





PAPEAAAKGGG
12,298
FLV_P10273_3mutA





EAAAKPAPGGS
12,299
XMRV6_A1Z651_3mut





PAPAP
12,300
BAEVM_P10272_3mutA





EAAAKEAAAKEAAAKEAAAK
12,301
MLVMS_P03355_PLV919





GSSPAPGGG
12,302
MLVMS_P03355_PLV919





EAAAKGGGPAP
12,303
KORV_Q9TTC1_3mutA





PAPEAAAK
12,304
MLVMS_P03355_PLV919





PAPGGGEAAAK
12,305
PERV_Q4VFZ2_3mutA_WS





EAAAKGSSGGS
12,306
MLVMS_P03355_3mutA_WS





EAAAKEAAAKEAAAK
12,307
MLVMS_P03355_PLV919





GSSEAAAK
12,308
MLVMS_P03355_3mutA_WS





GSSGSSGSSGSS
12,309
MLVMS_P03355_3mutA_WS





GGGGSGGGGSGGGGSGGGGS
12,310
MLVMS_P03355_3mutA_WS





EAAAKGGGGSEAAAK
12,311
MLVMS_P03355_3mut





GGS

MLVCB_P08361_3mutA





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
12,313
XMRV6_A1Z651_3mutA





GGSGSSPAP
12,314
MLVCB_P08361_3mutA





GGGGSGGGGSGGGGS
12,315
XMRV6_A1Z651_3mutA





PAPAPAPAPAP
12,316
BAEVM_P10272_3mutA





PAPAPAPAPAP
12,317
MLVMS_P03355_3mutA_WS





EAAAKEAAAKEAAAKEAAAK
12,318
MLVBM_Q7SVK7_3mut





GGGGSGGGGSGGGGSGGGGSGGGGS
12,319
BAEVM_P10272_3mutA





GGSGGSGGS
12,320
MLVMS_P03355_3mutA_WS





EAAAKPAPGSS
12,321
MLVMS_P03355_PLV919





GSS

MLVMS_P03355_3mutA_WS





PAPEAAAKGGS
12,323
MLVMS_P03355_3mutA_WS





GGGPAPGGS
12,324
MLVMS_P03355_3mutA_WS





EAAAKGGGGSS
12,325
MLVAV_P03356_3mutA





GSSGSSGSSGSSGSS
12,326
MLVFF_P26809_3mut





SGSETPGTSESATPES
12,327
PERV_Q4VFZ2_3mut





GGSEAAAKGGG
12,328
MLVMS_P03355_3mut





GSSGSSGSSGSSGSSGSS
12,329
AVIRE_P03360_3mutA





PAPAPAPAPAPAP
12,330
AVIRE_P03360_3mut





GGSGGS
12,331
XMRV6_A1Z651_3mutA





PAPGSSEAAAK
12,332
MLVCB_P08361_3mut





GGSPAPEAAAK
12,333
PERV_Q4VFZ2_3mut





EAAAKGGGGGS
12,334
MLVCB_P08361_3mutA





GGSGGSGGSGGS
12,335
MLVMS_P03355_PLV919





GGGGSSEAAAK
12,336
MLVMS_P03355_PLV919





GSSEAAAKGGG
12,337
MLVFF_P26809_3mutA





PAPGGS
12,338
MLVMS_P03355_3mutA_WS





EAAAKGGSGGG
12,339
MLVCB_P08361_3mutA





EAAAKGGG
12,340
PERV_Q4VFZ2_3mut





PAPGGS
12,341
XMRV6_A1Z651_3mutA





GSSPAPGGG
12,342
XMRV6_A1Z651_3mutA





PAPEAAAKGGG
12,343
MLVMS_P03355_3mutA_WS





GSSEAAAKGGG
12,344
PERV_Q4VFZ2_3mutA_WS





PAPGGSEAAAK
12,345
XMRV6_A1Z651_3mutA





GGGGGS
12,346
MLVMS_P03355_3mutA_WS





GGSPAPEAAAK
12,347
MLVMS_P03355_3mutA_WS





GGGPAP
12,348
MLVFF_P26809_3mutA





PAPGSSGGG
12,349
XMRV6_A1Z651_3mutA





PAPGSSGGG
12,350
MLVBM_Q7SVK7_3mutA_WS





GGGEAAAKGSS
12,351
MLVMS_P03355_3mutA_WS





GSSEAAAKGGS
12,352
MLVCB_P08361_3mutA





PAPGGSGSS
12,353
MLVCB_P08361_3mutA





EAAAKGGGGSEAAAK
12,354
BAEVM_P10272_3mutA





PAPAPAP
12,355
PERV_Q4VFZ2_3mutA_WS





GGGGGG
12,356
MLVAV_P03356_3mutA





GSSPAPEAAAK
12,357
MLVCB_P08361_3mutA





GGSGGSGGS
12,358
MLVMS_P03355_3mutA_WS





GSSGSSGSSGSSGSS
12,359
XMRV6_A1Z651_3mut





GGGPAPGGS
12,360
XMRV6_A1Z651_3mutA





GGGPAPEAAAK
12,361
BAEVM_P10272_3mutA





GGSGGG
12,362
AVIRE_P03360_3mutA





SGSETPGTSESATPES
12,363
PERV_Q4VFZ2_3mutA_WS





EAAAKGSSPAP
12,364
MLVMS_P03355_PLV919





GSSEAAAK
12,365
XMRV6_A1Z651_3mut





GSSGGSGGG
12,366
MLVFF_P26809_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
12,367
WMSV_P03359_3mutA





GGGGSEAAAKGGGGS
12,368
MLVMS_P03355_PLV919





PAPGGGGSS
12,369
MLVMS_P03355_3mutA_WS





SGSETPGTSESATPES
12,370
MLVMS_P03355_3mutA_WS





GGSPAPEAAAK
12,371
KRV_Q9TTC1-Pro_3mutA





GSSEAAAKGGG
12,372
MLVMS_P03355_3mutA_WS





GSSEAAAK
12,373
WMSV_P03359_3mutA





GGGGSEAAAKGGGGS
12,374
AVIRE_P03360_3mutA





GSS

WMSV_P03359_3mutA





PAPGGSEAAAK
12,376
MLVFF_P26809_3mutA





GGGGS
12,377
MLVMS_P03355_3mutA_WS





GGGPAP
12,378
MLVMS_P03355_3mutA_WS





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,379
MLVMS_P03355_3mutA_WS





EAAAKPAPGSS
12,380
PERV_Q4VFZ2_3mut





EAAAKPAPGSS
12,381
MLVCB_P08361_3mutA





GGGGGG
12,382
WMSV_P03359_3mutA





EAAAKPAPGGS
12,383
MLVMS_P03355_PLV919





PAPGGGEAAAK
12,384
PERV_Q4VFZ2_3mut





EAAAKEAAAKEAAAKEAAAKEAAAK
12,385
AVIRE_P03360_3mutA





GSSEAAAKPAP
12,386
XMRV6_A1Z651_3mutA





PAPGGSEAAAK
12,387
MLVBM_Q7SVK7_3mutA_WS





PAPGSS
12,388
MLVCB_P08361_3mutA





EAAAKGGG
12,389
MLVMS_P03355_3mutA_WS





EAAAKPAP
12,390
MLVCB_P08361_3mutA





PAPEAAAKGGS
12,391
MLVBM_Q7SVK7_3mutA_WS





GGSPAPGGG
12,392
MLVCB_P08361_3mutA





PAPGGSGSS
12,393
WMSV_P03359_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,394
MLVMS_P03355_PLV919





GGSGGGPAP
12,395
MLVMS_P03355_PLV919





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,396
MLVMS_P03355





PAPEAAAKGSS
12,397
MLVCB_P08361_3mutA





EAAAKGSS
12,398
MLVMS_P03355_3mutA_WS





GGSGGS
12,399
MLVMS_P03355_3mutA_WS





EAAAKEAAAKEAAAKEAAAKEAAAK
12,400
BAEVM_P10272_3mutA





GGGGSEAAAKGGGGS
12,401
FLV_P10273_3mutA





GGSEAAAKGGG
12,402
MLVCB_P08361_3mutA





GSSGSSGSSGSSGSS
12,403
BAEVM_P10272_3mutA





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
12,404
MLVFF_P26809_3mutA





EAAAKGGG
12,405
PERV_Q4VFZ2_3mut





GGGGGSEAAAK
12,406
MLVCB_P08361_3mutA





EAAAKPAPGGS
12,407
MLVMS_P03355_3mutA_WS





GGGGGSGSS
12,408
XMRV6_A1Z651_3mutA





PAPGSSEAAAK
12,409
MLVMS_P03355_3mutA_WS





GSSEAAAKPAP
12,410
MLVCB_P08361_3mutA





EAAAKGSSPAP
12,411
MLVAV_P03356_3mutA





GGGPAPGGS
12,412
WMSV_P03359_3mutA





GGSPAP
12,413
MLVMS_P03355_3mutA_WS





GGSEAAAKGGG
12,414
MLVMS_P03355_3mutA_WS





GGGGGGGG
12,415
MLVFF_P26809_3mutA





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
12,416
MLVMS_P03355_3mutA_WS





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
12,417
MLVBM_Q7SVK7_3mutA_WS





GSSPAPGGG
12,418
MLVAV_P03356_3mutA





GGGGGG
12,419
AVIRE_P03360_3mutA





GSSGGS
12,420
MLVMS_P03355_3mutA_WS





GGSPAPGSS
12,421
MLVFF_P26809_3mutA





PAPEAAAKGGG
12,422
PERV_Q4VFZ2_3mut





EAAAKGGGPAP
12,423
MLVFF_P26809_3mutA





GGGEAAAKGGS
12,424
MLVMS_P03355_PLV919





GGSGSSPAP
12,425
MLVFF_P26809_3mutA





SGSETPGTSESATPES
12,426
WMSV_P03359_3mutA





PAPGGSEAAAK
12,427
MLVBM_Q7SVK7_3mutA_WS





GGSGGG
12,428
MLVMS_P03355_PLV919





GGGGSSPAP
12,429
PERV_Q4VFZ2_3mut





GGGEAAAKGSS
12,430
MLVAV_P03356_3mutA





PAPAPAPAPAPAP
12,431
MLVMS_P03355_3mutA_WS





EAAAKGGGGSEAAAK
12,432
PERV_Q4VFZ2





EAAAKEAAAKEAAAKEAAAKEAAAK
12,433
MLVMS_P03355_PLV919





GGGGGSEAAAK
12,434
PERV_Q4VFZ2_3mut





PAPGSSEAAAK
12,435
MLVCB_P08361_3mutA





GSAGSAAGSGEF
12,436
PERV_Q4VFZ2_3mutA_WS





EAAAKGGGGSEAAAK
12,437
MLVFF_P26809_3mutA





GGSPAPGGG
12,438
PERV_Q4VFZ2_3mutA_WS





GSSEAAAKGGG
12,439
AVIRE_P03360_3mutA





GGGEAAAKPAP
12,440
MLVMS_P03355_3mutA_WS





GGGPAP
12,441
AVIRE_P03360_3mutA





GGSEAAAK
12,442
MLVCB_P08361_3mutA





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
12,443
PERV_Q4VFZ2_3mut





EAAAKPAPGGS
12,444
MLVBM_Q7SVK7_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,445
XMRV6_A1Z651_3mut





GGGGGGGG
12,446
MLVCB_P08361_3mutA





PAPGSS
12,447
PERV_Q4VFZ2_3mut





EAAAK
12,448
PERV_Q4VFZ2_3mut





GSAGSAAGSGEF
12,449
MLVMS_P03355_3mutA_WS





PAPGGGEAAAK
12,450
PERV_Q4VFZ2_3mut





EAAAKGSSGGS
12,451
MLVFF_P26809_3mut





GGGGSEAAAKGGGGS
12,452
BAEVM_P10272_3mutA





GGGGSGGGGSGGGGS
12,453
MLVMS_P03355_PLV919





EAAAKGGGGSEAAAK
12,454
BAEVM_P10272_3mut





PAPGGGEAAAK
12,455
MLVMS_P03355_3mutA_WS





GGSEAAAKPAP
12,456
MLVMS_P03355_3mutA_WS





PAPAP
12,457
MLVCB_P08361_3mutA





PAPAP
12,458
MLVFF_P26809_3mutA





GGSPAP
12,459
AVIRE_P03360_3mutA





EAAAKGSSGGS
12,460
MLVCB_P08361_3mutA





PAPGSSGGS
12,461
AVIRE_P03360_3mutA





EAAAKGGGGSEAAAK
12,462
XMRV6_A1Z651_3mutA





PAPAPAP
12,463
BAEVM_P10272_3mutA





GGSGGSGGSGGSGGSGGS
12,464
MLVMS_P03355_PLV919





GGGGGSGSS
12,465
MLVMS_P03355_PLV919





PAPGSSEAAAK
12,466
XMRV6_A1Z651_3mut





GGSEAAAKPAP
12,467
XMRV6_A1Z651_3mutA





EAAAKEAAAKEAAAKEAAAK
12,468
XMRV6_A1Z651_3mut





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,469
WMSV_P03359_3mut





GGSGGGEAAAK
12,470
XMRV6_A1Z651_3mutA





GGGEAAAK
12,471
XMRV6_A1Z651_3mutA





GGGGSGGGGSGGGGS
12,472
MLVMS_P03355_3mutA_WS





GGSGGSGGSGGSGGS
12,473
MLVFF_P26809_3mutA





GSSGGGGGS
12,474
MLVMS_P03355_3mut





PAPGGSEAAAK
12,475
MLVMS_P03355_3mutA_WS





GSSGGSPAP
12,476
MLVMS_P03355_3mutA_WS





SGSETPGTSESATPES
12,477
XMRV6_A1Z651_3mutA





GGGGSGGGGS
12,478
MLVMS_P03355_PLV919





PAPAPAPAPAP
12,479
MLVMS_P03355_3mut





GSSGSS
12,480
XMRV6_A1Z651_3mutA





GSSEAAAKPAP
12,481
PERV_Q4VFZ2_3mut





GGSGSSGGG
12,482
MLVMS_P03355_3mutA_WS





EAAAKEAAAK
12,483
MLVCB_P08361_3mutA





GSSGSSGSSGSS
12,484
MLVMS_P03355_3mutA_WS





GSSPAPGGG
12,485
PERV_Q4VFZ2_3mutA_WS





EAAAKEAAAKEAAAK
12,486
MLVMS_P03355_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,487
SFV1_P23074_2mutA





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
12,488
MLVMS_P03355_PLV919





GSAGSAAGSGEF
12,489
MLVMS_P03355_PLV919





PAPGSSEAAAK
12,490
MLVMS_P03355_3mutA_WS





GGSEAAAK
12,491
MLVMS_P03355_3mutA_WS





GSSGSSGSSGSSGSS
12,492
PERV_Q4VFZ2_3mutA_WS





GGSEAAAKPAP
12,493
PERV_Q4VFZ2_3mutA_WS





GGSGGSGGS
12,494
MLVCB_P08361_3mutA





EAAAKGGSGSS
12,495
MLVCB_P08361_3mutA





GGGGSGGGGSGGGGSGGGGSGGGGS
12,496
FLV_P10273_3mutA





EAAAKEAAAKEAAAKEAAAK
12,497
MLVBM_Q7SVK7_3mutA_WS





GGSGSSPAP
12,498
BAEVM_P10272_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
12,499
XMRV6_A1Z651_3mutA





GGGGSGGGGSGGGGSGGGGSGGGGS
12,500
MLVBM_Q7SVK7_3mutA_WS





GGSGSS
12,501
WMSV_P03359_3mutA





PAPEAAAK
12,502
MLVCB_P08361_3mutA





EAAAKPAP
12,503
BAEVM_P10272_3mutA





GSSPAP
12,504
PERV_Q4VFZ2_3mutA_WS





GGGPAP
12,505
PERV_Q4VFZ2_3mutA_WS





EAAAKGGSGSS
12,506
MLVMS_P03355_3mutA_WS





EAAAKGGGGSEAAAK
12,507
AVIRE_P03360_3mutA





GGSGGG
12,508
KORV_Q9TTC1-Pro_3mutA





GSSPAP
12,509
MLVFF_P26809_3mutA





GGSGSSEAAAK
12,510
BAEVM_P10272_3mutA





PAPGSSGGS
12,511
BAEVM_P10272_3mutA





GGGGGG
12,512
MLVFF_P26809_3mutA





PAPGGSEAAAK
12,513
MLVMS_P03355_PLV919





PAPGGS
12,514
MLVMS_P03355_PLV919





GGSGGSGGSGGS
12,515
BAEVM_P10272_3mutA





GSSPAP
12,516
MLVCB_P08361_3mutA





PAPAPAPAP
12,517
MLVMS_P03355_3mutA_WS





GGGGGG
12,518
MLVCB_P08361_3mutA





GSSGSSGSSGSSGSSGSS
12,519
KORV_Q9TTC1-Pro_3mutA





GSSEAAAKGGS
12,520
BAEVM_P10272_3mutA





GGSEAAAK
12,521
FLV_P10273_3mutA





GGSGGSGGSGGSGGS
12,522
KORV_Q9TTC1-Pro_3mutA





GSSPAPEAAAK
12,523
PERV_Q4VFZ2_3mut





GSSGSSGSSGSSGSS
12,524
XMRV6_A1Z651_3mutA





EAAAKPAPGGS
12,525
MLVMS_P03355_3mut





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
12,526
FLV_P10273_3mut





GGSPAPEAAAK
12,527
XMRV6_A1Z651_3mut





EAAAKGGSGGG
12,528
MLVFF_P26809_3mutA





EAAAKEAAAKEAAAKEAAAK
12,529
MLVFF_P26809_3mutA





GSSPAP
12,530
WMSV_P03359_3mutA





PAPAPAPAP
12,531
MLVAV_P03356_3mutA





PAPGGSEAAAK
12,532
KORV_Q9TTC1_3mut





GGSGSSEAAAK
12,533
MLVBM_Q7SVK7_3mutA_WS





GSSGGG
12,534
MLVCB_P08361_3mutA





GGGEAAAKGSS
12,535
PERV_Q4VFZ2_3mut





PAPGGSGGG
12,536
MLVFF_P26809_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,537
FFV_O93209





PAPGGGGSS
12,538
MLVMS_P03355_3mutA_WS





EAAAKGGS
12,539
MLVAV_P03356_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,540
MLVBM_Q7SVK7_3mutA_WS





GGSGGSGGS
12,541
WMSV_P03359_3mutA





PAPAP
12,542
MLVMS_P03355_3mutA_WS





GSSGGGEAAAK
12,543
MLVAV_P03356_3mutA





GGGGSSEAAAK
12,544
MLVFF_P26809_3mutA





EAAAKGSSGGS
12,545
MLVMS_P03355_PLV919





EAAAKGGGGSEAAAK
12,546
MLVMS_P03355_3mutA_WS





GGGGGGGG
12,547
MLVMS_P03355_PLV919





GSSGSSGSS
12,548
MLVMS_P03355_PLV919





GGGEAAAKPAP
12,549
PERV_Q4VFZ2_3mutA_WS





GGGGGSGSS
12,550
MLVMS_P03355_3mutA_WS





GGGGGGG
12,551
MLVMS_P03355_PLV919





GGS

MLVMS_P03355_PLV919





GSSGGG
12,553
MLVMS_P03355_3mutA_WS





EAAAKGGSGSS
12,554
PERV_Q4VFZ2_3mutA_WS





PAPGSSEAAAK
12,555
MLVMS_P03355_PLV919





GSSEAAAKPAP
12,556
MLVMS_P03355_PLV919





GGSPAPGSS
12,557
BAEVM_P10272_3mutA





GSAGSAAGSGEF
12,558
MLVCB_P08361_3mut





GGSPAPGGG
12,559
PERV_Q4VFZ2_3mut





GGGGSGGGGSGGGGSGGGGS
12,560
MLVMS_P03355_3mut





GSSGSSGSS
12,561
PERV_Q4VFZ2_3mutA_WS





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,562
PERV_Q4VFZ2_3mut





GGGGSEAAAKGGGGS
12,563
MLVCB_P08361_3mutA





GGSEAAAKGSS
12,564
MLVAV_P03356_3mutA





EAAAKGGGGSEAAAK
12,565
MLVCB_P08361_3mut





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,566
XMRV6_A1Z651_3mutA





PAPGGGEAAAK
12,567
MLVMS_P03355_3mutA_WS





GSSGGGEAAAK
12,568
PERV_Q4VFZ2_3mutA_WS





GSSGSS
12,569
MLVCB_P08361_3mut





PAPAPAPAPAPAP
12,570
PERV_Q4VFZ2_3mut





GGSPAPGGG
12,571
MLVFF_P26809_3mutA





GGSGGSGGSGGSGGS
12,572
MLVCB_P08361_3mutA





EAAAKEAAAK
12,573
MLVFF_P26809_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,574
GALV_P21414_3mut





PAPAPAPAPAPAP
12,575
WMSV_P03359_3mutA





GGGEAAAKGGS
12,576
KORV_Q9TTC1_3mutA





EAAAKGGGPAP
12,577
KORV_Q9TTC1_3mut





PAPEAAAKGSS
12,578
MLVBM_Q7SVK7_3mutA_WS





PAPEAAAKGSS
12,579
FLV_P10273_3mutA





PAPGGSEAAAK
12,580
MLVMS_P03355_3mut





GSSPAPGGG
12,581
BAEVM_P10272_3mutA





GGGEAAAKPAP
12,582
KORV_Q9TTC1-Pro_3mutA





GGGGSGGGGS
12,583
MLVMS_P03355_PLV919





GGGEAAAKGSS
12,584
MLVFF_P26809_3mutA





PAPGGGGSS
12,585
MLVBM_Q7SVK7_3mutA_WS





GSSEAAAK
12,586
BAEVM_P10272_3mutA





GGGGGGGG
12,587
MLVMS_P03355_PLV919





PAPGSSGGS
12,588
MLVAV_P03356_3mutA





GGGGSGGGGSGGGGSGGGGS
12,589
BAEVM_P10272_3mutA





PAP

MLVMS_P03355_3mut





EAAAKGSSPAP
12,591
XMRV6_A1Z651_3mutA





PAPEAAAKGGS
12,592
MLVFF_P26809_3mutA





GSSGGGEAAAK
12,593
BAEVM_P10272_3mutA





PAPAPAP
12,594
MLVMS_P03355_3mutA_WS





GGSEAAAKGGG
12,595
MLVMS_P03355_PLV919





GSSEAAAK
12,596
PERV_Q4VFZ2_3mut





GGGG
12,597
MLVMS_P03355_3mutA_WS





GGGGGS
12,598
MLVMS_P03355_3mut





GGGGSSEAAAK
12,599
PERV_Q4VFZ2_3mut





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,600
SFV3L_P27401-Pro_2mutA





GGSEAAAKGSS
12,601
MLVMS_P03355_3mutA_WS





PAPGSSGGS
12,602
XMRV6_A1Z651_3mutA





GGSPAP
12,603
MLVMS_P03355_3mutA_WS





GGGGSSEAAAK
12,604
BAEVM_P10272_3mut





GGSGGSGGSGGS
12,605
AVIRE_P03360_3mutA





PAPGSSGGS
12,606
MLVFF_P26809_3mutA





GSSPAPGGG
12,607
MLVMS_P03355_3mutA_WS





GGGGGGG
12,608
MLVMS_P03355_3mutA_WS





EAAAKGGGGGS
12,609
MLVMS_P03355_3mutA_WS





EAAAKGGSGGG
12,610
MLVMS_P03355_PLV919





GGGGSSEAAAK
12,611
XMRV6_A1Z651_3mutA





GGGGSEAAAKGGGGS
12,612
MLVBM_Q7SVK7_3mutA_WS





GSSGSS
12,613
MLVMS_P03355_PLV919





GGSGGG
12,614
MLVMS_P03355_PLV919





PAPEAAAKGGG
12,615
AVIRE_P03360_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,616
FOAMV_P14350-Pro_2mutA





GGGGGSGSS
12,617
PERV_Q4VFZ2_3mut





GSSGSSGSSGSSGSS
12,618
KORV_Q9TTC1-Pro_3mut





GGGGSEAAAKGGGGS
12,619
MLVMS_P03355_3mutA_WS





GGGGGSPAP
12,620
FLV_P10273_3mut





GGGEAAAK
12,621
MLVMS_P03355_3mutA_WS





GGSGGSGGSGGS
12,622
FLV_P10273_3mutA





GGG

MLVMS_P03355_PLV919





GGSPAPEAAAK
12,624
BAEVM_P10272_3mutA





EAAAKEAAAK
12,625
FLV_P10273_3mutA





GGGEAAAKPAP
12,626
BAEVM_P10272_3mutA





GGGEAAAKGGS
12,627
PERV_Q4VFZ2_3mut





GGSGGSGGS
12,628
PERV_Q4VFZ2_3mut





EAAAKGGGPAP
12,629
XMRV6_A1Z651_3mutA





EAAAK
12,630
MLVBM_Q7SVK7_3mutA_WS





PAPEAAAKGGG
12,631
PERV_Q4VFZ2_3mut





EAAAKGSS
12,632
MLVCB_P08361_3mutA





GGSEAAAKGGG
12,633
MLVBM_Q7SVK7_3mutA_WS





GGGGGGGGSGGGGSGGGGS
12,634
XMRV6_A1Z651_3mutA





GGGGSGGGGSGGGGSGGGGSGGGGS
12,635
BAEVM_P10272_3mut





GGGGSSPAP
12,636
PERV_Q4VFZ2_3mutA_WS





GGSGGSGGSGGSGGSGGS
12,637
PERV_Q4VFZ2_3mut





GGGEAAAKPAP
12,638
PERV_Q4VFZ2_3mut





EAAAKEAAAK
12,639
BAEVM_P10272_3mutA





GGSGSSEAAAK
12,640
XMRV6_A1Z651_3mutA





PAPEAAAKGSS
12,641
WMSV_P03359_3mutA





PAPAPAPAPAP
12,642
XMRV6_A1Z651_3mutA





GSSGGGEAAAK
12,643
MLVMS_P03355_PLV919





GSSPAPGGG
12,644
MLVFF_P26809_3mutA





GGSPAPEAAAK
12,645
MLVFF_P26809_3mut





PAPGGSEAAAK
12,646
PERV_Q4VFZ2_3mut





GGGGSS
12,647
MLVFF_P26809_3mutA





GGSGSSGGG
12,648
BAEVM_P10272_3mutA





GSSGGGEAAAK
12,649
MLVMS_P03355_3mutA_WS





EAAAKGGS
12,650
MLVBM_Q7SVK7_3mutA_WS





GGGPAPGGS
12,651
MLVMS_P03355_PLV919





EAAAKEAAAK
12,652
MLVMS_P03355_PLV919





GSSGSSGSS
12,653
MLVMS_P03355_PLV919





GGGEAAAKPAP
12,654
MLVAV_P03356_3mutA





SGSETPGTSESATPES
12,655
FLV_P10273_3mutA





PAPAPAPAPAP
12,656
KORV_Q9TTC1-Pro_3mut





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,657
BAEVM_P10272_3mutA





PAPGSSGGG
12,658
MLVMS_P03355_3mutA_WS





GSSGGGEAAAK
12,659
XMRV6_A1Z651_3mutA





GGGGSGGGGSGGGGSGGGGSGGGGS
12,660
XMRV6_A1Z651_3mutA





GGGGSSPAP
12,661
MLVFF_P26809_3mutA





GGSGGGPAP
12,662
PERV_Q4VFZ2_3mutA_WS





GSS

PERV_Q4VFZ2_3mut





EAAAKGSSPAP
12,664
MLVMS_P03355_3mut





EAAAKGGG
12,665
XMRV6_A1Z651_3mutA





GSSGSSGSSGSS
12,666
WMSV_P03359_3mutA





PAPEAAAKGSS
12,667
MLVMS_P03355_PLV919





GSSEAAAK
12,668
AVIRE_P03360_3mutA





EAAAKGGSGSS
12,669
AVIRE_P03360_3mutA





GSSEAAAK
12,670
MLVMS_P03355_3mut





GGSGSSEAAAK
12,671
MLVMS_P03355_PLV919





GGSEAAAKGGG
12,672
MLVFF_P26809_3mutA





GGGGSGGGGSGGGGSGGGGS
12,673
MLVAV_P03356_3mutA





PAPAPAPAPAPAP
12,674
MLVFF_P26809_3mut





EAAAKPAPGSS
12,675
KORV_Q9TTC1-Pro_3mut





PAPGSSEAAAK
12,676
MLVAV_P03356_3mutA





GGGGSSPAP
12,677
WMSV_P03359_3mutA





EAAAKGGGGGS
12,678
MLVMS_P03355_3mutA_WS





GGGEAAAKGGS
12,679
MLVMS_P03355_3mut





GGSGSSGGG
12,680
MLVMS_P03355_3mut





GGGPAPGGS
12,681
MLVAV_P03356_3mutA





PAPGGGGGS
12,682
MLVMS_P03355_PLV919





GGGPAPGSS
12,683
PERV_Q4VFZ2_3mut





GGGGGGG
12,684
MLVFF_P26809_3mutA





GGSGGGGSS
12,685
MLVCB_P08361_3mutA





GGGGGG
12,686
FLV_P10273_3mutA





GGSEAAAKGSS
12,687
PERV_Q4VFZ2_3mut





GGSPAPGGG
12,688
BAEVM_P10272_3mutA





GGSPAPGSS
12,689
AVIRE_P03360_3mutA





GGSGGSGGSGGS
12,690
KORV_Q9TTC1_3mut





EAAAKEAAAKEAAAKEAAAKEAAAK
12,691
MLVBM_Q7SVK7_3mut





PAPGSSGGS
12,692
XMRV6_A1Z651_3mut





EAAAKGGGGSS
12,693
PERV_Q4VFZ2_3mutA_WS





GGSGGSGGSGGSGGS
12,694
PERV_Q4VFZ2_3mutA_WS





PAPGGSGGG
12,695
MLVMS_P03355_PLV919





PAPGSSGGG
12,696
PERV_Q4VFZ2_3mutA_WS





GSSGSS
12,697
BAEVM_P10272_3mutA





EAAAKGSS
12,698
MLVFF_P26809_3mutA





GGGPAP
12,699
MLVMS_P03355_PLV919





EAAAKGGGGGS
12,700
MLVFF_P26809_3mutA





EAAAKGGSPAP
12,701
MLVBM_Q7SVK7_3mutA_WS





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,702
WMSV_P03359_3mutA





GSSPAPGGG
12,703
MLVBM_Q7SVK7_3mutA_WS





GGGEAAAKGSS
12,704
AVIRE_P03360_3mutA





GGGGSSEAAAK
12,705
AVIRE_P03360_3mutA





GGGGGGGG
12,706
PERV_Q4VFZ2_3mutA_WS





PAPGSSEAAAK
12,707
BAEVM_P10272_3mutA





EAAAKGSS
12,708
MLVFF_P26809_3mut





GSSEAAAKGGG
12,709
MLVCB_P08361_3mutA





GGSEAAAK
12,710
MLVBM_Q7SVK7_3mutA_WS





GSSEAAAKGGG
12,711
PERV_Q4VFZ2_3mutA_WS





PAPGGSGGG
12,712
WMSV_P03359_3mutA





GSSGGSGGG
12,713
MLVCB_P08361_3mutA





EAAAKGSSGGG
12,714
FLV_P10273_3mutA





GSSEAAAK
12,715
MLVCB_P08361_3mutA





GSSGGGEAAAK
12,716
MLVMS_P03355_3mut





GGGGSGGGGS
12,717
MLVCB_P08361_3mutA





EAAAKGGGGSEAAAK
12,718
MLVBM_Q7SVK7_3mutA_WS





EAAAKGGG
12,719
PERV_Q4VFZ2_3mutA_WS





EAAAKGGSPAP
12,720
MLVMS_P03355_PLV919





GGGPAPGGS
12,721
AVIRE_P03360_3mutA





GSSEAAAK
12,722
MLVBM_Q7SVK7_3mutA_WS





GSSGGGEAAAK
12,723
PERV_Q4VFZ2_3mut





SGSETPGTSESATPES
12,724
MLVMS_P03355_PLV919





GGSGSSPAP
12,725
MLVMS_P03355_3mut





GGGGGG
12,726
MLVBM_Q7SVK7_3mutA_WS





GGSPAPGGG
12,727
XMRV6_A1Z651_3mutA





GGSGSS
12,728
PERV_Q4VFZ2_3mutA_WS





PAP

MLVBM_Q7SVK7_3mutA_WS





EAAAKPAPGSS
12,730
MLVMS_P03355_PLV919





EAAAKGGG
12,731
MLVMS_P03355_3mut





GSSEAAAKPAP
12,732
PERV_Q4VFZ2_3mutA_WS





GGGGSS
12,733
MLVMS_P03355_3mutA_WS





GGSGSSEAAAK
12,734
PERV_Q4VFZ2_3mut





GGGGSS
12,735
BAEVM_P10272_3mutA





PAPAP
12,736
MLVFF_P26809_3mut





PAPEAAAKGGG
12,737
BAEVM_P10272_3mutA





EAAAKGGS
12,738
MLVMS_P03355_PLV919





PAPAPAPAPAPAP
12,739
PERV_Q4VFZ2_3mutA_WS





GGGGGSEAAAK
12,740
MLVMS_P03355_3mut





PAPGGS
12,741
PERV_Q4VFZ2_3mut





GGGGSS
12,742
MLVCB_P08361_3mutA





GGGGS
12,743
MLVAV_P03356_3mutA





GSSPAPEAAAK
12,744
MLVMS_P03355_PLV919





GGGGSSGGS
12,745
MLVFF_P26809_3mutA





PAPEAAAKGSS
12,746
MLVMS_P03355_PLV919





GGSGSSEAAAK
12,747
MLVMS_P03355_3mutA_WS





EAAAKGGG
12,748
MLVAV_P03356_3mutA





PAPGSSEAAAK
12,749
FLV_P10273_3mutA





EAAAKGSSGGG
12,750
MLVCB_P08361_3mutA





PAPEAAAK
12,751
KORV_Q9TTC1-Pro_3mutA





GGSPAPEAAAK
12,752
KORV_Q9TTC1-Pro_3mut





GGSGGSGGSGGSGGSGGS
12,753
MLVAV_P03356_3mutA





GSSEAAAKPAP
12,754
MLVBM_Q7SVK7_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,755
KORV_Q9TTC1-Pro_3mutA





GSSGGGEAAAK
12,756
XMRV6_A1Z651_3mut





PAPGGSGGG
12,757
AVIRE_P03360_3mutA





PAPGGSEAAAK
12,758
PERV_Q4VFZ2_3mutA_WS





GGGGS
12,759
MLVMS_P03355_3mutA_WS





GGGGSGGGGSGGGGS
12,760
MLVBM_Q7SVK7_3mutA_WS





PAPAPAPAPAP
12,761
PERV_Q4VFZ2_3mutA_WS





EAAAKEAAAKEAAAKEAAAKEAAAK
12,762
MLVMS_P03355_3mut





GSSGGSEAAAK
12,763
MLVMS_P03355_3mutA_WS





GGSGGSGGSGGS
12,764
WMSV_P03359_3mutA





EAAAKGSSGGG
12,765
WMSV_P03359_3mutA





EAAAKGGG
12,766
PERV_Q4VFZ2_3mutA_WS





SGSETPGTSESATPES
12,767
PERV_Q4VFZ2_3mut





PAPGSSGGS
12,768
MLVMS_P03355_3mutA_WS





PAPEAAAKGSS
12,769
PERV_Q4VFZ2_3mut





PAPEAAAK
12,770
AVIRE_P03360_3mutA





GSSEAAAKGGG
12,771
BAEVM_P10272_3mutA





GSSPAP
12,772
MLVAV_P03356_3mutA





EAAAKEAAAKEAAAKEAAAK
12,773
MLVFF_P26809_3mut





PAPGGSGSS
12,774
MLVAV_P03356_3mutA





GGGGSGGGGSGGGGS
12,775
PERV_Q4VFZ2_3mutA_WS





GSSGGSEAAAK
12,776
MLVCB_P08361_3mutA





EAAAKGGS
12,777
KORV_Q9TTC1-Pro_3mutA





EAAAKGGS
12,778
MLVFF_P26809_3mutA





GGSPAP
12,779
MLVMS_P03355_PLV919





GGSGSS
12,780
MLVMS_P03355_PLV919





SGSETPGTSESATPES
12,781
WMSV_P03359_3mut





GGGGGGG
12,782
WMSV_P03359_3mut





GGSPAPGSS
12,783
MLVCB_P08361_3mutA





GGGGSSGGS
12,784
WMSV_P03359_3mut





PAPGGS
12,785
MLVMS_P03355_PLV919





PAPGSSGGS
12,786
MLVCB_P08361_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
12,787
MLVFF_P26809_3mut





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
12,788
PERV_Q4VFZ2_3mut





GGSGGSGGSGGSGGS
12,789
BAEVM_P10272_3mutA





GSSEAAAK
12,790
PERV_Q4VFZ2_3mut





EAAAKEAAAKEAAAKEAAAK
12,791
KORV_Q9TTC1-Pro_3mutA





GGSGGSGGSGGSGGS
12,792
MLVMS_P03355_3mut





PAPAPAPAPAPAP
12,793
MLVMS_P03355_3mut





GGSPAPEAAAK
12,794
MLVMS_P03355_PLV919





EAAAK
12,795
WMSV_P03359_3mutA





EAAAKGSSGGS
12,796
MLVBM_Q7SVK7_3mutA_WS





GGSGGGGSS
12,797
MLVMS_P03355_3mutA_WS





GGGEAAAKPAP
12,798
MLVMS_P03355_3mut





EAAAKGGSGGG
12,799
XMRV6_A1Z651_3mutA





GGGGGSEAAAK
12,800
KORV_Q9TTC1-Pro_3mutA





GGGGGG
12,801
BAEVM_P10272_3mutA





GGGGGG
12,802
MLVMS_P03355_3mut





GGGGGGG
12,803
MLVBM_Q7SVK7_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,804
AVIRE_P03360





PAPGSSGGS
12,805
PERV_Q4VFZ2_3mut





GGGGGS
12,806
XMRV6_A1Z651_3mut





EAAAKPAP
12,807
XMRV6_A1Z651_3mutA





GGG

MLVMS_P03355_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,809
FLV_P10273_3mut





EAAAKGSSPAP
12,810
MLVMS_P03355_3mut





SGSETPGTSESATPES
12,811
BAEVM_P10272_3mutA





GGSPAPEAAAK
12,812
MLVMS_P03355_3mut





GSSGSSGSSGSS
12,813
MLVAV_P03356_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,814
MLVMS_P03355_3mut





GGSPAP
12,815
MLVCB_P08361_3mutA





GGGGGSEAAAK
12,816
MLVMS_P03355_3mutA_WS





GGGGG
12,817
MLVFF_P26809_3mutA





GSSEAAAK
12,818
MLVAV_P03356_3mutA





GGS

BAEVM_P10272_3mut





EAAAKGGSPAP
12,820
MLVCB_P08361_3mutA





PAPAPAPAP
12,821
FLV_P10273_3mutA





PAPGGGEAAAK
12,822
MLVCB_P08361_3mutA





GGGGSSEAAAK
12,823
MLVMS_P03355_3mutA_WS





GGGGG
12,824
PERV_Q4VFZ2_3mutA_WS





GGSGGSGGSGGSGGSGGS
12,825
PERV_Q4VFZ2_3mut





GGGGG
12,826
MLVMS_P03355_3mut





PAPEAAAKGGG
12,827
MLVBM_Q7SVK7_3mutA_WS





GSSGGGPAP
12,828
XMRV6_A1Z651_3mutA





GSSGSSGSSGSSGSSGSS
12,829
PERV_Q4VFZ2_3mutA_WS





EAAAKGGSPAP
12,830
PERV_Q4VFZ2_3mut





GSSGGSEAAAK
12,831
MLVMS_P03355_PLV919





GSS

PERV_Q4VFZ2_3mut





EAAAKGGS
12,833
WMSV_P03359_3mutA





GGGGGSPAP
12,834
PERV_Q4VFZ2_3mutA_WS





EAAAKGSS
12,835
MLVMS_P03355_PLV919





EAAAKGGGGSS
12,836
KORV_Q9TTC1-Pro_3mutA





PAPGSSGGG
12,837
PERV_Q4VFZ2_3mut





GGGGSSEAAAK
12,838
MLVFF_P26809_3mut





PAPAPAP
12,839
MLVMS_P03355_3mut





GSSGGSEAAAK
12,840
XMRV6_A1Z651_3mut





PAPEAAAKGSS
12,841
MLVMS_P03355_3mutA_WS





GGSGGSGGSGGSGGS
12,842
MLVMS_P03355_3mutA_WS





GGSGSSPAP
12,843
XMRV6_A1Z651_3mutA





GGGGSSPAP
12,844
MLVMS_P03355_PLV919





GGGGS
12,845
MLVCB_P08361_3mutA





EAAAKEAAAKEAAAKEAAAK
12,846
PERV_Q4VFZ2_3mutA_WS





EAAAKEAAAK
12,847
KORV_Q9TTC1_3mutA





PAPGGGEAAAK
12,848
BAEVM_P10272_3mutA





GSSGGSEAAAK
12,849
XMRV6_A1Z651_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
12,850
FLV_P10273_3mut





GSSEAAAKPAP
12,851
MLVMS_P03355_3mutA_WS





EAAAKPAPGSS
12,852
PERV_Q4VFZ2_3mutA_WS





GSSGGSPAP
12,853
XMRV6_A1Z651_3mutA





GSSEAAAKGGG
12,854
PERV_Q4VFZ2_3mut





GGGEAAAKGGS
12,855
WMSV_P03359_3mutA





GSSEAAAKGGG
12,856
MLVFF_P26809_3mut





PAPAPAP
12,857
KORV_Q9TTC1-Pro_3mutA





EAAAKGGSPAP
12,858
MLVMS_P03355_3mutA_WS





PAPGGSEAAAK
12,859
PERV_Q4VFZ2_3mut





GGGGS
12,860
MLVBM_Q7SVK7_3mutA_WS





EAAAKGSSGGG
12,861
KORV_Q9TTC1_3mut





EAAAKGGGPAP
12,862
MLVCB_P08361_3mutA





EAAAKGSS
12,863
BAEVM_P10272_3mutA





GGSPAPGGG
12,864
MLVBM_Q7SVK7_3mutA_WS





GGGGSEAAAKGGGGS
12,865
MLVMS_P03355_3mutA_WS





GGGEAAAKGGS
12,866
PERV_Q4VFZ2_3mutA_WS





EAAAKGGGGSS
12,867
MLVMS_P03355_3mutA_WS





EAAAKGGGPAP
12,868
MLVFF_P26809_3mut





GSSPAP
12,869
PERV_Q4VFZ2_3mutA_WS





EAAAKGGS
12,870
MLVMS_P03355_3mut





GGGGSS
12,871
KORV_Q9TTC1-Pro_3mutA





EAAAKGSSPAP
12,872
MLVMS_P03355_3mutA_WS





GGGPAP
12,873
PERV_Q4VFZ2_3mut





EAAAKGSSGGS
12,874
XMRV6_A1Z651_3mutA





PAPGGG
12,875
MLVAV_P03356_3mutA





GSSPAPEAAAK
12,876
BAEVM_P10272_3mutA





GGGPAP
12,877
MLVBM_Q7SVK7_3mutA_WS





GSSGGGGGS
12,878
AVIRE_P03360_3mutA





SGSETPGTSESATPES
12,879
MLVMS_P03355_PLV919





GGGPAP
12,880
MLVFF_P26809_3mut





EAAAKGGGGSS
12,881
XMRV6_A1Z651_3mutA





GGGGSSPAP
12,882
XMRV6_A1Z651_3mut





GGGGSEAAAKGGGGS
12,883
MLVMS_P03355_3mut





GSSPAP
12,884
MLVBM_Q7SVK7_3mutA_WS





GGSGSSEAAAK
12,885
FLV_P10273_3mutA





SGSETPGTSESATPES
12,886
MLVBM_Q7SVK7_3mutA_WS





PAPGGG
12,887
AVIRE_P03360_3mutA





GGGEAAAKPAP
12,888
MLVMS_P03355_3mutA_WS





EAAAKGGSGSS
12,889
PERV_Q4VFZ2_3mut





GGSPAPGGG
12,890
MLVAV_P03356_3mutA





PAPGGSGSS
12,891
BAEVM_P10272_3mutA





GSSGGSPAP
12,892
MLVFF_P26809_3mutA





EAAAKGSSGGG
12,893
PERV_Q4VFZ2_3mut





GGGGSGGGGS
12,894
PERV_Q4VFZ2_3mutA_WS





GSSGGGGGS
12,895
BAEVM_P10272_3mutA





GGGGSSGGS
12,896
MLVBM_Q7SVK7_3mutA_WS





EAAAKGGS
12,897
PERV_Q4VFZ2_3mutA_WS





GSSGSSGSSGSS
12,898
MLVMS_P03355_3mut





GGS

MLVMS_P03355_3mutA_WS





GSSGGSEAAAK
12,900
MLVBM_Q7SVK7_3mutA_WS





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
12,901
XMRV6_A1Z651





GGGGG
12,902
FLV_P10273_3mutA





PAPEAAAKGSS
12,903
PERV_Q4VFZ2_3mut





GGGGGG
12,904
WMSV_P03359_3mut





EAAAKGGG
12,905
BAEVM_P10272_3mutA





GGGGSS
12,906
MLVMS_P03355_3mutA_WS





GSSGGGEAAAK
12,907
KORV_Q9TTC1_3mut





GGSGSS
12,908
AVIRE_P03360_3mutA





EAAAKPAP
12,909
MLVMS_P03355_3mut





EAAAKEAAAKEAAAK
12,910
FLV_P10273_3mutA





GGGG
12,911
XMRV6_A1Z651_3mutA





GSSPAPGGS
12,912
BAEVM_P10272_3mutA





GSSGGGGGS
12,913
MLVFF_P26809_3mutA





GGGGSSGGS
12,914
MLVAV_P03356_3mutA





GGS

PERV_Q4VFZ2_3mut





GGGGG
12,916
WMSV_P03359_3mutA





GSSGSSGSSGSSGSSGSS
12,917
FLV_P10273_3mutA





PAPGGGGSS
12,918
MLVAV_P03356_3mutA





GGGGGGGG
12,919
BAEVM_P10272_3mutA





SGSETPGTSESATPES
12,920
MLVCB_P08361_3mutA





PAPGGG
12,921
BAEVM_P10272_3mutA





GSSGSSGSS
12,922
MLVCB_P08361_3mutA





GGSGSS
12,923
MLVMS_P03355_3mutA_WS





EAAAKGGGGSEAAAK
12,924
WMSV_P03359_3mutA





GGGGGGGG
12,925
FLV_P10273_3mutA





GSSGSS
12,926
MLVMS_P03355_3mutA_WS





PAPEAAAKGGS
12,927
XMRV6_A1Z651_3mutA





EAAAKEAAAK
12,928
MLVMS_P03355_3mut





GGGGSGGGGSGGGGS
12,929
BAEVM_P10272_3mutA





EAAAKGSSPAP
12,930
MLVMS_P03355_PLV919





GGGGSSEAAAK
12,931
MLVMS_P03355_3mut





GGGGSSEAAAK
12,932
BAEVM_P10272_3mutA





PAPGGSGSS
12,933
PERV_Q4VFZ2_3mut





GGSGGGEAAAK
12,934
MLVFF_P26809_3mut





PAPEAAAKGGS
12,935
PERV_Q4VFZ2_3mut





GGGPAPGSS
12,936
AVIRE_P03360_3mut





PAPGGSGGG
12,937
PERV_Q4VFZ2_3mutA_WS





GGGGGGGG
12,938
PERV_Q4VFZ2_3mutA_WS





GSSEAAAK
12,939
MLVMS_P03355_3mutA_WS





GGGGSGGGGSGGGGS
12,940
PERV_Q4VFZ2_3mutA_WS





EAAAKGGS
12,941
MLVMS_P03355_3mut





GGGGGSGSS
12,942
MLVCB_P08361_3mut





GGGPAP
12,943
KORV_Q9TTC1-Pro_3mutA





EAAAKPAPGGG
12,944
MLVCB_P08361_3mut





GSSGGSPAP
12,945
MLVCB_P08361_3mutA





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
12,946
MLVMS_P03355_3mut





PAPAPAPAP
12,947
MLVMS_P03355_3mut





GSSGGS
12,948
XMRV6_A1Z651_3mutA





GSSEAAAKGGG
12,949
MLVMS_P03355_3mut





GGSGSSPAP
12,950
MLVMS_P03355_3mutA_WS





GSSEAAAKGGS
12,951
MLVMS_P03355_PLV919





EAAAKEAAAKEAAAKEAAAKEAAAK
12,952
BAEVM_P10272_3mut





PAPGGGGSS
12,953
KORV_Q9TTC1_3mutA





EAAAKGSS
12,954
MLVMS_P03355_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,955
FFV_O93209_2mut





GGSGGSGGSGGSGGSGGS
12,956
BAEVM_P10272_3mutA





GGGGGG
12,957
MLVMS_P03355_PLV919





PAPEAAAK
12,958
BAEVM_P10272_3mutA





GGSGSSEAAAK
12,959
MLVAV_P03356_3mutA





GGG

MLVCB_P08361_3mutA





GGGGG
12,961
MLVCB_P08361_3mutA





GGSGGSGGSGGS
12,962
KORV_Q9TTC1-Pro_3mutA





GSSGSSGSSGSSGSSGSS
12,963
XMRV6_A1Z651_3mutA





GSSEAAAKPAP
12,964
FLV_P10273_3mutA





GGGEAAAKPAP
12,965
MLVCB_P08361_3mutA





GSSGSSGSS
12,966
MLVMS_P03355_3mutA_WS





PAPAPAPAP
12,967
MLVMS_P03355_PLV919





EAAAKGGG
12,968
MLVMS_P03355_PLV919





PAPAPAPAPAPAP
12,969
FLV_P10273_3mutA





EAAAKGGSGSS
12,970
MLVMS_P03355_3mut





GGGGGG
12,971
PERV_Q4VFZ2_3mutA_WS





PAPGGG
12,972
MLVCB_P08361_3mutA





GGGGGSGSS
12,973
KORV_Q9TTC1_3mutA





GGGGSGGGGSGGGGSGGGGS
12,974
XMRV6_A1Z651_3mut





GGSGGSGGS
12,975
KORV_Q9TTC1-Pro_3mutA





EAAAKPAPGGG
12,976
MLVMS_P03355_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
12,977
XMRV6_A1Z651





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
12,978
FLV_P10273_3mutA





EAAAKGGGGSEAAAK
12,979
PERV_Q4VFZ2_3mutA_WS





GGGPAPGSS
12,980
AVIRE_P03360_3mutA





GGGGG
12,981
MLVMS_P03355_3mutA_WS





GGGGSGGGGSGGGGSGGGGGGGGSGGGGS
12,982
MLVMS_P03355_3mut





GGGGSGGGGS
12,983
MLVMS_P03355_3mutA_WS





EAAAKGGSPAP
12,984
XMRV6_A1Z651_3mutA





EAAAKGSSPAP
12,985
AVIRE_P03360_3mutA





PAPGGSGSS
12,986
KORV_Q9TTC1-Pro_3mutA





GSS

MLVBM_Q7SVK7_3mutA_WS





GSS

WMSV_P03359_3mut





GGGPAPGSS
12,989
MLVFF_P26809_3mutA





EAAAKPAP
12,990
MLVMS_P03355_3mut





GSSPAPEAAAK
12,991
FLV_P10273_3mutA





GGSPAPGSS
12,992
MLVBM_Q7SVK7_3mutA_WS





GGGGGSEAAAK
12,993
XMRV6_A1Z651_3mut





PAPEAAAKGGG
12,994
WMSV_P03359_3mutA





PAPGGG
12,995
PERV_Q4VFZ2_3mut





GGSPAPEAAAK
12,996
WMSV_P03359_3mutA





GGSGGGGSS
12,997
PERV_Q4VFZ2_3mut





EAAAKGGGGSS
12,998
PERV_Q4VFZ2_3mut





EAAAKGGSPAP
12,999
AVIRE_P03360_3mut





GGSGGGGSS
13,000
WMSV_P03359_3mutA





PAPGSSEAAAK
13,001
MLVFF_P26809_3mut





GSSEAAAK
13,002
MLVMS_P03355_PLV919





GSAGSAAGSGEF
13,003
AVIRE_P03360_3mutA





EAAAKGGSGSS
13,004
MLVMS_P03355_3mut





GGSEAAAKPAP
13,005
MLVMS_P03355_PLV919





GGGGSGGGGSGGGGSGGGGSGGGGS
13,006
MLVFF_P26809_3mutA





PAPGSSEAAAK
13,007
PERV_Q4VFZ2_3mutA_WS





GGGGSSPAP
13,008
MLVMS_P03355_3mutA_WS





PAPAPAP
13,009
MLVCB_P08361_3mutA





EAAAKPAPGGG
13,010
MLVBM_Q7SVK7_3mutA_WS





GGGPAPGSS
13,011
BAEVM_P10272_3mutA





PAP

MLVMS_P03355_3mutA_WS





PAPGGSGGG
13,013
MLVMS_P03355_3mutA_WS





GGSGGSGGSGGSGGS
13,014
MLVBM_Q7SVK7_3mutA_WS





PAPAPAPAP
13,015
XMRV6_A1Z651_3mut





GSSPAPGGG
13,016
MLVMS_P03355_3mutA_WS





GSSPAPGGG
13,017
MLVMS_P03355_3mut





PAPGGG
13,018
MLVMS_P03355_PLV919





GGGEAAAKGSS
13,019
WMSV_P03359_3mut





EAAAKGSS
13,020
KORV_Q9TTC1-Pro_3mutA





EAAAKGGS
13,021
PERV_Q4VFZ2_3mut





EAAAKEAAAKEAAAKEAAAKEAAAK
13,022
PERV_Q4VFZ2_3mut





PAPEAAAKGGG
13,023
MLVMS_P03355_PLV919





EAAAKGSSGGG
13,024
MLVFF_P26809_3mut





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,025
PERV_Q4VFZ2





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,026
MLVAV_P03356_3mutA





GSSGGSGGG
13,027
MLVFF_P26809_3mut





GSSGSSGSSGSS
13,028
PERV_Q4VFZ2_3mutA_WS





GGSPAPGGG
13,029
MLVMS_P03355_PLV919





GSS

BAEVM_P10272_3mut





GGGPAPGSS
13,031
MLVMS_P03355_3mutA_WS





GGGGSS
13,032
KORV_Q9TTC1_3mutA





GSSGGSGGG
13,033
BAEVM_P10272_3mutA





EAAAKEAAAKEAAAK
13,034
MLVCB_P08361_3mutA





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
13,035
FLV_P10273_3mutA





PAPGGGGGS
13,036
PERV_Q4VFZ2_3mut





PAPAPAPAPAP
13,037
KORV_Q9TTC1-Pro_3mutA





EAAAK
13,038
MLVMS_P03355_3mutA_WS





GGG

MLVCB_P08361_3mut





GGSEAAAKGGG
13,040
BAEVM_P10272_3mutA





GGGGGSGSS
13,041
MLVAV_P03356_3mutA





EAAAKGSSPAP
13,042
MLVBM_Q7SVK7_3mutA_WS





GGSGGSGGS
13,043
XMRV6_A1Z651_3mut





EAAAKPAPGGG
13,044
KORV_Q9TTC1-Pro_3mutA





GGGPAPEAAAK
13,045
FLV_P10273_3mutA





GGSPAPEAAAK
13,046
MLVMS_P03355_3mutA_WS





GGSGGSGGSGGSGGS
13,047
MLVFF_P26809_3mut





EAAAKGGSGSS
13,048
MLVMS_P03355_PLV919





GGGEAAAKGGS
13,049
MLVBM_Q7SVK7_3mutA_WS





PAPAPAPAP
13,050
BAEVM_P10272_3mutA





EAAAKEAAAKEAAAKEAAAK
13,051
MLVMS_P03355_3mut





EAAAKPAP
13,052
XMRV6_A1Z651_3mut





EAAAKEAAAK
13,053
MLVBM_Q7SVK7_3mutA_WS





EAAAKGGG
13,054
BAEVM_P10272_3mut





EAAAKGSS
13,055
MLVAV_P03356_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,056
MLVFF_P26809_3mut





GGGPAPGSS
13,057
PERV_Q4VFZ2_3mutA_WS





GGGG
13,058
PERV_Q4VFZ2_3mut





EAAAKGGSGSS
13,059
MLVMS_P03355_PLV919





GGGGSGGGGSGGGGS
13,060
MLVMS_P03355_3mutA_WS





EAAAK
13,061
MLVMS_P03355_3mutA_WS





GGGGSS
13,062
PERV_Q4VFZ2





PAPEAAAKGGS
13,063
MLVCB_P08361_3mut





GSS

MLVMS_P03355_3mut





GSAGSAAGSGEF
13,065
MLVFF_P26809_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,066
KORV_Q9TTC1-Pro_3mut





GGGGSGGGGS
13,067
AVIRE_P03360_3mutA





EAAAK
13,068
MLVMS_P03355_3mut





GGGPAPGGS
13,069
PERV_Q4VFZ2_3mut





GGGGSGGGGSGGGGS
13,070
MLVMS_P03355_PLV919





PAPGGG
13,071
MLVMS_P03355_3mutA_WS





GGGEAAAKPAP
13,072
PERV_Q4VFZ2_3mutA_WS





EAAAKPAPGSS
13,073
KORV_Q9TTC1-Pro_3mutA





PAPGSS
13,074
KORV_Q9TTC1_3mutA





GSAGSAAGSGEF
13,075
PERV_Q4VFZ2_3mut





PAPGGGGSS
13,076
KORV_Q9TTC1-Pro_3mutA





GSSGGGEAAAK
13,077
MLVCB_P08361_3mutA





GSS

AVIRE_P03360_3mutA





GSSGSSGSSGSS
13,079
XMRV6_A1Z651_3mutA





PAPEAAAKGGG
13,080
MLVMS_P03355_PLV919





GGGPAPEAAAK
13,081
MLVCB_P08361_3mutA





PAPGGGGGS
13,082
MLVCB_P08361_3mutA





EAAAKEAAAKEAAAKEAAAK
13,083
PERV_Q4VFZ2_3mutA_WS





GGGGGSPAP
13,084
MLVFF_P26809_3mutA





GSSGSSGSSGSSGSS
13,085
PERV_Q4VFZ2





GSSPAPEAAAK
13,086
MLVMS_P03355_PLV919





GSSGSSGSSGSSGSSGSS
13,087
MLVBM_Q7SVK7_3mutA_WS





GSSGSSGSSGSSGSSGSS
13,088
MLVMS_P03355_3mutA_WS





GGSPAPEAAAK
13,089
MLVAV_P03356_3mutA





GSSGGG
13,090
BAEVM_P10272_3mut





EAAAKGSSGGS
13,091
KORV_Q9TTC1-Pro_3mutA





GGSGSSEAAAK
13,092
MLVMS_P03355_3mutA_WS





GGGPAPEAAAK
13,093
MLVFF_P26809_3mutA





GGGPAPGGS
13,094
MLVMS_P03355_3mutA_WS





GGGGG
13,095
MLVMS_P03355_PLV919





GGGEAAAKPAP
13,096
MLVBM_Q7SVK7_3mutA_WS





GGGGSGGGGS
13,097
WMSV_P03359_3mut





GGGPAPEAAAK
13,098
PERV_Q4VFZ2_3mut





GGSGSSEAAAK
13,099
MLVMS_P03355_PLV919





EAAAKGGGPAP
13,100
MLVMS_P03355_3mutA_WS





GSSGSSGSSGSSGSS
13,101
KORV_Q9TTC1-Pro_3mutA





PAPAP
13,102
WMSV_P03359_3mutA





GGSPAPGSS
13,103
MLVAV_P03356_3mutA





GGSGGGPAP
13,104
MLVMS_P03355_3mut





GGSPAP
13,105
MLVMS_P03355_PLV919





EAAAKGGSPAP
13,106
PERV_Q4VFZ2_3mut





GSSPAPGGG
13,107
KORV_Q9TTC1-Pro_3mutA





GSAGSAAGSGEF
13,108
MLVMS_P03355_3mut





GGSPAP
13,109
PERV_Q4VFZ2_3mut





GSSGSS
13,110
KORV_Q9TTC1-Pro_3mut





GGGPAPGSS
13,111
MLVMS_P03355_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,112
FOAMV_P14350





PAPGSSGGG
13,113
MLVMS_P03355_PLV919





GGSEAAAKPAP
13,114
BAEVM_P10272_3mutA





GGGGGS
13,115
MLVCB_P08361_3mutA





PAPEAAAKGGS
13,116
MLVMS_P03355_3mutA_WS





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,117
BAEVM_P10272_3mutA





GGSEAAAK
13,118
BAEVM_P10272_3mutA





GSSPAPEAAAK
13,119
MLVMS_P03355_3mutA_WS





PAPGGG
13,120
WMSV_P03359_3mut





EAAAKPAP
13,121
PERV_Q4VFZ2_3mut





GSSGSSGSSGSSGSS
13,122
WMSV_P03359_3mut





PAPGGG
13,123
MLVBM_Q7SVK7_3mutA_WS





GGSGGGEAAAK
13,124
BAEVM_P10272_3mutA





PAPGGS
13,125
MLVMS_P03355_3mut





GGSGGSGGSGGS
13,126
MLVBM_Q7SVK7_3mutA_WS





EAAAKEAAAKEAAAKEAAAK
13,127
PERV_Q4VFZ2_3mut





GGSEAAAKGGG
13,128
WMSV_P03359_3mutA





GGGPAP
13,129
BAEVM_P10272_3mutA





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
13,130
XMRV6_A1Z651_3mut





GGSPAPGSS
13,131
KORV_Q9TTC1_3mut





GGGPAPGSS
13,132
MLVMS_P03355_3mut





GGGGSSGGS
13,133
BAEVM_P10272_3mutA





GGGEAAAKGSS
13,134
KORV_Q9TTC1-Pro_3mutA





PAPAP
13,135
MLVBM_Q7SVK7_3mutA_WS





GGSPAPGGG
13,136
PERV_Q4VFZ2_3mut





PAPGSS
13,137
PERV_Q4VFZ2_3mutA_WS





GSSGGSPAP
13,138
MLVBM_Q7SVK7_3mutA_WS





EAAAKGGGGSEAAAK
13,139
PERV_Q4VFZ2_3mut





GSSEAAAKGGS
13,140
KORV_Q9TTC1-Pro_3mut





PAPAPAPAP
13,141
KORV_Q9TTC1-Pro_3mutA





GGSEAAAKPAP
13,142
WMSV_P03359_3mutA





PAPGGS
13,143
FLV_P10273_3mutA





EAAAKGGGPAP
13,144
PERV_Q4VFZ2_3mut





GGSGSSGGG
13,145
AVIRE_P03360_3mutA





EAAAKGGSGSS
13,146
BAEVM_P10272_3mutA





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
13,147
MLVCB_P08361_3mutA





GSSEAAAKGGS
13,148
XMRV6_A1Z651_3mutA





GGGGG
13,149
BAEVM_P10272_3mutA





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
13,150
SFV3L_P27401_2mutA





GGGEAAAKGSS
13,151
MLVMS_P03355_PLV919





EAAAKGGGGSEAAAK
13,152
KORV_Q9TTC1_3mutA





EAAAKGGG
13,153
AVIRE_P03360_3mut





GGSGGG
13,154
MLVMS_P03355_3mutA_WS





GGSGSSGGG
13,155
MLVMS_P03355_PLV919





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
13,156
KORV_Q9TTC1_3mut





GGGGSEAAAKGGGGS
13,157
KORV_Q9TTC1_3mutA





PAPAPAPAPAP
13,158
FLV_P10273_3mutA





GGS

MLVBM_Q7SVK7_3mutA_WS





GGGGGSEAAAK
13,160
MLVBM_Q7SVK7_3mutA_WS





GSSGSSGSSGSSGSS
13,161
MLVMS_P03355_3mutA_WS





EAAAKEAAAKEAAAKEAAAKEAAAK
13,162
MLVMS_P03355_3mut





GGSGSSGGG
13,163
PERV_Q4VFZ2_3mut





PAP

MLVFF_P26809_3mut





GSSPAPEAAAK
13,165
MLVAV_P03356_3mutA





EAAAKGGGGSS
13,166
MLVMS_P03355_3mut





GGGEAAAKGGS
13,167
XMRV6_A1Z651_3mut





GGSGGGPAP
13,168
MLVBM_Q7SVK7_3mutA_WS





GSAGSAAGSGEF
13,169
BAEVM_P10272_3mutA





GSSEAAAK
13,170
MLVCB_P08361_3mut





PAPGSS
13,171
MLVMS_P03355_3mut





EAAAKEAAAKEAAAK
13,172
MLVAV_P03356_3mutA





GSAGSAAGSGEF
13,173
XMRV6_A1Z651_3mutA





GSSGSSGSSGSS
13,174
BAEVM_P10272_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,175
KORV_Q9TTC1-Pro_3mut





GGGGSSEAAAK
13,176
WMSV_P03359_3mut





GSSGGGEAAAK
13,177
MLVBM_Q7SVK7_3mutA_WS





EAAAKPAP
13,178
MLVFF_P26809_3mutA





GGSPAPGGG
13,179
KORV_Q9TTC1_3mutA





PAPEAAAK
13,180
FLV_P10273_3mutA





GSSGSSGSS
13,181
MLVBM_Q7SVK7_3mutA_WS





GSSGGGEAAAK
13,182
FLV_P10273_3mutA





GGSPAP
13,183
MLVBM_Q7SVK7_3mutA_WS





GSAGSAAGSGEF
13,184
KORV_Q9TTC1-Pro_3mutA





PAPGGSEAAAK
13,185
MLVMS_P03355_PLV919





GGSPAPEAAAK
13,186
MLVBM_Q7SVK7_3mutA_WS





GGGGGSPAP
13,187
MLVBM_Q7SVK7_3mutA_WS





EAAAKGSSPAP
13,188
WMSV_P03359_3mut





EAAAKGGGPAP
13,189
MLVBM_Q7SVK7_3mutA_WS





PAPGSS
13,190
KORV_Q9TTC1-Pro_3mutA





GGSGSSGGG
13,191
BAEVM_P10272_3mut





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
13,192
FFV_O93209-Pro_2mut





GGSGGSGGSGGSGGSGGS
13,193
WMSV_P03359_3mutA





GGSGGSGGS
13,194
PERV_Q4VFZ2_3mutA_WS





GGGGG
13,195
PERV_Q4VFZ2_3mutA_WS





GGGPAP
13,196
FLV_P10273_3mutA





PAPGGSGGG
13,197
XMRV6_A1Z651_3mutA





GGGGSEAAAKGGGGS
13,198
XMRV6_A1Z651_3mut





EAAAKGSSGGG
13,199
KORV_Q9TTC1-Pro_3mutA





GSSGGSEAAAK
13,200
WMSV_P03359_3mut





EAAAKGGSGSS
13,201
PERV_Q4VFZ2_3mut





PAPAPAPAPAP
13,202
PERV_Q4VFZ2_3mut





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
13,203
MLVMS_P03355_3mutA_WS





GGGGGGG
13,204
KORV_Q9TTC1_3mutA





EAAAK
13,205
KORV_Q9TTC1-Pro_3mutA





GGGEAAAKGGS
13,206
KORV_Q9TTC1-Pro_3mutA





GGGEAAAKGGS
13,207
PERV_Q4VFZ2_3mutA_WS





GGGGGSPAP
13,208
XMRV6_A1Z651_3mut





GGGGSGGGGSGGGGSGGGGS
13,209
MLVFF_P26809_3mut





GGGGGGG
13,210
MLVFF_P26809_3mut





PAPAPAPAPAPAP
13,211
AVIRE_P03360_3mutA





GSSPAPGGG
13,212
FLV_P10273_3mutA





GGGGGSPAP
13,213
MLVMS_P03355_3mutA_WS





GGGGSGGGGSGGGGS
13,214
MLVMS_P03355_3mut





GGGGSGGGGSGGGGS
13,215
KORV_Q9TTC1_3mut





GSSEAAAKGGS
13,216
MLVAV_P03356_3mutA





GSSGSSGSSGSSGSS
13,217
MLVMS_P03355_3mut





EAAAKGGGGGS
13,218
PERV_Q4VFZ2_3mutA_WS





GSSGGGGGS
13,219
PERV_Q4VFZ2_3mut





GGGEAAAKPAP
13,220
MLVMS_P03355_3mut





GSSGGSPAP
13,221
PERV_Q4VFZ2_3mutA_WS





GSSGGGPAP
13,222
BAEVM_P10272_3mutA





GGGGGSGSS
13,223
MLVMS_P03355_PLV919





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,224
BAEVM_P10272_3mut





PAPEAAAK
13,225
MLVMS_P03355_3mut





GGGGSGGGGSGGGGS
13,226
FLV_P10273_3mutA





GGSGSSGGG
13,227
WMSV_P03359_3mutA





EAAAKGGS
13,228
PERV_Q4VFZ2_3mut





EAAAKGSSPAP
13,229
MLVCB_P08361_3mut





EAAAKGGSGSS
13,230
WMSV_P03359_3mutA





GSSGSS
13,231
PERV_Q4VFZ2_3mutA_WS





PAPAPAPAP
13,232
MLVMS_P03355_PLV919





GGSGGG
13,233
PERV_Q4VFZ2_3mutA_WS





GSS

MLVBM_Q7SVK7_3mutA_WS





PAP

KORV_Q9TTC1-Pro_3mutA





GGSGSSEAAAK
13,236
MLVFF_P26809_3mut





PAPEAAAKGSS
13,237
KORV_Q9TTC1-Pro_3mutA





GGSGGS
13,238
MLVCB_P08361_3mutA





GGGGGGG
13,239
PERV_Q4VFZ2_3mutA_WS





GGSPAPEAAAK
13,240
MLVBM_Q7SVK7_3mut





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,241
KORV_Q9TTC1_3mutA





GGSPAP
13,242
MLVMS_P03355_3mut





GGSEAAAKGGG
13,243
PERV_Q4VFZ2_3mut





GGGGSGGGGS
13,244
FLV_P10273_3mutA





GGGEAAAK
13,245
BAEVM_P10272_3mutA





GGGGSGGGGGGGGSGGGGSGGGGGGGGS
13,246
SFV3L_P27401_2mut





GGSEAAAKPAP
13,247
KORV_Q9TTC1-Pro_3mutA





GSSGGGEAAAK
13,248
MLVMS_P03355_PLV919





GGGGGSEAAAK
13,249
MLVMS_P03355_PLV919





EAAAKGGSGGG
13,250
MLVMS_P03355_3mutA_WS





GGGGSSPAP
13,251
MLVAV_P03356_3mutA





EAAAKEAAAK
13,252
MLVMS_P03355_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,253
SFV3L_P27401_2mut





GSSGSSGSSGSSGSS
13,254
MLVMS_P03355_PLV919





GSSGGG
13,255
KORV_Q9TTC1-Pro_3mutA





GSSGGS
13,256
MLVFF_P26809_3mutA





GGGGSGGGGS
13,257
XMRV6_A1Z651_3mutA





PAPGSS
13,258
MLVBM_Q7SVK7_3mutA_WS





GGGPAPEAAAK
13,259
XMRV6_A1Z651_3mutA





EAAAKGGS
13,260
MLVFF_P26809_3mut





GSS

KORV_Q9TTC1_3mutA





GGGG
13,262
PERV_Q4VFZ2_3mut





GGGGGSEAAAK
13,263
AVIRE_P03360_3mutA





GSSGSSGSSGSSGSS
13,264
MLVMS_P03355_PLV919





PAPGGSGGG
13,265
PERV_Q4VFZ2_3mut





GGGPAP
13,266
PERV_Q4VFZ2_3mut





GGGPAPEAAAK
13,267
AVIRE_P03360_3mutA





GGGEAAAK
13,268
MLVCB_P08361_3mut





GGG

MLVFF_P26809_3mutA





EAAAKPAPGSS
13,270
XMRV6_A1Z651_3mutA





GGSGSSEAAAK
13,271
PERV_Q4VFZ2_3mutA_WS





EAAAKGSS
13,272
MLVMS_P03355_3mut





GGSGSSEAAAK
13,273
BAEVM_P10272_3mut





GGSGGG
13,274
MLVBM_Q7SVK7_3mutA_WS





GGGPAP
13,275
MLVMS_P03355_PLV919





GGSPAPGGG
13,276
PERV_Q4VFZ2_3mutA_WS





GGGGGSEAAAK
13,277
MLVFF_P26809_3mutA





EAAAKGSSGGS
13,278
MLVBM_Q7SVK7_3mut





PAPAP
13,279
XMRV6_A1Z651_3mut





GSSPAPGGS
13,280
MLVBM_Q7SVK7_3mutA_WS





GSSEAAAKGGG
13,281
WMSV_P03359_3mutA





EAAAKGGGGGS
13,282
PERV_Q4VFZ2_3mut





GSSGSSGSSGSSGSS
13,283
MLVCB_P08361_3mutA





EAAAKGGGGSS
13,284
PERV_Q4VFZ2_3mut





EAAAKGSS
13,285
PERV_Q4VFZ2_3mut





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,286
AVIRE_P03360_3mutA





EAAAKGGS
13,287
MLVCB_P08361_3mut





GSSGGSEAAAK
13,288
MLVAV_P03356_3mutA





EAAAKPAPGGS
13,289
PERV_Q4VFZ2_3mut





GGSGGS
13,290
MLVAV_P03356_3mutA





EAAAKGSSGGG
13,291
AVIRE_P03360_3mutA





GGSGGSGGSGGS
13,292
PERV_Q4VFZ2_3mut





GGGGGGGG
13,293
KORV_Q9TTC1_3mutA





GGSGSSEAAAK
13,294
MLVCB_P08361_3mutA





EAAAKGGG
13,295
MLVBM_Q7SVK7_3mutA_WS





GGGGSGGGGSGGGGS
13,296
MLVCB_P08361_3mut





GGSGGSGGSGGS
13,297
PERV_Q4VFZ2_3mutA_WS





PAPAPAPAPAP
13,298
WMSV_P03359_3mut





EAAAKEAAAKEAAAKEAAAK
13,299
PERV_Q4VFZ2_3mut





GGSGGSGGS
13,300
XMRV6_A1Z651_3mutA





PAPGGGGSS
13,301
BAEVM_P10272_3mutA





GSSEAAAKGGS
13,302
MLVCB_P08361_3mut





GSSGGGPAP
13,303
MLVCB_P08361_3mutA





GGSGSS
13,304
MLVBM_Q7SVK7_3mutA_WS





GGGGGSEAAAK
13,305
MLVAV_P03356_3mutA





GSSEAAAK
13,306
PERV_Q4VFZ2_3mutA_WS





GGGGGSGSS
13,307
MLVBM_Q7SVK7_3mutA_WS





EAAAKGGSGSS
13,308
MLVFF_P26809_3mut





PAP

FLV_P10273_3mutA





GGGGG
13,310
MLVMS_P03355_3mutA_WS





EAAAK
13,311
PERV_Q4VFZ2_3mut





GSS

FLV_P10273_3mutA





PAPAPAPAPAPAP
13,313
KORV_Q9TTC1-Pro_3mutA





EAAAKEAAAKEAAAKEAAAK
13,314
MLVCB_P08361_3mut





EAAAKGGGGSEAAAK
13,315
XMRV6_A1Z651_3mut





PAPGGSGGG
13,316
MLVBM_Q7SVK7_3mutA_WS





GGSGGGPAP
13,317
WMSV_P03359_3mutA





GGGGSSEAAAK
13,318
MLVBM_Q7SVK7_3mutA_WS





PAPGGGGSS
13,319
MLVCB_P08361_3mut





GGSGGSGGSGGS
13,320
PERV_Q4VFZ2_3mutA_WS





PAPGGSGGG
13,321
MLVMS_P03355_3mutA_WS





GSSPAPGGS
13,322
MLVCB_P08361_3mutA





GSSGSSGSS
13,323
MLVFF_P26809_3mut





PAPGGGGGS
13,324
MLVBM_Q7SVK7_3mutA_WS





GSSPAP
13,325
PERV_Q4VFZ2_3mut





GGSGGG
13,326
KORV_Q9TTC1-Pro_3mut





EAAAKGGGGSEAAAK
13,327
PERV_Q4VFZ2_3mutA_WS





GGSPAPEAAAK
13,328
PERV_Q4VFZ2_3mutA_WS





EAAAKPAP
13,329
BAEVM_P10272_3mut





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
13,330
MLVMS_P03355_3mut





EAAAKGGGGSS
13,331
MLVFF_P26809_3mut





EAAAKEAAAK
13,332
MLVCB_P08361_3mut





GSSEAAAKGGS
13,333
PERV_Q4VFZ2_3mut





GGSPAP
13,334
KORV_Q9TTC1-Pro_3mutA





EAAAKEAAAKEAAAKEAAAK
13,335
MLVMS_P03355_3mutA_WS





GSSGSSGSSGSSGSS
13,336
BAEVM_P10272_3mut





PAPEAAAK
13,337
MLVMS_P03355_3mut





GSSGGSPAP
13,338
PERV_Q4VFZ2





GGGPAPGGS
13,339
BAEVM_P10272_3mutA





EAAAKPAPGGS
13,340
MLVMS_P03355_PLV919





GGGGSGGGGS
13,341
PERV_Q4VFZ2





GGGEAAAK
13,342
KORV_Q9TTC1-Pro_3mut





EAAAKGGGGGS
13,343
FLV_P10273_3mutA





GGSPAPGSS
13,344
MLVMS_P03355_3mut





GSSPAPEAAAK
13,345
MLVMS_P03355_3mutA_WS





GSAGSAAGSGEF
13,346
MLVBM_Q7SVK7_3mutA_WS





EAAAK
13,347
BAEVM_P10272_3mutA





EAAAKGGGGSS
13,348
BAEVM_P10272_3mutA





GGG

WMSV_P03359_3mut





GGSGSSPAP
13,350
BAEVM_P10272_3mut





GGSEAAAKPAP
13,351
MLVBM_Q7SVK7_3mutA_WS





EAAAKGGSGSS
13,352
MLVCB_P08361_3mut





PAPGSS
13,353
MLVAV_P03356_3mutA





PAPEAAAKGGG
13,354
MLVCB_P08361_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,355
FOAMV_P14350-Pro_2mut





GSSGSSGSS
13,356
PERV_Q4VFZ2_3mut





PAPGGG
13,357
MLVMS_P03355_3mut





PAPGGS
13,358
PERV_Q4VFZ2_3mut





GSSGGG
13,359
MLVMS_P03355_PLV919





GSSGSSGSSGSSGSSGSS
13,360
WMSV_P03359_3mut





PAP

AVIRE_P03360_3mutA





EAAAKGSSPAP
13,362
MLVBM_Q7SVK7_3mutA_WS





GSSGSSGSSGSS
13,363
MLVMS_P03355_PLV919





GGGGSGGGGSGGGGSGGGGSGGGGS
13,364
AVIRE_P03360





GGGGS
13,365
PERV_Q4VFZ2_3mut





EAAAKGSSGGG
13,366
MLVBM_Q7SVK7_3mutA_WS





GGGGGG
13,367
KORV_Q9TTC1-Pro_3mut





GGSGSSEAAAK
13,368
PERV_Q4VFZ2_3mut





GSSPAPEAAAK
13,369
MLVBM_Q7SVK7_3mutA_WS





GGGGSGGGGS
13,370
MLVBM_Q7SVK7_3mutA_WS





GSSGGGGGS
13,371
MLVAV_P03356_3mutA





GSAGSAAGSGEF
13,372
WMSV_P03359_3mutA





GGGEAAAKGSS
13,373
BAEVM_P10272_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,374
FFV_O93209-Pro_2mut





PAPGGSGGG
13,375
MLVCB_P08361_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
13,376
SFV3L_P27401_2mut





GGSGSSPAP
13,377
MLVMS_P03355_PLV919





GGGGGG
13,378
PERV_Q4VFZ2_3mut





EAAAKEAAAKEAAAKEAAAKEAAAK
13,379
PERV_Q4VFZ2_3mut





EAAAKGSSPAP
13,380
MLVFF_P26809_3mut





GGGPAPGGS
13,381
MLVBM_Q7SVK7_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,382
SFV3L_P27401





PAP

PERV_Q4VFZ2_3mut





EAAAKGGS
13,384
MLVMS_P03355_PLV919





GSSGGSEAAAK
13,385
WMSV_P03359_3mutA





GGSGSSEAAAK
13,386
KORV_Q9TTC1-Pro_3mutA





EAAAKEAAAKEAAAK
13,387
PERV_Q4VFZ2





GGSGGGEAAAK
13,388
MLVMS_P03355_3mutA_WS





GGGGSGGGGSGGGGSGGGGS
13,389
BAEVM_P10272_3mut





EAAAKGSS
13,390
XMRV6_A1Z651_3mutA





GSSGGGGGS
13,391
WMSV_P03359_3mutA





GSSGSSGSSGSSGSSGSS
13,392
MLVFF_P26809_3mutA





GGSGSS
13,393
MLVAV_P03356_3mutA





EAAAKGGGGSEAAAK
13,394
MLVMS_P03355_PLV919





EAAAKGGGPAP
13,395
PERV_Q4VFZ2





GGSEAAAKGGG
13,396
MLVAV_P03356_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,397
MLVBM_Q7SVK7_3mut





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,398
KORV_Q9TTC1-Pro_3mutA





GSSPAPEAAAK
13,399
MLVFF_P26809_3mutA





GGGGSEAAAKGGGGS
13,400
PERV_Q4VFZ2_3mut





GSSGSSGSSGSS
13,401
PERV_Q4VFZ2_3mut





GGSEAAAK
13,402
MLVFF_P26809_3mutA





GGGGGGGG
13,403
MLVMS_P03355_3mut





GSSGGG
13,404
XMRV6_A1Z651_3mutA





EAAAKGGS
13,405
BAEVM_P10272_3mutA





GGGGS
13,406
BAEVM_P10272_3mutA





GGSEAAAKGGG
13,407
KORV_Q9TTC1-Pro_3mutA





GGSGSSGGG
13,408
KORV_Q9TTC1_3mutA





GGSGSSEAAAK
13,409
WMSV_P03359_3mut





EAAAKGGSGSS
13,410
MLVBM_Q7SVK7_3mutA_WS





GGS

BAEVM_P10272_3mutA





GGGPAPGSS
13,412
WMSV_P03359_3mutA





GSSGSSGSSGSSGSS
13,413
AVIRE_P03360_3mut





GGGEAAAKPAP
13,414
XMRV6_A1Z651_3mut





GSSGGG
13,415
MLVFF_P26809_3mutA





GGSPAPGSS
13,416
PERV_Q4VFZ2_3mut





PAPGGS
13,417
MLVCB_P08361_3mut





PAPAPAPAPAP
13,418
KORV_Q9TTC1_3mutA





GSSGGS
13,419
MLVCB_P08361_3mutA





GSSGGSEAAAK
13,420
PERV_Q4VFZ2_3mut





EAAAKGSSGGS
13,421
MLVMS_P03355_PLV919





EAAAKGGG
13,422
WMSV_P03359_3mut





PAPGGGGGS
13,423
BAEVM_P10272_3mutA





GGGGSEAAAKGGGGS
13,424
WMSV_P03359_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,425
MLVMS_P03355_3mutA_WS





GGS

KORV_Q9TTC1-Pro_3mutA





GSSGGSPAP
13,427
BAEVM_P10272_3mutA





GGG

MLVMS_P03355_PLV919





PAPGSS
13,429
KORV_Q9TTC1-Pro_3mut





GGSEAAAKGGG
13,430
FLV_P10273_3mutA





GGSEAAAKPAP
13,431
PERV_Q4VFZ2_3mutA_WS





GGGGSSPAP
13,432
XMRV6_A1Z651_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
13,433
PERV_Q4VFZ2_3mutA_WS





GGGG
13,434
PERV_Q4VFZ2_3mutA_WS





GGSEAAAKPAP
13,435
MLVMS_P03355_3mut





PAPGSSGGG
13,436
MLVMS_P03355_3mutA_WS





PAPEAAAKGGS
13,437
AVIRE_P03360_3mut





GGGGSSPAP
13,438
MLVMS_P03355_3mutA_WS





GGGGSGGGGSGGGGSGGGGS
13,439
PERV_Q4VFZ2_3mut





GGGEAAAK
13,440
MLVMS_P03355_3mut





GGGGSS
13,441
MLVFF_P26809_3mut





GGSPAPGSS
13,442
XMRV6_A1Z651_3mut





GGGGS
13,443
KORV_Q9TTC1-Pro_3mutA





EAAAKGSSGGS
13,444
FLV_P10273_3mutA





GSS

MLVMS_P03355_PLV919





GGGG
13,446
MLVMS_P03355_PLV919





GSSGGS
13,447
MLVMS_P03355_PLV919





GGSGGSGGSGGS
13,448
MLVMS_P03355_3mut





PAPEAAAKGGS
13,449
MLVMS_P03355_3mut





EAAAKGSSGGG
13,450
BAEVM_P10272_3mutA





GSSEAAAK
13,451
KORV_Q9TTC1-Pro_3mutA





GSAGSAAGSGEF
13,452
KORV_Q9TTC1_3mutA





GGGGGSEAAAK
13,453
MLVCB_P08361_3mut





GGGG
13,454
WMSV_P03359_3mut





GGGGSSEAAAK
13,455
MLVMS_P03355_PLV919





PAPGGG
13,456
WMSV_P03359_3mutA





EAAAKGGSGGG
13,457
MLVAV_P03356_3mutA





GGGPAPGGS
13,458
MLVMS_P03355_3mut





EAAAKPAP
13,459
PERV_Q4VFZ2_3mutA_WS





GSSGSSGSS
13,460
KORV_Q9TTC1-Pro_3mutA





GSSPAPGGS
13,461
XMRV6_A1Z651_3mut





GGGGGSPAP
13,462
BAEVM_P10272_3mutA





GGSGSSGGG
13,463
PERV_Q4VFZ2_3mutA_WS





GGGEAAAKGSS
13,464
AVIRE_P03360_3mut





GSSEAAAK
13,465
FLV_P10273_3mutA





EAAAK
13,466
MLVMS_P03355_3mut





EAAAKGGSGSS
13,467
WMSV_P03359_3mut





GSSEAAAKGGG
13,468
PERV_Q4VFZ2_3mut





PAPGSSGGG
13,469
BAEVM_P10272_3mutA





EAAAKGGGGGS
13,470
MLVMS_P03355_3mut





GGSEAAAKPAP
13,471
AVIRE_P03360_3mut





GGGPAPGGS
13,472
XMRV6_A1Z651_3mut





GGGGS
13,473
KORV_Q9TTC1_3mutA





GGSGGSGGSGGSGGS
13,474
XMRV6_A1Z651_3mut





GGGPAP
13,475
KORV_Q9TTC1-Pro_3mut





EAAAKPAP
13,476
MLVBM_Q7SVK7_3mutA_WS





GGSEAAAK
13,477
MLVMS_P03355_PLV919





GSSEAAAKPAP
13,478
KORV_Q9TTC1-Pro_3mutA





GGSGSS
13,479
MLVMS_P03355_3mut





EAAAKPAPGGG
13,480
PERV_Q4VFZ2_3mut





GGSPAPEAAAK
13,481
KORV_Q9TTC1_3mutA





GGSEAAAKGGG
13,482
AVIRE_P03360_3mutA





GGGGSEAAAKGGGGS
13,483
MLVMS_P03355_PLV919





GSSGGGEAAAK
13,484
KORV_Q9TTC1-Pro_3mutA





EAAAKGGGPAP
13,485
WMSV_P03359_3mut





GSSPAP
13,486
XMRV6_A1Z651_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,487
SFV3L_P27401-Pro





GGSEAAAKGSS
13,488
MLVMS_P03355_PLV919





GSSGGSEAAAK
13,489
KORV_Q9TTC1-Pro_3mutA





GGSEAAAKGSS
13,490
KORV_Q9TTC1-Pro_3mutA





EAAAKGGG
13,491
AVIRE_P03360_3mutA





GSSGGSEAAAK
13,492
BAEVM_P10272_3mutA





GGGGSEAAAKGGGGS
13,493
KORV_Q9TTC1-Pro_3mut





PAPGSSEAAAK
13,494
MLVMS_P03355_3mut





PAPEAAAK
13,495
WMSV_P03359_3mut





PAPGGSGSS
13,496
PERV_Q4VFZ2_3mutA_WS





PAPGSS
13,497
BAEVM_P10272_3mut





PAPGGGGGS
13,498
MLVMS_P03355_3mut





EAAAKPAPGSS
13,499
MLVBM_Q7SVK7_3mutA_WS





GSSPAPGGS
13,500
MLVMS_P03355_PLV919





GGSGSSEAAAK
13,501
MLVMS_P03355_3mut





GGGGGG
13,502
KORV_Q9TTC1-Pro_3mutA





EAAAKEAAAKEAAAKEAAAK
13,503
MLVBM_Q7SVK7_3mut





GGSPAPGSS
13,504
MLVMS_P03355_PLV919





PAPAPAPAPAP
13,505
MLVCB_P08361_3mut





GGSGSSPAP
13,506
WMSV_P03359_3mutA





EAAAKGGSGGG
13,507
PERV_Q4VFZ2_3mutA_WS





GSSGSSGSSGSSGSS
13,508
PERV_Q4VFZ2_3mut





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,509
KORV_Q9TTC1_3mutA





GSSGGGEAAAK
13,510
WMSV_P03359_3mutA





GSSGGSEAAAK
13,511
FLV_P10273_3mutA





GGGGGGGG
13,512
PERV_Q4VFZ2_3mut





PAPGGSEAAAK
13,513
FLV_P10273_3mutA





GGGGSSPAP
13,514
BAEVM_P10272_3mutA





PAPAPAPAP
13,515
WMSV_P03359_3mut





GGSEAAAKPAP
13,516
PERV_Q4VFZ2_3mut





PAPGGSGGG
13,517
BAEVM_P10272_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,518
MLVMS_P03355_3mut





GGGGSGGGGSGGGGS
13,519
PERV_Q4VFZ2_3mut





GGSGGGPAP
13,520
PERV_Q4VFZ2_3mut





GGGPAPEAAAK
13,521
MLVFF_P26809_3mut





GGGGGSGSS
13,522
MLVMS_P03355_3mutA_WS





GSS

MLVCB_P08361_3mut





GGGGGSPAP
13,524
MLVMS_P03355_PLV919





GGSPAP
13,525
MLVAV_P03356_3mutA





GGGPAPGGS
13,526
KORV_Q9TTC1-Pro_3mutA





PAPGSSGGG
13,527
FLV_P10273_3mutA





PAPGSSGGG
13,528
WMSV_P03359_3mutA





PAPGGS
13,529
MLVBM_Q7SVK7_3mutA_WS





GGGEAAAKGSS
13,530
PERV_Q4VFZ2_3mutA_WS





GGSEAAAKGSS
13,531
MLVBM_Q7SVK7_3mutA_WS





PAPGGSEAAAK
13,532
MLVCB_P08361_3mut





GGSEAAAKGGG
13,533
XMRV6_A1Z651_3mutA





GGSGGGGSS
13,534
WMSV_P03359_3mut





GGGEAAAKPAP
13,535
KORV_Q9TTC1_3mutA





EAAAKGSS
13,536
KORV_Q9TTC1-Pro_3mut





PAPEAAAKGSS
13,537
MLVFF_P26809_3mut





GSAGSAAGSGEF
13,538
PERV_Q4VFZ2_3mut





EAAAKGGGGGS
13,539
WMSV_P03359_3mut





EAAAKGSSPAP
13,540
WMSV_P03359_3mutA





GGGGSEAAAKGGGGS
13,541
XMRV6_A1Z651_3mutA





GSSEAAAKPAP
13,542
SFV3L_P27401-Pro_2mutA





GGGGGG
13,543
PERV_Q4VFZ2_3mutA_WS





PAPGGS
13,544
BAEVM_P10272_3mut





PAP

AVIRE_P03360_3mut





PAPAPAP
13,546
MLVBM_Q7SVK7_3mutA_WS





GGGG
13,547
PERV_Q4VFZ2_3mutA_WS





GSSGGSEAAAK
13,548
MLVBM_Q7SVK7_3mut





GGSGGGGSS
13,549
MLVFF_P26809_3mut





GGGGSSGGS
13,550
AVIRE_P03360_3mutA





GSSPAPGGG
13,551
PERV_Q4VFZ2_3mutA_WS





GGSEAAAKPAP
13,552
MLVMS_P03355_PLV919





PAP

KORV_Q9TTC1-Pro_3mut





GSSGGS
13,554
PERV_Q4VFZ2_3mut





GGGGG
13,555
PERV_Q4VFZ2_3mut





GSSGGGPAP
13,556
FLV_P10273_3mutA





GSSEAAAKGGG
13,557
KORV_Q9TTC1-Pro_3mut





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,558
MLVCB_P08361_3mut





GGSEAAAKPAP
13,559
MLVCB_P08361_3mut





PAPAPAPAPAPAP
13,560
BAEVM_P10272_3mutA





GGGGSEAAAKGGGGS
13,561
MLVMS_P03355_3mut





EAAAKPAPGSS
13,562
MLVMS_P03355_3mut





GSSGSSGSSGSSGSS
13,563
MLVBM_Q7SVK7_3mutA_WS





PAPEAAAKGSS
13,564
MLVAV_P03356_3mut





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,565
AVIRE_P03360_3mut





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,566
PERV_Q4VFZ2_3mut





GGSEAAAKGGG
13,567
PERV_Q4VFZ2_3mutA_WS





GGSGGGGSS
13,568
MLVFF_P26809_3mutA





PAPEAAAKGSS
13,569
MLVCB_P08361_3mut





GGG

PERV_Q4VFZ2_3mutA_WS





GGSGGGEAAAK
13,571
MLVMS_P03355_3mut





EAAAKGGGGSS
13,572
WMSV_P03359_3mut





GSSPAPGGG
13,573
WMSV_P03359_3mutA





EAAAKGSSGGG
13,574
PERV_Q4VFZ2_3mut





GGSGGGEAAAK
13,575
PERV_Q4VFZ2_3mutA_WS





GGSGGSGGSGGSGGS
13,576
PERV_Q4VFZ2_3mutA_WS





EAAAKPAPGGS
13,577
PERV_Q4VFZ2_3mutA_WS





GGGGGSEAAAK
13,578
PERV_Q4VFZ2_3mutA_WS





GSSPAP
13,579
MLVFF_P26809_3mut





GGGEAAAKPAP
13,580
AVIRE_P03360_3mut





GSSGGSEAAAK
13,581
MLVMS_P03355_PLV919





EAAAKPAPGGS
13,582
WMSV_P03359_3mutA





PAPGGG
13,583
KORV_Q9TTC1_3mutA





EAAAKGSSPAP
13,584
KORV_Q9TTC1-Pro_3mut





GSSPAPEAAAK
13,585
MLVFF_P26809_3mut





GGSGGGEAAAK
13,586
MLVFF_P26809_3mutA





GSSGSSGSS
13,587
WMSV_P03359_3mutA





EAAAKGGS
13,588
BAEVM_P10272_3mut





EAAAKPAPGGS
13,589
KORV_Q9TTC1_3mutA





EAAAKPAPGGS
13,590
BAEVM_P10272_3mutA





GSSGGGGGS
13,591
PERV_Q4VFZ2_3mut





PAPGGGGSS
13,592
PERV_Q4VFZ2_3mut





GSSGSSGSS
13,593
WMSV_P03359_3mut





EAAAKEAAAKEAAAKEAAAK
13,594
WMSV_P03359_3mut





GGS

AVIRE_P03360_3mut





EAAAKPAPGSS
13,596
MLVFF_P26809_3mut





EAAAKGGG
13,597
KORV_Q9TTC1_3mut





PAPGSSEAAAK
13,598
MLVMS_P03355_3mut





PAPGSSGGS
13,599
MLVMS_P03355_PLV919





GSSPAPEAAAK
13,600
MLVMS_P03355_3mut





GSSGSSGSSGSSGSSGSS
13,601
WMSV_P03359_3mutA





GGGGS
13,602
BAEVM_P10272_3mut





GSSPAP
13,603
MLVMS_P03355_3mut





EAAAKGGGGSEAAAK
13,604
KORV_Q9TTC1-Pro_3mutA





EAAAKEAAAK
13,605
WMSV_P03359_3mutA





GGGGSSGGS
13,606
MLVCB_P08361_3mutA





PAPGGSEAAAK
13,607
BAEVM_P10272_3mut





EAAAKGGSPAP
13,608
MLVFF_P26809_3mut





GSSGGSGGG
13,609
MLVBM_Q7SVK7_3mutA_WS





GSSGGS
13,610
PERV_Q4VFZ2_3mut





PAPGGSGSS
13,611
PERV_Q4VFZ2_3mutA_WS





EAAAKGGSGSS
13,612
KORV_Q9TTC1-Pro_3mutA





PAPAP
13,613
MLVCB_P08361_3mut





EAAAKGSSPAP
13,614
PERV_Q4VFZ2_3mutA_WS





EAAAKPAPGGG
13,615
MLVMS_P03355_PLV919





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
13,616
MLVBM_Q7SVK7_3mut





EAAAKGGGGSS
13,617
MLVMS_P03355_PLV919





PAPEAAAK
13,618
PERV_Q4VFZ2_3mut





EAAAKPAPGSS
13,619
BAEVM_P10272_3mutA





GGSPAP
13,620
PERV_Q4VFZ2_3mutA_WS





GGSGGS
13,621
BAEVM_P10272_3mutA





PAPEAAAKGSS
13,622
KORV_Q9TTC1_3mut





PAPGSS
13,623
MLVMS_P03355_PLV919





PAPAPAPAPAP
13,624
MLVAV_P03356_3mutA





GGG

XMRV6_A1Z651_3mutA





GGGPAP
13,626
PERV_Q4VFZ2_3mutA_WS





GSSPAPEAAAK
13,627
KORV_Q9TTC1_3mutA





PAP

BAEVM_P10272_3mutA





GGSPAP
13,629
BAEVM_P10272_3mutA





PAPEAAAKGGS
13,630
MLVMS_P03355_PLV919





PAPGSSGGS
13,631
PERV_Q4VFZ2_3mutA_WS





PAPAPAPAPAPAP
13,632
PERV_Q4VFZ2_3mut





EAAAKEAAAKEAAAK
13,633
MLVCB_P08361_3mut





GGSGGSGGSGGSGGS
13,634
MLVMS_P03355_PLV919





EAAAKPAPGGS
13,635
MLVMS_P03355_3mut





GGSGGS
13,636
MLVMS_P03355_PLV919





EAAAKPAP
13,637
MLVMS_P03355_3mutA_WS





GGSEAAAK
13,638
XMRV6_A1Z651_3mutA





GGSGGG
13,639
KORV_Q9TTC1_3mut





GGSGGGEAAAK
13,640
PERV_Q4VFZ2_3mut





PAPEAAAKGGG
13,641
AVIRE_P03360





PAPAP
13,642
PERV_Q4VFZ2_3mut





GSS

KORV_Q9TTC1-Pro_3mutA





EAAAKGSSGGG
13,644
MLVAV_P03356_3mutA





GGSPAPGSS
13,645
MLVBM_Q7SVK7_3mutA_WS





PAPEAAAK
13,646
MLVAV_P03356_3mut





EAAAKGGSPAP
13,647
BAEVM_P10272_3mutA





PAPAPAPAP
13,648
WMSV_P03359_3mutA





PAPGGSEAAAK
13,649
MLVMS_P03355_3mut





GGSGGSGGSGGS
13,650
WMSV_P03359_3mut





GGGGGSGSS
13,651
XMRV6_A1Z651_3mut





PAPGGSGGG
13,652
KORV_Q9TTC1_3mutA





GGS

MLVMS_P03355_3mut





EAAAK
13,654
WMSV_P03359_3mut





GGGEAAAKGSS
13,655
MLVBM_Q7SVK7_3mutA_WS





GGSPAPGSS
13,656
MLVCB_P08361_3mut





GGSEAAAKPAP
13,657
PERV_Q4VFZ2_3mut





GGGGSGGGGSGGGGSGGGGSGGGGS
13,658
MLVCB_P08361_3mutA





GGSGSS
13,659
BAEVM_P10272_3mutA





GGGEAAAKGSS
13,660
WMSV_P03359_3mutA





EAAAKGGSPAP
13,661
WMSV_P03359_3mut





GSSPAPEAAAK
13,662
MLVMS_P03355_3mut





GGSGGSGGSGGS
13,663
MLVMS_P03355_PLV919





GSSPAPEAAAK
13,664
WMSV_P03359_3mut





GSSGSSGSSGSS
13,665
PERV_Q4VFZ2





GGSGSSEAAAK
13,666
WMSV_P03359_3mutA





GGSGGG
13,667
MLVFF_P26809_3mut





GGSPAPGGG
13,668
MLVFF_P26809_3mut





GGSGGSGGS
13,669
BAEVM_P10272_3mutA





GGGGSSEAAAK
13,670
MLVBM_Q7SVK7_3mut





GGSPAPGSS
13,671
MLVMS_P03355_3mut





EAAAKPAPGSS
13,672
AVIRE_P03360_3mut





GGGGSSGGS
13,673
FLV_P10273_3mutA





GGSPAPEAAAK
13,674
PERV_Q4VFZ2_3mut





GGSEAAAK
13,675
MLVMS_P03355_3mutA_WS





GSSGSSGSSGSS
13,676
MLVCB_P08361_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
13,677
MLVMS_P03355_PLV919





GGGGG
13,678
PERV_Q4VFZ2_3mut





GGSEAAAKGSS
13,679
MLVCB_P08361_3mutA





GSSGGG
13,680
MLVBM_Q7SVK7_3mutA_WS





PAPGSSGGG
13,681
KORV_Q9TTC1-Pro_3mutA





GGSGGS
13,682
BAEVM_P10272_3mut





EAAAKGGGGGS
13,683
MLVBM_Q7SVK7_3mutA_WS





GGSGSSPAP
13,684
MLVCB_P08361_3mut





PAPGSSGGG
13,685
KORV_Q9TTC1





PAPGGSGGG
13,686
MLVMS_P03355_3mut





GGGG
13,687
WMSV_P03359_3mutA





EAAAKGGSPAP
13,688
MLVCB_P08361_3mut





GSSGSS
13,689
FLV_P10273_3mutA





GGSEAAAKPAP
13,690
SFV3L_P27401_2mut





EAAAKGSSGGS
13,691
MLVAV_P03356_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,692
MLVAV_P03356_3mutA





EAAAKGGSGSS
13,693
PERV_Q4VFZ2_3mutA_WS





GGGGG
13,694
MLVCB_P08361_3mut





GGGEAAAK
13,695
BAEVM_P10272_3mut





GGSGGSGGSGGS
13,696
MLVCB_P08361_3mut





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,697
PERV_Q4VFZ2





PAPAPAPAPAP
13,698
MLVMS_P03355_3mutA_WS





EAAAKEAAAK
13,699
XMRV6_A1Z651_3mut





GSSGGSEAAAK
13,700
PERV_Q4VFZ2_3mutA_WS





PAPGGSEAAAK
13,701
KORV_Q9TTC1-Pro_3mutA





EAAAKGGGPAP
13,702
MLVBM_Q7SVK7_3mutA_WS





PAPGGSGSS
13,703
PERV_Q4VFZ2





SGSETPGTSESATPES
13,704
MLVMS_P03355_3mut





GGSGGS
13,705
MLVMS_P03355_PLV919





EAAAKGGS
13,706
FLV_P10273_3mut





GGSPAPGSS
13,707
MLVMS_P03355_3mutA_WS





EAAAKEAAAKEAAAKEAAAK
13,708
FFV_O93209_2mut





GSSGGSGGG
13,709
MLVMS_P03355_3mutA_WS





PAPGSSEAAAK
13,710
WMSV_P03359_3mut





PAPAPAPAPAPAP
13,711
KORV_Q9TTC1_3mutA





GGGGSS
13,712
BAEVM_P10272_3mut





GGGGSEAAAKGGGGS
13,713
AVIRE_P03360_3mut





GSSPAPEAAAK
13,714
KORV_Q9TTC1-Pro_3mutA





PAPEAAAKGGG
13,715
MLVBM_Q7SVK7_3mut





EAAAKEAAAK
13,716
WMSV_P03359_3mut





EAAAK
13,717
SFV3L_P27401-Pro_2mutA





GSSGGSGGG
13,718
XMRV6_A1Z651_3mutA





GGGEAAAKPAP
13,719
WMSV_P03359_3mutA





GGSGGS
13,720
MLVFF_P26809_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,721
FOAMV_P14350_2mutA





GGGGG
13,722
MLVAV_P03356_3mutA





GSSGGSEAAAK
13,723
BAEVM_P10272_3mut





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
13,724
SFV1_P23074





GGSGGGPAP
13,725
MLVCB_P08361_3mut





GGSGSS
13,726
PERV_Q4VFZ2_3mut





SGSETPGTSESATPES
13,727
MLVFF_P26809_3mut





EAAAKGGSPAP
13,728
MLVMS_P03355_3mut





PAPAP
13,729
PERV_Q4VFZ2_3mut





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,730
MLVBM_Q7SVK7_3mut





GGGGGS
13,731
BAEVM_P10272_3mutA





EAAAKEAAAK
13,732
AVIRE_P03360_3mut





GSSGGSEAAAK
13,733
PERV_Q4VFZ2_3mut





GGGEAAAK
13,734
WMSV_P03359_3mut





GSSGGGEAAAK
13,735
AVIRE_P03360_3mutA





GGG

XMRV6_A1Z651_3mut





GGGGSEAAAKGGGGS
13,737
BAEVM_P10272_3mut





GGGG
13,738
MLVMS_P03355_3mut





GGSGGS
13,739
MLVMS_P03355_3mutA_WS





GGSGGGGSS
13,740
MLVBM_Q7SVK7_3mutA_WS





GSSPAPGGS
13,741
PERV_Q4VFZ2_3mut





GSSPAPEAAAK
13,742
PERV_Q4VFZ2_3mutA_WS





EAAAKGGS
13,743
WMSV_P03359_3mut





GGSGGSGGSGGS
13,744
PERV_Q4VFZ2_3mut





GGGGSSEAAAK
13,745
KORV_Q9TTC1-Pro_3mut





PAPAPAPAPAPAP
13,746
MLVAV_P03356_3mut





EAAAKGSSGGG
13,747
MLVMS_P03355_PLV919





GGGGG
13,748
MLVBM_Q7SVK7_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,749
FFV_O93209_2mutA





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
13,750
KORV_Q9TTC1-Pro_3mut





GGSPAPGGG
13,751
MLVMS_P03355_3mutA_WS





GGGEAAAKGGS
13,752
MLVMS_P03355_3mut





GGGEAAAK
13,753
PERV_Q4VFZ2_3mut





PAPEAAAKGGG
13,754
MLVMS_P03355_3mut





GSSGSSGSSGSSGSSGSS
13,755
BAEVM_P10272_3mutA





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,756
GALV_P21414_3mutA





EAAAKGGSPAP
13,757
FFV_O93209-Pro





EAAAKEAAAK
13,758
MLVFF_P26809_3mut





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
13,759
PERV_Q4VFZ2_3mutA_WS





GGSGGSGGSGGS
13,760
MLVAV_P03356_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
13,761
SFV3L_P27401_2mutA





GSSGSSGSSGSSGSSGSS
13,762
BAEVM_P10272_3mut





GGGGS
13,763
MLVMS_P03355_PLV919





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
13,764
SFV1_P23074





GGGGSGGGGS
13,765
KORV_Q9TTC1-Pro_3mutA





GGGGSGGGGS
13,766
MLVMS_P03355_3mut





GGSGSS
13,767
KORV_Q9TTC1_3mutA





GSSPAPGGG
13,768
PERV_Q4VFZ2_3mut





GSSGGSPAP
13,769
PERV_Q4VFZ2_3mutA_WS





PAPGGS
13,770
PERV_Q4VFZ2_3mutA_WS





GGSPAPEAAAK
13,771
FOAMV_P14350_2mutA





GGGPAPGGS
13,772
SFV3L_P27401_2mut





PAPGSSGGG
13,773
MLVCB_P08361_3mut





GSSGGGEAAAK
13,774
AVIRE_P03360_3mut





GSSGGG
13,775
XMRV6_A1Z651_3mut





GSSGSS
13,776
PERV_Q4VFZ2_3mut





GSSGGG
13,777
MLVAV_P03356_3mutA





PAPGGGGGS
13,778
PERV_Q4VFZ2_3mut





GSSEAAAK
13,779
MLVMS_P03355_3mut





PAPGGG
13,780
FLV_P10273_3mutA





GGGGSGGGGS
13,781
PERV_Q4VFZ2_3mut





GSSGGS
13,782
MLVMS_P03355_PLV919





GGGGSGGGGS
13,783
SFV3L_P27401_2mut





EAAAKGGSGSS
13,784
FLV_P10273_3mutA





GSSEAAAKGGS
13,785
MLVMS_P03355_3mutA_WS





PAPGSSEAAAK
13,786
SFV3L_P27401_2mutA





GGGGSGGGGS
13,787
SFV3L_P27401-Pro_2mutA





PAPGSSEAAAK
13,788
PERV_Q4VFZ2_3mut





PAPGSSEAAAK
13,789
PERV_Q4VFZ2





GGSPAPGGG
13,790
AVIRE_P03360_3mut





GGGGGS
13,791
PERV_Q4VFZ2_3mutA_WS





GGGGSSGGS
13,792
PERV_Q4VFZ2_3mut





PAPAPAPAP
13,793
AVIRE_P03360_3mutA





GGSGGS
13,794
WMSV_P03359_3mutA





GGGPAPGGS
13,795
PERV_Q4VFZ2_3mut





GGSGGSGGSGGSGGS
13,796
MLVMS_P03355_PLV919





GGSGGG
13,797
PERV_Q4VFZ2_3mut





EAAAKEAAAK
13,798
SFV3L_P27401_2mut





PAPGSS
13,799
XMRV6_A1Z651_3mut





GSSEAAAK
13,800
MLVFF_P26809_3mut





GGSPAPGGG
13,801
MLVMS_P03355_3mut





EAAAKGGG
13,802
WMSV_P03359_3mutA





GSSEAAAKGGS
13,803
PERV_Q4VFZ2_3mutA_WS





GSSGGSPAP
13,804
FFV_O93209





GGGGGS
13,805
KORV_Q9TTC1-Pro_3mut





GSSGGG
13,806
MLVCB_P08361_3mut





GSSGSS
13,807
MLVCB_P08361_3mutA





GGSEAAAKPAP
13,808
BAEVM_P10272_3mut





EAAAKGGGGSS
13,809
MLVCB_P08361_3mut





EAAAKPAPGGS
13,810
KORV_Q9TTC1-Pro_3mutA





GSSGSSGSSGSSGSS
13,811
MLVAV_P03356_3mutA





GGGGSEAAAKGGGGS
13,812
PERV_Q4VFZ2_3mutA_WS





GGSGSS
13,813
KORV_Q9TTC1-Pro_3mut





GSS

SFV3L_P27401-Pro_2mutA





PAPAP
13,815
BAEVM_P10272_3mut





EAAAKPAP
13,816
BAEVM_P10272





EAAAKEAAAKEAAAKEAAAKEAAAK
13,817
KORV_Q9TTC1-Pro_3mut





GGGGGGG
13,818
PERV_Q4VFZ2_3mutA_WS





GGGGS
13,819
MLVMS_P03355_3mut





GSSGGG
13,820
FLV_P10273_3mutA





PAPAPAPAPAP
13,821
FLV_P10273_3mut





EAAAKEAAAKEAAAK
13,822
WMSV_P03359_3mutA





GSSGGS
13,823
MLVBM_Q7SVK7_3mutA_WS





EAAAKPAPGGG
13,824
MLVMS_P03355_3mut





GSSPAPGGS
13,825
WMSV_P03359_3mut





PAPGSSGGG
13,826
PERV_Q4VFZ2_3mutA_WS





GSSGGG
13,827
AVIRE_P03360_3mutA





PAPGGSGSS
13,828
MLVFF_P26809_3mut





PAPGSS
13,829
PERV_Q4VFZ2_3mut





GGGGGSGSS
13,830
WMSV_P03359_3mutA





EAAAKGGGGSS
13,831
MLVBM_Q7SVK7_3mutA_WS





GGGGGGG
13,832
BAEVM_P10272_3mut





PAPEAAAKGSS
13,833
MLVMS_P03355_3mut





GGSGGGEAAAK
13,834
MLVMS_P03355_PLV919





EAAAKGGGGGS
13,835
MLVCB_P08361_3mut





PAPGGS
13,836
KORV_Q9TTC1-Pro_3mut





GGGG
13,837
FLV_P10273_3mutA





EAAAKGGSGSS
13,838
MLVBM_Q7SVK7_3mutA_WS





GGGGSSGGS
13,839
MLVMS_P03355_3mutA_WS





GGGGGGGG
13,840
WMSV_P03359_3mut





GGSGSSGGG
13,841
MLVMS_P03355_PLV919





GSSEAAAKGGS
13,842
KORV_Q9TTC1-Pro_3mutA





EAAAKPAPGSS
13,843
MLVCB_P08361_3mut





GGSPAPGSS
13,844
KORV_Q9TTC1_3mutA





PAPGSSGGG
13,845
BAEVM_P10272_3mut





EAAAKPAPGSS
13,846
WMSV_P03359_3mut





GGSPAPEAAAK
13,847
XMRV6_A1Z651_3mutA





GSSPAP
13,848
FLV_P10273_3mutA





GSS

BAEVM_P10272_3mutA





EAAAKPAPGGS
13,850
FLV_P10273_3mutA





GGSGSSPAP
13,851
FLV_P10273_3mutA





PAPGSSGGS
13,852
MLVMS_P03355_3mut





GSAGSAAGSGEF
13,853
PERV_Q4VFZ2_3mutA_WS





GSSGGSEAAAK
13,854
KORV_Q9TTC1_3mutA





GSSGGS
13,855
MLVMS_P03355_3mutA_WS





EAAAKGGGGSEAAAK
13,856
SFV3L_P27401_2mut





GSSGGS
13,857
PERV_Q4VFZ2_3mutA_WS





GGSPAPEAAAK
13,858
FLV_P10273_3mut





GGSEAAAKGSS
13,859
PERV_Q4VFZ2_3mutA_WS





GSSPAPEAAAK
13,860
PERV_Q4VFZ2_3mutA_WS





GGSGSSGGG
13,861
PERV_Q4VFZ2_3mut





GGGG
13,862
AVIRE_P03360_3mutA





GGSEAAAKPAP
13,863
WMSV_P03359_3mut





GSSGGSPAP
13,864
MLVAV_P03356_3mutA





GSSGGSEAAAK
13,865
MLVMS_P03355_3mut





PAPEAAAKGGS
13,866
KORV_Q9TTC1-Pro_3mut





GGSPAP
13,867
PERV_Q4VFZ2_3mutA_WS





GGSEAAAK
13,868
MLVAV_P03356_3mutA





EAAAKGGGGSEAAAK
13,869
KORV_Q9TTC1-Pro_3mut





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
13,870
MLVMS_P03355_PLV919





GSSEAAAK
13,871
KORV_Q9TTC1_3mutA





GGG

AVIRE_P03360





GGSEAAAKGSS
13,873
MLVBM_Q7SVK7_3mut





GGSEAAAKGSS
13,874
MLVMS_P03355_3mut





GGSPAPEAAAK
13,875
MLVCB_P08361_3mut





GGSGGGEAAAK
13,876
MLVCB_P08361_3mut





GGSEAAAKPAP
13,877
MLVMS_P03355_3mutA_WS





EAAAKGGSGSS
13,878
KORV_Q9TTC1-Pro_3mut





GGGEAAAKGGS
13,879
MLVCB_P08361_3mut





EAAAKGGGGSEAAAK
13,880
FLV_P10273_3mutA





GGSPAP
13,881
MLVFF_P26809_3mut





GGGGSSGGS
13,882
XMRV6_A1Z651_3mutA





PAP

MLVCB_P08361_3mut





GGS

SFV3L_P27401-Pro_2mutA





GGGGSGGGGS
13,885
MLVMS_P03355_3mut





GGGEAAAKGGS
13,886
MLVAV_P03356_3mutA





GSSGSSGSSGSSGSSGSS
13,887
MLVMS_P03355_PLV919





PAPGSS
13,888
MLVCB_P08361_3mut





GGSGGSGGS
13,889
MLVMS_P03355_PLV919





PAPGGSGGG
13,890
FLV_P10273_3mutA





GGGGSGGGGSGGGGS
13,891
FLV_P10273_3mut





GGSGSSGGG
13,892
KORV_Q9TTC1-Pro_3mutA





GGSGGSGGS
13,893
GALV_P21414_3mutA





GGGEAAAKGGS
13,894
WMSV_P03359_3mut





SGSETPGTSESATPES
13,895
KORV_Q9TTC1_3mutA





EAAAKGGGGGS
13,896
KORV_Q9TTC1-Pro_3mut





EAAAKGSSPAP
13,897
BAEVM_P10272_3mut





GGGG
13,898
MLVCB_P08361_3mut





GGGGSGGGGSGGGGSGGGGSGGGGS
13,899
MLVBM_Q7SVK7_3mut





GSSGGSGGG
13,900
MLVMS_P03355_PLV919





GGSGSS
13,901
MLVFF_P26809_3mut





EAAAKGGS
13,902
AVIRE_P03360_3mutA





GSSEAAAKGGS
13,903
MLVBM_Q7SVK7_3mutA_WS





EAAAKPAPGGG
13,904
WMSV_P03359_3mut





PAPGSSGGG
13,905
MLVCB_P08361_3mutA





GGGGSSEAAAK
13,906
KORV_Q9TTC1-Pro_3mutA





GSSEAAAKPAP
13,907
BAEVM_P10272_3mutA





PAPGGGEAAAK
13,908
MLVBM_Q7SVK7_3mutA_WS





GGSGGGEAAAK
13,909
MLVCB_P08361_3mutA





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
13,910
FFV_O93209





EAAAKGGGGGS
13,911
GALV_P21414_3mutA





GGSPAPGGG
13,912
MLVMS_P03355_3mut





GSSGSSGSS
13,913
FLV_P10273_3mutA





EAAAK
13,914
MLVBM_Q7SVK7_3mut





GGGGSSGGS
13,915
MLVMS_P03355_3mut





GGSGSSPAP
13,916
PERV_Q4VFZ2_3mut





EAAAKEAAAKEAAAKEAAAK
13,917
BAEVM_P10272_3mut





GGGPAPGSS
13,918
MLVMS_P03355_3mut





GSSPAPGGS
13,919
PERV_Q4VFZ2_3mutA_WS





PAPAP
13,920
FLV_P10273_3mutA





PAPAPAPAP
13,921
PERV_Q4VFZ2_3mut





GGGGGSEAAAK
13,922
GALV_P21414_3mutA





GGGGGSGSS
13,923
BAEVM_P10272_3mutA





GGGEAAAKGSS
13,924
KORV_Q9TTC1_3mutA





GGGGGSPAP
13,925
AVIRE_P03360_3mut





GGGGGSEAAAK
13,926
SFV3L_P27401_2mutA





GGS

KORV_Q9TTC1_3mutA





GGGGGGG
13,928
PERV_Q4VFZ2_3mut





SGSETPGTSESATPES
13,929
SFV3L_P27401_2mutA





EAAAKGGSGGG
13,930
MLVMS_P03355_3mut





GGGGS
13,931
MLVFF_P26809_3mut





EAAAKGSSGGG
13,932
BAEVM_P10272_3mut





EAAAKPAPGGS
13,933
MLVF5_P26810_3mutA





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
13,934
SFV3L_P27401_2mutA





GGSPAPGGG
13,935
WMSV_P03359_3mutA





GSAGSAAGSGEF
13,936
MLVFF_P26809_3mut





GGGGSSGGS
13,937
MLVMS_P03355_3mutA_WS





GGGGGGG
13,938
MLVCB_P08361_3mut





GSSEAAAK
13,939
WMSV_P03359_3mut





PAPGSS
13,940
FLV_P10273_3mutA





GSSGGG
13,941
PERV_Q4VFZ2_3mutA_WS





PAPGGG
13,942
MLVFF_P26809_3mut





GGGGGSPAP
13,943
MLVMS_P03355_3mut





GGSEAAAK
13,944
XMRV6_A1Z651_3mut





GSSGGG
13,945
PERV_Q4VFZ2_3mut





GGSGGSGGSGGS
13,946
MLVMS_P03355_3mut





PAPAP
13,947
AVIRE_P03360_3mut





GGSEAAAK
13,948
PERV_Q4VFZ2_3mut





GGGGS
13,949
MLVMS_P03355_PLV919





GGGG
13,950
BAEVM_P10272_3mutA





EAAAKGGGGSS
13,951
MLVCB_P08361_3mutA





EAAAKEAAAKEAAAK
13,952
GALV_P21414_3mutA





PAPGGGEAAAK
13,953
KORV_Q9TTC1





EAAAKGGSPAP
13,954
MLVMS_P03355_3mut





GGSGSSEAAAK
13,955
MLVMS_P03355_3mut





GGSPAPEAAAK
13,956
FLV_P10273_3mutA





GGGGGGG
13,957
PERV_Q4VFZ2_3mut





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
13,958
SFV1_P23074_2mutA





EAAAKGSSGGS
13,959
MLVMS_P03355_3mut





GSSEAAAKPAP
13,960
MLVFF_P26809_3mut





GGGGSS
13,961
FLV_P10273_3mutA





EAAAKGGSGGG
13,962
AVIRE_P03360_3mutA





GGSGGS
13,963
PERV_Q4VFZ2_3mutA_WS





GGGGGSPAP
13,964
AVIRE_P03360_3mutA





EAAAKEAAAKEAAAK
13,965
XMRV6_A1Z651_3mut





PAPEAAAKGGS
13,966
FLV_P10273_3mutA





GSSGGSEAAAK
13,967
MLVCB_P08361_3mut





EAAAKGGSGGG
13,968
MLVMS_P03355





GGSGGGPAP
13,969
MLVMS_P03355_3mut





GGS

XMRV6_A1Z651_3mut





GGSEAAAKPAP
13,971
MLVFF_P26809_3mut





EAAAKGGG
13,972
MLVMS_P03355_PLV919





GSSGSSGSSGSS
13,973
WMSV_P03359_3mut





GGSGSSPAP
13,974
PERV_Q4VFZ2_3mut





GGGEAAAK
13,975
MLVMS_P03355_3mutA_WS





GSSPAPGGS
13,976
KORV_Q9TTC1-Pro_3mutA





GSSEAAAKGGG
13,977
SFV3L_P27401_2mut





EAAAKPAPGGS
13,978
MLVCB_P08361_3mut





GGSGGGEAAAK
13,979
PERV_Q4VFZ2





GGSGSS
13,980
MLVCB_P08361_3mut





GGSGGGEAAAK
13,981
MLVBM_Q7SVK7_3mutA_WS





GGSGGSGGSGGSGGSGGS
13,982
FLV_P10273_3mut





PAPEAAAKGSS
13,983
MLVMS_P03355_3mut





EAAAKGSSGGS
13,984
WMSV_P03359_3mutA





GGSGSSEAAAK
13,985
MLVCB_P08361_3mut





GGSGSSEAAAK
13,986
KORV_Q9TTC1_3mutA





GSSGGSGGG
13,987
MLVMS_P03355_PLV919





EAAAKGGSGGG
13,988
SFV3L_P27401-Pro_2mutA





GGSGGS
13,989
AVIRE_P03360_3mutA





GSAGSAAGSGEF
13,990
MLVMS_P03355_PLV919





GGSGSS
13,991
GALV_P21414_3mutA





GGGG
13,992
MLVFF_P26809_3mutA





GGGGSGGGGSGGGGSGGGGS
13,993
WMSV_P03359_3mut





SGSETPGTSESATPES
13,994
BAEVM_P10272_3mut





EAAAKEAAAKEAAAKEAAAK
13,995
FOAMV_P14350_2mutA





GGGEAAAKGGS
13,996
FLV_P10273_3mutA





GSSGGSEAAAK
13,997
MLVFF_P26809_3mut





EAAAKGGGGSS
13,998
MLVAV_P03356_3mut





PAPGGSEAAAK
13,999
KORV_Q9TTC1-Pro_3mut





EAAAK
14,000
XMRV6_A1Z651_3mut





GSSGSSGSSGSSGSSGSS
14,001
PERV_Q4VFZ2_3mut





GGGG
14,002
MLVCB_P08361_3mutA





GSSGSS
14,003
WMSV_P03359_3mutA





GSSGGSPAP
14,004
AVIRE_P03360_3mut





GGSGGSGGS
14,005
MLVCB_P08361_3mut





EAAAKGGGPAP
14,006
FLV_P10273_3mutA





GGGGSGGGGS
14,007
MLVCB_P08361_3mut





GGSEAAAKGSS
14,008
PERV_Q4VFZ2_3mutA_WS





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
14,009
SFV3L_P27401_2mutA





GGSGSSEAAAK
14,010
PERV_Q4VFZ2_3mutA_WS





EAAAKEAAAKEAAAKEAAAK
14,011
SFV3L_P27401-Pro_2mutA





GSSEAAAKGGS
14,012
FLV_P10273_3mutA





GGSGSS
14,013
PERV_Q4VFZ2





GGSGSSEAAAK
14,014
SFV3L_P27401-Pro_2mutA





GSSGSSGSS
14,015
XMRV6_A1Z651_3mutA





EAAAKGSSPAP
14,016
KORV_Q9TTC1_3mutA





EAAAKPAP
14,017
FLV_P10273_3mutA





GGSGSSEAAAK
14,018
KORV_Q9TTC1-Pro_3mut





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
14,019
KORV_Q9TTC1_3mutA





GGGGSGGGGSGGGGS
14,020
KORV_Q9TTC1-Pro_3mutA





GGGGGGG
14,021
FLV_P10273_3mut





EAAAKGSS
14,022
WMSV_P03359_3mut





EAAAKGGGPAP
14,023
MLVCB_P08361_3mut





GSSGSS
14,024
MLVBM_Q7SVK7_3mutA_WS





EAAAKGGGGGS
14,025
MLVFF_P26809_3mut





GGSGGGEAAAK
14,026
FLV_P10273_3mutA





PAPGSS
14,027
MLVFF_P26809_3mutA





PAPGSS
14,028
BAEVM_P10272_3mutA





GGSPAPGSS
14,029
AVIRE_P03360_3mut





GGGGSSEAAAK
14,030
MLVMS_P03355_3mut





GSSGGGGGS
14,031
FFV_O93209-Pro





EAAAKGSSPAP
14,032
PERV_Q4VFZ2_3mut





GSSPAPGGS
14,033
PERV_Q4VFZ2_3mut





GGGGGG
14,034
BAEVM_P10272_3mut





EAAAKGGGGSS
14,035
PERV_Q4VFZ2_3mutA_WS





PAPGGSEAAAK
14,036
KORV_Q9TTC1_3mutA





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
14,037
MLVMS_P03355_3mutA_WS





GSSGSSGSSGSS
14,038
MLVMS_P03355_3mut





EAAAKGSSGGG
14,039
MLVMS_P03355_PLV919





GGSEAAAKPAP
14,040
AVIRE_P03360_3mutA





GSSGSSGSSGSSGSS
14,041
WMSV_P03359_3mutA





GGGEAAAKPAP
14,042
FLV_P10273_3mutA





PAPGSSGGG
14,043
KORV_Q9TTC1_3mutA





GSSGSS
14,044
MLVMS_P03355_3mutA_WS





PAPEAAAK
14,045
BAEVM_P10272_3mut





GGGPAPGSS
14,046
PERV_Q4VFZ2





GSSGGSPAP
14,047
MLVFF_P26809_3mut





GGGGSS
14,048
SFV3L_P27401_2mut





PAPEAAAKGSS
14,049
SFV3L_P27401_2mut





GGSGGGPAP
14,050
XMRV6_A1Z651_3mutA





PAPGGS
14,051
BAEVM_P10272_3mutA





EAAAKGGGGGS
14,052
AVIRE_P03360_3mut





GSSGGSPAP
14,053
KORV_Q9TTC1-Pro_3mutA





GSSGGGGGS
14,054
WMSV_P03359_3mut





GGGEAAAKGGS
14,055
AVIRE_P03360_3mut





GGGEAAAKGSS
14,056
BAEVM_P10272_3mut





PAPEAAAKGSS
14,057
MLVAV_P03356_3mutA





GSSGSSGSSGSSGSS
14,058
MLVCB_P08361_3mut





GGSPAPGSS
14,059
FLV_P10273_3mutA





EAAAKGSSPAP
14,060
BAEVM_P10272_3mutA





GGSGGSGGSGGSGGSGGS
14,061
PERV_Q4VFZ2





GGGGSSEAAAK
14,062
FLV_P10273_3mutA





GGGGSSPAP
14,063
FFV_O93209





GSSGGSPAP
14,064
MLVMS_P03355_3mut





GGGPAPGSS
14,065
MLVMS_P03355_PLV919





PAPGSSGGS
14,066
PERV_Q4VFZ2_3mut





GGGGGSPAP
14,067
MLVFF_P26809_3mut





SGSETPGTSESATPES
14,068
MLVMS_P03355_3mutA_WS





GSSGSSGSSGSSGSS
14,069
KORV_Q9TTC1_3mutA





GSSPAPGGG
14,070
WMSV_P03359_3mut





PAPAPAPAPAPAP
14,071
SFV3L_P27401_2mutA





GGGPAPGGS
14,072
MLVMS_P03355_3mut





PAPGGSEAAAK
14,073
WMSV_P03359_3mut





GGGGSSEAAAK
14,074
FFV_O93209-Pro





GGSPAPGGG
14,075
FLV_P10273_3mutA





GSSPAPEAAAK
14,076
AVIRE_P03360_3mut





GGGEAAAK
14,077
FLV_P10273_3mutA





PAPEAAAKGGG
14,078
MLVCB_P08361_3mut





GGSPAPGGG
14,079
MLVCB_P08361_3mut





GGSGGGGSS
14,080
BAEVM_P10272_3mutA





GSSPAPEAAAK
14,081
MLVCB_P08361_3mut





GGSPAPGGG
14,082
KORV_Q9TTC1-Pro_3mutA





PAPGGSGSS
14,083
KORV_Q9TTC1_3mutA





GSSPAP
14,084
KORV_Q9TTC1-Pro_3mutA





SGSETPGTSESATPES
14,085
MLVMS_P03355





GSSGSSGSS
14,086
MLVAV_P03356_3mutA





PAPGSSGGS
14,087
PERV_Q4VFZ2_3mutA_WS





PAPGGS
14,088
KORV_Q9TTC1-Pro_3mutA





PAPEAAAKGGG
14,089
SFV3L_P27401-Pro_2mutA





GGSGGSGGS
14,090
BAEVM_P10272_3mut





PAPGGS
14,091
MLVFF_P26809_3mut





GSSGGSPAP
14,092
MLVMS_P03355_PLV919





GSSGGGGGS
14,093
FLV_P10273_3mutA





GGGGGSPAP
14,094
KORV_Q9TTC1-Pro_3mut





EAAAKPAPGSS
14,095
SFV3L_P27401-Pro_2mutA





EAAAKGGSPAP
14,096
KORV_Q9TTC1-Pro





GGGPAPEAAAK
14,097
MLVMS_P03355_PLV919





GGSEAAAKGSS
14,098
MLVMS_P03355





PAPEAAAKGSS
14,099
KORV_Q9TTC1_3mutA





PAPEAAAKGGS
14,100
WMSV_P03359_3mutA





GSSGGG
14,101
PERV_Q4VFZ2_3mutA_WS





EAAAKGGGGSS
14,102
MLVMS_P03355_PLV919





EAAAKGGSPAP
14,103
AVIRE_P03360_3mutA





GGGGSSGGS
14,104
MLVMS_P03355_PLV919





PAPEAAAKGSS
14,105
PERV_Q4VFZ2_3mutA_WS





EAAAKGGGGGS
14,106
BAEVM_P10272_3mut





GSSGGGGGS
14,107
MLVMS_P03355_3mut





PAPAPAPAP
14,108
KORV_Q9TTC1_3mutA





GGSGGSGGSGGS
14,109
MLVAV_P03356_3mut





PAPAPAPAP
14,110
SFV3L_P27401_2mut





GSSEAAAKPAP
14,111
MLVMS_P03355_3mut





GGSGGGEAAAK
14,112
SFV3L_P27401_2mutA





GSSGGSGGG
14,113
MLVMS_P03355_3mutA_WS





GGGGGSPAP
14,114
MLVCB_P08361_3mutA





GGGEAAAKGSS
14,115
XMRV6_A1Z651_3mutA





GGGGSSPAP
14,116
BAEVM_P10272_3mut





GGSGGG
14,117
PERV_Q4VFZ2_3mut





GGGGSS
14,118
MLVBM_Q7SVK7_3mutA_WS





EAAAKGSSGGS
14,119
PERV_Q4VFZ2_3mutA_WS





GSSGGGGGS
14,120
PERV_Q4VFZ2





EAAAKGSSGGS
14,121
PERV_Q4VFZ2_3mut





EAAAKEAAAK
14,122
MLVAV_P03356_3mut





GSSGGGEAAAK
14,123
MLVAV_P03356_3mut





GSSPAPGGG
14,124
XMRV6_A1Z651_3mut





GGGGSGGGGSGGGGS
14,125
PERV_Q4VFZ2_3mut





EAAAKEAAAKEAAAKEAAAK
14,126
KORV_Q9TTC1_3mutA





EAAAKGGSGSS
14,127
MLVBM_Q7SVK7_3mut





PAPEAAAK
14,128
BLVJ_P03361





GSSGGG
14,129
FFV_O93209-Pro





GGSGGGEAAAK
14,130
KORV_Q9TTC1-Pro_3mutA





EAAAK
14,131
FLV_P10273_3mutA





GGGGSSPAP
14,132
MLVMS_P03355_3mut





GSS

SFV3L_P27401-Pro_2mut





PAPEAAAKGSS
14,134
BAEVM_P10272_3mut





GGGGGSPAP
14,135
PERV_Q4VFZ2_3mut





GSSGSSGSS
14,136
BAEVM_P10272_3mutA





GGGGSGGGGSGGGGSGGGGS
14,137
SFV1_P23074_2mut





GGGGSSEAAAK
14,138
SFV3L_P27401_2mutA





GGGGSGGGGSGGGGSGGGGS
14,139
FOAMV_P14350-Pro_2mut





PAPGSSEAAAK
14,140
MLVBM_Q7SVK7_3mutA_WS





GGGGGSGSS
14,141
MLVFF_P26809_3mutA





GGSEAAAKGGG
14,142
MLVBM_Q7SVK7_3mut





PAPGSSGGG
14,143
PERV_Q4VFZ2





GGS

PERV_Q4VFZ2_3mutA_WS





EAAAKGGSGSS
14,145
FLV_P10273_3mut





GGGEAAAK
14,146
WMSV_P03359_3mutA





GGSEAAAKPAP
14,147
MLVBM_Q7SVK7_3mut





SGSETPGTSESATPES
14,148
FOAMV_P14350-Pro_2mutA





EAAAKPAPGGS
14,149
AVIRE_P03360_3mut





EAAAKGGGGGS
14,150
KORV_Q9TTC1-Pro_3mutA





GGGGS
14,151
PERV_Q4VFZ2_3mut





GGSEAAAKGSS
14,152
MLVFF_P26809_3mutA





GGSEAAAKGGG
14,153
AVIRE_P03360





GGSGGSGGSGGSGGSGGS
14,154
SFV3L_P27401_2mut





GGSEAAAKGSS
14,155
SFV3L_P27401-Pro_2mutA





GGGEAAAKPAP
14,156
MLVCB_P08361_3mut





GGSEAAAK
14,157
MLVMS_P03355_PLV919





GGSPAPGSS
14,158
KORV_Q9TTC1-Pro_3mutA





GSSPAPEAAAK
14,159
WMSV_P03359_3mutA





GGSGSS
14,160
KORV_Q9TTC1-Pro_3mutA





PAPGGGGGS
14,161
AVIRE_P03360_3mut





PAPEAAAKGSS
14,162
FFV_O93209-Pro





GGSGGGEAAAK
14,163
WMSV_P03359_3mut





PAPGGG
14,164
MLVMS_P03355_3mut





EAAAKGGG
14,165
FLV_P10273_3mutA





GSSGSSGSSGSS
14,166
MLVCB_P08361_3mut





EAAAKGGSGGG
14,167
FFV_O93209





GSSPAPGGS
14,168
PERV_Q4VFZ2_3mutA_WS





GSSPAPGGS
14,169
MLVCB_P08361_3mut





GGGPAP
14,170
WMSV_P03359_3mutA





GGGPAP
14,171
KORV_Q9TTC1_3mutA





GGSPAPGSS
14,172
KORV_Q9TTC1-Pro_3mut





PAPAP
14,173
MLVMS_P03355_3mut





GGGGGGG
14,174
MLVMS_P03355_3mut





GGGGG
14,175
KORV_Q9TTC1-Pro_3mut





GSAGSAAGSGEF
14,176
FOAMV_P14350_2mutA





PAPAP
14,177
KORV_Q9TTC1-Pro_3mutA





GGSEAAAKGGG
14,178
SFV3L_P27401-Pro_2mutA





PAPAP
14,179
WMSV_P03359_3mut





GGGGSGGGGSGGGGS
14,180
SFV3L_P27401_2mut





PAPGGS
14,181
KORV_Q9TTC1_3mutA





GGGEAAAKPAP
14,182
FLV_P10273_3mut





GGGGGS
14,183
MLVAV_P03356_3mutA





GSSEAAAKGGG
14,184
WMSV_P03359_3mut





EAAAKGGGGSS
14,185
GALV_P21414_3mutA





GSSGGS
14,186
MLVAV_P03356_3mutA





GSSGGG
14,187
MLVBM_Q7SVK7_3mut





PAPAPAP
14,188
SFV3L_P27401-Pro_2mutA





GGGG
14,189
KORV_Q9TTC1_3mutA





EAAAKPAPGGS
14,190
MLVFF_P26809_3mut





GGGGSGGGGS
14,191
XMRV6_A1Z651_3mut





EAAAKGGG
14,192
MLVCB_P08361_3mut





GGGGSSPAP
14,193
KORV_Q9TTC1_3mutA





GSSEAAAKGGG
14,194
KORV_Q9TTC1-Pro_3mutA





GGGGG
14,195
BLVJ_P03361_2mutB





GGGEAAAKGSS
14,196
FFV_O93209-Pro





GSSGSSGSS
14,197
BAEVM_P10272_3mut





GSSGGSPAP
14,198
PERV_Q4VFZ2_3mut





EAAAKGGS
14,199
KORV_Q9TTC1_3mut





GGSPAPEAAAK
14,200
AVIRE_P03360_3mut





GGSEAAAK
14,201
WMSV_P03359_3mut





GSSGGS
14,202
KORV_Q9TTC1-Pro_3mutA





GGGPAPEAAAK
14,203
KORV_Q9TTC1_3mutA





PAPGSS
14,204
WMSV_P03359_3mutA





GGSEAAAKGSS
14,205
FLV_P10273_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
14,206
SFV3L_P27401





GSSEAAAKGGG
14,207
SFV3L_P27401-Pro_2mutA





GGGGSEAAAKGGGGS
14,208
KORV_Q9TTC1-Pro_3mutA





GGSGGSGGS
14,209
WMSV_P03359_3mut





GGGGGSGSS
14,210
KORV_Q9TTC1-Pro





GGGGSGGGGSGGGGSGGGGS
14,211
MLVMS_P03355_3mut





EAAAKGGG
14,212
PERV_Q4VFZ2





GGSEAAAKGGG
14,213
KORV_Q9TTC1-Pro_3mut





GSSGGSGGG
14,214
PERV_Q4VFZ2_3mutA_WS





GGGGGS
14,215
PERV_Q4VFZ2_3mut





GSAGSAAGSGEF
14,216
PERV_Q4VFZ2





PAPEAAAKGSS
14,217
BAEVM_P10272_3mutA





GSSPAPGGG
14,218
MLVCB_P08361_3mut





GGGGSSPAP
14,219
KORV_Q9TTC1-Pro_3mutA





PAPGGSGGG
14,220
MLVFF_P26809_3mut





GSSPAP
14,221
KORV_Q9TTC1_3mutA





PAPGSS
14,222
SFV3L_P27401-Pro_2mut





GGSGGGGSS
14,223
MLVMS_P03355_PLV919





GSSGGS
14,224
WMSV_P03359_3mutA





EAAAKGGGGGS
14,225
PERV_Q4VFZ2





GGGGG
14,226
KORV_Q9TTC1_3mutA





EAAAKGSS
14,227
MLVMS_P03355_PLV919





EAAAKEAAAKEAAAKEAAAKEAAAK
14,228
FLV_P10273_3mut





EAAAKEAAAKEAAAKEAAAK
14,229
SFV3L_P27401-Pro_2mut





GSAGSAAGSGEF
14,230
SFV3L_P27401_2mutA





GGGPAPGGS
14,231
FLV_P10273_3mutA





GGSEAAAKGGG
14,232
MLVCB_P08361_3mut





PAPGGGEAAAK
14,233
BAEVM_P10272_3mut





EAAAKPAPGSS
14,234
FOAMV_P14350_2mut





GGSEAAAK
14,235
KORV_Q9TTC1_3mutA





GGSGSS
14,236
AVIRE_P03360





GGSPAPEAAAK
14,237
MLVMS_P03355_PLV919





GGGGS
14,238
XMRV6_A1Z651_3mut





GGSPAPGGG
14,239
XMRV6_A1Z651_3mut





EAAAKPAPGGS
14,240
PERV_Q4VFZ2





GSSPAP
14,241
BAEVM_P10272_3mut





GGSGSSGGG
14,242
FLV_P10273_3mutA





PAPGGG
14,243
PERV_Q4VFZ2_3mutA_WS





GSSGGSEAAAK
14,244
MLVBM_Q7SVK7_3mut





GGSEAAAK
14,245
MLVMS_P03355_3mut





GGGPAPGGS
14,246
MLVFF_P26809_3mut





GSAGSAAGSGEF
14,247
MLVBM_Q7SVK7_3mutA_WS





EAAAKPAPGGS
14,248
SFVCP_Q87040





PAPGGG
14,249
PERV_Q4VFZ2_3mutA_WS





GSSPAPEAAAK
14,250
MLVBM_Q7SVK7





PAPEAAAK
14,251
MLVBM_Q7SVK7_3mut





PAPGGGGGS
14,252
AVIRE_P03360_3mutA





GGSEAAAKPAP
14,253
MLVBM_Q7SVK7_3mut





EAAAKGSS
14,254
WMSV_P03359_3mutA





GGGEAAAK
14,255
MLVFF_P26809_3mutA





EAAAKEAAAKEAAAK
14,256
MLVMS_P03355_3mut





PAPEAAAKGGG
14,257
BAEVM_P10272_3mut





PAPAPAP
14,258
MLVCB_P08361_3mut





EAAAKPAPGGS
14,259
BAEVM_P10272_3mut





GGGGSGGGGS
14,260
FLV_P10273_3mut





GGGGSEAAAKGGGGS
14,261
KORV_Q9TTC1_3mut





EAAAK
14,262
FLV_P10273_3mut





PAPAPAP
14,263
WMSV_P03359_3mut





GGGGSEAAAKGGGGS
14,264
FFV_O93209-Pro





GGSPAPEAAAK
14,265
MLVMS_P03355_3mut





GGSGSSGGG
14,266
XMRV6_A1Z651_3mut





GGSPAPGSS
14,267
PERV_Q4VFZ2_3mut





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
14,268
SFV3L_P27401-Pro_2mutA





EAAAKGGGPAP
14,269
BAEVM_P10272_3mutA





GSSGGSEAAAK
14,270
MLVMS_P03355_3mutA_WS





SGSETPGTSESATPES
14,271
PERV_Q4VFZ2_3mutA_WS





EAAAKEAAAKEAAAKEAAAKEAAAK
14,272
KORV_Q9TTC1-Pro_3mutA





GSSGSSGSS
14,273
KORV_Q9TTC1_3mutA





GSSPAPGGG
14,274
SFV3L_P27401-Pro_2mutA





GSSGGGEAAAK
14,275
KORV_Q9TTC1_3mutA





GGSGGGGSS
14,276
PERV_Q4VFZ2_3mutA_WS





GSSGGGEAAAK
14,277
MLVCB_P08361_3mut





GSSEAAAKGGG
14,278
MLVCB_P08361_3mut





GGSGGGGSS
14,279
KORV_Q9TTC1_3mutA





GGSGSSPAP
14,280
PERV_Q4VFZ2_3mutA_WS





GSSPAP
14,281
MLVMS_P03355_3mut





GGGGSSEAAAK
14,282
AVIRE_P03360





GGS

WMSV_P03359_3mut





EAAAKEAAAK
14,284
PERV_Q4VFZ2_3mut





PAPAPAPAP
14,285
MLVAV_P03356_3mut





GGSEAAAKGGG
14,286
KORV_Q9TTC1_3mutA





PAPGGG
14,287
MLVAV_P03356_3mut





EAAAKGSS
14,288
BAEVM_P10272_3mut





GGGGSGGGGS
14,289
WMSV_P03359_3mutA





GGSGGSGGS
14,290
SFV3L_P27401_2mut





EAAAK
14,291
MLVCB_P08361_3mut





GGGGSSGGS
14,292
WMSV_P03359_3mutA





GGGPAPEAAAK
14,293
MLVAV_P03356_3mutA





EAAAKEAAAKEAAAK
14,294
FFV_O93209





GSSEAAAKGGG
14,295
MLVBM_Q7SVK7_3mut





GGGPAPGGS
14,296
FLV_P10273_3mut





GGSEAAAKGGG
14,297
WMSV_P03359_3mut





EAAAKGGGGGS
14,298
XMRV6_A1Z651_3mutA





EAAAKGGSGGG
14,299
FLV_P10273_3mutA





GGSEAAAKGGG
14,300
SFV3L_P27401_2mutA





GGGGS
14,301
PERV_Q4VFZ2_3mutA_WS





GSSGGS
14,302
MLVMS_P03355_3mut





GSSGSS
14,303
MLVAV_P03356_3mutA





GGSPAPGGG
14,304
MLVBM_Q7SVK7_3mutA_WS





GSSGGGGGS
14,305
MLVF5_P26810_3mut





PAPAPAPAP
14,306
MLVCB_P08361_3mut





PAPAP
14,307
PERV_Q4VFZ2_3mutA_WS





PAPGSSGGS
14,308
KORV_Q9TTC1_3mut





PAPGSSGGG
14,309
PERV_Q4VFZ2_3mut





GGGEAAAK
14,310
MLVMS_P03355_PLV919





GGSGGSGGSGGSGGS
14,311
SFV3L_P27401-Pro_2mutA





GGSGGG
14,312
FLV_P10273_3mut





PAPEAAAKGGG
14,313
MLVFF_P26809_3mut





PAP

PERV_Q4VFZ2_3mutA_WS





PAPGGSGSS
14,315
FFV_O93209_2mut





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
14,316
FFV_O93209-Pro_2mut





GSSGSSGSSGSS
14,317
FFV_O93209-Pro





GSSGSSGSSGSSGSS
14,318
FLV_P10273_3mutA





GGGEAAAKPAP
14,319
PERV_Q4VFZ2





PAPGSSGGG
14,320
SFV3L_P27401_2mut





PAPGGSGSS
14,321
KORV_Q9TTC1-Pro_3mut





PAPAPAPAPAP
14,322
GALV_P21414_3mutA





GGSGGGEAAAK
14,323
PERV_Q4VFZ2_3mut





GSSPAP
14,324
MLVCB_P08361_3mut





EAAAKPAP
14,325
MLVF5_P26810_3mut





GGGGSGGGGSGGGGSGGGGS
14,326
MLVBM_Q7SVK7_3mut





GGSGGG
14,327
WMSV_P03359_3mut





GGSGGSGGS
14,328
KORV_Q9TTC1_3mut





GGGGGGGG
14,329
MLVFF_P26809_3mut





GGGGSS
14,330
MLVAV_P03356_3mut





GSSGGGGGS
14,331
SFV3L_P27401_2mut





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
14,332
GALV_P21414_3mutA





GSSGSSGSS
14,333
PERV_Q4VFZ2_3mut





GSSPAPGGS
14,334
MLVFF_P26809_3mut





PAPAPAP
14,335
AVIRE_P03360_3mutA





EAAAKEAAAKEAAAKEAAAK
14,336
WMSV_P03359_3mutA





PAPAPAPAP
14,337
SFV3L_P27401_2mutA





GGGGSS
14,338
MLVAV_P03356_3mutA





GSSGSSGSSGSSGSS
14,339
SFV3L_P27401_2mutA





PAPGGS
14,340
WMSV_P03359_3mutA





GSSEAAAKGGG
14,341
PERV_Q4VFZ2





GSSGGSPAP
14,342
MLVMS_P03355_PLV919





GSSGSSGSSGSSGSSGSS
14,343
SFV3L_P27401_2mutA





GGSGSSGGG
14,344
MLVCB_P08361_3mut





GGGPAPGSS
14,345
SFV3L_P27401-Pro_2mutA





GSSEAAAKGGS
14,346
WMSV_P03359_3mut





GSSEAAAKGGG
14,347
MLVAV_P03356_3mut





GGSGGGPAP
14,348
FFV_O93209-Pro





GSSGSS
14,349
PERV_Q4VFZ2_3mut





PAPGGGGGS
14,350
GALV_P21414_3mutA





EAAAKPAPGGS
14,351
MLVAV_P03356_3mut





GSSGSS
14,352
MLVMS_P03355_3mut





EAAAKPAPGGS
14,353
FFV_O93209-Pro





GGGPAPEAAAK
14,354
MLVMS_P03355_3mutA_WS





GSSEAAAKGGG
14,355
MLVBM_Q7SVK7_3mut





GGGEAAAKGGS
14,356
BAEVM_P10272_3mut





GSSGSS
14,357
KORV_Q9TTC1-Pro_3mutA





EAAAKEAAAKEAAAK
14,358
SFV1_P23074





PAPGSSGGS
14,359
KORV_Q9TTC1-Pro_3mut





PAPAPAPAPAP
14,360
MLVMS_P03355





GSSEAAAK
14,361
SFV3L_P27401_2mut





PAP

PERV_Q4VFZ2_3mut





GGSEAAAKGGG
14,363
MLVBM_Q7SVK7_3mut





GGSGGGPAP
14,364
MLVBM_Q7SVK7_3mutA_WS





GSSGSS
14,365
MLVMS_P03355_3mut





GGSEAAAK
14,366
MLVMS_P03355





GSSEAAAKGGS
14,367
MLVMS_P03355_PLV919





PAPGGGGGS
14,368
MLVFF_P26809_3mut





GSSGGG
14,369
PERV_Q4VFZ2_3mut





GSSGGS
14,370
PERV_Q4VFZ2_3mutA_WS





PAPGGG
14,371
BAEVM_P10272_3mut





PAPGSSGGG
14,372
MLVBM_Q7SVK7_3mut





GGSEAAAK
14,373
SFV3L_P27401_2mut





GSSPAPEAAAK
14,374
SFV3L_P27401-Pro_2mut





GSSGGSPAP
14,375
BAEVM_P10272_3mut





GGSPAPGSS
14,376
PERV_Q4VFZ2_3mutA_WS





GGSGGSGGS
14,377
PERV_Q4VFZ2





GGSGGGPAP
14,378
FLV_P10273_3mut





GGGPAPEAAAK
14,379
SFV3L_P27401_2mutA





GGGGS
14,380
FLV_P10273_3mutA





GSSGGSGGG
14,381
XMRV6_A1Z651_3mut





EAAAKGGGGSS
14,382
PERV_Q4VFZ2





GGSGSSGGG
14,383
SFV3L_P27401-Pro_2mutA





GGSGGSGGS
14,384
MLVFF_P26809_3mut





GGGPAPEAAAK
14,385
FLV_P10273_3mut





GSSGGGEAAAK
14,386
MLVMS_P03355_3mut





GGG

SFV3L_P27401_2mut





GSAGSAAGSGEF
14,388
WMSV_P03359_3mut





GSSGGGPAP
14,389
MLVMS_P03355_PLV919





GGGGSS
14,390
KORV_Q9TTC1-Pro_3mut





GGGGSSEAAAK
14,391
KORV_Q9TTC1





PAPGGSGGG
14,392
SFV3L_P27401_2mut





GSSGSSGSSGSSGSS
14,393
FFV_O93209





GSSGGSPAP
14,394
MLVMS_P03355_3mut





GGSEAAAK
14,395
KORV_Q9TTC1-Pro_3mutA





GGGGSGGGGS
14,396
BAEVM_P10272_3mut





GSSEAAAKGGG
14,397
AVIRE_P03360_3mut





EAAAKPAPGGG
14,398
FLV_P10273_3mut





EAAAKGGSPAP
14,399
SFV3L_P27401-Pro_2mutA





GSSEAAAKPAP
14,400
MLVBM_Q7SVK7_3mut





GGGPAPGGS
14,401
MLVCB_P08361_3mut





GGG

SFV3L_P27401_2mutA





EAAAKGGGGSEAAAK
14,403
SFV3L_P27401_2mutA





GGSGSSGGG
14,404
MLVBM_Q7SVK7_3mut





GSAGSAAGSGEF
14,405
BAEVM_P10272_3mut





GGGEAAAK
14,406
FOAMV_P14350_2mutA





PAPEAAAKGGS
14,407
WMSV_P03359_3mut





PAPAPAPAPAPAP
14,408
MLVF5_P26810_3mutA





GGSGGGGSS
14,409
FLV_P10273_3mutA





PAPGSSGGS
14,410
BAEVM_P10272_3mut





PAPEAAAK
14,411
WMSV_P03359_3mutA





GSSGSSGSSGSSGSSGSS
14,412
FFV_O93209-Pro_2mut





GGGGGSGSS
14,413
FFV_O93209-Pro





GGGGGGGG
14,414
SFV3L_P27401-Pro_2mutA





GGGGGG
14,415
FLV_P10273_3mut





GSSGGSGGG
14,416
MLVAV_P03356_3mutA





GGGGSS
14,417
SFV3L_P27401-Pro_2mutA





GGSGGGPAP
14,418
FOAMV_P14350_2mut





GSSGSS
14,419
AVIRE_P03360_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
14,420
SFV3L_P27401-Pro_2mutA





EAAAKEAAAK
14,421
BAEVM_P10272_3mut





GSSPAPEAAAK
14,422
GALV_P21414_3mutA





GGSEAAAKPAP
14,423
SFV3L_P27401_2mutA





GGSGGGEAAAK
14,424
SFV3L_P27401-Pro_2mutA





EAAAKGSSPAP
14,425
FOAMV_P14350_2mut





GGSGSSEAAAK
14,426
SFV3L_P27401_2mut





GGG

PERV_Q4VFZ2





GGGGGSGSS
14,428
FOAMV_P14350_2mut





GGSGGGEAAAK
14,429
KORV_Q9TTC1-Pro_3mut





GSSGGSGGG
14,430
AVIRE_P03360_3mutA





EAAAKPAPGGG
14,431
SFV3L_P27401_2mutA





PAPGGSGGG
14,432
KORV_Q9TTC1-Pro_3mut





PAPAPAP
14,433
WMSV_P03359_3mutA





GSSEAAAKPAP
14,434
SFV1_P23074





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
14,435
SRV2_P51517





GSSGGSGGG
14,436
PERV_Q4VFZ2_3mutA_WS





GSSGSSGSSGSSGSSGSS
14,437
FFV_O93209





GSSGGGPAP
14,438
WMSV_P03359_3mut





PAPAPAPAPAPAP
14,439
MLVBM_Q7SVK7_3mut





GGGGGSPAP
14,440
KORV_Q9TTC1-Pro_3mutA





PAPGSS
14,441
MLVBM_Q7SVK7_3mutA_WS





PAPEAAAKGGS
14,442
SFV3L_P27401-Pro_2mut





GGGGSSPAP
14,443
MLVMS_P03355_3mut





GGSEAAAK
14,444
FFV_O93209-Pro





EAAAKPAPGGS
14,445
AVIRE_P03360_3mutA





PAPGSS
14,446
WMSV_P03359_3mut





PAPGSSGGG
14,447
SFV3L_P27401-Pro_2mutA





EAAAKEAAAKEAAAK
14,448
SFV3L_P27401_2mut





GGS

MLVRD_P11227_3mut





GGGGS
14,450
KORV_Q9TTC1-Pro_3mut





GGSGGGGSS
14,451
KORV_Q9TTC1





GGSGGG
14,452
MLVMS_P03355_3mutA_WS





GGGEAAAKPAP
14,453
BAEVM_P10272_3mut





EAAAKEAAAKEAAAKEAAAKEAAAK
14,454
FLV_P10273





PAPGGSGGG
14,455
KORV_Q9TTC1-Pro_3mutA





GSSGSSGSSGSSGSSGSS
14,456
HTL1L_P0C211





GGGEAAAKPAP
14,457
WMSV_P03359





GSSGGSPAP
14,458
FFV_O93209-Pro





PAPAPAPAPAP
14,459
SFV3L_P27401-Pro_2mutA





GSSGGSEAAAK
14,460
SFV3L_P27401_2mutA





GGSPAPGSS
14,461
SFV3L_P27401_2mut





GGSGGSGGS
14,462
KORV_Q9TTC1-Pro_3mut





PAPEAAAKGSS
14,463
KORV_Q9TTC1-Pro_3mut





EAAAKGGS
14,464
KORV_Q9TTC1_3mutA





EAAAKGGGGSEAAAK
14,465
SFV3L_P27401-Pro_2mut





GGGGSSPAP
14,466
FFV_O93209-Pro





EAAAK
14,467
SFV3L_P27401_2mut





EAAAKGGGGSS
14,468
BAEVM_P10272_3mut





GGGGGSEAAAK
14,469
MLVBM_Q7SVK7_3mut





GGGG
14,470
PERV_Q4VFZ2





GGGGGSEAAAK
14,471
FLV_P10273_3mut





EAAAKGGGPAP
14,472
KORV_Q9TTC1-Pro





GGGGSGGGGSGGGGSGGGGS
14,473
FFV_O93209_2mutA





GSSGGSGGG
14,474
PERV_Q4VFZ2_3mut





GGGGSGGGGSGGGGS
14,475
GALV_P21414_3mutA





GGSGGGEAAAK
14,476
AVIRE_P03360_3mutA





PAPEAAAKGGG
14,477
SFV3L_P27401_2mut





GGGGSGGGGS
14,478
AVIRE_P03360





GSSGGGEAAAK
14,479
SFV3L_P27401_2mutA





GGGGG
14,480
AVIRE_P03360_3mutA





GGSGSS
14,481
KORV_Q9TTC1_3mut





PAPAPAPAPAPAP
14,482
FOAMV_P14350_2mut





GGSEAAAKPAP
14,483
KORV_Q9TTC1-Pro_3mut





GGGGGG
14,484
PERV_Q4VFZ2_3mut





GSSGGGEAAAK
14,485
MLVBM_Q7SVK7





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
14,486
MLVAV_P03356





GGSPAPGSS
14,487
BAEVM_P10272_3mut





GGGGSSPAP
14,488
BAEVM_P10272





GGGGSEAAAKGGGGS
14,489
SFV3L_P27401_2mut





GGGGGGGG
14,490
GALV_P21414_3mutA





PAPAP
14,491
MLVAV_P03356_3mut





GGGEAAAK
14,492
PERV_Q4VFZ2_3mutA_WS





GSSPAPGGG
14,493
FFV_O93209_2mut





GGSGGSGGSGGSGGS
14,494
BAEVM_P10272





GGGGGS
14,495
MLVF5_P26810_3mutA





PAPGGGGSS
14,496
FLV_P10273_3mutA





GGGEAAAK
14,497
MLVBM_Q7SVK7_3mut





PAPEAAAKGGG
14,498
WMSV_P03359_3mut





GSSEAAAK
14,499
MLVBM_Q7SVK7_3mut





EAAAKEAAAK
14,500
AVIRE_P03360





EAAAKGGGGGS
14,501
MLVBM_Q7SVK7_3mut





GGGEAAAKGGS
14,502
SFV3L_P27401-Pro_2mutA





PAPAPAPAPAP
14,503
MLVF5_P26810_3mut





PAPGSSEAAAK
14,504
SFV3L_P27401-Pro_2mutA





EAAAKEAAAKEAAAK
14,505
BAEVM_P10272_3mutA





GGSPAPGSS
14,506
MLVMS_P03355





PAPGSSGGS
14,507
FLV_P10273_3mutA





EAAAKEAAAKEAAAKEAAAK
14,508
FOAMV_P14350-Pro_2mut





EAAAKGGG
14,509
KORV_Q9TTC1_3mutA





EAAAKGGSGGG
14,510
MLVBM_Q7SVK7_3mut





GGGGGS
14,511
KORV_Q9TTC1-Pro_3mutA





PAPGGSGGG
14,512
WMSV_P03359_3mut





GGGPAPGGS
14,513
KORV_Q9TTC1_3mutA





GSS

FFV_O93209





GGSGGSGGS
14,515
PERV_Q4VFZ2_3mut





GGGGS
14,516
GALV_P21414_3mutA





GGGG
14,517
MLVF5_P26810_3mut





GGSEAAAKPAP
14,518
FFV_O93209-Pro_2mut





PAPAPAPAP
14,519
FFV_O93209-Pro





PAP

MLVF5_P26810_3mut





EAAAKEAAAKEAAAK
14,521
FFV_O93209_2mut





EAAAKGSS
14,522
MLVCB_P08361_3mut





EAAAKGGG
14,523
MLVBM_Q7SVK7_3mut





PAPEAAAKGGG
14,524
FFV_O93209_2mut





GSSGGGEAAAK
14,525
SFV1_P23074-Pro_2mut





PAPGGGEAAAK
14,526
GALV_P21414_3mutA





GGGGSGGGGSGGGGSGGGGS
14,527
FOAMV_P14350-Pro_2mutA





GSSGGG
14,528
FOAMV_P14350_2mut





GGGGSGGGGGGGGSGGGGS
14,529
SFV3L_P27401_2mutA





GGSGSS
14,530
AVIRE_P03360_3mut





GGSGSSEAAAK
14,531
MMTVB_P03365_WS





PAPAPAP
14,532
MLVAV_P03356_3mutA





GSSGGSPAP
14,533
SFV3L_P27401-Pro_2mut





GGSPAP
14,534
AVIRE_P03360





GGSGGGPAP
14,535
FFV_O93209





GSSEAAAK
14,536
PERV_Q4VFZ2





GSSGGGPAP
14,537
PERV_Q4VFZ2_3mutA_WS





GGGGSSEAAAK
14,538
KORV_Q9TTC1_3mutA





GGSEAAAKPAP
14,539
SFVCP_Q87040





GGSGGGPAP
14,540
FOAMV_P14350_2mutA





GGGGSGGGGSGGGGSGGGGS
14,541
BLVJ_P03361_2mutB





GGGGSSPAP
14,542
SFV3L_P27401_2mutA





EAAAKGGS
14,543
MLVF5_P26810_3mut





GGSEAAAKGSS
14,544
MLVCB_P08361_3mut





GGGGSSEAAAK
14,545
SFV3L_P27401_2mut





EAAAKGGSGGG
14,546
FOAMV_P14350_2mut





GGSGGS
14,547
FLV_P10273_3mut





EAAAKGGG
14,548
FFV_O93209-Pro





GSSGSSGSSGSSGSS
14,549
SFV3L_P27401





GSSGGGPAP
14,550
PERV_Q4VFZ2_3mutA_WS





PAPGGSEAAAK
14,551
SFV3L_P27401-Pro_2mutA





GGSPAP
14,552
KORV_Q9TTC1





EAAAKPAPGSS
14,553
KORV_Q9TTC1_3mutA





SGSETPGTSESATPES
14,554
SFV1_P23074





GSSPAP
14,555
SFV3L_P27401-Pro_2mutA





GSSPAPGGG
14,556
SFV3L_P27401_2mut





GGGEAAAKGSS
14,557
SFV1_P23074_2mut





GGGPAPGGS
14,558
BAEVM_P10272_3mut





EAAAKGGG
14,559
KORV_Q9TTC1-Pro_3mutA





GSSGGG
14,560
SFV3L_P27401-Pro_2mut





GGSPAPEAAAK
14,561
BAEVM_P10272_3mut





EAAAKGSSPAP
14,562
FFV_O93209





EAAAKGGGGSEAAAK
14,563
SFV3L_P27401-Pro_2mutA





GSSGSSGSSGSSGSS
14,564
SFV1_P23074_2mut





EAAAKGGSPAP
14,565
FOAMV_P14350_2mut





GGSGGS
14,566
KORV_Q9TTC1-Pro_3mutA





EAAAKGSSGGS
14,567
GALV_P21414





GSSGGGPAP
14,568
MLVAV_P03356





PAPEAAAKGGS
14,569
FOAMV_P14350_2mut





EAAAKPAPGGG
14,570
AVIRE_P03360_3mut





GGSPAP
14,571
SFV3L_P27401_2mutA





GGGGSGGGGS
14,572
SFV3L_P27401_2mutA





GGGGSS
14,573
AVIRE_P03360_3mutA





GGSPAPGGG
14,574
SFV3L_P27401-Pro_2mutA





EAAAKPAPGSS
14,575
SFV3L_P27401





EAAAKPAP
14,576
FOAMV_P14350-Pro_2mut





PAPEAAAKGSS
14,577
PERV_Q4VFZ2_3mutA_WS





EAAAKGGSGSS
14,578
SFV3L_P27401_2mutA





GGGEAAAKGSS
14,579
GALV_P21414_3mutA





GGGGSEAAAKGGGGS
14,580
PERV_Q4VFZ2_3mut





PAPGGSGSS
14,581
FFV_O93209-Pro_2mutA





GGSEAAAKPAP
14,582
GALV_P21414_3mutA





GGSGGSGGSGGSGGS
14,583
FFV_O93209-Pro





GSSGGSEAAAK
14,584
SFV3L_P27401-Pro_2mut





GGS

GALV_P21414_3mutA





PAPGGSEAAAK
14,586
MLVMS_P03355





PAPEAAAKGGS
14,587
BAEVM_P10272_3mutA





GGSGSSPAP
14,588
SFV3L_P27401-Pro_2mutA





GSSPAP
14,589
WMSV_P03359_3mut





GGGEAAAK
14,590
MMTVB_P03365





GGGGSS
14,591
PERV_Q4VFZ2_3mut





GGSPAPGSS
14,592
SFV3L_P27401-Pro_2mut





PAPGGS
14,593
MLVBM_Q7SVK7_3mut





EAAAKGSSPAP
14,594
MLVBM_Q7SVK7_3mut





GGGGSSGGS
14,595
PERV_Q4VFZ2_3mut





PAPAPAPAPAPAP
14,596
SFV1_P23074





GGSEAAAKGGG
14,597
SFV3L_P27401-Pro_2mut





GGSGGS
14,598
SFV1_P23074_2mut





GSSGGGGGS
14,599
MLVF5_P26810_3mutA





EAAAKGGGPAP
14,600
SFV3L_P27401





EAAAKEAAAKEAAAKEAAAK
14,601
FOAMV_P14350-Pro_2mutA





GGGPAPGSS
14,602
SFV3L_P27401_2mutA





GGGGSGGGGSGGGGSGGGGS
14,603
SFV3L_P27401_2mut





EAAAKEAAAKEAAAKEAAAK
14,604
MMTVB_P03365_WS





PAPGSSGGS
14,605
KORV_Q9TTC1-Pro_3mutA





PAPGSSEAAAK
14,606
FOAMV_P14350-Pro_2mut





GSSPAPEAAAK
14,607
BAEVM_P10272_3mut





EAAAKGGGGSEAAAK
14,608
FFV_O93209-Pro





GGSPAP
14,609
PERV_Q4VFZ2





GGSGSSEAAAK
14,610
XMRV6_A1Z651_3mut





GGSEAAAKGGG
14,611
GALV_P21414_3mutA





PAPGGGGSS
14,612
AVIRE_P03360_3mutA





GGSGGSGGSGGS
14,613
PERV_Q4VFZ2





GGGGSSGGS
14,614
PERV_Q4VFZ2_3mutA_WS





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
14,615
BAEVM_P10272_3mutA





GGGPAP
14,616
MLVAV_P03356_3mut





GGGGGGGGSGGGGSGGGGS
14,617
FFV_O93209_2mut





GSSEAAAK
14,618
FFV_O93209





GGSPAPEAAAK
14,619
FOAMV_P14350_2mut





GGGGGSEAAAK
14,620
FOAMV_P14350_2mut





GSSPAPGGS
14,621
MLVBM_Q7SVK7_3mut





GSS

SFVCP_Q87040_2mut





EAAAKPAP
14,623
FOAMV_P14350-Pro





EAAAKGGG
14,624
SFV3L_P27401_2mut





GGGEAAAK
14,625
AVIRE_P03360_3mutA





PAPGSSGGG
14,626
WMSV_P03359_3mut





EAAAKGGSPAP
14,627
SFV3L_P27401





GSSGGSGGG
14,628
SFV3L_P27401-Pro_2mutA





GSSGGGEAAAK
14,629
GALV_P21414_3mutA





GGGPAPGSS
14,630
MLVBM_Q7SVK7_3mutA_WS





PAPGGGEAAAK
14,631
FFV_O93209-Pro_2mut





GSSGSSGSSGSS
14,632
SFV1_P23074_2mut





GGSEAAAK
14,633
PERV_Q4VFZ2_3mutA_WS





GGGEAAAKPAP
14,634
SFV3L_P27401_2mut





EAAAKGGGPAP
14,635
SFV3L_P27401_2mut





GGGGSSPAP
14,636
FLV_P10273_3mut





EAAAKPAPGSS
14,637
FFV_O93209_2mut





GGGGSSPAP
14,638
SFV3L_P27401_2mut





GSSGSS
14,639
KORV_Q9TTC1_3mutA





GGGGSGGGGSGGGGSGGGGSGGGGS
14,640
BLVJ_P03361_2mut





GGGGSSGGS
14,641
GALV_P21414_3mutA





EAAAKGGSGSS
14,642
FFV_O93209-Pro





EAAAKPAP
14,643
PERV_Q4VFZ2





GSSGGGEAAAK
14,644
MLVBM_Q7SVK7_3mut





PAPGGSGGG
14,645
BAEVM_P10272





EAAAKGGGPAP
14,646
MLVF5_P26810





GSSGSSGSS
14,647
MLVBM_Q7SVK7_3mut





GSSGGS
14,648
AVIRE_P03360_3mutA





GGSEAAAKGGG
14,649
FOAMV_P14350_2mut





EAAAKGGS
14,650
MLVF5_P26810_3mutA





GGSGSSGGG
14,651
WMSV_P03359_3mut





EAAAK
14,652
SFV1_P23074_2mut





GSSGGSPAP
14,653
SFV3L_P27401-Pro_2mutA





GGGGSSGGS
14,654
KORV_Q9TTC1_3mut





PAPGGSGGG
14,655
FFV_O93209-Pro_2mut





GGGPAPGGS
14,656
SFV3L_P27401_2mutA





GSSPAPEAAAK
14,657
FLV_P10273_3mut





GGSGSSPAP
14,658
SFV3L_P27401_2mut





GSSEAAAKGGS
14,659
SFV3L_P27401_2mut





PAPGGG
14,660
SFV3L_P27401_2mutA





SGSETPGTSESATPES
14,661
KORV_Q9TTC1-Pro_3mut





GGGGS
14,662
SFV1_P23074-Pro_2mutA





GSSGGGEAAAK
14,663
WMSV_P03359





EAAAKGGGGSEAAAK
14,664
MLVF5_P26810_3mutA





GSSEAAAKPAP
14,665
FFV_O93209





GGGGGG
14,666
SFV1_P23074_2mutA





EAAAKEAAAKEAAAK
14,667
MMTVB_P03365-Pro





EAAAKPAPGSS
14,668
MLVBM_Q7SVK7_3mut





GGSGSSEAAAK
14,669
SFV3L_P27401_2mutA





GGSEAAAK
14,670
MLVMS_P03355_3mut





GGSPAPEAAAK
14,671
SFV3L_P27401_2mut





GGGPAPGSS
14,672
SFV1_P23074





GGGGGSEAAAK
14,673
MLVBM_Q7SVK7_3mutA_WS





EAAAKPAPGSS
14,674
KORV_Q9TTC1-Pro





GSSGSSGSSGSS
14,675
SFV3L_P27401_2mut





EAAAKPAP
14,676
SFV3L_P27401_2mut





GGGEAAAK
14,677
PERV_Q4VFZ2_3mut





GGSGGS
14,678
SFV3L_P27401_2mutA





EAAAKGSSGGS
14,679
MMTVB_P03365





SGSETPGTSESATPES
14,680
SFV3L_P27401





EAAAKGSSGGG
14,681
PERV_Q4VFZ2





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
14,682
MMTVB_P03365





GGSGGGPAP
14,683
KORV_Q9TTC1_3mutA





PAPAPAPAP
14,684
SFV3L_P27401





GGGEAAAKGGS
14,685
SFV1_P23074_2mut





GSSGGSGGG
14,686
PERV_Q4VFZ2_3mut





PAPEAAAKGGS
14,687
FOAMV_P14350_2mutA





GGGEAAAKGSS
14,688
SFV3L_P27401_2mut





GGGGSGGGGSGGGGSGGGGS
14,689
MLVBM_Q7SVK7





PAPGSSGGG
14,690
FLV_P10273





GGSGSSGGG
14,691
FFV_O93209





EAAAKPAPGSS
14,692
MLVBM_Q7SVK7





GSSEAAAKGGG
14,693
SFV3L_P27401_2mutA





GGSGGSGGSGGSGGS
14,694
MLVF5_P26810





GGSEAAAKPAP
14,695
SFV3L_P27401-Pro_2mutA





EAAAKGGSPAP
14,696
SFV3L_P27401_2mutA





EAAAKGGGGGS
14,697
SFV3L_P27401_2mut





GSSPAPEAAAK
14,698
SFV3L_P27401_2mutA





PAPAP
14,699
MLVBM_Q7SVK7_3mut





PAPGGSEAAAK
14,700
KORV_Q9TTC1-Pro





GGSGSS
14,701
MLVF5_P26810_3mutA





GGSEAAAKPAP
14,702
FFV_O93209_2mut





GSS

MLVMS_P03355





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
14,704
SFV3L_P27401-Pro





PAPGGGEAAAK
14,705
SFV3L_P27401_2mut





PAPGGGGGS
14,706
SFV3L_P27401-Pro_2mut





PAPGGSGSS
14,707
BAEVM_P10272_3mut





GSSGGGEAAAK
14,708
FFV_O93209





GGSEAAAKPAP
14,709
SFV1_P23074_2mut





GGGGG
14,710
FLV_P10273_3mut





GGGEAAAKGSS
14,711
SFV3L_P27401





GSSGSSGSSGSSGSS
14,712
SFV1_P23074-Pro





SGSETPGTSESATPES
14,713
AVIRE_P03360





PAPGSSGGG
14,714
MLVBM_Q7SVK7_3mut





GGGGSSPAP
14,715
HTL3P_Q4U0X6_2mut





GGGEAAAK
14,716
SFV1_P23074





GGSGGG
14,717
AVIRE_P03360





EAAAKGSSGGG
14,718
SFV3L_P27401_2mutA





GSSPAPEAAAK
14,719
FOAMV_P14350-Pro_2mutA





GGGPAPGSS
14,720
WMSV_P03359





EAAAKGSSGGG
14,721
MLVMS_P03355





GGGGGSEAAAK
14,722
MLVMS_P03355





EAAAKPAPGGS
14,723
SFV3L_P27401





EAAAKGSSPAP
14,724
SFV3L_P27401





GGGGGGG
14,725
FOAMV_P14350_2mutA





EAAAKEAAAKEAAAK
14,726
SFV3L_P27401





GSSPAPGGS
14,727
FFV_O93209_2mutA





GGGGSSEAAAK
14,728
SFV3L_P27401-Pro_2mutA





GGSEAAAKGSS
14,729
GALV_P21414_3mutA





GGSEAAAKGSS
14,730
BAEVM_P10272_3mutA





EAAAKPAPGGG
14,731
MLVCB_P08361





GSSGSSGSSGSSGSSGSS
14,732
SFV1_P23074-Pro





GGGGSEAAAKGGGGS
14,733
FOAMV_P14350_2mut





GSSPAPGGS
14,734
MLVMS_P03355_PLV919





GGGGSGGGGS
14,735
FFV_O93209-Pro





GSSGGSPAP
14,736
KORV_Q9TTC1_3mutA





GGSGGS
14,737
GALV_P21414_3mutA





PAPGSSEAAAK
14,738
WMSV_P03359





PAPGGGGSS
14,739
MMTVB_P03365-Pro





GGGGSSGGS
14,740
PERV_Q4VFZ2_3mutA_WS





GGGGSGGGGS
14,741
FFV_O93209_2mut





GGGGSGGGGSGGGGSGGGGS
14,742
XMRV6_A1Z651





GGSGSSEAAAK
14,743
SFV1_P23074_2mut





GGSGGGGSS
14,744
GALV_P21414_3mutA





GGSEAAAKPAP
14,745
MLVBM_Q7SVK7





EAAAKGGSPAP
14,746
SFV1_P23074_2mutA





PAPAPAPAP
14,747
FFV_O93209





GSSGGSPAP
14,748
MMTVB_P03365-Pro





GGGGGSPAP
14,749
KORV_Q9TTC1_3mutA





EAAAKGGGPAP
14,750
PERV_Q4VFZ2





GSSGGSPAP
14,751
BAEVM_P10272





GGGGG
14,752
FFV_O93209





GGGGGS
14,753
FLV_P10273_3mutA





EAAAKEAAAKEAAAK
14,754
FOAMV_P14350





PAPGGG
14,755
MLVCB_P08361_3mut





GSSGGSEAAAK
14,756
FOAMV_P14350_2mutA





GGSPAPGGG
14,757
FLV_P10273_3mut





GSSGSSGSSGSSGSSGSS
14,758
SFV1_P23074-Pro_2mutA





GGSPAPEAAAK
14,759
SFV3L_P27401





PAPGGGGSS
14,760
HTL3P_Q4U0X6_2mutB





GGGGSSEAAAK
14,761
MMTVB_P03365_2mut_WS





PAPGGS
14,762
MLVRD_P11227_3mut





GGSGGSGGSGGSGGS
14,763
MMTVB_P03365





GSAGSAAGSGEF
14,764
AVIRE_P03360





GSSGGS
14,765
BAEVM_P10272_3mutA





GGSGGGGSS
14,766
MMTVB_P03365





GGSGGGGSS
14,767
WMSV_P03359





PAPEAAAKGSS
14,768
SFV1_P23074





GSSGSSGSSGSS
14,769
SFV1_P23074-Pro_2mutA





PAPAPAPAPAPAP
14,770
SFV3L_P27401





PAPGSSGGG
14,771
FLV_P10273_3mut





GGSGSSPAP
14,772
MLVMS_P03355





GGSGGGPAP
14,773
FOAMV_P14350





PAPGGGGGS
14,774
KORV_Q9TTC1_3mutA





EAAAKGSSPAP
14,775
GALV_P21414_3mutA





GGSGSSPAP
14,776
MLVBM_Q7SVK7_3mut





EAAAKGSS
14,777
SFV3L_P27401_2mut





GGGGGSEAAAK
14,778
WMSV_P03359





GGGGGGGG
14,779
SFV1_P23074-Pro





EAAAKEAAAK
14,780
MLVBM_Q7SVK7





GGGEAAAKGGS
14,781
MLVBM_Q7SVK7





EAAAKGGSPAP
14,782
SFV3L_P27401_2mut





GSSEAAAK
14,783
XMRV6_A1Z651





PAPGGGEAAAK
14,784
MMTVB_P03365_WS





GGSPAP
14,785
GALV_P21414_3mutA





GSSPAPGGG
14,786
MLVBM_Q7SVK7_3mutA_WS





GGSGSSPAP
14,787
SFV1_P23074_2mutA





GGS

HTL32_Q0R5R2_2mut





GGSGGGGSS
14,789
MMTVB_P03365-Pro





GGGGSGGGGSGGGGSGGGGS
14,790
SFVCP_Q87040_2mutA





EAAAKGGGPAP
14,791
FOAMV_P14350_2mut





GSSGGGEAAAK
14,792
MMTVB_P03365





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
14,793
MLVBM_Q7SVK7_3mutA_WS





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
14,794
MMTVB_P03365_WS





EAAAKEAAAK
14,795
FOAMV_P14350-Pro_2mut





GSSPAPEAAAK
14,796
FOAMV_P14350_2mutA





EAAAKPAPGGS
14,797
GALV_P21414_3mutA





GSSGGSPAP
14,798
KORV_Q9TTC1-Pro_3mut





GGGPAPEAAAK
14,799
MLVAV_P03356





GGGEAAAKPAP
14,800
SFV1_P23074-Pro_2mut





GGGGGSEAAAK
14,801
SFV3L_P27401_2mut





GGGPAPGSS
14,802
SFV3L_P27401_2mut





GGSEAAAKPAP
14,803
AVIRE_P03360





GSSGSSGSSGSSGSSGSS
14,804
SFV1_P23074-Pro_2mut





EAAAKGSSGGS
14,805
FOAMV_P14350_2mutA





GGGGGG
14,806
MLVBM_Q7SVK7_3mut





GSSPAPGGS
14,807
PERV_Q4VFZ2





GGSGSSPAP
14,808
GALV_P21414_3mutA





GGGPAPEAAAK
14,809
SFV3L_P27401





GGSGGGEAAAK
14,810
WMSV_P03359





GSAGSAAGSGEF
14,811
SFV1_P23074_2mut





GSSGGGEAAAK
14,812
MLVMS_P03355





GGG

MMTVB_P03365-Pro





PAPGSSGGS
14,814
FOAMV_P14350_2mut





GGGGSSPAP
14,815
FFV_O93209_2mut





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
14,816
MMTVB_P03365_WS





GGGGGGG
14,817
XMRV6_A1Z651





PAPAPAPAPAP
14,818
FOAMV_P14350





GGGGSGGGGSGGGGSGGGGS
14,819
MMTVB_P03365_2mut_WS





GGSGGGPAP
14,820
SFV3L_P27401_2mut





GGGGGG
14,821
SFV1_P23074-Pro





EAAAKPAPGSS
14,822
SFV3L_P27401_2mut





GGGGSSGGS
14,823
HTL3P_Q4U0X6_2mut





PAPGSSEAAAK
14,824
MMTVB_P03365-Pro





GGGGSSPAP
14,825
FOAMV_P14350-Pro_2mut





PAPGSSGGS
14,826
MMTVB_P03365





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
14,827
SRV2_P51517





PAPAPAP
14,828
MMTVB_P03365_2mut_WS





PAPGGGGGS
14,829
MMTVB_P03365_2mutB





GGGGSS
14,830
SFV1_P23074-Pro_2mutA





EAAAKEAAAKEAAAKEAAAK
14,831
SFV3L_P27401-Pro





GGSGGSGGSGGSGGS
14,832
MMTVB_P03365-Pro





GGGGGGG
14,833
SFV3L_P27401_2mut





PAPGGGEAAAK
14,834
SFV3L_P27401





PAPGSS
14,835
FOAMV_P14350_2mutA





GGGGSGGGGS
14,836
SFVCP_Q87040_2mutA





GSSGGSGGG
14,837
XMRV6_A1Z651





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
14,838
MLVBM_Q7SVK7





GSSEAAAKGGG
14,839
FFV_O93209-Pro_2mut





GGSEAAAKPAP
14,840
SFV3L_P27401-Pro





GSSGGSGGG
14,841
SFV1_P23074_2mut





EAAAKGGGGSS
14,842
FOAMV_P14350_2mutA





GGGGGG
14,843
SFV3L_P27401_2mut





GGGGG
14,844
MLVBM_Q7SVK7_3mut





PAPEAAAKGGG
14,845
SFV3L_P27401





EAAAKGGSPAP
14,846
KORV_Q9TTC1_3mutA





GGGEAAAKPAP
14,847
SFV1_P23074_2mut





GSSGSSGSSGSSGSSGSS
14,848
KORV_Q9TTC1-Pro





EAAAKEAAAKEAAAKEAAAK
14,849
SFVCP_Q87040





PAPGSSEAAAK
14,850
MLVBM_Q7SVK7





GSSGSSGSS
14,851
FFV_O93209-Pro_2mut





GSSGGGPAP
14,852
SFV3L_P27401-Pro_2mut





GGGPAPEAAAK
14,853
WMSV_P03359_3mut





GGGEAAAK
14,854
MMTVB_P03365-Pro





GSSGSSGSSGSS
14,855
SFV3L_P27401-Pro_2mutA





PAPAPAPAPAP
14,856
FFV_O93209-Pro





GGSPAPEAAAK
14,857
FFV_O93209-Pro_2mut





GSSGSSGSSGSSGSSGSS
14,858
GALV_P21414





EAAAKEAAAKEAAAKEAAAKEAAAK
14,859
FOAMV_P14350





GGGPAPEAAAK
14,860
MMTVB_P03365-Pro





PAPGGSGGG
14,861
MLVF5_P26810_3mutA





PAPGGSGGG
14,862
FLV_P10273_3mut





GGGEAAAKGGS
14,863
SFV3L_P27401





GSAGSAAGSGEF
14,864
MLVBM_Q7SVK7_3mut





GSSPAPGGG
14,865
MPMV_P07572_2mutB





GSSGSSGSSGSSGSSGSS
14,866
FOAMV_P14350





GGSGGGGSS
14,867
BLVJ_P03361_2mut





PAPEAAAKGSS
14,868
SFV1_P23074-Pro





GGG

FFV_O93209





EAAAKGGGGSS
14,870
SFV1_P23074_2mut





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
14,871
SRV2_P51517





GGGGSGGGGSGGGGSGGGGSGGGGGGGGS
14,872
MMTVB_P03365





GGGEAAAKGGS
14,873
MMTVB_P03365_WS





GSSGSS
14,874
SFV1_P23074





GSSGGGGGS
14,875
SFV3L_P27401





GGGGSSEAAAK
14,876
SFV1_P23074





EAAAKGSSGGS
14,877
HTL1A_P03362_2mutB





GSSEAAAKGGS
14,878
GALV_P21414_3mutA





EAAAKGSSPAP
14,879
SFV1_P23074





EAAAKPAPGSS
14,880
SFV3L_P27401_2mutA





PAPGSSGGG
14,881
SFV3L_P27401-Pro_2mut





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
14,882
SFV3L_P27401-Pro





EAAAKEAAAKEAAAKEAAAKEAAAK
14,883
MMTVB_P03365_WS





GGGGSSEAAAK
14,884
MLVF5_P26810_3mutA





EAAAKGGSPAP
14,885
GALV_P21414





PAPEAAAKGSS
14,886
MMTVB_P03365_WS





GSSGGGGGS
14,887
SFVCP_Q87040_2mut





GGGGSSPAP
14,888
SFV1_P23074





EAAAKGGGGSS
14,889
XMRV6_A1Z651





PAPAPAPAP
14,890
MMTVB_P03365





GGSEAAAKGSS
14,891
SFV3L_P27401_2mutA





GSSPAPGGG
14,892
MMTVB_P03365_WS





GGGGGG
14,893
SFV3L_P27401-Pro





GGSGGSGGS
14,894
FOAMV_P14350-Pro_2mut





PAPAPAPAPAPAP
14,895
WMSV_P03359





GSSPAP
14,896
MLVBM_Q7SVK7





GGGGGSGSS
14,897
MMTVB_P03365_2mut_WS





EAAAKGSSGGS
14,898
MMTVB_P03365_2mutB_WS





EAAAK
14,899
FFV_O93209_2mutA





PAPEAAAK
14,900
SFV1_P23074-Pro





EAAAKGGSGSS
14,901
SFV3L_P27401





GGSGGSGGS
14,902
FFV_O93209-Pro





GSSGGGEAAAK
14,903
MMTVB_P03365





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
14,904
MLVFF_P26809_3mutA





GGSGGSGGSGGSGGSGGS
14,905
HTL1L_P0C211_2mutB





GGGEAAAK
14,906
SFV3L_P27401-Pro_2mutA





GGGGGSGSS
14,907
MMTVB_P03365





GSSPAPGGS
14,908
FOAMV_P14350_2mutA





EAAAKGSS
14,909
MLVMS_P03355





GSSGGSGGG
14,910
FFV_O93209-Pro





GGSGGGGSS
14,911
MMTVB_P03365-Pro_2mut





GGSPAPGSS
14,912
FOAMV_P14350_2mut





GGSGGSGGSGGSGGSGGS
14,913
SFVCP_Q87040-Pro_2mut





GSSEAAAKGGG
14,914
FOAMV_P14350_2mutA





GGSGGSGGS
14,915
MMTVB_P03365-Pro





GSSGSSGSSGSSGSSGSS
14,916
MMTVB_P03365_2mut_WS





GSSGSSGSSGSSGSS
14,917
MMTVB_P03365-Pro





PAPEAAAK
14,918
WDSV_O92815





GSSGSSGSSGSSGSS
14,919
FFV_O93209-Pro_2mut





EAAAKGGGGSEAAAK
14,920
MMTVB_P03365-Pro





GGSPAPEAAAK
14,921
FOAMV_P14350





GSSGSS
14,922
PERV_Q4VFZ2





GGG

MMTVB_P03365-Pro





GGGGSGGGGSGGGGS
14,924
FFV_O93209_2mut





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
14,925
MMTVB_P03365-Pro





GGSGSSPAP
14,926
WMSV_P03359





GGGGGGGG
14,927
SFV3L_P27401_2mut





PAPGSSEAAAK
14,928
FOAMV_P14350-Pro_2mutA





GGGGSSPAP
14,929
FOAMV_P14350_2mut





GSSGGSPAP
14,930
MLVBM_Q7SVK7_3mut





GSSGGGGGS
14,931
GALV_P21414_3mutA





EAAAKEAAAKEAAAKEAAAKEAAAK
14,932
MMTVB_P03365





GSSGGGGGS
14,933
SFV1_P23074_2mut





GGGGSEAAAKGGGGS
14,934
SFV1_P23074





GGGEAAAKPAP
14,935
FFV_O93209





PAPGGGEAAAK
14,936
SFV1_P23074





GGSGGGEAAAK
14,937
PERV_Q4VFZ2_3mutA_WS





GSSGGG
14,938
MMTVB_P03365-Pro





EAAAKGSSGGS
14,939
FFV_O93209_2mut





GGGGG
14,940
SFV1_P23074_2mut





GGGPAP
14,941
SFV3L_P27401





GSSGGSEAAAK
14,942
FFV_O93209





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
14,943
MMTVB_P03365-Pro





GSSGGGEAAAK
14,944
SFV1_P23074_2mutA





GSSGSSGSSGSSGSS
14,945
SFV3L_P27401_2mut





GGSEAAAKPAP
14,946
FLV_P10273





GGGGSGGGGS
14,947
FOAMV_P14350-Pro_2mutA





GSSEAAAKPAP
14,948
SFV3L_P27401





GGGGSEAAAKGGGGS
14,949
MMTVB_P03365-Pro





PAPGSSEAAAK
14,950
MLVF5_P26810_3mut





EAAAKGGSGGG
14,951
SFV3L_P27401





GGGPAPGGS
14,952
SFV3L_P27401





GSSEAAAKGGS
14,953
FOAMV_P14350_2mutA





EAAAKGGSGGG
14,954
HTL1L_P0C211





GSSGGSPAP
14,955
SFV3L_P27401_2mutA





PAPAP
14,956
FFV_O93209





PAPGGSGSS
14,957
MMTVB_P03365_WS





EAAAKGGGGGS
14,958
FOAMV_P14350_2mut





PAPEAAAKGGS
14,959
SFV3L_P27401_2mut





GSSEAAAKPAP
14,960
MMTVB_P03365-Pro





GGSGGS
14,961
PERV_Q4VFZ2_3mut





GSSEAAAKGGG
14,962
FFV_O93209-Pro_2mutA





EAAAK
14,963
HTL1L_P0C211





GSSPAP
14,964
MLVMS_P03355





EAAAKPAPGGG
14,965
FFV_O93209-Pro_2mut





GGGGSEAAAKGGGGS
14,966
SFV1_P23074-Pro_2mut





EAAAKGSSGGS
14,967
SFV3L_P27401





GSAGSAAGSGEF
14,968
FFV_O93209_2mutA





PAPEAAAKGGS
14,969
MMTVB_P03365_2mutB_WS





EAAAKEAAAKEAAAKEAAAKEAAAKEAAAK
14,970
MMTVB_P03365





GGS

MMTVB_P03365





GGSEAAAKPAP
14,972
SFV1_P23074





EAAAKGSSGGG
14,973
HTLV2_P03363_2mut





GGSEAAAKGGG
14,974
MMTVB_P03365_WS





GGSGGS
14,975
FFV_O93209-Pro





GSSEAAAKGGS
14,976
MMTVB_P03365-Pro





PAPAPAPAPAP
14,977
SFV1_P23074_2mutA





GGSEAAAKGGG
14,978
MMTVB_P03365_2mutB_WS





PAPAPAPAP
14,979
MMTVB_P03365_WS





GGGGGGGGSGGGGSGGGGSGGGGS
14,980
HTL3P_Q4U0X6_2mut





PAPGGSEAAAK
14,981
SFV1_P23074-Pro_2mut





GGSGGGPAP
14,982
MMTVB_P03365





GSSGSSGSSGSSGSSGSS
14,983
MMTVB_P03365-Pro





GGSEAAAKPAP
14,984
SFV1_P23074-Pro





GGGEAAAKGSS
14,985
SFV3L_P27401_2mutA





GGGPAPGGS
14,986
AVIRE_P03360





PAPGGG
14,987
MLVRD_P11227





GGSEAAAKGSS
14,988
SFV3L_P27401_2mut





GGGEAAAKGSS
14,989
FOAMV_P14350_2mut





GGGEAAAKGSS
14,990
SFV1_P23074-Pro





EAAAKEAAAKEAAAKEAAAK
14,991
MLVAV_P03356





EAAAKGGGPAP
14,992
JSRV_P31623_2mutB





EAAAKGGGGSS
14,993
FOAMV_P14350_2mut





EAAAKEAAAKEAAAKEAAAKEAAAK
14,994
SRV2_P51517





GSSGGGGGS
14,995
FFV_O93209





PAPAPAP
14,996
FOAMV_P14350_2mutA





GGSGGSGGSGGS
14,997
FOAMV_P14350





GGGEAAAK
14,998
MMTVB_P03365_WS





GGGGGS
14,999
SFV1_P23074_2mutA





GGSGGS
15,000
WMSV_P03359_3mut





EAAAKGGS
15,001
MMTVB_P03365-Pro





GGGGSS
15,002
BLVJ_P03361_2mut





PAPAP
15,003
MMTVB_P03365-Pro_2mut





PAPGGG
15,004
SMRVH_P03364





EAAAKGGGGSS
15,005
SFV3L_P27401





PAPAPAPAPAP
15,006
MMTVB_P03365





GGGPAP
15,007
MMTVB_P03365-Pro





GSSGGSGGG
15,008
MMTVB_P03365





EAAAKGGGPAP
15,009
FOAMV_P14350_2mutA





GSSGSSGSSGSS
15,010
SFV1_P23074





GGGGSGGGGS
15,011
SFV3L_P27401





GSSGGSGGG
15,012
MLVF5_P26810





GGGEAAAKPAP
15,013
MMTVB_P03365-Pro





PAPEAAAK
15,014
HTLV2_P03363_2mut





GSSGSSGSSGSS
15,015
FOAMV_P14350_2mut





GSSEAAAKPAP
15,016
MMTVB_P03365-Pro





PAPEAAAKGGG
15,017
HTL3P_Q4U0X6_2mut





GGSEAAAKGSS
15,018
MMTVB_P03365-Pro





EAAAKPAPGGS
15,019
MMTVB_P03365_2mut_WS





GSSGGSEAAAK
15,020
MLVF5_P26810_3mutA





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
15,021
MLVF5_P26810_3mut





EAAAKGGGGSS
15,022
MMTVB_P03365-Pro





GGGGGSGSS
15,023
HTL1A_P03362_2mutB





PAPAP
15,024
FFV_O93209-Pro_2mut





GGGGGSPAP
15,025
HTL1C_P14078_2mut





GGGPAP
15,026
HTLV2_P03363_2mut





EAAAKGGGGSEAAAK
15,027
SFVCP_Q87040





GGSEAAAKGGG
15,028
FFV_O93209-Pro_2mutA





GSSPAPGGS
15,029
FOAMV_P14350-Pro_2mut





GGGGGGG
15,030
MMTVB_P03365-Pro





EAAAKGSS
15,031
SFV3L_P27401_2mutA





EAAAKGGGGSEAAAK
15,032
MMTVB_P03365-Pro





GGGGSEAAAKGGGGS
15,033
SFV1_P23074-Pro_2mutA





EAAAKGGGGSS
15,034
MMTVB_P03365





GGGEAAAKGGS
15,035
SFV1_P23074





PAPEAAAKGGG
15,036
MLVF5_P26810





GGGGSSGGS
15,037
MMTVB_P03365





GGSGSS
15,038
MMTVB_P03365





PAPAPAPAPAPAP
15,039
KORV_Q9TTC1





EAAAKGGG
15,040
SFV1_P23074-Pro_2mut





PAPAPAPAPAPAP
15,041
SRV2_P51517





GSSGSSGSSGSSGSS
15,042
FFV_O93209-Pro_2mutA





GGGGSS
15,043
FOAMV_P14350_2mut





PAPGGGEAAAK
15,044
MMTVB_P03365_WS





GGSGGGEAAAK
15,045
FFV_O93209-Pro_2mut





PAPAPAPAPAP
15,046
MMTVB_P03365_WS





GGGEAAAKGGS
15,047
MMTVB_P03365-Pro





GGGEAAAKGSS
15,048
MMTVB_P03365_2mutB





GSSPAPEAAAK
15,049
MMTVB_P03365_WS





EAAAKEAAAKEAAAKEAAAKEAAAK
15,050
SFV1_P23074-Pro_2mutA





PAPGGG
15,051
SFV3L_P27401





GSSEAAAKGGG
15,052
MMTVB_P03365_WS





GGGGSSEAAAK
15,053
FOAMV_P14350_2mut





PAPGSSGGS
15,054
SFV1_P23074-Pro_2mut





GSSGSSGSSGSSGSSGSS
15,055
SFV3L_P27401





EAAAKGSSGGG
15,056
MMTVB_P03365





PAPGGGGSS
15,057
WDSV_O92815_2mutA





GGSPAP
15,058
MMTVB_P03365-Pro





GGSGGSGGSGGSGGS
15,059
SFVCP_Q87040-Pro_2mut





PAPAPAPAP
15,060
MMTVB_P03365-Pro





GGGGG
15,061
HTL1A_P03362





GGSGGSGGSGGS
15,062
SFV1_P23074_2mutA





GSSGSSGSSGSSGSS
15,063
FOAMV_P14350-Pro_2mut





PAPGGSEAAAK
15,064
MMTVB_P03365_2mutB_WS





PAPAPAPAP
15,065
SFV1_P23074_2mut





PAPGGGGSS
15,066
MMTVB_P03365





GGSGSS
15,067
SFV3L_P27401_2mut





EAAAKEAAAKEAAAKEAAAK
15,068
MMTVB_P03365_2mut





EAAAKGGSGGG
15,069
HTL3P_Q4U0X6_2mut





PAPGGGGSS
15,070
SFVCP_Q87040-Pro_2mutA





EAAAKGGGGGS
15,071
MLVAV_P03356





GGGGGS
15,072
FOAMV_P14350_2mut





GGGEAAAKGGS
15,073
FFV_O93209-Pro_2mutA





EAAAKPAPGGG
15,074
MMTVB_P03365_2mutB





GGSGGGPAP
15,075
FFV_O93209_2mut





GSSEAAAKPAP
15,076
MMTVB_P03365





PAPAPAPAPAPAP
15,077
SFV1_P23074_2mut





GGSPAPGGG
15,078
MMTVB_P03365-Pro





GGSGGGEAAAK
15,079
MMTVB_P03365





PAPAP
15,080
SFVCP_Q87040





GSSEAAAK
15,081
SFVCP_Q87040





GGGGSGGGGSGGGGS
15,082
MMTVB_P03365-Pro





GSSGSSGSS
15,083
SFV3L_P27401





EAAAKGGSGGG
15,084
MMTVB_P03365-Pro





GSSPAP
15,085
SFV1_P23074_2mut





GGGEAAAK
15,086
SFV1_P23074-Pro





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
15,087
MMTVB_P03365-Pro





PAPGGS
15,088
HTL1C_P14078_2mut





PAPGSSGGS
15,089
SFV1_P23074_2mut





PAPEAAAK
15,090
MMTVB_P03365_WS





PAPAP
15,091
MMTVB_P03365-Pro





EAAAKGGS
15,092
HTL1A_P03362_2mut





GGGGSEAAAKGGGGS
15,093
HTL1C_P14078





EAAAKGSSGGS
15,094
FOAMV_P14350-Pro





PAPGGSGSS
15,095
MMTVB_P03365-Pro





PAPGGSEAAAK
15,096
SFV1_P23074_2mut





PAPGSSEAAAK
15,097
FFV_O93209-Pro_2mut





PAPGSSGGG
15,098
FOAMV_P14350-Pro_2mutA





GSSGGGEAAAK
15,099
AVIRE_P03360





GGGGGG
15,100
SMRVH_P03364_2mut





PAPEAAAKGGG
15,101
MMTVB_P03365-Pro





GGGEAAAKGGS
15,102
SFVCP_Q87040_2mutA





PAPAPAPAPAP
15,103
SRV2_P51517





GSSGSSGSSGSSGSSGSS
15,104
MMTVB_P03365





EAAAKGGGPAP
15,105
MLVAV_P03356





PAPAPAPAPAP
15,106
FOAMV_P14350-Pro_2mutA





PAPGGSEAAAK
15,107
FOAMV_P14350





GSSGGGPAP
15,108
HTL32_Q0R5R2_2mutB





GGGGGSPAP
15,109
HTL3P_Q4U0X6_2mutB





GSSGGSGGG
15,110
MMTVB_P03365-Pro





PAPAP
15,111
SFVCP_Q87040-Pro





GSSGGGPAP
15,112
MMTVB_P03365-Pro





GGSGSS
15,113
MMTVB_P03365-Pro_2mut





GGSPAPEAAAK
15,114
SFV1_P23074-Pro_2mut





EAAAKGGSGGG
15,115
SFV3L_P27401_2mut





GGGGSSEAAAK
15,116
MMTVB_P03365_WS





GGGGGSGSS
15,117
MMTVB_P03365_2mut





GGGGSSGGS
15,118
SFV1_P23074-Pro_2mutA





EAAAKGGGGSEAAAK
15,119
MMTVB_P03365_WS





PAPGGGEAAAK
15,120
SFV1_P23074-Pro





PAPEAAAKGGG
15,121
MMTVB_P03365





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
15,122
MMTVB_P03365





GSSGGSEAAAK
15,123
FOAMV_P14350-Pro_2mut





GGSPAP
15,124
MLVBM_Q7SVK7_3mut





GSSEAAAK
15,125
FOAMV_P14350





GSSEAAAK
15,126
MMTVB_P03365-Pro





EAAAKGSSGGS
15,127
HTL1A_P03362_2mut





GGGEAAAKPAP
15,128
FOAMV_P14350-Pro_2mut





EAAAKGGSPAP
15,129
FOAMV_P14350





GSSEAAAKPAP
15,130
MMTVB_P03365_WS





GSSGSSGSS
15,131
FOAMV_P14350_2mut





EAAAKEAAAKEAAAKEAAAK
15,132
MMTVB_P03365_WS





EAAAK
15,133
MMTVB_P03365





PAPGSS
15,134
BAEVM_P10272





PAPGGS
15,135
FFV_O93209-Pro_2mut





GGSGGS
15,136
SFV1_P23074-Pro_2mutA





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
15,137
HTLV2_P03363_2mut





GGSGGGEAAAK
15,138
MMTVB_P03365_WS





PAPGSSGGG
15,139
HTL1A_P03362





GGSGGS
15,140
SFV3L_P27401-Pro





GSSGSS
15,141
SFV1_P23074-Pro





PAPGGSEAAAK
15,142
MMTVB_P03365





GSAGSAAGSGEF
15,143
MMTVB_P03365-Pro





PAPGGG
15,144
FOAMV_P14350_2mut





EAAAKGGSGSS
15,145
MMTVB_P03365_WS





GSSGGGEAAAK
15,146
SFV3L_P27401-Pro





GGSGGGPAP
15,147
FOAMV_P14350-Pro_2mut





PAPAPAPAPAPAP
15,148
WDSV_O92815





SGSETPGTSESATPES
15,149
SFVCP_Q87040-Pro_2mutA





GGSGGSGGS
15,150
SFV1_P23074





GGGGSS
15,151
SFVCP_Q87040_2mut





GGGGGSEAAAK
15,152
MMTVB_P03365





SGSETPGTSESATPES
15,153
MMTVB_P03365_WS





PAPAPAP
15,154
SFV3L_P27401





PAPEAAAKGSS
15,155
MMTVB_P03365_2mutB_WS





GSSGSSGSSGSSGSS
15,156
SRV2_P51517





GGGPAPGSS
15,157
HTL32_QOR5R2_2mutB





GGSGGGGSS
15,158
MMTVB_P03365-Pro





SGSETPGTSESATPES
15,159
SRV2_P51517





EAAAKGSSGGS
15,160
MMTVB_P03365-Pro





GSSPAPEAAAK
15,161
MMTVB_P03365-Pro





GSSPAPEAAAK
15,162
SRV2_P51517





GGGGSSPAP
15,163
MMTVB_P03365-Pro





PAPGGGEAAAK
15,164
SFV1_P23074-Pro_2mutA





PAPEAAAKGGS
15,165
MMTVB_P03365





GSSGSSGSSGSSGSSGSS
15,166
FOAMV_P14350-Pro





GGSPAPGSS
15,167
SFV3L_P27401





GGGPAPGGS
15,168
SFV1_P23074-Pro_2mutA





GGGPAPGSS
15,169
MMTVB_P03365-Pro





EAAAKPAP
15,170
MLVBM_Q7SVK7





EAAAKEAAAKEAAAK
15,171
HTL1C_P14078





GSSGGSEAAAK
15,172
SRV2_P51517





PAPGGGGGS
15,173
SRV2_P51517





GGGEAAAK
15,174
FFV_O93209-Pro_2mut





EAAAKGGGPAP
15,175
HTL32_QOR5R2





GGSGSSGGG
15,176
MMTVB_P03365





PAPEAAAKGSS
15,177
MMTVB_P03365-Pro





PAPGGGGGS
15,178
MMTVB_P03365-Pro





EAAAKGGGGGS
15,179
MMTVB_P03365_WS





GGGGGS
15,180
MMTVB_P03365-Pro





GGGGSGGGGSGGGGSGGGGSGGGGS
15,181
HTL1C_P14078





EAAAKGGSPAP
15,182
MMTVB_P03365





GGGGSSPAP
15,183
FFV_O93209-Pro_2mut





GGGGSSGGS
15,184
MMTVB_P03365-Pro





PAPGSSGGS
15,185
MMTVB_P03365-Pro





GGGGGS
15,186
SRV2_P51517





GGSGSSGGG
15,187
MMTVB_P03365





GSSGGSEAAAK
15,188
MMTVB_P03365-Pro





EAAAKEAAAKEAAAKEAAAK
15,189
GALV_P21414





GGSEAAAKGGG
15,190
MMTVB_P03365-Pro





SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
15,191
MMTVB_P03365-Pro





GSSEAAAKGGS
15,192
MMTVB_P03365





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
15,193
HTL3P_Q4U0X6_2mutB





GGGEAAAK
15,194
MMTVB_P03365-Pro





PAPAPAPAP
15,195
MMTVB_P03365-Pro





PAPGSSGGG
15,196
MMTVB_P03365





GSSGSSGSSGSSGSS
15,197
GALV_P21414





GGSPAP
15,198
MMTVB_P03365_WS





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
15,199
MMTVB_P03365-Pro





PAPEAAAK
15,200
MMTVB_P03365-Pro





PAPGSSGGG
15,201
SFV1_P23074-Pro_2mutA





GGGGGSEAAAK
15,202
MMTVB_P03365_2mutB_WS





PAPAPAPAPAP
15,203
MMTVB_P03365-Pro





EAAAKGGSGSS
15,204
MMTVB_P03365-Pro





EAAAKEAAAKEAAAKEAAAK
15,205
MLVRD_P11227_3mut





PAPAPAPAP
15,206
FOAMV_P14350_2mutA





GGGPAPGSS
15,207
SFVCP_Q87040_2mut





PAPEAAAKGSS
15,208
SFVCP_Q87040_2mut





GGSPAPGGG
15,209
MMTVB_P03365-Pro





GGGGSGGGGSGGGGSGGGGS
15,210
MMTVB_P03365





EAAAKGGS
15,211
HTL3P_Q4U0X6_2mut





PAPGSSGGS
15,212
MMTVB_P03365_WS





GGGGSGGGGS
15,213
MMTVB_P03365





GGSGGS
15,214
FOAMV_P14350





EAAAKGGGGSEAAAK
15,215
SFVCP_Q87040-Pro_2mut





EAAAKEAAAKEAAAKEAAAK
15,216
MMTVB_P03365-Pro_2mutB





PAPGGGEAAAK
15,217
SFVCP_Q87040-Pro





GSSGSS
15,218
JSRV_P31623_2mutB





EAAAKGGGGGS
15,219
MMTVB_P03365_2mut_WS





GSSPAPEAAAK
15,220
MMTVB_P03365-Pro





GGGEAAAK
15,221
HTL1C_P14078





PAPEAAAKGSS
15,222
HTL32_Q0R5R2_2mutB





GGGGSSEAAAK
15,223
MMTVB_P03365-Pro





PAPGSSGGS
15,224
MMTVB_P03365-Pro





EAAAKGGGGGS
15,225
MMTVB_P03365





GGGGSGGGGSGGGGSGGGGS
15,226
MMTVB_P03365





EAAAKGGGGSS
15,227
HTL3P_Q4U0X6_2mut





GGGEAAAKGGS
15,228
SFVCP_Q87040-Pro





GGGGGSPAP
15,229
MMTVB_P03365-Pro_2mutB





GGSGGGEAAAK
15,230
SFV3L_P27401-Pro





PAPGGGGGS
15,231
SFV3L_P27401-Pro





EAAAKGGGGSEAAAK
15,232
MMTVB_P03365





PAPEAAAKGSS
15,233
MMTVB_P03365-Pro





GGSEAAAKGGG
15,234
MMTVB_P03365-Pro





GGSGGSGGSGGSGGS
15,235
SMRVH_P03364_2mutB





GGSGGSGGSGGSGGS
15,236
HTL1L_P0C211_2mut





GGGGGG
15,237
WDSV_O92815





GGGGGSGSS
15,238
MMTVB_P03365-Pro





GGSEAAAKPAP
15,239
SFV3L_P27401-Pro_2mut





GGGPAPGSS
15,240
MMTVB_P03365_2mut_WS





GGGGGS
15,241
MMTVB_P03365_WS





GGSPAPEAAAK
15,242
MMTVB_P03365





PAPEAAAKGGS
15,243
HTL1A_P03362





EAAAKGGSGSS
15,244
MMTVB_P03365_2mut_WS





GGGPAPEAAAK
15,245
SFV3L_P27401-Pro_2mut





PAPGGGGSS
15,246
HTL32_Q0R5R2_2mut





GSSPAPGGG
15,247
HTL3P_Q4U0X6_2mut





GGGGSSGGS
15,248
BLVAU_P25059_2mut





EAAAKGGGGGS
15,249
HTL1L_P0C211





GGSEAAAKGSS
15,250
JSRV_P31623_2mutB





GSSGGG
15,251
JSRV_P31623





GGSGGSGGSGGS
15,252
MMTVB_P03365-Pro





EAAAKPAP
15,253
SFV1_P23074-Pro_2mutA





GGGGSSGGS
15,254
MMTVB_P03365_WS





GGSGGS
15,255
MMTVB_P03365_WS





EAAAKGGGGGS
15,256
MMTVB_P03365-Pro





GGGGSGGGGSGGGGSGGGGSGGGGSGGGGS
15,257
MMTVB_P03365





GGSGGSGGS
15,258
MMTVB_P03365





GGGGGSEAAAK
15,259
MLVBM_Q7SVK7





GGSGSSPAP
15,260
MMTVB_P03365_WS





EAAAKEAAAKEAAAK
15,261
JSRV_P31623





PAPEAAAKGGS
15,262
MMTVB_P03365-Pro





GGSGSSEAAAK
15,263
FOAMV_P14350





GGGGGSGSS
15,264
MMTVB_P03365-Pro_2mut





GGGPAPGGS
15,265
MMTVB_P03365





SGSETPGTSESATPES
15,266
SFVCP_Q87040_2mut





GSSPAPGGS
15,267
SFV1_P23074-Pro_2mutA





GSSGSSGSSGSSGSS
15,268
MMTVB_P03365





EAAAKGGGPAP
15,269
MMTVB_P03365





GSSGGG
15,270
MMTVB_P03365_2mut_WS





GGGEAAAKPAP
15,271
MMTVB_P03365





PAPGGSGGG
15,272
MMTVB_P03365-Pro





GSSGGSGGG
15,273
WDSV_O92815_2mut





GGSGGG
15,274
HTL32_Q0R5R2_2mut





EAAAKGGSPAP
15,275
HTLV2_P03363_2mut





GGSPAPEAAAK
15,276
MMTVB_P03365-Pro





GSSGGSEAAAK
15,277
MMTVB_P03365_2mut





GSAGSAAGSGEF
15,278
MMTVB_P03365_WS





PAPGGSGSS
15,279
FFV_O93209





GGSEAAAKGGG
15,280
MMTVB_P03365





GGSPAPGSS
15,281
MMTVB_P03365-Pro





GSSGGSGGG
15,282
SFV3L_P27401





PAPEAAAKGGG
15,283
HTL1A_P03362_2mutB





GGGEAAAKPAP
15,284
MMTVB_P03365-Pro





GGSEAAAK
15,285
HTL32_Q0R5R2_2mutB





GGGEAAAKGSS
15,286
MPMV_P07572





GGGGGSEAAAK
15,287
MMTVB_P03365-Pro





PAPAPAPAPAP
15,288
SFVCP_Q87040-Pro_2mutA





PAPAPAPAPAP
15,289
HTL1L_P0C211_2mut





GGGGSSGGS
15,290
HTL3P_Q4U0X6





PAPGGSEAAAK
15,291
MMTVB_P03365_2mut_WS





PAPAPAPAPAP
15,292
HTL1A_P03362





EAAAKPAPGGG
15,293
MMTVB_P03365_2mut_WS





GGSEAAAK
15,294
MMTVB_P03365_2mut_WS





GGGEAAAKGSS
15,295
SFV1_P23074-Pro_2mutA





GGSPAPGSS
15,296
MMTVB_P03365-Pro





GGSEAAAKPAP
15,297
MLVBM_Q7SVK7





PAPEAAAKGGG
15,298
MMTVB_P03365_2mut_WS





GSSEAAAKPAP
15,299
MMTVB_P03365-Pro_2mutB





GGGGSEAAAKGGGGS
15,300
MMTVB_P03365-Pro_2mut





GSSEAAAKGGS
15,301
MMTVB_P03365-Pro_2mutB





GSSGSSGSSGSSGSS
15,302
SRV2_P51517_2mutB





GGGGGSPAP
15,303
HTL1L_P0C211_2mut





GGSEAAAK
15,304
MMTVB_P03365





GSSPAPEAAAK
15,305
SMRVH_P03364_2mutB





GGGPAPGGS
15,306
HTL1C_P14078_2mut





GGSPAPEAAAK
15,307
MMTVB_P03365_WS





GGSEAAAKPAP
15,308
HTL1A_P03362_2mut





PAPAPAPAP
15,309
HTLV2_P03363_2mut





GSSPAPGGG
15,310
MMTVB_P03365





GSSGSSGSSGSS
15,311
MMTVB_P03365-Pro





GGSEAAAKGSS
15,312
MMTVB_P03365_WS





GGSGSSGGG
15,313
MMTVB_P03365_2mutB





GSSGSSGSSGSSGSSGSS
15,314
JSRV_P31623_2mutB





GGSEAAAKPAP
15,315
MMTVB_P03365-Pro





GSSGGSGGG
15,316
HTLV2_P03363_2mut





AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA
15,317
WDSV_O92815_2mut





GGSPAPEAAAK
15,318
MMTVB_P03365





GGGGSSEAAAK
15,319
MMTVB_P03365





GGSGGGEAAAK
15,320
SFV1_P23074-Pro_2mutA





GGGGSEAAAKGGGGS
15,321
WDSV_O92815_2mut





GGSGSSEAAAK
15,322
MMTVB_P03365_2mutB_WS





GGSEAAAKPAP
15,323
MMTVB_P03365_WS





GSSGGGEAAAK
15,324
SFVCP_Q87040-Pro





GSSGGS
15,325
SFVCP_Q87040-Pro_2mut





GGSEAAAKPAP
15,326
SFVCP_Q87040_2mut





GSSGGSEAAAK
15,327
SFVCP_Q87040_2mut





GSSPAPEAAAK
15,328
SRV2_P51517_2mutB





GGSGGSGGSGGSGGSGGS
15,329
BLVAU_P25059





GSSGSSGSSGSSGSS
15,330
HTL1C_P14078_2mut





EAAAKGGGGSS
15,331
MMTVB_P03365_2mutB





GGGEAAAKGSS
15,332
SFVCP_Q87040-Pro









Example 4: Quantifying Activity of a Gene Editing Polypeptide and Template for Rewriting the Endogenous FAH Locus Achieved in Primary Mouse Hepatocytes

This example demonstrates the use of a heterologous gene modifying system containing a heterologous gene modifying polypeptide and a template RNA, to convert an A nucleotide to a G nucleotide in the endogenous Fah locus in mouse primary hepatocytes derived from a Fah5981SB mouse. The Fah5981SB mouse model harbors a G to A point mutation in the last nucleotide of exon 8 of the Fah gene, leading to aberrant mRNA splicing and subsequent mRNA degradation, without the production of Fah protein and, and thus serves as a mouse model of hereditary tyrosinemia type I.


In this example, the template RNA contained:

    • (1) a gRNA spacer;
    • (2) a gRNA scaffold;
    • (3) a heterologous object sequence; and
    • (4) a primer binding site (PBS) sequence.


More specifically, the template RNA (including chemical modification pattern) comprised the following sequences:









FAH1_R14_P12_Heavy RNACS048-001


(SEQ ID NO: 30421)


mG*mG*mA*rUrGrGrUrCrCrUrCrArUrGrArArCrGrArCrGrUrU





rUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmCrArArGrUrUrArAr





ArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmU





mGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCr





UrUrArCrCrGrCrUrCrCrArGrUrCrGrUrUrCrArUrGrArG*mG*





mA*mC





FAH1_R15_P10_Heavy RNACS049-001


(SEQ ID NO: 30422)


mG*mG*mA*rUrGrGrUrCrCrUrCrArUrGrArArCrGrArCrGrUrU





rUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmCrArArGrUrUrArAr





ArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmU





mGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCr





ArUrUrArCrCrGrCrUrCrCrArGrUrCrGrUrUrCrArUrG*mA*mG





*Mg





FAH2_R19_P11_MUT_Heavy RNACS052-001


(SEQ ID NO: 30423)


mU*mC*mA*rGrArGrGrArArGrCrUrGrGrGrCrCrArCrCrGrUrU





rUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmCrArArGrUrUrArAr





ArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmU





mGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCr





UrGrGrArGrCrGrGrUrArArUrGrGrCrUrGrGrUrGrGrCrCrCrA





rGrC*mU*mU*mC





FAH2_R19_P13_MUT_Heavy RNACS053-001


(SEQ ID NO: 30424)


mU*mC*mA*rGrArGrGrArArGrCrUrGrGrGrCrCrArCrCrGrUrU





rUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmCrArArGrUrUrArAr





ArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmU





mGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCr





UrGrGrArGrCrGrGrUrArArUrGrGrCrUrGrGrUrGrGrCrCrCrA





rGrCrUrU*mC*mC*mU






Additional exemplary template RNAs that could be utilized in this experiment include the following:









FAH1 RNACS050


(SEQ ID NO: 30425)


mG*mG*mA*rUrGrGrUrCrCrUrCrArUrGrArArCrGrArCrGrUrU





rUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmCrArArGrUrUrArAr





ArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmU





mGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCr





ArGrGrCrArUrUrArCrCrGrCrUrCrCrArGrUrCrGrUrUrCrArU





rGrArG*mG*mA*mC





FAH1 RNACS051


(SEQ ID NO: 30426)


mG*mG*mA*rUrGrGrUrCrCrUrCrArUrGrArArCrGrArCrGrUrU





rUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmCrArArGrUrUrArAr





ArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmU





mGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCr





ArGrGrCrArUrUrArCrCrGrCrUrCrCrArGrUrCrGrUrUrCrArU





rG*mA*mG*mG






In the sequences above m=2′-O-methyl ribonucleotide, r=ribose and *=phosphorothioate bond.


The heterologous gene modifying polypeptides tested comprised sequence of: RNAV209 (nCas9-RT) and RNAV214 (wtCas9-RT). Specifically, the nCas9-RT and the wtCas9-RT had the following amino acid sequences:










nCas9-RT (RNAV209):



(SEQ ID NO: 30427)



MPAAKRVKLDGGDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLI






GALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVE





EDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHF





LIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQL





PGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYAD





LFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYK





EIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNG





SIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKS





EETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVK





YVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRF





NASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKV





MKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKE





DIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAR





ENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMY





VDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKARGKSDNVPSEEVVKKMKN





YWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMN





TKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIK





KYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRK





RPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKL





IARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNP





IDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYL





ASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR





DKPIREQAENIIHILFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRI





DLSQLGGDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVS





LGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQR





LLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGL





PPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTL





FNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAK





KAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPG





FAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQG





YAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPL





VILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEG





LOHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIW





AKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTS





EGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPD





TSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFESPKKKAKVE 





wtCas9-RT (RNAV214-040):


(SEQ ID NO: 30428)



MPAAKRVKLDGGDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLI






GALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVE





EDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHF





LIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQL





PGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYAD





LFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYK





EIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNG





SIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKS





EETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVK





YVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRF





NASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKV





MKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKE





DIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAR





ENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMY





VDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKN





YWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMN





TKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIK





KYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRK





RPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKL





IARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNP





IDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYL





ASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR





DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRI





DLSQLGGDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVS





LGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQR





LLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGL





PPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTL





FNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAK





KAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPG





FAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQG





YAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPL





VILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEG





LOHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIW





AKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTS





EGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPD





TSTLLIENSSPSGGSKRTADGSEFEKRTADGSEFESPKKKAKVE







Underlining indicates the residue that differs between the nickase and wild-type sequences.


The heterologous gene modifying system comprising the heterologous gene modifying polypeptides listed above and the template RNA described above were transfected into primary mouse hepatocytes. The heterologous gene modifying polypeptide and the template RNA were delivered by nucleofection in the RNA format. Specifically, 4 μg of heterologous gene modifying polypeptide mRNA were combined with 10 μg of chemically synthesized template RNA in 5 μL of water. The transfection mix was added to 100,000 mouse primary hepatocytes in Buffer P3 [Lonza], and cells were nucleofected using program DG-138. After nucleofection, cells were grown at 37° C., 5% CO2 for 3 days prior to cell lysis and genomic DNA extraction. To analyze gene editing activity, primers flanking the target insertion site locus were used to amplify across the locus. Amplicons were analyzed via short read sequencing using an Illumina MiSeq. Conversion of terminal A to G sequence in exon 8 of fah gene indicates successful editing.


As shown in FIG. 2, for FAH2 templates, perfect rewrite levels (conversion of A to G with no unwanted mutations detected) of 4-8% were detected with RNAV209 but not with RNAV214. Indel levels of 4.4 to 6.6% were observed with RNAV209. Furthermore, the amount of WT Fah mRNA was measured using quantitative RT-PCR using primers that bind to exons 7 and 8. As shown in FIG. 3, FAH2 templates result in an increase in the abundance of Fah mRNA relative to WT by up to 12% when FAH2 template is tested with RNAV209 mRNA. These results demonstrate the use of a heterologous gene modifying system to reverse a mutation in the Fah gene, resulting in partial restoration of the expression of wild-type Fah mRNA.


Example 5: Quantifying Activity of a Gene Editing Polypeptide and Template In Vivo for Rewriting the Endogenous FAH Locus Achieved in Mouse Liver

This example demonstrates the use of a heterologous gene modifying system containing a heterologous gene modifying polypeptide and a template RNA, to convert an A nucleotide to a G nucleotide in the Fah5981SB mouse model into the endogenous Fah locus in mouse liver. The Fah5981SB mouse model harbors a G to A point mutation in the last nucleotide of exon 8 of the Fah gene, leading to aberrant mRNA splicing and subsequent mRNA degradation, without the production of Fah protein and serves as a mouse model of hereditary tyrosinemia type I.


In this example, the template RNA contained:

    • (1) a gRNA spacer;
    • (2) a gRNA scaffold;
    • (3) a heterologous object sequence; and
    • (4) a primer binding site (PBS) sequence.


More specifically, the template RNA comprised the following sequences:









FAH1_R14_P12_Heavy RNACS048-001


(SEQ ID NO: 30429)


mG*mG*mA*rUrGrGrUrCrCrUrCrArUrGrArArCrGrArCrGrUrU





rUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmCrArArGrUrUrArAr





ArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmU





mGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCr





UrUrArCrCrGrCrUrCrCrArGrUrCrGrUrUrCrArUrGrArG*mG*





mA*mC





FAH1_R15_P10_Heavy RNACS049-001


(SEQ ID NO: 30430)


mG*mG*mA*rUrGrGrUrCrCrUrCrArUrGrArArCrGrArCrGrUrU





rUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmCrArArGrUrUrArAr





ArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmU





mGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCr





ArUrUrArCrCrGrCrUrCrCrArGrUrCrGrUrUrCrArUrG*mA*mG





*mG





FAH2_R19_P11_MUT_Heavy RNACS052-001


(SEQ ID NO: 30431)


mU*mC*mA*rGrArGrGrArArGrCrUrGrGrGrCrCrArCrCrGrUrU





rUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmCrArArGrUrUrArAr





ArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmU





mGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCr





UrGrGrArGrCrGrGrUrArArUrGrGrCrUrGrGrUrGrGrCrCrCrA





rGrC*mU*mU*mC 





FAH2_R19_P13_MUT_Heavy RNACS053-001


(SEQ ID NO: 30432)


mU*mC*mA*rGrArGrGrArArGrCrUrGrGrGrCrCrArCrCrGrUrU





rUrUrArGrAmGmCmUmAmGmAmAmAmUmAmGmCrArArGrUrUrArAr





ArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrAmAmCmUmU





mGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCr





UrGrGrArGrCrGrGrUrArArUrGrGrCrUrGrGrUrGrGrCrCrCrA





rGrCrUrU*mC*mC*mU






The heterologous gene modifying polypeptides tested comprised a sequence of: RNAV209 and RNAV214, the sequences of which are each provided in Example 4.


The heterologous gene modifying system comprising the heterologous gene modifying polypeptides and the template RNA described above was formulated in LNP and delivered to mice. Specifically, 2 mg/kg of total RNA equivalent formulated in LNPs, combined at 1:1 (w/w) of template RNA and mRNA, were dosed intravenously in 7 to 9-week-old, mixed gender Fah5981SB mice. Six hours or 6 days post-dosing, animals were sacrificed, and their liver collected for analyses. To determine the expression distribution of the heterologous gene modifying polypeptide in the liver, 6-hr liver samples were subjected to immunohistochemistry using an anti-Cas9 antibody. Upon staining, quantification of Cas9-positive hepatocytes was determined by QuPath Markup. As shown in FIG. 4, the expression of the heterologous gene modifying polypeptide was observed in 82-91% of hepatocytes.


To analyze gene editing activity, primers flanking the target insertion site locus were used to amplify across the locus in the genomic DNA of liver samples collected 6 days post-dosing. Amplicons were analyzed via short read sequencing using an Illumina MiSeq. Conversion of an A nucleotide to a G nucleotide indicates successful editing. As shown in FIG. 5, perfect rewrite levels (conversion of A to G with no unwanted mutations detected) of 0.1%-1.9% were detected across the different groups. Indel levels were in the range of 0.2%-0.4%.


To determine the phenotypic correction caused by the gene editing activity, the restoration of wild-type FAH mRNA was determined by real-time qRT-PCR, and the restoration of Fah protein expression determined by immunohistochemistry using an anti-Fah antibody. As shown in FIG. 6, wild-type mRNA restoration of 0.1%-6%, relative to littermate heterozygous mice, was detected across the different groups. As shown in FIG. 7, Fah protein was detected in 0.1%-7% of liver cross-sectional area across the different groups. These results demonstrate the use of a heterologous gene modifying system to reverse a mutation in the Fah gene in an in vivo mouse model for hereditary tyrosinemia type I, resulting in partial restoration of expression of wild-type Fah mRNA and Fah protein.


Example 6. Gene Editing at the TTR Locus in an In Vivo Mouse Model

This Example demonstrates successful delivery of an mRNA and guide using Cas9-mediated gene editing using the protospacer sequence ACACAAAUACCAGUCCAGCG (SEQ ID NO: 37641) that targets the TTR locus using a heterologous gene modifying polypeptide and RNA in a C57Blk/6 mouse.


RNAs were prepared as follows. An mRNA encoding a heterologous gene modifying polypeptide having the sequence shown in Table 6A1 below was produced by in vitro transcription and the purified mRNA was dissolved in 1 mM sodium citrate, pH 6, to a final concentration of RNA of 1-2 mg/mL. Similarly, a guide RNA having a sequence shown in Table 6A1 below was produced by chemical synthesis and dissolved in water or aqueous buffer, to a final concentration of RNA of 1-2 mg/mL.









TABLE 6A1







Sequences of Example 6










SEQ



Name
ID NO
Nucleic acid sequence





Cas9-RT gene
30433
AUGCCUGCGGCUAAGCGGGUAAAAUUGGAUGGUGGGGACAAGAAGUACAGC


modifying

AUCGGCCUGGACAUCGGCACCAACUCUGUGGGCUGGGCCGUGAUCACCGACG


polypeptide

AGUACAAGGUGCCCAGCAAGAAAUUCAAGGUGCUGGGCAACACCGACCGGC




ACAGCAUCAAGAAGAACCUGAUCGGAGCCCUGCUGUUCGACAGCGGCGAAA




CAGCCGAGGCCACCCGGCUGAAGAGAACCGCCAGAAGAAGAUACACCAGACG




GAAGAACCGGAUCUGCUAUCUGCAAGAGAUCUUCAGCAACGAGAUGGCCAA




GGUGGACGACAGCUUCUUCCACAGACUGGAAGAGUCCUUCCUGGUGGAAGA




GGAUAAGAAGCACGAGCGGCACCCCAUCUUCGGCAACAUCGUGGACGAGGU




GGCCUACCACGAGAAGUACCCCACCAUCUACCACCUGAGAAAGAAACUGGUG




GACAGCACCGACAAGGCCGACCUGCGGCUGAUCUAUCUGGCCCUGGCCCACA




UGAUCAAGUUCCGGGGCCACUUCCUGAUCGAGGGCGACCUGAACCCCGACAA




CAGCGACGUGGACAAGCUGUUCAUCCAGCUGGUGCAGACCUACAACCAGCUG




UUCGAGGAAAACCCCAUCAACGCCAGCGGCGUGGACGCCAAGGCCAUCCUGU




CUGCCAGACUGAGCAAGAGCAGACGGCUGGAAAAUCUGAUCGCCCAGCUGCC




CGGCGAGAAGAAGAAUGGCCUGUUCGGAAACCUGAUUGCCCUGAGCCUGGG




CCUGACCCCCAACUUCAAGAGCAACUUCGACCUGGCCGAGGAUGCCAAACUG




CAGCUGAGCAAGGACACCUACGACGACGACCUGGACAACCUGCUGGCCCAGA




UCGGCGACCAGUACGCCGACCUGUUUCUGGCCGCCAAGAACCUGUCCGACGC




CAUCCUGCUGAGCGACAUCCUGAGAGUGAACACCGAGAUCACCAAGGCCCCC




CUGAGCGCCUCUAUGAUCAAGAGAUACGACGAGCACCACCAGGACCUGACCC




UGCUGAAAGCUCUCGUGCGGCAGCAGCUGCCUGAGAAGUACAAAGAGAUUU




UCUUCGACCAGAGCAAGAACGGCUACGCCGGCUACAUUGACGGCGGAGCCAG




CCAGGAAGAGUUCUACAAGUUCAUCAAGCCCAUCCUGGAAAAGAUGGACGG




CACCGAGGAACUGCUCGUGAAGCUGAACAGAGAGGACCUGCUGCGGAAGCA




GCGGACCUUCGACAACGGCAGCAUCCCCCACCAGAUCCACCUGGGAGAGCUG




CACGCCAUUCUGCGGCGGCAGGAAGAUUUUUACCCAUUCCUGAAGGACAACC




GGGAAAAGAUCGAGAAGAUCCUGACCUUCCGCAUCCCCUACUACGUGGGCCC




UCUGGCCAGGGGAAACAGCAGAUUCGCCUGGAUGACCAGAAAGAGCGAGGA




AACCAUCACCCCCUGGAACUUCGAGGAAGUGGUGGACAAGGGCGCUUCCGCC




CAGAGCUUCAUCGAGCGGAUGACCAACUUCGAUAAGAACCUGCCCAACGAG




AAGGUGCUGCCCAAGCACAGCCUGCUGUACGAGUACUUCACCGUGUAUAAC




GAGCUGACCAAAGUGAAAUACGUGACCGAGGGAAUGAGAAAGCCCGCCUUC




CUGAGCGGCGAGCAGAAAAAGGCCAUCGUGGACCUGCUGUUCAAGACCAAC




CGGAAAGUGACCGUGAAGCAGCUGAAAGAGGACUACUUCAAGAAAAUCGAG




UGCUUCGACUCCGUGGAAAUCUCCGGCGUGGAAGAUCGGUUCAACGCCUCCC




UGGGCACAUACCACGAUCUGCUGAAAAUUAUCAAGGACAAGGACUUCCUGG




ACAAUGAGGAAAACGAGGACAUUCUGGAAGAUAUCGUGCUGACCCUGACAC




UGUUUGAGGACAGAGAGAUGAUCGAGGAACGGCUGAAAACCUAUGCCCACC




UGUUCGACGACAAAGUGAUGAAGCAGCUGAAGCGGCGGAGAUACACCGGCU




GGGGCAGGCUGAGCCGGAAGCUGAUCAACGGCAUCCGGGACAAGCAGUCCG




GCAAGACAAUCCUGGAUUUCCUGAAGUCCGACGGCUUCGCCAACAGAAACU




UCAUGCAGCUGAUCCACGACGACAGCCUGACCUUUAAAGAGGACAUCCAGA




AAGCCCAGGUGUCCGGCCAGGGCGAUAGCCUGCACGAGCACAUUGCCAAUCU




GGCCGGCAGCCCCGCCAUUAAGAAGGGCAUCCUGCAGACAGUGAAGGUGGU




GGACGAGCUCGUGAAAGUGAUGGGCCGGCACAAGCCCGAGAACAUCGUGAU




CGAAAUGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCG




CGAGAGAAUGAAGCGGAUCGAAGAGGGCAUCAAAGAGCUGGGCAGCCAGAU




CCUGAAAGAACACCCCGUGGAAAACACCCAGCUGCAGAACGAGAAGCUGUAC




CUGUACUACCUGCAGAAUGGGCGGGAUAUGUACGUGGACCAGGAACUGGAC




AUCAACCGGCUGUCCGACUACGAUGUGGACCAUAUCGUGCCUCAGAGCUUUC




UGAAGGACGACUCCAUCGACAACAAGGUGCUGACCAGAAGCGACAAGAAUC




GGGGCAAGAGCGACAACGUGCCCUCCGAAGAGGUCGUGAAGAAGAUGAAGA




ACUACUGGCGGCAGCUGCUGAACGCCAAGCUGAUUACCCAGAGAAAGUUCG




ACAAUCUGACCAAGGCCGAGAGAGGCGGCCUGAGCGAACUGGAUAAGGCCG




GCUUCAUCAAGAGACAGCUGGUGGAAACCCGGCAGAUCACAAAGCACGUGG




CACAGAUCCUGGACUCCCGGAUGAACACUAAGUACGACGAGAAUGACAAGC




UGAUCCGGGAAGUGAAAGUGAUCACCCUGAAGUCCAAGCUGGUGUCCGAUU




UCCGGAAGGAUUUCCAGUUUUACAAAGUGCGCGAGAUCAACAACUACCACC




ACGCCCACGACGCCUACCUGAACGCCGUCGUGGGAACCGCCCUGAUCAAAAA




GUACCCUAAGCUGGAAAGCGAGUUCGUGUACGGCGACUACAAGGUGUACGA




CGUGCGGAAGAUGAUCGCCAAGAGCGAGCAGGAAAUCGGCAAGGCUACCGC




CAAGUACUUCUUCUACAGCAACAUCAUGAACUUUUUCAAGACCGAGAUUAC




CCUGGCCAACGGCGAGAUCCGGAAGCGGCCUCUGAUCGAGACAAACGGCGAA




ACCGGGGAGAUCGUGUGGGAUAAGGGCCGGGAUUUUGCCACCGUGCGGAAA




GUGCUGAGCAUGCCCCAAGUGAAUAUCGUGAAAAAGACCGAGGUGCAGACA




GGCGGCUUCAGCAAAGAGUCUAUCCUGCCCAAGAGGAACAGCGAUAAGCUG




AUCGCCAGAAAGAAGGACUGGGACCCUAAGAAGUACGGCGGCUUCGACAGC




CCCACCGUGGCCUAUUCUGUGCUGGUGGUGGCCAAAGUGGAAAAGGGCAAG




UCCAAGAAACUGAAGAGUGUGAAAGAGCUGCUGGGGAUCACCAUCAUGGAA




AGAAGCAGCUUCGAGAAGAAUCCCAUCGACUUUCUGGAAGCCAAGGGCUAC




AAAGAAGUGAAAAAGGACCUGAUCAUCAAGCUGCCUAAGUACUCCCUGUUC




GAGCUGGAAAACGGCCGGAAGAGAAUGCUGGCCUCUGCCGGCGAACUGCAG




AAGGGAAACGAACUGGCCCUGCCCUCCAAAUAUGUGAACUUCCUGUACCUG




GCCAGCCACUAUGAGAAGCUGAAGGGCUCCCCCGAGGAUAAUGAGCAGAAA




CAGCUGUUUGUGGAACAGCACAAGCACUACCUGGACGAGAUCAUCGAGCAG




AUCAGCGAGUUCUCCAAGAGAGUGAUCCUGGCCGACGCUAAUCUGGACAAA




GUGCUGUCCGCCUACAACAAGCACCGGGAUAAGCCCAUCAGAGAGCAGGCCG




AGAAUAUCAUCCACCUGUUUACCCUGACCAAUCUGGGAGCCCCUGCCGCCUU




CAAGUACUUUGACACCACCAUCGACCGGAAGAGGUACACCAGCACCAAAGAG




GUGCUGGACGCCACCCUGAUCCACCAGAGCAUCACCGGCCUGUACGAGACAC




GGAUCGACCUGUCUCAGCUGGGAGGUGACUCUGGAGGAUCUAGCGGAGGAU




CCUCUGGCAGCGAGACACCAGGAACAAGCGAGUCAGCAACACCAGAGAGCAG




UGGCGGCAGCAGCGGCGGCAGCAGCACCCUAAAUAUAGAAGAUGAGUAUCG




GCUACAUGAGACCUCAAAAGAGCCAGAUGUUUCUCUAGGGUCCACAUGGCU




GUCUGAUUUUCCUCAGGCCUGGGCGGAAACCGGGGGCAUGGGACUGGCAGU




UCGCCAAGCUCCUCUGAUCAUACCUCUGAAAGCAACCUCUACCCCCGUGUCC




AUAAAACAAUACCCCAUGUCACAAGAAGCCAGACUGGGGAUCAAGCCCCACA




UACAGAGACUGUUGGACCAGGGAAUACUGGUACCCUGCCAGUCCCCCUGGA




ACACGCCCCUGCUACCCGUUAAGAAACCAGGGACUAAUGAUUAUAGGCCUG




UCCAGGAUCUGAGAGAAGUCAACAAGCGGGUGGAGGACAUCCACCCCACCG




UGCCCAACCCUUACAACCUCUUGAGCGGGCUCCCACCGUCCCACCAGUGGUA




CACUGUGCUUGAUUUAAAGGAUGCCUUUUUCUGCCUGAGACUCCACCCCACC




AGUCAGCCUCUCUUCGCCUUUGAGUGGAGAGAUCCAGAGAUGGGAAUCUCA




GGACAAUUGACCUGGACCAGACUCCCACAGGGUUUCAAAAACAGUCCCACCC




UGUUUAAUGAGGCACUGCACAGAGACCUAGCAGACUUCCGGAUCCAGCACCC




AGACUUGAUCCUGCUACAGUACGUGGAUGACUUACUGCUGGCCGCCACUUC




UGAGCUAGACUGCCAACAAGGUACUCGGGCCCUGUUACAAACCCUAGGGAA




CCUCGGGUAUCGGGCCUCGGCCAAGAAAGCCCAAAUUUGCCAGAAACAGGUC




AAGUAUCUGGGGUAUCUUCUAAAAGAGGGUCAGAGAUGGCUGACUGAGGCC




AGAAAAGAGACUGUGAUGGGGCAGCCUACUCCGAAGACCCCUCGACAACUA




AGGGAGUUCCUAGGGAAGGCAGGCUUCUGUCGCCUCUUCAUCCCUGGGUUU




GCAGAAAUGGCAGCCCCCCUGUACCCUCUCACCAAACCGGGGACUCUGUUUA




AUUGGGGCCCAGACCAACAAAAGGCCUAUCAAGAAAUCAAGCAAGCCCUUC




UAACUGCCCCAGCCCUGGGGUUGCCAGAUUUGACUAAGCCCUUUGAACUCUU




UGUCGACGAGAAGCAGGGCUACGCCAAAGGUGUCCUAACGCAAAAACUGGG




ACCUUGGCGUCGGCCGGUGGCCUACCUGUCCAAAAAGCUAGACCCAGUAGCA




GCUGGGUGGCCCCCUUGCCUACGGAUGGUAGCAGCCAUUGCCGUACUGACAA




AGGAUGCAGGCAAGCUAACCAUGGGACAGCCACUAGUCAUUCUGGCCCCCCA




UGCAGUAGAGGCACUAGUCAAACAACCCCCCGACCGCUGGCUUUCCAACGCC




CGGAUGACUCACUAUCAGGCCUUGCUUUUGGACACGGACCGGGUCCAGUUC




GGACCGGUGGUAGCCCUGAACCCGGCUACGCUGCUCCCACUGCCUGAGGAAG




GGCUGCAACACAACUGCCUUGAUAUCCUGGCCGAAGCCCACGGAACCCGACC




CGACCUAACGGACCAGCCGCUCCCAGACGCCGACCACACCUGGUACACGGAU




GGAAGCAGUCUCUUACAAGAGGGACAGCGUAAGGCGGGAGCUGCGGUGACC




ACCGAGACCGAGGUAAUCUGGGCUAAAGCCCUGCCAGCCGGGACAUCCGCUC




AGCGGGCUGAACUGAUAGCACUCACCCAGGCCCUAAAGAUGGCAGAAGGUA




AGAAGCUAAAUGUUUAUACUGAUAGCCGUUAUGCUUUUGCUACUGCCCAUA




UCCAUGGAGAAAUAUACAGAAGGCGUGGGUGGCUCACAUCAGAAGGCAAAG




AGAUCAAAAAUAAAGACGAGAUCUUGGCCCUACUAAAAGCCCUCUUUCUGC




CCAAAAGACUUAGCAUAAUCCAUUGUCCAGGACAUCAAAAGGGACACAGCG




CCGAGGCUAGAGGCAACCGGAUGGCUGACCAAGCGGCCCGAAAGGCAGCCAU




CACAGAGACUCCAGACACCUCUACCCUCCUCAUAGAAAAUUCAUCACCCUCU




GGCGGCUCAAAAAGAACCGCCGACGGCAGCGAAUUCGAGAAAAGGACGGCG




GAUGGUAGCGAAUUCGAGAGCCCUAAAAAGAAGGCCAAGGUAGAGUAA





guide RNA
30434
mA*mC*mA*CAAAUACCAGUCCAGCGGUUUUAGAmGmCmUmAmGmAmAmAm




UmAmGmCAAGUUAAAAUAAGGCUAGUCCGUUAUCAmAmCmUmUmGmAmAm




AmAmAmGmUmGmGmCmAmCmCmGmAmGmUmCmGmGmUmGmCmU*mU*mU*




mU




m = 2′OMethyl, * = phosphorothioate linkage









Lipid nanoparticle (LNP) components (ionizable lipid, helper lipid, sterol, PEG) were dissolved in 100% ethanol with the lipid component molar ratios of 47:8:43.5:1.5, respectively. RNA (guide and mRNA) was combined in a 1:1 weight ratio and diluted to a concentration of 0.05-0.2 mg/mL in sodium acetate buffer, pH 5. RNA was formulated into distinct LNPs with a lipid amine to total RNA phosphate (N:P) molar ratio of 4.0. The LNPs were formed by microfluidic or turbulent mixing of the lipid and RNA solutions. A 3:1 ratio of aqueous to organic solvent was maintained during mixing using differential flow rates. After mixing, the LNPs were diluted, collected and buffer exchanged into 50 mM Tris, 9% sucrose buffer using tangential flow filtration. Formulations were concentrated to 1.0 mg/mL or higher then filtered through 0.2 μm sterile filter. The final LNP were stored at −80° C. until further use.


The LNP formulations were delivered intravenously by bolus tail vein injection to C57Blk/6 mice that were approximately 8 weeks old at concentrations ranging from 1-0.1 mg/kg. The expression of the Cas9-RT was measured by 6 hours after injection by euthanizing animals and collecting livers during necropsy. Animals were euthanized at 5 days after injection where liver was collected upon necropsy to which the activity of gene editing of the TTR locus was assessed. Expression of the Cas9-RT gene editing polypeptide in liver was measured by Western blot where Cas9 was detected by a mouse monoclonal antibody (7A9-3A3, Cell Signaling Technology) and GAPDH (Cell Signaling Technology) was used as a loading control. (FIG. 8). Editing of the TTR locus was quantified by Sanger sequencing followed by TIDE analysis of an amplicon of the TTR locus near the binding site of the protospacer. Editing of the TTR locus was observed, as shown in FIG. 9. TTR protein levels in serum were quantified by an ELISA using a standard curve (Aviva Biosciences). TTR protein levels in serum declined in treated animals, as shown in FIG. 10. These experiments demonstrate that the Cas9-RT polypeptide can be expressed in vivo, and can edit the TTR locus, resulting in a decrease in TTR protein levels in serum.


Example 7. Gene Editing at the TTR Locus in an In Vivo Cynomolgus Macaque Model

This Example demonstrates successful delivery of an mRNA and guide using Cas9-mediated gene editing using the protospacer sequence ACACAAAUACCAGUCCAGCG (SEQ ID NO: 37641) that targets the TTR locus using a heterologous gene modifying polypeptide and RNA in a cynomolgus model.


RNAs were prepared as follows. An mRNA encoding a heterologous gene modifying polypeptide having the sequence shown in Table 7A1 below was produced by in vitro transcription and the purified mRNA was dissolved in 1 mM sodium citrate, pH 6, to a final concentration of RNA of 1-2 mg/mL. Similarly, a guide RNA having a sequence shown in Table 7A1 below was produced by chemical synthesis and dissolved in water or aqueous buffer, to a final concentration of RNA of 1-2 mg/mL.









TABLE 7A1







Sequences of Example 7









Name
SEQ ID NO
Nucleic acid sequence





Cas9-RT gene
30435
AUGCCUGCGGCUAAGCGGGUAAAAUUGGAUGGUGGGGACAAG


modifying

AAGUACAGCAUCGGCCUGGACAUCGGCACCAACUCUGUGGGCU


polypeptide

GGGCCGUGAUCACCGACGAGUACAAGGUGCCCAGCAAGAAAUU




CAAGGUGCUGGGCAACACCGACCGGCACAGCAUCAAGAAGAAC




CUGAUCGGAGCCCUGCUGUUCGACAGCGGCGAAACAGCCGAGG




CCACCCGGCUGAAGAGAACCGCCAGAAGAAGAUACACCAGACG




GAAGAACCGGAUCUGCUAUCUGCAAGAGAUCUUCAGCAACGAG




AUGGCCAAGGUGGACGACAGCUUCUUCCACAGACUGGAAGAGU




CCUUCCUGGUGGAAGAGGAUAAGAAGCACGAGCGGCACCCCAU




CUUCGGCAACAUCGUGGACGAGGUGGCCUACCACGAGAAGUAC




CCCACCAUCUACCACCUGAGAAAGAAACUGGUGGACAGCACCG




ACAAGGCCGACCUGCGGCUGAUCUAUCUGGCCCUGGCCCACAU




GAUCAAGUUCCGGGGCCACUUCCUGAUCGAGGGCGACCUGAAC




CCCGACAACAGCGACGUGGACAAGCUGUUCAUCCAGCUGGUGC




AGACCUACAACCAGCUGUUCGAGGAAAACCCCAUCAACGCCAG




CGGCGUGGACGCCAAGGCCAUCCUGUCUGCCAGACUGAGCAAG




AGCAGACGGCUGGAAAAUCUGAUCGCCCAGCUGCCCGGCGAGA




AGAAGAAUGGCCUGUUCGGAAACCUGAUUGCCCUGAGCCUGGG




CCUGACCCCCAACUUCAAGAGCAACUUCGACCUGGCCGAGGAU




GCCAAACUGCAGCUGAGCAAGGACACCUACGACGACGACCUGG




ACAACCUGCUGGCCCAGAUCGGCGACCAGUACGCCGACCUGUU




UCUGGCCGCCAAGAACCUGUCCGACGCCAUCCUGCUGAGCGAC




AUCCUGAGAGUGAACACCGAGAUCACCAAGGCCCCCCUGAGCG




CCUCUAUGAUCAAGAGAUACGACGAGCACCACCAGGACCUGAC




CCUGCUGAAAGCUCUCGUGCGGCAGCAGCUGCCUGAGAAGUAC




AAAGAGAUUUUCUUCGACCAGAGCAAGAACGGCUACGCCGGCU




ACAUUGACGGCGGAGCCAGCCAGGAAGAGUUCUACAAGUUCAU




CAAGCCCAUCCUGGAAAAGAUGGACGGCACCGAGGAACUGCUC




GUGAAGCUGAACAGAGAGGACCUGCUGCGGAAGCAGCGGACCU




UCGACAACGGCAGCAUCCCCCACCAGAUCCACCUGGGAGAGCU




GCACGCCAUUCUGCGGCGGCAGGAAGAUUUUUACCCAUUCCUG




AAGGACAACCGGGAAAAGAUCGAGAAGAUCCUGACCUUCCGCA




UCCCCUACUACGUGGGCCCUCUGGCCAGGGGAAACAGCAGAUU




CGCCUGGAUGACCAGAAAGAGCGAGGAAACCAUCACCCCCUGG




AACUUCGAGGAAGUGGUGGACAAGGGCGCUUCCGCCCAGAGCU




UCAUCGAGCGGAUGACCAACUUCGAUAAGAACCUGCCCAACGA




GAAGGUGCUGCCCAAGCACAGCCUGCUGUACGAGUACUUCACC




GUGUAUAACGAGCUGACCAAAGUGAAAUACGUGACCGAGGGA




AUGAGAAAGCCCGCCUUCCUGAGCGGCGAGCAGAAAAAGGCCA




UCGUGGACCUGCUGUUCAAGACCAACCGGAAAGUGACCGUGAA




GCAGCUGAAAGAGGACUACUUCAAGAAAAUCGAGUGCUUCGAC




UCCGUGGAAAUCUCCGGCGUGGAAGAUCGGUUCAACGCCUCCC




UGGGCACAUACCACGAUCUGCUGAAAAUUAUCAAGGACAAGGA




CUUCCUGGACAAUGAGGAAAACGAGGACAUUCUGGAAGAUAU




CGUGCUGACCCUGACACUGUUUGAGGACAGAGAGAUGAUCGAG




GAACGGCUGAAAACCUAUGCCCACCUGUUCGACGACAAAGUGA




UGAAGCAGCUGAAGCGGCGGAGAUACACCGGCUGGGGCAGGCU




GAGCCGGAAGCUGAUCAACGGCAUCCGGGACAAGCAGUCCGGC




AAGACAAUCCUGGAUUUCCUGAAGUCCGACGGCUUCGCCAACA




GAAACUUCAUGCAGCUGAUCCACGACGACAGCCUGACCUUUAA




AGAGGACAUCCAGAAAGCCCAGGUGUCCGGCCAGGGCGAUAGC




CUGCACGAGCACAUUGCCAAUCUGGCCGGCAGCCCCGCCAUUA




AGAAGGGCAUCCUGCAGACAGUGAAGGUGGUGGACGAGCUCG




UGAAAGUGAUGGGCCGGCACAAGCCCGAGAACAUCGUGAUCGA




AAUGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAAC




AGCCGCGAGAGAAUGAAGCGGAUCGAAGAGGGCAUCAAAGAG




CUGGGCAGCCAGAUCCUGAAAGAACACCCCGUGGAAAACACCC




AGCUGCAGAACGAGAAGCUGUACCUGUACUACCUGCAGAAUGG




GCGGGAUAUGUACGUGGACCAGGAACUGGACAUCAACCGGCUG




UCCGACUACGAUGUGGACCAUAUCGUGCCUCAGAGCUUUCUGA




AGGACGACUCCAUCGACAACAAGGUGCUGACCAGAAGCGACAA




GAAUCGGGGCAAGAGCGACAACGUGCCCUCCGAAGAGGUCGUG




AAGAAGAUGAAGAACUACUGGCGGCAGCUGCUGAACGCCAAGC




UGAUUACCCAGAGAAAGUUCGACAAUCUGACCAAGGCCGAGAG




AGGCGGCCUGAGCGAACUGGAUAAGGCCGGCUUCAUCAAGAGA




CAGCUGGUGGAAACCCGGCAGAUCACAAAGCACGUGGCACAGA




UCCUGGACUCCCGGAUGAACACUAAGUACGACGAGAAUGACAA




GCUGAUCCGGGAAGUGAAAGUGAUCACCCUGAAGUCCAAGCUG




GUGUCCGAUUUCCGGAAGGAUUUCCAGUUUUACAAAGUGCGCG




AGAUCAACAACUACCACCACGCCCACGACGCCUACCUGAACGC




CGUCGUGGGAACCGCCCUGAUCAAAAAGUACCCUAAGCUGGAA




AGCGAGUUCGUGUACGGCGACUACAAGGUGUACGACGUGCGGA




AGAUGAUCGCCAAGAGCGAGCAGGAAAUCGGCAAGGCUACCGC




CAAGUACUUCUUCUACAGCAACAUCAUGAACUUUUUCAAGACC




GAGAUUACCCUGGCCAACGGCGAGAUCCGGAAGCGGCCUCUGA




UCGAGACAAACGGCGAAACCGGGGAGAUCGUGUGGGAUAAGG




GCCGGGAUUUUGCCACCGUGCGGAAAGUGCUGAGCAUGCCCCA




AGUGAAUAUCGUGAAAAAGACCGAGGUGCAGACAGGCGGCUU




CAGCAAAGAGUCUAUCCUGCCCAAGAGGAACAGCGAUAAGCUG




AUCGCCAGAAAGAAGGACUGGGACCCUAAGAAGUACGGCGGCU




UCGACAGCCCCACCGUGGCCUAUUCUGUGCUGGUGGUGGCCAA




AGUGGAAAAGGGCAAGUCCAAGAAACUGAAGAGUGUGAAAGA




GCUGCUGGGGAUCACCAUCAUGGAAAGAAGCAGCUUCGAGAAG




AAUCCCAUCGACUUUCUGGAAGCCAAGGGCUACAAAGAAGUGA




AAAAGGACCUGAUCAUCAAGCUGCCUAAGUACUCCCUGUUCGA




GCUGGAAAACGGCCGGAAGAGAAUGCUGGCCUCUGCCGGCGAA




CUGCAGAAGGGAAACGAACUGGCCCUGCCCUCCAAAUAUGUGA




ACUUCCUGUACCUGGCCAGCCACUAUGAGAAGCUGAAGGGCUC




CCCCGAGGAUAAUGAGCAGAAACAGCUGUUUGUGGAACAGCAC




AAGCACUACCUGGACGAGAUCAUCGAGCAGAUCAGCGAGUUCU




CCAAGAGAGUGAUCCUGGCCGACGCUAAUCUGGACAAAGUGCU




GUCCGCCUACAACAAGCACCGGGAUAAGCCCAUCAGAGAGCAG




GCCGAGAAUAUCAUCCACCUGUUUACCCUGACCAAUCUGGGAG




CCCCUGCCGCCUUCAAGUACUUUGACACCACCAUCGACCGGAA




GAGGUACACCAGCACCAAAGAGGUGCUGGACGCCACCCUGAUC




CACCAGAGCAUCACCGGCCUGUACGAGACACGGAUCGACCUGU




CUCAGCUGGGAGGUGACUCUGGAGGAUCUAGCGGAGGAUCCUC




UGGCAGCGAGACACCAGGAACAAGCGAGUCAGCAACACCAGAG




AGCAGUGGCGGCAGCAGCGGCGGCAGCAGCACCCUAAAUAUAG




AAGAUGAGUAUCGGCUACAUGAGACCUCAAAAGAGCCAGAUG




UUUCUCUAGGGUCCACAUGGCUGUCUGAUUUUCCUCAGGCCUG




GGCGGAAACCGGGGGCAUGGGACUGGCAGUUCGCCAAGCUCCU




CUGAUCAUACCUCUGAAAGCAACCUCUACCCCCGUGUCCAUAA




AACAAUACCCCAUGUCACAAGAAGCCAGACUGGGGAUCAAGCC




CCACAUACAGAGACUGUUGGACCAGGGAAUACUGGUACCCUGC




CAGUCCCCCUGGAACACGCCCCUGCUACCCGUUAAGAAACCAG




GGACUAAUGAUUAUAGGCCUGUCCAGGAUCUGAGAGAAGUCA




ACAAGCGGGUGGAGGACAUCCACCCCACCGUGCCCAACCCUUA




CAACCUCUUGAGCGGGCUCCCACCGUCCCACCAGUGGUACACU




GUGCUUGAUUUAAAGGAUGCCUUUUUCUGCCUGAGACUCCACC




CCACCAGUCAGCCUCUCUUCGCCUUUGAGUGGAGAGAUCCAGA




GAUGGGAAUCUCAGGACAAUUGACCUGGACCAGACUCCCACAG




GGUUUCAAAAACAGUCCCACCCUGUUUAAUGAGGCACUGCACA




GAGACCUAGCAGACUUCCGGAUCCAGCACCCAGACUUGAUCCU




GCUACAGUACGUGGAUGACUUACUGCUGGCCGCCACUUCUGAG




CUAGACUGCCAACAAGGUACUCGGGCCCUGUUACAAACCCUAG




GGAACCUCGGGUAUCGGGCCUCGGCCAAGAAAGCCCAAAUUUG




CCAGAAACAGGUCAAGUAUCUGGGGUAUCUUCUAAAAGAGGG




UCAGAGAUGGCUGACUGAGGCCAGAAAAGAGACUGUGAUGGG




GCAGCCUACUCCGAAGACCCCUCGACAACUAAGGGAGUUCCUA




GGGAAGGCAGGCUUCUGUCGCCUCUUCAUCCCUGGGUUUGCAG




AAAUGGCAGCCCCCCUGUACCCUCUCACCAAACCGGGGACUCU




GUUUAAUUGGGGCCCAGACCAACAAAAGGCCUAUCAAGAAAUC




AAGCAAGCCCUUCUAACUGCCCCAGCCCUGGGGUUGCCAGAUU




UGACUAAGCCCUUUGAACUCUUUGUCGACGAGAAGCAGGGCUA




CGCCAAAGGUGUCCUAACGCAAAAACUGGGACCUUGGCGUCGG




CCGGUGGCCUACCUGUCCAAAAAGCUAGACCCAGUAGCAGCUG




GGUGGCCCCCUUGCCUACGGAUGGUAGCAGCCAUUGCCGUACU




GACAAAGGAUGCAGGCAAGCUAACCAUGGGACAGCCACUAGUC




AUUCUGGCCCCCCAUGCAGUAGAGGCACUAGUCAAACAACCCC




CCGACCGCUGGCUUUCCAACGCCCGGAUGACUCACUAUCAGGC




CUUGCUUUUGGACACGGACCGGGUCCAGUUCGGACCGGUGGUA




GCCCUGAACCCGGCUACGCUGCUCCCACUGCCUGAGGAAGGGC




UGCAACACAACUGCCUUGAUAUCCUGGCCGAAGCCCACGGAAC




CCGACCCGACCUAACGGACCAGCCGCUCCCAGACGCCGACCACA




CCUGGUACACGGAUGGAAGCAGUCUCUUACAAGAGGGACAGCG




UAAGGCGGGAGCUGCGGUGACCACCGAGACCGAGGUAAUCUGG




GCUAAAGCCCUGCCAGCCGGGACAUCCGCUCAGCGGGCUGAAC




UGAUAGCACUCACCCAGGCCCUAAAGAUGGCAGAAGGUAAGAA




GCUAAAUGUUUAUACUGAUAGCCGUUAUGCUUUUGCUACUGCC




CAUAUCCAUGGAGAAAUAUACAGAAGGCGUGGGUGGCUCACA




UCAGAAGGCAAAGAGAUCAAAAAUAAAGACGAGAUCUUGGCC




CUACUAAAAGCCCUCUUUCUGCCCAAAAGACUUAGCAUAAUCC




AUUGUCCAGGACAUCAAAAGGGACACAGCGCCGAGGCUAGAGG




CAACCGGAUGGCUGACCAAGCGGCCCGAAAGGCAGCCAUCACA




GAGACUCCAGACACCUCUACCCUCCUCAUAGAAAAUUCAUCAC




CCUCUGGCGGCUCAAAAAGAACCGCCGACGGCAGCGAAUUCGA




GAAAAGGACGGCGGAUGGUAGCGAAUUCGAGAGCCCUAAAAA




GAAGGCCAAGGUAGAGUAA





guide RNA
30436
mA*mC*mA*CAAAUACCAGUCCAGCGGUUUUAGAmGmCmUmAm




GmAmAmAmUmAmGmCAAGUUAAAAUAAGGCUAGUCCGUUAUC




AmAmCmUmUmGmAmAmAmAmAmGmUmGmGmCmAmCmCmGmA




mGmUmCmGmGmUmGmCmU*mU*mU*mU




m = 2′OMethyl, * = phosphorothioate linkage









Lipid nanoparticle (LNP) components (ionizable lipid, helper lipid, sterol, PEG) were dissolved in 100% ethanol with the lipid component molar ratios of 47:8:43.5:1.5, respectively. RNA (guide and mRNA) was combined in a 1:1 weight ratio and diluted to a concentration of 0.05-0.2 mg/mL in sodium acetate buffer, pH 5. RNA was formulated into distinct LNPs with a lipid amine to total RNA phosphate (N:P) molar ratio of 4.0. The LNPs were formed by microfluidic or turbulent mixing of the lipid and RNA solutions. A 3:1 ratio of aqueous to organic solvent was maintained during mixing using differential flow rates. After mixing, the LNPs were diluted, collected and buffer exchanged into 50 mM Tris, 9% sucrose buffer using tangential flow filtration. Formulations were concentrated to 1.0 mg/mL or higher then filtered through 0.2 μm sterile filter. The final LNP were stored at −80° C. until further use. The LNP formulations were delivered intravenously by infusion over the course of 1 hour at 2 mg/kg where the volume of the infusion was 5 ml/kg. Cynomolgus macaques from mainland Asia were given dexamethasone 2 mg/kg bolus via intramuscular injection 1.5-2 h prior to intravenous infusion using a syringe pump. Animals were monitored after infusion and the expression of the Cas9-RT was measured by laparoscopic biopsies taken from the liver 8-12 h, 24 h, and 48 h after infusion. Animals were euthanized 14 days after infusion and liver was harvested by dividing the organ up into 8 different segments to which the activity of gene editing of the TTR locus was assessed. Expression of the Cas9-RT gene editing polypeptide in liver was quantified by capillary electrophoresis western blot using the ProteinSimple Jess system (bio-techne) where Cas9 was detected by a mouse monoclonal antibody (7A9-3A3, Cell Signaling Technology). Relative expression of the Cas9-RT gene editing polypeptide was measured by an area under curve analysis, as shown in FIG. 11. Editing of the TTR locus was quantified by amplicon-sequencing of the TTR locus near the binding site of the protospacer. Editing of the TTR locus was observed, as shown in FIG. 12. These experiments demonstrate that the Cas9-RT polypeptide can be expressed in vivo in a non-human primate model and can edit the TTR locus.


Example 8: Evaluating INDEL Activity for SpCas9, SpCas9-NG and SpRY Cas9 Spacers at the F508 Deletion CFTR Gene Locus

This example describes the use of exemplary gene modifying systems containing a Cas-RT fusion protein and single guide RNAs comprising a spacer, and quantifies the indel activity of single guide RNAs in the vicinity of F508 deletion in the CFTR gene in CFF-16HBEge CFTR F508del (M470) cells.


In this example, a single guide RNA contained:

    • a gRNA spacer;
    • a gRNA scaffold;


      In this example, the Cas-RT fusion proteins contained wildtype SpCas9 (PLV9103), SpCas9-NG (PLV9104), or SpRY Cas9 (PLV9105) domains. The amino acid sequences of polypeptides used in this and other Examples are given in Table F3.


Single guide RNAs generated are given in Table E2. Nucleotide modifications are noted as follows: phosphorothioate linkages denoted by an asterisk, 2′-O-methyl groups denoted by an ‘m’ preceding a nucleotide.


CFF-16HBEge CFTR F508del (M470) cells were received from the CF Foundation. The exemplary gene modifying systems comprising the Cas protein and single guide RNA described above were delivered by nucleofection in RNA format to M470 cells. Specifically, 2 μg of mRNA encoding the Cas-RT fusion protein was combined with 8 μg of guide RNAs. The mRNA and guide RNAs were added to 25 μL P3 buffer containing 200,000 M470 cells and were nucleofected using program CM-138. After nucleofection, cells were grown at 37° C., 5% CO2 in Minimum Essential Medium supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin for three days prior to cell lysis and genomic DNA extraction. To analyze indel activity, which is a proxy for the capability of a spacer to direct gene editing activity, primers flanking the F508 mutation site were used to amplify across the locus. Amplicons were analyzed via short read sequencing using an Illumina MiSeq.



FIG. 13 shows a graph of INDEL % assessed by Amp-Seq for M470 cells nucleofected with select exemplary gene modifying systems containing different F508 spacer guide RNAs and mRNAs encoding different Cas-RT fusion protein variants, where the systems were designed to assess cutting activity with the different Cas9 domains. The results show that use of a number of different spacer sequences for guide RNAs for SpCas9, SpCas9-NG, and SpRY Cas9 showed high INDEL %. Cells that were nucleofected with SpCas9 guides and SpCas9 mRNA showed a max INDEL % of upwards of 90%. Nucleofections with SpRY Cas9 guides and SpRY Cas9 mRNA or SpCas9-NG guides and SpCas9-NG mRNA showed moderate INDEL activity with max activity of approximately 45% or 30% for SpRY or SpNG, respectively.


INDEL activity was further evaluated in primary HBE cells. FIG. 14 shows a graph of INDEL percentage as assessed by Amp-Seq for primary HBE cells nucleofected with select exemplary gene modifying system containing select different spacer guide RNAs designed to produce cutting activity at an F508del proximal nick site and mRNA encoding different Cas-RT fusion protein variants (SpCas9, SpRY Cas9, and SpCas9-NG). Similar levels of INDEL activity were observed in the primary HBE cells relative to the M470 cells. Max indel percentage observed was upwards of 80% for an SpCas9 guide coupled with SpCas9 mRNA.


Taken together, the results demonstrate successful identification of active spacers in M470 and primary HBE cells for targeting the F508 locus using the Cas domains in question. Select spacers were then tested in the context of template RNA, as described below.









TABLE E2







Exemplary Template RNAs and Sequences










RNACS
Name
Sequence
SEQ ID NO





RNACS1700
Nme2Cas9_
mU*mC*mU*rGrUrArUrCrUrArUrArUrUrCrArUrCrArUrArGrGrArGrUrUrGrUrArGrCrUrCrCrCrGrAr
18945



chr7_
ArArCrGrUrUrGrCrUrArCrArArUrArArGrGrCrCrGrUrCrUrGrArArArArGrArUrGrUrGrCrCrGrCrArArC




117559598_−
rGrCrUrCrUrGrCrCrCrCrUrUrArArArGrCrUrUrCrUrGrCrUrUrUrArArGrGrGrGrCrArUrC*mG*mU*mU






RNACS1701
SauCas9_
mG*mC*mU*rUrCrUrGrUrArUrCrUrArUrArUrUrCrArUrCrArUrGrUrUrUrUrArGrUrArCrUrCrUrGrGr
18946



chr7_
ArArArCrArGrArArUrCrUrArCrUrArArArArCrArArGrGrCrArArArArUrGrCrCrGrUrGrUrUrUrArUrCr




117559602_−
UrCrGrUrCrArArCrUrUrGrUrUrGrGrCrG*mA*mG*mA






RNACS1702
SauCas9_
mC*mG*mC*rUrUrCrUrGrUrArUrCrUrArUrArUrUrCrArUrCrArGrUrUrUrUrArGrUrArCrUrCrUrGrGr
18947



chr7_
ArArArCrArGrArArUrCrUrArCrUrArArArArCrArArGrGrCrArArArArUrGrCrCrGrUrGrUrUrUrArUrCr




117559603_−
UrCrGrUrCrArArCrUrUrGrUrUrGrGrCrG*mA*mG*mA






RNACS1703
SauCas9KKH_
mA*mU*mA*rUrUrCrArUrCrArUrArGrGrArArArCrArCrCrGrUrUrUrUrArGrUrArCrUrCrUrGrGrArAr
18948



chr7_
ArCrArGrArArUrCrUrArCrUrArArArArCrArArGrGrCrArArArArUrGrCrCrGrUrGrUrUrUrArUrCrUrCr




117559592_−
GrUrCrArArCrUrUrGrUrUrGrGrCrG*mA*mG*mA






RNACS1704
SauCas9KKH_
mG*mG*mC*rArCrCrArUrUrArArArGrArArArArUrArUrCrGrUrUrUrUrArGrUrArCrUrCrUrGrGrArAr
18949



chr7_
ArCrArGrArArUrCrUrArCrUrArArArArCrArArGrGrCrArArArArUrGrCrCrGrUrGrUrUrUrArUrCrUrCr




117559568_+
GrUrCrArArCrUrUrGrUrUrGrGrCrG*mA*mG*mA






RNACS1705
SauCas9KKH_
mU*mC*mU*rArUrArUrUrCrArUrCrArUrArGrGrArArArCrGrUrUrUrUrArGrUrArCrUrCrUrGrGrArAr
18950



chr7_
ArCrArGrArArUrCrUrArCrUrArArArArCrArArGrGrCrArArArArUrGrCrCrGrUrGrUrUrUrArUrCrUrCr




117559595_−
GrUrCrArArCrUrUrGrUrUrGrGrCrG*mA*mG*mA






RNACS1706
SauCas9KKH_
mU*mC*mU*rGrUrArUrCrUrArUrArUrUrCrArUrCrArUrArGrUrUrUrUrArGrUrArCrUrCrUrGrGrArAr
18951



chr7_
ArCrArGrArArUrCrUrArCrUrArArArArCrArArGrGrCrArArArArUrGrCrCrGrUrGrUrUrUrArUrCrUrCr




117559601_−
GrUrCrArArCrUrUrGrUrUrGrGrCrG*mA*mG*mA






RNACS1707
SauCas9KKH_
mU*mU*mC*rUrGrUrArUrCrUrArUrArUrUrCrArUrCrArUrGrUrUrUrUrArGrUrArCrUrCrUrGrGrArAr
18952



chr7_
ArCrArGrArArUrCrUrArCrUrArArArArCrArArGrGrCrArArArArUrGrCrCrGrUrGrUrUrUrArUrCrUrCr




117559602_−
GrUrCrArArCrUrUrGrUrUrGrGrCrG*mA*mG*mA






RNACS1708
SauCas9KKH_
mC*mU*mU*rCrUrGrUrArUrCrUrArUrArUrUrCrArUrCrArGrUrUrUrUrArGrUrArCrUrCrUrGrGrArAr
18953



chr7_
ArCrArGrArArUrCrUrArCrUrArArArArCrArArGrGrCrArArArArUrGrCrCrGrUrGrUrUrUrArUrCrUrCr




117559603_−
GrUrCrArArCrUrUrGrUrUrGrGrCrG*mA*mG*mA






RNACS1709
SauCas9KKH_
mA*mU*mU*rArUrGrCrCrUrGrGrCrArCrCrArUrUrArArArGrUrUrUrUrArGrUrArCrUrCrUrGrGrArAr
18954



chr7_
ArCrArGrArArUrCrUrArCrUrArArArArCrArArGrGrCrArArArArUrGrCrCrGrUrGrUrUrUrArUrCrUrCr




117559559_+
GrUrCrArArCrUrUrGrUrUrGrGrCrG*mA*mG*mA






RNACS1710
SauCas9KKH_
mG*mC*mU*rUrCrUrGrUrArUrCrUrArUrArUrUrCrArUrCrGrUrUrUrUrArGrUrArCrUrCrUrGrGrArAr
18955



chr7_
ArCrArGrArArUrCrUrArCrUrArArArArCrArArGrGrCrArArArArUrGrCrCrGrUrGrUrUrUrArUrCrUrCr




117559604_−
GrUrCrArArCrUrUrGrUrUrGrGrCrG*mA*mG*mA






RNACS1711
SauCas9KKH_
mG*mA*mU*rUrArUrGrCrCrUrGrGrCrArCrCrArUrUrArArGrUrUrUrUrArGrUrArCrUrCrUrGrGrArAr
18956



chr7_
ArCrArGrArArUrCrUrArCrUrArArArArCrArArGrGrCrArArArArUrGrCrCrGrUrGrUrUrUrArUrCrUrCr




117559558_+
GrUrCrArArCrUrUrGrUrUrGrGrCrG*mA*mG*mA






RNACS1712
SauCas9KKH_
mC*mG*mC*rUrUrCrUrGrUrArUrCrUrArUrArUrUrCrArUrGrUrUrUrUrArGrUrArCrUrCrUrGrGrArAr
18957



chr7_
ArCrArGrArArUrCrUrArCrUrArArArArCrArArGrGrCrArArArArUrGrCrCrGrUrGrUrUrUrArUrCrUrCr




117559605_−
GrUrCrArArCrUrUrGrUrUrGrGrCrG*mA*mG*mA






RNACS1713
SpyCas9_
mA*mC*mC*rArUrUrArArArGrArArArArUrArUrCrArUrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18958



chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559571_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1714
SpyCas9_
mU*mC*mU*rGrUrArUrCrUrArUrArUrUrCrArUrCrArUrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18959



chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559602_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1715
SpyCas9_
mC*mA*mG*rUrUrUrUrCrCrUrGrGrArUrUrArUrGrCrCrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18960



chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559547_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1716
SpyCas9-
mU*mU*mC*rArUrCrArUrArGrGrArArArCrArCrCrArArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18961



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559590_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1717
SpyCas9-
mC*mA*mU*rUrArArArGrArArArArUrArUrCrArUrUrGrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18962



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559573_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1718
SpyCas9-
mA*mU*mU*rCrArUrCrArUrArGrGrArArArCrArCrCrArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18963



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559591_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1719
SpyCas9-
mC*mC*mA*rUrUrArArArGrArArArArUrArUrCrArUrUrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18964



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559572_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1720
SpyCas9-
mU*mA*mU*rUrCrArUrCrArUrArGrGrArArArCrArCrCrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18965



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559592_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1721
SpyCas9-
mA*mU*mA*rUrUrCrArUrCrArUrArGrGrArArArCrArCrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18966



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559593_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1722
SpyCas9-
mC*mA*mC*rCrArUrUrArArArGrArArArArUrArUrCrArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18967



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559570_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1723
SpyCas9-
mU*mA*mU*rArUrUrCrArUrCrArUrArGrGrArArArCrArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18968



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559594_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1724
SpyCas9-
mG*mC*mA*rCrCrArUrUrArArArGrArArArArUrArUrCrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18969



SpRY_chr7
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559569__
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1725
SpyCas9-
mC*mU*mA*rUrArUrUrCrArUrCrArUrArGrGrArArArCrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18970



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559595_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1726
SpyCas9-
mG*mG*mC*rArCrCrArUrUrArArArGrArArArArUrArUrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18971



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559568_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1727
SpyCas9-
mU*mC*mU*rArUrArUrUrCrArUrCrArUrArGrGrArArArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18972



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559596_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1728
SpyCas9-
mU*mG*mG*rCrArCrCrArUrUrArArArGrArArArArUrArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18973



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559567_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1729
SpyCas9-
mA*mU*mC*rUrArUrArUrUrCrArUrCrArUrArGrGrArArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18974



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559597_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1730
SpyCas9-
mC*mU*mG*rGrCrArCrCrArUrUrArArArGrArArArArUrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18975



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559566_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1731
SpyCas9-
mU*mA*mU*rCrUrArUrArUrUrCrArUrCrArUrArGrGrArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18976



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559598_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1732
SpyCas9-
mC*mC*mU*rGrGrCrArCrCrArUrUrArArArGrArArArArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18977



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559565_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1733
SpyCas9-
mG*mU*mA*rUrCrUrArUrArUrUrCrArUrCrArUrArGrGrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18978



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559599_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1734
SpyCas9-
mG*mC*mC*rUrGrGrCrArCrCrArUrUrArArArGrArArArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18979



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559564_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1735
SpyCas9-
mU*mG*mU*rArUrCrUrArUrArUrUrCrArUrCrArUrArGrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18980



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559600_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1736
SpyCas9-
mU*mG*mC*rCrUrGrGrCrArCrCrArUrUrArArArGrArArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18981



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559563_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1737
SpyCas9-
mC*mU*mG*rUrArUrCrUrArUrArUrUrCrArUrCrArUrArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18982



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559601_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1738
SpyCas9-
mA*mU*mG*rCrCrUrGrGrCrArCrCrArUrUrArArArGrArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18983



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559562_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1739
SpyCas9-
mU*mA*mU*rGrCrCrUrGrGrCrArCrCrArUrUrArArArGrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18984



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559561_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1740
SpyCas9-
mU*mU*mC*rUrGrUrArUrCrUrArUrArUrUrCrArUrCrArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18985



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559603_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1741
SpyCas9-
mU*mU*mA*rUrGrCrCrUrGrGrCrArCrCrArUrUrArArArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18986



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559560_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1742
SpyCas9-
mC*mU*mU*rCrUrGrUrArUrCrUrArUrArUrUrCrArUrCrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18987



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559604_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1743
SpyCas9-
mA*mU*mU*rArUrGrCrCrUrGrGrCrArCrCrArUrUrArArGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18988



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559559_+
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1744
SpyCas9-
mG*mC*mU*rUrCrUrGrUrArUrCrUrArUrArUrUrCrArUrGrUrUrUrUrArGrArGrCrUrArGrArArArUrAr
18989



SpRY_chr7_
GrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGr




117559605_−
UrGrGrCrArCrCrGrArGrUrCrGrG*mU*mG*mC






RNACS1745
St1Cas9_
mU*mU*mC*rUrGrUrArUrCrUrArUrArUrUrCrArUrCrArGrUrCrUrUrUrGrUrArCrUrCrUrGrGrUrArCr
18990



chr7_
CrArGrArArGrCrUrArCrArArArGrArUrArArGrGrCrUrUrCrArUrGrCrCrGrArArArUrCrArArCrArCrCrC




117559603_−
rUrGrUrCrArUrUrUrUrArUrGrGrCrArGrGrGrUrGrU*mU*mU*mU






RNACS1746
St1Cas9_
mC*mU*mU*rCrUrGrUrArUrCrUrArUrArUrUrCrArUrCrGrUrCrUrUrUrGrUrArCrUrCrUrGrGrUrArCrC
18991



chr7_
rArGrArArGrCrUrArCrArArArGrArUrArArGrGrCrUrUrCrArUrGrCrCrGrArArArUrCrArArCrArCrCrCr




117559604_−
UrGrUrCrArUrUrUrUrArUrGrGrCrArGrGrGrUrGrU*mU*mU*mU









Table E2A shows the sequences of E2 without chemical modifications. In some embodiments, the sequences of Table E2A may be used without chemical modifications, or with one or more chemical modifications.









TABLE E2A







Exemplary Template RNAs and Sequences without Chemical Modifications













SEQ ID


RNACS
Name
Sequence
NO





RNACS
Nme2Cas9_chr7_
UCUGUAUCUAUAUUCAUCAUAGGAGUUGUAGCUCCCGAAACGUUGCUACAAUAAGGCCGUCUGA
19244


1700
117559598_−
AAAGAUGUGCCGCAACGCUCUGCCCCUUAAAGCUUCUGCUUUAAGGGGCAUCGUU






RNACS
SauCas9_chr_
GCUUCUGUAUCUAUAUUCAUCAUGUUUUAGUACUCUGGAAACAGAAUCUACUAAAACAAGGCAA
19245


1701
7117559602_−
AAUGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGA






RNACS
SauCas9_chr7_
CGCUUCUGUAUCUAUAUUCAUCAGUUUUAGUACUCUGGAAACAGAAUCUACUAAAACAAGGCAA
19246


1702
117559603_−
AAUGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGA






RNACS
SauCas9KKH_
AUAUUCAUCAUAGGAAACACCGUUUUAGUACUCUGGAAACAGAAUCUACUAAAACAAGGCAAAA
19247


1703
chr7_
UGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGA




117559592_−







RNACS
SauCas9KKH_
GGCACCAUUAAAGAAAAUAUCGUUUUAGUACUCUGGAAACAGAAUCUACUAAAACAAGGCAAAA
19248


1704
chr7_
UGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGA




117559568_+







RNACS
SauCas9KKH_
UCUAUAUUCAUCAUAGGAAACGUUUUAGUACUCUGGAAACAGAAUCUACUAAAACAAGGCAAAA
19249


1705
chr7_
UGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGA




117559595_−







RNACS
SauCas9KKH_
UCUGUAUCUAUAUUCAUCAUAGUUUUAGUACUCUGGAAACAGAAUCUACUAAAACAAGGCAAAA
19250


1706
chr7_
UGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGA




117559601_−







RNACS
SauCas9KKH_
UUCUGUAUCUAUAUUCAUCAUGUUUUAGUACUCUGGAAACAGAAUCUACUAAAACAAGGCAAAA
19251


1707
chr7_
UGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGA




117559602_−







RNACS
SauCas9KKH_
CUUCUGUAUCUAUAUUCAUCAGUUUUAGUACUCUGGAAACAGAAUCUACUAAAACAAGGCAAAA
19252


1708
chr7_
UGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGA




117559603_−







RNACS
SauCas9KKH_
AUUAUGCCUGGCACCAUUAAAGUUUUAGUACUCUGGAAACAGAAUCUACUAAAACAAGGCAAAA
19253


1709
chr7_
UGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGA




117559559_+







RNACS
SauCas9KKH_
GCUUCUGUAUCUAUAUUCAUCGUUUUAGUACUCUGGAAACAGAAUCUACUAAAACAAGGCAAAA
19254


1710
chr7_
UGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGA




117559604_−







RNACS
SauCas9KKH_
GAUUAUGCCUGGCACCAUUAAGUUUUAGUACUCUGGAAACAGAAUCUACUAAAACAAGGCAAAA
19255


1711
chr7_
UGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGA




117559558_+







RNACS
SauCas9KKH_
CGCUUCUGUAUCUAUAUUCAUGUUUUAGUACUCUGGAAACAGAAUCUACUAAAACAAGGCAAAA
19256


1712
chr7_
UGCCGUGUUUAUCUCGUCAACUUGUUGGCGAGA




117559605_−







RNACS
SpyCas9_chr7_
ACCAUUAAAGAAAAUAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19257


1713
117559571_+
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC






RNACS
SpyCas9_chr7_
UCUGUAUCUAUAUUCAUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19258


1714
117559602_−
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC






RNACS
SpyCas9_chr7_
CAGUUUUCCUGGAUUAUGCCGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19259


1715
117559547_+
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGC






RNACS
SpyCas9−
UUCAUCAUAGGAAACACCAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19260


1716
SpRY_chr7_
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559590_−







RNACS1
SpyCas9−
CAUUAAAGAAAAUAUCAUUGGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19261


717
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559573_+







RNACS1
SpyCas9−
AUUCAUCAUAGGAAACACCAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19262


718
SpRY_chr7_
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559591_−







RNACS
SpyCas9−
CCAUUAAAGAAAAUAUCAUUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19263


1719
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559572_+







RNACS
SpyCas9−
UAUUCAUCAUAGGAAACACCGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19264


1720
SpRY_chr7_
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559592_−







RNACS
SpyCas9−
AUAUUCAUCAUAGGAAACACGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19265


1721
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559593_−







RNACS
SpyCas9−
CACCAUUAAAGAAAAUAUCAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19266


1722
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559570_+







RNACS
SpyCas9−
UAUAUUCAUCAUAGGAAACAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19267


1723
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559594_−







RNACS
SpyCas9−
GCACCAUUAAAGAAAAUAUCGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19268


1724
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559569_+







RNACS
SpyCas9−
CUAUAUUCAUCAUAGGAAACGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19269


1725
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559595_−







RNACS
SpyCas9−
GGCACCAUUAAAGAAAAUAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19270


1726
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559568_+







RNACS
SpyCas9−
UCUAUAUUCAUCAUAGGAAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19271


1727
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559596_−







RNACS
SpyCas9−
UGGCACCAUUAAAGAAAAUAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19272


1728
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559567_+







RNACS
SpyCas9−
AUCUAUAUUCAUCAUAGGAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19273


1729
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559597_−







RNACS
SpyCas9−
CUGGCACCAUUAAAGAAAAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19274


1730
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559566_+







RNACS
SpyCas9−
UAUCUAUAUUCAUCAUAGGAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19275


1731
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559598_−







RNACS
SpyCas9−
CCUGGCACCAUUAAAGAAAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19276


1732
SpRY_chr7_
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559565_+







RNACS
SpyCas9−
GUAUCUAUAUUCAUCAUAGGGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19277


1733
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559599_−







RNACS
SpyCas9−
GCCUGGCACCAUUAAAGAAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19278


1734
SpRY_chr7_
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559564_+







RNACS
SpyCas9−
UGUAUCUAUAUUCAUCAUAGGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19279


1735
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559600_−







RNACS
SpyCas9−
UGCCUGGCACCAUUAAAGAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19280


1736
SpRY_chr7_
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559563_+







RNACS
SpyCas9−
CUGUAUCUAUAUUCAUCAUAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19281


1737
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559601_−







RNACS
SpyCas9−
AUGCCUGGCACCAUUAAAGAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19282


1738
SpRY_chr7_
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559562_+







RNACS
SpyCas9−
UAUGCCUGGCACCAUUAAAGGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19283


1739
SpRY_chr7_
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559561_+







RNACS
SpyCas9−
UUCUGUAUCUAUAUUCAUCAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19284


1740
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559603_−







RNACS
SpyCas9−
UUAUGCCUGGCACCAUUAAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19285


1741
SpRY_chr7_
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559560_+







RNACS
SpyCas9−
CUUCUGUAUCUAUAUUCAUCGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19286


1742
SpRY_chr7_
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559604_−







RNACS
SpyCas9−
AUUAUGCCUGGCACCAUUAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUU
19287


1743
SpRY_chr7_
AUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559559_+







RNACS
SpyCas9−
GCUUCUGUAUCUAUAUUCAUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGU
19288


1744
SpRY_chr7_
UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC




117559605_−







RNACS
St1Cas9_chr7_
UUCUGUAUCUAUAUUCAUCAGUCUUUGUACUCUGGUACCAGAAGCUACAAAGAUAAGGCUUCAU
19289


1745
117559603_−
GCCGAAAUCAACACCCUGUCAUUUUAUGGCAGGGUGUUUU






RNACS
St1Cas9_chr7_
CUUCUGUAUCUAUAUUCAUCGUCUUUGUACUCUGGUACCAGAAGCUACAAAGAUAAGGCUUCAU
19290


1746
117559604_−
GCCGAAAUCAACACCCUGUCAUUUUAUGGCAGGGUGUUUU









Example 9: Evaluating Rewriting Activity of Exemplary Human Template RNAs for F508 T to TCTT Insertion

This example describes the use of exemplary gene modifying systems containing a gene modifying polypeptide and various different template RNAs comprising spacer, varied lengths of heterologous object sequences, and PBS sequences to quantify the activity of template RNAs for correction of the F508 deletion in the CFTR gene in CFF-16HBEge CFTR F508del (M470) cells.


In this example, a template RNA contained:

    • a gRNA spacer;
    • a gRNA scaffold;
    • a heterologous object sequence; and
    • a primer binding site (PBS) sequence.


In this example, a gene modifying polypeptide contained:

    • an endonuclease and/or DNA binding domain;
    • a peptide linker; and
    • a reverse transcriptase (RT) domain.


Exemplary template RNAs generated are given in Table E3. Nucleotide modifications are noted as follows: phosphorothioate linkages denoted by an asterisk, 2′-O-methyl groups denoted by an ‘m’ preceding a nucleotide. The exemplary gene modifying polypeptide was PLV8279 listed in Table F3.


CFF-16HBEge CFTR F508del (M470) cells were received from the CF Foundation. The gene modifying system comprising an mRNA encoding a compatible gene modifying polypeptide and a template RNA described above were delivered by nucleofection in RNA format to M470 cells. Specifically, 2 ng of mRNA encoding gene modifying polypeptide was combined with 8 ng of template RNAs. The mRNA and template RNAs were added to 25 μL P3 buffer containing 200,000 M470 cells and were nucleofected using program CM-138. After nucleofection, cells were grown at 37° C., 5% CO2 in Minimum Essential Medium supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin for three days prior to cell lysis and genomic DNA extraction. To analyze gene editing activity, primers flanking the F508 mutation site were used to amplify across the locus. Amplicons were analyzed via short read sequencing using an Illumina MiSeq.



FIG. 15 shows a graph of editing percentage as assessed by Amp-Seq for M470 cells nucleofected with exemplary gene modifying systems containing F508 template RNAs designed to produce 3 nucleotide CTT insertions at the nick site to correct the F508 mutation from T to TCTT. The results show that a number of F508 CTT insertion template RNAs facilitated editing activity, with % editing values ranging up to approximately 15%. The best performing tgRNA (RNACS3684) achieved a rewriting rate of −15%, followed by RNACS2126, RNACS3688, and RNACS3687 with rewriting rates of −9-11%. Generally, template RNAs containing longer PBS sequences (>15 nt) produced significantly higher rewriting activity than those with shorter PBS sequences.


Editing activity was further evaluated in primary HBE cells. FIG. 16 shows a graph of editing percentage as assessed by Amp-Seq for primary HBE cells nucleofected with exemplary gene modifying system containing F508 template RNAs designed to produce 3 nucleotide CTT insertions at the nick site to correct the F508 mutation selected from our screen in M470 cells. The results show that a number of F508 CTT insertion template RNAs facilitated detectable editing activity, at levels lower than those observed in M470 cells (less that 0.6%).


Taken together, the results demonstrate successful precise CTT insertion to correct the F508 mutation with a number of template RNAs in M470 cells, and lower but detectable editing in primary HBE cells.


Example 10: Second Nick gRNA Screen for T to TCTT Edit Increase

This example describes the use of exemplary gene modifying systems containing a gene modifying polypeptide, an exemplary template RNAs, and various different 2nd nick gRNAs comprising different spacers to quantify the activity of the combinations of template RNA and second nick gRNAs for correction of the F508 deletion in the CFTR gene in CFF-16HBEge CFTR F508del (M470) cells.


In this example, a template RNA contained:

    • a gRNA spacer;
    • a gRNA scaffold;
    • a heterologous object sequence; and
    • a primer binding site (PBS) sequence.


In this example, a second nick gRNA (ngRNA) contained:

    • a gRNA spacer; and
    • a gRNA scaffold;


In this example, a gene modifying polypeptide contained:

    • an endonuclease and/or DNA binding domain;
    • a peptide linker; and
    • a reverse transcriptase (RT) domain.


Exemplary template RNAs generated are given in Table E3 and ngRNAs generated are given in Table G3. Nucleotide modifications are noted as follows: phosphorothioate linkages denoted by an asterisk, 2′-O-methyl groups denoted by an ‘m’ preceding a nucleotide. The exemplary gene modifying polypeptide was PLV8279 listed in Table F3.


CFF-16HBEge CFTR F508del (M470) cells were received from the CF Foundation. The gene modifying system comprising an mRNA encoding a compatible gene modifying polypeptide, exemplary template RNA, and various second nick gRNAs described above were delivered by nucleofection in RNA format to M470 cells. Specifically, 2 μg of mRNA encoding the gene modifying polypeptide was combined with 8 μg of template RNA and 8 μg of second nick gRNA. The mRNA and template RNAs were added to 25 μL P3 buffer containing 200,000 M470 cells and were nucleofected using program CM-138. After nucleofection, cells were grown at 37° C., 5% CO2 in Minimum Essential Medium supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin for three days prior to cell lysis and genomic DNA extraction. To analyze gene editing activity, primers flanking the F508 mutation site were used to amplify across the locus. Amplicons were analyzed via short read sequencing using an Illumina MiSeq.



FIG. 17 shows a graph of rewriting percentage as assessed by amplicon sequencing for M470 cells nucleofected with exemplary gene modifying system containing an F508 template RNA designed to produce 3 nucleotide CTT insertions at the nick site to correct the F508 mutation from T to TCTT along with various different second nick gRNAs. The results show that several second nick gRNAs improved rewriting, with an increase in rewriting of up to 2-fold. For example, use of 2nd nick gRNA RNACS2288 increased rewriting activity from 15% to >30% in M470 cells.


Editing activity was further evaluated using various template gRNA combined with the best performing second nick gRNA (RNACS2288) in primary HBE cells. FIG. 18 shows a graph of editing percentage as assessed by Amp-Seq for primary HBE cells nucleofected with exemplary gene modifying systems containing select F508 template RNAs designed to produce 3 nucleotide CTT insertion at the F508 mutation along with second nick gRNA RNACS2288. The results show that rewriting activity in primary cells is significantly improved with addition of second nick gRNA (compare to levels seen in FIG. 16). For example, without a second nick gRNA, the tgRNAs RNACS3684 and RNCS2126 yielded low rewriting rates (<0.4%). With the addition of RNACS2288, rewriting activity was improved to −11% in primary HBE cells.


Taken together, the results demonstrate successful precise CTT insertion to correct the F508 mutation with a number of template RNAs and ngRNAs in M470 and primary HBE cells. The results further demonstrate that addition of a second nick gRNA can dramatically improve rewriting efficiency relative to a gene modifying system lacking the second nick gRNA, e.g., in primary cells, e.g., primary HBE cells. It should be understood that for all numerical bounds describing some parameter in this application, such as “about,” “at least,” “less than,” and “more than,” the description also necessarily encompasses any range bounded by the recited values. Accordingly, for example, the description “at least 1, 2, 3, 4, or 5” also describes, inter alia, the ranges 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, and 4-5, et cetera.


For all patents, applications, or other reference cited herein, such as non-patent literature and reference sequence information, it should be understood that they are incorporated by reference in their entirety for all purposes as well as for the proposition that is recited. Where any conflict exists between a document incorporated by reference and the present application, this application will control. All information associated with reference gene sequences disclosed in this application, such as GeneIDs or accession numbers (typically referencing NCBI accession numbers), including, for example, genomic loci, genomic sequences, functional annotations, allelic variants, and reference mRNA (including, e.g., exon boundaries or response elements) and protein sequences (such as conserved domain structures), as well as chemical references (e.g., PubChem compound, PubChem substance, or PubChem Bioassay entries, including the annotations therein, such as structures and assays, et cetera), are hereby incorporated by reference in their entirety.


Headings used in this application are for convenience only and do not affect the interpretation of this application.










LENGTHY TABLES




The patent application contains a lengthy table section. A copy of the table is available in electronic form from the USPTO web site (). An electronic copy of the table will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR 1.19(b)(3).





Claims
  • 1. A template RNA comprising, from 5′ to 3′: a) a gRNA spacer that is complementary to a first portion of the human CFTR gene, wherein the gRNA spacer comprises an RNA sequence according to SEQ ID NO: 19,587;b) gRNA scaffold that binds a SpyCas9 domain;c) a heterologous object sequence comprising a mutation region to correct a mutation in a second portion of the human CFTR gene, wherein the heterologous object sequence comprises a nucleotide sequence of the RT region of SEQ ID NO: 19,479 andd) a primer binding site (PBS) sequence comprising at least 3 bases with 100% identity to a third portion of the human CFTR gene, wherein the PBS sequence comprises a nucleotide sequence of the PBS sequence of SEQ ID NO: 19,479.
  • 2. The template RNA of claim 1, wherein the mutation to be corrected in the human CFTR gene is a F508del mutation.
  • 3. The template RNA of claim 1, wherein the gRNA spacer has a length of 20 nucleotides.
  • 4. The template RNA of claim 1, wherein the heterologous object sequence has a length of 13-16 nucleotides.
  • 5. The template RNA of claim 1, wherein the heterologous object sequence comprises, from 5′ to 3′, a post-edit homology region, a mutation region, and a pre-edit homology region.
  • 6. The template RNA of claim 1, wherein the heterologous object sequence has a length of 13 nucleotides.
  • 7. The template RNA of claim 1, wherein the PBS sequence has a length of 17-20 nucleotides.
  • 8. The template RNA of claim 1, wherein the PBS sequence has length of 17 nucleotides.
  • 9. The template RNA of claim 1, wherein the gRNA scaffold comprises an RNA sequence having at least 90% identity to SEQ ID NO: 19,588.
  • 10. The template RNA of claim 1, wherein the gRNA scaffold comprises an RNA sequence according to SEQ ID NO: 19,588.
  • 11. The template RNA of claim 1, which comprises an RNA sequence having at least 90% identity to SEQ ID NO: 19,479.
  • 12. The template RNA of claim 1, which comprises an RNA sequence according to SEQ ID NO: 19,479.
  • 13. The template RNA of claim 1, which comprises a region designed to inactivate a PAM sequence.
  • 14. The template RNA of claim 1, which comprises one or more chemically modified nucleotides.
  • 15. The template RNA of claim 14, which comprises the RNA sequence and chemical modifications set out in SEQ ID NO: 19,180.
  • 16. A gene modifying system comprising: a template RNA of claim 1, anda gene modifying polypeptide, or a nucleic acid encoding the gene modifying polypeptide.
  • 17. The gene modifying system of claim 16, which comprises the nucleic acid encoding the gene modifying polypeptide, wherein the nucleic acid comprises RNA.
  • 18. The gene modifying system of claim 16, wherein the gene modifying polypeptide comprises: a reverse transcriptase (RT) domain;a Cas domain; anda linker disposed between the RT domain and the Cas domain.
  • 19. The gene modifying system of claim 18, wherein the Cas domain is a SpyCas9 domain.
  • 20. The gene modifying system of claim 18, wherein the RT domain is an RT domain from a murine leukemia virus (MMLV), a porcine endogenous retrovirus (PERV); Avian reticuloendotheliosis virus (AVIRE), a feline leukemia virus (FLV), simian foamy virus (SFV) (e.g., SFV3L), bovine leukemia virus (BLV), Mason-Pfizer monkey virus (MPMV), human foamy virus (HFV), or bovine foamy/syncytial virus (BFV/BSV).
  • 21. The gene modifying system of claim 18, which further comprises a second strand-targeting gRNA spacer that directs a second nick to the second strand of the human CFTR gene.
  • 22. A pharmaceutical composition, comprising the gene modifying system of claim 16 and a pharmaceutically acceptable excipient or carrier.
  • 23. The pharmaceutical composition of claim 22, wherein the pharmaceutically acceptable excipient or carrier is selected from the group consisting of a plasmid vector, a viral vector, a vesicle, and a lipid nanoparticle.
  • 24. A method of making the template RNA of claim 1, the method comprising synthesizing the template RNA by in vitro transcription, solid-phase synthesis, or by introducing a DNA encoding the template RNA into a host cell under conditions that allow for production of the template RNA.
  • 25. A method for modifying a target site in the human CFTR gene in a cell, the method comprising contacting the cell with the gene modifying system of claim 16, or DNA encoding the same, thereby modifying the target site in the human CFTR gene in a cell.
  • 26. A method for treating a subject having a disease or condition associated with a mutation in the human CFTR gene, the method comprising administering to the subject the gene modifying system of claim 16, or DNA encoding the same, thereby treating the subject having a disease or condition associated with a mutation in the human CFTR gene.
  • 27. A template RNA comprising, e.g., from 5′ to 3′: (i) a gRNA spacer that is complementary to a first portion of the human CFTR gene, wherein the gRNA spacer has a sequence comprising the core nucleotides of a gRNA spacer sequence of Table 1, or a sequence having 1, 2, or 3 substitutions thereto, and optionally comprises one or more consecutive nucleotides starting with the 3′ end of the flanking nucleotides of the gRNA spacer, or wherein the gRNA spacer has a sequence of a spacer chosen from Table E2, E2A, E3, or E3A, or a sequence having 1, 2, or 3 substitutions thereto;(ii) a gRNA scaffold that binds a gene modifying polypeptide,(iii) a heterologous object sequence comprising a mutation region to introduce a mutation into a second portion of the human CFTR gene, and(iv) a primer binding site (PBS) sequence comprising at least 3, 4, 5, 6, 7, or 8 bases with 100% identity to a third portion of the human CFTR gene,wherein the gRNA spacer has a sequence other than SEQ ID NO: 19,587, the heterologous object sequence comprises a nucleotide sequence other than the RT region of SEQ ID NO: 19,479, and the PBS sequence comprises a nucleotide sequence other than the PBS sequence of SEQ ID NO: 19,479.
  • 28. A gene modifying system comprising: a template RNA of claim 27, anda gene modifying polypeptide, or a nucleic acid encoding the gene modifying polypeptide.
  • 29. A method for modifying a target site in the human CFTR gene in a cell, the method comprising contacting the cell with the gene modifying system of claim 28, or DNA encoding the same, thereby modifying the target site in the human CFTR gene in a cell.
  • 30. A method for treating a subject having a disease or condition associated with a mutation in the human CFTR gene, the method comprising administering to the subject the gene modifying system of claim 28, or DNA encoding the same, thereby treating the subject having a disease or condition associated with a mutation in the human CFTR gene.
CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of International Application No. PCT/US2022/081316, filed Dec. 9, 2022, which claims the benefit of U.S. Provisional Application No. 63/288,474, filed Dec. 10, 2021, U.S. Provisional Application No. 63/303,941, filed Jan. 27, 2022, and U.S. Provisional Application No. 63/374,832, filed Sep. 7, 2022. The contents of the aforementioned applications are hereby incorporated by reference in their entirety.

Provisional Applications (3)
Number Date Country
63374832 Sep 2022 US
63303941 Jan 2022 US
63288474 Dec 2021 US
Continuations (1)
Number Date Country
Parent PCT/US22/81316 Dec 2022 US
Child 18470687 US