Method for identifying DNA base editing by means of cytosine deaminase

Information

  • Patent Grant
  • 11920151
  • Patent Number
    11,920,151
  • Date Filed
    Wednesday, September 13, 2017
    7 years ago
  • Date Issued
    Tuesday, March 5, 2024
    10 months ago
Abstract
Provided are: a composition for DNA double-strand breaks (DSBs), comprising (1) a cytosine deaminase and an inactivated target-specific endonuclease, (2) a guide RNA, and (3) a uracil-specific excision reagent (USER); a method for producing DNA double-strand breaks by means of a cytosine deaminase using the composition; a method for analyzing a DNA nucleic acid sequence to which base editing has been introduced by means of a cytosine deaminase; and a method for identifying (or measuring or detecting) base editing, base editing efficiency at an on-target site, an off-target site, and/or target specificity by means of a cytosine deaminase.
Description
TECHNICAL FIELD

Provided are: a composition for DNA double-strand breaks (DSBs), comprising (1) a cytosine deaminase and an inactivated target-specific endonuclease, (2) a guide RNA, and (3) a uracil-specific excision reagent (USER); a method of generating DNA double-strand breaks by means of a cytosine deaminase using the composition; a method for analyzing a DNA nucleic acid sequence to which base editing has been introduced by means of a cytosine deaminase; and a method for identifying (or measuring or detecting) base editing site, base editing efficiency at on-target site, an off-target site, and/or target-specificity, by means of a cytosine deaminase.


BACKGROUND ART

Cas9-linked deaminases enable single-nucleotide conversions in a targeted manner to correct point mutations causing genetic disorders or introduce single-nucleotide variations of interest in human and other eukaryotic cells. Genome-wide target-specificities of these RNA-programmable deaminases, however, remain largely unknown.


Four different classes of programmable deaminases have been reported to date: 1) base editors (BEs) comprising catalytically-deficient Cas9 (dCas9) derived from S. pyogenes or D10A Cas9 nickase (nCas9) and rAPOBEC1, a cytidine deaminase from rat, 2) target-AID (activation-induced cytidine deaminase) comprising dCas9 or nCas9 and PmCDA1, an AID ortholog from sea lamprey, or human AID, 3) CRISPR-X composed of dCas9 and sgRNAs linked to MS2 RNA hairpins to recruit a hyperactive AID variant fused to MS2-binding protein, and 4) zinc-finger proteins or transcription activator-like effectors (TALEs) fused to a cytidine deaminase.


A programmable deaminase, consisting of a DNA binding module and cytidine deaminase, enables targeted nucleotide substitution or base editing in the genome without generating DNA double strand breaks (DSBs). Unlike programmable nocleas such as CRISPR-Cas9 and ZFNs, which induce small insertions or indels in the target site, programmable deaminases are able to convert C to T(U) (or to a lower frequency, C to G or A) within window of several nucleotides at a target site. Programmable deaminases can correct point mutations that cause genetic disorders in human cells, animals and plants, or can generate single nucleotide polymorphisms (SNPs).


Despite broad interest in base editing by programmable deaminase, there has not been developed any means for analyzing target-specificity of programmable deaminase to whole genome. Therefore, it is required to develop technologies to analyze target-specificity of programmable diaminnase to whole genome, thereby analyzing base editing efficiency, off-target site, and off-target effect of programmable diaminnase.


DISCLOSURE
Technical Problem

In this description, provided are technologies for analyzing target-specificity of a programmable deaminase to whole genome, and for analyzing base editing efficiency, off-target site, off-target effect, and the like of a programmable deaminase.


An embodiment provides a composition for DNA double strand breaks (DSBs) comprising (1) a cytosine deaminase and an inactivated target-specific endonuclease, or a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, or a plasmid comprising a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene; (2) a guide RNA; and (3) a uracil-specific excision reagent (USER).


Another embodiment provides a method of generating DNA double strand break, the method comprising:

    • (i) introducing or contacting (a) a cytosine deaminase and an inactivated target-specific endonuclease, or (b) a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, or (c) a plasmid comprising a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, into a cell or with DNA isolated from a cell, together with a guide RNA; and
    • (ii) treating a uracil-specific excision reagent (USER).


Another embodiment provides a method of analyzing nucleic acid sequence of DNA in which a base editing is introduced by cytosine deaminase, comprising:

    • (i) introducing or contacting (a) a cytosine deaminase and an inactivated target-specific endonuclease, or (b) a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, or (c) a plasmid comprising a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, into a cell or with DNA isolated from a cell, together with a guide RNA;
    • (ii) treating a uracil-specific excision reagent (USER), to generate double strand break in the DNA; and
    • (iii) analyzing nucleic acid sequence of the cleaved DNA fragment.


Another embodiment provides a method of identifying (or measuring or detecting) a base editing site, a base editing efficiency at on-target site, an off-target site, and/or a target-specificity, of cytosine deaminase, comprising:

    • (i) introducing or contacting (a) a cytosine deaminase and an inactivated target-specific endonuclease, or (b) a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, or (c) a plasmid comprising a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, into a cell or with DNA isolated from a cell, together with a guide RNA;
    • (ii) treating a uracil-specific excision reagent (USER), to generate double strand break in the DNA;
    • (iii) analyzing nucleic acid sequence of the cleaved DNA fragment; and
    • (iv) identifying the double strand break site in the nucleic acid sequence read obtained by said analysis.


Technical Solution

In this description, a modified Digenome-seq is used to assess specificities of a base editor (e.g., Base Editor 3 (BE3), composed of a Cas9 nickase and a deaminase, in the human genome. Genomic DNA is treated with BE3 and a mixture of DNA-modifying enzymes in vitro to produce DNA double-strand breaks (DSBs) at uracil-containing sites. BE3 off-target sites are computationally identified using whole genome sequencing data. BE3 is highly specific, inducing cytosine-to-uracil conversions at just 18±9 sites in the human genome. Digenome-seq is sensitive enough to capture BE3 off-target sites with a substitution frequency of 0.1%. Interestingly, BE3 and Cas9 off-target sites are often different, calling for independent assessments of genome-wide specificities.


First, a technique for generating double strand breaks in DNA using cytosine deaminase that does not induce double strand breakage in DNA, is provided.


An embodiment provides a composition for double strand breaks (DSBs) comprising (1) a cytosine deaminase and an inactivated target-specific endonuclease, or a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, or a plasmid comprising a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene; (2) a guide RNA; and (3) a uracil-specific excision reagent (USER). The composition may be used in inducing DNA double-strand breaks using cytosine deaminase.


The cytosine deaminase refers to any enzyme having activity to convert a cytosine, which is found in nucleotide (e.g., cytosine present in double stranded DNA or RNA), to uracil (C-to-U conversion activity or C-to-U editing activity). The cytosine deaminase converts cytosine positioned on a strand where a PAM sequence linked to target sequence is present, to uracil. In an embodiment, the cytosine deaminase may be originated from mammals including primates such as humans and monkeys, rodents such as rats and mice, and the like, but not be limited thereto. For example, the cytosine deaminase may be at least one selected from the group consisting of enzymes belonging to APOBEC (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like) family, and for example, may be at least one selected from the following group, but not be limited to:

    • APOBEC1: Homo sapiens APOBEC1 (Protein: GenBank Accession Nos. NP_001291495.1, NP_001635.2, NP_005880.2, etc.; gene (mRNA or cDNA; described in the order of the above listed corresponding proteins): GenBank Accession Nos. NM_001304566.1, NM_001644.4, NM_005889.3, etc.), Mus musculus APOBEC1 (protein: GenBank Accession Nos. NP_001127863.1, NP_112436.1, etc.; gene: GenBank Accession Nos. NM_001134391.1, NM_031159.3, etc.);
    • APOBEC2: Homo sapiens APOBEC2 (protein: GenBank Accession No. NP_006780.1, etc.; gene: GenBank Accession No. NM_006789.3 etc.), mouse APOBEC2 (protein: GenBank Accession No. NP_033824.1, etc.; gene: GenBank Accession No. NM_009694. 3, etc.);
    • APOBEC3B: Homo sapiens APOBEC3B (protein: GenBank Accession Nos. NP_001257340.1, NP_004891.4, etc.; gene: GenBank Accession Nos. NM_001270411.1, NM_004900.4, etc.), Mus musculus APOBEC3B (proteins: GenBank Accession Nos. NP_001153887.1, NP_001333970.1, NP_084531.1, etc.; gene: GenBank Accession Nos. NM_001160415.1, NM_001347041.1, NM_030255.3, etc.);
    • APOBEC3C: Homo sapiens APOBEC3C (protein: GenBank Accession No. NP_055323.2 etc.; gene: GenBank Accession No. NM 014508.2 etc.);
    • APOBEC3D (including APOBEC3E): Homo sapiens APOBEC3D (protein: GenBank Accession No. NP_689639.2, etc.; gene: GenBank Accession No. NM 152426.3 etc.);
    • APOBEC3F: Homo sapiens APOBEC3F (protein: GenBank Accession Nos. NP_660341.2, NP_001006667.1, etc.; gene: GenBank Accession Nos. NM_145298.5, NM_001006666.1, etc.);
    • APOBEC3G: Homo sapiens APOBEC3G (protein: GenBank Accession Nos. NP_068594.1, NP_001336365.1, NP_001336366.1, NP_001336367.1, etc.; gene: GenBank Accession Nos. NM_021822.3, NM_001349436.1, NM_001349437.1, NM_001349438.1, etc.);
    • APOBEC3H: Homo sapiens APOBEC3H (protein: GenBank Accession Nos. NP_001159474.2, NP_001159475.2, NP_001159476.2, NP_861438.3, etc.; gene: GenBank Accession Nos. NM_001166002.2, NM_001166003. 2, NM_001166004.2, NM_181773.4, etc.);
    • APOBEC4 (including APOBEC3E): Homo sapiens APOBEC4 (protein: GenBank Accession No. NP_982279.1, etc.; gene: GenBank Accession No. NM_203454.2 etc.); mouse APOBEC4 (protein: GenBank Accession No. NP_001074666.1, etc.; gene: GenBank Accession No. NM_001081197.1, etc.); and
    • Activation-induced cytidine deaminase (AICDA or AID): Homo sapiens AID (Protein: GenBank Accession Nos. NP_001317272.1, NP_065712.1, etc; Genes: GenBank Accession Nos. NM_001330343 0.1, NM_020661.3, etc.); mouse AID (protein: GenBank Accession No. NP_033775.1, etc., gene: GenBank Accession No. NM_009645.2, etc.), and the like.


As used herein, a target-specific nuclease is also referred to as a programmable nuclease, and refers to all types of endonuclease that are capable of recognizing and cleaving a specific target position on a genomic DNA.


For example, the target-specific nuclease may be at least one selected from the group consisting of all nuclases capable of recognizing a particular sequence of a target gene and having a nucleotide-cleavage activity thereby inducing insertion and/or deletion (Indel) on the target gene.


For example, the target-specific nuclease may be at least one selected from the group consisting of, but not limited to:

    • a transcription activator-like effector nuclease (TALEN) wherein and a cleavage domain and a transcription activator-like effector domain derived from a plant pathogenic gene that is a domain that recognizes a specific target sequence on the genome are fused;
    • a zinc-finger nuclease;
    • a meganuclease;
    • a RGEN (RNA-guided engineered nuclease; e.g., Cas9, Cpf1, etc.) derived from microorganism immune system, CRISPR; and
    • an Ago homolog, DNA-guided endonuclease.


According to an embodiment, the target-specific nuclease may be at least one selected from the group consisting of endonucleases involved in type II and/or type V of the CRISPR (Clustered regularly interspaced short palindromic repeats) system, such as Cas protein (e.g., Cas9 protein (CRISPR associated protein 9)), Cpf1 protein (CRISPR from Prevotella and Francisella 1), etc. In this regard, the target-specific nuclease may further comprise a target DNA-specific guide RNA for guiding to an on-target site in genomic DNA. The guide RNA may be one transcribed in vitro, for example, from an oligonucleotide duplex or a plasmid template, but is not limited thereto. The target-specific nuclease and the guide RNA may form a ribonucleic acid-protein complex, to act in the form of ribonucleic acid protein (RNP).


Cas9 protein is a main protein component of the CRISPR/Cas system, which can function as an activated endonuclease or nickase.


Cas9 protein or gene information thereof may be acquired from a well-known database such as the GenBank of NCBI (National Center for Biotechnology Information). For example, the Cas9 protein may be at least one selected from the group consisting of, but not limited to:

    • a Cas9 protein derived from Streptococcus sp., for example, Streptococcus pyogenes (e.g., SwissProt Accession number Q99ZW2(NP_269215.1) (encoding gene: SEQ ID NO: 229);
    • a Cas9 protein derived from Campylobacter sp., for example, Campylobacter jejuni;
    • a Cas9 protein derived from Streptococcus sp., for example, Streptococcus thermophiles or Streptocuccus aureus;
    • a Cas9 protein derived from Neisseria meningitidis;
    • a Cas9 protein derived from Pasteurella sp., for example, Pasteurella multocida; and
    • a Cas9 protein derived from Francisella sp., for example, Francisella novicida.


Cpf1 protein, which is an endonuclease of a new CRISPR system distinguished from the CRISPR/Cas system, is small in size compared to Cas9, requires no tracrRNA, and can function with a single guide RNA. In addition, Cpf1 can recognize thymidine-rich PAM (protospacer-adjacent motif) sequences and produces cohesive double-strand breaks (cohesive end).


For example, the Cpf1 protein may be an endonuclease derived from Candidatus spp., Lachnospira spp., Butyrivibrio spp., Peregrinibacteria, Acidominococcus spp., Porphyromonas spp., Prevotella spp., Francisella spp., Candidatus Methanoplasma), or Eubacterium spp. Examples of the microorganism from which the Cpf1 protien may be derived include, but are not limited to, Parcubacteria bacterium (GWC2011_GWC2_44_17), Lachnospiraceae bacterium (MC2017), Butyrivibrio proteoclasiicus, Peregrinibacteria bacterium (GW2011_GWA_33_10), Acidaminococcus sp. (BV3L6), Porphyromonas macacae, Lachnospiraceae bacterium (ND2006), Porphyromonas crevioricanis, Prevotella disiens, Moraxella bovoculi (237), Smiihella sp. (SC_KO8D17), Leptospira inadai, Lachnospiraceae bacterium (MA2020), Francisella novicida (U112), Candidatus Methanoplasma termitum, Candidatus Paceibacter, and Eubacterium eligens.


The target-specific endonuclease may be a microorganism-derived protein or an artificial or non-naturally occurring protein obtained by a recombinant or synthesis method. By way of example, the target-specific endonuclease (e.g., Cas9, Cpf1, and the like) may be a recombinant protein produced with a recombinant DNA. As used herein, the term “recombinant DNA (rDNA)” refers to a DNA molecule artificially made by genetic recombination, such as molecular cloning, to include therein heterogenous or homogenous genetic materials derived from various organisms. For instance, when a target-specific endonuclease is produced in vivo or in vitro by expressing a recombinant DNA in an appropriate organism, the recombinant DNA may have a nucleotide sequence reconstituted with codons selected from among codons encoding the protein of interest in order to be optimal for expression in the organism.


The term “inactivated target-specific endonuclease”, as used herein, refers to a target-specific endonuclease that lacks the endonuclease activity of cleaving a DNA duplex. The inactivated target-specific endonuclease may be at least one selected from among inactivated target-specific endonucleases that lack endonuclease activity, but retain nickase activity, and inactivated target-specific endonuclease that lack both endonuclease activity and nickase activity. In an embodiment, the inactivated target-specific endonuclease may retain nickase activity. In this case, when a cytosine base is converted to a uracil base, a nick is introduced into a strand on which cytosine-to-uracil conversion occurs, or an opposite strand thereto simultaneously or sequentially irrespective of order (for example, a nick is introduced at a position between third and fourth nucleotides in the direction toward the 5′ end of a PAM sequence on a strand opposite to a strand having the PAM sequence). The modification (mutation) of such target-specific endonucleases may include substitution of a catalytic aspartate residue (for Streptococcus pyogenes-derived Cas9 protein, for example, at least one selected from the group consisting of aspartic acid at position 10 (D10)) with a different amino acid, and the different amino acid may be alanine, but is not limited thereto.


As used herein, the expression “different amino acid” may be intended to refer to an amino acid selected from among alanine, isoleucine, leucine, methionine, phenylalanine, proline, tryptophan, valine, asparagine, cysteine, glutamine, glycine, serine, threonine, tyrosine, aspartic acid, glutamic acid, arginine, histidine, lysine, and all known variants thereof, exclusive of the amino acid having a wild-type protein retained at the original substitution position.


In one embodiment, when the inactivated target-specific endonuclease is a modified Cas9 protein, the Cas9 protein may be at least one selected from the group consisting of modified Cas9 that lacks endonuclease activity and retains nickase activity as a result of introducing mutation (for example, substitution with a different amino acid) to D10 of Streptococcus pyogenes-derived Cas9 protein (e.g., SwissProt Accession number Q99ZW2(NP_269215.1)), and modified Cas9 protein that lacks both endonuclease activity and nickase activity as a result of introducing mutations (for example, substitution with different mutations) to both D10 and H840 of Streptococcus pyogenes-derived Cas9 protein. In Cas9 protein, for example, the mutation at D10 may be D10A mutation (the amino acid D at position 10 in Cas9 protein is substituted with A; below, mutations introduced to Cas9 are expressed in the same manner), and the mutation at H840 may be H840A mutation.


The cytidine deaminase and the inactivated target-specific endonuclease may be used in the form of a fusion protein in which they are fused to each other directly or via a peptide linker (for example, existing in the order of cytidine deaminase-inactivated target-specific endonuclease in the N- to C-terminus direction (i.e., inactivated target-specific endonuclease fused to the C-terminus of cytidine deaminase) or in the order of inactivated target-specific endonuclease-cytidine deaminase in the N- to C-terminus direction (i.e., cytidine deaminase fused to the C-terminus of inactivated target-specific endonuclease) (or may be contained in the composition), a mixture of a purified cytidine deaminase or mRNA coding therefor and an inactivated target-specific endonuclease or mRNA coding therefor (or may be contained in the composition), a plasmid carrying both a cytidine deaminase-encoding gene and an inactivated target-specific endonuclease-encoding gene (e.g., the two genes arranged to encode the fusion protein described above) (or may be contained in the composition), or a mixture of a cytidine deaminase expression plasmid and an inactivated target-specific endonuclease expression plasmid which carry a cytidine deaminase-encoding gene and an inactivated target-specific endonuclease-encoding gene, respectively (or may be contained in the composition). In one embodiment, the cytidine deaminase and the inactivated target-specific endonuclease may be in the form of a fusion protein in which they exist in the order of cytidine deaminase-inactivated target-specific endonuclease in the N- to C-terminus direction or in the order of inactivated target-specific endonuclease-cytidine deaminase in the N- to C-terminus direction, or a single plasmid in which a cytidine deaminase-encoding gene and an inactivated target-specific endonuclease-encoding gene are contained to encode the fusion protein.


So long as it carries the cytidine deaminase-encoding gene and/or the inactivated target-specific endonuclease-encoding gene and contains an expression system capable of expressing the gene in a host cell, any plasmid may be used. The plasmid contains elements for expressing a target gene, which include a replication origin, a promoter, an operator, and a terminator, and may further comprise an enzyme site suitable for introduction into the genome of a host cell (e.g., restriction enzyme site), a selection marker for identifying successful introduction into a host cell, a ribosome binding site (RBS) for translation into a protein, and/or a transcriptional regulatory factor. The plasmid may be one used in the art, for example, at least one selected from the group consisting of, but not limited to, pcDNA series, pSC101, pGV1106, pACYC177, ColE1, pKT230, pME290, pBR322, pUC8/9, pUC6, pBD9, pHC79, pIJ61, pLAFR1, pHV14, pGEX series, pET series, and pUC19. The host cell may be selected from among cells to which base editing or a double-strand break is intended to introduced by the cytidine deaminase (for example, eukaryotic cells including mammal cells such as human cells) and all cells that can express the cytidine deaminase-encoding gene and/or the inactivated target-specific endonuclease-encoding gene into cytidine deaminase and inactivated target-specific endonuclease, respectively (for example, E. coli, etc.).


The guide RNA, which acts to guide a mixture or a fusion protein of the cytidine deaminase and the inactivated target-specific endonuclease to an on-target site, may be at least one selected from the group consisting of CRISPR RNA (crRNA), trans-activating crRNA (tracrRNA), and single guide RNA (sgRNA), and may be, in detail, a crRNA:tracrRNA duplex in which crRNA and tracrRNA is coupled to each other, or a single-strand guide RNA (sgRNA) in which crRNA or a part thereof is connected to tracrRNA or a part thereof via an oligonucleotide linker.


Concrete sequences of the guide RNA may be appropriately selected, depending on kinds of the target-specific endonucleases used, or origin microorganisms thereof, and are an optional matter which could easily be understood by a person skilled in the art.


When a Streptococcus pyogenes-derived Cas9 protein is used as a target-specific endonuclease, crRNA may be represented by the following General Formula 1:











(General Formula 1)



(SEQ ID NO: 233)



5′-(Ncas9)I-(GUUUUAGAGCUA)-(Xcas9)m-3′








    • wherein,

    • Ncas9 is a targeting sequence, that is, a region determined according to a sequence at an on-target site in a target gene (i.e., a sequence hybridizable with a sequence of an on-target site), I represents a number of nucleotides included in the targeting sequence and is an integer of 17 to 23 or 18 to 22, for example, 20;

    • the region including 12 consecutive nucleotides (GUUUUAGAGCUA; SEQ ID NO: 230) adjacent to the 3′-terminus of the targeting sequence is essential for crRNA,

    • Xcas9 is a region including m nucleotides present at the 3′-terminal site of crRNA (that is, present adjacent to the 3′-terminus of the essential region), and m may be an integer of 8 to 12, for example, 11 wherein the m nucleotides may be the same or different and are independently selected from the group consisting of A, U, C, and G.





In an embodiment, the Xcas9 may include, but is not limited to, UGCUGUUUUG (SEQ ID NO: 231).


In addition, the tracrRNA may be represented by the following General Formula 2:











(General Formula 2)



(SEQ ID NO: 234)



5′-(Ycas9)p-(UAGCAAGUUAAAAUAAGGCUAGUCCGU







UAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC)-3′








    • wherein,

    • the region represented by 60 nucleotides (UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGC; SEQ ID NO: 232) is essential for tracrRNA,

    • Ycas9 is a region including p nucleotides present adjacent to the 3′-terminus of the essential region, and p may be an integer of 6 to 20, for example, 8 to 19 wherein the p nucleotides may be the same or different and are independently selected from the group consisting of A, U, C, and G.





Further, sgRNA may form a hairpin structure (stem-loop structure) in which a crRNA moiety including the targeting sequence and the essential region thereof and a tracrRNA moiety including the essential region (60 nucleotides) thereof are connected to each other via an oligonucleotide linker (responsible for the loop structure). In greater detail, the sgRNA may have a hairpin structure in which a crRNA moiety including the targeting sequence and essential region thereof is coupled with the tracrRNA moiety including the essential region thereof to form a double-strand RNA molecule with connection between the 3′ end of the crRNA moiety and the 5′ end of the tracrRNA moiety via an oligonucleotide linker.


In one embodiment, sgRNA may be represented by the following General Formula 3:











(General Formula 3)



(SEQ ID NO: 235)



5′-(Ncas9)I-(GUUUUAGAGCUA)-(oligonucleotide







linker)-(UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAAC







UUGAAAAAGUGGCACCGAGUCGGUGC)-3′








    • wherein, (Ncas9)l is a targeting sequence defined as in General Formula 1.





The oligonucleotide linker included in the sgRNA may be 3-5 nucleotides long, for example 4 nucleotides long in which the nucleotides may be the same or different and are independently selected from the group consisting of A, U, C, and G.


The crRNA or sgRNA may further contain 1 to 3 guanines (G) at the 5′ end thereof (that is, the 5′ end of the targeting sequence of crRNA).


The tracrRNA or sgRNA may further comprise a terminator inclusive of 5 to 7 uracil (U) residues at the 3′ end of the essential region (60 nt long) of tracrRNA.


The target sequence for the guide RNA may be about 17 to about 23 or about 18 to about 22, for example, 20 consecutive nucleotides adjacent to the 5′ end of PAM (Protospacer Adjacent Motif (for S. pyogenes Cas9, 5′-NGG-3′ (N is A, T, G, or C)) on a target DNA.


As used herein, the term “the targeting sequence” of guide RNA hybridizable with the target sequence for the guide RNA refers to a nucleotide sequence having a sequence complementarity of 50% or higher, 60% or higher, 70% or higher, 80% or higher, 90% or higher, 95% or higher, 99% or higher, or 100% to a nucleotide sequence of a complementary strand to a DNA strand on which the target sequence exists (i.e., a DNA strand having a PAM sequence (5′-NGG-3′ (N is A, T, G, or C))) and thus can complimentarily couple with a nucleotide sequence of the complementary strand.


In the present specification, a nucleic acid sequence at an on-target site is represented by that of the strand on which a PAM sequence exists among two DNA strands in a region of a target gene. In this regard, the DNA strand to which the guide RNA couples is complementary to a strand on which a PAM sequence exists. Hence, the targeting sequence included in the guide RNA has the same nucleic acid sequence as a sequence at an on-target site, with the exception that U is employed instead of T due to the RNA property. In other words, a targeting sequence of guide RNA and a sequence at the on-target site (or a sequence of a cleavage site) are represented by the same nucleic acid sequence with the exception that T and U are interchanged, in the present specification.


The guide RNA may be used in the form of RNA (or may be contained in the composition) or in the form of a plasmid carrying a DNA coding for the RNA (or may be contained in the composition).


The uracil-specific excision reagent (USER) may include any agent capable of removing uracil that is converted from cytosine by cytosine deaminase and/or introducing DNA cleavage at the position where uracil is removed.


In an embodiment, the uracil-specific excision reagent (USER) may comprise a uracil DNA glycosylase (UDG), endonuclease VIII, or a combination thereof. In an embodiment, the uracil-specific removal reagent may comprise a combination of endonuclease VIII or uracil DNA glycosylase and endonuclease VIII.


The uracil DNA glycosylase (UDG) may refer to an enzyme that acts to remove uracil (U) present in DNA thereby preventing mutagenesis of DNA. It may be at least one selected from the group consisting of enzymes that cleave N-glycosylic bond of uracil to initiate base-excision repair (BER). For example, the uracil DNA glycosylase may be an Escherichia coli uracil DNA glycosylase (e.g., GenBank Accession Nos. ADX49788.1, ACT28166.1, EFN36865.1, BAA10923.1, ACA76764.1, ACX38762.1, EFU59768 1, EFU53885.1, EFJ57281.1, EFU47398.1, EFK71412.1, EFJ92376.1, EFJ79936.1, EF059084.1, EFK47562.1, KXH01728.1, ESE25979.1, ESD99489.1, ESD73882.1, ESD69341.1, etc.), human uracil DNA glycosylase (for example, GenBank Accession Nos. NP_003353.1, NP_550433.1, etc.), mouse uracil DNA glycosylase (for example, GenBank Accession Nos. NP_001035781.1, NP_035807 0.2, etc.), and the like; but not be limited thereto.


The endonuclease VIII functions to remove the uracil-deleted nucleotides. It may be at least one selected from the group consisting of enzymes having N-glycosylase activity to remove uracil damaged by the uracil DNA glycosylase from double-stranded DNA and AP-lyase activity to cut 3′ and 5′ ends of apurinic site (AP site) which is generated by the removal of damaged uracil. For example, the endonuclease VIII may be human endonuclease VIII (e.g., GenBank Accession Nos. BAC06476.1, NP_001339449.1, NP_001243481.1, NP_078884.2, NP_001339448.1, etc.), mouse endonuclease VIII (e.g., GenBank Accession Nos. BAC06477.1, NP082623.1, etc.), Escherichia coli endonuclease VIII (e.g., GenBank Accession Nos. OBZ49008.1, OBZ43214.1, OBZ42025.1, ANJ41661.1, KYL40995.1, KMV55034.1, KMV53379.1, KMV50038.1, KMV40847.1, AQW72152.1, etc.), but not be limited thereto.


In another embodiment, in case of using an inactivated target-specific endonuclease lacking nickase activity as well as endonuclease activity, such as a modified Cas9 which is generated by introducing both of D10A and H840A into Cas9 protien derived from Streptococcus pyogenes; for generating double strand cleavage, the composition may further comprise an endonuclease capable of specifically degrading a DNA single strand region generated by removing uracil on one strand among two strands of DNA (the endonuclease may cleave phosphodiester bonds of both ends of DNA single strand region). The endonuclease capable of specifically degrading a single strand region of DNA may be at least one selected from the group consisting of S1 nuclease (derived from Aspergillus oryzae; e.g., catalog number M5791 (Promega), etc.), Mung bean nuclease, and the like.


By using a cytosine deaminase, an inactivated target-specific endonuclease, and a uracil-specific excision reagent, a double strand break can be generated at a site where a base conversion (base editing) from cytosine to uracil (C→U) by cytosine deaminase occurs (FIG. 4a). The DNA cleavage fragments generated as above have staggered ends. Thereafter, an end repair process may optionally occur, whereby DNA fragments (double stranded) with blunted ends can be generated (see FIG. 4a).


Another embodiment provides a method of generating double strand break using a cytosine deaminase, the method comprising:

    • (i) introducing or contacting (a) a cytosine deaminase and an inactivated target-specific endonuclease, or (b) a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, or (c) a plasmid comprising a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, into a cell or with DNA isolated from a cell, together with a guide RNA; and
    • (ii) treating a uracil-specific excision reagent (USER).


By generating (or introducing) a double strand break into DNA using cytosine deaminase, a base editing (i.e., conversion from C to U) site, a base editing efficiency by a cytosine deaminase, and the like can be analyzed, thereby identifying (or measuring) a base editing efficiency at on-target site, specificity to on-target sequence, an off-target sequence, etc., of cytosine deaminase.


Another embodiment provides a method of analyzing nucleic acid sequence of DNA in which a base editing is introduced by cytosine deaminase, comprising:

    • (i) introducing or contacting (a) a cytosine deaminase and an inactivated target-specific endonuclease, or (b) a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, or (c) a plasmid comprising a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, into a cell or with DNA isolated from a cell, together with a guide RNA;
    • (ii) treating a uracil-specific excision reagent (USER), to generate double strand break in the DNA; and
    • (iii) analyzing nucleic acid sequence of the cleaved DNA fragment.


Another embodiment provides a method of identifying (or measuring or detecting) a base editing site, a base editing efficiency at on-target site, an off-target site, and/or a target-specificity, of cytosine deaminase, comprising:

    • (i) introducing or contacting (a) a cytosine deaminase and an inactivated target-specific endonuclease, or (b) a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, or (c) a plasmid comprising a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, into a cell or with DNA isolated from a cell, together with a guide RNA;
    • (ii) treating a uracil-specific excision reagent (USER), to generate double strand break in the DNA;
    • (iii) analyzing nucleic acid sequence of the cleaved DNA fragment; and
    • (iv) identifying the double strand break site in the nucleic acid sequence read obtained by said analysis.


The cytosine deaminase, inactivated target-specific endonuclease, plasmid, guide RNA and uracil-specific excision reagent are as described above.


The method may be carried out in a cell or in vitro, for example, it may be carried out in vitro. More specifically, all steps of the method are carried out in vitro; or step (i) is carried out in a cell, and step (ii) and subsequent steps are carried out in vitro using DNA (e.g., genomic DNA) extracted from the cell used in step (i).


Said step (i) comprises transfecting a cell or contacting (e.g., co-incubating) DNA extracted from the cell with a cytosine deaminase and an inactivated target-specific endonuclease (or coding genes thereof) together with a guide RNA, to induce conversion from cytosine to uracil and generation of DNA nick in a target site targeted by the guide RNA. The cell may be selected from all eukaryotic cells which are desired to be introduced with a base editing by cytosine deaminase, and for example, it may be selected from mammalian cells including human cells. The transfection can be carried out by introducing a plasmid containing a gene encoding a cytosine deaminase and an inactivated target-specific endonuclease into a cell by any conventional means. For example, the plasmid may be introduced into a cell by electroporation, lipofection, and the like, but not be limited thereto.


In one embodiment, step (i) may be performed by culturing DNA extracted from a cell (a cell to which base editing (e.g., a base editing site, base editing efficiency, etc.) by a cytosine deaminase is to be examined) together with a cytosine deaminase and an inactivated target-specific endonuclease (e.g., a fusion protein comprising a cytosine deaminase and an inactivated target-specific endonuclease) and a guide RNA (in vitro). The DNA extracted from the cell may be a genomic DNA or a PCR (polymerase chain reaction) amplification product containing a target gene or a target site.


Said step (ii) may comprise removing a base modified with uracil in the step (i) to generate DNA double strand break. More specifically, step (ii) may comprise treating (contacting) uracil DNA glycosylase (UDG), endonuclease VIII, or a combination thereof to the reaction product obtained in step (i). When both of uracil DNA glycosylase and endonuclease VIII are treated (contacted), they can be treated at the same time or sequentially in any order. The step of contacting (contacting) may be carried out by incubating the reaction product obtained in step (i) with uracil DNA glycosylase and/or endonuclease VIII.


When step (i) is carried out in a cell (i.e., when the cell is transfected), the reaction sample of step (ii) may comprise DNA isolated from the transfected cell. When step (i) is carried out in vitro for DNA extracted (separated) from a cell, the reaction sample of step (ii) may comprise isolated DNA treated with a cytosine deaminase and an inactivated target-specific endonuclease and a guided RNA.


In another embodiment, when an inactivated target-specific endonuclease generated by introducing both of D10A and H840A into Cas9 protien derived from Streptococcus pyogenes is used in step (i), since the inactivated target-specific endonuclease lacks nickase activity as well as endonuclease activity, for generating double strand cleavage, the method may further comprise a step (step (ii-1)) of treating an endonuclease capable of specifically degrading a DNA single strand region generated by removing uracil on one strand among two strands of DNA (the endonuclease may cleave phosphodiester bonds of both ends of DNA single strand region), after step (ii) and before step (iii) (FIG. 22(a)). The endonuclease capable of specifically degrading a single strand region of DNA may be S1 nuclease, but not be limited thereto.


Optionally, the method may further comprise a step of removing the cytosine deaminase, inactivated target-specific endonuclease, and/or guide RNA used in step (i), after performing (finishing) step (i) and prior to performing step (ii). The cytidine deaminase and inactivated target-specific endonuclease are used together with a guide RNA, thereby having sequence specificity, and thus, they mostly act on an on-target site; however, if similar sequences to a target sequence of on-target site are present on an off-target site, they may also act on the off-target site. As used herein, the term “off-target site” may refer to a site that is not an on-target site, but to which the cytidine deaminase and inactivated target-specific endonuclease have activity. That is, the off-target site may refer to a site where base editing and/or cleavage by cytidine deaminase and inactivated target-specific endonuclease occurs, besides an on-target site. In an embodiment, the term “off-target site” may used to cover not only sites that are not on-target sites of the cytidine deaminase and inactivated target-specific endonuclease, but also sites having possibility to be off-target sites thereof. The off-target sites may refer to, but not be limited to, any sites that are cleaved by the cytidine deaminase and inactivated target-specific endonuclease in vitro, besides on-target sites.


The activity of cytidine deaminase and inactivated target-specific endonuclease on sites besides an on-target site may be caused by various reasons. For example, a sequence (off-target sequence) other than target sequence having low mismatch level to a target sequence designed for a desired target site and high sequence homology with the target sequence, may act as an on-target sequence of cytidine deaminase and inactivated target-specific endonuclease used. The off-target sequence may be a sequence (gene region) having 1 to 6, 1 to 5, 1 to 4, 1 to 3, 1 to 2, or 1 nucleotide mismatch to a target sequence, but not be limited thereto.


The working of the deaminase and the inactivated target-specific endonuclease in an off-target site may incur undesirable mutation in a genome, which may lead to a significant problem. Hence, a process of accurately detecting and analyzing an off-site sequence may be as very important as the activity of the deaminase and the inactivated target-specific endonuclease at an on-target site. The process may be useful for developing a deaminase and an inactivated target-specific endonuclease which both work specifically only at on-target sites without the off-target effect.


Because the cytidine deaminase and the inactivated target-specific endonuclease have activities in vivo and in vitro for the purpose of the present invention, the enzymes can be used in detecting in vitro an off-target site of DNA (e.g., genomic DNA). When applied in vivo, thus, the enzymes are expected to be active in the same sites (gene loci including off-target sequences) as the detected off-target sites.


Step (iii) is a step of analyzing nucleic acid sequence of DNA fragments cleaved in step (ii), and can be performed by any conventional method for analyzing nucleic acid sequence. For example, when the separate DNA used in step (i) is a genomic DNA, the nucleic acid sequence analysis may be conducted by whole genome sequencing. In contrast to the indirect method in which a sequence having a homology with the sequence at an on-target site is searched for and would be predicted to be off-target site, whole genome sequencing allows for detecting an off-target site actually cleaved by the target-specific nuclease at the level of the entire genome, thereby more accurately detecting an off-target site.


As used herein, the term “whole genome sequencing” (WGS) refers to a method of reading the genome by many multiples such as in 10X, 20X, and 40X formats for whole genome sequencing by next generation sequencing. The term “Next generation sequencing” means a technology that fragments the whole genome or targeted regions of genome in a chip-based and PCR-based paired end format and performs sequencing of the fragments by high throughput on the basis of chemical reaction (hybridization).


In the step (iv), a DNA cleavage site is identified (or determined) using the base sequence data (sequence read) obtained in step (ii). By analyzing the sequencing data, an on-target site and an off-target site can simply be detected. The determination of a site at which DNA is cleaved from the base sequence data can be performed by various approaches. In the specification, various reasonable methods are provided for determining the site. However, they are merely illustrative examples that fall within the technical spirit of the present invention, but are not intended to limit the scope of the present invention.


As an example of determining a cleaved site, when the sequence reads obtained by whole genome sequencing are aligned according to sites on a genome, the site at which the 5′ ends are vertically (straightly) aligned may mean the site at which DNA is cleaved. The alignment of the sequence reads according to sites on genomes may be performed using an analysis program (for example, BWA/GATK or ISAAC). As used herein, the term “vertical alignment” refers to an arrangement in which the 5′ ends of two more sequence reads start at the same site (nucleotide position) on the genome for each of the adjacent Watson strand and Crick strand when the whole genome sequencing results are analyzed with a program such as BWA/GATK or ISAA. Through this method, the DNA fragments that are cleaved in step (ii) and thus have the same 5′ end are each sequenced.


That is, when the cleavage in step (ii) occurs at on-target sites and off-target sites, the alignment of the sequence reads allows the vertical alignment of the common cleaved sites because each of their sites start at the 5′ end. However, the 5′ end is not present in the uncleaved sites, so that it can be arranged in a staggered manner in alignment. Accordingly, the vertically aligned site may be regarded as a site cleaved in step (i), which means an on-target site or off-target site cleaved by the inactivated target-specific endonuclease.


The term “alignment” means mapping sequence reads to a reference genome and then aligning the bases having identical sites in genomes to fit for each site. Accordingly, so long as it can align sequence reads in the same manner as above, any computer program may be employed. The program may be one already known in the pertinent art or may be selected from among programs tailored to the purpose. In one embodiment, alignment is performed using ISAAC, but is not limited thereto.


As a result of the alignment, the site at which the DNA is cleaved by the deaminase and the inactivated target-specific endonuclease can be determined by a method such as finding a site where the 5′ end is vertically aligned as described above, and the cleaved site may be determined as an off-target site if not an on-target site. In other words, a sequence is an on-target site if identical to the base sequence designed as an on-target site of the deaminase and inactivated target-specific endonuclease, and is regarded as an off-target site if not identical to the base sequence. This is obvious according to the definition of an off-target site described above. The off-target site may comprise a sequence having homology with the sequence of on-target site; in particular, a sequence having at least one nucleotide mismatch with the on-target site; more particularly, a sequence having 1 to 6, 1 to 5, 1 to 4, 1 to 3, 1 to 2, or 1 nucleotide mismatch with the on-target site; however, the off-target site does not limited thereto, but includes any site capable of being cleaved by the cytidine deaminase and the inactivated target-specific endonuclease used.


In another example, in addition to finding a vertically aligned position at the 5′ end, when the double peak pattern is seen in 5′ end plot, the position can be determined as an off-target site if it is not on-target site. When a graph is drawn by counting the number of nucleotides constituting 5′ end having the same base for each site in a genomic DNA, a double peak pattern appears at a specific position. This is because the double peak is caused by each strand of a double strand cleaved by a cytidine deaminase and inactivated target-specific endonuclease.


Therefore, the method of identifying an off-target site may further comprise, after the step (iv), determining the cleaved site as an off-target site when the site is not an on-target site.


In an embodiment, the steps (i) and (ii) are conducted with regard to the genomic DNA to induce a double-strand break and after the whole genome analysis (step (iii), the DNA reads are aligned with ISAAC to identify alignment patterns for vertical alignment at cleaved sites and staggered alignment at uncleaved sites. A unique pattern of double peaks may appear at the cleavage sites as represented by a 5′ end plot.


Moreover, as a non-limiting examples, a site where two or more sequence reads corresponding to each of Watson strand and Crick strand are aligned vertically may be determined as an off-target site. In addition, a site where 20% or more of sequence reads are vertically aligned and the number of sequence reads having the same 5′ end in each of the Watson and Crick strands is 10 or more is determined as an off-target site, that is, a cleavage site.


The process in steps (iii) and (iv) of the method described above may be Digenome-seq (digested-genome sequencing). For greater details, reference may be made to Korean Patent No. 10-2016-0058703 A (this document is herein incorporated by reference in its entirety).


Base editing sites (i.e., double-strand break site) of cytidine deaminase, base editing efficiency at on-target sites or target-specificity (i.e., [base editing frequency at on-target sites]/[base editing frequency over entire sequence]), and/or off-target sites (identified as base editing sites of deaminase, but not on-target sites) can be identified (or measure or detected) by the method described above.


The identification (detection) of an off-target site is performed in vitro by treating a genomic DNA with the deaminase and the inactivated target-specific endonuclease. Thus, it can be identified whether off-target effects are actually produced also in vivo in the off-target site detected by this method. However, this is merely an additional verification process, and thus is not a step that is essentially entailed by the scope of the present invention, and is merely a step that can be additionally performed according to the needs.


In the present specification, the term “off-target effect” is intended to mean a level at which base editing and/or double-strand break occurs at an off-target site. The term “indel” (insertion and/or deletion) is a generic term for a mutation in which some bases are inserted or deleted in the middle of a base sequence of DNA.


In another embodiment, a method for identifying (or measuring or detecting) a base editing site, a base editing efficiency at on-target site, an off-target site, and/or target-specificity of a cytosine deaminase can be conducted by a method other than the Digenome-seq method as described above.


In a concrete embodiment, the method for identifying (or measuring or detecting) a base editing site, a base editing efficiency at on-target site, an off-target site, and/or target-specificity of a cytosine deaminase may be conducted by circle-seq method (FIG. 20a). For example, the method may comprise the following steps of:

    • (i) fragmenting and circularizing a genomic DNA extracted from a cell;
    • (ii) treating the circularized DNA fragment with a cytosine deaminase and an inactivated target-specific endonuclease, followed by treating with a uracil-specific excision reagent (USER), to generate a double stranded break in the circularized DNA fragment; and
    • (iii) constructing a library using the DNA fragment in which double-strand break is generated, and performing next-generation genome sequencing (NGS).


The cytosine deaminase and inactivated target-specific endonuclease in step (ii) may be used together with a guide RNA.


In another concrete embodiment, the method for identifying (or measuring or detecting) a base editing site, a base editing efficiency at on-target site, an off-target site, and/or target-specificity of a cytosine deaminase may be conducted by Bless method (FIG. 20b). For example, the method may comprise the following steps of:

    • (i) contacting (a) a cytosine deaminase and an inactivated target-specific endonuclease, or (b) a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, or (c) a plasmid comprising a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, with a cell or DNA isolated from a cell;
    • (ii) treating uracil-specific excision reagent (USER), to generate a double stranded break in DNA;
    • (iii) labeling an end of the cleaved DNA fragment and capturing the labeled DNA fragment;
    • (iv) amplifying the captured DNA fragment and performing next generation dielectric sequencing (NGS).


The cytosine deaminase and inactivated target-specific endonuclease, or a gene encoding the same, or a plasmid comprising the gene in step (i) may be used together with a guide RNA or DNA encoding the guide RNA or a plasmid comprising the DNA.


In another concrete embodiment, the method for identifying (or measuring or detecting) a base editing site, a base editing efficiency at on-target site, an off-target site, and/or target-specificity of a cytosine deaminase may be conducted by DSBCapture method (FIG. 20c). For example, the method may comprise the following steps of:

    • (i) contacting (a) a cytosine deaminase and an inactivated target-specific endonuclease, or (b) a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, or (c) a plasmid comprising a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, with a cell or DNA isolated from a cell;
    • (ii) treating uracil-specific excision reagent (USER), to generate a double stranded break in DNA;
    • (iii) performing an end repair and adaptor ligation for the cleaved DNA fragment; and
    • (iv) amplifying the DNA fragment obtained in step (iii) and performing next generation dielectric sequencing (NGS).


The cytosine deaminase and inactivated target-specific endonuclease, or a gene encoding the same, or a plasmid comprising the gene in step (i) may be used together with a guide RNA or DNA encoding the guide RNA or a plasmid comprising the DNA.


Effect of the Invention

The method of generating DNA double-strand break and technologies for analyzing nucleic acid sequence using the method can achieve more accurate and efficient validation of base editing site, a base editing efficiency at on-target site, an off-target site, and/or target-specificity of a cytosine deaminase.





BRIEF DESCRIPTION OF DRAWINGS


FIG. 1a shows the base editing efficiency resulted by BE1 (APOBEC1-dCas9), BE2 (APOBEC1-dCas9-UGI) and BE3 (APOBEC1-nCas9-UGI) (Reference Example 1) on 7 endogenous on-target sites (EMX1, FANCF, HEK2, RNF2, HEK3, HEK4, HBB) in HEK293T cells.



FIG. 1b shows the frequency of Cas9 nuclease-induced mutation measured by targeted deep sequencing at 7 endogenous on-target sites in HEK293T cells.



FIG. 1c is a graph representatively showing base editing efficiency or ranking of indel frequency at 7 endogenous target sites.



FIG. 2a is a graph showing mutation frequency at one of 3 endogenous sites (EMX1) of HEK293T cells which are co-transfected with sgRNA having 0 to 4 mismatches and a plasmid encoding BE3 or Cas9 (wherein the nucleic acid sequences listed are sequentially numbered from SEQ ID NO: 1 to SEQ ID NO: 31 in the downward direction on the graph).



FIG. 2b is a graph showing mutation frequency at one of 3 endogenous sites (HBB) of HEK293T cells which are co-transfected with sgRNA having 0 to 4 mismatches and a plasmid encoding BE3 or Cas9 (wherein the nucleic acid sequences listed are sequentially numbered from SEQ ID NO: 32 to SEQ ID NO: 62 in the downward direction on the graph).



FIG. 2c is a graph showing mutation frequency at one of 3 endogenous sites (RNF2) of HEK293T cells which are co-transfected with sgRNA having 0 to 4 mismatches and a plasmid encoding BE3 or Cas9 (wherein the nucleic acid sequences listed are sequentially numbered from SEQ ID NO: 63 to SEQ ID NO: 93 in the downward direction on the graph).



FIG. 3a is a graph showing Cas9 nuclease associated indel frequency and BE associated base editing frequency at EMX1 site.



FIG. 3b is a graph showing Cas9 nuclease associated indel frequency and BE associated base editing frequency at HBB site.



FIG. 3c is a graph showing Cas9 nuclease associated indel frequency and BE associated base editing frequency at RNF2 site.



FIG. 4a is a schematic view of BE3 Digenome-seq.



FIG. 4b is an electrophoresis image showing the PCR products cleaved by treating BE3 and/or USER.



FIG. 4c is a Sanger sequencing result showing C-to-U conversion by BE3 and DNA cleavage by USER.



FIG. 4d is an IGV image showing straight alignment of the sequence read at on-target site of EMX1.



FIG. 5 is an IGV image showing straight alignment of sequence reads at 6 different on-target sites.



FIGS. 6a (EMX1) and 6b (HBB) are genome-wide circus plots representing DNA cleavage scores obtained with intact genomic DNA (first layer from the center) and genomic DNA digested with BE3 and USER (second layer from the center) or with Cas9 (third layer from the center, only present in FIG. 6b), where the arrow indicates on-target site.



FIGS. 6c (EMX1) and 6d (HBB) show sequence logos obtained via WebLogo using DNA sequences at Digenome-capture sites (Tables 2-8) (DNA cleavage score >2.5).



FIGS. 6e (EMXI) and 6f (HBB) represent scatterplots of BE3-mediated substitution frequencies vs Cas9-mediated indel frequencies determined using targeted deep sequencing, wherein circled dots indicate off-target sites validated by BE3 but invalidated by Cas9.



FIGS. 6g (EMX1) and 6h (HBB) show BE3 off-target sites validated in HEK293T cells by targeted deep sequencing, wherein PAM sequences are the last 3 nucleotides at 3′ end, mismatched bases are shown in small letters, and dashes indicate RNA bulges (Error bars indicate s.e.m. (n=3)).



FIG. 7 is a Venn diagram showing the number of sites with DNA cleavage scores 2.5 or higher identified by Digenome-seq of Cas9 nuclease- and Base editor-treated genomic DNA.



FIG. 8 is a graph showing the number of total sites (▪) and the number of PAM-containing sites with ten or fewer mismatches (D) for a range of DNA cleavage scores.



FIG. 9 is a Venn diagram showing the number of PAM-containing homologous sites with DNA cleavage scores over 0.1 or higher identified by Digenome-seq of Cas9 nuclease- and Base editor-treated genomic DNA.



FIG. 10 shows fractions of homologous sites captured by Digenome-seq, wherein bars represent the number of homologous sites that differ from on-target sites by up to 6nt, squares (BE3) and triangles (Cas9) represent the fraction of Digenome-seq captured sites for a range of mismatch numbers.



FIGS. 11a and 11b are graphs showing the significant correlation between the number of BE3- and Cas9-associated sites identified by Digenome 1.0 (11a) and Digenome 2.0 (11b).



FIGS. 12a and 12b are graphs showing the significant correlation between the number of BE3-associated sites identified by Digenome 1.0 (12a) or Digenome 2.0 (12b) and the number of sites with 6 or fewer mismatches.



FIG. 13 shows examples of Digenome-captured off-target sites associated only with Cas9, which contain no cytosines at positions 4-9.



FIGS. 14a-14c show base editing efficiencies at Digenome-captured sites associated only with 3 different Cas9 nucleases.



FIGS. 15a-15c show base editing efficiencies of 3 different BE3 deaminases at Digenome-negative sites.



FIG. 16a is a schematic view showing conventional (gX19 sgRNA), truncated (gX18 or gX17 sgRNA), and extended sgRNAs (gX20 or ggX20 sgRNA).



FIG. 16b shows base-editing frequencies at the HBB on- and off-target sites in HEK293T cells measured by targeted deep sequencing.



FIG. 17 shows the result of reducing BE3 off-target effects using modified sgRNAs, wherein 17a shows a schematic view of conventional sgRNAs (GX19 sgRNA) and modified sgRNAs (GX17 sgRNA, gX18 sgRNA, gX20 sgRNA, and ggX20 sgRNA), and 17b shows base editing efficiencies (frequencies) measured at the EMX1 on- and off-target sites by targeted deep sequencing in HEK293T cells.



FIG. 18a is a cleavage map of plasmid rAPOBEC1-XTEN-dCas9-NLS.



FIG. 18b is a cleavage map of plasmid rAPOBEC1-XTEN-dCas9-UGI-NLS.



FIG. 18C is a cleavage map of plasmid rAPOBEC1-XTEN-Cas9n-UGI-NLS.



FIG. 19 is a cleavage map of Cas9 expression plasmid.



FIG. 20 is a cleavage map of plasmid pET28b-BE1 encoding His6-rAPOBEC1-XTEN-dCas9.



FIGS. 21a to 21c are schematic overviews of genome-wide off-target profiling by a method other than Digenome-seq, wherein FIG. 21a illustrates a method using circle-seq, FIG. 21b illustrates a method using Bless, and FIG. 21c illustrates a method using DSBCapture.



FIG. 22 shows process and results of BE1 (rAPOBEC1-dCas9)-mediated double strand breaks (DSBs), wherein (a) schematically shows processes to generate DBS using BE1 (rAPOBEC1-dCas9), USER enzyme, and S1 nuclease, and (B) is an agarose gel electrophoresis image showing BE1-mediated DSB results in PCR amplicons obtained after treating BE1/sgRNA, USER enzyme, and S1 nuclease.





DETAILED DESCRIPTION OF THE EMBODIMENTS

Hereinafter, the present invention will be described in more detail with reference to the following examples. However, these are only for illustrating the present invention, and the scope of the present invention is not limited by these examples.


REFERENCE EXAMPLE

1. Cell Culture and Transfection


HEK293T cells (ATCC CRL-11268) were maintained in DMEM (Dulbecco Modified Eagle Medium) supplemented with 10% (w/v) FBS and 1% (w/v) penicillin/streptomycin (Welgene). HEK293T cells (1.5×105) were seeded on 24-well plates and transfected at ˜80% confluency with sgRNA plasmid (500 ng) and Base Editor plasmid (Addgene plasmid #73019 (Expresses BE1 with C-terminal NLS in mammalian cells; rAPOBEC1-XTEN-dCas9-NLS; FIG. 18a), #73020 (Expresses BE2 in mammalian cells; rAPOBEC1-XTEN-dCas9-UGI-NLS; FIG. 18b), #73021 (Expresses BE3 in mammalian cells; rAPOBEC1-XTEN-Cas9n-UGI-NLS; FIG. 18c)) (1.5 μg) or Cas9 expression plasmid (Addgene plasmid #43945; FIG. 19), using Lipofectamine 2000 (Invitrogen). Genomic DNA was isolated using DNeasy Blood & Tissue Kit (Qiagen) at 72 hours after transfection. The cells were not tested for mycoplasma contamination. The sgRNA used in the following Examples was constructed by converting T to U on the overall sequence at an on-target site (on-target sequence; see Tables 1-8), except the 5′-terminal PAM sequence (5′-NGG-3′; wherein N is A, T, G, or C), and employing the converted sequence as the targeting sequence ‘(Ncas9)l’ of the following General Formula 3: 5′-(Ncas9)l-(GUUUUAGAGCUA)-(GAAA)-(UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGC)-3′ (General Formula 3; oligonucleotide linker: GAAA) (SEQ ID NO: 235).


2. Protein Purification


The His6-rAPOBEC1-XTEN-dCas9 protein-coding plasmid (pET28b-BE1; Expresses BE1 with N-terminal His6 tag in E. coli; FIG. 20) was generously given by David Liu (Addgene plasmid #73018). The His6-rAPOBEC1-XTEN-dCas9 protein-coding plasmid pET28b-BE1 was converted into a His6-rAPOBEC1-nCas9 protein (BE3 delta UGI; BE3 variant lacking a UGI domain)-coding plasmid (pET28b-BE3 delta UGI) by site directed mutagenesis for substituting A840 with H840 in the dCas9.


Rosetta expression cells (Novagen, catalog number: 70954-3CN) were transformed with the prepared pET28b-BE1 or pET28b-BE3 delta UGI and cultured overnight in Luria-Bertani (LB) broth containing 100 μg/ml kanamycin and 50 mg/ml carbenicilin at 37° C. Ten ml of the overnight cultures of Rosetta cells containing pET28b-BE1 or pET28b-BE3 delta UGI was inoculated into 400 ml LB broth containing 100 μg/ml kanamycin and 50 mg/ml carbenicilin and cultured at 30° C. until the OD600 reached 0.5-0.6. The cells were cooled to 16° C. for 1 hour, supplemented with 0.5 mM IPTG (Isopropyl β-D-1-thiogalactopyranoside), and cultured for 14-18 hours.


For protein purification, cells were harvested by centrifugation at 5,000×g for 10 min at 4° C. and lysed by sonication in 5 ml lysis buffer (50 mM NaH2PO4, 300 mM NaCl, 1 mM DTT, and 10 mM imidazole, pH 8.0) supplemented with lysozyme (Sigma) and a protease inhibitor (Roche complete, EDTA-free). The soluble lysate obtained after centrifugation of the cell lysis mixture at 13,000 rpm. for 30 min at 4° C. was incubated with Ni-NTA agarose resin (Qiagen) for 1 hour at 4° C. The cell lysate/Ni-NTA mixture was applied to a column and washed with a buffer (50 mM NaH2PO4, 300 mM NaCl, and 20 mM imidazole, pH 8.0). The BE3 protein was eluted with an elution buffer (50 mM NaH2PO4, 300 mM NaCl, and 250 mM imidazole, pH 8.0). The eluted protein was buffer exchanged with a storage buffer (20 mM HEPES-KOH (pH 7.5), 150 mM KCl, 1 mM DTT, and 20% glycerol) and concentrated with centrifugal filter units (Millipore) to give purified rAPOBEC1-XTEN-dCas9 protein and rAPOBEC1-nCas9 protein.


3. Desamination and USER Treatment of PCR Amplification Products


PCR amplification products (10 μg) containing EMX1 site were incubated with purified rAPOBEC1-nCas9 protein (4 μg) and EMX1-specific sgRNA (3 μg) at 100 μl reaction volume for 1 hour at 37° C. The cultures were then incubated for 30 min at 37° C. in a uracil-specific excitation reagent (6 units) (New England Biolabs; containing a mixture of Uracil DNA glycosylase (UDG) and DNA glycosylase-lyase Endonuclease VIII, 50 mM KCl, 5 mM NaCl, 10 mM Tris-HCl (pH 7.4), 0.1 mM EDTA, 1 mM DTT, BSA 175 mg/ml, and 50% (w/v) glycerol) glycerol) and then subjected to agarose gel electrophoresis.


4. Deamination and USER Treatment of Genomic DNA


Genomic DNA was purified (extracted) from HEK293T cells with a DNeasy Blood & Tissue Kit (Qiagen) according to the manufacturer's instructions. Genomic DNA (10 μg) was incubated with the rAPOBEC1-nCas9 protein (300 nM) purified in Reference Example 2 and an sgRNA (900 nM) in a reaction volume of 500 μL for 8 hours at 37° C. in a buffer (100 mM NaCl, 40 mM Tris-HCl, 10 mM MgCl2, and 100 μg/ml BSA, pH 7.9). After removal of sgRNA using RNase A (50 μg/mL), uracil-containing genomic DNA was purified with a DNeasy Blood & Tissue Kit (Qiagen). The on-target site was amplified by PCR using a SUN-PCR blend and subjected to Sanger sequencing to check BE3-mediated cytosine deamination and USER-mediated DNA cleavage.


5. Sequencing of Whole Genome and Digenome


Genomic DNA (1 μg) was fragmented to the 400- to 500-bp range using the Covaris system (Life Technologies) and blunt-ended using End Repair Mix (Thermo Fischer). Fragmented DNA was ligated with adapters to produce libraries, which were then subjected to WGS (whole genome sequencing) using HiSeq X Ten Sequencer (Illumina) at Macrogen.


6. Targeted Deep Sequencing


On-target and potential off-target sites were amplified with a KAPA HiFi HotStart PCR kit (KAPA Biosystems #KK2501) for deep sequencing library generation. Pooled PCR amplicons were sequenced using MiniSeq (Illumina) or Illumina Miseq (LAS Inc. Korea) with TruSeq HT Dual Index system (Illumina).


Example 1. Comparison of BE3-Associated Base Editing Efficiency and Cas9-Associated Indel Frequency in Human Cells

Base editing efficiencies, defined by single-nucleotide substitution frequencies, of three different forms of BEs, at seven genomic loci (EMX1, FANCF, HEK2, RNF2, HEK3, HEK4 and HBB) in HEK293T cells were determined, and compared with genome editing efficiencies, defined by indel frequencies at target sites, of Cas9 nucleases (FIG. 1a,b). FIG. 1a shows the base editing efficiencies resulted from BE1 (APOBEC1-dCas9), BE2 (APOBEC-dCas9-UGI) and BE3 (APOBEC-nCas9-UGI) (Reference Example 1) in seven endogenous target sites (EMX1, FANCF, HEK2, RNF2, HEK3, HEK4, HBB) of HEK293T cells. The base editing efficiency was measured by targeted deep sequencing (Reference Example 6). The efficiency of BE3 [APOBEC-nCas9-UGI (uracil DNA glycosylase inhibitor), 29±6%] is superior to that of BE1 (APOBEC1-dCas9, 5±1%) and BE2 (APOBEC-dCas9-UGI, 8±2%). FIG. 1b shows the Cas9 nuclease-induced mutation frequencies measured by the target deep-sequnctation at 7 endogenous target sites in HEK293T cells (the results were obtained by using the Cas9 expression plasmid of Reference Example 1 (Addgene plasmid #43945; FIG. 19)). These results show that BE3 activity is independent of Cas9 nuclease activity. FIG. 1c is a graph representatively showing the ranking of indel frequency or base editing efficiency at the 7 endogenous on-target sites (see Table 2-8). As shown in FIG. 1c, several sgRNAs exhibit low activity when working together with Cas9, but highly activity when working together with BE3; while some sgRNAs show opposite correlation.


Example 2. Tolerance of BE3 and Cas9 to Mismatched sgRNAs

To assess specificities of BE3 deaminases, it was examined in a cell whether BE3 can tolerate mismatches in small guide RNAs (sgRNAs). To this end, plasmids encoding BE3 or Cas9 (Reference Example 1) eand sgRNAs with one to four mismatches were co-transfected into HEK293T cells, to measure mutation frequencies at three endogenous sites (EMX1, HBB, RNF2).


The used target sites (including the PAM sequence (in bold)) of the sgRNA with 1 to 4 mismatches are summarized in Table 1 below:














TABLE 1





SEQ
EMX1
SEQ

SEQ
RNF2


ID
mismatched
ID
HBB mismatched
ID
mismatched


NO:
sgRNAs
NO:
sgRNAs
NO:
sgRNAs







 1
GgactCGAGC
32
GccatCCCAC
63
GctgcCTTAG



AGAAGAAGAA

AGGGCAGTAA

TCATTACCTG




GGG



CGG



AGG






 2
GAGTttagGC
33
GTTGttttAC
64
GTCActccAG



AGAAGAAGAA

AGGGCAGTAA

TCATTACCTG




GGG



CGG



AGG






 3
GAGTCCGAat
34
GTTGCCCCgtgaG
65
GTCATCTTgactA



gaAAGAAGAA

GCAGTAACGG

TTACCTGAGG




GGG










 4
GAGTCCGAGC
35
GTTGCCCCACAG
66
GTCATCTTAGTC



AGggagAGAA

aatgGTAACGG

gccgCCTGAGG




GGG










 5
GAGTCCGAGC
36
GTTGCCCCACAG
67
GTCATCTTAGTC



AGAAGAgagg

GGCAacggCGG

ATTAttcaAGG




GGG










 6
GAactCGAGC
37
GTcatCCCACAGG
68
GTtgcCTTAGTCA



AGAAGAAGAA

GCAGTAACGG

TTACCTGAGG




GGG










 7
GAGTCtagGC
38
GTTGCtttACAGGG
69
GTCATtccAGTCA



AGAAGAAGAA

CAGTAACGG

TTACCTGAGG




GGG










 8
GAGTCCGAat
39
GTTGCCCCgtgGG
70
GTCATCTTgacC



gGAAGAAGAA

GCAGTAACGG

ATTACCTGAGG




GGG










 9
GAGTCCGAGC
40
GTTGCCCCACAaa
71
GTCATCTTAGTtg



AaggGAAGAA

aCAGTAACGG

cTACCTGAGG




GGG










10
GAGTCCGAGC
41
GTTGCCCCACAG
72
GTCATCTTAGTC



AGAAaggGAA

GGtgaTAACGG

ATcgtCTGAGG




GGG










11
GAGTCCGAGC
42
GTTGCCCCACAG
73
GTCATCTTAGTC



AGAAGAAagg

GGCAGcggCGG

ATTACtcaAGG




GGG










12
GAacCCGAGC
43
GTcaCCCCACAG
74
GTtgTCTTAGTCA



AGAAGAAGAA

GGCAGTAACGG

TTACCTGAGG




GGG










13
GAGTttGAGC
44
GTTGttCCACAGG
75
GTCActTTAGTCA



AGAAGAAGAA

GCAGTAACGG

TTACCTGAGG




GGG










14
GAGTCCagGC
45
GTTGCCttACAGG
76
GTCATCccAGTC



AGAAGAAGAA

GCAGTAACGG

ATTACCTGAGG




GGG










15
GAGTCCGAat
46
GTTGCCCCgtAGG
77
GTCATCTTgaTC



AGAAGAAGAA

GCAGTAACGG

ATTACCTGAGG




GGG










16
GAGTCCGAGC
47
GTTGCCCCACga
78
GTCATCTTAGctA



gaAAGAAGAA

GGCAGTAACGG

TTACCTGAGG




GGG










17
GAGTCCGAGC
48
GTTGCCCCACAG
79
GTCATCTTAGTC



AGggGAAGAA

aaCAGTAACGG

gcTACCTGAGG




GGG










18
GAGTCCGAGC
49
GTTGCCCCACAG
80
GTCATCTTAGTC



AGAAagAGAA

GGtgGTAACGG

ATcgCCTGAGG




GGG










19
GAGTCCGAGC
50
GTTGCCCCACAG
81
GTCATCTTAGTC



AGAAGAgaAA

GGCAacAACGG

ATTAttTGAGG




GGG










20
GAGTCCGAGC
51 
GTTGCCCCACAG
82
GTCATCTTAGTC



AGAAGAAGgg

GGCAGTggCGG

ATTACCcaAGG




GGG










21
GgGTCCGAGC
52
GcTGCCCCACAG
83
GcCATCTTAGTC



AGAAGAAGAA

GGCAGTAACGG

ATTACCTGAGG




GGG










22
GAGcCCGAGC
53
GTTaCCCCACAG
84
GTCgTCTTAGTC



AGAAGAAGAA

GGCAGTAACGG

ATTACCTGAGG




GGG










23
GAGTCtGAGC
54
GTTGCtCCACAGG
85
GTCATtTTAGTC



AGAAGAAGAA

GCAGTAACGG

ATTACCTGAGG




GGG










24
GAGTCCGgGC
55
GTTGCCCtACAGG
86
GTCATCTcAGTC



AGAAGAAGAA

GCAGTAACGG

ATTACCTGAGG




GGG










25
GAGTCCGAGt
56
GTTGCCCCAtAGG
87
GTCATCTTAaTC



AGAAGAAGAA

GCAGTAACGG

ATTACCTGAGG




GGG










26
GAGTCCGAGC
57
GTTGCCCCAC
88
GTCATCTTAGTt



AaAAGAAGAA

AaGGCAGTAACGG

ATTACCTGAGG




GGG










27
GAGTCCGAGC
58
GTTGCCCCACAG
89
GTCATCTTAGTC



AGAgGAAGAA

GaCAGTAACGG

AcTACCTGAGG




GGG










28
GAGTCCGAGC
59
GTTGCCCCACAG
90
GTCATCTTAGTC



AGAAGgAGAA

GGCgGTAACGG

ATTgCCTGAGG




GGG










29
GAGTCCGAGC
60
GTTGCCCCACAG
91
GTCATCTTAGTC



AGAAGAAaAA

GGCAGcAACGG

ATTACtTGAGG




GGG










30
GAGTCCGAGC
61
GTTGCCCCACAG
92
GTCATCTTAGTC



AGAAGAAGAg

GGCAGTAgCGG

ATTACCTaAGG




GGG










31
GAGTCCGAGC
62
GTTGCCCCACAG
93
GTCATCTTAGTC



AGAAGAAGAA

GGCAGTAACGG

ATTACCTGAGG




GGG on tar-


(on target

(on target



get se-

sequence)

sequence)



quence)









(In Table 1, the base position in a lower-case letter refers to the mismatched site)


The results (Indel frequency and cytosine conversion frequency) obtained in the mismatched sequence and the on-target sequence listed in Table 1 are shown in FIGS. 2a to 2c (2a: EMX1, 2b: HBB and 2c: RNF2; Error bars indicate s.e.m. (n=3)). In FIGS. 2a to 2c, the bars indicated as ‘Cn’ show a mutation (substitution with other base or deletion) frequency of cytosine (C) at the n-th position from 5′ end of mismatched sequence or on-target sequence. The Indel frequency and the cytosine conversion frequency (base editing frequency) were measured using the target deep sequencing (Reference Example 6). The primers used for the target deep sequencing are as follows:









EMX1


1st PCR


Forward(5′→3′):


(SEQ ID NO: 94)


AGTGTTGAGGCCCCAGTG;





Reverse(5′→3′):


(SEQ ID NO: 95)


GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCAGCAGCAAGCAGCA





CTCT;





2nd PCR


Forward(5′→3′):


(SEQ ID NO: 96)


ACACTCTTTCCCTACACGACGCTCTTCCGATCTGGGCCTCCTGAGTTTC





TCAT;





Reverse(5′→3′)


(SEQ ID NO: 97)


GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCAGCAGCAAGCAGCA





CTCT;





HBB


1st PCR


Forward(5′→3′):


(SEQ ID NO: 98)


GGCAGAGAGAGTCAGTGCCTA;





Reverse(5′→3′):


(SEQ ID NO: 99)


GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCAGGGCTGGGCATAA





AAGT;





2nd PCR


Forward(5′→3′):


(SEQ ID NO: 100)


ACACTCTTTCCCTACACGACGCTCTTCCGATCTGTCTCCACATGCCCAG





TTTC;





Reverse(5′→3′)


(SEQ ID NO: 101)


GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCAGGGCTGGGCATAA





AAGT;





RNF2


1st PCR


Forward(5′→3′):


(SEQ ID NO: 102)


CCATAGCACTTCCCTTCCAA;





Reverse(5′→3′):


(SEQ ID NO: 103)


GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCCAACATACAGAAG





TCAGGAA;





2nd PCR


Forward(5′→3′):


(SEQ ID NO: 104)


ACACTCTTTCCCTACACGACGCTCTTCCGATCTATTTCCAGCAATGTCT





CAGG;





Reverse(5′→3′)


(SEQ ID NO: 105)


GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCCAACATACAGAAG





TCAGGAA.






In addition, the Cas9 nuclease-associated indel frequency and BE3-associated base editing frequency in EMX1 site (FIG. 3a), HBB site (FIG. 3b), and RNF2 site (FIG. 3c) were measured using mismatched sgRNAs (Table 1), and the obtained results are shown in FIGS. 3a to 3c. As shown in FIGS. 3a-3c, there is a statistically significant correlation (R2=0.70, 0.83, and 0.72 at three sites, respectively) between the Cas9-induced indel frequency and the BE3 induced substitution frequency.


BE3 deaminases and Cas9 nucleases tolerated one-nucleotide (nt) mismatches at almost every position and 2-nt mismatches in the protospacer-adjacent motif (PAM)-distal region but did not tolerate most of the 3-nt or 4-nt mismatches in either the PAM-proximal or distal regions. We noticed, however, that several sgRNAs (indicated by asterisks in FIG. 2) with two or three mismatches were highly active with BE3 but not with Cas9 or vice versa. For example, BE3 with the fully-matched sgRNA or with a 3-nt mismatched sgRNA induced substitutions at comparable frequencies (33% vs. 14%) at the EMX1 site, whereas Cas9 with the same matched and 3-nt mismatched sgRNAs showed widely different indel frequencies (50% vs. 2%) (FIG. 2a). Conversely, BE3 with two 2-nt mismatched sgRNAs was poorly active (substitution frequencies <1%), whereas Cas9 with the same mismatched sgRNAs was highly active (indel frequencies >10%) (FIG. 2a). These results indicate that the tolerance of Cas9 nucleases and BE3 deaminases for mismatched sgRNAs can differ and imply that BE3 and Cas9 could have separate sets of off-target sites in the genome, calling for a method to profile genome-wide specificities of RNA-programmable deaminases.


Example 3. Digenome-Seq for Identifying BE3 Off-Target Sites in Human Genome

Several different cell-based methods, which include GUIDE-seq (Tsai, S.Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nature biotechnology 33, 187-197 (2015)), HTGTS (Frock, R. L. et al. Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nature biotechnology (2014)), BLESS (Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186-191 (2015)), and IDLV capture(Wang, X. et al. Unbiased detection of custom charactercleavage by CRISPR-Cas9 and TALENs using integrase-defective lentiviral vectors. Nature biotechnology 33, 175-178 (2015)), have been developed for identifying genome-wide off-target sites at which Cas9 nucleases induce DSBs. None of these methods, at least in their present forms, are suitable for assessing the genome-wide specificities of programmable deaminases, simply because deaminases do not yield DSBs. We reasoned that DSBs could be produced at deaminated, uracil-containing sites in vitro using appropriate enzymes and that these DNA cleavage sites could be identified via Digenome-seq (digested-genome sequencing; Kim, D., Kim, S., Kim, S., Park, J. & Kim, J. S. Genome-wide target-specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq. Genome research 26, 406-415 (2016); Kim, D. et al. Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells. Nature biotechnology 34, 863-868 (2016); Kim, D. et al. Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells. Nature methods 12, 237-243, 231 p following 243 (2015)), an in vitro method used for assessing genome-wide specificities of Cas9 and Cpf1 nucleases.


To test this idea, a PCR amplicon containing a target sequence was incubated (1) with the recombinant rAPOBEC1-nCas9 protein (Reference Example 2), a derivative of BE3 with no UGI domain, and its sgRNA in vitro to induce C-to-U conversions and a nick in the Watson and Crick strands, respectively, and then (2) with USER (Uracil-Specific Excision Reagent), a mixture of E. coli Uracil DNA glycosylase (UDG) and DNA glycosylase-lyase Endonuclease VIII, to generate a gap at the location of the uracils, giving rise to a composite DSB (FIG. 4a). Next it was investigated whether Digenome-seq could be used to assess genome-wide target-specificities of BE3 deaminases. Human genomic DNA, purified from HEK293T cells, was incubated with each of 7 different BE3 ribonucleoproteins (RNPs) (300 nM rAPOBEC1-nCas9 protein and 900 nM sgRNA each) for 7 hours three times, and then with USER for 3 hours (FIG. 4a).



FIG. 4a shows an outline of the BE3 Digenome-seq, showing the BE3-mediated cleavage of uracil-containing site by USER, a mixture of E. coli Uracil DNA glycosylase (UDG) and DNA glycosylase-lyase Endonuclease VIII. FIG. 4b is an electrophoresis image showing the PCR products cleaved by treating BE3 and/or USER. As shown in FIG. 4b, the PCR amplicon was cleaved, when incubated with both BE3 and USER.


C-to-U conversions induced by BE3 and uracil removal by USER were confirmed by Sanger sequencing (FIG. 4c). FIG. 4c is a Sanger sequencing result showing C-to-U conversion by BE3 and DNA cleavage by USER. Each genomic DNA sample was subjected to whole genome sequencing (WGS) after end repair and adaptor ligation (FIG. 4a).


After sequence alignment to the human reference genome (hg19), we used Integrative Genomics Viewer (IGV) to monitor alignment patterns at each on-target site, and the results are shown in FIGS. 4d and 5, respectively. After sequencing for the human reference genome (hg19), an alignment pattern at the target position was monitored using an Integrative Genomics Viewer (IGV) FIG. 4d is an IGV image showing straight alignment of the sequence read at on-target site of EMX1, and FIG. 5 is an IGV image showing straight alignment of sequence reads at 6 different on-target sites. As shown in FIGS. 4d and 5, uniform alignments of sequence reads, signature patterns associated with DSBs produced in vitro, were observed at all 7 on-target sites.


Example 4. Genome-Wide BE3 Off-Target Sites Revealed by Digenome-Seq

To identify BE3 off-target sites in the human genome, a DNA cleavage score was assigned, based on the number of sequence reads whose 5′ ends aligned at a given position, to each nt position across the genome and listed all the sites with scores over 2.5, a cutoff value that was used for finding off-target sites of Cas9 nucleases with the same set of 7 sgRNAs in the inventor's previous study (Kim, D., Kim, S., Kim, S., Park, J. & Kim, J. S. Genome-wide target-specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq. Genome research 26, 406-415 (2016)) (FIG. 6a-d and Tables 2-8).


The DNA cleavage score at site i of each nucleotide (i.e., the nucleotide position on genomic DNA) was calculated by the following formula:








Score at the






i






site


=





a
=
1

5









C


(


F
i

-
1

)



D
i


×


C


(


R

i
-
4
+
a


-
1

)



D

i
-
4
+
a



×

(


F
i

+

R

i
-
4
+
a


-
2

)



+




a
=
1

5









C


(


R

i
-
1


-
1

)



D

i
-
1



×


C


(


F

i
-
3
+
a


-
1

)



D

i
-
3
+
a



×

(


R

i
-
a


+

F

i
-
3
+
a


-
2

)










    • Fi: Number of forward sequence reads starting at the i site

    • Ri: Number of reverse sequence reads starting at the i site

    • Di: Sequencing depth at the i site

    • C: Arbitrary constant





In the above formula, the number of nucleotide sequence data means the number of nucleotide leads, the sequencing depth means the number of sequencing leads at a specific site, and the C value is 1.


Digenome-captured sites (cleavage site+PAM) and DNA cleavage score are shown in Tables 2 to 8 below:









TABLE 2







(On target: EMX1_4)


EMX1
















DNA
DNA seq at
SEQ






cleavage
a cleavage
ID



ID
Chr
Position
Score
sites
NO
Bulge





EMX1_1
chr15
 44109763
30.53
GAGTCtaAGCAG
106
x






AAGAAGAAGAG







EMX1_2
chr11
 62365273
26.44
GAaTCCaAGCAG
107
x






AAGAAGAgAAG







EMX1_3
chr5
  9227162
23.66
aAGTCtGAGCAc
108
x






AAGAAGAATGG







EMX1_4
chr2
 73160998
14.55
GAGTCCGAGCAG
 31
x






AAGAAGAAGGG







EMX1_5
chr4
131662222
11.14
GAaTCCaAG-AG
109
RNA






AAGAAGAATGG

bulge





EMX1_6
chr8
128801258
 9.60
GAGTCCtAGCAG
110
x






gAGAAGAAGAG







EMX1_7
chr19
 24250503
 8.35
GAGTCCaAGCAG
111
x






tAGAgGAAGGG







EMX1_8
chr1
  4515013
 8.12
GtGTCCtAG-AG
112
RNA






AAGAAGAAGGG

bulge





EMX1_9
chr1
 23720618
 5.96
aAGTCCGAGgAG
113
x






AgGAAGAAAGG







EMX1_10
chr2
219845072
 5.47
GAGgCCGAGCAG
114
x






AAGAAagACGG







EMX1_11
chr8
102244551
 4.70
agtTCCaAGCAG
115
x






AAGAAGcATGG







EMX1_12
chr3
 45605387
 3.11
GAGTCCacaCAG
116
x






AAGAAGAAAGA







EMX1_13
chr16
 12321159
 3.01
GAGTCCaAG-AG
117
RNA






AAGAAGtgAGG

bulge





EMX1_14
chr9
111348573
 1.56
GAGTCCttG-AG
118
RNA






AAGAAGgAAGG

bulge





EMX1_15
chr3
  5031614
 1.50
GAaTCCaAGCAG
119
x






gAGAAGAAGGA







EMX1_16
chr14
 31216733
 1.34
GtacCaGAG-AG
120
RNA






AAGAAGAgAGG

bulge





EMX1_17
chr14
 48932119
 1.16
GAGTCCcAGCAa
121
x






AAGAAGAAAAG







EMX1_18
chr11
107812992
 1.04
aAGTCCaAGt-G
122
RNA






AAGAAGAAAGG

bulge





EMX1_19
chr12
106646090
 1.03
aAGTCCatGCAG
123
x






AAGAgGAAGGG







EMX1_20
chr2
 71969823
 0.80
GAGTCCtAG-AG
124
RNA






AAGAAaAAGGG

bulge





EMX1_21
chr3
145057362
 0.48
GAGTCCct-CAG
125
RNA






gAGAAGAAAGG

bulge





EMX1_22
chr6
  9118799
 0.45
acGTCtGAGCAG
126
x






AAGAAGAATGG







EMX1_23
chr1
 59750259
 0.27
GAGTtCcAGaAG
127
x






AAGAAGAAGAG







EMX1_24
chr11
 79484079
 0.22
GAGTCCtAa-AG
128
RNA






AAGAAGcAGGG

bulge





EMX1_25
chr9
135663403
 0.21
cAGTCCaAaCAG
129
x






AAGAgGAATGG
















TABLE 3







(On target sequence: FANCF_2)


FANCF
















DNA
DNA seq at
SEQ






Cleavage
a cleavage
ID



ID
Chr
Position
Score
sites
NO
Bulge





FANCF_1
chr10
 73463135
13.34
tGAATCCCaTCT
130
x






cCAGCACCAGG







FANCF_2
chr11
 22647338
 7.04
GGAATCCCTTCT
131
x






GCAGCACCTGG







FANCF_3
chr10
 43410030
 6.53
GGAgTCCCTcCT
132
x






aCAGCACCAGG







FANCF_4
chr10
 37953199
 5.67
GGAgTCCCTcCT
133
x






aCAGCACCAGG







FANCF_5
chr11
 47554037
 5.13
GGAATCCCTTCT
134
x






aCAGCAtCCTG







FANCF_6
chr16
 49671025
 3.00
GGAgTCCCTcCT
135
x






GCAGCACCTGA







FANCF_7
chr18
  8707528
 1.26
GGAAcCCCgTCT
136
x






GCAGCACCAGG







FANCF_8
chr7
 44076496
 0.95
GtctcCCCTTCT
137
x






GCAGCACCAGG







FANCF_9
chr9
113162294
 0.46
aaAATCCCTTCc
138
x






GCAGCACCTAG







FANCF_10
chr15
 49119756
 0.42
tGtATttCTTCT
139
x






GCctCAggCTG







FANCF_11
chr2
 54853314
 0.39
GGAATatCTTCT
140
x






GCAGCcCCAGG







FANCF_12
chr8
 21374810
 0.37
GagtgCCCTgaa
141
x






GCctCAgCTGG







FANCF_13
chrX
 86355179
 0.35
accATCCCTcCT
142
x






GCAGCACCAGG







FANCF_14
chr3
 35113165
 0.20
tGAATCCtaaCT
143
x






GCAGCACCAGG







FANCF_15
chr10
  3151994
 0.13
ctctgtCCTTCT
144
x






GCAGCACCTGG
















TABLE 4







(On target sequence: RNF2_1)


RNF2
















DNA
DNA seq at
SEQ






Cleavage
a cleavage
ID



ID
Chr
Position
Score
sites
NO
Bulge





RNF2_1
chr1
185056773
27.66
GTCATCTTAGTC
93
x






ATTACCTGAGG
















TABLE 5







(On target sequence: HBB_1)


HBB
















DNA
DNA seq at
SEQ






Cleavage
a cleavage
ID



ID
Chr
Position
Score
sites
NO
Bulge





HBB_1
chr11
  5248214
17.68
CTTGCCCCACAG
145
x






GGCAGTAACGG







HBB_2
chr17
  8370252
13.64
tTgctCCCACAG
146
x






GGCAGTAAACG







HBB_3
chr12
124803834
10.88
gcTGCCCCACAG
147
x






GGCAGcAAAGG







HBB_4
chrX
 75006256
 2.34
gTgGCCCCACAG
148
x






GGCAGgAATGG







HBB_5
chr12
 93549201
 0.55
aTTGCCCCACgG
149
x






GGCAGTgACGG







HBB_6
chr10
 95791920
 0.27
acTctCCCACAa
150
x






GGCAGTAAGGG







HBB_7
chr9
104595883
 0.18
tcaGCCCCACAG
151
x






GGCAGTAAGGG
















TABLE 6







(On target sequence: HEK2_2)


HEK2
















DNA
DNA seq at
SEQ






Cleavage
a cleavage
ID



ID
Chr
Position
Score
sites
NO
Bulge





HEK2_1
chr4
90522183
18.27
GAACACAAtGCA
152
x






TAGAtTGCCGG







HEK2_2
chr5
87240613
 7.54
GAACACAAAGCA
153
x






TAGACTGCGGG







HEK2_3
chr2
19844956
 0.93
aActcCAAAGCA
154
x






TAtACTGCTGG
















TABLE 7







(On target sequence: HEK3_2)


HEK3
















DNA
DNA seq at
SEQ






Cleavage
a cleavage
ID



ID
Chr
Position
Score
sites
NO
Bulge





HEK3_1
chr1
 47005705
29.27
aGCtCAGACTGA
155
x






GCAaGTGAGGG







HEK3_2
chr9
110184636
11.38
GGCCCAGACTGA
156
x






GCACGTGATGG







HEK3_3
chr19
   882560
10.90
GGCCCAGA--GA
157
RNA






GCACGTGtGGG

bulge





HEK3_4
chr15
 79749930
 3.03
caCCCAGACTGA
158
x






GCACGTGcTGG







HEK3_5
chr17
 34954539
 2.10
GGCCCa-ACTGA
159
RNA






GCAaGTGATGG

bulge





HEK3_6
chrX
114764149
 1.66
aGaCCAGACTGA
160
x






GCAaGaGAGGG







HEK3_7
chr6
 73097166
 0.15
GGCCactcaTGg
161
x






cCACaTacTGG
















TABLE 8







(On target sequence: HEK4_1)


HEK4
















DNA
DNA seq at
SEQ






Cleavage
a cleavage
ID



ID
Chr
Position
Score
sites
NO
Bulge





HEK4_1
chr20
 31349772
19.26
GGCACTGCGGCT
162
x






GGAGGTGGGGG







HEK4_2
chr6
160517881
15.45
GGCACTGCtGCT
163
x






GGgGGTGGTGG







HEK4_3
chr6
168787137
15.37
GGCACTGCa-CT
164
RNA






GGAGGTtGTGG

bulge





HEK4_4
chr19
 33382081
13.83
GGCtCTGCGGCT
165
x






GGAGGgGGTGG







HEK4_5
chr20
 60080553
12.71
aGCACTGCaGaT
166
x






GGAGGaGGCGG







HEK4_6
chr5
141232853
10.87
GGCACTGCGGCa
167
x






GGgaGgaGGGG







HEK4_7
chr20
 60010562
10.51
tGCACTGCGGCc
168
x






GGAGGaGGTGG







HEK4_8
chr13
 70136736
 8.76
GGCACT-gGGCT
169
RNA






GaAGGTaGAGG

bulge





HEK4_9
chr20
  1151854
 8.41
GGCACTGtGGCT
170
x






GcAGGTGGAGG







HEK4_10
chr15
 71686928
 7.70
tGCtCTGCGGCa
171
x






GGAGGaGGAGG







HEK4_11
chr7
  1397398
 6.71
aGCACTGCaGCT
172
x






GGgaGTGGAGG







HEK4_12
chr20
 45343010
 6.57
GGCACTGaGGgT
173
x






GGAGGTGGGGG







HEK4_13
chr8
 20854500
 5.57
GGCACTGgGGCT
174
x






GGAGacGGGGG







HEK4_14
chr7
 54561437
 5.40
aGgACTGCGGCT
175
x






GGgGGTGGTGG







HEK4_15
chr15
 60790561
 5.29
GGCACTGCaaCT
176
x






GGAaGTGaTGG







HEK4_16
chr13
 27629410
 4.40
GGCACTGgGGtT
177
x






GGAGGTGGGGG







HEK4_17
chr7
110143150
 3.69
GcCACTGCaGCT
178
x






aGAGGTGGAGG







HEK4_18
chr7
139244406
 3.59
GcCACTGCGaCT
179
x






GGAGGaGGGGG







HEK4_19
chr19
  2474643
 3.56
GGCACTG-GGCT
180
RNA






GGAGGcGGGGG

bulge





HEK4_20
chr2
  6961255
 3.17
aGCtCTGCGGCa
181
x






GGAGtTGGAGG







HEK4_21
chr17
 75429280
 2.90
GaCACcaCGGCT
182
x






GGAGaTGGTGG







HEK4_22
chr7
 17979717
 2.66
GcactgGCaGCc
183
DNA






GGAGGTGGTGG

bulge





HEK4_23
chr9
  5020590
 2.64
tGCACTGCaGCT
184
x






GcAGGTGGAGG







HEK4_24
chrX
122479548
 2.52
GGCACTG-GGCT
185
RNA






GGAGaTGGAGG

bulge





HEK4_25
chr12
104739608
 2.48
ccttCTGCGGCT
186
x






GGAaGTGGTGG







HEK4_26
chr17
 40693638
 2.38
GcactgcaGGCa
187
DNA






GGAGGTGaGTG

bulge





HEK4_27
chr8
144781301
 2.38
GaCACTGCaGCT
188
x






GGAGGTGGGGT







HEK4_28
chr9
 74103955
 2.36
GGCACTGCaGCa
189
x






GGgGaTGGGGG







HEK4_29
chr18
 37194558
 2.31
GGCACTGCGGgT
190
x






GGAGGcGGGGG







HEK4_30
chr20
 60895671
 2.12
GGCACaGCaGCT
191
x






GGAGGTGcTGG







HEK4_31
chr12
113935460
 1.63
GGCcCTGCGGCT
192
x






GGAGaTatGGG







HEK4_32
chrX
 70597642
 1.57
GaCACTGC-tCT
193
RNA






GGAGGTGGTGG

bulge





HEK4_33
chr15
 41044242
 1.31
GGCgGGAGCTGC
194
x






GGCgGTGGAGG







HEK4_34
chr17
   176302
 1.18
tGCACTGtGGCT
195
x






GGAGaTGGGGG







HEK4_35
chr10
 77103119
 1.15
GGCAtcaCGGCT
196
x






GGAGGTGGAGG







HEK4_36
chr7
134872032
 0.93
aGCACTGtGGCT
197
x






GGgGGaGGCGG







HEK4_37
chr9
133039175
 0.86
GtCACTGCaGCT
198
x






GGAGGaGGGGG







HEK4_38
chr10
 73435248
 0.79
GtaACTGCGGCT
199
x






GGcGGTGGTGG







HEK4_39
chr14
 21993455
 0.78
GGtACaGCGGCT
200
x






GGgGGaGGCGG







HEK4_40
chr17
 29815563
 0.59
GGCgCTGCGGCc
201
x






GGAGGTGGGGC







HEK4_41
chr16
 50300346
 0.56
aGCACTGtGGCT
202
x






GGgGGaGGGGG







HEK4_42
chr11
 78127584
 0.53
tGCACTGCaGCT
203
x






GGAGGcaaCGG







HEK4_43
chr19
  1295086
 0.52
GaCACTGaGGCa
204
x






GGAGGTGGGGG







HEK4_44
chr2
162283033
 0.51
GGCAtctgGGTG
205
x






GCTGGgaGGGG







HEK4_45
chr20
 24376056
 0.47
GGCACTGaGaCc
206
x






aGAGGTGGTGG







HEK4_46
chr16
  1029977
 0.42
GGCACTGCaGac
207
x






GGAGGTGtGGG







HEK4_47
chr19
 47503406
 0.39
GGCACTG-GGCT
208
RNA






GGAGGgGaGAG

bulge





HEK4_48
chr2
231467380
 0.39
GGCACTGCaGCT
209
x






GGgGGTtGGTG







HEK4_49
chr10
 13692636
 0.38
GGCACTGgGGCT
210
x






GGgGGaGGGGG







HEK4_50
chr1
 32471659
 0.34
GGCACTtCaGCT
211
x






GGAGGcaGAGG







HEK4_51
chr17
  8634933
 0.33
GGCACat-GGaT
212
RNA






GGAGGTGGAGG

bulge





HEK4_52
chr6
 83388605
 0.30
aGCACTGtGG-T
213
RNA






GGAGGTGGAGG

bulge





HEK4_53
chr10
 27700491
 0.29
GGCACTG-GGtT
214
RNA






GGgGGTGGTGG

bulge





HEK4_54
chr1
143662284
 0.27
GGCACat-GGCT
215
RNA






GGgGGTGGTGG

bulge





HEK4_55
chr16
 49777696
 0.22
tGCACTGCGaCT
216
x






GGAGGgaGAGG







HEK4_56
chr19
 38616186
 0.19
GGCACTGaGaCT
217
x






GGgGGTGGGGG







HEK4_57
chr10
126752487
 0.18
GGCACTGCaGCc
218
x






tGgGGgtGGGG







HEK4_58
chr16
 28266968
 0.17
GGCtCTtCGGCT
219
x






GGAGGTaGCGG







HEK4_59
chr2
149886210
 0.15
GaCACTG-GGCT
220
RNA






GGAGGTtGCGG

bulge





HEK4_60
chr20
 37471343
 0.15
aGCACTGtGcCT
221
x






GGgGGTGGGGG







HEK4_61
chr12
 53453556
 0.13
tGgACTGCGGCT
222
x






GGAGagGGAGG







HEK4_62
chr15
 30501337
 0.13
GGCACTG-GGCT
223
RNA






GGAtGTGGTGG

bulge





HEK4_63
chr5
139284047
 0.12
GGCACTGaGGCT
224
x






GcAGGcGGCGG







HEK4_64
chr8
119227145
 0.12
GGCACaatGGCT
225
x






GGAGGTGaAGG







HEK4_65
chr14
 95761249
 0.11
GGCACTctGGCT
226
x






GGAGcTGGGGG







HEK4_66
chr3
 23651529
 0.11
GGCACaGCaGgT
227
x






GGAGGTGGAGG







HEK4_67
chr12
  9287415
 0.10
GGCtCTGCaGCc
228
x






aGgGGTGGAGG









(In Tables 2 to 8, the bases in lower case letters represent mismatched bases)



FIGS. 6a and 6b are genome-wide circus plots representing DNA cleavage scores obtained with intact genomic DNA (first layer from the center) and genomic DNA digested with BE3 and USER (second layer from the center) or with Cas9 (third layer from the center, only present in FIG. 6b), where the arrow indicates on-target site. FIGS. 6c and 6d show sequence logos obtained via WebLogo using DNA sequences at Digenome-capture sites (Tables 2-8) (DNA cleavage score >2.5). FIGS. 6e and 6f represent scatterplots of BE3-mediated substitution frequencies vs Cas9-mediated indel frequencies determined using targeted deep sequencing, wherein circled dots indicate off-target sites validated by BE3 but invalidated by Cas9. FIGS. 6g and 6h show BE3 off-target sites validated in HEK293T cells by targeted deep sequencing, wherein PAM sequences are the last 3 nucleotides at 3′ end, mismatched bases are shown in small letters, and dashes(-) indicate RNA bulges (Error bars indicate s.e.m. (n=3)).


The primers used in the deep sequencing are summarized in Tables 9 to 15 below:









TABLE 9







EMX1










1st PCR
2nd PCR
















SEQ 
Forward 
SEQ 
Reverse 
SEQ 
Forward 
SEQ 
Reverse 


ID
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)





EMX1_1
236
GCCTTTTTCCG
237
GTGACTGGAGT
238
ACACTCTTTCCC
239
GTGACTGGAGT




GACACATAA

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTAT

CTCTTCCGATCT






GCCTCATTATCA

CTCACCTGGGC

GCCTCATTATCA






TCAGTGTTGG

GAGAAAG

TCAGTGTTGG





EMX1_2
240
ACACTCTTTCCC
241
GTCTCTGTGAAT
242
ACACTCTTTCCC
243
GTGACTGGAGT




TACACGACGCT

GGCGTCAC

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTGT



CTTCCGATCTGT

CTCTTCCGATCT




CCCAGACCTTC



CCCAGACCTTC

CACTGTCTGCA




ATCTCCA



ATCTCCA

GGGCTCTCT





EMX1_3
244
ACACTCTTTCCC
245
TCAAATTGTTTA
246
ACACTCTTTCCC
247
GTGACTGGAGT




TACACGACGCT

ATAGCTCTGTTG

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTTT

TT

CTTCCGATCTTT

CTCTTCCGATCT




GGTCCCACAGG



GGTCCCACAGG

TTTTTGGTCAAT




TGAATAAC



TGAATAAC

ATCTGAAAGGTT





EMX1_4
248
AGTGTTGAGGC
249
GTGACTGGAGT
250
ACACTCTTTCCC
251
GTGACTGGAGT


(on

CCCAGTG

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG


target)



CTCTTCCGATCT

CTTCCGATCTG

CTCTTCCGATCT






CAGCAGCAAGC

GGCCTCCTGAG

CAGCAGCAAGC






AGCACTCT

TTTCTCAT

AGCACTCT





EMX1_5
252
ACACTCTTTCCC
253
AAAAGATGTGG
254
ACACTCTTTCCC
255
GTGACTGGAGT




TACACGACGCT

TATATACATACG

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTCT

ATGG

CTTCCGATCTCT

CTCTTCCGATCT




GAAAATTTATGA



GAAAATTTATGA

CAAACAAAGAA




CAATTTACTACC



CAATTTACTACC

GGAAAGTCCTC




A



A

A





EMX1_6
256
ACACTCTTTCCC
257
TGTCTCATTGGC
258
ACACTCTTTCCC
259
GTGACTGGAGT




TACACGACGCT

TTTTTCTTTTC

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTG



CTTCCGATCTG

CTCTTCCGATCT




CTTGCCTGTGT



CTTGCCTGTGT

GCCCAGCTGTG




GACTTGAC



GACTTGAC

CATTCTATC





EMX1_7
260
ACACTCTTTCCC
261
CCCAGCTACAC
262
ACACTCTTTCCC
263
GTGACTGGAGT




TACACGACGCT

GTCACAATG

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTTG



CTTCCGATCTTG

CTCTTCCGATCT




AGCCCTATGAA



AGCCCTATGAA

TAGGGTCCAGG




AAGATTGC



AAGATTGC

CAAGAGAAA





EMX1_8
264
ACACTCTTTCCC
265
TCTGTCTGGCA
266
ACACTCTTTCCC
267
GTGACTGGAGT




TACACGACGCT

GATGATACCC

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTAC



CTTCCGATCTAC

CTCTTCCGATCT




ATTGCTACCCCT



ATTGCTACCCCT

ATCTGCTTCCTC




TGGTGA



TGGTGA

GTGGTCAT





EMX1_9
268
ACACTCTTTCCC
269
GATCTGATCTTA
270
ACACTCTTTCCC
271
GTGACTGGAGT




TACACGACGCT

CCCCAGAAGC

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTC



CTTCCGATCTC

CTCTTCCGATCT




GGTTCCGGTAC



GGTTCCGGTAC

CTGCTACTTGG




TTCATGTC



TTCATGTC

CTGACCACA





EMX1_10
272
CTCCTCCGACC
273
GTGACTGGAGT
274
ACACTCTTTCCC
275
GTGACTGGAGT




AGCAGAG

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTAA

CTCTTCCGATCT






TCCCTCAGCCA

GGAGGTGCAGG

TCCCTCAGCCA






CTTTATTTCA

AGCTAGA

CTTTATTTCA





EMX1_11
276
GGTGCTGTGGG
277
GTGACTGGAGT
278
ACACTCTTTCCC
279
GTGACTGGAGT




GGCATAG

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTCC

CTCTTCCGATCT






ACAGGCGAACA

TTGATTTGGAG

ACAGGCGAACA






GAACAGACA

GGGTCTT

GAACAGACA





EMX1_12
280
CCCTTTCTTAAT
281
GTGACTGGAGT
282
ACACTCTTTCCC
283
GTGACTGGAGT




AAATTACCCAGT

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG




TTC

CTCTTCCGATCT

CTTCCGATCTTG

CTCTTCCGATCT






AAAAAGATAGG

GACTAAAACACT

AAAAAGATAGG






CAAACATAGGA

GCCCAAG

CAAACATAGGA






AAA



AAA





EMX1_13
284
GCTTTTCTGGG
285
GTGACTGGAGT
286
ACACTCTTTCCC
287
GTGACTGGAGT




GACATAGCA

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTAC

CTCTTCCGATCT






AAGAATTCCAG

TTCCCTTGTCAT

AAGAATTCCAG






GCAGTTAACCA

CCCACA

GCAGTTAACCA





EMX1_14
288
CACAGGAATGT
289
GTGACTGGAGT
290
ACACTCTTTCCC
291
GTGACTGGAGT




CTTGGGTCA

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTCT

CTCTTCCGATCT






CTCTTCAATCCA

TAGCCTGGGTC

CTCTTCAATCCA






TCGCCAGT

ATGCACT

TCGCCAGT





EMX1_15
292
ACACTCTTTCCC
293
GCACTTGTTGG
294
ACACTCTTTCCC
295
GTGACTGGAGT




TACACGACGCT

CCATTTGTA

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTTG



CTTCCGATCTTG

CTCTTCCGATCT




AGGAGGCAAAA



AGGAGGCAAAA

TTTTGAATATGT




GGGAATA



GGGAATA

TTTAAATTCTCC










ACA





EMX1_16
296
ACACTCTTTCCC
297
GCACAGAGGGT
298
ACACTCTTTCCC
299
GTGACTGGAGT




TACACGACGCT

TGTTTGCTT

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTAA



CTTCCGATCTAA

CTCTTCCGATCT




GGCTAGCCCAG



GGCTAGCCCAG

TTCATCCTTTTG




AGTCTCC



AGTCTCC

TGGGGTTC





EMX1_17
300
GGAATCAATCAA
301
GTGACTGGAGT
302
ACACTCTTTCCC
303
GTGACTGGAGT




TGAAGTTGAAG

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG




A

CTCTTCCGATCT

CTTCCGATCTG

CTCTTCCGATCT






TTTGCAATTTGC

CAATCTGAAGAA

TTTGCAATTTGC






TTAGTTATTGAA

CAAAGAGCA

TTAGTTATTGAA





EMX1_18
304
ACACTCTTTCCC
305
TCAAGAGACTG
306
ACACTCTTTCCC
307
GTGACTGGAGT




TACACGACGCT

TTGTTTTAGATT

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTTG

GTC

CTTCCGATCTTG

CTCTTCCGATCT




ACATTTGATAGA



ACATTTGATAGA

CCCAGTCCAAT




ACAGATGGGTA



ACAGATGGGTA

GGCTGTAGT





EMX1_19
308
CCCTGCAAATT
309
GTGACTGGAGT
310
ACACTCTTTCCC
311
GTGACTGGAGT




GAGTACGTG

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTTG

CTCTTCCGATCT






GTCCCGAAGTG

GGGGCCATTCT

GTCCCGAAGTG






CTGGAATTA

TTATAGTT

CTGGAATTA





EMX1_20
312
GACAGTCCTGG
313
GTGACTGGAGT
314
ACACTCTTTCCC
315
GTGACTGGAGT




GCTAGGTGA

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTGA

CTCTTCCGATCT






CTCTGGACTCA

GAGTCAGGAGT

CTCTGGACTCA






GCTCCCATC

GCCCAGT

GCTCCCATC





EMX1_21
316
ACACTCTTTCCC
317
AGATGAATGCA
318
ACACTCTTTCCC
319
GTGACTGGAGT




TACACGACGCT

GGGAGCTGT

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTCC

CACCATTG

CTTCCGATCTCC

CTCTTCCGATCT




TCTCATTTCTAC



TCTCATTTCTAC

TTCTGAATTAAA








CACCATTG

AATGGAAAGAA










CTG





EMX1_22
320
ACAATTTCAGTA
321
GTGACTGGAGT
322
ACACTCTTTCCC
323
GTGACTGGAGT




GTAGCATTAAG

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG




GAAT

CTCTTCCGATCT

CTTCCGATCTGA

CTCTTCCGATCT






TTGTGACAAACT

ATGCCAGTTCT

TTGTGACAAACT






GCCCTCTG

GGGTTGT

GCCCTCTG





EMX1_23
324
ACACTCTTTCCC
325
CAAAAATCAACT
326
ACACTCTTTCCC
327
GTGACTGGAGT




TACACGACGCT

CAAGATGGATTA

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTAA

AA

CTTCCGATCTAA

CTCTTCCGATCT




TTTCTGAACCCA



TTTCTGAACCCA

GAGAACCTAGG




AAGACAGG



AAGACAGG

GAAAACTCTTCTG





EMX1_24
328
ACACTCTTTCCC
329
CTTGTGGATCAT
330
ACACTCTTTCCC
331
GTGACTGGAGT




TACACGACGCT

GGGTACTGAG

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTCC



CTTCCGATCTCC

CTCTTCCGATCT




AAGCTATTTAAC



AAGCTATTTAAC

TGGGCCTTGGT




TGGTATGCAC



TGGTATGCAC

ATTAGAGCA





EMX1_25
332
ACACTCTTTCCC
333
TGCTTTTTCACT
334
ACACTCTTTCCC
335
GTGACTGGAGT




TACACGACGCT

TGTCTAGTTTTC

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTTC

TT

CTTCCGATCTTC

CTCTTCCGATCT




AAGGGGGTATA



AAGGGGGTATA

AACAATTTCCCA




TAAAAGGAAGA



TAAAAGGAAGA

CAAAGTCCA
















TABLE 10







FANCF










1st PCR
2nd PCR
















SEQ 
Forward 
SEQ 
Reverse 
SEQ 
Forward 
SEQ 
Reverse 


ID
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)





FANCF_1
336
CTGAAGGTGCT
337
GTGACTGGAGT
338
ACACTCTTTCCC
339
GTGACTGGAGT




GGTTTAGGG

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTTG

CTCTTCCGATCT






TGTCTGATTGAG

ACATCCAGGGT

TGTCTGATTGAG






TCCCCACA

TTCAAGTC

TCCCCACA





FANCF_2
340
ACACTCTTTCCC
341
TGACATGCATTT
342
ACACTCTTTCCC
343
GTGACTGGAGT


(on

TACACGACGCT

CGACCAAT

TACACGACGCT

TCAGACGTGTG


target)

CTTCCGATCTAT



CTTCCGATCTAT

CTCTTCCGATCT




GGATGTGGCGC



GGATGTGGCGC

AGCATTGCAGA




AGGTAG



AGGTAG

GAGGCGTAT





FANCF_3
344
CCTCAGGGATG
345
GTGACTGGAGT
346
ACACTCTTTCCC
347
GTGACTGGAGT




GATGAAGTG

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTCC

CTCTTCCGATCT






TCCCAGTGAGA

CTTACCAGATG

TCCCAGTGAGA






CCAGTTTGA

GAGGACA

CCAGTTTGA





FANCF_4
348
CCCTTACCAGAT
349
GTGACTGGAGT
350
ACACTCTTTCCC
351
GTGACTGGAGT




GGAGGACA

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTGT

CTCTTCCGATCT






ACCTTGAGTTTT

GACCCAGGTCC

ACCTTGAGTTTT






GCCCAGTG

AGTGTTT

GCCCAGTG





FANCF_5
352
AGCTTTAAAATG
353
GTGACTGGAGT
354
ACACTCTTTCCC
355
GTGACTGGAGT




GGGAATCCA

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTCT

CTCTTCCGATCT






TTCCCAGCACT

CCAGTACAGGG

TTCCCAGCACT






GTTCTGTTG

GCTTTTG

GTTCTGTTG





FANCF_6
356
ACACAGGGTGC
357
GTGACTGGAGT
358
ACACTCTTTCCC
359
GTGACTGGAGT




AGTGGTACA

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTAG

CTCTTCCGATCT






TGGGGAGTATC

GTGCTTCTGCA

TGGGGAGTATC






CTTGCAATC

GGTCATC

CTTGCAATC





FANCF_7
360
ACGCCAGCACT
361
GTGACTGGAGT
362
ACACTCTTTCCC
363
GTGACTGGAGT




TTCTAAGGA

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTG

CTCTTCCGATCT






CACAGATTGAT

CCTGCTGCACT

CACAGATTGAT






GCCACTGGA

CTCTGAGTA

GCCACTGGA





FANCF_8
364
ACACTCTTTCCC
365
ACACCTCCGAG
366
ACACTCTTTCCC
367
GTGACTGGAGT




TACACGACGCT

GCCTTCT

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTTT



CTTCCGATCTTT

CTCTTCCGATCT




TCCTCAACCTTT



TCCTCAACCTTT

CAGGTCCTCCT




TCTGCTG



TCTGCTG

CTCCCAGTT





FANCF_9
368
ACACTCTTTCCC
369
GCCAGGATTTC
370
ACACTCTTTCCC
371
GTGACTGGAGT




TACACGACGCT

CTCAAACAA

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTCC



CTTCCGATCTCC

CTCTTCCGATCT




TGAATAACTAAA



TGAATAACTAAA

GCCAAGTTCCC




TGACAACATGG



TGACAACATGG

ATAAGCAAA





FANCF_10
372
GCTCTCAAATG
373
GTGACTGGAGT
374
ACACTCTTTCCC
375
GTGACTGGAGT




GCTCCAAAC

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTTC

CTCTTCCGATCT






CAGAGTGGCCT

CTCCATCTCATT

CAGAGTGGCCT






GCTTACAATC

CCCATC

GCTTACAATC





FANCF_11
376
GCCGAGAATTA
377
GTGACTGGAGT
378
ACACTCTTTCCC
379
GTGACTGGAGT




CCACGACAT

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTTC

CTCTTCCGATCT






GGCACACAGCT

ACAGCGAGGAA

GGCACACAGCT









GTACGTAGG

GGACAAT

GTACGTAGG


FANCF_12
380
ACACTCTTTCCC
381
CTCCTCAGTGG
382
ACACTCTTTCCC
383
GTGACTGGAGT




TACACGACGCT

GTGAAGTCC

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTG



CTTCCGATCTG

CTCTTCCGATCT




GAGCTCTCAGT



GAGCTCTCAGT

ACGGAGAGGTC




TGGACTGG



TGGACTGG

ACATGAAGG





FANCF_13
384
TGAAAAGCAGT
385
GTGACTGGAGT
386
ACACTCTTTCCC
387
GTGACTGGAGT




CTAGGACACAA

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG




A

CTCTTCCGATCT

CTTCCGATCTTG

CTCTTCCGATCT






CAACTCTGCCAT

GCAGGCTAGGT

CAACTCTGCCAT






GTGCCTTA

TTAGAGC

GTGCCTTA





FANCF_14
388
CACATATGAAAT
389
GTGACTGGAGT
390
ACACTCTTTCCC
391
GTGACTGGAGT




ATTAAATTTGAA

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG




CCA

CTCTTCCGATCT

CTTCCGATCTTG

CTCTTCCGATCT






GGGAATATAGA

AACCATGTTACC

GGGAATATAGA






AAAATCAAGAGA

TTTTGACC

AAAATCAAGAGA






TGG



TGG





FANCF_15
392
CGTCTTCGCTCT
393
GTGACTGGAGT
394
ACACTCTTTCCC
395
GTGACTGGAGT




TTGGTTTT

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTTG

CTCTTCCGATCT






CACCCTGTAGA

TGGCACATAGT

CACCCTGTAGA






TCTCTCTCACG

CGTAACCTC

TCTCTCTCACG
















TABLE 11







RNF2










1st PCR
2nd PCR
















SEQ 
Forward 
SEQ 
Reverse 
SEQ 
Forward 
SEQ 
Reverse 


ID
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)





RNF2_1
396
CCATAGCACTTC
397
GTGACTGGAGTT
398
ACACTCTTTCCC
399
GTGACTGGAGTT


(on

CCTTCCAA

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT


target)



CTTCCGATCTGC

TTCCGATCTATTT

CTTCCGATCTGC






CAACATACAGAA

CCAGCAATGTCT

CAACATACAGAA






GTCAGGAA

CAGG

GTCAGGAA
















TABLE 12







HBB










1st PCR
2nd PCR
















SEQ 
Forward 
SEQ 
Reverse 
SEQ 
Forward 
SEQ 
Reverse 


ID
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)





HBB_1
400
GGCAGAGAGAG
401
GTGACTGGAGTT
402
ACACTCTTTCCC
403
GTGACTGGAGTT


(on

TCAGTGCCTA

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT


target)



CTTCCGATCTCA

TTCCGATCTGTC

CTTCCGATCTCA






GGGCTGGGCAT

TCCACATGCCCA

GGGCTGGGCAT






AAAAGT

GTTTC

AAAAGT





HBB_2
404
ACACTCTTTCCC
405
GTGGGTGTCCTG
406
ACACTCTTTCCC
407
GTGACTGGAGTT




TACACGACGCTC

GGTTGTT

TACACGACGCTC

CAGACGTGTGCT




TTCCGATCTCCT



TTCCGATCTCCT

CTTCCGATCTCA




ACAGCCTGCGA



ACAGCCTGCGA

CCTGGAGGCTA




GGAATA



GGAATA

GGCACT





HBB_3
408
CCCACACAGGTT
409
GTGACTGGAGTT
410
ACACTCTTTCCC
411
GTGACTGGAGTT




TTCTCCTC

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT






CTTCCGATCTCT

TTCCGATCTCTT

CTTCCGATCTCT






AGGCCTTCACCT

CCCTAGACCTGC

AGGCCTTCACCT






GGAACC

CTCCT

GGAACC





HBB_4
412
ACACTCTTTCCC
413
CAGAAAATAAAG
414
ACACTCTTTCCC
415
GTGACTGGAGTT




TACACGACGCTC

CAGCTGACTCAC

TACACGACGCTC

CAGACGTGTGCT




TTCCGATCTTTG



TTCCGATCTTTG

CTTCCGATCTCC




TGTAACAGCCAC



TGTAACAGCCAC

TGGCAAAAGTGT




TCACCA



TCACCA

TTGGAT





HBB_5
416
TTTGCATTCCTTT
417
GTGACTGGAGTT
418
ACACTCTTTCCC
419
GTGACTGGAGTT




TAGCTTCTTTT

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT






CTTCCGATCTAG

TTCCGATCTATG

CTTCCGATCTAG






CTACCACGGTGA

GCTGTTATTCAG

CTACCACGGTGA






CAGTAACA

GGAAA

CAGTAACA





HBB_6
420
ACACTCTTTCCC
421
AAATGGTAAAAA
422
ACACTCTTTCCC
423
GTGACTGGAGTT




TACACGACGCTC

GAAACTCAAATG

TACACGACGCTC

CAGACGTGTGCT




TTCCGATCTTCC

C

TTCCGATCTTCC

CTTCCGATCTGG




ACTTTGTTAGTC



ACTTTGTTAGTC

ATACCACTGGGC




AGGAGATTC



AGGAGATTC

TTCTGA





HBB_7
424
TTCAAATCTGGA
425
GTGACTGGAGTT
426
ACACTCTTTCCC
427
GTGACTGGAGTT




AAATAATCTATCA

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT




CC

CTTCCGATCTAT

TTCCGATCTTTT

CTTCCGATCTAT






TTCCAGGCTATG

CATACCCTTTCC

TTCCAGGCTATG






CTTCCA

CGTTC

CTTCCA
















TABLE 13







EK2










1st PCR
2nd PCR
















SEQ 
Forward 
SEQ 
Reverse 
SEQ 
Forward 
SEQ 
Reverse 


ID
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)





HEK2_1
428
ACACTCTTTCCC
429
TTTTCTTGTGAA
430
ACACTCTTTCCC
431
GTGACTGGAGTT




TACACGACGCTC

ACAGAAATGTCA

TACACGACGCTC

CAGACGTGTGCT




TTCCGATCTCGT



TTCCGATCTCGT

CTTCCGATCTAA




ACTATGCAAGCC



ACTATGCAAGCC

TGCTCCCACACC




ACATTG



ACATTG

ATTTTT





HEK2_2
432
ACACTCTTTCCC
433
TTCCCAAGTGAG
434
ACACTCTTTCCC
435
GTGACTGGAGTT


(on

TACACGACGCTC

AAGCCAGT

TACACGACGCTC

CAGACGTGTGCT


target)

TTCCGATCTAGG



TTCCGATCTAGG

CTTCCGATCTAA




ACGTCTGCCCAA



ACGTCTGCCCAA

AATTGTCCAGCC




TATGT



TATGT

CCATCT





HEK2_3
436
ATTTACAAAACTT
437
GTGACTGGAGTT
438
ACACTCTTTCCC
439
GTGACTGGAGTT




AGGAGAATCAAA

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT




GG

CTTCCGATCTCA

TTCCGATCTTCA

CTTCCGATCTCA






GCTGCTGTTATC

AAGGAAAAGCAA

GCTGCTGTTATC






CTTCCTC

CGTGA

CTTCCTC
















TABLE 14







HEK3










1st PCR
2nd PCR
















SEQ 
Forward 
SEQ 
Reverse 
SEQ 
Forward 
SEQ 
Reverse 


ID
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)





HEK3_1
440
GCAGTTGCTTG
441
GTGACTGGAGT
442
ACACTCTTTCCC
443
GTGACTGGAGT




ACTAGAGGTAG

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG




C

CTCTTCCGATCT

CTTCCGATCTTC

CTCTTCCGATCT






AGTGATGTGGG

CAGATTCCTGGT

AGTGATGTGGG






AGGTTCCTG

CCAAAG

AGGTTCCTG





HEK3_2
444
AAGGCATGGAT
445
GTGACTGGAGT
446
ACACTCTTTCCC
447
GTGACTGGAGT


(on

GAGAGAAGC

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG


target)



CTCTTCCGATCT

CTTCCGATCTAA

CTCTTCCGATCT






CTCCCTAGGTG

ACGCCCATGCA

CTCCCTAGGTG






CTGGCTTC

ATTAGTC

CTGGCTTC





HEK3_3
448
CTCAGGAGGCT
449
GTGACTGGAGT
450
ACACTCTTTCCC
451
GTGACTGGAGT




GAGGTAGGA

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTAG

CTCTTCCGATCT






ACGTGTCTGCG

GAAGATGAGGC

ACGTGTCTGCG






GTTAGCAG

TGCAGTG

GTTAGCAG





HEK3_4
452
TTATGCGGCAAA
453
GTGACTGGAGT
454
ACACTCTTTCCC
455
GTGACTGGAGT




ACAAAATG

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTGA

CTCTTCCGATCT






TCGTCGCTGAC

TCTCATCCCCTG

TCGTCGCTGAC






AATTTCTGA

TTGACC

AATTTCTGA





HEK3_5
456
TGTTATCAACTG
457
GTGACTGGAGT
458
ACACTCTTTCCC
459
GTGACTGGAGT




GGGGTTGC

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTAG

CTCTTCCGATCT






TCCTTCATGGAC

AGGGGCATCTC

TCCTTCATGGAC






TGGTAGGC

GTGTAGA

TGGTAGGC





HEK3_6
460
ACACTCTTTCCC
461
AAGCTATGATGT
462
ACACTCTTTCCC
463
GTGACTGGAGT




TACACGACGCT

GATGTGACTGG

TACACGACGCT

TCAGACGTGTG




CTTCCGATCTTG



CTTCCGATCTTG

CTCTTCCGATCT




TGTGCATGGTTC



TGTGCATGGTTC

CATGGTGTCTCA




ATCTCC



ATCTCC

CCCCTGTA





HEK3_7
464
GCCATGATCCT
465
GTGACTGGAGT
466
ACACTCTTTCCC
467
GTGACTGGAGT




CGTGATTTT

TCAGACGTGTG

TACACGACGCT

TCAGACGTGTG






CTCTTCCGATCT

CTTCCGATCTTC

CTCTTCCGATCT






ACTTACCGAAG

TCATGCTGTCTT

ACTTACCGAAG






GCAGGGACT

GGATAAACA

GCAGGGACT
















TABLE 15







HEK4










1st PCR
2nd PCR
















SEQ 
Forward 
SEQ 
Reverse 
SEQ 
Forward 
SEQ 
Reverse 


ID
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)
ID NO:
(5′to3′)





HEK4_1
468
ACACTCTTTCCC
469
GACGTCCAAAAC
470
ACACTCTTTCCC
471
GTGACTGGAGTT


(on

TACACGACGCTC

CAGACTCC

TACACGACGCTC

CAGACGTGTGCT


target)

TTCCGATCTCTC



TTCCGATCTCTC

CTTCCGATCTAC




CCTTCAAGATGG



CCTTCAAGATGG

TCCTTCTGGGGC




CTGAC



CTGAC

CTTTT





HEK4_2
472
TCCCCAATGTTT
473
GTGACTGGAGTT
474
ACACTCTTTCCC
475
GTGACTGGAGTT




TCTTGTGA

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT






CTTCCGATCTGA

TTCCGATCTTAG

CTTCCGATCTGA






TTACACAGAGGA

AAGCGGACCCC

TTACACAGAGGA






GGCACCA

ACATAG

GGCACCA





HEK4_3
476
TGAGAGAACATG
477
GTGACTGGAGTT
478
ACACTCTTTCCC
479
GTGACTGGAGTT




GTGCTTTG

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT






CTTCCGATCTAG

TTCCGATCTGAA

CTTCCGATCTAG






GCTGTGGTAGG

TGTGGACAGCAT

GCTGTGGTAGG






GACTCAC

TGCAT

GACTCAC





HEK4_4
480
ACACTCTTTCCC
481
AACCAACATGGT
482
ACACTCTTTCCC
483
GTGACTGGAGTT




TACACGACGCTC

GGGACACT

TACACGACGCTC

CAGACGTGTGCT




TTCCGATCTCCA



TTCCGATCTCCA

CTTCCGATCTAG




GAAGAGTGTGGT



GAAGAGTGTGGT

GCTGTGGTGAAG




GCAGT



GCAGT

AGGATG





HEK4_5
484
GGAGTTAGGCGT
485
GTGACTGGAGTT
486
ACACTCTTTCCC
487
GTGACTGGAGTT




AGCTTCAGG

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT






CTTCCGATCTCC

TTCCGATCTAAT

CTTCCGATCTCC






TGGCACAGACCT

CCAATCAATGGG

TGGCACAGACCT






TCCTAA

AGCAT

TCCTAA





HEK4_6
488
ACACTCTTTCCC
489
GCTGGTCATGCA
490
ACACTCTTTCCC
491
GTGACTGGAGTT




TACACGACGCTC

GTGTCTGT

TACACGACGCTC

CAGACGTGTGCT




TTCCGATCTAAA



TTCCGATCTAAA

CTTCCGATCTCC




GCCCAGCTCTGC



GCCCAGCTCTGC

CCATTTCTGCCT




TGATA



TGATA

GATTT





HEK4_7
492
ACACTCTTTCCC
493
TGGGCTCAACCC
494
ACACTCTTTCCC
495
GTGACTGGAGTT




TACACGACGCTC

AGGTGT

TACACGACGCTC

CAGACGTGTGCT




TTCCGATCTGGG



TTCCGATCTGGG

CTTCCGATCTCC




CATGGCTTCTGA



CATGGCTTCTGA

GGATGATTCTCC




GACT



GACT

TACTTCC





HEK4_8
496
ACACTCTTTCCC
497
AGTTGTGGGGTT
498
ACACTCTTTCCC
499
GTGACTGGAGTT




TACACGACGCTC

TTCTGCTG

TACACGACGCTC

CAGACGTGTGCT




TTCCGATCTGCC



TTCCGATCTGCC

CTTCCGATCTAT




AACTAGAGGCAG



AACTAGAGGCAG

TCTGGAGGCAAC




ACAGG



ACAGG

TCCTCA





HEK4_9
500
GGCAAAACCCAT
501
GTGACTGGAGTT
502
ACACTCTTTCCC
503
GTGACTGGAGTT




TCCAGAAG

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT






CTTCCGATCTTG

TTCCGATCTACC

CTTCCGATCTTG






TTAGGAGCTCCC

ACGTCAGGACTT

TTAGGAGCTCCC






CATCAC

GTGTG

CATCAC





HEK4_10
504
ATGTTAGCCGGG
505
GTGACTGGAGTT
506
ACACTCTTTCCC
507
GTGACTGGAGTT




ATGGTCTA

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT






CTTCCGATCTTC

TTCCGATCTGAT

CTTCCGATCTTC






CAGGGTATCAGG

CTCTTGACTTGG

CAGGGTATCAGG






AAAGGTT

TGATCCA

AAAGGTT





HEK4_11
508
ACACTCTTTCCC
509
CACAGCCCATCT
510
ACACTCTTTCCC
511
GTGACTGGAGTT




TACACGACGCTC

CTCCACTC

TACACGACGCTC

CAGACGTGTGCT




TTCCGATCTAAA



TTCCGATCTAAA

CTTCCGATCTTG




TCCTCAGCACAC



TCCTCAGCACAC

GGCTCCAACCTC




GACAA



GACAA

TTCTAA





HEK4_12
512
CCCTGGTGAGCA
513
GTGACTGGAGTT
514
ACACTCTTTCCC
515
GTGACTGGAGTT




AACACAC

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT






CTTCCGATCTCA

TTCCGATCTCCC

CTTCCGATCTCA






GGTCCTGTGCCA

ACGTGGTATTCA

GGTCCTGTGCCA






CCTC

CCTCT

CCTC





HEK4_13
516
GCCATCTAATCA
517
GTGACTGGAGTT
518
ACACTCTTTCCC
519
GTGACTGGAGTT




CAGCCACA

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT






CTTCCGATCTGC

TTCCGATCTCTC

CTTCCGATCTGC






ATCTTGTCCCTT

CTGGGTGCTCAG

ATCTTGTCCCTT






CTCAGC

ACTTC

CTCAGC





HEK4_14
520
ACACTCTTTCCC
521
CACCATGCCTGG
522
ACACTCTTTCCC
523
GTGACTGGAGTT




TACACGACGCTC

CTAATTTT

TACACGACGCTC

CAGACGTGTGCT




TTCCGATCTGTT



TTCCGATCTGTT

CTTCCGATCTTT




GAGAAGCAGCAA



GAGAAGCAGCAA

AGTAGGGACGG




GGTGA



GGTGA

GGTTTCA





HEK4_15
524
CAGAACCCAAGG
525
GTGACTGGAGTT
526
ACACTCTTTCCC
527
GTGACTGGAGTT




CTCTTGAC

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT






CTTCCGATCTAT

TTCCGATCTTCC

CTTCCGATCTAT






TTTGCTCAGACC

AAGATGCCTTCT

TTTGCTCAGACC






CAGCAT

GCTCT

CAGCAT





HEK4_16
528
ACACTCTTTCCC
529
TTTCTCACGATG
530
ACACTCTTTCCC
531
GTGACTGGAGTT




TACACGACGCTC

ACATTTTGG

TACACGACGCTC

CAGACGTGTGCT




TTCCGATCTAAC



TTCCGATCTAAC

CTTCCGATCTCG




AGAGCCCTGCA



AGAGCCCTGCA

GAGGAGGTAGAT




GAACAT



GAACAT

TGGAGA





HEK4_17
532
ACACTCTTTCCC
533
TGTTCCTAGAGC
534
ACACTCTTTCCC
535
GTGACTGGAGTT




TACACGACGCTC

AACCTTCACA

TACACGACGCTC

CAGACGTGTGCT




TTCCGATCTCAT



TTCCGATCTCAT

CTTCCGATCTGG




GTATGCAGCTGC



GTATGCAGCTGC

AGAGCCAGAGT




TTTTGA



TTTTGA

GGCTAAA





HEK4_18
536
CTGAAAGAGGGA
537
GTGACTGGAGTT
538
ACACTCTTTCCC
539
GTGACTGGAGTT




GGGGAGAC

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT






CTTCCGATCTCT

TTCCGATCTCTC

CTTCCGATCTCT






TCGCCAGGTCTT

GGGAGAGAGGA

TCGCCAGGTCTT






CTGTTC

AAGGAC

CTGTTC





HEK4_19
540
ACACTCTTTCCC
541
GACGCATCCCAC
542
ACACTCTTTCCC
543
GTGACTGGAGTT




TACACGACGCTC

CTCCTC

TACACGACGCTC

CAGACGTGTGCT




TTCCGATCTCCC



TTCCGATCTCCC

CTTCCGATCTCT




GGCCGATTTAAC



GGCCGATTTAAC

GGGGCACGAAA




TTTTA



TTTTA

TGTCC





HEK4_20
544
CCAGGAACAGA
545
GTGACTGGAGTT
546
ACACTCTTTCCC
547
GTGACTGGAGTT




GGGACCAT

CAGACGTGTGCT

TACACGACGCTC

CAGACGTGTGCT






CTTCCGATCTCC

TTCCGATCTCCA

CTTCCGATCTCC






TGGTTCCAGTCA

GGTCCAGAGACA

TGGTTCCAGTCA






CCTCTC

AGACG

CCTCTC










FIG. 7 is a Venn diagram showing the number of sites with DNA cleavage scores 2.5 or higher identified by Digenome-seq of Cas9 nuclease- and Base editor-treated genomic DNA.


As can be seen from the above results, seven BE3 deaminases plus USER cleaved human genomic DNA in vitro at just 1-24 (8±3) sites, far fewer than did Cas9 nucleases with the same set of sgRNAs (70±30 sites) in a multiplex Digenome-seq analysis (Kim, D., Kim, S., Kim, S., Park, J. & Kim, J. S. Genome-wide target-specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq. Genome research 26, 406-415 (2016)) (FIG. 7). This means that BE3 has far fewer potential, not necessarily genuine, off-target sites than does Cas9. Sequence logos, obtained by comparing Digenome-identified sites, showed that both the PAM-distal and PAM-proximal regions contributed to the specificities of BE3 deaminases (FIG. 6c, d).


The inventors further improved the computer program (termed Digenome 2.0) to identify potential off-target sites more comprehensively. The inventors counted the number of positions whose DNA cleavage scores were over a cutoff value that ranged from 0.0001 to 10 and the number of PAM (5′-NGN-3′ or 5′-NNG-3′)-containing sites with 10 or fewer mismatches, compared to the on-target site, among the positions with scores over the cutoff value (FIG. 8). FIG. 8 is a graph showing the number of total sites (▪) and the number of PAM-containing sites with ten or fewer mismatches (□) for a range of DNA cleavage scores. Such result was obtained by performing whole genome sequencing (WGS) for intact human genomic DNA (left) and human genomic DNA (right) cleaved by BE3 and USER. Cutoff score of 0.1 was selected, because WGS data obtained using intact genomic DNA, which had not been treated with BE3 and USER and thus served as a negative control, did not yield any false-positive sites with this cutoff score 0.1 (FIG. 8). Based on these results, in determining off-target sites by Digenome 2.0, sites with DNA cleavage score of 0.1 or more and 10 or less mismatch and having PAM (5′-NGN-3′ or 5′-NNG-3′) are determined as a off-target sites. In determining off-target sites by Digenome 2.0, sites with DNA cleavage score of 2.5 or more are determined as off-target sites. On the other hand, in the off-target localization by Digenome 1.0, a site with a DNA cleavage score of 2.5 or more is determined as off-target site candidates.


With Digenome 2.0, it was able to identify many additional BE3- and Cas9-associated DNA cleavage sites, including two sites that had been missed in the previous study ((Kim, D., Kim, S., Kim, S., Park, J. & Kim, J. S. Genome-wide target-specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq. Genome Res (2016)) but had been captured by both HTGTS and GUIDE-seq using EMX1-specific Cas9. FIG. 9 is a Venn diagram showing the number of PAM-containing homologous sites with DNA cleavage scores over 0.1 or higher identified by Digenome-seq of Cas9 nuclease- and Base editor-treated genomic DNA. BE3 deaminases induced base conversions in vitro at 1-67 (18±9) sites, whereas Cas9 nucleases cleaved genomic DNA at 30-241 (90±30) sites.


Example 5. Fraction of Homologous Sites Captured by Digenome-Seq

The inventors examined the BE3- and Cas9-associated sites as shown in FIGS. 7 and 9. FIG. 10 shows fractions of homologous sites captured by Digenome-seq, wherein bars represent the number of homologous sites that differ from on-target sites by up to 6nt, squares (BE3) and triangles (Cas9) represent the fraction of Digenome-seq captured sites for a range of mismatch numbers. As shown in FIG. 10, regardless of the number of mismatches, fewer homologous sites were identified by Digenome-seq when BE3 was used than when Cas9 was used.



FIGS. 11a and 11b are graphs showing the significant correlation between the number of BE3- and Cas9-associated sites identified by Digenome 1.0 (11a) and Digenome 2.0 (11b). As shown in FIGS. 11a and 11b, there was a statistically significant correlation [R2=0.97 (Score >2.5, Digenome 1.0) or 0.86 (Digenome 2.0)] between the number of Cas9- and BE3-associated sites. These results suggest that sgRNAs were the primary determinants of both Cas9 and BE3 specificities.



FIGS. 12a and 12b show the correlation between the number of BE3-associated sites identified by Digenome 1.0 (12a) or Digenome 2.0 (12b) and the number of sites with 6 or fewer mismatches. As shown in FIGS. 12a and 12b, a strong correlation [R2=0.94 (Digenome 1.0) or 0.95 (Digenome 2.0)] was observed between the number of BE3-associated, Digenome-captured sites and the number of homologous sites with 6 mismatches in the human genome (defined as “orthogonality”). Of particular interest are those associated with BE3 alone or Cas9 alone. Interestingly, 69% (=18/26) of sites associated with BE3 alone had missing or extra nucleotides, compared to their respective on-target sites, producing, respectively, an RNA or DNA bulge at the DNA-gRNA interface (Table 1). By contrast, these bulge-type off-target sites were rare among Cas9-associated sites. Just 4% (=25/647) of sites associated with Cas9 had missing or extra nucleotides.



FIG. 13 shows examples of Digenome-captured off-target sites associated only with Cas9, which contain no cytosines at positions 4-9. Thirteen % (=73/548) of sites associated with Cas9 alone had no cytosines at positions 4-8 (numbered 1-20 in the 5′ to 3′ direction), the window of BE3-mediated deamination.


To validate off-target effects at BE3-associated sites identified by Digenome-seq, the inventors performed targeted deep sequencing and measured BE3-induced substitution frequencies and Cas9-induced indel frequencies in HEK293T cells. The results are shown in 6e to 6h as above and Table 16 as below.









TABLE 16





Mutation frequencies of Cas9 and BE3 in


on-target and off-target sites captured by Digenome-seq







EMX1


Base editing efficiency (%)


































SEQ ID



























NO. 31
G
A
G
T
C
C
G
A
G
C
A
G
A
A
G
A
A
G
A
A
G
G
G





On-target
C-
Untreated




0.04
0.06



0.15















(EMX1_4)
>other
(+)BE1




8.49
4.72



0.08
















bases
(+)BE2




11.08 
10.72 



0.09

















(+)BE3




49.17 
45.06 



0.10

















SEQ ID



























NO: 106
G
A
G
T
C
t
a
A
G
C
A
G
A
A
G
A
A
G
A
A
G
A
G


































EMX1_1
C-
Untreated




0.04




0.05















>other
(+)BE1




3.13




0.05















bases
(+)BE2




0.75




0.05
















(+)BE3




15.57 




0.07





































SEQ ID



























NO: 107
G
A
a
T
C
C
a
A
G
C
A
G
A
A
G
A
A
G
A
g
A
A
G


































EMX1_2
C-
Untreated




0.08
0.08



0.07















>other
(+)BE1




0.65
0.31



0.06















bases
(+)BE2




0.32
0.32



0.07
















(+)BE3




0.84
0.81



0.07





































SEQ ID



























NO: 108
a
A
G
T
C
t
G
A
G
C
A
C
A
A
G
A
A
G
A
A
T
G
G


































EMX1_3
C-
Untreated




0.02




0.07

0.06













>other
(+)BE1




0.02




0.07

0.04













bases
(+)BE2




0.02




0.05

0.05














(+)BE3




0.13




0.07

0.05





































SEQ ID



























NO: 109
G
A
a
T
C
C
a
A
G

A
G
A
A
G
A
A
G
A
A
T
G
G


































EMX1_5
C-
Untreated




0.06
0.10



















>other
(+)BE1




0.63
0.24



















bases
(+)BE2




0.32
0.34




















(+)BE3




0.96
0.96





































SEQ ID



























NO: 110
G
A
G
T
C
C
t
A
G
C
A
G
9
A
G
A
A
G
A
A
G
A
G


































EMX1_6
C-
Untreated




0.02
0.04



0.04















>other
(+)BE1




0.06
0.07



0.07















bases
(+)BE2




0.07
0.08



0.05
















(+)BE3




2.43
12.40



0.04





































SEQ ID



























NO: 111
G
A
G
T
C
C
a
A
G
C
A
G
t
A
G
A
g
G
A
A
G
G
G


































EMX1_7
C-
Untreated




0.03
0.06



0.06















>other
(+)BE1




0.07
0.10



0.07















bases
(+)BE2




0.03
0.06



0.09
















(+)BE3




0.05
0.09



0.07





































SEQ ID



























NO: 112
G
t
G
T
C
C
t
A
G

A
G
A
A
G
A
A
G
A
A
G
G
G


































EMX1_8
C-
Untreated




0.05
0.03



















>other
(+)BE1




0.64
0.57



















bases
(+)BE2




0.54
0.39




















(+)BE3




0.37
0.34





































SEQ ID



























NO: 113
a
A
G
T
C
C
G
A
G
g
A
G
A
g
G
A
A
G
A
A
A
G
G


































EMX1_9
C-
Untreated




0.05
0.16



















>other
(+)BE1




0.06
0.18



















bases
(+)BE2




0.06
0.17




















(+)BE3




0.09
0.25





































SEQ ID



























NO: 114
G
A
G
g
C
C
G
A
G
C
A
G
A
A
G
A
A
a
g
A
C
G
G


































EMX1_10
C-
Untreated




0.14
0.10



0.13















>other
(+)BE1




0.44
0.24



0.16















bases
(+)BE2




0.51
0.48



0.15
















(+)BE3




3.45
3.70



0.17





































SEQ ID



























NO: 115
a
g
t
T
C
C
a
A
G
C
A
G
A
A
G
A
A
G
c
A
T
G
G


































EMX1_11
C-
Untreated




0.06
0.05



0.07








0.06






>other
(+)BE1




1.19
0.44



0.08








0.07






bases
(+)BE2




0.46
0.43



0.05








0.07







(+)BE3




0.74
0.62



0.06








0.07





































SEQ ID



























NO: 116
G
A
G
T
C
C
a
C
a
C
A
G
A
A
G
A
A
G
A
A
A
G
A


































EMX1_12
C-
Untreated




0.08
0.26

0.11

0.11















>other
(+)BE1




0.08
0.24

0.11

0.11















bases
(+)BE2




0.08
0.23

0.10

0.10
















(+)BE3




0.17
0.33

0.17

0.10





































SEQ ID



























NO: 117
G
A
G
T
C
C
a
A
G

A
G
A
A
G
A
A
G
t
g
A
G
G


































EMX1_13
C-
Untreated




0.08
0.12



















>other
(+)BE1




0.07
0.11



















bases
(+)BE2




0.07
0.11




















(+)BE3




0.08
0.13





































SEQ ID



























NO: 118
G
A
G
T
C
C
t
A
G

A
G
A
A
G
A
A
G
g
A
A
G
G


































EMX1_14
C-
Untreated




0.06
0.13



















>other
(+)BE1




0.09
0.17



















bases
(+)BE2




0.05
0.10




















(+)BE3




0.05
0.13





































SEQ ID



























NO: 119
G
A
a
T
C
C
a
A
G
C
A
G
g
A
G
A
A
G
A
A
G
G
A


































EMX1_15
C-
Untreated




0.04
0.07



0.05















>other
(+)BE1




0.03
0.08



0.06















bases
(+)BE2




0.04
0.07



0.06
















(+)BE3




0.14
0.18



0.05





































SEQ ID



























NO: 120
G
t
a
c
C
a
G
A
G

A
G
A
A
G
A
A
G
A
g
A
G
G


































EMX1_16
C-
Untreated



0.06
0.06




















>other
(+)BE1



0.05
0.05




















bases
(+)BE2



0.05
0.05





















(+)BE3



0.05
0.05





































SEQ ID



























NO: 121
G
A
G
T
C
C
C
A
G
C
A
a
A
A
G
A
A
G
A
A
A
A
G


































EMX1_17
C-
Untreated




0.10
0.19
0.09


0.07















>other
(+)BE1




0.13
0.17
0.09


0.05















bases
(+)BE2




0.10
0.20
0.06


0.03
















(+)BE3




0.11
0.20
0.07


0.07





































SEQ ID



























NO: 122
a
A
G
T
C
C
a
A
G
t

G
A
A
G
A
A
G
A
A
A
G
G


































EMX1_18
C-
Untreated




0.05
0.09



















>other
(+)BE1




0.08
0.09



















bases
(+)BE2




0.08
0.10




















(+)BE3




0.09
0.11





































SEQ ID



























NO. 123
a
A
G
T
C
C
a
t
G
C
A
G
A
A
G
A
g
G
A
A
G
G
G


































EMX1_19
C-
Untreated




0.03
0.07



0.10















>other
(+)BE1




0.17
0.10



0.12















bases
(+)BE2




0.09
0.14



0.08
















(+)BE3




0.24
0.30



0.12





































SEQ ID



























NO: 124
G
A
G
T
C
C
t
A
G

A
G
A
A
G
A
A
a
A
A
A
G
G


































EMX1_20
C-
Untreated




0.05
0.12



















>other
(+)BE1




0.28
0.24



















bases
(+)BE2




0.39
0.42




















(+)BE3




0.50
0.57





































SEQ ID



























NO: 125
G
A
G
T
C
C
c
t

C
A
G
9
A
G
A
A
G
A
A
A
G
G


































EMX1_21
C-
Untreated




0.16
0.08
0.07


0.03















>other
(+)BE1




0.15
0.10
0.06


0.04















bases
(+)BE2




0.20
0.13
0.11


0.05
















(+)BE3




0.20
0.12
0.10


0.06





































SEQ ID



























NO: 126
a
c
G
T
C
t
G
A
G
C
A
G
A
A
G
A
A
G
A
A
T
G
G


































EMX1_22
C-
Untreated

0.14


0.04




0.11















>other
(+)BE1

0.17


0.36




0.10















bases
(+)BE2

0.13


0.14




0.11
















(+)BE3

0.15


0.62




0.12





































SEQ ID



























NO: 127
G
A
G
T
t
C
c
A
G
a
A
G
A
A
G
A
A
G
A
A
G
A
G


































EMX1_23
C-
Untreated





0.06
0.08


















>other
(+)BE1





0.09
0.13


















bases
(+)BE2





0.06
0.10



















(+)BE3





0.06
0.09





































SEQ ID



























NO: 128
G
A
G
T
C
C
t
A
a

A
G
A
A
G
A
A
G
c
A
G
G
G


































EMX1_24
C-
Untreated




0.05
0.18












0.11






>other
(+)BE1




0.04
0.18












0.12






bases
(+)BE2




0.05
0.19












0.11







(+)BE3




0.05
0.22












0.12





































SEQ ID



























NO: 129
c
A
G
T
C
C
a
A
a
C
A
G
A
A
G
A
g
G
A
A
T
G
G


































EMX1_25
C-
Untreated
0.11



0.05
0.11



0.11















>other
(+)BE1
0.08



0.10
0.10



0.10















bases
(+)BE2
0.10



0.06
0.10



0.11
















(+)BE3
0.11



0.07
0.13



0.11










FANCF


Base editing efficiency (%)





































SEQ ID



























NO: 131
G
G
A
A
T
C
C
C
T
T
C
T
G
C
A
G
C
A
C
C
T
G
G


































On-target
C-
Untreated





0.06
0.10
0.04


0.03


0.13


0.13

0.05
0.04




(FANCF_2)
>other
(+)BE1





0.81
10.39 
0.42


0.07


0.12


0.13

0.06
0.03





bases
(+)BE2





2.11
12.06 
1.97


0.39


0.14


0.09

0.07
0.02






(+)BE3





10.26 
9.44
9.28


4.12


0.18


0.12

0.05
0.04





































SEQ ID



























NO: 130
t
G
A
A
T
C
C
C
a
T
C
T
c
C
A
G
C
A
C
C
A
G
G


































FNACF_1
C-
Untreated





0.07
0.10
0.08


0.03

0.04
0.04


0.07

0.04
0.07





>other
(+)BE1





0.07
0.10
0.09


0.03

0.05
0.05


0.09

0.03
0.06





bases
(+)BE2





0.10
0.10
0.12


0.03

0.02
0.06


0.07

0.02
0.08






(+)BE3





0.16
0.16
0.18


0.06

0.05
0.07


0.09

0.03
0.07





































SEQ ID



























NO: 132
G
G
A
g
T
C
C
C
T
c
C
T
a
C
A
G
C
A
C
C
A
G
G


































FNACF_3
C-
Untreated





0.06
0.09
0.05

0.08
0.03


0.06


0.07

0.06
0.14





>other
(+)BE1





0.06
0.09
0.06

0.08
0.03


0.06


0.06

0.07
0.13





bases
(+)BE2





0.06
0.10
0.05

0.08
0.04


0.05


0.07

0.07
0.15






(+)BE3





0.20
0.23
0.18

0.16
0.06


0.08


0.06

0.07
0.18





































SEQ ID



























NO: 133
G
G
A
g
T
C
C
C
T
c
C
T
a
C
A
G
C
A
C
C
A
G
G


































FNACF_4
C-
Untreated





0.06
0.05
0.05

0.05
0.06


0.06


0.03

0.03
0.06





>other
(+)BE1





0.06
10.05
0.05

0.02
0.07


0.07


0.03

0.03
0.05





bases
(+)BE2





0.07
10.06
0.03

0.05
0.06


0.07


0.02

0.04
0.06






(+)BE3





0.11
0.09
0.12

0.05
0.06


0.07


0.04

0.04
0.04





































SEQ ID



























NO: 134
G
G
A
A
T
C
C
C
T
T
C
T
a
C
A
G
C
A
t
C
C
T
G


































FNACF_5
C-
Untreated





0.09
10.07
0.05


0.03


0.07


0.03


0.03





>other
(+)BE1





0.07
0.06
0.04


0.03


0.07


0.03


0.03





bases
(+)BE2





0.08
0.05
10.06


0.03


0.06


0.05


0.03






(+)BE3





0.10
10.07
0.05


0.03


0.07


0.03


0.02





































SEQ ID



























NO: 135
G
G
A
g
T
C
C
C
T
c
C
T
G
C
A
G
C
A
C
C
T
G
A


































FNACF_6
C-
Untreated





0.04
0.04
0.04

0.02
0.04


0.09


0.06

0.02
0.04





>other
(+)BE1





0.05
0.05
0.02

0.02
0.04


0.12


0.04

0.05
0.05





bases
(+)BE2





0.04
0.05
0.05

0.03
0.06


0.11


0.06

0.05
0.05






(+)BE3





0.13
0.09
0.09

0.05
0.06


0.12


0.06

0.05
0.03





































SEQ ID



























NO: 136
G
G
A
A
c
C
C
C
g
T
C
T
G
C
A
G
C
A
C
C
A
G
G


































FNACF_7
C-
Untreated




0.03
0.07
0.07
0.06


0.03


0.20


0.05

0.03
0.07





>other
(+)BE1




0.05
0.06
0.04
0.07


0.01


0.21


0.05

0.02
0.05





bases
(+)BE2




0.04
0.08
10.05
0.08


0.02


0.23


0.06

0.02
0.05






(+)BE3




1.06
1.07
1.07
1.02


0.71


0.22


0.07

0.03
0.07





































SEQ ID



























NO: 137
G
t
c
t
c
C
C
C
T
T
c
T
G
C
A
G
C
A
C
C
A
G
G


































FNACF_8
C-
Untreated


0.02

0.03
0.04
0.02
0.05


0.03


0.11


0.03

0.02
0.03





>other
(+)BE1


0.02

0.02
0.03
0.04
0.05


0.03


0.08


0.04

0.02
0.04





bases
(+)BE2


0.01

0.02
0.02
10.05
0.05


0.02


0.09


0.03

0.02
0.04






(+)BE3


0.02

0.02
0.04
0.04
0.08


0.03


0.10


0.03

0.02
0.03





































SEQ ID



























NO: 138
a
a
A
A
T
C
C
C
T
T
C
c
G
C
A
G
C
A
C
C
T
A
G


































FNACF_9
C-
Untreated





0.07
0.02
0.04


0.05
0.06

0.05


0.04

0.05
0.06





>other
(+)BE1





0.08
0.03
0.04


0.04
0.07

0.04


0.04

0.05
0.04





bases
(+)BE2





0.08
0.02
0.03


0.03
0.07

0.04


0.04

0.04
0.06






(+)BE3





0.10
0.04
0.05


0.05
0.06

0.06


0.04

0.05
0.03





































SEQ ID



























NO: 139
t
G
t
A
T
t
t
C
T
T
C
T
G
C
c
t
C
A
g
g
C
T
G


































FNACF_10
C-
Untreated

























>other
(+)BE1

























bases
(+)BE2


























(+)BE3





































SEQ ID



























NO: 140
G
G
A
A
T
a
t
C
T
T
C
T
G
C
A
G
C
C
C
C
A
G
G


































FNACF_11
C-
Untreated







0.03


0.05


0.22


0.03
0.05
0.05
0.10





>other
(+)BE1







0.03


0.04


0.23


0.03
0.06
0.05
0.09





bases
(+)BE2







0.03


0.04


0.21


0.03
0.06
0.07
0.09






(+)BE3







0.04


0.03


0.21


0.02
0.06
0.05
0.09





































SEQ ID



























NO: 141
G
a
g
t
g
C
C
C
T
g
a
a
G
C
c
t
C
A
g
C
T
G
G


































FNACF_12
C-
Untreated

























>other
(+)BE1

























bases
(+)BE2


























(+)BE3





































SEQ ID



























NO: 142
a
c
c
A
T
C
C
C
T
c
C
T
G
C
A
G
C
A
C
C
A
G
G


































FNACF_13
C-
Untreated

0.07
0.06


0.04
0.04
0.04

0.06
0.05


0.10


0.03

0.08
0.04





>other
(+)BE1

0.14
0.07


0.07
0.04
0.05

0.06
0.04


0.10


0.04

0.06
0.05





bases
(+)BE2

0.11
0.07


0.04
0.03
0.04

0.06
0.04


0.12


0.04

0.05
0.05






(+)BE3

0.13
0.08


0.15
0.15
0.14

0.13
0.09


0.10


0.04

0.06
0.04





































SEQ ID



























NO: 143
t
G
A
A
T
C
C
t
a
a
C
T
G
C
A
G
c
A
C
C
A
G
G


































FNACF_14
C-
Untreated





0.09
0.05



0.04


0.09


0.06

0.08
0.06





>other
(+)BE1





0.09
10.04



0.05


0.10


0.06

0.10
0.07





bases
(+)BE2





0.07
0.05



0.04


0.07


0.06

0.09
0.06






(+)BE3





0.10
0.08



0.03


0.11


0.07

0.10
0.07





































SEQ ID



























NO: 144
c
t
c
t
g
t
C
C
T
T
C
T
G
C
A
G
C
A
C
C
T
G
G


































FNACF_15
C-
Untreated
0.03

0.04



0.05
0.02


0.02


0.06


0.02

0.01
0.03





>other
(+)BE1
0.03

0.02



0.04
0.02


0.02


0.06


0.03

0.02
0.04





bases
(+)BE2
0.04

0.03



0.04
0.03


0.02


0.05


0.03

0.02
0.04






(+)BE3
0.03

0.02



0.07
0.03


0.02


0.05


0.03

0.02
0.04










RNF2


Base editing efficiency (%)





































SEQ ID



























NO: 93
G
T
C
A
T
C
T
T
A
G
T
C
A
T
T
A
C
C
T
G
A
G
G


































On-target
C-
Untreated





0.07





0.06




0.03
0.07






(RNF2_1)
>other
(+)BE1





2.90





0.08




0.03
0.07







bases
(+)BE2





3.89





0.62




0.05
0.08








(+)BE3





31.12 





3.45




0.16
0.08










HBB


Base editing efficiency (%)


































SEQ ID



























NO: 145
C
T
T
G
C
C
C
C
A
C
A
G
G
G
C
A
G
T
A
A
C
G
G


































On-target
C-
Untreated
0.05



0.08
0.03
0.05
0.04

0.04




0.08









(HBB_1)
>other
(+)BE1
0.04



0.08
0.14
0.17
0.08

0.05




0.07










bases
(+)BE2
0.08



0.56
0.80
0.83
0.80

0.07




0.06











(+)BE3
0.10



3.01
4.51
4.88
4.64

0.14




0.08





































SEQ ID



























NO: 146
t
T
g
c
t
C
C
C
A
C
A
G
G
G
C
A
G
T
A
A
A
C
G


































HBB_2
C-
Untreated



0.07

0.06
0.04
0.05

0.04




0.06










>other
(+)BE1



0.07

0.09
0.07
0.07

0.04




0.07










bases
(+)BE2



0.14

0.24
0.22
0.22

0.05




0.06











(+)BE3



0.42

0.89
0.84
0.86

0.07




0.06





































SEQ ID



























NO. 147
g
c
T
G
C
C
C
C
A
C
A
G
G
G
C
A
G
C
A
A
A
G
G


































HBB_3
C-
Untreated

0.07


0.06
0.06
0.11
0.03

0.07




0.14


0.09







>other
(+)BE1

0.08


0.06
0.06
0.10
0.03

0.05




0.10


0.08







bases
(+)BE2

0.09


0.13
0.15
0.17
0.09

0.05




0.12


0.09








(+)BE3

0.09


0.80
0.86
0.87
0.75

0.07




0.11


0.09





































SEQ ID



























NO: 148
g
T
g
G
C
C
C
C
A
C
A
G
G
G
C
A
G
g
A
A
T
G
G


































HBB_4
C-
Untreated




0.07
0.13
0.06
0.09

0.04




0.06










>other
(+)BE1




0.09
0.14
0.07
0.08

0.05




0.08










bases
(+)BE2




0.09
0.15
0.08
0.12

0.04




0.07











(+)BE3




0.14
0.20
0.13
0.16

0.07




0.08





































SEQ ID



























NO: 149
a
T
T
G
C
C
C
C
A
C
g
G
G
G
C
A
G
T
g
A
C
G
G


































HBB_5
C-
Untreated




0.12
0.19
0.73
0.40

0.16




0.20










>other
(+)BE1




0.16
0.20
0.76
0.47

0.19




0.25










bases
(+)BE2




0.14
0.16
0.77
0.51

0.17




0.28











(+)BE3




0.36
0.42
0.95
0.73

0.20




0.21





































SEQ ID



























NO: 150
a
c
T
c
t
C
C
C
A
C
A
a
G
G
C
A
G
T
A
A
G
G
G


































HBB_6
C-
Untreated

0.11

0.12
0.08
0.11
0.20
0.08

0.05




0.17










>other
(+)BE1

0.10

0.16
0.10
0.09
0.20
0.10

0.04




0.14










bases
(+)BE2

0.08

0.16
0.11
0.11
0.21
0.10

0.05




0.20











(+)BE3

0.10

0.14
0.13
0.13
0.22
0.09

0.05




0.17





































SEQ ID



























NO: 151
t
c
a
G
C
C
C
C
A
C
A
G
G
G
C
A
G
T
A
A
G
G
G


































HBB_7
C-
Untreated

0.03


0.07
0.07
0.09
0.05

0.05




0.08










>other
(+)BE1

0.14


0.09
0.09
0.11
0.08

0.06




0.14










bases
(+)BE2

0.27


0.09
0.22
0.25
0.19

0.05




0.09











(+)BE3

2.82


0.80
2.89
4.01
4.20

0.14




0.09










HEK2


Base editing efficiency (%)





































SEQ ID
G
A
A
C
A
C
A
A
A
G
C
A
T
A
G
A
CT
G
C
G
G
G
G




NO: 153


































On-target
C-
Untreated



0.04

 0.05




0.04





0.16







(HEK2_2)
>other
(+)BE1



0.65

10.29




0.04





0.18








bases
(+)BE2



7.32

14.69




0.03





0.17









(+)BE3



11.74 

33.30




0.07





0.18





































SEQ ID



























NO: 152
G
A
A
C
A
C
A
A
t
G
C
A
T
A
G
A
t
T
G
C
C
G
G


































HEK2_1
C-
Untreated



0.10

0.00




0.11








0.18





>other
(+)BE1



0.10

0.10




0.13








0.21





bases
(+)BE2



0.13

0.12




0.11








0.16






(+)BE3



0.17

0.21




0.11








0.19





































SEQ ID



























NO: 154
a
A
c
t
C
C
A
A
A
G
c
A
T
A
t
A
C
T
G
C
T
G
G


































HEK2_3
C-
Untreated


0.09

0.09
0.34




0.25








0.09





>other
(+)BE1


0.08

0.07
0.37




0.24








0.08





bases
(+)BE2


0.09

0.07
0.38




0.19








0.08






(+)BE3


0.09

0.07
0.38




0.24








0.07










HEK3


Base editing efficiency (%)





































SEO ID



























NO: 156
G
G
C
C
C
A
G
A
C
T
G
A
G
C
A
C
G
T
G
A
T
G
G


































On-target
C-
Untreated


0.13
0.46
0.42



0.14




0.10

0.07








(HEK3_2)
>other
(+)BE1


0.38
6.45
8.56



0.59




0.14

0.08









bases
(+)BE2


0.37
6.27
8.17



0.41




0.20

0.06










(+)BE3


1.00
24.71 
31.39 



0.76




0.09

0.10





































SEQ ID



























NO: 155
a
G
C
t
C
A
G
A
C
T
G
A
G
C
A
a
G
T
G
A
G
G
G


































HEK3_1
C-
Untreated


0.12

0.04



0.04




0.14











>other
(+)BE1


0.12

0.04



0.07




0.13











bases
(+)BE2


0.13

0.05



0.08




0.17












(+)BE3


0.13

0.09



0.05




0.13





































SEQ ID



























NO: 157
G
t
g
g
C
c
c
A
g
a
G
A
G
C
A
C
G
T
G
t
G
G
G


































HEK3_3
C-
Untreated




0.07
0.06
0.07






0.12

0.13









>other
(+)BE1




0.08
0.05
0.10






0.09

0.11









bases
(+)BE2




0.08
0.05
0.06






0.11

0.12










(+)BE3




0.07
0.05
0.07






0.10

0.10





































SEQ ID



























NO: 158
c
a
C
C
C
A
G
A
C
T
G
A
G
C
A
C
G
T
G
c
T
G
G


































HEK3_4
C-
Untreated
0.08

0.07
0.07
0.05



0.01




0.14

0.06



0.04





>other
(+)BE1
0.09

0.06
0.08
0.06



0.03




0.13

0.04



0.04





bases
(+)BE2
0.09

0.07
0.07
0.06



0.02




0.10

0.05



0.05






(+)BE3
0.08

0.05
0.08
0.06



0.02




0.13

0.05



0.05





































SEQ ID



























NO: 159
c
G
g
C
C
c
a
A
C
T
G
A
G
C
A
a
G
T
G
A
T
G
G


































HEK3_5
C-
Untreated
0.16


0.08
0.13
0.10


0.06




0.19











>other
(+)BE1
0.19


0.11
0.14
0.07


0.06




0.21











bases
(+)BE2
0.16


0.08
0.13
0.09


0.05




0.16












(+)BE3
0.16


0.08
0.13
0.09


0.05




0.20





































SEQ ID



























NO: 160
a
G
a
C
C
A
G
A
C
T
G
A
G
C
A
a
G
a
G
A
G
G
G


































HEK3_6
C-
Untreated



0.08
0.10



0.06




0.20











>other
(+)BE1



0.09
0.12



0.06




0.19











bases
(+)BE2



0.08
0.12



0.06




0.19












(+)BE3



0.10
0.11



0.05




0.16





































SEQ ID



























NO: 161
G
G
C
C
a
c
t
c
a
T
G
g
c
C
A
C
a
T
a
c
T
G
G


































HEK3_7
C-
Untreated


0.45
0.15

0.05

0.19




0.29
0.26





0.06





>other
(+)BE1


0.45
0.16

0.08

0.19




0.30
0.28





0.06





bases
(+)BE2


0.47
0.17

0.00

0.19




0.31
0.24





0.06






(+)BE3


0.44
0.16

0.08

0.19




0.29
0.26





0.06










HEK4


Base editing efficiency (%)





































SEQ ID



























NO. 162
G
G
C
A
C
T
G
C
G
G
C
T
G
G
A
G
G
T
G
G
G
G
G


































On-target
C-
Untreated


0.16

 0.11


0.20


0.07













(HEK4_1)
>other
(+)BE1


0.17

 6.18


0.25


0.07














bases
(+)BE2


0.65

10.35


0.84


0.06















(+)BE3


2.34

41.18


0.80


0.07





































SEQ ID



























NO: 163
G
G
C
A
C
T
G
C
t
G
C
T
G
G
g
G
G
T
G
G
T
G
G


































HEK4_2
C-
Untreated


0.11

0.05


0.15


0.98














>other
(+)BE1


0.13

0.38


0.14


0.98














bases
(+)BE2


0.16

0.46


0.13


0.93















(+)BE3


0.31

5.93


0.22


1.07





































SEQ ID



























NO: 164
G
G
C
A
C
T
G
C
a

C
T
G
G
A
G
G
T
t
G
T
G
G


































HEK4_3
C-
Untreated


0.08

0.05


0.07


0.05














>other
(+)BE1


0.10

0.22


0.09


0.05














bases
(+)BE2


0.11

0.22


0.07


0.05















(+)BE3


0.09

0.39


0.08


0.03





































SEQ ID



























NO: 165
G
G
C
t
C
T
G
C
G
G
C
T
G
G
A
G
G
g
G
G
T
G
G


































HEK4_4
C-
Untreated


0.04

0.05


0.34


0.13














>other
(+)BE1


0.05

0.26


0.35


0.13














bases
(+)BE2


0.06

0.19


0.35


0.15















(+)BE3


0.07

2.07


0.34


0.17





































SEO ID



























NO: 166
a
G
C
A
C
T
G
C
a
G
a
T
G
G
A
G
G
a
G
G
C
G
G


































HEK4_5
C-
Untreated


0.08

0.07


0.11

















>other
(+)BE1


0.09

0.11


0.11

















bases
(+)BE2


0.09

0.07


0.10


















(+)BE3


0.10

0.52


0.20





































SEQ ID



























NO: 167
G
G
C
A
C
T
G
C
G
G
C
a
G
G
g
a
G
g
a
G
G
G
G


































HEK4_6
C-
Untreated

























>other
(+)BE1

























bases
(+)BE2


























(+)BE3





































SEQ ID



























NO: 168
t
G
C
A
C
T
G
C
G
G
C
c
G
G
A
G
G
a
G
G
T
G
G


































HEK4_7
C-
Untreated


0.21

0.12


0.36


0.14
0.09













>other
(+)BE1


0.15

0.53


0.31


0.13
0.08













bases
(+)BE2


0.19

1.25


0.32


0.11
0.14














(+)BE3


0.37

10.75


0.41


0.12
0.07





































SEQ ID



























NO: 169
G
G
C
A
C
T

g
G
G
C
T
G
a
A
G
G
T
a
G
A
G
G


































HEK4_8
C-
Untreated


0.09

0.05





0.08














>other
(+)BE1


0.07

0.15





0.05














bases
(+)BE2


0.08

0.17





0.07















(+)BE3


0.07

0.18





0.06





































SEQ ID



























NO: 170
G
G
C
A
C
T
G
t
G
G
C
T
G
c
A
G
G
T
G
G
A
G
G


































HEK4_9
C-
Untreated


0.09

0.03





0.02


0.04











>other
(+)BE1


0.08

0.04





0.04


0.03











bases
(+)BE2


0.12

0.03





0.04


0.03












(+)BE3


0.12

0.02





0.04


0.05





































SEQ ID



























NO: 171
t
G
C
t
C
T
G
C
G
G
C
a
G
G
A
G
G
a
G
G
A
G
G


































HEK4_10
C-
Untreated


0.08

0.17


0.06


0.06














>other
(+)BE1


0.07

0.17


0.06


0.05














bases
(+)BE2


0.08

0.18


0.07


0.07















(+)BE3


0.08

0.19


0.07


0.07





































SEQ ID



























NO: 172
a
G
C
A
C
T
G
C
a
G
C
T
G
G
g
a
G
T
G
G
A
G
G


































HEK4_11
C-
Untreated


0.16

0.05


0.13


0.07














>other
(+)BE1


0.12

0.47


0.12


0.07














bases
(+)BE2


0.13

0.64


0.14


0.08















(+)BE3


0.19

1.83


0.18


0.08





































SEQ ID



























NO: 173
G
G
C
A
C
T
G
a
G
G
g
T
G
G
A
G
G
T
G
G
G
G
G


































HEK4_12
C-
Untreated


0.10

0.03




















>other
(+)BE1


0.07

0.65




















bases
(+)BE2


0.10

0.47





















(+)BE3


0.09

0.99





































SEQ ID



























NO: 174
G
G
C
A
C
T
G
g
G
G
C
T
G
G
A
G
a
c
G
G
G
G
G


































HEK4_13
C-
Untreated


0.13

0.15





0.13






0.23







>other
(+)BE1


0.12

0.14





0.11






0.18







bases
(+)BE2


0.10

0.13





0.09






0.15








(+)BE3


0.11

0.12





0.12






0.18





































SEQ ID



























NO: 175
a
G
g
A
C
T
G
C
G
G
C
T
G
G
g
G
G
T
G
G
T
G
G


































HEK4_14
C-
Untreated




0.06


0.28


0.05














>other
(+)BE1




0.50


0.37


0.03














bases
(+)BE2




0.63


0.38


0.04















(+)BE3




5.20


0.50


0.04





































SEO ID



























NO: 176
G
G
C
A
C
T
G
C
a
a
C
T
G
G
A
a
G
T
G
a
T
G
G


































HEK4_15
C-
Untreated


0.11

0.06


0.12


0.03














>other
(+)BE1


0.10

0.08


0.07


0.02














bases
(+)BE2


0.08

0.08


0.08


0.02















(+)BE3


0.10

0.26


0.09


0.03





































SEQ ID



























NO: 177
G
G
C
A
C
T
G
g
G
G
t
T
G
G
A
G
G
T
G
G
G
G
G


































HEK4_16
C-
Untreated


0.17

0.16




















>other
(+)BE1


0.14

1.01




















bases
(+)BE2


0.17

0.58





















(+)BE3


0.38

3.41





































SEQ ID



























NO: 178
G
c
C
A
C
T
G
C
a
G
C
T
a
G
A
G
G
T
G
G
A
G
G


































HEK4_17
C-
Untreated

0.14
0.05

0.07


0.20


0.03














>other
(+)BE1

0.10
0.06

0.24


0.13


0.04














bases
(+)BE2

0.09
0.10

0.27


0.14


0.04















(+)BE3

0.12
0.34

3.12


0.22


0.04





































SEQ ID



























NO: 179
G
c
C
A
C
T
G
C
G
a
C
T
G
G
A
G
G
a
G
G
G
G
G


































HEK4_18
C-
Untreated

0.14
0.07

0.06


60.77


0.04














>other
(+)BE1

0.10
0.05

0.08


61.73


0.03














bases
(+)BE2

0.12
0.03

0.05


60.63


0.05















(+)BE3

0.10
0.08

0.12


60.98


0.04





































SEQ ID



























NO: 180
G
G
C
A
C
T
G

G
G
C
T
G
G
A
G
G
c
G
G
G
G
G


































HEK4_19
C-
Untreated


0.06

0.06





0.05






0.12







>other
(+)BE1


0.07

0.04





0.06






0.11







bases
(+)BE2


0.08

0.06





0.04






0.10








(+)BE3


0.08

0.05





0.07






0.09





































SEQ ID



























NO: 181
a
G
c
t
c
T
G
C
G
G
C
a
G
G
A
G
t
T
G
G
A
G
G


































HEK4_20
C-
Untreated


0.24

0.02


0.20


0.12














>other
(+)BE1


0.21

0.03


0.20


0.08














bases
(+)BE2


0.21

0.02


0.17


0.08















(+)BE3


0.23

0.02


0.22


0.11


















EMX1


















Indel frequency (%)
Validation




















(−)
(+) RGEN
BE3
Cas9









On-target
0.15
61.59
Validated
Validated






(EMX1_4)










EMX1_1
0.29
38.25
Validated
Validated






EMX1_2
0.00
0.01
Validated
Validated






EMX1_3
0.10
3.45
Validated
Validated






EMX1_5
0.01
0.01
Validated
Invalidated






EMX1_6
0.00
8.63
Validated
Validated






EMX1_7
0.01
0.01
Validated
Invalidated






EMX1_8
0.08
0.08
Validated
Invalidated






EMX1_9
0.01
0.23
Validated
Validated






EMX1_10
0.00
7.94
Validated
Validated






EMX1_11
0.00
0.01
Validated
Invalidated






EMX1_12
0.00
0.00
Invalidated
Invalidated






EMX1_13
0.00
0.00
Invalidated
Invalidated






EMX1_14
0.01
0.01
Invalidated
Invalidated






EMX1_15
0.46
0.89
Validated
Validated






EMX1_16
0.00
0.00
Invalidated
Invalidated






EMX1_17
0.01
0.00
Invalidated
Invalidated






EMX1_18
0.01
0.01
Validated
Invalidated






EMX1_19
0.01
0.02
Validated
Invalidated






EMX1_20
0.27
0.25
Validated
Invalidated






EMX1_21
0.00
0.00
Invalidated
Invalidated






EMX1_22
0.02
0.17
Validated
Validated






EMX1_23
0.01
0.01
Invalidated
Invalidated






EMX1_24
0.00
0.00
Invalidated
Invalidated






EMX1_25
1.06
1.04
Validated
Invalidated


















FANCF


















Indel frequency (%)
Validation




















(−)
(+) RGEN
BE3
Cas9









On-target
0.01
44.48
Validated
Validated






(FANCF_2)










FNACF_1
0.00
0.02
Validated
Validated






FNACF_3
0.01
0.37
Validated
Validated






FNACF_4
0.01
0.22
Validated
Validated






FNACF_5
0.00
0.00
Invalidated
Invalidated






FNACF_6
0.00
0.28
Validated
Validated






FNACF_7
0.01
12.06
Validated
Validated






FNACF_8
0.03
0.05
Invalidated
Invalidated






FNACF_9
0.00
0.08
Validated
Validated






FNACF_10










FNACF_11
0.02
0.03
Invalidated
Invalidated






FNACF_12










FNACF_13
0.01
0.03
Validated
Validated






FNACF_14
0.00
0.00
Invalidated
Invalidated






FNACF_15
0.02
0.00
Invalidated
Invalidated


















RNF2


















Indel frequency (%)
Validation




















(−)
(+) IRGEN
BE3
Cas9









On-target
0.03
66.13
Validated
Validated






(RNF2_1)






















HBB


















Indel frequency (%)
Validation




















(−) RGEN
(+) RGEN
BE3
Cas9









On-target
0.02
38.35
Validated
Validated






(HB8_1)










HBB_2
0.02
0.01
Validated
Invalidated






HBB_3
0.01
3.57
Validated
Validated






HBB_4
0.00
0.70
Validated
Validated






HBB_5
0.00
0.35
Validated
Validated






HBB_6
0.02
0.01
Invalidated
Invalidated






HBB_7
0.00
20.92
Validated
Validated


















HEK2


















Indel frequency (%)
Validation




















(−) RGEN
(+) RGEN
BE3
Cas9









On-target
0.00
43.28
Validated
Validated






(HEK2_2)










HEK2_1
0.00
1.01
Validated
Validated






HEK2_3
0.00
0.00
Invalidated
Invalidated


















HEK3


















Indel frequency (%)
Validation




















(−) RGEN
(+) RGEN
BE3
Cas9









On-target
0.00
60.16
Validated
Validated






(HEK3_2)










HEK3_1
0.00
2.93
Invalidated
Validated






HEK3_3
0.00
0.00
Invalidated
Invalidated






HEK3_4
0.00
4.16
Invalidated
Validated






HEK3_5
0.00
0.00
Invalidated
Invalidated






HEK3_6
0.00
0.02
Invalidated
Invalidated






HEK3_7
0.00
0.00
Invalidated
Invalidated


















HEK4


















Indel frequency (%)
Validation




















(−) RGEN
(+) RGEN
BE3
Cas9









On-target
0.00
59.38
Validated
Validated






(HEK4_1)










HEK4_2
0.02
35.65
Validated
Validated






HEK4_3
0.00
0.00
Validated
Validated






HEK4_4
0.07
29.61
Validated
Validated






HEK4_5
0.00
0.08
Validated
Validated






HEK4_6










HEK4_7
0.02
35.87
Validated
Validated






HEK4_8
0.04
0.04
Validated
Invalidated






HEK4_9
0.02
25.09
Invalidated
Validated






HEK4_10
2.67
3.08
Invalidated
Validated






HEK4_11
0.04
8.97
Validated
Validated






HEK4_12
0.08
10.38
Validated
Validated






HEK4_13
0.11
0.69
Invalidated
Validated






HEK4_14
0.38
46.26
Validated
Validated






HEK4_15
0.01
0.14
Validated
Validated






HEK4_16
0.12
25.87
Validated
Validated






HEK4_17
0.01
2.93
Validated
Validated






HEK4_18
0.16
0.37
Validated
Validated






HEK4_19
0.10
0.11
Invalidated
Invalidated






HEK4_20
0.02
0.07
Invalidated
Validated









The inventors analyzed a total of 75 sites identified using 7 sgRNAs and observed BE3-induced point mutations at 50 sites, including all 7 on-target sites, with frequencies above noise levels caused by sequencing errors (typically in the range of 0.1-2%), resulting in a validation rate of 67%. It is possible that BE3 can still induce mutagenesis at the other BE3-associated, Digenome-positive sites with frequencies below background noise levels. Importantly, we were able to identify BE3 off-target sites at which base editing was detected with a frequency of 0.1%, demonstrating that Digenome-seq is a highly sensitive method. Cas9 nucleases detectably induced indels at 70% (=44/63) of the sites associated with both Cas9 and BE3 but failed to do so at each of the 12 sites associated with BE3 alone (Tables 2-8).



FIGS. 14a-14c show base editing efficiencies at Digenome-captured sites associated only with 3 different Cas9 nucleases. As shown in FIGS. 14a-14c, BE3 did not detectably cause substitutions at 24 Digenome-positive sites associated with 3 different Cas9 nucleases alone. Furthermore, FIGS. 15a-15c show base editing efficiencies of 3 different BE3 deaminases at Digenome-negative sites. As shown in FIGS. 15a-15c, the 3 BE3 deaminases did not induce base editing at 28 Digenome-negative sites with ≤3 mismatches, identified using Cas-OFFinder (Bae, S., Park, J. & Kim, J. S. Cas-OFFinder: A fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics (2014)) (FIGS. 15a-15c). Frequencies of BE3-induced substitutions were well-correlated with those of Cas9-mediated indels [R2=0.92 (EMX1) or 0.89 (HBB)] (FIG. 6e, f). Nevertheless, there were many off-target sites validated by BE3 but not by Cas9. 64% (=7/11) of these validated, BE3-exclusive off-target sites had a missing nt, compared to their respective on-target sites. These results show that Cas9 and BE3 off-target sites largely overlap with each other but that there are off-target sites exclusively associated with Cas9 alone or BE3 alone (FIG. 10).


Example 6. Reducing BE3 Off-Target Effects Via Modified sgRNAs

To reduce BE3 off-target effects, the inventors replaced conventional sgRNAs (termed gX19 or GX19; “g” and “G” represent, respectively, a mismatched and matched guanine) with truncated sgRNAs (termed gX18 or gX17) or extended sgRNAs containing one or two extra guanines at the 5′ terminus (termed gX20 or ggX20) and measured on-target and off-target base-editing frequencies in HEK293T cells. The results are shown in FIGS. 16-17 and Table 3.









TABLE 17





Analysis of BE3 off-target effect via modified sgRNAs







EMX1





































SEQ ID NO: 31
G
A
G
T
C
C
G
A
G
C
A
G
A
A
G
A
A
G
A
A
G
G
G





On-target
C-
Untreated




0.04
0.06



0.15















(EMX1_4)
>other
ggX20




49.01
46.36



0.10
















bases
gX20




54.78
50.04



0.14

















GX19




49.17
45.06



0.10

















gX18




48.68
37.61



0.09

















GX17




48.71
37.70



0.14







SEQ ID NO: 106
G
A
G
T
C
t
a
A
G
C
A
G
A
A
G
A
A
G
A
A
G
A
G





EMX1_1
C-
Untreated




0.04




0.05
















>other
ggX20




1.26




0.05
















bases
gX20




8.50




0.06

















GX19




15.57




0.07

















gX18




0.06




0.05

















GX17




0.07




0.05







SEQ ID NO: 107
G
A
a
T
C
c
a
A
G
C
A
G
A
A
G
A
A
G
A
G
A
A
G





EMX1_2
C-
Untreated




0.08
0.08



0.07
















>other
ggX20




0.40
0.36



0.05
















bases
gX20




0.80
0.75



0.07

















GX19




0.84
0.81



0.07

















gX18




0.22
0.23



0.08

















GX17




0.16
0.17



0.06







SEQ ID NO: 108
a
A
G
T
C
t
G
A
G
c
A
c
A
A
G
A
A
G
A
A
T
G
G







Untreated




0.02




0.07

0.06













EMX1_3
c-
ggX20




0.03




0.06

0.06














>other
gX20




0.03




0.07

0.05














bases
GX19




0.13




0.07

0.05















gX18




0.02




0.07

0.05















GX17




0.02




0.08

0.04







SEQ ID NO: 109
G
A
a
T
C
C
a
A
G

A
G
A
A
G
A
A
G
A
A
T
G
G





EMX1_5
C-
Untreated




0.11
0.05




















>other
ggX20




1.09
1.11




















bases
gX20




2.31
2.27





















GX19




0.96
0.96





















gX18




0.06
0.11





















GX17




0.08
0.12







SEQ ID NO: 110
G
A
G
T
C
C
t
A
G
C
A
G
g
A
G
A
A
G
A
A
G
A
G





EMX1_6
C-
Untreated




0.02
0.04



0.04
















>other
ggX20




0.34
0.35



0.05
















bases
gX20




1.69
1.71



0.05

















GX19




2.43
2.40



0.04

















gX18




0.02
0.02



0.05

















GX17




0.02
0.04



0.04







SEQ ID NO: 111
G
A
G
T
C
C
a
A
G
C
A
G
t
A
G
A
g
G
A
A
G
G
G





EMX1_7
C-
Untreated




0.03
0.06



0.06
















>other
ggX20




0.03
0.06



0.08
















bases
gX20




0.04
0.08



0.08

















GX19




0.07
0.10



0.07

















gX18




0.03
0.05



0.09

















GX17




0.03
0.05



0.08







SEQ ID NO: 112
G
t
G
T
C
C
t
A
G

A
G
A
A
G
A
A
G
A
A
G
G
G





EMX1_8
C-
Untreated




0.05
0.03




















>other
ggX20




0.10
0.09




















bases
gX20




0.38
0.35





















GX19




0.64
0.57





















gX18




0.60
0.56





















GX17




0.62
0.60







SEQ ID NO: 113
a
A
G
T
C
C
G
A
G
g
A
G
A
g
G
A
A
G
A
A
A
G
G





EMX1_9
C-
Untreated




0.05
0.16




















>other
ggX20




0.04
0.14




















bases
gX20




0.05
0.17





















GX19




0.06
0.18





















gX18




0.05
0.18





















GX17




0.05
0.16







SEQ ID NO: 114
G
A
G
g
C
C
G
A
G
C
A
G
A
A
G
A
A
a
g
A
C
G
G





EMX1_10
C-
Untreated




0.14
0.10



0.13
















>other
ggX20




0.26
0.21



0.12
















bases
gX20




0.44
0.39



0.19

















GX19




3.45
3.70



0.17

















gX18




0.17
0.10



0.15

















GX17




0.18
0.09



0.16







SEQ ID NO: 115
a
g
t
T
C
C
a
A
G
C
A
G
A
A
G
A
A
G
c
A
T
G
G





EMX1_11
C-
Untreated




0.06
0.05



0.07








0.06







>other
ggX20




0.27
0.28



0.07








0.07







bases
gX20




0.74
0.70



0.07








0.08








GX19




0.74
0.62



0.06








0.07








gX18




0.06
0.05



0.07








0.06








GX17




0.07
0.06



0.07








0.07







SEQ ID NO: 116
G
A
G
T
C
C
a
C
a
c
A
G
A
A
G
A
A
G
A
A
A
G
A







Untreated




0.08
0.26

0.11

0.11















EMX1_12
C-
gg×20




0.07
0.21

0.12

0.11
















>other
gx20




0.09
0.23

0.09

0.11
















bases
GX19




0.17
0.33

0.17

0.10

















gx18




0.07
0.25

0.10

0.13

















GX17




0.08
0.23

0.11

0.11







SEQ ID NO: 117
G
A
G
T
C
C
a
A
G

A
G
A
A
G
A
A
G
t
g
A
G
G





EMX1_13
C-
Untreated




0.08
0.12




















>other
ggX20




0.06
0.11




















bases
gX20




0.07
0.11





















GX19




0.08
0.13





















gX18




0.08
0.13





















GX17




0.07
0.13







SEQ ID NO: 118
G
A
G
T
C
C
t
A
G

A
G
A
A
G
A
A
G
g
A
A
G
G





EMX1_14
C-
Untreated




0.06
0.13




















>other
ggX20




0.07
0.19




















bases
gX20




0.07
0.17





















GX19




0.05
0.13





















gX18




0.06
0.12





















GX17




0.05
0.14







SEQ ID NO: 119
G
A
a
T
C
C
a
A
G
C
A
G
g
A
G
A
A
G
A
A
G
G
A





EMX1_15
C-
Untreated




0.04
0.07



0.05
















>other
ggX20




0.09
0.15



0.04
















bases
gX20




0.54
0.60



0.08

















GX19




0.14
0.18



0.05

















gX18




0.04
0.07



0.05

















GX17




0.01
0.07



0.06







SEQ ID NO: 120
G
t
a
c
C
a
G
A
G

A
G
A
A
G
A
A
G
A
g
A
G
G





EMX1_16
C-
Untreated



0.06
0.06





















>other
ggX20



0.05
0.05





















bases
gX20



0.06
0.05






















GX19



0.05
0.05






















gX18



0.06
0.04






















GX17



0.06
0.05







SEQ ID NO: 121
G
A
G
T
C
C
C
A
G
C
A
a
A
A
G
A
A
G
A
A
A
A
G





EMX1_17
C-
Untreated




0.10
0.19
0.09


0.07
















>other
ggX20




0.10
0.16
0.10


0.07
















bases
gX20




0.19
0.24
0.13


0.05

















GX19




0.11
0.20
0.07


0.07

















gX18




0.12
0.24
0.09


0.06

















GX17




0.12
0.20
0.07


0.06







SEQ ID NO: 122
a
A
G
T
C
C
a
A
G
t

G
A
A
G
A
A
G
A
A
A
G
G





EMX1_18
C-
Untreated




0.06
0.09




















>other
ggX20




0.05
0.09




















bases
gX20




0.05
0.08





















GX19




0.09
0.11





















gX18




0.05
0.08





















GX17




0.05
0.09







SEQ ID NO: 123
a
A
G
T
C
C
a
t
G
C
A
G
A
A
G
A
g
G
A
A
G
G
G





EMX1_19
C-
Untreated




0.03
0.07



0.10
















>other
ggX20




0.05
0.07



0.09
















bases
gX20




0.03
0.08



0.12

















GX19




0.24
0.30



0.12

















gX18




0.03
0.08



0.10

















GX17




0.05
0.07



0.09







SEQ ID NO: 124
G
A
G
T
C
C
t
A
G

A
G
A
A
G
A
A
a
A
A
G
G
G





EMX1_20
C-
Untreated




0.05
0.12




















>other
ggX20




0.21
0.26




















bases
gX20




0.43
0.50





















GX19




0.50
0.57





















gX18




0.06
0.12





















GX17




0.05
0.12







SEQ ID NO: 125
G
A
G
T
C
C
c
t

C
A
G
g
A
G
A
A
G
A
A
A
G
G





EMX1_21
C-
Untreated




0.16
0.08
0.07


0.03
















>other
ggX20




0.12
0.07
0.06


0.04
















bases
gX20




0.15
0.11
0.08


0.04

















GX19




0.24
0.17
0.16


0.06

















gX18




0.12
0.09
0.07


0.04

















GX17




0.14
0.08
0.06


0.02







SEQ ID NO: 126
a
c
G
T
C
t
G
A
G
C
A
G
A
A
G
A
A
G
A
A
T
G
G





EMX1_22
C-
Untreated

0.14


0.04




0.11
















>other
ggX20

0.14


0.16




0.13
















bases
gX20

0.15


0.20




0.16

















GX19

0.15


0.62




0.12

















gX18

0.22


1.24




0.12

















GX17

0.13


4.49




0.11







SEQ ID NO: 127
G
A
G
T
t
C
c
A
G
a
A
G
A
A
G
A
A
G
A
A
G
A
G





EMX1_23
C-
Untreated





0.06
0.08



















>other
ggX20





0.06
0.09



















bases
gX20





0.09
0.13




















GX19





0.06
0.09




















gX18





0.07
0.11




















GX17





0.07
0.09







SEQ ID NO: 128
G
A
G
T
C
C
t
A
a

A
G
A
A
G
A
A
G
c
A
G
G
G





EMX1_24
C-
Untreated




0.05
0.18












0.11







>other
ggX20




0.06
0.20












0.16







bases
gX20




0.07
0.19












0.12








GX19




0.05
0.22












0.12








gX18




0.07
0.19












0.15








GX17




0.04
0.18












0.12







SEQ ID NO: 129
c
A
G
T
C
C
a
A
a
C
A
G
A
A
G
A
g
G
A
A
T
G
G





EMX1_25
C-
Untreated
0.11



0.05
0.11



0.11
















>other
ggX20
0.11



0.08
0.12



0.09
















bases
gX20
0.10



0.05
0.10



0.10

















GX19
0.11



0.07
0.13



0.11

















gX18
0.13



0.05
0.14



0.13

















GX17
0.10



0.07
0.13



0.12










FANCF





































SEQ ID NO: 131
G
G
A
A
T
C
C
C
T
T
C
T
G
C
A
G
C
A
C
C
T
G
G





On-target
C-
Untreated





0.06
0.10
0.04


0.03


0.13


0.13

0.05
0.04





(FANCF_2)
>other
ggX20





9.20
8.19
7.94


4.25


0.12


0.12

0.06
0.04






bases
gX20





8.12
7.31
6.89


3.01


0.13


0.12

0.05
0.03







GX19





10.26
9.44
9.28


4.12


0.18


0.12

0.05
0.04







GX18





9.74
8.81
8.16


3.14


0.15


0.14

0.06
0.02







gX17





3.36
2.80
2.77


1.14


0.12


0.12

0.05
0.04







SEQ ID NO: 130
t
G
A
A
T
C
C
C
a
T
C
T
c
C
A
G
C
A
C
C
A
G
G





FANCF_1
C-
Untreated





0.07
0.10
0.08


0.03

0.04
0.04


0.07

0.04
0.07






>other
ggX20





0.06
0.11
0.07


0.03

0.03
0.05


0.11

0.03
0.06






bases
gX20





0.09
0.10
0.09


0.04

0.02
0.05


0.09

0.03
0.09







GX19





0.16
0.16
0.18


0.06

0.05
0.07


0.09

0.03
0.07







GX18





0.80
0.79
0.79


0.25

0.13
0.08


0.10

0.03
0.06







gX17





0.08
0.10
0.09


0.02

0.03
0.06


0.09

0.02
0.07







SEQ ID NO: 132
G
G
A
g
T
C
C
C
T
c
C
T
a
C
A
G
C
A
C
C
A
G
G





FANCF_3
C-
Untreated





0.06
0.09
0.05

0.08
0.03


0.06


0.07

0.06
0.14






>other
ggX20





0.06
0.08
0.04

0.07
0.04


0.06


0.05

0.07
0.15






bases
gX20





0.10
0.13
0.08

0.10
0.05


0.07


0.06

0.07
0.15







GX19





0.20
0.23
0.18

0.16
0.06


0.08


0.06

0.07
0.18







GX18





0.05
0.09
0.05

0.08
0.03


0.05


0.06

0.08
0.17







gX17





0.05
0.08
0.04

0.11
0.05


0.06


0.07

0.09
0.15







SEQ ID NO: 133
G
G
A
g
T
C
C
C
T
c
C
T
a
C
A
G
C
A
C
C
A
G
G





FANCF_4
C-
Untreated





0.06
0.05
0.05

0.05
0.06


0.06


0.03

0.03
0.06






>other
ggX20





0.05
0.05
0.05

0.04
0.04


0.07


0.04

0.03
0.04






bases
gX20





0.08
0.07
0.06

0.06
0.08


0.06


0.02

0.02
0.06







GX19





0.11
0.09
0.12

0.05
0.06


0.07


0.04

0.04
0.04







GX18





0.07
0.07
0.06

0.05
0.06


0.07


0.04

0.03
0.07







gX17





0.06
0.05
0.04

0.06
0.04


0.07


0.03

0.02
0.06







SEQ ID NO: 134
G
G
A
A
T
C
C
C
T
T
C
T
a
C
A
G
C
A
t
C
C
T
G





FANCF_5
C-
Untreated





0.09
0.07
0.05


0.03


0.07


0.03


0.03






>other
ggX20





0.07
0.07
0.04


0.04


0.06


0.03


0.02






bases
gX20





0.07
0.05
0.06


0.04


0.05


0.04


0.03







GX19





0.10
0.07
0.05


0.03


0.07


0.03


0.02







GX18





0.08
0.06
0.06


0.03


0.07


0.04


0.03







gX17





0.09
0.05
0.05


0.05


0.08


0.04


0.02







SEQ ID NO: 135
G
G
A
g
T
C
C
C
T
c
C
T
G
C
A
G
C
A
C
C
T
G
A





FANCF_6
C-
Untreated





0.04
0.04
0.04

0.02
0.04


0.09


0.06

0.02
0.04






>other
ggX20





0.03
0.04
0.04

0.02
0.04


0.12


0.05

0.03
0.06






bases
gX20





0.05
0.06
0.05

0.03
0.06


0.11


0.07

0.06
0.04







GX19





0.13
0.09
0.09

0.05
0.06


0.12


0.06

0.05
0.03







GX18





0.06
0.05
0.04

0.03
0.04


0.08


0.07

0.04
0.05







gX17





0.05
0.05
0.04

0.04
0.05


0.14


0.05

0.05
0.04







SEQ ID NO: 136
G
G
A
A
c
C
C
C
g
T
C
T
G
C
A
G
C
A
C
C
A
G
G





FANCF_7
C-
Untreated




0.03
0.07
0.07
0.06


0.03


0.20


0.05

0.03
0.07






>other
ggX20




0.27
0.29
0.28
0.32


0.10


0.21


0.05

0.02
0.07






bases
gX20




1.46
1.50
1.49
1.48


0.80


0.20


0.04

0.04
0.06







GX19




1.06
1.07
1.07
1.02


0.71


0.22


0.07

0.03
0.07







GX18




0.04
0.07
0.05
0.09


0.01


0.17


0.04

0.04
0.06







gX17




0.04
0.06
0.04
0.09


0.01


0.17


0.05

0.03
0.06







SEQ ID NO: 137
G
t
c
t
c
C
C
C
T
T
C
T
G
C
A
G
C
A
C
C
A
G
G





FANCF_8
C-
Untreated


0.02

0.03
0.04
0.02
0.05


0.03


0.11


0.03

0.02
0.03






>other
ggX20


0.02

0.01
0.03
0.05
0.05


0.01


0.08


0.02

0.02
0.04






bases
gX20


0.01

0.02
0.04
0.04
0.04


0.02


0.08


0.02

0.03
0.03







GX19


0.02

0.02
0.04
0.04
0.08


0.03


0.10


0.03

0.02
0.03







GX18


0.04

0.09
0.09
0.10
0.13


0.05


0.10


0.04

0.03
0.03







gX17


0.04

0.09
0.11
0.11
0.13


0.07


0.11


0.05

0.02
0.04







SEQ ID NO: 138
a
a
A
A
T
C
C
C
T
T
C
c
G
C
A
G
C
A
C
C
T
A
G





FANCF_9
C-
Untreated





0.07
0.02
0.04


0.05
0.06

0.05


0.04

0.05
0.06






>other
ggX20





0.08
0.03
0.04


0.03
0.06

0.06


0.05

0.05
0.04






bases
gX20





0.09
0.03
0.04


0.04
0.05

0.05


0.04

0.06
0.04







GX19





0.10
0.04
0.05


0.05
0.06

0.06


0.04

0.05
0.03







GX18





0.10
0.06
0.07


0.06
0.05

0.07


0.03

0.06
0.04







gX17





0.06
0.06
0.06


0.05
0.06

0.04


0.03

0.05
0.06







SEQ ID NO: 139
t
G
t
A
T
t
t
C
T
T
C
T
G
C
c
t
C
A
g
g
C
T
G





FANCF_10
C-
Untreated


























>other
ggX20


























bases
gX20



























GX19



























GX18



























gX17







SEQ ID NO: 140
G
G
A
A
T
a
t
C
T
T
C
T
G
C
A
G
C
c
C
C
A
G
G





FANCF_11
C-
Untreated







0.03


0.05


0.22


0.03
0.05
0.05
0.10






>other
ggX20







0.03


0.05


0.23


0.02
0.06
0.04
0.09







gX20







0.03


0.03


0.23


0.03
0.05
0.05
0.10







GX19







0.04


0.03


0.21


0.02
0.06
0.05
0.09







GX18







0.03


0.04


0.20


0.02
0.04
0.05
0.07







gX17







0.04


0.05


0.24


0.02
0.06
0.06
0.08







SEQ ID NO: 141
G
a
g
t
g
C
C
C
T
g
a
a
G
C
c
t
C
A
g
C
T
G
G





FANCF_12
C-
Untreated


























>other
ggX20


























bases
gX20



























GX19



























GX18



























gX17







SEQ ID NO: 142
a
c
c
A
T
C
C
C
T
c
C
T
G
C
A
G
C
A
C
C
A
G
G





FANCF_13
C-
Untreated

0.07
0.06


0.04
0.04
0.04

0.06
0.05


0.10


0.03

0.08
0.04






>other
ggX20

0.13
0.07


0.05
0.04
0.05

0.05
0.02


0.08


0.04

0.03
0.06






bases
gX20

0.10
0.08


0.04
0.04
0.04

0.06
0.04


0.09


0.04

0.07
0.04







GX19

0.13
0.08


0.15
0.15
0.14

0.13
0.09


0.10


0.04

0.06
0.04







GX18

0.14
0.12


1.03
0.99
0.94

0.40
0.14


0.09


0.04

0.05
0.05







gX17

0.15
0.15


2.04
1.96
1.94

0.75
0.26


0.10


0.04

0.06
0.04







SEQ ID NO: 143
t
G
A
A
T
C
C
t
a
a
C
T
G
C
A
G
C
A
C
C
A
G
G





FANCF_14
C-
Untreated





0.09
0.05



0.04


0.09


0.06

0.08
0.06






>other
ggX20





0.10
0.05



0.04


0.08


0.07

0.11
0.07






bases
gX20





0.08
0.05



0.05


0.12


0.07

0.09
0.06







GX19





0.10
0.08



0.03


0.11


0.07

0.10
0.07







GX18





0.46
0.42



0.04


0.13


0.05

0.08
0.06







gX17





0.10
0.05



0.03


0.11


0.06

0.09
0.07







SEQ ID NO: 144
c
t
c
t
g
t
C
C
T
T
C
T
G
C
A
G
C
A
C
C
T
G
G





FANCF_15
C-
Untreated
0.03

0.04



0.05
0.02


0.02


0.06


0.02

0.01
0.03






>other
ggX20
0.03

0.03



0.04
0.03


0.02


0.07


0.01

0.02
0.04






bases
gX20
0.04

0.03



0.06
0.03


0.02


0.05


0.02

0.03
0.04







GX19
0.03

0.02



0.07
0.03


0.02


0.05


0.03

0.02
0.04







GX18
0.03

0.04



0.05
0.03


0.03


0.05


0.02

0.02
0.02







gX17
0.03

0.02



0.05
0.04


0.03


0.06


0.04

0.02
0.02







SEQ ID NO: 93
G
T
C
A
T
C
T
T
A
G
T
C
A
T
T
A
C
C
T
G
A
G
G





On-target
C-
Untreated


 0.06


 0.07





0.06




0.03
0.07







(RNF2_1)
>other
ggX20


22.35


29.23





3.10




0.10
0.08








bases
gX20


20.82


28.93





3.23




0.10
0.09









GX19


19.23


31.12





3.45




0.16
0.08









gX18


 9.19


19.16





1.61




0.07
0.08









gX17


 2.34


 7.73





0.95




0.06
0.09







SEQ ID NO: 145
C
T
T
G
C
C
C
C
A
C
A
G
G
G
C
A
G
T
A
A
C
G
G





On-target
C-
Untreated
0.05



0.08
0.03
0.05
0.04

0.04




0.08










(HBB_1)
>other
ggX20
0.30



4.68
6.16
6.49
5.84

0.15




0.08











bases
gX20
0.09



2.76
3.27
3.37
3.07

0.11




0.07












gX19
0.10



3.01
4.51
4.88
4.64

0.14




0.08












gX18
0.08



2.20
6.12
6.80
6.30

0.15




0.07












gX17
0.08



0.63
3.27
4.07
3.74

0.10




0.10







SEQ ID NO: 146
t
T
g
c
t
C
C
C
A
C
A
G
G
G
C
A
G
T
A
A
A
C
G





HBB_2
C-
Untreated



0.07

0.06
0.04
0.05

0.04




0.06











>other
ggX20



0.06

0.08
0.04
0.06

0.04




0.09











bases
gX20



0.08

0.09
0.07
0.08

0.03




0.05












gX19



0.42

0.89
0.84
0.86

0.07




0.06












gX18



0.07

0.12
0.10
0.11

0.05




0.06












gX17



0.07

0.08
0.05
0.06

0.05




0.08







SEQ ID NO: 147
g
c
T
G
C
C
C
C
A
C
A
G
G
G
C
A
G
C
A
A
A
G
G





HBB_3
C-
Untreated

0.07


0.06
0.06
0.11
0.03

0.07




0.14


0.09








>other
ggX20

0.10


0.09
0.11
0.14
0.08

0.07




0.15


0.07








bases
gX20

0.10


0.74
0.77
0.79
0.70

0.09




0.13


0.08









gX19

0.09


0.80
0.86
0.87
0.75

0.07




0.11


0.09









gX18

0.12


0.46
0.64
0.64
0.53

0.05




0.11


0.10









gX17

0.09


0.16
0.19
0.24
0.18

0.04




0.14


0.09







SEQ ID NO: 148
g
T
g
G
C
C
C
C
A
C
A
G
G
G
C
A
G
g
A
A
T
G
G





HBB_4
C-
Untreated




0.07
0.13
0.06
0.09

0.04




0.06











>other
ggX20




0.10
0.11
0.06
0.10

0.05




0.04











bases
gX20




0.08
0.12
0.07
0.09

0.04




0.06












gX19




0.14
0.20
0.13
0.16

0.07




0.08












gX18




0.10
0.24
0.17
0.20

0.08




0.06












gX17




0.84
1.61
1.58
1.53

0.16




0.05







SEQ ID NO: 149
a
T
T
G
C
C
C
C
A
C
g
G
G
G
C
A
G
T
g
A
C
G
G





HBB_5
C-
Untreated




0.12
0.19
0.73
0.40

0.16




0.20











>other
ggX20




0.16
0.20
0.73
0.48

0.19




0.25











bases
gX20




0.20
0.23
0.80
0.47

0.14




0.21












gX19




0.36
0.42
0.95
0.73

0.20




0.21












gX18




0.24
0.32
0.89
0.60

0.20




0.24












gX17




0.17
0.20
0.75
0.49

0.20




0.22







SEQ ID NO: 150
a
c
T
c
t
C
C
C
A
C
A
a
G
G
C
A
G
T
A
A
G
G
G





HBB_6
C-
Untreated

0.11

0.12

0.11
0.20
0.08

0.05




0.17











>other
ggX20

0.09

0.14

0.09
0.24
0.09

0.05




0.19











bases
gX20

0.12

0.13

0.13
0.23
0.14

0.04




0.22












gX19

0.10

0.14

0.13
0.22
0.09

0.05




0.17












gX18

0.12

0.15

0.14
0.26
0.11

0.06




0.22












gX17

0.10

0.16

0.11
0.24
0.10

0.04




0.19







SEQ ID NO: 151
t
c
a
G
C
C
C
C
A
C
A
G
G
G
C
A
G
T
A
A
G
G
G





HBB_7
C-
Untreated

0.03


0.07
0.07
0.09
0.05

0.05




0.08











>other
ggX20

1.37


0.17
0.76
0.99
1.08

0.08




0.09











bases
gX20

2.47


0.41
1.72
2.24
2.30

0.15




0.08












gX19

2.82


0.80
2.89
4.01
4.20

0.14




0.09












gX18

3.34


1.71
5.48
7.00
7.65

0.30




0.08












gX17

3.86


1.68
5.97
7.44
7.65

0.15




0.10










HEK2





































SEQ ID NO: 153
G
A
A
C
A
C
A

A
A
G
C
A
T
A
G
A
C
T
G
C
G
G





On-
C-
Untreated



0.05

0.05





0.03





0.03


0.19




target
>other
ggX20



30.30

47.30





0.03





0.14


0.15




(HEK2_2)
bases
gX20



36.76

44.99





0.08





0.13


0.16






GX19



11.89

34.66





0.05





0.27


0.15






gX18



2.02

45.27





0.02





0.03


0.19






gX17



2.77

30.94





0.02





0.03


0.18







SEQ ID NO: 152
G
A
A
C
A
C
A
A
t
G
C
A
T
A
G
A
t
T
G
C
C
G
G





HEK2_1
C-
Untreated



0.11

0.09




0.09








0.16






>other
ggX20



0.12

0.09




0.14








0.18






bases
GX20



0.17

0.14




0.13








0.19







GX19



0.19

0.22




0.12








0.18







gX18



0.12

0.10




0.11








0.20







gX17



0.11

0.09




0.13








0.20







SEQ ID NO: 154
a
A
c
t
c
C
A
A
A
G
C
A
T
A
t
A
C
T
G
C
T
G
G





HEK2_3
C-
Untreated


0.07

0.09
0.37




0.24








0.24






>other
ggX20


0.09

0.08
0.39




0.24








0.30






bases
gX20


0.08

0.08
0.38




0.25








0.28







GX19


0.08

0.08
0.38




0.24








0.27







gX18


0.08

0.08
0.39




0.24








0.30







gX17


0.06

0.06
0.36




0.23








0.28







SEQ ID NO: 156
G
G
C
C
C
A
G
A
C
T
G
A
G
C
A
C
G
T
G
A
T
G
G





On-target
C-
Untreated


0.15
0.47
0.39



0.15




0.08

0.06









(HEX3_2)
>other
ggX20


6.89
25.21
26.19



0.61




0.07

0.05










bases
gX20


6.36
32.68
37.05



1.76




0.06

0.11











GX19


0.93
25.39
32.09



0.75




0.09

0.13











gX18


0.95
14.23
21.59



1.68




0.09

0.10











gX17


0.14
0.65
0.85



0.40




0.10

0.06







SEQ ID NO: 155
a
G
C
t
C
A
G
A
C
T
G
A
G
C
A
a
G
T
G
A
G
G
G





HEK3_1
C-
Untreated


0.13

0.04



0.06




0.12












>other
ggX20


0.13

0.05



0.05




0.14












bases
gX20


0.12

0.04



0.04




0.15













GX19


0.14

0.09



0.04




0.17













gX18


0.14

0.04



0.06




0.13













gX17


0.11

0.04



0.05




0.12







SEQ ID NO. 161
G
G
C
C
a
c
t
c
a
T
G
g
c
C
A
C
a
T
a
c
T
G
G





HEK3_7
c-
Untreated


0.38
0.18

0.06

0.18




0.30
0.30

0.07



0.06






>other
ggX20


0.42
0.15

0.07

0.20




0.28
0.27

0.07



0.05






bases
gX20


0.39
0.14

0.08

0.15




0.28
0.21

0.08



0.06







GX19


0.44
0.15

0.07

0.17




0.28
0.26

0.08



0.06







GX18


0.45
0.14

0.07

0.16




0.25
0.26

0.06



0.05







gX17


0.42
0.14

0.07

0.19




0.26
0.26

0.07



0.04







SEQ ID NO: 162
G
G
C
A
C
T
G
C
G
G
C
T
G
G
A
G
G
T
G
G
G
G
G





On-target
C-
Untreated


0.17

0.08


0.23


0.07














(HEK4_1)
>other
ggX20


1.97

48.84


1.50


0.08















bases
gX20


1.20

44.02


1.39


0.06
















GX19


1.38

41.26


0.50


0.10
















GX18


0.27

39.88


1.43


0.07
















gX17


0.23

5.72


1.10


0.35







SEQ ID NO. 163
G
G
C
A
C
T
G
C
t
G
c
T
G
G
g
G
G
T
G
G
T
G
G





HEK4_2
C-
Untreated


0.14

0.04


0.11


0.91















>other
ggX20


0.17

0.39


0.13


0.93















bases
gX20


0.21

1.86


0.15


1.11
















GX19


0.27

6.55


0.25


0.99
















GX18


0.16

0.11


0.14


0.90
















gX17


0.15

0.06


0.10


0.93







SEQ ID NO: 164
G
G
C
A
C
T
G
c
a

C
T
G
G
A
G
G
T
t
G
T
G
G





HEK4_3
C-
Untreated


0.09

0.05


0.07


0.05















>other
ggX20


0.08

0.10


0.09


0.04















bases
gX20


0.10

0.26


0.09


0.06
















GX19


0.09

0.27


0.09


0.04
















GX18


0.08

0.05


0.06


0.05
















gX17


0.08

0.04


0.07


0.05







SEQ ID NO: 161
G
G
C
C
a
c
t
c
a
T
G
g
c
C
A
C
a
T
a
c
T
G
G





HEK3_7
C-
Untreated


0.38
0.18

0.06

0.18




0.30
0.30

0.07



0.06






>other
ggX20


0.42
0.15

0.07

0.20




0.28
0.27

0.07



0.05






bases
gX20


0.39
0.14

0.08

0.15




0.28
0.21

0.08



0.06







GX19


0.44
0.15

0.07

0.17




0.28
0.26

0.08



0.06







GX18


0.45
0.14

0.07

0.16




0.25
0.25

0.06



0.05







gX17


0.42
0.14

0.07

0.19




0.26
0.26

0.07



0.04










HEK4





































SEQ ID NO: 162
G
G
C
A
C
T
G
C
G
G
C
T
G
G
A
G
G
T
G
G
G
G
G





On-target
C-
Untreated


0.17

0.08


0.23


0.07














(HEK4_1)
>other
ggX20


1.97

48.84


1.50


0.08















bases
gX20


1.20

44.02


1.39


0.06
















GX19


1.38

41.26


0.50


0.10
















GX18


0.27

39.88


1.43


0.07
















gX17


0.23

5.72


1.10


0.35







SEQ ID NO: 163
G
G
C
A
C
T
G
C
t
G
C
T
G
G
g
G
G
T
G
G
T
G
G





HEK4_2
C-
Untreated


0.14

0.04


0.11


0.91















>other
ggX20


0.17

0.39


0.13


0.93















bases
gX20


0.21

1.85


0.15


1.11
















GX19


0.27

6.55


0.25


0.99
















GX18


0.15

0.11


0.14


0.90
















gX17


0.15

0.06


0.10


0.93







SEQ ID NO: 164
G
G
C
A
C
T
G
C
a

C
T
G
G
A
G
G
T
t
G
T
G
G





HEK4_3
C-
Untreated


0.09

0.05


0.07


















>other
ggX20


0.08

0.10


0.09


















bases
gX20


0.10

0.26


0.09



















GX19


0.09

0.27


0.09



















GX18


0.08

0.05


0.06



















gX17


0.08

0.04


0.07







SEQ ID NO: 163
G
G
C
t
C
T
G
C
G
G
C
T
G
G
A
G
G
g
G
G
T
G
G





HEK4_4
C-
Untreated


0.05

0.05


0.29


0.14















>other
ggX20


0.13

2.87


0.34


0.13















bases
gX20


0.11

2.94


0.38


0.14
















GX19


0.10

2.53


0.35


0.15
















GX18


0.04

0.13


0.30


0.12
















gX17


0.05

0.05


0.29


0.13







SEQ ID NO: 166
a
G
C
A
C
T
G
C
a
G
a
T
G
G
A
G
G
a
G
G
C
G
G





HEK4_5
C-
Untreated


0.09

0.03


0.11


















>other
ggX20


0.11

0.03


0.09


















bases
gX20


0.08

0.07


0.14



















GX19


0.15

0.58


0.17



















GX18


0.08

0.03


0.09



















gX17


0.06

0.03


0.10







SEQ ID NO. 167
G
G
C
A
C
T
G
C
G
G
C
a
G
G
g
a
G
g
a
G
G
G
G





HEK4_6
C-
Untreated


























>other
ggX20


























bases
gX20



























GX19



























GX18



























gX17







SEQ ID NO: 168
t
G
C
A
C
T
G
C
G
G
C
c
G
G
A
G
G
a
G
G
T
G
G





HEK4_7
C-
Untreated


0.24

 0.10


0.38


0.14
0.08














>other
ggX20


0.18

 0.38


0.29


0.13
0.05














bases
gX20


0.19

 1.64


0.36


0.14
0.09















GX19


0.43

 9.74


0.32


0.13
0.08















GX18


1.01

11.33


0.56


0.11
0.08















gX17


0.18

 0.16


0.26


0.13
0.08







SEQ ID NO. 169
G
G
C
A
C
T

g
G
G
C
T
G
a
A
G
G
T
a
G
A
G
G





HEK4_8
C-
Untreated


0.08

0.03





0.09















>other
ggX20


0.18

0.64





0.05















bases
gX20


0.18

0.62





0.05
















GX19


0.07

0.16





0.06
















GX18


0.08

0.03





0.08
















gX17


0.07

0.03





0.06







SEQ ID NO: 170
G
G
C
A
C
T
G
t
G
G
C
T
G
c
A
G
G
T
G
G
A
G
G





HEK4_9
C-
Untreated


0.11

0.03





0.04


0.03












>other
ggX20


0.10

0.04





0.02


0.04












bases
gX20


0.12

0.03





0.03


0.03













GX19


0.12

0.02





0.04


0.04













GX18


0.10

0.03





0.03


0.03













gX17


0.06

0.02





0.03


0.03







SEQ ID NO: 171
t
G
C
t
C
T
G
C
G
G
C
a
G
G
A
G
G
a
G
G
A
G
G





HEK4_10
C-
Untreated


0.08

0.18


0.07


0.07















>other
ggX20


0.07

0.17


0.06


0.07















bases
gX20


0.09

0.16


0.06


0.05
















GX19


0.06

0.16


0.09


0.07
















GX18


0.07

0.17


0.07


0.06
















gX17


0.04

0.06


0.02


0.04







SEQ ID NO: 172
a
G
C
A
C
T
G
C
a
G
C
T
G
G
g
a
G
T
G
G
A
G
G





HEK4_11
C-
Untreated


0.16

0.05


0.15


0.08















>other
ggX20


0.11

0.17


0.10


0.08















bases
gX20


0.15

0.35


0.16


0.08
















GX19


0.19

1.78


0.27


0.11
















GX18


0.13

0.33


0.12


0.08
















gX17


0.14

0.07


0.10


0.09







SEQ ID NO: 173
G
G
C
A
C
T
G
a
G
G
g
T
G
G
A
G
G
T
G
G
G
G
G





HEK4_12
C-
Untreated


0.07

0.04





















>other
ggX20


0.27

1.09





















bases
gX20


0.30

1.94






















GX19


0.07

1.09






















GX18


0.07

0.04






















gX17


0.10

0.03







SEQ ID NO: 174
G
G
C
A
C
T
G
g
G
G
C
T
G
G
A
G
a
c
G
G
G
G
G





HEK4_13
C-
Untreated


0.12

0.13





0.12






0.21








>other
ggX20


0.10

0.15





0.10






0.14








bases
gX20


0.12

0.15





0.12






0.20









GX19


0.12

0.19





0.11






0.19









GX18


0.12

0.14





0.13






0.19









gX17


0.12

0.13





0.10






0.18







SEQ ID NO: 175
a
G
g
A
C
T
G
C
G
G
C
T
G
G
g
G
G
T
G
G
T
G
G





HEK4_14
C-
Untreated




0.05


0.29


0.03















>other
ggX20




1.37


0.31


0.04















bases
gX20




1.03


0.44


0.05
















GX19




4.70


0.38


0.06
















GX18




1.67


0.29


0.04
















gX17




6.06


0.88


0.07







SEQ ID NO: 176
G
G
C
A
C
T
G
C
a
a
C
T
G
G
A
a
G
T
G
a
T
G
G





HEK4_15
C-
Untreated


0.11

0.06


0.11


0.02















>other
ggX20


0.10

0.10


0.08


0.02















bases
gX20


0.08

0.16


0.08


0.03
















GX19


0.10

0.32


0.09


0.02
















GX18


0.08

0.06


0.06


0.01
















gX17


0.10

0.04


0.09


0.02







SEQ ID NO: 177
G
G
C
A
C
T
G
g
G
G
t
T
G
G
A
G
G
T
G
G
G
G
G





HEK4_16
C-
Untreated


0.16

0.18





















>other
ggX20


0.69

2.90





















bases
gX20


0.87

3.94






















GX19


0.29

3.17






















GX18


0.18

0.21






















gX17


0.15

0.15







SEQ ID NO: 178
G
c
C
A
C
T
G
C
a
G
C
T
a
G
A
G
G
T
G
G
A
G
G





HEK4_17
C-
Untreated

0.11
0.05

0.05


0.16


0.04















>other
ggX20

0.11
0.10

0.69


0.17


0.04















bases
gX20

0.11
0.16

1.46


0.17


0.04
















GX19

0.11
0.29

3.27


0.28


0.04
















GX18

0.13
0.14

0.69


0.15


0.04
















gX17

0.12
0.12

0.23


0.18


0.03







SEQ ID NO: 179
G
c
C
A
C
T
G
C
G
a
C
T
G
G
A
G
G
a
G
G
G
G
G





HEK4_18
C-
Untreated

0.16
0.06

0.06


61.49


0.05















>other
ggX20

0.12
0.06

0.06


60.75


0.04















bases
gX20

0.10
0.07

0.06


60.11


0.05
















GX19

0.12
0.08

0.11


61.02


0.05
















GX18

0.14
0.08

0.08


60.97


0.03
















gX17

0.12
0.07

0.08


60.12


0.05







SEQ ID NO: 180
G
G
C
A
C
T
G

G
G
C
T
G
G
A
G
G
c
G
G
G
G
G


HEK4_19
C-
Untreated


0.03

0.06





0.05






0.08








>other
ggX20


0.04

0.11





0.08






0.08








bases
gX20


0.04

0.10





0.05






0.11









GX19


0.05

0.05





0.09






0.08









GX18


0.03

0.05





0.07






0.09









gX17


0.01

0.03





0.02






0.06







SEQ ID NO. 181
a
G
C
t
C
T
G
C
G
G
C
a
G
G
A
G
t
T
G
G
A
G
G





HEK4_20
C-
Untreated


0.22

0.03


0.22


0.10















>other
ggX20


0.25

0.02


0.20


0.10















bases
gX20


0.23

0.02


0.21


0.10
















GX19


0.22

0.02


0.20


0.09
















GX18


0.23

0.02


0.16


0.09
















gX17


0.25

0.02


0.23


0.10










FIG. 16a schematically shows a conventional sgRNA (gX19 sgRNA), a truncated sgRNA (gX18 or gX17 sgRNA) and an extended sgRNA (gX20 or ggX20 sgRNA). FIG. 16b shows base-editing frequencies at the HBB on- and off-target sites in HEK293T cells measured by targeted deep sequencing. Specificity ratios were calculated by dividing the base-editing frequency at the on-target site with that at off-target sites. The heatmap represents relative specificities of modified sgRNAs, compared to that of conventional sgRNA.



FIG. 17 shows the result of reducing BE3 off-target effects using modified sgRNAs, wherein 17a shows a schematic view of conventional sgRNAs (GX19 sgRNA) and modified sgRNAs (GX17 sgRNA, gX18 sgRNA, gX20 sgRNA, and ggX20 sgRNA), and 17b shows base editing efficiencies (frequencies) measured at the EMX1 on- and off-target sites by targeted deep sequencing in HEK293T cells.


As shown in FIGS. 16a, 16b, 17a, and 17b, truncated sgRNAs reduced off-target effects at many sites but exacerbated them at sites with mismatches at the 5′ terminus (shown by asterisks in FIGS. 16b and 17b). Extended sgRNAs reduced off-target effects at almost every site without sacrificing on-target effects. Interestingly, some extended sgRNAs were more active at on-target sites than conventional sgRNAs (Table 17). Use of attenuated Cas9 variants or delivery of BE3 RNPs rather than plasmids may further improve the genome-wide specificity of base editing.


In summary, the results obtained using mismatched sgRNAs, Digenome-seq, and targeted deep sequencing showed that BE3 deaminases were highly specific, catalyzing C-to-U conversions in vitro and base editing in human cells at a limited number of sites in the human genome. It was also found that BE3 and Cas9 off-target sites were not always coincidental, justifying independent assessments of each tool. It is expect that the above results and methods will accelerate broad use of RNA-guided programmable deaminases in research and medicine.


Example 7. BE1 (rAPOBEC1-dCas9)-Mediated Double Strand Breaks (DSBs)

A PCR amplicon containing a target sequence (ENX1 on-target sequence; SEQ ID NO: 31) was incubated with BE1 (rAPOBEC1-dCas9; Example 2) and its sgRNA (sgRNA targeting SEQ ID NO: 31) in vitro to induce Cytidine to Uracil conversions. Uracil, which is induced by rAPOBEC1, was removed by USER (Uracil-Specific Excision Reagent) Enzyme (New England Biolabs). Then, S1 nuclease (Catalog #M5761; Promega) was treated to cleave phosphodiester bonds in a single-strand DNA, producing a DSB at the cytosine-deaminated site (FIG. 22(a)).


The above-obtained PCR amplicon was subjected to electrophoresis, to confirm that they are cleaved by the treatment of BE1/sgRNA, USER, and S1 Nuclease (FIG. 22(b)).


From the above description, it will be understood by those skilled in the art that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. In this regard, it should be understood that the above-described embodiments are to be considered in all respects as illustrative and not restrictive. The scope of the present invention should be construed as being included in the scope of the present invention without departing from the scope of the present invention as defined by the appended claims.

Claims
  • 1. A method of analyzing nucleic acid sequence of DNA in which a base editing is introduced by cytosine deaminase, comprising: (i) introducing or contacting (a) a cytosine deaminase and an inactivated target-specific endonuclease, or (b) a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, or (c) a plasmid comprising a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, into a cell or with DNA isolated from a cell, together with a guide RNA;(ii) treating the DNA with a uracil-specific excision reagent (USER) and generating double strand cleavage in DNA; and(iii) analyzing nucleic acid sequence of the cleaved DNA fragment,wherein the DNA isolated from a cell in step (i) is a genomic DNA, and the nucleic acid sequence analysis of step (iii) is performed by whole genome sequencing,wherein the uracil-specific excision reagent (USER) comprises uracil DNA glycosylase (UDG) and endonuclease VIII, andwherein the inactivated target-specific endonuclease is a Cas9 protein derived from Streptococcus pyogenes wherein amino acid residue D10 is substituted with alanine.
  • 2. The method of claim 1, wherein the cytosine deaminase and inactivated target-specific endonuclease are in a form of a fusion protein, or the cytosine deaminase coding gene and inactivated target-specific endonuclease coding gene encode a fusion protein comprising the cytosine deaminase and inactivated target-specific endonuclease.
  • 3. The method of claim 1, wherein amino acid residue H840 of the inactive target-specific endonuclease is substituted with alanine, and wherein generating double strand cleavage in DNA comprises treating the DNA with an endonuclease specifically cleaving a single strand region of DNA.
  • 4. The method of claim 1, wherein the guide RNA is a crRNA:tracrRNA duplex in which crRNA and tracrRNA is coupled to each other, or a single-strand guide RNA (sgRNA).
  • 5. The method of claim 1, which is performed in vitro.
  • 6. A method of identifying a base editing site of cytosine deaminase, comprising: (i) introducing or contacting (a) a cytosine deaminase and an inactivated target-specific endonuclease, or (b) a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, or (c) a plasmid comprising a cytosine deaminase coding gene and an inactivated target-specific endonuclease coding gene, into a cell or with DNA isolated from a cell, together with a guide RNA;(ii) treating the DNA with a uracil-specific excision reagent (USER) and generating double strand cleavage in DNA;(iii) analyzing nucleic acid sequence of the cleaved DNA fragment; and(iv) identifying the double strand cleavage site in the nucleic acid sequence read obtained by the analysis,wherein the DNA isolated from a cell in step (i) is a genomic DNA, and the nucleic acid sequence analysis of step (iii) is performed by whole genome sequencing,wherein the uracil-specific excision reagent (USER) comprises uracil DNA glycosylase (UDG) and endonuclease VIII, andwherein the inactivated target-specific endonuclease is a Cas9 protein derived from Streptococcus pyogenes wherein amino acid residue D10 is substituted with alanine.
  • 7. The method of claim 6, wherein the cytosine deaminase and inactivated target-specific endonuclease are in a form of a fusion protein, or the cytosine deaminase coding gene and inactivated target-specific endonuclease coding gene encode a fusion protein comprising the cytosine deaminase and inactivated target-specific endonuclease.
  • 8. The method of claim 6, wherein amino acid residue H840 of the inactive target-specific endonuclease is substituted with alanine, and wherein generating double strand cleavage in DNA comprises treating the DNA with an endonuclease specifically cleaving a single strand region of DNA.
  • 9. The method of claim 6, wherein the guide RNA is a crRNA:tracrRNA duplex in which crRNA and tracrRNA is coupled to each other, or a single-strand guide RNA (sgRNA).
  • 10. The method of claim 6, which is performed in vitro.
PCT Information
Filing Document Filing Date Country Kind
PCT/KR2017/010056 9/13/2017 WO
Publishing Document Publishing Date Country Kind
WO2018/052247 3/22/2018 WO A
US Referenced Citations (4)
Number Name Date Kind
20150166980 Liu et al. Jun 2015 A1
20150226671 Huang Aug 2015 A1
20160304846 Liu Oct 2016 A1
20170121693 Liu May 2017 A1
Foreign Referenced Citations (3)
Number Date Country
2015-105928 Jul 2015 WO
2015-138620 Sep 2015 WO
2016-022363 Feb 2016 WO
Non-Patent Literature Citations (26)
Entry
Kim et al., Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells. Nature Methods (2015) 12(3): 237-243 and Supplemental material (Year: 2015).
Mung Bean Nuclease, NEB, www.neb.com/products/m0250-mung-bean-nuclease (archived from Sep. 18, 2015) [retrieved Sep. 16, 2022] (Year: 2015).
Chaudhry and Weinfeld, Induction of double-strand breaks by S1 nuclease, mung bean nuclease and nuclease P1 in DNA containing abasic sites and nicks. Nucleic Acids Research (1995), 23(19): 3805-3809 (Year: 1995).
Krokan et al., Base Excision Repair. Cold Spring Harbor Perspectives in Biology (2013), 5: a012583 (Year: 2013).
Albert's Molecular Biology of the Cell, 5th Ed. (2008), Chapter 4: DNA, Chromosomes and Genomes (Year: 2008).
Briggs et al., Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Research (2010), 38: e87, 1-12 (Year: 2010).
Daesik Kim et al, “Genome-wide target specificities of CRISPR RNA-guided programmable deaminases”, Nature Biotechnology, vol. 35, No. 5, Apr. 10, 2017, pp. 475-480, XP055383071, ISSN 1087-0156. doi:10.1038/nbt.3852.
EPO, Supplementary European Search Report of EP 17851121.8 dated Mar. 30, 2020.
“User Enzyme”, New England BioLabs, Inc., https://www.neb.com/products/m5505-user-enzyme.
“The new editor-targeted genome engineering in the absence of homology-directed repair”, Cell Death Discovery, vol. 2, pp. 1-2, Jun. 13, 2016.
Alexis C. Komor et al., “Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage”, Nature, vol. 533, pp. 420-424 and Methods, May 19, 2016.
Keiji Nishida et al., “Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems”, Science, vol. 353, Issue 6305, Sep. 16, 2016.
Yunqing Ma et al., “Targeted AIDID-mediated mutagenesis (TAM) enables efficient genomic diversification in mammalian cells”, Nature Methods, vol. 13, No. 12, Dec. 2016.
Gaelen T Hess et al., “Directed evolution using dCas9-targeted somatic hypermutation in mammalian cells”, Nature Methods, vol. 13, No. 12, pp. 1036-10742, Dec. 2016.
Luhan Yang et al., “Engineering and optimising deaminase fusions for genome editing”, Nature communications, vol. 7, 2016.
Hyongbum Kim et al., “A guide to genome engineering with programmable nucleases”, Nature Reviews, vol. 15, pp. 321-334, May 2014.
Shengdar Q Tsai et al., “GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases”, Nature biotechnology, vol. 33, No. 2, pp. 189-197, Feb. 2015.
Richard L Frock et al., “Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases”, Nature biotechnology, vol. 33, No. 2, Feb. 2015.
F. Ann Ran et al., “In vivo genome editing using Staphylococcus aureus Cas9”, Nature, vol. 520, pp. 186-191, Apr. 2015.
Xiaoling Wang et al., “Unbiased detection of off-target cleavage by CRISPR-Cas9 and TALENs using integrase-defective lentiviral vectors”, Nature biotechnology, vol. 33, No. 2, Feb. 2015.
Daesik Kim et al., “Genome-wide target specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq”, Genome research, vol. 26, pp. 406-415, 2016.
Daesik Kim et al., “Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells”, Nature biotechnology, vol. 34, pp. 863-868, 2016.
Daesik Kim et al., “Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells”, Nature methods, vol. 12, No. 3, pp. 237-243, 231 p following 243, Mar. 2015.
Sangsu Bae et al., “Cas-OFFinder: A fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases”, Bioinformatics, vol. 30, No. 10, 2014.
Luhan Yang et al., “Genome Editing With Targeted Deaminases”, bioRxiv, 2016.
Stefanie V Lensing et al., “DSBCapture: in situ capture and sequencing of DNA breaks”, Nature methods, vol. 13, No. 10, Oct. 2016.
Related Publications (1)
Number Date Country
20200131536 A1 Apr 2020 US
Provisional Applications (2)
Number Date Country
62445310 Jan 2017 US
62393682 Sep 2016 US