SYSTEM AND METHOD FOR GENOME EDITING

Abstract
The invention relates to the field of genetic engineering. In particular, the present invention relates to a novel eukaryotic genome editing system and method. More specifically, the present invention relates to a CRISPR-Cpf1 system capable of efficiently editing a genome of a eukaryotic cell and the use thereof.
Description
TECHNICAL FIELD

The invention relates to the field of genetic engineering. In particular, the present invention relates to a novel eukaryotic genome editing system and method. More specifically, the present invention relates to a CRISPR-Cpf1 system capable of efficiently editing a genome of a eukaryotic cell and the use thereof.


BACKGROUND

The CRISPR (Clustered regular interspaced short palindromic repeats) system is an immune system that is generated during the evolution of bacteria to protect against foreign gene invasion. Among them, the type II CRISPR-Cas9 system is a system for DNA cleavage by a Cas9 protein mediated by two small RNAs (crRNA and tracrRNA) or an artificially synthetized small RNA (sgRNA), and is the simplest one of the three first discovered (Type I, II, III) CRISPR systems. Due to its ease of operation, the system was successfully engineered in 2013 and successfully achieved eukaryotic genome editing. The CRISPR/Cas9 system quickly became the most popular technology in life sciences.


In 2015, Zhang et al. discovered a new gene editing system through sequence alignment and systematic analysis, the CRISPR-Cpf1 system, which is different from the CRISPR-Cas9 system. The system requires only one small RNA (crRNA) to mediate genome editing. There are thousands of CRISPR systems in nature, but only a few can successfully implement eukaryotic genome editing.


There is still a need in the art for a CRISPR-Cpf1 system that enables efficient eukaryotic genome editing.


SUMMARY OF THE INVENTION

In one aspect, the present invention provides a genome editing system for site-directed modification of a target sequence in the genome of a cell, comprising at least one of the following i) to v):


i) a Cpf1 protein, and a guide RNA;


ii) an expression construct comprising a nucleotide sequence encoding a Cpf1 protein, and a guide RNA;


iii) a Cpf1 protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;


iv) an expression construct comprising a nucleotide sequence encoding a Cpf1 protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;


v) an expression construct comprising a nucleotide sequence encoding a Cpf1 protein and a nucleotide sequence encoding a guide RNA;


wherein the Cpf1 protein comprises an amino acid sequence of SEQ ID NOs: 1-12 or an amino acid sequence having at least 80% sequence identity to one of SEQ ID NOs: 1-12, the guide RNA capable of targeting the Cpf1 protein to a target sequence in the genome of the cell.


In a second aspect, the present invention provides a method of modifying a target sequence in the genome of a cell, comprising introducing the genome editing system of the invention into the cell, whereby the guide RNA targets the Cpf1 protein to the target sequence in the genome of the cell, resulting in substitution, deletion and/or addition of one or more nucleotides in the target sequence.


In a third aspect, the invention provides a method of treating a disease in a subject in need thereof, comprising delivering to the subject an effective amount of the genome editing system of the invention to modify a gene related with the disease.


In a fourth aspect, the invention provides the use of the genome editing system of the invention for the preparation of a pharmaceutical composition for treating a disease in a subject in need thereof, wherein the genome editing system is for modifying a gene related with the disease.


In a fifth aspect, the invention provides a pharmaceutical composition for treating a disease in a subject in need thereof, comprising the genome editing system of the invention and a pharmaceutically acceptable carrier, wherein the genome editing system is for modifying a gene related with the disease.


In a sixth aspect, the invention provides a crRNA, comprises a crRNA scaffold sequence corresponding to any one of SEQ ID NOs: 25-33 or the coding sequence of which comprises a sequence set forth in any one of SEQ ID NOs: 25-33.





DESCRIPTION OF THE DRAWINGS


FIG. 1. Alignment of direct repeat sequences (DR) from 25 new Cpf1 family proteins. The stem duplexes of the 21 mature crRNAs with DR sequences are extremely conserved.



FIG. 2. Immunofluorescence showing that the Cpf1 proteins were localized in the nucleus.



FIG. 3. shows the ability of the Cpf1 proteins to edit human Dnmt1 site 3. A: T7EI assay shows that 8 of the 9 selected Cpf1 proteins were able to achieve insertion/deletion (indel) at human Dnmt1 site 3 in 293T cells; B: sequencing results show representative indel in human Dnmt1 site 3.



FIG. 4. shows the ability of the Cpf1 proteins to edit human Dnmt1 site 1. A: The T7EI assay shows that four Cpf1 proteins were clearly targeted to human Dnmt1 site 1. B: Sequencing results show representative indels in human Dnmt1 site 1, wherein 6 Cpf1 proteins were capable of producing indels, although PsCpf1 and SaCpf1 had no significant bands in the enzyme digestion assay.



FIG. 5. shows the ability of the Cpf1 proteins to edit human Dnmt1 site 2. A: The T7EI assay shows that four Cpf1 proteins were clearly targeted to human Dnmt1 site 2. B: Sequencing results show representative indels in human Dnmt1 site 2, wherein 5 Cpf1 were capable of producing indels, although PsCpf1 had no significant bands in the enzyme digestion assay.



FIG. 6. In vitro assay of PAM sequences of BsCpf1. A: E. coli expression and purification of BsCpf1 protein. B-E: Targeting BsCpf1 using 5′-TTN PAM.



FIG. 7. In vitro assay of PAM sequences of each Cpf1. “+” indicates in vitro digestion activity.



FIG. 8. shows the ability of HkCpf1 to edit 8 sites of the human AAVS1 gene. A: The results of T7EI assay. B: Sequencing results show that HkCpf1 was able to generate mutations at seven sites, and the PAM sequence is 5′-YYN, which is consistent with the results in FIG. 7.



FIGS. 9, 10 and 11 show that the four Cpf1 proteins edited the mouse genome in EpiSC cells.



FIG. 12. Further validation of the editing ability of ArCpf1, BsCpf1, HkCpf1, LpCpf1, PrCpf1 and PxCpf1 for different targets in mice.



FIGS. 13-14 illustrate the optimization of the crRNA scaffold sequence.



FIG. 15. shows the various vector maps as used.





DETAILED DESCRIPTION OF THE INVENTION
1. Definition

In the present invention, the scientific and technical terms used herein have the meaning as commonly understood by a person skilled in the art unless otherwise specified. Also, the protein and nucleic acid chemistry, molecular biology, cell and tissue culture, microbiology, immunology related terms, and laboratory procedures used herein are terms and routine steps that are widely used in the corresponding field. For example, standard recombinant DNA and molecular cloning techniques used in the present invention are well known to those skilled in the art and are more fully described in the following document: Sambrook, J., Fritsch, E. F. and Maniatis, T., Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, 1989 (hereinafter referred to as “Sambrook”). In the meantime, in order to better understand the present invention, definitions and explanations of related terms are provided below.


“Cpf1 nuclease”, “Cpf1 protein” and “Cpf1” are used interchangeably herein and refer to an RNA-directed nuclease comprising a Cpf1 protein or a fragment thereof. Cpf1 is a component of the CRISPR-Cpf1 genome editing system that targets and cleaves DNA target sequences to form DNA double-strand breaks (DSBs) under the guidance of a guide RNA (crRNA). DSB can activate the intrinsic repair mechanism in the living cell, non-homologous end joining (NHEJ) and homologous recombination (HR), to repair DNA damage in the cell, during which site-directed editing is achieved to the specific DNA sequence.


“guide RNA” and “gRNA” can be used interchangeably herein. The guide RNA of the CRISPR-Cpf1 genome editing system is typically composed only of crRNA molecules, wherein the crRNA comprises a sequence that is sufficiently identical to the target sequence to hybridize to the complement of the target sequence and direct the complex (Cpf1+crRNA) to sequence-specifically bind to the target sequence.


“Genome” as used herein encompasses not only chromosomal DNA present in the nucleus, but also organellar DNA present in the subcellular components (e.g., mitochondria, plastids) of the cell.


As used herein, “organism” includes any organism that is suitable for genomic editing, eukaryotes are preferred. Examples of organisms include, but are not limited to, mammals such as human, mouse, rat, monkey, dog, pig, sheep, cattle, cat; poultry such as chicken, duck, goose; plants including monocots and dicots such as rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis and the like.


A “genetically modified organism” or “genetically modified cell” includes an organism or a cell which comprises within its genome an exogenous polynucleotide or a modified gene or expression regulatory sequence. For example, the exogenous polynucleotide is stably integrated within the genome of the organism or the cell such that the polynucleotide is passed on to successive generations. The exogenous polynucleotide may be integrated into the genome alone or as part of a recombinant DNA construct. The modified gene or expression regulatory sequence means that, in the organism genome or the cell genome, said sequence comprises one or more nucleotide substitution, deletion, or addition.


“Exogenous” in reference to a sequence means a sequence from a foreign species, or refers to a sequence in which significant changes in composition and/or locus occur from its native form through deliberate human intervention if from the same species.


“Polynucleotide”, “nucleic acid sequence”, “nucleotide sequence” or “nucleic acid fragment” are used interchangeably and are single-stranded or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural or altered nucleotide bases. Nucleotides are referred to by their single letter names as follows: “A” is adenosine or deoxyadenosine (corresponding to RNA or DNA, respectively), “C” means cytidine or deoxycytidine, “G” means guanosine or deoxyguanosine, “U” represents uridine, “T” means deoxythymidine, “R” means purine (A or G), “Y” means pyrimidine (C or T), “K” means G or T, “H” means A or C or T, “I” means inosine, and “N” means any nucleotide.


“Polypeptide,” “peptide,” and “protein” are used interchangeably in the present invention to refer to a polymer of amino acid residues. The terms apply to an amino acid polymer in which one or more amino acid residues is artificial chemical analogue of corresponding naturally occurring amino acid(s), as well as to a naturally occurring amino acid polymer. The terms “polypeptide,” “peptide,” “amino acid sequence,” and “protein” may also include modified forms including, but not limited to, glycosylation, lipid ligation, sulfation, y carboxylation of glutamic acid residues, and ADP-ribosylation.


Sequence “identity” has recognized meaning in the art, and the percentage of sequence identity between two nucleic acids or polypeptide molecules or regions can be calculated using the disclosed techniques. Sequence identity can be measured along the entire length of a polynucleotide or polypeptide or along a region of the molecule. (See, for example, Computational Molecular Biology, Lesk, A. M., ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, D. W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A. M., and Griffin, H. G, eds., Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991). Although there are many methods for measuring the identity between two polynucleotides or polypeptides, the term “identity” is well known to the skilled person (Carrillo, H. & Lipman, D., SIAM J Applied Math 48: 1073 (1988)).


Suitable conserved amino acid replacements in peptides or proteins are known to those skilled in the art and can generally be carried out without altering the biological activity of the resulting molecule. In general, one skilled in the art recognizes that a single amino acid replacement in a non-essential region of a polypeptide does not substantially alter biological activity (See, for example, Watson et al., Molecular Biology of the Gene, 4th Edition, 1987, The Benjamin/Cummings Pub. co., p. 224).


As used in the present invention, “expression construct” refers to a vector such as a recombinant vector that is suitable for expression of a nucleotide sequence of interest in an organism. “Expression” refers to the production of a functional product. For example, expression of a nucleotide sequence may refer to the transcription of a nucleotide sequence (e.g., transcription to produce mRNA or functional RNA) and/or the translation of an RNA into a precursor or mature protein.


The “expression construct” of the present invention may be a linear nucleic acid fragment, a circular plasmid, a viral vector or, in some embodiments, an RNA that is capable of translation (such as mRNA).


The “expression construct” of the present invention may comprise regulatory sequences and nucleotide sequences of interest from different origins, or regulatory sequences and nucleotide sequences of interest from the same source but arranged in a manner different from that normally occurring in nature.


“Regulatory sequence” and “regulatory element” are used interchangeably to refer to a nucleotide sequence that is located upstream (5′ non-coding sequence), middle or downstream (3′ non-coding sequence) of a coding sequence and affects the transcription, RNA processing or stability or translation of the relevant coding sequence. Regulatory sequences may include, but are not limited to, promoters, translation leaders, introns and polyadenylation recognition sequences.


“Promoter” refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment. In some embodiments of the present invention, the promoter is a promoter capable of controlling the transcription of a gene in a cell, whether or not it is derived from the cell. The promoter may be a constitutive promoter or tissue-specific promoter or developmentally-regulated promoter or inducible promoter.


“Constitutive promoter” refers to a promoter that may in general cause the gene to be expressed in most cases in most cell types. “Tissue-specific promoter” and “tissue-preferred promoter” are used interchangeably and mean that they are expressed primarily but not necessarily exclusively in one tissue or organ, but also in a specific cell or cell type. “Developmentally-regulated to promoter” refers to a promoter whose activity is dictated by developmental events. “Inducible promoter” selectively express operably linked DNA sequences in response to an endogenous or exogenous stimulus (environment, hormones, chemical signals, etc.).


As used herein, the term “operably linked” refers to the linkage of a regulatory element (e.g., but not limited to, a promoter sequence, a transcription termination sequence, etc.) to a nucleic acid sequence (e.g., a coding sequence or an open reading frame) such that transcription of the nucleotide sequence is controlled and regulated by the transcriptional regulatory element. Techniques for operably linking regulatory element regions to nucleic acid molecules are known in the art.


“Introduction” of a nucleic acid molecule (e.g., plasmid, linear nucleic acid fragment, RNA, etc.) or protein into an organism means that the nucleic acid or protein is used to transform an organism cell such that the nucleic acid or protein is capable of functioning in the cell. As used in the present invention, “transformation” includes both stable and transient transformations.


“Stable transformation” refers to the introduction of exogenous nucleotide sequences into the genome, resulting in the stable inheritance of foreign genes. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any of its successive generations.


“Transient transformation” refers to the introduction of a nucleic acid molecule or protein into a cell, performing a function without the stable inheritance of an exogenous gene. In transient transformation, the exogenous nucleic acid sequences are not integrated into the genome.


2. Efficient Genome Editing System

In one aspect, the invention provides the use of the Cpf1 protein comprises an amino acid sequences having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, even 100% sequence identity to one of SEQ ID NOs: 1-12 in eukaryotic genome editing.


In another aspect, the present invention provides a genome editing system for site-directed modification of a target sequence in the genome of a cell, comprising at least one of the following i) to v):


i) a Cpf1 protein, and a guide RNA;


ii) an expression construct comprising a nucleotide sequence encoding a Cpf1 protein, and a guide RNA;


iii) a Cpf1 protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;


iv) an expression construct comprising a nucleotide sequence encoding a Cpf1 protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;


v) an expression construct comprising a nucleotide sequence encoding a Cpf1 protein and a nucleotide sequence encoding a guide RNA;


wherein the Cpf1 protein comprises an amino acid sequence of SEQ ID NOs: 1-12 or an amino acid sequence having at least 80% sequence identity to one of SEQ ID NOs: 1-12, the guide RNA capable of targeting the Cpf1 protein to a target sequence in the genome of the cell.


In some embodiments of the methods of the invention, the guide RNA is a crRNA. In some embodiments, the coding sequence of the crRNA comprises a crRNA scaffold sequence set forth in any one of SEQ ID NOs: 25-33. In some preferred embodiments, the crRNA scaffold sequence is SEQ ID NO:30. In some embodiments, the coding sequence of the cRNA further comprises a sequence that specifically hybridizes to the complement of the target sequence (i.e., a spacer sequence) at 3′ of the cRNA scaffold sequence.


In some embodiments, the crRNA is encoded by a nucleotide sequence selected from the group consisting of:











i)



5′-ATTTCTACtgttGTAGAT(SEQ ID NO: 25)-Nx-3′;







ii)



5′-ATTTCTACtattGTAGAT(SEQ ID NO: 26)-Nx-3′;







iii)



5′-ATTTCTACtactGTAGAT(SEQ ID NO: 27)-Nx-3′;







iv)



5′-ATTTCTACtttgGTAGAT(SEQ ID NO: 28)-Nx-3′;







v)



5′-ATTTCTACtagttGTAGAT(SEQ ID NO: 29)-Nx-3′;







vi)



5′-ATTTCTACTATGGTAGAT(SEQ ID NO: 30)-Nx-3′;







vii)



5′-ATTTCTACTGTCGTAGAT(SEQ ID NO: 31)-Nx-3′;







viii)



5′-ATTTCTACTTGTGTAGAT(SEQ ID NO: 32)-Nx-3′;



and







ix)



5′-ATTTCTACTGTGGTAGAT(SEQ ID NO: 33)-Nx-3′,






wherein Nx represents nucleotide sequence that consists of x consecutive nucleotides, N is independently selected from A, G, C and T; x is an integer of 18≤x≤35, preferably, x=23. In some embodiments, the sequence Nx (spacer sequence) is capable of specifically hybridizing to the complement of the target sequence.


In some embodiments of the methods of the invention, the cell is a eukaryotic cell, preferably a mammalian cell.


In some embodiments of the methods of the invention, the Cpf1 protein comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97% of at least 98%, at least 99%, or even 100% sequence identity to one of SEQ ID NOs: 1-12. The Cpf1 protein is capable of targeting and/or cleaving target sequences in the genome of the cell through a crRNA.


In some embodiments of the methods of the invention, the Cpf1 protein comprises an amino acid sequence having one or more amino acid residue substitution, deletion or addition relative to one of SEQ ID NOs: 1-12. For example, the Cpf1 protein comprises an amino acid sequence having 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 amino acid residues substitution, deletion or addition relative to one of SEQ ID NOs: 1-12. In some embodiments, the amino acid substitution is a conservative substitution. The Cpf1 protein is capable of targeting and/or cleaving target sequences in the cell genome through a crRNA.


The Cpf1 protein of the invention may be derived from a species selected from: Agathobacter rectalis, Lachnospira pectinoschiza, Sneathia amnii, Helcococcus kunzii, Arcobacter butzleri, Bacteroidetes oral, Oribacterium sp., Butyrivibrio sp., Proteocatella sphenisci, Candidatus Dojkabacteria, Pseudobutyrivibrio xylanivorans, Pseudobutyrivibrio ruminis.


In some preferred embodiments of the invention, the Cpf1 protein is to derived from Agathobacter rectalis (ArCpf1), Butyrivibrio sp. (BsCpf1), Helcococcus kunzii (HkCpf1), Lachnospira pectinoschiza (LpCpf1), Pseudobutyrivibrio ruminis (PrCpf1) or Pseudobutyrivibrio xylanivorans (PxCpf1). In some preferred embodiments of the invention, the Cpf1 protein comprises an amino acid sequence selected from SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 8, SEQ ID NO: 11, SEQ ID NO: 12.


In some embodiments of the present invention, the Cpf1 protein of the present invention further comprises a nuclear localization sequence (NLS). In general, one or more NLSs in the Cpf1 protein should have sufficient strength to drive the Cpf1 protein in the nucleus of the cell to accumulate to an amount for achieving genome editing. In general, the intensity of nuclear localization activity is determined by the number, location of the NLS, one or more specific NLSs used in the Cpf1 protein, or a combination of these factors.


In some embodiments of the present invention, the NLS of the Cpf1 protein of the present invention may be located at the N-terminus and/or C-terminus. In some embodiments, the Cpf1 protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs. In some embodiments, the Cpf1 protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS at or near the N-terminus. In some embodiments, the Cpf1 protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs at or near the C-terminus. In some embodiments, the Cpf1 protein comprises a combination of these, such as comprises one or more NLSs at the N-terminus and one or more NLSs at the C-terminus. When there is more than one NLS, each can be selected to be independent of other NLSs. In some preferred embodiments of the present invention, the Cpf1 protein comprises two NLSs, for example, the two NLSs are located at the N-terminus and the C-terminus, respectively.


In general, NLS consists of one or more short sequences of positively charged lysine or arginine exposed on the surface of the protein, but other types of NLS are also known. Non-limiting examples of NLS include: KKRKV (nucleotide sequence 5′-AAGAAGAGAAAGGTC-3′), PKKKRKV (nucleotide sequence 5′-CCCAAGAAGAAGAGGAAGGTG-3′ or to CCAAAGAAGAAGAGGAAGGTT), or SGGSPKKKRKV (nucleotide sequence 5′-TCGGGGGGGAGCCCAAAGAAGAAGCGGAAGGTG-3′).


Furthermore, depending on the location of the DNA to be edited, the Cpf1 proteins of the present invention may also include other localization sequences, such as cytoplasm localization sequences, chloroplast localization sequences, mitochondrial localization sequences, and the like.


To obtain efficient expression in the target cells, in some embodiments of the present invention, the nucleotide sequence encoding the Cpf1 protein is codon optimized for the organism from which the cell to be genome edited is derived. Codon optimization refers to the replacement of at least one codon (e.g., about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons) of a native sequence by a codon that is used more frequently or most frequently in the gene of the host cell, modifying the nucleic acid sequence while maintaining the native amino acid sequence to enhance expression in the host cell of interest.


Different species show specific preferences for certain codons of a particular amino acid. Codon preference (difference in codon usage between organisms) is often associated with the efficiency of translation of messenger RNA (mRNA), which is believed to depend on the nature of the translated codon and the availability of specific transfer RNA (tRNA) molecules. The advantages of selected tRNAs within cells generally reflect the most frequently used codons for peptide synthesis. Therefore, genes can be customized to be best gene expressed in a given organism based on codon optimization. The codon usage table can be easily obtained, for example, in the Codon Usage Database available at www.kazusa.orjp/codon/, and these tables can be adjusted in different ways. See, Nakamura Y. et. al “Codon usage tabulated from the international DNA sequence databases: status for the year 2000 Nucl. Acids Res, 28: 292 (2000).


The organism from which the cell can be edited by the method of the present invention is derived preferably is eukaryote, including but not limited to mammals such as human, mouse, rat, monkey, dog, pig, sheep, cattle, cat; a poultry such as chicken, duck, goose; plants including monocots and dicots, such as rice, corn, wheat, sorghum, barley, soybean, peanut and Arabidopsis thaliana and so on.


In some embodiments of the invention, the nucleotide sequence encoding the Cpf1 protein is codon optimized for human. In some embodiments, the codon-optimized nucleotide sequence encoding the Cpf1 protein is selected from SEQ ID NOs: 13-24.


In some embodiments of the present invention, the nucleotide sequence encoding the Cpf1 protein and/or the nucleotide sequence encoding the guide RNA is operably linked to an expression regulatory element such as a promoter.


Examples of promoters that can be used in the present invention include, but are not limited to, the polymerase (pol) I, pol II or pol III promoters. Examples of the pol I promoter include the gallus RNA pol I promoter. Examples of the pol II promoters include, but are not limited to, the immediate-early cytomegalovirus (CMV) promoter, the Rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and the immediate-early simian virus 40 (SV40) promoter. Examples of pol III promoters include the U6 and H1 promoters. An inducible promoter such as a metallothionein promoter can be used. Other examples of promoters include the T7 phage promoter, the T3 phage promoter, the β-galactosidase promoter, and the Sp6 phage promoter, and the like. Promoters that can be used in plants include, but are not limited to, cauliflower mosaic virus 35S promoter, maize Ubi-1 promoter, wheat U6 promoter, rice U3 promoter, maize U3 promoter, rice actin promoter.


In another aspect, the invention provides a crRNA, comprises a crRNA scaffold sequence corresponding to any one of SEQ ID NOs: 25-33. In some embodiments, the coding sequence of the crRNA comprises the crRNA scaffold sequence set forth in any one of SEQ ID NOs: 25-33. In some preferred embodiments, the crRNA scaffold sequence is SEQ ID NO:30. In some embodiments, the coding sequence of the cRNA further comprises a sequence that specifically hybridizes to the complement of the target sequence (i.e., a spacer sequence) at 3′ of the cRNA scaffold sequence.


In some embodiments, the crRNA is encoded by a nucleotide sequence selected from the group consisting of:











i)



5′-ATTTCTACtgttGTAGAT(SEQ ID NO: 25)-Nx-3′;







ii)



5′-ATTTCTACtattGTAGAT(SEQ ID NO: 26)-Nx-3′;







iii)



5′-ATTTCTACtactGTAGAT(SEQ ID NO: 27)-Nx-3′;







iv)



5′-ATTTCTACtttgGTAGAT(SEQ ID NO: 28)-Nx-3′;







v)



5′-ATTTCTACtagttGTAGAT(SEQ ID NO: 29)-Nx-3′;







vi)



5′-ATTTCTACTATGGTAGAT(SEQ ID NO: 30)-Nx-3′;







vii)



5′-ATTTCTACTGTCGTAGAT(SEQ ID NO: 31)-Nx-3′;







viii)



5′-ATTTCTACTTGTGTAGAT(SEQ ID NO: 32)-Nx-3′;



and







ix)



5′-ATTTCTACTGTGGTAGAT(SEQ ID NO: 33)-Nx-3′,






wherein Nx represents nucleotide sequence that consists of x consecutive nucleotides, N is independently selected from A, G, C and T; x is an integer of 18≤x≤35, preferably, x=23. In some embodiments, the sequence Nx (spacer sequence) is capable of specifically hybridizing to the complement of the target sequence.


In some embodiments, the crRNAs of the invention are particularly suitable for use in combination with the Cpf1 protein of the invention for genome editing, particularly for genome editing in eukaryotes such as mammals.


3. Method for Modifying a Target Sequence in the Genome of a Cell

In another aspect, the present invention provides a method of modifying a target sequence in the genome of a cell, comprising introducing the genome editing system of the present invention into the cell, whereby the guide RNA targets the Cpf1 protein to a target sequence in the genome of the cell, resulting in one or more nucleotide substitution, deletion and/or addition in the target sequence.


In another aspect, the present invention provides a method of producing a genetically modified cell, comprising introducing the genome editing system of the present invention into the cell, whereby the guide RNA targets the Cpf1 protein to a target sequence in the genome of the cell, resulting in one or more nucleotide substitution, deletion and/or addition in the target sequence.


In another aspect, the invention also provides a genetically modified organism comprising a genetically modified cell produced by the method of the invention or a progeny cell thereof.


The design of target sequences or crRNA coding sequences that can be recognized and targeted by the Cpf1 protein and guide RNA (i.e., crRNA) complex can be found, for example, in Zhang et al., Cell 163, 1-13, Oct. 22, 2015. In general, the 5′-terminus of the target sequence targeted by the genome editing system of the present invention needs to include a protospacer adjacent motif (PAM) 5′-TTTN, 5′-TTN, 5′-CCN, 5′-TCCN, 5′-TCN or 5′-CTN, wherein N is independently selected from A, G, C and T.


For example, in some embodiments of the present invention, the target sequence has the following structure: 5′-TYYN-NX-3′ or 5′-YYN-NX-3′, wherein N is independently selected from A, G, C and T, Y is selected from C and T; x is an integer of 15≤x≤35; Nx represents x consecutive nucleotides.


In some embodiments, the coding sequence of the crRNA comprises a crRNA scaffold sequence set forth in any one of SEQ ID NOs: 25-33. In some preferred embodiments, the crRNA scaffold sequence is SEQ ID NO:30. In some embodiments, the coding sequence of the cRNA further comprises a sequence that specifically hybridizes to the complement of the target sequence (i.e., a spacer sequence) at 3′ of the cRNA scaffold sequence.


In some embodiments, the crRNA is encoded by a nucleotide sequence selected from the group consisting of:











i)



5′-ATTTCTACtgttGTAGAT(SEQ ID NO: 25)-Nx-3′;







ii)



5′-ATTTCTACtattGTAGAT(SEQ ID NO: 26)-Nx-3′;







iii)



5′-ATTTCTACtactGTAGAT(SEQ ID NO: 27)-Nx-3′;







iv)



5′-ATTTCTACtttgGTAGAT(SEQ ID NO: 28)-Nx-3′;







v)



5′-ATTTCTACtagttGTAGAT(SEQ ID NO: 29)-Nx-3′;







vi)



5′-ATTTCTACTATGGTAGAT(SEQ ID NO: 30)-Nx-3′;







vii)



5′-ATTTCTACTGTCGTAGAT(SEQ ID NO: 31)-Nx-3′;







viii)



5′-ATTTCTACTTGTGTAGAT(SEQ ID NO: 32)-Nx-3′;



and







ix)



5′-ATTTCTACTGTGGTAGAT(SEQ ID NO: 33)-Nx-3′,






wherein Nx represents nucleotide sequence that consists of x consecutive nucleotides, N is independently selected from A, G, C and T; x is an integer of 18≤x≤35, preferably, x=23. In some embodiments, the sequence Nx (spacer sequence) is capable of specifically hybridizing to the complement of the target sequence.


In the present invention, the target sequence to be modified may be located at any location in the genome, for example, in a functional gene such as a protein-encoding gene, or may be, for example, located in a gene expression regulatory region such as a promoter region or an enhancer region, thereby modification of gene functional or modification of gene expression can be achieved.


The substitution, deletion and/or addition in the target sequence of the cell can be detected by T7EI, PCR/RE or sequencing methods.


In the method of the present invention, the genome editing system can be introduced into the cell by a variety of methods well known to those skilled in the art.


Methods that can be used to introduce the genome editing system of the present invention into a cell include, but are not limited to, calcium phosphate transfection, protoplast fusion, electroporation, lipofection, microinjection, viral infection (e.g., baculovirus, vaccinia virus, adenovirus, adeno-associated virus, lentivirus and other viruses), gene gun method, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation.


A cell edited by the method of the present invention can be a cell of mammals such as human, mouse, rat, monkey, dog, pig, sheep, cattle, cat; a cell of poultry such as chicken, duck, goose; a cell of plants including monocots and dicots, such as rice, corn, wheat, sorghum, barley, soybean, peanut and Arabidopsis thaliana and so on.


In some embodiments, the method of the invention are performed in vitro. For example, the cell is an isolated cell. In some embodiments, the cell is a CAR-T cell. In some embodiments, the cell is an induced embryonic stem cell.


In other embodiments, the method of the invention may also be performed in vivo. For example, the cell is a cell within an organism, and the system of the present invention can be introduced into the cell in vivo by, for example, a virus-mediated method. For example, the cell can be a tumor cell in a patient.


4. Therapeutic Applications

The invention also encompasses the use of the genome editing system of the invention in the treatment of diseases.


By modifying a disease-related gene by the genome editing system of the present invention, it is possible to achieve up-regulation, down-regulation, inactivation, activation, or mutation correction of the disease-related gene, thereby achieving disease prevention and/or treatment. For example, in the present invention, the target sequence may be located in a protein coding region of the disease-related gene, or may be, for example, located in a gene expression regulatory region such as a promoter region or an enhancer region, thereby enabling functional modification of the disease-related gene or expression modification of the disease-related gene.


A “disease-related” gene refers to any gene that produces a transcriptional or translational product at an abnormal level or in an abnormal form in a cell derived from a disease-affected tissue as compared to a non-disease control tissue or cell. When altered expression is related with the appearance and/or progression of a disease, it may be a gene that is expressed at an abnormally high level; or it may be a gene that is expressed at an abnormally low level. A disease-related gene also refers to a gene having one or more mutations or a genetic variation that is directly responsible for the disease or is genetic linkage with one or more genes responsible for the etiology of the disease. The transcribed or translated product may be known or unknown, and may be at normal or abnormal levels.


Accordingly, the invention also provides a method of treating a disease in a subject in need thereof, comprising delivering to the subject an effective amount of the genome editing system of the invention to modify a gene related with the disease.


The invention also provides the use of the genome editing system of the invention for the preparation of a pharmaceutical composition for treating a disease in a subject in need thereof, wherein the genome editing system is for modifying a gene related with the disease.


The invention also provides a pharmaceutical composition for treating a disease in a subject in need thereof, comprising the genome editing system of the invention and a pharmaceutically acceptable carrier, wherein the genome editing system is for modifying a gene related with the disease.


In some embodiments, the subject is a mammal, such as a human.


Examples of such diseases include, but are not limited to, tumors, inflammation, Parkinson's disease, cardiovascular disease, Alzheimer's disease, autism, drug addiction, age-related macular degeneration, schizophrenia, hereditary diseases, and the like.


Further examples of diseases and corresponding disease-related genes according to the invention can be found, for example, from Chinese patent application CN201480045703.4.


5. Kit

The scope of the invention also includes a kit for use in the method of the invention, comprising the genome editing system of the invention, and an instruction. The kit generally includes a label indicating the intended use and/or method for use of the contents of the kit. The term label includes any written or recorded material provided on or with the kit or otherwise provided with the kit.


EXAMPLE
Material and Method
Cell Culture and Transfection

HEK293T, HeLa cells were cultured in DMEM medium (Gibco) supplemented with 10% FBS (Gibco) and 100 U/ml penicillin and 100 ug/ml streptomycin, and mouse EpiSC cells were cultured in N2B27 medium supplemented with bFGF. 12 h before transfection, the cells were seeded in a 12-well plate at the density of 500,000 per well. After 12 h, the density of the cells was about 60%-70%, and then 1.5 ug (Cpf1:crRNA=2:1) plasmid was transfected into the cells using Lipofectamine LTX and PLUS reagents (Invitrogen), and the medium was replaced with serum-free opti-MEM medium (Gibco) before transfection. After transfection, it was replaced with serum-added DMED medium at 8-12 h after transfection. 48 h after transfection, GFP positive cells were sorted by FACS for genotyping.


Genotype Analysis: T7EI Analysis and DNA Sequencing

GFP-positive cells were sorted by FACS, lysed with Buffer L+ 1/100V Protease K (Biotool) at 55° C. for 30 min, inactivated at 95° C. for 5 min, and directly used for PCR detection. Appropriate primers were designed near the Cpf1 targeting site, and the amplified product was purified using DNA Clean & Concentrator™-5 kit (ZYMO Research). 200 ng of purified PCR product was added into 1 uL of NEBuffer2 (NEB) and diluted to 10 uL with ddH2O. Then, heterodimers were formed by re-annealing according to the previously reported method (Li, W., et al., Simultaneous generation and germline transmission of multiple gene mutations in rat using CRISPR-Cas systems. Nat Biotechnol, 2013. 31(8): p. 684-6). The reannealed product was added with 0.3 uL of T7EI endnucelease and 1/10 V NEBuffer2 (NEB), and digested at 37° C. for 1 h 30 min. Genotype analysis was performed by 3% TAE-Gel electrophoresis. Indels were calculated according to the method reported in the previous article (Cong, L., et al., Multiplex genome engineering using CRISPR/Cas systems. Science, 2013. 339(6121): p. 819-23).


The PCR product corresponding to the T7EI positive sample was ligated to the pEASY-T1 or pEASY-B (Transgen) vector, then transformed, plated, and incubated at 37° C. overnight. An appropriate amount of single colonies were picked for Sanger sequencing. Genotypes of mutants were determined by alignment with genotype of the wild-type.


Immunofluorescence Staining

The Cpf1 eukaryotic expression vector was transfected into HeLa cells for 48 hours and then fixed with 4% PFA for 10 min at room temperature; washed three times with PBST (PBS+0.3% Triton X100), blocked with 2% BSA (+0.3% Triton X100) for 15 min at room temperature, incubated with primary antibody Rat anti-HA (Roche, 1:1000) overnight, washed three times with PBST, and incubated with Cy3-labeled fluorescent secondary antibody (Jackson ImmunoResearch, 1:1000) for 2 h at room temperature. Nucleus were stained with DAPI (Sigma, 1:1000) for 10 min, and washed three times with PBST. The slides were mounted with Aqueous Mounting Medium (Abcam) and fixed. Zeiss LSM780 was used for observation.


Prokaryotic Expression and Purification of Cpf1 Protein

The Cpf1 encoding gene was cloned into the prokaryotic expression vector BPK2103/2104. The 10×His tag fused on C-terminal was used to purify the protein. The expression vector was transformed into BL21 (DE3) E. coli competent cells (Transgen). Clones in which IPTG induced high expression were selected and inoculated into 300 mL of CmR+ LB medium, cultured at 37° C. with shaking to OD600˜0.4. IPTG was added at a final concentration of 1 mM to induce expression at 16° C. for 16 h. The culture was centrifuged at 8000 rpm at 4° C. for 10 min, and the bacterial pellet was collected. The bacteria were lysed in 40 mL NPI-10 (+1×EDTA-free Protease Inhibitor Cocktail, Roche; +5% glycerol) on ice bath with ultrasonication crushing: total time 15 min, ultrasonication 2 s, pause 5 s, at the power of 100 W. After crushing, centrifugation was performed at 8000 rpm at 4° C. for 10 min, and the supernatant was collected. 2 mL of His60 Ni Superflow Resin (Takara) was added to the supernatant, shaked at 4° C. for 1 h, purified by Polypropylene Columns (Qiagen). Using gravity flow, the undesired proteins without the fused His10 tag was thoroughly washed away with 20 mL of NPI-20, 20 mL of NPI-40 and 10 mL of NPI-100, respectively. Finally, the Cpf1 protein with fused His10 tag was eluted from the Ni column with 6×0.5 mL NPI-500. The eluted to protein of interest was dialyzed overnight against a dialysate (50 mM Tris-HCl, 300 mM KCl, 1 mM DTT, 20% Glycerol). The protein solution after dialysis was concentrated with 100 kDa Amicon Ultra-4, PLHK Ultracel-PL ultrafiltration tube (Millipore). Protein concentration was determined using a Micro BCA™ Protein Assay Kit (Thermo Scientific™).


RNA In Vitro Transcription

An in vitro transcriptional crRNA template with a T7 promoter was synthesized (BGI) and the crRNA was transcribed in vitro using the HiScribe™ T7 Quick High Yield RNA Synthesis Kit (NEB) according to the protocol. After the transcription was completed, crRNA was purified by Oligo Clean & Concentrator™ (ZYMO Research), and measured with NanoDrop (Thermo Scientific™) for the concentration, and stored at −80° C.


In Vitro Digestion Analysis

The target sequences with different 5′PAM sequences were synthesized and cloned into pUC19 or p11-LacY-wtx1 vector, amplified with primers and purified. 200 ng of purified PCR product was reacted with 400 ng of crRNA and 50 nM of Cpf1 protein in NEBUffer3 (NEB) reaction system for 1 h at 37° C., respectively. It was then analyzed by electrophoresis on a 12% Urea-TBE-PAGE gel or 2.5% agarose gel.


Target Sequence

The target sequences used in the experiment are shown in Table 1 below:











TABLE 1





Target
Sequence (5′-3′)
5′PAM (5′-3′)







Human Dnmt1 Target site 1
CCTCACTCCTGCTCGGTGAATTT
TTTC





Human Dnmt1 Target site 2
AGGAGTGTTCAGTCTCCGTGAAC
TTTG





Human Dnmt1 Target site 3
CTGATGGTCCATGTCTGTTACTC
TTTC





Mouse MeCP2 Target site 1
CCTGCCTCTGCTGGCTCTGCAGA
TTTG





Mouse MeCP2 Target site 2
TGATGTTTCTGCTTTGCCTGCCT
TTTC





Mouse MeCP2 Target site 3
GGGGAAGCCGAGGCTTCTGGCAC
TTTG





Mouse MeCP2 Target site 4
GTGTCCAACCTTCAGGCAAGGTG
TTTC





Mouse MeCP2 Target site 5
GCCTGCCTCTGCTGGCTCTGCAG
TTT





Mouse MeCP2 Target site 6
CTGATGTTTCTGCTTTGCCTGCC
TTT





Mouse MeCP2 Target site 7
GGGGGAAGCCGAGGCTTCTGGCA
TTT





Mouse MeCP2 Target site 8
CGTGTCCAACCTTCAGGCAAGGT
TTT





Mouse Tet1 Target site 1
TCGGGTCAGCATCACTGGCTCAG
TTTC





Mouse Tet1 Target site 2
CTGGGAGCAGCCTGAGAACCCTG
TTTG





Mouse Tet1 Target site 3
ACATCAGCTGAGCCAGTGATGCT
TTTG





Mouse Tet1 Target site 4
GATTCTTGCAGTAGGTGCACTCC
TTTC





Mouse Tet1 Target site 5
CTCTTCTTACAGATCTGGTGGCT
TTTC





Mouse Tet1 Target site 6
CTCGGGTCAGCATCACTGGCTCA
TTT





Mouse Tet1 Target site 7
CACAGGCCAGTACCTCTTCTCCC
TTA





Mouse Tet1 Target site 8
CGATTCTTGCAGTAGGTGCACTC
TTT





Human AAVS1 Target site 1
TGAGAATGGTGCGTCCTAGGTGT
TTTG





Human AAVS1 Target site 2
GTGAGAATGGTGCGTCCTAGGTG
TTT





Human AAVS1 Target site 3
ACCAGGTCGTGGCCGCCTCTACT
TTC





Human AAVS1 Target site 4
TGTGGAAAACTCCCTTTGTGAGA
CCG





Human AAVS1 Target site 5
CTACTCCCTTTCTCTTTCTCCAT
CCT





Human AAVS1 Target site 6
TGTCCCCCTTCCTCGTCCACCAT
CCC





Human AAVS1 Target site 7
CCTTTGTGAGAATGGTGCGTCCT
CTC





Human AAVS1 Target site 8
CTACAGGGGTTCCTGGCTCTGCT
TCC





Mouse Apob Target site 12
GTGGGCCCATGGCGGATGGATGG
TTTC





Mouse Nrl Target site 1
CCTCCCAGTCCCTTGGCTATGGA
TTTC





Mouse Nrl Target site 7
GGCTCCACACCATACAGCTCGGT
TTG









Example 1. Identification of Novel Cpf1 Proteins

CRISPR/Cpf1 (Clustered Regularly Interspaced Short Palindromic Repeats from Prevotella and Francisella 1) is an acquired immune mechanism found in Prevotella and Francisella and successfully engineered for DNA editing (Zetsche, B., et al., Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. Cell, 2015. 163(3): p. 759-71.). Unlike CRISPR/Cas9, Cpf1 requires only one crRNA as a guide, no tracrRNA is required, and the Cpf1 protein uses a T-rich PAM sequence.


The present inventors performed a PSI-Blast search in the NCBI database using the already reported AsCpf1, LbCpf1 and FnCpf1, and selected 25 unreported Cpf1 proteins, of which 21 Cpf1 proteins possess a direct repeat (DR) sequence.


By sequence alignment, the sequences and the RNA secondary structures of the 21 DR from different Cpf1 proteins were found to be quite conserved (FIG. 1). Based on the hypothesis that besides the conserved crRNA, the PAM sequences of all Cpf1 proteins were also conserved (i.e., all using T-rich PAM sequences), 12 of the 25 Cpf1 proteins were selected as candidates. The specific information of the selected 12 candidate Cpf1 proteins is shown in Table 2 below:









TABLE 2







Candidate Cpf1 Proteins














Amino Acid
Codon optimization





Sequence
(human) nucleotide


Name
Bacteria Source
GenBank ID
(SEQ ID NO)
sequence (SEQ ID NO)














ArCpf1

Agathobacter rectalis strain

CZAJ01000001.1
1
13



2789STDY5834884


LpCpf1

Lachnospira pectinoschiza

CZAK01000004.1
2
14



strain 2789STDY5834886


SaCpf1

Sneathia amnii strain SN3

CP011280.1
3
15


HkCpf1

Helcococcus kunzii ATCC

JH601088.1/
4
16



51366
AGEI01000022.1


AbCpf1

Arcobacter butzleri L348

JAIQ01000039.1
5
17


BoCpf1

Bacteroidetes oral taxon

NZ_GG774890.1
6
18



274


OsCpf1

Oribacterium sp. NK2B42

NZ_KE384190.1
7
19


BsCpf1

Butyrivibrio sp. NC30

NZ_AUKC01000013.1
8
20


PsCpf1

Proteocatella sphenisci

NZ_KE384028.1
9
21



DSM 23131


C6Cpf1

Candidatus Dojkabacteria

LBTH01000007.1
10
22


PxCpf1

Pseudobutyrivibrio

FMWK01000002.1
11
23




xylanivorans strain DSM




10317


PrCpf1

Pseudobutyrivibrio ruminis

NZ_KE384121.1
12
24









Example 2. Verification of the Ability of the Cpf1 Proteins to Edit Human Genome

Codon optimization was performed for expression in humans and 12 selected protein coding sequences were synthesized (BGI), cloned into eukaryotic expression vectors (pCAG-2AeGFP-SV40 and pCAG-2AeGFP-SV40_v4) and prokaryotic expression vectors (BPK2103-ccdB and BPK2014-ccdB).


Through HeLa cell transfection experiments, we can find that the selected Cpf1 proteins can be clearly expressed in the nucleus (FIG. 2).


Next, by co-transfection with crRNA targeting human Dnmt1 gene into 293T cells and identification by PCR and T7EI digestion, 8 Cpf1 proteins (Ab, Ar, Bo, Bs, Hk, Lp, Ps, Sa) were found to induce mutation at site 3 of the Dnmt1 gene in 293T cells (FIG. 3A). DNA sequencing further confirmed that the gene site did produce a genetic mutation (FIG. 3B).


To further demonstrate that the identified Cpf1 proteins can induce mutations in the mammalian genome, the above experiments were repeated using sites 1 and 2 of the Dnmt1 gene. T7EI digestion results revealed that four proteins, ArCpf1, BsCpf1, HkCpf1 and LpCpf1, produced significant indels at both sites (FIGS. 4A and 5A). DNA sequencing results showed that besides the above four proteins, PsCpf1 and SaCpf1 also caused mutations in the gene (FIGS. 4B and 5B).


Example 3. Identification of PAM Sequences

To determine the PAM sequence of the identified Cpf1 proteins, first, the BsCpf1 protein was selected for detailed study.


BsCpf1 was expressed in E. coli and purified by His tag (FIG. 6A). The purified BsCpf1 protein and the corresponding crRNA and dsDNA fragment (human Dnmt1 target site 2) were incubated at 37° C. for 1 h, and then subjected to TBE denaturing PAGE gel electrophoresis, which proved that the PAM sequence of BsCpf1 was 5′(T)TTN- (FIG. 6B-E).


Then, the PAM sequences of the remaining Cpf1 proteins were identified by a similar method, and the results are shown in FIG. 7. The experimental results show that the selected Cpf1 proteins all have in vitro enzymatic activity. The PAM sequence of 9 of the Cpf1 proteins is 5′(T)TTN-. The PAM sequences of C6Cpf1, HkCpf1 and PsCpf1 are 5′(T)YYN-.


To confirm that the PAM sequence for in vivo editing of HkCpf1 is 5′(T)YYN-, the HkCpf1 vector was transfected into 293T cells with the targeted crRNA vector at 8 sites on the human AAVS1 gene, respectively. The experimental results show that (FIG. 8), HkCpf1 can edit 7 of the sites and generate mutations, and the PAM sequence is 5′(T)YYN- (underlined portion of FIG. 8B).


Example 4. Genome Editing in Mouse

To extend the applicability of these Cpf1 proteins, a number of transfection experiments were performed in mouse EpiSC cell lines.


First, we transfected EpiSC cells with the BsCpf1 vector and the crRNA vectors (pUC19-crRNA) targeting eight sites on the mouse Tet1 gene, respectively. It was confirmed by T7EI digestion and DNA sequencing that BsCpf1 caused gene mutations at sites 2, 3 and 7 (FIG. 9A). The PAM sequence of site 7 was 5′TTA-, and the result was in agreement with the conclusion of Example 3. This indicates that the BsCpf1 protein uses the 5′ TTN-PAM sequence.


Next, the DNA editing ability of ArCpf1, BsCpf1, HkCpf1, and PxCpf1 was repeatedly verified. Co-transfection was carried out with the crRNA vectors (pUC19-crRNA) targeting eight sites on the mouse MeCP2 gene. Using T7EI digestion and DNA sequencing, it was demonstrated that ArCpf1 (FIG. 10 D, E), BsCpf1 (FIG. 9B, C), HkCpf1 (FIG. 10 F, G) and PxCpf1 (FIG. 11 H, I) can target and edit the genome of mouse.


In addition, the editing ability of ArCpf1, BsCpf1, HkCpf1, LpCpf1, PrCpf1 and PxCpf1 was further verified by target site 12 of mouse Apob, target site 4 of mouse MeCP2, target site 1 of mouse Nrl and target site 7 of mouse Nrl.


The results are shown in FIG. 12, and all six proteins can edit the genome. The PAM sequence of site Nrl-7 is 5′TTG, indicating that the PAM sequence of BsCpf1 and PrCpf1 is 5′TTN-.


The above experimental results demonstrate that the 12 Cpf1 proteins found in the present invention all have DNA editing ability, wherein ArCpf1 (SEQ ID NO: 1), BsCpf1 (SEQ ID NO: 8), HkCpf1 (SEQ ID NO: 4), PxCpf1 (SEQ ID NO: 11), LpCpf1 (SEQ ID NO: 2) and PrCpf1 (SEQ ID NO: 12) enable efficient mammalian genome editing.


Example 5. Optimizing the crRNA Scaffolds to Improve Editing Efficiency

This example optimizes the cRNA scaffolds of the newly identified Cpf1 proteins that can be used for mammalian genome editing so as to improve the genome editing efficiency of each Cpf1 protein (Table 3).


The experimental results are shown in FIG. 13. FIG. 13A shows that cells were transfected with each Cpf1 proteins and different crRNA plasmids, and by PCR and T7EI analysis, it demonstrated that different crRNA scaffolds affect the editing efficiency of Cpf1 proteins.


The inventors then established a library of crRNA mutants, transfected with BsCpf1 or PrCpf1, respectively. Through PCR and T7EI analysis, crRNA mutants that enable Cpf1 to efficiently edit mammalian genome were screened.


The editing efficiency of the crRNA31 mutant was significantly higher than that of the wild-type crRNA scaffold (crRNA2) derived from the genome of the strain. The cells were transfected with Cpf1, crRNA31, and crRNA2 plasmids, and it was confirmed by PCR and T7EI analysis that crRNA31 can increase the editing efficiency of five Cpf1s of ArCpf1, BsCpf1, HkCpf1, PrCpf1, and PxCpf1 at the site MeCP2-4 by two folds (FIG. 13B).



FIGS. 14A and B show that analysis of different target sites by T7EI confirms that the screened cRNA scaffolds can improve the editing efficiency of BsCpf1 and PrCpf1. For PrCpf1, in addition to crRNA31 scaffold, crRNA77, crRNA129 and crRNA159 scaffolds can also significantly improve the editing efficiency at the site Nrl-1.



FIG. 14C shows analysis of intracellular GFP editing efficiency by flow cytometry. crRNA31 scaffold can improve the editing efficiency of BsCpf1 and PrCpf1, and crRNA77 and crRNA159 can also significantly improve the editing efficiency of PrCpf1.









TABLE 3







Identified cRNA Scaffold









cRNA
crRNA Scafflod



Scafflod
Sequence (5′-3′)
SEQ ID NO





crRNA1
ATTTCTACtgttGTAGAT
25





crRNA2
ATTTCTACtattGTAGAT
26





crRNA3
ATTTCTACtactGTAGAT
27





crRNA4
ATTTCTACtttgGTAGAT
28





crRNA5
ATTTCTACtagttGTAGAT
29





crRNA31
ATTTCTACTATGGTAGAT
30





crRNA77
ATTTCTACTGTCGTAGAT
31





crRNA129
ATTTCTACTTGTGTAGAT
32





crRNA159
ATTTCTACTGTGGTAGAT
33



















Sequence Information















>SEQ ID NO: 1 ArCpf1


MNNGTNNFQNFIGISSLQKTLRNALTPTETTQQFIVKNGIIKEDELRGENRQILKDIMDDYYRGFISE


TLSSIDDIDWTSLFEKMEIQLKNGDNKDTLIKEQAEKRKAIYKKFADDDRFKNMFSAKLISDILPEF


VIHNNNYSASEKEEKTQVIKLFSRFATSFKDYFKNRANCFSADDISSSSCHRIVNDNAEIFFSNALVY


RRIVKNLSNDDINKISGDMKDSLKKMSLEKIYSYEKYGEFITQEGISFYNDICGKVNSFMNLYCQKN


KENKNLYKLRKLHKQILCIADTSYEVPYKFESDEEVYQSVNGFLDNISSKHIVERLRKIGDNYNGY


NLDKIYIVSKFYESVSQKTYRDWETINTALEIHYNNILPGNGKSKADKVKKAVKNDLQKSITEINEL


VSNYKLCPDDNIKAETYIHEISHILNNFEAQELKYNPEIHLVESELKASELKNVLDVIMNAFHWCSV


FMTEELVDKDNNFYAELEEIYDEIYPVISLYNLVRNYVTQKPYSTKKIKLNFGIPTLADGWSKSKEY


SNNAIILMRDNLYYLGIFNAKNKPEKKIIEGNTSENKGDYKKMIYNLLPGPNKMIPKVFLSSKTGVE


TYKPSAYILEGYKQNKHLKSSKDFDITFCRDLIDYFKNCIAIHPEWKNFGFDFSDTSTYEDISGFYRE


VELQGYKIDWTYISEKDIDLLQEKGQLYLFQIYNKDFSKKSTGNDNLHTMYLKNLFSEENLKDVV


LKLNGEAEIFFRKSSIKNPIIHKKGSILVNRTYEAEEKDQFGNIQIVRKTIPENIYQELYKYFNDKSDK


ELSDEAAKLKNAVGHHEAATNIVKDYRYTYDKYFLHMPITINFKANKTSFINDRILQYIAKEKDLH


VIGIDRGERNLIYVSVIDTCGNIVEQKSFNIVNGYDYQIKLKQQEGARQIARKEWKEIGKIKEIKEGY


LSLVIHEISKMVIKYNAIIAMEDLSYGFKKGRFKVERQVYQKFETMLINKLNYLVFKDISITENGGL


LKGYQLTYIPEKLKNVGHQCGCIFYVPAAYTSKIDPTTGFVNIFKFKDLTVDAKREFIKKFDSIRYDS


DKNLFCFTFDYNNFITQNTVMSKSSWSVYTYGVRIKRRFVNGRFSNESDTIDITKDMEKTLEMTDI


NWRDGHDLRQDIIDYEIVQHIFEIFKLTVQMRNSLSELEDRNYDRLISPVLNENNIFYDSAKAGDAL


PKDADANGAYCIALKGLYEIKQITENWKEDGKFSRDKLKISNKDWFDFIQNKRYL





>SEQ ID NO: 2 LpCpf1


MFSLDYFSLTLSQRYIDIYNTMIGGNTLADGTKVQGINENINIYRQKNNIDRKNLPTLKPLHKQLLS


DRETLSWIPEAFKTKEEVVGAIEDFYKNNIISFKCCDNIVDITKQFIDIFSLNEDYELNKIFIKNDISITS


ISQDIFKDYRIIKEALWQKHINENPKAAKSKDLTGDKEKYFSRKNSFFSFEEIISSLKLMGRKIDLFSY


FKDNVEYRAHSIETTFIKWQKNKNDKKTTKELLDNILNLQRVLKPLYLKAEVEKDILFYSIFDIYFE


SLNEIVKLYNKVRDFESKKPYSLEKFKLNFQNSTLLSGWDVNKEPDNTSILLKKDGLYYLGIMDKK


HNRVFKNLESSKGGYEKIEYKLLSGPNKMLPKVFFSNKSIGYYNPSPALLEKYKSGVHKKGESFDL


NFCHELIDFFKASIDKHEDWKNFNFKFSDTSEYADISGFYREVEQQGYKITFKNIDEEFINTLINEGK


LYLFQIYNKDFSTFSKGTKNLHTLYWEMIFNEENLKNVVYKLNGEAEIFYRKKSIEYSEDKMKYGH


HYEELKDKFNYPIIKDKRFTMDKFQFHVPITMNFKATGRSYINEEVNDFLRQNSKDVKIIGINRGER


HLIYLTMINAKGEIIQQYSLNEIVNSYNNKNFTVNYNEKLSKKEGERAIARENWGVVENIKELKEG


YLSHAIHTISNLIVENNAIVVLEDLNFEFKRERLKVEKSIYQKFEKMLIDKLNYLVDKKKDINENGG


LLKALQLTNKFESFEKIGKQNGFLFFVNAWNITKICPVTGFVSLFDTRYQSVDKAREFFSKFDSIKY


NEEKEHYEFVFDYSNFTDKAKDTKTKWTVCSYGTRIKTFRNSEKNNNWDNKTVSPTEDLSKLLKS


CDRDIKEFIISQDKKEFFVELLEIFSLIVQMKNSIINSEIDYIISPVANENGEFFDSRFANSSLPKNADAN


AAYNTARKGLMLLEKIRDSEIGKKIDMKITNTEWLNFVQER





>SEQ ID NO: 3 SaCpf1


MNDIEGLKEEFLKISLENFEGIYISNKKLNEISNRKFGDYNSINMMIKQSMNEKGILSKKEINELIPDL


ENINKPKVKSFNLSFIFENLTKEHKELIIDYIRENICNVIENVKITIEKYRNIDNKIEFKNNAEKVSKIK


EMLESINELCKLIKEFNTDEIEKNNEFYNILNKNFEIFESSYKVLNKVRNFVTKKEVIENKMKLNFS


NYQLGNGWHKNKEKDCSIILFRKRNNERWIYYLGILKHGTKIKENDYLSSVDTGFYKMDYYAQNS


LSKMIPKCSITVKNVKNAPEDESVILNDSKKFNEPLEITPEIRKLYGNNEHIKGDKFKKESLVKWIDF


CKEFLLKYKSFEKAKKEILKLKESNLYENLEEFYSDAEEKAYFLEFINIDEDKIKKLVKEKNLYLFQI


YNKDFSAYSTGNKNLHTMYFEELFTDENLKKPVFKLNGNTEVFYRIASSKPKIVHNKGEKLVNKT


YLDDGIIKTIPDSVYEEISEKVKNNEDYSKLLEENNIKNLEIKVATHEIVKDKRYFENKFLFYLPITLN


KKVSNKNTNKNINKNVIDEIKDCNEYNVIGIDRGERNLISLCIINQNGEIILQKEMNIIQSSDKYNVD


YNEKLEIKSKERDNAKKNWSEIGKIKDLKSGYLSAVVHEIVKLAIEYNAVIILEDLNNGFKNSRKKV


DKQIYQKFERALIEKLQFLIFKNYDKNEKGGLRNAFQLTPELKNITKVASQQGIIIYTNPAYTSKIDPT


TGYANIIKKSNNNEESIVKAIDKISYDKEKDMFYFDINLSNSSFNLTVKNVLKKEWRIYTNGERIIYK


DRKYITLNITQEMKDILSKCGIDYLNIDNLKQDILKNKLHKKVYYIFELANKMRNENKDVDYIISPV


LNKDGKFFMTQEINELTPKDADLNGAYNIALKGKLMIDNLNKKEKFVFLSNEDWLNFIQGR





>SEQ ID NO: 4 HkCpf1


MFEKLSNIVSISKTIRFKLIPVGKTLENIEKLGKLEKDFERSDFYPILKNISDDYYRQYIKEKLSDLNL


DWQKLYDAHELLDSSKKESQKNLEMIQAQYRKVLFNILSGELDKSGEKNSKDLIKNNKALYGKLF


KKQFILEVLPDFVNNNDSYSEEDLEGLNLYSKFTTRLKNFWETRKNVFTDKDIVTAIPFRAVNENFG


FYYDNIKIFNKNIEYLENKIPNLENELKEADILDDNRSVKDYFTPNGFNYVITQDGIDVYQAIRGGF


TKENGEKVQGINEILNLTQQQLRRKPETKNVKLGVLTKLRKQILEYSESTSFLIDQIEDDNDLVDRIN


KFNVSFFESTEVSPSLFEQIERLYNALKSIKKEEVYIDARNTQKFSQMLFGQWDVIRRGYTVKITEG


SKEEKKKYKEYLELDETSKAKRYLNIREIEELVNLVEGFEEVDVFSVLLEKFKMNNIERSEFEAPIY


GSPIKLEAIKEYLEKHLEEYHKWKLLLIGNDDLDTDETFYPLLNEVISDYYIIPLYNLTRNYLTRKHS


DKDKIKVNFDFPTLADGWSESKISDNRSIILRKGGYYYLGILIDNKLLINKKNKSKKIYEILIYNQIPE


FSKSIPNYPFTKKVKEHFKNNVSDFQLIDGYVSPLIITKEIYDIKKEKKYKKDFYKDNNTNKNYLYTI


YKWIEFCKQFLYKYKGPNKESYKEMYDFSTLKDTSLYVNLNDFYADVNSCAYRVLFNKIDENTID


NAVEDGKLLLFQIYNKDFSPESKGKKNLHTLYWLSMFSEENLRTRKLKLNGQAEIFYRKKLEKKPII


HKEGSILLNKIDKEGNTIPENIYHECYRYLNKKIGREDLSDEAIALFNKDVLKYKEARFDIIKDRRYS


ESQFFFHVPITFNWDIKTNKNVNQIVQGMIKDGEIKHIIGIDRGERHLLYYSVIDLEGNIVEQGSLNT


LEQNRFDNSTVKVDYQNKLRTREEDRDRARKNWTNINKIKELKDGYLSHVVHKLSRLIIKYEAIVI


MENLNQGFKRGRFKVERQVYQKFELALMNKLSALSFKEKYDERKNLEPSGILNPIQACYPVDAYQ


ELQGQNGIVFYLPAAYTSVIDPVTGFTNLFRLKSINSSKYEEFIKKFKNIYFDNEEEDFKFIFNYKDFA


KANLVILNNIKSKDWKISTRGERISYNSKKKEYFYVQPTEFLINKLKELNIDYENIDIIPLIDNLEEKA


KRKILKALFDTFKYSVQLRNYDFENDYIISPTADDNGNYYNSNEIDIDKTNLPNNGDANGAFNIAR


KGLLLKDRIVNSNESKVDLKIKNEDWINFIIS





>SEQ ID NO: 5 AbCpf1


MFSLDYFSLTLSQRYIDIYNTMIGGNTLADGTKVQGINENINIYRQKNNIDRKNLPTLKPLHKQLLS


DRETLSWIPEAFKTKEEVVGAIEDFYKNNIISFKCCDNIVDITKQFIDIFSLNEDYELNKIFIKNDISITS


ISQDIFKDYRIIKEALWQKHINENPKAAKSKDLTGDKEKYFSRKNSFFSFEEIISSLKLMGRKIDLFSY


FKDNVEYRAHSIETTFIKWQKNKNDKKTTKELLDNILNLQRVLKPLYLKAEVEKDILFYSIFDIYFE


SLNEIVKLYNKVRDFESKKPYSLEKFKLNFQNSTLLSGWDVNKEPDNTSILLKKDGLYYLGIMDKK


HNRVFKNLESSKGGYEKIEYKLLSGPNKMLPKVFFSNKSIGYYNPSPALLEKYKSGVHKKGESFDL


NFCHELIDFFKASIDKHEDWKNFNFKFSDTSEYADISGFYREVEQQGYKITFKNIDEEFINTLINEGK


LYLFQIYNKDFSTFSKGTKNLHTLYWEMIFNEENLKNVVYKLNGEAEIFYRKKSIEYSEDKMKYGH


HYEELKDKFNYPIIKDKRFTMDKFQFHVPITMNFKATGRSYINEEVNDFLRQNSKDVKIIGINRGER


HLIYLTMINAKGEIIQQYSLNEIVNSYNNKNFTVNYNEKLSKKEGERAIARENWGVVENIKELKEG


YLSHAIHTISNLIVENNAIVVLEDLNFEFKRERLKVEKSIYQKFEKMLIDKLNYLVDKKKDINENGG


LLKALQLTNKFESFEKIGKQNGFLFFVNAWNITKICPVTGFVSLFDTRYQSVDKAREFFSKFDSIKY


NEEKEHYEFVFDYSNFTDKAKDTKTKWTVCSYGTRIKTFRNSEKNNNWDNKTVSPTEDLSKLLKS


CDRDIKEFIISQDKKEFFVELLEIFSLIVQMKNSIINSEIDYIISPVANENGEFFDSRFANSSLPKNADAN


AAYNTARKGLMLLEKIRDSEIGKKIDMKITNTEWLNFVQER





>SEQ ID NO: 6 BoCpf1


MRKFNEFVGLYPISKTLRFELKPIGKTLEHIQRNKLLEHDAVRADDYVKVKKIIDKYHKCLIDEALS


GFTFDTEADGRSNNSLSEYYLYYNLKKRNEQEQKTFKTIQNNLRKQIVNKLTQSEKYKRIDKKELI


TTDLPDFLTNESEKELVEKFKNFTTYFTEFHKNRKNMYSKEEKSTAIAFRLINENLPKFVDNIAAFE


KVVSSPLAEKINALYEDFKEYLNVEEISRVFRLDYYDELLTQKQIDLYNAIVGGRTEEDNKIQIKGLN


QYINEYNQQQTDRSNRLPKLKPLYKQILSDRESVSWLPPKFDSDKNLLIKIKECYDALSEKEKVFDK


LESILKSLSTYDLSKIYISNDSQLSYISQKMFGRWDIISKAIREDCAKRNPQKSRESLEKFAERIDKKL


KTIDSISIGDVDECLAQLGETYVKRVEDYFVAMGESEIDDEQTDTTSFKKNIEGAYESVKELLNNAD


NITDNNLMQDKGNVEKIKTLLDAIKDLQRFIKPLLGKGDEADKDGVFYGEFTSLWTKLDQVTPLY


NMVRNYLTSKPYSTKKIKLNFENSTLMDGWDLNKEPDNTTVIFCKDGLYYLGIMGKKYNRVFVD


REDLPHDGECYDKMEYKLLPGANKMLPKVFFSETGIQRFLPSEELLGKYERGTHKKGAGFDLGDC


RALIDFFKKSIERHDDWKKFDFKFSDTSTYQDISEFYREVEQQGYKMSFRKVSVDYIKSLVEEGKL


YLFQIYNKDFSAHSKGTPNMHTLYWKMLFDEENLKDVVYKLNGEAEVFFRKSSITVQSPTHPANS


PIKNKNKDNQKKESKFEYDLIKDRRYTVDKFLFHVPITMNFKSVGGSNINQLVKRHIRSATDLHIIGI


DRGERHLLYLTVIDSRGNIKEQFSLNEIVNEYNGNTYRTDYHELLDTREGERTEARRNWQTIQNIRE


LKEGYLSQVIHKISELAIKYNAVIVLEDLNFGFMRSRQKVEKQVYQKFEKMLIDKLNYLVDKKKPV


AETGGLLRAYQLTGEFESFKTLGKQSGILFYVPAWNTSKIDPVTGFVNLFDTHYENIEKAKVFFDKF


KSIRYNSDKDWFEFVVDDYTRFSPKAEGTRRDWTICTQGKRIQICRNHQRNNEWEGQEIDLTKAFK


EHFEAYGVDISKDLREQINTQNKKEFFEELLRLLRLTLQMRNSMPSSDIDYLISPVANDTGCFFDSRK


QAELKENAVLPMNADANGAYNIARKGLLAIRKMKQEENDSAKISLAISNKEWLKFAQTKPYLED





>SEQ ID NO: 7 OsCpf1


METEILKYDFFEREGKYMYYDGLTKQYALSKTIRNELVPIGKTLDNIKKNRILEADIKRKSDYEHVK


KLMDMYHKKIINEALDNFKLSVLEDAADIYFNKQNDERDIDAFLKIQDKLRKEIVEQLKGHTDYS


KVGNKDFLGLLKAASTEEDRILIESFDNFYTYFTSYNKVRSNLYSAEDKSSTVAYRLINENLPKFFD


NIKAYRTVRNAGVISGDMSIVEQDELFEVDTFNHTLTQYGIDTYNHMIGQLNSAINLYNQKMHGA


GSFKKLPKMKELYKQLLTEREEEFIEEYTDDEVLITSVHNYVSYLIDYLNSDKVESFFDTLRKSDGK


EVFIKNDVSKTTMSNILFDNWSTIDDLINHEYDSAPENVKKTKDDKYFEKRQKDLKKNKSYSLSKI


AALCRDTTILEKYIRRLVDDIEKIYTSNNVFSDIVLSKHDRSKKLSKNTNAVQAIKNMLDSIKDFEH


DVMLINGSGQEIKKNLNVYSEQEALAGILRQVDHIYNLTRNYLTKKPFSTEKIKLNFNRPTFLDGW


DKNKEEANLGILLIKDNRYYLGIMNTSSNKAFVNPPKAISNDIYKKVDYKLLPGPNKMLPKVFFAT


KNIAYYAPSEELLSKYRKGTHKKGDSFSIDDCRNLIDFFKSSINKNTDWSTFGFNFSDTNSYNDISDF


YREVEKQGYKLSFTDIDACYIKDLVDNNELYLFQIYNKDFSPYSKGKLNLHTLYFKMLFDQRNLDN


VVYKLNGEAEVFYRPASIESDEQIIHKSGQNIKNKNQKRSNCKKTSTFDYDIVKDRRYCKDKFMLH


LPITVNFGTNESGKFNELVNNAIRADKDVNVIGIDRGERNLLYVVVVDPCGKIIEQISLNTIVDKEYD


IETDYHQLLDEKEGSRDKARKDWNTIENIKELKEGYLSQVVNIIAKLVLKYDAIICLEDLNFGFKRG


RQKVEKQVYQKFEKMLIDKMNYLVLDKSRKQESPQKPGGALNALQLTSAFKSFKELGKQTGIIYY


VPAYLTSKIDPTTGFANLFYIKYESVDKARDFFSKFDFIRYNQMDNYFEFGFDYKSFTERASGCKSK


WIACTNGERIVKYRNSDKNNSFDDKTVILTDEYRSLFDKYLQNYIDEDDLKDQILQIDSADFYKNLI


KLFQLTLQMRNSSSDGKRDYIISPVKNYREEFFCSEFSDDTFPRDADANGAYNIARKGLWVIKQIRE


TKSGTKINLAMSNSEWLEYAQCNLL





>SEQ ID NO: 8 BsCpf1


MYYQNLTKKYPVSKTIRNELIPIGKTLENIRKNNILESDVKRKQDYEHVKGIMDEYHKQLINEALD


NYMLPSLNQAAEIYLKKHVDVEDREEFKKTQDLLRREVTGRLKEHENYTKIGKKDILDLLEKLPSI


SEEDYNALESFRNFYTYFTSYNKVRENLYSDEEKSSTVAYRLINENLPKFLDNIKSYAFVKAAGVLA


DCIEEEEQDALFMVETFNMTLTQEGIDMYNYQIGKVNSAINLYNQKNHKVEEFKKIPKMKVLYKQI


LSDREEVFIGEFKDDETLLSSIGAYGNVLMTYLKSEKINIFFDALRESEGKNVYVKNDLSKTTMSNI


VFGSWSAFDELLNQEYDLANENKKKDDKYFEKRQKELKKNKSYTLEQMSNLSKEDISPIENYIERI


SEDIEKICIYNGEFEKIVVNEHDSSRKLSKNIKAVKVIKDYLDSIKELEHDIKLINGSGQELEKNLVVY


VGQEEALEQLRPVDSLYNLTRNYLTKKPFSTEKVKLNFNKSTLLNGWDKNKETDNLGILFFKDGK


YYLGIMNTTANKAFVNPPAAKTENVFKKVDYKLLPGSNKMLPKVFFAKSNIGYYNPSTELYSNYK


KGTHKKGPSFSIDDCHNLIDFFKESIKKHEDWSKFGFEFSDTADYRDISEFYREVEKQGYKLTFTDI


DESYINDLIEKNELYLFQIYNKDFSEYSKGKLNLHTLYFMMLFDQRNLDNVVYKLNGEAEVFYRPA


SIAENELVIHKAGEGIKNKNPNRAKVKETSTFSYDIVKDKRYSKYKFTLHIPITMNFGVDEVRRFND


VINNALRTDDNVNVIGIDRGERNLLYVVVINSEGKILEQISLNSIINKEYDIETNYHALLDEREDDRN


KARKDWNTIENIKELKTGYLSQVVNVVAKLVLKYNAIICLEDLNFGFKRGRQKVEKQVYQKFEK


MLIEKLNYLVIDKSREQVSPEKMGGALNALQLTSKFKSFAELGKQSGIIYYVPAYLTSKIDPTTGFVN


LFYIKYENIEKAKQFFDGFDFIRFNKKDDMFEFSFDYKSFTQKACGIRSKWIVYTNGERIIKYPNPEK


NNLFDEKVINVTDEIKGLFKQYRIPYENGEDIKEIIISKAEADFYKRLFRLLHQTLQMRNSTSDGTRD


YIISPVKNDRGEFFCSEFSEGTMPKDADANGAYNIARKGLWVLEQIRQKDEGEKVNLSMTNAEWL


KYAQLHLL





>SEQ ID NO: 9 PsCpf1


MENFKNLYPINKTLRFELRPYGKTLENFKKSGLLEKDAFKANSRRSMQAIIDEKFKETIEERLKYTE


FSECDLGNMTSKDKKITDKAATNLKKQVILSFDDEIFNNYLKPDKNIDALFKNDPSNPVISTFKGFT


TYFVNFFEIRKHIFKGESSGSMAYRIIDENLTTYLNNIEKIKKLPEELKSQLEGIDQIDKLNNYNEFIT


QSGITHYNEIIGGISKSENVKIQGINEGINLYCQKNKVKLPRLTPLYKMILSDRVSNSFVLDTIENDTE


LIEMISDLINKTEISQDVIMSDIQNIFIKYKQLGNLPGISYSSIVNAICSDYDNNFGDGKRKKSYENDR


KKHLETNVYSINYISELLTDTDVSSNIKMRYKELEQNYQVCKENFNATNWMNIKNIKQSEKTNLIK


DLLDILKSIQRFYDLFDIVDEDKNPSAEFYTWLSKNAEKLDFEFNSVYNKSRNYLTRKQYSDKKIK


LNFDSPTLAKGWDANKEIDNSTIIMRKFNNDRGDYDYFLGIWNKSTPANEKIIPLEDNGLFEKMQY


KLYPDPSKMLPKQFLSKIWKAKHPTTPEFDKKYKEGRHKKGPDFEKEFLHELIDCFKHGLVNHDE


KYQDVFGFNLRNTEDYNSYTEFLEDVERCNYNLSFNKIADTSNLINDGKLYVFQIWSKDFSIDSKG


TKNLNTIYFESLFSEENMIEKMFKLSGEAEIFYRPASLNYCEDIIKKGHHHAELKDKFDYPIIKDKRY


SQDKFFFHVPMVINYKSEKLNSKSLNNRTNENLGQFTHIIGIDRGERHLIYLTVVDVSTGEIVEQKH


LDEIINTDTKGVEHKTHYLNKLEEKSKTRDNERKSWEAIETIKELKEGYISHVINEIQKLQEKYNALI


VMENLNYGFKNSRIKVEKQVYQKFETALIKKFNYIIDKKDPETYIHGYQLTNPITTLDKIGNQSGIVL


YIPAWNTSKIDPVTGFVNLLYADDLKYKNQEQAKSFIQKIDNIYFENGEFKFDIDFSKWNNRYSISKT


KWTLTSYGTRIQTFRNPQKNNKWDSAEYDLTEEFKLILNIDGTLKSQDVETYKKFMSLFKLMLQL


RNSVTGTDIDYMISPVTDKTGTHFDSRENIKNLPADADANGAYNIARKGIMAIENIMNGISDPLKIS


NEDYLKYIQNQQE





>SEQ ID NO: 10 C6Cpf1


MKNVFGGFTNLYSLTKTLRFELKPTSKTQKLMKRNNVIQTDEEIDKLYHDEMKPILDEIHRRFINDA


LAQKIFISASLDNFLKVVKNYKVESAKKNIKQNQVKLLQKEITIKTLGLRREVVSGFITVSKKWKD


KYVGLGIKLKGDGYKVLTEQAVLDILKIEFPNKAKYIDKFRGFWTYFSGFNENRKNYYSEEDKATS


IANRIVNENLSRYIDNIIAFEEILQKIPNLKKFKQDLDITSYNYYLNQAGIDKYNKIIGGYIVDKDKKI


QGINEKVNLYTQQTKKKLPKLKFLFKQIGSERKGFGIFEIKEGKEWEQLGDLFKLQRTKINSNGREK


GLFDSLRTMYREFFDEIKRDSNSQARYSLDKIYFNKASVNTISNSWFTNWNKFAELLNIKEDKKNG


EKKIPEQISIEDIKDSLSIIPKENLEELFKLTNREKHDRTRFFGSNAWVTFLNIWQNEIEESFNKLEEKE


KDFKKNAAIKFQKNNLVQKNYIKEVCDRMLAIERMAKYHLPKDSNLSREEDFYWIIDNLSEQREIY


KYYNAFRNYISKKPYNKSKMKLNFENGNLLGGWSDGQERNKAGVILRNGNKYYLGVLINRGIFR


TDKINNEIYRTGSSKWERLILSNLKFQTLAGKGFLGKHGVSYGNMNPEKSVPSLQKFIRENYLKKY


PQLTEVSNTKFLSKKDFDAAIKEALKECFTMNFINIAENKLLEAEDKGDLYLFEITNKDFSGKKSGK


DNIHTIYWKYLFSESNCKSPIIGLNGGAEIFFREGQKDKLHTKLDKKGKKVFDAKRYSEDKLFFHV


SITINYGKPKNIKFRDIINQLITSMNVNIIGIDRGEKHLLYYSVIDSNGIILKQGSLNKIRVGDKEVDFN


KKLTERANEMKKARQSWEQIGNIKNFKEGYLSQAIHEIYQLMIKYNAIIVLEDLNTEFKAKRLSKV


EKSVYKKFELKLARKLNHLILKDRNTNEIGGVLKAYQLTPTIGGGDVSKFEKAKQWGMMFYVRA


NYTSTTDPVTGWRKHLYISNFSNNSVIKSFFDPTNRDTGIEIFYSGKYRSWGFRYVQKETGKKWEL


FATKELERFKYNQTTKLCEKINLYDKFEELFKGIDKSADIYSQLCNVLDFRWKSLVYLWNLLNQIRN


VDKNAEGNKNDFIQSPVYPFFDSRKTDGKTEPINGDANGALNIARKGLMLVERIKNNPEKYEQLIR


DTEWDAWIQNFNKVN





>SEQ ID NO: 11 PxCpf1


MIIGRDFNMYYQNLTKMYPISKTLRNELIPVGKTLENIRKNGILEADIQRKADYEHVKKLMDNYHK


QLINEALQGVHLSDLSDAYDLYFNLSKEKNSVDAFSKCQDKLRKEIVSFLKNHENFPKIGNKEIIKLI


QSLNDNDADNNALDSFSNFYTYFSSYNEVRKNLYSDEEKSSTVAYRLINENLPKSLDNIKAYAIAKK


AGVRAEGLSEEEQDCLFIIETFERTLTQDGIDNYNADIGKLNTAINLYNQQNKKQEGFRKVPQMKCL


YKQILSDREEAFIDEFSDDEDLITNIESFAENMNVFLNSEIITDFKNALVESDGSLVYIKNDVSKTLFS


NIVFGSWNAIDEKLSDEYDLANSKKKKDEKYYEKRQKELKKNKSYDLETIIGLFDDSIDVIGKYIE


KLESDITAIAEAKNDFDEIVLRKHDKNKSLRKNTNAVEAIKSYLDTVKDFERDIKLINGSGQEVEKN


LVVYAEQENILAEIKNVDSLYNMSRNYLTQKPFSTEKFKLNFENPTLLNGWDRNKEKDYLGILFEK


EGMYYLGIINNNHRKIFENEKLCTGKESCFNKIVYKQISNAAKYLSSKQINPQNPPKEIAEILLKRKA


DSSSLSRKETELFIDYLKDDFLVNYPMIINSDGENFFNFHFKQAKDYGSLQEFFKEVEHQAYSLKTR


PIDDSYIYRMIDEGKLYLFQIHNKDFSPYSKGNLNLHTIYLQMLFDQRNLNNVVYKLNGEAEVFYR


PASINDEEVIIHKAGEEIKNKNSKRAVDKPTSKFGYDIIKDRRYSKDKFMLHIPVTMNFGVDETRRF


NDVVNDALRNDEKVRVIGIDRGERNLLYVVVVDTDGTILEQISLNSIINNEYSIETDYHKLLDEKEG


DRDRARKNWTTIENIKELKEGYLSQVVNVIAKLVLKYNAIICLEDLNFGFKRGRQKVEKQVYQKF


EKMLIDKLNYLVIDKSRKQEKPEEFGGALNALQLTSKFTSFKDMGKQTGIIYYVPAYLTSKIDPTTGF


ANLFYVKYENVEKAKEFFSRFDSISYNNESGYFEFAFDYKKFTDRACGARSQWTVCTYGERIIKYR


NADKNNSFDDKTIVLSEEFKELFSIYGISYEDGAELKNKIMSVDEADFFRCLTGLLQKTLQMRNSSN


DGTRDYIISPIMNDRGEFFNSEACDASKPKDADANGAFNIARKGLWVLEQIRNTPSGDKLNLAMSN


AEWLEYAQRNQI





>SEQ ID NO: 12 PrCpf1


MIIGRDFNMYYQNLTKMYPISKTLRNELIPVGKTLENIRKNGILEADIQRKADYEHVKKLMDNYHK


QLINEALQGVHLSDLSDAYDLYFNLSKEKNSVDAFSKCQDKLRKEIVSLLKNHENFPKIGNKEIIKL


LQSLYDNDTDYKALDSFSNFYTYFSSYNEVRKNLYSDEEKSSTVAYRLINENLPKFLDNIKAYAIAK


KAGVRAEGLSEEDQDCLFIIETFERTLTQDGIDNYNAAIGKLNTAINLFNQQNKKQEGFRKVPQMK


CLYKQILSDREEAFIDEFSDDEDLITNIESFAENMNVFLNSEIITDFKIALVESDGSLVYIKNDVSKTSF


SNIVFGSWNAIDEKLSDEYDLANSKKKKDEKYYEKRQKELKKNKSYDLETIIGLFDDNSDVIGKYI


EKLESDITAIAEAKNDFDEIVLRKHDKNKSLRKNTNAVEAIKSYLDTVKDFERDIKLINGSGQEVEK


NLVVYAEQENILAEIKNVDSLYNMSRNYLTQKPFSTEKFKLNFNRATLLNGWDKNKETDNLGILFE


KDGMYYLGIMNTKANKIFVNIPKATSNDVYHKVNYKLLPGPNKMLPKVFFAQSNLDYYKPSEELL


AKYKAGTHKKGDNFSLEDCHALIDFFKASIEKHPDWSSFGFEFSETCTYEDLSGFYREVEKQGYKI


TYTDVDADYITSLVERDELYLFQIYNKDFSPYSKGNLNLHTIYLQMLFDQRNLNNVVYKLNGEAE


VFYRPASINDEEVIIHKAGEEIKNKNSKRAVDKPTSKFGYDIIKDRRYSKDKFMLHIPVTMNFGVDE


TRRFNDVVNDALRNDEKVRVIGIDRGERNLLYVVVVDTDGTILEQISLNSIINNEYSIETDYHKLLD


EKEGDRDRARKNWTTIENIKELKEGYLSQVVNVIAKLVLKYNAIICLEDLNFGFKRGRQKVEKQV


YQKFEKMLIDKLNYLVIDKSRKQDKPEEFGGALNALQLTSKFTSFKDMGKQTGIIYYVPAYLTSKID


PTTGFANLFYVKYENVEKAKEFFSRFDSISYNNESGYFEFAFDYKKFTDRACGARSQWTVCTYGER


IIKFRNTEKNNSFDDKTIVLSEEFKELFSIYGISYEDGAELKNKIMSVDEADFFRSLTRLFQQTMQMR


NSSNDVTRDYIISPIMNDRGEFFNSEACDASKPKDADANGAFNIARKGLWVLEQIRNTPSGDKLNL


AMSNAEWLEYAQRNQI





>SEQ ID NO: 13 ArCpf1 nucleotide sequence


AACAACGGCACAAACAATTTCCAGAACTTCATTGGCATTTCATCACTCCAGAAGACCCTGAGG


AATGCCCTGACACCTACCGAGACAACCCAGCAATTCATTGTGAAGAACGGAATCATTAAGGAG


GACGAACTGAGAGGCGAGAACCGCCAGATCCTGAAAGACATCATGGACGACTACTATCGCGGT


TTCATCTCCGAGACACTCAGCAGCATTGATGATATTGATTGGACATCTCTGTTTGAGAAGATGGA


AATCCAGCTGAAGAATGGCGACAACAAGGACACTCTCATTAAGGAGCAAGCAGAGAAGAGAA


AGGCTATCTACAAGAAGTTTGCAGACGACGATCGCTTCAAGAATATGTTTAGCGCAAAGCTGAT


AAGTGATATTCTCCCAGAGTTTGTGATTCACAACAACAACTATTCTGCCAGCGAGAAGGAAGA


GAAGACACAGGTCATCAAGCTCTTCTCTCGGTTTGCCACCTCATTCAAAGATTACTTCAAGAAT


CGGGCAAATTGTTTCTCTGCCGATGACATCTCATCTTCCTCCTGTCACAGAATCGTTAATGATAA


CGCCGAAATCTTCTTCTCAAATGCCCTGGTGTACCGGAGAATCGTGAAGAATCTGAGTAACGAC


GACATCAATAAGATTAGCGGTGATATGAAGGACTCTCTGAAGAAGATGTCCCTGGAGAAGATAT


ACAGCTATGAGAAGTACGGCGAGTTCATCACTCAAGAGGGAATTAGCTTCTACAACGACATTTG


TGGCAAGGTTAACTCATTCATGAATCTCTACTGCCAGAAGAACAAGGAGAATAAGAATCTGTAC


AAACTGCGCAAACTGCATAAGCAAATCCTGTGTATCGCCGACACCTCTTATGAGGTGCCCTACA


AGTTTGAGTCTGACGAAGAGGTGTACCAGTCCGTGAACGGATTCCTGGACAACATCAGTTCTA


AACACATCGTCGAGCGGCTGAGGAAGATCGGCGACAACTACAATGGCTATAACCTGGACAAGA


TATACATCGTCAGTAAGTTCTATGAAAGTGTGAGTCAGAAGACCTACCGGGACTGGGAGACTAT


CAACACCGCACTCGAAATCCACTACAATAACATCCTGCCTGGCAACGGTAAGAGCAAGGCCGA


CAAGGTGAAGAAGGCCGTCAAGAACGACCTCCAGAAGAGCATCACCGAGATTAACGAACTGG


TGAGTAACTACAAGCTCTGCCCAGACGATAACATTAAGGCAGAAACCTACATTCATGAGATTTC


CCATATACTGAACAACTTTGAAGCTCAAGAGCTGAAATACAATCCCGAGATTCATCTGGTGGAG


AGCGAGCTGAAAGCATCCGAGCTGAAGAACGTTCTCGACGTCATCATGAACGCCTTCCACTGG


TGTAGCGTGTTCATGACTGAGGAGCTGGTCGATAAAGACAACAATTTCTACGCCGAGCTGGAA


GAAATCTACGACGAGATATACCCTGTGATTAGCCTCTACAATCTGGTCCGGAACTATGTGACCCA


GAAGCCCTATTCAACTAAGAAGATCAAGCTGAACTTTGGGATTCCTACCCTGGCCGACGGCTGG


TCCAAGAGCAAAGAGTATTCCAATAACGCAATCATCCTGATGAGAGACAACCTGTATTACCTCG


GTATCTTTAACGCTAAGAATAAGCCCGAGAAGAAGATCATCGAAGGAAATACATCCGAGAACA


AGGGCGACTACAAGAAGATGATCTATAACCTGCTGCCTGGCCCAAACAAGATGATCCCTAAGGT


GTTCCTGAGCAGCAAGACCGGAGTCGAGACTTACAAGCCAAGTGCCTACATACTGGAGGGCTA


TAAGCAGAACAAGCACCTGAAATCTAGCAAAGATTTCGACATCACTTTCTGTCGCGACCTGATC


GACTATTTCAAGAATTGTATTGCCATCCACCCAGAGTGGAAGAATTTCGGATTCGACTTCTCTGA


CACCTCCACATACGAGGACATCAGTGGCTTCTATAGAGAAGTGGAGCTGCAAGGTTACAAGATC


GACTGGACCTACATATCTGAGAAAGACATCGACCTGCTGCAAGAGAAAGGGCAGCTCTACCTC


TTCCAAATCTACAACAAGGACTTTAGTAAGAAGTCTACAGGTAATGACAATCTGCACACTATGT


ACCTGAAGAATCTCTTCTCTGAAGAGAACCTCAAAGACGTGGTGCTGAAACTGAACGGCGAA


GCAGAAATCTTCTTTCGCAAATCATCCATTAAGAATCCTATCATACACAAGAAGGGTAGTATCCT


GGTGAACAGGACATACGAAGCCGAGGAGAAAGATCAGTTCGGCAACATTCAGATTGTGCGCA


AGACTATTCCCGAGAATATCTACCAGGAGCTGTACAAATACTTCAACGATAAGTCTGATAAAGA


GCTGTCAGACGAAGCAGCCAAGCTGAAGAATGCTGTGGGACATCATGAAGCAGCTACTAACAT


CGTGAAAGACTATAGATACACATACGACAAGTATTTCCTGCACATGCCAATTACCATCAACTTCA


AAGCCAATAAGACTTCTTTCATTAACGACCGCATCCTCCAGTACATTGCAAAGGAGAAAGACCT


GCACGTGATCGGGATTGATCGCGGAGAACGGAACCTCATCTACGTTTCAGTCATCGACACATGC


GGTAACATCGTCGAACAGAAGAGCTTCAACATTGTTAATGGGTATGATTACCAGATCAAACTCA


AGCAACAGGAAGGCGCACGGCAGATTGCTCGCAAGGAGTGGAAGGAAATTGGCAAGATCAAG


GAAATCAAGGAAGGATACCTCAGCCTCGTCATTCATGAAATCAGCAAGATGGTGATCAAGTATA


ACGCAATCATCGCTATGGAGGACCTGAGTTATGGCTTCAAGAAAGGCAGATTCAAGGTGGAGC


GGCAAGTCTACCAGAAATTCGAAACAATGCTGATCAACAAGCTGAACTACCTGGTGTTCAAAG


ACATCAGCATAACCGAGAACGGAGGACTCCTGAAAGGCTACCAGCTCACATACATCCCAGAGA


AACTCAAGAATGTGGGCCACCAATGCGGCTGCATCTTCTACGTCCCTGCCGCTTACACCAGCAA


GATAGATCCCACTACAGGATTCGTGAACATATTCAAATTCAAGGATCTGACAGTGGACGCCAAG


AGGGAGTTCATCAAGAAGTTTGATAGTATTCGCTATGACAGCGATAAGAATCTGTTCTGTTTCAC


CTTTGACTACAACAACTTCATTACCCAGAATACCGTTATGTCCAAGTCTAGCTGGAGTGTGTATA


CCTACGGTGTTCGGATCAAGCGGAGGTTTGTCAATGGTAGATTCTCAAACGAAAGCGACACCAT


CGACATCACAAAGGACATGGAGAAGACACTGGAAATGACTGACATAAACTGGAGAGATGGAC


ACGACCTGCGGCAAGACATCATTGACTACGAGATCGTTCAGCACATCTTTGAAATCTTCAAGCT


GACTGTTCAGATGCGGAATAGTCTGAGCGAGCTGGAGGACCGGAATTACGACCGCCTGATCTC


ACCAGTCCTGAACGAGAATAACATCTTCTACGATTCTGCCAAAGCAGGAGATGCCCTGCCAAA


GGACGCTGATGCAAACGGTGCCTACTGCATCGCCCTCAAAGGTCTGTACGAAATCAAGCAGATT


ACCGAGAATTGGAAGGAGGACGGGAAGTTCAGCAGAGACAAGCTCAAGATCAGCAATAAGGA


CTGGTTCGATTTCATTCAGAACAAGCGCTACCTG





>SEQ ID NO: 14 LpCpf1 nucleotide sequence


ATGATCATGAACAACGTGACCGGCGACTTCAGCGAGTTCGTGGCCATCAGCAAGGTGCAGAAG


ACCCTGCGCAACGAGCTGCGCCCCACCCCCCTGACCATGAAGCACATCAAGCAGAAGGGCATC


ATCACCGAGGACGAGTACAAGACCCAGCAGAGCCTGGAGCTGAAGCGCATCGCCGACGGCTA


CTACCGCGACTACATCACCCACAAGCTGAACGACACCAACAACCTGGACTTCCGCAACCTGTT


CGAGGCCATCGAGGAGAAGTACAAGAAGAACGACAAGGACAACCGCGACAAGCTGGACCTG


GTGGAGAAGAGCAAGCGCGGCGAGATCGCCAAGCTGCTGAGCGCCGACGACAACTTCAAGAG


CATGTTCGAGGCCAAGCTGATCACCCAGCTGCTGCCCGTGTACGTGGAGCAGAACTACATCGG


CGAGGACAAGGAGAAGGCCCTGGAGACCATCGCCCTGTTCAAGGGCTTCACCACCTACTTCAC


CGACTACTTCAACATCCGCAAGAACATGTTCAAGGAGAACGGCGGCGCCAGCAGCATCTGCTA


CCGCATCGTGAACGTGAACGCCAGCATCTTCTACGACAACCTGAAGACCTTCATGTGCATCAAG


GAGAAGGCCGAGACCGAGATCGCCCTGATCGAGGAGGAGCTGACCGAGCTGCTGGACAGCTG


GCGCCTGGAGCACATCTTCAGCGAGGACTACTACAACGAGCTGCTGGCCCAGAAGGGCATCGA


CTACTACAACCAGATCTGCGGCGACGTGAACAAGCACATGAACCTGTACTGCCAGCAGAACAA


GCTGAAGGCCAACGTGTTCAAGATGACCAAGCTGCAGAAGCAGATCATGGGCATCAGCGAGA


AGGCCTTCGAGATCCCCCCCATGTACCAGAACGACGAGGAGGTGTACGCCGCCTTCAACGGCT


TCATCAGCCGCCTGGAGGAGGTGAAGCTGATCGACCGCCTGGGCAACGTGCTGCAGAACAGC


AACATCTACGACACCGCCAAGATCTACATCAACGCCCGCTGCTACACCAACGTGAGCAGCTAC


GTGTACGGCGGCTGGGGCGTGATCGAGAGCGCCATCGAGCGCTACTGGTACAACACCATCGCC


GGCAAGGGCCAGAGCAAGGCCAAGAAGATCGAGAAGGCCAAGAAGGACAACAAGTTCATGA


GCGTGAAGGAGCTGGACAGCATCGTGAGCGACTACGAGCCCGACTACTTCAACGCCAGCAACA


TGGACGACGACAACAGCGGCCGCGCCTTCAGCGGCCACGGCGTGCTGGGCTACTTCAACAAG


ATGAGCAAGCTGCTGGCCAACATGAGCCTGCACACCATCACCTACGACAGCGGCGACAGCCTG


ATCGAGAACAAGGAGACCGCCCTGAACATCAAGAAGGACCTGGACGACATCATGAGCATCTAC


CACTGGCTGCAGACCTTCATCATCGACGAGGTGGTGGAGAAGGACAACGCCTTCTACGCCGAG


CTGGAGGACATCTACTACGAGCTGGAGAACGTGGTGACCCTGTACGACCGCATCCGCAACTAC


GTGACCCGCAAGCCCTACAGCACCCAGAAGTTCaagcttAACTTCGCCAGCCCCACCCTGGCCAG


CGGCTGGAGCCGCAGCAAGGAGTTCGACAACAACGCCATCATCCTGCTGCGCAACAACAAGT


ACTACATCGCCATCTTCAACGTGAACAACAAGCCCGACAAGCAGATCATCAAGGGCAGCGAGG


AGCAGCGCCTGAGCACCGACTACAAGAAGATGGTGTACAACCTGCTGCCCGGCCCCAACAAGA


TGCTGCCCTGGGTGTTCATCAAGAGCAACACCGGCAAGCGCGACTACAACCCCAGCAGCTACA


TCCTGGAGGGCTACGAGAAGAACCGCCACATCAAGAGCAGCGGCAACTTCGACATCAACTACT


GCCACGACCTGATCGACTACTACAAGGCCTGCATCAACAAGCACCCCGAGTGGAAGAACTACG


GCTTCAAGTTCAAGGAGACCACCCAGTACAACGACATCGGCCAGTTCTACAAGGACGTGGAGA


AGCAGGGCTACAGCATCAGCTGGGCCTACATCAGCGAGGCCGACATCAACCGCCTGGACGAGG


AGGGCAAGATCTACCTGTTCGAGATCTACAACAAGGACCTGAGCAGCCACAGCACCGGCAAG


GACAACCTGCACACCATGTACCTGAAGAACATCTTCAGCGAGGACAACCTGAAGAACATCTGC


ATCGAGCTGAACGGCAACGCCGAGCTGTTCTACCGCAAGAGCAGCATGAAGCGCAACATCACC


CACAAGAAGGACACCGTGCTGGTGAACAAGACCTACATCAACGAGGCCGGCGTGCGCGTGAG


CCTGACCGACGAGGACTACATCAAGGTGTACAACTACTACAACAACGACTACGTGATCGACGT


GGAGAAGGACAAGAAGCTGGTGGAGATCCTGGAGCGCATCGGCCACCGCAAGAACCCCATCG


ACATCATCAAGGACAAGCGCTACACCGAGGACAAGTACTTCCTGCACTTCCCCATCACCATCAA


CTACGGCGTGGACGACGAGAACATCAACGCCAAGATGATCGAGTACATCGCCAAGCACAACAA


CATGAACGTGATCGGCATCGACCGCGGCGAGCGCAACCTGATCTACATCAGCGTGATCAACAA


CAAGGGCAACATCATCGAGCAGAAGAGCTTCAACCTGGTGAACAACTACGACTACAAGAACA


AGCTGAAGAACATGGAGAAGACCCGCGACAACGCCCGCAAGAACTGGCAGGAGATCGGCAA


GATCAAGGACGTGAAGAACGGCTACCTGAGCGGCGTGATCAGCAAGATCGCCCGCATGGTGGT


GGACTACAACGCCATCATCGTGATGGAGGACCTGAACCGCGGCTTCAAGCGCGGCCGCTTCAA


GGTGGAGCGCCAGGTGTACCAGAAGTTCGAGAACATGCTGATCAGCAAGCTGAACTACCTGGT


GTTCAAGGAGAAGAAGGCCGACGAGAACGGCGGCATCCTGAAGGGCTACCAGCTGACCTACC


TGCCCAAGAGCGCCCTGCAGATCGGCAAGCAGTGCGGCTGCATCTTCTACGTGCCCGCCGCCT


ACACCAGCAAGATCGACCCCGCCACCGGCTTCATCAACATCTTCGACTTCAAGAAGTACAGCG


GCAGCGCCATCAACGCCAAGGTGAAGGACAAGAAGGAGTTCCTGATGAGCATGAACAGCATC


CGCTACGTGAACGAGGGCAGCGCCGAGTACGAGAAGATCGGCCACCGCCAGCTGTTCGCCTTC


AGCTTCGACTACAACAACTTCAAGACCTACAACGTGAGCATCCCCGTGAACGAGTGGACCACC


TACACCTACGGCGAGCGCATCAAGAAGCTGTACAAGGACGGCCGCTGGAGCGGCAGCGAGGT


GCTGAACCTGACCGAGGACCTGATCGAGCTGATGGAGCAGTACGGCATCGAGTACAAGGACGG


CCACGACATCCGCGAGGACATCAGCCACATGGACGAGATGCGCAACGCCGACTTCATCTGCAA


CCTGTTCGAGAAGTTCAAGTACACCGTGCAGCTGCGCAACAGCAAGAGCGAGGCCGAGGGCG


ACGACTACGACCGCCTGGTGAGCCCCGTGCTGAACAGCCACAACGGCTTCTTCGACAGCAGCG


ACTACAAGGAGAACGAGAAGAGCGACGACATCATCGACGACAAGCAGATCATGCCCAAGGAC


GCCGACGCCAACGGCGCCTACTGCATCGCCCTGAAGGGCCTGTACGAGATCAACAAGATCAAG


GAGAACTGGAGCGACGACAAGAAGCTGAAGGAGAGCGAGCTGTACATCGGCGTGACCGAGTG


GCTGGACTACATCCAGAACCGCCGCTTCGAG





>SEQ ID NO: 15 SaCpf1 nucleotide sequence


ATGAACGACATTGAAGGACTGAAGGAGGAATTTCTGAAGATTTCCCTGGAGAACTTTGAGGGT


ATCTACATCAGCAATAAGAAGCTGAATGAGATTTCTAACCGGAAATTCGGCGACTACAACAGCA


TTAACATGATGATCAAGCAGAGCATGAATGAGAAGGGTATTCTGTCCAAGAAGGAGATCAACG


AACTCATACCCGATCTGGAGAACATCAACAAGCCTAAGGTCAAGTCTTTCAATCTGAGTTTCAT


CTTCGAGAACCTCACCAAAGAGCATAAGGAACTGATCATCGACTACATCAGGGAGAACATCTG


CAACGTGATTGAGAATGTCAAGATTACAATAGAGAAATACAGGAACATTGATAACAAGATCGAG


TTTAAGAACAATGCTGAGAAGGTGTCCAAGATTAAGGAAATGCTGGAGAGCATCAACGAGCTG


TGTAAACTGATTAAGGAGTTCAACACAGACGAGATTGAGAAGAACAATGAGTTCTATAACATTC


TCAATAAGAATTTCGAAATCTTCGAATCCAGTTACAAGGTTCTGAATAAGGTCCGGAACTTTGT


GACCAAGAAAGAAGTTATTGAGAATAAGATGAAGCTGAATTTCTCAAACTATCAGCTCGGGAA


CGGCTGGCACAAGAACAAAGAGAAGGACTGTAGCATTATCCTGTTTCGCAAGAGAAACAATGA


GCGCTGGATATACTACCTCGGGATTCTGAAGCATGGTACAAAGATCAAAGAGAACGACTATCTC


TCATCAGTGGACACAGGGTTCTACAAGATGGACTATTACGCACAGAATAGTCTGTCAAAGATGA


TTCCAAAGTGCAGCATTACAGTTAAGAACGTCAAGAACGCTCCAGAGGACGAGTCTGTCATTC


TGAACGATTCCAAGAAGTTCAATGAACCTCTGGAGATCACACCCGAGATAAGAAAGCTGTACG


GCAATAACGAGCACATCAAGGGCGACAAATTCAAGAAAGAGTCCCTGGTCAAGTGGATAGACT


TCTGTAAGGAGTTTCTGCTGAAATACAAGAGTTTCGAGAAGGCCAAGAAGGAAATCCTCAAAC


TCAAAGAATCAAACCTCTATGAGAACCTGGAGGAGTTCTACAGCGATGCCGAGGAGAAAGCCT


ACTTCCTGGAGTTCATCAACATTGATGAGGACAAGATCAAGAAGCTGGTCAAGGAGAAGAATC


TGTACCTCTTCCAGATATACAACAAAGACTTTAGCGCTTATTCAACCGGCAATAAGAACCTCCAT


ACCATGTATTTCGAGGAGCTGTTCACTGATGAGAATCTCAAGAAGCCTGTGTTCAAGCTCAACG


GAAACACAGAAGTGTTCTACAGAATTGCCAGTAGTAAGCCAAAGATCGTGCACAACAAGGGA


GAGAAACTGGTCAACAAGACATACCTCGACGATGGCATCATTAAGACAATTCCAGATTCTGTGT


ACGAAGAGATTTCAGAGAAAGTCAAGAACAACGAGGACTACTCCAAGCTGCTCGAAGAGAAT


AACATCAAGAATCTGGAAATCAAGGTGGCAACTCATGAAATTGTGAAAGATAAACGCTACTTTG


AGAACAAATTCCTGTTCTATCTGCCCATCACACTGAACAAGAAGGTGTCTAACAAGAATACCAA


TAAGAACATCAACAAGAATGTCATCGACGAGATTAAGGATTGTAACGAGTATAACGTCATTGGG


ATTGATAGAGGTGAGCGGAACCTGATTAGCCTGTGCATCATCAATCAGAACGGCGAAATCATAC


TCCAGAAGGAGATGAACATCATTCAATCTAGCGACAAGTATAACGTGGACTATAACGAGAAACT


GGAGATCAAGTCTAAGGAGAGGGACAACGCTAAGAAGAACTGGAGCGAAATCGGGAAGATAA


AGGACCTGAAGAGCGGATACCTCTCCGCTGTGGTGCACGAGATTGTCAAGCTGGCAATCGAAT


ACAACGCTGTGATCATTCTGGAGGACCTCAACAACGGGTTTAAGAACTCAAGGAAGAAGGTTG


ATAAACAGATATACCAGAAATTCGAGAGGGCTCTGATTGAGAAGCTGCAATTTCTCATCTTCAA


GAACTATGACAAGAATGAGAAAGGAGGACTCAGGAATGCTTTCCAGCTGACTCCCGAACTGAA


GAACATCACCAAGGTGGCATCCCAGCAGGGCATCATAATCTACACAAATCCAGCCTATACCAGC


AAGATCGACCCTACCACAGGTTATGCAAACATAATCAAGAAGAGCAACAATAACGAGGAGTCT


ATCGTGAAGGCAATCGACAAGATTTCCTATGACAAAGAGAAAGACATGTTCTACTTCGACATTA


ACCTGTCAAATAGCTCCTTTAACCTGACCGTTAAGAATGTGCTCAAGAAGGAATGGCGCATCTA


CACCAATGGCGAGAGAATCATCTATAAGGATCGCAAGTACATTACACTGAATATCACACAAGAG


ATGAAAGACATCCTGTCAAAGTGCGGCATTGATTATCTGAACATCGACAACCTGAAACAGGACA


TTCTCAAGAACAAACTGCATAAGAAGGTCTACTATATCTTCGAGCTGGCTAACAAGATGCGCAA


CGAGAATAAGGACGTGGATTACATAATCAGCCCAGTGCTGAATAAGGATGGAAAGTTCTTCATG


ACCCAAGAGATCAACGAGCTGACACCAAAGGATGCCGACCTGAACGGAGCCTATAACATAGCT


CTGAAGGGCAAGCTGATGATTGACAACCTGAATAAGAAGGAGAAGTTTGTCTTTCTCAGCAAT


GAAGATTGGCTCAATTTCATCCAGGGTCGG





>SEQ ID NO: 16 HkCpf1 nucleotide sequence


ATGTTCGAGAAGCTGAGCAACATCGTGAGCATCAGCAAGACCATCCGCTTCAAGCTGATCCCC


GTGGGCAAGACCCTGGAGAACATCGAGAAGCTGGGCAAGCTGGAGAAGGACTTCGAGCGCAG


CGACTTCTACCCCATCCTGAAGAACATCAGCGACGACTACTACCGCCAGTACATCAAGGAGAA


GCTGAGCGACCTGAACCTGGACTGGCAGAAGCTGTACGACGCCCACGAGCTGCTGGACAGCA


GCAAGAAGGAGAGCCAGAAGAACCTGGAGATGATCCAGGCCCAGTACCGCAAGGTGCTGTTC


AACATCCTGAGCGGCGAGCTGGACAAGAGCGGCGAGAAGAACAGCAAGGACCTGATCAAGA


ACAACAAGGCCCTGTACGGCAAGCTGTTCAAGAAGCAGTTCATCCTGGAGGTGCTGCCCGACT


TCGTGAACAACAACGACAGCTACAGCGAGGAGGACCTGGAGGGCCTGAACCTGTACAGCAAG


TTCACCACCCGCCTGAAGAACTTCTGGGAGACCCGCAAGAACGTGTTCACCGACAAGGACATC


GTGACCGCCATCCCCTTCCGCGCCGTGAACGAGAACTTCGGCTTCTACTACGACAACATCAAGA


TCTTCAACAAGAACATCGAGTACCTGGAGAACAAGATCCCCAACCTGGAGAACGAGCTGAAG


GAGGCCGACATCCTGGACGACAACCGCAGCGTGAAGGACTACTTCACCCCCAACGGCTTCAAC


TACGTGATCACCCAGGACGGCATCGACGTGTACCAGGCCATCCGCGGCGGCTTCACCAAGGAG


AACGGCGAGAAGGTGCAGGGCATCAACGAGATCCTGAACCTGACCCAGCAGCAGCTGCGCCG


CAAGCCCGAGACCAAGAACGTGAAGCTGGGCGTGCTGACCAAGCTGCGCAAGCAGATCCTGG


AGTACAGCGAGAGCACCAGCTTCCTGATCGACCAGATCGAGGACGACAACGACCTGGTGGAC


CGCATCAACAAGTTCAACGTGAGCTTCTTCGAGAGCACCGAGGTGAGCCCCAGCCTGTTCGAG


CAGATCGAGCGCCTGTACAACGCCCTGAAGAGCATCAAGAAGGAGGAGGTGTACATCGACGCC


CGCAACACCCAGAAGTTCAGCCAGATGCTGTTCGGCCAGTGGGACGTGATCCGCCGCGGCTAC


ACCGTGAAGATCACCGAGGGCAGCAAGGAGGAGAAGAAGAAGTACAAGGAGTACCTGGAGC


TGGACGAGACCAGCAAGGCCAAGCGCTACCTGAACATCCGCGAGATCGAGGAGCTGGTGAAC


CTGGTGGAGGGCTTCGAGGAGGTGGACGTGTTCAGCGTGCTGCTGGAGAAGTTCAAGATGAA


CAACATCGAGCGCAGCGAGTTCGAGGCCCCCATCTACGGCAGCCCCATCAAGCTGGAGGCCAT


CAAGGAGTACCTGGAGAAGCACCTGGAGGAGTACCACAAGTGGAAGCTGCTGCTGATCGGCA


ACGACGACCTGGACACCGACGAGACCTTCTACCCCCTGCTGAACGAGGTGATCAGCGACTACT


ACATCATCCCCCTGTACAACCTGACCCGCAACTACCTGACCCGCAAGCACAGCGACAAGGACA


AGATCAAGGTGAACTTCGACTTCCCCACCCTGGCCGACGGCTGGAGCGAGAGCAAGATCAGC


GACAACCGCAGCATCATCCTGCGCAAGGGCGGCTACTACTACCTGGGCATCCTGATCGACAAC


AAGCTGCTGATCAACAAGAAGAACAAGAGCAAGAAGATCTACGAGATCCTGATCTACAACCAG


ATCCCCGAGTTCAGCAAGAGCATCCCCAACTACCCCTTCACCAAGAAGGTGAAGGAGCACTTC


AAGAACAACGTGAGCGACTTCCAGCTGATCGACGGCTACGTGAGCCCCCTGATCATCACCAAG


GAGATCTACGACATCAAGAAGGAGAAGAAGTACAAGAAGGACTTCTACAAGGACAACAACAC


CAACAAGAACTACCTGTACACCATCTACAAGTGGATCGAGTTCTGCAAGCAGTTCCTGTACAAG


TACAAgggcccCAACAAGGAGAGCTACAAGGAGATGTACGACTTCAGCACCCTGAAGGACACCA


GCCTGTACGTGAACCTGAACGACTTCTACGCCGACGTGAACAGCTGCGCCTACCGCGTGCTGT


TCAACAAGATCGACGAGAACACCATCGACAACGCCGTGGAGGACGGCAAGCTGCTGCTGTTC


CAGATCTACAACAAGGACTTCAGCCCCGAGAGCAAGGGCAAGAAGAACCTGCACACCCTGTA


CTGGCTGAGCATGTTCAGCGAGGAGAACCTGCGCACCCGCAAGCTGAAGCTGAACGGCCAGG


CCGAGATCTTCTACCGCAAGAAGCTGGAGAAGAAGCCCATCATCCACAAGGAGGGCAGCATCC


TGCTGAACAAGATCGACAAGGAGGGCAACACCATCCCCGAGAACATCTACCACGAGTGCTACC


GCTACCTGAACAAGAAGATCGGCCGCGAGGACCTGAGCGACGAGGCCATCGCCCTGTTCAAC


AAGGACGTGCTGAAGTACAAGGAGGCCCGCTTCGACATCATCAAGGACCGCCGCTACAGCGA


GAGCCAGTTCTTCTTCCACGTGCCCATCACCTTCAACTGGGACATCAAGACCAACAAGAACGT


GAACCAGATCGTGCAGGGCATGATCAAGGACGGCGAGATCAAGCACATCATCGGCATCGACCG


CGGCGAGCGCCACCTGCTGTACTACAGCGTGATCGACCTGGAGGGCAACATCGTGGAGCAGGG


CAGCCTGAACACCCTGGAGCAGAACCGCTTCGACAACAGCACCGTGAAGGTGGACTACCAGA


ACAAGCTGCGCACCCGCGAGGAGGACCGCGACCGCGCCCGCAAGAACTGGACCAACATCAAC


AAGATCAAGGAGCTGAAGGACGGCTACCTGAGCCACGTGGTGCACAAGCTGAGCCGCCTGAT


CATCAAGTACGAGGCCATCGTGATCATGGAGAACCTGAACCAGGGCTTCAAGCGCGGCCGCTT


CAAGGTGGAGCGCCAGGTGTACCAGAAGTTCGAGCTGGCCCTGATGAACAAGCTGAGCGCCC


TGAGCTTCAAGGAGAAGTACGACGAGCGCAAGAACCTGGAGCCCAGCGGCATCCTGAACCCC


ATCCAGGCCTGCTACCCCGTGGACGCCTACCAGGAGCTGCAGGGCCAGAACGGCATCGTGTTC


TACCTGCCCGCCGCCTACACCAGCGTGATCGACCCCGTGACCGGCTTCACCAACCTGTTCCGCC


TGAAGAGCATCAACAGCAGCAAGTACGAGGAGTTCATCAAGAAGTTCAAGAACATCTACTTCG


ACAACGAGGAGGAGGACTTCAAGTTCATCTTCAACTACAAGGACTTCGCCAAGGCCAACCTGG


TGATCCTGAACAACATCAAGAGCAAGGACTGGAAGATCAGCACCCGCGGCGAGCGCATCAGCT


ACAACAGCAAGAAGAAGGAGTACTTCTACGTGCAGCCCACCGAGTTCCTGATCAACAAGCTGA


AGGAGCTGAACATCGACTACGAGAACATCGACATCATCCCCCTGATCGACAACCTGGAGGAGA


AGGCCAAGCGCAAGATCCTGAAGGCCCTGTTCGACACCTTCAAGTACAGCGTGCAGCTGCGCA


ACTACGACTTCGAGAACGACTACATCATCAGCCCCACCGCCGACGACAACGGCAACTACTACA


ACAGCAACGAGATCGACATCGACAAGACCAACCTGCCCAACAACGGCGACGCCAACGGCGCC


TTCAACATCGCCCGCAAGGGCCTGCTGCTGAAGGACCGCATCGTGAACAGCAACGAGAGCAA


GGTGGACCTGAAGATCAAGAACGAGGACTGGATCAACTTCATCATCAGC





>SEQ ID NO: 17 AbCpf1 nucleotide sequence


ATGTTCAGCCTGGACTACTTCAGCCTGACCCTGAGCCAGCGCTACATCGACATCTACAACACCA


TGATCGGCGGCAACACCCTGGCCGACGGCACCAAGGTGCAGGGCATCAACGAGAACATCAAC


ATCTACCGCCAGAAGAACAACATCGACCGCAAGAACCTGCCCACCCTGAAGCCCCTGCACAAG


CAGCTGCTGAGCGACCGCGAGACCCTGAGCTGGATCCCCGAGGCCTTCAAGACCAAGGAGGA


GGTGGTGGGCGCCATCGAGGACTTCTACAAGAACAACATCATCAGCTTCAAGTGCTGCGACAA


CATCGTGGACATCACCAAGCAGTTCATCGACATCTTCAGCCTGAACGAGGACTACGAGCTGAA


CAAGATCTTCATCAAGAACGACATCAGCATCACCAGCATCAGCCAGGACATCTTCAAGGACTAC


CGCATCATCAAGGAGGCCCTGTGGCAGAAGCACATCAACGAGAACCCCAAGGCCGCCAAGAG


CAAGGACCTGACCGGCGACAAGGAGAAGTACTTCAGCCGCAAGAACAGCTTCTTCAGCTTCG


AGGAGATCATCAGCAGCCTGAAGCTGATGGGCCGCAAGATCGACCTGTTCAGCTACTTCAAGG


ACAACGTGGAGTACCGCGCCCACAGCATCGAGACCACCTTCATCAAGTGGCAGAAGAACAAG


AACGACAAGAAGACCACCAAGGAGCTGCTGGACAACATCCTGAACCTGCAGCGCGTGCTGAA


GCCCCTGTACCTGAAGGCCGAGGTGGAGAAGGACATCCTGTTCTACAGCATCTTCGACATCTAC


TTCGAGAGCCTGAACGAGATCGTGAAGCTGTACAACAAGGTGCGCGACTTCGAGAGCAAGAA


GCCCTACAGCCTGGAGAAGTTCAAGCTGAACTTCCAGAACAGCACCCTGCTGAGCGGCTGGG


ACGTGAACAAGGAGCCCGACAACACCAGCATCCTGCTGAAGAAGGACGGCCTGTACTACCTG


GGCATCATGGACAAGAAGCACAACCGCGTGTTCAAGAACCTGGAGAGCAGCAAGGGCGGCTA


CGAGAAGATCGAGTACAAGCTGCTGAGCGGCCCCAACAAGATGCTGCCCAAGGTGTTCTTCAG


CAACAAGAGCATCGGCTACTACAACCCCAGCCCCGCCCTGCTGGAGAAGTACAAGAGCGGCGT


GCACAAGAAGGGCGAGAGCTTCGACCTGAACTTCTGCCACGAGCTGATCGACTTCTTCAAGGC


CAGCATCGACAAGCACGAGGACTGGAAGAACTTCAACTTCAAGTTCAGCGACACCAGCGAGT


ACGCCGACATCAGCGGCTTCTACCGCGAGGTGGAGCAGCAGGGCTACAAGATCACCTTCAAGA


ACATCGACGAGGAGTTCATCAACACCCTGATCAACGAGGGCAAGCTGTACCTGTTCCAGATCTA


CAACAAGGACTTCAGCACCTTCAGCAAGGGCACCAAGAACCTGCACACCCTGTACTGGGAGA


TGATCTTCAACGAGGAGAACCTGAAGAACGTGGTGTACAAGCTGAACGGCGAGGCCGAGATC


TTCTACCGCAAGAAGAGCATCGAGTACAGCGAGGACAAGATGAAGTACGGCCACCACTACGAG


GAGCTGAAGGACAAGTTCAACTACCCCATCATCAAGGACAAGCGCTTCAccatggACAAGTTCCA


GTTCCACGTGCCCATCACCATGAACTTCAAGGCCACCGGCCGCAGCTACATCAACGAGGAGGT


GAACGACTTCCTGCGCCAGAACAGCAAGGACGTGAAGATCATCGGCATCAACCGCGGCGAGC


GCCACCTGATCTACCTGACCATGATCAACGCCAAGGGCGAGATCATCCAGCAGTACAGCCTGA


ACGAGATCGTGAACAGCTACAACAACAAGAACTTCACCGTGAACTACAACGAGAAGCTGAGC


AAGAAGGAGGGCGAGCGCGCCATCGCCCGCGAGAACTGGGGCGTGGTGGAGAACATCAAGG


AGCTGAAGGAGGGCTACCTGAGCCACGCCATCCACACCATCAGCAACCTGATCGTGGAGAACA


ACGCCATCGTGGTGCTGGAGGACCTGAACTTCGAGTTCAAGCGCGAGCGCCTGAAGGTGGAG


AAGAGCATCTACCAGAAGTTCGAGAAGATGCTGATCGACAAGCTGAACTACCTGGTGGACAAG


AAGAAGGACATCAACGAGAACGGCGGCCTGCTGAAGGCCCTGCAGCTGACCAACAAGTTCGA


GAGCTTCGAGAAGATCGGCAAGCAGAACGGCTTCCTGTTCTTCGTGAACGCCTGGAACATCAC


CAAGATCTGCCCCGTGACCGGCTTCGTGAGCCTGTTCGACACCCGCTACCAGAGCGTGGACAA


GGCCCGCGAGTTCTTCAGCAAGTTCGACAGCATCAAGTACAACGAGGAGAAGGAGCACTACG


AGTTCGTGTTCGACTACAGCAACTTCACCGACAAGGCCAAGGACACCAAGACCAAGTGGACC


GTGTGCAGCTACGGCACCCGCATCAAGACCTTCCGCAACAGCGAGAAGAACAACAACTGGGA


CAACAAGACCGTGAGCCCCACCGAGGACCTGAGCAAGCTGCTGAAGAGCTGCGACCGCGACA


TCAAGGAGTTCATCATCAGCCAGGACAAGAAGGAGTTCTTCGTGGAGCTGCTGGAGATCTTCA


GCCTGATCGTGCAGATGAAGAACAGCATCATCAACAGCGAGATCGACTACATCATCAGCCCCGT


GGCCAACGAGAACGGCGAGTTCTTCGACAGCCGCTTCGCCAACAGCAGCCTGCCCAAGAACG


CCGACGCCAACGCCGCCTACAACACCGCCCGCAAGGGCCTGATGCTGCTGGAGAAGATCCGCG


ACAGCGAGATCGGCAAGAAGATCGACATGAAGATCACCAACACCGAGTGGCTGAACTTCGTGC


AGGAGCGC





>SEQ ID NO: 18 BoCpf1 nucleotide sequence


AGGAAATTCAATGAGTTCGTGGGTCTGTATCCTATTAGTAAGACCCTCAGGTTCGAGCTGAAAC


CAATCGGCAAGACACTGGAGCATATCCAGAGAAACAAGCTCCTGGAGCATGATGCCGTTCGCG


CTGACGACTATGTCAAAGTGAAGAAGATCATTGACAAATACCATAAGTGTCTGATAGATGAGGC


CCTGTCTGGATTCACCTTCGATACAGAAGCCGATGGGAGAAGCAATAACAGCCTGTCTGAGTAC


TATCTGTACTACAATCTCAAGAAGAGAAATGAGCAGGAACAGAAGACTTTCAAGACAATCCAG


AACAATCTGCGGAAGCAGATTGTCAACAAGCTGACCCAGAGTGAGAAGTATAAGAGAATTGAT


AAGAAAGAACTCATCACCACTGATCTGCCAGACTTCCTGACTAATGAAAGCGAGAAAGAACTG


GTGGAGAAGTTCAAGAACTTTACTACCTACTTTACCGAATTTCACAAGAACCGCAAGAATATGT


ACTCCAAGGAAGAGAAGTCCACCGCAATCGCTTTCCGCCTGATTAACGAGAACCTGCCAAAGT


TTGTCGATAACATCGCTGCTTTCGAGAAAGTTGTGTCCTCACCTCTCGCAGAGAAGATCAATGC


CCTGTACGAGGACTTTAAGGAGTATCTGAATGTGGAAGAAATCTCACGGGTGTTTAGACTCGAC


TATTACGATGAACTGCTGACACAGAAACAGATTGATCTGTACAACGCTATCGTCGGTGGTCGGA


CAGAGGAGGACAACAAGATCCAGATAAAGGGACTGAACCAGTATATCAACGAATACAATCAGC


AGCAGACAGATCGGTCTAATCGGCTGCCAAAGCTGAAACCTCTCTATAAGCAAATTCTCTCCGA


CAGAGAGAGCGTGTCATGGCTGCCTCCCAAGTTTGATAGCGATAAGAATCTGCTGATTAAGATC


AAAGAATGCTACGACGCCCTGTCCGAGAAGGAGAAAGTGTTTGACAAGCTGGAAAGTATTCTC


AAGAGCCTGTCAACCTATGACCTGTCTAAGATATACATTTCTAACGACTCTCAGCTGTCTTACAT


TAGCCAGAAGATGTTTGGACGGTGGGACATCATATCTAAGGCCATCAGGGAGGATTGTGCTAAG


AGGAATCCTCAGAAATCTCGGGAATCCCTGGAGAAGTTCGCCGAGAGGATAGATAAGAAACTC


AAGACCATCGACTCCATCAGCATCGGCGATGTGGATGAGTGCCTGGCCCAGCTGGGTGAAACC


TACGTTAAGCGGGTGGAGGATTACTTTGTGGCAATGGGCGAATCCGAGATCGACGATGAGCAG


ACAGATACCACCTCCTTCAAGAAGAACATAGAGGGAGCATACGAGTCCGTCAAGGAGCTGCTG


AACAACGCTGATAACATTACAGACAATAACCTGATGCAGGACAAGGGCAATGTGGAGAAGATC


AAGACCCTGCTGGATGCAATCAAGGACCTCCAGCGGTTCATTAAGCCACTCCTGGGTAAAGGT


GACGAAGCAGACAAGGACGGCGTGTTCTACGGTGAGTTTACATCCCTGTGGACCAAACTCGAT


CAGGTTACTCCTCTCTATAACATGGTTCGGAATTACCTCACTTCAAAGCCTTATAGTACAAAGAA


GATTAAGCTGAACTTTGAGAACAGCACTCTCATGGATGGATGGGATCTGAATAAGGAGCCAGAT


AACACTACCGTGATATTCTGCAAAGATGGGCTGTATTACCTGGGCATTATGGGTAAGAAGTACAA


TAGAGTGTTTGTCGATAGAGAGGACCTGCCTCACGACGGCGAGTGCTACGACAAGATGGAGTA


CAAACTGCTGCCAGGTGCCAATAAGATGCTCCCTAAAGTGTTCTTCTCCGAAACTGGTATTCAA


CGGTTCCTCCCATCCGAGGAACTCCTGGGCAAGTACGAAAGAGGCACACATAAGAAAGGAGCT


GGGTTTGACCTGGGAGACTGTAGAGCACTGATTGATTTCTTTAAGAAGAGCATTGAAAGGCAC


GATGATTGGAAGAAGTTTGACTTCAAGTTCAGCGACACAAGCACATACCAGGACATAAGTGAG


TTCTATAGAGAAGTGGAGCAGCAGGGCTATAAGATGTCCTTTAGAAAGGTTTCTGTGGACTATAT


CAAGTCTCTGGTGGAAGAAGGTAAGCTGTATCTGTTCCAGATATACAACAAAGACTTCTCCGCA


CATTCCAAAGGGACACCTAACATGCACACTCTCTATTGGAAGATGCTGTTCGATGAGGAGAACC


TGAAGGACGTGGTGTATAAGCTGAATGGAGAAGCTGAGGTGTTCTTCCGGAAATCTAGCATCA


CAGTGCAAAGCCCAACACACCCTGCTAATTCACCTATCAAGAACAAGAACAAGGATAATCAGA


AGAAGGAATCAAAGTTTGAGTACGATCTCATCAAGGACCGCAGGTATACCGTGGACAAGTTCC


TCTTTCACGTGCCTATAACCATGAATTTCAAGTCCGTCGGTGGCTCTAACATCAATCAGCTCGTG


AAGCGGCACATTCGGTCCGCAACCGACCTCCACATCATCGGCATAGATAGAGGAGAGCGGCAT


CTGCTGTACCTGACCGTTATCGACAGCAGAGGTAACATCAAAGAACAGTTCAGTCTGAACGAG


ATAGTGAACGAGTATAACGGGAACACCTATCGGACCGATTACCACGAGCTGCTCGATACCAGAG


AAGGCGAGAGAACAGAAGCTAGACGGAACTGGCAGACTATACAGAACATACGCGAGCTGAAA


GAGGGATACCTCTCCCAGGTGATTCACAAGATCAGCGAGCTGGCTATCAAATACAACGCCGTGA


TCGTGCTGGAGGATCTCAATTTCGGCTTTATGAGGTCACGCCAGAAAGTGGAGAAGCAGGTGT


ATCAGAAATTCGAGAAGATGCTGATCGACAAGCTGAACTACCTGGTCGATAAGAAGAAACCTG


TCGCTGAAACCGGAGGGCTGCTGAGAGCCTACCAGCTGACCGGAGAATTTGAGTCCTTTAAGA


CCCTGGGAAAGCAGAGCGGCATTCTGTTCTACGTTCCCGCTTGGAACACCAGTAAGATTGATCC


TGTGACTGGGTTTGTCAATCTCTTCGATACCCACTATGAGAACATTGAGAAGGCTAAGGTGTTC


TTTGACAAATTCAAGAGTATCAGGTACAATTCCGACAAGGATTGGTTCGAATTTGTCGTGGACG


ACTATACAAGGTTCTCACCTAAGGCAGAGGGCACCAGGAGGGACTGGACTATCTGCACCCAGG


GAAAGCGCATTCAGATATGTCGGAACCACCAGCGCAATAACGAGTGGGAGGGTCAAGAGATTG


ACCTGACCAAAGCATTCAAGGAGCACTTTGAAGCCTATGGCGTTGACATCTCAAAGGACCTGA


GGGAGCAGATCAATACTCAGAACAAGAAAGAGTTCTTCGAAGAACTGCTGCGCCTGCTGCGGC


TCACCCTGCAAATGAGGAACTCCATGCCAAGTTCTGACATCGACTACCTGATCAGCCCAGTCGC


CAACGACACCGGATGCTTCTTCGATTCAAGAAAGCAGGCCGAGCTGAAAGAGAATGCAGTTCT


CCCTATGAACGCTGATGCTAATGGTGCATACAACATCGCTAGAAAGGGACTGCTGGCAATCCGC


AAGATGAAACAAGAAGAGAACGACAGTGCTAAGATCAGCCTCGCTATATCCAACAAGGAGTGG


CTCAAGTTTGCTCAGACTAAGCCATATCTGGAGGAC





>SEQ ID NO: 19 OsCpf1 nucleotide sequence


ATGGAGACCGAGATCCTGAAGTACGACTTCTTCGAGCGCGAGGGCAAGTACATGTACTACGAC


GGCCTGACCAAGCAGTACGCCCTGAGCAAGACCATCCGCAACGAGCTGGTGCCCATCGGCAA


GACCCTGGACAACATCAAGAAGAACCGCATCCTGGAGGCCGACATCAAGCGCAAGAGCGACT


ACGAGCACGTGAAGAAGCTGATGGACATGTACCACAAGAAGATCATCAACGAGGCCCTGGAC


AACTTCAAGCTGAGCGTGCTGGAGGACGCCGCCGACATCTACTTCAACAAGCAGAACGACGA


GCGCGACATCGACGCCTTCCTGAAGATCCAGGACAAGCTGCGCAAGGAGATCGTGGAGCAGCT


GAAGGGCCACACCGACTACAGCAAGGTGGGCAACAAGGACTTCCTGGGCCTGCTGAAGGCCG


CCAGCACCGAGGAGGACCGCATCCTGATCGAGAGCTTCGACAACTTCTACACCTACTTCACCA


GCTACAACAAGGTGCGCAGCAACCTGTACAGCGCCGAGGACAAGAGCAGCACCGTGGCCTAC


CGCCTGATCAACGAGAACCTGCCCAAGTTCTTCGACAACATCAAGGCCTACCGCACCGTGCGC


AACGCCGGCGTGATCAGCGGCGACATGAGCATCGTGGAGCAGGACGAGCTGTTCGAGGTGGA


CACCTTCAACCACACCCTGACCCAGTACGGCATCGACACCTACAACCACATGATCGGCCAGCT


GAACAGCGCCATCAACCTGTACAACCAGAAGATGCACGGCGCCGGCAGCTTCAAGAAGCTGC


CCAAGATGAAGGAGCTGTACAAGCAGCTGCTGACCGAGCGCGAGGAGGAGTTCATCGAGGAG


TACACCGACGACGAGGTGCTGATCACCAGCGTGCACAACTACGTGAGCTACCTGATCGACTAC


CTGAACAGCGACAAGGTGGAGAGCTTCTTCGACACCCTGCGCAAGAGCGACGGCAAGGAGGT


GTTCATCAAGAACGACGTGAGCAAGACCACCATGAGCAACATCCTGTTCGACAACTGGAGCAC


CATCGACGACCTGATCAACCACGAGTACGACAGCGCCCCCGAGAACGTGAAGAAGACCAAGG


ACGACAagtactTCGAGAAGCGCCAGAAGGACCTGAAGAAGAACAAGAGCTACAGCCTGAGCAA


GATCGCCGCCCTGTGCCGCGACACCACCATCCTGGAGAAGTACATCCGCCGCCTGGTGGACGA


CATCGAGAAGATCTACACCAGCAACAACGTGTTCAGCGACATCGTGCTGAGCAAGCACGACCG


CAGCAAGAAGCTGAGCAAGAACACCAACGCCGTGCAGGCCATCAAGAACATGCTGGACAGCA


TCAAGGACTTCGAGCACGACGTGATGCTGATCAACGGCAGCGGCCAGGAGATCAAGAAGAAC


CTGAACGTGTACAGCGAGCAGGAGGCCCTGGCCGGCATCCTGCGCCAGGTGGACCACATCTAC


AACCTGACCCGCAACTACCTGACCAAGAAGCCCTTCAGCACCGAGAAGATCAAGCTGAACTTC


AACCGCCCCACCTTCCTGGACGGCTGGGACAAGAACAAGGAGGAGGCCAACCTGGGCATCCT


GCTGATCAAGGACAACCGCTACTACCTGGGCATCATGAACACCAGCAGCAACAAGGCCTTCGT


GAACCCCCCCAAGGCCATCAGCAACGACATCTACAAGAAGGTGGACTACAAGCTGCTGCCCGG


CCCCAACAAGATGCTGCCCAAGGTGTTCTTCGCCACCAAGAACATCGCCTACTACGCCCCCAG


CGAGGAGCTGCTGAGCAAGTACCGCAAGGGCACCCACAAGAAGGGCGACAGCTTCAGCATCG


ACGACTGCCGCAACCTGATCGACTTCTTCAAGAGCAGCATCAACAAGAACACCGACTGGAGCA


CCTTCGGCTTCAACTTCAGCGACACCAACAGCTACAACGACATCAGCGACTTCTACCGCGAGG


TGGAGAAGCAGGGCTACAAGCTGAGCTTCACCGACATCGACGCCTGCTACATCAAGGACCTGG


TGGACAACAACGAGCTGTACCTGTTCCAGATCTACAACAAGGACTTCAGCCCCTACAGCAAGG


GCAAGCTGAACCTGCACACCCTGTACTTCAAGATGCTGTTCGACCAGCGCAACCTGGACAACG


TGGTGTACAAGCTGAACGGCGAGGCCGAGGTGTTCTACCGCCCCGCCAGCATCGAGAGCGACG


AGCAGATCATCCACAAGAGCGGCCAGAACATCAAGAACAAGAACCAGAAGCGCAGCAACTGC


AAGAAGACCAGCACCTTCGACTACGACATCGTGAAGGACCGCCGCTACTGCAAGGACAAGTTC


ATGCTGCACCTGCCCATCACCGTGAACTTCGGCACCAACGAGAGCGGCAAGTTCAACGAGCTG


GTGAACAACGCCATCCGCGCCGACAAGGACGTGAACGTGATCGGCATCGAccgcggCGAGCGCA


ACCTGCTGTACGTGGTGGTGGTGGACCCCTGCGGCAAGATCATCGAGCAGATCAGCCTGAACA


CCATCGTGGACAAGGAGTACGACATCGAGACCGACTACCACCAGCTGCTGGACGAGAAGGAG


GGCAGCCGCGACAAGGCCCGCAAGGACTGGAACACCATCGAGAACATCAAGGAGCTGAAGGA


GGGCTACCTGAGCCAGGTGGTGAACATCATCGCCAAGCTGGTGCTGAAGTACGACGCCATCAT


CTGCCTGGAGGACCTGAACTTCGGCTTCAAGCGCGGCCGCCAGAAGGTGGAGAAGCAGGTGT


ACCAGAAGTTCGAGAAGATGCTGATCGACAAGATGAACTACCTGGTGCTGGACAAGAGCCGC


AAGCAGGAGAGCCCCCAGAAGCCCGGCGGCGCCCTGAACGCCCTGCAGCTGACCAGCGCCTT


CAAGAGCTTCAAGGAGCTGGGCAAGCAGACCGGCATCATCTACTACGTGCCCGCCTACCTGAC


CAGCAAGATCGACCCCACCACCGGCTTCGCCAACCTGTTCTACATCAAGTACGAGAGCGTGGA


CAAGGCCCGCGACTTCTTCAGCAAGTTCGACTTCATCCGCTACAACCAGATGGACAACTACTTC


GAGTTCGGCTTCGACTACAAGAGCTTCACCGAGCGCGCCAGCGGCTGCAAGAGCAAGTGGAT


CGCCTGCACCAACGGCGAGCGCATCGTGAAGTACCGCAACAGCGACAAGAACAACAGCTTCG


ACGACAAGACCGTGATCCTGACCGACGAGTACCGCAGCCTGTTCGACAAGTACCTGCAGAACT


ACATCGACGAGGACGACCTGAAGGACCAGATCCTGCAGATCGACAGCGCCGACTTCTACAAGA


ACCTGATCAAGCTGTTCCAGCTGACCCTGCAGATGCGCAACAGCAGCAGCGACGGCAAGCGC


GACTACATCATCAGCCCCGTGAAGAACTACCGCGAGGAGTTCTTCTGCAGCGAGTTCAGCGAC


GACACCTTCCCCCGCGACGCCGACGCCAACGGCGCCTACAACATCGCCCGCAAGGGCCTGTGG


GTGATCAAGCAGATCCGCGAGACCAAGAGCGGCACCAAGATCAACCTGGCCATGAGCAACAG


CGAGTGGCTGGAGTACGCCCAGTGCAACCTGCTG





>SEQ ID NO: 20 BsCpf1 nucleotide sequence


ATGTACTACCAGAACCTGACCAAGAAGTACCCCGTGAGCAAGACCATCCGCAACGAGCTGATC


CCCATCGGCAAGACCCTGGAGAACATCCGCAAGAACAACATCCTGGAGAGCGACGTGAAGCG


CAAGCAGGACTACGAGCACGTGAAGGGCATCATGGACGAGTACCACAAGCAGCTGATCAACG


AGGCCCTGGACAACTACATGCTGCCCAGCCTGAACCAGGCCGCCGAGATCTACCTGAAGAAGC


ACGTGGACGTGGAGGACCGCGAGGAGTTCAAGAAGACCCAGGACCTGCTGCGCCGCGAGGTG


ACCGGCCGCCTGAAGGAGCACGAGAACTACACCAAGATCGGCAAGAAGGACATCCTGGACCT


GCTGGAGAAGCTGCCCAGCATCAGCGAGGAGGACTACAACGCCCTGGAGAGCTTCCGCAACT


TCTACACCTACTTCACCAGCTACAACAAGGTGCGCGAGAACCTGTACAGCGACGAGGAGAAGA


GCAGCACCGTGGCCTACCGCCTGATCAACGAGAACCTGCCCAAGTTCCTGGACAACATCAAGA


GCTACGCCTTCGTGAAGGCCGCCGGCGTGCTGGCCGACTGCATCGAGGAGGAGGAGCAGGAC


GCCCTGTTCATGGTGGAGACCTTCAACATGACCCTGACCCAGGAGGGCATCGACATGTACAACT


ACCAGATCGGCAAGGTGAACAGCGCCATCAACCTGTACAACCAGAAGAACCACAAGGTGGAG


GAGTTCAAGAAGATCCCCAAGATGAAGGTGCTGTACAAGCAGATCCTGAGCGACCGCGAGGA


GGTGTTCATCGGCGAGTTCAAGGACGACGAGACCCTGCTGAGCAGCATCGGCGCCTACGGCAA


CGTGCTGATGACCTACCTGAAGAGCGAGAAGATCAACATCTTCTTCGACGCCCTGCGCGAGAG


CGAGGGCAAGAACGTGTACGTGAAGAACGACCTGAGCAAGACCACCATGAGCAACATCGTGT


TCGGCAGCTGGAGCGCCTTCGACGAGCTGCTGAACCAGGAGTACGACCTGGCCAACGAGAAC


AAGAAGAAGGACGACAAGTACTTCGAGAAGCGCCAGAAGGAGCTGAAGAAGAACAAGAGCT


ACACCCTGGAGCAGATGAGCAACCTGAGCAAGGAGGACATCAGCCCCATCGAGAACTACATCG


AGCGCATCAGCGAGGACATCGAGAAGATCTGCATCTACAACGGCGAGTTCGAGAAGATCGTGG


TGAACGAGCACGACAGCAGCCGCAAGCTGAGCAAGAACATCAAGGCCGTGAAGGTGATCAAG


GACTACCTGGACAGCATCAAGGAGCTGGAGCACGACATCAAGCTGATCAACGGCAGCGGCCA


GGAGCTGGAGAAGAACCTGGTGGTGTACGTGGGCCAGGAGGAGGCCCTGGAGCAGCTGCGCC


CCGTGGACAGCCTGTACAACCTGACCCGCAACTACCTGACCAAGAAGCCCTTCAGCACCGAGA


AGGTGAAGCTGAACTTCAACAAGAGCACCCTGCTGAACGGCTGGGACAAGAACAAGGAGACC


GACAACCTGGGCATCCTGTTCTTCAAGGACGGCAAGTACTACCTGGGCATCATGAACACCACC


GCCAACAAGGCCTTCGTGAACCCCCCCGCCGCCAAGACCGAGAACGTGTTCAAGAAGGTGGA


CTACAAGCTGCTGCCCGGCAGCAACAAGATGCTGCCCAAGGTGTTCTTCGCCAAGAGCAACAT


CGGCTACTACAACCCCAGCACCGAGCTGTACAGCAACTACAAGAAGGGCACCCACAAGAAggg


cccCAGCTTCAGCATCGACGACTGCCACAACCTGATCGACTTCTTCAAGGAGAGCATCAAGAAG


CACGAGGACTGGAGCAAGTTCGGCTTCGAGTTCAGCGACACCGCCGACTACCGCGACATCAGC


GAGTTCTACCGCGAGGTGGAGAAGCAGGGCTACAAGCTGACCTTCACCGACATCGACGAGAG


CTACATCAACGACCTGATCGAGAAGAACGAGCTGTACCTGTTCCAGATCTACAACAAGGACTT


CAGCGAGTACAGCAAGGGCAAGCTGAACCTGCACACCCTGTACTTCATGATGCTGTTCGACCA


GCGCAACCTGGACAACGTGGTGTACAAGCTGAACGGCGAGGCCGAGGTGTTCTACCGCCCCG


CCAGCATCGCCGAGAACGAGCTGGTGATCCACAAGGCCGGCGAGGGCATCAAGAACAAGAAC


CCCAACCGCGCCAAGGTGAAGGAGACCAGCACCTTCAGCTACGACATCGTGAAGGACAAGCG


CTACAGCAAGTACAAGTTCACCCTGCACATCCCCATCACCATGAACTTCGGCGTGGACGAGGT


GCGCCGCTTCAACGACGTGATCAACAACGCCCTGCGCACCGACGACAACGTGAACGTGATCGG


CATCGACCGCGGCGAGCGCAACCTGCTGTACGTGGTGGTGATCAACAGCGAGGGCAAGATCCT


GGAGCAGATCAGCCTGAACAGCATCATCAACAAGGAGTACGACATCGAGACCAACTACCACGC


CCTGCTGGACGAGCGCGAGGACGACCGCAACAAGGCCCGCAAGGACTGGAACACCATCGAGA


ACATCAAGGAGCTGAAGACCGGCTACCTGAGCCAGGTGGTGAACGTGGTGGCCAAGCTGGTG


CTGAAGTACAACGCCATCATCTGCCTGGAGGACCTGAACTTCGGCTTCAAGCGCGGCCGCCAG


AAGGTGGAGAAGCAGGTGTACCAGAAGTTCGAGAAGATGCTGATCGAGAAGCTGAACTACCT


GGTGATCGACAAGAGCCGCGAGCAGGTGAGCCCCGAGAAGATGGGCGGCGCCCTGAACGCCC


TGCAGCTGACCAGCAAGTTCAAGAGCTTCGCCGAGCTGGGCAAGCAGAGCGGCATCATCTACT


ACGTGCCCGCCTACCTGACCAGCAAGATCGACCCCACCACCGGCTTCGTGAACCTGTTCTACAT


CAAGTACGAGAACATCGAGAAGGCCAAGCAGTTCTTCGACGGCTTCGACTTCATCCGCTTCAA


CAAGAAGGACGACATGTTCGAGTTCAGCTTCGACTACAAGAGCTTCACCCAGAAGGCCTGCGG


CATCCGCAGCAAGTGGATCGTGTACACCAACGGCGAGCGCATCATCAAGTACCCCAACCCCGA


GAAGAACAACCTGTTCGACGAGAAGGTGATCAACGTGACCGACGAGATCAAGGGCCTGTTCA


AGCAGTACCGCATCCCCTACGAGAACGGCGAGGACATCAAGGAGATCATCATCAGCAAGGCCG


AGGCCGACTTCTACAAGCGCCTGTTCCGCCTGCTGCACCAGACCCTGCAGATGCGCAACAGCA


CCAGCGACGGCACCCGCGACTACATCATCAGCCCCGTGAAGAACGACCGCGGCGAGTTCTTCT


GCAGCGAGTTCAGCGAGGGCACCATGCCCAAGGACGCCGACGCCAACGGCGCCTACAACATC


GCCCGCAAGGGCCTGTGGGTGCTGGAGCAGATCCGCCAGAAGGACGAGGGCGAGAAGGTGA


ACCTGAGCATGACCAACGCCGAGTGGCTGAAGTACGCCCAGCTGCACCTGCTG





>SEQ ID NO: 21 PsCpf1 nucleotide sequence


GAGAACTTCAAGAACCTGTACCCCATCAACAAGACCCTGCGCTTCGAGCTGCGCCCCTACGGC


AAGACCCTGGAGAACTTCAAGAAGAGCGGCCTGCTGGAGAAGGACGCCTTCAAGGCCAACAG


CCGCCGCAGCATGCAGGCCATCATCGACGAGAAGTTCAAGGAGACCATCGAGGAGCGCCTGA


AGTACACCGAGTTCAGCGAGTGCGACCTGGGCAACATGACCAGCAAGGACAAGAAGATCACC


GACAAGGCCGCCACCAACCTGAAGAAGCAGGTGATCCTGAGCTTCGACGACGAGATCTTCAA


CAACTACCTGAAGCCCGACAAGAACATCGACGCCCTGTTCAAGAACGACCCCAGCAACCCCGT


GATCAGCACCTTCAAGGGCTTCACCACCTACTTCGTGAACTTCTTCGAGATCCGCAAGCACATC


TTCAAGGGCGAGAGCAGCGGCAGCATGGCCTACCGCATCATCGACGAGAACCTGACCACCTAC


CTGAACAACATCGAGAAGATCAAGAAGCTGCCCGAGGAGCTGAAGAGCCAGCTGGAGGGCAT


CGACCAGATCGACAAGCTGAACAACTACAACGAGTTCATCACCCAGAGCGGCATCACCCACTA


CAACGAGATCATCGGCGGCATCAGCAAGAGCGAGAACGTGAAGATCCAGGGCATCAACGAGG


GCATCAACCTGTACTGCCAGAAGAACAAGGTGAAGCTGCCCCGCCTGACCCCCCTGTACAAGA


TGATCCTGAGCGACCGCGTGAGCAACAGCTTCGTGCTGGACACCATCGAGAACGACACCGAGC


TGATCGAGATGATCAGCGACCTGATCAACAAGACCGAGATCAGCCAGGACGTGATCATGAGCG


ACATCCAGAACATCTTCATCAAGTACAAGCAGCTGGGCAACCTGCCCGGCATCAGCTACAGCA


GCATCGTGAACGCCATCTGCAGCGACTACGACAACAACTTCGGCGACGGCAAGCGCAAGAAG


AGCTACGAGAACGACCGCAAGAAGCACCTGGAGACCAACGTGTACAGCATCAACTACATCAG


CGAGCTGCTGACCGACACCGACGTGAGCAGCAACATCAAGATGCGCTACAAGGAGCTGGAGC


AGAACTACCAGGTGTGCAAGGAGAACTTCAACGCCACCAACTGGATGAACATCAAGAACATC


AAGCAGAGCGAGAAGACCAACCTGATCAAGGACCTGCTGGACATCCTGAAGAGCATCCAGCG


CTTCTACGACCTGTTCGACATCGTGGACGAGGACAAGAACCCCAGCGCCGAGTTCTACACCTG


GCTGAGCAAGAACGCCGAGAAGCTtGACTTCGAGTTCAACAGCGTGTACAACAAGAGCCGCA


ACTACCTGACCCGCAAGCAGTACAGCGACAAGAAGATCAAGCTGAACTTCGACAGCCCCACCC


TGGCCAAGGGCTGGGACGCCAACAAGGAGATCGACAACAGCACCATCATCATGCGCAAGTTCA


ACAACGACCGCGGCGACTACGACTACTTCCTGGGCATCTGGAACAAGAGCACCCCCGCCAACG


AGAAGATCATCCCCCTGGAGGACAACGGCCTGTTCGAGAAGATGCAGTACAAGCTGTACCCCG


ACCCCAGCAAGATGCTGCCCAAGCAGTTCCTGAGCAAGATCTGGAAGGCCAAGCACCCCACC


ACCCCCGAGTTCGACAAGAAGTACAAGGAGGGCCGCCACAAGAAGGGCCCCGACTTCGAGAA


GGAGTTCCTGCACGAGCTGATCGACTGCTTCAAGCACGGCCTGGTGAACCACGACGAGAAGTA


CCAGGACGTGTTCGGCTTCAACCTGCGCAACACCGAGGACTACAACAGCTACACCGAGTTCCT


GGAGGACGTGGAGCGCTGCAACTACAACCTGAGCTTCAACAAGATCGCCGACACCAGCAACC


TGATCAACGACGGCAAGCTGTACGTGTTCCAGATCTGGAGCAAGGACTTCAGCATCGACAGCA


AGGGCACCAAGAACCTGAACACCATCTACTTCGAGAGCCTGTTCAGCGAGGAGAACATGATCG


AGAAGATGTTCAAGCTGAGCGGCGAGGCCGAGATCTTCTACCGCCCCGCCAGCCTGAACTACT


GCGAGGACATCATCAAGAAGGGCCACCACCACGCCGAGCTGAAGGACAAGTTCGACTACCCC


ATCATCAAGGACAAGCGCTACAGCCAGGACAAGTTCTTCTTCCACGTGCCCATGGTGATCAACT


ACAAGAGCGAGAAGCTGAACAGCAAGAGCCTGAACAACCGCACCAACGAGAACCTGGGCCA


GTTCACCCACATCATCGGCATCGACCGCGGCGAGCGCCACCTGATCTACCTGACCGTGGTGGAC


GTGAGCACCGGCGAGATCGTGGAGCAGAAGCACCTGGACGAGATCATCAACACCGACACCAA


GGGCGTGGAGCACAAGACCCACTACCTGAACAAGCTGGAGGAGAAGAGCAAGACCCGCGAC


AACGAGCGCAAGAGCTGGGAGGCCATCGAGACCATCAAGGAGCTGAAGGAGGGCTACATCAG


CCACGTGATCAACGAGATCCAGAAGCTGCAGGAGAAGTACAACGCCCTGATCGTGATGGAGAA


CCTGAACTACGGCTTCAAGAACAGCCGCATCAAGGTGGAGAAGCAGGTGTACCAGAAGTTCG


AGACCGCCCTGATCAAGAAGTTCAACTACATCATCGACAAGAAGGACCCCGAGACCTACATCC


ACGGCTACCAGCTGACCAACCCCATCACCACCCTGGACAAGATCGGCAACCAGAGCGGCATCG


TGCTGTACATCCCCGCCTGGAACACCAGCAAGATCGACCCCGTGACCGGCTTCGTGAACCTGC


TGTACGCCGACGACCTGAAGTACAAGAACCAGGAGCAGGCCAAGAGCTTCATCCAGAAGATC


GACAACATCTACTTCGAGAACGGCGAGTTCAAGTTCGACATCGACTTCAGCAAGTGGAACAAC


CGCTACAGCATCAGCAAGACCAAGTGGACCCTGACCAGCTACGGCACCCGCATCCAGACCTTC


CGCAACCCCCAGAAGAACAACAAGTGGGACAGCGCCGAGTACGACCTGACCGAGGAGTTCAA


GCTGATCCTGAACATCGACGGCACCCTGAAGAGCCAGGACGTGGAGACCTACAAGAAGTTCAT


GAGCCTGTTCAAGCTGATGCTGCAGCTGCGCAACAGCGTGACCGGCACCGACATCGACTACAT


GATCAGCCCCGTGACCGACAAGACCGGCACCCACTTCGACAGCCGCGAGAACATCAAGAACC


TGCCCGCCGACGCCGACGCCAACGGCGCCTACAACATCGCCCGCAAGGGCATCATGGCCATCG


AGAACATCATGAACGGCATCAGCGACCCCCTGAAGATCAGCAACGAGGACTACCTGAAGTACA


TCCAGAACCAGCAGGAG





>SEQ ID NO: 22 C6Cpf1 nucleotide sequence


ATGAAGAACGTGTTCGGCGGCTTCACCAACCTGTACAGCCTGACCAAGACCCTGCGCTTCGAG


CTGAAGCCCACCAGCAAGACCCAGAAGCTGATGAAGCGCAACAACGTGATCCAGACCGACGA


GGAGATCGACAAGCTGTACCACGACGAGATGAAGCCCATCCTGGACGAGATCCACCGCCGCTT


CATCAACGACGCCCTGGCCCAGAAGATCTTCATCAGCGCCAGCCTGGACAACTTCCTGAAGGT


GGTGAAGAACTACAAGGTGGAGAGCGCCAAGAAGAACATCAAGCAGAACCAGGTGAAGCTG


CTGCAGAAGGAGATCACCATCAAGACCCTGGGCCTGCGCCGCGAGGTGGTGAGCGGCTTCATC


ACCGTGAGCAAGAAGTGGAAGGACAAGTACGTGGGCCTGGGCATCAAGCTGAAGGGCGACGG


CTACAAGGTGCTGACCGAGCAGGCCGTGCTGGACATCCTGAAGATCGAGTTCCCCAACAAGGC


CAAGTACATCGACAAGTTCCGCGGCTTCTGGACCTACTTCAGCGGCTTCAACGAGAACCGCAA


GAACTACTACAGCGAGGAGGACAAGGCCACCAGCATCGCCAACCGCATCGTGAACGAGAACC


TGAGCCGCTACATCGACAACATCATCGCCTTCGAGGAGATCCTGCAGAAGATCCCCAACCTGA


AGAAGTTCAAGCAGGACCTGGACATCACCAGCTACAACTACTACCTGAACCAGGCCGGCATCG


ACAAGTACAACAAGATCATCGGCGGCTACATCGTGGACAAGGACAAGAAGATCCAGGGCATCA


ACGAGAAGGTGAACCTGTACACCCAGCAGACCAAGAAGAAGCTGCCCAAGCTGAAGTTCCTG


TTCAAGCAGATCGGCAGCGAGCGCAAGGGCTTCGGCATCTTCGAGATCAAGGAGGGCAAGGA


GTGGGAGCAGCTGGGCGACCTGTTCAAGCTGCAGCGCACCAAGATCAACAGCAACGGCCGCG


AGAAGGGCCTGTTCGACAGCCTGCGCACCATGTACCGCGAGTTCTTCGACGAGATCAAGCGCG


ACAGCAACAGCCAGGCCCGCTACAGCCTGGACAAGATCTACTTCAACAAGGCCAGCGTGAAC


ACCATCAGCAACAGCTGGTTCACCAACTGGAACAAGTTCGCCGAGCTGCTGAACATCAAGGAG


GACAAGAAGAACGGCGAGAAGAAGATCCCCGAGCAGATCAGCATCGAGGACATCAAGGACAG


CCTGAGCATCATCCCCAAGGAGAACCTGGAGGAGCTGTTCAAGCTGACCAACCGCGAGAAGC


ACGACCGCACCCGCTTCTTCGGCAGCAACGCCTGGGTGACCTTCCTGAACATCTGGCAGAACG


AGATCGAGGAGAGCTTCAACAAGCTGGAGGAGAAGGAGAAGGACTTCAAGAAGAACGCCGC


CATCAAGTTCCAGAAGAACAACCTGGTGCAGAAGAACTACATCAAGGAGGTGTGCGACCgcatgc


TGGCCATCGAGCGCATGGCCAAGTACCACCTGCCCAAGGACAGCAACCTGAGCCGCGAGGAG


GACTTCTACTGGATCATCGACAACCTGAGCGAGCAGCGCGAGATCTACAAGTACTACAACGCCT


TCCGCAACTACATCAGCAAGAAGCCCTACAACAAGAGCAAGATGAAGCTGAACTTCGAGAAC


GGCAACCTGCTGGGCGGCTGGAGCGACGGCCAGGAGCGCAACAAGGCCGGCGTGATCCTGCG


CAACGGCAACAAGTACTACCTGGGCGTGCTGATCAACCGCGGCATCTTCCGCACCGACAAGAT


CAACAACGAGATCTACCGCACCGGCAGCAGCAAGTGGGAGCGCCTGATCCTGAGCAACCTGA


AGTTCCAGACCCTGGCCGGCAAGGGCTTCCTGGGCAAGCACGGCGTGAGCTACGGCAACATG


AACCCCGAGAAGAGCGTGCCCAGCCTGCAGAAGTTCATCCGCGAGAACTACCTGAAGAAGTA


CCCCCAGCTGACCGAGGTGAGCAACACCAAGTTCCTGAGCAAGAAGGACTTCGACGCCGCCA


TCAAGGAGGCCCTGAAGGAGTGCTTCACCATGAACTTCATCAACATCGCCGAGAACAAGCTGC


TGGAGGCCGAGGACAAGGGCGACCTGTACCTGTTCGAGATCACCAACAAGGACTTCAGCGGC


AAGAAGAGCGGCAAGGACAACATCCACACCATCTACTGGAAGTACCTGTTCAGCGAGAGCAA


CTGCAAGAGCCCCATCATCGGCCTGAACGGCGGCGCCGAGATCTTCTTCCGCGAGGGCCAGAA


GGACAAGCTGCACACCAAGCTGGACAAGAAGGGCAAGAAGGTGTTCGACGCCAAGCGCTACA


GCGAGGACAAGCTGTTCTTCCACGTGAGCATCACCATCAACTACGGCAAGCCCAAGAACATCA


AGTTCCGCGACATCATCAACCAGCTGATCACCAGCATGAACGTGAACATCATCGGCATCGACCG


CGGCGAGAAGCACCTGCTGTACTACAGCGTGATCGACAGCAACGGCATCATCCTGAAGCAGGG


CAGCCTGAACAAGATCCGCGTGGGCGACAAGGAGGTGGACTTCAACAAGAAGCTGACCGAGC


GCGCCAACGAGATGAAGAAGGCCCGCCAGAGCTGGGAGCAGATCGGCAACATCAAGAACTTC


AAGGAGGGCTACCTGAGCCAGGCCATCcacgagATCTACCAGCTGATGATCAAGTACAACGCCAT


CATCGTGCTGGAGGACCTGAACACCGAGTTCAAGGCCAAGCGCCTGAGCAAGGTGGAGAAGA


GCGTGTACAAGAAGTTCGAGCTGAAGCTGGCCCGCAAGCTGAACCACCTGATCCTGAAGGAC


CGCAACACCAACGAGATCGGCGGCGTGCTGAAGGCCTACCAGCTGACCCCCACCATCGGCGGC


GGCGACGTGAGCAAGTTCGAGAAGGCCAAGCAGTGGGGCATGATGTTCTACGTGCGCGCCAA


CTACACCAGCACCACCGACCCCGTGACCGGCTGGCGCAAGCACCTGTACATCAGCAACTTCAG


CAACAACAGCGTGATCAAGAGCTTCTTCGACCCCACCAACCGCGACACCGGCATCGAGATCTT


CTACAGCGGCAAGTACCGCAGCTGGGGCTTCCGCTACGTGCAGAAGGAGACCGGCAAGAAGT


GGGAGCTGTTCGCCACCAAGGAGCTGGAGCGCTTCAAGTACAACCAGACCACCAAGCTGTGC


GAGAAGATCAACCTGTACGACAAGTTCGAGGAGCTGTTCAAGGGCATCGACAAGAGCGCCGA


CATCTACAGCCAGCTGTGCAACGTGCTGGACTTCCGCTGGAAGAGCCTGGTGTACCTGTGGAA


CCTGCTGAACCAGATCCGCAACGTGGACAAGAACGCCGAGGGCAACAAGAACGACTTCATCC


AGAGCCCCGTGTACCCCTTCTTCGACAGCCGCAAGACCGACGGCAAGACCGAGCCCATCAACG


GCGACGCCAACGGCGCCCTGAACATCGCCCGCAAGGGCCTGATGCTGGTGGAGCGCATCAAG


AACAACCCCGAGAAGTACGAGCAGCTGATCCGCGACACCGAGTGGGACGCCTGGATCCAGAA


CTTCAACAAGGTGAAC





>SEQ ID NO: 23 PxCpf1 nucleotide sequence


ATGATCATCGGCCGCGACTTCAACATGTACTACCAGAACCTGACCAAGATGTACCCCATCAGCA


AGACCCTGCGCAACGAGCTGATCCCCGTGGGCAAGACCCTGGAGAACATCCGCAAGAACGGC


ATCCTGGAGGCCGACATCCAGCGCAAGGCCGACTACGAGCACGTGAAGAAGCTGATGGACAA


CTACCACAAGCAGCTGATCAACGAGGCCCTGCAGGGCGTGCACCTGAGCGACCTGAGCGACG


CCTACGACCTGTACTTCAACCTGAGCAAGGAGAAGAACAGCGTGGACGCCTTCAGCAAGTGCC


AGGACAAGCTGCGCAAGGAGATCGTGAGCTTCCTGAAGAACCACGAGAACTTCCCCAAGATC


GGCAACAAGGAGATCATCAAGCTGATCCAGAGCCTGAACGACAACGACGCCGACAACAACGC


CCTGGACAGCTTCAGCAACTTCTACACCTACTTCAGCAGCTACAACGAGGTGCGCAAGAACCT


GTACAGCGACGAGGAGAAGAGCAGCACCGTGGCCTACCGCCTGATCAACGAGAACCTGCCCA


AGAGCCTGGACAACATCAAGGCCTACGCCATCGCCAAGAAGGCCGGCGTGCGCGCCGAGGGC


CTGAGCGAGGAGGAGCAGGACTGCCTGTTCATCATCGAGACCTTCGAGCGCACCCTGACCCAG


GACGGCATCGACAACTACAACGCCGACATCGGCAAGCTGAACACCGCCATCAACCTGTACAAC


CAGCAGAACAAGAAGCAGGAGGGCTTCCGCAAGGTGCCCCAGATGAAGTGCCTGTACAAGCA


GATCCTGAGCGACCGCGAGGAGGCCTTCATCGACGAGTTCAGCGACGACGAGGACCTGATCAC


CAACATCGAGAGCTTCGCCGAGAACATGAACGTGTTCCTGAACAGCGAGATCATCACCGACTT


CAAGAACGCCCTGGTGGAGAGCGACGGCAGCCTGGTGTACATCAAGAACGACGTGAGCAAGA


CCCTGTTCAGCAACATCGTGTTCGGCAGCTGGAACGCCATCGACGAGAAGCTGAGCGACGAGT


ACGACCTGGCCAACAGCAAGAAGAAGAAGGACGAGAagtactACGAGAAGCGCCAGAAGGAGC


TGAAGAAGAACAAGAGCTACGACCTGGAGACCATCATCGGCCTGTTCGACGACAGCATCGACG


TGATCGGCAAGTACATCGAGAAGCTGGAGAGCGACATCACCGCCATCGCCGAGGCCAAGAAC


GACTTCGACGAGATCGTGCTGCGCAAGCACGACAAGAACAAGAGCCTGCGCAAGAACACCAA


CGCCGTGGAGGCCATCAAGAGCTACCTGGACACCGTGAAGGACTTCGAGCGCGACATCAAGCT


GATCAACGGCAGCGGCCAGGAGGTGGAGAAGAACCTGGTGGTGTACGCCGAGCAGGAGAACA


TCCTGGCCGAGATCAAGAACGTGGACAGCCTGTACAACATGAGCCGCAACTACCTGACCCAGA


AGCCCTTCAGCACCGAGAAGTTCAAGCTGAACTTCGAGAACCCCACCCTGCTGAACGGCTGG


GACCGCAACAAGGAGAAGGACTACCTGGGCATCCTGTTCGAGAAGGAGGGCATGTACTACCTG


GGCATCATCAACAACAACCACCGCAAGATCTTCGAGAACGAGAAGCTGTGCACCGGCAAGGA


GAGCTGCTTCAACAAGATCGTGTACAAGCAGATCAGCAACGCCGCCAAGTACCTGAGCAGCAA


GCAGATCAACCCCCAGAACCCCCCCAAGGAGATCGCCGAGATCCTGCTGAAGCGCAAGGCCG


ACAGCAGCAGCCTGAGCCGCAAGGAGACCGAGCTGTTCATCGACTACCTGAAGGACGACTTC


CTGGTGAACTACCCCATGATCATCAACAGCGACGGCGAGAACTTCTTCAACTTCCACTTCAAGC


AGGCCAAGGACTACGGCAGCCTGCAGGAGTTCTTCAAGGAGGTGGAGCACCAGGCCTACAGC


CTGAAGACCCGCCCCATCGACGACAGCTACATCTACCGCATGATCGACGAGGGCAAGCTGTAC


CTGTTCCAGATCCACAACAAGGACTTCAGCCCCTACAGCAAGGGCAACCTGAACCTGCACACC


ATCTACCTGCAGATGCTGTTCGACCAGCGCAACCTGAACAACGTGGTGTACAAGCTGAACGGC


GAGGCCGAGGTGTTCTACCGCCCCGCCAGCATCAACGACGAGGAGGTGATCATCCACAAGGCC


GGCGAGGAGATCAAGAACAAGAACAGCAAGCGCGCCGTGGACAAGCCCACCAGCAAGTTCG


GCTACGACATCATCAAGGACCGCCGCTACAGCAAGGACAAGTTCATGCTGCACATCCCCGTGA


CCATGAACTTCGGCGTGGACGAGACCCGCCGCTTCAACGACGTGGTGAACGACGCCCTGCGCA


ACGACGAGAAGGTGCGCGTGATCGGCATCGACCGCGGCGAGCGCAACCTGCTGTACGTGGTG


GTGGTGGACACCGACGGCACCATCCTGGAGCAGATCAGCCTGAACAGCATCATCAACAACGAG


TACAGCATCGAGACCGACTACCACAAGCTGCTGGACGAGAAGGAGGGCGACCGCGACCGCGC


CCGCAAGAACTGGACCACCATCGAGAACATCAAGGAGCTGAAGGAGGGCTACCTGAGCCAGG


TGGTGAACGTGATCGCCAAGCTGGTGCTGAAGTACAACGCCATCATCTGCCTGGAGGACCTGA


ACTTCGGCTTCAAGCgcggccgcCAGAAGGTGGAGAAGCAGGTGTACCAGAAGTTCGAGAAGATG


CTGATCGACAAGCTGAACTACCTGGTGATCGACAAGAGCCGCAAGCAGGAGAAGCCCGAGGA


GTTCGGCGGCGCCCTGAACGCCCTGCAGCTGACCAGCAAGTTCACCAGCTTCAAGGACATGGG


CAAGCAGACCGGCATCATCTACTACGTGCCCGCCTACCTGACCAGCAAGATCGACCCCACCAC


CGGCTTCGCCAACCTGTTCTACGTGAAGTACGAGAACGTGGAGAAGGCCAAGGAGTTCTTCAG


CCGCTTCGACAGCATCAGCTACAACAACGAGAGCGGCTACTTCGAGTTCGCCTTCGACTACAA


GAAGTTCACCGACCGCGCCTGCGGCGCCCGCAGCCAGTGGACCGTGTGCACCTACGGCGAGC


GCATCATCAAGTACCGCAACGCCGACAAGAACAACAGCTTCGACGACAAGACCATCGTGCTGA


GCGAGGAGTTCAAGGAGCTGTTCAGCATCTACGGCATCAGCTACGAGGACGGCGCCGAGCTGA


AGAACAAGATCATGAGCGTGGACGAGGCCGACTTCTTCCGCTGCCTGACCGGCCTGCTGCAGA


AGACCCTGCAGATGCGCAACAGCAGCAACGACGGCACCCGCGACTACATCATCAGCCCCATCA


TGAACGACCGCGGCGAGTTCTTCAACAGCGAGGCCTGCGACGCCAGCAAGCCCAAGGACGCC


GACGCCAACGGCGCCTTCAACATCGCCCGCAAGGGCCTGTGGGTGCTGGAGCAGATCCGCAAC


ACCCCCAGCGGCGACAAGCTGAACCTGGCCATGAGCAACGCCGAGTGGCTGGAGTACGCCCA


GCGCAACCAGATC





>SEQ ID NO: 24 PrCpf1 nucleotide sequence


ATGATCATCGGCCGCGACTTCAACATGTACTACCAGAACCTGACCAAGATGTACCCCATCAGCA


AGACCCTGCGCAACGAGCTGATCCCCGTGGGCAAGACCCTGGAGAACATCCGCAAGAACGGC


ATCCTGGAGGCCGACATCCAGCGCAAGGCCGACTACGAGCACGTGAAGAAGCTGATGGACAA


CTACCACAAGCAGCTGATCAACGAGGCCCTGCAGGGCGTGCACCTGAGCGACCTGAGCGACG


CCTACGACCTGTACTTCAACCTGAGCAAGGAGAAGAACAGCGTGGACGCCTTCAGCAAGTGCC


AGGACAAGCTGCGCAAGGAGATCGTGAGCCTGCTGAAGAACCACGAGAACTTCCCCAAGATC


GGCAACAAGGAGATCATCAAGCTGCTGCAGAGCCTGTACGACAACGACACCGACTACAAGGC


CCTGGACAGCTTCAGCAACTTCTACACCTACTTCAGCAGCTACAACGAGGTGCGCAAGAACCT


GTACAGCGACGAGGAGAAGAGCAGCACCGTGGCCTACCGCCTGATCAACGAGAACCTGCCCA


AGTTCCTGGACAACATCAAGGCCTACGCCATCGCCAAGAAGGCCGGCGTGCGCGCCGAGGGC


CTGAGCGAGGAGGACCAGGACTGCCTGTTCATCATCGAGACCTTCGAGCGCACCCTGACCCAG


GACGGCATCGACAACTACAACGCCGCCATCGGCAAGCTGAACACCGCCATCAACCTGTTCAAC


CAGCAGAACAAGAAGCAGGAGGGCTTCCGCAAGGTGCCCCAGATGAAGTGCCTGTACAAGCA


GATCCTGAGCGACCGCGAGGAGGCCTTCATCGACGAGTTCAGCGACGACGAGGACCTGATCAC


CAACATCGAGAGCTTCGCCGAGAACATGAACGTGTTCCTGAACAGCGAGATCATCACCGACTT


CAAGATCGCCCTGGTGGAGAGCGACGGCAGCCTGGTGTACATCAAGAACGACGTGAGCAAGA


CCAGCTTCAGCAACATCGTGTTCGGCAGCTGGAACGCCATCGACGAGAAGCTGAGCGACGAGT


ACGACCTGGCCAACAGCAAGAAGAAGAAGGACGAGAagtactACGAGAAGCGCCAGAAGGAGC


TGAAGAAGAACAAGAGCTACGACCTGGAGACCATCATCGGCCTGTTCGACGACAACAGCGAC


GTGATCGGCAAGTACATCGAGAAGCTGGAGAGCGACATCACCGCCATCGCCGAGGCCAAGAA


CGACTTCGACGAGATCGTGCTGCGCAAGCACGACAAGAACAAGAGCCTGCGCAAGAACACCA


ACGCCGTGGAGGCCATCAAGAGCTACCTGGACACCGTGAAGGACTTCGAGCGCGACATCAAG


CTGATCAACGGCAGCGGCCAGGAGGTGGAGAAGAACCTGGTGGTGTACGCCGAGCAGGAGAA


CATCCTGGCCGAGATCAAGAACGTGGACAGCCTGTACAACATGAGCCGCAACTACCTGACCCA


GAAGCCCTTCAGCACCGAGAAGTTCAAGCTGAACTTCAACCGCGCCACCCTGCTGAACGGCTG


GGACAAGAACAAGGAGACCGACAACCTGGGCATCCTGTTCGAGAAGGACGGCATGTACTACC


TGGGCATCATGAACACCAAGGCCAACAAGATCTTCGTGAACATCCCCAAGGCCACCAGCAACG


ACGTGTACCACAAGGTGAACTACAAGCTGCTGCCCGGCCCCAACAAGATGCTGCCCAAGGTGT


TCTTCGCCCAGAGCAACCTGGACTACTACAAGCCCAGCGAGGAGCTGCTGGCCAAGTACAAGG


CCGGCACCCACAAGAAGGGCGACAACTTCAGCCTGGAGGACTGCCACGCCCTGATCGACTTCT


TCAAGGCCAGCATCGAGAAGCACCCCGACTGGAGCAGCTTCGGCTTCGAGTTCAGCGAGACCT


GCACCTACGAGGACCTGAGCGGCTTCTACCGCGAGGTGGAGAAGCAGGGCTACAAGATCACCT


ACACCGACGTGGACGCCGACTACATCACCAGCCTGGTGGAGCGCGACGAGCTGTACCTGTTCC


AGATCTACAACAAGGACTTCAGCCCCTACAGCAAGGGCAACCTGAACCTGCACACCATCTACC


TGCAGATGCTGTTCGACCAGCGCAACCTGAACAACGTGGTGTACAAGCTGAACGGCGAGGCC


GAGGTGTTCTACCGCCCCGCCAGCATCAACGACGAGGAGGTGATCATCCACAAGGCCGGCGAG


GAGATCAAGAACAAGAACAGCAAGCGCGCCGTGGACAAGCCCACCAGCAAGTTCGGCTACGA


CATCATCAAGGACCGCCGCTACAGCAAGGACAAGTTCATGCTGCACATCCCCGTGACCATGAA


CTTCGGCGTGGACGAGACCCGCCGCTTCAACGACGTGGTGAACGACGCCCTGCGCAACGACG


AGAAGGTGCGCGTGATCGGCATCGACCGCGGCGAGCGCAACCTGCTGTACGTGGTGGTGGTGG


ACACCGACGGCACCATCCTGGAGCAGATCAGCCTGAACAGCATCATCAACAACGAGTACAGCA


TCGAGACCGACTACCACAAGCTGCTGGACGAGAAGGAGGGCGACCGCGACCGCGCCCGCAAG


AACTGGACCACCATCGAGAACATCAAGGAGCTGAAGGAGGGCTACCTGAGCCAGGTGGTGAA


CGTGATCGCCAAGCTGGTGCTGAAGTACAACGCCATCATCTGCCTGGAGGACCTGAACTTCGG


CTTCAAGCgcggccgcCAGAAGGTGGAGAAGCAGGTGTACCAGAAGTTCGAGAAGATGCTGATCG


ACAAGCTGAACTACCTGGTGATCGACAAGAGCCGCAAGCAGGACAAGCCCGAGGAGTTCGGC


GGCGCCCTGAACGCCCTGCAGCTGACCAGCAAGTTCACCAGCTTCAAGGACATGGGCAAGCA


GACCGGCATCATCTACTACGTGCCCGCCTACCTGACCAGCAAGATCGACCCCACCACCGGCTTC


GCCAACCTGTTCTACGTGAAGTACGAGAACGTGGAGAAGGCCAAGGAGTTCTTCAGCCGCTTC


GACAGCATCAGCTACAACAACGAGAGCGGCTACTTCGAGTTCGCCTTCGACTACAAGAAGTTC


ACCGACCGCGCCTGCGGCGCCCGCAGCCAGTGGACCGTGTGCACCTACGGCGAGCGCATCATC


AAGTTCCGCAACACCGAGAAGAACAACAGCTTCGACGACAAGACCATCGTGCTGAGCGAGGA


GTTCAAGGAGCTGTTCAGCATCTACGGCATCAGCTACGAGGACGGCGCCGAGCTGAAGAACAA


GATCATGAGCGTGGACGAGGCCGACTTCTTCCGCAGCCTGACCCGCCTGTTCCAGCAGACCAT


GCAGATGCGCAACAGCAGCAACGACGTGACCCGCGACTACATCATCAGCCCCATCATGAACGA


CCGCGGCGAGTTCTTCAACAGCGAGGCCTGCGACGCCAGCAAGCCCAAGGACGCCGACGCCA


ACGGCGCCTTCAACATCGCCCGCAAGGGCCTGTGGGTGCTGGAGCAGATCCGCAACACCCCCA


GCGGCGACAAGCTGAACCTGGCCATGAGCAACGCCGAGTGGCTGGAGTACGCCCAGCGCAAC


CAGATC





>SEQ ID NO: 25 crRNA1 scaffold sequence


ATTTCTACtgttGTAGAT





>SEQ ID NO: 26 crRNA2 scaffold sequence


ATTTCTACtattGTAGAT





>SEQ ID NO: 27 crRNA3 scaffold sequence


ATTTCTACtactGTAGAT





>SEQ ID NO: 28 crRNA4 scaffold sequence


ATTTCTACifigGTAGAT





>SEQ ID NO: 29 crRNA5 scaffold sequence


ATTTCTACtagttGTAGAT





>SEQ ID NO: 30 crRNA31 scaffold sequence


ATTTCTACTATGGTAGAT





>SEQ ID NO: 31 crRNA77 scaffold sequence


ATTTCTACTGTCGTAGAT





>SEQ ID NO: 32 crRNA129 scaffold sequence


ATTTCTACTTGTGTAGAT





>SEQ ID NO: 33 crRNA159 scaffold sequence


ATTTCTACTGTGGTAGAT








Claims
  • 1. A genome editing system for site-directed modification of a target sequence in the genome of a cell, comprising at least one of the following i) to v): i) a Cpf1 protein, and a guide RNA;ii) an expression construct comprising a nucleotide sequence encoding a Cpf1 protein, and a guide RNA;iii) a Cpf1 protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;iv) an expression construct comprising a nucleotide sequence encoding a Cpf1 protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;v) an expression construct comprising a nucleotide sequence encoding a Cpf1 protein and a nucleotide sequence encoding a guide RNA;wherein the Cpf1 protein comprises an amino acid sequence of SEQ ID NOs: 1-12 or an amino acid sequence having at least 80% sequence identity to one of SEQ ID NOs: 1-12, the guide RNA capable of targeting the Cpf1 protein to a target sequence in the genome of the cell.
  • 2. The system of claim 1, wherein the Cpf1 protein comprises an amino acid sequences having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, even 100% sequence identity to one of SEQ ID NOs: 1-12.
  • 3. The system of claim 1, wherein the Cpf1 protein comprises an amino acid sequence having one or more amino acid residue substitution, deletion or addition relative to one of SEQ ID NOs: 1-12, for example, the Cpf1 protein comprises an amino acid sequence having 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acid residue substitution, deletion or addition relative to one of SEQ ID NOs: 1-12.
  • 4. The system of claim 1, wherein the Cpf1 protein is derived from a species selected from: Agathobacter rectalis, Lachnospira pectinoschiza, Sneathia amnii, Helcococcus kunzii, Arcobacter butzleri, Bacteroidetes oral, Oribacterium sp., Butyrivibrio sp., Proteocatella sphenisci, Candidatus Dojkabacteria, Pseudobutyrivibrio xylanivorans, Pseudobutyrivibrio ruminis.
  • 5. The system of claim 1, wherein the Cpf1 protein comprises an amino acid sequence selected from SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 8, SEQ ID NO: 11, and SEQ ID NO: 12.
  • 6. The system of claim 1, wherein the nucleotide sequence encoding the Cpf1 protein is codon optimized for the organism from which the cell to be genome edited is derived.
  • 7. The system of claim 1, wherein the nucleotide sequence encoding the Cpf1 protein is selected from SEQ ID Nos: 13-24.
  • 8. The system of claim 1, wherein the guide RNA is a crRNA.
  • 9. The system of claim 8, wherein the coding sequence of the crRNA comprises a crRNA scaffold sequence set forth in any one of SEQ ID NOs: 25-33, preferably comprises a crRNA scaffold sequence set forth in SEQ ID NO:30.
  • 10. The system of claim 1, the target sequence has the following structure: 5′-TYYN-NX-3′ or 5′-YYN-NX-3′, wherein N is independently selected from A, G, C and T, Y is selected from C and T; x is an integer of 15≤x≤35; Nx represents x consecutive nucleotides.
  • 11. A method of modifying a target sequence in the genome of a cell, comprising introducing the genome editing system of any one of claims 1-10 into the cell, whereby the guide RNA targets the Cpf1 protein to the target sequence in the genome of the cell, resulting in one or more nucleotide substitution, deletion or addition in the target sequence.
  • 12. The method of claim 11, wherein the cell is from a mammal such as human, mouse, rat, monkey, dog, pig, sheep, cattle, cat; a poultry such as chicken, duck, goose; a plant including monocots and dicots, such as rice, corn, wheat, sorghum, barley, soybean, peanut and Arabidopsis thaliana and so on.
  • 13. The method of any one of claims 11-12, wherein the system is introduced into the cell by the following methods: calcium phosphate transfection, protoplast fusion, electroporation, lipofection, microinjection, viral infection (e.g., baculovirus, vaccinia virus, adenovirus, adeno-associated virus, lentivirus and other viruses), gene gun method, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation.
  • 14. A method of treating a disease in a subject in need thereof, comprising delivering to the subject an effective amount of the genome editing system of any one of claims 1-10 to modify a gene related with the disease in the subject.
  • 15. Use of the genome editing system of any one of claims 1-10 for the preparation of a pharmaceutical composition for treating a disease in a subject in need thereof, wherein the genome editing system is for modifying a gene related with the disease in the subject.
  • 16. A pharmaceutical composition for treating a disease in a subject in need thereof, comprising the genome editing system of any one of claims 1-10 and a pharmaceutically acceptable carrier, wherein the genome editing system is for modifying a gene related with the disease.
  • 17. The method, use or pharmaceutical composition of claims 14-16, wherein the subject is a mammal such as a human.
  • 18. The method, use or pharmaceutical composition of claim 17, wherein the disease is selected from tumors, inflammation, Parkinson's disease, cardiovascular disease, Alzheimer's disease, autism, drug addiction, age-related macular degeneration, schizophrenia, hereditary diseases.
  • 19. A crRNA, which comprises a crRNA scaffold sequence corresponding to any one of SEQ ID NOs: 25-33, or the coding sequence of which comprises a sequence set forth in any one of SEQ ID NOs: 25-33.
  • 20. The crRNA of claim 19, wherein the crRNA is encoded by a nucleotide sequence selected from:
Priority Claims (1)
Number Date Country Kind
201710228595.X Apr 2017 CN national
PCT Information
Filing Document Filing Date Country Kind
PCT/CN2018/082446 4/10/2018 WO 00