The present disclosure discloses a technology related to a CRISPR/Cas9 system, particularly Cas9 protein variants. The CRISPR/Cas system is a type of immune system found in prokaryotic organisms and includes a Cas protein and guide RNA. The detailed configuration of the Cas protein, or guide RNA, is described in detail in a published document WO2018/231018 (International Publication Number).
A Streptococcus pyogenes-derived Cas9 protein is also referred to as a SpCas9 protein, and is one of the orthologs of the Cas9 protein. The SpCas9 protein is known to exhibit double-stranded DNA cleavage activity in cells. However, gene editing using the SpCas9 protein is limited to the vicinity of the PAM sequence of 5′-NGG-3′, but research has been conducted to expand the PAM range.
If the SpCas9 protein could be used in methods for gene editing with various types of PAM sequences or regardless of the PAM sequence, gene editing at various sites would be possible. Accordingly, there may be an advantage in that the most efficient gene editing site can be selected from a wider range of sites even in the same gene.
Examples of SpCas9 proteins which have been developed to recognize various PAM sequences include known SpCas9 proteins such as Nureki-NG Cas9 capable of recognizing the PAM sequence of 5′-NGN-3′, and SpRY Cas9 that is close to PAMless.
The present disclosure discloses a SpCas9 variant capable of recognizing PAM sequences other than 5′-NGG-3′.
The present disclosure discloses a SpCas9 variant capable of recognizing PAM sequences other than 5′-NGG-3′.
The present disclosure discloses a CRISPR/Cas9 composition including the SpCas9 variant.
The present disclosure discloses a method for gene editing using the CRISPR/Cas9 composition.
The present disclosure discloses a method for screening the SpCas9 variant.
One aspect of the present disclosure provides a Streptococcus pyogenes SpCas9 variant, represented by an amino acid sequence with six or more amino acid residue differences compared to SEQ ID NO: 1 which is an amino acid sequence of wild-type SpCas9 protein.
The SpCas9 variant may include one selected from the following mutations compared to the wild-type SpCas9 protein:
The SpCas9 variant including the L1111R/D1135V/G1218K/E1219V/A1322R/R1335Q mutation may include a sequence having at least 80% identity to an amino acid sequence of SEQ ID NO: 3 or a sequence having 100% identity to the amino acid sequence of SEQ ID NO: 3. In this case, the SpCas9 variant may recognize PAM sequence which is 5′-NGN-3′.
The SpCas9 variant including the L1111R/D1135V/G1218Q/E1219Q/A1322R/R1333P/T1337L mutation may include a sequence having at least 80% to 100% sequence identity or sequence similarity with an amino acid sequence of SEQ ID NO: 4. In this case, the SpCas9 variant may recognize PAM sequence which is 5′-NNG-3′.
The SpCas9 variant including the L1111R/D1135V/G1218R/E1219F/A1322R/R1333G/R1335H/T1337C mutation may include a sequence having at least 80% to 100% sequence identity or sequence similarity with an amino acid sequence of SEQ ID NO: 5. In this case, the SpCas9 variant may be PAMless.
The SpCas9 variant including the L1111R/D1135V/G1218M/E1219T/A1322R/R1333P/R1335Y/T1337L mutation may include a sequence having at least 80% to 100% sequence identity or sequence similarity with the amino acid sequence of SEQ ID NO: 6. In this case, the SpCas9 variant may be PAMless.
The present disclosure provides a CRISPR/Cas9 composition.
The CRISPR/Cas9 composition may include the SpCas9 variant, or a nucleic acid encoding the SpCas9 variant; and a guide RNA or a nucleic acid encoding the guide RNA. The guide RNA may include crRNA and tracrRNA. The guide RNA may form a complex by interacting with the SpCas9 variant. The guide RNA may bind to a target sequence of a target gene.
The SpCas9 variant may include L1111R/D1135V/G1218K/E1219V/A1322R/R1335Q mutation.
The SpCas9 variant may include L1111R/D1135V/G1218Q/E1219Q/A1322R/R1333P/T1337L mutation.
The SpCas9 variant may include L1111R/D1135V/G1218R/E1219F/A1322R/R1333G/R1335H/T1337C mutation.
The SpCas9 variant may include L1111R/D1135V/G1218M/E1219T/A1322R/R1333P/R1335Y/T1337L mutation.
The crRNA may include a guide domain and a direct repeat. The sequence of the direct repeat may be a sequence including at least 90% identical to SEQ ID NO: 7. The sequence of the tracrRNA may include at least 90% identical to SEQ ID NO: 8.
The CRISPR/Cas9 composition includes the SpCas9 variant and the guide RNA, and the SpCas9 variant and the guide RNA may be in the form of ribonucleoprotein (RNP).
The CRISPR/Cas9 composition may include vector which includes a nucleic acid encoding the SpCas9 variant and/or a nucleic acid encoding the guide RNA.
The present disclosure provides a method for gene editing, including introducing a CRISPR/Cas9 composition into a target subject for gene editing.
The target subject for gene editing may be a plant, animal, plant tissue, animal tissue, prokaryotic cell, or eukaryotic cell.
The introduction may be performed via an injection, transfusion, implantation, or transplantation.
The introduction may be performed via electroporation, gene gun, sonoporation, magnetofection, temporary cell squeezing, cationic liposome method, lithium acetate-DMSO, lipid-mediated transfection, calcium phosphate precipitation, PEI (polyethyleneimine)-mediated transfection, DEAE-dextran-mediated transfection, or nanoparticle-mediated nucleic acid delivery.
The introduction may be performed via route selected from subretinal, subcutaneously, intradermally, intraocularly, intravitreally, intratumorally, intranodally, intramedullary, intramuscularly, intravenous, intralymphatic, and intraperitoneally.
The SpCas9 variant provided in the present disclosure can recognize a PAM sequence different from that of a wild-type SpCas9 protein, thereby enabling cleavage of target sequences adjacent to the PAM sequence other than 5′-NGG-3′.
Hereinafter, the content of the disclosure will be described in more detail through specific exemplary embodiments and examples with reference to the accompanying drawings. It should be noted that the accompanying drawings include some exemplary embodiments of the disclosure, but not all exemplary embodiments. The content disclosed by the present disclosure can be implemented variously, and is not limited to specific exemplary embodiments described herein. These embodiments should be construed as being provided to satisfy the legal requirements applicable herein. A person with ordinary skill in the art to which the invention disclosed herein pertains will be able to conceive of many modifications and other exemplary embodiments of the content of the invention disclosed herein. Therefore, it should be understood that the content of the invention disclosed herein is not limited to the specific exemplary embodiments described herein, and modifications thereof and other exemplary embodiments are also within the scope of the claim.
As used herein, the term “about” refers to an amount, a level, a value, a number, a frequency, a percentage, a dimension, a size, a quantity, a weight, or a length that varies to the degree of 30, 25, 20, 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1% with respect to a reference amount, level, value, number, frequency, percentage, dimension, size, quantity, weight, or length.
Unless otherwise stated, when describing the sequence of a peptide in the present specification, one-letter notation or three-letter notation of an amino acid is used, and it is described in the direction from the N-terminus to the C-terminus. For example, when expressed as RNVP, it refers to a peptide in which arginine, asparagine, valine, and proline are sequentially linked in the direction from the N-terminus to the C-terminus. As another example, when expressed as Thr-Leu-Lys, it refers to a peptide in which threonine, leucine, and lysine are sequentially linked in the direction from the N-terminus to the C-terminus. In the case of amino acids that cannot be represented by one-letter notation, other letters are used to describe these amino acids, and will be described via additional description.
The notation methods of each amino acid are as follows: Alanine (Ala, A); arginine (Arg, R); asparagine (Asn, N); aspartic acid (Asp, D); cysteine (Cys, C); glutamic acid (Glu, E); glutamine (Gln, Q); glycine (Gly, G); histidine (His, H); isoleucine (IIe, I); leucine (Leu, L); lysine (Lys K); methionine (Met, M); phenylalanine (Phe, F); proline (Pro, P); serine (Ser, S); threonine (Thr, T); tryptophan (Trp, W); tyrosine (Tyr, Y); and valine (Val, V).
The symbols A, T, C, G and U used in the present specification are to be interpreted as having the meanings understood by those skilled in the art. Depending on the context and technology, it may be interpreted as a base, nucleoside or nucleotide in DNA or RNA as appropriate. For example, when meaning a base, each may be interpreted as adenine (A), thymine (T), cytosine (C), guanine (G), or uracil (U) itself, when meaning a nucleoside, each may be interpreted as adenosine (A), thymidine (T), cytidine (C), guanosine (G), or uridine (U), and when meaning a nucleotide in a sequence, it should be construed as meaning a nucleotide including each of the above nucleosides.
In the present specification, the symbol N may be appropriately interpreted as a base, nucleoside, or nucleotide in DNA or RNA, depending on context and technology. For example, when meaning a base, each may be interpreted as any one of adenine (A), thymine (T), cytosine (C), guanine (G), and uracil (U), when meaning a nucleoside, each may be interpreted as any one of adenosine (A), thymidine (T), cytidine (C), guanosine (G), and uridine (U), and when meaning a nucleotide in a sequence, it should be construed as meaning a nucleotide including each of the above nucleosides.
As used herein, the term “operably linked” refers to being linked such that in the gene expression technique, a particular configuration is linked to another configuration to allow the particular configuration to function in an intended manner. For example, the case where a promoter sequence is operably linked to a coding sequence means that the promoter is linked so as to affect the intracellular transcription and/or expression of the coding sequence. Further, the term includes all meanings that can be recognized by those skilled in the art, and may be appropriately interpreted according to the context.
As used herein, “target gene” or “target nucleic acid” basically refers to an intracellular gene or nucleic acid that is the target of gene editing. “Target gene” and “target nucleic acid” may be used interchangeably, and may refer to the same subject. “Target gene” or “target nucleic acid” may refer to an endogenous gene or nucleic acid possessed by a target cell, or an exogenous gene or nucleic acid unless otherwise described, and is not particularly limited as long as it may be the target of gene editing. The target gene or target nucleic acid may be single-stranded DNA, double-stranded DNA, and/or RNA. Further, the term includes all meanings that can be recognized by those skilled in the art, and may be appropriately interpreted according to the context.
As used herein, the terms “target strand” and “non-target strand” are used to specify each strand when describing that a CRISPR/Cas9 complex acts on a double-stranded nucleic acid as a target nucleic acid. Basically, “target strand” and “non-target strand” refer to each strand of a double-stranded nucleic acid, and have complementary sequences to each other. Here, “non-target strand” refers to a strand in which a protospacer adjacent motif (PAM) recognized by the Cas9 protein is located, and “target strand” refers to a strand to which a guide RNA binds complementarily. In other words, when the CRISPR/Cas9 complex cleaves a target nucleic acid, 1) the Cas9 protein recognizes a PAM sequence present in a non-target strand, and 2) a portion of the guide RNA that is designed to target a target sequence (a so-called guide domain) forms a duplex by complementarily binding to the target strand, thereby activating the nucleic acid cleavage function of the CRISPR/Cas9 complex.
Further, the term includes all meanings that can be recognized by those skilled in the art, and may be appropriately interpreted according to the context.
As used herein, “target sequence” refers to a particular sequence that a CRISPR/Cas complex recognizes to cleave a target gene or target nucleic acid. The target sequence may be appropriately selected according to the purpose thereof. Specifically, a “target sequence” is a sequence included in a target gene or target nucleic acid sequence, and the term refers to a sequence that is complementary to a guide domain sequence included in a guide RNA or an engineered guide RNA provided in the present specification. In general, the guide domain sequence is determined in consideration of a target gene or target nucleic acid sequence and a PAM sequence recognized by an effector protein of a CRISPR/Cas system. “Target sequence” refers to a sequence included in the target strand which complementarily binds to the guide RNA of the CRISPR/Cas complex.
As used herein, “non-target sequence” refers to a sequence having complementarity to the target sequence. The non-target sequence is a sequence included in the non-target strand, and when present as a double strand, it is generally bound to the target sequence. In addition, the non-target sequence is adjacent to the PAM sequence.
The term includes all meanings that can be recognized by those skilled in the art, and may be appropriately interpreted according to the context.
As used herein, “vector” collectively refers to all materials capable of carrying a genetic material into a cell, unless otherwise specified. For example, a vector may be a DNA molecule including a genetic material of interest, for example, a nucleic acid encoding a Cas protein of the CRISPR/Cas system, and/or a nucleic acid encoding a guide RNA, but is not limited thereto. The term includes all meanings that can be recognized by those skilled in the art, and may be appropriately interpreted according to the context.
As used herein, “non-homologous end joining (NHEJ)” refers to a method of restoring or repairing a double-stranded break in DNA by joining together both ends of a cleaved double or single strand, and generally, the broken double strand is recovered when two compatible ends formed by a double-stranded break (for example, cleavage) repeatedly come into frequent contact to fully join the two ends. NHEJ is a restoring method that can occur at any stage of the cell cycle, and occurs primarily when there is no homologous gene used as a template in the cell, such as in the G1 phase. The process of repairing damaged genes or nucleic acids using NHEJ may result in the insertion and/or deletion of a partial nucleic acid sequence (indel) at an NHEJ repair site. The term includes all meanings that can be recognized by those skilled in the art, and may be appropriately interpreted according to the context.
As used herein, “homology directed repairing (HDR)” refers to a method through which a damaged gene or nucleic acid can be corrected without error using a homologous sequence as a template to repair or restore the damaged gene or nucleic acid, and generally, to repair or restore damaged DNA, that is, to restore the original information that the cell possesses, the damaged DNA is repaired or restored using the information of an unaltered complementary base sequence or the information of sister chromatids. The most common form of HDR is homologous recombination (HR). HDR is a repair or restoration method that typically occurs primarily during the S or G2/M phase of actively dividing cells.
To repair or restore damaged DNA using HDR, instead of using complementary base sequences or sister chromatids which cells originally possess, an artificially synthesized DNA template may be used using complementary or homologous base sequence information. That is, the damaged DNA may be repaired or restored by providing a nucleic acid template including a complementary or homologous base sequence to cells. In this case, when damaged DNA is repaired or restored by additionally including a nucleic acid sequence or nucleic acid fragment in the nucleic acid template, the additionally included nucleic acid sequence or nucleic acid fragment may be knocked-in into the damaged DNA. The term includes all meanings that can be recognized by those skilled in the art, and may be appropriately interpreted according to the context.
The CRISPR/Cas9 system has target-specific nucleic acid cleavage activity, but two conditions are required to exhibit such target-specific nucleic acid cleavage activity. First, there must be a base sequence having a certain length, which the Cas9 protein is capable of recognizing in a nucleic acid.
Second, there must be a sequence capable of complementarily binding to a guide domain included in a guide RNA around the base sequence having the certain length. When these two conditions are satisfied to allow 1) the Cas9 protein to recognize the base sequence having a certain length and 2) the guide domain to complementarily bind to the sequence part around the base sequence having a certain length, the nucleic acid cleavage activity is exhibited. In this case, the base sequence having the certain length recognized by the Cas9 protein is called a protospacer adjacent motif (PAM) sequence.
The PAM sequence is a unique sequence determined by the Cas9 protein. When the PAM sequence of the Cas9 protein is known, the PAM sequence may be used to design a CRISPR/Cas9 system which targets nucleic acids of predetermined target sequences surrounding the PAM sequence.
As used herein, “NLS” refers to a peptide having a certain length, which serves as a type of “tag” by being attached to a protein to be transported, or a sequence thereof, when a material outside the cell nucleus is transported into the nucleus by a nuclear transport action. Specifically, the NLS may be an NLS of SV40 virus large T-antigen having an amino acid sequence PKKKRKV (SEQ ID NO: 10); an NLS from a nucleoplasmin (for example, a nucleoplasmin bipartite NLS having a sequence KRPAATKKAGQAKKKK (SEQ ID NO: 69)); a c-myc NLS having an amino acid sequence PAAKRVKLD (SEQ ID NO: 70) or RQRRNELKRSP (SEQ ID NO: 71); an hRNPA1 M9 NLS having a sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 72); a sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 73) of an IBB domain from importin-alpha; sequences VSRKRPRP (SEQ ID NO: 74) and PPKKARED (SEQ ID NO: 75) of a myoma T protein; a sequence PQPKKKPL (SEQ ID NO: 76) of human p53; a sequence SALIKKKKKMAP (SEQ ID NO: 77) of mouse c-abl IV; sequences DRLRR (SEQ ID NO: 78) and PKQKKRK (SEQ ID NO: 79) of an influenza virus NS1; a sequence RKLKKKIKKL (SEQ ID NO: 80) of a hepatitis virus delta antigen; a sequence REKKKFLKRR (SEQ ID NO: 81) of a mouse Mx1 protein; a sequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 82) of a human poly (ADP-ribose) polymerase; or an NLS sequence derived from a sequence RKCLQAGMNLEARKTKK (SEQ ID NO: 83) of a steroid hormone receptor (human) glucocorticoid, but is not limited thereto. As used herein, the term “tag” includes all meanings that can be recognized by those skilled in the art, and may be appropriately interpreted according to the context.
As used herein, “amino acid residue” refers to a collective term for amino acid moieties other than —H and —OH which are removed when a peptide bond is formed through a condensation reaction as a structural unit of a polypeptide. That is, “amino acid residue” refers to a group other than an atomic group removed during bonding. For example, when a protein consists of a total of 1368 amino acids from the N-terminus to the C-terminus, the protein may be expressed as consisting of 1368 amino acid residues. Specifically, the wild-type SpCas9 protein consists of 1,368 amino acid residues.
For convenience of explanation, amino acid residues (amino acid moieties other than —H and —OH) may be described using general amino acid sequence notation. For example, in the wild-type SpCas9 protein, the 1218th amino acid residue in the direction from the N-terminus to the C-terminus may be expressed as glycine (Gly, G).
In the present disclosure, when describing what a specific amino acid residue in the SpCas9 protein is, the amino acid may be described using the position of the specific amino acid residue and the one-letter amino acid notation. For example, when the order of amino acid residues in the direction from the N-terminus to the C-terminus of a protein is expressed with numbers, if the 1218th amino acid residue is glycine (Gly; G), the protein can be said to include the amino acid residue “G1218.”
In the present disclosure, when describing a mutation in a SpCas9 variant related to a wild-type SpCas9 protein, the mutation may be described using the amino acid residue at which the mutation occurs and the amino acid which is substituted for the corresponding position. For example, when a wild-type SpCas9 protein includes a G1218 amino acid residue and the 1218th amino acid in a SpCas9 variant includes lysine (Lys, K), the SpCas9 variant may be described as including a G1218K mutation. That is, a variant in which the 1218th amino acid, glycine, in the amino acid sequence constituting the wild-type SpCas9 protein is substituted with lysine is represented by “G1218K.”
Furthermore, when a SpCas9 variant simultaneously includes G1218K, E1219V, and R1335Q mutations, the SpCas9 variant may be expressed as including “G1218K/E1219V/R1335Q” mutation.
Further, the term includes all meanings that can be recognized by those skilled in the art, and may be appropriately interpreted according to the context.
The CRISPR/Cas system is a type of immune system found in prokaryotic organisms and includes a Cas protein and guide RNA. The detailed configuration of the Cas protein, or guide RNA, is described in detail in a published document WO2018/231018 (International Publication Number). As used herein, the term “Cas protein” is a general term for a nuclease which may be interpreted as being used in the CRISPR/Cas system. Hereinafter, the DNA cleavage process of the most commonly used CRISPR/Cas9 system will be briefly described.
In the CRISPR/Cas9 complex, a protein with nuclease activity that cleaves nucleic acids is called a Cas9 protein. The Cas9 protein corresponds to Class 2, Type II in the CRISPR/Cas system classification, and examples thereof include Cas9 proteins derived from Streptococcus pyogenes, Streptococcus thermophilus, Streptococcus sp., Streptomyces pristinaespiralis, Streptomyces viridochromogenes, Streptomyces viridochromogenes, Streptosporangium roseum, and Streptosporangium roseum, and the like. The present disclosure discloses a variant of a Streptococcus pyogenes-derived Cas9 protein.
In the CRISPR/Cas9 complex, an RNA having a function of guiding the CRISPR/Cas9 complex to recognize a specific sequence included in a target nucleic acid is referred to as a guide RNA. The guide RNA may be generally described in the art as consisting of crRNA and tracrRNA.
In addition, the component of the guide RNA may be broadly divided functionally into 1) a scaffold component and 2) a guide domain component. Generally, the scaffold component includes tracrRNA and direct repeat, and the guide domain component and a portion of the direct repeat are included in the crRNA. The scaffold component is a part which interacts with the Cas9 protein, and is a part which can interact with the Cas9 protein to form a complex. The scaffold component is determined by the type of microorganism from which the Cas9 protein is derived. The guide domain component is a part capable of complementarily binding to a nucleotide sequence part with a certain length in a target nucleic acid, and may have a length of about 15 to 30 nt. The guide domain component is an artificially variable sequence, and is determined by a target nucleotide sequence of interest.
Process in which CRISPR/Cas9 Complex Cleaves Target Nucleic Acid
When the CRISPR/Cas9 complex comes into contact with the target nucleic acid, the Cas9 protein recognizes a nucleotide sequence with a certain length (PAM sequence), and a portion of the guide RNA (the guide domain component) complementarily binds to a target sequence (a part of the double strand of the target nucleic acid, which complementarily binds to a non-target sequence, which is a part adjacent to the PAM sequence), and the target nucleic acid is cleaved by the CRISPR/Cas9 complex. In this case, the nucleotide sequence with a certain length, which the Cas9 protein recognizes, is referred to as a protospacer-adjacent motif (PAM) sequence, which is determined by the type and origin of the Cas9 protein. For example, the Streptococcus pyogenes-derived Cas9 protein may recognize the 5′-NGG-3′ sequence in the target nucleic acid. In this case, N is one of adenosine (A), thymidine (T), cytidine (C), and guanosine (G). In order for the CRISPR/Cas9 complex to cleave the target nucleic acid, the guide domain component of the guide RNA must complementarily bind to the target sequence (a part which complementarily binds to the non-target sequence which is a part adjacent to the PAM sequence in the double strand of the target nucleic acid). Therefore, the guide domain component is designed and used according to the sequence of the target nucleic acid, specifically, the sequence adjacent to the PAM sequence. When the CRISPR/Cas9 complex cleaves the target nucleic acid, any position in a double-stranded region including the PAM sequence part of the target nucleic acid and/or a sequence which complementarily binds to the guide domain is cleaved.
Streptococcus pyogenes-Derived Cas9 Protein
A Streptococcus pyogenes-derived Cas9 protein is also referred to as SpCas9, and is one of the orthologs of the Cas9 protein. The wild-type SpCas9 protein may recognize the 5′-NGG-3′ sequence in the target nucleic acid as a PAM sequence. The amino acid sequence of the wild-type SpCas9 protein is as follows:
The wild-type SpCas9 protein has an advantage of having higher gene editing efficiency than other types of Cas9 proteins. However, since a PAM sequence that wild-type SpCas9 is capable of recognizing is limited to 5′-NGG-3′, nucleic acid sequences cannot be edited at positions which do not have a 5′-NGG-3′ PAM sequence nearby. That is, there is a problem in that the locations where gene editing can be performed using wild-type SpCas9 are limited. To overcome these problems, various attempts have been made in the art to produce a SpCas9 protein capable of recognizing a PAM sequence other than 5′-NGG-3′, and new mutants have been discovered accordingly. The present disclosure discloses a new SpCas9 variant.
The present disclosure discloses a SpCas9 variant. The SpCas9 variant has an amino acid sequence that differs partially from that of the wild-type SpCas9 protein. As a specific example, when the amino acid sequences of the SpCas9 variant and the wild-type SpCas9 protein are compared, the SpCas9 variant and the wild-type SpCas9 protein differ in 6, 7, or 8 amino acid residues.
The SpCas9 variant may recognize a PAM sequence different from that of the wild-type SpCas9 protein. As a specific example, the SpCas9 variant may recognize a 5′-NGG-3′ sequence.
As an example, the SpCas9 variant may cleave a target sequence adjacent to a 5′-NGN-3′ sequence. As another example, the SpCas9 variant may cleave a target sequence adjacent to a 5′-NNG-3′ sequence. As another example, the SpCas9 variant may be PAMless.
The SpCas9 variant differs in at least one amino acid residue of the G1218, E1219, R1333, R1335, and T1337 amino acid residues when compared to the wild-type SpCas9 protein. The G1218, E1219, R1333, R1335, and T1337 amino acid residues are amino acid residues which are involved in the recognition of the SpCas9 protein to the PAM sequence.
The SpCas9 variant differs in at least one amino acid residue of the G1218 and E1219 amino acid residues when compared to the wild-type SpCas9 protein. The G1218 and E1219 amino acid residues are amino acid residues related to the capability of hydrophobic interactions with a portion of the ribose of the PAM sequence located in the genome.
Furthermore, the SpCas9 variant differs in at least one amino acid residue of the R1333, R1335, and T1337 amino acid residues when compared to the wild-type SpCas9 protein. The R1333, R1335, and T1337 amino acid residues are amino acid residues involved in the capability to directly recognize and bind to the PAM sequence.
Further, the SpCas9 variant differs in L1111, D1135, and A1322 amino acid residues when compared to the wild-type SpCas9 protein. The SpCas9 variant includes the L1111R/D1135V/A1322R mutation when compared to the wild-type SpCas9 protein.
The L1111R/D1135V/A1322R mutation is a mutated portion common with a known mutant Nureki-NG Cas9 protein. The amino acid sequence of the Nureki-NG Cas9 protein is as follows:
Hereinafter, specific examples of the spCas9 variants will be described in detail.
As an embodiment, the SpCas9 variant may include, when compared to the wild-type SpCas9 protein, L1111R, D1135V, and A1322R mutations, and may include mutations in which the G1218, E1219, and R1335 amino acid residues are substituted with other amino acids. As a specific example, the SpCas9 variant may include L1111R/D1135V/G1218K/E1219V/A1322R/R1335Q mutation.
As a specific example, the SpCas9 variant includes the following substitutions compared to the wild type SpCas9 protein:
The SpCas9 variant including the L1111R/D1135V/G1218K/E1219V/A1322R/R1335Q mutation may recognize PAM sequence which is 5′-NGN-3′. As a specific example, the SpCas9 variant including the L1111R/D1135V/G1218K/E1219V/A1322R/R1335Q mutation may cleave a non-target sequence and/or a target sequence adjacent to PAM sequence which is 5′-NGN-3′.
As a specific example, the amino acid sequence of the SpCas9 variant including the L1111R/D1135V/G1218K/E1219V/A1322R/R1335Q mutation may be as follows: 5′-
As another specific example, the SpCas9 variant including the L1111R/D1135V/G1218K/E1219V/A1322R/R1335Q mutation may comprise an amino acid sequence having at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, identity to an amino acid sequence of SEQ ID NO: 3.
As an embodiment, the SpCas9 variant may include, when compared to the wild-type SpCas9 protein, L1111R, D1135V, and A1322R mutations, and may include mutations in which the G1218, E1219, R1333, and T1337 amino acid residues are substituted with other amino acids. As a specific example, the SpCas9 variant may include L1111R/D1135V/G1218Q/E1219Q/A1322R/R1333P/T1337L mutation.
As a specific example, the SpCas9 variant includes the following substitutions compared to the wild type SpCas9 protein:
The SpCas9 variant including the L1111R/D1135V/G1218Q/E1219Q/A1322R/R1333P/T1337L mutation may recognize a PAM sequence which is 5′-NNG-3′. As a specific example, the SpCas9 variant including the L1111R/D1135V/G1218Q/E1219Q/A1322R/R1333P/T1337L mutation may cleave a non-target sequence and/or a target sequence adjacent to a PAM sequence which is 5′-NNG-3′.
As a specific example, the amino acid sequence of the SpCas9 variant including the L1111R/D1135V/G1218Q/E1219Q/A1322R/R1333P/T1337L mutation may be as follows:
As another specific example, the SpCas9 variant including the L1111R/D1135V/G1218Q/E1219Q/A1322R/R1333P/T1337L mutation may comprise an amino acid sequence having at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with an amino acid sequence of SEQ ID NO: 4.
As an embodiment, the SpCas9 variant may include, when compared to the wild-type SpCas9 protein, L1111R, D1135V, and A1322R mutations, and may include mutations in which the G1218, E1219, R1333, R1335, and T1337 amino acid residues are substituted with other amino acids. As a specific example, the SpCas9 variant may include L1111R/D1135V/G1218R/E1219F/A1322R/R1333G/R1335H/T1337C mutation.
As a specific example, the SpCas9 variant includes the following substitutions compared to the wild type SpCas9 protein:
The SpCas9 variant including the L1111R/D1135V/G1218R/E1219F/A1322R/R1333G/R1335H/T1337C mutation may be PAMless. As a specific example, the SpCas9 variant including the L1111R/D1135V/G1218R/E1219F/A1322R/R1333G/R1335H/T1337C mutation may target and cleave a desired target sequence regardless of the specific PAM sequence.
As a specific example, the amino acid sequence of the SpCas9 variant including the L1111R/D1135V/G1218R/E1219F/A1322R/R1333G/R1335H/T1337C mutation may be as follows:
As another specific example, the SpCas9 variant including the L1111R/D1135V/G1218R/E1219F/A1322R/R1333G/R1335H/T1337C mutation may comprise an amino acid sequence having at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with an amino acid sequence of SEQ ID NO: 5.
As an embodiment, the SpCas9 variant may include, when compared to the wild-type SpCas9 protein, L1111R, D1135V, and A1322R mutations, and may include mutations in which the G1218, E1219, R1333, R1335, and T1337 amino acid residues are substituted with other amino acids. As a specific example, the SpCas9 variant may include L1111R/D1135V/G1218M/E1219T/A1322R/R1333P/R1335Y/T1337L mutation.
As a specific example, the SpCas9 variant include the following substitutions compared to the wild type SpCas9 protein:
The SpCas9 variant including the L1111R/D1135V/G1218M/E1219T/A1322R/R1333P/R1335Y/T1337L mutation may be PAMless. As a specific example, the SpCas9 variant including the L1111R/D1135V/G1218M/E1219T/A1322R/R1333P/R1335Y/T1337L mutation may target and cleave a desired target sequence regardless of the specific PAM sequence.
As a specific example, the amino acid sequence of the SpCas9 variant including the L1111R/D1135V/G1218M/E1219T/A1322R/R1333P/R1335Y/T1337L mutation may be as follows:
As another specific example, the SpCas9 variant including the L1111R/D1135V/G1218M/E1219T/A1322R/R1333P/R1335Y/T1337L mutation may comprise an amino acid sequence having at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with an amino acid sequence of SEQ ID NO: 6.
As an exemplary embodiment, the SpCas9 variant may further include a nuclear localization sequence (NLS).
As a specific example, an NLS may be fused to the N-terminus of the SpCas9 variant. As another specific example, an NLS may be fused to the C-terminus of the SpCas9 variant. As still another specific example, an NLS may be fused to the N-terminus and C-terminus of the SpCas9 variant. As yet another specific example, the amino acid sequence of the SpCas9 variant may comprise the sequence of the NLS.
In this case, the NLS refers to a peptide having a certain length, which serves as a type of “tag” by being attached to a protein to be transported, or a sequence thereof, when a material outside the cell nucleus is transported into the nucleus by a nuclear transport action. Accordingly, as a specific example, the SpCas9 variant to which the NLS is fused is more likely to be transported from the outside to the inside of the cell nucleus than the SpCas9 variant to which the NLS is not fused.
The NLS may be any one of those exemplified in the NLS paragraph in <<Definitions of terms>>. As a specific example, the amino acid sequence of the NLS may be PKKKRKV (SEQ ID NO: 10).
The present disclosure discloses a CRISPR/Cas9 composition. The CRISPR/Cas9 composition includes 1) the SpCas9 variant or a nucleic acid encoding the same and 2) a guide RNA or a nucleic acid encoding the same. In this case, the CRISPR/Cas9 composition may be used in a method for editing genes. As a specific example, the CRISPR/Cas9 composition may be used when genes are edited by targeting a sequence adjacent to PAM sequence other than 5′-NGG-3′.
The guide RNA may include crRNA and tracrRNA.
The crRNA may include a guide domain and a direct repeat. The guide domain and the direct repeat may be sequentially linked from 5′ to 3′ of the crRNA.
The guide domain is a part capable of complementarily binding to a nucleotide sequence part with a certain length in a target nucleic acid. The guide domain is an artificially variable sequence, and is determined by a target nucleotide sequence of interest.
The tracrRNA may form a CRISPR/Cas9 complex by interacting with the SpCas9 variant together with the direct repeat of crRNA.
As a specific example, the sequence of the direct repeat may include the following sequence: 5′-GUUUUAGAGCUA-3′ (SEQ ID NO: 7). As a specific example, the direct repeat may include a nucleic acid sequence having at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with a sequence of SEQ ID NO: 7.
As a specific example, the tracrRNA may include the following sequence: 5′-UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGA GUCGGUGC-3′ (SEQ ID NO: 8). As a specific example, the tracrRNA may include a nucleic acid sequence having at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with a sequence of SEQ ID NO: 8.
As a specific example, the guide RNA may include the following sequence: 5′-GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUU GAAAAAGUGGCACCGAGUCGGUGCUUUUUU-3′ (SEQ ID NO: 9). As a specific example, the guide RNA may include a nucleic acid sequence having at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with a sequence of SEQ ID NO: 9.
As a specific example, the guide RNA may be in the form of single guide RNA (sgRNA). In this case, the single guide RNA may be a single guide RNA in which a crRNA and a tracrRNA are linked to each other via a linker (for example, a linker of a 5′-GAAA-3′ or 5′-GA-3′ sequence).
As another specific example, the guide RNA may be one in which the crRNA and the tracrRNA are not linked to each other.
As a specific example, the CRISPR/Cas9 composition may include a vector including a nucleic acid encoding the SpCas9 variant and/or a nucleic acid encoding the guide RNA. The vector is described in detail in the following <<Configurational Form of CRISPR/Cas9 Composition-Vector>>paragraph.
As a specific example, the CRISPR/Cas9 composition may include a ribonucleoprotein (RNP) formed by the binding of a SpCas9 mutant protein and a guide RNA. This may mean a form of a CRISPR/Cas9 complex formed by allowing the direct repeat of the guide RNA and tracrRNA to interact with the SpCas9 variant.
As a specific example, the CRISPR/Cas9 composition may include any one or more of the following configurations 1) to 4): 1) a SpCas9 variant and a guide RNA; 2) a nucleic acid encoding the SpCas9 variant and a guide RNA; 3) a nucleic acid encoding the SpCas9 variant and a nucleic acid encoding the guide RNA; and 4) the SpCas9 variant and a nucleic acid encoding the guide RNA.
As an embodiment, the CRISPR/Cas9 composition may include a SpCas9 variant described in the paragraph <<Example 1 of SpCas9 variant-L1111R/D1135V/G1218K/E1219V/A1322R/R1335Q>> or a nucleic acid encoding the same. The CRISPR/Cas9 composition may include a guide RNA which targets a target sequence which complementarily binds to a non-target sequence adjacent to a PAM sequence which is 5′-NGN-3′ or a nucleic acid encoding the guide RNA.
As a specific example, the guide domain of the guide RNA may include a sequence complementary to a target sequence which complementarily binds to a non-target sequence adjacent to a PAM sequence which is 5′-NGN-3′. As a specific example, the guide domain may complementarily bind to a target sequence which complementarily binds to a non-target sequence adjacent to a PAM sequence which is 5′-NGN-3′.
As a specific example, the guide domain may have a length of 11nt, 12nt, 13nt, 14nt, 15nt, 16nt, 17nt, 18nt, 19nt, 20nt, 21nt, 22nt, 23nt, 24nt, 25nt, 27nt, 28nt, 29nt, or 30nt. As an exemplary embodiment, the guide domain may have a length within a range between any two numbers selected from the immediately preceding sentence. For example, the guide domain may have a length of 18nt to 22nt.
As a specific example, the amino acid sequence of the SpCas9 variant may be a sequence of SEQ ID NO: 3.
As another specific example, the SpCas9 variant may have an amino acid sequence comprising at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with an amino acid sequence of SEQ ID NO: 3.
As an embodiment, the CRISPR/Cas9 composition may include a SpCas9 variant described in the paragraph <<Example 2 of SpCas9 variant-L1111R/D1135V/G1218Q/E1219Q/A1322R/R1333P/T1337L>> or a nucleic acid encoding the same. The CRISPR/Cas9 composition may include a guide RNA which targets a target sequence which complementarily binds to a non-target sequence adjacent to PAM sequence which is 5′-NNG-3′ or a nucleic acid encoding the guide RNA.
As a specific example, the guide domain of the guide RNA may include a sequence complementary to a target sequence which complementarily binds to a non-target sequence adjacent to a PAM sequence which is 5′-NNG-3′. As a specific example, the guide domain may complementarily bind to a target sequence which complementarily binds to a non-target sequence adjacent to a PAM sequence which is 5′-NNG-3′.
As a specific example, the guide domain may have a length of 11nt, 12nt, 13nt, 14nt, 15nt, 16nt, 17nt, 18nt, 19nt, 20nt, 21nt, 22nt, 23nt, 24nt, 25nt, 27nt, 28nt, 29nt, or 30nt. As an exemplary embodiment, the guide domain may have a length within a range between any two numbers selected from the immediately preceding sentence. For example, the guide domain may have a length of 18nt to 22nt.
As a specific example, the amino acid sequence of the SpCas9 variant may be a sequence of SEQ ID NO: 4.
As another specific example, the SpCas9 variant may have an amino acid sequence comprising at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with an amino acid sequence of SEQ ID NO: 4.
As an embodiment, the CRISPR/Cas9 composition may include a SpCas9 variant described in the paragraph <<Example 3 of SpCas9 variant-L1111R/D1135V/G1218R/E1219F/A1322R/R1333G/R1335H/T1337C>> or a nucleic acid encoding the same. The CRISPR/Cas9 composition may include a guide RNA which targets a target sequence which complementarily binds to a non-target sequence adjacent to PAM sequence which is 5′-NNN-3′ or a nucleic acid encoding the guide RNA.
As a specific example, the guide domain of the guide RNA may include a sequence complementary to a target sequence which complementarily binds to a non-target sequence adjacent to PAM sequence which is 5′-NNN-3′. As a specific example, the guide domain may complementarily bind to a target sequence which complementarily binds to a non-target sequence adjacent to PAM sequence which is 5′-NNN-3′.
As a specific example, the guide domain may have a length of 11nt, 12nt, 13nt, 14nt, 15nt, 16nt, 17nt, 18nt, 19nt, 20nt, 21nt, 22nt, 23nt, 24nt, 25nt, 27nt, 28nt, 29nt, or 30nt. As an exemplary embodiment, the guide domain may have a length within a range between any two numbers selected from the immediately preceding sentence. For example, the guide domain may have a length of 18nt to 22nt.
As a specific example, the amino acid sequence of the SpCas9 variant may be a sequence of SEQ ID NO: 5.
As another specific example, the SpCas9 variant may have an amino acid sequence comprising at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with an amino acid sequence of SEQ ID NO: 5.
As an embodiment, the CRISPR/Cas9 composition may include a SpCas9 variant described in the paragraph <<Example 4 of SpCas9 variant-L1111R/D1135V/G1218M/E1219T/A1322R/R1333P/R1335Y/T1337L>> or a nucleic acid encoding the same. The CRISPR/Cas9 composition may include a guide RNA which complementarily binds to a non-target sequence adjacent to PAM sequence which is 5′-NNN-3′ or a nucleic acid encoding the guide RNA which targets a target sequence.
As a specific example, the guide domain of the guide RNA may include a sequence complementary to a target sequence which complementarily binds to a non-target sequence adjacent to PAM sequence which is 5′-NNN-3′. As a specific example, the guide domain may complementarily bind to a target sequence which complementarily binds to a non-target sequence adjacent to PAM sequence which is 5′-NNN-3′.
As a specific example, the guide domain may have a length of 11nt, 12nt, 13nt, 14nt, 15nt, 16nt, 17nt, 18nt, 19nt, 20nt, 21nt, 22nt, 23nt, 24nt, 25nt, 27nt, 28nt, 29nt, or 30nt. As an exemplary embodiment, the guide domain may have a length within a range between any two numbers selected from the immediately preceding sentence. For example, the guide domain may have a length of 18nt to 22nt.
As a specific example, the amino acid sequence of the SpCas9 variant may be a sequence of SEQ ID NO: 6.
As another specific example, the SpCas9 variant may have an amino acid sequence comprising at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with an amino acid sequence of SEQ ID NO: 6.
The CRISPR/Cas9 composition may include vectors in various forms. The configurations and forms of the vectors which may be included in the CRISPR/Cas9 composition will be described below.
As an exemplary embodiment, the vector may include a nucleic acid encoding the SpCas9 variant and/or a nucleic acid encoding the guide RNA.
As a specific example, the SpCas9 and the guide RNA may be a SpCas9 variant and a guide RNA described in the paragraph <<Component Example 1 of composition-L1111R/D1135V/G1218K/E1219V/A1322R/R1335Q>>. As a specific example, the SpCas9 variant and the guide RNA may be a SpCas9 variant and a guide RNA described in the paragraph <<Example 2 of SpCas9 variant-L1111R/D1135V/G1218Q/E1219Q/A1322R/R1333P/T1337L>>. As a specific example, the SpCas9 variant and the guide RNA may be a SpCas9 variant and a guide RNA described in the paragraph <<Example 3 of SpCas9 variant-L1111R/D1135V/G1218R/E1219F/A1322R/R1333G/R1335H/T1337C>. As a specific example, the SpCas9 variant and the guide RNA may be a SpCas9 variant and a guide RNA described in the paragraph <<Example 4 of SpCas9 variant-L1111R/D1135V/G1218M/E1219T/A1322R/R1333P/R1335Y/T1337L>>.
As an exemplary embodiment, the vector may include a component for knock-in. In this case, the vector may include a donor. The donor may refer to a nucleic acid sequence that helps restoring a target gene damaged by a gene editing process or a target nucleic acid damaged by a gene editing process through homology-directed repair (HDR) according to a gene editing process. In this case, the donor may include a nucleic acid sequence for insertion into the target gene or nucleic acid.
As a specific example, the donor may include a nucleic acid sequence (homology arm) having homology to a partial base sequence in the 5′ direction (upstream) and/or 3′ direction (downstream) at a position where the nucleic acid sequence is to be inserted, for example, at the cleavage site of the damaged target nucleic acid. In this case, the nucleic acid sequence to be inserted may be located between a nucleic acid sequence having homology to a base sequence in the 5′ direction and a nucleic acid sequence having homology to a base sequence in the 3′ direction, centered on the cleaved site of the target. In this case, a nucleic acid sequence having homology may have at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95% or more homology or may have complete homology to a base sequence(s) in the 5′ direction (upstream) and/or 3′ direction (downstream) of the target nucleic acid. In this case, the size of each homology arm may be designed to any length determined to be appropriate by those skilled in the art.
As a specific example, the vector may additionally include other components required to express the SpCas9 variant and/or guide RNA in cells.
For example, the other additional configurations may include expression control elements, selection elements, and the like.
Examples of the expression control element include a promoter, an enhancer, a polyadenylation signal, a Kozak consensus sequence, an inverted terminal repeat (ITR), a long terminal repeat (LTR), a terminator, an internal ribosome entry site (IRES), a 2A self-cleaving peptide, a replication origin, or the like.
Here, the promoter sequence may be designed differently depending on the corresponding RNA transcription factor or expression environment, and is not limited as long as the promoter sequence can appropriately express the element of the CRISPR/Cas system in the cell. For example, the promoter may be one of an SV40 initial promoter, a mouse mammary tumor virus long terminal repeat (LTR) promoter, an adenovirus major late promoter (Ad MLP), a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as a CMV immediate early promoter region (CMVIE), a Rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6) (Miyagishi et al., Nature Biotechnology 20, 497-500 (2002)), an enhanced U6 promoter (e.g., Xia et al., Nucleic Acids Res. 2003 Sep. 1; 31 (17)), a human H1 promoter (H1), and 7SK. As a specific example, the vector may include a CMV promoter. In this case, the sequence of the CMV promoter may be 5′-CGATGTACGGGCCAGATATACGCGTTGACATTGATTATTGACTAGTTATTAATAG TAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACAT AACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTG ACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGAC GTCAATGGGTGGACTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGT ATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCT GGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTA CGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGG GCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGT CAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAAC AACTCCGCCCCATTGACGCAAATGGGGGGTAGGCGTGTACGGTGGGAGGTCTA TATAAGCAGAGCTCTCTGGCTAACTAGAGAACCCACTGCTTACTGGCTTATCGA AATT-3′ (SEQ ID NO: 11). As a specific example, the sequence of the CMV promoter may have a nucleic acid sequence having at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with a nucleic acid sequence of SEQ ID NO: 11.
For example, the 2A self-cleaving peptide may be T2A, P2A, E2A, F2A, or the like. The 2A self-cleaving peptide may be located between two or more different proteins to be expressed in the vector.
Further, the replication origin may be an f1 replication origin, an SV40 replication origin, a pMB1 replication origin, an adeno replication origin, an AAV replication origin, and/or a BBV replication origin, but is not limited thereto.
The selection element may be a fluorescent protein gene, a tag, a reporter gene, an antibiotic resistance gene, or the like.
For example, the fluorescent protein gene may be a GFP gene, a YFP gene, an RFP gene, an mCherry gene, or the like.
For example, the tag may be a histidine (His) tag, a V5 tag, a FLAG tag, an influenza hemagglutinin (HA) tag, a Myc tag, a VSV-G tag, a thioredoxin (Trx) tag, or the like.
For example, the reporter gene may be glutathione-S-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT), beta-galactosidase, beta-glucuronidase, or the like.
For example, the antibiotic resistance gene may be a hygromycin resistant gene, a neomycin resistant gene, a kanamycin resistant gene, a blasticidin resistant gene, a zeocin resistant gene, and the like.
As an exemplary embodiment, the vector may be a viral vector. As a specific example, the viral vector may be one or more selected from the group consisting of a retrovirus, a lentivirus, an adenovirus, an adeno-associated virus (AAV), a vaccinia virus, a poxvirus and a herpes simplex virus. As an exemplary embodiment, the viral vector may be an AAV.
As an exemplary embodiment, the vector may be a non-viral vector. As a specific example, the non-viral vector may be one or more selected from the group consisting of a plasmid, a phage, naked DNA, a DNA complex, and mRNA. As an exemplary embodiment, the plasmid may be selected from the group consisting of the pcDNA series, pS456, p326, pACYC177, ColE1, pKT230, pME290, pBR322, pUC8/9, pUC6, pBD9, pHC79, plJ61, pLAFR1, pHV14, the pGEX series, the pET series, and pUC19. As an exemplary embodiment, the phage may be selected from the group consisting of λgt4AB, λ-Charon, λΔz1, and M13. As an exemplary embodiment, the encoding nucleic acid may be a PCR amplicon.
The present disclosure discloses a method for gene editing using a CRISPR/Cas9 composition. The method for gene editing includes delivering, injecting, and/or introducing (administering) a CRISPR/Cas9 composition to a target subject for gene editing.
The target subject for gene editing may be an individual or a tissue, and may be referred to as a target individual or a target tissue. As an exemplary embodiment, the target individual may be a plant, animal, non-human animal, and/or human. Specifically, the target individual may be a mammal. As an exemplary embodiment, the target tissue may be a non-human animal tissue and/or a human tissue.
The target subject for gene editing may refer to a cell, and may be referred to as a target cell. As an exemplary embodiment, the target cell may be a prokaryotic cell. As another exemplary embodiment, the target cell may be a eukaryotic cell. Specifically, the eukaryotic cell may be a plant cell, animal cell, non-human animal cell and/or human cell.
The delivery, injection, and/or introduction method is not particularly limited as long as it can deliver the SpCas9 variant or a nucleic acid encoding the same, and the guide RNA or a nucleic acid encoding the same as any one of the configurational forms of the compositions to a cell. Those skilled in the art can perform the method by appropriately selecting a known technique.
As a specific example, the delivery, injection, and/or introduction method may be performed by injection, transfusion, implantation, or transplantation.
As a specific example, the delivery, injection, and/or introduction method may be performed via route selected from subretinal, subcutaneously, intradermally, intraocularly, intravitreally, intratumorally, intranodally, intramedullary, intramuscular, intravenous, intralymphatic, or intraperitoneally.
As a specific example, the delivery, injection, and/or introduction method may be electroporation, a gene gun, sonoporation, magnetofection, and/or temporary cell compression or squeezing.
As an exemplary embodiment, the delivery, injection, and/or introduction method may be delivering a SpCas9 variant or a nucleic acid encoding the SpCas9 variant and/or a guide RNA or a nucleic acid encoding the guide RNA using nanoparticles. In this case, the delivery method may be a cationic liposome method, a lithium acetate-DMSO method, lipid-mediated transfection, a calcium phosphate precipitation method, polyethyleneimine (PEI)-mediated transfection, DEAE-dextran-mediated transfection, and/or nanoparticle-mediated nucleic acid delivery (see Panyam et., al Adv Drug Deliv Rev. 2012 Sep. 13.pii: S0169-409X (12) 00283-9. doi: 10.1016/j.addr.2012.09.023), but is not limited thereto.
As a specific example, the lipid-mediated transfection may be based on lipid nanoparticles (LNPs) and/or PEG. As a specific example, the LNPs may include protonated ionization lipids and/or ionization lipids exhibiting neutrality. As a specific example, the LNPs may further include a phospholipid, cholesterol, or a PEG-linked lipid. In this case, the LNP is a particulate drug delivery vehicle that uses phospholipids, cholesterol, and the like, which are substances present in the body, and therefore has high bioavailability and affinity, enables drug release and control, and is highly stable against degradation by enzymes.
The CRISPR/Cas9 complex derived from the composition introduced into the subject comes into contact with the target nucleic acid, so that the SpCas9 variant recognizes the PAM sequence and the guide domain complementarily binds to a target sequence (a part of the double strand of the target nucleic acid, which complementarily binds to a non-target sequence, which is a part adjacent to the PAM sequence). Moreover, the target nucleic acid is cleaved by the SpCas9 variant of the CRISPR/Cas9 complex.
When the CRISPR/Cas9 complex cleaves the target nucleic acid, any position in the PAM sequence part of the target nucleic acid and/or the sequence part that complementarily binds to the guide domain is cleaved. The CRISPR/Cas9 complex may cause a double-strand break (DSB) part in the target nucleic acid to be repaired through mechanisms such as homology directed repair (HDR) or non-homologous end joining (NHEJ). In this case, when the part is repaired by HDR, the insertion of the donor may occur. In this case, when the part is repaired by NHEJ, this may result in the substitution, insertion or deletion of a short gene fragment, and the knock-out of the corresponding gene may occur.
The method for gene editing may cause an indel to be generated in a target gene or target nucleic acid. In this case, the indel may occur inside and/or outside a target sequence portion. The indel refers to a mutation in the nucleotide sequence of a nucleic acid before gene editing, where some nucleotides are deleted, an arbitrary nucleotide is inserted, and/or a combination of insertion and deletion occurs.
In general, the occurrence of an indel in the target gene or target nucleic acid sequence inactivates the corresponding gene or nucleic acid. In such a case, a protein encoded by the gene may not be expressed or may be expressed as a damaged protein, making it functionally deficient. Such an effect may be referred to as a “gene knock-out.”
As a result of performing the method for gene editing, base editing may occur in the target gene or target nucleic acid. This means that, unlike an indel, in which any nucleotide in the target gene or target nucleic acid is deleted or added, one or more specific nucleotides in the nucleic acid are changed as intended. In other words, base editing causes a predetermined point mutation at a specific position in the target gene or nucleic acid. As an exemplary embodiment, as a result of performing the method for gene editing, one or more nucleotides in the target gene or target nucleic acid may be substituted with other nucleotides.
As a result of performing the method for gene editing, knock-in may occur in the target gene or target nucleic acid. The knock-in means inserting an additional nucleic acid sequence into the target gene or target nucleic acid sequence. In addition to the CRISPR/Cas9 complex, a donor including the additional nucleic acid sequence is further required for the knock-in to occur. In this case, the donor may be included in the vector described in the <<Vector for knock-in>>table of contents. When the CRISPR/Cas9 complex cleaves a target gene or target nucleic acid in the cell, repair of the cleaved target gene or target nucleic acid will occur due to homology directed repairing (HDR). In this case, the donor is involved in the repair process, so that the additional nucleic acid sequence may be inserted into the target gene or target nucleic acid. For example, the donor includes an exogeneous DNA sequence for insertion into a cellular genome, and the donor may induce insertion of the exogeneous DNA sequence into the target gene or the target nucleic acid.
As a result of performing the method for gene editing, all or part of the target gene or target nucleic acid sequence may be deleted. The deletion refers to the removal of a partial base sequence (nucleotide sequence) with a certain length or more from the target gene or the target nucleic acid (large deletion). The deletion, compared to the above-described indel effect, may allow the total removal of a specific region of a gene, for example, a first exon region.
As an exemplary embodiment, the method for gene editing may include delivering, injecting, and/or introducing a CRISPR/Cas9 composition described in “Component example 1 of composition-L1111R/D1135V/G1218K/E1219V/A1322R/R1335Q” to a target subject for gene editing. As a specific example, after the CRISPR/Cas9 composition is delivered to the target subject for gene editing, the CRISPR/Cas9 complex comes into contact with the target nucleic acid, so that the SpCas9 variant recognizes a PAM sequence which is 5′-NGN-3′, and the guide domain complementarily binds to the target sequence (a part of the double strand of the target nucleic acid, which complementarily binds to a non-target sequence, which is a part adjacent to the PAM sequence), allowing the target nucleic acid to be cleaved by the CRISPR/Cas9 complex. As a specific example, when the CRISPR/Cas9 complex cleaves the target nucleic acid, any position in the PAM sequence part which is 5′-NGN-3′ of the target nucleic acid and/or the sequence part that complementarily binds to the guide domain may be cleaved. As a specific example, as a result of performing the method for gene editing, indels, base editing, insertions, and/or deletions may occur in the target gene and/or target nucleic acid. As a specific example, as a result of performing the method for gene editing, knock-in and/or knock-out may occur in the target gene and/or target nucleic acid.
As an exemplary embodiment, the method for gene editing may include delivering, injecting, and/or introducing a CRISPR/Cas9 composition described in “Component example 2 of composition-L1111R/D1135V/G1218Q/E1219Q/A1322R/R1333P/T1337L” to a target subject for gene editing. As a specific example, after the CRISPR/Cas9 composition is delivered to the target subject for gene editing, the CRISPR/Cas9 complex comes into contact with the target nucleic acid, so that the SpCas9 variant recognizes PAM sequence which is 5′-NNG-3′, and the guide domain complementarily binds to the target sequence (a part of the double strand of the target nucleic acid, which complementarily binds to a non-target sequence, which is a part adjacent to the PAM sequence), allowing the target nucleic acid to be cleaved by the CRISPR/Cas9 complex. As a specific example, when the CRISPR/Cas9 complex cleaves the target nucleic acid, any position in the PAM sequence part which is 5′-NNG-3′ of the target nucleic acid and/or the sequence part that complementarily binds to the guide domain may be cleaved. As a specific example, as a result of performing the method for gene editing, indels, base editing, insertions, and/or deletions may occur in the target gene and/or target nucleic acid. As a specific example, as a result of performing the method for gene editing, knock-in and/or knock-out may occur in the target gene and/or target nucleic acid.
As an exemplary embodiment, the method for gene editing may include delivering, injecting, and/or introducing a CRISPR/Cas9 composition described in “Component example 3 of composition-L1111R/D1135V/G1218R/E1219F/A1322R/R1333G/R1335H/T1337C” to a target subject for gene editing. As a specific example, after the CRISPR/Cas9 composition is delivered to the target subject for gene editing, the CRISPR/Cas9 complex comes into contact with the target nucleic acid, so that the SpCas9 variant recognizes PAM sequence which is 5′-NNN-3′, and the guide domain complementarily binds to the target sequence (a part of the double strand of the target nucleic acid, which complementarily binds to a non-target sequence, which is a part adjacent to the PAM sequence), allowing the target nucleic acid to be cleaved by the CRISPR/Cas9 complex. As a specific example, when the CRISPR/Cas9 complex cleaves the target nucleic acid, any position in the PAM sequence part which is 5′-NNN-3′ of the target nucleic acid and/or the sequence part that complementarily binds to the guide domain may be cleaved. As a specific example, as a result of performing the method for gene editing, indels, base editing, insertions, and/or deletions may occur in the target gene and/or target nucleic acid. As a specific example, as a result of performing the method for gene editing, knock-in and/or knock-out may occur in the target gene and/or target nucleic acid.
As an exemplary embodiment, the method for gene editing may include delivering, injecting, and/or introducing a CRISPR/Cas9 composition described in “Component example 4 of composition-L1111R/D1135V/G1218M/E1219T/A1322R/R1333P/R1335Y/T1337L” to a target subject for gene editing. As a specific example, after the CRISPR/Cas9 composition is delivered to the target subject for gene editing, the CRISPR/Cas9 complex comes into contact with the target nucleic acid, so that the SpCas9 variant recognizes PAM sequence which is 5′-NNN-3′, and the guide domain complementarily binds to the target sequence (a part of the double strand of the target nucleic acid, which complementarily binds to a non-target sequence, which is a part adjacent to the PAM sequence), allowing the target nucleic acid to be cleaved by the CRISPR/Cas9 complex. As a specific example, when the CRISPR/Cas9 complex cleaves the target nucleic acid, any position in the PAM sequence part which is 5′-NNN-3′ of the target nucleic acid and/or the sequence part that complementarily binds to the guide domain may be cleaved. As a specific example, as a result of performing the method for gene editing, indels, base editing, insertions, and/or deletions may occur in the target gene and/or target nucleic acid. As a specific example, as a result of performing the method for gene editing, knock-in and/or knock-out may occur in the target gene and/or target nucleic acid.
The present disclosure discloses a method for screening a SpCas9 variant. In this case, the SpCas9 variant is characterized by the ability to recognize a PAM sequence other than 5′-NGG-3′. As a specific example, the method may include 1) a Cas9 cell library production step and/or 2) a mutant protein selection step. In this case, the mutant protein selection step may include a primary selection step and/or a secondary selection step.
As a specific example, the Cas9 cell library production step may include a PiggyBac using step and/or a transposase using step.
The PiggyBac using step is a step of producing a library by substituting n (specifically, n may be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20) amino acid residues with another amino acid (any one of about 20 types) to clone a nucleic acid encoding an SpCas9 protein having 20″ diversity in a PiggyBac-based vector.
The transposase using step is a step of producing a cell library having 20″ diversity by transfecting cells with the library produced in the PiggyBac using step together with a transposase vector to induce integration into the genomic DNA of each cell.
As a specific example, the SpCas9 protein having 20″ diversity may include the L1111R/D1135V/A1322R mutation compared to the wild-type SpCas9 protein.
As a specific example, the residue in which the amino acid is substituted may include at least one of the G1218 and E1219 amino acid residues of the wild-type SpCas9 protein. As a specific example, the residue in which the amino acid is substituted may include at least one of the R1333, R1335, and T1337 amino acid residues of the wild-type SpCas9 protein.
The primary selection step involves transfecting the produced cell library with various types of sgRNAs targeting the HPRT gene, and then treating the cells with 6-thioguanine (6TG) to allow only cells with mutations in the HPRT gene to survive. In this case, the surviving cells are those in which the SpCas9 protein reacts with sgRNA to generate an indel in the HPRT gene, and the SpCas9 transfected into the surviving cells recognizes a PAM sequence other than 5′-NGG-3′.
As a specific example, the sgRNA targets a target sequence adjacent to a PAM sequence other than 5′-NGG-3′. As a specific example, the PAM sequence other than 5′-NGG-3′ may include at least one sequence of 5′-CC-3′, 5′-TT-3′, 5′-AA-3′, 5′-GC-3′, 5′-GT-3′, and 5′-GA-3′.
The secondary selection step involves transfecting a pool of cells of the same type as the surviving cells (cells which are the same as transfected SpCas9 protein, but without mutations in the HPRT gene) in the primary selection step with various types of sgRNAs targeting the HPRT gene, and then treating the cells with 6-thioguanine (6TG) to allow only cells with mutations in the HPRT gene to survive. In this case, the surviving cells are those in which the SpCas9 protein reacts with sgRNA to generate an indel in the HPRT gene, and the SpCas9 transfected into the surviving cells recognizes a PAM sequence other than 5′-NGG-3′.
As a specific example, the sgRNA targets a target sequence adjacent to a PAM sequence other than 5′-NGG-3′. As a specific example, the PAM sequence other than 5′-NGG-3′ may include at least one sequence of 5′-CC-3′, 5′-TT-3′, 5′-AA-3′, 5′-GC-3′, 5′-GT-3′, and 5′-GA-3′. As a specific example, when the sgRNA used in the secondary selection step is compared to the sgRNA used in the primary selection step, the PAM sequences adjacent to the sequences targeted by the sgRNAs may be the same, but the sequences targeted by the sgRNAs are different sequences.
Hereinafter, possible examples of the disclosure are listed. The following examples provided in this paragraph merely correspond to examples of the disclosure. Therefore, the disclosure should not be construed as being limited to the following examples. The brief description described with the example numbers is also for the convenience of dividing the examples, and cannot be construed as a limitation on the disclosure.
A Streptococcus pyogenes Cas9 (SpCas9) variant, represented by an amino acid sequence with six or more amino acid residue differences compared to SEQ ID NO:1 which is an amino acid sequence of wild-type SpCas9 protein.
The SpCas9 variant of Example 1, wherein the SpCas9 variant includes L1111R/D1135V/A1322R mutation compared to the wild-type SpCas9 protein.
The SpCas9 variant of Examples 1 and 2, wherein the SpCas9 variant includes a mutation in which any one or more of the G1218 and E1219 amino acid residues of the wild-type SpCas9 protein are substituted with other amino acids.
The SpCas9 variant of Examples 1 to 3, wherein the SpCas9 variant includes a mutation in which any one or more of the R1333, R1335, and T1337 amino acid residues of the wild-type SpCas9 protein are substituted with other amino acids.
The SpCas9 variant of Examples 1 to 4, wherein the SpCas9 variant includes L1111R/D1135V/G1218K/E1219V/A1322R/R1335Q mutation.
The SpCas9 variant of Example 5, wherein the SpCas9 variant includes a sequence having at least 80% identity to an amino acid sequence of SEQ ID NO: 3 or a sequence having 100% identity to the amino acid sequence of SEQ ID NO: 3.
The SpCas9 variant of Examples 5 and 6, wherein the SpCas9 variant is capable of recognizing PAM sequence which is 5′-NGN-3′.
The SpCas9 variant of Examples 1 to 4, wherein the SpCas9 variant includes L1111R/D1135V/G1218Q/E1219Q/A1322R/R1333P/T1337L mutation.
The SpCas9 variant of Example 8, wherein the SpCas9 variant includes a sequence having at least 80% identical to an amino acid sequence of SEQ ID NO: 4 or a sequence having 100% identical with the amino acid sequence of SEQ ID NO: 4.
The SpCas9 variant of Examples 8 and 9, wherein the SpCas9 variant is capable of recognizing PAM sequence which is 5′-NNG-3′.
The SpCas9 variant of Examples 1 to 4, wherein the SpCas9 variant includes L1111R/D1135V/G1218R/E1219F/A1322R/R1333G/R1335H/T1337C mutation.
The SpCas9 variant of Example 11, wherein the SpCas9 variant includes a sequence having at least 80% identical to an amino acid sequence of SEQ ID NO: 5 or a sequence having 100% identical with the amino acid sequence of SEQ ID NO: 5.
The SpCas9 variant of Examples 11 and 12, wherein the SpCas9 variant is PAMless.
The SpCas9 variant of Examples 1 to 4, wherein the SpCas9 variant includes L1111R/D1135V/G1218M/E1219T/A1322R/R1333P/R1335Y/T1337L mutation.
The SpCas9 variant of Example 14, wherein the SpCas9 variant includes a sequence having at least 80% identical to an amino acid sequence of SEQ ID NO: 6 or a sequence having 100% identical with the amino acid sequence of SEQ ID NO: 6.
The SpCas9 variant of Examples 14 and 15, wherein the SpCas9 variant is PAMless.
The SpCas9 variant of Examples 1 to 16, wherein the SpCas9 variant is capable of forming a CRISPR/Cas9 complex by interacting with the guide RNA.
The SpCas9 variant of Example 17,
The SpCas9 variant of Example 18, wherein the direct repeat includes a nucleic acid sequence having at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with a sequence of SEQ ID NO: 7.
The SpCas9 variant of Examples 18 and 19, wherein the tracrRNA includes a nucleic acid sequence having at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with a sequence of SEQ ID NO: 8.
A CRISPR/Cas9 composition including the SpCas9 variant of any one of Examples 1 to 20 or a nucleic acid encoding the SpCa9 variant.
The CRISPR/Cas9 composition of Example 21, wherein the CRISPR/Cas9 composition further includes a guide RNA or a nucleic acid encoding the guide RNA.
The SpCas9 variant of Example 22,
The CRISPR/Cas9 composition of Example 23, wherein the direct repeat includes a nucleic acid sequence having at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with a sequence of SEQ ID NO: 7.
The CRISPR/Cas9 composition of Example 19, wherein the tracrRNA includes a nucleic acid sequence having at least 80%, for example, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to 100%, sequence identity or sequence similarity with a sequence of SEQ ID NO: 8.
The CRISPR/Cas9 composition of any one of Examples 22 to 25, wherein the SpCas9 variant and the guide RNA are capable of forming a CRISPR/Cas9 composition by interacting with each other.
The CRISPR/Cas9 composition of any one of Examples 21 to 26, wherein the SpCas9 variant and the guide RNA are in the form of ribonucleoprotein (RNP).
The CRISPR/Cas9 composition of any one of Examples 21 to 26,
The CRISPR/Cas9 composition of any one of Examples 21 to 28, wherein the CRISPR/Cas9 composition includes a donor.
The CRISPR/Cas9 composition of Example 29, wherein the donor includes a gene for insertion into the target sequence.
The CRISPR/Cas9 composition of Example 30, wherein the donor is in the form of a vector.
The CRISPR/Cas9 composition of any one of Examples 28 to 31, wherein the vector includes one or more of a promoter, an enhancer, an artificial intron, a polyadenylation signal, a Kozak consensus sequence, an internal ribosome entry site (IRES), a splice acceptor, a 2A sequence, and a replication origin.
The CRISPR/Cas9 composition of Example 32, wherein the promoter is one of an SV40 initial promoter, a mouse mammary tumor virus long terminal repeat (LTR) promoter, an adenovirus major late promoter (Ad MLP), a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as a CMV immediate early promoter region (CMVIE), a Rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6) (Miyagishi et al., Nature Biotechnology 20, 497-500 (2002)), an enhanced U6 promoter (e.g., Xia et al., Nucleic Acids Res. 2003 Sep. 1; 31 (17)), a human H1 promoter (H1), and 7SK.
The CRISPR/Cas9 composition of any one of Examples 28 to 33, wherein the vector is a viral vector.
The CRISPR/Cas9 composition of Example 34, wherein the viral vector is one selected from the group consisting of a retrovirus, a lentivirus, an adenovirus, an adeno-associated virus (AAV), a vaccinia virus, a poxvirus, and a herpes simplex virus.
The CRISPR/Cas9 composition of any one of Examples 28 to 33, wherein the vector is a non-viral vector.
In Example 36, the non-viral vector may be one or more selected from the group consisting of a plasmid, a phage, naked DNA, a DNA complex, and mRNA. As an exemplary embodiment, the CRISPR/Cas9 wherein the plasmid is one selected from the group consisting of the pcDNA series, pS456, p326, pACYC177, ColE1, pKT230, pME290, pBR322, pUC8/9, pUC6, pBD9, pHC79, plJ61, pLAFR1, pHV14, the pGEX series, the pET series, and pUC19.
A method for gene editing, wherein the method includes delivering, injecting, and/or introducing (administering) the CRISPR/Cas9 composition of any one of Examples 21 to 37 to a target subject for gene editing.
The method of Example 38, wherein the target subject for gene editing is a target individual, a target tissue, or a target cell.
The method of Example 39, wherein the target individual is a plant, animal, non-human animal, or human.
The method of Example 39, wherein the target tissue is a non-human animal tissue or human tissue.
The method of Example 39, wherein the target cell is a eukaryotic cell or prokaryotic cell.
The method of any one of Examples 38 to 42, wherein the delivery, injection, and/or introduction is performed by one or more of injection, transfusion, implantation, transplantation, electroporation, gene gun, sonoporation, magnetofection, temporary cell squeezing, a cationic liposome method, lithium acetate-DMSO, lipid-mediated transfection, a calcium phosphate precipitation method, lipofection, polyethyleneimine (PEI)-mediated transfection, DEAE-dextran-mediated transfection, and nanoparticle-mediated nucleic acid delivery (see Panyam et., al Adv Drug Deliv Rev. 2012 Sep. 13.pii: S0169-409X (12) 00283-9. doi: 10.1016/j.addr.2012.09.023).
The method of any one of Examples 38 to 44, wherein, by delivering, injecting, and/or introducing (administering) the CRISPR/Cas9 composition to the target subject for gene editing, the CRISPR/Cas9 complex is capable of cleaving a portion of the target nucleic acid adjacent to PAM sequence other than 5′-NGG-3′.
The method of any one of Examples 38 to 44, wherein, by delivering, injecting, and/or introducing (administering) the CRISPR/Cas9 composition to the target subject for gene editing, indel, base editing, insertion, and/or deletion may occur at a target nucleic acid portion adjacent to PAM sequence other than 5′-NGG-3′.
The method of any one of Examples 44 and 45, wherein the SpCas9 variant includes G1218K/E1219V/R1335Q mutation, wherein the PAM sequence is 5′-NGN-3′.
The method of any one of Examples 44 and 45, wherein the SpCas9 variant includes G1218Q/E1219Q/R1333P/T1337L mutation, wherein the PAM sequence is 5′-NNG-3′.
The method of any one of Examples 44 and 45, wherein the SpCas9 variant includes G1218R/E1219F/R1333G/R1335H/T1337C mutation, wherein the PAM sequence is 5′-NNN-3′.
The method of any one of Examples 44 and 45, wherein the SpCas9 variant includes G1218M/E1219T/R1333P/R1335Y/T1337L mutation, wherein the PAM sequence is 5′-NNN-3′.
A method for screening a SpCas9 variant, which is capable of recognizing a PAM sequence other than 5′-NGG-3′.
The method of Example 50, wherein the screening method includes a Cas9 cell library production step.
The method of Example 51, wherein the Cas9 cell library production step includes a PiggyBac using step,
The method of any one of Examples 51 and 52, wherein the Cas9 cell library production step includes a transposase using step,
The method of any one of Examples 50 to 53, wherein the method includes a mutant protein selection step.
The method of Example 54, wherein the mutant protein selection step includes a primary selection step,
The method of Example 55, wherein the mutant protein selection step includes a secondary selection step,
Hereinafter, the disclosure may be described in more detail through Experimental Examples and Examples. These Examples are only for exemplifying the content disclosed by the present disclosure, and it will be obvious to those skilled in the art that the scope of the content disclosed by the present disclosure should not be interpreted as being limited by these Experimental Examples and Examples.
The configuration of the present disclosure was carried out using the following screening method.
In the sequence of the wild-type SpCas9 protein (SEQ ID NO: 1), a library was produced by substituting five representative amino acid residues (G1218, E1219, R1333, R1335, and T1337) which are known to interact with the PAM sequence with 20 other amino acids to clone a nucleic acid encoding a Cas9 variant with a total of 205 diversity into a PiggyBac-based vector. In this case, the Cas9 variant includes L1111R/D1135V/A1322R mutation compared to the wild-type SpCas9 protein.
A library with 205 diversity was produced by transfecting cells with the produced library together with a transposase vector to induce integration into the genomic DNA of each cell (
The produced library was transfected with sgRNA targeting the HPRT gene. The transfected sgRNAs include sgRNA targeting a target sequence that complementarily binds to a non-target sequence adjacent to different PAM sequences. Thereafter, cells were treated with 6-thioguanine (6TG) to allow only cells with a mutation in the HPRT gene to survive. The surviving cells are in a state in which a nucleic acid encoding a Cas9 variant that successfully cleaves the target sequence adjacent to the PAM sequence associated with the transfected sgRNA is integrated into the corresponding cells (
After five positions with amino acid mutations in the Cas9 variant associated with the nucleic acid integrated into the above surviving cells were amplified using PCR amplification, these positions were analyzed using NGS to obtain clones. The PAM sequence of the obtained Hit was later validated by additional experiments (
PCR was performed using an oligo library pool (a product of Combinatorial Variant Library manufactured by Twist Bioscience was ordered) as a template for nucleic acids encoding SpCas9 variants in which five amino acid residues were site saturation mutated (SSM).
A pblc-based plasmid library in which a product obtained by the above PCR (a product obtained using the oligo library pool as a template) was inserted into a PBLC vector (purchased from Bioneer) by a cloning method using a Gibson assembly (performed at 50° C. overnight) was produced. In the corresponding experiment, primers of SEQ ID NOs: 25 to 27 were used.
PCR was performed using the produced pblc-based plasmid library as a template. A PiggyBac-based Cas9 variants plasmid library in which a product obtained by PCR (a product obtained using the pblc-based plasmid library as a template) was inserted into a PiggyBac vector (purchased from SBI) by a cloning method using a Gibson assembly (performed at 50° C. overnight) was produced.
HeLa cells were seeded onto 5 dishes (150 mm) at 2×106 cells/dish. 24 hours after cell seeding, Hela cells were co-transfected with the PiggyBac-based Cas9 variants plasmid library and a transposase expressed vector using Lipofectamine 2000 (PiggyBac-based Cas9 variants plasmid library: transposase=2 μg: 2 μg) for integration.
The PiggyBac-based Cas9 variants plasmid library includes a puromycin resistance gene. 24 hours after co-transfection, puromycin selection was performed using a medium containing 2 μg/ml puromycin. 96 hours after puromycin selection, subculture was carried out. One week after subculture, cells stock was performed to produce a primary Cas9 variants cell library.
qRT-PCR for Copy Number Confirmation
A PiggyBac copy number kit (purchased from SBI) was used. After genomic DNA was extracted from the produced piggyBac-based cas9 variants cell library (prep), qRT-PCR was performed using a primer provided in the kit. The obtained Ct value was substituted into a formula (ΔΔCt-2-(Pbcopy-UCR1), copy number=ΔΔCt/2) to calculate an integration copy number.
A primary Cas9 variants cell library was seeded at 2×106 cells onto a 150 mm dish. 20 μg of pRG vectors (HPRT target: CC, TT, AA, GC, GT, GA pam sgRNA) capable of expressing sgRNA for primary screening, which targets (is bound by the guide domain of sgRNA) a sequence adjacent to the PAM sequence (5′-NCC-3′, 5′-NTT-3′, 5′-NAA-3′, 5′-NGC-3′, 5′-NGT-3′, 5′-NGA-3′) other than 5′-NGG-3′ in the HPRT gene were transfected using Lipofectamine 2000. 3 days after transfection, subculture was performed. 7 days after transfection, 6-thioguanine (6TG) selection was started using a medium including 3 μM 6TG simultaneously with subculture. 14 days after the start of 6TG selection, subculture was performed. 17 days after the start of 6TG selection, cells were harvested, and genomic DNA was extracted (prep).
The cell pool obtained through 6TG selection in the primary screening was seeded at 2×106 cells onto a 150 mm dish. 20 μg of pRG vectors (HPRT target: CC, TT, AA, GC, GT, GA pam sgRNA) capable of expressing sgRNA for secondary screening, which targets (is bound by the guide domain of sgRNA) a sequence adjacent to the PAM sequence (5′-NCC-3′, 5′-NTT-3′, 5′-NAA-3′, 5′-NGC-3′, 5′-NGT-3′, 5′-NGA-3′) other than 5′-NGG-3′ in the HPRT gene were transfected using Lipofectamine 2000. 3 days after transfection, subculture was performed. 7 days after transfection, 6-TG selection was started using a medium including 3 μM 6-thioguanine simultaneously with subculture. 14 days after the start of 6-TG selection, subculture was performed. 17 days after the start of 6-TG selection, cells were harvested, and genomic DNA was extracted (prep).
ddPCR for Hit Identification
50 ng of genomic DNA was used (conditions for inserting one genomic copy per drop), and ddPCR EvaGreen Supermix (Bio-Rad) was used for amplification. ddPCR Supermix amplification reactions were set up according to the manufacturer's protocol (Bio-Rad). Droplets were generated using DG8 cartridges, DG8 Gaskets, and a QX200™ Droplet generator (Bio-Rad). The generated droplets were transferred to a 96-well plate and heat-sealed using a PX1 PCR plate sealer (Bio-Rad). The PCR conditions were used by changing only the annealing temperature to 61° C. in the manufacturer's protocol for QX200 ddPCR EvaGreen Supermix. Droplets were individually scanned using a QX200™ Droplet Digital™ PCR system (BioRad). After PCR, 20 ul of water was added to break the droplets, the droplets were vortexed, the droplets were then frozen in liquid nitrogen and thawed at room temperature three times, and then the droplets were spun down to separate an aqueous layer and an oil layer. Purification was performed by collecting only the aqueous layer.
Circularization for NGS was performed in the following order:
Primers of SEQ ID NOS: 28 to 41 were used to carry out the PCR.
Transfection of Selected Cas9 Variants into PAM Analysis Cell Libraries for PAM Analysis
A cell library was seeded at 2×106 cells/dish onto 5 dishes (150 mm). For this purpose, 24 hours after cell seeding, 20 μg of lenti-based vector candidates were transfected using Lipofectamine 2000. In this case, lenti-based vector candidates are lenti-based vectors that are capable of expressing sgRNA which targets (is bound by the guide domain of sgRNA) a sequence adjacent to the PAM sequence (5′-NNN-3′, where each N is any one of A, C, T, and G, and there are 256 different types of PAM sequences, for a total of 44 types) in the HPRT gene. In this case, primers of SEQ ID NOs: 42 to 50 were used to produce a cell library for verification.
24 hours after transfection, blasticidin selection was started using a medium including 20 μg/ml blasticidin. 120 hours after transfection, cells were harvested and genomic DNA was extracted (prep) (1×108 cells genomic extraction).
For the primary PCR, a template was used with a coverage of ×1000 for the library scale (assuming 10 μg of genomic DNA per 106 cells). The primary PCR was carried out at 2.5 μg/reaction×48 reactions. In the experiments of the present specification, primers of SEQ ID NOS: 51 to 56 were used. All the primary PCR pools were collected and purified, and then barcoded PCR was performed. Finally, Illumina Hlseq was performed.
The present inventors predicted that when amino acid residues affecting the recognition of PAM sequence by the Cas9 protein were modified, it would be possible to select a SpCas9 variant capable of recognizing PAM sequence other than 5′-NGG-3′.
Therefore, in Nureki-NG Cas9 (Nureki-NG Cas9 is a Cas9 having L1111R/D1135V/G1218R/E1219F/A1322R/R1335V/T1337R mutation from the wild-type SpCas9 protein.), directed evolution was performed using site saturation mutations on a total of five amino acid residues: G1218 and E1219, which form hydrophobic interactions with a portion of the ribose in the PAM sequence, and R1333, R1335, and T1337, which directly recognize and bind to the PAM sequence (
A Cas9 variants plasmid library with a scale of over 106 was produced by the Gibson assembly method using an oligo pool including a nucleic acid encoding a Cas9 variant with site saturation mutation applied to five amino acid residues. First, to improve transfection efficiency, a Cas9 variants plasmid library was produced in a small pBLC vector (2.8 kb) (quality-84%) (
A Cas9 variants cell library was produced using the PiggyBac-based Cas9 variants plasmid library produced above. In this case, a cell library consisting of cells, into which nucleic acids encoding Cas9 variants were integrated, was produced through puromycin selection.
After the produced cell library was transfected with a guide RNA, 6-TG selection was performed to allow only cells in which the Cas9 variant reacted with the guide RNA to edit the HPRT gene to survive. By analyzing the sequence of the Cas9 variant corresponding to the surviving cells, a candidate group of SpCas9 variants recognizing novel PAM was selected.
Hela cells highly sensitive to 6-TG selection were used in the screening process. Hela cells were co-transfected with the piggyBac-based cas9 variants plasmid library produced in Experimental Example 2 and a transposase expression vector for integration. Thereafter, a cas9 variants cell library was produced by allowing only integrated cells to survive through puromycin selection because the piggyBac-based cas9 variants plasmid library includes a puromycin resistance gene. The integration copy number of the produced piggyBac-based cas9 variants cell library was measured using qRT-PCR (integration copy number=5.6).
As a result of finding cells whose genes are edited by reacting with sgRNAs (nCC pam, nAA pam, nTT pam, nGC pam, nGA pam, and nGT pam5 in
To prove this, sgRNAs (nCC pam, nAA pam, nTT pam, nGC pam, nGA pam, and nGT pam in
In this case, before the screening was started, transfection was performed under each condition using a GFP expression vector to find the optimal transfection conditions in a 150-mm dish, and the results were analyzed by flow cytometry (80.1% transfection efficiency when 20 μg of Lipofectamine 2000 was used) (
A secondary screening was performed to enrich positive hits from a cell pool subjected to primary screening through 6-TG selection. The cell pools screened for different PAM sequences were subjected to a secondary screening in the same manner as the primary screening using sgRNAs (2nd nCC pam, 2nd nAA pam, 2nd nTT pam, 2nd nGC pam, 2nd nGA pam, and 2nd nGT pam in
PCR was performed to search for hits from the obtained genomic DNA. In this case, due to the similar homology between the amplicons, the 1st PCR was performed in two forms: a ddPCR method (using 50 ng of genomic DNA=genomic 1 copy/drop) to prevent shuffling which may occur between hits of the two mutation loci, and a general PCR method which is not the ddPCR method (
Since the distance between hits (350 bp) prevented illumina sequencing, two mutation loci were located close to each other (
As a result of the NGS data analysis (
In this case, the selected mutations were G1218K/E1219V/R1335Q mutation, G1218Q/E1219Q/R1333P/T1337L mutation, and G1218M/E1219T/R1333P/R1335Y/T1337L mutation.
In the general PCR method performed by assuming shuffling, the 1218/1219 and 1333/1335/1337 parts were ranked separately and analyzed (
The present inventors attempted to confirm what the PAM sequences recognized by the four SpCas9 variants selected in experimental example 6 are. To confirm the PAM sequence, a transfection experiment was performed on a cell library for PAM analysis. Further, an additional experiment was performed in the same manner as in the above experiment to compare with a wild-type SpCas9 protein, Nureki-NG Cas9, and SPRY Cas9.
The selected candidates were individually cloned and transfected into a cell library for PAM analysis (as shown in
As a result of the analysis, it was analyzed through
These results suggest that the method for screening the SpCas9 variant of the present specification may be used to find variants capable of recognizing PAM sequence other than 5′-NGG-3′.
An additional analysis was performed in the same manner as in Experimental Example 7 on the wild-type SpCas9 protein of SEQ ID NO: 1 (
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0010253 | Jan 2022 | KR | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/KR2023/001033 | 1/20/2023 | WO |