The present disclosure relates to compositions, methods and systems for targeted genomic modification and targeted regulation of gene expression in mammalian cells, including in human cells. In particular, type II CRISPR-Cas systems of Cas9 enzymes, guide RNAs and associated specific PAMs are described.
The clustered regularly interspaced short palindromic repeats/CRISPR associated system (CRISPR/Cas) is a microbial adaptive immune system that evolved within the bacterial and archeal organisms as a defense against invading genetic materials such as viruses and plasmids. The CRISPR system has enormous potential for adaptation for genome editing in humans, animals and other organisms.
The CRISPR system uses RNA-guided nucleases to cleave foreign genetic elements. The CRISPR/Cas system is generally classified into three major divisions known as Type-I, Type-II and Type-III as well as several subdivisions based on the Cas genes (Chylinski, K. et al., Nucleic Acids Research 42(10):6091-105, 2014). The CRISPR nuclease system only requires 3 components, which include the Cas9 protein (a nuclease), CRISPR RNA (crRNA) and trans-activating crRNA (tracrRNA) for genome editing (see
DNA cleavage specificity is determined by two parameters: the variable, spacer-derived sequence of crRNA targeting the protospacer sequence (a protospacer is defined as the sequence on the DNA target that is complementary to the spacer of crRNA) and a short sequence, the Protospacer Adjacent Motif (PAM), located immediately downstream of the protospacer on the non-target DNA strand. It is known that a gene-specific 20-nucleotide guide sequence inserted in the gRNA can be sufficient to direct the Cas9 protein to that specific gene's target sequence in the genome to execute the gene-editing process (
Further, successful binding of Cas9 protein to a target DNA sequence requires a sequence that is complementary to the gRNA sequence and must be immediately followed by a Protospacer Adjacent Motif (PAM) sequence at the 3′ end of the 20 bp target sequence (see
With the guidance of gRNA, Cas9 protein promotes genome editing at a pre-defined target sequence by inducing a double stranded break (DSB) in the DNA. Following the DSB induced by Cas9 protein, the target sequence can be repaired by the cellular repair machinery via either non-homologous end-joining (NHEJ) or homology-directed repair (HDR). In the absence of a donor template, NHEJ activates to re-ligate the DSBs, resulting in a genetic scar in the form of insertion/deletion (indel) mutations at the target sequence. This resulting NHEJ scar within the target sequence can cause gene knockouts, as the indels occurring within a coding exon can lead to frameshift mutations and premature stop codons.
In the presence of a donor template, HDR is an alternative major DNA repair pathway. HDR can allow precise genetic modifications (mutations or corrections) in the target sequence. However, HDR typically occurs less frequently and is substantially more variable in frequency than NHEJ. The repair template can either be in the form of double-stranded DNA (a PCR product or a linearized plasmid), a targeting construct with homologous arms flanking both sides of an insertion/correction sequence, or a single-stranded DNA oligonucleotide (ssODN). A ssODN provides a simple and effective method for making small gene edits (<50 bp) within the genome, such as the introduction of single-nucleotide mutations for probing causal genetic variations. Unlike NHEJ, HDR is generally active only in dividing cells, and its efficiency can vary extensively depending on the cell type, as well as the genomic locus and the repair template.
A potential limitation of CRISPR nuclease systems is that they can cause off-target mutagenesis. To minimize off-target activity, several mutant forms of Cas9 protein have been developed to maximize introduction of DSBs at the desired target site. For example, a double nicking strategy can be used. The Cas9 protein (e.g., SpCas9) has two functional domains (RuvC and HNH), each cutting a different DNA strand. When both these domains are active, the Cas9 protein causes DSBs in the target DNA. Mutated versions of Cas9 that contain a single active catalytic domain, either RuvC or HNH, are known as nickases. For example, the RuvC domain can be inactivated by a D10A mutation and the HNH domain can be inactivated by an H840A mutation. Cas9 nickase (Cas9n) cuts only one strand of the target genomic DNA rather than both strands as with the wild type Cas9. The single-strand break or nick can be repaired without any indels at the target sequence using high-fidelity base excision repair (BER) pathways rather than NHEJ. While Cas9 protein induces DSB at the target sequence using a single gRNA, either RuvC− or HNH− mutant Cas9n requires a paired gRNA appropriately spaced and oriented to simultaneously introduce the single-stranded nicks on both strands of the target sequence (see
However, available CRISPR systems have several limitations such as low efficiency and low specificity, which leads to a low success rate and undesirable off-target mutagenesis. There is a need for improved CRISPR/Cas systems for genome editing in cells.
It is an object of the present invention to ameliorate at least some of the deficiencies present in the prior art. Embodiments of the present technology have been developed based on the inventors' appreciation that there is a need for improved CRISPR/Cas systems for targeted genomic modification and targeted regulation of gene expression in mammalian cells.
The present disclosure relates broadly to CRISPR/Cas systems having greater efficiency and/or specificity than other known CRISPR/Cas systems, and to uses thereof for targeted genomic modification within a target genome region (TGR) in a mammalian cell. Therapeutic uses of the methods and systems described herein are also provided.
In particular, there are provided modified CRISPR/Cas systems having in some embodiments one or more of the following advantages: higher efficiency of genomic modification; ability to more efficiently and/or safely transfect a CRISPR/Cas system multiple times; reduced off-site or non-specific modification (i.e., higher specificity of genomic modification); a higher efficiency of homology-directed repair (HDR); improved stability of a single-stranded DNA oligonucleotide (ssODN) HDR repair template; and/or other advantages as will become apparent herein.
In general, HDR occurs at a much lower frequency and is therefore less efficient than NHEJ in CRISPR/Cas9 gene editing systems. This low efficiency of HDR presents a major constraint in the execution of precise genetic modifications by the CRISPR/Cas9 system. In some embodiments, we provide herein HDR repair templates having an improved design that can improve stability of the ssODN HDR donor template and/or allow genetic modifications to be made more efficiently at a target site of interest. Modifications in an ssODN HDR repair template include, without limitation: adding a 4-nucleotide repeat (such as the CGCG repeat of phosphorothioate) to improve the stability of the ssODN HDR template; incorporating a peptide nucleic acid at the ssODN end (5′ or 3′) to increase efficiency of the HDR pathway; linking a Cyanine dye; and/or linking a quantum dot at the 5′ end of a ssODN to allow monitoring of its cellular uptake and/or distribution in a cell during genomic modification. Thus, stability, efficiency, and/or traceability of the repair template, e.g., a ssODN, may be improved.
Further, successful genome modification by Cas9 nuclease requires a PAM sequence at the 3′ end of the target genomic DNA, e.g., at the 3′ end of the 20-nucleotide target sequence to which the guide RNA binds. However, if an HDR template has an intact PAM sequence or retains an intact PAM sequence in the donor template after Cas9 modification has occurred, then Cas9 may repeatedly act on the target sequence, potentially leading to an increased chance of mutations and/or DNA degradation, even after the desired modification has been introduced. To avoid these unwanted activities by the CRISPR/Cas9 system, we provide herein, in some embodiments, modified CRISPR/Cas9 systems and repair templates in which the PAM sequence is mutated in the HDR repair template. For example, in the case of the SpCas9 enzyme, the PAM sequence “NGG” in the HDR template can be mutated to NGT, NGC or NGA. Such mutation will prevent binding by the Cas9 enzyme and thus “mask” the PAM sequence. It is noted that, where the HDR template sequence falls within a protein coding region (for example, in an exon or a promoter region), then care is taken to introduce a silent mutation in the PAM sequence to avoid introducing amino acid changes into the coding region. “Masking” a PAM sequence in this way means that an already-modified sequence will not be cut again by Cas9. Multiple transfections of the CRISPR/Cas9 system to increase efficiency are thus possible, as already-modified sequences will be unaffected.
In some embodiments, alternatively or additionally, further modification of an already-modified sequence is prevented by selecting one or more gRNA such that a mutation is introduced in a PAM sequence and/or a target DNA sequence, the introduced mutation preventing further modification by the CRISPR/Cas9 system. In such embodiments the “masking” mutation is introduced by one or more gRNA. The HDR repair template may or may not also introduce a “masking” mutation into the target genome region (TGR).
In a first broad aspect, there is provided a method for targeted genomic modification within a target genome region (TGR) in a mammalian cell, the method comprising providing a CRISPR/Cas9 system and contacting the mammalian cell with the CRISPR/Cas9 system, wherein the CRISPR/Cas9 system comprises: i) a first guide RNA (gRNA) comprising a first CRISPR RNA (crRNA) and a first trans-activating crRNA (tracrRNA) linked together, the first gRNA being capable of binding with sequence specificity to a first target DNA sequence on one strand of the DNA double helix in the TGR, the first target DNA sequence to which the first gRNA binds being adjacent to a first PAM sequence; ii) a second gRNA comprising a second CRISPR RNA (crRNA) and a second trans-activating crRNA (tracrRNA) linked together, the second gRNA being capable of binding with sequence specificity to a second target DNA sequence on the other strand of the DNA double helix in the TGR, the second target DNA sequence to which the second gRNA binds being adjacent to a second PAM sequence, wherein the first and the second target DNA sequence are located within about 100 to about 1000 nucleotides of each other, or within about 100 nucleotides of each other, and are on opposite strands of the DNA double helix; and iii) a Cas9n protein. In some embodiments, the first and the second target DNA sequence are located within about 100 nucleotides of each other. In some embodiments, the first and the second target DNA sequence are located within about 1000 nucleotides of each other. In some embodiments, the first and the second target DNA sequence are located more than 100 nucleotides from each other. The mammalian cell is contacted with the CRISPR/Cas9 system under conditions (sufficient time, etc.) such that the TGR is modified, forming a modified-TGR, the first and/or the second gRNA having been selected in some embodiments such that one or more of the first PAM sequence, the second PAM sequence, the first target DNA sequence and the second target DNA sequence are modified within the modified-TGR so as to prevent further modification of the modified-TGR by the CRISPR/Cas9 system.
In some embodiments, the CRISPR/Cas9 system further comprises: iv) a third gRNA comprising a third CRISPR RNA (crRNA) and a third trans-activating crRNA (tracrRNA) linked together, the third gRNA being capable of binding with sequence specificity to a third DNA sequence on one strand of the DNA double helix in the TGR, the third target DNA sequence to which the third gRNA binds being adjacent to a third PAM sequence; wherein the third target DNA sequence is located either within 100 nucleotides of the first target DNA sequence on the opposite strand of the DNA double helix or within 100 nucleotides of the second target DNA sequence on the opposite strand of the DNA double helix; and wherein the third gRNA is selected such that the CRISPR/Cas9 system can only bind and/or modify the third target DNA sequence if the target genome region comprises a disease-causing modification or a sequence for which modification is desired. By “wherein the third target DNA sequence is located either within 100 nucleotides of the first target DNA sequence on the opposite strand of the DNA double helix or within 100 nucleotides of the second target DNA sequence on the opposite strand of the DNA double helix” is meant that in some embodiments the separation between the first and the second or third target is about 100 nucleotides or is less than 100 nucleotides from each other, however it should be understood that in some embodiments greater separations of more than 100 nucleotides between target DNA sequences are possible.
In general, gRNAs and/or target DNA sequences can be appropriately spaced and oriented so that Cas9 will simultaneously introduce single-stranded nicks on both strands of the target sequence. If the single-stranded nicks on both strands of the target sequence are located sufficiently close together, then the cellular repair machinery will repair the nicks as a double stranded break (DSB) and introduce an indel mutation into the targeted DNA sequence. For example, in this case repair of the nicks via NHEJ will introduce an indel mutation into the targeted DNA sequence. In some embodiments therefore, where repair of DSB is desired, the gRNAs and/or target DNA sequences are selected so that the single-stranded nicks on both strands of the target sequence are located sufficiently close together to induce DSB repair. In some such embodiments, where DSB repair is desired, a first target DNA sequence on one strand of the DNA double helix and a second target DNA sequence on the opposite strand of the DNA double helix are separated by about 100 nucleotides or less than 100 nucleotides from each other. In some such embodiments, the first target DNA sequence on one strand of the DNA double helix and the second target DNA sequence on the opposite strand of the DNA double helix are separated by less than 100 nucleotides, by less than 50 nucleotides, by less than about 20 nucleotides, or by less than about 10 nucleotides. In alternate embodiments, where DSB repair is not desired and it is desired instead to have the cellular machinery simply repair the nicks made by Cas9, without introducing an indel mutation, then the first target DNA sequence on one strand of the DNA double helix and the second target DNA sequence on the opposite strand of the DNA double helix are located sufficiently far apart or separated by a sufficient number of nucleotides so that DSB repair does not occur; for example, they may be separated by more than 100 nucleotides from each other, to ensure no DSB repair and no introduction of an indel mutation.
In some embodiments, target DNA sequences on opposite strands of the DNA double helix are selected to be located within a certain distance of each other sufficient to induce double-stranded break (DSB) repair, e.g., to induce an indel mutation in the DNA. In alternate embodiments, target DNA sequences on opposite strands of the double helix are selected to be located within a certain distance of each other sufficient to not induce double-stranded break (DSB) repair, e.g., so that no genomic modification occurs; for example, nicks may be repaired without modifying the starting TGR or target DNA sequence.
In some embodiments, only one gRNA (e.g., only the first gRNA, the second gRNA, or the third gRNA) can bind to its target DNA sequence in a “normal” or non-disease causing TGR; in this case, the CRISPR/Cas9 system will only create a single nick in one strand of the double helix in the TGR. This nick will simply be repaired and will not modify the TGR or target DNA sequence. In some embodiments therefore, the CRISPR/Cas9 system can only bind and/or modify the TGR (e.g., the third target DNA sequence) in the mammalian cell of a patient suffering from a disease or in a mammalian cell where the TGR includes a disease-causing mutation.
In some embodiments, the third target DNA sequence is only adjacent to the third PAM sequence if the target genome region comprises a disease-causing mutation or a sequence for which modification is desired or is in the mammalian cell of a patient suffering from a disease. In some embodiments, one or more of the third PAM sequence and the third target DNA sequence are modified by one or more nucleotide change within the modified-TGR so as to prevent further modification by the CRISPR/Cas9 system, e.g., so that binding by the third gRNA and/or the Cas9n protein is prevented.
In some embodiments, the disease-causing mutation is a repeat expansion, e.g., a trinucleotide expansion, a hexanucleotide expansion, and the like. In some embodiments, the repeat expansion is at least about 30 bp long. For example, the repeat expansion may encompass 5 or more hexanucleotide repeats, 10 or more trinucleotide repeats, etc. In an embodiment, the repeat expansion encompasses more than 3 hexanucleotide repeats, more than 4 hexanucleotide repeats, or more than 5 hexanucleotide repeats. In some embodiments, the disease is a repeat expansion disorder, e.g., a trinucleotide repeat disorder such as without limitation Fragile X Syndrome, Huntington's disease, spinocerebellar ataxia, myotonic dystrophy, myoclonic epilepsy, and/or Friedreich's ataxia; a hexanucleotide repeat disorder, such as without limitation amyotrophic lateral sclerosis (ALS) and frontotemporal dementia; and the like.
In some embodiments, the disease-causing mutation is an amyotrophic lateral sclerosis (ALS)-causing mutation and/or the disease is ALS. In some embodiments, the disease-causing causing mutation is a Fragile X Syndrome-causing mutation and/or the disease is Fragile X Syndrome.
In some embodiments, where there are multiple gRNAs, the multiple gRNAs may be the same or different. For example, the first gRNA and the second gRNA may be the same or different from each other, and each may be the same or different from the third gRNA. Many such permutations are possible. Similarly, the first, second, and third PAM sequences may be the same or different.
In some embodiments, the CRISPR/Cas9 system may further comprise a repair template for homology-directed repair (HDR). The repair template may or may not comprise one or more nucleotide change in one or more of the first PAM sequence, the second PAM sequence, and the (optional) third PAM sequence, and/or one or more nucleotide change in one or more of the first target DNA sequence, the second target DNA sequence, and the (optional) third target DNA sequence. In some embodiments, the repair template is a single-stranded DNA oligonucleotide (ssODN). In some embodiments, the repair template further comprises a DNA sequence to be inserted or modified in the target genome region. In some embodiments, the repair template is capped at the 5′ end, the 3′ end, or both. The cap may comprise, for example, 4 nucleotides or a peptide linked to the repair template. The repair template may further comprise a tag at the 5′ end, the 3′ end, or both, e.g., a detectable moiety such as without limitation a fluorophore, a cyanine dye, or a quantum dot.
In some embodiments, the CRISPR/Cas9 system further comprises: v) a fourth gRNA comprising a fourth CRISPR RNA (crRNA) and a fourth trans-activating crRNA (tracrRNA) linked together, the fourth gRNA being capable of binding with sequence specificity to a fourth DNA sequence on one strand of the DNA double helix in the TGR, the fourth target DNA sequence to which the fourth gRNA binds being adjacent to a fourth PAM sequence; wherein the fourth target DNA sequence is located either within 100 nucleotides of the first target DNA sequence on the opposite strand of the DNA double helix or within 100 nucleotides of the second target DNA sequence on the opposite strand of the DNA double helix; and wherein the fourth gRNA is selected such that the CRISPR/Cas9 system can only bind and/or modify the fourth target DNA sequence if the target genome region comprises a disease-causing modification or a sequence for which modification is desired. By “wherein the fourth target DNA sequence is located either within 100 nucleotides of the first target DNA sequence on the opposite strand of the DNA double helix or within 100 nucleotides of the second target DNA sequence on the opposite strand of the DNA double helix” is meant that in some embodiments the separation between the first and the second, third or fourth target is about 100 nucleotides or is less than 100 nucleotides from each other, however it should be understood that in some embodiments greater separations of more than 100 nucleotides between target DNA sequences are possible.
In some embodiments, only one gRNA (e.g., only the first gRNA, the second gRNA, the third gRNA, or the fourth gRNA) can bind to its target DNA sequence in a “normal” or non-disease causing TGR; in this case, the CRISPR/Cas9 system will only create a single nick in one strand of the double helix in the TGR. This nick will simply be repaired and will not modify the TGR or target DNA sequence. In some embodiments therefore, the CRISPR/Cas9 system can only bind and/or modify the TGR (e.g., the third or fourth target DNA sequence) in the mammalian cell of a patient suffering from a disease or in a mammalian cell where the TGR includes a disease-causing mutation.
In some embodiments, the third target DNA sequence or the fourth target DNA sequence is only adjacent to the third PAM sequence or the fourth PAM sequence respectively if the target genome region comprises a disease-causing mutation or a sequence for which modification is desired or is in the mammalian cell of a patient suffering from a disease. In some embodiments, one or more of the third PAM sequence and the third target DNA sequence are modified by one or more nucleotide change within the modified-TGR so as to prevent further modification by the CRISPR/Cas9 system, e.g., so that binding by the third gRNA and/or the Cas9n protein is prevented. In some embodiments, one or more of the fourth PAM sequence and the fourth target DNA sequence are modified by one or more nucleotide change within the modified-TGR so as to prevent further modification by the CRISPR/Cas9 system, e.g., so that binding by the fourth gRNA and/or the Cas9n protein is prevented. In some embodiments, one or more of the third PAM sequence, the fourth PAM sequence, the third target DNA sequence, and the fourth target DNA sequence are modified by one or more nucleotide change within the modified-TGR so as to prevent further modification by the CRISPR/Cas9 system, e.g., so that binding by the third gRNA, the fourth gRNA and/or the Cas9n protein is prevented.
In a second broad aspect, there is provided a method for targeted genomic modification within a target genome region (TGR) in a mammalian cell, the method comprising providing a CRISPR/Cas9 system and contacting the mammalian cell with the CRISPR/Cas9 system, wherein the CRISPR/Cas9 system comprises: i) one or more guide RNA (gRNA) comprising a CRISPR RNA (crRNA) and a trans-activating crRNA (tracrRNA) linked together, the one or more gRNA being capable of binding with sequence specificity to a first target DNA sequence and a second target DNA sequence in the TGR, the first target DNA sequence to which the one or more gRNA binds being adjacent to a first PAM sequence, and the second target DNA sequence being adjacent to a second PAM sequence, wherein the first and the second target DNA sequence are located within 100 nucleotides of each other and are on opposite strands of the DNA double helix; ii) a Cas9n protein; and iii) a repair template for homology-directed repair (HDR), wherein the repair template comprises one or more nucleotide change in one or more of the first PAM sequence and the second PAM sequence. By “wherein the first and the second target DNA sequence are located within 100 nucleotides of each other” is meant that in some embodiments the separation between the first and the second target is about 100 nucleotides or is less than 100 nucleotides from each other, however in some embodiments greater separations of more than 100 nucleotides between target DNA sequences are possible. The mammalian cell is contacted with the CRISPR/Cas9 system under conditions such that the TGR is modified, forming a modified-TGR, the repair template having been selected such that one or more of the first PAM sequence, the second PAM sequence, the first target DNA sequence and the second target DNA sequence are modified within the modified-TGR so as to prevent further modification of the modified-TGR by the CRISPR/Cas9 system. In some embodiments, the repair template comprises a single nucleotide change in one or more of the first PAM sequence, the second PAM sequence, the first target DNA sequence, and the second target DNA sequence.
As used herein, a “target DNA sequence” is also referred to as a “target genomic DNA (gDNA) region”. These terms are used interchangeably when the target DNA sequence is in the genome of a mammalian cell.
In some embodiments, one or more of the PAM sequences and the target DNA sequences no longer exists in the genome of the mammalian cell, or exists only on one strand of the DNA double helix within the modified-TGR, such that the CRISPR/Cas9 system can no longer bind to and/or modify the modified-TGR. In an embodiment, one or more of the first gRNA, the second gRNA, and the Cas9n protein can't bind to at least one strand of the DNA double helix in the modified-TGR. In another embodiment, one or more of the first gRNA, the second gRNA, and the Cas9n protein can't bind to either strand of the DNA double helix in the modified-TGR.
Any combination of the PAM sequences and the target DNA sequences may be modified in the modified-TGR. For example, in some embodiments, only one of the first PAM sequence, the second PAM sequence, the first target DNA sequence, and the second target DNA sequence is modified in the modified-TGR. In other embodiments, two, three, or all of the first PAM sequence, the second PAM sequence, the first target DNA sequence, and the second target DNA sequence are modified in the modified-TGR. The number and location of such modifications is not particularly limited, as long as the CRISPR/Cas9 system can no longer bind to and/or modify the modified-TGR. In this way efficiency can be increased. Further, in some cases, multiple introductions of the CRISPR/Cas9 system into a cell are also made possible, as a modified-TGR in a cell will not be further cut or modified by Cas9.
A PAM sequence (e.g., a first PAM sequence, a second PAM sequence, etc.) may be any PAM sequence appropriate for the particular Cas9 protein being used. PAM sequences associated with the various Cas9 proteins as indicated are shown in Table 1. In an embodiment, the PAM sequence is selected from NGG, NNGRRT, NNGRRN, NNNNGATT, NNAGAAW, NAAAAC, NGG, NAG, NGCG, NGAG, NGAN, NGNG, and NTT, where R is A or G, W is A or T, and N is A, C, G, or T. In an embodiment, the PAM sequence is 3′-GGA-5′ and the modified-TGR comprises a single nucleotide change in the PAM sequence that changes the PAM sequence to 3′-TGA-5′. In an embodiment where a repair template is included, the repair template comprises a single nucleotide change in the PAM sequence that changes the PAM sequence to 3′-TGA-5′.
Streptococcus
pyogenes(Sp)*
Staphylococcus
aureus(Sa)*
Neisseria
meningitides(Nm)*
Streptococcus
thermophilus(St)*
Treponema denticola(Td)*
Francisella
novicida(Fn)#
Neisseria cinerea(Nc)*
Campylobacter
jejuni(Cj)*
#Cpf1 protein
In some embodiments, one or more nucleotide or base change is made in a PAM sequence in the modified-TGR in order to “mask” the PAM sequence so that binding by the Cas9 enzyme is prevented. In some embodiments, the mutation(s) introduced in the PAM sequence prevents binding of the Cas9n protein. In some embodiments, the mutation(s) introduced in the PAM sequence is silent, i.e., does not change the amino acid encoded by the sequence, if the PAM sequence is included in a protein coding region. In some embodiments, masking the PAM sequence by mutating it so that the Cas9 protein can no longer bind prevents modified target DNA sequences from being cut a second time by the Cas9 protein. In this way efficiency can be increased. Further, in some cases, multiple introductions of the CRISPR/Cas9 system into a cell are also made possible, as a modified target DNA sequence will not be further cut or modified by Cas9.
In some embodiments, a PAM sequence is partially or fully located in an intron in the TGR.
In some embodiments, a repair template is a single-stranded DNA oligonucleotide (ssODN). A repair template, e.g., ssODN, typically further comprises a DNA sequence to be inserted or modified in the target DNA sequence. In some embodiments, a repair template, e.g., ssODN, further comprises a cap at the 5′ end, the 3′ end, or both. A cap may be, without limitation, 4 nucleotides (such as a CGCG repeat), a peptide, or a detectable tag (such as a fluorophore or a dye, e.g., a Cyanine dye, or a quantum dot). One or more caps may be present on a repair template. In some embodiments, one or more cap may serve to increase stability of the repair template, increase efficiency of the HDR pathway, and/or allow monitoring of cellular uptake and/or distribution of the repair template in a cell.
In some embodiments the components of the CRISPR/Cas9 system are provided directly as nucleic acid (e.g., RNA, ssODN) and protein. In other embodiments the components are provided in the form of a DNA, such as a vector, that encodes the component, for expression in the mammalian cell. Each component may be encoded by the same DNA or by a different DNA. Further, some components may be provided directly (e.g., as an RNA or protein, etc.) while other components are provided in the form of a DNA. Many such permutations are possible and are not meant to be limited.
In one embodiment, one or more gRNA is provided by an episome (e.g., through an episomal vector) that encodes it. The mammalian cell may be contacted with the episome encoding the gRNA first, prior to contacting the mammalian cell with the Cas9n protein and/or the optional repair template. Without wishing to be bound by theory, it is believed that, in some embodiments, priming a cell with gRNA (by e.g. an episomal vector) in this way may increase efficiency of genomic modification by allowing higher and/or longer expression of the gRNA in the mammalian cell (due to replication of the episome in the cell as an example). It should be understood that the number of episomal vectors is not particularly limited. Multiple gRNAs may be provided on one or on multiple episomes, for example.
In one embodiment, the Cas9n protein is provided directly as an isolated protein. In another embodiment, the Cas9n protein is provided in the form of a nucleic acid, e.g., a DNA plasmid, encoding the Cas9n protein. In another embodiment, the nucleic acid encoding the Cas9n protein is an RNA.
In some embodiments, the CRISPR/Cas9 system is introduced into the mammalian cell via transfection. Other methods for introducing nucleic acids and proteins into cells are known and may be used. The method of introducing the CRISPR/Cas9 system into the cell is not meant to be particularly limited.
In some embodiments, the CRISPR/Cas9 system is introduced, e.g., transfected, into the mammalian cell more than once. Multiple transfections may be performed, as desired, to increase efficiency of genomic modification. For example, multiple transfections may represent two, three, five, ten, or more than ten transfections into the cell, either in vitro or in vivo or both. In some embodiments, the CRISPR/Cas9 system is introduced into the mammalian cell by a) first transfecting an epiosomal vector encoding the gRNA into the mammalian cell; and b) then transfecting the Cas9n protein or an RNA encoding the Cas9n protein and the repair template (if present) into the mammalian cell, the repair template (if present) being for example a ssODN, as described herein.
For illustrative purposes the Cas9n protein is used in the embodiments described herein. However, the Cas protein is not meant to be particularly limited. It should be expressly understood that any suitable Cas protein may be used in methods and systems provided herein. Further, the term “Cas9” protein is used herein to refer generally to different forms of the protein, such as without limitation Cas9 (wild-type), Cas9n, dCas9, and other appropriate modified versions of Cas9, unless specified otherwise.
The mammalian cell to be genomically-modified is also not particularly limited. Any suitable cell in which a genomic modification is desired may be used in methods provided herein. For example, a mammalian cell may be an embryonic stem cell, a pluripotent stem cell, an induced pluripotent stem cell, a multipotent stem cell, a directly reprogrammed multipotent stem cell, a precursor cell, a progenitor cell, or a somatic cell. A mammalian cell may be a neuronal cell (including neurons, astrocytes and oligodendrocytes), a neural progenitor cell, a neural precursor cell, a neural stem cell, as well as other somatic, precursor, progenitor and stem cells of ectodermal, endodermal and mesodermal lineage such as cells of cardiac lineage, blood lineage (except mature red blood cells that have no nucleus), muscle lineage, adipocyte (fat) lineage, epithelial lineage, endothelial lineage, epidermal lineage, pulmonary lineage, hepatic lineage, pancreatic lineage, as well as kidney and other organ and system lineages, as well as tumour cells and other abnormal cells, and the like. In an embodiment, the mammalian cell is a human cell.
In a third aspect, there is provided a method for treating or preventing ALS in a patient in need thereof, comprising use of the CRISPR/Cas9 systems provided herein to repair mutations in the SOD1 gene in the patient (e.g., to correct an H46R mutation, or to delete repeats). In some embodiments, methods provided herein may be used to repair mutations in the SOD1 gene in ALS patients. For example, the target gDNA region may include or be adjacent to an H46R mutation in the SOD1 gene. In an embodiment, the TGR comprises all or a portion of the DNA sequence set forth in region 31655770-31670821 of NCBI Reference Sequence NC_000021.9; the repair template is a ssODN having the sequence set forth in SEQ ID NO: 6 or 7; and/or one or more gRNA comprises the sequence set forth in SEQ ID NO: 4 or 5.
In some embodiments, one or more gRNA is selected such that the target gDNA region to which it binds is adjacent to the required PAM sequence only in the chromosome of an ALS patient carrying an ALS DNA mutation and not in a normal chromosome (that does not have the mutation, or has already been repaired by an earlier intervention). This ensures that genomic modification will only occur in the ALS patient on genomic DNA regions in need of repair.
In a fourth aspect, there is provided a method for treating or preventing HIV infection in a patient in need thereof, comprising use of the CRISPR/Cas9 systems provided herein to introduce mutations (e.g., deletions) in the CCR5 gene in the patient. In some embodiments, methods provided herein may be used to modify the CCR5 gene, e.g., the target gDNA region is in the CCR5 gene, includes the CCR5 gene, or is adjacent to the CCR5 gene. For example, the TGR may comprise all or a portion of the DNA sequence set forth in region 46372903-46373961 of NCBI Reference Sequence NC_000003.12; the gRNA may comprise the sequence set forth in SEQ ID NO: 8 or 9; and/or the repair template may be a ssODN having the sequence set forth in SEQ ID NO: 10 or 11.
In a fifth aspect, there is provided a method for treating cancer in a patient in need thereof, comprising use of the CRISPR/Cas9 systems provided herein to introduce mutations (e.g., deletions or insertions) into a cancer-causing gene (e.g., an oncogene) in the patient. In some embodiments, methods provided herein may be used to modify a cancer-causing gene (i.e., the TGR is in the cancer causing gene) to correct the gene so it is no longer cancer-causing. For example, the TGR may comprise all or a portion of the DNA sequence set forth in region 21967752-21995301 of NCBI Reference Sequence NC-000009. In other embodiments, methods provided herein may be used to modify one or more gene in a cancer cell, such as one or more gene that results in death of the cell, termination or reduction in proliferation and/or growth of the cell, and/or confers dependence on the presence or introduction (or the lack of presence) of a substance for continued survival of the cell.
In a sixth aspect, there is provided a method for treating or preventing ALS or Frontotemporal Dementia resulting from a mutated C9ORF72 gene in a patient in need thereof, comprising use of the CRISPR/Cas9 systems provided herein. In an embodiment, deletions are introduced in the mutated C9ORF72 gene in the patient. In some embodiments, such deletions are introduced without the use a repair template, e.g., without the use of a ssODN. In some embodiments, methods provided herein may be used to modify or correct a mutated C9ORF72 gene (i.e., the TGR is in the mutated C9ORF72 gene). In an embodiment, the TGR comprises all or a portion of the DNA sequence set forth in region 27546546-27573866 of NCBI Reference Sequence NC_000009.12; and/or one or more gRNA having the sequence set forth in any one of SEQ ID NO: 1, 2, and 3 is used. In a particular embodiment, three gRNAs having the sequences of SEQ ID NOs: 1, 2, and 3 are used.
In a seventh aspect, there is provided a method for treating or preventing a mitochondrial disease in a patient in need thereof, comprising carrying out targeted genomic modification within a target mitochondrial DNA region in a mammalian cell of the patient using methods provided herein. In an embodiment, the ssODN is conjugated with MSP or TPP and the target mitochondrial DNA region comprises the nt.A12770G mutation.
In an eighth aspect, there is provided a method for treating or preventing cystic fibrosis in a patient in need thereof, comprising carrying out targeted genomic modification within a target genome region (TGR) in a mammalian cell of the patient using methods provided herein. In an embodiment, the TGR comprises the W1282X mutation.
In a ninth aspect, there is provided a method for inactivation of a transgene in a genetically-modified organism (GMO), comprising carrying out targeted genomic modification within a target genome region (TGR) in a cell of the GMO using methods provided herein. The GMO may be a plant or an animal such as, without limitation, an α-interferon transgene-expressing genetically-engineered plant (such as a rice crop), a GFP-expressing transgenic fish, and the like.
In a tenth aspect, there is provide a method of treating a disease listed in Table 5 or Table 6 in a patient in need thereof, comprising carrying out targeted genomic modification within a target genome region (TGR) in a mammalian cell of the patient using methods provided herein.
In another aspect, there is provided a method for targeted genomic modification within a target genome region (TGR) in a mammalian cell, the method comprising: a) providing a CRISPR/Cas9 system comprising: i) a first guide RNA (gRNA) comprising a first CRISPR RNA (crRNA) and a first trans-activating crRNA (tracrRNA) linked together, the first gRNA being capable of binding with sequence specificity to a first target DNA sequence on one strand of the DNA double helix in the TGR, the first target DNA sequence to which the first gRNA binds being adjacent to a first PAM sequence; ii) a second gRNA comprising a second CRISPR RNA (crRNA) and a second trans-activating crRNA (tracrRNA) linked together, the second gRNA being capable of binding with sequence specificity to a second target DNA sequence, the second target DNA sequence to which the second gRNA binds being adjacent to a second PAM sequence, wherein the second target DNA sequence is on the same strand of the DNA double helix as the first target DNA sequence; iii) a third gRNA comprising a third CRISPR RNA (crRNA) and a third trans-activating crRNA (tracrRNA) linked together, the third gRNA being capable of binding with sequence specificity to a third DNA sequence on one strand of the DNA double helix in the TGR, the third target DNA sequence to which the third gRNA binds being adjacent to a third PAM sequence, wherein the third target DNA sequence is on the opposite strand of the DNA double helix from the first and the second target DNA sequences; wherein the first target DNA sequence and the second target DNA sequence are overlapping, such that the first gRNA and the second gRNA compete for binding to their respective target DNA sequences; and wherein at least one of the second gRNA and the third gRNA is selected such that the CRISPR/Cas9 system can only bind and/or modify the second target DNA sequence and/or the third target DNA sequence respectively if the target genome region comprises a disease-causing modification or a sequence for which modification is desired; and iv) a Cas9n protein; and b) contacting the mammalian cell with the CRISPR/Cas9 system such that the TGR is modified, forming a modified-TGR. In some embodiments, the CRISPR/Cas9 system can only bind and/or modify the second and/or the third target DNA sequence in the mammalian cell of a patient suffering from a disease. In some embodiments, this method is particularly useful for treating a disease-causing mutation which is a repeat expansion and/or for treating a repeat expansion disorder.
In another aspect, there is provided a method for treating or preventing a repeat expansion disorder, comprising carrying out targeted genomic modification within a target genome region (TGR) in a mammalian cell of the patient using methods provided herein. In an embodiment, the target genome region comprises a disease-causing mutation which is a repeat expansion, e.g., a trinucleotide expansion, a hexanucleotide expansion, and the like, and triple gRNA guided excision as described herein is used to remove extra repeats from the TGR. In other embodiments, the target genome region comprises a disease-causing mutation which is a repeat expansion, e.g., a trinucleotide expansion, a hexanucleotide expansion, and the like, and triple gRNA guided excision (e.g., using a gRNA1, a gRNA2 and a gRNA3 and as described herein, for example in
In some embodiments, the repeat expansion is at least about 30 bp long. In some embodiments, the repeat expansion encompasses 5 or more hexanucleotide repeats, 10 or more trinucleotide repeats, more than 3 hexanucleotide repeats, more than 4 hexanucleotide repeats, or more than 5 hexanucleotide repeats. Any repeat expansion disorder may be treated using methods provided herein such as, without limitation, Fragile X Syndrome, Huntington's disease, spinocerebellar ataxia, myotonic dystrophy, myoclonic epilepsy, Friedreich's ataxia, amyotrophic lateral sclerosis (ALS) and/or frontotemporal dementia. In an embodiment, the disease-causing mutation is an amyotrophic lateral sclerosis (ALS)-causing mutation and/or the disease is ALS. In an embodiment, the disease-causing causing mutation is a Fragile X Syndrome-causing mutation and/or the disease is Fragile X Syndrome.
In another aspect, there is provided a method for targeted genomic modification within a target genome region (TGR) in a mammalian cell, the method comprising: a) providing a CRISPR/Cas9 system comprising: i) a first guide RNA (gRNA) comprising a first CRISPR RNA (crRNA) and a first trans-activating crRNA (tracrRNA) linked together, the first gRNA being capable of binding with sequence specificity to a first target DNA sequence on one strand of the DNA double helix in the TGR, the first target DNA sequence to which the first gRNA binds being adjacent to a first PAM sequence; ii) a second gRNA comprising a second CRISPR RNA (crRNA) and a second trans-activating crRNA (tracrRNA) linked together, the second gRNA being capable of binding with sequence specificity to a second target DNA sequence, the second target DNA sequence to which the second gRNA binds being adjacent to a second PAM sequence, wherein the second target DNA sequence is on the same strand of the DNA double helix as the first target DNA sequence; iii) a third gRNA comprising a third CRISPR RNA (crRNA) and a third trans-activating crRNA (tracrRNA) linked together, the third gRNA being capable of binding with sequence specificity to a third DNA sequence on one strand of the DNA double helix in the TGR, the third target DNA sequence to which the third gRNA binds being adjacent to a third PAM sequence, wherein the third target DNA sequence is on the opposite strand of the DNA double helix from the first and the second target DNA sequences; iv) a fourth gRNA comprising a fourth CRISPR RNA (crRNA) and a fourth trans-activating crRNA (tracrRNA) linked together, the fourth gRNA being capable of binding with sequence specificity to a fourth DNA sequence on one strand of the DNA double helix in the TGR, the fourth target DNA sequence to which the fourth gRNA binds being adjacent to a fourth PAM sequence, wherein the fourth target DNA sequence is on the opposite strand of the DNA double helix from the first and the second target DNA sequences; wherein at least one of the first gRNA, the second gRNA, the third gRNA, and the fourth gRNA is selected such that the CRISPR/Cas9 system can only bind and/or modify the respective target DNA sequence if the respective target DNA sequence comprises a disease-causing modification or a sequence for which modification is desired; and iv) a Cas9n protein; and b) contacting the mammalian cell with the CRISPR/Cas9 system such that the TGR is modified, forming a modified-TGR. In some embodiments, the CRISPR/Cas9 system can only bind and/or modify the respective target DNA sequence in the mammalian cell of a patient suffering from a disease. In some embodiments, the fourth gRNA is selected such that the CRISPR/Cas9 system can only bind and/or modify the fourth target DNA sequence if the fourth target DNA sequence comprises a disease-causing modification or a sequence for which modification is desired; and the second and the fourth target DNA sequence are located on opposite strands of the DNA double helix and are separated by a number of nucleotides sufficient to induce double stranded break (DSB) repair. In some embodiments, the second and the fourth target DNA sequence are separated by about 100 nucleotides or less than 100 nucleotides from each other, or by about 10 nucleotides or less, about 20 nucleotides or less, or about 50 nucleotides or less from each other. In some embodiments, the DSB repair introduces an indel mutation in the target genome region, e.g., knocking out or silencing the disease-causing modification in the target genome region.
In some embodiments, the third gRNA is also selected such that the CRISPR/Cas9 system can only bind and/or modify the third target DNA sequence if the third target DNA sequence comprises a disease-causing modification or a sequence for which modification is desired; in this case, the first and the third target DNA sequence are located on opposite strands of the DNA double helix and are separated by a number of nucleotides sufficient to induce double stranded break (DSB) repair (e.g., less than 100 nucleotides apart, less than 50 nucleotides apart, less than 20 nucleotides apart, or less than 10 nucleotides apart).
In alternative embodiments, the third gRNA and the first gRNA are selected such that the CRISPR/Cas9 system can bind and/or modify their respective target DNA sequences even if the respective target DNA sequences do not comprise a disease-causing modification; in this case, the first and the third target DNA sequence are located on opposite strands of the DNA double helix and are separated by a number of nucleotides sufficient to not induce double stranded break (DSB) repair (e.g., more than about 100 nucleotides apart).
In some embodiments, this method is particularly useful where the disease-causing mutation is a heterozygous mutation, e.g., a point mutation, e.g., a gain of function mutation, and only one chromosome requires repair. DSB repair is then only desired on the mutated chromosome. In an embodiment, the disease-causing mutation is a mutated SOD1 allele and/or the disease is ALS.
In another aspect, there is provided a method for treating or preventing a disease or condition caused by a heterozygous mutation, comprising carrying out targeted genomic modification within a target genome region (TGR) in a mammalian cell of the patient using methods provided herein. In an embodiment, the disease-causing heterozygous mutation is a point mutation. In some embodiments, the disease-causing heterozygous mutation is disrupted, i.e., removed or corrected. In other embodiments, the disease-causing heterozygous mutation is silenced or knocked out. It should be understood that in many cases where a disease-causing mutation is a heterologous mutation, knocking out the mutated allele can be as effective at treating the disease as correcting the gene mutation. For example, in many cases the non-mutated copy of the gene provides adequate levels of the gene product (protein or RNA), and the mutated copy of the gene interferes with the function of the wild-type gene product, thereby causing disease symptoms that would not occur if only the wild-type, non-mutated gene product were present. In such cases, knocking out the mutated allele provides a simpler strategy to treat the disease effectively and has a lower chance of introducing new, unwanted gene mutations during the genomic modification than repair of the mutated allele. In some embodiments therefore, there is provided a method for treating or preventing a disease or condition caused by a heterozygous mutation, comprising carrying out targeted genomic modification within a TGR in a mammalian cell of the patient to knock out the heterozygous mutation, using methods provided herein. In an embodiment, a heterozygous mutation is knocked out using a QuadPlex gRNA method as described herein (for example, in
In another aspect, there are provided methods for highly efficient and precise targeted genomic modification within a target genome region (TGR, also referred to as gDNA). In some embodiments, delivery of the components of the targeted genomic modification system (e.g., Cas9, gRNAs, HDR templates if needed, etc.) into the cells carrying the TGR is enhanced by delivering the targeted genomic modification components on a plasmid into the cells, and optionally selecting for transfected cells carrying the plasmid. For example, in some embodiments, Cas9 or Cas9n is delivered as an episomal or non-episomal plasmid expressing the Cas9 or Cas9n mRNA or protein. In some embodiments, CRISPR gRNAs are delivered in combination with Cas9 or Cas9n as episomal or non-episomal plasmids. Further, in some embodiments, gRNAs produced using in vitro transcription (also referred to as “IVT gRNAs”) are pre-loaded onto a Cas9 protein. In some embodiments, cleavage specificity of the CRISPR/Cas9 system is further enhanced by direct introduction into a cell of a pre-complexed Cas9 or Cas9n protein with IVT gRNA to form a Cas9/gRNA ribonucleoprotein complex (Cas9/gRNA RNP). This approach can be fine-tuned to deliver the Cas9/Cas9n protein at the optimum concentration to limit off-target cleavage by utilizing a cell's own endogenous degradation machinery to rapidly degrade the Cas9 or Cas9n protein, once it has completed its activity.
In another aspect, there are provided methods for highly efficient and precise targeted genomic modification within a target genome region in a population of cells comprising introducing components of the targeted genomic modification system into the cells using an episomal plasmid and selecting for transfected cells having the episomal plasmid therein. In some embodiments, transfected cells are selected by screening for truncated proteins and/or tag-epitopes that are encoded by the episomal plasmid encoding the components of the targeted genomic modification system (Cas9, gRNAs, etc.) and expressed at the cell surface of transfected cells carrying the episomal plasmid. For example, in some embodiments an episomal plasmid encoding one or more gRNA and/or Cas9 also encodes a truncated surface protein or a protein that confers specific antibiotic resistance to a cell, allowing for selection and purification of transfected cells carrying the episomal plasmid using, e.g., sorting, magnetic antibody separation, a specific antibody for the truncated surface protein, and the like. Cells having the episomal plasmid are then selected and purified out of the starting cell population, greatly enriching the number of genomic-modified cells in the population. In some embodiments, a completely pure or nearly pure genomic-modified cell population may be obtained using an episomal plasmid. It should be understood that the components of the targeted genomic modification system (Cas9, gRNAs, etc.) are generally encoded by the same episomal plasmid encoding the truncated protein and/or tag-epitope. However, in alternate embodiments, the components of the targeted genomic modification system (Cas9, gRNAs, etc.) and the truncated protein and/or tag epitope may be encoded by separate episomal plasmids which are co-transfected into the cells.
In some embodiments, transfected cells are selected and/or purified without antibiotic selection by transfecting an episomal plasmid encoding a non-immunogenic N- or C-terminal truncated protein that is expressed at the cell surface of transfected cells carrying the episomal plasmid. This approach can be utilized for any cell type, whether cells are adherent or grow in suspension, for rapid antibiotic-free selection of transfected cells, in order to enrich the percentage of genomic-modified cells. In some embodiments, an episomal plasmid encoding a non-immunogenic N- or C-terminal truncated protein may also encode a tag-epitope. Alternatively, an episomal plasmid encoding a tag-epitope may be used. A tag-epitope can be used similarly either in place of or in addition to a truncated surface protein as a selection, tracking and purification tool for transfected cells. In some embodiments, tag-epitopes are inserted between the ends of an outer membrane signal peptide and before the start codon of a truncated protein. It should be understood that any suitable truncated protein and/or tag-epitope may be used.
Exemplary gRNA and repair template sequences are shown in Table 2 and Table 6. In another aspect, there are provided isolated nucleic acids comprising or consisting of any one of the sequences set forth in SEQ ID NOs: 1-103, 112 and 113.
1PAM sequences are single underlined; Bold and double underlined bases represent modifications introduced into the modified-TGR by the CRISPR/Cas9 system.
Therapeutic application of methods and systems provided herein is not meant to be particularly limited. For example, methods and systems may be used for genetic modification, for gene-editing, to manipulate DNA in a cell, to increase or decrease expression of a particular gene, to correct mutated sequences or base pairs, to correct deletions or insertions of sequences, to cause deletions or insertions of sequences, to induce mutations, to inactivate a transgene in a genetically modified organism, and the like. Methods and systems may thus be used in a wide range of applications, including therapeutically in a patient, e.g, a human patient, or in a cell derived from or isolated from a human patient. Methods and systems provided herein may be used ex vivo or in vivo.
In another aspect, there is provided a repair template for HDR, e.g., a ssODN, as described herein. For example, isolated ssODNs having modifications as described above are provided (e.g., having mutated PAM sequences, capped ends, etc). Further, DNAs such as recombinant vectors encoding and expressing such ssODNs are also provided. In an embodiment, an ssODN having the sequence set forth in SEQ ID NOs: 6, 7, 10, 11, 82, or 83 is provided. DNAs such as recombinant vectors encoding and expressing such ssODNs are also provided.
In yet another aspect, there is provided a guide RNA for treating ALS or HIV, having the sequence set forth in SEQ ID NOs: 1, 2, 3, 4, 5, 8, or 9. DNAs such as recombinant vectors encoding and expressing such gRNAs are also provided.
In still another aspect, there are provided cells comprising repair templates, e.g., ssODNs, and gRNAs described herein, as well as DNAs encoding them. In yet another aspect, there are provided cells comprising a CRISPR/Cas9 system described herein.
In a further aspect, there is provided a kit for genomic modification in a cell. The kit may comprise a repair template, e.g., an ssODN, a gRNA, and/or a Cas9 protein, or nucleic acids encoding them, and instructions for use thereof.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
For a better understanding of the invention and to show more clearly how it may be carried into effect, reference will now be made by way of example to the accompanying drawings, which illustrate aspects and features according to embodiments of the present invention, and in which:
In order to provide a clear and consistent understanding of the terms used in the present specification, a number of definitions are provided below. Moreover, unless defined otherwise, all technical and scientific terms as used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this invention pertains.
The use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one”, but it is also consistent with the meaning of “one or more”, “at least one”, and “one or more than one”. Similarly, the word “another” may mean at least a second or more.
As used in this specification and claim(s), the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “include” and “includes”) or “containing” (and any form of containing, such as “contain” and “contains”), are inclusive or open-ended and do not exclude additional, unrecited elements or process steps.
The term “about” is used to indicate that a value includes an inherent variation of error for the device or the method being employed to determine the value.
The terms “derivative” and “variant” are used interchangeably herein.
The terms “polynucleotide” and “nucleic acid,” used interchangeably herein, refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. Thus, this term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. “Oligonucleotide” generally refers to polynucleotides of between about 5 and about 200, and more generally of less than about 1000, nucleotides of single- or double-stranded DNA, although there is no upper limit to the length of an oligonucleotide. Oligonucleotides are also referred to as “oligomers” or “oligos” and may be isolated from genes, or chemically synthesized by methods known in the art. The terms “polynucleotide” and “nucleic acid” should be understood to include, as applicable to the embodiments being described, single-stranded (such as sense or antisense) and double-stranded polynucleotides.
“Genomic DNA” refers to the DNA of a genome of an organism including, but not limited to, the DNA of the genome of a bacterium, fungus, archea, plant or animal, including mammals, including humans.
“Manipulating” DNA encompasses binding, nicking one strand, or cleaving (i.e., cutting) both strands of the DNA, or encompasses modifying the DNA or a polypeptide associated with the DNA. Manipulating DNA can (but does not necessarily) silence, activate, or modulate (either increase or decrease) the expression of an RNA or polypeptide encoded by the DNA. Manipulating DNA can (but does not necessarily) alter the amino acid sequence of a polypeptide encoded by the DNA. Such alteration in the amino acid sequence may affect (e.g., increase or decrease) the function, enzymatic activity, and/or stability of the encoded polypeptide.
A “stem-loop structure” refers to a nucleic acid having a secondary structure that includes a region of nucleotides which are known or predicted to form a double strand (stem portion) that is linked on one side by a region of predominantly single-stranded nucleotides (loop portion). The terms “hairpin” and “fold-back” structures are also used herein to refer to stem-loop structures. Such structures are well known in the art and these terms are used consistently with their known meanings in the art. As is known in the art, a stem-loop structure does not require exact base-pairing. Thus, the stem may include one or more base mismatches. Alternatively, the base-pairing may be exact, i.e. not include any mismatches.
By “hybridizable” or “complementary” or “substantially complementary” it is meant that a nucleic acid (e.g., RNA) comprises a sequence of nucleotides that enables it to non-covalently bind, i.e., to form Watson-Crick base pairs and/or G/U base pairs, “anneal”, or “hybridize,” to another nucleic acid in a sequence-specific, antiparallel, manner (i.e., a nucleic acid specifically binds to a complementary nucleic acid) under the appropriate in vitro and/or in vivo conditions of temperature and solution ionic strength. As is known in the art, standard Watson-Crick base-pairing includes: adenine (A) pairing with thymidine (T), adenine (A) pairing with uracil (U), and guanine (G) pairing with cytosine (C) (DNA, RNA). In addition, it is known in the art that for hybridization between two RNA molecules (e.g., dsRNA), guanine (G) base pairs with uracil (U). For example, G/U base-pairing is partially responsible for the degeneracy (i.e., redundancy) of the genetic code in the context of tRNA anti-codon base-pairing with codons in mRNA. In the context of this disclosure, a guanine (G) of a protein-binding segment (dsRNA duplex) of a guide RNA molecule is considered complementary to a uracil (U), and vice versa. As such, when a G/U base-pair can be made at a given nucleotide position in a protein-binding segment (dsRNA duplex) of a guide RNA molecule, the position is not considered to be non-complementary, but is instead considered to be complementary.
Hybridization and washing conditions are well known and exemplified in Sambrook, J., Fritsch, E. F. and Maniatis, T. Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (1989), particularly Chapter 11 and Table 11.1 therein; and Sambrook, J. and Russell, W., Molecular Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (2001). It is well-known that the conditions of temperature and ionic strength determine the “stringency” of the hybridization.
Hybridization requires that two nucleic acids contain complementary sequences, although mismatches between bases are possible. The conditions appropriate for hybridization between two nucleic acids depend on the length of the nucleic acids and the degree of complementarity, variables well-known in the art. The greater the degree of complementarity between two nucleotide sequences, the greater the value of the melting temperature (Tm) for hybrids of nucleic acids having those sequences. For hybridizations between nucleic acids with short stretches of complementarity (e.g., complementarity over 35 or less, 30 or less, 25 or less, 22 or less, 20 or less, or 18 or less nucleotides) the position of mismatches becomes important (see Sambrook et al., supra, 11.7-11.8). Typically, the length for a hybridizable nucleic acid is at least about 10 nucleotides. Illustrative minimum lengths for a hybridizable nucleic acid are: at least about 15 nucleotides; at least about 20 nucleotides; at least about 22 nucleotides; at least about 25 nucleotides; and at least about 30 nucleotides. Furthermore, the skilled artisan will recognize that the temperature and wash solution salt concentration may be adjusted as necessary according to factors such as length of the region of complementarity and the degree of complementarity.
It is understood in the art that the sequence of polynucleotide need not be 100% complementary to that of its target nucleic acid to be specifically hybridizable or hybridizable. Moreover, a polynucleotide may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g., a loop structure or hairpin structure). A polynucleotide can comprise at least 70%, at least 80%, at least 90%, at least 95%, at least 99%, or 100% sequence complementarity to a target region within a target nucleic acid sequence to which it is targeted. For example, an antisense nucleic acid in which 18 of 20 nucleotides of the antisense compound are complementary to a target region, and would therefore specifically hybridize, would represent 90 percent complementarity. In this example, the remaining non-complementary nucleotides may be clustered or interspersed with complementary nucleotides and need not be contiguous to each other or to complementary nucleotides. Percent complementarity between particular stretches of nucleic acid sequences within nucleic acids can be determined routinely using BLAST programs (basic local alignment search tools) and PowerBLAST programs known in the art (Altschul et al., J. Mol. Biol., 1990, 215, 403-410; Zhang and Madden, Genome Res., 1997, 7, 649-656) or by using the Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer Group, University Research Park, Madison Wis.), using default settings, which uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2, 482-489).
The terms “peptide,” “polypeptide,” and “protein” are used interchangeably herein, and refer to a polymeric form of amino acids of any length, which can include coded and non-coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones.
“Binding” as used herein (e.g., with reference to an RNA-binding domain of a polypeptide) refers to a non-covalent interaction between macromolecules (e.g., between a protein and a nucleic acid). While in a state of non-covalent interaction, the macromolecules are said to be “associated” or “interacting” or “binding” (e.g., when a molecule X is said to interact with a molecule Y, it is meant the molecule X binds to molecule Y in a non-covalent manner). Not all components of a binding interaction need be sequence-specific (e.g., contacts with phosphate residues in a DNA backbone), but some portions of a binding interaction may be sequence-specific. Binding interactions are generally characterized by a dissociation constant (Kd) of less than 10−6 M, less than 10−7 M, less than 10−8 M, less than 10−9 M, less than 10−10 M, less than 10−11 M, less than 10−12 M, less than 10−13 M, less than 10−14 M, or less than 10−15 M. “Affinity” refers to the strength of binding, increased binding affinity being correlated with a lower Kd. By “binding domain” it is meant a protein domain that is able to bind non-covalently to another molecule. A binding domain can bind to, for example, a DNA molecule (a DNA-binding protein), an RNA molecule (an RNA-binding protein) and/or a protein molecule (a protein-binding protein). In the case of a protein domain-binding protein, it can bind to itself (to form homodimers, homotrimers, etc.), and/or it can bind to one or more molecules of a different protein or proteins.
The term “conservative amino acid substitution” refers to the interchangeability in proteins of amino acid residues having similar side chains. For example, a group of amino acids having aliphatic side chains consists of glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains consists of serine and threonine; a group of amino acids having amide containing side chains consisting of asparagine and glutamine; a group of amino acids having aromatic side chains consists of phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains consists of lysine, arginine, and histidine; a group of amino acids having acidic side chains consists of glutamate and aspartate; and a group of amino acids having sulfur containing side chains consists of cysteine and methionine. Exemplary conservative amino acid substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine-glutamine.
A polynucleotide or polypeptide has a certain percent “sequence identity” to another polynucleotide or polypeptide, meaning that, when aligned, that percentage of bases or amino acids are the same, and in the same relative position, when comparing the two sequences. Sequence identity can be determined in a number of different ways. To determine sequence identity, sequences can be aligned using various methods and computer programs (e.g., BLAST, T-COFFEE, MUSCLE, MAFFT, etc.), available over the world wide web at sites including ncbi.nlm.nili.gov/BLAST, ebi.ac.uk/Tools/msa/tcoffee/,ebi.ac.uk/Tools/msa/muscle/,mafft.cbrc.ip/aliqnment/software/. See, e.g., Altschul et al. (1990), J. Mol. Bioi. 215:403-10.
A DNA sequence that “encodes” a particular RNA is a DNA nucleic acid sequence that is transcribed into RNA. A DNA polynucleotide may encode an RNA (mRNA) that is translated into protein, or a DNA polynucleotide may encode an RNA that is not translated into protein (e.g. tRNA, rRNA, or a guide RNA; also called “non-coding” RNA or “ncRNA”). A “protein coding sequence” or a sequence that encodes a particular protein or polypeptide, is a nucleic acid sequence that is transcribed into mRNA (in the case of DNA) and is translated (in the case of mRNA) into a polypeptide in vitro or in vivo when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are determined by a start codon at the 5′ terminus (N-terminus) and a translation stop nonsense codon at the 3′ terminus (C-terminus). A coding sequence can include, but is not limited to, cDNA from prokaryotic or eukaryotic mRNA, genomic DNA sequences from prokaryotic or eukaryotic DNA, and synthetic nucleic acids. A transcription termination sequence will usually be located 3′ to the coding sequence.
As used herein, a “promoter sequence” is a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a downstream (3′ direction) coding or non-coding sequence. For purposes of defining the present invention, the promoter sequence is bounded at its 3′ terminus by the transcription initiation site and extends upstream (5′ direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence will be found a transcription initiation site, as well as protein binding domains responsible for the binding of RNA polymerase. Eukaryotic promoters will often, but not always, contain “TATA” boxes and “CAT” boxes. Various promoters, including inducible promoters, the T7 promoter, etc., may be used to drive the various vectors of the present invention.
A promoter can be a constitutively active promoter (i.e., a promoter that is constitutively in an active/“ON” state), an inducible promoter (i.e., a promoter whose state, active/“ON” or inactive/“OFF”, is controlled by an external stimulus, e.g., the presence of a particular temperature, compound, or protein), a spatially restricted promoter (i.e., transcriptional control element, enhancer, etc.; e.g., a tissue specific promoter, a cell type specific promoter, etc.), and/or a temporally restricted promoter (i.e., the promoter is in the “ON” state or “OFF” state during specific stages of embryonic development or during specific stages of a biological process. Suitable promoters can be derived from viruses and can therefore be referred to as viral promoters, or they can be derived from any organism, including prokaryotic or eukaryotic organisms. Suitable promoters can be used to drive expression by any RNA polymerase (e.g., Pol I, Pol II, or Pol III). Exemplary promoters include, but are not limited to, the SV40 early promoter; mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter; a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE); a rous sarcoma virus (RSV) promoter; a human U6 small nuclear promoter (U6; Miyagishi et al., Nature Biotechnology 20: 497-500, 2002), an enhanced U6 promoter (e.g., Xia et al., Nucleic Acids Res. 31(17), 2003), a human H1 promoter (H1), and the like.
Examples of inducible promoters include, but are not limited to, T7 RNA polymerase promoter, T3 RNA polymerase promoter, Isopropyl-beta-D-thiogalactopyranoside (IPTG)-regulated promoter, lactose-induced promoter, heat shock promoter, tetracycline-regulated promoter, steroid-regulated promoter, metal-regulated promoter, estrogen receptor-regulated promoter, and the like. Inducible promoters can therefore be regulated by molecules including, but not limited to, doxycycline; RNA polymerase, e.g., T7 RNA polymerase; an estrogen receptor; an estrogen receptor fusion; etc.
In some embodiments, the promoter is a spatially restricted promoter (i.e., cell type specific promoter, tissue specific promoter, etc.) such that in a multi-cellular organism, the promoter is active (i.e., “ON”) in a subset of specific cells. Spatially restricted promoters may also be referred to as enhancers, transcriptional control elements, control sequences, etc. Any convenient spatially restricted promoter may be used and the choice of suitable promoter (e.g., a brain specific promoter, a promoter that drives expression in a subset of neurons, a promoter that drives expression in the germline, a promoter that drives expression in the lungs, a promoter that drives expression in muscles, a promoter that drives expression in islet cells of the pancreas, etc.) will depend on several factors such as the organism. For example, various spatially restricted promoters are known for plants, flies, worms, mammals, mice, etc. Thus, a spatially restricted promoter can be used to regulate the expression of a nucleic acid encoding a site-directed modifying polypeptide in a wide variety of different tissues and cell types, depending on the organism. Some spatially restricted promoters are also temporally restricted such that the promoter is in the “ON” state or “OFF” state during specific stages of embryonic development or during specific stages of a biological process. Many examples of spatially restricted promoters are known and include, without limitation: neuron-specific promoters, adipocyte-specific promoters, cardiomyocyte-specific promoters, smooth muscle-specific promoters, and photoreceptor-specific promoters.
The terms “DNA regulatory sequences,” “control elements,” and “regulatory elements,” used interchangeably herein, refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate transcription of a non-coding sequence (e.g., a guide RNA) or a coding sequence (e.g., a site-directed modifying polypeptide, or a Cas9 polypeptide) and/or regulate translation of an encoded polypeptide.
The term “naturally-occurring” or “unmodified” as used herein as applied to a nucleic acid, a polypeptide, a cell, or an organism, refers to a nucleic acid, polypeptide, cell, or organism that is found in nature. For example, a polypeptide or polynucleotide sequence that is present in an organism (including viruses) that can be isolated from a source in nature and which has not been intentionally modified by a human in the laboratory is naturally occurring.
“Recombinant,” as used herein, means that a particular nucleic acid (DNA or RNA) or vector is the product of various combinations of cloning, restriction, polymerase chain reaction (PCR) and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from endogenous nucleic acids found in natural systems. DNA sequences encoding polypeptides can be assembled from cDNA fragments or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid which is capable of being expressed from a recombinant transcriptional unit contained in a cell or in a cell-free transcription and translation system. Genomic DNA comprising the relevant sequences can also be used in the formation of a recombinant gene or transcriptional unit. Sequences of non-translated DNA may be present 5′ or 3′ from the open reading frame, where such sequences do not interfere with manipulation or expression of the coding regions, and may indeed act to modulate production of a desired product by various mechanisms. Alternatively, DNA sequences encoding RNA (e.g., guide RNA) that is not translated may also be considered recombinant. Thus, e.g., the term “recombinant” nucleic acid refers to one which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of sequence through human intervention. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. Such is usually done to replace a codon with a codon encoding the same amino acid, a conservative amino acid, or a non-conservative amino acid. Alternatively, it is performed to join together nucleic acid segments of desired functions to generate a desired combination of functions. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. When a recombinant polynucleotide encodes a polypeptide, the sequence of the encoded polypeptide can be naturally occurring (“wild type”) or can be a variant (e.g., a mutant) of the naturally occurring sequence.
Thus, the term “recombinant” polypeptide does not necessarily refer to a polypeptide whose sequence does not naturally occur. Instead, a “recombinant” polypeptide is encoded by a recombinant DNA sequence, but the sequence of the polypeptide can be naturally occurring (“wild type”) or non-naturally occurring (e.g., a variant, a mutant, etc.). Thus, a “recombinant” polypeptide is the result of human intervention, but may be a naturally occurring amino acid sequence.
A “vector” or “expression vector” is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, i.e. an “insert”, may be attached so as to bring about the replication of the attached segment in a cell.
An “expression cassette” comprises a DNA coding sequence operably linked to a promoter. “Operably linked” refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. For instance, a promoter is operably linked to a coding sequence if the promoter affects its transcription or expression. The terms “recombinant expression vector,” or “DNA construct” are used interchangeably herein to refer to a DNA molecule comprising a vector and at least one insert. Recombinant expression vectors are usually generated for the purpose of expressing and/or propagating the insert(s), or for the construction of other recombinant nucleotide sequences. The nucleic acid(s) may or may not be operably linked to a promoter sequence and may or may not be operably linked to DNA regulatory sequences.
A cell has been “transformed” or “transfected” by exogenous DNA, e.g., a recombinant expression vector, when such DNA has been introduced inside the cell. The presence of the exogenous DNA results in permanent or transient genetic change. The transforming DNA may or may not be integrated (covalently linked) into the genome of the cell.
In prokaryotes, yeast, and mammalian cells for example, the transforming DNA may be maintained on an episomal element such as a plasmid. With respect to eukaryotic cells, a stably transformed cell is one in which the transforming DNA has become integrated into a chromosome so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones that comprise a population of daughter cells containing the transforming DNA. A “clone” is a population of cells derived from a single cell or common ancestor by mitosis. A “cell line” is a clone of a primary cell that is capable of stable growth in vitro for many generations.
Suitable methods of transformation include, e.g., viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI)-mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, direct microinjection, nanoparticle-mediated nucleic acid delivery, and the like. The choice of method of transformation is generally dependent on the type of cell being transformed and the circumstances under which the transformation is taking place (e.g., in vitro, ex vivo, or in vivo). A general discussion of these methods can be found in Ausubel, et al., Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons, 1995.
A “host cell,” as used herein, denotes an in vivo or in vitro eukaryotic cell, a prokaryotic cell (e.g., bacterial or archaeal cell), or a cell from a multicellular organism (e.g., a cell line) cultured as a unicellular entity, which eukaryotic or prokaryotic cells can be, or have been, used as recipients for a nucleic acid, and include the progeny of the original cell which has been transformed by the nucleic acid. It is understood that the progeny of a single cell may not necessarily be completely identical in morphology or in genomic or total DNA complement as the original parent, due to natural, accidental, or deliberate mutation. A “recombinant host cell” (also referred to as a “genetically modified host cell”) is a host cell into which has been introduced a heterologous nucleic acid, e.g., an expression vector. For example, a bacterial host cell is a genetically modified bacterial host cell by virtue of introduction into a suitable bacterial host cell of an exogenous nucleic acid (e.g., a plasmid or recombinant expression vector) and a eukaryotic host cell is a genetically modified eukaryotic host cell (e.g., a mammalian germ cell), by virtue of introduction into a suitable eukaryotic host cell of an exogenous nucleic acid.
A “target DNA” as used herein is a DNA polynucleotide that comprises a “target site” or “target sequence.” The terms “target site,” “target sequence,” “target protospacer DNA,” or “protospacer-like sequence” are used interchangeably herein to refer to a nucleic acid sequence present in a target DNA to which a DNA-targeting segment of a guide RNA will bind, provided sufficient conditions for binding exist. For example, the target site (or target sequence) 5′-GAGCATATC-3′ within a target DNA is targeted by (or is bound by, or hybridizes with, or is complementary to) the RNA sequence 5′-GAUAUGCUC-3′. Suitable DNA/RNA binding conditions include physiological conditions normally present in a cell. Other suitable DNA/RNA binding conditions (e.g., conditions in a cell-free system) are known in the art; see, e.g., Sambrook, supra. The strand of the target DNA that is complementary to and hybridizes with the guide RNA is referred to as the “complementary strand” and the strand of the target DNA that is complementary to the “complementary strand” (and is therefore not complementary to the guide RNA) is referred to as the “noncomplementary strand” or “non-complementary strand.” By “site-directed modifying polypeptide” or “RNA-binding site-directed polypeptide” or “RNA-binding site-directed modifying polypeptide” or “site-directed polypeptide” is meant a polypeptide that binds RNA and is targeted to a specific DNA sequence. A site-directed modifying polypeptide as described herein is targeted to a specific DNA sequence by the RNA molecule to which it is bound. The RNA molecule comprises a sequence that binds, hybridizes to, or is complementary to a target sequence within the target DNA, thus targeting the bound polypeptide to a specific location within the target DNA (the target sequence).
By “cleavage” is meant the breakage of the covalent backbone of a DNA molecule. Cleavage can be initiated by a variety of methods including, but not limited to, enzymatic or chemical hydrolysis of a phosphodiester bond. Both single-stranded cleavage and double-stranded cleavage are possible, and double-stranded cleavage can occur as a result of two distinct single-stranded cleavage events. DNA cleavage can result in the production of either blunt ends or staggered ends. In certain embodiments, a complex comprising a guide RNA and a site-directed modifying polypeptide is used for targeted double-stranded DNA cleavage.
“Nuclease” and “endonuclease” are used interchangeably herein to mean an enzyme which possesses endonucleolytic catalytic activity for DNA cleavage.
By “cleavage domain” or “active domain” or “nuclease domain” of a nuclease is meant the polypeptide sequence or domain within the nuclease which possesses the catalytic activity for DNA cleavage. A cleavage domain can be contained in a single polypeptide chain or cleavage activity can result from the association of two (or more) polypeptides. A single nuclease domain may consist of more than one isolated stretch of amino acids within a given polypeptide.
By “site-directed polypeptide” or “RNA-binding site-directed polypeptide” is meant a polypeptide that binds RNA and is targeted to a specific DNA sequence. A site-directed polypeptide as described herein is targeted to a specific DNA sequence by the RNA molecule to which it is bound. The RNA molecule comprises a sequence that is complementary to a target sequence within the target DNA, thus targeting the bound polypeptide to a specific location within the target DNA (the target sequence).
The RNA molecule that binds to the site-directed modifying polypeptide and targets the polypeptide to a specific location within a target DNA is referred to herein as a “guide RNA” or “guide RNA polynucleotide” (also referred to herein as a “gRNA”). A guide RNA typically comprises two segments, a “DNA-targeting segment” and a “protein-binding segment.” By “segment” is meant a segment, section, or region of a molecule, e.g., a contiguous stretch of nucleotides in an RNA. A segment can also mean a region or section of a complex such that a segment may comprise regions of more than one molecule. For example, in some cases the protein-binding segment (described below) of a guide RNA is one RNA molecule and the protein-binding segment therefore comprises a region of that RNA molecule. In other cases, the protein-binding segment (described below) of a guide RNA comprises two separate molecules that are hybridized along a region of complementarity. As an illustrative, non-limiting example, a protein-binding segment of a guide RNA that comprises two separate molecules can comprise (i) base pairs 40-75 of a first RNA molecule that is 100 base pairs in length; and (ii) base pairs 10-25 of a second RNA molecule that is 50 base pairs in length. The definition of “segment,” unless otherwise specifically defined in a particular context, is not limited to a specific number of total base pairs, is not limited to any particular number of base pairs from a given RNA molecule, is not limited to a particular number of separate molecules within a complex, and may include regions of RNA molecules that are of any total length and may or may not include regions with complementarity to other molecules.
The DNA-targeting segment (or “DNA-targeting sequence”) comprises a nucleotide sequence that is complementary to a specific sequence within a target DNA sequence or target genomic DNA (gDNA) region (the complementary strand of the target DNA) designated the “protospacer-like” sequence herein. The protein-binding segment (or “protein-binding sequence”) interacts with a site-directed modifying polypeptide. When the site-directed modifying polypeptide is a Cas9 or Cas9 related polypeptide (described in more detail below), site-specific cleavage of the target DNA occurs at locations determined by both (i) base-pairing complementarity between the guide RNA and the target DNA; and (ii) a short motif (referred to as the protospacer adjacent motif (PAM)) in the target DNA sequence or target gDNA.
As used herein, the terms “target genomic DNA (gDNA)”, “target genomic DNA (gDNA) region”, and “gDNA” are used interchangeably to refer to a target DNA sequence present in the genome of a cell, i.e., a chromosomal target DNA sequence. In some cases, where it is clear that the target DNA sequence is present in the genome of a cell, the terms “target DNA sequence” and “target genomic DNA (gDNA) region” and “gDNA” may be used interchangeably.
The protein-binding segment of a guide RNA comprises, in part, two complementary stretches of nucleotides that hybridize to one another to form a double stranded RNA duplex (dsRNA duplex).
In some embodiments, a nucleic acid (e.g., a guide RNA, a nucleic acid comprising a nucleotide sequence encoding a guide RNA; a nucleic acid encoding a site-directed polypeptide; etc.) comprises a modification or sequence that provides for an additional desirable feature (e.g., modified or regulated stability; subcellular targeting; tracking, e.g., a fluorescent label; a binding site for a protein or protein complex; etc.). Non-limiting examples include: a 5′ cap (e.g., a 7-methylguanylate cap (m7G)); a 3′ polyadenylated tail (i.e., a 3′ poly(A) tail); a riboswitch sequence (e.g., to allow for regulated stability and/or regulated accessibility by proteins and/or protein complexes); a stability control sequence; a sequence that forms a dsRNA duplex (i.e., a hairpin)); a modification or sequence that targets the RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplasts, and the like); a modification or sequence that provides for tracking (e.g., direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, etc.); a modification or sequence that provides a binding site for proteins (e.g., proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA demethylases, histone acetyltransferases, histone deacetylases, and the like); and combinations thereof.
In some embodiments, a guide RNA comprises an additional segment at either the 5′ or 3′ end that provides for any of the features described above. For example, a suitable third segment can comprise a 5′ cap (e.g., a 7-methylguanylate cap (m7G)); a 3′ polyadenylated tail (i.e., a 3′ poly(A) tail); a riboswitch sequence (e.g., to allow for regulated stability and/or regulated accessibility by proteins and protein complexes); a stability control sequence; a sequence that forms a dsRNA duplex (i.e., a hairpin)); a sequence that targets the RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplasts, and the like); a modification or sequence that provides for tracking (e.g., direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, etc.); a modification or sequence that provides a binding site for proteins (e.g., proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA demethylases, histone acetyltransferases, histone deacetylases, and the like); and combinations thereof.
A guide RNA and a site-directed modifying polypeptide (i.e., site-directed polypeptide) form a complex (i.e., bind via non-covalent interactions). The guide RNA provides target specificity to the complex by comprising a nucleotide sequence that is complementary to a sequence of a target DNA. The site-directed modifying polypeptide of the complex provides the site-specific activity. In other words, the site-directed modifying polypeptide is guided to a target DNA sequence (e.g. a target sequence in a chromosomal nucleic acid, e.g., a genome; a target sequence in an extrachromosomal nucleic acid, e.g. an episomal nucleic acid, a minicircle, etc.; a target sequence in a mitochondrial nucleic acid; a target sequence in a chloroplast nucleic acid; a target sequence in a plasmid; etc.) by virtue of its association with the protein-binding segment of the guide RNA.
In some embodiments, a guide RNA comprises two separate RNA molecules (RNA polynucleotides: an “crRNA” and a “tracrRNA”, see below) and may be referred to herein as a “double-molecule guide RNA” or a “two-molecule guide RNA.” In other embodiments, the guide RNA is a single RNA molecule (single RNA polynucleotide) and may be referred to herein as a “single-molecule guide RNA,” a “single-guide RNA,” or an “sgRNA.” The term “guide RNA” or “gRNA” is inclusive, referring both to double-molecule guide RNAs and to single-molecule guide RNAs (i.e., sgRNAs).
An exemplary single-molecule guide RNA comprises a CRISPR RNA (crRNA or crRNA-like) molecule which includes a CRISPR repeat or CRISPR repeat-like sequence and a corresponding trans-activating crRNA (tracrRNA or tracrRNA-like) molecule. A crRNA molecule comprises both the DNA-targeting segment (single stranded) of the guide RNA and a stretch (a duplex-forming segment) of nucleotides that forms one half of the dsRNA duplex of the protein-binding segment of the guide RNA. The corresponding tracrRNA molecule comprises a stretch of nucleotides (a duplex-forming segment) that forms the other half of the dsRNA duplex of the protein-binding segment of the guide RNA. In other words, a stretch of nucleotides of the crRNA molecule are complementary to and hybridize with a stretch of nucleotides of the tracrRNA molecule to form the dsRNA duplex of the protein-binding domain of the guide RNA. As such, each crRNA molecule can be said to have a corresponding tracrRNA molecule. The crRNA molecule additionally provides the single stranded DNA-targeting segment. Thus, a crRNA and a tracrRNA molecule (as a corresponding pair) hybridize to form a guide RNA. A double-molecule guide RNA can comprise any corresponding crRNA and tracrRNA pair.
A single-molecule guide RNA comprises two stretches of nucleotides (a crRNA and a tracrRNA) that are complementary to one another, are covalently linked (directly, or by intervening nucleotides), and hybridize to form the double stranded RNA duplex (dsRNA duplex) of the protein-binding segment, thus resulting in a stem-loop structure. The crRNA and the tracrRNA can be covalently linked via the 3′ end of the crRNA and the 5′ end of the tracrRNA. Alternatively, crRNA and the tracrRNA can be covalently linked via the 5′ end of the crRNA and the 3′ end of the tracrRNA.
The term “stem cell” is used herein to refer to a cell (e.g., a vertebrate stem cell) that has the ability both to self-renew and to generate a differentiated cell type (see Morrison et al., Cell 88:287-298, 1997). In the context of cell ontogeny, the adjective “differentiated”, or “differentiating” is a relative term. A “differentiated cell” is a cell that has progressed further down the developmental pathway than the cell it is being compared with. Thus, pluripotent stem cells can differentiate into lineage-restricted progenitor cells (e.g., mesodermal stem cells), which in turn can differentiate into cells that are further restricted (e.g., neuron progenitors), which can differentiate into end-stage cells (i.e., terminally differentiated cells, e.g., neurons, cardiomyocytes, etc.), which play a characteristic role in a certain tissue type, and may or may not retain the capacity to proliferate further. Stem cells may be characterized by both the presence of specific markers (e.g., proteins, RNAs, etc.) and the absence of specific markers. Stem cells may also be identified by functional assays both in vitro and in vivo, particularly assays relating to the ability of stem cells to give rise to multiple differentiated progeny.
Stem cells of interest include pluripotent stem cells (PSCs). The term “pluripotent stem cell” or “PSC” is used herein to mean a stem cell capable of producing all cell types of the organism. Therefore, a PSC can give rise to cells of all germ layers of the organism (e.g., the endoderm, mesoderm, and ectoderm of a vertebrate). Pluripotent cells are capable of forming teratomas and of contributing to ectoderm, mesoderm, or endoderm tissues in a living organism.
PSCs of animals can be derived in a number of different ways. For example, embryonic stem cells (ESCs) are derived from the inner cell mass of an embryo (Thomson et. al, Science 282: 5391, 1998) whereas induced pluripotent stem cells (iPSCs) are derived from somatic cells (Takahashi et. al, Cell 131 (5):861-72, 2007; Yu et. al, Science318(5858):1917-20, 2007). Because the term PSC refers to pluripotent stem cells regardless of their derivation, the term PSC encompasses the terms ESC and iPSC, as well as the term embryonic germ stem cells (EGSC), which are another example of a PSC. PSCs may be in the form of an established cell line, they may be obtained directly from primary embryonic tissue, or they may be derived from a somatic cell. PSCs can be target cells of the methods described herein.
By “embryonic stem cell” (ESC) is meant a PSC that was isolated from an embryo, typically from the inner cell mass of the blastocyst. ESC lines are listed in the NIH Human Embryonic Stem Cell Registry and many such lines are known. Stem cells of interest also include embryonic stem cells from other primates, such as Rhesus stem cells and marmoset stem cells. It should be understood that stem cells may be obtained from any mammalian species, e.g., human, equine, bovine, porcine, canine, feline, rodent, e.g. mice, rats, hamsters, primates, etc. In culture, ESCs typically grow as flat colonies with large nucleo-cytoplasmic ratios, defined borders and prominent nucleoli. Examples of methods of generating and characterizing ESCs may be found in, for example, U.S. Pat. Nos. 7,029,913, 5,843,780, and 6,200,806, the disclosures of which are incorporated herein by reference. Methods for proliferating hESCs in undifferentiated form are described in WO 99/20741, WO 01/51616, and WO 03/020920. By “embryonic germ stem cell” (EGSC) or “embryonic germ cell” or “EG cell” is meant a PSC that is derived from germ cells and/or germ cell progenitors, e.g. primordial germ cells, i.e., those that would become sperm and eggs. Embryonic germ cells (EG cells) are thought to have properties similar to embryonic stem cells as described above.
By “induced pluripotent stem cell” or “iPSC” is meant a PSC that is derived from a cell that is not a PSC (i.e., from a cell that is differentiated relative to a PSC). iPSCs can be derived from multiple different cell types, including terminally differentiated cells. iPSCs have an ES cell-like morphology, growing as flat colonies with large nucleo-cytoplasmic ratios, defined borders and prominent nuclei. In addition, iPSCs express one or more key pluripotency markers known by one of ordinary skill in the art, including but not limited to Alkaline Phosphatase, SSEA3, SSEA4, Sox2, Oct3/4, Nanog, TRA160, TRA181, TDGF 1, Dnmt3b, FoxD3, GDF3, Cyp26al, TERT, and zfp42. Examples of methods of generating and characterizing iPSCs may be found in, for example, U.S. Patent Application Publication Nos. US20090047263, US20090068742, US20090191159, US20090227032, US20090246875, and US20090304646, the disclosures of which are incorporated herein by reference. Generally, to generate iPSCs, somatic cells are provided with reprogramming factors (e.g. Oct4, SOX2, KLF4, MYC, Nanog, Lin28, etc.) known in the art to reprogram the somatic cells to become pluripotent stem cells.
By “somatic cell” is meant any cell in an organism that, in the absence of experimental manipulation, does not ordinarily give rise to all types of cells in an organism. In other words, somatic cells are cells that have differentiated sufficiently that they will not naturally generate cells of all three germ layers of the body, i.e., ectoderm, mesoderm and endoderm. For example, somatic cells would include both neurons and neural progenitors, the latter of which may be able to naturally give rise to all or some cell types of the central nervous system but cannot give rise to cells of the mesoderm or endoderm lineages.
By “mitotic cell” it is meant a cell undergoing mitosis. Mitosis is the process by which a eukaryotic cell separates the chromosomes in its nucleus into two identical sets in two separate nuclei. It is generally followed immediately by cytokinesis, which divides the nuclei, cytoplasm, organelles and cell membrane into two cells containing roughly equal shares of these cellular components. By “post-mitotic cell” is meant a cell that has exited from mitosis, i.e., is “quiescent”, i.e. no longer undergoing divisions. This quiescent state may be temporary, i.e. reversible, or it may be permanent. Similarly, by “meiotic cell” is meant a cell that is undergoing meiosis. Meiosis is the process by which a cell divides its nuclear material for the purpose of producing gametes or spores. Unlike mitosis, in meiosis, the chromosomes undergo a recombination step which shuffles genetic material between chromosomes. Additionally, the outcome of meiosis is four (genetically unique) haploid cells, as compared with the two (genetically identical) diploid cells produced from mitosis.
By “recombination” is meant a process of exchange of genetic information between two polynucleotides. As used herein, “homology-directed repair (HDR)” refers to the specialized form of DNA repair that takes place, for example, during repair of double-strand breaks in cells. This process requires nucleotide sequence homology, uses a “donor” molecule as a template for repair of a “target” molecule (i.e., the one that experienced the double-strand break), and leads to the transfer of genetic information from the donor to the target. The donor is also referred to herein as the “donor template” and the “repair template.” Homology-directed repair may result in an alteration of the sequence of the target molecule (e.g., insertion, deletion, mutation), if the donor polynucleotide differs from the target molecule and part or all of the sequence of the donor polynucleotide is incorporated into the target DNA. In some embodiments, the donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of a copy of the donor polynucleotide integrates into the target DNA.
By “non-homologous end joining (NHEJ)” is meant the repair of double-strand breaks in DNA by direct ligation of the break ends to one another without the need for a homologous template (in contrast to homology-directed repair, which requires a homologous sequence to guide repair). NHEJ often results in the loss (deletion) of nucleotide sequence(s) near the site of the double-strand break.
The terms “treatment”, “treating” and the like are used herein to generally mean obtaining a desired pharmacologic and/or physiologic effect. The effect may be prophylactic in terms of completely or partially preventing a disease or symptom thereof and/or may be therapeutic in terms of a partial or complete cure for a disease and/or adverse effect attributable to the disease. “Treatment” as used herein covers any treatment of a disease or symptom in a mammal, and includes: (a) preventing the disease or symptom from occurring in a subject which may be predisposed to acquiring the disease or symptom but has not yet been diagnosed as having it; (b) inhibiting the disease or symptom, i.e., arresting its development; or (c) relieving the disease, i.e., causing regression of the disease. The therapeutic agent may be administered before, during or after the onset of disease or injury. The treatment of ongoing disease, where the treatment stabilizes or reduces the undesirable clinical symptoms of the patient, is of particular interest. Such treatment is desirably performed prior to complete loss of function in the affected tissues. The therapy will desirably be administered during the symptomatic stage of the disease, and in some cases after the symptomatic stage of the disease.
The terms “individual,” “subject,” “host,” and “patient,” are used interchangeably herein and refer to any mammalian subject for whom diagnosis, treatment, or therapy is desired, particularly humans.
General methods in molecular and cellular biochemistry can be found in such standard textbooks as Molecular Cloning: A Laboratory Manual, 3rd Ed. (Sambrook et al., HaRBor Laboratory Press 2001); Short Protocols in Molecular Biology, 4th Ed. (Ausubel et al. eds., John Wiley & Sons 1999); Protein Methods (Bollag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); Viral Vectors (Kaplift & Loewy eds., Academic Press 1995); Immunology Methods Manual (I. Lefkovits ed., Academic Press 1997); and Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, John Wiley & Sons 1998), the disclosures of which are incorporated herein by reference.
Where a range of values is provided, it should be understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
The invention further provides kits for genomic DNA modification in a mammalian cell. Kits may comprise one or more guide RNA, repair template, and Cas9 protein (or nucleic acid encoding a Cas9 protein), and/or instructions for use. A kit may also include reagents, solvents, buffers, etc., required for carrying out the methods described herein. In some embodiments, a kit includes a ssODN as described herein for use as a repair template and one or more guide RNA, or one or more nucleic acid (such as a vector) encoding the ssODN and/or the guide RNA.
It should be appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination. All combinations of the embodiments pertaining to the invention are specifically embraced by the present invention and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all sub-combinations of the various embodiments and elements thereof are also specifically embraced by the present invention and are disclosed herein just as if each and every such sub-combination was individually and explicitly disclosed herein.
The present invention will be more readily understood by referring to the following examples, which are provided to illustrate the invention and are not to be construed as limiting the scope thereof in any manner.
Unless defined otherwise or the context clearly dictates otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It should be understood that any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the invention.
We demonstrate herein that the described CRISPR/Cas systems can be used for several gene editing-based therapeutic applications, such as but not limited to, the removal of extra hexanucleotide repeats within the C9ORF72 gene at chromosome 9 and correction of the H46R mutation in SOD1 gene that causes amyotrophic lateral sclerosis (ALS). In addition, we demonstrate that the aforementioned CRISPR/Cas system-based genetic manipulation strategy can also be used for inactivating the expression of CCR5, a co-receptor for CD4, which is required for viral entry into cells and implicated in human immunodeficiency virus (HIV)'s mode of infection. Other potential therapeutic applications are also illustrated.
It is noted that, among the different types of Cas enzymes, type-II Cas9 protein has been used as the CRISPR nuclease for gene editing-based therapeutic strategies described herein, however the particular Cas enzyme used is not meant to be limited. It should be expressly understood that any suitable Cas enzyme may be used in methods and systems provided herein.
General procedures for construction of CRISPR plasmids for use in the CRISPR/Cas system (e.g., with Cas9 from Streptococcus pyogenes) to make genomic modifications are described below. Genomic modifications that can be made using the CRISPR/Cas system provided herein include, without limitation: induction of a gene mutation; correction of a mutated sequence; deletion or insertion of a sequence; and gene repression or activation. Steps include, without limitation: identification of gRNA target sites; analysis of off-target activities; construction of CRISPR plasmids; and transfection of CRISPR components into cells (e.g., cell lines) of interest. CIRSPR plasmids may be all-in-one plasmids or dual plasmids that express Cas9 proteins and gRNAs separately. CRISPR plasmids may be designed for single or multiplex gRNA expression. Non-limiting examples of such plasmids include an “All-in-one” CRISPR plasmid comprising a Promoter-NLS-Cas9n-NLS-2A-reporter-episomal sequence-gRNA cassette; a Cas9n-expressing plasmid that comprising Promoter-NLS-Cas9n-NLS for co-transfection with gRNA-expressing plasmids comprising a Promoter-reporter-episomal sequence-gRNA cassette; T7 promoter-driven Cas9- and gRNA-expressing plasmids; N-terminal 6×His-tagged T7 promoter-driven Cas9 protein expressing prokaryotic vectors for synthesis of recombinant Cas9 protein (and similar plasmids for expressing Cas9n and dCas9). Table 3 and
For gRNA expression plasmids, the human U6 and H1 promoters can be used to directly drive transcription to produce gRNA with defined start and end points. Single gRNA expression plasmids constructed with either a human U6 promoter or a human H1 promoter and multiplex gRNA expression plasmids using both human U6 and H1 promoters are produced (
The following considerations are utilized in methods and systems described herein: i) For Cas9, Cas9n, dCas9, and reporters expression, the human EF1a promoter and human growth hormone poly-A signal are used; ii) Cas9 proteins are tagged with a nuclear localization sequence/signal (NLS) for import into the cell nucleus via nuclear transport; iii) A small self-cleaving 2A peptide sequence is used to construct plasmids expressing multiple proteins from a single mRNA/open reading frame (ORF) (e.g., Cas9 proteins and reporters in the same mRNA); and iv) In addition to conventional plasmids, scaffold/matrix attachment regions (S/MAR) are used for authentic and efficient extra chromosomal (plasmid) replication in mammalian cells to stably express the transgene proteins without integration.
We now elaborate further on the methodology to construct the CRISPR plasmids and preparation of other CRISPR/Cas components for successful genome modification.
A. Single gRNA-Expressing Plasmid Construction Protocol.
First, a target sequence is selected. We design sense and anti-sense DNA oligonucleotide sequences towards the target DNA and upstream of the PAM sequence (5′-NGG-3′). “N” in the PAM sequence stands for any nucleotide (A, C, G, or T). The typical length of the target sequence is 20 bp (e.g., 5′-NNNNNNNNNNNNNNNNNNNNNGG-3′), although in some embodiments shorter or longer target sequences may be used. The PAM sequence (5′-NGG-3′) is shown in bold and underlined.
Next, gRNA oligonucleotides are designed. Two 5′-phosphorylated DNA oligonucleotides are designed, as shown:
Next, the two phosphorylated DNA oligonucleotides are annealed together. Primers are diluted to 10 μM using Nuclease buffer or NTE buffer. NTE buffer contains 50 mM NaCl, 10 mM Tris pH7.4, and 1 mM EDTA. The annealing reaction is prepared to generate a duplex as follows: 10 μL of 10 μM top strand oligo and 10 μL of 10 uM bottom strand oligo are mixed together in a total volume of 20 μL and incubated at 95° C. for 5 minutes. After the 5 min. incubation, Oligos were then cooled down from 95° C. to room temperature at the rate of 1° C./min.
Next, the oligo duplex is ligated into the CRISPR vector. Annealed oligos are cloned into a CRISPR plasmid (e.g., an All-in-one CRISPR plasmid) as follows: 1.0 μL linearized All-in-one CRISPR vector, 3.0 μL annealed oligo mix, 1.0 μL 5× Ligation buffer, and 0.25 μL T4 quick ligase are mixed together in a total volume of 5.0 μL and incubated under standard conditions. The mix is then transformed into competent cells with appropriate antibiotic selection using standard methods. The constructed plasmids are confirmed using restriction analysis and DNA sequencing. The CRISPR/Cas system is transfected using a standard transfection protocol, e.g., by lipofection (such as with Lipofectamine LTX™ (Invitrogen)) or by electroporation (such as with 4D Nucleofector™ system (Lonza)), and a functional assay using the SURVEYOR™ mutation detection kit (#706020, IDT) is performed to validate the CRISPR-mediated genome editing in the target gDNA sequence. This assay uses enzymes that cleave heteroduplex DNA of an edited sequence and provides specific information on the mutation's location, orientation, and type.
Dual gRNA Expressing-Plasmid Construction.
First, a dual gRNA expression fragment is synthesized. Forward and reverse primers for generating the desired dual gRNA PCR amplicon are designed and made. Primers may also be procured from a commercial source, e.g., from Sigma Genosys. After the correct size of the amplicon is generated and gel-purified, it is then inserted into a suitable linearized vector (e.g., the All-in-one CRISPR vector).
Primers are designed as shown below:
Forward Primer:
5′-AGACACCTTGGATCCNNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGA AATAGCAAG-3′, where N denotes the gRNA1 sequence.
Reverse Primer:
5′-TTCTAGCTCTAAAACnnnnnnnnnnnnnnnnnnnnGTTTTAGAGCTAGAAATAGCA AG-3′, where n denotes the reverse complement sequence of gRNA2.
U6 gBlock is used for this amplicon generation, as follows: 15.0 μL PCR Master mix (2×), 3.0 μL 10 μM forward primer, 3.0 μL 10 μM reverse primer, 3.0 μL DMSO, 0.5 μL gBlock with human U6 promoter, and 5.5 μL Nuclease Free Water are mixed together in a total volume of 30.0 μL. A PCR reaction program is then run as shown in Table 4.
After completion of PCR amplification, the PCR product (15 μl/well) is resolved by 1.5% agarose gel electrophoresis. If no additional bands are observed, the single PCR amplicon is excised out from the gel and eluted out using column purification.
The PCR product is then cloned into the appropriate vector. For example, the PCR product may be cloned into a linearized All-in-one CRISPR vector as follows: 1.0 μL linearized All-in-one CRISPR vector, 1.0 μL PCR insert (up to 200 ng), 6.0 μL Nuclease free water, and 2.0 μL 5× fusion master mix are mixed together in a total volume of 10 μL. Volumes of the PCR insert and the nuclease free water are varied depending on the concentration of the insert. The linearized vector and PCR insert are generally mixed in a 1:2 molar ratio. The mixture is transformed into competent cells with appropriate antibiotic selection. The constructed plasmids are confirmed by restriction analysis and DNA sequencing. The CRISPR/Cas system is then transfected into mammalian cells using a standard transfection protocol, e.g., by lipofection (such as with Lipofectamine LTX™ (Invitrogen)) or by electroporation (such as with 4D Nucleofector™ system (Lonza)), and a functional assay using the SURVEYOR™ mutation detection kit (#706020, IDT) is performed to validate the CRISPR-mediated genome editing in the target gDNA sequence. This assay uses enzymes that cleave heteroduplex DNA of an edited sequence and provides specific information on the mutation's location, orientation, and type.
Triple gRNA Expressing Plasmid Construction.
First, a triple gRNA expression fragment is synthesized. Forward and reverse primers for generating the desired triple gRNA PCR amplicon are designed and synthesized or procured from, e.g., Sigma Genosys. After the correct sizes of the amplicon are generated and gel purified, the purified amplicon is inserted using a fusion reaction with a suitable linearized vector, such as the All-in-one CRISPR vector.
Primers are designed as shown below:
Amplicon-1 Forward Primer:
5′-AGACACCTTGGATCCNNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGA AATAGCAAG-3′, where “N” denotes gRNA1 sequence.
Amplicon-1 Reverse Primer:
5′-nnnnnnnnnnnnnnnnnnnnCGGTGTTTCGTCCTTTCCAC-3′, where “n” denotes reverse complement sequence of gRNA2.
U6 gBlock is used for this amplicon generation.
Amplicon-2 Forward Primer:
5′-NNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAG-3′, where “N” denotes gRNA2 sequence.
Amplicon-2 Reverse Primer:
5′-TTCTAGCTCTAAAACnnnnnnnnnnnnnnnnnnnnGGATCCAAGGTGTCTCATAC-3′, where “n” denotes reverse complement sequence of gRNA3.
H1 gBlock is used for this amplicon generation.
For the PCR reaction, components are mixed as follows: 15.0 μL PCR Master mix (2×), 3.0 μL 10 μM forward primer, 3.0 μL 10 μM reverse primer, 3.0 μL DMSO, 0.5 μL gBlock with human U6/H1 promoter, and 5.5 μL Nuclease Free Water are mixed together in a total volume of 30.0 μL. A PCR reaction program is then run as shown in Table 4.
After completion of PCR amplification, the PCR product (15 μL/well) is resolved by 1.5% agarose gel electrophoresis. If no additional bands are observed, the PCR amplicon is excised and eluted out using column purification. The PCR product is then cloned into a vector, such as the All-in-one CRISPR vector, as follows: 1.0 μL linearized All-in-one CRISPR vector, 1.0 μL of each PCR insert in a 1:1 ratio, up to 200 ng, 5.0 μL Nuclease free water, and 2.0 μL 5× fusion master mix are mixed together in a total volume of 10.0 μL. Volumes of the PCR inserts and the nuclease free water are varied depending on the concentration of the inserts. The linearized vector and PCR inserts are generally mixed in a 1:2 molar ratio. The mixture is transformed into competent cells with appropriate antibiotic selection. The constructed plasmids are confirmed by restriction analysis and sequencing. The CRISPR/Cas system is transfected using a standard transfection protocol and a functional assay is performed to validate the CRISPR-mediated genome editing, as described above.
Multiplex gRNA-Expressing Plasmid Construction for Expression of 4 gRNAs in the all-in-One CRISPR Plasmid.
First, a quad (4) gRNA expression fragment is synthesized. Forward and reverse primers for generating the desired quad gRNA PCR amplicon are designed and synthesized or procured from e.g. Sigma Genosys. After the correct sizes of the amplicon are generated and gel purified, the purified amplicon is inserted using a fusion reaction with a suitable linearized vector (such as the All-in-one CRISPR vector). The four gRNA expressing CRISPR plasmid can be used for removal of a defined gene, fragment, or sequence in the genome using Cas9n with significantly high specificity.
Primers are designed as shown:
Amplicon-1 Forward Primer:
5′-AGACACCTTGGATCCNNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGA
AATAGCAAG-3′, where “N” denotes gRNA1 sequence.
Amplicon-1 Reverse Primer:
5′-nnnnnnnnnnnnnnnnnnnnCGGTGTTTCGTCCTTTCCAC-3′, where “n” denotes reverse complement sequence of gRNA2.
U6 gBlock is used for this amplicon generation.
Amplicon-2 Forward Primer:
5′-NNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAG-3′, where “N” denotes 15 bp of gRNA2 sequence.
Amplicon-2 Reverse Primer:
5′-nnnnnnnnnnnnnnnnnnnnGGATCCAAGGTGTCTCATAC-3′, where “n” denotes reverse complement sequence of gRNA3.
H1 gBlock is used for this amplicon generation.
Amplicon-3 Forward Primer:
5′-NNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAG-3′, where “N” denotes 15 bp of gRNA3 sequence.
Amplicon-3 Reverse Primer:
5′-TTCTAGCTCTAAAACnnnnnnnnnnnnnnnnnnnnCGGTGTTTCGTCCTTTCCAC-3′, where “n” denotes reverse complement sequence of gRNA4.
U6 gBlock is used for this amplicon generation.
For the PCR reaction, components are mixed as follows: 15.0 μL PCR Master mix (2×), 3.0 μL 10 μM forward primer, 3.0 μL 10 μM reverse primer, 3.0 μL DMSO, 0.5 μL gBlock with human U6/H1 promoter, and 5.5 μL Nuclease Free Water are mixed together in a total volume of 30.0 μL. A PCR reaction program is then run as shown in Table 4.
After completion of PCR amplification, the PCR product (15 μL/well) is resolved by 1.5% agarose gel electrophoresis. If no additional bands are observed, the PCR amplicon is excised and eluted out using column purification. The PCR product is then cloned into a vector, such as the All-in-one CRISPR vector, as follows: 1.0 μL linearized All-in-one CRISPR vector, 1.0 μL of each PCR insert in a 1:1:1 ratio, up to 200 ng, 4.0 μL Nuclease free water, and 2.0 μL 5× fusion master mix are mixed together in a total volume of 10.0 μL. Volumes of the PCR inserts and the nuclease free water are varied depending on the concentration of the inserts. The linearized vector and PCR inserts are generally mixed in a 1:2 molar ratio. The mixture is transformed into competent cells with appropriate antibiotic selection. The constructed plasmids are confirmed by restriction analysis and sequencing. The CRISPR/Cas system is transfected using a standard transfection protocol and a functional assay is performed to validate the CRISPR-mediated genome editing, as described above.
Preparation of CRISPR Plasmids.
For transfection or microinjection delivery of CRISPR genome editing systems into cells, plasmids must be pure and free from chemical contamination, endotoxins and any animal components. We use an endotoxin-free plasmid DNA maxiprep kit (EndoFree™ Plasmid Maxi Kit, #12362, Qiagen) to isolate plasmid DNA from RecA− and EndA− E. coli cells after overnight culture. Before using the CRISPR plasmids, the isolated plasmids are tested and validated using PCR, restriction analysis and whole plasmid sequencing.
In Vitro Transcription (ivT) for mRNA and gRNA Synthesis.
Plasmids containing T7 promoter-driven gRNA and Cas9 are used in ivT reactions to generate mature Cas9/Cas9n/dCas9 and gRNA. Transient expression of CRISPR components using ivTRNA is integration free and expression decreases as RNA is degraded within the cell.
In vitro transcription of Cas9 protein (such as Cas9, Cas9n, dCas9, etc.) is carried out using INCOGNITO T7 ARCA™ 5 mC- & ψ-RNA transcription kit. The mRNA for the appropriate Cas9 protein is generated in an animal component-free production process using T7 promoter based transcription and subsequent 5′ capping and 3′ polyadenylation. The introduction of anti-reverse cap analog (ARCA) and modified nucleotides (5′-mCTP and ψ-TP) results in higher levels of protein translation and induces a lower innate immune response against the resulting RNAs in downstream applications.
For in vitro transcription of gRNA(s), we introduce a target 20-nucleotide sequence under the control of the T7 promoter that has been amplified by PCR. The PCR amplicon contains T7 promoter+target specific crRNA+tracrRNA construct which is used as a template to synthesis gRNA by T7 RNA polymerase. The resulting products are treated with DNase-I to remove the template and quantified using NanoDrop™. Prior to transfection or microinjection, the activity of the newly synthesized gRNA is checked with Cas9 nuclease and a corresponding template.
CRISPR Nuclease Production.
Plasmids and mRNAs for CRISPR nuclease (Cas9, Cas9n, dCas9, etc.) require transcription and translation for use in genomic modification. Unlike plasmids and mRNAs, Cas9 protein works immediately after transfection into cells with gRNAs.
Another advantage of this Cas9 protein is that we can check the efficiency using in vitro experiments. Humanized Cas9 protein (e.g., Cas9, Cas9n and dCas9) sequences are sub-cloned into a T7-driven E. coli expression vector that contains a nuclear localization signal, an HA epitope, a 6×His tag at the N-terminal, and can be induced with IPTG in BL21 (DE3) strain. Cas9 proteins are purified using Ni-NTA agarose beads, dialyzed and analyzed by SDS-PAGE prior to downstream applications.
Delivery of CRISPR/Cas9 System into Cells.
Methods of delivery are determined based on the target cells and applications desired for genome editing. CRISPR reagents can be delivered by any suitable method such as, without limitation, transfection, nucleofection, and microinjection, and using either plasmid DNA, RNA and/or protein.
Design of ssODN.
Homology-directed repair (HDR) is a precise genetic modification ranging from a single nucleotide change to large insertions at a pre-defined target site of the genome. However, HDR needs a donor template which could be either a double stranded linear DNA or a ssODN containing the desired sequence. The donor template must be introduced into the cells with the Cas9 or Cas9n enzyme and the gRNA(s). Along with the desired modifications, the donor template must contain an additional homologous sequence immediately downstream and upstream (i.e., a homology sequence at both the right and left arms) of the target sequence.
For large modifications (>100 bp insertion/deletion), using a double stranded DNA donor is generally more efficient than a ssODN. However, a ssODN provides a more effective HDR than a dsDNA template when small modifications (<50 bp) in the target sequence are needed. However, the orientation and length of the ssODN (desired modifications in the offset plus homology arms on each side) must be optimized for each target in order to achieve a high-performance ssODN. We have observed that ssODNs longer than an optimum length result in decreased HDR efficiency (data not shown).
A schematic diagram of the structure of ssODN donor templates is shown in
Enhancing HDR Efficiency Using Modifications in ssODN and Inhibiting NHEJ.
Compared to NHEJ, HDR occurs at a much lower frequency and is therefore less efficient than NHEJ. This low efficiency of HDR presents a major constraint in the execution of precise genetic modifications by the CRISPR/Cas9 system. In addition to optimal HDR template design, we show that certain modifications can be made in the donor template to improve the stability of the ssODN HDR donor template.
For successful genome modification by Cas9 nuclease, a PAM sequence at the 3′ end of the 20-nucleotide target sequence is required. However, if the HDR template has an intact PAM sequence or retains an intact PAM sequence in the donor template after Cas9 modification has occurred, then it may be degraded by Cas9 in the cells and Cas9 may repeatedly act on the target sequence, even after the desired modification has been introduced. To avoid these unwanted activities by the CRISPR/Cas9 system, we mask the PAM sequence in the HDR donor template by mutating the PAM sequence. For example, in the case of the SpCas9 enzyme, the PAM sequence “NGG” in the HDR template can be mutated to NGT, NGC or NGA. It is noted that, if the HDR template falls within the coding region, then a silent mutation strategy should be followed to avoid introducing amino acid changes into the coding region.
Once a large cell population is successfully obtained with a reasonable indel rate or HDR-mediated correction or HDR-mediated knock-in, edited cells will be purified using magnetic separation or other suitable methods known in the art such as, e.g., cell sorting. In the case of magnetic separation, the corrected or edited cells have a change in expression of a cell surface marker or intracellular marker that can be specifically recognized by an antibody (or other means) to which a magnetic core (such as iron nanoparticle microbeads) is attached, allowing for magnetic separation using a magnetic field. Alternatively, the non-corrected/non-edited cells express the cell surface marker or intracellular marker that can be specifically recognized by an antibody (or other means) to which a magnetic core (such as iron nanoparticle microbeads) is attached, allowing for magnetic separation from the corrected or edited cells using a magnetic field (e.g., using a column placed in a magnetic field). In this case, the CRISPR-edited cells are purified by either negative selection (e.g., in the case of a knockout) or positive selection (e.g., in the case of knock-in/HDR repair).
We use CCR5 as an example to illustrate the separation of CRISPR/Cas9 edited CCR5 receptor knockout cells by negative selection. CCR5 belongs to a family of G-protein-coupled receptors and spans the plasma membrane seven times in a serpentine manner. CCR5 serves as a receptor for several chemokines including MIP-1α, MIP-1β, and MCP-2. It also functions as the primary co-receptor for macrophage-tropic HIV-1, which binds to CCR5 through gp120. Extracellular domains of CCR5 are important for HIV entry into target cells.
Manipulating a target genome sequence using CRISPR/Cas systems provided herein can have a wide range of therapeutic applications, including without limitation: correction of mutated sequences or base pairs in the genome; deletion or insertion of sequences/bps; and induction of mutations, e.g., for transcriptional activation or repression of a gene of interest. Here we use ALS as a genetic disease model to demonstrate gene-editing strategies for correcting genetic mutations associated with ALS.
ALS is the third most common neuromuscular disease worldwide that attacks nerve cells responsible for controlling voluntary muscles. There are currently no definitive diagnoses or effective therapies for ALS. About ˜90% of ALS occurs sporadically without clear associated risk factors and only ˜10% of ALS cases have been found to be familial, being caused by mutations in more than a dozen genes. The familial form of ALS usually results from a pattern of inheritance that requires only one parent to carry the gene responsible for the disease. About 50% of familial ALS cases result from a defect in genes encoding chromosome 9 open reading frame 72 (C9ORF72), which has unknown gene function, and/or superoxide dismutase 1 (SOD1). In the general population, the C9ORF72 gene typically contains from about 3 to 30 GGGGCC hexanucleotide repeats. This number of repeats is considered to be healthy. ALS is associated with heterozygous GGGGCC hexanucleotide expansion of from about 200 to 4500 repeats in a non-coding region of the C9ORF72 gene. These hexanucleotide repeats are believed to be responsible for the disease.
We generate and use a CRISPR/Cas9-based gene editing system using the cellular NHEJ pathway to excise the extra GGGGCC hexanucleotide repeats from the genome of an ALS patient. Through competitive binding of gRNAs and the Cas9n nuclease, the genome of an ALS patient having more than 5 hexanucleotide repeats is selectively modified, whereas the CRISPR gene editing system does not act on a healthy genome having 3 or fewer hexanucleotide repeats (
Triple gRNA guided excision of extra GGGGCC hexanucleotide repeats in C9ORF72 is validated by a Loop-mediated isothermal amplification (LAMP) technique and diagnostic ALS PCR. LAMP is an auto-cycling strand displacement DNA synthesis procedure, carried out by Bst DNA polymerase with high strand displacement activity and a set of four specially designed primers that recognize a total of six distinct sequences on the target DNA (Notomi, T. et al., Nucleic Acids Res. 28, e63, 2000). LAMP is therefore expected to amplify the target sequence with high selectivity (Notomi, T. et al., Nucleic Acids Res. 28, e63, 2000; Nagamine, K. et al., Mol. Cell. Probes 16:223-229, 2002). We design a set of four primers that specifically bind to the C9ORF72 gene that has extra GGGGCC hexanucleotide repeats; these primers fail to recognize and amplify normal genomic DNA or the ALS-C9ORF72 gene in which the extra GGGGCC hexanucleotides have been excised. The final LAMP-amplified GGGGCC hexanucleotide-containing C9ORF72 gene products are a mixture of stem-loop DNA of various lengths. LAMP amplified products can be analyzed by real-time PCR (with real-time probes), turbidity, fluorescent assay and agarose gel electrophoresis. When LAMP products are visualized by agarose gel electrophoresis, many bands of different sizes up to the loading well are seen.
PCR amplification of the targeted gDNA region and the endonuclease assay are the conventional methods that have been developed to detect the efficiency of indel mutations induced by CRISPR/Cas9 activity. The extra GGGGCC hexanucleotide repeat-containing C9ORF72 gene amplification is technically challenging due to the presence of a high percentage of GC base pairs. A qualitative PCR method to validate the presence of extra GGGGCC hexanucleotide repeats in the C9ORF72 gene was therefore developed. Briefly, the PCR amplification was carried out using Phusion™ High-Fidelity PCR Master Mix (ThermoFisher, MA, USA). The forward primer NWL-MBPr-664 (5′-GGGTCTAGCAAGAGCAGGTGTGGGTTTAGGAGGTGTGTG-3′) and reverse primer NWL-MBPr-674 (5′-GCCCCGACCACGCCCCGGCCCCGGCCCCGGCCCCTAGCG-3′) were used to amplify a 211-bp PCR product from both the wild type (WT) and extra GGGGCC hexanucleotide repeat-containing C9ORF72 gene. The PCR cycling parameters used with the C1000 Thermal Cycler (Bio-Rad, CA, USA) were as follows: initial denaturation at 98° C. for 3 min, 32 cycles of 98° C. for 30 seconds, 71° C. for 30 seconds, 72° C. for 60 seconds, and final extension at 72° C. for 5 min. After amplification, the PCR products were resolved by 1.5% agarose gel electrophoresis. Visualizing the PCR products by agarose gel electrophoresis showed the following: WT resulted in a high-intensity band, while any extra GGGGCC hexanucleotide repeat-containing C9ORF72 sequence resulted in a low-intensity band. We utilized the band intensity difference in diagnostic ALS PCR to screen the extra GGGGCC hexanucleotide repeat excisions.
Human Neural Stem Like-Cells (NSLC) from an ALS patient were cultured in neural proliferation media (NeuroCult™ proliferation medium, STEMCELL Technologies, Vancouver, BC, Canada) supplemented with EGF (20 ng/ml, Peprotech, QC, Canada) and FGF (20 ng/ml, Peprotech) at 37° C., 5% CO2, 5% O2 until 80% confluency. After reaching the desired confluency, the cells were harvested with TrypLE™ (Life Technologies, CA, USA) by incubating the cells in TrypLE for 3 to 5 minutes at 37° C. The cells were pelleted by centrifugation at 1500 rpm for 5 minutes and the cell pellet was used for transfection experiments. About 1×106 cells were gently re-suspended in 100 μl of P3 solution (Lonza) and combined with 2 μg of pD-Epi723gRNA1 plasmid and 1.5 μg of Cas9n mRNA. This mixture was then transferred to a nucleofection cuvette and the cells were transfected using the program “DS150” in a 4D-Amaxa Nucleofector™ Device (Lonza, Walkersville, Md., USA). The transfected cells were transferred to a laminin-coated 6-well culture plate seeded at a cell density of ˜2×105/well and incubated overnight in the neural proliferation media supplemented with EGF (20 ng/ml) and FGF2 (20 ng/ml) at 37° C., 5% CO2, 5% O2. The un-transfected cells were also plated at the same cell density as negative control for this experiment. After the overnight incubation, the media was replaced with fresh neural proliferation media supplemented with EGF (20 ng/ml) and FGF2 (20 ng/ml). The transfected cells were subsequently re-transfected two more times (3 days apart) with 1.5 μg of Cas9n mRNA using the Lipofectamine™ MessengerMAX (Invitrogen) as per the manufacturer's protocol. The triple transfected cells were further cultured in the neural proliferation media supplemented with EGF (20 ng/ml) and FGF2 (20 ng/ml) for another 48 hours and then collected for diagnostic ALS PCR analysis. Diagnostic ALS PCR was performed to examine whether the pD-Epi723gRNA1 plasmid/Cas9n mRNA removed the extra GGGGCC hexanucleotide repeats in the ALS-C9ORF72 gene. As shown in
At present more than 150 different mutations in the SOD1 gene have been identified in ALS patients. H46R, a mutation in the 46th codon for histidine changed to arginine, is the most common ALS-causing mutation. H46R causes a profound loss of copper binding to the SOD1 active site, which renders SOD1 enzymatically inactive. We generate a CRISPR/Cas9-based gene editing system using a ssODN-directed HDR pathway to site specifically correct the H46R mutated codon.
We also design a correcting ssODN template with a histidine codon at the 46th position for HDR targeting of the non-coding strand of the gene. Since the non-coding strand has been shown to be highly specific and does not interfere with transcription and gene expression, the specificity of the ssODN is confirmed to avoid the ssODN HDR template targeting other sequences in the genome. To improve stability of the ssODN HDR template, we incorporate a tag at the 3′ end (such as, but not limited to, a CGCG repeat of phosphorothioate) to increase intracellular stability towards endonuclease and exonuclease. To improve efficiency of the ssODN HRD template, a peptide nucleic acid at the end of the ssODN consisting of nucleic acid bases attached to an archiral peptide backbone made up of N-2-aminoethyl glycine units is incorporated. Furthermore we can incorporate a tracking fluorophore (such as, without limitation, a Cyanine dye or a quantum dot) at the 5′ end of the ssODN to allow monitoring of its cellular uptake and distribution. The PAM sequences in the ssODN HDR donor are also masked silently (without affecting the amino acid sequence) to avoid degradation by Cas9n nuclease and to avoid repeated genetic modification after the desired modification has taken place. Both the above-mentioned gene editing methodologies efficiently correct the genetic disorders and retain target gene function.
We also use similar methods to introduce a CCR5 Δ32 mutation into a cell. CCR5 is a co-receptor for CD4, which plays a critical role in the entry of HIV into host CD4+ cells). Cells with a CCR5 Δ32 mutation are known to be resistant to HIV entry (see
Mitochondria are double membrane sub-cellular organelles present in all mammalian nucleated cells. Their main role is to produce cellular ATP through oxidative phosphorylation. Mitochondria have their own DNA, which is distinct from the chromosomal DNA present in the nucleus. Human mitochondrial DNA is a small circular double stranded DNA of about 16.6 kb in size, encoding 13 essential polypeptides, which are critical for oxidative phosphorylation. Mitochondria replicate their DNA by themselves. Harmful mutations in mitochondrial DNA can cause a number of serious diseases. The rate of mitochondrial DNA mutation is 10 to 17-fold higher than nuclear DNA mutation. Unlike mutations in nuclear DNA, which are inherited from both parents, mitochondrial mutations are inherited only from the mother.
Two types of mutations, namely point mutations and rearrangements, occur in mitochondrial DNA. Over 250 harmful mitochondrial mutations have been characterized (MITOMAP: A Human Mitochondrial Genome Database, http://www.mitomap.org, 2009). These mutations disrupt the mitochondria's ability to generate energy efficiently. Mitochondrial dysfunction may lead to several diseases in many organs such as progressive myopathy, cardiomyopathy, retinitis pigmentosa, Leber hereditary optic neuropathy (LHON), progressive brain-stem disorder, diabetes, MELAS (mitochondrial encephalomyopathy, lactic acidosis, and stroke-like episodes), and so on. Currently there are no effective treatments for the majority of mitochondrial diseases.
Due to the complexity of the mitochondrial DNA location, certain modifications are required in nuclear CRISPR/Cas9-mediated gene editing methodology to precisely edit the mutated mitochondrial DNA, including for efficiently targeting the Cas9 enzyme and HDR donor ssODN towards mitochondrial DNA. These modifications include adding a mitochondrial targeting sequence (MTS), such as the MTS from Ornithine transcarbamylase or cytC, at the N-terminal end to direct the Cas9 enzyme towards the mitochondrial matrix, and modifying the donor ssODN at its 3′ end by adding peptide nucleic acids (PNA) attached with either a mitochondrial signal peptide (MSP) or triphenylphosphonium (TPP). The mechanisms of MSP and TPP targeting of the ssODN to the mitochondrial matrix are unique, as MTS is recognized by mitochondrial surface receptors and TPP can easily pass through the phospholipid bilayer towards the negatively charged mitochondrial matrix (Yoon, Y. G. et al., Anat. Cell Biol. 43 (2): 97-109, 2010). These unique properties of the conjugates promote efficient delivery of the ssODN donor to precisely edit the mitochondrial DNA mutations.
Selective repair of mutant mitochondrial DNA without affecting normal or wild type mitochondrial DNA is of great importance. The localized oxidative environment and increased replication in the mitochondria can sometimes make mitrochondrial DNA mutation more frequent. In such conditions mutant mitochondrial DNA co-exist with normal or wild type DNA in various proportions, referred to as heteroplasmy. Thus an increase in the proportion of mutant mitochondrial DNA is required for a disease to be expressed or to increase disease severity. To selectively repair the mutant DNA and increase the proportion of wild type mitochondrial DNA for complete recovery from the disease phenotype, an HDR donor ssODN that is linked with either PNA-MSP or PNA-TPP along with Cas9 enzyme and gRNA (Cas9+gRNA)-expressing plasmid can be used. It should be understood that this configuration is used as an example only; many other configurations are also possible, as described herein, and are intended to be encompassed. For example, episomal expression of the gRNA in the cells may be introduced first, followed by the Cas9 protein and the modified ssODN. The gRNA is designed so that it exclusively binds to mutated mitochondrial DNA, but not to wild type DNA. For efficient mitochondrial DNA editing, either co-transfection of donor ssODN and Cas9+gRNA expressing plasmid, or transfection of donor ssODN 24 hours after transfection of Cas9+gRNA-expressing plasmid, can be used.
A schematic diagram illustrating such a system for use in correcting mutated mitochondrial DNA targets is shown in
Correcting mutations via high fidelity HDR pathways with a donor template is generally less efficient than non-homologus end joining (NHEJ), which can repair a DSB by directly rejoining the two ends in a process that does not require any homology repair template. During the repair process, NHEJ introduces indels that can cause frame shift mutations in the target gene and lead to mRNA degradation by nonsense-mediated decay or result in the production of truncated non-functional proteins. Mutations in several proteins cause pathogenic abnormal functionality that could be more virulent than complete knockout of such genes. In such cases, complete deletion of a pathogenic mutated gene may cause less harm than the expression of the pathogenic form of that specific protein.
For illustrative purposes, SOD1 is used here as an example. Mutations in the SOD1 gene have been found in about 12-13% of familial cases of amyotrophic lateral sclerosis (ALS). Currently, about 150 different mutations have been reported in the SOD1 gene that causes ALS. ALS is a protein misfolding disease. Mutations in the SOD1 gene cause an increased propensity to form aggregates that may confer toxicity, especially in motor neurons. Minute amounts of mutated SOD1 aggregates are sufficient to act as prions in transmitting a templated, spreading aggregation of SOD1, that leads to development of fatal ALS (Bidhendi, E. E. et al., J. Clin. Invest. 2016 (in press), doi: 10.1172/JCI84360). It has been confirmed that motor neuron disease caused by the SOD1 protein is due predominantly to the gain of such toxic properties and not through loss of function. Hence, we have developed a gene editing strategy to completely delete the expression of the SOD1 enzyme, instead of merely correcting specific mutations.
Gene editing systems and methods described herein can be used not only for treating diseases caused by genetic mutations, but also for treating infectious diseases and diseases that having both genetic and environmental components. For example, systems and methods described herein can be used to specifically knockout genes required for the survival of pathogenic human infectious agents, such as viruses, bacteria, parasites, yeast and prion proteins. Further, systems and methods can be used to knockout a previously inserted transgene in the germ line of a genetically modified organism (GMO). For illustration, we also describe herein the design of NHEJ mediated gene editing strategies to knockout (e.g., inactivate) previously inserted transgenes in the germ line of GMOs.
About 4000 human diseases that are caused by gene disorders have been identified (Online Mendelian Inheritance in Man (OMIM): An Online Catalog of Human Genes and Genetic Disorders, http://www.omim.org; The Human Gene Mutation Database (HGMD®): An attempt to collate known (published) gene lesions responsible for human inherited disease, http://www.hgmd.cf.ac.uk/ac/index.php; Piñero, J. et al., Database (Oxford), 2015; 2015:bav028). Among these genetic disorders, current and potential candidates for gene therapy include, without limitation, cancer, cardiovascular diseases, metabolic diseases, AIDS, cystic fibrosis, amyotrophic lateral sclerosis, Parkinson's disease, Alzheimer's disease, and arthritis. The same or similar systems and methods described herein can be used to treat such genetic disorders.
Examples of diseases and their associated infectious agents, along with transgenes, that may be targeted using systems and methods described herein are shown in Table 5, which lists various human disease models and their respective infectious agents. The examples shown in Table 5 are for illustrative purposes only and are not meant to be limiting. Examples are also illustrated in
Additional examples of genetic disorders that may be targeted using systems and methods described herein are shown in Table 6, which also lists exemplary gRNAs. Examples are also illustrated in the Figures as indicated.
Plasmodium
falciparum
Bacillus
anthracis
Candida
albicans
TGATCTTACTGATTCTGAAA
CATTATTTTTCATTCATAGT
TGGCCGATTCAGAATTTTGT
AAATTGGAAGAGTTTGTTTA
CCTGTGGGGCAAAGTGAACG
TTGACAGCAGTCTTCTCCAC
AACGCTACACGTTTCGTGTT
CTGCCCATCCACACTAGGCA
CAGCAACCAGACGGACAAGC
CACGAAGCTCTCCGATGTGT
TATCTCATACCTATCCCTAT
ACAACTGGGGCCAGGTGCGG
CAATTTGGTCAGCTGGGTCT
TATACACAGCATTCTCTGAA
TTAAACATTAACGGAACCCC
TGGCGTGATCTGCGCGCCCC
CTCTCTGAGGTAACAAATTG
TCTCAGAGAGGAGTAAACAC
TCGAGGAGATCTCGAATAGA
CGCCTCTGCTCTGTATCGGG
CAACGATCTGACCGCCACCC
GCCGCGCAGGGGCCCTAGAT
CAGAATTGATACTGACTGTA
AAAGATAGTCATCTTGGGGC
CCCCTATCTTTATTGTGACG
AAGGAAGCTCTATTAGATAC
CCCAAGACACACCATCGATC
CAATAACTTTGCAACAGTGA
AGATGCCAGCAGATCAGCTC
CTCAGCTTTTTCTCACTCTA
CAACGTGCTGGTCTGAGTGC
AGCTGTGGGAGGAAGATAAG
AGCCTCCACCAGCTCCTCCT
ATGCAGGCATCCTCAGCTAC
GCTAGGCCACGCCGAGGTCC
AGTTATGGCGACGAAGGCCG
CATGAACACGGAATCCATGC
TTGGAGATAATACAGCAGGT
GAGTCGCGCGCTAGGGGCCG
CCGGGGCCGGGGCCGGGGCG
CCCGGCCCCGGCCCCGGCCC
TGCCGCACGCCCCCTGGCAG
GGCGGCGGAGGCGGCGGCGG
CCAGGCCCAGGCCCAGGCCC
CCTGCGCCTGCCCTGGGAAC
CCTGGGCCTGGGCCTGGGCC
CCATGTGTACTACATCCACA
AAACTCACTGGAGTTTGACG
AGCGGTAACTAAGATTAGTA
TTCCAACTGTTCATCGGCTG
GTCCGTGCTGTCCGCTCGG
TGGGGCCGCAGGGCGTGGAT
CATAAATTTTACATTTGAGT
GATGATATAACAAATCAATA
GAGATTGATTTTCTAATAAA
CCGCCGATATATTACAGAAC
ACCCCGCCTGTAACACGAGC
TACCACAGAGTCTAGACTCG
GTTTATGTTGAATAGCATTG
AACCGAGTCCGATGAAAAAA
GCTTCGGGCGCTTCTTGCAG
CTGGGGGCAGCCGATACCCG
CCAGGATCCAACTGCCTATG
TGTGGCCATGTGGAGTGACG
CATCTAATTCAACAAGAATT
GGGCAAAAATTCTCTGTCAG
TTGATACTCCTGGGACAAAT
1Sequences are shown in the 5′ → 3′ direction. PAM sequences are shown in bold and underlined.
HDR is generally a very precise DNA repair pathway compared to NHEJ, however, NHEJ is generally more active than HDR-mediated DNA repair. This difference in efficiency makes HDR-mediated gene editing more challenging for correcting or editing genes, as well as for use in disease treatment or gene enhancement strategies that require precise gene editing or correction, as compared to gene inactivation by NHEJ. To overcome this difference in efficiency, we have developed an HDR-mediated gene-editing strategy with multiple transfections of HDR-plasmid and a corrective HDR donor in order to target disease-causing mutations or for gene enhancement.
In order to illustrate such methods, an example is shown in
Cancer is a group of diseases involving uncontrolled proliferation of abnormal cells in the body with the potential to invade or spread to other parts of the body. Current cancer treatment strategies generally target all the dividing cells instead of specifically targeting the abnormally proliferating cancer cells. Targeted cancer therapies are expected to be more efficient than older treatment forms and less harmful to the healthy cells. The methods detailed herein allow for a gene editing approach using the cellular NHEJ pathway to knockout oncogenes that disrupt the normal cell cycle (though overexpressing their encoded oncoproteins, leading to cancer development). This CRISPR/Cas9 system mediated gene editing involves the introduction of Cas9n protein (either via a plasmid, mRNA, or protein) and gRNA into cells to correct the cellular dysfunction or to knockout certain genes within the cells to cure or slow cancer progression. Recently, many genes and associated pathways have been identified as being dysregulated in cancer. Examples of such genes are shown in Table 7. Gene editing provides a rationale towards specifically targeting such genes to cure the underlying cancer diseases. As an example, WNT10A acts as an autocrine oncogene both in renal cell carcinogenesis and in its progression by activating the WNT/β-catenin signaling cascade (Hsu et al., 2012 PLOS one 7(10): e47649).
WNT10A gene knockout in Caki-1 cells was used here as an example to demonstrate a quantitative platform to directly test the performance of the knockout of certain oncogenes for inhibiting carcinogenesis and its disease progression. Caki-1 cells (#HTB-46, ATCC) were cultured in McCoy's 5A medium (#SH3020001, Hyclone) supplemented with 10% bovine calf serum (#SH30073.04, Hyclone) at 37° C., 5% CO2. 1×106 cells were harvested using TrypLE Select (#A1285901, Thermofisher), and the harvested cells were re-suspended in 1×PBS and spun down at 200 g for 3 minutes. The pelleted cells were gently re-suspended in 100 μl of SF solution (PBC2-00675, Lonza) and transfected with 1 μg of pD-EpiWe2gRNA1 plasmid (SEQ ID NO: 107) and 1 μg of Cas9n mRNA (SEQ ID NO: 109). The cell suspension was combined with the mix of plasmid DNA and mRNA, and transferred to a nucleofection cuvette and transfected using “DN-100” program in a 4D-Amaxa Nucleofector™ device (Lonza). After transfection, 100 μl of the medium was mixed with the cell suspension in the cuvette and incubated for 10 min at 37° C., 5% CO2. The cells were then transferred to 2 wells of a 6-well plate (#3335, Costar) and incubated at 37° C., 5% CO2. After 72 hours, the cells were harvested with TrypLE Select and plated out over 2 new wells of a 6-well plate. The next day, the cells were transfected with additional Cas9n mRNA. The RNA transfection mix was prepared using MessengerMax (#LMRNA008, Thermofisher) as per the manufacturer's protocol: About 15 μl of MessengerMax was diluted in 250 μl OptiMEM I and incubated for 10 minutes at room temperature. About 5 μg of Cas9n mRNA was diluted in 250 μl OptiMEM I media and mixed with the MessengerMax solution and incubated for 5 minutes at room temperature. After 5 minutes of incubation, 250 μl of the transfection mix were then added to each well containing 2.5 mL medium and 0.5 mL DNA transfection mix. The cells were incubated with this transfection mix for 4-6 h at 37° C., 5% CO2, The media was then replaced with 3 mL of fresh medium per well, and the cells were incubated at 37° C., 5% CO2. This Cas9n mRNA transfection was repeated two more times 72 h apart. 4 days after the final transfection the cells were harvested for the analysis of CRISPR/Cas9 mediated cleavage efficiency using T7 endonuclease mutation detection assay. The genomic DNA was isolated from CRISPR/Cas9 system transfected cell lines using Quick-gDNA Miniprep Kit (# D3024, Zymo Research). PCRs were performed using Phusion™ High-Fidelity PCR Master Mix (ThermoFisher, MA USA). The following primers were used to amplify the gDNA region containing the CRISPR target site: The forward primer NWL-MBPr-1081 (5′-atactgtggccacaagcatg-3′)(SEQ ID NO: 110) and reverse primer NWL-MBPr-1080 (5′-gttccccatcctaaatgtgg-3′)(SEQ ID NO: 111) were used to amplify a 898-bp product from both edited and untransfected Caki-1 cells, with the cleavage site located approximately in the middle. The PCR cycling parameters used with a C1000 Thermal Cycler (Bio-Rad, CA, USA) were as follows: initial denaturation at 98° C. for 3 min, 32 cycles of 98° C. for 30 seconds, 60° C. for 30 seconds, 72° C. for 60 seconds and the final extension at 72° C. for 5 min. After amplification, PCR products were resolved by 1.5% agarose gel electrophoresis and the amplicon purified using Wizard SV gel and PCR cleanup system (#A9281, Promega). Approximately 300 ng of purified PCR product obtained from the untransfected and edited cells were denatured at 95° C. and re-annealed in NEB buffer 2 using C1000 Thermal Cycler (Bio-Rad, CA, USA) as follows: 95° C. 5 min, ramp down to 85° C. at 2° C./second, followed by ramp down to 25° C. at 0.1° C./second and final hold at 4° C. The re-annealed PCR products were digested with 10 U of T7 endonuclease I (#M0302L, NEB) for 30 min at 37° C. The reaction was stopped by adding 2 μl of 0.5 M EDTA, and PCR products were resolved by electrophoresis on a 2% agarose gel. DNA fragments were stained with SYBR safe (#S33102, Invitrogen) according to manufacturer's procedure, and ImageJ software was used for the quantification of band intensities and gene editing efficiency. The gene editing efficiency was determined to be 20.1%. This efficiency was obtained without optimizing any of the transfection conditions or quantity and stability of the Cas9n mRNA introduced into the cells, and without any selection of the cells. Targeting efficiencies were calculated using the following formula: % gene modification=100×(1−[1−fraction cleaved]1/2). The fraction cleaved was calculated using the following formula: Cleaved band intensity/(uncleaved band intensity+cleaved band intensity).
Without wishing to be limited by theory, one advantage of using a non-integrating episomal plasmid is that the gRNA and/or Cas9 can be introduced continuously over a sufficient period of time, to help ensure that the gene editing takes place in the cell, thereby increasing efficiency. For example, an episomal plasmid encoding the one or more gRNA can help to ensure that sufficient gRNA is continuously present to allow for multiple introductions of Cas9 (either as the protein or as mRNA or a plasmid) and for precise timing (e.g., at a particular point in the cell cycle or when a donor template/ssDNA is introduced into the cell). This can provide an overall higher efficiency of achieving gene editing in a high proportion of cells in a cell population. Alternatively, an episomal plasmid can encode Cas9 to ensure continuous presence of the Cas9 protein in the cell over a prolonged period of time; the one or more gRNA can then be introduced (optionally together with the donor template) multiple times to allow high efficiency gene editing in a cell population.
In addition, an episomal plasmid encoding the one or more gRNA and/or Cas9 can also encode for a truncated surface protein or a protein that confers specific antibiotic resistance to a cell, to allow for selection and purification of transfected cells carrying the episomal plasmid (e.g., by sorting or by magnetic antibody separation, or using a specific antibody for the truncated surface protein that is expressed). Thus cells having the episomal plasmid can be selected and purified out of the starting cell population; it could then be expected that all or nearly all of the cells in this selected cell population (which in most cases would represent >50% of the starting cell population) would be gene edited, allowing for a completely pure or nearly pure gene-edited cell population, and without any gene integration (except for the optional donor template).
For antibiotic-free selection, a diverse pool of non-immunogenic N- or C-terminal truncated proteins with or without tag-epitopes can be used to enrich the transfected cells using sorting, magnetic microbead-based separation, or other separation methodologies (see, e.g., Table 8 and
Transfection Conditions and CD4 Magnetic Sorting for KG-1 Cells.
CCR5 gene knockout and CD4 truncated protein expression in KG-1 cells was used as an example to demonstrate a quantitative platform to directly test the performance of the truncated protein efficiency in the enrichment of Cas9n mRNA/gRNA plasmid transfected cells and the CD4 truncated protein expression (analyzed using Amnis® FlowSight, a multicolour spectral imaging flow cytometer). KG-1 cells (ATCC CCL-246) were cultured in Iscove's Modified Dulbecco's medium (Thermofisher 12440-053) supplemented with 20% bovine calf serum (Hyclone SH30073.04) at 37° C., 5% CO2, 5% 02 2×106 cells were centrifuged at 200 g for 2 minutes and re-suspended in PBS and spun down at 200 g for 3 minutes. The cells were gently re-suspended in 100 μL of SF solution (Lonza PBC2-00675) and transfected with 5 μg pD-CCR5gRNA (SEQ ID NO: 108) and 2.5 μg Cas9n mRNA. The cell suspension was combined with the mix of plasmid DNA and mRNA, transferred to a cuvette and transfected with “FF-100” program in a 4D-Amaxa Nucleofector™ Device (Lonza). Post transfection, 100 μL of medium was mixed with the cell suspension in the cuvette and incubated for 10 min at 37° C., 5% CO2, 5% O2. The cells were then transferred to a well of a 6-well plate (Costar 3335) and incubated at 37° C., 5% CO2, 5% O2 until analysis.
Flow Cytometry and Analysis.
Multicolor spectral imaging flow cytometry data were collected on an Amnis® FlowSight at 20× magnification. For the truncated CD4 expression analysis, the fresh cell pellets from the transfection conditions were tested to determine the relative differences in the expression of truncated CD4 receptors at different time points following transfections with truncated CD4 expression plasmid. Briefly, the fresh cell pellets were first re-suspended into single cell suspension in BD Staining buffer (BSA, #554657, BD Biosciences), and the cells were stained with FITC conjugated anti-CD4 antibody (M-T466, 130-080-501, Miltenyi Biotec) for 15 minutes before rinsing off the unbound antibodies. The stained cells were fixed with BD CytoFix™ Fixation buffer (#554655, BD Biosciences) and rinsed 3× times with 1×PBS. The cells were then re-suspended in 200 μL of 1×PBS, and the truncated CD4 receptor expression was detected using the single color antibody staining. This single color antibody panel was excited with a 488-nm laser at 60 mW (for detecting FITC), and the brightfield and fluorescent images were collected for 50,000 events. The Amnis IDEAS 6.1 software was used to analyze raw image files. The cut-offs for in-focus and single cells were determined manually, and pictures were screened to remove cells that were debris. The relative expression was determined using Frequency Vs Intensity values for FITC fluorophore, and its geometric mean of the histoplots were used to determine the magnitude of relative differences of truncated CD4 expression following transfection with truncated CD4 expression plasmid. From the images and histoplots obtained for different time points following transfection with the CRISPR/Cas9n system, the observed peak of truncated CD4 expression in KG1 cells transfected with the 5 μg pD-CCR5gRNA and 2.5 μg Cas9n mRNA was found to be around 42 hour post transfection, at which point almost half the transfected cells expressed it. In cases where an episomal plasmid was used instead, the truncated CD4 expression could be maintained for several weeks. This approach can thus be used to isolate and purify likely cells that have undergone gene editing, without the need for stable transfection and/or antibiotic selection.
Although this invention is described in detail with reference to embodiments thereof, these embodiments are offered to illustrate but not to limit the invention. It is possible to make other embodiments that employ the principles of the invention and that fall within its spirit and scope as defined by the claims appended hereto.
The contents of all documents and references cited herein are hereby incorporated by reference in their entirety.
The present application claims priority to U.S. Provisional Patent Application No. 62/351,398 filed on Jun. 17, 2016, the entirety of which is incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2017/053599 | 6/16/2017 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62351398 | Jun 2016 | US |