The present invention is related to the field of genetic engineering. In particular, in regards to methods and compositions to correct genomic mutations that are associated with diseases or other medical disorders. For example, a modified prime editor is used to delete and insert large polynucleotide sequences that is beyond the capability of conventional prime editors. The presently disclosed Cas9 prime editor is catalytically active, whereas conventional prime editors utilize a Cas9 nickase. The improved prime editor permits a therapeutic deletion/insertion event to treat diseases and medical disorders that are beyond the capability of conventional prime editors.
Correction of genetic mutations in vivo is believed to have broad potential therapeutic application for a range of human genetic diseases. Prime editors (PE) composed of a Cas9 nickase and an engineered reverse transcriptase have been reported to result in nucleotide changes, sequence insertions and deletions. Anzalone et al., “Search-and-replace genome editing without double-strand breaks or donor DNA” Nature 576:149-157 (2019). PE does not induce double-stranded DNA breaks and does not require a donor DNA template in conjunction with homology directed repair.
Genomic insertions, duplications, and insertion/deletions (indels) may account for ˜14% of human pathogenic mutations. Current gene editing methods cannot accurately or efficiently correct these abnormal genomic rearrangements, especially larger alterations (e.g., >100 bp). Thus, what is needed in the art are compositions and methods to accurately delete large insertions/duplications and repair a deletion junction which improve the scope of gene therapies.
The present invention is related to the field of genetic engineering. In particular, in regards to methods and compositions to correct genomic mutations that are associated with diseases or other medical disorders. For example, a modified prime editor is used to delete and insert large polynucleotide sequences that is beyond the capability of conventional prime editors. The presently disclosed Cas9 prime editor is catalytically active, whereas conventional prime editors utilize a Cas9 nickase. The improved prime editor permits a therapeutic deletion/insertion event to treat diseases and medical disorders that are beyond the capability of conventional prime editors.
In one embodiment, the present invention contemplates a method, comprising: a) providing; i) a genomic DNA locus comprising a target nucleotide sequence; and ii) a composition comprising a catalytically active Cas9 protein fused to a reverse transcriptase, a first prime editor guide RNA (pegRNA) molecule conjugated to a first reverse transcriptase DNA insertion template and a second prime editor guide RNA molecule conjugated to a second reverse transcriptase DNA insertion template, wherein said first and second reverse transcriptase DNA templates are complementary; b) contacting said catalytically active Cas9 protein with said target nucleotide sequence, wherein said first pegRNA molecule binds to a sense strand of said target nucleotide sequence and said second pegRNA molecule binds to an antisense strand of said target nucleotide sequence; c) creating two double strand breaks in said target nucleotide sequence with said catalytically active Cas9 protein such that said target nucleotide sequence is deleted; and d) incorporating a double stranded insertion nucleotide sequence encoded by said first and second reverse transcriptase insertion templates into said genomic DNA sequence. In one embodiment, the target nucleotide sequence ranges between 1 kb to 10 kb. In one embodiment, the insertion nucleotide sequence has a length of up to 60 bp. In one embodiment, the target nucleotide sequence is linked to a genetic disease. In one embodiment, the genetic disease is tyrosinemia I. In one embodiment, the target nucleotide sequence comprises a FahΔExon 5 mutation.
In one embodiment, the present invention contemplates a method, comprising: a) providing; i) a patient exhibiting at least one symptom of a genetic disease; and ii) a composition comprising a catalytically active Cas9 protein fused to a reverse transcriptase, a first prime editor guide RNA (pegRNA) molecule conjugated to a first reverse transcriptase DNA insertion template and a second prime editor guide RNA molecule conjugated to a second reverse transcriptase DNA insertion template, wherein said first and second reverse transcriptase DNA templates are complementary; b) administering said composition to said patient such that said at least one symptom of said genetic disease is reduced. In one embodiment, the genetic disease is tyrosinemia. In one embodiment, the genetic disease is Huntington disease. In one embodiment, the patient further comprises a gene mutation insertion between 1 kb-10 kb. In one embodiment, the administering replaces said gene mutation insertion with an insertion nucleotide sequence that has a length of up to 60 bp.
In one embodiment, the present invention contemplates a composition comprising a catalytically active Cas9 protein fused to a reverse transcriptase, a first prime editor guide RNA (pegRNA) molecule conjugated to a first reverse transcriptase DNA template and a second prime editor guide RNA molecule conjugated to a second reverse transcriptase DNA template, wherein said first and second reverse transcriptase DNA templates are complementary. In one embodiment, the first reverse transcriptase DNA template is conjugated as a 3′ extension to the first pegRNA molecule. In one embodiment, the second reverse transcriptase DNA template is conjugated as a 3′ extension to the second pegRNA molecule. In one embodiment, the first and second reverse transcriptase DNA templates have a length of up to 60 bp.
To facilitate the understanding of this invention, a number of terms are defined below. Terms defined herein have meanings as commonly understood by a person of ordinary skill in the areas relevant to the present invention. Terms such as “a”, “an” and “the” are not intended to refer to only a singular entity but also plural entities and also includes the general class of which a specific example may be used for illustration. The terminology herein is used to describe specific embodiments of the invention, but their usage does not delimit the invention, except as outlined in the claims.
The term “about” or “approximately” as used herein, in the context of any of any assay measurements refers to +/−5% of a given measurement.
As used herein, the term “CRISPRs” or “Clustered Regularly Interspaced Short Palindromic Repeats” refers to an acronym for DNA loci that contain multiple, short, direct repetitions of base sequences. Each repetition contains a series of bases followed by the same series in reverse and then by 30 or so base pairs known as “spacer DNA”. The spacers are short segments of DNA from a virus and may serve as a ‘memory’ of past exposures to facilitate an adaptive defense against future invasions (PMID 25430774).
As used herein, the term “Cas” or “CRISPR-associated (cas)” refers to genes often associated with CRISPR repeat-spacer arrays (PMID 25430774).
As used herein, the term “Cas9” refers to a nuclease from Type II CRISPR systems, an enzyme specialized for generating double-strand breaks in DNA, with two active cutting sites (the HNH and RuvC domains), one for each strand of the double helix. Jinek combined tracrRNA and crRNA (spacer RNA) into a “single-guide RNA” (sgRNA) molecule that, mixed with Cas9, could find and cleave DNA targets through Watson-Crick pairing between the guide sequence within the sgRNA and the target DNA sequence (PMID 22745249). There have been substantial efforts to broaden the targeting specificity of SpyCas9 through mutations that increase the number of PAMs that can be recognized. Two of the most prominent modified versions of Cas9 are xCas9 (Hu et al. 2018 (PMID 29512652)) and Cas9-NG (Nishimasu et al. 2018 (PMID 30166441)), both of which permit targeting some additional PAM elements.
As used herein, the term “guide RNA” refers to an RNA that programs a CRISPR-Cas protein to recognize a target site in the genome. This could be a crRNA, crRNA/tracrRNA, sgRNA or a pegRNA depending on the type of Cas9 protein and the modifications that have been made to the protein to incorporate extra functionality.
As used herein, the term “catalytically active Cas9” refers to an unmodified Cas9 nuclease comprising full nuclease activity.
The term “nickase” as used herein, refers to a nuclease that cleaves only a single DNA strand, either due to its natural function or because it has been engineered to cleave only a single DNA strand. Cas9 nickase variants (e.g. nSpCas9, nCas9) that have either the RuvC or the HNH domain mutated provide control over which DNA strand is cleaved and which remains intact (Jinek, et al. 2012 (PMID 22745249) and Cong, et al. 2013 (PMID 23287718)).
The term “protospacer adjacent motif” (or PAM) as used herein, refers to a DNA sequence that may be required for a Cas9/sgRNA to form an R-loop to interrogate a specific DNA sequence through Watson-Crick pairing of its guide RNA with the genome. The PAM may comprise a trinucleotide sequence having a single G residue (e.g., a single G PAM), or a trinucleotide sequence having two consecutive G residues (e.g., a dual G PAM). The PAM specificity may be a function of the DNA-binding specificity of the Cas9 protein (e.g., a “protospacer adjacent motif recognition domain” at the C-terminus of Cas9).
As used herein, the term “sgRNA” refers to single guide RNA used in conjunction with CRISPR associated systems (Cas). sgRNAs are a fusion of crRNA and tracrRNA and contain nucleotides of sequence complementary to the desired target site (Jinek, et al. 2012 (PMID 22745249)). Watson-Crick pairing of the sgRNA with the target site permits R-loop formation, which in conjunction with a functional PAM permits DNA cleavage or in the case of nuclease-deficient Cas9 allows binds to the DNA at that locus.
The term “primer binding site” as used herein, refers to a specific nucleic acid sequence within the pegRNA that is complementary to the 3′ or 5′ end of a cleaved target nucleotide sequence. This allows annealing of the free 3′ end or free 5′ end of the genomic DNA for extension by the reverse transcriptase based on the reverse transcriptase template sequence encoded in the pegRNA.
The term, “prime editing guide RNA molecule” or “pegRNA molecule” as used herein, refers to a Cas9 guide RNA molecule that encodes the crRNA-tracrRNA fused to a primer binding site (PBS) and a reverse transcriptase template (RTT). The primer binding site hybridizes to a desired genomic sequence released by the binding and cleavage of the Cas9 nickase. The 3′ end and/or 5′ end of a genomic sequence is extended by the reverse transcriptase based on the reverse transcriptase template sequence.
The term “prime editing” as used herein, is a genome editing technology by which the genome of living organisms may be modified. Prime editing manipulates the genetic information of a targeted DNA site to essentially “rewrite” the coded sequences.
The term “prime editor” or “PE” as used herein, is a fusion protein comprising a catalytically impaired Cas9 endonuclease (nickase; nCas9) that can nick DNA fused to an engineered reverse transcriptase enzyme, and a prime editing guide RNA (pegRNA). The pegRNA is capable of programming the nCas9 to recognize a target site with the encoded crRNA-tracrRNA. The resulting nicked genomic DNA can be extended by the reverse transcriptase based on the pegRNA template sequence to integrate a new sequence. Once one strand is recoded, cellular DNA repair pathways fill in the other strand to create the new sequence. Such manipulation includes, but is not limited to, insertions, deletions, and base-to-base conversions without the need for double strand breaks (DSBs) or donor DNA templates. For example, such prime editing may be performed by a Cas9 CRISPR platform programmed with a pegRNA, such as a catalytically impaired Cas9 nickase platform with an appropriate reverse transcriptase.
The term “base pairs” as used herein, refer to specific nucleobases (also termed nitrogenous bases), that are the building blocks of nucleotide sequences that form a primary structure of both DNA and RNA. Double stranded DNA may be characterized by specific hydrogen bonding patterns, base pairs may include, but are not limited to, guanine-cytosine and adenine-thymine) base pairs.
As used herein, the term “edit” “editing” or “edited” refers to a method of altering a nucleic acid sequence of a polynucleotide (e.g., for example, a wild type naturally occurring nucleic acid sequence or a mutated naturally occurring sequence) by selective deletion of a specific genomic target, the specific inclusion of new sequence through the use of an exogenously supplied DNA template, or the conversion of one DNA base to another DNA base.
Such a specific genomic target includes, but may be not limited to, a chromosomal region, mitochondrial DNA, a gene, a promoter, an open reading frame or any nucleic acid sequence.
The term “effective amount” as used herein, refers to a particular amount of a pharmaceutical composition comprising a therapeutic agent that achieves a clinically beneficial result (i.e., for example, a reduction of symptoms). Toxicity and therapeutic efficacy of such compositions can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, e.g., for determining the LD50 (the dose lethal to 50% of the population) and the ED50 (the dose therapeutically effective in 50% of the population). The dose ratio between toxic and therapeutic effects is the therapeutic index, and it can be expressed as the ratio LD50/ED50. Compounds that exhibit large therapeutic indices are preferred. The data obtained from these cell culture assays and additional animal studies can be used in formulating a range of dosage for human use. The dosage of such compounds lies preferably within a range of circulating concentrations that include the ED50 with little or no toxicity. The dosage varies within this range depending upon the dosage form employed, sensitivity of the patient, and the route of administration.
The term “symptom”, as used herein, refers to any subjective or objective evidence of disease or physical disturbance observed by the patient. For example, subjective evidence is usually based upon patient self-reporting and may include, but is not limited to, pain, headache, visual disturbances, nausea and/or vomiting. Alternatively, objective evidence is usually a result of medical testing including, but not limited to, body temperature, complete blood count, lipid panels, thyroid panels, blood pressure, heart rate, electrocardiogram, tissue and/or body imaging scans.
The term “associated with” as used herein, refers to an art-accepted causal relationship between a genetic mutation and a medical condition or disease. For example, it is art-accepted that a patient having an HTT gene comprising a tandem CAG repeat expansion mutation has, or is a risk for, Huntington's disease.
The term “disease” or “medical condition”, as used herein, refers to any impairment of the normal state of the living animal or plant body or one of its parts that interrupts or modifies the performance of the vital functions. Typically manifested by distinguishing signs and symptoms, it is usually a response to: i) environmental factors (as malnutrition, industrial hazards, or climate); ii) specific infective agents (as worms, bacteria, or viruses); iii) inherent defects of the organism (as genetic anomalies); and/or iv) combinations of these factors.
The terms “reduce,” “inhibit,” “diminish,” “suppress,” “decrease,” “prevent” and grammatical equivalents (including “lower,” “smaller,” etc.) when in reference to the expression of any symptom in an untreated subject relative to a treated subject, mean that the quantity and/or magnitude of the symptoms in the treated subject is lower than in the untreated subject by any amount that is recognized as clinically relevant by any medically trained personnel. In one embodiment, the quantity and/or magnitude of the symptoms in the treated subject is at least 10% lower than, at least 25% lower than, at least 50% lower than, at least 75% lower than, and/or at least 90% lower than the quantity and/or magnitude of the symptoms in the untreated subject.
The term “administered” or “administering”, as used herein, refers to any method of providing a composition to a patient such that the composition has its intended effect on the patient. An exemplary method of administering is by a direct mechanism such as, local tissue administration (i.e., for example, extravascular placement), oral ingestion, transdermal patch, topical, inhalation, suppository etc.
The term “patient” or “subject”, as used herein, is a human or animal and need not be hospitalized. For example, out-patients, persons in nursing homes are “patients.” A patient may comprise any age of a human or non-human animal and therefore includes both adult and juveniles (i.e., children). It is not intended that the term “patient” connote a need for medical treatment, therefore, a patient may voluntarily or involuntarily be part of experimentation whether clinical or in support of basic science studies.
The term “protein” as used herein, refers to any of numerous naturally occurring extremely complex substances (as an enzyme or antibody) that consist of amino acid residues joined by peptide bonds, contain the elements carbon, hydrogen, nitrogen, oxygen, usually sulfur. In general, a protein comprises amino acids having an order of magnitude within the hundreds.
The term “pharmaceutically” or “pharmacologically acceptable”, as used herein, refer to molecular entities and compositions that do not produce adverse, allergic, or other untoward reactions when administered to an animal or a human.
The term, “pharmaceutically acceptable carrier”, as used herein, includes any and all solvents, or a dispersion medium including, but not limited to, water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid polyethylene glycol, and the like), suitable mixtures thereof, and vegetable oils, coatings, isotonic and absorption delaying agents, liposome, commercially available cleansers, and the like. Supplementary bioactive ingredients also can be incorporated into such carriers.
The terms “Nucleic acid sequence” and “nucleotide sequence” as used herein refer to an oligonucleotide or polynucleotide, and fragments or portions thereof, and to DNA or RNA of genomic or synthetic origin which may be single- or double-stranded, and represent the sense or antisense strand.
The term “antisense strand” as used herein, refers to a non-coding DNA strand of a gene. A cell uses antisense DNA strand as a template for producing messenger RNA (mRNA) that directs the synthesis of a protein.
The term “sense strand” as used herein, refers to a coding DNA strand of a gene. A cell uses sense DNA strand to encode the associated amino acid sequence of a protein.
The term “an isolated nucleic acid”, as used herein, refers to any nucleic acid molecule that has been removed from its natural state (e.g., removed from a cell and is, in a preferred embodiment, free of other genomic nucleic acid).
The terms “amino acid sequence” and “polypeptide sequence” as used herein, are interchangeable and to refer to a sequence of amino acids.
The term “portion” when used in reference to a nucleotide sequence refers to fragments of that nucleotide sequence. The fragments may range in size from 5 nucleotide residues to the entire nucleotide sequence minus one nucleic acid residue.
A “deletion” is defined as a change in either nucleotide or amino acid sequence in which one or more nucleotides or amino acid residues, respectively, are absent.
An “insertion” or “addition” is that change in a nucleotide or amino acid sequence which has resulted in the addition of one or more nucleotides or amino acid residues.
As used herein, the terms “complementary” or “complementarity” are used in reference to “polynucleotides” and “oligonucleotides” (which are interchangeable terms that refer to a sequence of nucleotides) related by the base-pairing rules. For example, the sequence “C-A-G-T,” is complementary to the sequence “G-T-C-A.” Complementarity can be “partial” or “total.” “Partial” complementarity is where one or more nucleic acid bases is not matched according to the base pairing rules. “Total” or “complete” complementarity between nucleic acids is where each and every nucleic acid base is matched with another base under the base pairing rules. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. This is of particular importance in amplification reactions, as well as detection methods which depend upon binding between nucleic acids.
The terms “homology” and “homologous” as used herein in reference to nucleotide sequences refer to a degree of complementarity with other nucleotide sequences. There may be partial homology or complete homology (i.e., identity). A nucleotide sequence which is partially complementary, i.e., “substantially homologous,” to a nucleic acid sequence is one that at least partially inhibits a completely complementary sequence from hybridizing to a target nucleic acid sequence. The inhibition of hybridization of the completely complementary sequence to the target sequence may be examined using a hybridization assay (Southern or Northern blot, solution hybridization and the like) under conditions of low stringency. A substantially homologous sequence or probe will compete for and inhibit the binding (i.e., the hybridization) of a completely homologous sequence to a target sequence under conditions of low stringency. This is not to say that conditions of low stringency are such that non-specific binding is permitted; low stringency conditions require that the binding of two sequences to one another be a specific (i.e., selective) interaction. The absence of non-specific binding may be tested by the use of a second target sequence which lacks even a partial degree of complementarity (e.g., less than about 30% identity); in the absence of non-specific binding the probe will not hybridize to the second non-complementary target.
The terms “homology” and “homologous” as used herein in reference to amino acid sequences refer to the degree of identity of the primary structure between two amino acid sequences. Such a degree of identity may be directed a portion of each amino acid sequence, or to the entire length of the amino acid sequence. Two or more amino acid sequences that are “substantially homologous” may have at least 50% identity, preferably at least 75% identity, more preferably at least 85% identity, most preferably at least 95%, or 100% identity.
An oligonucleotide sequence which is a “homolog” is defined herein as an oligonucleotide sequence which exhibits greater than or equal to 50% identity to a sequence, when sequences having a length of 100 bp or larger are compared.
As used herein, the term “hybridization” is used in reference to the pairing of complementary nucleic acids using any process by which a strand of nucleic acid joins with a complementary strand through base pairing to form a hybridization complex. Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acids) is impacted by such factors as the degree of complementarity between the nucleic acids, stringency of the conditions involved, the Tm of the formed hybrid, and the G:C ratio within the nucleic acids.
As used herein the term “hybridization complex” refers to a complex formed between two nucleic acid sequences by virtue of the formation of hydrogen bounds between complementary G and C bases and between complementary A and T bases; these hydrogen bonds may be further stabilized by base stacking interactions. The two complementary nucleic acid sequences hydrogen bond in an antiparallel configuration. A hybridization complex may be formed in solution (e.g., C0 t or R0 t analysis) or between one nucleic acid sequence present in solution and another nucleic acid sequence immobilized to a solid support (e.g., a nylon membrane or a nitrocellulose filter as employed in Southern and Northern blotting, dot blotting or a glass slide as employed in in situ hybridization, including FISH (fluorescent in situ hybridization)).
DNA molecules are said to have “5′ ends” and “3′ ends” because mononucleotides are reacted to make oligonucleotides in a manner such that the 5′ phosphate of one mononucleotide pentose ring is attached to the 3′ oxygen of its neighbor in one direction via a phosphodiester linkage. Therefore, an end of an oligonucleotide is referred to as the “5′ end” if its 5′ phosphate is not linked to the 3′ oxygen of a mononucleotide pentose ring. An end of an oligonucleotide is referred to as the “3′ end” if its 3′ oxygen is not linked to a 5′ phosphate of another mononucleotide pentose ring. As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide, also may be said to have 5′ and 3′ ends. In either a linear or circular DNA molecule, discrete elements are referred to as being “upstream” or 5′ of the “downstream” or 3′ elements. This terminology reflects the fact that transcription proceeds in a 5′ to 3′ fashion along the DNA strand. The promoter and enhancer elements which direct transcription of a linked gene are generally located 5′ or upstream of the coding region. However, enhancer elements can exert their effect even when located 3′ of the promoter element and the coding region. Transcription termination and polyadenylation signals are located 3′ or downstream of the coding region.
As used herein, the term “an oligonucleotide having a nucleotide sequence encoding a gene” means a nucleic acid sequence comprising the coding region of a gene, i.e. the nucleic acid sequence which encodes a gene product. The coding region may be present in a cDNA, genomic DNA or RNA form. When present in a DNA form, the oligonucleotide may be single-stranded (i.e., the sense strand) or double-stranded. Suitable control elements such as enhancers/promoters, splice junctions, polyadenylation signals, etc. may be placed in close proximity to the coding region of the gene if needed to permit proper initiation of transcription and/or correct processing of the primary RNA transcript. Alternatively, the coding region utilized in the expression vectors of the present invention may contain endogenous enhancers/promoters, splice junctions, intervening sequences, polyadenylation signals, etc. or a combination of both endogenous and exogenous control elements.
As used herein, the terms “nucleic acid molecule encoding”, “DNA sequence encoding,” and “DNA encoding” refer to the order or sequence of deoxyribonucleotides along a strand of deoxyribonucleic acid. The order of these deoxyribonucleotides determines the order of amino acids along the polypeptide (protein) chain. The DNA sequence thus codes for the amino acid sequence.
As used herein, the term “gene” means the deoxyribonucleotide sequences comprising the coding region of a structural gene and including sequences located adjacent to the coding region on both the 5′ and 3′ ends for a distance of about 1 kb on either end such that the gene corresponds to the length of the full-length mRNA. The sequences which are located 5′ of the coding region and which are present on the mRNA are referred to as 5′ non-translated sequences. The sequences which are located 3′ or downstream of the coding region and which are present on the mRNA are referred to as 3′ non-translated sequences. The term “gene” encompasses both cDNA and genomic forms of a gene. A genomic form or clone of a gene contains the coding region interrupted with non-coding sequences termed “introns” or “intervening regions” or “intervening sequences.” Introns are segments of a gene which are transcribed into heterogeneous nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or “spliced out” from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide.
In addition to containing introns, genomic forms of a gene may also include sequences located on both the 5′ and 3′ end of the sequences which are present on the RNA transcript. These sequences are referred to as “flanking” sequences or regions (these flanking sequences are located 5′ or 3′ to the non-translated sequences present on the mRNA transcript). The 5′ flanking region may contain regulatory sequences such as promoters and enhancers which control or influence the transcription of the gene. The 3′ flanking region may contain sequences which direct the termination of transcription, posttranscriptional cleavage and polyadenylation
The liver sections from untreated FahΔExon5 mice kept on or off NTBC serve as negative controls. Scale bar=100 m.
The present invention is related to the field of genetic engineering. In particular, in regards to methods and compositions to correct genomic mutations that are associated with diseases or other medical disorders. For example, a modified prime editor is used to delete and insert large polynucleotide sequences that is beyond the capability of conventional prime editors. The presently disclosed Cas9 prime editor is catalytically active, whereas conventional prime editors utilize a Cas9 nickase. The improved prime editor permits a therapeutic deletion/insertion event to treat diseases and medical disorders that are beyond the capability of conventional prime editors.
In one embodiment, the present invention contemplates a Cas9 prime editor (PECas9) comprising a catalytically active Cas9 nuclease conjugated to a reverse transcriptase and combined with two prime editing guide RNAs (pegRNAs) having complementary reverse transcriptase template nucleotide strands. Although it is not necessary to understand the mechanism of an invention, it is believed that PECas9 can replace a genomic fragment, ranging from to ˜1 Kb to >10 Kb, with any desired sequence without requiring an exogenous DNA template.
This system, designated herein as a “PECas9-Based Deletion And Repair” (PEDAR) system has been shown herein to restore mCherry expression through an in-frame deletion of a disrupted green fluorescent protein (GFP) DNA sequence. Further shown is that PEDAR efficiency is enhanced by using pegRNAs with high cleavage activity or increasing transfection efficiency. In tyrosinemia mice, a PEDAR system removed a 1.38-kb pathogenic insertion within the Fah gene and precisely repaired the deletion junction to restore FAH protein expression in liver. These data demonstrate that PECas9 compositions and PEDAR methods can be an efficacious clinical therapy for correcting pathogenic mutations by replacing large nucleotide sequences and/or chromosomal aberrations.
In one embodiment, the present invention contemplates compositions and methods to perform precise genome editing that accurately deletes insertion/duplication mutations of DNA sequences and repairs the disrupted genomic site to treat a wide range of diseases.
Genetic insertions, duplications, and indels (insertion/deletion) account for ˜14% of 60,008 known human pathogenic variants. See,
The CRISPR/Cas9 system is a proposed gene editing tool for correcting pervasive pathogenic gene mutations. When using dual single guide RNAs (sgRNA), Cas9 is believed to induce two double-strand breaks (DSBs). The two cut ends can then be ligated through the non-homologous end joining (NHEJ) repair pathway, leading to <5-Mb target fragment deletion in vitro and in vivo. Ran et al., “Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity” Cell 154:1380-1389 (2013); Cong et al., “Multiplex genome engineering using CRISPR/Cas systems” Science 339:819-823 (2013); Kato et al., “Creation of mutant mice with megabase-sized deletions containing custom-designed breakpoints by means of the CRISPR/Cas9 system” Sci Rep 7:59 (2017); Hara et al., “Microinjection-based generation of mutant mice with a double mutation and a 0.5 Mb deletion in their genome by the CRISPR/Cas9 system” J Reprod Dev 62:531-536 (2016); and Wang et al., “Large genomic fragment deletion and functional gene cassette knock-in via Cas9 protein mediated genome editing in one-cell rodent embryos” Sci Rep 5:17517 (2015).
However, the random indels generated by NHEJ lower the editing accuracy of this method. When a donor DNA template is present, CRISPR/Cas9 can insert a desired sequence at the cut site to repair the deletion junction through homology directed repair (HDR). Yeh et al., “Advances in genome editing through control of DNA repair pathways” Nat Cell Biol 21:1468-1478 (2019). This method has been used successfully in precise gene deletion and replacement application. Zheng et al., “Precise gene deletion and replacement using the CRISPR/Cas9 system in human cells” Biotechniques 57:115-124 (2014). Nevertheless, the repair efficiency of CRISPR-mediated HDR is hindered by the exogenous DNA donor and is limited in post-mitotic cells. Cox et al., “Therapeutic genome editing: prospects and challenges” Nature Medicine 21:121-131 (2015); and Liu et al., “Methodologies for Improving HDR Efficiency” Front Genet 9:691 (2018).
To further expand the gene editing toolbox, a CRISPR-associated gene editor—called prime editing (PE)—was developed by conjugating an engineered reverse transcriptase (RT) to a catalytically-impaired Cas9 ‘nickase’ (Cas9H840A) that cleaves only one DNA strand. An extension at the 3′ end of the prime editing guide RNA (pegRNA) encodes an RT template, allowing the nicked site to be precisely repaired. Anzalone et al., “Search-and-replace genome editing without double-strand breaks or donor DNA” Nature 576:149-157 (2019); and Matsoukas, I. G., “Prime Editing: Genome Editing for Rare Genetic Diseases Without Double-Strand Breaks or Donor DNA” Front Genet 11:528 (2020).
Thus, conventional PE complexes can mediate small deletions, small insertions, and limited base editing without creating double stranded DNA breaks or requiring donor DNA. Schene et al., “Prime editing for functional repair in patient-derived disease models” Nat Commun 11:5352 (2020); Jiang et al., “Prime editing efficiently generates W542L and S621I double mutations in two ALS genes in maize” Genome Biology 21:257 (2020); Liu et al., “Improved prime editors enable pathogenic allele correction and cancer modelling in adult mice” bioRxiv, 2020.2012.2015.422970 (2020); Jang, H. et al., “Prime editing enables precise genome editing in mouse liver and retina” bioRxiv 2021.2001.2008.425835 (2021). Yet, conventional PE has been unsuccessfully applied to delete large DNA sequences.
Conventional PE complexes are constructed with a nicking Cas9, one pegRNA and one nicking gRNA. If one of skill would consider using a conventional prime editor complex with two prime editing guide RNAs (pegRNAs), an attempt to replace large genomic DNA sequences might be outlined as follows (see,
However, this theoretical modification of a conventional prime editor complex is considered in the art not to have a reasonable expectation of success because it has been reported that a prime editor Cas9 nickase complex is not effective in mediating larger target deletions with paired guide RNAs. Song et al., “CRISPR-Cas9(D10A) Nickase-Assisted Genome Editing in Lactobacillus casei” Appl Environ Microbiol 83 (2017); and Cho et al., “Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases” Genome Res 24; 132-141 (2014). Indeed, PE applications reported in the literature are limited to programing deletions of less than 100 bp, raising the concern that PE cannot generate long genomic deletions. Matsoukas I. G., “Prime Editing: Genome Editing for Rare Genetic Diseases Without Double-Strand Breaks or Donor DNA” Front Genet 11:528 (2020).
To achieve accurate and efficient large nucleotide sequence deletion and simultaneous nucleotide sequence insertion, without requiring a DNA template, a conventional prime editing system was improved by using a catalytically active Cas9 nuclease with a pair of pegRNAs (hereafter referred to as pegF and pegR) rather than a nickase Cas9 with one pegRNA and one nicking guide RNA. See,
This newly-engineered system can mediate an accurate deletion/insertion repair through the following exemplary steps: (i) prime editor recognizes the ‘NGG’ PAM sequence, binds, and cleaves both complementary strands of DNA on either side of the large sequence8; (ii) the encoded insertion sequences are then reverse transcribed between the cleavage sites of the complementary strands using the RT template linked to the pegRNAs; (iii) the complementary DNA strands containing the insertion sequence are annealed; (iv) the original DNA strands (i.e., 5′ flaps) are excised; and (v) the DNA is repaired by endogenous DNA repair pathways. See,
Catalytically active Cas9 nuclease has been used to program larger deletions with dual conventional sgRNAs14. In one embodiment, the present invention contemplates a primer editor composition comprising a catalytically active Cas9 nuclease (instead of a conventional PE Cas9 nickase) that is conjugated to a reverse transcriptase (RT) to create “PECas9”. See,
Although it is not necessary to understand the mechanism of an invention, it is believed that when two pegRNAs target both complementary strands of DNA, PECas9 introduces two DSBs and deletes an intervening DNA fragment between the two DSBs. Concurrently, an insertion nucleotide sequence is incorporated at the deletion site using the respective RT templates conjugated as a 3′ extension on each of the two pegRNAs. The two complementary insertion sequences then function as a homologous sequence to induce an endogenous ligation and repair of the deletion junction. See
The efficiencies of PEDAR systems, conventional PE systems, and conventional Cas9 systems were compared for large deletion sequences coupled together with an accurate large insertion sequence at an endogenous HEK3 genomic locus in HEK293T cells. For this comparison, two pegRNAs were designed with an offset of 979 bp (e.g., the distance between the two ‘NGG’ PAM sequences) to program a 991 bp deletion sequence with an 18 bp insertion sequence at the HEK3 site. The 3′ extension RT template of the pegRNAs encoded an I-SceI recognition sequence (18-bp), which was reversed transcribed and integrated into the deletion site. See,
The two pegRNAs were transfected into cells along with a conventional PE, a PECas9, or a conventional Cas9. Delivery of PECas9 with or without a single pegRNA was used as a negative control and the target site was amplified three days post-transfection. The data showed that either PECas9 or a conventional Cas9, but not a conventional PE, led to a ˜450-bp deletion amplicon. The conventional PE amplicon was ˜1-kb shorter than the amplicon without a deletion. See,
Deletion amplicons from each group were digested with I-SceI endonuclease, and it was observed that only PECas9 showed cut bands of expected size (˜251 bp and ˜199 bp), indicating insertion of the I-SceI recognition sequence. See,
The PEDAR system also generated unintended edits, classified as: (i) other deletions/insertions, including a direct deletion without insertion and imperfect deletion/insertions, and (ii) small indels generated by individual pegRNA at the two cut sites, hereafter referred to as cut site_F and cut site_R. The incidence of these unintended events was measured in total genomic DNA by real-time quantitative PCR, and it was observed that PECas9 and conventional Cas9 generated comparable rates of unintended edits. See,
PECas9-mediated unintended deletion edits with the highest sequencing reads were evaluated. See,
PECas9 or conventional Cas9 also introduced indels at the two cut sites without generating the desired deletion. Sanger sequencing of these amplicons without a deletion reveals no significant difference in small indels caused by either PECas9 or conventional Cas9. See,
Potential repair mechanisms underlying PEDAR-mediated editing were evaluated by delivering PECas9 with one pegRNA and one sgRNA targeting the HEK3 locus and PECas9 with two pegRNAs. See,
After transfecting a cell with a pegRNA_alt and either conventional PE or PECas9 a deletion amplicon of the expected size was identified and insertion of I-Sce1 recognition sequence was detected. See,
To investigate a maximum deletion size for a PEDAR system, two sets of paired pegRNAs were designed with either an offset of ˜8 kb or ˜10 kb targeted at the CDC42 locus. See,
A PEDAR system was validated to generate large in-frame deletions and accurately repair genomic coding regions to restore gene expression. A HEK293T traffic light reporter (TLR) cell line was used which contains a green fluorescent protein GFP sequence with an insertion and an mCherry sequence separated by a T2A (2A self-cleaving peptides) sequence28,29. The TLR system generates a disrupted GFP sequence that causes a frameshift which prevents mCherry expression. See,
A PEDAR system was tested to restore an mCherry signal by accurately deleting a disrupted GFP and T2A sequence having ˜800 bp in length. Two pegRNAs were designed that targeted the GFP promoter region before the start codon and the site immediately after T2A, respectively. In this approach, part of the Kozak sequence and start codon were unintentionally deleted due to the restriction of the PAM sequence. However, the RT template at the 3′ end of pegRNAs was designed to encode missing the Kozak sequence and start codon to ensure their insertion into the target site by reverse transcription. See,
TLR reporter cells were treated with dual pegRNAs (e.g., pegF+pegR) and either PECas9, conventional PE, or conventional Cas9, and the mCherry signal were assessed by flow cytometry. The frequency of mCherry positive cells was significantly higher in the PECas9-treated group (2.12±0.105%) as compared to either the conventional PE or conventional Cas9 groups. See,
Alternatively, to enhance the editing rate, improving the expression level of gene editing agents in cells was evaluated.30 Co-transfection of cells with a fluorescent protein-expressing plasmid, followed by FACS sorting, has been reported to enrich for cells with high levels of transgene expression31, 32. Thus, a GFP-expressing plasmid was co-transfected with PECas9 and paired pegRNAs into TLR cells as an indicator of transfection efficiency. A ˜1.42-fold increase in mCherry positive cell rate was observed after selection of cells with high GFP expression. See,
To verify that PEDAR systems restore mCherry expression via accurate deletion-insertion, mCherry positive cells were sorted in PECas9-treated groups and the insertion sequences were amplified. The data shows a deletion amplicon that is ˜800-bp shorter than an amplicon in untreated control cells. See,
To test the clinical gene therapy embodiments of PEDAR systems, a Tyrosinemia I mouse model was selected, referred to as FahΔExon5. This Tyrosinemia I mouse model is derived by replacing a 19-bp sequence with a ˜1.3-kb neo expression cassette at exon 5 of the Fah gene33,34. See,
To maintain body weight and survival, FahΔExon5 mice are given water supplemented with NTBC (2-(2-nitro-4-trifluoromethylbenzoyl)-1,3-cyclohexanedione), a tyrosine catabolic pathway inhibitor. A PEDAR system was tested to correct a causative FahΔExon5 mutation by deleting a large mutation insertion and simultaneously inserting a 19-bp sequence back to repair exon 5. See,
It has been reported that gene edited hepatocytes with a corrected FAH protein will gain a growth advantage and eventually repopulate the liver35. Therefore, a PECas9 and two pegRNAs were delivered via hydrodynamic injection to FahΔExon5 mice (n=4) which were subsequently removed from the NTBC water supplement to allow repopulation of the gene edited hepatocytes. Untreated FahΔExon5 mice, but with removed NTBC water supplementation, were used as controls. Forty days later, widespread FAH protein patches were observed in PECas9-treated mouse liver sections, and the gene edited hepatocytes showed normal morphology. See,
To understand gene editing events in mouse liver, the insertion nucleotide sequence was amplified by using PCR primers spanning exon 5. A ˜300-bp deletion amplicon was identified in treated mice, indicating deletion of the ˜1.3-kb mutation insertion fragment. See,
In one embodiment, the present invention contemplates a Cas9 prime editor that operates on a PECas9-based deletion and repair (PEDAR) method that can correct mutations caused by large genomic rearrangements. Based on the design of conventional prime editors, the PEDAR system was modified to comprise a catalytically active Cas9 nuclease combined with an RT and paired pegRNAs. In operation, PECas9 couples together the replacement of a deletion nucleotide sequence with an insertion nucleotide sequence to accomplish a desired genome edit.
The presently disclosed PEDAR system is similar to a recently developed paired prime editing method, called PRIME-Del.36 PRIME-Del, however, utilizes a Cas9 nickase protein (PE2) as opposed to a fully catalytically active Cas9 as in the PEDAR system. As such, unlike the PEDAR system, PRIME-Del is incapable of creating two DSBs for excising and replacing a large deletion sequence in excess of 1-10 kb with an insertion sequence. This difference in catalytic activity confers a distinct advantage of the PEDAR system over PRIME-Del, as the PEDAR system can create >10-kb target deletions simultaneously with up to 60-bp insertions in cells. PRIME-Del can only create 20- to 700-bp target deletions and up to 30-bp insertions. Consequently, the large sequence deletion/insertion capability of the PEDAR system is beyond the capability of either PRIME-Del or other conventional primer editors.17,36 Compared to PRIME-Del, PEDAR seems to be more error-prone, introducing higher fractions of direct deletion and imperfect deletion-insertion. See,
Despite the relative editing efficiency and accuracy of PECas9 being higher than conventional PE and conventional Cas9 gene editing, PECas9 activity can be further improved using multiple pegRNA sequences with distinct spacer sequences, PBSs, or RT templates. Furthermore, MMEJ or SSA enhancers could further improve the efficiency of PEDAR editing.38, 39
In one embodiment, the present invention contemplates a PEDAR system for correcting genome duplications. See,
One such genome duplication of high clinical significance is the trinucleotide CAG repeat expansion in the HTT gene, believed to result in Huntington disease43. In one embodiment, the present invention contemplates a method comprising a PEDAR system that accurately removes an HTT gene CAG repeat expansion to reduce CAG repeat length and reduce the symptoms of Huntington disease.
Thus, the PEDAR system is a clinical platform for gene therapy. The significance of PEDAR also extends to basic biology, where it could be used for protein function studies. See,
Human embryonic kidney (HEK293T) cells (ATCC) and HEK293T-TLR cells24, 25 were maintained in Dulbecco's Modified Eagle's Medium (Corning) supplemented with 10% fetal bovine serum (Gibco) and 1% Penicillin/Streptomycin (Gibco).28,29 Cells were seeded at 70% confluence in 12-well cell culture plate one day before transfection. 1.5 μg PE-Cas9, and 1 μg paired pegRNAs (0.5 μg each) was transfected with Lipofectamine 3000 reagent (Invitrogen).
Plasmids expressing pegRNAs were constructed by Gibson assembly using BsaI-digested acceptor plasmid (Addgene #132777) as vector. See, Table 2.
All animal study protocols were approved by the UMass Medical School IACUC. FahΔExon5 mice were kept on 10 mg/L NTBC water. Grompe et al., “Loss of fumarylacetoacetate hydrolase is responsible for the neonatal hepatic dysfunction phenotype of lethal albino mice” Genes & Development 7:2298-2307 (1993).
30 μg PE-Cas9 or Cas9 plasmid and 15 μg paired pegRNA expressing plasmids were 10 injected into 9-week-old mice. One week later, NTBC supplemented water was replaced with normal water, and mouse weight was measured every two days. When the mouse lost 20% of its body weight relative to the first day of measurement (day when NTBC water was removed), mouse will be supplemented with NTBC water until the body weight is back to original body weight. After 40 days, mice were euthanized.
Portion of livers were fixed with 4% formalin, embedded in paraffin, sectioned at 5 m and stained with hematoxylin and eosin (H&E) for pathology. Liver sections were de-waxed, rehydrated, and stained using standard immunohistochemistry protocols. Xue et al., “Response and resistance to NF-kappaB inhibitors in mouse models of lung adenocarcinoma” Cancer Discovery 1:236-247 (2011).
The following antibody was used: anti-FAH (Abcam, 1:400). The images were captured using Leica DMi8 microscopy.
To extract genomic DNA, HEK293T cells (3 days post transfection) were washed with PBS, pelleted, and lysed with 50 μl Quick extraction buffer (Epicenter) and incubated in a 15 thermocycler (65° C. 15 min, and 98° C. 5 min). PureLink Genomic DNA Mini Kit (Thermo Fisher) was used to extract genomic DNA from two different liver lobes (˜10 mg each) per mouse.
Target sequences were amplified using Phusion Flash PCR Master Mix (Thermo Fisher) with the primers listed in Table 3.
PCR products were analyzed by electrophoresis in a 1% agarose gel, and target amplicons were extracted using DNA extraction kit (Qiagen).
10 ng of purified PCR products were incubated with I-SceI endonuclease (NEB) according to manufacture's instruction. One-hour post incubation, the product was visualized and analyzed by electrophoresis in 4-20% TBE gel (Thermo).
The sequences around the two cut sites of the target locus were amplified using Phusion Flash PCR Master Mix (Thermo Fisher) with the primers as listed in Table 2 (supra). Sanger sequencing was performed to sequence the purified PCR products, and the trace sequences were analyzed using TIDE software (tide.nki.nl). The alignment window of left boundary was set at 10-bp.
Real-time quantitative PCR (qPCR) was used to calculate the absolute editing rate in total genomic DNA at the HEK3 locus. Quantitative PCR was performed with SsoFast EvaGreen Supermix (Bio-rad). Primers within the deletion region (P1 and P2), spanning the deletion region (P3 and P4), or across the deletion/insertion junction (P5 and P6) were designed. See,
The absolute rates of each type of editing introduced by PEDAR were calculated as follows: (1) Accurate deletion-insertion editing rate=copy number of DNA with accurate deletion-insertion/copy number of DNA with and without deletion. (2) Other deletion-insertion rate=(copy number of DNA with deletion—copy number of DNA with accurate deletion-insertion)/copy number of DNA with and without deletion. (3) Absolute rate of small indels at two cut sites=copy number of DNA without deletion x indel rate at distinct cut site calculated by TIDE/copy number of DNA with and without deletion.
To assess mCherry recovery rate, post-editing HEK293T-TLR cells were trypsinized and analyzed using the MACSQuant VYB Flow Cytometer. Untreated HEK293T-TLR cells were used as a negative control for gating. All data were analyzed by FlowJo10.0 software.
Genomic sites of interest were amplified from genomic DNA using specific primers containing llumina forward and reverse adaptors. See, Table 2. To quantify the percentage of target deletion-insertion by PE-Cas9 or Cas9, an amplification was performed on the fragment containing deletions (˜200 bp in length) from total genomic DNA to exclude length-dependent bias during PCR amplification.
20 μL PCR1 reactions were performed with 0.5 μM each of forward and reverse primer, 1 μL of genomic DNA extract or 300 ng purified genomic DNA, and 10 μL of Phusion Flash PCR Master Mix (Thermo Fisher). PCR reactions were carried out as follows: 98° C. for 10 s, then 20 cycles of [98° C. for 1 s, 55° C. for 5 s, and 72° C. for 10 s], followed by a final 72° C. extension for 3 min.
After the first round of PCR, unique Illumina barcoding reverse primer was added to each sample in a secondary PCR reaction (PCR 2). Specifically, 20 μL of a PCR reaction contained 0.5 μM of unique reverse Illumina barcoding primer pair and 0.5 μM common forward Illumina barcoding primer, 1 μL of unpurified PCR 1 reaction mixture, and 10 μL of Phusion Flash PCR Master Mix. The barcoding PCR2 reactions were carried out as follows: 98° C. for 10 s, then 20 cycles of [98° C. for 1 s, 60° C. for 5 s, and 72° C. for 10 s], followed by a final 72° C. extension for 3 min. PCR 2 products were purified by 1% agarose gel using a QIAquick Gel Extraction Kit (Qiagen), eluting with 15 μL of Elution Buffer.
DNA concentration was measured by Bioanalyzer and sequenced on an Illumina MiSeq instrument (150 bp, paired-end) according to the manufacturer's protocols. Paired-end reads were merged with FLASh41 with maximum overlap length equal to 150 bp. Alignment of amplicon sequence to the reference sequence was performed using CRISPResso242.
To quantify accurate deletion-insertion edits, CRISPResso2 was run in HDR mode using the sequence with desired deletion/insertion editing as the reference sequence. The editing window is set to 10 bp. Editing yield was calculated as: [# of HDR aligned reads]+[total reads]. For all experiments, indel yields were calculated as: [# of indel-containing reads]+[total reads].
The ClinVar variant summary was obtained from NCBI ClinVar database (accessed Dec. 31, 2020). Variants with pathogenic significance were filtered by allele ID to remove duplicates. All pathogenic variants were categorized according to mutation type. The fractions of distinct mutation types were calculated using GraphPad Prism8.
This invention was made with government support HL137167, HL131471 and HL147367, HL 153940 awarded by The National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US22/20392 | 3/15/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63165487 | Mar 2021 | US |