SYNTHETIC RNAS AND METHODS OF USE

REFERENCE TO A “SEQUENCE LISTING,” A TABLE, OR A COMPUTER PROGRAM LISTING APPENDIX SUBMITTED AS AN ASCII FILE

The Sequence Listing written in file PAT057679-WO-PCT_SL.TXT, created Oct. 29, 2018, 29,326 bytes in size, machine format IBM-PC, MS-Windows operating system, is hereby incorporated by reference.

FIELD OF THE INVENTION

The invention relates generally a process of using an enzyme to synthesize nucleic acids, particularly to in vitro transcription, and, e.g., to the in vitro transcription of guide RNAs for use in Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) technologies.

BACKGROUND OF THE INVENTION

A CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) system is a combination of protein and ribonucleic acid (“RNA”) that can alter the genetic sequence of an organism. In their natural environments, CRISPR systems protect bacteria against infection by viruses. CRISPR systems are now being developed as powerful tools to modify specific deoxyribonucleic acid (DNA) sequences in the genomes of other organisms, from plants to animals.

A Type II CRISPR-Cas system comprises three components: (1) a CRISPR RNA (crRNA) molecule, which is also called a “guide sequence” in PCT patent publication WO 2014/093661 (The Broad Institute, Inc., Massachusetts Institute of Technology) and a “targeter-RNA” in WO 2013/176772 A1 (The Regents of the University of California, University of Vienna, Jennifer A. Doudna); (2) a trans-activating crRNA (tracrRNA), which is called an “activator-RNA” in WO 2013/176772 A1, (3) and a nuclease or other effector protein, for example, protein called Cas9 (formerly CSN1). The crRNA and the tracrRNA can be joined as a single polynucleotide known as a single guide RNA (sgRNA). To alter a DNA molecule, a Type 11 CRISPR-Cas system achieves three interactions: (1) crRNA binding by specific base pairing to a specific sequence in the DNA of interest (target DNA); (2) crRNA binding by specific base pairing at another sequence to a tracrRNA; and (3) portions of the gRNA interacting with a Cas9 protein, which then cuts the target DNA at the specific site. These interactions are illustrated in FIG. 2 of JENNIFER A. DOUDNA, EMMANUELLE CHARPENTIER SCIENCE 28 Nov. 2014, which shows a double-stranded target DNA sequence that is bound to a crRNA (as indicated by the vertical black lines showing nucleic acid base pairing). A different part of the crRNA is bound to a tracrRNA. The tracrRNA interacts with a Cas9 protein that cuts the target DNA in a site-specific matter. By linking a DNA-cutting enzyme to a specific site on the target DNA, the CRISPR-Cas9 system achieves specific, targeted manipulation of DNA.

Because of the power of CRISPR systems as biotechnological methods, use of CRISPR systems is expected to grow. A problem with this growth is that there is currently not a satisfactory method for large-scale production of high-quality sgRNA. Current solid-phase chemical synthesis methods are not expected to meet the demand, for several reasons described in the specification below.

Thus, there is a need in the biotechnological art for a method for large-scale production of high-quality RNA molecules, for example, mRNA fragments, interfering RNAs, RNA aptamers, gRNAs, such as for example, sgRNA.

SUMMARY OF THE INVENTION

Provided herein is a DNA template (an IVT cassette) for making a ribonucleic acid (RNA) transcript having a length of about 20-200 bases, where the DNA template includes (a) a first deoxyribonucleic acid (DNA) sequence comprising a RNA transcription initiation site; (b) a polymerase promoter upstream from the RNA transcription initiation site; (c) a second DNA sequence encoding the RNA transcript having a length of about 20-200 bases disposed downstream of the RNA transcription initiation site; and (d) a linearization site downstream from the RNA transcription initiation site.

In some embodiments, the DNA template is part of a DNA plasmid.

In some embodiments, the polymerase promoter is selected from the group consisting of T7 polymerase promoter, a T3 polymerase promoter, an SP6 polymerase promoter, a Syn5 polymerase promoter, and an E. coli RNase promoter.

In some embodiments, the linearization site is a restriction endonuclease site.

In some embodiments, the restriction endonuclease site is selected from the group consisting of DraI, BspQI, SapI and BbsI.

In some embodiments, the DNA template has been linearized.

In some embodiments, the DNA template further includes a ribozyme sequence, e.g., downstream from the RNA transcription initiation site and upstream of the linearization site.

In some embodiments, the ribozyme sequence is selected from the group consisting of hammerhead, hairpin, hepatitis delta virus and Varkud satellite ribozyme.

In some embodiments, the DNA template further includes a T7 terminator sequence, e.g., downstream from the RNA transcription initiation site and upstream of the linearization site.

In some embodiments, the DNA template further includes a promoter enhancing sequence upstream from the RNA transcription initiation site.

In some embodiments, RNA transcript having a length of about 20-200 bases comprises a single guide RNA (sgRNA) sequence.

In some embodiments, the sgRNA sequence is about 50 bases to 150 bases in length.

Also provided herein is a double stranded DNA (dsDNA) template for making a ribonucleic acid (RNA) transcript having a length of about 20-200 bases, where the dsDNA template includes (a) a first DNA sequence comprising an RNA transcription initiation site; (b) a polymerase promoter upstream from the RNA transcription initiation site, (c) a second DNA sequence encoding the RNA transcript having a length of about 20-200 bases disposed downstream of the RNA transcription initiation site; and (d) one or more modified nucleotides at the 5′ end of the antisense strand of the dsDNA template.

In some embodiments, the dsDNA template includes a transcriptional enhancer sequence upstream of the polymerase promoter.

In some embodiments, the modified nucleotide comprises 2′-O-alkyl modification.

In some embodiments, the modified nucleotide is 2′-O-methyl modified nucleotide or 2′-O-(2-methoxyethyl) modified nucleotide.

In some embodiments, the linearization site is a restriction endonuclease site.

In some embodiments, the restriction endonuclease site is selected from the group consisting of DraI, BspQI, SapI and BbsI.

In some embodiments, the RNA transcript having a length of about 20-200 bases comprises a sgRNA sequence.

In some embodiments, the sgRNA sequence is about 50 bases to 150 bases in length.

Further provided herein is a partially single stranded DNA (ssDNA) template for making a ribonucleic acid (RNA) transcript having a length of about 20-200 bases, where the ssDNA template includes (a) a first DNA sequence comprising an RNA transcription initiation site; (b) a polymerase promoter upstream from the RNA transcription initiation site, (c) a second DNA sequence encoding the RNA transcript having a length of about 20-200 bases disposed downstream of the RNA transcription initiation site; and (d) one or more modified nucleotides at the 5′ end of the antisense strand of the dsDNA template.

In some embodiments, the partially ssDNA template includes a transcriptional enhancer sequence upstream of the polymerase promoter.

In some embodiments, the modified nucleotide comprises 2′-O-alkyl modification.

In some embodiments, the modified nucleotide is 2′-O-methyl modified nucleotide or 2′-O-(2-methoxyethyl) modified nucleotide.

In some embodiments, the single stranded DNA is complementary to all or a portion of the polymerase promoter.

In some embodiments, the RNA transcript having a length of about 20-200 bases comprises a sgRNA sequence.

In some embodiments, the sgRNA sequence is about 50 bases to 150 bases in length.

Also provided herein is a method of making a ribonucleic acid (RNA) having a length of about 20-200 bases by in vitro transcription (IVT), including the steps of (a) obtaining a DNA template described herein, and (b) making the RNA transcript by in vitro transcription.

In some embodiments, the method includes the step of amplifying the DNA template using PCR.

In some embodiments, the method further includes the step of purifying the produced RNA transcript by reverse-phase chromatography.

In some embodiments, the method further includes the step of testing the purified produced RNA transcript for the presence of immune stimulating moieties by an immunogenicity assay.

In some embodiments, the produced RNA transcript is substantially free of any immune stimulating moieties.

In some embodiments, the produced RNA transcript is substantially free of n+x variants (e.g., where X=1).

In some embodiments, the produced RNA transcript is substantially free of n−x variants (e.g., where X=1).

In some embodiments, the RNA transcript comprises a sgRNA.

In some embodiments, the sgRNA is about 50 bases to 150 bases in length.

Also provided herein is a composition including a ribonucleic acid (RNA) transcript having a length of about 20-200 bases, made by the process described herein, where (a) the composition comprising the RNA transcript is substantially free of immune stimulating moieties, and/or (b) the composition is substantially free of RNA transcripts having n−1 variants and/or n+1 variants.

In some embodiments, the RNA comprises pseudouridine (ψ), or 5-methylcytidine (m⁵C), or both ψ and m⁵C.

In some embodiments, the RNA transcript in the composition is about 50 bases to 150 bases in length.

In some embodiments, the RNA transcript is dephosphorylated or capped at the 5′ end, at the 3′ end, or at the 5′ and 3′ ends.

In some embodiments, the RNA transcript comprises a sgRNA transcript.

Also provided herein is a pharmaceutical composition, including the composition described herein, and a pharmaceutically acceptable carrier.

Further provided herein is a composition including an IVT-made polynucleotide having a length of about 20-200 bases, where the composition is substantially free of immune stimulating moieties and/or is substantially free of n−1 or n+1 variants.

In some embodiments, the IVT-made polynucleotide comprises pseudouridine (ψ), or 5-methylcytidine (m⁵C), or both ψ and m⁵C.

In some embodiments, the IVT-made polynucleotide is about 50 bases to 150 bases in length.

In some embodiments, the IVT-made polynucleotide is dephosphorylated or capped at the 5′ end, at the 3′ end, or at the 5′ and 3′ ends.

In some embodiments, the IVT-made polynucleotide is a sgRNA sequence.

In some embodiments, the sgRNA sequence is about 50 bases to 150 bases in length.

Also included herein is a cell comprising a composition or a pharmaceutical composition described herein.

In some embodiments, the cell further includes an RNA-guided DNA endonuclease enzyme.

Also provided herein is a method of altering gene expression in a cell, the method includes introducing into the cell a composition or a pharmaceutical composition described herein.

In some embodiments, the method further includes introducing to the cell an RNA-guided DNA endonuclease enzyme.

In some embodiments, the RNA-guided DNA endonuclease enzyme is Cas9 or Cpf1 or a Class II CRISPR endonuclease or a variant thereof.

In some embodiments, the cell is an animal cell.

In some embodiments, the cell is a mammalian, primate, or human cell.

In some embodiments, the cell is a hematopoietic stem or progenitor cell (HSPC).

Also provided herein is a cell, altered by the method described herein.

Also provided herein is a cell, obtainable by the method described herein.

Also provided herein is the composition or the pharmaceutical composition described herein for use in altering gene expression in a cell.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic representation of one design of a DNA template for IVT production of sgRNA. The sgRNA sequence is shown as comprising crRNA and optionally tracrRNA elements.

FIG. 2 is a schematic drawing of a plasmid-based template for making a sgRNA.

FIG. 3 is an image of an agarose gel showing electrophoresis of linearized plasmid DNA template and circular plasmid DNA template. The left lane is a molecular weight ladder. The middle lane (1) shows linearized DNA. The right lane (2) shows circular DNA.

FIG. 4 shows a PCR approach to generate a dsDNA template with modified ends for IVT production of sgRNA.

FIG. 5 shows a PCR approach to generate a partially ssDNA template with modified ends for IVT production of sgRNA.

FIG. 6 shows comparison of in vitro transcribed RNA using either natural or chemically modified nucleotides in the sgRNA. Incorporation of pseudouridine (ψ), or combination of pseudouridine (V) and 5-methylcytidine (m⁵C) into the in vitro sgRNA transcript does not affect activity of sgRNA in an in vitro Cas9 assay.

FIG. 7 is a capillary electrophoresis of an in vitro RNA transcript. The left lane is a molecular weight ladder. The right lane (1) shows an in vitro transcript of sgRNA.

FIG. 8 is an image of a gel electrophoresis assay showing the homogeneity of sgRNAs produced by in vitro transcription and by solid-phase chemical synthesis by commercial vendors.

FIG. 9A shows a 100 mer sgRNA produced by in vitro transcription (IVT) from PCR template and measured by LC-MS. The figure shows no n+x entities.

FIG. 9B shows a 100 mer sgRNA produced by in vitro transcription (IVT) from PCR template and measured by LC-MS. The figure shows minor n−x (“N minus”) and n+x (“N plus”) entities.

FIG. 10 shows a 100 mer sgRNA produced by solid-phase chemical synthesis performed by a commercial vendor and measured by LC-MS. The figure shows both n+x entities and n−1 entities, as well as side-products resulting from incomplete deprotection of the chemically synthesized sgRNA product.

FIG. 11 is a gel electrophoresis showing the results of an in vitro Cas9 assay. The figure shows that sgRNA produced by in vitro transcription has comparable activity to sgRNA produced by solid-state chemical synthesis.

FIG. 12 is a gel-electrophoresis analysis of sgRNA1 and sgRNA2 PCR templates.

FIG. 13A is an overlapped comparison of chromatograms UV260 nm of IVT product and chemical synthesis product.

FIG. 13B is a chromatograms UV260 nm of IVT product.

FIG. 13C is a chromatograms UV260 nm of chemical synthesis product.

FIG. 14 is a FACS result of a series of transfected cells. MB-CD34 and HSC cells were electroporated with respective sgRNA and cas9 ribonucleoprotein (RNP) and were later harvested and stained with B2M-FITC antibody. FACS analysis was then conducted. Comparison of the Cas9 activity complexed with either chemically synthesized sgRNA3, or IVT-derived sgRNA3 shown. IVT-derived sgRNA3 was also compared as 5′ triphosphate, or 5′ hydroxyl. The results indicated that all sgRNAs prepared via IVT worked either equally well or better than the one that was chemically synthesized.

DETAILED DESCRIPTION OF THE INVENTION

Each of the patents, patent publications, and patent applications, and all documents cited herein are hereby incorporated herein by reference, and can be used in the practice of the invention.

The recitation of a listing of elements in any definition of a variable herein includes definitions of that variable as any single element, combination or sub-combination of listed elements. The recitation of an embodiment herein includes that embodiment as any single embodiment or in combination with any other embodiments described herein.

Definitions

Provided below are definitions of some of the terms. Additional definitions are set forth throughout the specification. Unless otherwise defined herein, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art.

“5-methylcytidine” (m⁵C) is a modified nucleoside derived from 5-methylcytosine. 5-Methylcytosine is a methylated form of the DNA base cytosine that may be involved in the regulation of gene transcription. See, e.g., WO 2013/052523.

“About” means, approximately the value stated. The term “about” “reflects the inherent uncertainty in any scientific measurement—i.e., repeated measurements of the same property will not yield exactly the same result due to the limitations of accuracy and precision associated with measurement and testing techniques.

“Analogs” include polynucleotide variants which differ by one or more modifications, e.g., substitutions, additions or deletions of nucleotide residues that still maintain one or more of the properties of the parent or starting polynucleotide.

The term “alter,” “altering,” “alteration of” or “altered” gene expression used herein refers to any action or process that is capable of modulating (interchangeably used with “altering,” “regulating,” “modifying,” “controlling” and“changing”) transcription and/or translation of a sequence of interest (e.g. a gene). Therefore, in one example, the alteration of gene expression includes any transcriptional regulation such as transcriptional activation (interchangeably used with “promotion,” “enhancement,” “increase” or “upregulation” of transcription) and transcriptional repression (interchangeably used with “reduction,” “decrease,” “inhibition” or “suppression” of transcription). In another example, the alteration of gene expression includes translational activation (interchangeably used with “promotion,” “enhancement,” “increase” or “upregulation” of transcription) and translational repression (interchangeably used with “reduction,” “decrease,” “inhibition” or “suppression” of transcription). In embodiments, the alteration of gene expression includes edition of nucleic acid sequence in genomic DNA. Thus, in embodiments the edition of nucleic acid sequence includes genome edition. In embodiments, the edition of nucleic acid sequence includes editing the sequence of non-genomic DNA or RNA (e.g. mRNA). In embodiments, the edition of nucleic acid sequence is done by mutating and/or deleting one or more nucleic acids from the sequence of interest (e.g. a genomic DNA sequence, non-genomic DNA sequence or RNA sequence), or inserting additional nucleic acid(s) into the sequence of interest.

The term “genome edition” or “editing genome” used herein refers to alteration of DNA sequence in a genome. The alternation of genome can be done by deletion of part of genomic DNA sequence, insertion of an additional DNA sequence into the genome and/or replacement of part of genome with a different DNA sequence. In embodiments, the edition of genome is permanent such that a daughter cell dived from the original cell that has the edited genome will have the same, altered (or modified) genome.

“Cas” refers to “CRISPR-associated” genes and proteins. CRISPR-Cas systems can be divided into two classes, Class 1 and Class 2, according to the configuration of their effector modules. CRISPR systems that may be used vary greatly. These systems will generally have the functional activities of a being able to form complex having a protein and a gRNA sequence where the complex recognizes a second nucleic acid. CRISPR systems can be a type I, a type II, or a type III system. Non-limiting examples of suitable CRISPR proteins include Cas3, Cas4, Cas5, Cas5e (or CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8a1, Cas8a2, Cas8b, Cas8c, Cas9, Cas10, Cas Od, CasF, CasG, CasH, Csy1, Csy2, Csy3, Cse1 (or CasA), Cse2 (or CasB), Cse3 (or CasE), Cse4 (or CasC), Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csz1, Csx15, Csf1, Csf2, Csf3, Csf4, and Cu1966.

“Cas9” molecule refers to a protein that can interact with a sgRNA molecule (e.g., sequence of a domain of a tracr) and, in concert with the sgRNA molecule, localize (“target” or “home”) to a site that comprises a target sequence and PAM sequence. Cas9 molecules of, derived from, or based on the Cas9 proteins of a variety of species can be used in the methods and compositions described in this specification. A “CRISPR associated protein 9,” “Cas9,” “Csn1” or “Cas9 protein” as referred to herein includes any of the recombinant or naturally-occurring forms of the Cas9 endonuclease or variants or homologs thereof that maintain Cas9 endonuclease enzyme activity (e.g. within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to Cas9). In some embodiments, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring Cas9 protein. In embodiments, the Cas9 protein is substantially identical to the protein identified by the UniProt reference number Q99ZW2 or a variant or homolog having substantial identity thereto. Cas9 refers to the protein also known in the art as “nickase”. In embodiments, Cas9 is an RNA-guided DNA endonuclease enzyme that binds a CRISPR (clustered regularly interspaced short palindromic repeats) nucleic acid sequence. In embodiments, the CRISPR nucleic acid sequence is a prokaryotic nucleic acid sequence. In embodiments, the Cas9 nuclease from Streptococcus pyogenes is targeted to genomic DNA by a synthetic guide RNA consisting of a 20-nt guide sequence and a scaffold. The guide sequence base-pairs with the DNA target, directly upstream of a requisite 5′-NGG protospacer adjacent motif (PAM), and Cas9 mediates a double-stranded break (DSB) about 3-base pair upstream of the PAM. In embodiments, the CRISPR nuclease from Streptococcus aureus is targeted to genomic DNA by a synthetic guide RNA consisting of a 21-23-nt guide sequence and a scaffold. The guide sequence base-pairs with the DNA target, directly upstream of a requisite 5′-NNGRRT protospacer adjacent motif (PAM), and Cas9 mediates a double-stranded break (DSB) about 3-base pair upstream of the PAM.

The term “Cas9 variant” refers to proteins that have at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a functional portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to wild-type Cas9 protein and have one or more mutations that increase its binding specificity to PAM compared to wild-type Cas9 protein.

“Class 2” CRISPR systems use a large single-component Cas protein in conjunction with crRNAs to mediate interference. A class 2 CRISPR-Cas system can use Cas9. A class 2 CRISPR-Cas system can alternatively use Cpf1. See, e.g., Zetsche et al. (2015) Cell 163: 759-771. The term “Class II CRISPR endonuclease” refers to endonucleases that have similar endonuclease activity as Cas9 and participate in a Class II CRISPR system. An example Class II CRISPR system is the type II CRISPR locus from Streptococcus pyogenes SF370, which contains a cluster of four genes Cas9, Cas1, Cas2, and Csn1, as well as two non-coding RNA elements, tracrRNA and a characteristic array of repetitive sequences (direct repeats) interspaced by short stretches of non-repetitive sequences (spacers, about 30 bp each).

“Cpf1” is an RNA-guided endonuclease of a class II CRISPR/Cas system found in Prevotella and Francisella bacteria. “CRISPR/Cpf1” is a DNA-editing technology analogous to the CRISPR/Cas9 system. Cpf1 is a smaller and simpler endonuclease than Cas9, overcoming some of the CRISPR/Cas9 system limitations. The term Cpf1 includes all orthologs, and variants that can be used in a CRISPR system. A “Cpf1” or “Cpf1 protein” as referred to herein includes any of the recombinant or naturally-occurring forms of the Cpf1 (Clustered Regularly Interspaced Short Palindromic Repeats from Prevotella and Francisella 1 or CRISPR/Cpf1) endonuclease or variants or homologs thereof that maintain Cpf1endonuclease enzyme activity (e.g. within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity compared to Cpf1). In some embodiments, the variants or homologs have at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity across the whole sequence or a portion of the sequence (e.g. a 50, 100, 150 or 200 continuous amino acid portion) compared to a naturally occurring Cpf1 protein.

“CRISPR system” or “CRISPR-Cas system” comprises the transcripts and other elements involved in the activity of CRISPR-associated (Cas) genes, including sequences encoding a Cas gene or the Cas protein itself or both, a tracrRNA, a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system); RNAs (e.g., RNAs to guide Cas9, e.g. crRNA and tracrRNA or a single guide RNA (sgRNA) (chimeric RNA)); or other sequences and transcripts from a CRISPR locus. See, WO 2014/093622 A2 (The Broad Institute, Inc., Massachusetts Institute Of Technology). In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system). One of skill in the biotechnological art can identify direct repeats in silico by searching for repetitive motifs that fulfill any or all of the following criteria: (1) found in a 2 kb window of genomic sequence flanking the type II CRISPR locus; (2) span from 20 to 50 bp; and (3) interspaced by 20 to 50 bp. Two of these criteria can be used, e.g., 1 and 2, 2 and 3, or 1 and 3. Alternatively, all three criteria can be used. It might be preferred in a CRISPR complex that the tracr sequence has one or more hairpins and is 30 or more nucleotides in length, 40 or more nucleotides in length, or 50 or more nucleotides in length; the guide sequence is between 10 to 30 nucleotides in length, the CRISPR/Cas enzyme is a Type II Cas9 enzyme.

“CRISPR” refers to a set of Clustered Regularly Interspaced Short Palindromic repeats, or a system comprising such a set of repeats. Naturally occurring CRISPR systems confer resistance to foreign genetic elements, e.g., plasmids and phages. Naturally occurring CRISPR systems provide a form of acquired immunity. The CRISPR system is used in gene editing (silencing, enhancing or changing specific genes) in eukaryotes, e.g., mice, primates and humans, by, e.g., introducing into the eukaryotic cell one or more vectors encoding a specifically engineered guide RNA and one or more appropriate RNA-guided nucleases, e.g., Cas proteins. See, Wiedenheft et al. (2012) Nature 482: 331-8. In some prokaryotes, Cse (Cas subtype, Escherichia coli) proteins (e.g., CasA) form a functional complex, Cascade, which processes CRISPR RNA transcripts into spacer-repeat units that Cascade retains. Brouns et al. (2008) Science 321: 960-964. In other prokaryotes, Cas6 processes the CRISPR transcript. In Escherichia coli, CRISPR-based phage inactivation requires Cascade and Cas3, but not Cas1 or Cas2. In Pyrococcus furiosus and other prokaryotes, Cmr (Cas RAMP module) proteins form a functional complex with small CRISPR RNAs that recognizes and cleaves complementary target RNAs. A simpler CRISPR system relies on the protein Cas9, which is a nuclease with two active cutting sites, one for each strand of the double helix. Combining Cas9 and modified CRISPR locus RNA has been used in a system for gene editing. Pennisi (2013) Science 341: 833-836.

“Downstream” refers to the 5′ to 3′ direction in which RNA transcription takes place, so downstream is toward the 3′ end of an RNA molecule.

“E. coli RNA polymerase” is an RNA polymerase. The core enzyme consists of 5 subunits designated α, α, β′, β, and ω. The core enzyme is free of sigma factor and does not recognize any specific bacterial or phage DNA promoters, and so retains the ability to transcribe RNA from nonspecific initiation sequences. The holoenzyme is the core enzyme saturated with the addition of a sigma factor, which allows the enzyme to initiate RNA synthesis from specific bacterial and phage promoters.

“HDV ribozyme” is a self-cleaving RNA sequence derived from the hepatitis delta virus, having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO. 5.

“In vitro transcription (IVT) cassette” includes a RNA polymerase promoter upstream from a transcription initiation nucleotide of an RNA sequence having a length of about 20-200 bases. The IVT cassette can include one or more of a linearization sequence, a ribozyme sequence, an RNA polymerase termination sequence, and one or more modified nucleotides.

“In vitro transcription” (IVT) is RNA transcription in vitro. Many kits for in vitro transcription are commercially available. New England Biolabs (Beverly, Mass., USA) sells the HiScribe™ T7 High Yield RNA Synthesis Kit.

“Initiation site” is the initiation site for RNA transcription. The initiation nucleotide can be selected to provide transcription with a selected RNA polymerase. For example, T7 polymerase promoter best transcribes when the initiating nucleotide is guanosine. Transcription from a modified T7 polymerase promoter can also begin with adenosine.

“Immune stimulating moiety” is a substance that potentiates and/or modulates the immune responses to an antigen to improve them.

“Linearization site” or “linearization sequence” can be recognition sites for restriction endonucleases (e.g. BspQI, DraI, SapI, BbsI, etc.).

“n+x product” (or “n+x mutation,” “n+x variant,” “n+x fragment”), when referring to an RNA transcript sample, describes the difference between the expected and the actual number of ribonucleotides in an RNA transcript. The “n” is the number of nucleotides in the transcript as expected from the DNA-coding region, while “x” is the additional number of non-template nucleotides in the actual, measured RNA transcript.

“n−x product” (or “n−x mutation,” “n−x variant,” “n−x fragment”), when referring to an RNA transcript sample, describes the difference between the expected and the actual number of ribonucleotides in an RNA transcript. The “n” is the number of nucleotides in the transcript as expected from the DNA-coding region, while “x” is the reduced number of non-template nucleotides in the actual, measured RNA transcript.

“Nucleic acid” refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single-, double- or multiple-stranded form, or complements thereof. The term “polynucleotide” refers to a linear sequence of nucleotides. The term “nucleotide” typically refers to a single unit of a polynucleotide, i.e., a monomer. Nucleotides can be ribonucleotides, deoxyribonucleotides, or modified versions thereof. Examples of polynucleotides contemplated herein include single and double stranded DNA, single and double stranded RNA (including siRNA), and hybrid molecules having mixtures of single and double stranded DNA and RNA. Nucleic acids can be linear or branched. For example, nucleic acids can be a linear chain of nucleotides or the nucleic acids can be branched, e.g., such that the nucleic acids comprise one or more arms or branches of nucleotides. Optionally, the branched nucleic acids are repetitively branched to form higher ordered structures such as dendrimers and the like. The terms also encompass nucleic acids containing known nucleotide analogues or modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides. Examples of such analogues include, without limitation, phosphodiester derivatives including, e.g., phosphoramidate, phosphorodiamidate, phosphorothioate (also known as phosphothioate), phosphorodithioate, phosphonocarboxylic acids, phosphonocarboxylates, phosphonoacetic acid, phosphonoformic acid, methyl phosphonate, boron phosphonate, or O-methylphosphoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A Practical Approach, Oxford University Press); and peptide nucleic acid backbones and linkages. Other analogue nucleic acids include those with positive backbones; non-ionic backbones, modified sugars, and non-ribose backbones (e.g. phosphorodiamidate morpholino oligos or locked nucleic acids (LNA)), including those described in U.S. Pat. Nos. 5,235,033 and 5,034,506, and Chapters 6 and 7, ASC Symposium Series 580, Carbohydrate Modifications in Antisense Research, Sanghui & Cook, eds. Nucleic acids containing one or more carbocyclic sugars are also included within one definition of nucleic acids. Modifications of the ribose-phosphate backbone may be done for a variety of reasons, e.g., to increase the stability and half-life of such molecules in physiological environments or as probes on a biochip. Mixtures of naturally occurring nucleic acids and analogues can be made; alternatively, mixtures of different nucleic acid analogues, and mixtures of naturally occurring nucleic acids and analogues may be made. In embodiments, the internucleotide linkages in DNA are phosphodiester, phosphodiester derivatives, or a combination of both. In some embodiments, modified nucleotides or nucleosides include chemical modifications such as a chemical substitution at a sugar position, a phosphate position, and/or a base position of the nucleic acid including, for example, incorporation of a modified nucleotide, incorporation of a capping moiety (e.g. 3′ capping), conjugation to a high molecular weight, non-immunogenic compound (e.g. polyethylene glycol (PEG)), conjugation to a lipophilic compound, substitutions in the phosphate backbone. Base modifications may include 5-position pyrimidine modifications, modifications at exocyclic amines, substitution of 4-thiouridine, substitution of 5-bromo- or 5-iodo-uracil, backbone modifications. Sugar modifications may include 2′-amine nucleotides (2′-NH₂), 2′-fluoro nucleotides (2′-F), and 2′-O-alkyl nucleotides (e.g., 2′-O-methyl (2′-OMe) nucleotides or 2′-O-(2-methoxyethyl) nucleotides). 2′-substituted nucleosides include 2′-fluoro, 2-deoxy, 2′-O-methyl, 2′-O-β-methoxyethyl, 2′-O-allylriboribonucleosides, 2′-amino, locked nucleic acid (LNA) monomers and the like. A wide range of nucleotide, nucleoside, base and phosphate modifications are known to those or ordinary skill in the art, e.g. as described in Eaton et al., Bioorganic & Medicinal Chemistry, Vol. 5, No. 6, pp 1087-1096, 1997

The term “nucleotide” typically refers to a compound containing a nucleoside or a nucleoside analogue and at least one phosphate group or a modified phosphate group linked to it by a covalent bond. Exemplary covalent bonds include, without limitation, an ester bond between the 3′, 2′ or 5′ hydroxyl group of a nucleoside and a phosphate group.

The term “nucleoside” refers to a compound containing a sugar part and a nucleobase, e.g. pyrimidine or purine base. Exemplary sugars include, without limitation, ribose, 2-deoxyribose, arabinose and the like. Exemplary nucleobases include, without limitation, thymine, uracil, cytosine, adenine, guanine.

“Partially ssDNA oligo template” includes dsDNA portion and single stranded portion. The double stranded portion can encode all of a portion of the sgRNA. The single stranded portion can be complimentary to the sequence encoding all or a portion of an RNA polymerase promoter enhancing sequence and/or an RNA polymerase promoter.

“Plasmid based template” consists of IVT cassette inserted into appropriate vector for amplification of plasmid DNA

“Polynucleotide variant” refers to molecules that differ in their nucleotide sequence from a native or reference sequence, which can possess substitutions, deletions, or insertions at certain positions within the encoded amino acid sequence, as shown in WO 2015/006747 A2.

“Polynucleotide” includes any compound or substance that comprises a polymer of nucleotides, as shown in WO 2015/006747 A2.

“Pseudouridine” (P) is an isomer of the nucleoside uridine in which the uracil is attached via a carbon-carbon instead of a nitrogen-carbon glycosidic bond. See, WO WO2013/052523 A1.

“Purity” or “purified” refers to the level of contaminates (undesired product, e.g., residual DNA, n+x product, n−x product) in the final product/composition prepared according to the methods or processes described herein as being less than 5% by weight, less than 4% by weight, less than 3% by weight, less than 2% by weight, less than 1% by weight, less than 0.5% by weight, less than 0.1% by weight, less than 0.05% by weight or less than 0.01% by weight. Purity can be measured by any methods appropriately known in the art. In some embodiments, the purity is determined by chromatograms UV260 nm.

“Ribozyme” and “ribozyme sequence” is a self-cleaving RNA sequences that is inserted after the end of the RNA sequence. Upon transcription, the ribozyme sequence cleaves off, leaving a precise end to the RNA. This method is particularly useful if no unique restriction sites are available for linearization. One example of a ribozyme is a hepatitis delta (HDV) ribozyme of SEQ ID NO: 5.

“RNA polymerase promoter” can be, but is not limited to, a T7 promoter, a T3 promoter, a SP6 promoter, a promoter recognized by cyanophage Syn5 RNA polymerase, or a promoter recognized by E. coli RNA polymerase, as described in WO 2015/024017 A2. Those of skill in the biotechnological arts will know the nucleotide sequences of other RNA polymerase promoters

The terms “guide RNA,” “guide RNA molecule,” “gRNA molecule” or “gRNA” are used interchangeably, and refer to a set of nucleic acid molecules that promote the specific directing of a RNA-guided nuclease or other effector molecule (typically in complex with the gRNA molecule) to a target sequence. In some embodiments, said directing is accomplished through hybridization of a portion of the gRNA to DNA (e.g., through the gRNA targeting domain), and by binding of a portion of the gRNA molecule to the RNA-guided nuclease or other effector molecule (e.g., through at least the gRNA tracr). In embodiments, a gRNA molecule consists of a single contiguous polynucleotide molecule, referred to herein as a “single guide RNA” or “sgRNA” and the like. In embodiments, sgRNA includes the crRNA sequence and optionally the tracrRNA sequence. In embodiments, sgRNA includes the crRNA sequence. In embodiments, sgRNA includes the crRNA sequence and the tracrRNA sequence. The term “targeting domain” as the term is used in connection with a gRNA, is the portion of the gRNA molecule that recognizes, e.g., is complementary to, a target sequence, e.g., a target sequence within the nucleic acid of a cell, e.g., within a gene. The term “crRNA” as the term is used in connection with a gRNA molecule, is a portion of the gRNA molecule that comprises a targeting domain and a region that interacts with a tracr to form a flagpole region. The term “flagpole” as used herein in connection with a gRNA molecule, refers to the portion of the gRNA where the crRNA and the tracr bind to, or hybridize to, one another.

In some embodiments, the degree of complementarity between a targeting domain and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). The term “complementary” as used in connection with nucleic acid, refers to the pairing of bases, A with T or U, and G with C. The term complementary refers to nucleic acid molecules that are completely complementary, that is, form A to T or U pairs and G to C pairs across the entire reference sequence, as well as molecules that are at least 80%, 85%, 90%, 95%, 99% complementary.

In embodiments, the length of sgRNA sequence is 50-150 bases (e.g., 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, or 150 bases).

In embodiments, the length of sgRNA sequence is 50-120 bases (e.g., 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, or 120 bases).

In embodiments, the length of sgRNA sequence is 60-120 bases (e.g., 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, or 120 bases).

In one embodiment, the sgRNA sequence comprises a tracrRNA sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 5. In another embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 6. In another embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 7. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 33. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 34. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 35. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 36. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 37. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 38. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 39. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 40. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 41. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 42. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 43. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 44. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 45. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 46. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 47. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 48. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 49. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 50. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 51.

In some embodiments, the sgRNA may comprise, from 5′ to 3′, disposed 3′ to the targeting domain:

a)

(SEQ ID NO: 52)

GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAAC

UUGAAAAAGUGGCACCGAGUCGGUGC;

b)

(SEQ ID NO: 53)

GUUUAAGAGCUAGAAAUAGCAAGUUUAAAUAAGGCUAGUCCGUUAUCAAC

UUGAAAAAGUGGCACCGAGUCGGUGC;

c)

(SEQ ID NO: 54)

GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUC

CGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC;

d)

(SEQ ID NO: 55)

GUUUAAGAGCUAUGCUGGAAACAGCAUAGCAAGUUUAAAUAAGGCUAGUC

CGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC;

- e) any of a) to d), above, further comprising, at the 3′ end, at least 1, 2, 3, 4, 5, 6 or 7 uracil (U) nucleotides, e.g., 1, 2, 3, 4, 5, 6, or 7 uracil (U) nucleotides;
- f) any of a) to d), above, further comprising, at the 3′ end, at least 1, 2, 3, 4, 5, 6 or 7 adenine (A) nucleotides, e.g., 1, 2, 3, 4, 5, 6, or 7 adenine (A) nucleotides; or
- g) any of a) to 0, above, further comprising, at the 5′ end (e.g., at the 5′ terminus, e.g., 5′ to the targeting domain), at least 1, 2, 3, 4, 5, 6 or 7 adenine (A) nucleotides, e.g., 1, 2, 3, 4, 5, 6, or 7 adenine (A) nucleotides. In embodiments, any of a) to g) above is disposed directly 3′ to the targeting domain.

In an embodiment, a sgRNA comprises, e.g., consists of, from 5′ to 3′: [targeting domain]—

(SEQ ID NO: 56)

GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAAC

UUGAAAAAGUGGCACCGAGUCGGUGCUUUU.

In an embodiment, a sgRNA described herein comprises, e.g., consists of, from 5′ to 3′: [targeting domain]—

(SEQ ID NO: 57)

GUUUAAGAGCUAUGCUGGAAACAGCAUAGCAAGUUUAAAUAAGGCUAGUC

CGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU.

In embodiments, a sgRNA described herein comprises, e.g., consists of, a ribonucleic acid having the sequence:

(SEQ ID NO: 7)

NNNNNNNNNNNNNNNNNNNGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUA

AGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC,

where the n's refer to the residues of the targeting domain.

In an embodiment, a sgRNA described herein comprises, e.g., consists of:

(SEQ ID NO: 58)

NNNNNNNNNNNNNNNNNNNNGUUUUAGAGCUAGAAAUAGCAAGUUAAAAU

AAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUU

U,

where m indicates a base with 2′O-Methyl modification, * indicates a phosphorothioate bond, and N's indicate the residues of the targeting domain, e.g., as described herein, (optionally with an inverted abasic residue at the 5′ and/or 3′ terminus).

Other exemplary sgRNA molecules and their sequences can be found in WO2017115268 and WO2018142364, the contents of which are incorporated herein.

In some embodiments, a crRNA comprises, from 5′ to 3′, preferably disposed directly 3′ to the targeting domain:

(SEQ ID NO: 59)

a) GUUUUAGAGCUA;

(SEQ ID NO: 60)

b) GUUUAAGAGCUA;

(SEQ ID NO: 61)

c) GUUUUAGAGCUAUGCUG;

(SEQ ID NO: 62)

d) GUUUAAGAGCUAUGCUG;

(SEQ ID NO: 63)

e) GUUUUAGAGCUAUGCUGUUUUG;

(SEQ ID NO: 64)

f) GUUUAAGAGCUAUGCUGUUUUG;

or

(SEQ ID NO: 65)

g) GUUUUAGAGCUAUGCU.

In some embodiments, a tracr comprises, from 5′ to 3′:

a)

(SEQ ID NO: 66)

UAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACC

GAGUCGGUGC;

b)

(SEQ ID NO: 67)

UAGCAAGUUUAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACC

GAGUCGGUGC;

c)

(SEQ ID NO: 68)

CAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG

GCACCGAGUCGGUGC;

d)

(SEQ ID NO: 69)

CAGCAUAGCAAGUUUAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG

GCACCGAGUCGGUGC;

e)

(SEQ ID NO: 70)

AACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAG

UGGCACCGAGUCGGUGCUUUUUUU;

f)

(SEQ ID NO: 71)

AACAGCAUAGCAAGUUUAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAG

UGGCACCGAGUCGGUGCUUUUUUU;

g)

(SEQ ID NO: 72)

AACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAG

UGGCACCGAGUCGGUGC

h)

(SEQ ID NO: 73)

GUUUAAGAGCUAUGCUGGAAACAGCAUAGCAAGUUUAAAUAAGGCUAGUC

CGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU;

i)

(SEQ ID NO: 74)

AGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGG

CACCGAGUCGGUGCUUU;

j)

(SEQ ID NO: 75)

GUUGGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUU

AUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUU;

- k) any of a) to j), above, further comprising, at the 3′ end, at least 1, 2, 3, 4, 5, 6 or 7 uracil (U) nucleotides, e.g., 1, 2, 3, 4, 5, 6, or 7 uracil (U) nucleotides;
- l) any of a) to j), above, further comprising, at the 3′ end, at least 1, 2, 3, 4, 5, 6 or 7 adenine (A) nucleotides, e.g., 1, 2, 3, 4, 5, 6, or 7 adenine (A) nucleotides; or
- m) any of a) to l), above, further comprising, at the 5′ end (e.g., at the 5′ terminus), at least 1, 2, 3, 4, 5, 6 or 7 adenine (A) nucleotides, e.g., 1, 2, 3, 4, 5, 6, or 7 adenine (A) nucleotides.

In an embodiment, the sequence of k), above comprises the 3′ sequence UUUUUU, e.g., if a U6 promoter is used for transcription. In an embodiment, the sequence of k), above, comprises the 3 sequence UUUU, e.g., if an HI promoter is used for transcription. In an embodiment, sequence of k), above, comprises variable numbers of 3′ U's depending, e.g., on the termination signal of the pol-III promoter used. In an embodiment, the sequence of k), above, comprises variable 3′ sequence derived from the DNA template if a T7 promoter is used. In an embodiment, the sequence of k), above, comprises variable 3 sequence derived from the DNA template, e.g., if in vitro transcription is used to generate the RNA molecule. In an embodiment, the sequence of k), above, comprises variable 3′ sequence derived from the DNA template, e.g., if a pol-II promoter is used to drive transcription.

Other exemplary gRNA (crRNA and/or tracrRNA), sgRNA molecules and their sequences can be found in WO2017115268 and WO2018142364, the contents of which are incorporated herein.

“Sequence identity”. Percent identity of two amino acid sequences, or of two nucleic acid sequences is defined as the percentage of amino acid residues or nucleotides in a candidate sequence that are identical with the amino acid residues in a polypeptide or nucleic acid sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent amino acid or nucleic acid sequence identity can be achieved in various conventional ways, for instance, using publicly available computer software including the GCG program package (Devereux et al., Nucleic Acids Research 12(1): 387, 1984), BLASTP, BLASTN, and FASTA (Altschul et al. J. Mol. Biol. 215: 403-410, 1990). The BLAST X program is publicly available from NCBI and other sources (BLAST Manual, Altschul et al. NCBI NLM NIH Bethesda, Md. 20894; Altschul et al. J. Mol. Biol. 215: 403-410, 1990). Skilled artisans can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared. Methods to determine identity and similarity are codified in publicly available computer programs.

“SP6 promoter” is a polynucleotide sequence for a SP6 RNA polymerase to begin transcription, preferably with at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 12. Transcription initiates on the first nucleotide following the promoter sequence (typically guanosine).

A “surface coated” substrate is a substrate that is coated with a reagent that binds to a nonradiolabeled tagged probe. The substrate of the surface coated substrate can be magnetic beads. For example, Oligo dT magnetic beads are commercially available.

“Syn5 promoter” is a polynucleotide sequence for the marine cyanophage Syn5 RNA polymerase to begin transcription, preferably with at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 13. See, US 2016/0369248 A1 (President and Fellows of Harvard College). See also, Zhu et al. (1 Feb. 2013) J. Biol. Chem. 288(5): 3545-3552.

“Solid-phase chemical synthesis” is method in which molecules are bound, attached or adhered on a solid support, e.g., a bead, and synthesized step-by-step in a reactant solution; compared with normal synthesis in a liquid state, it is easier to remove excess reactant or byproduct from the product. In this method, building blocks are protected at all reactive functional groups. The two functional groups that are able to participate in the desired reaction between building blocks in the solution and on the bead can be controlled by the order of deprotection. Solid-phase chemical synthesis of relatively short fragments of nucleic acids with defined chemical structure (sequence) is useful in current laboratory practice because it provides a rapid and inexpensive access to custom-made oligonucleotides of the desired sequence. See, Sanghvi (2011) Curr. Protoc. Nucleic Acid Chem. 46 (16): 4.1.1-4.1.22. Some companies providing commercial include Axolabs (Kulmbach, Germany), Integrated DNA Technologies (IDT) (Coralville, Iowa, USA) and Biospring (Frankfurt, Germany).

As used herein, the term “substantially free” as used herein means that the undesired component (e.g., residual DNA, n+x product or n−x product, or immune stimulating moieties) is present in the composition described herein in an amount less than 5% by weight, less than 4% by weight, less than 3% by weight, less than 2% by weight, less than 1% by weight, less than 0.5% by weight, less than 0.1% by weight, less than 0.05% by weight, or less than 0.01% by weight.

“T3 RNA polymerase promoter” is a polynucleotide sequence for a T7 RNA polymerase to begin transcription, preferably with at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO. 11. Transcription initiates on the first nucleotide following the promoter sequence (usually guanosine).

“T7 RNA polymerase promoter upstream enhancer sequence” is an enhancer polynucleotide sequence upstream from the T7 RNA polymerase promoter, which helps to increase the yield of RNA in an IVT reaction, preferably with at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 6.

“T7 RNA polymerase promoter” is a polynucleotide sequence for a T7 RNA polymerase to begin transcription, preferably with at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO. 1. Transcription initiates on the first nucleotide following the promoter sequence (typically guanosine).

“Target DNA” is the DNA of interest that comprises a nucleotide sequence (the target sequence) to which the crRNA binds by Watson-Crick base pairing.

“Target sequence” refers to a sequence to which a guide sequence (e.g., a gRNA targeting domain) is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. A target sequence can comprise any polynucleotide, such as DNA or RNA polynucleotides. A target sequence can be located in the nucleus or cytoplasm of a cell.

“tracrRNA” (trans-activating CRISPR) is the portion of sgRNA that binds to Cas9. tracrRNA is called an “activator-RNA” in in WO 2013/176772 A1. The portion of sgRNA that binds to Cas9 is constant.

“Transcription initiation nucleotide” is the first nucleotide from which transcription begins. A transcription initiation nucleotide could be A, T, C or G, depending on promoter and RNA polymerase chosen for specific transcript.

“Transcript” used herein refers to a polynucleotide of ribonucleotides having a length of about 20-200 bases (e.g., 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, or 200 bases), which is transcribed from a DNA template described herein through the process/method (e.g., IVT) described herein. In an embodiment, “transcript” is also referred as IVT-made transcript or IVT-made polynucleotide or IVT-made RNA. In an embodiment, transcript described herein is an IVT-made gRNA (crRNA or tracrRNA). In an embodiment, transcript described herein is an IVT-made sgRNA.

“Upstream” refers to the 5′ to 3′ direction in which RNA transcription takes place, so downstream is toward the 5′ end of an RNA molecule.

IVT Cassettes, Compositions and Methods

The disclosure is directed to polynucleotides and methods of generating, characterizing and analyzing polynucleotides (e.g., RNAs having a length of about 20-200 bases, for example, guide RNAs (gRNAs) and single guide RNAs (sgRNAs)). The polynucleotides, e.g., RNAs having a length of about 20-200 bases, for example, gRNA and/or sgRNA, can be used to modulate transcription, e.g., in clinical or research settings. The disclosure provides an improvement in manufacturing RNAs having a length of about 20-200 bases and quality. By practicing the methods described herein, the variety of contaminants in a composition of full-length product (FLP) RNA transcript produced by in vitro transcription (IVT) is less than the corresponding composition of transcript produced by solid-phase chemical synthesis.

In solid-phase chemical synthesis of long ˜100 mer RNA oligonucleotides, as shown in FIG. 25 of FLUOROUS CHEMISTRY, EDITORS: HORVÁTH, ISTVÁN T. (ED.), the variety of oligonucleotide impurities than can occur is much greater than from IVT synthesis of RNA. Impurities can originate from incomplete addition of nucleotides, forming so-called “n−x truncated” fragments (also referred to herein as “n−x variants”), whose synthesis has been prematurely terminated. Also, an inefficient capping of sequences that have failed to incorporate a nucleotide results in the formation of oligonucleotides with internal deletions, which are also n−x fragments. Moreover, inefficient detritylation can result in other n−x fragments. Additional side-reactions in solid-phase chemical synthesis can occur because of the repeated exposure of the growing oligonucleotide chain to chemicals. Premature detritylation during coupling results in n+x fragments (also referred to herein as “n+x variants”) that have duplicated nucleotides in the sequence. Depurination during the detritylation step results in the formation of oligonucleotide products with abasic sites, which are later cleaved by ammonia during the deprotection stage. Minimizing undesired side reactions during chemical oligonucleotide synthesis requires protecting groups attached to the nucleosides during the chain elongation. Upon the completion of the oligonucleotide chain assembly, the protecting groups are removed to yield the desired oligonucleotides. Thus, other side products such as oligomers carrying residual protecting groups arising from incomplete deprotection, acrylamide adducts, bicyclic products, etc. can occur. These side products have previously been problematic to remove from the composition of the desired RNA transcript. In general, the longer the RNA chain, the more challenging the solid-phase synthesis is getting. In fact, even in cases of high coupling efficiencies (>99%) the percentage of side-products, generated with every nucleotide addition, accumulates drastically when oligomer the oligomer length is growing beyond >50 mer. The general relationship between full-length product (FLP) yield, oligonucleotide length, and various coupling efficiencies is that small decreases in coupling efficiency (51%) result in large decreases in full-length product (FLP) yield, most notably for long oligonucleotides. Because these various side-products are difficult (if not impossible) to remove, there is a risk that corresponding RNA compositions trigger unwanted off-targeting effects caused by the impurities contained in RNA sequence in compositions generated by chemical synthesis. The biggest risks are mutations in the crRNA region.

Also, because the chemical synthesis of long oligonucleotides has a very low yield, the overall cost of chemical synthesis will be higher than that of IVT.

In addition, it had been described in the art that IVT is not recommended for generating gRNA, allegedly due to three main reasons: low purity, variable efficiency and high cost (see, e.g., www.synthego.com/resources/3-Reasons-to-Stop-Using-IVT).

The compositions and methods described herein, therefore, provide unexpected solutions to some of the problems of chemical synthesis and other problems known in the art.

The present disclosure overcomes some of the deficiencies of chemical synthesis by allowing production of a composition of polynucleotides (e.g., RNAs having a length of about 20-200 bases, such as gRNA, sgRNA) having less than 6%, 5%, 4%, 3%, 2%, 1% or no detectable n−x fragments, preferably less than 4%, 3%, 2%, 1% or no detectable n−x fragments. n−x fragments can be detected by any methods known in the art, for example, by LC-MS or Next generation sequencing (NGS), ion exchange chromatography, reversed phase chromatography, or electrophoresis.

In embodiments, the percentage of desired product (e.g., RNA molecules having a length of about 20-200 bases, for example, gRNAs, sgRNAs, RNA aptamers, RNAi molecules, etc.) among IVT product is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 150%, 200% or higher than the percentage of desired product among the chemically synthesized product. In other words, in embodiments, the purity of IVT product described herein is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 150%, 200% or higher than the purity of the chemically synthesized product (see, e.g., FIG. 14).

In one aspect, the disclosure features a DNA template for making a ribonucleic acid (RNA) transcript having a length of about 20-200 bases by in vitro transcription (IVT). The DNA template comprises an IVT cassette, which comprises a first DNA sequence including an RNA transcription initiation site, a polymerase promoter upstream from the RNA transcription initiation site, a second DNA sequence encoding the RNA transcript having a length of about 20-200 bases disposed downstream of the RNA transcription initiation site, and a linearization site downstream from the transcription initiation site (e.g., the downstream from the second DNA sequence). In some embodiments, the RNA transcript having a length of about 20-200 bases comprises a gRNA. In some embodiments, the gRNA is about 20-150 bases in length. In some embodiments, the RNA transcript having a length of about 20-200 bases comprises a sgRNA. In some embodiments, the sgRNA is about 50-150 bases in length. In some embodiments, the sgRNA sequence encodes a fusion transcript, which comprises crRNA and optionally tracrRNA. In some embodiments, the sgRNA sequence starts with a transcription initiation nucleotide. FIG. 1 shows a drawing of an exemplary IVT cassette, comprising a DNA sequence encoding the two sgRNA elements, crRNA and optionally tracrRNA. In some embodiments, the linearization site is immediately downstream of the second DNA sequence encoding the RNA transcript having a length of about 20-200 bases (e.g., the sgRNA sequence), near or at the end of the second DNA sequence, to keep the resulting RNA transcript at a desired length.

In one embodiment, the DNA template is part of a DNA plasmid, which comprises the IVT cassette and an appropriate vector for amplification of DNA, e.g., so that the plasmid can be amplified by growing in bacteria, e.g., Escherichia coli. See, FIG. 2.

In one embodiment, the promoter is an RNA polymerase promoter, e.g., selected from a T7 promoter, a T3 promoter, a SP6 promoter, a Syn5 promoter, a phi 2.5 overlapping promoter, an AC15/C26 mutA promoter, an A6/B1 mutA promoter, and a phi 9 (A-15C) promoter. In one embodiment, the promoter is a T7 promoter, e.g., having a sequence with at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 1. In another embodiment, the promoter is a T3 promoter, e.g., having a sequence with at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 2. In another embodiment, the promoter is a SP6 promoter, e.g., having a sequence with at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 3. In yet another embodiment, the promoter is a Syn5 promoter, e.g., having a sequence with at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 4. In yet another embodiment, the promoter is a phi 2.5 overlapping promoter, e.g., having a sequence with at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 27. In yet another embodiment, the promoter is an AC15/C26 mutA promoter, e.g., having a sequence with at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 28. In yet another embodiment, the promoter is an A6/B1 mutA promoter, e.g., having a sequence with at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 29. In yet another embodiment, the promoter is a phi 9 (A-15C) promoter, e.g., having a sequence with at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 30. The nucleotide sequences of other RNA polymerase promoters (e.g., promoters for E. coli RNA polymerase) are known in the art.

In one embodiment, the RNA transcription initiation site has adenosine as the initiating nucleotide. In one embodiment, where the RNA polymerase promoter is a T7 promoter, the initiation site has adenosine as the initiating nucleotide. In another embodiment, the RNA transcription initiation site has guanosine as the initiating nucleotide. In one embodiment, where the RNA polymerase promoter is a T7 promoter, the initiation site has guanosine as the initiating nucleotide.

In one embodiment, the sgRNA sequence comprises a tracrRNA sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 5. In another embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 6. In another embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 7. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 33. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 34. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 35. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 36. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 37. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 38. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 39. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 40. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 41. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 42. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 43. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 44. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 45. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 46. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 47. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 48. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 49. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 51. In one embodiment, the sgRNA sequence comprises a sequence having at least 90%, 95%, 98%, 99% or 100% sequence identity to SEQ ID NO: 51.

In some embodiments, the sgRNA may comprise, from 5′ to 3′, disposed 3′ to the targeting domain:

- e) any of a) to d), above, further comprising, at the 3′ end, at least 1, 2, 3, 4, 5, 6 or 7 uracil (U) nucleotides, e.g., 1, 2, 3, 4, 5, 6, or 7 uracil (U) nucleotides;
- f) any of a) to d), above, further comprising, at the 3′ end, at least 1, 2, 3, 4, 5, 6 or 7 adenine (A) nucleotides, e.g., 1, 2, 3, 4, 5, 6, or 7 adenine (A) nucleotides; or
- g) any of a) to f), above, further comprising, at the 5′ end (e.g., at the 5′ terminus, e.g., 5′ to the targeting domain), at least 1, 2, 3, 4, 5, 6 or 7 adenine (A) nucleotides, e.g., 1, 2, 3, 4, 5, 6, or 7 adenine (A) nucleotides. In embodiments, any of a) to g) above is disposed directly 3′ to the targeting domain.