The disclosure relates to novel RNA constructs encoding foreign proteins or functional RNAs, with a circularization system based on group I introns, which are capable of self-circularizing with high efficiency without introducing extraneous fragments, as well as to methods of using the constructs to make circular RNAs.
Messenger RNA (mRNA) is a type of single-stranded RNA involved in protein synthesis. In vitro transcribed (IVT) mRNAs have recently attracted much attention as novel agents with great therapeutic potential. Especially, the successful use of mRNA vaccines for COVID-19 has proven the safety and efficacy of mRNA therapeutic agents in vivo. Because of its short development cycle, flexibility in design, and strong immune activation, mRNA vaccines have been rapidly validated for their safety and efficacy in combating infectious diseases such as Covid-19. However, the use of mRNA in nonvaccine therapies such as protein replacement is limited by several factors including mRNA stability, poor persistence of expression in vivo, immunogenicity, and limited range of expressing cell types. Linear single-stranded mRNA requires adding a 5′ cap and 3′ polyA tail or even incorporating modified nucleotides like 1 my to guarantee stability and expression levels in vivo while reducing the risk of unwanted immunogenicity. Moreover, even with these modifications, mRNA is susceptible to exonuclease digestion, resulting in a short half-life both in vitro and in vivo.
Circular RNA (circRNA) is a type of single-stranded RNA which forms a 3′-5′ covalently closed loop. CircRNAs are created by a non-canonical splicing process termed “backsplicing”, whereby the spliceosome fuses a splice donor site in a downstream exon (5′ splice site) to a splice acceptor site in an upstream exon (3′ splice site). Unlike linear mRNAs, circRNAs do not require a 5′-cap or 3′-poly (A) tail for their stability. The closed ring structure of circRNAs protects them from exonuclease-mediated degradation, rendering them resistant to several mechanisms of RNA turnover and having a 2.5-fold longer half-life compared to their linear mRNA counterparts. Moreover, circRNAs have beneficial features not shared by mRNAs, such as reduced immunogenicity and extended translation duration. For these reasons, circRNAs have been explored as therapeutic agents.
CircRNAs are generally noncoding, as they lack the 5′-cap structure, but several studies have provided evidence that some circRNAs can be translated into proteins. Engineered circRNA with cap-independent translation elements such as internal ribosome entry sites (IRES) or N6-methyladenosine (m6A) modifications can also facilitate protein translation in vivo. Like mRNAs, circRNAs can also be delivered via lipid nanoparticles (LNPs) to provide in vivo expression, which may be more sustained than linear mRNAs.
CircRNAs can be generated post-transcriptionally in living cells by plasmids carrying minigene sequences. Since spliceosome-mediated backsplicing is a major mechanism of circularization in vivo, most circRNA minigenes have at least exonic regions containing the sequence to be circularized, as well as 5′ and 3′ flanking intronic sequences containing splicing motifs. However, this vector transcription-dependent circularization can still produce variable amounts of unwanted heterologous by-products that cannot be easily identified or purified in vivo. In addition, this approach requires plasmid vectors to be efficiently delivered into the nucleus, making technical development difficult, while double-stranded DNAs also carry the risk of integrating into the genome.
Protein ligase and ribozyme assays are commonly used for in vitro preparation of circRNAs. Enzyme ligation-mediated circularization usually requires a complementary splint (a DNA or RNA oligo) to bring both ends of the RNA molecule closer and then catalysis by several enzymes from bacteriophage T4, including T4 DNA ligase, T4 RNA ligase 1, and T4 RNA ligase 2. However, all these ligase-mediated circularizations are relatively inefficient, especially for large RNA molecules. In addition, the generation of intermolecular end-joining by-products in the ligation reaction cannot be avoided entirely, leading to complicated system optimization and unfavorable production-scale-up.
Ribozyme-mediated RNA circularizations can also be performed by the permuted intron and exon (PIE) method based on the group I intron or group II intron self-splicing system.
The group I introns are naturally occurring cis-splicing ribozymes that can splice an RNA transcript and remove themselves from the primary transcript by autocatalyzing two consecutive trans-esterification reactions and joining the two flanking exons (see
Helices P1 to P9 (and the intervening junctions and loops) assemble to form the catalytic core of group I introns. In general, helix P1 comprises at least 4-6 base pairs from the 5′ intron and 5′ exon, ending with a conserved G-U wobble base pair (5′-GNNNNN-3′ in intron or 5′-NNNNNU-3′ in exon), which contributes to 5′ splice site recognition. In addition, the P1 extension region (or “P1ex”) is important for the 5′ splicing reaction rate and splicing site recognition. The sequence ‘GNNNNN’ is also known as the internal guide sequence (IGS). For some group I introns, helix P10 is formed after the first step of splicing and involves base pairing between the 3′ intron and 3′ exon. The 3′ splice site is partially recognized through a conserved guanine at the 3′ end of the intron, termed Omega G (ωG). In some cases, the 3′ splice site accuracy can be improved by introducing or enhancing P9.0 or even P9.2 structures.
Previous studies provided a permuted intron-exon (PIE) splicing strategy using a modified group I intron, including placement of the 5′ half of the group I intron to the tail of the exon and transferring the remaining 3′ half to the head of the same exon. See, e.g., R. A. Wesselhoeft, P. S. Kowalski, D. G. Anderson, Engineering circular RNA for potent and stable translation in eukaryotic cells. Nat Commun 9, 2629 (2018); M. Puttaraju, M. D. Been, Group I permuted intron-exon (PIE) sequences self-splice to produce circular exons. Nucleic Acids Res 20, 5357-5364 (1992). This method achieves RNA circularization by a regular group I intron self-splicing reaction that includes two transesterifications at defined splice sites. Attack of the 5′ splice site by free GTP leads to the release of the 3′ end sequence (5′ half intron) of the PIE construct (first transesterification). The free 3′-OH group of the newly generated 3′ half exon attacks the 3′ splice site in the second transesterification reaction. This leads to the release of circRNA and 3′ half intron. Compared with enzymatic ligation, the PIE method can be used to circularize larger linear RNA precursors, it does not require additional protein ligase, and the reaction conditions and purification methods are easier to develop and optimize.
Circular RNAs encoding foreign proteins synthesized by the PIE method have been validated both in vitro and in vivo and retain the characteristics of low immunogenicity and longer translation duration, which broaden their applications. Based on these advantages, the PIE system is currently the most studied and widely used method for RNA circularization. Although the PIE system can achieve circularization of long fragments more efficiently than ligase-mediated methods, the splicing reaction introduces additional fragments (E1, E2, and spacer) from phage or Anabaena exons that may activate immune responses.
A PIE-group II intron system can achieve scarless circularization by optimizing exon binding sites (EBS) sequences to match the intron binding sites (IBS). However, the alteration of EBS greatly impacts splicing efficiency; complicated optimization and testing are required to guarantee efficient splicing, and the EBS-IBS pairs may in some cases be incompatible. The PIE system splits the ribozyme into two parts placed at the RNA construct's 3′ and 5′ terminals, which requires that the ribozyme fragments at both ends are correctly folded and spatially brought closer to form the complete ribozyme catalytic domain. However, the structure of the internal sequences may interfere with the ribozyme structure at both ends, which requires additional spacer sequences to separate the internal sequences and the ribozyme fragments at both ends.
There is a need for ribozyme-mediated circularization approaches that are simpler, faster, safer, more accurate, and more efficient than conventional processes.
In an aspect, the disclosure provides novel RNA constructs (also referred to as “circular RNA precursors”) encoding foreign proteins or functional RNAs, with a circularization system based on group I introns, different from the PIE constructs, e.g., in having an intact ribozyme core. The RNA constructs are capable of self-circularizing with high efficiency without introducing extraneous fragments.
The novel RNA construct comprising,
This design retains the complete core domain of the ribozyme, which is more conducive to the correct folding of the ribozyme than the traditional PIE system. This method can also achieve circularization of the nucleotide sequence of interest without inclusion of exogenous sequence residues by mimicking the formation of a P1 duplex, selecting an arbitrary sequence (for example, ‘nnnnnu’ or ‘nnnnnc’) in a nucleotide sequence to be circularized as the target site sequence (simply guaranteeing that the target site sequence is unique in the RNA construct) and placing the sequence downstream the target site to the 3′ end of the nucleotide sequence to be circularized at the 5′ region of the GOI and the sequence from the 5′ end of the nucleotide sequence to be circularized to the target site at the 3′ region of the GOI, and then designing a corresponding IGS.
In some embodiments, the RNA construct comprises, from 5′ end to 3′ end,
The disclosure further provides an RNA construct comprising, from 5′ end to 3′ end,
In some embodiments, ωN is guanine (ωG).
In some embodiments, the ribozyme core sequence comprises a nucleotide sequence encoding the scaffold domain and catalytic domain of a group I intron; preferably, the ribozyme core sequence comprises or consists of the sequence from the IGS end to the sequence before the 5′ half of P9.0 duplex of a group I intron.
In some embodiments, the ribozyme core sequence is derived from a group IC1 (e.g., from Tetrahymena sp. (e.g., T. thermophile, T. cosmopolitanis, T. hyperangularis, T. malaccensis or T. pigmentosa) or Pneumocystis sp. (e.g., Pneumocystis carinii), IC2, IC3 (e.g., from Anabaena sp. PCC7120 or Azoarcus sp. BH72) or IA2 (e.g., from Bacteriophage Twort) intron.
The disclosure also provides an RNA construct comprising, from 5′ end to 3′ end,
The disclosure also provides DNA constructs encoding the novel RNA constructs and methods of making circular RNAs using the novel constructs.
Further areas of applicability of the present disclosure will become apparent from the detailed description provided hereinafter. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the disclosure, are intended for purposes of illustration only and are not intended to limit the scope of the disclosure.
The following description of the preferred embodiment(s) is merely exemplary in nature and is in no way intended to limit the disclosure, its application, or uses.
As used throughout, ranges are used as shorthand for describing each and every value that is within the range. Any value within the range can be selected as the terminus of the range. In addition, all references cited herein are hereby incorporated by referenced in their entireties. In the event of a conflict in a definition in the present disclosure and that of a cited reference, the present disclosure controls.
Unless indicated otherwise, the scientific and technological terminologies used herein refer to meanings commonly understood by a person skilled in the art. Also, the terminologies and experimental procedures used herein relating to protein and nucleotide chemistry, molecular biology, cell and tissue cultivation, microbiology, immunology, all belong to terminologies and conventional methods generally used in the art. For example, the standard DNA recombination and molecular cloning technology used herein are well known to a person skilled in the art, and are described in details in the following references: Sambrook, J., Fritsch, Efland Maniatis, T., Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, 1989. In the meantime, in order to better understand the present invention, definitions and explanations for the relevant terminologies are provided below.
Unless otherwise indicated, the term “at least” preceding a series of elements is to be understood to refer to every element in the series. Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the present application. Such equivalents are intended to be encompassed by the present application.
As used herein, the expression “comprising”, “including”, “containing” or “having” are open-ended, and do not exclude additional unrecited elements, steps, or ingredients. The expression “consisting of” excludes any element, step, or ingredient not designated. The expression “consisting essentially of” means that the scope is limited to the designated elements, steps or ingredients, plus elements, steps or ingredients that are optionally present that do not substantially affect the essential and novel characteristics of the claimed subject matter. It should be understood that the expression “comprising” encompasses the expressions “consisting essentially of” and “consisting of”.
It must be noted that as used herein and in the appended claims, the singular forms “a”, “an” and “the” include plural reference unless the context clearly dictates otherwise.
Unless otherwise stated, any numerical values, such as a concentration or a concentration range described herein, are to be understood as being modified in all instances by the term “about”. Thus, a numerical value typically includes ±10% of the recited value. For example, a concentration of 1 mg/mL includes 0.9 mg/mL to 1.1 mg/mL. Likewise, a concentration range of 1% to 10% (w/v) includes 0.9% (w/v) to 11% (w/v). As used herein, the use of a numerical range expressly includes all possible subranges, all individual numerical values within that range, including integers within such ranges and fractions of the values unless the context clearly indicates otherwise.
As used herein, the term “and/or” encompasses all combinations of items connected by the term, and each combination should be regarded as individually listed herein. For example, “A and/or B” covers “A”, “A and B”, and “B”. For example, “A, B, and/or C” covers “A”, “B”, “C”, “A and B”, “A and C”, “B and C”, and “A and B and C”.
The terms “polynucleotide”, “nucleic acid sequence”, “nucleotide sequence”, and “nucleic acid fragment” are used interchangeably to refer to a polymer of RNA or DNA that is single- or double-stranded, optionally containing synthetic, non-natural or modified nucleotides.
As used herein, a nucleotide in a nucleotide sequence is referred to by the single letter designation of its nucleobase as follows: “A (a)” for adenine or deoxyadenine (for RNA or DNA, respectively), “C (c)” for cytosine or deoxycytosine, “G (g)” for guanine or deoxyguanine, “U (u)” for uracil, “T (t)” for deoxythymine, “R” for purines (A or G), “Y” for pyrimidines (C or T), “I” for hypoxanthine, and “N” or “n” for any nucleotide. Although a nucleotide sequence may be represented as a DNA sequence (comprising T(s)), when referring to RNA, one skilled in the art can readily determine the corresponding RNA sequence (i.e., replacing T with U), and vice versa.
As used herein, “operably linked”, when referring to a first nucleotide sequence that is operably linked with a second nucleotide sequence, means a situation when the first nucleotide sequence is placed in a functional relationship with the second nucleotide sequence.
As used herein, the term “cis-splicing” refers to splicing from the same nucleic acid strand.
As used herein, the term “back-splicing site” or “backsplicing site” when used with reference to a circular RNA, refers to a dinucleotide served as the point of reconnection during the back-splicing process, resulting in the two ends of a linear nucleotide sequence joining to form the circular RNA.
As used herein, the term “splice site” refers to a dinucleotide between which a phosphodiester bond is cleaved during RNA circularization.
As used herein, the terms “native” and “naturally-occurring” mean the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms. For example, a naturally occurring group I intron or native nucleotide sequence of a group I intron may be present in and isolated from a natural source, and is not intentionally modified by human manipulation.
As used herein, the first nucleotide starting from the 5′ end of a nucleotide sequence is designated as the 5′ end nucleotide and is numbered as nucleotide 1 of the nucleotide sequence. Similarly, the last nucleotide starting from the 5′ end of a nucleotide sequence is designated as the 3′ end nucleotide of the nucleotide sequence.
As is understood by those skilled in the art, “upstream” is toward the 5′ direction of a nucleotide sequence and “downstream” is toward the 3′ direction of a nucleotide sequence.
Unless indicated otherwise, the expression “from 5′ end to 3′ end” means that the listed elements of a nucleotide sequence are present in a 5′ to 3′ direction and does not limit the length of the nucleotide sequence and elements therein. Thus, such an expression does not exclude any other elements located upstream, downstream and/or inbetween of the listed elements.
Unless indicated otherwise, a first nucleotide sequence (or a nucleotide) is “at the 5′ end” or “at the 3′ end” of a second nucleotide sequence refers to the terminal position of the first nucleotide sequence (or the nucleotide) within the second nucleotide sequence. While a first nucleotide sequence (or a nucleotide) is “in the 5′ region” or “in the 3′ region” of a second nucleotide sequence or a similar expression means the first nucleotide sequence (or the nucleotide) is located at a position adjacent to the 5′ or 3′ end nucleotide of the second nucleotide sequence but not necessarily at the 5′ or 3′ end of the second nucleotide sequence.
Secondary structures of RNAs can be predicted and/or determined from the nucleotide sequence by RNA structure prediction tools such as RNAfold (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) or RNAstructure (https://rna.urmc.rochester.edu/RNAstructureWeb/index.html).
As used herein, the term “pair with” refers to two nucleic acid strands or two regions on the same nucleic acid strand form a duplex-containing structure through Watson-Crick base pairing and/or non-Watson-Crick base pairing. The expression “form”, “can form”, “may form” or the like is open-ended and inclusive, and do not exclude additional unrecited structures. For example, the expression “the 5′ and 3′ flanking sequences can pair with each to form a double-stranded region” means that a double-stranded region is formed through base pairing between at least a portion of the nucleotides in the 5′ and 3′ flanking sequences, but do not exclude any other structure may be formed by the 5′ flanking sequence and 3′ flanking sequence alone or in combination.
As used herein, the term “complementary” refers to Watson-Crick base pairing and/or non-Watson-Crick base pairing. As used herein, the term “reverse complementary” refers to base pairing is formed between a first nucleotide sequence in the 5′ to 3′ direction and a second nucleotide sequence in the 3′ to 5′ direction. Base pairings between two reverse complementary nucleotide sequences include Watson-Crick base pairing and/or non-Watson-Crick base pairing. Preferably, all base pairings between two reverse complementary nucleotide sequences are Watson-Crick base pairings. A “reverse complement” of a given nucleotide sequence can be obtained by reversing the order of all the nucleotides in the nucleotide sequence and then replacing all the nucleotides with their respective Watson-Crick complementary nucleotides.
The degree of complementarity between two nucleotide sequences can be indicated by the percentage of nucleotides in a first nucleotide sequence which can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleotide sequence (e.g., about 50%, about 60%, about 70%, about 80%, about 90%, and 100% complementary). Two nucleotide sequences are “reverse complementary” or “perfectly complementary” if all the contiguous nucleotides of a first nucleotide sequence form hydrogen bonds with the same number of contiguous nucleotides in a second nucleotide sequence.
As used herein, the term “at least partially (reverse) complementary” or “substantially complementary” means that at least about 50% (e.g., at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, about 100%) nucleotides of a nucleotide sequence (e.g., a 5′ homology arm sequence) can form base pairs with another nucleotide sequence (e.g., a 3′ homology arm sequence). Two substantially complementary nucleotide sequences (for example, two homology arm sequences) may share a sufficient level of sequence identity to one another's reverse complement to allow hybridization occurs. Two nucleotide sequences (for example, two homology arm sequences) are “substantially complementary” or “at least partially complementary” if the two nucleotide sequences are at least 50% (e.g., at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%) complementary over a region of at least 8 nucleotides (e.g., at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, or more nucleotides), or if the two nucleotide sequences hybridize under at least moderate, or, in some embodiments high, stringency conditions. Exemplary moderate stringency conditions include overnight incubation at 37° C. in a solution comprising 20% formamide, 5×SSC (150 mM NaCl, 15 mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5×Denhardt's solution, 10% dextran sulfate, and 20 mg/ml denatured sheared salmon sperm DNA, followed by washing the filters in 1×SSC at about 37-50° C., or substantially similar conditions, e.g., the moderately stringent conditions described in Sambrook, J., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press; 4th edition (Jun. 15, 2012). High stringency conditions are conditions that use, for example (1) low ionic strength and high temperature for washing, such as 0.015 M sodium chloride/0.0015 M sodium citrate/0.1% sodium dodecyl sulfate (SDS) at 50° C., (2) employ a denaturing agent during hybridization, such as formamide, for example, 50% (v/v) formamide with 0.1% bovine serum albumin (BSA)/0.1% Ficoll/0.1% polyvinylpyrrolidone (PVP)/50 mM sodium phosphate buffer at pH 6.5 with 750 mM sodium chloride and 75 mM sodium citrate at 42° C., or (3) employ 50% formamide, 5×SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodium phosphate (pH 6.8), 0.1% sodium pyrophosphate, 5×Denhardt's solution, sonicated salmon sperm DNA (50 μg/ml), 0.1% SDS, and 10% dextran sulfate at 42° C., with washes at (i) 42° C. in 0.2×SSC, (ii) 55° C. in 50% formamide, and (iii) 55° C. in 0.1×SSC (optionally in combination with EDTA). Additional details and an explanation of stringency of hybridization reactions are provided in, e.g., Sambrook, supra; and Ausubel et al., eds., Short Protocols in Molecular Biology, 5th ed., John Wiley & Sons, Inc., Hoboken, N.J. (2002).
As used herein, two “homology arm sequences” or “homology arms” complement, or are complementary, to one another when the two regions share a sufficient level of sequence identity to one another's reverse complement to allow hybridization occurs. As used herein, a “homology arm sequence” is any contiguous sequence that can form base pairs with preferably at least about 50% (e.g., at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, about 100%) of another sequence (another homology arm sequence) in the RNA construct.
As used herein, a “spacer” refers to a nucleotide sequence separating two other elements (segments) along a polynucleotide sequence. A spacer may be of any length. For example, A spacer may be of 1-100 nucleotides, preferably 2-50 nucleotides in length. A spacer may comprise a defined or random nucleotide sequence.
As used herein, the term “Watson-Crick base pairing” refers to a hydrogen-bond pairing occurs between adenine and thymine (A-T) (DNA) or uracil (A-U) (RNA), or guanine and cytosine (G-C).
As used herein, the term “wobble base pairing” refers to a type of non-Watson-Crick base pairing. Wobble base pairing may be formed between hypoxanthine and uracil (I-U, I for inosine), guanine and uracil (G-U), adenine and cytosine (A-C), hypoxanthine and adenine (I-A), or hypoxanthine and cytosine (I-C), but not limited to.
As used herein, the term “base pair” or “base pairing” refers to two nitrogenous bases that are connected by hydrogen bonds. A base pair can be a Watson-Crick base pair or a non-Watson-Crick base pair. Examples of non-Watson-Crick base pairs may include but not limited to wobble base pairs and Hoogsteen base pairs. Among the most frequent of wobble base pairs are G-T (U) base pairing and A-C base pairing. Other non-Watson-Crick base pairs include but are not limited to C-U, A-G (or I) and A-A.
As used herein, the term “stem-loop”, also known as a “hairpin”, refers to a secondary structure that can occur in single-stranded nucleic acids. The stem-loop may occur when two regions of the same strand pair with each other to form a double-stranded region that ends in an unpaired loop.
As used herein, the terms “duplex”, “double-stranded region” and “helix” are used interchangeably to refer to a double-stranded structure comprising at least one base pair. A duplex may comprise a Watson-Crick base pair, a non-Watson-Crick base pair or a combination thereof.
As used herein, the term “duplex mimic” refers to a double-stranded structure that functionally mimics the native duplex structure of a group I intron ribozyme. A duplex mimic may comprise at least one base pair. A duplex mimic may comprise a Watson-Crick base pair, a non-Watson-Crick base pair or a combination thereof. The sequences forming a duplex mimic preferably are but not limited to the corresponding native ribozyme sequences and can be truncated or designed as other alternative sequences.
As used herein, the term “free energy” refers to the energy released by folding an unfolded nucleic acid molecule (e.g., RNA or DNA, etc.), or, conversely, the amount of energy that must be added in order to unfold a folded nucleic acid molecule (e.g., RNA or DNA, etc.). The “minimum free energy (MFE)” of a nucleic acid molecule (e.g., DNA, RNA, etc.) describes the lowest value of free energy observed for the nucleic acid molecule when assessed for various secondary structures thereof. The more negative free energy a structure has, the more likely is its formation.
As used herein, the term “melting temperature (Tm)” refers to the temperature at which about 50% of double-stranded nucleic acid structures (e.g., DNA/DNA, DNA/RNA, or RNA/RNA duplexes) denature and dissociate to single-stranded structures. The melting temperature of a particular nucleic acid molecule can be determined using thermodynamic analyses and algorithms described herein and known in the art (see, e.g., Kibbe W. A., Nucleic Acids Res., 35 (Web Server issue): W43-W46 (2007). doi: 10.1093/nar/gkm234; and Dumousseau et al., BMC Bioinformatics, 13:101 (2012). doi.org/10.1186/1471-2105-13-101).
When referring to a nucleotide sequence or protein sequence, the term “identity” is used to denote similarity between two sequences. Sequence similarity or identity may be determined using standard techniques known in the art, including, but not limited to, the local sequence identity algorithm of Smith & Waterman, Adv. Appl. Math. 2, 482 (1981), by the sequence identity alignment algorithm of Needleman & Wunsch, J Mol. Biol. 48,443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Natl. Acad. Sci. USA 85, 2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Drive, Madison, Wis.), the Best Fit sequence program described by Devereux et al., Nucl. Acid Res. 12, 387-395 (1984), or by inspection. Another algorithm is the BLAST algorithm, described in Altschul et al., J Mol. Biol. 215, 403-410, (1990) and Karlin et al., Proc. Natl. Acad. Sci. USA 90, 5873-5787 (1993). A particularly useful BLAST program is the WU-BLAST-2 program which was obtained from Altschul et al., Methods in Enzymology, 266, 460-480 (1996); blast.wustl/edu/blast/README.html. WU-BLAST-2 uses several search parameters, which are optionally set to the default values. The parameters are dynamic values and are established by the program itself depending upon the composition of the particular sequence and composition of the particular database against which the sequence of interest is being searched; however, the values may be adjusted to increase sensitivity. Further, an additional useful algorithm is gapped BLAST as reported by Altschul et al, (1997) Nucleic Acids Res. 25, 3389-3402. Unless otherwise indicated, percent identity is determined herein using the algorithm available at the internet address: blast.ncbi.nlm.nih.gov/Blast.cgi.
As used herein, a “recombinant” nucleic acid (e.g., a recombinant group I intron) refers to a non-naturally occurring nucleic acid resulted from artificial modifications, such as mutagenesis that is distinguishable from naturally occurring nucleic acids found in natural sources.
As used herein, definitions of IGS and the paired regions, for example, P1, P2, P4-P6, P3-P9, P9.0, P9a, P9b, P9.1, P9.1a, P9.2 and P10 duplexes and P1 extension, of a group I intron are known in the art and can be determined, for example, with reference to the following documents: Burke J. M. et al., Structural conventions for group I introns. Nucleic Acids Res. 1987 September; 15 (18): 7217-21; Stahley M. R. and Strobel S. A., RNA splicing: group I intron crystal structures reveal the basis of splice site selection and metal ion catalysis, Current Opinion in Structural Biology, 2006 16 (3): 319-326; and Woodson S. A. Structure and assembly of group I introns. Curr Opin Struct Biol. 2005 15 (3): 324-30. Representative sequences and secondary structures of group I introns are available, for example, on website https://crw2-comparative-rna-web.org/group-i-introns/.
The essential sequence elements of the novel cis-splicing mediated RNA circularization system based on group I introns are shown in
The nucleotide sequence of interest comprises a target site (e.g., ‘NNNNNU’) that can pair with the interanl guide sequence (IGS) (e.g., ‘GNNNNN’) to determine the 5′ splice site. The 5′ recognizer sequence (R1) comprises a first pairing sequence and a 3′ end nucleotide ‘N’ (also referred to as “ωN”). In some particular embodiments, the 3′ end nucleotide ‘N’ is guanine (also referred to as “ωG”). The 3′ recognizer sequence (R2) comprises a second pairing sequence that can pair with the first pairing sequence to form a duplex which helps to determine the 3′ splice site downstream the ωN. In some particular embodiments, the ribozyme core is capable of catalyzing the formation of a circular RNA comprising the nucleotide sequence of interest by joining the nucleotide immediately downstream the ωN (i.e., the nucleotide at the ωN+1 position in the RNA construct) and the 3′ end nucleotide of the target site (e.g., the 3′ end ‘U’ if the target site is ‘NNNNNU’).
R1 may further comprise a 5′ flanking sequence. R2 may further comprise a 3′ flanking sequence. The 5′ and 3′ flanking sequences may pair with each other to form a double-stranded region which promotes the 5′ and 3′ ends of the RNA construct to be close and thereby helping to determine the duplex required for the 3′ splice site.
In one aspect, the disclosure provides an RNA construct (Construct 1) comprising, from 5′ end to 3′ end:
In some embodiments, the RNA construct comprises, from 5′ end to 3′ end,
In some embodiments, ωN is guanine (ωG).
Group I introns may be categorized into 14 subgroups including IA1-3, IB1-4, IC1-3, ID and IE1-3. The self-splicing group I intron useful in the present disclosure may be obtained or derived from any organism, such as, for example, fungi, bacteria, bacteriophages, and eukaryotic viruses. Examples of group I introns useful in the present disclosure include, but are not limited to, group I introns derived from the following organisms: Enterobacteria phage T4, Bacteriophage Twort, Bacteriophage SPO1, Bacteriophage S3b, Bacillus anthracis, Clostridium botulinum, Tetrahymena thermophile (e.g., Ttch.L1925), Didymium iridis (e.g., Dir.S956-2), Diderma niveum, Dunaliella parva, Pneumocystis carinii, Physarum polycephalum (e.g., Ppo.L1925), Anabaena sp. PCC7120, Scytonema hofmanni, Agrobacterium tumefaciens, Synechocystis sp. PCC 6803, Synechococcus elongatus PCC 6301, Neurospora crassa, Candida albicans, Scytalidium cerradiumydiaces, Scytalidium dimidiatum, Pediadiaces Chlamydomonas nivalis, Chlorella vulgaris, Amoebidium parasiticum, Pediastrum biradiatum, Emericella nidulans, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Azoarcus sp. BH72, Neochloris aquatica, and Symkania negevensis. See e.g., Vicens Q. et al., Toward predicting self-splicing and protein-facilitated splicing of group I introns. RNA. 2008 October; 14 (10): 2013-29; Tanner M. and Cech T., Activity and thermostability of the small self-splicing group I intron in the pre-tRNA (11e) of the purple bacterium Azoarcus. RNA. 1996 January; 2 (1): 74-83; Vicens Q. and Cech T. R., Atomic level architecture of group I introns revealed. Trends Biochem Sci. 2006 January; 31 (1): 41-51.; Hedberg A. and Johansen S. D., Nuclear group I introns in self-splicing and beyond. Mob DNA. 2013 Jun. 5; 4 (1): 17. A group I intron can be a naturally occurring or a recombinant group I intron. A recombinant group I intron can be obtained, for example, by deleting, inserting and/or substituting one or more nucleotides of a naturally occurring group I intron, as long as the self-splicing activity is retained.
The ribozyme core has the catalytic activity of a group I intron ribozyme means that the ribozyme core is able to catalyze self-splicing of the RNA construct with the help of IGS/target site determining the 5′ splice site, and the 5′ recognizer sequence (R1)/the 3′ recognizer sequence (R2) and ωN (e.g., ωG in some embodiments) determining the 3′ splice site like a group I intron ribozyme. In some embodiments, the resulting circular RNA is formed by connecting the 3′ end nucleotide of the target site and the nucleotide immediately downstream of the ωN (e.g., ωG in some embodiments).
In some embodiments, the ribozyme core sequence comprises a nucleotide sequence encoding the scaffold domain and catalytic domain of a group I intron. The scaffold domain may comprise P4-P6 (P4, P5 and P6) and the catalytic domain may comprise P3-P8 (P3, P7 and P8). In some embodiments, the ribozyme core sequence comprises or consists of the sequence from the IGS end (e.g., starting from a nucleotide downstream (e.g., immediately downstream) of the 3′ end nucleotide of the IGS) to the sequence before the P9.0 duplex (i.e., before the 5′ half of P9.0 duplex) of a group I intron.
In some embodiments, the ribozyme core sequence does not comprise the sequence for the P9.0 duplex of a group I intron. In some embodiments, the ribozyme core sequence does not comprise the sequence from the 5′ half of P9.0 duplex to the 3′ end nucleotide of a group I intron. In some embodiments, the ribozyme core sequence comprises the complete sequence between P1-P9.0 duplexes of a group I intron, excluding the sequences for the P1 and P9.0 duplexes. For example, in embodiments wherein the ribozyme core sequence is derived from a Pneumocystis carinii or Tetrahymena sp. group I intron, the ribozyme core sequence may comprise or consist of the sequence from the IGS end to the sequence before the P9.0 duplex (i.e., before the 5′ half of P9.0 duplex) of the group I intron.
The ribozyme core sequence may be derived from any group I intron, including but not limited to the group I introns as described above. In some embodiments, the ribozyme core sequence is derived from a group IC1 (e.g., from Tetrahymena sp. (e.g., T. thermophile, T. cosmopolitanis (e.g., GenBank accession number: X03107), T. hyperangularis (e.g., GenBank accession number: X03106), T. malaccensis (e.g., GenBank accession number: X03105) or T. pigmentosa (e.g., GenBank accession number: J01210)) or Pneumocystis sp. (e.g., Pneumocystis carinii), IC2, IC3 (e.g., from Anabaena sp. PCC7120 (e.g., GenBank accession number: M38692) or Azoarcus sp. BH72 (e.g., GenBank accession number: X66221)) or IA2 (e.g., from Bacteriophage Twort) intron. The group I intron from which the ribozyme core sequence is derived can be a naturally occurring group I intron or a recombinant group I intron. In some embodiments, the ribozyme core sequence comprises a nucleotide sequence having at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to the nucleotide sequence of a naturally occurring group I intron.
In some embodiments, the ribozyme core sequence is derived from a Pneumocystis sp. group I intron, e.g., a Pneumocystis sp. group I intron comprising a nucleotide sequence selected from SEQ ID NOs: 32-36. In a particular embodiment, the ribozyme core sequence is derived from a Pneumocystis carinii group I intron comprising the nucleotide sequence of SEQ ID NO: 32. In a particular embodiment, the ribozyme core sequence comprises or consists of the nucleotide sequence of SEQ ID NO: 19 or a nucleotide sequence having at least 95% sequence identity thereto.
In some embodiments, the ribozyme core sequence is derived from a Tetrahymena sp. group I intron, e.g., a Tetrahymena thermophile group I intron comprising the sequence of SEQ ID NO: 12. In a particular embodiment, the ribozyme core sequence comprises or consists of the nucleotide sequence of SEQ ID NO: 17 or a nucleotide sequence having at least 95% sequence identity thereto.
In some embodiments, the ribozyme core sequence is derived from an Anabaena sp. group I intron; for example, the ribozyme core sequence comprises or consists of the nucleotide sequence of SEQ ID NO: 48 or a nucleotide sequence having at least 95% sequence identity thereto.
The first and second pairing sequences can pair with each other to form a duplex-containing structure upstream of the ωN to define a 3′ splice site downstream the ωN. The duplex-containing structure may comprise at least one base pair and have a minimum free energy (MFE) of less than −18.9 KJ/mol and a melting temperature of at least 35.0° C. The free energy parameters can be determined using any method known in the art, for example, an RNA secondary structure predicting tool such as RNAfold and RNAstructure, or by experimental methods such as optical melting experiments, in conjunction with NMR or crystallography. Algorithms for determining MFE are further described in, e.g., Hajiaghayi et al., BMC Bioinformatics, 13:22 (2012); Mathews, D. H., Bioinformatics, Volume 21, Issue 10:2246-2253 (2005); and Doshi et al., BMC Bioinformatics, 5:105 (2004) doi 10.1186/1471-2105-5-105). Alternatively, the formation of a duplex-containing structure between the first and second pairing sequences can be predicted by determining the optimal secondary structure of the RNA construct of the present disclosure.
The most commonly used software programs, employed to predict the secondary RNA or DNA structures by MFE algorithms, make use of the so-called nearest-neighbor energy model. This model uses free energy rules based on empirical thermodynamic parameters (Mathews et al., J Mol Biol, 288:911-940 (1999); and Mathews et al., Proc Natl Acad Sci USA, 101:7287-7292 (2004)) and computes the overall stability of an RNA or DNA structure by adding independent contributions of local free energy interactions due to adjacent base pairs and loop regions.
In some embodiments, the duplex-containing structure may have a minimum free energy (MFE) of less than about −18.9 KJ/mol (e.g., less than about −17 KJ/mol, less than about −18 KJ/mol, less than about −18.9 KJ/mol, less than about −19 KJ/mol, less than about −20 KJ/mol, less than about −30 KJ/mol, less than about −40 KJ/mol). In some embodiments, the MFE is greater than about −90 KJ/mol (e.g., greater than about −85 KJ/mol, greater than about −80 KJ/mol, greater than about −70 KJ/mol, greater than about −60 KJ/mol, greater than about −50 KJ/mol, greater than about −40 KJ/mol). In some embodiments, the duplex-containing structure has a minimum free energy (MFE) of about −18.9 KJ/mol or less. In some embodiments, the duplex-containing structure has an MFE in the range of about −18.9 kJ/mol to about −90 KJ/mol.
In some embodiments, the duplex-containing structure has a melting temperature of at least 35.0° C. In some embodiments, the duplex-containing structure has a melting temperature of at least 35.0° C., but not more than about 85° C. In some embodiments, the RNA secondary structure has a melting temperature of at least 35° C., 36° C., 37° C., 38° C., 39° C., 40° C., 41° C., 42° C., 43° C., 44° C., 45° C., 46° C., 47° C., 48° C., 49° C. or greater. In some embodiments, the melting temperature is no more than about 85° C., no more than about 75° C., no more than about 70° C., no more than about 65° C., no more than about 60° C., no more than about 55° C., no more than about 50° C. or less.
The duplex-containing structure may comprise one or more base pairs, e.g., 1-200, 1-50, 5-45, 10-40, 15-35, 15-20, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 base pairs, consecutive or interrupted by one or more mismatches. In some embodiments, the duplex-containing structure comprises at least two base pairs. In some preferable embodiments, the duplex-containing structure comprises at least two consecutive base pairs. For example, the duplex-containing structure may comprise 2-100, 3-80, 5-60, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45 or 50 consecutive base pairs. In some embodiments, at least one base pair is located immediately upstream of the ωN. In some preferable embodiments, 2-6 consecutive base pairs of the duplex-containing structure are located immediately upstream of the ωN. Examples of duplex-containing structures may include but are not limited to stem structures, stem-loop structures and stem-loop alternating structures.
The duplex-containing structure may comprise a Watson-Crick base pair, a non-Watson-Crick base pair or a combination thereof. The duplex-containing structure may comprise a base pair selected from A-U, G-C, G-A, A-A, U-U, A-C, G-U and a combination thereof. The duplex-containing structure may optionally comprise one or more structures selected from a bulge loop, an interior loop and a hairpin loop.
The first and second pairing sequences may independently comprise 1-100 nucleotides, for example, 2-90, 5-90, 10-80, 20-60, 30-50, 40-45, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 18, 20, or 25 nucleotides. In some embodiments, the first pairing sequence comprises 2-20, 2-12, 4-10, 6, 7 or 8 nucleotides. In some embodiments, the second pairing sequence comprises 2-100, 5-80, 8-60, 10-50, 15, 20, 30, 40, 50, 60, 70, 80, 90 or 100, preferably 5-80 or 8-60 nucleotides. The first and second pairing sequences may share a sufficient level of sequence identity to one another's reverse complement to allow the 5′ and 3′ ends of the RNA construct to form the duplex-containing structure. The percent sequence identity can be any percent of sequence identity that allows for hybridization to occur. In some embodiments, at least 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% nucleotides of the first pairing sequence form base pairs with the second pairing sequence. In some embodiments, the first pairing sequence is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100% complementary to the second pairing sequence.
In some embodiments, the first pairing sequence comprises a sequence of at least 2 contiguous nucleotides, for example, a sequence of 2-100 contiguous nucleotides which is reverse complementary to a sequence of the same number of contiguous nucleotides in the second pairing sequence. In some preferable embodiments, the first pairing sequence comprises a sequence of 2-6 contiguous nucleotides, which is reverse complementary to a sequence of the same number of contiguous nucleotides in the second pairing sequence.
The base pairs formed between the first and second pairing sequences may be located anywhere upstream of the ωN, preferably upstream and adjacent to the ωN, for example, immediately upstream of the ωN (for example, at least one base pair is located at the ωN−1 position in the RNA construct), or located a few (e.g., 1-50, 10-40, 20-30, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20) nucleotides upstream of the ωN (for example, at least one base pair is located at the ωN−2, ωN−3, ωN−4, ωN−5, ωN−6, ωN−7, ωN−8, ωN−9 or ωN−10 position in the RNA construct). In some embodiments, ωN is guanine (ωG). As demonstrated in some embodiments of the present application, one or more base pairs formed in close proximity of the ωN, mimicking the P9.0 duplex in a native group I intron, are essential for higher circularization efficiency and more accurate splicing. Accordingly, in some preferable embodiments, the relative location of the duplex formed to the ωN in the RNA construct is substantially identical to that of the P9.0 duplex to the ωG in the group I intron from which the ribozyme core sequence is derived.
In some preferable embodiments, the first and second pairing sequences form at least one base pair upstream and adjacent to the ωN, such that base pairing between the first and second pairing sequences simulate the formation of a P9.0 duplex upstream the ωG in the native group I intron during the circularization reaction. The duplex formed adjacent to the ωN may be also referred to as a “P9.0 duplex mimic”.
The P9.0 duplex mimic may comprise at least one base pair. For example, the P9.0 duplex mimic may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 base pairs. Preferably, the P9.0 duplex mimic may comprise 2-6 consecutive base pairs. Preferably, the P9.0 duplex mimic comprises a substantially identical number of base pairs to that of the P9.0 duplex of the group I intron from which the ribozyme core sequence is derived. Those skilled in the art would be able to determine essential features for a P9.0 duplex mimic in view of the present disclosure and the prior art.
The P9.0 duplex mimic may comprise a Watson-Crick base pair, a non-Watson-Crick base pair or a combination thereof. In some embodiments, the non-Watson-Crick base pair is a wobble base pair. A preferable example of a non-Watson-Crick base pair may be a G-U wobble base pair. In a particular embodiment, the ribozyme core sequence is derived from a Pneumocystis carinii group I intron, the P9.0 duplex mimic may comprise a G-U wobble base pair.
In some embodiments, the first pairing sequence comprises a nucleotide ‘N1’ that is able to form a base pair with a nucleotide ‘n1’ of the second pairing sequence, wherein ‘N1’ is located at an ωN-i position in the RNA construct, i is an integer of 1-21. In some particular embodiments, i is an integer of 1-11. In some preferable embodiments, i is 1 or 2.
In some embodiments, ‘N1’ is the 3′ end nucleotide of a first contiguous sequence of 2-6 nucleotides in the first pairing sequence, ‘n1’ is the 5′ end nucleotide of a second contiguous sequence in the second pairing sequence, wherein the first contiguous sequence is reverse complementary to the second contiguous sequence.
In some embodiments, ‘N1’ is the 3′ end nucleotide of a first contiguous sequence of 2-6 nucleotides in the first pairing sequence, ‘n1’ is the 5′ end nucleotide of a second contiguous sequence in the second pairing sequence, wherein the first contiguous sequence is reverse complementary to the second contiguous sequence, and i is an integer of 1-21. In some particular embodiments, i is an integer of 1-11. In some preferable embodiments, i is 1 or 2.
In some embodiments, R1 comprises a nucleotide sequence ‘(Nx)s(Ny)t(ωN)’ at its 3′ end, and R2 comprises a nucleotide sequence ‘(nx)w’; wherein, ωN, ‘Nx’, ‘nx’ and ‘Ny’ are each independently any naturally occurring or modified nucleotide, t is an integer of 0-20, s and w are each independently an integer of 1-200, and ‘(Nx)s’ and ‘(nx)w’ are substantially complementary to form a duplex-containing structure upstream of the ωN to define the 3′ splice site.
In some embodiments, ωN is guanine (ωG).
In some embodiments, t is an integer of 0-10. In some preferable embodiments, t is 0. In some other embodiments, t is 1.
In some embodiments, each of s and w is an integer of h which is selected from 2-6, ‘(Nx)h’ and ‘(nx)h’ are reverse complementary, and t is 0-20. In some particular embodiments, the ribozyme core sequence is derived from a group IC1 intron, for example, an Tetrahymena sp. (e.g., T. thermophile) or Pneumocystis sp. group I intron, and t is an integer of 0-20. In some particular embodiments, the ribozyme core sequence comprises or consists of the nucleotide sequence of SEQ ID NO: 17 or SEQ ID NO: 19 or a nucleotide sequence having at least 95% sequence identity thereto, and t is 0.
In some embodiments, R1 comprises a nucleotide sequence ‘N1(Ny)tG’ at its 3′ end, and R2 comprises a nucleotide ‘n1’, wherein ‘G’ is the ωG, ‘N1’, ‘n1’ and ‘Ny’ are each independently any naturally occurring or modified nucleotide, t is an integer of 0-20, and ‘N1’ and ‘n1’ form a base pair. In some embodiments, t is 0. In some embodiments, the ribozyme core sequence is derived from a group IC1 intron, for example, an Tetrahymena sp. (e.g., T. thermophile) or Pneumocystis sp. group I intron, and t is 0.
In some other embodiments, t is an integer of 1-20, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20. In some particular embodiments, for example, wherein the ribozyme core sequence is derived from a group IC3 intron, for example, an Azoarcus sp. group I intron (e.g., derived from Azoarcus sp. strain BH72), t is 1. In some particular embodiments, the ribozyme core sequence is derived from a group IC1 intron, for example, an Tetrahymena sp. (e.g., T. thermophile) group I intron, t is an integer of 1-10, preferably t is 1.
In some embodiments, R1 comprises a nucleotide sequence ‘N2N1 (Ny)t(G’ at its 3′ end, and R2 comprises a nucleotide sequence ‘n1n2’, wherein ‘G’ is the ωG, ‘N1’, ‘n1’, ‘N2’, ‘n2’ and ‘Ny’ are each independently any naturally occurring or modified nucleotide, t is an integer of 0-20, ‘N1’ and ‘n1’ form a first base pair, and ‘N2’ and ‘n2’ form a second base pair. In some embodiments, t is 0, R1 comprises a nucleotide sequence ‘N2N1G’ at its 3′ end. In some other embodiments, for example, wherein the ribozyme core sequence is derived from a group IC3 intron (e.g., an Azoarcus sp. or Annona cherimola group I intron), t is 1, R1 comprises a nucleotide sequence ‘N2N1NyG’, wherein ‘Ny’ is any naturally occurring or modified nucleotide; for example, ‘Ny’ is ‘G’, ‘U’ or ‘A’. In some embodiments, the first and second base pairs are each selected from A-U, G-C, G-A, A-A, U-U, A-C, G-U and a combination thereof.
In some embodiments, the 5′ recognizer sequence (R1) may further comprise a 5′ flanking sequence located upstream of the first pairing sequence. In some embodiments, the 3′ recognizer sequence (R2) may further comprise a 3′ flanking sequence located downstream of the second pairing sequence. The 5′ flanking sequence and 3′ flanking sequence may pair with each other to form at least one RNA secondary structure that promotes the 5′ and 3′ ends of the RNA construct to be close. The at least one RNA secondary structure may comprise a double-stranded region formed by base pairing between the 5′ and 3′ flanking sequences, and optionally one or more structures selected from a bulge loop, an inteior loop and a hairpin loop. Examples of such RNA secondary structures include but are not limited to stem structures, stem-loop structures and stem-loop alternating structures. The 5′ and 3′ flanking sequences each may independently comprise 1-500 nucleotides, for example, 10-500, 20-400, 30-300, 40-200, 50-100, 60-90 or 70-80 nucleotides. In some embodiment, the 5′ and 3′ flanking sequences each independently comprises 3-400, 4-200, 5-150, 10-100 or 20-50 nucleotides.
The double-stranded region may comprise one or more base pairs, e.g., about 2-500, about 5-100, about 2-50, about 10-50 or about 20-30 base pairs, consecutive or interrupted by one or more mismatches. Preferably, the double-stranded region comprises 2-50 base pairs, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 or 50 base pairs. Preferable examples of the 5′ and 3′ flanking sequences may be homology arm sequences. For example, a double-stranded region can be formed by two homology arm sequences that are substantially reverse complementary.
In some embodiments, the 5′ flanking sequence comprises a 5′ homology arm sequence, and the 3′ flanking sequence comprises a 3′ homology arm sequence, and the 5′ and 3′ homology arm sequences are substantially complementary. In some embodiments, R1 further comprises a 5′ homology arm sequence located upstream of the first pairing sequence and R2 further comprises a 3′ homology arm sequence located downstream of the second pairing sequence, wherein the 5′ and 3′ homology arm sequences are substantially complementary. The 5′ and 3′ homology arm sequences each may independently comprise 5-50 nucleotides, for example, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45 or 50 nucleotides. In an embodiment, the 5′ and 3′ homology arm sequences are reverse complementary. In another embodiment, the 5′ and 3′ homology arm sequences are partially reverse complementary, for example, at least 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95% or 99% nucleotides of the 5′ and 3′ homology arm sequences form base pairs. Preferably, the 5′ and 3′ homology arm sequences share a higher percent of identity to one another's reverse complement than they to a sequence located within the GOI and/or the ribozyme core sequence, such that formation of a double-stranded region between the 5′ and 3′ homology arm sequences is prioritized.
In some embodiments, the 5′ and 3′ flanking sequences, alone or in combination, may form one or more structures mimicking the native structures of the group I intron ribozyme. For example, in embodiments wherein the ribozyme core sequence is derived from a Tetrahymena sp. group I intron, the 5′ and 3′ flanking sequences, alone or in combination, may form one or more structures mimicking the native P9 (P9a/9b), P9.1, P9.1a or P9.2 duplex of the group I intron or a combination thereof. Preferably, the 5′ and 3′ flanking sequences in combination form a structure mimicking the P9.2 duplex of the group I intron.
The RNA construct according to the present disclosure can be derived from a group I intron by inserting a nucleotide sequence of interest between a 3′ fragment (corresponding to R1) and a 5′ fragment (corresponding to Ribozyme core-R2) of a group I intron, wherein the 3′ fragment and 5′ fragment in combination retain the self-splicing ability of the group I intron. Upon further investigation, the present inventor unexpectedly discovered that a 3′ end portion (e.g., a sequence from the 5′ half of P9.0 duplex to the 3′ end nucleotide) of a group I intron could be deleted and modified without disrupting the catalytic activity of the group I intron and the formation of a duplex-containing structure comprising any sequence between the 5′ and 3′ ends of the RNA construct is only required to facilitate circularization through the self-splicing activity of the ribozyme core.
Accordingly, the present disclosure provides, in another aspect, an RNA construct (Construct 2) comprising, from 5′ end to 3′ end,
The group I intron can be a group I intron as described above. Preferably, the group I intron is a group IC1 (e.g., from Tetrahymena sp. (e.g., T. thermophile, T. cosmopolitanis, T. hyperangularis, T. malaccensis or T. pigmentosa) or Pneumocystis sp. (e.g., Pneumocystis carinii), IC2, IC3 (e.g., from Anabaena sp. PCC7120 or Azoarcus sp. BH72) or IA2 (e.g., from Bacteriophage Twort) intron. In an embodiment, the group I intron is a Pneumocystis sp. group I intron comprising a nucleotide sequence selected from SEQ ID NOs: 32-36. In another embodiment, the group I intron is a Tetrahymena thermophila group I intron comprising the nucleotide sequence of SEQ ID NO: 12. In another embodiment, the group I intron is an Anabaena sp. group I intron comprising the nucleotide sequence of SEQ ID NO: 49.
‘Np’ and ‘Nq’ are selected such that a P9.0 duplex mimic can be formed between R1 and R2. The first and second nucleotide sequences in combination retain the self-splicing ability of the group I intron, but not necessarily constitute the full-length of the group I intron. For example, the first and second nucleotide sequences in combination may lack one or more duplexes that is not a P9.0 duplex in the P9 domain of the group I intron. For example, the first and second nucleotide sequences in combination may lack one or more duplexes selected from a P9a duplex, a P9b duplex, a P9.1 duplex, a P9.1a duplex and a P9.2 duplex, when applicable. Preferably, the first and second nucleotide sequences in combination comprise at least one duplex selected from a P9a duplex, a P9b duplex, a P9.1 duplex, a P9.1a duplex and a P9.2 duplex, when applicable.
In some embodiments, the group I intron is a Pneumocystis carinii group I intron comprising the nucleotide sequence of SEQ ID NO: 32, ‘Np’ and ‘Nq’ are independently selected from any nucleotide from nucleotide 316 (U316) to nucleotide 342 (G342) of SEQ ID NO: 32. In some embodiments, the group I intron is a Tetrahymena thermophila group I intron comprising the nucleotide sequence of SEQ ID NO: 12, ‘Np’ and ‘Nq’ are independently selected from any nucleotide from nucleotide 313 (A313) to nucleotide 411 (U411) of SEQ ID NO: 12. In some embodiments, the group I intron is an Anabaena sp. group I intron comprising the nucleotide sequence of SEQ ID NO: 49, ‘Np’ and ‘Nq’ are independently selected from any nucleotide from nucleotide 212 (C212) to nucleotide 243 (G243) of SEQ ID NO: 49.
‘Np’ may be located at any position upstream of ‘Nq’ in the group I intron. In some embodiments, ‘Np’ is located immediately upstream of or adjacent to ‘Nq’ in the group I intron. In some embodiments, ‘Np’ is located immediately upstream of ‘Nq’ in the group I intron. In some other embodiments, ‘Np’ is located several nucleotides (for example, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides) upstream of ‘Nq’ in the group I intron.
In embodiments wherein the group I intron does not have a P9.2 duplex, ‘Np’ can be the 3′ end nucleotide of the 5′ half of P9.0 duplex of the group I intron, and ‘Nq’ can be the 5′ end nucleotide of the 3′ half of P9.0 duplex of the group I intron. For example, for a Pneumocystis carinii group I intron comprising the nucleotide sequence of SEQ ID NO: 32, ‘Np’ and ‘Nq’ can be nucleotide 316 (U316) and nucleotide 342 (G342) of SEQ ID NO: 32, respectively. For example, for an Anabaena sp. group I intron comprising the nucleotide sequence of SEQ ID NO: 49, ‘Np’ and ‘Nq’ can be nucleotide 212 (C212) and nucleotide 243 (G243) of SEQ ID NO: 49, respectively.
In some embodiments, ‘Np’ and ‘Nq’ can be independently selected from any nucleotide from the 3′ end nucleotide of the 5′ half to the 5′ end nucleotide of the 3′ half of a duplex of the group I intron, wherein the duplex is not a P9.0 duplex. For example, the duplex can be a P9a, P9b, P9.1, P9.1a or P9.2 duplex. In a preferable embodiment, the duplex is a P9.2 duplex.
In preferable embodiments, ‘Np’ and ‘Nq’ are located within the region connecting the 5′ half and 3′ half of a duplex, wherein the duplex is not a P9.0 duplex. For example, ‘Np’ and ‘Nq’ can be located within the apical loop of a P9a/9b, P9.1, P9.1a or P9.2 duplex. In an embodiment, the group I intron is a Pneumocystis carinii group I intron comprising the nucleotide sequence of SEQ ID NO: 32, ‘Np’ and ‘Nq’ are independently selected from nucleotide 325 (G325) to nucleotide 328 (A328) of SEQ ID NO: 32. In another embodiment, the group I intron is a Tetrahymena thermophila group I intron comprising the nucleotide sequence of SEQ ID NO: 12, ‘Np’ and ‘Nq’ are independently selected from any nucleotide from nucleotide 383 (G383) to nucleotide 386 (A386) of SEQ ID NO: 12. In yet another embodiment, the group I intron is an Anabaena sp. group I intron comprising the nucleotide sequence of SEQ ID NO: 49, ‘Np’ and ‘Nq’ are independently selected from any nucleotide from nucleotide 219 (A219) to nucleotide A (A222) of SEQ ID NO: 49; or ‘Np’ and ‘No’ are independently selected from any nucleotide from nucleotide 232 (G232) to nucleotide A (A235) of SEQ ID NO: 49.
In yet another embodiment, ‘Np’ is the 3′ end nucleotide of the 5′ half of a duplex and ‘Nq’ is the 5′ end nucleotide of the 3′ half of a duplex, wherein the duplex is not a P9.0 duplex. For example, for a Pneumocystis carinii group I intron comprising the nucleotide sequence of SEQ ID NO: 32, ‘Np’ and ‘Nq’ can be nucleotide 324 (C324) and nucleotide 329 (G329) of SEQ ID NO: 32, respectively. For example, for a Tetrahymena thermophila group I intron comprising the nucleotide sequence of SEQ ID NO: 12, ‘Np’ and ‘Nq’ can be nucleotide 375 (U375) and nucleotide 394 (G394) of SEQ ID NO: 12, respectively; or ‘Np’ and ‘Nq’ can be nucleotide 382 (C382) and nucleotide 387 (G387) of SEQ ID NO: 12, respectively. For example, for an Anabaena sp. group I intron comprising the nucleotide sequence of SEQ ID NO: 49, ‘Np’ and ‘Nq’ can be nucleotide 218 (C218) and nucleotide 223 (G223) of SEQ ID NO: 49, respectively; or ‘Np’ and ‘Nq’ can be nucleotide 231 (C231) and nucleotide 236 (G236) of SEQ ID NO: 49, respectively.
The IGS end of a group I intron can be readily identified by those skilled in the art in view of the present disclosure and the prior art. The second nucleotide sequence (corresponding to Ribozyme core-R2) may comprise a nucleotide sequence lacking the IGS of the group I intron. For example, the second nucleotide sequence may comprise a nucleotide sequence starting from the nucleotide immediately downstream of the 3′ end nucleotide of the IGS of a group I intron.
In some embodiments, the group I intron is a Pneumocystis carinii group I intron comprising the nucleotide sequence of SEQ ID NO: 32. In an embodiment, the second nucleotide sequence comprises a nucleotide sequence starting from nucleotide 18 (G18) to nucleotide 316 (U316) of SEQ ID NO: 32, and the first nucleotide sequence may comprise a nucleotide sequence starting from any nucleotide selected from nucleotide 317 (C317) to nucleotide 342 (G342) to the 3′ end of SEQ ID NO: 32. In another embodiment, the second nucleotide sequence comprises a nucleotide sequence starting from nucleotide 18 (G18) to any nucleotide selected from nucleotide 316 (U316) to nucleotide 341 (U341) of SEQ ID NO: 32, and the first nucleotide sequence may comprise a nucleotide sequence starting from nucleotide 342 (G342) to the 3′ end of SEQ ID NO: 32.
In some embodiments, the group I intron is a Tetrahymena thermophila group I intron comprising the nucleotide sequence of SEQ ID NO: 12. In an embodiment, the second nucleotide sequence may comprise a nucleotide sequence starting from nucleotide 27 (A27) to nucleotide 313 (A313) of SEQ ID NO: 12, and the first nucleotide sequence may comprise a nucleotide sequence starting from any nucleotide selected from nucleotide 314 (C314) to nucleotide 411 (U411) to the 3′ end of SEQ ID NO: 12. In another embodiment, the second nucleotide sequence may comprise a nucleotide sequence starting from nucleotide 27 (A27) to any nucleotide selected from nucleotide 313 (A313) to nucleotide 410 (C410) of SEQ ID NO: 12, and the first nucleotide sequence may comprise a nucleotide sequence starting from nucleotide 411 (U411) to the 3′ end of SEQ ID NO: 12.
In some embodiments, the group I intron is an Anabaena sp. group I intron comprising the nucleotide sequence of SEQ ID NO: 49. In an embodiment, the second nucleotide sequence may comprise a nucleotide sequence starting from nucleotide 12 (C12) to nucleotide 212 (C212) of SEQ ID NO: 49, and the first nucleotide sequence may comprise a nucleotide sequence starting from any nucleotide selected from nucleotide 213 (A213) to nucleotide 243 (G243) to the 3′ end of SEQ ID NO: 49. In another embodiment, the second nucleotide sequence may comprise a nucleotide sequence starting from nucleotide 12 (C12) to any nucleotide selected from nucleotide 212 (C212) to nucleotide 242 (A242) of SEQ ID NO: 49, and the first nucleotide sequence may comprise a nucleotide sequence starting from nucleotide 243 (G243) to the 3′ end of SEQ ID NO: 49.
In some embodiments, the RNA construct further comprises a 5′ homology arm sequence located upstream of the first nucleotide sequence and a 3′ homology arm sequence located downstream of the second nucleotide sequence, wherein the 5′ and 3′ homology arm sequences are as described above.
In some embodiments, the 5′ homology arm sequence comprises the nucleotide sequence of SEQ ID NO: 13, and the 3′ homology arm sequence comprises the nucleotide sequence of SEQ ID NO: 14. In some embodiments, the 5′ homology arm sequence comprises the nucleotide sequence of SEQ ID NO: 15 and the 3′ homology arm sequence comprises the nucleotide sequence of SEQ ID NO: 16. In some embodiments, the 5′ homology arm sequence comprises the nucleotide sequence of SEQ ID NO: 37 and the 3′ homology arm sequence comprises the nucleotide sequence of SEQ ID NO: 38. In some embodiments, the 5′ homology arm sequence comprises the nucleotide sequence of SEQ ID NO: 52 and the 3′ homology arm sequence comprises the nucleotide sequence of SEQ ID NO: 53.
However, the present inventor unexpectedly discovered that an RNA construct having a pair of homology arm sequences located at opposite ends of the RNA construct may achieve a high circularization efficiency comparable to an RNA construct counterpart preserving the native 3′ end sequence of a group I intron. That is, according to some embodiments of the present application, a 3′ end portion of the group I intron (e.g., a sequence from the 5′ half of the P9.0 duplex to the ωG) may be entirely replaced by a pair of homology arm sequences that are placed upstream of the GOI and downstream of the ribozyme core sequence, respectively, without affecting the circularization efficiency. Using homologous arm sequences to replace the natural partial sequences of a group I intron offers several advantages, including design simplicity and flexibility. When replacing the 3′ end portion of the group I intron (e.g., a sequence from the 5′ half of the P9.0 duplex to the ωG) with a pair of homologous arms, there is no need to add 5′ and 3′ spacers in the GOI region to ensure proper folding of the intron fragments at both ends. Furthermore, changes in the internal GOI sequence do not affect the circularization efficiency or interfere with the structure of the intron fragments at both ends. From a purification standpoint, the increased length difference between the 5′ and 3′ fragments generated after the splicing reaction facilitates their separation and purification. In summary, using homologous arm sequences for replacement of a 3′ end portion of a group I intron simplifies design, maintains structural integrity, and enhances purification efficiency.
The RNA construct may further comprises additional nucleotide sequences, for example, a nucleotide sequence useful for replication, transcription, translation and/or purification of the RNA construct, for example, inserted between two elements of the RNA construct as a spacer, or extending at the 5′ and/or 3′ ends of the RNA construct, as long as the self-splicing activity is maintained. Such nucleotide sequences may be conventionally selected by those skilled in the art as needed. In some embodiments, a spacer may be inserted between the 5′ homology arm and the first pairing sequence and/or between the 3′ homology arm and the second pairing sequence. In some embodiments, a spacer may be inserted between the 5′ homology arm and the first nucleotide sequence and/or between the 3′ homology arm and the second nucleotide sequence. In some embodiments, the 3′ end of the RNA construct can be extended with a sequence that will not pair to form a stable secondary structure such as a stem (referred to as “Tail element” in the present disclosure). Such sequences may include but are not limited to a polyadenine (polyA) and polyadenine/cytosine (polyAC) sequence of, for example, 10-200, 20-180, 30-150, 40-120, 50-100 nucleotides in length. In some embodiments, the RNA construct further comprises a polyA sequence at its 3′ end. The polyA sequence may comprise 10 to 150, preferably more than 20 and less than 100, and more preferably 40 to 70 consecutive adenines. This design can facilitate RNase R digestion of the precursor and can also increase the precursor's length difference versus the circRNA in favor of detection and purification (e.g.,
The nucleotide sequence of interest (GOI) can include but is not limited to the structure elements shown in
In one aspect, the present application provides an RNA construct that may achieve circularization of the nucleotide sequence of interest without inclusion of an exogenous exon fragment, for example, by mimicking the formation of a P1 duplex (P1 duplex mimic). Accordingly, advantageous effects of the present invention may at least include simplicity in design, a broad target sequence compatibility and/or a lower immunogenicity in a host while maintaining a high circularization efficiency. In some embodiments, the circular RNA does not comprise an exogenous exon fragment. For example, both the 3′ and the 5′ ends of the GOI do not comprise a natural exon fragment flanking the group I intron from which the ribozyme core sequence is derived. In some embodiments, the ribozyme core sequence is derived from a
Tetrahymena sp. group I intron. In some embodiments, the ribozyme core sequence is derived from a Tetrahymena thermophila group I intron comprising the nucleotide sequence of SEQ ID NO: 12. In some embodiments, the ribozyme core sequence comprises or consists of the nucleotide sequence of SEQ ID NO: 17 or a nucleotide sequence having at least 95% sequence identity thereto. In some embodiments, the ribozyme core sequence is derived from a Pneumocystis sp. group I intron. In some embodiments, the ribozyme core sequence is derived from a Pneumocystis sp. group I intron comprising a nucleotide sequence selected from SEQ ID NOs: 32-36. In some embodiments, the ribozyme core sequence comprises or consists of the nucleotide sequence of SEQ ID NO: 19 or a nucleotide sequence having at least 95% sequence identity thereto.
However, the present inventor unexpectedly discovered that in the cis-splicing system of the present application, for a ribozyme core sequence derived from, for example, an Anabaena sp. group I intron, a natural exon fragment flanking the group I intron may be desirable for a high circularization efficiency. In such cases, optimizing the 3′ end and/or 5′ end sequence of the GOI may be desirable to avoid the introduction of an exogenous exon sequence. This may be achieved by designing the backsplicing site in a non-coding region or codon optimization of a region in the nucleotide sequence to be circularized that is substantially homologous to an exon-exon junction fragment. In some embodiments, a 5′ end portion of the GOI (that is, a sequence that is downstream and adjacent to the ωN) may be designed to include a sequence that is substantially homologous, for example, at least 80%, 85%, 90%, 95%, 99% or 100% identical to a 5′ end portion of the 3′ exon (downstream exon) of the group I intron. In some embodiments, a 3′ end portion of the GOI (that is, a part of or the entire sequence of the target site and optionally its upstream sequence) may be designed to include a sequence that is substantially homologous, for example, at least 80%, 85%, 90%, 95%, 99% or 100% identical to a 3′ end portion of the 5′ exon (upstream exon) of the group I intron. In certain embodiments, the structure formed by the 5′ and 3′ termini of the GOI resembles the exon sequence structure found on both sides of the natural group I intron, where the 5′ and 3′ termini of the GOI can form an internal duplex. This structure may be introduced independently or integrated with the homologous sequences in the GOI. See for example, Chu-Xiao Liu et al., 2022, Molecular Cell, 82 (2): 420-434, for further description.
The present inventor further unexpectedly discovered that for a ribozyme core sequence derived from, for example, a Tetrahymena sp. group I intron or a Pneumocystis sp. group I intron, a high circularization efficiency may be achieved without the incorporation of a natural exon fragment. Accordingly, in some embodiments of the present application, a ribozyme core sequence derived from a Tetrahymena sp. group I intron or a Pneumocystis sp. group I intron as described herein may be preferable.
The backsplicing site can theoretically be set at any matching position of a nucleotide sequence to be circularized. In some embodiments, the backsplicing site can be designed inside the IRES (e.g., a sequence of ‘nnnnnu’ or ‘nnnnnc’ inside the IRES can be selected as the target site sequence). After circularization, IRES fragments at both ends of GOI can be reconnected to form a complete IRES sequence, as shown in
In some embodiments, the IGS and the target site form a P1 duplex mimic. The P1 duplex mimic may comprise a Watson-Crick base pair, a wobble base pair or a combination thereof. The P1 duplex mimic may comprise at least on base pair. For example, the P1 duplex mimic may comprise 1-20 base pairs, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 base pairs. Preferably, the P1 duplex mimic comprises a substantially identical number of base pairs to that of the P1 duplex of the group I intron from which the ribozyme core sequence is derived. Those skilled in the art would be able to determine essential features for a P1 duplex mimic in view of the present disclosure and the prior art.
In some embodiments, the IGS has the structure of 5′-X(N)m-3′, and the target site has the structure of 5′-(n)mx-3′, wherein
In some embodiments, m is an integer of 3-6. In some embodiments, m is an integer of 4-5. In a particular embodiment, m is 5.
In some embodiments, the base pairs formed between 5′-(N)m-3′ and 5′-(n)m-3′ comprise a Watson-Crick base pair, a wobble base pair or a combination thereof. In some embodiments, 5′-(N)m-3′ and 5′-(n)m-3′ are reverse complementary.
In some embodiments, the IGS comprises a sequence ‘GNNNNN’ and the target site comprises a sequence ‘nnnnnu’. In some embodiments, the IGS comprises a sequence ‘ANNNNN’ and the target site comprises a sequence ‘nnnnnc’. In some embodiments, ‘NNNNN’ and ‘nnnnn’ are reverse complementary.
In some embodiments, the RNA construct may further comprise a linker sequence located between the target site and IGS. The linker sequence can include but are not limited to the sequence elements as shown in
In some embodiments, the linker sequence comprises an unpaired sequence. The unpaired sequence may form a loop structure between the target site and the IGS. In some embodiments, the linker sequence comprises an unpaired sequence, wherein the target site, the linker sequence and the IGS form a stem-loop structure. In some embodiments, the stem portion of the stem-loop structure may comprise at least two base pairs, for example, 2-20 base pairs, for example, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15 base pairs or more. In some embodiments, the loop portion of the stem-loop structure may comprise at least 3 nucleotides, for example, 3-50 nucleotides, for example, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25-50, 30-45 or 35-40 nucleotides. The stem-loop structure may also have on either side of the stem one or more bulges (mismatches). The unpaired sequence may comprise at least 3 nucleotides, for example, 3-50 nucleotides, for example, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25-50, 30-45 or 35-40 nucleotides. Examples of an unpaired sequence may be a polyA or polyU sequence.
The IGS (e.g., ‘GNNNNN’ or ‘ANNNNN’) can extend 1 to 3 nucleotides at the 5′ end and form a P1 extension (P1-ex) mimic with 1 to 3 nucleotides adjacent to the target site (e.g., ‘nnnnnu’ or ‘nnnnnc’, respectively) at the 3′ end of GOI. In some embodiments, the linker sequence comprises, from 5′ to 3′ end, a third pairing sequence, a loop sequence, and a fourth pairing sequence, wherein the third and fourth pairing sequences form a P1 extension mimic. The P1 extension mimic may comprise a Watson-Crick base pair, a wobble base pair or a combination thereof. The P1 extension mimic may comprise 1, 2, 3, 4, 5, 6, or more base pairs, preferably 1-3 base pairs. In some embodiments, the P1 extension mimic comprises 1-3 reverse complementary base pairs. In some embodiments, the third pairing sequence comprises a sequence of 1-3 contiguous nucleotides, which is reverse complementary to a sequence of the same number of contiguous nucleotides in the fourth pairing sequence to form a P1 extension mimic.
The RNA construct may further comprise a fifth pairing sequence which can pair with a sequence in the GOI which is adjacent to the ωN (e.g., ωG in some embodiments) to simulate the formation of a P10 duplex (also referred to as a “P10 duplex mimic”). In some embodiments, the linker sequence comprises a fifth pairing sequence which can pair with a sixth pairing sequence in the 5′ region of the nucleotide sequence of interest to form a P10 duplex mimic. The P10 duplex mimic may comprise a Watson-Crick base pair, a wobble base pair or a combination thereof. The P10 duplex mimic may comprise at least two consecutive base pairs, for example, 3-10 base pairs, preferably 3, 4, 5, 6, 7, or 8 base pairs. In some embodiments, the P10 duplex mimic comprises 3-10 reverse complementary base pairs. In some embodiments, the fifth pairing sequence comprises a sequence of 3-10 contiguouse nucleotides, which is reverse complementary to a sequence of the same number of contiguous nucleotides in the sixth pairing sequence to form a P10 duplex mimic.
The sixth pairing sequence may be located adjacent to the 5′ end of the nucleotide sequence of interest, for example, starting from the nucleotide immediately downstream of the ωN (i.e., starting from the nucleotide 1 of the nucleotide sequence of interest (the nucleotide at the ωN+1 position in the RNA construct) or starting from a few nucleotides downstream of the ωN (for example, starting from the nucleotide 2 or 3 of the nucleotide sequence of interest (the nucleotide at the ωN+2 or N+3 position in the RNA construct). In some embodiments, the sixth pairing sequence starts from a nucleotide at a ωN+r position in the RNA construct, wherein r is an integer greater or equal to 1, for example r is an integer of 1-50, 10-40, 20-30, for example, r is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20. In some preferable embodiments, the sixth pairing sequence starts from the nucleotide at the ωN+1 position in the RNA construct. In some embodiments, ωN is guanine.
In some embodiments, the RNA construct comprises sequences for a P1 extension mimic but not a P10 duplex mimic. In some embodiments, the linker sequence comprises, from 5′ to 3′ end, a third pairing sequence, a loop sequence, and a fourth pairing sequence, wherein the third and fourth pairing sequences form a P1 extension mimic, and part or the entire of a 3′ end portion of the linker sequence does not pair with a sequence in the 5′ region of the GOI.
In some embodiments, the RNA construct comprises sequences for a P10 duplex mimic but not a P1 extension mimic. In some embodiments, the linker sequence comprises a loop sequence, and a 3′ end portion of the loop sequence constitute a fifth pairing sequence which can pair with a sixth pairing sequence in the 5′ region of the GOI to form a P10 duplex mimic.
In some embodiments, the RNA construct comprises sequences for a P1 extension mimic and a P10 duplex mimic. In some embodiments, the fifth pairing sequence for the P10 duplex mimic and the fourth pairing sequence for the P1 extension mimic partially overlap. In some embodiments, the linker sequence comprises, from 5′ to 3′ end, a third pairing sequence, a loop sequence and a fourth pairing sequence, wherein the third and fourth pairing sequences form a P1 extension mimic, and a 3′ end portion of the loop sequence and a 5′ end portion or the entire of the fourth pairing sequence constitute a fifth pairing sequence which can pair with a sixth pairing sequence in the 5′ region of the GOI to form a P10 duplex mimic.
In some embodiments, the RNA construct has the structure of the following:
In some embodiments, t is an integer of 0-10.
In some embodiments, t is 0, the RNA construct has the structure of the following:
In some embodiments, the IGS comprises a sequence ‘GNNNNN’ and the target site comprises a sequence ‘nnnnnu’. In some embodiments, the IGS comprises a sequence ‘ANNNNN’ and the target site comprises a sequence ‘nnnnnc’. In some embodiments, ‘NNNNN’ and ‘nnnnn’ are reverse complementary.
The nucleotide sequence to be circularized can be split into a 5′ fragment ended with the selected target site and a 3′ fragment comprising the remaining sequence. The nucleotide sequence of interest may be formed by placing the 3′ fragment at the 5′ region and the 5′ fragment at the 3′ region of the GOI. In some embodiments, the circular RNA is formed by connecting the nucleotide immediately downstream of the ωN (i.e., the nucleotide at the ωN+1 position in the RNA construct) and the 3′ end nucleotide of the target site through the self-splicing of the RNA construct. Accordingly, in some embodiments, the circular RNA may substantially consist of the nucleotide sequence of interest. In some embodiments, the circular RNA is formed by connecting the nucleotide at ωN+1 position and the 3′ end nucleotide of the target site in the RNA construct. In some embodiments, ωN is guanine.
In some embodiments, the circular RNA comprises a noncoding sequence having a biological activity. Examples of a noncoding sequence having a biological activity include, but are not limited to, a micro RNA and a long non-coding (lnc) RNA. In some embodiments, the circular RNA comprises a protein-coding sequence. The protein-coding sequence may encode any protein, for example, a protein for therapeutic or diagnostic use. In some embodiments, the protein-coding sequence encodes an antibody.
When the circular RNA comprises a protein-coding sequence, the circular RNA may further comprise sequences necessary for translation, e.g., an internal ribosomal entry site (IRES) sequence upstream of the protein-coding sequence. In some embodiments, the IRES sequence is intact within the nucleotide sequence of interest. In some embodiments, the IRES sequence is split to the 5′ and 3′ ends of the nucleotide sequence of interest and connected after circularization (e.g.,
The nucleotide sequence of interest may comprise at least two protein-coding regions such that at least two different proteins can be expressed from the circular RNA. For example, a 2A or 2A-like sequence may be included between two protein-coding sequences to mediate co-translation of two proteins (also referred to as “Stop-Carry On” or “StopGo” translation). See, for example, de Lima JGS, Lanza DCF. 2A and 2A-like Sequences: Distribution in Different Virus Species and Applications in Biotechnology. Viruses. 2021 Oct. 26; 13 (11): 2160. Alternatively, two or more different IRES sequences may be used to drive the expression of two or more different protein-coding regions.
The RNA construct may comprise unmodified or modified nucleotides. In some embodiments, the RNA construct does not comprise uridine, but comprises nucleosides selected from pseudouridine (Ψ), 1-methylpseudouridine (1 mΨ), 5-methoxyuridine (5 moU), 2-thiouridine, or 4-thiouridine in place of uridine. In some embodiments, the RNA construct comprises 10%-100%, for example, 10%-90%, 20-80%, 30%-70%, 40-60%, or 50%-60% modified uridine in place of uridine, wherein the modified uridine is selected from pseudouridine (Ψ), 1-methylpseudouridine (1 mΨ), 5-methoxyuridine (5 moU), 2-thiouridine, or 4-thiouridine.
The circular RNA may be of any length. For example, the circular RNA may comprise about 200-10,000 nucleotides (e.g., about 300, about 400, about 500, about 600, about 700, about 800, about 900, about 1,000, about 2,000, about 3,000, about 4,000, about 5,000, about 6,000, about 7,000, about 8,000, or about 9,000 nucleotides, or a range defined by any two of the foregoing values). In some embodiments, the circular RNA comprises about 500-6,000 nucleotides (e.g., about 550, about 650, about 750, about 850, about 950, about 1,100, about 1,200, about 1,300, about 1,400, about 1,500, about 1,600, about 1,700, about 1,800, about 1,900, about 2,100, about 2,200, about 2,300, about 2,400, about 2,500, about 2,600, about 2,700, about 2,800, about 2,900, about 3,100, about 3,300, about 3,500, about 3,700, about 3,800, about 3,900, about 4,100, about 4,300, about 4,500, about 4,700, about 4,900, about 5,100, about 5,300, about 5,500, about 5,700, or about 5,900 nucleotides, or a range defined by any two of the foregoing values).
The disclosure provides, in a first aspect, an RNA construct (Construct 1) comprising,
For example, the present disclosure provides:
The present disclosure provides, in a second aspect, an RNA construct (Construct 2) comprising, from 5′ end to 3′ end,
For example, the present disclosure provides:
The present disclosure further provides:
In a particular embodiment, the RNA construct of the present disclosure has a sequence selected from:
The RNA construct of the present disclosure may be synthesized in vivo or in vitro by transcription of a template DNA. For example, the DNA template may comprise a promoter upstream of the region that encodes the RNA construct. The promoter may be selected to enable transcription of the RNA construct in prokaryotic or eukaryotic cells. The promoter is recognized by an RNA polymerase, for example a T7 promoter, which is recognized by T7 virus RNA polymerase. In some embodiments, the promoter is a T7 promoter and the RNA polymerase is a T7 virus RNA polymerase; or the promoter is a T6 promoter, and the polymerase is a T6 virus RNA polymerase; or the promoter is an SP6 virus RNA polymerase promoter and the polymerase is SP6 virus RNA polymerase; or the promoter is T3 virus RNA polymerase promoter and the polymerase is T3 virus RNA polymerase; or the promoter is T4 virus RNA polymerase promoter and the polymerase is T4 virus RNA polymerase. In certain embodiments, the RNA polymerase promoter is a T7 virus RNA polymerase promoter and the polymerase is a T7 virus RNA polymerase. Other examples of promoters may include but are not limited to cytomegalovirus (CMV) immediate early promoter, eukaryotic translation elongation factor 1 α (EF-1α) promoter, simian virus 40 (SV40), U6 promoter, H1 promoter, chicken β-actin (CBA) promoter and human phosphoglycerate kinase 1 (hPGK) promoter.
The template DNA may be linear or circular. In some embodiments, the template DNA is prepared by linearizing a DNA plasmid, e.g., by a restriction enzyme. In other embodiments, the template is circular (e.g., a DNA plasmid). The template DNA may comprise an RNA polymerase terminator sequence element downstream of the region that encodes the RNA construct, especially when the template DNA is circular.
The template DNA comprises a sequence encoding the RNA construct, which as described above, is a linear RNA molecule that can self-splice, thereby producing a circular RNA (circRNA). The RNA construct contains the circRNA sequence plus splicing sequences (e.g., ribozyme core sequence and 5′ and 3′ recognizer sequences) necessary to circularize the RNA. These splicing sequences are removed from the RNA construct during the circularization, leaving a circRNA comprising the nucleotide sequence of interest. In some embodiments, the nucleoside moieties in the RNA construct are naturally occurring nucleosides, e.g., adenosine, guanosine, cytidine and uridine. In other embodiments, the nucleoside moieties in the RNA construct comprise nucleosides in addition to or in place of adenosine, guanosine, cytidine and uridine; for example the nucleosides comprise pseudouridine (Ψ), 1-methylpseudouridine (1 mΨ), 2-thiouridine, 4-thiouridine, 5-methoxyuridine (5 moU), 5-methylcytidine, N6-methyladenosine, inosine or a combination thereof, for example where uridine is replaced with pseudouridine, 1-methylpseudouridine, 2-thiouridine, 4-thiouridine or 5-methoxyuridine (5 moU), and/or cytidine is replaced with 5-methylcytidine, and/or adenosine is replaced with N6-methyladenosine, and/or guanosine is replaced with inosine.
In some embodiments, the DNA template comprises a promoter recognized by an RNA polymerase operably linked to a sequence encoding an RNA construct as described above. As used herein, the phrase “operably linked” means that the elements are positioned on the DNA template such that the RNA construct can be synthesized by in vitro or in vivo transcription of the template DNA. The RNA construct can then form the desired circRNA, e.g., using the methods disclosed herein.
The disclosure thus further provides a DNA construct, e.g., a plasmid, comprising a sequence encoding the RNA construct of the present disclosure, operably linked to a promoter.
The disclosure further provides methods for production of a circRNA by (i) in vitro transcription of a DNA construct, e.g., a plasmid, comprising a sequence encoding the RNA construct of the present disclosure, and (ii) circularization (i.e., self-splicing) of the RNA construct thus transcribed, in a buffered reaction solution comprising magnesium and ingredients required for in vitro transcription, e.g., an RNA polymerase, an RNase inhibitor, ATP, GTP, CTP, UTP, DTT, and a monovalent cation (Na+ or K+). Optionally, this method is carried out in one step, without a need to purify the RNA construct before allowing the RNA construct to self-splice. In other words, the in vitro transcription and the circularization occur in the same reaction solution at the same reaction conditions (e.g., temperature). Therefore, the reaction solution and reaction conditions must be optimized for the efficiency of both in vitro transcription and circularization.
As is shown in the examples below, the efficiency of the self-splicing and release of the circRNA requires optimal concentrations of magnesium ion. In some embodiments, the reaction solution comprises Mg2+ at the concentration greater than 26 mM, e.g., greater than 30 mM or greater than 35 mM. In some embodiments, the concentration of Mg2+ in the solution is from 30 mM to 100 mM, e.g., from 30 mM to 90 mM, from 30 mM to 80 mM, from 30 mM to 70 mM, from 30 mM to 60 mM, from 30 mM to 50 mM, from 30 mM to 40 mM, from 35 mM to 100 mM, from 35 mM to 90 mM, from 35 mM to 80 mM, from 35 mM to 70 mM, from 35 mM to 60 mM, from 35 mM to 50 mM, from 35 mM to 40 mM, from 38 to 66 mM, e.g., about 38 mM. In certain embodiments, the concentration of Mg2+ in the solution is from 38 mM to 66 mM.
In some embodiments, the reaction solution comprises a pyrophosphatase at the concentration of from 1 U/ml to 5 U/ml, e.g., from 1 U/ml to 4 U/ml, from 1.5 U/ml to 3 U/ml, from 1.5 U/ml to 2.5 U/ml, about 1 U/ml, about 2 U/ml, or about 4 U/ml. As used herein, 1 U (unit) of pyrophosphatase is defined as the amount of enzyme that generates 1 μmol of phosphate per minute from inorganic pyrophosphate under standard reaction conditions (a 10 minute reaction at 25° C. in 20 mM Tris-HCl, pH 8.0, 2 mM MgCl2 and 2 mM PPi).
The reaction solution further comprises ingredients required for in vitro transcription. In some embodiments, the reaction solution comprises an RNA polymerase, an RNase inhibitor, ATP, GTP, CTP, UTP, DTT, and a monovalent cation (Na+ or K+). In certain embodiments, the reaction solution comprises about 5 U/μl RNA polymerase, about 1 U/μl RNAse inhibitor, about 10 mM ATP, about 10 mM GTP, about 10 mM CTP, about 10 mM UTP, about 10 mM DTT, and 5 mM monovalent cation (Na+ or K+). The reaction solution may comprise a buffer. The pH of the reaction solution may be from 6 to 8, e.g., from 7 to 8, or about 7.5.
The RNA construct may be unmodified, partially modified or completely modified. In some embodiments, the RNA construct is unmodified, i.e., contains only naturally occurring nucleotides. In other embodiments, the RNA construct is partially modified or completely modified. A part or all of at least one ribonucleoside triphosphate in the reaction solution may be replaced with a modified nucleoside triphosphate in order to synthesize partially modified or completely modified RNA construct. Examples of modified nucleoside triphosphate include, but are not limited to, pseudouridine-5′-triphosphate, 1-methylpseudouridine-5′-triphosphate, 2-thiouridine-5′-triphosphate, 4-thiouridine-5′-triphosphate and 5-methylcytidine-5′-triphosphate.
RNA polymerase used for in vitro transcription may be chosen based on the RNA polymerase promoter in the DNA template. For example, if the RNA polymerase promoter in the DNA template is a T7 virus RNA polymerase promoter, the reaction solution may comprise a T7 RNA polymerase. In some embodiments, the reaction solution comprises an RNA polymerase selected from T7 virus RNA polymerase, T6 virus RNA polymerase, SP6 virus RNA polymerase, T3 virus RNA polymerase, or T4 virus RNA polymerase. In certain embodiments, the RNA polymerase promoter in the DNA template is a T7 virus RNA polymerase and the reaction solution comprises a T7 virus RNA polymerase.
In some embodiments, the in vitro transcription of the template DNA and the circularization (i.e., self-splicing) of the RNA construct are carried out at a temperature of from 37° C. to 55° C., e.g., from 39° C. to 55° C., from 41° C. to 55° C., from 43° C. to 55° C., from 37° C. to 50° C., from 39° C. to 50° C., from 41° C. to 50° C., from 43° C. to 50° C., from 37° C. to 47° C., from 39° C. to 47° C., from 41° C. to 47° C., from 43° C. to 47° C., from 47° C. to 55° C., from 50° C. to 55° C., from 39° C. to 43° C., about 37° C., about 39° C., about 41° C., about 43° C., about 47° C., about 53° C., or about 55° C. It has been found that the production of a major by-product, dsDNA, is reduced with increasing temperature. dsRNA can be recognized by cytosolic sensors such as RIG-I and MDA5 and then activate the innate immune system (Wu et al., 2020, “Synthesis of low immunogenicity RNA with high-temperature in vitro transcription, RNA 26, 345-360; Olejniczak, 2010, “Sequence-non-specific effects of RNA interference triggers and microRNA regulators, Nucleic Acids Res 38, 1-16). Since ds RNA production should be reduced as much as possible, a temperature higher than 37° C. is preferred. In some embodiments, the in vitro transcription of the template DNA and the circularization (i.e., self-splicing) of the RNA construct are carried out at a temperature higher than 37° C., e.g., from 39° C. to 55° C., from 41° C. to 55° C., from 43° C. to 55° C., from 39° C. to 50° C., from 41° C. to 50° C., from 43° C. to 50° C., from 39° C. to 47° C., from 41° C. to 47° C., from 43° C. to 47° C., from 47° C. to 55° C., from 50° C. to 55° C., from 39° C. to 43° C., about 39° C., about 41° C., about 43° C., about 47° C., about 53° C., or about 55° C.
A genetically modified RNA polymerase exhibiting increased thermo stability (e.g., T7 Toyobo) may be preferred if the in vitro transcription of the template DNA and the circularization (i.e., self-splicing) of the RNA construct are carried out at a high temperature. In some embodiments, the in vitro transcription of the template DNA and the circularization (i.e., self-splicing) of the RNA construct are carried out at a temperature of from 47° C. to 55° C., e.g., from 50° C. to 55° C., about 47° C., about 53° C., or about 55° C. and the RNA polymerase is a thermostable polymerase (e.g., T7 Toyobo).
The in vitro transcription of the template DNA and the circularization (i.e., self-splicing) of the RNA construct may be carried out for at least 1 hour, e.g., at least 1.5 hours, at least 2.5 hours, at least 3 hours, from 1 hour to 3 hours, from 1.5 hours to 3 hours, from 2 hours to 3 hours, or from 2.5 hours to 3 hours. The reaction time no less than 1.5 hours is preferred to guarantee the sufficient circularization. On the other hand, the prolongation of the reaction time has the potential to increase by-products. Therefore, the optimal reaction duration of the one-step process may be 2.5-3 hours. In a preferred embodiment, the in vitro transcription of the template DNA and the circularization (i.e., self-splicing) of the RNA construct are carried out for 2.5-3 hours.
In some embodiments, the method further comprises a step of removing the DNA template after the self-splicing of the RNA construct. The DNA template may be removed by adding a DNase I, e.g., for 30 min at 37° C.
In some embodiments, the method further comprises a step of purifying the circular RNA after the self-splicing of the RNA construct or after the step of removing the DNA template, if the method comprises a step of removing the DNA template. In some embodiments, the purification step is selected from a precipitation step, a tangential flow filtration step and a chromatographic step, and a combination thereof. The precipitation step may be an alcoholic precipitation step or LiCl precipitation. The tangential flow filtration step may be a diafiltration step using tangential flow filtration and/or a concentration step using tangential flow filtration. The chromatographic step may be selected from HPLC, anion exchange chromatography, affinity chromatography, hydroxyapatite chromatography, magnetic bead chromatography and core bead chromatography. In some embodiments, the purification step comprises a precipitation step, e.g., LiCl precipitation. In other embodiments, the purification step comprises a chromatography, e.g., magnetic bead chromatography.
The disclosure thus provides, in an aspect, a method of preparing a circular RNA (Method 1), comprising (i) providing a template DNA, wherein the template DNA comprises a sequence encoding the RNA construct of the present disclosure, operably linked to a promoter, in a reaction solution, thereby allowing synthesis of the RNA construct by in vitro transcription of the template DNA and allowing the RNA construct to self-splice, to produce a circular RNA, and (ii) recovering the circular RNA thus produced.
For example, the invention includes:
The disclosure further provides a circular RNA obtained by Method 1, et seq.
The disclosure further provides a pharmaceutical composition comprising a circular RNA obtained by any of Methods 1, et seq., e.g., a lipid nanoparticle comprising a circular RNA obtained by Method 1, et seq.
The disclosure further provides a pharmaceutical composition comprising a vector containing DNA expressing the RNA construct of the present disclosure.
The disclosure provides the following exemplary embodiments.
Embodiment 1. An RNA construct comprising, from 5′ end to 3′ end,
Embodiment 2. The RNA construct according to embodiment 1, wherein the ribozyme core comprises a nucleotide sequence encoding the scaffold domain and catalytic domain of a group I intron; preferably, the ribozyme core comprises or consists of the sequence from the IGS end to the sequence before the P9.0 duplex of a group I intron.
Embodiment 3. The RNA construct according to embodiment 1 or 2, wherein the ribozyme core is derived from a group IC1 (e.g., from Tetrahymena sp. (e.g., T. thermophile, T. cosmopolitanis, T. hyperangularis, T. malaccensis or T. pigmentosa) or Pneumocystis sp. (e.g., Pneumocystis carinii), IC2, IC3 (e.g., from Anabaena sp. PCC7120 or Azoarcus sp. BH72) or IA2 (e.g., from Bacteriophage Twort) intron.
Embodiment 4. The RNA construct according to embodiment 1 or 2, wherein the ribozyme core is derived from a Pneumocystis sp. group I intron; for example, a Pneumocystis sp. group I intron comprising a nucleotide sequence selected from SEQ ID NOs: 32-36; preferably, the ribozyme core is derived from a Pneumocystis carinii group I intron comprising the nucleotide sequence of SEQ ID NO: 32, for example, the ribozyme core comprises or consists of the nucleotide sequence of SEQ ID NO: 19 or a nucleotide sequence having at least 95% sequence identity thereto.
Embodiment 5. The RNA construct according to embodiment 1 or 2, wherein the ribozyme core is derived from a Tetrahymena sp. group I intron; for example, a Tetrahymena thermophila group I intron comprising the nucleotide sequence of SEQ ID NO: 12, for example, the ribozyme core comprises or consists of the nucleotide sequence of SEQ ID NO: 17 or a nucleotide sequence having at least 95% sequence identity thereto.
Embodiment 6. The RNA construct according to any one of embodiments 1-5, wherein the duplex-containing structure comprises one or more base pairs.
Embodiment 7. The RNA construct according to any one of embodiments 1-6, wherein the first and second pairing sequences each independently comprises 2-100 nucleotides; for example, the first pairing sequence comprises 2-20, 2-12, 4-10, 6, 7 or 8 nucleotides; and/or the second pairing sequence comprises 2-100, 5-80, 8-60, 10-50, 15, 20, 30, 40, 50, 60, 70, 80, 90 or 100, preferably 5-80 or 8-60 nucleotides.
Embodiment 8. The RNA construct according to any one of embodiments 1-6, wherein the RNA construct further comprises a 5′ homology arm sequence located upstream of the first pairing sequence and a 3′ homology arm sequence located downstream of the second pairing sequence, and the 5′ and 3′ homology arm sequences are at least partially reverse complementary.
Embodiment 9. An RNA construct comprising, from 5′ end to 3′ end,
Embodiment 10. The RNA construct according to embodiment 9, wherein the ribozyme core is as defined in any one of embodiments 2-5.
Embodiment 11. The RNA construct according to embodiment 9 or 10, wherein t is 0 or 1; for example, wherein t is 0, ‘(Nx)s(Ny)tG’ is ‘N2N1G’; or t is 1, ‘(Nx)s(Ny)tG’ is ‘N2N1NyG’; and ‘(nx)w’ is ‘n1n2’; wherein ‘N1’, ‘n1’, ‘N2’, ‘n2’ and ‘Ny’ are each independently any naturally occurring or modified nucleotide, ‘N1’ and ‘n1’ form a first base pair, and ‘N2’ and ‘n2’ form a second base pair.
Embodiment 12. The RNA construct according to embodiment 9 or 10, wherein the RNA construct further comprises a 5′ homology arm sequence located upstream of the ‘(Nx)s(Ny)tG’, and a 3′ homology arm sequence located downstream of the ‘(nx)w’, wherein the 5′ and 3′ homology arm sequences are at least partially reverse complementary.
Embodiment 13. An RNA construct comprising, from 5′ end to 3′ end,
Embodiment 14. The RNA construct according to Embodiment 13, wherein the group I intron is a group IC1 (e.g., from Tetrahymena sp. (e.g., T. thermophile, T. cosmopolitanis, T. hyperangularis, T. malaccensis or T. pigmentosa) or Pneumocystis carinii), IC2, IC3 (e.g., from Anabaena sp. PCC7120 or Azoarcus sp. BH72) or IA2 (e.g., from Bacteriophage Twort) intron; preferably, the group I intron is a group IC1 intron, for example, a Pneumocystis sp. or Tetrahymena sp. group I intron, more preferably, the group I intron comprises a nucleotide sequence selected from SEQ ID NOs: 32-36 and SEQ ID NO: 12.
Embodiment 15. The RNA construct according to embodiment 13, wherein the group I intron is a Pneumocystis carinii group I intron comprising the nucleotide sequence of SEQ ID NO: 32, ‘Np’ and ‘Nq’ are independently selected from any nucleotide from nucleotide 316 to nucleotide 342 of SEQ ID NO: 32; or the group I intron is a Tetrahymena thermophila group I intron comprising the nucleotide sequence of SEQ ID NO: 12, ‘Np’ and ‘Nq’ are independently selected from any nucleotide from nucleotide 313 to nucleotide 411 of SEQ ID NO: 12.
Embodiment 16. The RNA construct according to embodiment 13 or 14, wherein ‘Np’ and ‘Nq’ are independently selected from any nucleotide from the 3′ end nucleotide of the 5′ half to the 5′ end nucleotide of the 3′ half of a duplex of the group I intron, wherein the duplex is not a P9.0 duplex; for example, the duplex is a P9a/9b, P9.1, P9.1a or P9.2 duplex, preferably a P9.2 duplex.
Embodiment 17. The RNA construct according to embodiment 16, wherein ‘Np’ and ‘Nq’ are located within the region connecting the 5′ half and 3′ half of the duplex; or ‘Np’ is the 3′ end nucleotide of the 5′ half of the duplex and ‘Nq’ is the 5′ end nucleotide of the 3′ half of the duplex.
Embodiment 18. The RNA construct according to any one of embodiments 13-17, wherein the RNA construct further comprises a 5′ homology arm sequence located upstream of the first nucleotide sequence and a 3′ homology arm sequence located downstream of the second nucleotide sequence; wherein the 5′ and 3′ homology arm sequences are at least partially reverse complementary.
Embodiment 19. The RNA construct according to any one of embodiments 1-18, wherein the non-Waton-Crick base pair formed between the 5′ end nucleotide of the IGS and the 3′ end nucleotide of the target site is
Embodiment 20. The RNA construct according to any one of embodiments 1-17, wherein the IGS and the target site form a P1 duplex mimic.
Embodiment 21. The RNA construct according to any one of embodiments 1-20, wherein
Embodiment 22. The RNA construct according to any one of embodiments 1-21, wherein the IGS comprises a sequence ‘GNNNNN’ and the target site comprises a sequence ‘nnnnnu’; or the IGS comprises a sequence ‘ANNNNN’ and the target site comprises a sequence ‘nnnnnc’; wherein ‘NNNNN’ and ‘nnnnn’ are reverse complementary.
Embodiment 23. The RNA construct according to any one of embodiments 1-22, wherein the RNA construct further comprises a linker sequence located between the target site and IGS.
Embodiment 24. The RNA construct according to embodiment 23, wherein the linker sequence comprises an unpaired sequence, wherein the target site, the linker sequence and the IGS form a stem-loop structure.
Embodiment 25. The RNA construct according to embodiment 23, wherein the linker sequence comprises, from 5′ end to 3′ end, a third pairing sequence, a loop sequence and a fourth pairing sequence, wherein the third and fourth pairing sequences form a P1 extension mimic; preferably, the P1 extension mimic comprises 1-3 reverse complementary base pairs.
Embodiment 26. The RNA construct according to any one of embodiments 23-25, wherein the linker sequence comprises a fifth pairing sequence which can pair with a sixth pairing sequence at the 5′ region of the nucleotide sequence of interest to form a P10 duplex mimic; preferably, the P10 duplex mimic comprises 3-10 base pairs.
Embodiment 27. The RNA construct according to embodiment 1, having the structure of:
Embodiment 28. A DNA construct comprising a sequence encoding an RNA construct according to any one of embodiments 1-27.
Embodiment 29. A method of preparing a circular RNA comprising (i) providing a DNA construct according to embodiment 28 in a reaction solution, thereby allowing synthesis of the RNA construct by in vitro transcription of the DNA construct and allowing the RNA construct to self-splice, to produce a circular RNA, and (ii) recovering the circular RNA thus produced.
Circular RNA is prepared as follows:
The DNA sequence encoding a circRNA precursor (precursor sequence, SEQ ID NO: 1) based on a Tetrahymena thermophile group I intron comprising the nucleotide sequence of SEQ ID NO: 12 (hereafter referred to as ribozyme T, comprising a ribozyme core sequence of SEQ ID NO: 17) was chemically synthesized and cloned into an expression vector (Genscript) containing a T7 promoter to generate the template plasmid for in vitro transcription (IVT) of the circRNA precursor. The nucleotide sequence to be circularized (SEQ ID NO: 50) comprises a 5′ UTR comprising an IRES sequence from Human rhinovirus B, an open reading frame (ORF) sequence encoding the green fluorescent protein (GFP) and a 3′ UTR. The backsplicing site is designed inside the ORF (corresponding to
The plasmid linearized by BsaI enzymatic digestion is used as a template for the IVT reaction. A single reaction system (20 μL in total) is prepared as follows: 1 U/μL RNase Inhibitor (Novoprotein E125), 6.67 mM ATP, 20 mM GTP, 6.67 mM CTP, 6.67 mM UTP, 1×Transcription buffer (Novoprotein GMP-EB121 containing 6 mM MgCl2), 10 mM DTT (Sigma 43816), 4 U/mL Pyrophosphatase Inorganic (Novoprotein GMP-M036), 5 mM NaCl (Invitrogen AM9760G), 18 mM MgCl2 (Invitrogen M1028), 5 U/μL T7 RNA polymerase (Novoprotein GMP-E121), 25 ng/μL linearized plasmid. IVT is carried out at 37° C. for 3 hours and then was treated by DNase I (Novoprotein GMP-E127) for 30 min at 37° C. to remove DNA templates. The RNA construct is purified by precipitation with 7.5 M LiCl or column purification using a Monarch RNA cleanup kit (NEB). A fragment analyzer is applied to evaluate the products.
The plasmid linearized by BsaI enzymatic digestion was used as a template for the IVT reaction. A single reaction system (20 μL in total) was prepared as follows: 1 U/μL RNase Inhibitor (Novoprotein E125), 10 mM ATP, 10 mM GTP, 10 mM CTP, 10 mM UTP, 1×Transcription buffer (Novoprotein GMP-EB121; containing 6 mM MgCl2), 10 mM DTT (Sigma 43816), 4 U/mL Inorganic Pyrophosphatases (Novoprotein GMP-M036), 5 mM NaCl (Invitrogen AM9760G), MgCl2 (Invitrogen M1028) ranging from 30 mM to 50 mM, 5 U/μL T7 RNA polymerase (KactusBio GMP-T7P-EE101-12), 25 ng/μL linearized plasmid. The reaction was carried out at 37° C. for 3 hours; IVT products were treated with DNase I (Novoprotein GMP-E127) for 30 min at 37° C. to remove DNA templates. RNAs were purified by 7.5 M LiCl precipitation or column purification using a Monarch RNA cleanup kit (NEB).
A fragment analyzer (FA) was applied to evaluate the products. Specifically, in the RNA mode, purified circular RNAs were further analyzed with capillary electrophoresis with Agilent 5200 or 5300 Bioanalyzer. Samples were diluted to an appropriate concentration and analyzed according to the manufacturer's instructions (Agilent DNF-471 RNA Kit, 15 nt). Agilent ProSize Data Analysis Software was utilized to analyze the results. The Smear analysis module was applied to identify the peak range corresponding to the circular RNA component. As FA cannot distinguish between circRNA and nicked RNA, both components were exhibited in a single peak before the precursor peak, as shown in
Some samples from the 1.2 and 1.3 reactions were treated with RNase R to verify the generation of circular RNA. A single reaction system (50 μL in total) was prepared as follows: adding 5 μL 10×RNase R reaction buffer (10×: 0.2 M Tris-HCl pH 8.0, 1 M KCl, 1 mM MgCl2) and RNase R 30 unit to IVT RNAs with a total amount of 10 μg to 50 μg (adjust the volume to 50 μL with water). After incubation at 37° C. for 20 minutes, the products were purified using the Monarch RNA cleanup kit (NEB). In all cases, 150 ng RNA per sample was diluted 1:1 in volume with 2×GLB II (gel loading buffer II, Thermofisher) to a final volume of 20 μl/well, heated to 75° C. for at least 2 min, and cooled on ice for at least 3 min. RNA was then separated on a precast 2% E-Gel EX Agarose Gel (Invitrogen) on an E-Gel Power Snap Electrophoresis System (Invitrogen) using the E-Gel EX 1%-2% program; ssRNA ladder (NEB) was used as a standard. Bands were visualized using blue light transillumination.
It is shown that the RNA precursor construct could be directly self-spliced and circularized in the IVT system by adjusting the final concentration of Mg2+ greater than 26 mM, such as increasing to a certain range, including but not limited to 36 mM to 56 mM (
Previous studies reported that the preparation process of circular RNAs is accompanied by the generation of nicked RNAs, which cannot be separated from circular RNAs with equivalent molecular size by traditional electrophoretic methods such as capillary electrophoresis or agarose gel but can be separated and detected by the E-Gel EX system. E-Gel shows a band of nicked RNA under the corresponding band of the precursor (
To examine the cellular expression of the circularized RNA with a complete ORF region, the circularization products, including the RNase R-treated samples, were transfected into HEK293 cells with the precursor as a control. Specifically, 50000 cells were seeded per well of a 96-well plate, 100 ng RNA sample was transfected into cells per well using transfection reagent (TransIT, Mirus), and reporter gene expression was detected by flow cytometry 48 h later. The results show that the circularization products and RNase R digested products could effectively express GFP but not for the RNA construct (
A circRNA precursor (precursor sequence, SEQ ID NO: 2) based on ribozyme T was generated and purified through the same processes described in Examples 1.1 and 1.2. The backsplicing site was designed inside the IRES (
The results show that adjusting magnesium ion concentration, including but not limited to 36 mM to 56 mM, can prompt the precursor in the IVT system to directly undergo a self-splicing reaction (
A circRNA precursor (precursor sequence, SEQ ID NO: 3) based on a Pneumocystis carinii group I intron (hereinafter referred to as ribozyme P, comprising a ribozyme core sequence of SEQ ID NO: 19) was generated and purified through the same processes described in Examples 1.1 and 1.2 (
E-gel shows that ribozyme P can catalyze the self-cleavage of the precursor in the IVT reaction (tested Mg2+ concentration was 56 mM), and the splicing products were subjected to RNase R digestion to confirm the generation of circular RNA (
Different from the structures mentioned above (the sequence determining the 3′ splice site was split to the 5′ and 3′ regions of the precursor), in this Example, the sequence determining the 5′ splice site was split to the 5′ and 3′ regions of the precursor (
The results show that circularization of the precursor is triggered when adjusting the Mg2+ concentration of IVT to include but not limited to a concentration of 46 mM. The digestion of RNase R could remove most of the linear components (like the precursor) and enrich the circular RNA, as shown in E-Gel (
However, compared to the design of splitting the 3′ splice site sequence (
P9.0 duplex is essential for the recognition of the 3′ splice site. In addition to Watson-Crick base pairing in P9.0 as tested in Example 3 (precursor sequence, SEQ ID NO: 3), this Example tested a P9.0 containing a wobble base pair, G-U (
E-Gel shows that a P9.0 containing wobble base pairs could also be compatible with the self-splicing reaction of ribozyme P (tested Mg2+ concentrations were from 36 mM to 56 mM) (
P9.2 duplex facilitates 3′ site splicing. In this Example, the sequences for the 3′ and 5′ halves of P9.2 were removed to study the effects of P9.2 on circularization. The circRNA precursor (precursor sequence, SEQ ID NO: 7;
E-Gel shows that after the removal of P9.2, ribozyme T could still catalyze the self-splicing reaction (
Previous studies have found that other wobble base pairs, except for U-G, can also effectively promote the splicing reaction of ribozymes, although the reaction efficiency varies (Dana A. B. et al., Molecular Recognition in a Trans Excision-Splicing Ribozyme: Non-Watson-Crick Base Pairs at the 5′ Splice Site and ωG at the 3′ Splice Site Can Play a Role in Determining the Binding Register of Reaction Substrates, Biochemistry 2005 44 (3), 1067-1077). In this study, the effect of the C-A wobble base pair on circularization was investigated. A circRNA precursor (precursor sequence, SEQ ID NO. 8;
E-gel shows that P1 using C-A base pair could still be compatible with the self-splicing reaction of ribozyme T (tested Mg2+ concentrations were from 36 mM to 56 mM), and the splicing products were subjected to RNase R digestion to confirm the generation of circular RNA (
The presence of R1 and R2 is crucial for the efficient formation of P9.0, enabling ribozyme T to mediate the complete splicing reactions. The spatial formation of homology arms from R1 and R2, along with P9.2, in a stem-like structure facilitates the proximity of the precursor's termini, thereby promoting splicing. To further investigate the necessity of homology arm sequences and P9.2, they were removed from SEQ ID NO: 1 to generate a circRNA precursor comprising the sequence of SEQ ID NO: 9. In SEQ ID NO: 9, R2 comprises the sequence from the 5′ half of P9.0 to the sequence before the 5′ half of P9.2, and R1 comprises the sequence from the end of P9.2 to ωG.
The circRNA precursor (SEQ ID NO: 9;
Surprisingly, the resulting precursor lacking homology arms and P9.2 still exhibited self-splicing activity, as evidenced by E-Gel analysis, and the digestion by RNase R further confirmed the generation of circular RNA (
Based on the results of Examples 1 (SEQ ID NO: 1), 6 (SEQ ID NO: 7) and 8 (SEQ ID NO: 9), we hypothesize that the formation of a double-stranded region through the 5′ and 3′ homology arm sequences between R1 and R2 in proximity of the ωG is essential for a higher circularization efficiency. To test this hypothesis, the sequence connecting the 3′ half of P9.2 and the 3′ half of P9.0 (i.e., Spacer 1), and the sequence connecting the 5′ half of P9.0 and the 5′ half of P9.2 (i.e., Spacer 2) were removed from SEQ ID NO: 1 to generate a circRNA precursor comprising the sequence of SEQ ID NO: 10. In the absence of Spacer 1 and Spacer 2, the sequence for 3′ half of P9.2 and the sequence for 5′ half of P9.2 in SEQ ID NO: 10 can be simply regarded as 5′ and 3′ homology arm sequences, respectively.
The circRNA precursor (SEQ ID NO: 10;
Results from E-Gel revealed that precursor molecules lacking both spacers can still undergo self-splicing circularization and RNase R digestion effectively enriched circular RNA (
Based on the previous results (for example, Example 9), the 5′ and 3′ homology arm sequences are essential for high efficiency recognition and splicing of the 3′ splice site. To further confirm the necessity of the 5′ and 3′ homology arm sequences, they were directly removed from SEQ ID NO: 6, leaving only two bases for pairing similar to P9.0 (
E-Gel results demonstrate that the precursor could undergo self-splicing and circularization without the external homology arms (
Based on previous results (e.g., Example 9), the recognition and splicing of the 3′ splicing site can be achieved by partially pairing the two ends of the precursor to form a duplex similar to P9.0. To further confirm the necessity of the 5′ and 3′ homology arm sequences for ribozyme T, they were directly removed from SEQ ID NO: 10, leaving only two bases for pairing similar to P9.0, to generate a circRNA precursor comprising the sequence of SEQ ID NO: 39 (
The circRNA precursor (SEQ ID NO: 39) was generated and purified through the same processes described in Examples 1.1 and 1.2. The backsplicing site was designed inside the ORF (
Results from E-Gel demonstrated that the precursor could undergo self-splicing and circularization without the external homology arms (
To further validate the significance of the paired structure formed between the 5′ and 3′ ends of the precursor, R1 and R2 were designed to form unpaired structures, like loops, to generate a circRNA precursor comprising the sequence of SEQ ID NO: 40 (
The circRNA precursor (SEQ ID NO: 40) was generated and purified through the same processes described in Examples 1.1 and 1.2. The back-splicing site was designed inside the ORF (
The results demonstrated that forming a paired structure between the two ends of the precursor molecule is crucial for completing the two-step self-splicing reaction. Without this paired structure, the circularization efficiency was significantly reduced (
Based on the results obtained in Example 12, it was observed that absence of a paired structure between R1 and R2 led to a significant inhibition of the circularization reaction. To further validate the necessity of paired structure formation at both ends of the precursor, homology arms were reintroduced to generate a circRNA precursor comprising the sequence of SEQ ID NO: 41 (
The circRNA precursor (SEQ ID NO: 41) was generated and purified through the same processes described in Examples 1.1 and 1.2. The backsplicing site was designed inside the ORF (
E-Gel results demonstrated that reintroduction of this paired structure resulted in a substantial improvement in circularization efficiency (
Examples 12 and 13 have demonstrated that incorporating complementary pairing structures between R1 and R2 is crucial for effective circularization. To further investigate the flexibility of pairing design, the base pairs within the P9.0 duplex at the 5′ and 3′ positions were swapped while still maintaining the complementary design. The resultant circRNA precursor has the sequence of SEQ ID NO: 44 (
The circRNA precursor (SEQ ID NO: 44) based on ribozyme T was generated and purified through the same processes described in Examples 1.1 and 1.2. The back-splicing site was designed inside the ORF (
The results indicated that there were no restrictions on the order (from 5′ to 3′) of base pairs on the structure of the P9.0 duplex to complete the splicing reaction (
This study applied the cis-splicing circularization system proposed in the present invention to other group I introns, specifically Anabeana pre-tRNA group I intron.
The circRNA precursor (based on Anabaena (sp. strain PCC 7120)-hereafter referred to as “ribozyme A”) was generated and purified through the same processes described in Examples 1.1 and 1.2 (SEQ ID NO: 45) (
E-gel results demonstrated that such a design included only homology arms sequences in R1 and R2, along with the P9.0 duplex (see
To examine the cellular expression of the circularized RNA, RNase R-treated circularization samples were transfected into A549 cells. A mock transfection served as a negative control. At 24 hours after transfection, the cells grown in 100 μl of culture medium were added with 100 μl of reagent (ωNE-Glo™ Luciferase Assay, Promega) (for 96-well plates) and lysed by rocking and pipetting for roughly 3 minutes at room temperature. Then, the plate was read on a TECAN M1000 Infinite Pro microplate reader using i-control 1.10 software with an integration time of 1,000 ms. The results demonstrated that the resulting circular RNA was enriched by RNase R, leading to increased expression of the reporter gene (luciferase) in cells (
In this Example, the natural exon sequence flanking the ribozyme A was deleted and replaced with the sequence from within the GOI (e.g., the sequence after ωG, see SEQ ID 45), although the replaced sequence may be partially homology with the natural exon sequence. For ribozyme A, the product of the cis-splicing reaction needed to be further enriched by RNase R so that the band corresponding to circRNA was able to be detected more prominently in the E-Gel (
It is reported that the P10 duplex is formed following the initial step of the splicing reaction and is closely associated with the subsequent step of splicing in the self excision of some group I intron, including ribozyme T. To study the roles of P10 duplex and P1 extension in the cis-circularization of the RNA construct of the present application, the entire sequences for P10-2 (including P1-ex-2) as well as P1-ex-1 were removed from the linker sequence (i.e., ‘CACAUUUUACA’) (see the linker sequence in SEQ ID NO: 10) to generate a circRNA precursor comprising the nucleotide sequence of SEQ ID NO: 46 (
E-Gel results demonstrated that the complete removal of the P10 duplex and P1 extension did not prevent the precursor from undergoing the two-step splicing reaction and circularization (
To further study the role of P10 duplex and P1 extension, the 5′ end portion of P10-2 and P1-ex-1 were removed from the linker sequence (‘CACAUUUUACAAUG’) (see the linker sequence in SEQ ID NO: 10) to generate a circRNA precursor comprising the nucleotide sequence of SEQ ID NO: 47 (
E-Gel results demonstrate that even a shorter P10 duplex and P1 extension significantly improved two-step splicing efficiency (
While the disclosure has been described with respect to specific examples including presently preferred modes of carrying out the disclosure, those skilled in the art will appreciate that there are numerous variations and permutations of the above described systems and techniques. It is to be understood that other embodiments may be utilized and structural and functional modifications may be made without departing from the scope of the present disclosure. Thus, the scope of the disclosure should be construed broadly as set forth in the appended claims.
CAU
-
Spacer 1-
-
sequence (including
5' portion of
P10-2)-P1-ex-2 (3′ portion of
P10-2); ; Tetrahymena
ribozyme core: SEQ ID NO: 17;
-Spacer 2-
-Arm I; Tail]
AAAAGUUAUCAGGCAUGCACCUGGUAGCUAGUCU
UUAAACCAAUAGAUUGCAUCGGUUUAAAAGGCAAGACCGUCA
AAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUC
AGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAA
GCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUC
AACAGAUCUUCUGUUGAUAUGGAUGCAGUUCACAGACUAAAU
GUCGGUCGGGGAAGAUGUAUUCUUCUCAUAAGAUAUAGUCG
CUCCUUAAUGGGAGCUAGCGGAUGAAGUGAUGCAACAC
UGGAGCCGCUGGGAACUAAUU
ACCAGUGGACAAUCGACGGAUA
ACCGUCGAUUGUCCACUGGUC
UUA
-
UAGACAUGGUGUGAAGACUCGCAUGUGCUUGGUUGUGAUUCCUC
Spacer 1-
-
;
sequence (including
5' portion of
P10-2)-P1-ex-2 (3′ portion of
P10-2); ; Tetrahymena
ribozyme core: SEQ ID NO: 17;
-Spacer 2-
-Arm I; Tail]
AAAAGUUAUCAGGCAUGCACCUGGUAGCUAGUCUUUAA
ACCAAUAGAUUGCAUCGGUUUAAAAGGCAAGACCGUCAAAUU
GCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGG
GAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUG
ACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACA
GAUCUUCUGUUGAUAUGGAUGCAGUUCACAGACUAAAUGUCG
GUCGGGGAAGAUGUAUUCUUCUCAUAAGAUAUAGUCGGACCU
CUCCUUAAUGGGAGCUAGCGGAUGAAGUGAUGCAACACUGGA
GCCGCUGG
GAACUAAUU
ACCAGUGGACAAUCGACGGAUAACAGC
ACCGUCGAUUGUCCACUGGUCGCCUUG
UUAUAGACAUGGUG
-
sequence (including
5' portion of
P10-2)-P1-ex-2 (3′ portion of
P10-2); ; Pneumocystis
carinii
ribozyme core: SEQ ID
-Arm III;
GAAAGCGGCGUG
AAAACGUUAGCUAGUGAUCUGGAAUAAAUUCAGAUUGCGACA
CUGUCAAAUUGCGGGGAAGCCCUAAAUAUUCAACUACUAAGC
AGUUUGUGGAAACACAGCUGUGGCCGAGUUAAUAGCCCUGGG
UAUAGUAACAAUGUUGAAUAUGAAUCUUUUGGGAGAUGAAAU
GGGUGAUCCGCAGCCAAGUCCUAAGGGCAUUUUUGUCUAUGG
AUGCAGUUCAACGACUAGAUGGCAGUGGGUAUUGUAAGGAAU
UGCAGUUUUCUUGCAGUGCUUAAGGUAUAGUCU
CAGUGGACAAUCGACGGAUAACAGCAUAUCUAGAAAAAAAAAA
ACCGUCGAUUGUCCACUGGUC
UUACA
UCUA
UA
AAAA
GUUAUCAGGCAUGCACCUGGUAGCUAGUCUUUAAACCAAUAG
Linker sequence including 5′
AUUGCAUCGGUUUAAAAGGCAAGACCGUCAAAUUGCGGGAAA
portion of P10-2 and P1-ex-1 (3′
GGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUU
portion of P10-2); ;
GAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAU
Tetrahymena
ribozyme core:
GGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCU
GUUGAUAUGGAUGCAGUUCACAGACUAAAUGUCGGUCGGGGA
AGAUGUAUUCUUCUCAUAAGAUAUAGUCGGACCUCUCCUUAA
UGGGAGCUAGCGGAUGAAGUGAUGCAACACUGGAGCCGCUGG
GAACUAAUUUGUAUGCGAAAGUAUAUUGAUUAGUUUUGGAGU
P1-ex-2
;
ACUCG
UUAUAGACAUGGUGUGAAGACUCGCAUGUGCUUGGUUGU
Arm I; Tail]
CAGUGGACAAUCGACGGAUAACAGCAUAUCUAGAAAAAAAAAA
GGACGGGCAAG
UUACA
UCUA
UAA
AAAAGUUAUCAGGC
AUGCACCUGGUAGCUAGUCUUUAAACCAAUAGAUUGCAUCGG
Linker sequence including
UUUAAAAGGCAAGACCGUCAAAUUGCGGGAAAGGGGUCAACA
5' portion of P10-2 and P1-ex-1 (3′
GCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCU
portion of P10-2); ;
UGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACC
Tetrahymena
ribozyme core:
ACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGG
AUGCAGUUCACAGACUAAAUGUCGGUCGGGGAAGAUGUAUUC
UUCUCAUAAGAUAUAGUCGGACCUCUCCUUAAUGGGAGCUAG
CGGAUGAAGUGAUGCAACACUGGAGCCGCUGGGAACUAAUUU
GUAUGCGAAAGUAUAUUGAUUAGUUUUGGAGUACUCGU
UAUA
P1-ex-2;
GACAUGGUGUGAAGACUCGCAUGUGCUUGGUUGUGAUUCCUCCG
Arm II; Tail]
ACCGUCGAUUGUCCACUGGUCGCCUUG
UUAUAGACAUGGUG
-
target site
at the 3′ end; Linker
sequence (including 5′ portion of
P10-2)-P1-ex-2 (3′ portion of
; Pneumocystis
carinii
ribozyme core: SEQ ID
-Arm III;
GAAAGCGGCGUG
AAAACGUUAGCUAGUGAUCUGGAAUAAAUUCAGAUUGCGACA
CUGUCAAAUUGCGGGGAAGCCCUAAAUAUUCAACUACUAAGC
AGUUUGUGGAAACACAGCUGUGGCCGAGUUAAUAGCCCUGGG
UAUAGUAACAAUGUUGAAUAUGAAUCUUUUGGGAGAUGAAAU
GGGUGAUCCGCAGCCAAGUCCUAAGGGCAUUUUUGUCUAUGG
AUGCAGUUCAACGACUAGAUGGCAGUGGGUAUUGUAAGGAAU
UGCAGUUUUCUUGCAGUGCUUAAGGUAUAGUCU
CAGUGGACAAUCGACGGAUAACAGCAUAUCUAGAAAAAAAAAA
ACCGUCGAUUGUCCACUGGUC
UUAUAGACAUGG
-
sequence (including 5′ portion of
P10-2)-P1-ex-2 (3′ portion of
P10-2
); ; Tetrahymena
ribozyme core: SEQ ID NO: 17;
-Spacer 2-
Arm I; Tail]
AA
AAGUUAUCAGGCAUGCACCUGGUAGCUAGUCUUUAAACCAAU
AGAUUGCAUCGGUUUAAAAGGCAAGACCGUCAAAUUGCGGGA
AAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACU
UUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGAC
AUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUU
CUGUUGAUAUGGAUGCAGUUCACAGACUAAAUGUCGGUCGGG
GAAGAUGUAUUCUUCUCAUAAGAUAUAGUCG
CUCCUUA
AUGGGAGCUAGCGGAUGAAGUGAUGCAACACUGGAGCCGCUG
G
ACCAGUGGACAAUCGACGGAUAACAGCAUAUCUAGAAAAAAA
UUA
-
UAGACAUGGUGUGAAGACUCGCAUGUGCUUGGUUGUGAUUCCUC
Spacer 1
- -
sequence (including
5′ portion of
P10-2)-P1-ex-2 (3′ portion of
P10-2)
; ; Tetrahymena
ribozyme core: SEQ ID NO: 17;
-Spacer 2-
-Arm I; Tail]
AAAAGUUAUCAGGCAUGCACCUGGUAGCUAGUCUUUAA
ACCAAUAGAUUGCAUCGGUUUAAAAGGCAAGACCGUCAAAUU
GCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGG
GAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUG
ACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACA
GAUCUUCUGUUGAUAUGGAUGCAGUUCACAGACUAAAUGUCG
GUCGGGGAAGAUGUAUUCUUCUCAUAAGAUAUAGUCG
CUCCUUAAUGGGAGCUAGCGGAUGAAGUGAUGCAACACUGGA
GCCGCUGG
GAACUAAUU
ACCAGUGGACAAUCGACGGAUAACAGC
UGGAGUAC
CAUGGCCGACAAGCAGAAGAACGGCAUCAAGGC
-
sequence (including
5′ portion of
P10-2)-P1-ex-2 (3′ portion of
P10-2)
; ; Tetrahymena
ribozyme core: SEQ ID NO: 17;
-Spacer 2-
UUUUACAGGCCAUG
AAAAGUUAUCAGGCAUGCACCUG
GUAGCUAGUCUUUAAACCAAUAGAUUGCAUCGGUUUAAAAGG
CAAGACCGUCAAAUUGCGGGAAAGGGGUCAACAGCCGUUCAG
UACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGG
UAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCA
AGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC
ACAGACUAAAUGUCGGUCGGGGAAGAUGUAUUCUUCUCAUAA
GAUAUAGUCG
CCUCUCCUUAAUGGGAGCUAGCGGAUGAAG
UGAUGCAACACUGGAGCCGCUGG
UAACAGCAUAUCUAGAAAAA
-
-
sequence (including
5′ portion of
P10-2)-P1-ex-2 (3′ portion of
P10-2)
; ; Tetrahymena
ribozyme core: SEQ ID NO: 17;
-
-Arm I; Tail]
AA
AAGUUAUCAGGCAUGCACCUGGUAGCUAGUCUUUAAACCAAU
AGAUUGCAUCGGUUUAAAAGGCAAGACCGUCAAAUUGCGGGA
AAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACU
UUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGAC
AUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUU
CUGUUGAUAUGGAUGCAGUUCACAGACUAAAUGUCGGUCGGG
GAAGAUGUAUUCUUCUCAUAAGAUAUAGUCG
GAACUAAUU
A
CCAGUGGACAAUCGACGGAUAACAGCAUAUCUAGAAAAAAAAAA
UUAUAGACAUGGUGUGAAGACUCGCAUGUGCUUGGUUGUGA
-
sequence (including
5′ portion of
P10-2)-P1-ex-2 (3′ portion of
P10-2)
; ; Pneumocystis
carinii
ribozyme core: SEQ ID
; Tail]
GAAAGCGGCGUGAAAACGUUAGCUAGUGAUCUGGAAUAA
AUUCAGAUUGCGACACUGUCAAAUUGCGGGGAAGCCCUAAAU
AUUCAACUACUAAGCAGUUUGUGGAAACACAGCUGUGGCCGA
GUUAAUAGCCCUGGGUAUAGUAACAAUGUUGAAUAUGAAUCU
UUUGGGAGAUGAAAUGGGUGAUCCGCAGCCAAGUCCUAAGGG
CAUUUUUGUCUAUGGAUGCAGUUCAACGACUAGAUGGCAGUG
GGUAUUGUAAGGAAUUGCAGUUUUCUUGCAGUGCUUAAGGUA
UAGUCU
UAACAGCAUAUCUAGAAAAAAAAAAAAAAAAAAAA
thermophila group I intron)
IGS-5′ half of P9.0-Spacer 2-
5′ half of P9.2
-Spacer 3-3′
half of P9.2
-Spacer 1-3′ half
of P9.0-
ωG
CACUGGAGCCGCUGGGAACUAAUUUGUAUGCGAAAGUAUAUUG
AUUAGUUUUGGAGUACUCG
thermophila group I intron, from
thermophila group I intron
carinii group I intron, from IGS
Pneumocystis carinii group I
are marked)
Pneumocystis carinii group I
Pneumocystis wakefieldiae group I
Pneumocystis carinii group I
carinii group I intron)
CAUGGCCGACAAGCAGAAGAACGGCAUCAAGGCGAACUUCA
-
sequence (including
5′ portion of
P10-2)-P1-ex-2 (3′ portion of
P10-2)
; ; Tetrahymena
ribozyme core: SEQ ID NO: 17;
; Tail]
GGCC
AUG
AAAAGUUAUCAGGCAUGCACCUGGUAGCUA
GUCUUUAAACCAAUAGAUUGCAUCGGUUUAAAAGGCAAGACC
GUCAAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAG
UCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUA
AUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUA
AGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUCACAGACU
AAAUGUCGGUCGGGGAAGAUGUAUUCUUCUCAUAAGAUAUAG
UCG
UAACAGCAUAUCUAGAAAAAAAAAAAAAAAAAAAAAAA
AACACCAA
CAUGGCCGACAAGCAGAAGAACGGCAUCAAGGCGA
sequence (including
5′ portion of
P10-2)-P1-ex-2 (3′ portion of
P10-2)
; ; Tetrahymena
ribozyme core: SEQ ID NO: 17;
UACA
GGCC
AUG
AAAAGUUAUCAGGCAUGCACCUGGUA
GCUAGUCUUUAAACCAAUAGAUUGCAUCGGUUUAAAAGGCAA
GACCGUCAAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUAC
CAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAU
GGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGU
CCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUCACA
GACUAAAUGUCGGUCGGGGAAGAUGUAUUCUUCUCAUAAGAU
AUAGUCG
AACCACAAUAACAGCAUAUCUAGAAAAAAAAAAAAA
ACCGUCGAUUGUCCACUGGUCGAUUAGUUUAACACCAAGCAUGG
-
Loop 1
-
sequence (including
5′ portion of
P10-2)-P1-ex-2 (3′ portion of
P10-2)
; ; Tetrahymena
ribozyme core: SEQ ID NO: 17;
Arm I; Tail]
G
AAAAGUUAUCAGGCAUGCACCUGGUAGCUAGUCUU
UAAACCAAUAGAUUGCAUCGGUUUAAAAGGCAAGACCGUCAA
AUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCA
GGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAG
CUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCA
ACAGAUCUUCUGUUGAUAUGGAUGCAGUUCACAGACUAAAUG
UCGGUCGGGGAAGAUGUAUUCUUCUCAUAAGAUAUAGUCG
AA
CCACAAGAACUAAUUACCAGUGGACAAUCGACGGAUAACAGCAU
ACCGUCGAUUGUCCACUGGUCGAUUAGUUUUGGAGUAC
CAU
-
GGCCGACAAGCAGAAGAACGGCAUCAAGGCGAACUUCAAGAUCC
Spacer 1
- -
sequence (including
5′ portion of
P10-2)-P1-ex-2 (3′ portion of
P10-2)
; ; Tetrahymena
ribozyme core: SEQ ID NO: 17;
-Spacer 2-
5′ half of P9.2
-Arm I; Tail]
UG
AAAAGUUAUCAGGCAUGCACCUGGUAGCUAGUCU
UUAAACCAAUAGAUUGCAUCGGUUUAAAAGGCAAGACCGUCA
AAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUC
AGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAA
GCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUC
AACAGAUCUUCUGUUGAUAUGGAUGCAGUUCACAGACUAAAU
GUCGGUCGGGGAAGAUGUAUUCUUCUCAUAAGAUAUAGUCG
CCUCUCCUUAAUGGGAGCUAGCGGAUGAAGUGAUGCAACAC
UGGAGCCGCUGG
GAACUAAUU
ACCAGUGGACAAUCGACGGAUA
CCGUCGAUUGUCCACUGGUCAAA
AAAAACAAAAAACA
-
Loop sequence-P1-ex-2; ;
Anabaena
ribozyme core: SEQ
-Spacer 2-
Arm IV; Tail]
CCUUAAAGAAGAAAUUC
UUUAAGUGGAUGCUCUCAAACUCAGGGAAACCUAAAUCUAGU
UAUAGACAAGGCAAUCCUGAGCCAAGCCGAAGUAGUAAUUAG
UAAGUCAACAAUAGAUGACUUACAACUAAUCGGAAGGUGCAG
AGACUCGACGGGAGCUACCCUAACGUCAAGACGAGGGUAAAG
AGAGAGUCCA
AAA
GACCAGUGGACAAUCGACGGAUAA
CCGUCGAUUGUCCACUGGUCGAUUAGUUU
CAUGGCCGACAA
-
-
sequence; ; Tetrahymena
ribozyme core: SEQ ID NO: 17;
-
5′ half of P9.2
-Arm IV; Tail]
AAAAGUUAUCAG
GCAUGCACCUGGUAGCUAGUCUUUAAACCAAUAGAUUGCAUC
GGUUUAAAAGGCAAGACCGUCAAAUUGCGGGAAAGGGGUCAA
CAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGC
CUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAA
CCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAU
GGAUGCAGUUCACAGACUAAAUGUCGGUCGGGGAAGAUGUAU
UCUUCUCAUAAGAUAUAGUCG
GAACUAAUU
ACCAGUGGACA
AUCGACGGAUAACAGCAUAUCUAGAAAAAAAAAAAAAAAAAAA
CCGUCGAUUGUCCACUGGUCGAUUAGUUU
CAUGGCCGACAA
-
-
sequnence (including 5′ portion
of P10-2)-P1-ex-2 (3′ portion of
; Tetrahymena
ribozyme core: SEQ ID NO: 17;
-5′ half of
P9.2
-Arm IV; Tail]
half of P9.0
and
are marked)
according to
encoding GFP with a
TTATAG
AGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGG
ACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCTGGCGAGG
GCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCA
TCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGT
GACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCC
GACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCG
AAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGG
CAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACAC
CCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGA
GGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAA
CAGCCACAACGT
CATGGCCGACAAGCAGAAGAACGGC
ATCAAGGCGAACTTCAAGATCCGCCACAACATCGAGGACGGCA
GCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCG
GCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCA
CCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATC
ACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCT
CGGCATGGACGAGCTGTACAAGTGATAAACCGGTGCTGGAGCC
)
AAGGGACCTGCCCCGTTTTACCCTTTGGAGGACGGTACAGCAG
GAGAACAGCTCCACAAGGCGATGAAACGCTACGCCCTGGTCC
CCGGAACGATTGCGTTTACCGATGCACATATTGAGGTAGACAT
CACATACGCAGAATACTTCGAAATGTCGGTGAGGCTGGCGGAA
GCGATGAAGAGATATGGTCTTAACACTAATCACCGCATCGTGG
TGTGTTCGGAGAACTCATTGCAGTTTTTCATGCCGGTCCTTGG
AGCACTTTTCATCGGGGTCGCAGTCGCGCCAGCGAACGACATC
TACAATGAGCGGGAACTCTTGAATAGCATGGGAATCTCCCAGC
CGACGGTCGTGTTTGTCTCCAAAAAGGGGCTGCAGAAAATCCT
CAACGTGCAGAAGAAGCTCCCCATTATTCAAAAGATCATCATT
ATGGATAGCAAGACAGATTACCAAGGGTTCCAGTCGATGTATA
CCTTTGTGACATCGCATTTGCCGCCAGGGTTTAACGAGTATGA
CTTCGTCCCCGAGTCATTTGACAGAGATAAAACCATCGCGCTG
ATTATGAACTCCTCGGGTAGCACCGGTTTGCCAAAGGGGGTGG
CGTTGCCCCACCGCACTGCTTGTGTGCGGTTCTCGCACGCTAG
GGACCCTATCTTTGGTAATCAGATCATTCCCGACACAGCAATC
CTGTCCGTGGTACCTTTTCATCACGGTTTTGGCATGTTCACGA
CTCTCGGCTATTTGATTTGCGGTTTCAGGGTCGTACTTATGTAT
CGGTTCGAGGAAGAGCTATTTTTGAGATCCTTGCAAGATTACA
AGATCCAGTCGGCCCTCCTTGTGCCAACGCTTTTCTCATTCTTT
GCGAAATCGACACTTATTGATAAGTATGACCTTTCCAATCTGC
ATGAGATTGCCTCAGGGGGAGCGCCGCTTAGCAAGGAAGTCG
GGGAGGCAGTGGCCAAGCGCTTCCACCTTCCCGGAATCCGGC
AGGGATACGGGCTCACGGAGACAACATCCGCGATCCTTATCAC
GCCCGAGGGTGACGATAAGCCGGGAGCCGTCGGAAAAGTGGT
CCCCTTCTTTGAAGCCAAGGTCGTAGACCTCGACACGGGAAAA
ACCCTCGGAGTGAACCAGAGGGGCGAGCTCTGCGTGAGAGGG
CCGATGATCATGTCAGGTTACGTGAATAACCCTGAAGCGACGA
ATGCGCTGATCGACAAGGATGGGTGGTTGCATTCGGGAGACA
TTGCCTATTGGGATGAGGATGAGCACTTCTTTATCGTAGATCG
ACTTAAGAGCTTGATCAAATACAAAGGCTATCAGGTAGCGCCT
GCCGAGCTCGAGTCAATCCTGCTCCAGCACCCCAACATTTTCG
ACGCCGGAGTGGCCGGGTTGCCCGATGACGACGCGGGTGAGC
TGCCAGCGGCCGTGGTAGTCCTCGAACATGGGAAAACAATGA
CCGAAAAGGAGATCGTGGACTACGTAGCATCACAAGTGACGA
CTGCGAAGAAACTGAGGGGAGGGGTAGTCTTTGTGGACGAGG
TCCCGAAAGGCTTGACTGGGAAGCTTGACGCTCGCAAAATCCG
GGAAATCCTGATTAAGGCAAAGAAAGGCGGGAAAATCGCTGT
CTGATAAAAAAAAACAAAAAAACAAAACAAAC AAAAACAAA
Number | Date | Country | Kind |
---|---|---|---|
PCT/CN2022/143232 | Dec 2022 | WO | international |
PCT/CN2023/085331 | Mar 2023 | WO | international |
PCT/CN2023/116485 | Sep 2023 | WO | international |
The present application claims priority to International Application Nos. PCT/CN2022/143232 filed on Dec. 29, 2022, PCT/CN2023/085331 filed on Mar. 31, 2023 and PCT/CN2023/116485 filed on Sep. 1, 2023. The contents of the above-referenced applications are hereby incorporated by reference in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2023/143083 | 12/29/2023 | WO |