The present application is based upon and claims the benefit of a priority of Chinese Patent Application No. 202111185073.9, filed on Oct. 12, 2021, and a priority of Chinese Patent Application No. 202111435528.8, filed on Nov. 29, 2021, the entire contents of which are incorporated herein by reference.
This applications contains a sequence listing that has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. Said XML file is named 53596-0007001_SL_ST26.xml. The XML file, created on Oct. 11, 2022, is 964,919 bytes in size.
The disclosure belongs to the technical field of bioinformatics and bioengineering, and specifically, the disclosure relates to use of a polynucleotide in initiating translation of a circular nucleic acid molecule, a polynucleotide having an activity of initiating translation of a circular nucleic acid molecule, a circular nucleic acid molecule including the polynucleotide, a cyclization precursor nucleic acid molecule, a recombinant nucleic acid molecule, a recombinant expression vector, a recombinant host cell, and use.
A messenger ribonucleic acid (mRNA) is transcribed from DNA and provides genetic information required for the next protein translation. When mRNA for encoding an antigenic protein is injected into the human body, the antigenic protein can be synthesized in the body, thereby inducing intense cellular and humoral immune responses and showing a characteristic of an autoimmune adjuvant, which makes the mRNA an excellent vaccine means. In addition, the mRNA has many other advantages as a vaccine or for production of a therapeutic protein. For example, compared with a DNA vector, the mRNA is transiently expressed in cells, without a risk of integration into a genome or dependence on a cell cycle, and therefore, the mRNA is much safer; compared with a viral vector, the mRNA does not have a feature of immune resistance caused by the vector itself, and therefore, protein is easier to express; and compared with a recombinant protein, a virus, and the like, a cell-free system is used during a production process of the mRNA, which only involves an in vitro enzyme-catalyzed reaction, resulting in a simpler and more controllable production process with lower costs. Currently, the mRNA shows a wide range of application potentials in serving as the vaccine, producing the therapeutic protein, serving as a means of gene therapy, and the like.
Currently, mRNAs for clinical or preclinical use are mainly linear mRNAs, and a structure of the linear mRNA includes a 5′ cap structure, a 3′ polyadenosine tail (PolyA tail), a 5′ untranslational region (5′ UTR), a 3′ untranslational region (3′ UTR), an open reading frame (ORF), and the like. The 5′ cap structure is an essential feature of eukaryotic mRNA and is obtained by adding N7-methylguanosine to a 5′ end of the mRNA. Studies have shown that the 5′ cap structure is bound to a translation initiation complex eif4E to promote mRNA translation, and can effectively prevent mRNA degradation and reduce immunogenicity of the mRNA. A main function of the 3′ polyadenosine tail is to bind to polyA binding protein (PABP) that interacts with eiF4G and eiF4E to mediate formation of circular mRNA, promote the translation, and prevent the mRNA degradation. The 5′ and 3′ untranslational regions, such as 5′ and 3′ untranslational regions using beta-globin, can effectively prevent mRNA degradation and promote translation from the mRNA to the protein.
Circular RNAs (circRNAs) are a common type of RNAs in eukaryotes. Natural circRNAs are mainly produced through a molecular mechanism referred to as “back splicing” in cells. Currently, it has been found that eukaryotic circRNAs have a variety of molecular and cellular regulatory functions. For example, the circular RNA can be bound to microRNAs (miRNAs) to regulate expression of target genes; and the circular RNA can be directly bound to a target protein to regulate gene expression, and the like. Currently identified circular RNAs mainly function as non-coding RNAs. However, circular RNAs capable of encoding proteins also exist in nature, namely, circular mRNAs. The circular mRNAs tend to have a longer half-life due to their circular properties, and therefore, it is speculated that the circular mRNAs may be more stable. Methods of forming the circular RNA in vitro include a chemical method, a protease catalysis method, a ribozyme catalysis method, and the like.
An internal ribosome entry site (IRES) is a cis-acting RNA sequence capable of recruiting ribosomal subunits to a translation initiation site of the mRNA independently of the 5′ cap structure, to mediate translation processes of viruses, some eukaryotes, and the like. The circular RNAs have a closed ring structure and lack typical translation initiation elements, but the circular RNAs can still implement a translation function by mediating the binding of ribosomes to the mRNAs by using the IRESs. Compared with linear mRNA, circular mRNA molecules have high stability and have important application prospects in protein expression and clinical treatment. A protein expression level of the circular mRNA molecules is affected by the translation initiation element. Therefore, finding more IRES elements that can initiate translation of the circular mRNA molecules is of great significance for improvement of the protein expression level of the circular mRNA molecules and expansion of application of the circular mRNA molecules to clinical and industrial production.
Currently, because confirmation, mechanism of action studies and structure studies of the IRESs in sequences mainly rely on experimental verification and it takes a lot of time and costs to screen out active IRES sequences from a large number of sequences with unknown functions, currently, a few IRESs are discovered and verified, which limits the application of the circular RNA molecules in protein expression, clinical treatment, and the like.
In view of the problems existing in the prior art, for example, the screening of sequences containing an IRES is time-consuming and costly, resulting in a small number of verified IRES sequences at present, which limits the application of circular mRNA molecules in protein expression, clinical treatment, etc. For this purpose, the disclosure provides a Levenshtein distance-based IRES screening method, which can efficiently and rapidly screen a to-be-predicted sequence containing the IRES, and the screening results are accurate, which is conducive to the discovery of new IRES sequences.
In some embodiments, the disclosure provides a polynucleotide including any one nucleotide sequence shown in (i), where the polynucleotide is capable of initiating a translation process of a circular nucleic acid molecule, has high IRES activity, and is capable of improving the protein expression level of the circular nucleic acid molecule, which provides abundant translation initiation elements for the further application of the circular nucleic acid molecule.
According to a first aspect, the disclosure provides a Levenshtein distance-based IRES screening method, including the following steps:
(1) selecting n sequences including an IRES as sample sequences, where n≥1 and n is a natural number;
(2) subjecting the sample sequences and to-be-predicted sequences to one-hot encoding respectively, where categorical variables are A, T, C, and G;
(3) traversing the sample sequences, and calculating a Levenshtein distance between each sample sequence and the to-be-predicted sequence;
(4) calculating an average of Levenshtein distances between all sample sequences and the to-be-predicted sequences; and
(5) determining, based on the average, whether the to-be-predicted sequences include the IRES.
In some embodiments, according to the Levenshtein distance-based IRES screening method in the disclosure, in the step (5), if the average is not less than a set prediction threshold, it is determined that the to-be-predicted sequence includes the IRES, otherwise it is determined that the to-be-predicted sequence includes no IRES.
In some embodiments, according to the Levenshtein distance-based IRES screening method in the disclosure, the prediction threshold is not less than 0.5.
In some embodiments, according to the Levenshtein distance-based IRES screening method in the disclosure, the prediction threshold is 0.75.
In some embodiments, according to the Levenshtein distance-based IRES screening method in the disclosure, the method further includes the following step of:
traversing sample sequences if the to-be-predicted sequence is determined to include the IRES to separately find a longest common substring of each sample sequence and the to-be-predicted sequence.
In some embodiments, according to the Levenshtein distance-based IRES screening method in the disclosure, the method further includes the following steps of: predicting a secondary structure of the to-be-predicted sequence determined to include the IRES, and determining a position of the IRES in the to-be-predicted sequence in combination with the longest common substring.
In some embodiments, according to the Levenshtein distance-based IRES screening method in the disclosure, predicting the secondary structure of the to-be-predicted sequence determined to include the IRES includes: predicting, by using at least one of RNAfold, Mfold, RNAfoldweerver, and Vienna RNA software, the secondary structure of the to-be-predicted sequence determined to include the IRES.
In some embodiments, according to the Levenshtein distance-based IRES screening method in the disclosure, the secondary structure of the to-be-predicted sequence determined to include the IRES is predicted by using RNAfold software.
In some embodiments, according to the Levenshtein distance-based IRES screening method in the disclosure, the method further includes the following steps: subjecting the to-be-predicted sequence determined to include the IRES to experimental verification to determine the IRES activity of the to-be-predicted sequence.
In some embodiments, according to the Levenshtein distance-based IRES screening method in the disclosure, the experimental verification include the steps of:
constructing a circular nucleic acid molecule by using the to-be-predicted sequence determined to include the IRES, where in the circular nucleic acid molecule, the to-be-predicted sequence is operably linked to a nucleotide sequence encoding a fluorescent protein; and
obtaining a fluorescence signal released by the circular nucleic acid molecule, and determining the IRES activity of the to-be-predicted sequence based on the fluorescence signal.
According to a second aspect, the disclosure provides a polynucleotide, where the polynucleotide is selected from at least one of the group consisting of (i) to (iv):
(i) including a nucleotide sequence shown in any one of SEQ ID NOs: 1, 2, 3, 4, 9, 10, 11, 13, 14, 15, 17, 18, 19, 20, 25, 26, 27, 28, 41, 42, 45, 46, 51, 56, 59, 72, 79, 91, 98, 101, 104, 106, 107, 110, 115, 116, 117, 118, 119, 122, 123, 125, 127, 129, 130, 135, 139, 165, 179, 180, 183, 186, 188, 198, 200, 215, 216, 217, 218, 219, 220, 221, 222, 223, 225, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 239, 240, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 272, 273, 274, 275, 276, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 291, 293, 294, 296, 298, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 314, 315, 317, 318, 319, 321, 322, 323, 324, 326, 329, 331, 332, 333, 334, 335, 336, 348, 385, 387, 389, 392, 393, 394, 395, 406, 436, 438, 439, 441, 445, 457, 460, 496, 504, 507, 509, 511, 514, and 534;
(ii) a mutant sequence of any one nucleotide sequence shown in (i), where the mutant sequence has a mutant nucleotide at one or more positions of any corresponding nucleotide sequence shown in (i), and the mutant sequence has an activity of initiating translation of a circular nucleic acid molecule;
(iii) a nucleotide sequence that can be reversely complementary to a hybridized sequence of the nucleotide sequence shown in (i) or (ii) under a highly stringent hybridization condition or a very highly stringent hybridization condition and that has an activity of initiating translation of a circular nucleic acid molecule; and
(iv) a nucleotide sequence having at least 70%, optionally at least 80%, preferably at least 90%, more preferably at least 95%, most preferably at least 98% sequence identity with the nucleotide sequence shown in any one of (i) or (ii) and having an activity of initiating translation of a circular nucleic acid molecule.
Preferably, the polynucleotide includes a nucleotide sequence shown in any of the following sequences:
in some embodiments, according to the polynucleotide in the disclosure, the polynucleotide is a polynucleotide including the IRES that is screened by the method according to any one of claims 1 to 9.
In some embodiments, provided is use of the polynucleotide according to the disclosure in at least one of (a1)-(a2):
(a1) initiating translation of a circular nucleic acid molecule, or preparing a product for initiating translation of a circular nucleic acid molecule; and
(a2) increasing a protein expression level of a circular nucleic acid molecule, or preparing a product for increasing a protein expression level of a circular nucleic acid molecule.
According to a third aspect, the disclosure provides a circular nucleic acid molecule, where the circular nucleic acid molecule includes the polynucleotide according to the second aspect;
preferably, the circular nucleic acid molecule further includes a coding region encoding a polypeptide of interest, and the coding region is operably linked to the polynucleotide; and
optionally, the circular nucleic acid molecule further includes one or more of the following elements: a 5′ spacer region, a 3′ spacer region, a second exon, and a first exon.
In some embodiments, according to the circular nucleic acid molecule provided by the disclosure, the 5′ spacer region includes a sequence shown in any one of (b1)-(b2):
(b1) a nucleotide sequence shown in any one of SEQ ID NOs: 549-550; and
(b2) a sequence having at least 90%, optionally at least 95%, preferably at least 97%, more preferably at least 98%, most preferably at least 99% sequence identity with the nucleotide sequence shown in (b1).
In some embodiments, according to the circular nucleic acid molecule provided by the disclosure, the 3′ spacer region includes a sequence shown in any one of (c1)-(c2):
(c1) a nucleotide sequence shown in any one of SEQ ID NOs: 551-553; and
(c2) a sequence having at least 90%, optionally at least 95%, preferably at least 97%, more preferably at least 98%, most preferably at least 99% sequence identity with the nucleotide sequence shown in (c1).
In some embodiments, according to the circular nucleic acid molecule provided by the disclosure, the second exon includes a sequence shown in any one of (d1)-(d2):
(d1) a nucleotide sequence shown in SEQ ID NO: 555; and
(d2) a sequence having at least 90%, optionally at least 95%, preferably at least 97%, more preferably at least 98%, most preferably at least 99% sequence identity with the nucleotide sequence shown in (d1).
In some embodiments, according to the circular nucleic acid molecule provided by the disclosure, the first exon includes a sequence shown in any one of (e1)-(e2):
(e1) a nucleotide sequence shown in SEQ ID NO: 554; and
(e2) a sequence having at least 90%, optionally at least 95%, preferably at least 97%, more preferably at least 98%, most preferably at least 99% sequence identity with the nucleotide sequence shown in (e1).
According to a fourth aspect, the disclosure provides a cyclization precursor nucleic acid molecule, where the cyclization precursor nucleic acid molecule is cyclized to form the circular nucleic acid molecule according to the third aspect; and
optionally, the cyclization precursor nucleic acid molecule further includes one or more of the following elements:
a 5′ homology arm, a 3′ intron, a second exon, a 5′ spacer region, a coding region, a 3′ spacer region, a first exon, a 5′ intron and a 3′ homology arm.
In some embodiments, according to the cyclization precursor nucleic acid molecule provided by the disclosure, the 5′ homology arm includes a sequence shown in any one of (g1)-(g2):
(g1) a nucleotide sequence shown in any one of SEQ ID NOs: 558-559; and
(g2) a sequence having at least 90%, optionally at least 95%, preferably at least 97%, more preferably at least 98%, most preferably at least 99% sequence identity with the nucleotide sequence shown in (g1).
In some embodiments, according to the cyclization precursor nucleic acid molecule provided by the disclosure, the 3′ homology arm includes a sequence shown in any one of (h1)-(h2):
(h1) a nucleotide sequence shown in any one of SEQ ID NOs: 560-561; and
(h2) a sequence having at least 90%, optionally at least 95%, preferably at least 97%, more preferably at least 98%, most preferably at least 99% sequence identity with the nucleotide sequence shown in (h1).
In some embodiments, according to the cyclization precursor nucleic acid molecule provided by the disclosure, the 5′ intron includes a sequence shown in any one of (j1)-(j2):
(j1) a nucleotide sequence shown in SEQ ID NO: 556; and
(j2) a sequence having at least 90%, optionally at least 95%, preferably at least 97%, more preferably at least 98%, most preferably at least 99% sequence identity with the nucleotide sequence shown in (j1).
In some embodiments, according to the cyclization precursor nucleic acid molecule provided by the disclosure, the 3′ intron includes a sequence shown in any one of (k1)-(k2):
(k1) a nucleotide sequence shown in SEQ ID NO: 557; and
(k2) a sequence having at least 90%, optionally at least 95%, preferably at least 97%, more preferably at least 98%, most preferably at least 99% sequence identity with the nucleotide sequence shown in (k1).
According to a fifth aspect, the disclosure provides a recombinant nucleic acid molecule, where the recombinant nucleic acid molecule is selected from any one of (f1)-(f2):
(f1) including the polynucleotide according to the second aspect; and
(f2) transcription to form the cyclization precursor nucleic acid molecule according to the fourth aspect.
According to a sixth aspect, the disclosure provides a recombinant expression vector, where the recombinant expression vector includes the recombinant nucleic acid molecule according to the fifth aspect.
According to a seventh aspect, the disclosure provides a recombinant host cell, where the recombinant host cell includes the polynucleotide according to the second aspect, the circular nucleic acid molecule according to the third aspect, the cyclization precursor nucleic acid molecule according to the fourth aspect, the recombinant nucleic acid molecule according to the fifth aspect, or the recombinant expression vector according to the sixth aspect.
According to an eighth aspect, the disclosure provides a method for preparing a circular nucleic acid molecule with an improved protein expression level, where the method includes a step of operably linking the polynucleotide according to the second aspect to a coding region of the circular nucleic acid molecule.
According to a ninth aspect, the disclosure provides use of the circular nucleic acid molecule according to the third aspect, the cyclization precursor nucleic acid molecule according to the fourth aspect, the recombinant nucleic acid molecule according to the fifth aspect, or the recombinant expression vector according to the sixth aspect in at least one of (g1) to (g3):
(g1) expressing a protein, or preparing a product for expressing a protein;
(g2) expressing a polypeptide, or preparing a product for expressing a polypeptide; and
(g3) serving as or preparing a nucleic acid vaccine;
optionally, the protein or the polypeptide is one or more selected from: an antigen, an antibody, an antigen-binding fragment, a channel protein, a receptor, a cytokine, and an immune checkpoint inhibitor.
In some embodiments, through the Levenshtein distance-based IRES screening method provided by the disclosure, whether there is the IRES in the to-be-predicted sequence can be efficiently and accurately determined. If there is the IRES in the to-be-predicted sequence, a position of the IRES can also be further predicted and determined by further predicting the secondary structure of the to-be-predicted sequence in combination with the longest common substring of the to-be-predicted sequence and the sample sequence, so as to screen out a possible IRES core sequence from the sequences, which provides a technical support for screening of highly active IRESs, facilitates discovery of a new IRES sequence, and helps a researcher to selectively perform experimental verification on a RNA sequence with a higher probability of the presence of an IRES sequence, thereby improving the efficiency of experimental verification and saving ineffective time and costs.
In some embodiments, the polynucleotide shown in any sequence of SEQ ID NOs: 1 to 548 is screened by the method provided by the disclosure. In the disclosure, through experimental verification, it is found that the polynucleotide shown in any sequence of SEQ ID NOs: 1 to 548 has the activity of initiating translation of the circular nucleic acid molecule, which indicates that the screening method provided in the disclosure has an advantage of high accuracy.
In some embodiments, in the disclosure, through comparison, it is found that the polynucleotide including any nucleotide sequence shown in (i) is screened according to the method of the present disclosure, the IRES activity of the polynucleotide exceeds that of a CVB3 IRES element with high translation initiation activity that has been found so far, which can significantly increase the protein expression level of the circular nucleic acid molecule, thereby providing abundant translation initiation elements for application of the circular nucleic acid molecule in preparing a protein, serving as a vaccine, producing a therapeutic protein, or serving as a means of gene therapy, etc.
In some embodiments, the disclosure provides the circular nucleic acid molecule, including the polynucleotide that includes the nucleotide sequence shown in (i), which can achieve a high expression level of a polypeptide of interest and a protein of interest, thereby further expanding the application of the circular nucleic acid molecule in the fields of protein production, prevention or treatment of clinical diseases, etc.
In some embodiments, in the disclosure, the polynucleotide shown in any sequence in (i) is operably linked to the coding region of the circular nucleic acid molecule, providing a good basis for efficient expression of the protein of interest by the circular nucleic acid molecule.
When used in combination with the term “include” in the claims and/or description, the word “a” or “an” may refer to “one”, but may also refer to “one or more”, “at least one” and “one or more than one”.
As used in the claims and description, the word “include”, “have”, “comprise” or “contain” is meant to be inclusive or open-ended without exclusion of additional unrecited elements or method steps.
Throughout this application document, the term “about” means that one value includes a standard deviation of an error of a device or method used for measuring the value.
Although the disclosed content supports a definition of the term “or” only as a substitute and “and/or”, the term “or” in the claims refers to “and/or” unless it is explicitly stated that it is only the substitute or substitutes are mutually exclusive.
The term “one-hot encoding”, also known as one-bit valid encoding, mainly means encoding N states by using an N-bit state register, where each state has its own register bit, and only one bit is valid at any time. The one-hot encoding is a representation of a categorical variable as a binary vector. First, a categorical value needs to be mapped to an integer value. Then, each integer value is expressed as a binary vector, which is zero-valued except for an index of an integer, which is denoted as 1.
A term “sample sequence traversing” indicates that sample sequences are objects (or elements) arranged into a column, and each element is either before or after other elements. A sequence between elements is very important. The sample sequence traversing means accessing each element in a sample sequence sequentially along a certain search route once and only once. An operation for accessing the element depends on a specific application problem. Sequence traversing is often used for tree search and graph search of a data structure.
The term “Levenshtein distance” is a measure of a distance between two string sequences. Formally speaking, a Levenshtein distance of two strings is the minimum number of single character editing (for example, deleting, inserting, and substituting) required to transform one string into another string. The Levenshtein distance is also known as an edit distance. Although the Levenshtein distance is only a type of edit distance, the Levenshtein distance is closely related to pairwise string alignment. In mathematics, the Levenshtein distance between two strings a and b satisfy that levab(i, j)=max(i, j), where if min(i, j)=0, levab(i, j)=min(levab(i−1, j)+1, levab(i, j−1)+1, and levab(i−1, j−1)+1) (ai !=bj), where ai !=bj is an indicator function. When ai !=bj, a value is 1, otherwise, a value is 0. It should be noted that in the minimum item, a first part corresponds to a deletion operation (from a to b), a second part corresponds to an insertion operation, and a third part corresponds to a substitution operation.
The term “maximum common substring” is to find a longest substring of two or more known strings. A difference between a longest common substring and a longest common subsequence is that the subsequences do not have to be continuous, but the substrings must be continuous.
The terms “polypeptide,” “peptide,” and “protein” are used interchangeably herein and are amino acid polymers of any length. The polymer can be linear or branched, can contain modified amino acids, and can be interrupted by non-amino acids. The term also includes amino acid polymers that have been subjected to modification (for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other treatment, such as conjugation with a labeling component).
The term “polynucleotide” or “nucleic acid molecule” refers to a polymer consisting of nucleotides. The polynucleotide may be in a form of an individual fragment or a component of a larger nucleotide sequence structure, derived from nucleotide sequences that have been isolated at least once in quantity or concentration, and sequences and their component nucleotide sequences can be identified, manipulated, and recovered by a standard molecular biological method (for example, by using a cloning vector). When one nucleotide sequence is expressed by one DNA sequence (namely, A, T, G, C), this also indicates inclusion of one RNA sequence (namely, A, U, G, C) where “U” substitutes for “T”. In other words, “polynucleotide” refers to a nucleotide polymer removed from other nucleotides (the individual fragment or entire fragment), or may be a component or constituent of the larger nucleotide structure, such as an expression vector or a polycistronic sequence. The polynucleotides include DNA, RNA and cDNA sequences.
The term “circular nucleic acid molecule” refers to a nucleic acid molecule in a closed ring. In some specific embodiments, the circular nucleic acid molecule is a circular RNA molecule. More specifically, the circular nucleic acid molecule is a circular mRNA molecule.
In some embodiments, the circular RNA molecule in the disclosure is formed by linking a 5′ end of the upstream of a linear RNA molecule to a 3′ end of the downstream of the linear RNA molecule to form a circular form. The circular RNA molecule in the disclosure is formed by subjecting a cyclization precursor RNA molecule to cleavage and a cyclization reaction to form a circular form.
The term “linear RNA” refers to an RNA precursor that can be cyclized to form circular RNA, which is usually transcribed from a linear DNA molecule.
The term “linear RNA” refers to RNA with a translation function including a 5′ cap structure, a 3′ polyadenosine tail (PolyA tail), a 5′ untranslational region (5′ UTR), a 3′ untranslational region (3′ UTR), an open reading frame (ORF), and the like.
The term “translation initiation element” refers to any sequence element capable of recruiting ribosomes and initiating a translation process of an RNA molecule. For example, the translation initiation element is an IRES element, an m6A modified sequence, a rolling circle translation initiation sequence, or the like.
The term “IRES” is also known as an internal ribosome entry site, and the “internal ribosome entry site” (IRES) belongs to a translation control sequence, is usually located at a 5′ end of a gene of interest, and enables translation of RNA in a cap-independent manner. A transcribed IRES can be directly bound to a ribosomal subunit, so that an mRNA initiation codon is properly oriented in the ribosome for translation. The IRES sequence is usually located in the 5′UTR (just upstream of the initiation codon) of the mRNA. The IRES functionally replaces a requirement for various protein factors that interact with a translation mechanism of eukaryotes.
The term “coding region” refers to a gene sequence capable of transcribing a messenger RNA and finally translating the messenger RNA into a polypeptide or protein of interest.
The term “expression” includes any step involved in production of a polypeptide, which includes, but is not limited to, transcription, post-transcriptional modification, translation, post-translational modification, and secretion.
The terms “sequence identity” and “percent identity” refer to a percentage of same (that is, identical) nucleotides or amino acids of two or more polynucleotides or polypeptides. Sequence identity of two or more polynucleotides or polypeptides can be measured by the following method: aligning nucleotide or amino acid sequences of the polynucleotides or polypeptides, scoring the number of positions containing same nucleotide or amino acid residues in the aligned polynucleotides or polypeptides, and comparing the number of positions containing same nucleotide or amino acid residues in the aligned polynucleotides or polypeptides with the number of positions containing different nucleotide or amino acid residues in the aligned polynucleotides or polypeptides. Polynucleotides can differ at one position, for example, by inclusion of different nucleotides (that is, substitution or mutation) or deletion of nucleotides (that is, insertion of a nucleotide in one or two polynucleotides or deletion of nucleotides). Polypeptides can differ at one position, for example, by inclusion of different amino acids (that is, substitution or mutation) or deletion of amino acids (that is, insertion of an amino acid in one or two polypeptides or deletion of amino acids). The sequence identity can be calculated by dividing the number of the positions containing same nucleotide or amino acid residues by a total number of nucleotide or amino acid residues in the polynucleotides or polypeptides. For example, the percent identity can be calculated by dividing the number of the positions containing same nucleotide or amino acid residues by a total number of nucleotide or amino acid residues in the polynucleotides or polypeptides, and multiplying by 100.
For example, when compared and aligned with maximum correspondence by using a sequence comparison algorithm or measuring via visual inspection, two or more sequences or subsequences have at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% “sequence identity” or “percent identity” of nucleotides. In some embodiments, overall lengths of sequences in any one or two compared biopolymers (for example, polynucleotides) are substantially identical.
The term “recombinant nucleic acid molecule” refers to a polynucleotide having sequences which are not linked together in nature. A recombinant polynucleotide can be included in a proper vector, and the vector can be used for transformation into a proper host cell. The polynucleotide is then expressed in a recombinant host cell to produce, for example, a “recombinant polypeptide”, a “recombinant protein”, a “fusion protein”, and the like.
The term “recombinant expression vector” refers to a DNA structure for expressing, for example, a polynucleotide encoding a required polypeptide. The recombinant expression vector may include: for example, (i) a set of genetic elements having a regulatory effect on gene expression, such as a promoter and an enhancer; (ii) a structure or coding sequence capable of being transcribed into mRNA and translated into protein; and (iii) appropriate transcriptional subunits of transcription and translation initiation and termination sequences. The recombinant expression vector is constructed in any appropriate method. A nature of the vector is not critical and any vector including a plasmid, a virus, a phage, and a transposon can be used. Possible vectors used in the disclosure include, but are not limited to, chromosomal, non-chromosomal, and synthetic DNA sequences, such as a viral plasmid, a bacterial plasmid, a phage DNA, a yeast plasmid, and a vector derived from a combination of plasmid and phage DNA, such as DNAs from viruses such as lentivirus, retrovirus, vaccinia, adenovirus, fowlpox, baculovirus, SV40, and pseudorabies.
The term “host cell” refers to a cell into which an exogenous polynucleotide has been introduced, and includes a progeny of such cell. Host cells include “transformants” and “transformed cells,” namely, primary transformed cells and progenies derived therefrom. The host cell is any type of cellular system that can be used to produce an antibody molecule in the present invention, including a eukaryotic cell such as a mammalian cell, an insect cell, and a yeast cell; and a prokaryotic cell such as an Escherichia coli cell. The host cells include cultured cells, and also include cells within transgenic animals, transgenic plants, or cultured plant or animal tissue. The term “recombinant host cell” includes a host cell that differs from a parental cell after introduction of a circular nucleic acid molecule, a cyclization precursor nucleic acid molecule, a recombinant nucleic acid molecule or a recombinant expression vector, and the recombinant host cell is obtained specifically via transformation. The host cell in the disclosure may be a prokaryotic cell or a eukaryotic cell, as long as the host cell is a cell into which the circular nucleic acid molecule, the cyclization precursor nucleic acid molecule, the recombinant nucleic acid molecule, or the recombinant expression vector in the disclosure can be introduced.
The term “highly stringent condition” means subjecting probes of at least 100 nucleotides in length to prehybridization and hybridization treatments for 12 to 24 hours at 42° C. in 5×SSPE (saline sodium phosphate EDTA), 0.3% SDS, 200 μg/mL sheared and denatured salmon sperm DNA and 50% formamide according to a standard Southern blotting procedure for the DNA, and finally washing a carrier material with 2×SSC and 0.2% SDS at 65° C. for three times, each washing being carried out for 15 minutes.
As used in the disclosure, the term “very highly stringent condition” means subjecting probes of at least 100 nucleotides in length to prehybridization and hybridization for 12 to 24 hours at 42° C. in 5×SSPE (saline sodium phosphate EDTA), 0.3% SDS, 200 μg/mL sheared and denatured salmon sperm DNA and 50% formamide according to a standard Southern blotting procedure for the DNA, and finally washing a carrier material with 2×SSC and 0.2% SDS at 70° C. for three times, each washing being carried out for 15 minutes.
Unless otherwise defined or clearly indicated in this context, all technical and scientific terms in the disclosure have the same meaning as commonly understood by a person of ordinary skill in the art to which the disclosure belongs.
In the technical solution in the disclosure, numbers in nucleotide and amino acid sequence listings in the description represent the following meanings:
Sequences shown in SEQ ID Nos: 1 to 548, and 562 to 564 are polynucleotide sequences having an activity of initiating translation of circular nucleic acid molecules;
A sequence shown in a SEQ ID NO: 549 is a nucleotide sequence of a 5′ spacer sequence 1;
A sequence shown in SEQ ID NO: 550 is a nucleotide sequence of a 5′ spacer sequence 2;
A sequence shown in SEQ ID NO: 551 is a nucleotide sequence of a 3′ spacer sequence 1;
A sequence shown in SEQ ID NO: 552 is a nucleotide sequence of a 3′ spacer sequence 2;
A sequence shown in SEQ ID NO: 553 is a nucleotide sequence of a 3′ spacer sequence 3;
A sequence shown in SEQ ID NO: 554 is a nucleotide sequence of an exon element 1 (E1) of a class I PIE system;
A sequence shown in SEQ ID NO: 555 is a nucleotide sequence of an exon element 2 (E2) of a class I PIE system;
A sequence shown in a SEQ ID NO: 556 is a nucleotide sequence of a 5′ intron of a class I PIE system;
A sequence shown in SEQ ID NO: 557 is a nucleotide sequence of a 3′ intron of a class I PIE system;
A sequence shown in SEQ ID NO: 558 is a nucleotide sequence of a 5′ homology arm sequence 1 (H1);
A sequence shown in SEQ ID NO: 559 is a nucleotide sequence of a 5′ homology arm sequence 2 (H2);
A sequence shown in SEQ ID NO: 560 is a nucleotide sequence of a 3′ homology arm sequence 1; and
A sequence shown in SEQ ID NO: 561 is a nucleotide sequence of a 3′ homology arm sequence 2.
The Levenshtein distance-based IRES screening method in the disclosure includes the following steps:
(1) selecting n sequences including an IRES as sample sequences, where n≥1 and n is a natural number;
(2) subjecting the sample sequences and to-be-predicted sequences to one-hot encoding respectively, where categorical variables are A, T, C, and G;
(3) traversing the sample sequences, and calculating a Levenshtein distance between each sample sequence and the to-be-predicted sequence;
(4) calculating an average of Levenshtein distances between all sample sequences and the to-be-predicted sequences; and
(5) determining, based on the average, whether the to-be-predicted sequences include the IRES.
According to the screening method provided by the disclosure, the Levenshtein distance is used for the first time to screen and determine IRESs for a large number of to-be-predicted sequence samples, which helps the researchers to selectively perform experimental verification on the to-be-predicted sequence samples with a high probability of the presence of the IRES, thereby effectively reducing time and costs for IRES sequence screening. Compared with an existing IRES prediction method, the screening method in the disclosure has advantages of accurate results and high efficiency.
In some embodiments, in the step (5), if the average is not less than a set prediction threshold, it is determined that the to-be-predicted sequence includes the IRES, otherwise it is determined that the to-be-predicted sequence includes no IRES.
In some specific embodiments, the prediction threshold is not less than 0.5. When the prediction threshold is not less than 0.5, there is a high probability that the to-be-predicted sequence includes the IRES. In some preferable embodiments, the prediction threshold is 0.75. When the prediction threshold is 0.75, the to-be-predicted sequences generally include the IRES.
In some specific embodiments, a Levenshtein distance calculation method is as follows: a Levenshtein distance between two strings a and b satisfy that levab(i, j)=max(i, j), where if min(i, j)=0, levab(i, j)=min(levab(i−1, j)+1, levab(i, j−1)+1, and levab(i−1, j−1)+1) (ai !=bj), where ai !=bj is an indicator function. When ai !=bj, a value is 1, otherwise, a value is 0. It should be noted that in the minimum item, a first part corresponds to a deletion operation (from a to b), a second part corresponds to an insertion operation, and a third part corresponds to a substitution operation.
In some embodiments, the method further includes the following steps: predicting a secondary structure of the to-be-predicted sequence determined to include the IRES, and determining a position of the IRES in the to-be-predicted sequence in combination with the longest common substring.
Further, predicting the secondary structure of the to-be-predicted sequence determined to include the IRES includes: predicting, by using at least one of RNAfold, Mfold, RNAfoldweerver, and Vienna RNA software, the secondary structure of the to-be-predicted sequence determined to include the IRES.
In combination with IRES analysis software such as RNAfold, the position of IRES in the to-be-predicted sequence containing IRES can be further analyzed and located, which facilitates the discovery of new IRES sequences.
In some embodiments, the method further includes the following step of: subjecting the to-be-predicted sequence determined to include the IRES to experimental verification to determine an IRES activity of the to-be-predicted sequence.
In some embodiments, the experimental verification includes the steps of:
constructing a circular nucleic acid molecule by using the to-be-predicted sequence determined to include the IRES, where in the circular nucleic acid molecule, the to-be-predicted sequence is operably linked to a nucleotide sequence encoding a fluorescent protein; and
obtaining a fluorescence signal released by the circular nucleic acid molecule, and determining the IRES activity of the to-be-predicted sequence based on the fluorescence signal.
In some specific embodiments, in the disclosure, by taking the condition that disclosed human poliovirus 1 strain Mahoney_CDC 5′ UTR (a sequence shown in SEQ ID NO: 564) with the IRES activity is used as a to-be-predicted sequence as an example, a process of determining, by the method in the disclosure, whether the sequence shown in SEQ ID NO: 564 contains the IRES is as follows:
(1) selection of a sample sequence: a highly active human Coxsackievirus B3 (CVB3) virus IRES sequence (SEQ ID NO: 562) and a highly active human Echovirus 29 strain JV-10 (E29) virus IRES sequence (SEQ ID NO: 563) that have been experimentally verified are selected as sample sequences;
(2) one-hot encoding: as shown in Tables 1-3 below, to-be-encoded objects are determined as the sample sequence and the to-be-predicted sequence, where the categorical variables are A, T, C, and G; and each sample has 4 features, and the features are converted into binary vectors for representation, thereby converting sequence letter information into digital information;
(3) the sample sequences are traversed, and a Levenshtein distance between each sample sequence and the to-be-predicted sequence is calculated: wherein a represents the sample sequence, b represents the to-be-predicted sequence, i and j respectively represent a row and a column in Tables 1-3, and based on a calculation formula of the Levenshtein distance, a Levenshtein distance between the human poliovirus 1 strain Mahoney_CDC 5′UTR sequence and the human Coxsackievirus B3 (CVB3) virus IRES sequence is calculated to be 0.79028, and a Levenshtein distance between the human poliovirus 1 strain Mahoney_CDC 5′UTR sequence and the human Echovirus 29 strain JV-10 (E29) virus IRES sequence is calculated to be 0.79380;
(4) a prediction threshold is set to be 0.75, and an average of Levenshtein distances between 2 sample sequences and the to-be-predicted sequence is calculated to be 0.79204, where the average is greater than the prediction threshold of 0.75, and therefore, the to-be-predicted sequence, human poliovirus 1 strain Mahoney_CDC 5′UTR sequence, can be determined as the IRES-containing sequence;
(5) the sample sequences are traversed, and the longest common substrings of each sample sequence and the to-be-predicted sequence are separately searched, where the longest common substring of the to-be-predicted sequence, the human poliovirus 1 strain Mahoney_CDC 5′UTR sequence, and the sample sequence, the human Coxsackievirus B3 (CVB3) virus IRES sequence, is GCGGAACCGACTACTTTGGGTGTCCGTGTTTC, and the longest common substring of the to-be-predicted sequence, the human poliovirus 1 strain Mahoney_CDC 5′UTR sequence, and the sample sequence, the human Echovirus 29 strain JV-10 (E29) virus IRES sequence, is TCCTCCGGCCCCTGAATGCGGCTAATCCCAAC; and
(6) a secondary structure of the human poliovirus 1 strain Mahoney_CDC 5′UTR sequence is predicted by using RNAfold software, where as shown in
As shown in
Further, by the foregoing method, 548 nucleotide sequences containing the IRES are found via screening in the disclosure, and during further experimental verification, in the disclosure, it is found that a nucleotide sequence shown in any one of SEQ ID NOs: 1 to 548 has the IRES activity and can initiate the expression of a protein of interest in the circular nucleic acid molecule, indicating that the screening method provided by the disclosure has the advantages of high accuracy and high efficiency.
It should be noted that CVB3 IRES is a currently discovered IRES element having high IRES activity and capable of initiating protein expression of the circular nucleic acid molecule to high extent (Wesselhoeft R A, Kowalski P S, Anderson D G. Engineering circular RNA for potent and stable translation in eukaryotic cells. Nat Commun. 2018 Jul. 6; 9(1): 2629. doi: 10.1038/s41467-018-05096-6). In some specific embodiments, in the disclosure, by using the currently discovered CVB3 IRES having high IRES activity as a control, it is found that the polynucleotides of sequences shown below (SEQ ID NOs: 1, 2, 3, 4, 9, 10, 11, 13, 14, 15, 17, 18, 19, 20, 25, 26, 27, 28, 41, 42, 45, 46, 51, 56, 59, 72, 79, 91, 98, 101, 104, 106, 107, 110, 115, 116, 117, 118, 119, 122, 123, 125, 127, 129, 130, 135, 139, 165, 179, 180, 183, 186, 188, 198, 200, 215, 216, 217, 218, 219, 220, 221, 222, 223, 225, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 239, 240, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 272, 273, 274, 275, 276, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 291, 293, 294, 296, 298, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 314, 315, 317, 318, 319, 321, 322, 323, 324, 326, 329, 331, 332, 333, 334, 335, 336, 348, 385, 387, 389, 392, 393, 394, 395, 406, 436, 438, 439, 441, 445, 457, 460, 496, 504, 507, 509, 511, 514, and 534) in the disclosure have a higher capability of initiating the protein expression of the circular mRNA molecule compared with CVB3 IRES, indicating that a large number of nucleotide sequences of interest having extremely high IRES activity can be screened by the method in the disclosure, which lays a foundation for improving the level of the protein of interest expressed by the circular nucleic acid molecule.
Currently, although IRES elements capable of initiating a protein translation process have been found in some species (such as viruses), homology of viral IRES sequences of different species is low, and currently there is a lack of definite standards for determining the IRES sequences. Therefore, further research and identification are needed for the IRES sequences having the activity of initiating translation of the circular nucleic acid molecules.
To resolve the foregoing problem, the disclosure provides polynucleotides derived from different types of viruses as follows:
Echovirus E1 (strain Farouk/ATCC VR-1038), Echovirus E2 (strain USA/2013-19511), Echovirus E3 (isolate JSev001), Echovirus E3 (strain 61246-70294), Echovirus E3 (strain 61247-622), Echovirus E3 (strain 61245-2710), Echovirus E3 (strain 63038-1131), Echovirus E3 (strain 63040-70881), Echovirus E3 (isolate HNWY-01), Echovirus E3 (isolate ECHO3_INMI1), Echovirus E3 (isolate Env_2016_Sep_E-3), Echovirus E3 (strain Sakhalin-11.293), Echovirus E3 (strain HAI/2016-23067A), Echovirus E3 (strain HAI/2016-23066), Echovirus E3 (strain HAI/2016-23065A), Echovirus E3 (strain HAI/2016-23061), Echovirus E3 (strain HAI/2016-23056), Echovirus E3 (strain HAI/2016-23051A), Echovirus E3 (strain HAI/2016-23050), Echovirus E3 (isolate 123-R2), Echovirus E3 (strain Sakhalin/10_DU145), Echovirus E3 (strain Sakhalin/10_RD), Echovirus E3 (isolate E3/TO/BR/018), Echovirus E4 (strain 2F5), Echovirus 4 (strain AUS250G), Echovirus E4 (strain Pesacek), Echovirus E5, Echovirus E6, Echovirus 9 (strain Barty), Echovirus 9 (strain Hill), Echovirus E11, Echovirus E12, Echovirus E13 (strain HAI/2017-23078B), Echovirus E13 (strain HAI/2016-23072), Echovirus E13 (strain HAI/2016-23073), Echovirus E13 (strain HAI/2016-23075), Echovirus E13 (strain HAI/2017-23082B), Echovirus E14 (strain RO-81-1-79), Echovirus E14 (isolate ETH_P19/E14_2016), Echovirus E14 (isolate NSW-V04-2012-ECHO14), Echovirus E14 (isolate E14/P843/2013/China), Echovirus E14 (isolate E14/P968/2013/China), Echovirus E15 (strain CH 96-51), Echovirus E16 (isolate ETH_P4/E16_2016), Echovirus E16 (isolate E16/P85/2013/China), Echovirus E16 (strain Harrington), Echovirus 17 (strain CHHE-29), Echovirus E18 (isolate PC06/JS/CHN/2019), Echovirus E18 (strain E18/JXY2-2/2019), Echovirus E18 (isolate QD9/SD/CHN/2019), Echovirus E18 (isolate LJ/0530/2019), Echovirus E18 (strain 12J3), Echovirus E18 (strain USA/2015/CA-RGDS-1049), Echovirus E18 (isolate E18-221/HeB/CHN/2015), Echovirus E18 (strain 12G5), Echovirus E18 (isolate E18-393/HeB/CHN/2015), Echovirus E18 (isolate E18-398/HeB/CHN/2015), Echovirus E18 (isolate E18-HeB15-54462/HeB/CHN/2015), Echovirus E18 (isolate E18-HeB15-54498/HeB/CHN/2015), Echovirus E18 (isolate ETH_P12/E18_2016), Echovirus E18 (isolate NSW-V13A-2008-ECHO18), Echovirus E18 (strain A83/YN/CHN/2016), Echovirus E18 (strain A86/YN/CHN/2016), Echovirus E18 (isolate Jena/ST9524/10), Echovirus E18 (isolate Jena/VI10227/10), Echovirus E18 (isolate Kor05-ECV18-054cn), Echovirus E19 (strain HAI/2016-23039B), Echovirus E19 (strain HAI/2016-23036D), Echovirus E19 (strain HAI/2016-23037D), Echovirus E19 (strain HAI/2016-23037E), Echovirus E19 (strain HAI/2016-23042B), Echovirus E19 (strain HAI/2016-23046B), Echovirus E19 (strain HAI/2016-23047), Echovirus E19 (strain HAI/2016-23054), Echovirus E19 (strain HAI/2016-23052), Echovirus E19 (strain HAI/2016-23053), Echovirus E19 (strain HAI/2016-23062D), Echovirus E19 (strain HAI/2016-23063B), Echovirus E19 (strain HAI/2016-23064B), Echovirus E19 (strain HAI/2016-23067B), Echovirus E19 (strain HAI/2016-23070B), Echovirus E19 (strain HAI/2017-23079), Echovirus E19 (strain HAI/2017-23081A), Echovirus E19 (isolate ETH_P3/E19_2016), Echovirus E19 (strain NGR_2014), Echovirus E19 (isolate PDV_BLR_IN), Echovirus E19 (strain Burke), Echovirus E19 (strain K/542/81), Echovirus E20 (isolate E20/TO/BR/016), Echovirus E20 (strain HAI/2016-23038B), Echovirus E20 (strain HAI/2016-23041B), Echovirus E20 (strain HAI/2016-23085B), Echovirus E20 (strain HAI/2016-23065C), Echovirus E20 (strain HAI/2016-23068B), Echovirus E20 (strain HAI/2016-23069), Echovirus E20 (strain HAI/2017-23080B), Echovirus E20 (strain HAI/2017-23081B), Echovirus E20 (HAI/2016-23077B), Echovirus E20 (strain HAI/2017-23083C), Echovirus E20 (strain KM-EV20-2010), Echovirus E20 (strain JV-1), Echovirus E21 (strain 553/YN/CHN/2013), Echovirus E21 (strain Farina), Echovirus E24 (strain VEN/2018-23086), Echovirus E24 (isolate PZ18G/JS/20120703), Echovirus E24 (strain DeCamp), Echovirus E25 (strain USA/2016-19521), Echovirus E25 (strain USA/2018-23126), Echovirus E25 (strain 10-4339-2), Echovirus E25 (strain USA/CA/RGDS-2017-1010), Echovirus E25 (isolate NSW-V07-2007-ECHO25), Echovirus E25 (isolate NSW-V08-2008-ECHO25), Echovirus E25 (isolate NSW-V09-2008-ECHO25), Echovirus E25 (isolate NSW-V58-2010-ECHO25), Echovirus E25 (strain 61241-70868), Echovirus E25 (strain E25/ZE-wly/Zhejiang/CHN/2005), Echovirus E25 (isolate Jena/AN1380/10), Echovirus E25 (strain XM0297), Echovirus E25 (strain E25/2010/CHN/BJ), Echovirus E25 (isolate E25SD2010CHN), Echovirus E25 (strain HN-2), Echovirus E25 (strain JV-4), Echovirus E26 (strain Coronel), Echovirus E27 (isolate ETH_P8/E27_2016), Echovirus E27 (strain Bacon), Echovirus E29 (strain HAI/2016-23048B), Echovirus E29 (strain JV-10), Echovirus E30 (isolate E30/TO/BR/032), Echovirus E30 (isolate TL12C/NM/CHN/2016), Echovirus E30 (isolate TL7C/NM/CHN/2016), Echovirus E30 (strain USA/2018-23125), Echovirus E30 (Echo30/Hokkaido.JPN/21208/2017), Echovirus E30 (strain USA/2015/CA-RGDS-1046), Echovirus E30 (strain USA/2017/CA-RGDS-1048), Echovirus E30 (isolate B001/USA/2016), Echovirus E30 (strain 16-110), Echovirus E30 (strain 1-B4-TW), Echovirus E30 (strain 2002-59), Echovirus E30 (strain KM/A363/09), Echovirus E30 (isolate 1-MRS2013), Echovirus E30 (isolate 3-MRS2013), Echovirus E30 (isolate 4-MRS2013), Echovirus E30 (isolate 2012EM161), Echovirus E30 (isolate E30SD2010CHN), Echovirus E30 (isolate ECV30/GX10/05), Echovirus E30 (strain Kor08-ECV30), Echovirus E30 (isolate FDJS03_84), Echovirus 30 (strain Bastianni), Echovirus 31 (strain Caldwell), Echovirus 32 (strain PR-10), Echovirus E33 (strain YNK35/CHN/2013), Echovirus E33 (strain YNA12/CHN/2013), Human poliovirus 1 (isolate CHN-Hainan/93-2), Human poliovirus 1 (isolate RUS39223), Human poliovirus 1 (isolate Pak-1), Human poliovirus 1 (isolate TJK35363 clone 6), Human poliovirus 1 (strain 3788ALB96), Human poliovirus 1 (isolate CHN15115/Xinjiang/CHN/2011), Human poliovirus 1 (isolate 29690_c1), Human poliovirus 1 (strain NIE1018316), Human poliovirus 1 (isolate EGY1218587), Human poliovirus 1 (isolate 558/BRA-PE/88), Human poliovirus 2 (isolate Env2008_E2450), Human poliovirus 2 (strain CHA1218985), Human poliovirus 2 (isolate Env2008_E3218), Human poliovirus 2 (strain MAD-2593-11), Human poliovirus 3 (strain PAK1019536), Human poliovirus 3 (isolate Env08_E2886), Human poliovirus 3 (strain SWI10947), Human poliovirus 3 (strain FIN84-2493), Human poliovirus 3 (strain USOL-D-bac), Enterovirus A71 (isolate 2019-EV-A71-R398), Enterovirus A71 (strain USA/2018-23296), Enterovirus A71 (strain 16L), Enterovirus A76 (strain 10-3291-2), Human enterovirus A76 (AY697458), Enterovirus A89 (strain KSYPH-TRMH22F/XJ/CHN/2011), Human enterovirus A89 (AY697459.1), Enterovirus A90 (strain 10-2879-1), Enterovirus A90 (isolate SCHO5F/XJ/CHN/2011), Human enterovirus A90 (isolate 01336/SD/CHN/EV90), Human enterovirus A90 (AB192877.1), Human enterovirus A90 (isolate F950027), Human enterovirus 91 (AY697461.1), Human enterovirus A92 (strain RJG7), Simian enterovirus SV19 (strain NOLA-2), Simian enterovirus SV19 (isolate cg4006), Simian enterovirus SV19 (strain M19s (P2)), Simian enterovirus SV43 (strain OM112t (P12)), Simian enterovirus SV46 (isolate cg5400), Simian enterovirus SV46 (strain RNM5), Enterovirus B69 (strain Toluca-1), Enterovirus B69 (isolate 15_491), Enterovirus B73 (isolate 088/SD/CHN/04), Human enterovirus B73 (isolate 2776-82), Human enterovirus 74 (strain Rikaze-136/XZ/CHN/2010), Enterovirus B75 (isolate Y16/XZ/CHN/2007), Enterovirus B75 (isolate 102/SD/CHN/97), Enterovirus B75 (strain USA/OK85-10362), Human enterovirus B77 (strain USA/TX97-10394), Human enterovirus B77 (strain CF496-99), Human enterovirus B79 (strain 17-2255-1_E79), Human enterovirus B79 (AB426610.1), Human enterovirus B79 (strain USA/CA79-10384), Enterovirus B80 (isolate HT-LYKH2O3F/XJ/CHN/2011), Human enterovirus B80 (isolate HZ01/SD/CHN/2004), Enterovirus B81 (isolate 99279/XZ/CHN/1999), Human enterovirus B81 (strain USA/CA68-10389), Human enterovirus B82 (strain USA/CA64-10390), Human enterovirus B83 (strain USA/CA76-10392), Enterovirus B83 (isolate 99245/XZ/CHN/1999), Enterovirus B83 (isolate AFP341-GD-CHN-2001), Enterovirus B83 (isolate 246/YN/CHN/08), Enterovirus B84 (strain GHA:BAR:TES/2017), Enterovirus B84 (isolate AFP452/GD/CHN/2004), Human enterovirus B84 (isolate CIV2003-10603), Human enterovirus B85 (strain HTPS-MKLH04F/XJ/CHN/2011), Human enterovirus B85 (strain BAN00-10353), Human enterovirus B86 (strain BAN00-10354), Enterovirus B87 (isolate LY02/SD/CHN/2000), Enterovirus B88 (strain 11-4644-1), Human enterovirus B88 (strain BAN01-10398), Enterovirus B93 (isolate 99052/XZ/CHN/1999), Enterovirus B93 (isolate 38-03), Human enterovirus B97 (strain 99188/SD/CHN/1999/EV97), Human enterovirus B97 (strain DT94-0227), Human enterovirus B97 (strain BAN99-10355), Human enterovirus B98 (strain: T92-1499), Human enterovirus B100 (isolate BAN2000-10500), Human enterovirus B101 (strain CIV03-10361), Enterovirus B106 (isolate AKS-AWT-AFP2F/XJ/CHN/2011), Human enterovirus 106 (isolate 148/YN/CHN/12), Enterovirus C96 (strain VEN/2018-23123A), Enterovirus C96 (isolate 127/SD/CHN/1991), Enterovirus C96 (clone V13C), Enterovirus C99 (strain 10L1), Human enterovirus C104 (isolate kvv585-16-TS), Human enterovirus C105 (strain USA/OK/2014-19362), Human enterovirus C116 (strain 126), Enterovirus C117 (strain JX-C117-40-2017), Human enterovirus C118 (isolate CQ5185), Human enterovirus D68 (strain Fermon), Enterovirus D68 (TBp-13-Ph209), Enterovirus D70 (strain JPN/1989-23292), Enterovirus D94 (strain ANG/2010-23293), Human enterovirus D94 (isolate 19/04), Enterovirus D111 (strain ANG/2010-23294), Enterovirus D111 (isolate D111-NGR-KAT-1263), Simian enterovirus J103 (isolate cg8227), Coxsackievirus A2 (isolate HN202009), Coxsackievirus A2 (isolate 16027), Coxsackievirus A2 (isolate CVA2-1388-M14/XY/CHN/2017), Coxsackievirus A2 (isolate CVA2/Shenzhen50/CHN/2012), Coxsackievirus A2 (strain 2260165), Coxsackievirus A4 (strain CA4/JX2204/2014), Coxsackievirus A4 (isolate HK458564/2016), Coxsackievirus A5 (isolate CV-A5-3487-M14-XY-CHN-2017), Coxsackievirus A5 (strain CVA5/13164/HUN/2015), Coxsackievirus A6 (isolate DN1501), Coxsackievirus A6 (strain RYN-A1205), Coxsackievirus A7 (strain MAD-3101-11), Coxsackievirus A8 (isolate 13-467/GS/CHN/2013), Coxsackievirus A8 (isolate C177/CHW/AUS/2017), Coxsackievirus A8 (isolate CV-A8/P82/2013/China), Human coxsackievirus A8 (strain Donovan), Coxsackievirus A10 (isolate TA111R), Coxsackievirus A10 (strain CA10/JX2545/2017), Coxsackievirus A12 (isolate D89), Coxsackievirus A12 (strain QD-LXH535/SD/CHN/2009), Coxsackievirus A14 (strain MAD-72-07), Coxsackievirus A14 (isolate SEN-14-254), Human coxsackievirus A14 (strain G-14), Coxsackievirus A16 (isolate AH17-18/AH/East/CHN/2017-02-12), Coxsackievirus A16 (isolate CV-A16/HVN08.039_HA_GIANGVNM/2008), Coxsackievirus B1 (strain RO-98-1-74), Coxsackievirus B1 (strain CVB1/XM0108), Coxsackievirus B1 (strain B1/Groningen/2011), Coxsackievirus B2 (strain 13-2380-2_B2), Coxsackievirus B2 (strain 14L), Coxsackievirus B2 (strain 08-749-Shimane08-JPN), Coxsackievirus B2 (strain RW41-2/YN/CHN/2012), Coxsackievirus B2 (isolate BCH314), Coxsackievirus B3 (isolate B307), Coxsackievirus B3 (isolate 2001-5), Coxsackievirus B3 (isolate DHO9Y/JS/2012), Coxsackievirus B4 (isolate B401), Coxsackievirus B4 (isolate CV-B4/P11/2013/China), Coxsackievirus B4 (isolate Edwards CB4), Coxsackievirus B5 (isolate B501), Coxsackievirus B5 (strain USA/MI/2009-23030), Coxsackievirus B6 (isolate 99148/XZ/CHN/1999), Coxsackievirus B6 (strain LEV15), Coxsackievirus A9 (strain A744/YN/CHN/2009), Coxsackievirus A9 (isolate 2-MRS2013), Coxsackievirus A1 (clone V18A), Coxsackievirus A1 (isolate KS-ZPHO1F/XJ/CHN/2011), Coxsackievirus A11 (isolate CV-A11_66122), Coxsackievirus A13 (clone V4B), Coxsackievirus A13 (strain BAN01-10637), Coxsackievirus A19 (strain 2019103106/XX/CHN/2019), Coxsackievirus A19 (strain 8663), Coxsackievirus A20 (strain CAM1976), Coxsackievirus A21 (isolate 12MYKLU412), Coxsackievirus A21 (strain NIV17-608-2), Coxsackievirus A22 (strain 438913), Coxsackievirus A24 (strain 20693_84_CV-A24), Coxsackievirus A15 (strain G-9), Coxsackievirus A18 (strain CAM1972), Human rhinovirus A2 (strain 12L4), Human rhinovirus A2 (strain USA/2018/CA-RGDS-1062), Human rhinovirus A2 (X02316), Human rhinovirus A7 (strain ATCC VR-1117), Human rhinovirus A8 (strain ATCC VR-1118), Human rhinovirus A9 (isolate F01), Human rhinovirus A9 (isolate F02), Human rhinovirus A9 (strain ATCC VR-489), Human rhinovirus A10 (strain ATCC VR-1120), Human rhinovirus A11 (strain RvA11/USA/2021/XHZLKL), Human rhinovirus A11 (strain SCH-107), Human rhinovirus A11 (EF173414), Human rhinovirus A12 (isolate p211), Human rhinovirus A12 (EF173415), Human rhinovirus A13 (strain ATCC VR-1123), Human rhinovirus A13 (isolate F03), Human rhinovirus A15 (isolate 7002), Human rhinovirus A15 (DQ473493), Human rhinovirus A16 (isolate KC939), Human rhinovirus A16 (HRVPP), Human rhinovirus A18 (strain HRVA18/03/ZJ/CHN/2017), Human rhinovirus 18 (strain ATCC VR-1128), Human rhinovirus 19 (strain ATCC VR-1129), Human rhinovirus A20 (strain RvA20/USA/2021/B4Q4QT), Human rhinovirus A22 (strain RvA22/USA/2021/WBLGNP), Human Rhinovirus A23 (strain RvA23/USA/2021/JZHYZ6), Human rhinovirus A24 (strain RvA24/USA/2021/QZ8RX3), Human Rhinovirus A25 (strain RvA25/USA/2021/A8F6KW), Human Rhinovirus A28 (strain RvA28/USA/2021/ADMJHA), Human Rhinovirus A29 (strain RvA29/USA/2021/273658-4), Human rhinovirus A30 (strain MCL-18-H-1135), Human rhinovirus A31 (strain RvA31/USA/2021/273760-4), Human rhinovirus A32 (strain ATCC VR-1142), Human rhinovirus A33 (strain ATCC VR-330), Human rhinovirus A34 (strain ATCC VR-1144), Human rhinovirus A36 (DQ473505.1), Human rhinovirus A38 (strain ATCC VR-1148), Human rhinovirus A39 (strain ATCC VR-340), Human rhinovirus A40 (strain 7D5), Human rhinovirus A41 (strain SC9861), Human rhinovirus A43 (strain ATCC VR-1153), Human rhinovirus A44 (DQ473499), Human rhinovirus A45 (strain ATCC VR-1155), Human rhinovirus A46 (strain RvA46/USA/2021/6EEDHN), Human rhinovirus A47 (strain ATCC VR-1157), Human rhinovirus A49 (isolate F04), Human rhinovirus A50 (strain ATCC VR-517), Human rhinovirus A51 (strain ATCC VR-1161), Human rhinovirus A53 (DQ473507), Human rhinovirus A54 (strain ATCC VR-1164), Human rhinovirus A55 (DQ473511), Human rhinovirus A56 (strain ATCC VR-1166), Human rhinovirus A57 (isolate fs ship #1-hrv-57), Human rhinovirus A58 (strain ATCC VR-1168), Human rhinovirus A59 (strain 16-J2), Human rhinovirus A60 (strain ATCC VR-1473), Human rhinovirus A61 (strain SCH-99), Human rhinovirus A62 (strain ATCC VR-1172), Human rhinovirus A63 (strain ATCC VR-1173), Human rhinovirus A64 (strain ATCC VR-1174), Human rhinovirus A65 (strain ATCC VR-1175), Human rhinovirus A66 (strain ATCC VR-1176), Human rhinovirus A67 (strain ATCC VR-1177), Human rhinovirus A68 (strain ATCC VR-1178), Human rhinovirus A71 (strain ATCC VR-1181), Human rhinovirus A74 (DQ473494), Human rhinovirus A75 (DQ473510), Human rhinovirus A76 (strain ATCC VR-1186), Human rhinovirus A77 (strain ATCC VR-1187), Human Rhinovirus A78 (strain RvA78/USA/2021/177499), Human rhinovirus A80 (strain ATCC VR-1190), Human rhinovirus A81 (isolate F06), Human rhinovirus A82 (strain ATCC VR-1192), Human rhinovirus A85 (strain RvA85/USA/2021/AR424A), Human rhinovirus A88 (DQ473504.1), Human rhinovirus A90 (strain ATCC VR-1291), Human rhinovirus A94 (strain ATCC VR-1295), Human rhinovirus A95 (strain ATCC VR-1301), Human rhinovirus A96 (strain ATCC VR-1296), Human rhinovirus A98 (strain RvA98/USA/2021/W58KP8), Human rhinovirus A100 (strain ATCC VR-1300), Human rhinovirus A101 (strain SC1124), Human rhinovirus A103 (strain MCL-18-H-1122), Human rhinovirus B3 (NC_038312.1), Human rhinovirus B4 (DQ473490.1), Human rhinovirus B5 (strain ATCC VR-485), Human rhinovirus B6 (DQ473486.1), Human rhinovirus B17 (EF173420), Human rhinovirus B26 (strain ATCC VR-1136), Human rhinovirus B35 (strain ATCC VR-1145), Human rhinovirus B37 (EF173423), Human rhinovirus B42 (strain ATCC VR-338), Human rhinovirus B48 (DQ473488), Human rhinovirus B52 (isolate F10), Human rhinovirus B69 (strain ATCC VR-1179), Human rhinovirus B70 (DQ473489), Human rhinovirus B72 (strain ATCC VR-1182), Human rhinovirus B79 (isolate ZB/CHN/18), Human rhinovirus B83 (strain ATCC VR-1193), Human rhinovirus B84 (strain ATCC VR-1194), Human rhinovirus B86 (strain ATCC VR-1196), Human rhinovirus B91 (strain RvB91/USA/2021/95333), Human rhinovirus B92 (strain ATCC VR-1293), Human rhinovirus B93 (EF173425), Human rhinovirus B97 (strain ATCC VR-1297), Human rhinovirus B99 (strain ATCC VR-1299), Human rhinovirus C2 (isolate 470389), Human rhinovirus C6 (strain RvC6/USA/2021/LCP8K8), Human rhinovirus C8 (strain RvC8/USA/2021/7N6PM0), Human rhinovirus C9 (strain RvC9/USA/2021/96D92H), Human rhinovirus C10 (strain QCE), Human rhinovirus C11 (strain SC9849), Human rhinovirus C12 (strain RvC12/USA/2021/044858), Human rhinovirus C15 (strain RvC15/USA/2021/SUSM75), Human rhinovirus C17 (strain RvC17/USA/2021/T3RVH2), Human rhinovirus C23 (strain RvC23/USA/2021/ULVLFU), Human rhinovirus C30 (strain USA/2015/CA-RGDS-1045), Human rhinovirus C31 (strain RvC31/USA/2021/B8JUE1), Human rhinovirus C32 (strain USA/CA/RGDS-2016-1008), Human rhinovirus C34 (strain RvC34/USA/2021/BYRST7), Human rhinovirus C35 (strain RvC35/USA/2021/70881), Human rhinovirus C36 (strain RvC36/USA/2021/PEXCU4), Human rhinovirus C39 (strain RvC39/USA/2021/71206), Human rhinovirus C40 (strain RvC40/USA/2021/70389), Human rhinovirus C41 (strain USA/CA/2016-RGDS-1006), Human rhinovirus C42 (strain RvC42/USA/2021/278730), Human rhinovirus C43 (strain SC174), Human rhinovirus C47 (isolate CA-RGDS-1001), Human rhinovirus C50 (strain human/Australia/SG1/2008), Human rhinovirus C51 (isolate LZ508), Human rhinovirus C54 (isolate D3490), Human rhinovirus C56 (strain RvC56/USA/2021/466615), Enterovirus E (isolate HeN-A2), Enterovirus F (isolate HeN-B62), Enterovirus G (EV-G/Pig/JPN/Kana-Uchi13/2019/G1_PL-CP), Enterovirus I Dromedary camel enterovirus (strain 19CC), Bovine enterovirus GX20-1, Goat enterovirus (isolate NMG-F37), Aimelvirus 1 (strain gpai001), Ampivirus A1 (strain NEWT/2013/HUN), Equine rhinitis A virus (strain PERV-1), Foot-and-mouth disease virus—type A (isolate A/BR19-16_08 dpi_CB-RF), Foot-and-mouth disease virus—type Asia 1 (isolate Mazbi/QOL-UVAS-Pak/2006), Foot-and-mouth disease virus—type C (isolate KEN/1/2004), Foot-and-mouth disease virus O (isolate o6pirbright iso58), Foot-and-mouth disease virus—type SAT 1 (isolate TAN/3/80), Duck hepatitis A virus 1 (strain R85952), Turkey avisivirus (isolate USA-IN1), Bopivirus sp (strain bovine/TV-9682/2019-HUN), Encephalomyocarditis virus (ZM12/14), Human TMEV-like cardiovirus (NC_010810), Saffold virus 3 (NGT07-987), Human cosavirus A (strain AM326/BRA-AM/2017), Cosavirus F (strain NGR_2017_NHP_CV), Canine picodicistrovirus (strain 209), Equine rhinitis B virus 1, Simian hepatitis A virus, Hepatovirus D2 (isolate KS111230Crimig2011), Rodent hepatovirus (KEF121Sigmas2012), Hepatovirus G2 (isolate FO1AF48Rhilan2010), Loch Leven virus (isolate MW12_1o), Hunnivirus 05VZ (isolate 05VZ-75-RAT099), Melegrivirus A (NC_023858), Canine picomavirus, Turdivirus 3, Pasivirus A3 (strain swine/Zsana1/2013/HUN), Passerivirus (sp. strain waxbill/DB01/HUN/2014), Wenling sharpspine skate picornavirus (strain DHBYCGS18742), Picomaviridae (sp. rodent/RL/PicoV/FJ2015), Avian sapelovirus, Marmot sapelovirus 2 (strain HT6), Bat picornavirus (isolate BtPV/13585-58/M.dau/DK/2014), Bat picornavirus LMA6 (isolate DesRot/Peru/LMA6_F_DrPicoV), Sicinivirus A1 (isolate JSY), Sicinivirus A5 (strain RS/BR/2015/1), Sicinivirus (sp. isolate Environment/NLD/2019NE_7 picoma_3), Porcine teschovirus 10 (strain Vir 460/88), Tremovirus A (isolate GDs29), Yili Teratoscincus roborowskii picornavirus 1 (strain LPWC175499), Canine kobuvirus (US-PC0082), Feline kobuvirus (strain FK-13), Feline kobuvirus (strain WHJ-1), Kobuvirus (dog/AN211D/USA/2009), Murine kobuvirus 1 (isolate MKV1/NYC/2014/M014/0146), Kobuvirus sewage Kathmandu (isolate KoV-SewK™), Bovine kobuvirus (strain IL35164), Kobuvirus cattle/Kagoshima-1-22-KoV/2014/JPN (Kagoshima-1-22-KoV/2014/JPN), Caprine kobuvirus (isolate MN1/2018), Ferret kobuvirus (isolate MpKoV38), Grey squirrel kobuvirus (isolate UK 2010), Marmot kobuvirus (strain HT9), Ovine kobuvirus (isolate SKoV-China/SWUN/AB18/2019), Human parechovirus type 1 (PicoBank/HPeV1/a virus p123), Human parechovirus 3 (strain CAU14/2015/KR), Human parechovirus 4 (isolate 1(251176-02), Human parechovirus 5 (strain CT86-6760), Human parechovirus 5 (4112/SapporoC/July/2018), Human parechovirus 6 (strain: NI1561-2000), Human parechovirus 6 (isolate AFW), Human parechovirus 7, Human parechovirus 14 (clone V3C), Human parechovirus 17 (isolate 157Chzj058), Human parechovirus 18 (isolate 11Chzj207), Human parechovirus 19 (isolate 67Chzj11), Ljungan virus strain 145SL (isolate 145SLG), Ljungan virus M1146, Ljungan virus 64-7855, Rattus tanezumi parechovirus (strain Wencheng-Rt386-3), Parechovirus (sp. strain Parchzj-6), Baskerville virus, Bemisia tabaci picoma-like virus 1 (isolate CAU-Q1), British Admiral virus (isolate MW13_1o), Carfax virus, Chicken picornavirus 4 (isolate 5C), Chicken picornavirus 5 (isolate 27C), Chicken proventriculitis virus (isolate CPV/Korea/03), Zebrafish picomavirus-1 (strain NCSZCF/ZfPV/2015/North Carolina/USA), Duck picomavirus (duck/FC22/China/2017), Eotetranychus kankitus picorna-like virus (strain EKPLV.abc9), Falcon picomavirus, Feline picornavirus (strain 661F), French Guiana picomavirus (isolate French_Guiana Picornavirus), Leveillula taurica associated picoma-like virus 1 (isolate PM-A DN31116), Moran virus, Mus musculus picomavirus (strain Wencheng-Mm283), Ovine picomavirus, Pigeon mesivirus 2 (strain pigeon/GALII5-PiMeV/2011/HUN), Red-necked stint Picornavirus B-like, Sphenigellan virus, Sphenimaju virus, Washington bat picomavirus, Waterwitch virus (isolate MW03_1o), Aphid lethal paralysis virus, Cricket paralysis virus, Drosophila C virus (strain EB), Homalodisca coagulata virus-1, Antheraea pernyi iflavirus (isolate LnApIV-02), Isla virus (strain Cx 1773-5), Chaetoceros socialis f. radians RNA virus, and Apple latent spherical virus.
The polynucleotides provided by the disclosure have the activity of initiating translation of the circular nucleic acid molecule, and can mediate an expression process of a protein in the circular nucleic acid molecule, which achieves highly efficient translation and expression of the protein and provides a good application basis for the application of the circular nucleic acid molecule.
In some embodiments, the disclosure provides a polynucleotide (i) having the activity of initiating translation of a circular nucleic acid molecule, where the polynucleotide includes a nucleotide sequence shown in any one of SEQ ID NOs: 1 to 548. Preferably, the polynucleotide includes a nucleotide sequence shown in SEQ ID NOs: 1, 2, 3, 4, 9, 10, 11, 13, 14, 15, 17, 18, 19, 20, 25, 26, 27, 28, 41, 42, 45, 46, 51, 56, 59, 72, 79, 91, 98, 101, 104, 106, 107, 110, 115, 116, 117, 118, 119, 122, 123, 125, 127, 129, 130, 135, 139, 165, 179, 180, 183, 186, 188, 198, 200, 215, 216, 217, 218, 219, 220, 221, 222, 223, 225, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 239, 240, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 272, 273, 274, 275, 276, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 289, 291, 293, 294, 296, 298, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 314, 315, 317, 318, 319, 321, 322, 323, 324, 326, 329, 331, 332, 333, 334, 335, 336, 348, 385, 387, 389, 392, 393, 394, 395, 406, 436, 438, 439, 441, 445, 457, 460, 496, 504, 507, 509, 511, 514, and 534.
A polynucleotide shown in any sequence of SEQ ID NOs: 1 to 548 obtained via screening in the disclosure can recruit a ribosome in the circular nucleic acid molecule to initiate translation of the circular nucleic acid molecule. A polynucleotide shown in a preferred sequence mediates the protein expression level of the circular nucleic acid molecule to be significantly higher than that of CVB3 IRES, which can improve the expression level of the polypeptide and protein of interest, thereby providing abundant translation initiation elements for use of the circular nucleic acid molecule in preparing a protein, serving as vaccines, producing a therapeutic protein, serving as a means of gene therapy, etc.
Although the circular nucleic acid molecule has extremely high application potential in protein expression and prevention or treatment of clinical diseases, the sequences that can be used to initiate translation of circular nucleic acid molecules have not been found in large numbers. The screening method provided by the disclosure provides abundant translation initiation sequences for circular nucleic acid molecules, and has an important value for broadening industrial and clinical application of the circular nucleic acid molecule.
In some embodiments, the polynucleotide further includes a mutant sequence (ii) of any nucleotide sequence shown in (i), where the mutant sequence has a mutant nucleotide at one or more positions of any corresponding sequence shown in (i), and the mutant sequence has the activity of initiating translation of the circular nucleic acid molecule.
In the disclosure, the mutant sequence refers to a polynucleotide that contains a change (that is, substitution, insertion and/or deletion) at one or more (for example, several) positions relative to a “wild-type” or “comparative” nucleotide sequence, where the substitution means substituting a different nucleotide for a nucleotide occupying a position. Deletion refers to removal of a nucleotide occupying a certain position. Insertion refers to addition of a nucleotide at a position adjacent to and immediately following a nucleotide occupying a position.
In some specific embodiments, the mutant sequence includes one or more nucleotides deleted from or added to a 5′ end of any corresponding nucleotide sequence shown in (i). In some specific embodiments, the mutant sequence includes one or more nucleotides deleted from or added to a 3′ end of any corresponding nucleotide sequence shown in (i). In some specific embodiments, the mutant sequence includes one or more nucleotides deleted, added and/or substituted inside any corresponding nucleotide sequence shown in (i).
In the disclosure, the mutant sequence may have an increased activity of initiating translation of the circular nucleic acid molecule, or retained or at least partially retained activity of initiating translation of the circular nucleic acid molecule compared with a non-mutated nucleotide sequence. Specifically, as long as the mutated nucleotide does not cause loss of the mutant sequence's activity of initiating translation of the circular nucleic acid molecule, the mutant sequence falls within the scope of the disclosure.
In some embodiments, the polynucleotide having the activity of initiating translation of the circular nucleic acid molecule further includes: a nucleotide sequence that can be reversely complementary to a hybridized sequence of the nucleotide sequence shown in (i) or (ii) under a highly stringent hybridization condition or a very highly stringent hybridization condition and that has the activity of initiating translation of the circular nucleic acid molecule.
In some embodiments, the polynucleotide having the activity of initiating translation of the circular nucleic acid molecule further includes a nucleotide sequence having at least 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% (including all ranges and percentages between these values) sequence identity with the nucleotide sequence shown in any one of (i) or (ii) and having the activity of initiating translation of the circular nucleic acid molecule.
In some embodiments, the disclosure provides use of the polynucleotide in at least one of (a1)-(a2):
(a1) initiating translation of a circular nucleic acid molecule, or preparing a product for initiating translation of a circular nucleic acid molecule; and
(a2) increasing a protein expression level of a circular nucleic acid molecule, or preparing a product for increasing a protein expression level of a circular nucleic acid molecule.
The polynucleotide provided by the disclosure is used for initiating protein translation of the circular nucleic acid molecule, and has high translation activity, thereby implementing stable and efficient expression of the protein of interest.
The circular nucleic acid molecule provided by the disclosure includes the polynucleotide shown in any sequence in (i). The circular nucleic acid molecule has high protein expression efficiency and have a great application potential in the fields such as industrial protein production, nucleic acid vaccines, expression of therapeutic proteins, and gene therapies.
In some embodiments, the circular nucleic acid molecule is a circular RNA molecule. More specifically, the circular nucleic acid molecule is a circular mRNA molecule including a coding region encoding a polypeptide of interest. The coding region of the circular mRNA molecule is operably linked to the polynucleotide having the activity of initiating translation of the circular nucleic acid molecule, thereby initiating the protein translation process of the circular mRNA molecule.
In some embodiments, the circular mRNA molecule further includes one or more of the following elements: a 5′ spacer region, a 3′ spacer region, a second exon, and a first exon.
In some preferred embodiments, the circular mRNA molecule includes the following sequentially linked elements: a second exon E2, a 5′ spacer region, the polynucleotide having the activity of initiating translation of the circular nucleic acid molecule, a coding region, a 3′ spacer region, and a first exon E1. In the disclosure, it is found that the circular mRNA molecule with this structure has an increased protein expression level after insertion of the polynucleotide provided by the disclosure.
In the disclosure, the coding region may contain a nucleotide sequence encoding any protein. The sequence of the coding region is not specifically limited in the present disclosure, which is set according to a type of to-be-expressed protein of interest.
In some specific embodiments, the 5′ spacer region includes a nucleotide sequence shown in any one of SEQ ID NOs: 549-550, or a sequence having at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the nucleotide sequence shown in any one of SEQ ID NOs: 549-550.
In some specific embodiments, the 3′ spacer region includes a nucleotide sequence shown in any one of SEQ ID NOs: 551-553, or a sequence having at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the nucleotide sequence shown in any one of SEQ ID NOs: 551-553.
In some specific embodiments, the first exon E1 includes a nucleotide sequence shown in SEQ ID NO: 554, or a sequence having at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the nucleotide sequence shown in SEQ ID NO: 554.
In some specific embodiments, the second exon E2 includes a nucleotide sequence shown in SEQ ID NO: 555, or a sequence having at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the nucleotide sequence shown in SEQ ID NO: 555.
The disclosure finds that nucleotide sequences of the foregoing elements can further promote a protein translation process of the circular mRNA molecule mediated by the polynucleotide, and improve the activity of initiating protein translation by the polynucleotide.
In some other embodiments, the circular nucleic acid molecule may also include other types of elements or element sequences, which is not specifically limited in the disclosure, as long as the polynucleotides shown in SEQ ID NOs: 1 to 548 in the disclosure can initiate protein translation of the circular nucleic acid molecule to achieve high-level expression of the protein.
In some embodiments, the disclosure provides a cyclization precursor nucleic acid molecule, which can be cyclized to form the circular nucleic acid molecule described above. Further, the cyclization precursor nucleic acid molecule is a cyclization precursor mRNA molecule.
In some specific embodiments, the cyclization precursor mRNA molecule further includes one or more of the following elements: a 5′ homology arm, a 3′ intron, a second exon, a 5′ spacer region, a coding region, a 3′ spacer region, a first exon, a 5′ intron and a 3′ homology arm.
In some specific embodiments, the cyclization precursor mRNA molecule includes the following sequentially linked elements:
a 5′ homology arm, a 3′ intron, a second exon, a 5′ spacer region, the polynucleotide having the activity of initiating translation of the circular nucleic acid molecule, a coding region, a 3′ spacer region, a first exon, a 5′ intron and a 3′ homology arm.
The cyclization precursor mRNA molecule is cyclized by the following process: via a ribozyme feature of the intron, under the trigger of GTP, a junction of the 5′ intron and the first exon is broken; and a ribozyme cleavage of the first exon further attacks a junction of the 3′ intron and the second exon, causing break of the junction, the 3′ intron is dissociated, and the first exon and the second exon are connected to form the circular mRNA molecule.
In some specific embodiments, the 5′ homology arm includes a sequence having at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with a nucleotide sequence shown in any one of SEQ ID Nos: 558-559.
In some specific embodiments, the 3′ homology arm includes a sequence having at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with a nucleotide sequence shown in any one of SEQ ID Nos: 560-561.
In some specific embodiments, the 5′ intron includes a sequence having at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the nucleotide sequence shown in any one of SEQ ID No: 556.
In some specific embodiments, the 3′ intron includes a sequence having at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the nucleotide sequence shown in any one of SEQ ID No: 557.
In some embodiments, the disclosure provides a recombinant nucleic acid molecule capable of being transcribed to form the cyclization precursor mRNA molecule described above. To enable further transcription of the recombinant nucleic acid molecule to form the mRNA molecule, the recombinant nucleic acid molecule may also contain a regulatory sequence. For example, the regulatory sequence is a T7 promoter linked to the upstream of the 5′ homology arm.
In some embodiments, the disclosure provides a recombinant expression vector including the recombinant nucleic acid molecule described above. Vectors connecting the recombinant nucleic acid molecules can be various types of vectors commonly used in the art, for example, a pUC57 plasmid, etc. Further, the recombinant nucleic acid molecule contains a restriction site, so that a linearized vector suitable for transcription is obtained after the recombinant expression vector is digested by the enzyme.
In some embodiments, the disclosure provides a recombinant host cell, including at least one of the circular mRNA molecule, the cyclization precursor mRNA molecule, the recombinant nucleic acid molecule, and the recombinant expression vector.
Other objectives, features and advantages of the disclosure will become obvious from the following detailed description. However, it should be understood that the detailed description and specific examples (while showing specific embodiments of the disclosure) are provided for explanatory purposes only. Because after reading the detailed descriptions, various changes and modifications made within the spirit and scope of the disclosure will become obvious to those skilled in the art.
The experimental techniques and methods used in this example are conventional technical methods unless otherwise specified. For example, the experimental methods in which specific conditions are not specified in the following examples are usually performed according to conventional conditions for example, conditions described in Sambrook et al., Molecular Cloning: A Laboratory Manual (New York: Cold Spring Harbor Laboratory Press, 1989), or conditions recommended by a manufacturer. The materials, reagents, and the like used in the examples are officially commercially available unless otherwise specified.
(1) Nucleotide sequences derived from different species of viruses were obtained and used as a set of to-be-predicted sequences.
(2) A set of 583 sample IRES sequences of which the activity had been experimentally verified were downloaded from iresite database (http://www.iresite.org).
(3) One-hot encoding: to-be-encoded objects were determined as (1) a set of obtained to-be-predicted sequences, and (2) a set of selected IRES sequences, wherein the categorical variables were A, T, C, and G; and each sample had 4 features, and the features were converted into binary vectors for representation. Taking SEQ ID NO: 1 as an example, details are shown in Table 4 below:
(4) Calculation of Levenshtein distances: Levenshtein distances between each to-be-predicted sequence and the selected 583 sample IRES sequences were calculated, and an average was taken. In calculative mathematics, the Levenshtein distance between two strings a and b satisfy that levab(i, j)=max(i, j), where if min(i, j)=0, levab(i, j)=min(levab(i−1, j)+1, levab(i, j−1)+1, and levab(i−1, j−1)+1) (ai !=bj), where ai !=bj is an indicator function. When ai !=bj, a value is 1, otherwise, a value is 0. It should be noted that in the minimum item, a first part corresponds to a deletion operation (from a to b), a second part corresponds to an insertion operation, and a third part corresponds to a substitution operation. The average of the Levenshtein distances between the to-be-predicted sequences and the 583 sample IRES sequences was calculated. The maximum average was 1.0. If the average was greater than 0.5, it could be preliminarily determined that the to-be-predicted sequence could contain the IRES; if the average was greater than 0.75, it was determined that the to-be-predicted sequence highly likely contained the IRES. The average of the Levenshtein distances was shown in Table 5 below.
Plasmids containing different IRES elements and coding genes eGFP were constructed, and this step was entrusted to Nanjing Genscript Biotech Corporation for gene synthesis and cloning. A DNA vector of constructed circular RNA included a T7 promoter, a 5′ homology arm (SEQ ID NO: 558), a 3′ intron (SEQ ID NO: 557), a second exon E2 (SEQ ID NO: 555), a 5′ spacer region (SEQ ID NO: 549), an IRES element, an eGFP protein coding region sequence, a 3′ spacer region (SEQ ID NO: 551), a first exon E1 (SEQ ID NO: 554), a 5′ intron (SEQ ID NO: 556), a 3′ homology arm (SEQ ID NO: 560), and a restriction site XbaI that can be used for plasmid linearization. The obtained gene fragment was connected to a pUC57 vector.
(1) Stab culture bacteria synthesized in vitro were activated under 37° C. at 220 rpm for 3 to 4 hours.
(2) An activated bacterial solution was taken for amplification culture under a culture condition of 37° C. at 220 rpm overnight.
(3) A plasmid was extracted (a Tiangen endotoxin-free small amount Midiprep Kit), and an OD value was measured.
The plasmid prepared in the foregoing step 2.2.1 was digested with a XbaI single digestion.
Enzyme digestion was conducted at 37° C. overnight. A universal DNA gel extraction kit (Tiangen Biotech (Beijing) Co., Ltd.) was used to recover an enzyme-digested product, the OD value was measured, and the enzyme-digested product was identified via 1% agarose gel electrophoresis. A purified linear plasmid template was used for in vitro transcription.
2.2.3 Preparation of mRNA Via In Vitro Transcription
2.2.3.1 Preparation of Circular mRNA Via One-Step Transcription and Cyclization
1) An in vitro transcription reaction was conducted, and the system was as follows:
Incubation was carried out at 37° C. for 2 to 4 hours, 2 μL of DNaseI was added for digestion at 37° C. for 15 minutes.
2) Purification of transcript mRNA
The foregoing obtained transcript was purified via a silica spin column method (Thermo, GeneJET RNA Purification Kit), and the OD value was measured and 1% denatured agarose gel electrophoresis was used to identify an RNA size (
2.2.4 Transfection of 293T Cells with Circular mRNA Encoding EGFP and Measurement of Fluorescence Intensity
2.2.4.1 Cell culture: 293T cells were inoculated in a DMEM high-glucose medium containing 10% fetal bovine serum and 1% double antibody, and incubated at 37° C. in a 5% CO2 incubator. Subculture of cells was carried out every other 2-3 days.
2.2.4.2 Cell transfection: before transfection, the 293T cells were seeded in a 24-well plate at 1×105 cells/well, and incubated at 37° C. in a 5% CO2 incubator. After a confluence of the cells reached 70% to 90%, a transfection reagent Lipofectamine Messenger Max (Invitrogen) was used to transfect the 293T cells at 500 ng of mRNA per well. Detailed operations were as follows:
Incubation was carried out by standing at room temperature for 10 minutes after dilution and mixing.
2) Dilution of mRNA
3) Selection of Mixed and Diluted Messenger MAX™ Reagent and mRNA (1:1)
Incubation was carried out by standing at room temperature for 5 minutes after dilution and mixing.
4) 50 μL of the above mixed solution was sucked and slowly added to the 24-well plate in an adherent manner, and incubation was carried out at 37° C. in the 5% CO2 incubator.
1) Cell fluorescence observation: expression of EGFP was observed in the 293T cells 24 hours after transfection under a fluorescence microscope.
2) Test of average fluorescence intensity of cells via flow cytometry: the average fluorescence intensity of the 293T cells were measured by using a flow cytometer 24 hours after transfection.
No active IRES sequence was added to the circular mRNA molecule in the control 1, and a coxsackievirus B3 (CVB3) sequence (SEQ ID NO: 562) with high IRES activity was added to the circular mRNA molecule in the control 2. The test results are shown in the table below. If the expression level of EGFP was greater than 0 and less than or equal to 10000, it indicated that the to-be-predicted sequence mediated the expression of the circular RNA, and contained the IRES sequence; if the expression level of EGFP is greater than 10000, it indicated that the IRES contained in the to-be-predicted sequence had extremely good activity.
It could be learned from the above Table 11 that the polynucleotides of the sequences shown in the SEQ ID NOs: 1 to 548 in the disclosure all had the activity of initiating protein translation of the circular mRNA molecule, and could be used as the IRES element to construct a circular mRNA molecule having protein and polypeptide translation activity. In some preferred embodiments, the EGFP expression level of the circular mRNA molecules constructed by using the polynucleotide in the disclosure was higher than that of the circular nucleic acid molecule constructed by using Coxsackievirus B3 (CVB3) (shown in SEQ ID NO: 562), indicating that the IRES activity of the polynucleotide provided by the disclosure was further improved compared with the current highly-active IRES sequence, which was of great significance for improving the levels of expressing the protein of interest and the polypeptide of interest by the circular nucleic acid molecule.
All technical features disclosed in this specification can be combined in any manner. Each feature disclosed in this specification may also be replaced with other features having the same, equivalent or similar function. Therefore, unless otherwise specified, each disclosed feature is only an instance of a series of equivalent or similar features.
In addition, from the foregoing descriptions, a person skilled in the art can easily learn a key feature of the present invention, and can make many modifications to the invention to adapt to various use purposes and conditions without departing from the spirit and scope of the present invention. Therefore, such modifications are also intended to fall within the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202111185073.9 | Oct 2021 | CN | national |
202111435528.8 | Nov 2021 | CN | national |