The present disclosure relates to a method of screening a regulatory element for enhancing mRNA translation, a novel regulatory element resulting from the method, and uses thereof.
Viruses have evolved diverse mechanisms to hijack cellular gene expression machinery, and research in this area has contributed greatly to advances in RNA biology and biotechnology. For instance, the 7-methyl guanosine cap, internal ribosome entry site, and RNA triple helix were first discovered from reovirus, poliovirus, and Kaposi's sarcoma-associated herpesvirus, respectively. Human immunodeficiency virus (HIV) is known to utilize the transactivation response region (TAR) and the rev-response element (RRE) to recruit cellular factors for viral transcription and RNA export, respectively (Vaishnav, et al., New Biol., 1991, 3, 142-150; Dingwall, et al., EMBO J., 1990, 9, 4145-4153). Hepatitis B virus (HBV) relies on its post-transcriptional regulatory element (PRE) to bring host nucleotidyl transferases, which stabilize viral transcripts (Kim, et al., Nat. Struct. Mol. Biol., 2020, 27, 581-588; Huang, et al., Mol. Cell. Biol., 1993, 13, 7476-7486).
However, these discoveries were made through low-throughput analyses of pathogenic viruses, which represent only a small fraction of the entire virome. To date, 6,828 viral species have been named, and the NCBI Genome database contains 14,775 complete viral genome sequences (O'Leary, et al., Nucleic Acids Res., 2016, 44, D733-D745). Recent metagenomics studies based on deep sequencing have detected hundreds of thousands of additional viral sequences from environmental and animal samples (Neri, et al., Cell, 2022, 185, 4023-4037). Despite the vast number of available sequences, those without clinical or industrial relevance remain largely unexplored. Therefore, the rapidly growing collection of viral sequences presents a significant challenge for functional annotation, demanding more effective strategies to interpret viral sequence data.
The present inventors developed a method for screening regulatory elements for enhancing mRNA translation using viral sequence data, and used this method to discover novel regulatory elements, and uses thereof.
An objective of the present disclosure is to provide a method of screening a regulatory element for enhancing RNA stability and/or mRNA translation.
Another objective of the present disclosure is to provide a regulatory element for enhancing RNA stability and/or mRNA translation.
Another objective of the present disclosure is to provide a construct, vector, or recombinant host cell, which includes a gene of a target protein and the regulatory element, preferably located in a 3′ UTR of the gene.
Another objective of the present disclosure is to provide a composition including the construct, vector, or recombinant host cell.
Another objective of the present disclosure is to provide a method of preparing a target protein, the method including: culturing the recombinant host cell; and separating a target protein.
Another objective of the present disclosure is to provide a method of preparing an mRNA construct, the method including: in vitro transcribing a construct by using the construct or vector as a template; and recovering a transcribed mRNA construct.
Another objective of the present disclosure is to provide a use of the construct, vector, recombinant host cell, or composition for enhancing RNA stability and/or mRNA translation.
Another objective of the present disclosure is to provide a use of the construct, vector, recombinant host cell, or composition for preventing or treating a disease.
Another objective of the present disclosure is to provide a use of the construct, vector, recombinant host cell, or composition for preparing an mRNA construct or a target protein.
Through the screening method of the present disclosure, a novel regulatory element capable of enhancing mRNA translation may be obtained. Furthermore, the novel regulatory element may increase the expression of a target protein and as such, may be applied to various fields, depending on the intended use of the target protein
Data are normalized against the value of the wild-type sample and represented as mean±SEM (n=3). * indicates p<0.05, with a two-sided Student's t-test performed. (L) shows the results where FLAG-tagged ZCCHC2 proteins (F-ZCCHC2) were transiently expressed in HeLa ZCCHC2 knockout cells, immunoprecipitated with an anti-FLAG antibody, and analyzed by western blotting. Full-length ZCCHC2 protein and its truncated mutants (ΔC, ΔN) were compared for their ability to interact with TENT4 proteins. TENT4A and GAPDH were detected on the same gel, whereas the other proteins were analyzed on separate gels with the same amounts of samples. Cross-reacting bands are indicated by asterisks.
Each description and embodiment disclosed in the present application may be applied to other descriptions and embodiments presented herein. In other words, all combinations of the various elements disclosed herein fall within the scope of the present application. Moreover, the scope of the present application shall not be considered limited by any specific descriptions provided below. Moreover, a person of ordinary skill in the art would be able to recognize or identify numerous equivalents to the specific aspects of the present application only through routine experimentation. Such equivalents are intended to be encompassed within the scope of the present application.
An aspect of the present disclosure relates to a method of screening a regulatory element for enhancing RNA stability and/or mRNA translation. The screened regulatory element may enhance RNA stability and/or mRNA translation, thereby increasing the expression of a target protein.
Specifically, the method may be a method of screening a regulatory element for enhancing RNA stability and/or mRNA translation and include:
The viral genomes used in the present application may be obtained from known databases (e.g., NCBI).
The tiling in the process (a) may be a method used in the art to analyze genomic characteristics, which involves dividing the genomic sequence into segments of a certain size (sliding window) to generate a plurality of segments, wherein the window is shifted by a specific displacement (shift) size from the first position of the previous segment to create each subsequent segment. For example, the size of the sliding window may be 100 nt to 500 nt, and the displacement may be 1 nt to 500 nt, but are not limited thereto. The sizes of the sliding window and the displacement may be appropriately selected those skilled in the art.
One or more barcode sequences may be added to the plurality of segments. Specifically, by adding one barcode sequence from each of two or more different types downstream of a single segment, two or more oligonucleotides may be generated per segment. That is, in the present disclosure, the plurality of oligonucleotides may include one or more barcode sequences.
In process (b), the plurality of oligonucleotides may be individually introduced into a vector, thereby producing a plurality of vectors (i.e., a pool of vectors). At this stage, the oligonucleotides may be introduced into the 3′ UTR of the reporter gene within the vector.
In the present disclosure, the reporter may be luciferase, a fluorescent protein, β-galactosidase, chloramphenicol acetyltransferase, or aequorin, but is not limited thereto.
In the present disclosure, methods for introducing vectors into cells encompass any method of introducing nucleic acids into cells (e.g., transfection or transformation) and may be performed by selecting appropriate standard techniques known in the art depending on the cell type. For example, methods such as electroporation, calcium phosphate (CaPO4) precipitation, calcium chloride (CaCl2) precipitation, microinjection, polyethylene glycol (PEG) method, DEAE-dextran method, cationic liposome method, and lithium acetate-DMSO method may be used, without being limited thereto.
In process (d), a method of isolating and fractionating polysomes from the cell into which a vector has been introduced may be performed by selecting an appropriate standard technique known in the art.
In an embodiment, process (d) may include lysing the cell into which a vector has been introduced, and fractionating polysomes by centrifugation into free mRNA, monosome, LP, MP, and HP, but is not limited thereto.
Additionally, after extracting free mRNA, monosome, LP, MP, and HP from each fraction, performing sequencing to obtain each read value, and using the obtained read values as a basis, values of Equation (1) and MRL may be determined for each oligonucleotide (i.e., each segment of the viral genome).
An oligonucleotide for which the calculated value of Equation (1) exceeds 0.2 and the value of the MRL exceeds 4.5 may be selected as a regulatory element for enhancing mRNA translation.
In addition, if the regulatory element for enhancing mRNA translation of the present disclosure also meets the condition that the value of Equation (2) exceeds 0.5, the regulatory element may further enhance RNA stability:
In this case, the RNA/DNA ratio refers to the ratio of RNA and DNA isolated and/or sequenced from the cell into which a vector has been introduced in the process (d) (for example, a sequencing read ratio).
Under these circumstances, it is possible to screen for a regulatory element that enhance both RNA stability and mRNA translation. Specifically, the screening method may further include: (d)′ isolating DNA and RNA from the cell into which a vector has been introduced in process (c), and calculating the value of Equation (2) for each oligonucleotide; and (e)′ selecting, as a regulatory element for enhancing RNA stability, an oligonucleotide for which the value of Equation (2) exceeds 0.5. At this stage, processes (d)′ and (e)′ may be performed simultaneously with processes (d) and (e), respectively, or may be performed as processes separate from processes (d) and (e).
Additionally, in an embodiment, process (d)′ may include extracting and isolating DNA and RNA and/or treating the isolated RNA with DNase I to remove vector DNA, but is not limited thereto.
Additionally, based on the isolated DNA and RNA, the value of Equation (2) may be determined for each oligonucleotide (i.e., for each segment of the viral genome). For example, after reverse-transcribing the isolated RNA to obtain cDNA, amplifying the DNA, cDNA, and the original vector pool by PCR, and then performing sequencing, the value of Equation (2) for each oligonucleotide may be determined, but is not limited thereto.
Another aspect of the present disclosure relates to a regulatory element for enhancing mRNA translation that has been screened by the aforementioned screening method. This regulatory element may additionally enhance RNA stability and may enhance protein expression by enhancing RNA stability and/or mRNA translation.
Specifically, the regulatory element for enhancing mRNA translation may be a regulatory element for which the value of Equation (1) exceeds 0.2 and the MRL exceeds 4.5, but is not limited thereto. For example, the value of Equation (1) and the MRL for the regulatory element may be obtained through a method including:
In an embodiment, the regulatory element for enhancing mRNA translation further meets the condition that the value of Equation (2) exceeds 0.5, and may thereby further enhance RNA stability, but is not limited thereto. In this case, the value of Equation (2) may be obtained through a method including:
In an embodiment, the regulatory element of the present disclosure may include (i) a nucleotide sequence of any one of SEQ ID NOs: 20 and 79 to 93 (K5, K1-K4, K6-K16) or an RNA nucleotide sequence thereof; or (ii) a nucleotide sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% homology or identity thereto, but is not limited thereto.
In an embodiment, the regulatory element of the present disclosure may include: (i) the nucleotide sequence of a segment of the Saffold virus genome (NCBI Reference Sequence: NC_009448.2) or an RNA nucleotide sequence thereof; (ii) a nucleotide sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% homology or identity thereto; or (iii) a homolog thereof.
In the present disclosure, the segment may include more than 120 and up to 190, 130 to 180, 130, or 180 consecutive nucleotides in the 5′ direction from the nucleotide at position 8060 within the Saffold virus genome, but is not limited thereto. For example, the segment may consist of the nucleotide sequence of SEQ ID NO: 82 (K4).
Additionally, the homolog may include a nucleotide sequence within the 3′ UTR of a cardiovirus genus and having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% homology or identity to nucleotides 7952 to 7988 of the Saffold virus genome. For example, the homolog may include the nucleotide sequence of SEQ ID NO: 187 or an RNA nucleotide sequence thereof; or a nucleotide sequence having at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% homology or identity thereto, but is not limited thereto.
The nucleotide sequence within the 3′ UTR of a cardiovirus genus may be obtained from known databases (e.g., NCBI, etc.).
However, in the present disclosure, even when a ‘regulatory element comprising/including the nucleotide sequence of a specific sequence number’ or a ‘regulatory element having the nucleotide sequence of a specific sequence number’ is described, it is apparent that if regulatory elements, in which some sequences are deleted, modified, substituted, or added with respect to the nucleotide sequence of the specific sequence number, possess the same or equivalent function as the regulatory element with the specific sequence number, they can also be used in this application.
For example, it is apparent that if regulatory elements with non-functional sequences added to the internal or terminal regions of a sequence of the regulatory element with the specific sequence number, or with some sequences deleted from the internal or terminal regions of the sequence of the regulatory element with the specific sequence number, have the same or equivalent function as the regulatory element with the specific sequence number, they also fall within the scope of this application.
Homology and identity refer to the degree of relatedness between two given nucleotide sequences and can be expressed as a percentage. The terms homology and identity can often be used interchangeably.
Whether any two sequences have homology, similarity, or identity can be determined, for example, by using known computer algorithms such as the “FASTA” program with default parameters, as in Pearson et al (1988)[Proc. Natl. Acad. Sci. USA 85]: 2444. Alternatively, such determination can be made using the Needleman-Wunsch algorithm, as performed by the Needleman program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277) (version 5.0.0 or later), or other tools such as the GCG program package (Devereux et al., Nucleic Acids Research 12: 387 (1984)), BLASTP, BLASTN, FASTA (Atschul et al., J. Mol. Biol. 215: 403 (1990); Guide to Huge Computers, Martin J. Bishop, Ed., Academic Press, San Diego, 1994; and Carillo et al., SIAM J. Applied Math 48: 1073 (1988)). For example, homology, similarity, or identity of sequences can be determined using BLAST from the National Center for Biotechnology Information, or ClustalW.
In addition, the nucleic acid sequence described in (ii) may include a sequence of any one of SEQ ID NO: 20 and SEQ ID NOs: 79 to 93, incorporating one or more substitutions, deletions, or a combination thereof, or an RNA nucleotide sequence thereof, but is not limited thereto. For example, the altered nucleotide may be one or more nucleotides among nucleotides 1 through 14.
The regulatory element of the present disclosure, by interacting with TENT4, may induce poly(A) tail elongation, poly(A) tail stability increase via mixed tailing, or both.
Another aspect of the present disclosure relates to a construct including a gene of a target protein and the regulatory element of the present disclosure, preferably located in a 3′ UTR of the gene. In detail, the construct may be a DNA construct or an mRNA construct.
In the present disclosure, the target protein is not limited as long as RNA stability and/or mRNA translation can be enhanced by the regulatory element of the present disclosure, but may be selected from a reporter, a bioactive peptide, an antigen, or an antibody or a fragment thereof.
In the present disclosure, the bioactive polypeptide may be selected from a hormone, a cytokine, a cytokine-binding protein, an enzyme, a growth factor, or an insulin, but is not limited thereto.
In the present disclosure, the antigen may be selected from a vaccine antigen, a tumor-associated antigen, or an allergy antigen, but is not limited thereto.
In an embodiment, the construct of the present disclosure may further include one or more barcode sequences, forward adapter sequences, reverse adapter sequences, poly(A) tail sequences, or a combination thereof, but is not limited thereto.
In an embodiment, the construct of the present disclosure may further include a promoter sequence, wherein the target protein may be operably linked to the promoter sequence, but is not limited thereto.
In an embodiment, the construct of the present disclosure may further include 5′ terminal repeat sequences and 3′ terminal repeat sequences from a virus selected from the group consisting of adeno-associated virus, adenovirus, alphavirus, retrovirus (e.g., gamma retrovirus and lentivirus), parvovirus, herpesvirus, and SV40, but is not limited thereto.
In an embodiment, the mRNA construct of the present disclosure may further include a 5′ UTR, a 3′ UTR, a poly(A) tail sequence, or a combination thereof, but is not limited thereto.
Another aspect of the present disclosure relates to a vector including the construct or a pool of the vector.
In the present disclosure, the term “vector” refers to a genetic construct containing a nucleotide sequence that encodes a target protein operably linked to appropriate regulatory sequences, enabling the expression of the target protein in a suitable host. The regulatory sequences may include a promoter capable of initiating transcription, any operator sequences for regulating such transcription, a sequence encoding an appropriate mRNA ribosome-binding site, and a sequence regulating the termination of transcription and translation, but are not limited thereto. The vector, once introduced into an appropriate host cell, may be replicated or function independently of the host genome, or may be integrated into the genome itself.
In the present disclosure, the vector is not particularly limited as long as it can be expressed in a host cell, and may be introduced into a host cell using any vector known in the art. Examples of commonly used vectors include a plasmid, a cosmid, a virus, and a bacteriophage, whether in their natural states or recombinant forms.
In addition, the term “operably linked” as used herein means that a promoter sequence that initiates and mediates the transcription of a gene encoding a target protein is functionally linked to the sequence of the gene.
Another aspect of the present disclosure relates to a recombinant host cell including the construct or vector.
In the present disclosure, the host cell includes any cell capable of expressing a target protein and encompasses cells that have undergone a natural or artificial genetic modification. In addition, the host cell includes eukaryotic and prokaryotic cells and may specifically be a eukaryotic cell or a cell derived from a mammal (e.g., human), but is not limited thereto.
Another aspect of the present disclosure relates to a composition including the construct, vector, or recombinant host cell. In the present disclosure, the construct, vector, recombinant host cell, or a composition including the same may express a target protein in vitro, in vivo, or ex vivo.
In an embodiment, the composition, when administered to an individual, may provide a target protein to the individual by the construct, vector, or recombinant host cell, and depending on the use of the target protein provided, may exhibit a preventative or therapeutic effect for a disease (e.g., infectious disease). Therefore, the composition may be a pharmaceutical composition but is not limited thereto.
In addition, in an embodiment, using the construct, vector, or recombinant host cell, the mRNA construct or target protein of the present disclosure may be prepared in vitro or ex vivo. Therefore, the composition may be a composition for preparing the mRNA construct or target protein of the present disclosure, but is not limited thereto.
For example, if the target protein is a vaccine antigen, the construct, vector, recombinant host cell, or the composition itself may be used as a vaccine, or may be used to prepare a vaccine antigen.
In an embodiment, the construct or vector of the present disclosure may further include a gene encoding TENT4, or a combination thereof, or the recombinant host cell or composition of the present disclosure may further include TENT4 or a gene encoding the same; or a combination thereof, to induce poly(A) tail elongation, poly(A) tail stability increase, or both, through interactions with TENT4, thereby enhancing RNA stability or mRNA translation, but are not limited thereto.
Another aspect of the present disclosure relates to a composition including TENT4 interacting with the regulatory element, or a gene encoding the same.
The TENT4 may induce poly(A) tail elongation, poly(A) tail stability increase via mixed tailing, or both, through interactions with the regulatory element, thereby enhancing RNA stability or mRNA translation. Therefore, the composition may increase the expression of the target protein of the present disclosure in vitro, in vivo, or ex vivo.
In an embodiment, to express the target protein, the composition may further include the construct, vector, and/or recombinant host cell of the present disclosure, or TENT4 or a gene encoding the same may be included in the construct, vector, and/or recombinant host cell of the present disclosure.
In an embodiment, depending on the use of a target protein whose in vivo expression is enhanced by the composition, the composition may exhibit a preventative or therapeutic effect for a disease. Therefore, the composition may be a pharmaceutical composition but is not limited thereto.
Further, in an embodiment, the composition may be used to prepare the mRNA construct or target protein of the present disclosure in vitro or ex vivo. Therefore, the composition may be a composition for preparing the mRNA or target protein of the present disclosure, but is not limited thereto.
For example, if the target protein is a vaccine antigen, the composition may increase the expression of the vaccine antigen in vivo, allowing the composition to be used as a vaccine composition, or the composition may be used to produce a vaccine antigen in vitro or ex vivo.
Another aspect of the present disclosure relates to a method for preparing a target protein, the method including: culturing the recombinant host cell; and recovering the target protein.
In the present disclosure, the method of preparing a target protein by using the recombinant host cell may be carried out using a method widely known in the art. In detail, the culturing may be carried out continuously in a batch process, fed-batch process, or repeated fed-batch process, but is not limited thereto. The medium used for culturing may be appropriately selected by a person skilled in the art, depending on the host cell. In detail, the recombinant host cell of the present disclosure may be cultured under aerobic or anaerobic conditions in a conventional medium containing an appropriate carbon source, nitrogen source, phosphorus source, inorganic compound, amino acid, and/or vitamin, with adjustments to temperature, pH, and the like.
The method of preparing a target protein may further include an additional process after the culturing. The additional process may be appropriately selected depending on the use of the target protein.
In detail, the method of preparing a target protein may include, after the culturing: recovering the target protein from one or more materials selected from the recombinant host cell, a dried material of the recombinant host cell, an extract of the recombinant host cell, a culture of the recombinant host cell, a supernatant of the culture, or a lysate of the recombinant host cell.
The method may further include lysing the recombinant host cell prior to or simultaneously with the recovering. The lysis of the recombinant host cell may be carried out by a method commonly used in the technical field to which the present disclosure pertains, such as lysis buffer, sonication, heat treatment, or French press. In addition, the lysing may include an enzymatic reaction, which involves cell wall/cell membrane degrading enzymes, nucleases, nucleic acid transferases, and/or proteases, etc., but is not limited thereto.
In the present disclosure, dried material of the recombinant host cell may be prepared by drying cells that have accumulated a target substance, but is not limited thereto.
In the present disclosure, extract of the recombinant host cell may refer to a remaining substance after separating the cell wall/cell membrane from the cell. In detail, the extract of the recombinant host cell may refer to the components obtained by lysing the cell, excluding the cell wall/cell membrane. The cell extract contains the target protein and may also contain, other than the target protein, one or more components from proteins, carbohydrates, nucleic acids, and fibers from the cell, but is not limited thereto.
In the present disclosure, the recovering may recover the target protein using an appropriate method known in the art (e.g., centrifugation, filtration, anion exchange chromatography, crystallization, and HPLC).
In the present disclosure, the recovering may include a purification process. The purification process may involve isolating only the target protein from the cell and purifying the target protein. Through the purification process, the purified target protein may be prepared.
Another aspect of the present disclosure relates to a method of preparing an mRNA construct, the method including: in vitro transcribing the construct or vector; and recovering a transcribed mRNA construct.
The transcription and recovery methods may employ suitable methods known in the art.
In an embodiment, the method may further include treating with DNase I after transcription to remove the DNA of the construct or vector used as a template; and/or washing, but is not limited thereto.
Another aspect of the present disclosure relates to a use of the construct, vector, recombinant host cell, or composition for enhancing RNA stability and/or mRNA translation.
Another aspect of the present disclosure relates to a use of the construct, vector, recombinant host cell, or composition for preventing or treating a disease.
Another aspect of the present disclosure relates to a use of the construct, vector, recombinant host cell, or composition for preparing a target protein.
Mode for Disclosure Hereinbelow, the present invention will be described in greater detail with reference to experimental examples and examples. These examples are provided only to illustrate the present invention and therefore, should not be construed as limiting the scope of the present invention.
All cell lines used in the present disclosure tested mycoplasma-negative. HeLa cells (gift from C.-H. Chung at Seoul National University and authenticated by ATCC (STR profiling)), Lenti-X 293T cells (Clontech, 632180), and 293AAV cells (Cell Biolabs, AAV-100) were cultured in DMEM containing 10% FBS (Welgene, S001-01). HCT116 cells (ATCC, CCL-247) were cultured in McCoy's 5A (Welgene, LM 005-01) containing 10% FBS.
Genomic sequences of viruses that can infect humans as hosts were retrieved from NCBI Virus Genome Browser (retrieved 2020-01-10, 804 sequences, 504 viruses). Additional information on each virus was retrieved from the GenBank file from NCBI Nucleotide. Based on sequence similarity and virus classification, 143 representative viral species were selected, and woodchuck hepatitis virus was added as a control. For the tiling of RNA viruses, the whole genome of the sequences in positive-sense orientation was used for tiling. For DNA viruses, the sequences of the 3′ UTR of coding transcripts and the whole sequences of non-coding RNAs were used for oligo design. If the UTR is not annotated, UTR was predicted based on the poly(A) signal (PAS) annotation. If the PAS is not annotated, PAS was predicted using Dragon PolyA Spotter ver. 1.2 within the range of 800 bp from the stop codon. If the PAS cannot be predicted, the 390-bp region downstream of the stop codon was taken for tiling. After determining the genomic region for tiling, oligos were designed with sliding windows of 130-nt with a 65-nt shift size. When a window contains the Sac or NotI restriction sites which were later used for cloning, the window was made to end at the restriction site, thereby creating a shorter segment. The next segment starts at the restriction site, thereby preventing cleavage of the segment by Sac or NotI during plasmid construction. Thus, the screen may miss some viral elements that contain the restriction site sequences. Also, the design may miss some elements that are longer than 65 nt. For instance, elements with a size of 100 nt have a probability of being missed by approximately 50%.
Three barcodes of 7-bp random sequences with at least 3 hamming distances were added to each oligo sequence. As controls, the 1E segments and their stem-loop mutants were added to the library. In addition, human hepatitis B virus PRE and its corresponding stem-loop mutants were included as controls. Positive and negative controls were tiled separately. In total, 30,367 segments and 91,101 oligos were designed.
For the secondary screening, five classes of K5 mutants were designed. (1) For single-nucleotide substitution, the base at each position was converted into the other three base types throughout K5. (2) For single-nucleotide deletion, the base at each position was removed. (3) For two-nucleotide deletion, two consecutive nucleotides for all positions were deleted. (4) To examine the significance of base-pairing, the secondary structure was predicted from 6 different RNA secondary prediction software and 38 predicted base-pairs were collected and mutated (AT/TA/GC/CG/GU/UG/del) in a way to preserve the base pair. (5) Two bases randomly selected in predicted loops were mutated to create different combinations. In addition, the homologs of K5 were screened by including 88 homologous elements from other picornaviruses (including 45 from the genus Kobuvirus). When the homology was ambiguous, the 3′-most 130-nt were used for oligo design. In total, the library for the secondary screening included 1,288 elements with 3 barcodes each, generating a total of 3,864 oligos.
Oligos of 170 nt in length (containing the forward adaptor sequence of 16 nt, the reverse adaptor sequence of 17 nt, and the barcode sequence of 7 nt) were synthesized from Synbio Technologies. NotI and Sac restriction sites were added by 6 cycles of PCR using Q5 High-Fidelity 2× Master Mix (NEB, M0492) and primers Sac-univ-F and NotI-univ-R. The amplified product was purified using 6% Native PAGE gel, SYBRgold (Invitrogen, S11494) staining. The purified amplified product and pmirGLO-3XmiR-1 vector were digested with Sac-HF (NEB, R3156S) and NotI-HF (NEB, R3189S) and cloned into the 3′ UTR of the firefly luciferase gene using T4 DNA ligase (NEB, M0202M). The ligation product was purified with Zymo Oligo Clean & Concentrator kit (Zymo Research, #D4061) and transformed into the Lucigen Endura ElectroCompetent cell (Lucigen, LU60242-2). Transformed bacteria were recovered at 37° C. for 1 hour and then cultured with shaking at 30° C. for 14 hours. The colony count was confirmed to be approximately 1E7. The primer sequences used are provided in Table 1.
4E5 HCT116 cells were seeded one day before transfection for RNA stability screening. 1.5 μg of the plasm id pool was transfected by Lipofectamine 3000 (Invitrogen, L3000001) and p3000. RNA and DNA were extracted 48 hours post-transfection using the Allprep RNA/DNA Mini Kit (Qiagen, 80004), and RNA was treated with Recombinant DNase I (RNase-free) (TAKARA, 2270A) to remove remaining plasmid DNA. RNAs were reverse-transcribed using SSIV reverse transcriptase (Invitrogen, 18090010). The extracted DNA, cDNA obtained from RNA, and the original plasmid pool were amplified by 14 cycles of PCR, using mixed primers MPRAlib_N/NN/NNN_F and MPRAlib_N/NN/NNN_R (Table 1). 6 cycles of the second PCR were performed using Illumina index primers. The PCR amplicons were sequenced by next-generation sequencing using the Illumina Novaseq 6000 platform.
For nuclear/cytoplasmic fractionation screening, the cytoplasm was obtained using cytosol lysis buffer (0.15 μg/μl digitonin [Merck, D141], 150 mM NaCl, 50 mM HEPES [pH 7.0-7.6], 20 U/ml RNase inhibitor [Ambion, AM2696], 1× protease inhibitor [Calbiochem, 535140], 1× phosphatase inhibitor [Merck, P0044]). The library preparation steps were performed in the same manner as the RNA stability screening.
For polysome fractionation screening, a 10-50% sucrose gradient was prepared using Gradient Master™ (Biocomp, B108-2). HCT116 cells, at three times the scale of RNA stability screening, were treated with 100 μg/ml cycloheximide for 1 minute at 37° C., then lysed with 150 μl of PEB (20 mM Tris-CI pH 7.5, 100 mM KCl, 5 mM MgCl2, 0.5% NP-40 [Merck, 74385]) containing 100 U/ml RNase inhibitor, 1× protease inhibitor, and 1× phosphatase inhibitor on ice for 10 minutes, and then centrifuged. The supernatant was layered onto the sucrose gradient and centrifuged at 36,000 rpm for 2 hours at 4° C. using an SW41Ti rotor and a Beckman Coulter Ultracentrifuge Optima XE. Samples were collected in 0.25 ml fractions using a Biologic LP system coupled with a Model 2110 fraction collector (Bio-Rad, 7318303) and a Model EM-1 Econo UV detector (Bio-Rad). 0.75 ml of TRIzol™ LS Reagent (Life Technologies) was immediately added to each fraction. Free mRNA, monosome, light polysome (LP; 2-3 ribosomes), medium polysome (MP; 4-8 ribosomes), and heavy polysome (HP; 9 or more ribosomes) were separated based on the 254 nm absorbance trend and extracted using the Direct-Zol RNA Miniprep kit (Zymo Research, R2052).
The following library preparation steps were performed in the same manner as the RNA stability screening. The sequencing data are available in the Zenodo database under the following DOI identifiers: [https://doi.org/10.5281/zenodo.6777910](Stability), https://doi.org/10.5281/zenodo.6717932 (Polysome), https://doi.org/10.5281/zenodo.6696870 (Secondary screening), https://doi.org/10.5281/zenodo.7773943 (Nuclear/cytoplasmic fractionation).
For all samples, reads were aligned to oligos using bowtie 2.2.6 with the parameter-local. Aligned reads were filtered to ensure a strict, unique match to the barcode. Statistical tests were performed with MPRAnalyze using the mpralm function. Technical performance was assessed using the Spearman correlation coefficient from the scipy module and histogram plots. Normalized counts were used for visualization. For polysome analysis, after variance stabilizing transformation using DESeq2, the relative distance of each fraction was calculated by subtracting the mean of the five fractions. The relative distance of each fraction was used to perform hierarchical clustering in the scipy module. For another translational quantification, Mean Ribosome Load (MRL) was calculated as follows:
For mRNA stability cutoff, Log2FC<−1 and adjusted p-value<0.001 were used for negatively regulated elements, and Log2FC>0.5 and adjusted p-value<0.05 were used for positively regulated elements. Log2(heavy polysome/free mRNA)>0.2 and/or MRL >4.5 were used for the translational activating element cutoff, and Log2(heavy polysome/free mRNA)<−0.2 and/or MRL<3.5 were used for the translational downregulating element cutoff.
For the second screening substitution data, the base-identity score of substitution and deletion was calculated as follows:
The base-pairing score for substitution data was calculated as follows.
The pair-deletion score for deletion data was calculated as follows.
For the tree construction of picornaviruses, virus sequences retrieved from NCBI were aligned using ClustalOmega and visualized using FigTree v1.4.4. The conservation score was calculated as the number of identical nucleotides with the K5 element after multiple sequence alignment across the top 33 species. For RNA structure visualization, the structure was predicted using RNAfold and visualized using forna.
For validation experiment, the selected elements were PCR-amplified from the plasmid library pool and cloned into 3′ UTR of firefly gene in pmirGLO-3XmiR-1 vector. For luciferase construct, K5 element (8122-8251: NC_001918.1) was amplified from the plasmid pool library, and an additional 55 bp and 110 bp were added by PCR amplification to create eK5 element (8067-8251: NC_001918.1) and full UTR (8012-8251: NC_001918.1), respectively. 120-K5 element (8132-8251: NC_001918.1), 110-K5 element (8142-8251: NC_001918.1), and K5m element (8122-8251,8185AG: NC_001918.1) were amplified from pmirGLO-3XmiR-1 K5 plasmid, and eK5m element (8067-8251: NC_001918.1) was amplified from pmirGLO-3XmiR-1 eK5 plasmid. K4 element (7931-8060: NC_009448.2) was amplified from the plasmid pool library, and an additional 50 bp was added by PCR amplification to make eK4 element (7881-8060: NC_009448.2). 1E element (414-463: RNA2.7) was amplified from pmirGLO-3XmiR-1 1E vector.
For AAV production, pAAV-CAG-GFP (Addgene, Plasmid #37825) plasmid was used as a template. K5 element (8122-8251: NC_001918.1), K5m element (8122-8251, 8185AG: NC_001918.1), eK5 element (8067-8251: NC_001918.1), and eK5m element (8067-8251, 8185AG: NC_001918.1) were amplified from pmirGLO-3XmiR-1 eK5 and eK5m plasmid and replaced WPRE sequence in pAAV-CAG-GFP plasmid by Gibson assembly. For control plasmid, WPRE sequence in 3′ UTR of GFP gene in pAAV-CAG-GFP was eliminated by PCR-based amplification.
For d2EGFP plasmid construction, firefly luciferase gene from pmirGLO-3XmiR-1 vector was replaced by GBA 5′ UTR, d2EGFP CDS, and GBA 3′ UTR to make control plasmid. UTRs from luciferase constructs were amplified and inserted into this d2EGFP control vector.
For tethering and rescue construction, pmirGLO-3xBoxB was generated from pmirGLO-3xmir1-5xBoxB vector, and for pGK-ZCCHC2 construct, ZCCHC2 amplified from HCT116 cDNA was subcloned into pGK vector. Tethering constructs including ZCCHC2 ΔC (1-375 a.a) and ZCCHC2 ΔN (201 aa-1,178 a.a) constructs were generated by subcloning ZCCHC2 in pGK-TEV-HA-AN. To generate ZCCHC2 zinc-finger mutated version, first and second cysteines of the zinc-finger (CX2CX3GHX4C) were replaced with serine by mutagenesis PCR. For TNRC6B C-term constructs, C-term region (716-1,028 a.a) of TNRC6B gene was amplified from HCT116 cDNA and was subcloned into pGK and pGK-TEV-HA-AN vector by Gibson assembly.
For RaPID experiment, EGFP CDS, 3xBoxB sequence, and eK5 sequence were amplified from d2EGFP, pmirGLO-3xBoxB, and pmirGLO-3xmir-1-eK5 plasmids, respectively, and subcloned into the pCK vector by Gibson assembly.
The list of plasmids generated by this method is shown in Table 1.
Luciferase assay was performed as follows. For luciferase reporter assay by Lipofectamine 3000, 2E5 of HeLa or HCT116 cells on a 24-well plate were transfected with 100 ng of pmirGLO-3XmiR-1 plasmid on Day 0, and harvested on Day 2. For knockdown experiment, 100 ng of the pmirGLO-3XmiR-1 K5 plasmid and 40 nM of siRNAs (Dharmacon siRNA smartpool) were co-transfected using Lipofectamine 3000 for each target gene. For ZCCHC2 structure experiment, 50 ng of the pmirGLO-3XmiR-1 plasmid and 60 ng of pGK-null, pGK-ZCCHC2, or pGK-ZCCHC2 zinc-finger mutant construct were co-transfected. For tethering experiment, 50 ng of pmirGLO-3xBoxB plasmid and 60 ng of pGK-ZCCHC2 wild-type/mutant constructs were co-transfected, with or without λN-HA-TEV flag. For the luciferase assay, cells were lysed and analyzed using the Dual-luciferase reporter assay system (Promega) according to the manufacturer's instructions.
RNA was extracted by RNeasy Mini Kit (Qiagen, 74106), treated with DNase (Qiagen, 79254), and reverse-transcribed with Primescript RTmix (Takara, RR036A). mRNA levels were measured with SYBR Green assays (Life Technologies, 4367659) and StepOnePlus Real-Time PCR System (Applied Biosystems) or QuantStudio 3 (Applied Biosystems). The list of RT-qPCR primers is shown in Table 1.
AAV generation and purification were performed as follows. 293 AAV cell lines (Cell Biolabs, #AAV-100) were cultured in DMEM with 10% FBS, 0.1 mM MEM Non-essential Amino Acids (NEAA), and 2 mM L-glutamine. For producing AAVs carrying GFP proteins, the 293 AAV cells were seeded overnight in a 150-mm petri dish and when the confluence reached 70%, pAAV-CAG-GFP plasmid variants (Addgene, 37825) along with pAdDelta6F6 (Addgene, 112867) and pAAVDJ (Cell Biolabs, VPK-420-DJ) plasmids were co-transfected with Lipofectamine 3000 and p3000. After 72 hours of transfection, the cells were harvested and resuspended in 2.5 ml of serum-free DMEM. Then, cell lysis was performed through 4 rounds of freezing/thawing (30-min freezing in ethanol/dry ice and 15-min thawing in 37° C. water bath, in each cycle). AAV supernatants were collected after centrifugation at 10,000×g for 10 minutes at 4° C. After purifying the AAVs using the ViraBind™ AAV Purification Kit (Cell Biolabs), viral titers were measured using the QuickTiter™ AAV Quantitation Kit (Cell Biolabs) according to the manufacturer's instructions. For transduction, HeLa cells were seeded in a 12-well plate and infected by AAV with 2,000 and 10,000 moi along with mock infection with PBS as a control. After 5 days of infection, the GFP signal was detected using a flow cytometer (BD Accuri C6 Plus).
For in vitro transcribed RNAs, DNA templates were prepared by PCR using a forward primer (T7 promoter+gene_specific_F) and a reverse primer (T120+gene_specific_R, with two nucleotides of 2′-O-Methylated deoxyuridine at the 5′ end). 250 ng of DNA templates was in vitro transcribed using the mMESSAGE mMACHINE™ T7 Transcription Kit (Invitrogen, AM1344) and Components (7.5 mM ATP/CTP/UTP [NEB, N0450S] each, 1.5 mM GTP, and 6 mM CleanCap® Reagent AG (3′ OMe) [TriLink Biotechnologies]). The DNA templates were removed using Recombinant DNase I (RNase-free) and cleaned up using the RNeasy MiniElute Cleanup Kit (Qiagen, 74204). The primers used for in vitro transcription template preparation are shown in Table 1.
11. Preparation and Analysis of mRNA Transfected Samples
2E5 of HeLa cells on a 12-well plate were transfected with in vitro transcribed RNAs using Lipofectamine MessengerMax. For samples transfected with luciferase mRNA, the cells were lysed and analyzed by Dual-luciferase reporter assay system according to the manufacturer's instructions. For d2EGFP samples, the cells were lysed in RIPA lysis and extraction buffer (Thermo, 89901), which contains 1× protease inhibitor and 1× phosphatase inhibitor, on ice for 10 minutes and then centrifuged. The samples were boiled with 5×SDS buffer and loaded on Novex SDS-PAGE gel (10-20%) using the ladder (Thermo, 26616). The gel was transferred to a methanol-activated PVDF membrane (Millipore), then blocked with PBS-T containing 5% skim milk, followed by probing with primary antibodies and washing three times with PBS-T. Anti-EGFP (1:3,000, CAB4211, Invitrogen), and anti-alpha-TUBULIN (1:300, Abcam, ab52866) were used as the primary antibodies. Anti-mouse or anti-rabbit HRP-conjugated secondary antibodies (Jackson ImmunoResearch Laboratories) were incubated for 1 hour and washed 3 times with PBS-T. Chemiluminescence was conducted with West Pico or Femto Luminol reagents (Thermo), and the signals were detected by ChemiDoc XRS+ System (Bio-Rad). For d2EGFP samples, the GFP signals were detected by a flow cytometer (BD Accuri C6 Plus).
Hire-PAT assay and signal processing of capillary electrophoresis data were performed as described in the literature (Kim et al., Nat. Struct. Mol. Biol., 2020, 27, 581-588). Poly(A) site of the firefly luciferase gene was used as confirmed by Sanger sequencing in the referenced literature, and forward PCR primers for the poly(A) site are listed in Table 1.
To measure the poly(A) tail length distribution upon RG7834 treatment, HeLa cells were transfected with the pmirGLO-3XmiR-1 plasmid containing the K5 element in the 3′ UTR of firefly luciferase treated with R00321 (Glixx Laboratories Inc, GLXC-11004) or RG7834 (Glixx Laboratories Inc, GLXC-221188), and harvested within two days. To compare the poly(A) tail length distribution between parental cells and ZCCHC2 knockout, parental cells and ZCCHC2 knockout cells were prepared in the same way as the RG7834-treated sample. To perform gene-specific TAIL-seq, rRNA-depleted total RNAs (Truseq Strnd Total RNA LP Gold, Illumina, 20020599) were ligated to the 3′ adapter and partially fragmented by RNase T1 (Ambion). After purification on a Urea-PAGE gel (300-1500 nt), the RNA was reverse transcribed and amplified by PCR. For PCR amplification of the firefly luciferase gene, GS-TAIL-seq-FireflyLuc-F was used as the forward primer. The libraries were sequenced on the Illumina platform (Miseq) using the PhiX control library v.2 (Illumina) containing a spike-in mixture, with a paired-end run (51X251 cycles). The TAIL-seq sequencing data have been deposited in the Zenodo database with the identifier DOI:10.5281/zenodo.6786179.
The TAIL-seq was analyzed using Tailseeker v.3.1.5. For each transcript, genes were identified by mapping read 1 to the firefly luciferase construct sequence and the human transcriptome using bowtie2.2.6. Next, the corresponding poly(A) tail length and modifications at the 3′ end were extracted using read 2. The mixed tailing ratio was calculated from transcripts with poly(A) tails longer than 50 nt.
TENT4 dKO cells were prepared using the same method as described in the literature by Kim et al. In addition, ZCCHC2 and ZCCHC14 knockout cell lines were also prepared according to the method described in the literature by Kim et al. HeLa cells in a 6-well plate and HCT116 cells in a 24-well plate were transfected with 300 ng of the pSpCas9(BB)-2A-GFP-px458 plasmid (Addgene #48138) containing sgRNA targeting ZCCHC2 (ACCTCAGGACGGACTTACCG, PAM sequence: TGG) and sgRNA targeting ZCCHC14 (CAAGTGGGCAGCGCGCGCCGCC [SEQ ID NO: 97], PAM sequence: CGG), respectively, using Metafectene (Biontex, T020). After single-cell screening, knockout strains were confirmed by Sanger sequencing and western blot analysis. The parental and modified genome sequences are listed in Table 1, with the inserted sequences highlighted in red.
RaPID (RNA-protein interaction detection) assay was performed as follows. In detail, a BASU-expressing stable HeLa cell line was generated by transducing lentiviral delivery constructs produced from Lenti-X 293T (Clontech, 632180) and the BASU RaPID plasmid (Addgene #107250). 1E7 cells from a 150 mm plate were transfected with 40 μg of RNA synthesized above, using Lipofectamine mMAX (Life Technologies, LMRNA015). After 16 hours, the cells were treated with 200 μM biotin (Sigma, B4639) for 1 hour. The treated cells were lysed on ice for 10 minutes using RIPA lysis and extraction buffer (Thermo, 89901) containing 1× protease inhibitor and 1× phosphatase inhibitor, followed by centrifugation. The lysate was incubated with Pierce streptavidin beads (Thermo, 88816) at 4° C. overnight with rotation. The beads were washed three times with wash buffer 1 (1% SDS containing 1 mM DTT, protease, and phosphatase inhibitor cocktails), was washed once with wash buffer 2 (0.1% Na-DOC, 1% Triton X-100, 0.5 M NaCl, 50 mM HEPES pH 7.5, 1 mM DTT, 1 μM EDTA containing protease and phosphatase inhibitor cocktails), and then washed once with wash buffer 3 (0.5% Na-DOC, 150 mM NaCl, 0.5% NP-40, 10 mM Tris-HCl, 1 mM DTT, 1 μM EDTA containing protease and phosphatase inhibitor cocktails).
For western blot, proteins were eluted using Elution buffer (1.5× Laemmli sample buffer, 0.02 mM DTT, 4 mM Biotin) and analyzed by western blot using anti-ZCCHC2 (1:250, Atlas Antibodies, HPA040943), anti-TENT4A (1:500, Atlas Antibodies, HPA045487), anti-alpha-TUBULIN (1:300, Abcam, ab52866), anti-HA (1:2000, Invitrogen, 715500) primary antibodies. For LC-MS/MS analysis, the samples were washed six times with digestion buffer (50 mM Tris, pH 8.0) at 37° C. for 1 minute. After washing, the protein-bound beads were incubated at 37° C. for 1 hour in 180 μL of digestion buffer containing 2 μL of 1 M DTT, followed by the addition of 16 μL of 0.5 M IAA and further incubation at 37° C. for 1 hour. Then, 2 μL of 0.1 g/L trypsin was added, and the resulting mixture was incubated overnight at 37° C. The remaining detergents were removed using HiPPR (Thermo, 88305) and washed with ZipTip C18 resin (Millipore, ZTC18S960) prior to LC-MS/MS analysis.
LC-MS/MS analysis was carried out using an Orbitrap Eclipse Tribrid (Thermo) coupled with a nanoAcquity system (Waters). The capillary analytical column (75 μm i.d.×100 cm) and trap column (150 μm i.d.×3 cm) were packed with 3 μm of Jupiter C18 particles (Phenomenex). The LC flow was set to 300 nL/min with a 60-minute linear gradient ranging from 95% solvent A (0.1% formic acid (Merck)) to 35% solvent B (100% acetonitrile, 0.1% formic acid). Full MS scans (m/z 300-1,800) were acquired at 120 k resolution (m/z 200). High-energy collision-induced dissociation (HCD) fragmentation occurred at 30% normalized collision energy (NCE) with 1.4th precursor isolation window. MS2 scans were acquired at a resolution of 30 k.
MS/MS raw data were analyzed using MSFragger1 (v3.7), IonQuant2 (v1.8.10), and Philosopher3 (v4.8.1) integrated into FragPipe (v18.0). For label-free protein identification and quantification, a built-in FragPipe workflow (LFQ-MBR) was used with trypsin specified as the enzyme. The target-decoy database (including contaminants) was generated using FragPipe from the Swiss-Prot human database (October 2022). The combined_protein.tsv file was used for further analysis. For the enrichment cutoff, a Log2FC greater than 1, based on at least two replicate experiments, was used.
For co-IP experiment, parental cells and TENT4 dKO cells on a 150 μl plate were lysed on ice for 20 minutes using Buffer A (100 mM KCl, 0.1 mM EDTA, 20 mM HEPES [pH 7.5], 0.4% NP-40, 10% glycerol) containing 1 mM DL-Dithiothreitol (DTT), 1× protease inhibitor, and RNase A (Thermo, EN0531), and then centrifuged. For immunoprecipitation, 12.5 μg of antibody (NMG, anti-TENT4A, and anti-TENT4B) conjugated to protein A and G sepharose beads (1:1 mixture, total 20 μl) was used with 1 mg of the lysates. After incubation at 4° C. for 2 hours, the beads were washed, boiled in 20 μl of 2×SDS buffer, and loaded onto a 4-12% (Novex) SDS-PAGE gel with the ladder (Thermo, 26616 and 26619). For domain co-IP experiment, full-length ZCCHC2, truncated construct of ZCCHC2, and negative construct having FLAG tag were transfected in ZCCHC2KO cells, and the cells were lysed within 2 days. 10 μl of ANTI-FLAG® M2 Affinity Gel (Merck, A2220-10ML) were added to 1 mg of the lysates and immunoprecipitation was performed for 2 hr incubation at 4° C. For the input sample, 50 μg of cell lysates were used. After the gel transferring to a methanol-activated PVDF membrane (Millipore), the membrane was blocked with PBS-T containing 5% skim milk, probed with primary antibodies, and washed three times with PBS-T. Anti-ZCCHC2 (1:250, Atlas HPA040943), anti-ZCCHC14 (1:1,000, Bethyl Laboratories, A303-096A), anti-TENT4A (1:500, Atlas Antibodies, HPA045487), anti-TENT4B (1:500, lab-made), anti-GAPDH (1:1,000, Santa Cruz, sc-32233), and anti-FLAG (1:1,000, Abcam, ab1162) were used as the primary antibodies. Anti-mouse or anti-rabbit HRP-conjugated secondary antibodies (Jackson ImmunoResearch Laboratories) were incubated for 1 hour and washed 3 times with PBS-T. Chemiluminescence was conducted with West Pico or Femto Luminol reagents (Thermo, 34580 and 34095), and the signals were detected by ChemiDoc XRS+ System (Bio-Rad).
MS/MS data were processed using MaxQuant v.1.5.3.30 with default settings and the human Swiss-Prot database v.12/5/2018, applying a 0.8% FDR cutoff at the protein level.
Among the MaxQuant output files, MaxLFQ intensity values were extracted from the proteingroups.txt file. After adding a pseudo-value of 10,000 to MaxLFQ intensity values, Limma was performed and significant genes were filtered by Log2FC>0.8 and FDR<0.1.67.
Using the UniProt Align tool, ZCCHC2 (Q9COB9), ZCCHC14 (AOA590UJW6), and GLS-1 (Q814M5) were aligned, and conservation scores for the three proteins were calculated
For ZCCHC2 immunoprecipitation, a stable HeLa cell line expressing EGFP with the K5 element in the 3′ UTR was generated by transducing lentiviral vectors produced from Lenti-X 293T (Clontech, 632180) cells according to the constructs. In addition, the cells were lysed by treatment on ice for 30 minutes with lysis buffer (20 mM HEPES pH 7.6 [Ambion, AM9851 and AM9856], 0.4% NP-40, 100 mM KCl, 0.1 mM EDTA, 10% glycerol, 1 mM DTT, 1× Protease inhibitor [Calbiochem, 535140]), followed by centrifugation to obtain the cell lysate. As a negative control, 10 μg of normal rabbit IgG (Cell Signaling, 2729S) was used, and for ZCCHC2 immunoprecipitation, 10 μg of ZCCHC2 antibody (Atlas, HPA040943) was used. After antibodies being conjugated to protein A magnetic beads (Life Technologies, 10002D), 1 mg of cell lysates were incubated with antibody-conjugated beads for 2 hours and then washed with wash buffer (the same lysis buffer but with 0.2% NP-40). After adding 5 ng of firefly luciferase mRNA to each sample as a spike-in used for normalization, RNAs were purified by TRIzol reagent (Life Technologies) and used for RT-qPCR. The RT-qPCR primers are shown in Table 1.
Subcellular fractionation was conducted as follows. In detail, to obtain cytoplasmic fraction, cells were lysed in 200 μl of cytoplasmic lysis buffer (0.2 μg/μl digitonin [Merck, D141], 150 mM NaCl, 50 mM HEPES [pH 7.0-7.6], 0.1 mM EDTA, 1 mM DTT, 20 U/ml RNase inhibitor, 1× Protease inhibitor, 1× Phosphatase inhibitor). For the membrane and nuclear fractions, a subcellular protein fractionation kit (Thermo Scientific, 78840) was used according to the manufacturer's instructions. Anti-GM130 (1:500, BD Bioscience, 610822) and anti-Histone (1:2000, Cell Signaling, 4499) were used as the primary antibodies.
The reagents and resources used in the experimental examples of the present disclosure are shown in Table 2 below.
To build a library of viral RNA elements, a two-step approach was used due to the technical limitations of oligo synthesis: the initial screens were performed with human viruses, followed by expanding the secondary screen to include other related species. To identify viruses that can infect humans, the NCBI database, which currently annotates 502 human viral species that belong to 114 genera and 40 families, was used.
As shown in
MASTADENOVIRUS
CYTOMEGALOVIRUS
LYMPHOCRYPTOVIRUS
RHADINOVIRUS
ROSEOLOVIRUS
SIMPLEXVIRUS
VARICELLOVIRUS
MEGALOCYTIVIRUS
ALPHAPAPILLOMAVIRUS
BETAPAPILLOMAVIRUS
GAMMAPAPILLOMAVIRUS
MUPAPILLOMAVIRUS
NUPAPILLOMAVIRUS
ALPHAPOLYOMAVIRUS
BETAPOLYOMAVIRUS
DELTAPOLYOMAVIRUS
CENTAPOXVIRUS
MOLLUSCIPOXVIRUS
ORTHOPOXVIRUS
PARAPOXVIRUS
YATAPOXVIRUS
HUCHISMACOVIRUS
PORPRISMACOVIRUS
ALPHATORQUEVIRUS
BETATORQUEVIRUS
GAMMATORQUEVIRUS
GYROVIRUS
CIRCOVIRUS
CYCLOVIRUS
GEMYCIRCULARVIRUS
BOCAPARVOVIRUS
DEPENDOPARVOVIRUS
ERYTHROPARVOVIRUS
UNCLASSIFIED
PARVOVIRINAE
PROTOPARVOVIRUS
TETRAPARVOVIRUS
PICOBIRNA
VIRUS
ORBIVIRUS
ORTHOREOVIRUS
ROTAVIRUS
SEADORNAVIRUS
TOTIVIRIDAE
MAMASTROVIRUS
ASTROVIRIDAE
NOROVIRUS
SAPOVIRUS
VESIVIRUS
ALPHACORONAVIRUS
BETACORONAVIRUS
FLAVIVIRUS
HEPACIVIRUS
PEGIVIRUS
PESTIVIRUS
ORTHOHE
PEVIRUS
RUBIVIRUS
HUSAVIRUS
CARDIOVIRUS
COSAVIRUS
ENTEROVIRUS
HEPATOVIRUS
KOBUVIRUS
PARECHOVIRUS
ROSAVIRUS
SALIVIRUS
TOROVIRUS
ALPHAVIRUS
MAMMARENAVIRUS
ORTHOBORNAVIRUS
EBOLAVIRUS
MARBURGVIRUS
ORTHOHANTAVIRUS
DELTAVIRUS
ORTHONAIROVIRUS
ALPHAINFLUENZAVIRUS
BETAINFLUENZAVIRUS
GAMMAINFLUENZAVIRUS
THOGOTOVIRUS
HENIPAVIRUS
MORBILLIVIRUS
ORTHORUBULAVIRUS
PARARUBU
LAVIRUS
RESPIROVIRUS
ORTHOBUNYAVIRUS
BANDAVIRUS
PHLEBOVIRUS
METAPNEUMOVIRUS
ORTHOPNEUMOVIRUS
LEDANTEVIRUS
LYSSAVIRUS
TIBROVIRUS
VESICULO
VIRUS
BETARETROVIRUS
DELTARETROVIRUS
GAMMARETROVIRUS
LENTIVIRUS
UNCLASSIFIED
RETROVIRIDAE
SPUMAVIRUS
ORTHOHEPADNAVIRUS
As shown in
For functional assessment, the plasmid pool was transfected into the human colon cancer cell line (HCT116) to quantify the impact of each element on gene expression (
To determine the effect of 30,302 viral segments (30,190 segments with all three barcodes detected) on mRNA abundance, the following experiment was conducted. The experiment results were reproducible between quadruplicate experiments and between barcodes. In detail, the positive controls spanning 1E and WPRE increased mRNA levels relative to the 1E mutants (
Thus, segments that stabilize RNA (Log2(RNA/DNA)>0.5, p-value<0.05) or destabilize RNA (Log2(RNA/DNA)<−1, p-value<0.001) were effectively identified through this experiment (Tables 4 and 5). The 50 segments in Table 4 were found to exhibit excellent RNA abundance, with Log2(RNA/DNA) values similar to or higher than those of the positive controls WPRE or HCMV 1E (
Segments that Stabilize RNA
Segments that Destabilize RNA
HUMAN_BETAHERPES
VIRUS_6B_(HHV-6B)
HUMAN_BETAHERPES
VIRUS_6B_(HHV-6B)
HUMAN_GAMMAHERPES-
VIRUS_4_(EPSTEIN-
HUMAN_ALPHAHERPES-
VIRUS_2_(HERPES_
SIMPLEX_VIRUS_2)
AICHI_VIRUS_1
SALIVIRUS_A
HUMAN_GAMMAHERPES-
VIRUS_8_(KAPOSI′S_
HUMAN ALPHAHERPES-
VIRUS_2_(HERPES_
SIMPLEX_VIRUS_2)
HUMAN_GAMMAHERPES-
VIRUS_4_(EPSTEIN-
HUMAN_ALPHAHERPES-
VIRUS_1_(HERPES_
SIMPLEX_VIRUS_1)
HUMAN_GAMMAHERPES-
VIRUS_4_(EPSTEIN-
HUMAN_GAMMAHERPES-
VIRUS_8_(KAPOSI′S_
HUMAN_GAMMAHERPES-
VIRUS_4_(EPSTEIN-
HUMAN_BETAHERPES
VIRUS_5_(HHV-
MOLLUSCUM_
CONTAGIOSUM_VIRUS_
HUMAN_BETAHERPES
VIRUS_5_(HHV-
MOLLUSCUM_
CONTAGIOSUM_VIRUS_
PEGIVIRUS_A
Also, the translational effects of 30,155 segments (29,786 segments with all three barcodes detected) were assessed using the polysome profiling-sequencing data (
The very weak correlation between the estimated mRNA abundance and translational efficiency suggests that most viral elements influence either mRNA abundance or translation. Nevertheless, some segments were found to affect both aspects. For validation, 16 candidates, not previously studied, which enhanced both RNA abundance and translation were selected (
The K4 element from the 3′ UTR of Saffold virus (GenBank: NC_009448.2, 7,931-8,060) and the K5 element from the 3′ UTR of Aichi virus 1 (AiV-1) (GenBank: NC_001918.1, 8,122-8,251) were further investigated (
Saffold virus and AiV-1 belong to the genus Cardiovirus and genus Kobuvirus, respectively, and are broadly distributed and poorly investigated viruses that cause relatively mild symptoms, including gastroenteritis.
To map the boundaries of the elements, the extended or truncated segments of K4 and K5 were examined. The extended 180-nt segment of K4 covering the entire 3′ UTR of Saffold virus (“eK4,” 7,881-8,060) showed similar effects to the original K4 segment, confirming that the 3′ terminal 130 nt is sufficient to convey the activity of K4. However, the extended form of K5 (“eK5,” 8,067-8,251, 185 nt) further enhanced luciferase expression, outperforming other elements, including the original K5, K4, and the extended K4 (eK4) (
To characterize K5 in more detail, a second round of high-throughput assay was performed on K5 mutants and homologs (
As shown in
To investigate the phylogenetic distribution of K5, the 3′ UTR segments from 88 picornavirus species (K5 and 87 other picornavirus elements) were included in the secondary screen. Among these picornavirus, 43 kobuvirus segments (Table 8; with at least 59% homology to K5) upregulated mRNA levels further than the nonfunctional control K5m, which has a deletion in the G bulge in the second hairpin (
Kobuvirus sp. strain 16317 × 87
Kobuvirus sewage Aichi gene for
Aichivirus A strain Wencheng-Rt386-2
Kobuvirus SZAL6-KoV/2011/HUN,
Kobuvirus sp. strain 20724 × 43
Aichivirus A strain rat08/rAiA/HUN,
Kobuvirus sewage Kathmandu isolate
Aichi virus 1 strain PAK585 polyprotein
Kobuvirus dog/AN211D/USA/2009
Aichivirus A strain FSS693 polyprotein
Kobuvirus sp. strain 20724 × 41
Aichivirus A7 isolate RtMruf-
Aichi virus strain D/VI2244/2004
Aichi virus isolate Chshc7, complete
Aichi virus isolate
Aichi virus strain D/VI2321/2004
Aichi virus strain kvgh99012632/2010
Aichi virus strain D/VI2287/2004
Aichi virus isolate BAY/1/03/DEU from
Outside the Kobuvirus genus, most picornaviral 3′ UTRs failed to increase mRNA abundance (
GTACGCGGCCGTTCTGACGTTGGAATTCTGTAGATGAAAGTTAGCTAGGA
GTGTACGCGGTCATCGGGGACCCCTCCTGGCCTTTGGTTTATTGGTGAAT
5. Enhancement of Gene Expression from Vectors and Synthetic mRNAs by K5
To test whether K5 can function in other molecular contexts, a vector system based on adeno-associated virus (AAV), a single-stranded DNA virus belonging to the Parvoviridae family that enables efficient gene delivery with low toxicity for human gene therapy, was used. As shown in
Minimal K5 (120 nt) or eK5 (185 nt) sequences, along with inactive mutants (K5m and eK5m) and WPRE, were evaluated as controls. These segments were inserted downstream of the EGFP-coding sequences within AAV vectors, and their impact on gene expression was measured (
In addition, the above experiment was repeated using a lentiviral vector. As a result, it was confirmed that, similar to AAV vectors, eK5 also increased GFP expression when using the lentiviral vector (
In vitro transcribed (IVT) mRNA represents another important platform for gene transfer, as exemplified by the COVID-19 vaccines. To test the effect of K5 on IVT mRNAs, luciferase-encoding mRNAs were synthesized with or without functional eK5, as shown in
A similar observation was made with another set of IVT mRNAs containing the GFP coding sequences (d2EGFP) and the alpha-globin 3′ UTR (GBA), widely used to stabilize mRNAs. As shown in
In the time-course experiment using synthetic mRNA transfection, the prolonged protein expression (
To test the possibility that this change involves tail extension catalyzed by terminal nucleotidyl transferases (TENTs), TENTs were depleted, and luciferase assays were performed with K5 reporter constructs. As shown in
TENT4A (also known as PAPD7, TRF4-1, and TUT5) and TENT4B (also known as PAPD5, TRF4-2, and TUT3) extend poly(A) tails with the occasional incorporation of non-adenosine residues, a process known as “mixed tailing”. The resulting mixed tail effectively impedes deadenylation, stabilizing the transcript, because the main deadenylase complex, CCR4-NOT, has a preference for adenosine residues. To investigate the direct involvement of mixed tails by measuring the frequency of mixed tails, a modified version of TAIL-seq (named as “gene-specific TAIL-seq(GS-TAIL-seq)”) was developed. In detail, RNA was ligated to the 3′ adapter conjugated with a biotin and partially fragmented. The 3′ end fragments were enriched using streptavidin beads, reverse transcribed with primers binding to the adapter, and then amplified by PCR with a gene-specific forward primer. The sequencing data show that K5 reporter mRNA has non-adenosine residues mainly at terminal and penultimate positions, as expected for mixed tails. As shown in
Moreover, as shown in
Interestingly, however, it was observed that K5 remains fully active in the absence of ZCCHC14, an adapter protein known to recruit TENT4 to viral RNAs. As shown in
To identify the potential K5 adapters, the ‘RNA-protein interaction detection (RaPID)’ method was performed. As shown in
Orthogonally, the TENT4 complex that could be obtained by in vitro RNA-pulldown experiments using HCMV 1E stem-loop (SL2.7) as a bait was examined. As a result, in addition to TENT4A, TENT4B, ZCCHC14, SAMD4A, and K0355, which are known to interact with 1E, ZCCHC2 was also found (
To validate the interaction between ZCCHC2 with eK5, western blotting was performed following the RaPID experiment, which detected ZCCHC2 associated with the eK5 bait (
ZCCHC2 is a poorly characterized protein of 126 kDa with long intrinsically disordered regions, a PX domain, and a CCHC-type zinc finger (ZnF) domain (
To test if ZCCHC2 binds to TENT4, co-immunoprecipitation experiments were conducted. As shown in
Next, to investigate the function of ZCCHC2 in K5-mediated regulation, the ZCCHC2 gene in HeLa cells was ablated with CRISPR-Cas9. Using this KO, Hire-PAT assays were conducted to examine poly(A) tail length distribution. As shown in
Consistently, luciferase assays and RT-qPCR using the eK5 reporters revealed that eK5 can no longer enhance reporter expression in the absence of ZCCHC2. This result was confirmed using the longer eK5 constructs. As shown in
To verify the role of ZCCHC2, rescue experiments were performed by transfecting the ZCCHC2-expression plasmid into ZCCHC2 KO cells. As shown in
To further confirm the direct activity of ZCCHC2 on the target RNA, tethering experiments were conducted by utilizing a luciferase reporter containing BoxB elements, instead of K5. As shown in
Next, the specific region of ZCCHC2 responsible for TENT4 recruitment was identified. As shown in
Based on these results, it was confirmed that ZCCHC2 uses its N terminus and C terminus to interact with TENT4 and K5, respectively. As shown in
To identify the minimal range required for K4 element functionality, the regulatory element was truncated and a dual-luciferase assay was performed as follows. The original 130-nt K4 element was successfully reduced to an 11-70-nt range (K4 min) without activity loss. Further truncations of the K4 min region, however, led to a decrease in luciferase activity (
Systematic mutagenesis was used to investigate both the sequence and structural characteristics necessary for K4 element function. In the mutagenesis library, we introduced single-nucleotide substitutions, as well as single and two-consecutive-nucleotide deletions, across the entire K4 element. Paired mutations in the K4 min region were designed to preserve the overall secondary structure (
The oligo pool was cloned into an integrase-site GFP-containing plasmid, which was subsequently integrated into the genome of HEK293T cells. Cells were sorted into four bins via FACS (
As anticipated, mutations outside the truncated K4 min region did not significantly affect activity, affirming that the functional truncated versions of the K4 element retain the essential features required for stability enhancement (
To evaluate the effects of each substitution, the mean expression was calculated for each nucleotide and mapped across the structure. The (G/A)NNCCA loop is required and the overall stem was important for the expression. Additionally, we calculated the ΔExpression of paired bases with unpaired bases based on compensatory mutations to assess the necessity of base-pairing in the stem region (
10. Practical Applications in mRNA Therapeutics
To evaluate the therapeutic potential of the K4 element in mRNA-based treatments, we tested its effect on in vitro transcribed (IVT) mRNAs. IVT mRNAs, with and without the K4 element, were transfected into HCT116 cells using lipid nanoparticle (LNP) formulation. The K4 element demonstrated a significant impact on IVT mRNAs, increasing expression levels up to 10-fold compared to controls at 96 hours post-transfection (
Additionally, we conducted a mouse immunization study using IVT mRNAs (
To further explore practical applications in mRNA therapeutics, we tested m1ψ-modified IVT mRNAs with various combinations of known stabilizing elements, including K4, 1E, and K3 (
We next assessed in vivo luciferase expression by encapsulating the mRNA in lipid nanoparticles (LNPs) and administering it via intravenous (IV) injection with one of the combinations, K3m2K4. We observed a substantial increase in luciferase expression, particularly on Day 3 post-injection. (
From the foregoing description, it will be apparent to those skilled in the art that the present invention may be implemented in various specific forms without altering its technical concept or essential features. The experimental examples and embodiments described above should therefore be considered illustrative and not restrictive in any way. The scope of the present invention should be interpreted to encompass all modifications and variations that fall within the meaning and scope of the appended claims and their equivalents, rather than being limited to the detailed description provided above.
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0080073 | Jun 2022 | KR | national |
This application is a continuation-in-part application of International Application No. PCT/KR2023/009153 filed on Jun. 29, 2023, which claims priority to Korean Patent Application No. 10-2022-0080073 filed on Jun. 29, 2022, the entire contents of which are herein incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/KR2023/009153 | Jun 2023 | WO |
Child | 19003992 | US |