This patent application claims priority to Korean Patent Application No. 10-2020-0144787 filed with the Korean Intellectual Property Office on Nov. 2, 2020, the disclosure of which is incorporated herein by reference.
The present invention relates to an RNA interactome capture protocol and an antiviral composition discovered using the same.
Coronaviruses (CoVs) are a group of enveloped viruses with nonsegmented, single-stranded, positive-sense (+) RNA genomes, which belong to order Nidovirales, family Coronaviridae, and subfamily Coronavirinae (Lai and Cavanagh, 1997). They are classified into four genera: Alphacoronavirus and Betacoronavirus which exclusively infect mammals and Gammacoronavirus and Deltacoronavirus which primarily infect birds (Woo et al., 2012). Human CoVs such as Alphacoronavirus HCoV-229E and Betacoronavirus HCoV-OC43 have been known since the 1960s (Hamre and Procknow, 1966) as etiologic agents of the common cold. However, with the beginning of the 21st century, the world experienced the emergence of five novel human coronavirus species including highly pathogenic Severe acute respiratory syndrome coronavirus (SARS-CoV) in 2002 (Peiris et al., 2003), Middle East respiratory syndrome coronavirus (MERS-CoV) in 2012 (de Groot et al., 2013), and SARS-CoV-2 in 2019 (Zhou et al., 2020).
At the core of the coronavirus particle, the RNA genome is encapsulated in nucleocapsid (N) protein and surrounded by the viral membrane that contains spike (S) protein, membrane (M) protein, and envelope (E) protein (Lai and Cavanagh, 1997). The coronaviral RNA genome is ˜30 kb which is the longest among RNA viruses and contains a 5′-cap structure and a 3′ poly(A) tail (Bouvet et al., 2010; Lai and Stohlman, 1981). Upon cell entry, the genomic RNA (gRNA) acts as an mRNA to produce nonstructural proteins (nsps) that are required for viral RNA production (Perlman and Netland, 2009). The ORF1a encodes polypeptide 1a (pp1a, 440-500 kDa) that is cleaved into 11 nsps. The −1 ribosomal frameshift occurs immediately upstream of the ORF1a stop codon, allowing translation of downstream ORF1b, yielding a large polypeptide (pp1ab, 740-810 kDa) that is cleaved into 15 nsps. Together, 16 different nsp fragments are generated to construct double-membrane vesicles (DMV) and mediate subsequent steps of viral RNA synthesis.
After this initial stage of viral translation, the gRNA is used as the template for the synthesis of negative-strand (−) RNA intermediates which in turn serve as the templates for positive-sense (+) RNA synthesis (Snijder et al., 2016; Sola et al., 2015). Ten different canonical (+) RNA species are produced from the SARS-CoV-2 genome, which include one full-length gRNA and nine subgenomic RNAs (sgRNAs) (Kim et al., 2020a). All canonical viral (+) RNAs share the common 5′ end sequence called the leader sequence and the 3′ end sequences. The sgRNAs are generated via discontinuous transcription which leads to the fusion between the 5′ leader sequence and the “body” parts containing the downstream open reading frames (Sola et al., 2015) that encode structural proteins (S, E, M, and N) and accessory proteins (3a, 3c, 6, 7a, 7b, 8, and 9b) (Kim et al., 2020a).
To accomplish this, coronaviruses employ unique strategies to evade, modulate, and utilize the host machinery (Fung and Liu, 2019). For example, the gRNA molecules must be kept in an intricate balance between translation, transcription, and encapsulation by recruiting the right host RNA-binding proteins (RBPs) and forming specific ribonucleoprotein (RNP) complexes. As host cells counteract by launching RBPs such as RIG-I, MDAS, and Toll-like receptors (TLRs) to recognize and eliminate viral RNAs, the virus needs to evade the immune system using its components to win the arms race between virus and host. How such stealthy devices are genetically coded in this compact RNA genome is yet to be explored (Snijder et al., 2016). Thus, the identification of the RBPs that bind to viral transcripts (or the SARS-CoV-2 RNA interactome) is key to uncovering the molecular rewiring of viral gene regulation and the activation of antiviral defense systems.
Biochemical techniques for studying RNA-protein interactions have been developed (Ramanathan et al., 2019) with the advancement in protein-centric methods such as CLIP-seq (crosslinking immunoprecipitation followed by sequencing) (Ule et al., 2018). In CLIP-seq experiments, RNP complexes are crosslinked by UV irradiation within cells to identify direct RNA-protein interactions. The protein of interest is immunoprecipitated to identify the associated RNAs (Lee and Ule, 2018; Van Nostrand et al., 2020). More recently, RNA-centric methods have also been developed to profile the mRNA interactome and RNP complexes (Roth and Diederichs, 2015). After UV irradiation, the RNA of interest is purified with oligonucleotide probes and the crosslinked proteins are identified by mass spectrometry. For example, RAP-MS exhibits compelling evidence of highly confident profiling of proteins that bind to a specific RNA owing to a combination of long hybridization probes and harsh denaturing condition (Engreitz et al., 2013; McHugh et al., 2015).
Leading to the present disclosure, intensive and thorough research conducted by the present inventors into the development of a process that facilitates analyzing the binding of RNA viruses, more specifically, the genomic RNA of coronaviruses to the virus itself and/or host proteins. As a result, the inventors completed the development of a powerful RNP capture protocol to define the repertoire of viral and host proteins related to the transcriptome of coronavirus, and utilized it to identify new targets for suppression of coronavirus.
Therefore, an aspect of the present disclosure is to provide an antiviral composition.
Another aspect of the present invention is to provide a pharmaceutical composition for preventing or treating coronavirus infection.
Another aspect of the present invention is to provide a method for screening coronavirus inhibitors.
Another aspect of the present invention is to provide a method for establishing an RNA interactome.
According to an aspect thereof, the present disclosure provides an antiviral composition comprising one or more proteins selected from the group consisting of LARP1, FUBP3, FUBP1, FAM120A, FAM120C, EIF4H, RPS3, RPS9, SND1, CELF1, RALY, CNBP, TRIM25, ZC3HAV1, PARP12, PPP1CA, HDLBP, CNBP, TIAL, UPF1, and SHFL as an active ingredient.
The present inventors have made intensive research efforts to develop a process that facilitates analyzing the binding of RNA viruses, more specifically, the genomic RNA of coronaviruses to the virus itself and/or host proteins. As a result, we completed the development of a powerful RNP capture protocol to define the repertoire of viral and host proteins related to the transcriptome of coronavirus, and utilized it to identify new targets for suppression of coronavirus.
One embodiment of the RNP capture protocol designed to derive a component having antiviral activity and a component having proviral activity of the present invention includes the following steps: (a) inducing the RNA-protein cross-linking by irradiating a sample containing target RNA and protein with UV light; (b) performing Denaturation by DNase treatment; (c) capturing denatured RNP complexes in a sequence-specific manner using a biotinylated antisense oligonucleotide probe pool (wherein the biotinylated antisense oligonucleotides included in the probe pool are designed based on the target RNA sequence in a consecutive manner, and have the same length ranging from 45 to 135 nt, preferably from 60 to 120 nt, more preferably from 75 to 105 nt, or from 80 to 100 nt, or at 90 nt and their starting positions are spaced at regular intervals to induce partial overlap between the oligonucleotides; (d) performing Digestion by on-bead trypsin treatment; and (e) obtaining interactome information through analysis of the digestion products.
Through the RNP capture protocol including the above steps, the present inventors were able to obtain detailed information about the interactome, which is a set of proteins that form crosslinks with the genomic RNA (gRNA) of the coronavirus. The term “interactome” in the present invention refers to the collective set of proteins that bind to a target RNA sequence, specifically to a genomic RNA sequence of the coronavirus. More specifically, for example, in the case of the “SARS-CoV-2 RNA interactome,” it refers to the set of 109 proteins that were discovered through the RNP capture protocol.
The antiviral composition of the present disclosure comprises one or more proteins selected from the group consisting of LARP1, FUBP3, FUBP1, FAM120A, FAM120C, EIF4H, RPS3, RPS9, SND1, CELF1, RALY, CNBP, TRIM25, ZC3HAV1, PARP12, PPP1CA, HDLBP, CNBP, TIAL, UPF1, and SHFL as an active ingredient. The present inventors revealed through knock-down experiments that depletion of the above-described proteins leads to substantial upregulation of viral RNA, indicating that one or more proteins selected from the above-described protein group possess antiviral functions.
“LARP1” in the present disclosure is known to stabilizes 5′ TOP mRNAs encoding ribosomal proteins and translation factors, which contain the 5′ terminal oligopyrimidine (5′ TOP) motif in the 5′ UTR, and encodes ribosomal proteins and translation factors, and inhibits the translation of 5′ TOP mRNA in response to metabolic stress in an mTORC1-dependent manner.
“FUBP3” in the present disclosure is known as a nuclear protein with four KH domains, which binds to the 3′UTR of cellular mRNAs regulating mRNA localization, while its connection to the lifecycle of coronavirus is unknown.
“FUBP1” in the present disclosure is a single-strnaded DNA-binding protein that binds to various DNA elements, including the FUSE (Far upstream element) located upstream of c-myc. However, its connection to the lifecycle of coronaviruses is not known.
“FAM120A” in the present disclosure is known to be involved in mRNA transport in the cytoplasm, and acts as a scaffolding protein that activates src family kinases and allows src family kinases to phosphorylate and activate PI3-kinase.
“FAM120C” in the present disclosure is a potential transmembrane protein, which is known to be encoded by a gene located in the region associated with intellectual disability and autism and functions.
“EIF4 H” of the present disclosure, it is known to act as a cofactor of RNA helicase EIF4A together with EIF4B, and its depletion is known to lead to the formation of RNA granules.
“RPS3” of the present disclosure is a component of the 40S ribosomal subunit in eukaryotes, and it is known to be a multifunctional protein involved in DNA repair, apoptosis, and regulation of innate immune response to bacterial infections. However, its connection to viral inhibition is not known.
“RPS9” in the present disclosure is known as a component of the 40S ribosomal subunit in eukaryotes, and its Various expressions have been reported in colorectal cancer, but the correlation between its expression levels and the severity of the disease has not been established. Additionally, its connection to viral inhibition is also not known.
“S N D1” of the present disclosure is known to act as an oncogene in the progression of various carcinomas, and in particular, it is known to promote tumor angiogenesis in human hepatocellular carcinoma through a novel pathway including NF-kappaB and miR-221.
“CELF1” of the present disclosure is a protein known to enhance cell migration, invasion, and chemical resistance by targeting ETS2 in colon cancer, and corresponds to an RNA-binding protein, but its connection to virus inhibition is not known.
“RALY” of the present disclosure is known as an RNA binding protein. Infectious mononucleosis produces anti-EBNA-1 antibodies that cross-react with several normal human proteins, and RALY is one such antigen and is known to be a member of the heterogeneous nuclear ribonucleoprotein gene family.
“CNBP” of the present disclosure is found in many tissues of the body, and is the most abundant protein in the heart and muscle (skeletal muscle) used for exercise. It regulates the activity of other genes, and plays an essential role in normal development before birth, especially the normal development of muscles.
“TRIM25” of the present disclosure is known as a protein that binds to the CARD domain of RIG-I.
“ZC3HAV1 (ZAP/PARP13)” of the present disclosure is a protein that has been reported to degrade HIV RNA by recognizing CpG and recruiting decay factors.
“PARP12” of the present disclosure is a substance that belongs to the ADP-ribosyl transferase family, and its function is not been clearly identified.
“PPP1CA” of the present disclosure is a protein that associates with over 200 regulatory proteins to form holoenzymes which dephosphorylate their biological targets with high specificity.
“HDLBP” of the present disclosure is a conserved protein that contains 14 KH domains and has been implicated in viral translation of dengue virus.
“CNBP” of the present disclosure is a CCHC-type zinc finger nucleic acid binding protein, and it is known to form bonds to DNA and RNA. It is known to functionally act in cap-dependent translation of ornithine decarboxylase mRNA, and it is known to function in the sterol-mediated transcriptional regulation.
“TIAL” of the present disclosure is a type of RNA-binding protein that contains three RNA recognition motifs (RRMs). It is known to bind to adenine- and uridine-rich elements in mRNA and pre-mRNA.
“UPF1” of the present disclosure is a protein that is part of a multiprotein complex after splicing as an exon junction complex involved in nuclear release of mRNA.
“SHFL” of the present disclosure is a protein known to be associated to Dengue virus and autoimmune peripheral neuropathy.
The above-mentioned proteins are not known to be associated with the inhibition or lifecycle of the coronavirus.
According to another aspect thereof, the present disclosure provides an antiviral composition comprising nucleic acids encoding one or more proteins selected from the group consisting of LARP1, FUBP3, FUBP1, FAM120A, FAM120C, EIF4H, RPS3, RPS9, SND1, CELF1, RALY, CNBP, TRIM25, ZC3HAV1, PARP12, PPP1CA, HDLBP, CNBP, TIAL, UPF1, and SHFL as an active ingredient.
A nucleotide sequence of the above-mentioned nucleic acid molecule encoding the proteins of the present disclosure is understood to be the nucleotide sequence encoding the amino acid sequence constituting the fusion protein, and is not limited to any specific nucleotide sequence. This is evident to a person skilled in the art because even if variations occur in the nucleotide sequence, there are cases where the mutated nucleotide sequence, when expressed as a protein, does not result in changes in the protein sequence. This is known as codon degeneracy. Therefore, the nucleotide sequence includes nucleotide sequences that encode functionally equivalent codons or the same amino acids (for example, due to codon degeneracy, there are six codons for arginine or serine), or codons that encode biologically equivalent amino acids.
As used herein, the term “nucleic acid” refers to comprehensively including DNA (gDNA and cDNA) and RNA molecules, and the nucleotide as a basic constituent unit in the nucleic acid molecule includes naturally occurring nucleotides, and analogues with modified sugars or bases (Scheit, Nucleotide Analogs, John Wiley, New York (1980); and Uhlman & Peyman, Chemical Reviews, 90:543-584(1990)).
Considering the above-described mutation having biologically equivalent activity, it should be construed that the nucleic acid molecules of the present disclosure encoding the amino acid sequences constituting the fusion proteins also include sequences showing substantial identity therewith. The substantial identity refers to a sequence showing at least 60%, more preferably at least 70%, still more preferably at least 80%, still more preferably at least 90% nucleotide, and most specifically at least 95% identity when the sequence of the present invention and any other sequence are correspondingly aligned as much as possible and the aligned sequence is analyzed using algorithms commonly used in the art. Methods of alignment for sequence comparison are known in the art. Various methods and algorithms for alignment are disclosed in Smith and Waterman, Adv. Appl. Math. 2:482(1981); Needleman and Wunsch, J. Mol. Bio. 48:443(1970); Pearson and Lipman, Methods in Mol. Biol. 24: 307-31(1988); Higgins and Sharp, Gene 73:237-44(1988); Higgins and Sharp, CABIOS Corpet et al., Nuc. Acids Res. 16:10881-90(1988); Huang et al., Comp. Appl. BioSci. 8:155-65(1992); and Pearson et al., Meth. Mol. Biol. 24:307-31(1994), but are not limited thereto.
In an embodiment of the present disclosure, the virus is a coronavirus.
As used herein, the term “coronavirus” refers to a collective term for RNA viruses belonging to the subfamily Coronavirinae in the family Coronaviridae. They cause respiratory and gastrointestinal infections in humans and animals and are easily transmitted through mucosal and droplet transmission. Recently, they have emerged as a prominent virus causing severe infections in humans. Coronaviruses are known to be responsible for a significant portion of common colds in humans and can lead to direct viral pneumonia, secondary bacterial pneumonia, viral bronchitis, or secondary bacterial bronchitis.
In an embodiment of the present disclosure, the coronavirus is selected from the group consisting of 229E, OC43, NL63, HKU1, MERS-CoV, SARS-CoV and SARS-CoV-2, but is not limited thereto. The coronaviruses correspond to variants of coronavirus that can infect humans and are known to share a significant portion of genetic information in their gRNA. By applying the antiviral composition of the present invention, replication of gRNA and protein synthesis of the coronaviruses can be inhibited, thereby suppressing virus proliferation and activation.
According to another aspect thereof, the present disclosure provides an antiviral composition comprising a recombinant vector that contains nucleic acids encoding one or more proteins selected from the group consisting of LARP1, FUBP3, FUBP1, FAM120A, FAM120C, EIF4H, RPS3, RPS9, SND1, CELF1, RALY, CNBP, TRIM25, ZC3HAV1, PARP12, PPP1CA, HDLBP, CNBP, TIAL, UPF1, and SHFL as an active ingredient.
As used herein, the term “vector” refers to a means for expressing a gene of interest in a host cell, which comprising a plasmid vector; cosmid vector; and viral vectors such as bacteriophage vectors, adenoviral vectors, retroviral vectors, and adeno-associated viral vectors, and the like.
In an embodiment of the present disclosure, in the vector of the present invention, the nucleic acid molecule encoding one or more proteins selected from the group consisting of LARP1, FUBP3, FUBP1, FAM120A, FAM120C, EIF4H, RPS3, RPS9, SND1, CELF1, RALY, CNBP, TRIM25, ZC3HAV1, PARP12, PPP1CA, HDLBP, CNBP, TIAL, UPF1 and SHFL is operatively linked to the promoter of the vector.
As used herein, the term “operatively linked” refers to a functional linkage between a nucleic acid expression regulatory sequence (e.g., promoters, signal sequences, or arrays of transcription factor binding sites) and another nucleic acid sequence, whereby Control sequences will control the transcription and/or translation of said other nucleic acid sequences.
The recombinant vector system of the present invention can be constructed by various methods known in the art, and a specific method thereof is disclosed in Sambrook et al., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press (2001), which is incorporated herein by reference.
The vector of the present invention may be typically constructed as a vector for cloning or a vector for expression. In addition, the vector of the present invention may be constructed by using a prokaryotic or eukaryotic cell as a host.
For example, in cases where the vector of the present invention is an expression vector and an eukaryotic cell is used as a host cell, a promoter derived from the genome of a mammalian cell (e.g., metallothionein promoter, β-actin promoter, human hemoglobin promoter, and human muscle creatine promoter) or a promoter derived from mammalian viruses (e.g., adenovirus late promoter, vaccinia virus 7.5K promoter, SV40 promoter, cytomegalovirus promoter, tk promoter of HSV, mouse mammary tumor virus (MMTV) promoter, LTR promoter of HIV, promoter of Moloney virus, Epstein-Barr virus (EBV), and Rous sarcoma virus (RSV)) may be used, and a polyadenylated sequence may be commonly used as the transcription termination sequence.
The vector of the present invention may be fused with the other sequences to facilitate the purification of the antibody expressed therefrom. Examples of the fusion sequence include, for example, glutathione 5-transferase (Pharmacia, USA), maltose binding protein (NEB, USA), FLAG™ (IBI, USA), and 6×His (hexahistidine (SEQ ID NO: 23); QIAGEN™, USA).
Meanwhile, the expression vector of the present invention includes, as a selective marker, an antibiotic agent-resistant gene that is ordinarily used in the art, and may include resistant genes against ampicillin, gentamycin, carbenicillin, chloramphenicol, streptomycin, kanamycin, geneticin, neomycin, and tetracycline.
According to another aspect thereof, the present disclosure provides an antiviral composition comprising an expression inhibitor or activity inhibitor of one or more proteins selected from the group consisting of EIF3A, EIF3D, CSDE1, HNRNPM, YBX3, FMR1, RTCB, and NUFIP2.
“EIF3A” of the present disclosure is an RNA-binding component of the mammalian EIF3 complex and is an evolutionarily conserved protein along with EIF3B and EIF3C.
“EIF3D” of the present disclosure is known to interact with the mRNA cap and corresponds to a protein required for initiation of translation.
“CSDE1(Unr)” of the present disclosure is known to be required for IRES-dependent translation in human rhinovirus (Picornaviridae) and poliovirus (Picornaviridae).
“HNRNPM” of the present disclosure is a protein involved in the spliceosome mechanism through interaction with the CDC5L/PLRG1 spliceosome subcomplex, and the HNRNPM complex with nuclear IMP-3 is known to play an important role in efficient synthesis of CCND1, D3 and G1, and in the proliferation of human cancer cells.
“YBX3” of the present disclosure is known to inhibit influenza A virus by interacting with the viral ribonucleoprotein complex and impairing its function. However, its connection to coronavirus suppression is not identified.
“FMR1” of the present disclosure is a protein that binds to RNA and is associated with polysomes, and corresponds to a protein known to be associated with mRNA transport from the nucleus to the cytoplasm.
The “RTCB” of the present disclosure is an RNA ligase component of the E. coli RNA repair operon, and is a protein known to be able to catalyze in vivo tRNA splicing and HAC1 mRNA splicing.
“NUFIP2” of the present disclosure is an RNA-binding protein that exhibits cell cycle-dependent intracellular localization.
In an embodiment of the present disclosure, the expression inhibitor of the proteins is selected from the group consisting of miRNA, siRNA, shRNA, antisense oligonucleotide, CRISPRi, and combinations thereof that complementarily bind to the nucleic acid sequences encoding the proteins.
As used herein, the term “miRNA, siRNA or shRNA” of the present disclosure refers to a nucleic acid molecule that binds to mRNA transcribed from a target gene and inhibits translation of the mRNA in order to mediate RNA interference or gene silencing. Since the siRNA or shRNA can inhibit the expression of a target gene, it can be used in an efficient gene knockdown method or gene therapy method, and for the purpose of the present invention, it can be used to inhibit the expression of any one or more proteins selected from the group consisting of EIF3A, EIF3D, CSDE1, HNRNPM, YBX3, FMR1, RTCB and NUFIP2.
In an embodiment of the present disclosure, specific examples of siRNAs for EIF3A, EIF3D, CSDE1, HNRNPM, YBX3, FMR1, RTCB, and NUFIP2 of the present invention are as follows, and one or more siRNAs selected from the following siRNA sequences can be used to inhibit protein expression:
As used herein, the term “antisense oligonucleotide” of the present invention refers to DNA, RNA, or derivatives thereof containing a nucleic acid sequence complementary to a specific mRNA sequence, which binds to a complementary sequence in mRNA and inhibits translation of mRNA into protein. For the purpose of the present invention, it can be used to inhibit the expression of any one or more proteins selected from the group consisting of EIF3A, EIF3D, CSDE1, HNRNPM, YBX3, FMR1, RTCB, and NUFIP2.
As used herein, the term “CRISPRi (Clustered regularly interspaced short palindromic repeats interference)” of the present disclosure refers to a system capable of inducing gene suppression at the target site by using dCas in which the DNA cleavage activity of Cas is deficient (removed), in order to use the CRISPR-Cas system for the purpose of regulating gene expression. CRISPR, a protein involved in the adaptive immune system in Eubacteria and Archaea, has been utilized as a gene editing tool that can easily and quickly create gene insertion and modification at the target site using Cas and sgRNA. However, due to the potential for off-target mutations, CRISPRi (CRISPR interference) was developed as a system to regulate gene expression instead of gene editing at the target site by utilizing dCas, which lacks DNA cleavage activity. By using CRISPRi, gene expression at specific target locations can be suppressed without inducing mutations, allowing for precise gene regulation without genetic modifications. CRISPRi is used as an efficient tool for DNA sequence-specific regulation of gene expression in various organisms and can be used to inhibit the expression of one or more proteins selected from the group consisting of EIF3A, EIF3D, CSDE1, HNRNPM, YBX3, FMR1, RTCB, and NUFIP2, for the purposes of the present invention.
In an embodiment of the present disclosure, the activity inhibitor of the proteins is selected from the group consisting of antibodies, aptamers, small molecule compounds, and combinations thereof that specifically bind to the proteins.
As used herein, the term “antibody” in the present disclosure refers to a proteinaceous molecule capable of specifically binding to the antigenic epitope of a protein or peptide molecule. The antibodies can be prepared by cloning each gene into an expression vector according to a conventional method to obtain a protein encoded by the gene, and then using the obtained protein by a conventional method.
In the present disclosure, the antibody can be utilized as a means capable of inhibiting the activity of the protein, by binding to the activated EIF3A, EIF3D, CSDE1, HNRNPM, YBX3, FMR1, RTCB, or NUFIP2 protein.
As specific examples, the antibodies of the present invention may include polyclonal antibodies, monoclonal antibodies, or antigen-binding molecules that can specifically bind to EIF3A, EIF3D, CSDE1, HNRNPM, YBX3, FMR1, RTCB, or NUFIP2, and includes any portion thereof that possesses antigen-binding properties, and also encompasses specialized antibodies, including humanized antibodies and other special antibodies, in addition to including all immunoglobulin antibodies. In addition, the antibody may be a complete form having two full-length light chains and two full-length heavy chains, as well as a form including functional fragments of the antibody molecule. A functional fragment of an antibody molecule refers to a fragment having at least an antigen-binding function, and may include Fab, F(ab′), F(ab′) 2 and Fv.
As used herein, the term “aptamer” of the present disclosure refers to a nucleic acid molecule having binding activity to a specific target molecule. The aptamer may be RNA, DNA, modified nucleic acid, or a mixture thereof, and may be linear or cyclic forms. In general, it is known that the shorter the nucleotide sequence constituting the aptamer facilitates the easier the chemical synthesis and mass production, the better the cost advantage, the easier the chemical modification, the better the stability in vivo, and the lower the toxicity. In the present disclosure, the aptamer binds to the activated EIF3A, EIF3D, CSDE1, HNRNPM, YBX3, FMR1, RTCB, or NUFIP2 protein and can be utilized as a means to inhibit the activity of the protein.
As used herein, the term “small molecules” of the present disclosure refers to organic compounds having a small molecular weight, which bind to and regulate the function of biological polymers such as proteins. It may be of natural origin or artificially synthesized, and may inhibit protein functions or interfere with protein-protein interactions, but is not limited thereto.
For the purpose of the present invention, the small molecule compound includes, without limitation, any molecule that inhibits the activity of the activated EIF3A, EIF3D, CSDE1, HNRNPM, YBX3, FMR1, RTCB, or NUFIP2 protein, and specifically includes a molecule that binds to and inhibits the activity of activated EIF3A, EIF3D, CSDE1, HNRNPM, YBX3, FMR1, RTCB, or NUFIP2, but is not limited thereto.
According to another aspect thereof, the present disclosure provides a pharmaceutical composition for the prevention or treatment of coronavirus infection comprising the antiviral composition according to any one of the antiviral compositions mentioned above.
As used herein, the term “coronavirus infection” refers to a symptom or disease state caused by infection with a coronavirus selected from the group consisting of 229E, OC43, NL63, HKU1, MERS-CoV, SARS-CoV and SARS-CoV-2. Specifically, for example, it is known that infection with coronavirus is the cause of a significant portion of common colds in humans, and can cause direct viral pneumonia, secondary bacterial pneumonia, viral bronchitis, secondary bacterial bronchitis, and the like. It is known that there is, and by preventing coronavirus infection, the development of the above-mentioned diseases can be prevented. In addition, for a cold, viral pneumonia, or viral bronchitis, which is directly caused by a coronavirus, by suppressing viral growth, the above-mentioned diseases can be treated.
As used herein, the term “prevention” in the present specification refers to any activity that suppresses or delays viral infection itself or the onset of diseases associated with viral infection by administration of the pharmaceutical composition of the present invention.
As used herein, the term “treatment” refers to all activities that inhibit viral growth by administering the pharmaceutical composition of the present invention, thereby improving or beneficially altering symptoms and diseases caused by viral infection.
The pharmaceutical compositions of the present disclosure comprise a pharmaceutically acceptable carrier in addition to the active ingredient. The pharmaceutically acceptable carriers included in the pharmaceutical compositions of the disclosure are those commonly utilized in pharmaceutical formulations, such as lactose, dextrose, sucrose, sorbitol, mannitol, starch, acacia gum, calcium phosphate, alginate, gelatin, calcium silicate, microcrystalline cellulose, polyvinylpyrrolidone, cellulose, water, syrup, methyl cellulose, methylhydroxybenzoate, propylhydroxybenzoate, talc, magnesium stearate, and mineral oil. In addition to the above components, the pharmaceutical compositions of the present disclosure may further comprise lubricants, wetting agents, sweeteners, flavoring agents, emulsifiers, suspending agents, preservatives, and the like. Suitable pharmaceutically acceptable carriers and formulations are described in detail in Remington's Pharmaceutical Sciences (19th ed., 1995).
A suitable dosage of the pharmaceutical composition of the present disclosure is variously prescribed by factors such as formulation method, administration method, patient's age, weight, sex, medical condition, food, administration time, administration route, excretion rate and response sensitivity. In one aspect, the dosage of the pharmaceutical composition of the present disclosure is preferably 0.001-1000 mg/kg (body weight) per day.
The pharmaceutical composition of the present disclosure can be administered orally or parenterally, and in the case of parenteral administration, it can be administered by topical application to the skin, intravenous injection, subcutaneous injection, intramuscular injection, intraperitoneal injection, transdermal administration, etc.
Considering that the coronavirus of the present disclosure is transmitted through mucosal transmission and droplet transmission, resulting in damage to the bronchi, lungs, and the like, the pharmaceutical compositions of the present disclosure are preferably administered to a patient by topical application to mucosal sites, oral administration, intravenous injection, subcutaneous injection, and the like.
The pharmaceutical composition of the present disclosure is prepared in unit dosage form by formulation using a pharmaceutically acceptable carrier and/or excipient according to a method that can be easily performed by those skilled in the art. or it may be prepared by incorporating into a multi-dose container. In this case, the formulation may be in the form of a solution, suspension or emulsion in an oil or aqueous medium, or may be in the form of an extract, powder, granule, tablet or capsule, and may additionally contain a dispersing agent or stabilizer.
According to another aspect of the present disclosure, the present disclosure provides a screening method for a coronavirus inhibitor comprising the following steps:
Since the EIF3A, EIF3D, CSDE1, HNRNPM, YBX3, FMR1, RTCB and NUFIP2 proteins of the present disclosure have proviral activity, they can be targets of newly discovered pharmaceutical compositions and can be used as drug candidates in cells transfected with coronavirus. By processing, it is possible to discover a coronavirus inhibitor by confirming the expression level of any one or more of the above-mentioned proteins.
According to another aspect of the present disclosure, the present disclosure provides a method for establishing an RNA interactome comprising the following steps:
In one embodiment of the present disclosure, wherein the probe pool comprises a first probe pool that specifically hybridizes with a gRNA molecule and a second probe pool that hybridizes with both gRNA and gsRNA.
The features and advantages of the present disclosure are summarized as follows:
Hereinafter, the present disclosure will be described in more detail through examples. These examples are only for explaining the present disclosure in more detail, and it will be apparent to those skilled in the art that the scope of the present disclosure is not limited by these examples according to the gist of the present disclosure.
By scanning the genomic RNAs of SARS-CoV-2 (NCBI RefSeq accession NC_045512.2) and HCoV-OC43 (GenBank accession AY391777.1) from head to tail, partially overlapping 90 nt tiles were enumerated. These tiles were designed to have 30 nt spacing, so adjacent tiles share a subsequence of 60 nt. To avoid ambiguous targeting, tiles were aligned to the human transcriptome (version of Oct. 14, 2019) using bowtie 2 (Langmead and Salzberg, 2012) and multi-mapped sequences were discarded. To prepare biotinylated antisense oligonucleotides (ASOs) in bulk, the sequence elements for in vitro transcription (IVT), reverse transcription (RT) and PCR were added to the 90 nt tiles. The T7 promoter (5′-TAA TAC GAC TCA CTA TAG GG-3′) and a pad for RT priming (5′-TGG AAT TCT CGG GTG CCA AGG-3′) were added to the head and tail of each tile, respectively. We grouped ASOs into two sets for each viral genome: “Probe I” targets the unique region of genomic RNA ([265:21553] of NC_045512.2; [:21506] of AY391777.1) and “Probe II” aims at both genomic and subgenomic RNAs ([21562:] of NC_045512.2; [21506:] of AY391777.1). The templates of four ASO groups have distinct PCR primer binding sites on both ends. Accordingly, each ASO set can be selectively amplified from a single mixture. The final ASO templates (167 nt) were prepared via the oligo pool synthesis service of Genscript and stored at −80° C. The ASO templates used in this study are listed in Table 2.
ASO templates were amplified using KAPA HiFi HotStart ReadyMix (Roche) and PCR primers for an ASO pool. PCR products were purified by QIAquick PCR purification kit (QIAGEN). RNA intermediates were then transcribed using 5× MEGAscript T7 transcription kit (Invitrogen), and DNA templates were degraded by TURBO DNase (Invitrogen). To clean up enzymes and other reagents, 1.8× reaction volume of AMPure XP (Beckman) was applied and polyethylene glycol was added to be final 20%. The size selection was carried out according to the manufacturer's protocol. Biotinylated ASOs were synthesized by RevertAid Reverse Transcriptase (Thermo Scientific) and 5′ biotin-TEG primer. RNA intermediates were hydrolyzed at 0.1 M NaOH and neutralized with acetic acid. Finally, ASO purification was performed in the same manner as IVT RNA selection. The primer sequences used for PCR and reverse transcription are listed in Table 6.
The Uniprot reference proteome sets for human (UP000005640, canonical, SwissProt) and African green monkey (Chlorocebus sabaeus; UP000029965, canonical, SwissProt and TrEMBL) were used to identify host proteins in each mass spectrometry experiment (version 03/21/2020) (UniProt Consortium, 2019). The reference proteome set for the Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was manually curated largely based on the NCBI Reference Sequence (NC_045512.2) and related literature of other accessory proteins (e.g. ORF3b, ORF9b and ORF9c). The reference proteome set for the Human coronavirus OC43 (HCoV-OC43) was compiled based on the Uniprot Swiss-Prot proteins for HCoV-OC43 (taxonomy:31631) except for HCoV-Protein I which was separated into Protein la and Protein lb (or N2) (Vijgen et al., 2005).
Virus experiments were carried out in accordance with the biosafety guideline by the Korea Centers for Disease Control & Prevention (KCDC). The Institutional Biosafety Committee of Seoul National University (SNUIBC) approved the protocols used in this study, including SNUIBC-R200331-1-1 for BL2 experiments and SNUIBC-200508-1 for BL3 experiments. Vero and HCT-8 cells were maintained in DMEM (Welgene) and RPMI 1640 (Welgene) respectively, both with 1× Antibiotic-Antimycotic (Gibco) and 10% FBS (Gibco) and cultured in CO2 incubator with 5% CO2 at 37° C. For SARS-CoV-2 infection, 7×106 Vero cells were plated in T-175 flasks 24 hours before infection. Cells were washed with serum-free media and incubated with 5 mL virus-diluted media for 30 minutes at 0.1 MOI, as determined by plaque assay. After infection, virus containing media was replaced with reduced-serum media (2% FBS) and cultured until the harvest. For HCoV-OC43 infection, a similar protocol was used except for incubation temperature lowered to 33° C. For siRNA transfection, 3.5×105 Calu-3 cells, maintained in DMEM with 1× Antibiotic-Antimycotic and 10% FBS in CO2 incubator with 5% CO2 at 37° C., were plated in 12 well plate and final 50 nM siRNAs were reverse-transfected using Lipofectamine RNAiMAX (Invitrogen) and ON-TARGETplus SMARTpool siRNAs (Horizon Discovery). Cell viability after siRNA knockdown was measured by splitting 1/100th of cells from uninfected cells, 48 hours after transfection into 96 well plates in triplicates and cell number was measured by MTT assay (Promega) at 4 hours after addition of tetrazolium dye.
For total RNA purification from virus-infected cells, 1 mL TRIzol LS (Invitorgen) were added to media-removed cell monolayers per single well of 12 well plates followed by on-column DNA digestion and purification (Zymo Research). For RNA purification from RNP capture sample, bead-captured RNAs were digested with 100 ng Proteinase K (PCR grade, Roche) and incubated at 37° C. for 1 hour, followed by RNA isolation by TRIzol LS with GlycoBlue (Invitrogen). 1˜5 μg RNA were reverse-transcribed using RevertAid transcriptase (Thermo Scientific) and random hexamer. qPCR was performed with primer pairs listed in Table 6 and PowerSYBR Green (Applied Biosystems) and analyzed with QuantStudio 5 (Thermo Scientific).
Virus infected cells were detached from culture vessels by trypsin and cell pellets were resuspended with ice-cold PBS. 12 mL cell suspensions were dispersed in 150 mm dishes to irradiate 254 nm UV for 2.5 J/cm2 using BIO-LINK BLX-254 for SARS-CoV-2 or 0.8 J/cm2 using Spectrolinker XL-1500 for HCoV-OC43. UV-crosslinked cells were pelleted by centrifugation and resuspended in TURBO DNase solution (150 Units per flask) and incubated at 37° C. for 30 minutes. DNA digested cells were supplemented with equal volume of pre-heated 2× lysis buffer (40 mM Tris-Cl at pH 7.5, 1 M LiCl, 1% LDS, 2 mM EDTA, 10 mM DTT and 8 M urea) and denatured by incubating at 68° C. for minutes. Per replicate, cell lysate from 1 flask (T-175) were mixed with 20 μg biotin probe pools (Probe I or Probe II) and hybridized by incubating at 68° C. for 30 minutes in final 1 mL volume. Biotin-labeled RNP lysates were supplemented with streptavidin beads (1 mL per replicate, New England Biolabs) and captured by rotating at room temperature overnight. Probe-enriched RNP beads were washed with 1× lysis buffer twice and transferred to fresh tubes, followed by final wash with detergent-free wash buffer (20 mM Tris-CI at pH 7.5, 0.5 M LiCl, 1 mM EDTA). 1/10th of beads were set aside for assessment of RNA contents by RT-qPCR and another 1/10th of beads were used for silver staining (KOMA biotech). The remaining 8/10th of beads were digested with 100 units of Benzonase nuclease (Millipore) at 37° C. for 1 hour. For on-bead peptide digestion, nuclease treated beads were suspended to final 8 M urea and reduced with 10 mM dithiothreitol (DTT), alkylated with 40 mM iodoacetamide (IAA) for 1 hour each at 37° C., and diluted with 50 mM ammonium bicarbonate (ABC) to final 1 M urea. These bead suspensions were supplemented with 300 ng Trypsin (Thermo Scientific, MS grade) and 1 mM CaCl2) and digested overnight at 37° C. Peptide solutions were separated from magnetic beads and further processed with HiPPR detergent removal spin columns (Thermo Scientific) and desalted by reverse phase C18 ziptip (Millipore). After the clean up and dry down, the samples were reconstituted with 20 μL of 25 mM ammonium bicarbonate buffer for LC-MS/MS analysis.
LC-MS/MS analysis was carried out using Orbitrap Fusion Lumos Tribrid MS (Thermo Fisher Scientific) coupled with nanoAcquity UPLC system (Waters). Both analytical capillary columns (100 cm×75 μm i.d.) and the trap columns (3 cm×150 μm i.d) were packed in-house with 3 μm Jupiter C18 particles (Phenomenex, Torrance). The long analytical column was placed in a column heater (Analytical Sales and Services) regulated to a temperature of 45° C. The LC flow rate was 300 nL/min and the 100-min linear gradient ranged from 95% solvent A (H2O with 0.1% formic acid (Merck)) to 40% solvent B (100% acetonitrile with 0.1% formic acid). Precursor ions were acquired at 120 K resolving power at m/z 200 and the isolation of precursor for MS/MS analysis was performed with a 1.4 Th. Higher-energy collisional dissociation (HCD) with 30% collision energy was used for sequencing with a target value of 1E5 ions determined by automatic gain control. Resolving power for acquired MS2 spectra was set to 30 K at m/z 200 with 120 ms maximum injection time.
Mass spectrometric raw data files were processed for Label-Free Quantification with MaxQuant (version 1.6.15.0) (Cox and Mann, 2008) using the built-in Andromeda search engine (Cox et al., 2011) at default settings with a few exceptions. Briefly, for peptide-spectrum match (PSM) search, cysteine carbamidomethylation was set as fixed modifications, and methionine oxidation and N-terminal acetylation were set as variable modifications. Tolerance for the first and main PSM search were 20 and 4.5 ppm, respectively. Peptides from common contaminant proteins were identified by utilizing the contaminant database provided by MaxQuant. FDR threshold of 1% was used for both the peptide and protein level. The match-between-runs option was enabled with default parameters in the identification step. Finally, LFQ was performed for those with a minimum ratio count of 1.
To identify host and viral proteins that interact with the particular RNA species of interest (e.g. sgRNA or gRNA), we utilized the results from the “bead only” and “probe only” samples as technical backgrounds. Specifically, the “bead only” (or no-probe) experiment in infected cells was used to account for non-specific interactors and biotin-containing carboxylases (e.g. PCCA, ACACA, and ACACB) and determine the set of host and viral proteins that in a broad sense bind to the RNA, which we call Probe I/II “binding” proteins. The probe experiment in uninfected cells (i.e. “probe only”) was then used as the technical background against target RNAindependent interactors and determine the set of host proteins that are enriched for the target RNA, which we call Probe I/II “enriched” proteins.
To accomplish this, we considered the protein spectra count data as a multinomial distribution and applied a statistical test for spectra count enrichment. Specifically, let Np be the number of identified spectra counts for protein group p from the case experiment (e.g. Probe I experiment in infected cells), and Mp be the respective count number from the control experiment (e.g. noprobe experiment in infected cells). For each protein i with Ni≥1, the statistical significance of enrichment is:
We conducted enrichment analyses of Gene Ontology (GO) terms (Gene Ontology Consortium, 2001) by means of summarizing the function of tens of host proteins identified in the RNP capture experiment. In general, Fisher's exact test is used to estimate the statistical significance of the association (i.e. contingency) between a particular GO term and the gene set of interest.
To improve the explanatory power of this analysis, we used the weight01 algorithm (Alexa et al., 2006) from the topGO R package which accounts for the GO graph structure and reduces local dependencies between GO terms.
Detailed information of the Gene Ontology was from the GO.db R package (version 3.8.2), and GO gene annotations were from the org.Hs.eg.db R package (version 3.8.2).
We integrated protein-protein interaction data from the BioGRID database (Release 3.5.187) (Stark et al., 2006) and retrieved other proteins that do not necessarily bind to the SARS-CoV-2 RNA but form either transient or stable physical interactions with the host proteins identified from the RNP capture experiments. In detail, we considered only human protein-protein interactions that were (1) found from at least two different types of experiments and (2) reported by at least three publication records which resulted in a total of 65,625 interactions covering 12,143 human proteins. Physical interactions between SARS-CoV-2 proteins and human proteins were by affinity capture and mass spectrometry in SARS-CoV-2 protein expressing cells (Gordon et al., 2020). The network R package and the ggnet2 function of the GGally R package was used for graph visualization.
Pfam database (version 33.1) (El-Gebali et al., 2019) was used for protein domain enrichment analysis. Taxon 9606 (human) and Taxon 60711 (green monkey) protein domain annotations were used to analyze RNP capture results of HCoV-OC43 and SARS-CoV-2, respectively. Onesided Fisher's exact test was applied to estimate the statistical enrichment of a particular protein domain for the specific gene set (e.g. SARS-CoV-2 Probe I binding proteins). We utilized the set of all proteins identified in the RNP capture experiments and all protein domains annotated to those proteins as the statistical background of the enrichment analysis.
To investigate the subcellular localizations of the SARS-CoV-2 interactome, we leveraged the protein subcellular localization information from the Human cell map database v1 (Go et al., 2019). Information from the SAFE algorithm was used primarily but then supplemented by 29 information from the NMF algorithm in case of “no prediction” or “-” localizations. Localization terms of the NMF algorithm were matched to terms of the SAFE algorithm in general, but few were mapped to the higher term of the SAFE algorithm. For example, the “cell junction” term of the NMF algorithm was merged to the “cell junction, plasma membrane” term of the SAFE algorithm.
To identify the viral and host proteins that directly interact with the genomic and subgenomic RNAs of SARS-CoV-2, we modified the RNA antisense purification coupled with mass spectrometry (RAP-MS) (McHugh and Guttman, 2018) protocol which was developed to profile the interacting proteins of a particular RNA species (
Briefly, cells were first detached from culture vessels and then irradiated with 254 nm UV to induce RNA-protein crosslink while preserving RNA integrity. Crosslinked cells were treated with DNase and lysed with an optimized buffer condition to homogenize and denature the proteins in high concentration. Massive pools of biotinylated antisense probes were used to capture the denatured RNP complexes in a sequence-specific manner. After stringent washing and detergent removal, the RNP complexes were released and digested by serial benzonase and on-bead trypsin treatment. These modifications to the RAP-MS protocol enabled robust and sensitive identification of proteins directly bound to the RNA target of interest.
We designed two separate pools of densely overlapping 90-nt antisense probes to achieve an unbiased perspective of the SARS-CoV-2 RNA interactome (
The SARS-CoV-2 transcriptome consists of (1) a genomic RNA (gRNA) encoding 16 nonstructural proteins (nsps) and (2) multiple subgenomic RNAs (sgRNAs) that encode structural and accessory proteins (Sola et al., 2015). The sgRNAs are more abundant than the gRNA (Kim et al., 2020a). The first pool (“Probe I”) consists of 707 oligos tiles every 30 nucleotides across the ORF1ab region (266:21553, NC_045512.2) and thus hybridizes specifically with the gRNA molecules (
To first check whether our method specifically captures the viral RNP complexes, we compared the resulting purification from Vero cells infected with SARS-CoV-2 (BetaCoV/Korea/KCDC03/2020) at MOI 0.1 for 24 hours (Kim et al., 2020b) by either Probe I or Probe II. As negative controls, we pulled-down without probes (“no probe” control) or with the control probes (for either 18S or 28S rRNA). Protein composition of each RNP sample was distinct as shown by silver staining and western blotting (
We conducted Label-free quantification (LFQ) by liquid chromatography with tandem mass spectrometry (LC-MS/MS) and identified 429 host proteins and 9 viral proteins in total (
YP_009724397
ADA0
09
E−309
A0A0
95
E−89
A0A0
A0A
A0A0
A0A0
9
A0A0
3E−52
A0A
A0A0
L
L
E−32
A0A0
A0A0
A0A
N2
A0A0
N2
A0A0
7E−19
A0A0
_0097225297
P_009725297.
A0A0
A0A0
A0A0
A0A0
A0A
7E−12
A0A0
A0A0
5
5
A0A0
A0A0
.386E−10
.366E−10
E−10
8-10
E−10
3
3
E−10
_009725308
12
E−10
S
S
.229E−09
.229E−09
A0A
0E−08
A0A
A0A
3
3
E.07
76
76
E−07
1
1
E−07
A0A
A0
A0A0
A0A
_009724390
_009724390
_009724393
_009724393
E−0
8
8
A0A
A0A0
N
N
A0A
9E−03
A0A0
9E−03
YP_009725305
A0A
A0A0
A0A
A0A
A0A
A0A0
A0A0
.550E−03
4E−02
4E−02
A0A0
A0A
CO
4E−02
A0A
A0A
A0A
A0A0
A0A
A0A
A0A
YP_009725304
A0A0
YP_009
E−377
E−250
E−224
E−
TBF1
E−148
TBF1
TBF1
E−145
E−137
E−103
E−106
E−103
E−93
E−92
E−92
E−81
E−80
E−72
E−70
E−69
E−62
E−62
E−55
E−53
E−51
E−51
E−50
E−49
E−48
E−48
E−45
E−43
E−42
E−41
E−40
E−40
E−40
E−38
2
E−36
E−36
E−35
E−35
E−32
E−32
E−32
E−30
E−29
E−29
E−27
E−27
E−27
E−26
E−26
E−25
E−25
E−24
E−24
E−23
E−23
E−22
E−22
E−21
E−21
E−21
E−20
E−20
E−20
E−19
E−18
E−18
1
E−17
E−17
E−17
E−17
E−17
YP_
E−17
E−16
E−15
E−15
E−14
E−14
E−14
E−13
E−13
E−13
E−13
E−13
E−13
E−12
E−12
E−11
E−11
E−10
E−10
E−10
E−10
E−10
E−09
E−09
E−09
E−08
E−08
E−08
E−08
E−08
E−08
E−08
E−08
E−07
E−07
E−07
E−07
E−07
E−07
E−07
E−07
E−07
E−07
E−07
E−07
E−07
E−06
E−06
E−06
E−06
E−0
E−0
E−05
E−05
E−05
E−05
E−05
E−05
E−05
E−05
E−05
H0BB
E−05
E−05
E−04
E−04
E−04
E−04
E−04
E−04
E−04
E−04
E−04
E−04
E−04
E−04
E−04
E−04
E−04
E−04
E−03
E−03
E−03
E−03
E−03
E−03
E−03
E−03
E−03
E−03
E−03
E−03
YP_
E−03
A0A
.820E−03
.820E−03
.820E−03
.820E−03
.820E−03
.820E−03
.820E−03
.820E−03
.820E−03
.820E−03
A0A0
.820E−03
YP_009725305
.820E−03
YP_009725307
.820E−03
U1
U1
A0A
A0A0
YP_009724393
A0A0
E−02
A0A
E−02
48E−02
0
0
A0A0
A0A
A0A0
A0A0
A0A0
2
.407E−02
indicates data missing or illegible when filed
As for viral proteins, the N protein was the most strongly enriched one, as expected (
Nsp12, S, M, and nsp9 were detected more with Probe I than with Probe II. Coronavirus nsp9 is a single-strand RNA binding protein (Egloff et al., 2004; Sutton et al., 2004) essential for viral replication (Miknis et al., 2009). Nsp1 is one of the major virulence factors that suppresses host translation by binding to the 40S ribosomal subunit (Thoms et al., 2020). While nsp1 is mostly studied in the context of host gene expression (Narayanan et al., 2008), our result hints at the direct role of nsp1 on the transcripts of SARS-CoV-2.
To delineate the host proteins that are enriched in the SARS-CoV-2 RNP complex, we employed an additional negative control experiment with uninfected cells (
To investigate the evolutionary conservation of the RNA-protein interactions in coronaviruses, we conducted RNP capture on HCoV-OC43 that belongs to the lineage A of genus betacoronavirus. HCoV-OC43 shows 54.2% nucleotide homology to SARS-CoV-2 which belongs to lineage B. We profiled the HCoV-OC43 RNP complexes at multiple time points: 12, 24, 36, and 48 hours post-infection (
Unweighted spectral count analysis against the no-probe control revealed 133, 167, 192, and 160 proteins that are overrepresented in the OC43 Probe I experiment at 12, 24, 36, and 48 hours post-infection (hpi), respectively (FDR<10%, Table 3).
SBPS
51
2272
78
P2
E−05
78
Q
3
31
A
21
7
IYB3
41
6PK6
2753
Q8
Q
159
HRF1
YZW1
2
24
0506
433
0
3
717
P2
14979
52
2750
Q
S1
211
39
880
3731
2995
09
7
534
3
2W4
N1
3
NC51
Q
Q8
0866
21
8
Q8
00425
15371
22
3
65
.279E−03
.279E−03
95758
402
1114
NDY0
0
1
43776
O1
9
0
0506
10E−22
0SE−20
NZB2
2
P2
E−14
Q
Q
87E−11
PKG0
E−11
634
795
084
2-2750
71
2W4
YD1
Q8
2633
8
7809
2841
501
7
E−03
M1
Q8
9429
2318
7
E−02
E−02
E−02
0869
2140
5T6F2
P0
A1
7
.606E−52
P2
2272
P1
2
T035
E−12
Q
E−12
2
B2
S1
Q
Q
33
96T37
E−07
2800
37
0425
P3
7
1
5
019
37
S
YK
2917
RJ2
Y77
0832
398
indicates data missing or illegible when filed
For the OC43 Probe II experiment, 119, 189, 194, and 185 proteins were overrepresented at each respective infection time point of 12, 24, 36, and 48 hpi (FDR<10%, Table 4).
124
P
NP80
6; Q8
BY0
729
P6
T749
P
C
NQ39
N1G4
2
6L21
NWB
indicates data missing or illegible when filed
The analysis of all eight RNP capture experiments resulted in the enrichment of proteins containing canonical RNA binding domains such as the RRM domain and the KH domain (
Fourteen viral proteins were detected within the HCoV-OC43 RNP complexes (
Next, the inventors compared the host factors that form the viral RNA interactome of HCoV-OC43 and SARS-CoV-2. All 107 proteins from the SARS-CoV-2 interactome were also detected in the HCoV-OC43 interactome throughout multiple infection timepoints, except for RBMS1 and DDX3Y (
To determine the core host factors that are conserved in the coronavirus RNA interactomes, we applied our spectral count analysis on the HCoV-OC43 experiment of 36 hpi (
The inventors identified 67 and 70 host proteins for the HCoV-OC43 Probe I and Probe II experiments, respectively (Table 5).
2879; O96319
NZ
8
TBY0; Q
NQ94
indicates data missing or illegible when filed
Reagent and Resource
38 proteins were statistically enriched in both probe sets. GO term enrichment analysis revealed that these 38 host proteins are involved in transcriptional regulation, RNA processing, and RNA stability control (
To understand the regulatory significance of the SARS-CoV-2 RNA interactome, we compiled a list of “neighboring” proteins that are known to physically interact with the factors identified in our study (see Methods for details). In particular, we generated a physical interaction network centered (or seeded) by the core SARS-CoV-2 interactome (
To achieve a more indepth functional perspective of the RNA interactome, we reconstructed the physical interaction network with the SARS-CoV-2 RNA interactome but excluding ribosomal proteins and EIF3 proteins (
Another way to gauge the regulatory potential of the SARS-CoV-2 RNA interactome is to examine them in the context of the transcriptional response to viral infection. For example, infected cells recognize the non-cellular RNAs by a number of cytosolic sensors such as RIG-I (DDX58) and MDA5 (IFIH1) and ultimately induces interferons which in turn up-regulates interferon-stimulated genes (ISGs) through the JAK-STAT pathway (Sa Ribero et al., 2020). Multiple studies have reported the unusually low-level of type I/111 interferon responses in cell and animal model systems of SARS-CoV-2 infection (Blanco-Melo et al., 2020) and blood biopsy from COVID-19 patients (Hadjadj et al., 2020), indicating active immune evasion by SARS-CoV-2 and supporting the therapeutic potential of timely interferon treatment (Sa Ribero et al., 2020).
To investigate the regulation of the SARS-CoV-2 RNA interactome, we utilized published transcriptome data of SARS-CoV-2-infected cells (Blanco-Melo et al., 2020). Transcriptome analysis of ACE2-expressing A549 cells revealed host factors of SARS-CoV-2 RNA interactome that are differentially expressed after infection (
Interferon-beta (INFβ) treatment on normal human bronchial epithelial (NHBE) cells induces PARP12, SHFL, and TRIM25 (
5. Host Factors and Functional Modules that Regulate the SARS-CoV-2 RNAs
To measure the impact of these host proteins on coronavirus RNAs, we conducted knockdown experiments and infected Calu-3 cells with SARS-CoV-2 (FIG. and 5b). Calu-3 cells are human lung epithelial cells and often used as a model system for coronavirus infection (Sims et al., 2008). Strategically, we selected a subset of the SARS-CoV-2 RNA interactome that covers a broad range of functional modules that we identified above: JAK-STAT signaling, mRNA transport, mRNA stability, and translation. Knockdown of host factors that are stimulated by SARS-CoV-2 infection or INFβ treatment, namely PARP12, TRIM25, ZC3HAV1, CELF1, and SHFL, led to increased viral RNAs (
ZC3HAV1 (ZAP/PARP13) is an ISG and known to restrict the replication of many RNA viruses such as HIV-1 (Retroviridae), sindbis virus (Togaviridae), and Ebola (Filoviridae) (Goodier et al., 2015). ZC3HAV1 was previously reported to recognize CpG and recruits decay factors to degrade HIV RNAs (Takata et al., 2017). ZC3HAV1 acts as a cofactor for TRIM25, an E3 ubiquitin ligase that promotes antiviral signaling mainly through RIG-I (Choudhury et al., 2020; Gack et al., 2007). Our knockdown results indicate that both ZC3HAV1 and TRIM25 may act as antiviral factors against SARS-CoV-2 (
PARP12, a cytoplasmic mono-ADP-ribosylation (MARylation) enzyme, is also known to have broad antiviral activity against RNA viruses such as Venezuelan equine encephalitis virus (Togaviridae), vesicular stomatitis virus (Rhabdoviridae), Rift Valley fever virus (Phenuiviridae), encephalomyocarditis virus (Picornaviridae), and Zika virus (Flaviviridae) by multiple mechanisms including blocking cellular RNA translation (Atasheva et al., 2014; Welsby et al., 2014) or triggering proteasome-mediated destabilization of viral proteins (Li et al., 2018). ADP ribosyltransferases are evolutionarily ancient tools used for host-pathogen interactions (Fehr et al., 2020). Of note, coronavirus nsp3 carries a conserved macrodomain that can remove ADPribose to reverse the activity of PARP enzymes (Fehr et al., 2015). Knockdown of PARP12 and PARP14 was shown to increase the replication of the macrodomain-deficient mouse hepatitis virus (MHV) which belongs to the lineage A of genus betacoronavirus (Grunewald et al., 2019) which is consistent with our knockdown results (
Other interferon-stimulated RBPs may also be involved in host defense against SARS-CoV-2. CELF1 (CUGBP1 Elav-like protein family 1) mediates alternative splicing and controls mRNA stability and translation (Konieczny et al., 2014). CELF1 is required for IFNβ-mediated suppression of simian immunodeficiency virus (Retroviridae) (Dudaronek et al., 2007), but its involvement in other viral infections is unknown. SHFL (Shiftless/RyDEN) was induced prominently upon viral infection and interferon treatment, and suppressed by JAK inhibitor (
Apart from the above RBPs, we identified multiple host factors that have not been previously described in the context of viral infection. In particular, LARP1 depletion resulted in a substantial upregulation of viral RNAs (
In fact, the SARS-CoV-2 RNA interactome includes specific components of the 40S and 60S ribosomal subunits and translational initiation factors (
Other translation factors EIF3A, EIF3D, and CSDE1 exhibited proviral effects (
EIF3A is the RNA-binding component of the mammalian EIF3 complex and evolutionarily conserved along with EIF3B and EIF3C (Masutani et al., 2007). EIF3D is known to interact with mRNA cap and is required for specialized translation initiation (Lee et al., 2016). CSDE1 (Unr) is required for IRES-dependent translation in human rhinovirus (Picornaviridae) and poliovirus (Picornaviridae) (Anderson et al., 2007; Boussadia et al., 2003). In all, our finding suggests that SARS-CoV-2 may recruit EIF3D and CSDE1 to respectively regulate cap-dependent and IRESdependent translation initiation (Lee et al., 2017) of SARS-CoV-2 gRNA and sgRNAs.
Lastly, the coronaviral RNA interactomes are enriched with RBPs with KH domains (
Our current study reveals a broad-spectrum of antiviral factors such as TRIM25, ZC3HAV1, PARP12, and SHFL and also many RBPs whose functions are unknown in the context of viral infection such as LARP1, FUBP3, FAM120A/C, EIF4H, RPS3, RPS9, SND1, CELF1, RALY, CNBP, EIF3A, EIF3D, and CSDE1. This list of proteins reflects constant host-pathogen interactions and opens new avenues to explore unknown mechanisms of viral life cycle and immune evasion.
Along with proteins regulating RNAs, it would also be interesting to consider the possibility of ‘riboregulation’ (Hentze et al., 2018) in which RNA controls its interacting proteins. Dengue virus, for example, uses its subgenomic RNA called sfRNA to sequester TRIM25 (Chapman et al., 2014). The sgRNA/gRNA ratio is a critical determinant of epidemic potential of dengue virus (Manokaran et al., 2015). Notably, coronaviruses including SARS-CoV-2 produces substantial amounts of noncanonical sgRNAs that may serve as noncoding decoys to interact with host RBPs to modulate host immune responses (Kim et al., 2020a).
There are over 3,200 human clinical studies listed for the treatment of COVID-19 as of September 2020 (ClinicalTrials.gov), yet there's no effective antiviral drug or vaccine available to stop the ongoing pandemic. This unmet medical need highlights our substantial lack of knowledge of SARS-CoV-2. Thus, redefining antiviral strategies should be contemplated beyond expeditious drug repurposing efforts. So far, collective large and multidisciplinary datasets from viral transcriptome (Kim et al., 2020a), host transcriptional response (Blanco-Melo et al., 2020), ribosome profiling (Finkel et al., 2020), whole proteomics (Bojkova et al., 2020), protein-protein interactions by co-IP (Gordon et al., 2020) and proximity labeling (St-Germain et al., 2020), phosphoproteomics (Bouhaddou et al., 2020), RNA structure (Lan et al., 2020), genome-wide CRISPR screen (Wei et al., 2020), and off-label drug screening (Chen et al., 2020) have all provided invaluable insights of the underlying biology of this novel human coronavirus. In line with these efforts, we here present the SARS-CoV-2 RNA interactome that offers insights into the host-viral interaction that regulate the life cycle of coronaviruses. Data interpretation in the context of publicly available orthogonal information has enabled the identification of proviral and antiviral protein candidates. We expect that further efforts to generate and integrate system-level data will elucidate the pathogenicity of SARS-CoV-2 and introduce new strategies to combat COVID-19.
Atasheva, S., Frolova, E. I., and Frolov, I. (2014). Interferon-stimulated poly(ADP-Ribose) polymerases are potent inhibitors of cellular translation and virus replication. J. Virol. 88, 2116-2130.
Balinsky, C. A., Schmeisser, H., Wells, A. I., Ganesan, S., Jin, T., Singh, K., and Zoon, K. C. (2017). IRAV (FLJ11286), an Interferon-Stimulated Gene with Antiviral Activity against Dengue Virus, Interacts with MOV10. J. Virol. 91.
Number | Date | Country | Kind |
---|---|---|---|
10-2020-0144787 | Nov 2020 | KR | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/KR2021/015742 | 11/2/2021 | WO |