The invention generally relates to compositions (including constructs, fusion proteins, vectors, and cells) and methods of using such compositions for enhancing gene expression and viral replication. More specifically, the invention relates to use of m6A sequences and/or YTHDF polypeptides to enhance gene expression or viral replication.
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product such as a protein. The central dogma of molecular biology dictates that information is generally transferred from DNA to RNA to protein, although exceptions have been well documented.
In the biotechnology industry, it is often desirable to enhance RNA expression. For example, recombinant DNA technology is widely used in the biotechnology industry to produce proteins that may be used as research tools, industrial enzymes, or active ingredients in therapeutics. Generally, in such applications, recombinant DNA technology involves the cloning of a gene encoding a desired polypeptide into a suitable expression vector. The expression vector encoding the desired polypeptide is then transfected into a host cell, which is cultured to produce RNA encoding the polypeptide. The RNA is translated to produce the polypeptide. The polypeptide may then be purified either by lysing the cells or, in the case of a secreted polypeptide, purified from the supernatant of the cell culture. Given this methodology, one potential way of increasing the productivity of a cell line producing a desired polypeptide is by increasing the steady-state levels of mRNA encoding the polypeptide in a cell.
In addition to the recombinant production of commercially-valuable proteins, enhanced RNA expression may also be beneficial in other applications such as gene therapy, RNA-based therapeutics (i.e., mRNA-based therapeutics), and virus production. There, however, remains a need in the art for new strategies and mechanisms for enhancing RNA expression in a particular cell or cell line.
The invention generally relates to compositions (including polynucleotides, constructs, fusion proteins, vectors, and cells) and methods of using such compositions for enhancing gene expression, protein production and viral replication. More specifically, the invention relates to use of m6A sequences and/or YTHDF polypeptides to enhance gene expression or viral replication.
In one aspect, polynucleotides are provided. The polynucleotides may include at least one m6A sequence such as, for example, one engineered m6A sequence. In another aspect, constructs are provided. The constructs may include a promoter operably connected to any one of the polynucleotides described herein. Alternatively, the constructs may include an insert site, and a UTR sequence including at least one m6A sequence. The insert site may or may not include a heterologous coding sequence encoding a heterologous polypeptide.
In another aspect, vectors including any of the polynucleotides or constructs described herein are provided. In another aspect, cells including any of the polynucleotides, constructs, or vectors described herein are provided.
In another aspect, methods for producing a heterologous polypeptide in a cell are provided. The methods may include introducing any of the polynucleotides, constructs, or vectors described herein into the cell.
In a further aspect, provided herein are fusion proteins (and constructs encoding such fusion proteins) including a YTHDF polypeptide and a RNA-binding polypeptide. Constructs including (i) a heterologous coding sequence encoding a heterologous polypeptide, and (ii) a UTR sequence including at least one RNA-binding polypeptide recognition sequence are also provided as are cells including such fusion proteins and constructs.
In another aspect, methods for producing a heterologous polypeptide in a cell including introducing or expressing the fusion proteins (or constructs encoding such fusion proteins) described herein in the cell and introducing or expressing constructs including (i) a heterologous coding sequence encoding a heterologous polypeptide, and (ii) a UTR sequence including at least one RNA-binding polypeptide recognition sequence.
In a still further aspect, cells engineered to overexpress a YTHDF polypeptide as well as methods of using such cells to produce a virus containing at least one m6A sequence in a cell are also provided.
The present invention generally relates to the inventors' discovery that m6A sequences strongly enhance RNA expression in cis. Without being limited by theory, the inventors have found that m6A sequences strongly enhance RNA expression by recruiting cellular YTHDF m6A “reader” proteins. As a result, inhibition of YTHDF expression was found to inhibit RNA expression and viral replication, while YTHDF overexpression enhanced RNA expression and viral replication.
Like proteins and DNA, RNA is subject to a number of covalent modifications that can impact its function and post-transcriptionally modified nucleotides have indeed been detected on eukaryotic RNAs. Of these, the N6-methyladenosine (m6A) modification is the most common, with an average of ˜3 m6A addition sites per mRNA and with ˜25% of all cellular mRNAs generally containing multiple m6A residues. The importance of m6A is underlined by the fact that this modification is evolutionarily conserved from fungi to plants and animals, and that global inhibition of m6A addition is embryonic lethal in plants, insects and mammals.
The post-transcriptional addition of m6A to mRNAs occurs predominantly in the nucleus and is mediated by a heterotrimeric protein complex consisting of the two methyltransferase-like (METTL) enzymes METTL3 and METTL14 and their co-factor Wilms tumor 1-associated protein (WTAP). This complex specifically methylates A residues in the consensus sequence (G/A/U)(G>A) m6AC (U/C/A). In addition to these m6A “writers”, mammals also encode two RNA demethylases or “erasers” called ALKBH5 (α-ketoglutamarate-dependent dioxygenase homologue 5) and FTO (fat mass and obesity associated), which are found predominantly in the nucleus or cytoplasm, respectively. Finally, the function of m6A residues on RNAs is thought to be primarily mediated by three related cytoplasmic “reader” proteins called YTH-domain containing family 1 (YTHDF1), YTHDF2 and YTHDF3. The three YTHDF proteins all contain a conserved carboxy-terminal YTH domain that binds m6A and a more variable amino-terminal effector domain of unclear function.
Although m6A editing of viral mRNAs was first reported 40 years ago, the role of this post-transcriptional modification is only beginning to be elucidated. Here, in part, the inventors have discovered that substitution of m6A sequences or m6A-deficient forms within a UTR region of a RNA transcript including an indicator gene will significantly affect its expression. Furthermore, the inventors observe that viruses such as influenza A encode numerous m6A sequences within viral open reading frames that may affect viral expression levels. This effect, which was observed in several cell types, was equivalent at both the protein and RNA level, suggesting that m6A sequences stabilize edited RNAs. Moreover, this effect was not specific for m6A sequences of viral origin as m6A sequences derived from human RNAs also exerted a similar positive effect on RNA and protein expression. Furthermore, the inventors were able to phenocopy the observed enhancement in RNA and protein expression induced by UTR m6A sequences by recruiting YTHDF proteins to the UTR of an indicator gene by fusion to an RNA-binding polypeptide recognition sequence derived from a protein-RNA tethering system, suggesting, without being limited by theory, that m6A sequences exert their effect by recruiting YTHDF proteins.
The inventors also observed that several of the m6A sequences mapped to the areas of a viral genome required for viral replication. Given their evidence that m6A sequences primarily act to recruit YTHDF proteins, they asked if overexpression or knockdown of YTHDF proteins would induce the predicted up or down regulation of viral replication and gene expression. The inventors did indeed observe a striking increase in viral replication when a YTHDF protein was overexpressed and a marked decline in viral replication in cells in which the a YTHDF gene had been inactivated by DNA editing. Together, these data therefore reveal that m6A sequences of either viral or cellular origin enhance gene expression in cis and finally demonstrate that the level of expression of cellular YTHDF proteins impacts the level of viral gene expression and replication in several types of cells in culture, as indeed predicted if the role of m6A sequences is to recruit YTHDF proteins to the RNA.
In one aspect of the present invention, polynucleotides are provided. As used herein, the terms “polynucleotide,” “polynucleotide sequence,” “nucleic acid” and “nucleic acid sequence” refer to a nucleotide, oligonucleotide, polynucleotide (which terms may be used interchangeably), or any fragment thereof. These phrases also refer to DNA or RNA of natural or synthetic origin (which may be single-stranded or double-stranded and may represent the sense or the antisense strand).
Regarding polynucleotide sequences, the terms “percent identity” and “% identity” and “% sequence identity” refer to the percentage of residue matches between at least two polynucleotide sequences aligned using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible way, gaps in the sequences being compared in order to optimize alignment between two sequences, and therefore achieve a more meaningful comparison of the two sequences. Percent sequence identity for a polynucleotide may be determined as understood in the art. (See, e.g., U.S. Pat. No. 7,396,664, which is incorporated herein by reference in its entirety). A suite of commonly used and freely available sequence comparison algorithms is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST), which is available from several sources, including the NCBI, Bethesda, Md., at its website. The BLAST software suite includes various sequence analysis programs including “blastn,” that is used to align a known polynucleotide sequence with other polynucleotide sequences from a variety of databases. Also available is a tool called “BLAST 2 Sequences” that is used for direct pairwise comparison of two nucleotide sequences. “BLAST 2 Sequences” can be accessed and used interactively at the NCBI website. The “BLAST 2 Sequences” tool can be used for both blastn and blastp (discussed above).
Regarding polynucleotide sequences, percent identity may be measured over the length of an entire defined polynucleotide sequence, for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at least 2, at least 3, at least 10, at least 20, at least 30, at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous nucleotides. Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in the tables, figures, or Sequence Listing, may be used to describe a length over which percentage identity may be measured.
Regarding polynucleotide sequences, “variant,” “mutant,” or “derivative” may be defined as a polynucleotide sequence having at least 50% sequence identity to the particular polynucleotide over a certain length of one of the polynucleotide sequences using blastn with the “BLAST 2 Sequences” tool available at the National Center for Biotechnology Information's website. (See Tatiana A. Tatusova, Thomas L. Madden (1999), “Blast 2 sequences—a new tool for comparing protein and nucleotide sequences”, FEMS Microbiol Lett. 174:247-250). Such a pair of polynucleotides may show, for example, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% or greater sequence identity over a certain defined length.
Isolated polynucleotides homologous to the polynucleotides described herein are also provided. Those of skill in the art also understand the degeneracy of the genetic code and that a variety of polynucleotides can encode the same polypeptide. In some embodiments, the polynucleotides may be codon-optimized for expression in a particular cell. While particular polynucleotide sequences which are found in viruses and humans are disclosed herein any polynucleotide sequences may be used which encode a desired form of the substituted polypeptides described herein. Thus non-naturally occurring sequences may be used. These may be desirable, for example, to enhance expression in heterologous expression systems of polypeptides or proteins. Computer programs for generating degenerate coding sequences are available and can be used for this purpose. Pencil, paper, the genetic code, and a human hand can also be used to generate degenerate coding sequences.
The polynucleotides may include at least one m6A sequence. The N6-methyladenosine (m6A) modification of RNA is one of the most common post-transcriptional modifications detected in RNAs. As used herein, a “m6A sequence” is a RNA sequence that (1) includes the consensus sequence (G/A/U)(G>A) m6AC (U/C/A) and (2) is methylated within the central adenosine nucleotide of the consensus sequence in a cell. Both requirements are needed because, as known in the art, not every RNA sequence including the consensus sequence will necessarily be methylated in a cell. In other words, the consensus sequence is necessary but not sufficient to being a “m6A sequence.” The methylation of the consensus sequence may be detected by determining, for example and without limitation, whether the m6A sequence is bound sufficiently by an m6A specific antibody and/or a YTHDF polypeptide to indicate that the central adenosine of the consensus sequence is methylated. As used herein, a m6A sequence may also be a DNA sequence encoding such an RNA sequence. In some embodiments, the m6A sequence may include the central adenosine that is methylated in a cell and have 40%, 60%, or 80% sequence identity with the remaining nucleotides in the (G/A/U)(G>A) m6AC (U/C/A) consensus sequence. In some embodiments, the m6A sequence may require additional surrounding sequences to allow for methylation and concomitant increased gene expression, protein production and viral replication when used recombinantly. These surrounding sequences may be 5′ or 3′ to the m6A consensus sequence and may be at least 5, 6, 8, 10, 15, 20, 25, 30, 35, 40 or more nucleotides in length. In some embodiments, the m6A sequence comprises any one of SEQ ID NOS: 16-33. In some embodiments, the polynucleotide may include at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or more m6A sequences. Within the polynucleotide sequence, the m6A sequences may be separated by at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, or more bases.
The m6A sequences described herein may be either “engineered m6A sequence(s)” or “native m6A sequence(s).” As used herein, “engineered m6A sequences” are m6A sequences that are not found naturally in a given polynucleotide but rather are introduced into the polynucleotide using laboratory methods. “Native m6A sequences” are m6A sequences that are found naturally in a given polynucleotide.
In some embodiments, the polynucleotides may encode a heterologous polypeptide. In some embodiments it is envisioned that at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or more engineered m6A sequences are incorporated into the polynucleotide encoding the heterologous polypeptide that may change or not change the amino acid sequence of the heterologous polypeptide. For example, similar to how polynucleotides are often “codon-optimized” for expression in a particular cell, it is contemplated that one or more engineered m6A sequences may be incorporated into a polynucleotide encoding a heterologous polypeptide which do not alter the amino acid sequence of the polypeptide but increase the expression of the heterologous polypeptide in a cell.
In some embodiments, the polynucleotides may encode a regulatory RNA. Regulatory RNAs may include, without limitation, antisense RNAs, CRISPR RNAs, guide RNAs, long noncoding RNAs, microRNAs, siRNAs. In some embodiments it is envisioned that at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or more engineered m6A sequences are incorporated into the polynucleotide encoding the regulatory RNA. Like with polynucleotides encoding polypeptides, it is contemplated that engineered m6A sequences may be incorporated into a polynucleotide encoding a regulatory RNA which increase the expression of the regulatory RNA in a cell.
In some embodiments, the polynucleotides may encode a UTR sequence. A “UTR sequence” is a polynucleotide sequence that when expressed in a cell may, when DNA, be transcribed but, when RNA, is not typically translated. The UTR sequence may be a 3′ UTR sequence or a 5′ UTR sequence. The UTR sequence forms part of a RNA transcript that is not translated (i.e., outside the coding region for the polypeptide). In some embodiments, the UTR sequence may comprise any one of SEQ ID NOS: 1-3, 7-15, variants of SEQ ID NOS: 1-3, 7-15 having at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity to SEQ ID NOS: 1-3, 7-15, or fragments of SEQ ID NOS: 1-3, 7-15.
In some embodiments, the polynucleotides may be included within a virus. Suitable viruses are described further below.
The polynucleotides or polypeptides provided herein may be prepared by methods available to those of skill in the art. Generally, the nomenclature used herein and the laboratory procedures utilized in the present invention include molecular, biochemical, microbiological and recombinant DNA techniques that are well known and commonly employed in the art. Standard techniques available to those skilled in the art may be used for cloning, DNA and RNA isolation, amplification and purification. Such techniques are thoroughly explained in the literature.
In a further aspect of the present invention, constructs are provided. Notably each of the constructs claimed are recombinant molecules and as such do not occur in nature. As used herein, the term “construct” refers to recombinant polynucleotides including, without limitation, DNA and RNA, which may be single-stranded or double-stranded and may represent the sense or the antisense strand. Recombinant polynucleotides are polynucleotides formed by laboratory methods that include polynucleotide sequences derived from at least two different natural sources or they may be synthetic.
The constructs of the present invention may include a promoter operably connected to anyone of the polynucleotides described herein. Such embodiments may further include a polyA site. Optionally, the construct may include in the 5′ to 3′ direction of at least one strand of the construct the promoter, the polynucleotide including at least one m6A site, and the polyA site.
Alternatively, the constructs of the present invention may also include an insert site, and any one of the polynucleotides encoding UTR sequence described herein. The UTR sequence may be either 5′ or 3′ to the insert site. Such embodiments may optionally further include a polyA site. The construct may include in the 5′ to 3′ direction of at least one strand of the construct the insert site, the UTR sequence, and the polyA site. In some embodiments, the construct further includes a promoter operably connected to the insert site.
As used herein, an “insert site” is a polynucleotide sequence that allows the incorporation of another polynucleotide of interest. Exemplary insert sites may include, without limitation, polynucleotides including sequences recognized by one or more restriction enzymes (i.e., multicloning site (MCS)), polynucleotides including sequences recognized by site-specific recombination systems such as the λ phage recombination system (i.e., Gateway Cloning technology), the FLP/FRT system, and the Cre/lox system or polynucleotides including sequences that may be targeted by the CRISPR/Cas system. The insert site may comprise a heterologous coding sequence encoding a heterologous polypeptide or may include any one of the polynucleotides encoding a heterologous polypeptide or a regulatory RNA described herein.
As used herein, a “polyA site” or “polyA sequence” is a polynucleotide sequence that includes 5 or more adenosine bases or a DNA sequence that encodes such a string of adenosine bases in at least one strand or may be polynucleotide sequence that signals the addition of a polyA tail to a RNA transcript. Common polyA sequences are known in the art and may include, without limitation, polyA sequences derived from the SV40 virus, from HIV-1 or from the human or rat insulin genomic gene, the human growth hormone gene or any other mammalian mRNA encoding gene. Synthetic poly(A) addition sequences, generally consisting of the sequence 5′-AAUAA-3 linked to a 3′ G/U rich sequence, can also be used.
As used herein, the terms “promoter,” “promoter region,” or “promoter sequence” refer generally to transcriptional regulatory regions of a gene or regulatory RNA (i.e., promoters, enhancers, or both), which may be found at the 5′ or 3′ side of the gene or regulatory RNA, or within the coding region of a gene or regulatory RNA, or within introns. Typically, a promoter is a DNA regulatory region capable of binding RNA polymerase in a cell and initiating transcription of a downstream (3′ direction) coding sequence. The typical 5′ promoter sequence is bounded at its 3′ terminus by the transcription initiation site and extends upstream (5′ direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence is a transcription initiation site (conveniently defined by mapping with nuclease S1), as well as protein binding domains (consensus sequences) responsible for the binding of RNA polymerase.
As used herein, a polynucleotide is “operably connected” or “operably linked” when it is placed into a functional relationship with a second polynucleotide sequence. For instance, a promoter is operably linked to an insert site or heterologous coding sequence within the insert site if the promoter is connected to the coding sequence or insert site such that it may effect transcription of the coding sequence. In various embodiments, the polynucleotides may be operably linked to at least 1, at least 2, at least 3, at least 4, at least 5, or at least 10 promoters.
Promoters useful in the practice of the present invention include, but are not limited to, constitutive, inducible, temporally-regulated, developmentally regulated, chemically regulated, tissue-preferred and tissue-specific promoters. Suitable promoters for expression in plants include, without limitation, the 35S promoter of the cauliflower mosaic virus, ubiquitine, tCUP cryptic constitutive promoter, the Rsyn7 promoter, pathogen-inducible promoters, the maize In2-2 promoter, the tobacco PR-1a promoter, glucocorticoid-inducible promoters, estrogen-inducible promoters and tetracycline-inducible and tetracycline-repressible promoters. Other promoters include the T3, T7 and SP6 promoter sequences, which are often used for in vitro transcription of RNA. In mammalian cells, typical promoters include, without limitation, promoters for Rous sarcoma virus (RSV), human immunodeficiency virus (HIV-1), cytomegalovirus (CMV), SV40 virus, and the like as well as the translational elongation factor EF-1α promoter or ubiquitin promoter. Those of skill in the art are familiar with a wide variety of additional promoters for use in various cell types.
The constructs of the present invention may include a heterologous coding sequence encoding a heterologous polypeptide within the insert site. The heterologous coding sequence thus may be 3′ or 5′ to the UTR sequence. In some embodiments, the expression of the constructs of the present invention in a cell produces a transcript including the heterologous coding sequence and the UTR sequence. A “heterologous coding sequence” is a region of a construct that is an identifiable segment (or segments) that is not found in association with the larger construct in nature. When the heterologous coding region encodes a gene or a portion of a gene, the gene may be flanked by DNA that does not flank the genetic DNA in the genome of the source organism. In another example, a heterologous coding region is a construct where the coding sequence itself is not found in nature.
As used herein, a “heterologous polypeptide” “polypeptide” or “protein” or “peptide” may be used interchangeably to refer to a polymer of amino acids. A “polypeptide” as contemplated herein typically comprises a polymer of naturally occurring amino acids (e.g., alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, and valine).
In some embodiments, the heterologous polypeptide may be a therapeutic polypeptide, industrial enzyme or other useful protein product. Exemplary therapeutic polypeptides are summarized in, for example Leader et al., Nature Review—Drug Discovery 7:21-39 (2008). Therapeutic polypeptides include but are not limited to enzymes, antibodies, hormones, cytokines, ligands, protein antigens, competitive inhibitors and can be naturally occurring or engineered polypeptides. The therapeutic polypeptides may include, without limitation, Insulin, Pramlintide acetate, Growth hormone (GH), somatotropin, Mecasermin, Mecasermin rinfabate, Factor VIII, Factor IX, Antithrombin III (AT-III), Protein C, beta-Gluco-cerebrosidase, Alglucosidase-alpha, Laronidase, Idursulphase, Galsulphase, Agalsidase-beta, alpha-1-Proteinase inhibitor, Lactase, Pancreatic enzymes (lipase, amylase, protease), Adenosine deaminase, immunoglobulins, Human albumin, Erythropoietin, Darbepoetin-alpha, Filgrastim, Pegfilgrastim, Sargramostim, Oprelvekin, Human follicle-stimulating hormone (FSH), Human chorionic gonadotropin (HCG), Lutropin-alpha, Type I alpha-interferon, Interferon-alpha2a, Interferon-alpha2b, Interferon-alphan3, Interferon-beta1a, Interferon-beta1b, Interferon-gammalb, Aldesleukin, Alteplase, Reteplase, Tenecteplase, Urokinase, Factor VIIa, Drotrecogin-alpha, Salmon calcitonin, Teriparatide, Exenatide, Octreotide, Dibotermin-alpha, Recombinant human bone morphogenic protein 7 (rhBMP7), Histrelin acetate, Palifermin, Becaplermin, Trypsin, Nesiritide, Botulinumtoxin type A, Botulinum toxin type B, Collagenase, Human deoxy-ribonuclease I, dornase-alpha, Hyaluronidase (bovine, ovine), Hyaluronidase (recombinant human, Papain, L-Asparaginase, Rasburicase, Lepirudin, Bivalirudin, Streptokinase, Anistreplase, Bevacizumab, Cetuximab, Panitumumab, Alemtuzumab, Rituximab, Trastuzumab, Abatacept, Anakinra, Adalimumab, Etanercept, Infliximab, Alefacept, Efalizumab, Natalizumab, Eculizumab, Antithymocyte globulin (rabbit), Basiliximab, Daclizumab, Muromonab-CD3, Omalizumab, Palivizumab, Enfuvirtide, Abciximab, Pegvisomant, Crotalidae polyvalent immune Fab (ovine), Digoxin immune serum Fab (ovine), Ranibizumab, Denileukin diftitox, Ibritumomab tiuxetan, Gemtuzumab ozogamicin, Tositumomab, Hepatitis B surface antigen (HBsAg), HPV vaccine, OspA, Anti-Rhesus (Rh) immunoglobulin G98 Rhophylac, Recombinant purified protein derivative (DPPD), Glucagon, Growth hormone releasing hormone (GHRH), Secretin, Thyroid stimulating hormone (TSH), thyrotropin, Capromab pendetide, Satumomab pendetide, Arcitumomab, Nofetumomab, Apcitide, Imciromab pentetate, Technetium fanolesomab, HIV antigens, and Hepatitis C antigens.
The heterologous polypeptide may also be a Cas protein including, without limitation, Cas9. The Cas9 proteins may be derived from any bacterial genome including, without limitation, Cas9 proteins derived from Streptococcus pyogenes and Staphylococcus aureus.
Vectors including any of the polynucleotides or constructs described herein are provided. The term “vector” is intended to refer to a polynucleotide capable of transporting another polynucleotide to which it has been linked. In some embodiments, the vector may be a “plasmid,” which refers to a circular double-stranded DNA loop into which additional DNA segments may be ligated. Another type of vector is a viral vector (e.g., replication defective retroviruses, herpes simplex virus, lentiviruses, adenoviruses and adeno-associated viruses), where additional polynucleotide segments may be ligated into the viral genome. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors can be integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome, such as some viral vectors or transposons. Yeast and bacterial artificial chromosomes are also included as vectors.
Cells including any of the polynucleotides, constructs, or vectors described herein are provided. Suitable “cells” include eukaryotic cells. Suitable eukaryotic cells include, without limitation, plant cells, fungal cells, and animal cells such as cells from popular model organisms including, but not limited to, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Mus musculus, and Rattus norvegicus. In some embodiments, the cell may be a mammalian cell, a chicken cell, or an insect cell. Suitable mammalian cells include, without limitation, a mouse cell, a rat cell, a hamster cell, or a human cell. Suitable chicken cells include, without limitation, primary chicken cells such as chick embryo fibroblasts, chicken cell lines, or cells within an embryonated chicken egg. In some embodiments, the cell is a mammalian cell such as, without limitation, a mouse cell, a rat cell, a hamster cell, or a human cell. The cell may be a cell line typically used to recombinantly produce polypeptides including, without limitation, insect cell lines infected by baculovirus, yeast cell lines, and mammalian cell lines such A549 cells, CHO cells, HEK293 cells, HEK293T cells, HeLa cells, NS0 cells, Sp2/0 cells, COS cells, BK cells, MDCK cells, a T cell such as a CD4 T cell, and Vero cells. Cell lines typically used for protein production are described elsewhere, for example, in Khan et al., Advanced Pharmaceutical Bulletin 3(2): 257-263 (2013). In some embodiments, the cell may be a cell line used to produce viruses including, without limitation, insect cells, chicken cells, HEK 293 cells, HEK 293T cells, A549 cells and Vero cells.
In some embodiments, the cell may overexpress a YTHDF polypeptide. As used herein, “overexpressing” or “expressing” a polynucleotide or polypeptide in a cell refers to transcribing or translating a polynucleotide or polypeptide that has been introduced into the cell using laboratory methods. For example, the polypeptide may be expressed from a polynucleotide present in a vector for propagating the polynucleotide or the polypeptide may be expressed from a polynucleotide that is integrated into the genome of the cell. Overexpressing also includes increasing production of the native polypeptide by altering expression of the native polypeptide. Overexpression of the native polypeptide may be accomplished by any means available to those skilled in the art, including adding enhancers, altering the promoter, supplying a trans activating factor or any other means.
The function of m6A residues on RNAs is thought to be primarily mediated by three related cytoplasmic “reader” protein families called YTH-domain containing family 1 YTHDF1, YTHDF2 and YTHDF3. As used herein, a “YTHDF polypeptide” may refer to YTHDF1, YTHDF2 or YTHDF3 polypeptides from any eukaryote. Suitably, the YTHDF polypeptide could be from the organism from which the cell is derived or within. In some embodiments, the YTHDF polypeptide is the polypeptide of SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, or a polypeptide having at least 60%, 70%, 75%, 80%, 85%, 90%, 95%, or 98% sequence identity to SEQ ID NO: 4, SEQ ID NO: 5, or SEQ ID NO: 6. In mammals, YTHDF polypeptides have been well-conserved throughout evolution showing on the order of greater than 90% sequence identity. See, e.g., SEQ ID NOs: 34-40.
The polypeptides contemplated herein may be further modified in vitro or in vivo to include non-amino acid moieties. These modifications may include but are not limited to acylation (e.g., O-acylation (esters), N-acylation (amides), S-acylation (thioesters)), acetylation (e.g., the addition of an acetyl group, either at the N-terminus of the protein or at lysine residues), formylation lipoylation (e.g., attachment of a lipoate, a C8 functional group), myristoylation (e.g., attachment of myristate, a C14 saturated acid), palmitoylation (e.g., attachment of palmitate, a C16 saturated acid), alkylation (e.g., the addition of an alkyl group, such as an methyl at a lysine or arginine residue), isoprenylation or prenylation (e.g., the addition of an isoprenoid group such as farnesol or geranylgeraniol), amidation at C-terminus, glycosylation (e.g., the addition of a glycosyl group to either asparagine, hydroxylysine, serine, or threonine, resulting in a glycoprotein). Distinct from glycation, which is regarded as a nonenzymatic attachment of sugars, polysialylation (e.g., the addition of polysialic acid), glypiation (e.g., glycosylphosphatidylinositol (GPI) anchor formation, hydroxylation, iodination (e.g., of thyroid hormones), and phosphorylation (e.g., the addition of a phosphate group, usually to serine, tyrosine, threonine or histidine) or other enzymatic attachments are also encompassed.
The polypeptides disclosed herein may include “mutant” polypeptides, variants, and derivatives thereof. As used herein the term “wild-type” is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms. As used herein, a “variant, “mutant,” or “derivative” refers to a polypeptide molecule having an amino acid sequence that differs from a reference protein or polypeptide molecule. A variant or mutant may have one or more insertions, deletions, or substitutions of an amino acid residue relative to a reference molecule. A variant or mutant may include a fragment of a reference molecule. For example, a YTHDF mutant or variant polypeptide may have one or more insertions, deletions, or substitution of at least one amino acid residue relative to the YTHDF “wild-type” polypeptide. The polypeptide sequences of the “wild-type” YTHDF1, YTHDF2, and YTHDF polypeptides from humans is presented as SEQ ID NO: 4, SEQ ID NO: 5, and SEQ ID NO: 6, respectively. These sequences may be used as reference sequences.
SEQ ID NOs: 34-40 are the YTHDF2 proteins from Goat, Cat, Zebu, Gray Mouse, Beaver, Rat and a consensus sequence. Based on alignment of these sequences it becomes immediately apparent to a person of ordinary skill in the art that various amino acid residues may be altered (i.e. substituted, deleted, etc.) in, for example human YTHDF2 (SEQ ID NO: 5), without substantially affecting the activity of the polypeptide. For example, a person of ordinary skill in the art would appreciate that substitutions in a reference YTHDF2 (i.e., human) could be based on alternative amino acid residues that occur at the corresponding position in other YTHDF2 polypeptides from other species. For example, the human YTHDF2 polypeptide has a asparagine amino acid residue at position 174 while some of the other polypeptides have a serine amino acid at this position in the alignment. Thus, one exemplary modification that is apparent from the sequence alignment of these sequences is a N174S in the human YTHDF2 polypeptide (SEQ ID NO: 5). Similar modifications could be made at each position of the sequence alignments of the various YTHDF sequences provided herein. Additionally, a person of ordinary skill in the art, could easily align other YTHDF2 polypeptides with the polypeptide sequences shown here to determine what additional variants could be made to YTHDF2 polypeptides.
The polypeptides provided herein may be full-length polypeptides or may be fragments of the full-length polypeptide. As used herein, a “fragment” is a portion of an amino acid sequence which is identical in sequence to but shorter in length than a reference sequence. A fragment may comprise up to the entire length of the reference sequence, minus at least one amino acid residue. For example, a fragment may comprise from 5 to 1000 contiguous amino acid residues of a reference polypeptide, respectively. In some embodiments, a fragment may comprise at least 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 250, or 500 contiguous amino acid residues of a reference polypeptide. Fragments may be preferentially selected from certain regions of a molecule. The term “at least a fragment” encompasses the full length polypeptide. A fragment of a YTHDF polypeptide may comprise or consist essentially of a contiguous portion of an amino acid sequence of the full-length YTHDF polypeptide (SEQ ID NOS: 4, 5, or 6). A fragment may include an N-terminal truncation, a C-terminal truncation, or both truncations relative to the full-length YTHDF polypeptide. Preferably, a fragment of an YTHDF polypeptide includes amino acid residues required for the m6A reader function.
A “deletion” in a polypeptide refers to a change in the amino acid sequence resulting in the absence of one or more amino acid residues. A deletion may remove at least 1, 2, 3, 4, 5, 10, 20, 50, 100, 200, or more amino acids residues. A deletion may include an internal deletion and/or a terminal deletion (e.g., an N-terminal truncation, a C-terminal truncation or both of a reference polypeptide).
“Insertions” and “additions” in a polypeptide refer to changes in an amino acid sequence resulting in the addition of one or more amino acid residues. An insertion or addition may refer to 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, or more amino acid residues. A variant of a YTHDF polypeptide may have N-terminal insertions, C-terminal insertions, internal insertions, or any combination of N-terminal insertions, C-terminal insertions, and internal insertions.
Regarding polypeptides, the phrases “percent identity,” “% identity,” and “% sequence identity” refer to the percentage of residue matches between at least two amino acid sequences aligned using a standardized algorithm. Methods of amino acid sequence alignment are well-known. Some alignment methods take into account conservative amino acid substitutions. Such conservative substitutions, explained in more detail below, generally preserve the charge and hydrophobicity at the site of substitution, thus preserving the structure (and therefore function) of the polypeptide. Percent identity for amino acid sequences may be determined as understood in the art. (See, e.g., U.S. Pat. No. 7,396,664). A suite of commonly used and freely available sequence comparison algorithms is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST), which is available from several sources, including the NCBI, Bethesda, Md., at its website. The BLAST software suite includes various sequence analysis programs including “blastp,” that is used to align a known amino acid sequence with other amino acids sequences from a variety of databases. As described herein, variants, mutants, or fragments (e.g., a YTHDF polypeptide variant, mutant, or fragment thereof) may have 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 80%, 70%, 60%, or 50% amino acid sequence identity relative to a reference molecule (e.g., relative to the YTHDF full-length polypeptide (SEQ ID NO: 2)).
Polypeptide sequence identity may be measured over the length of an entire defined polypeptide sequence, for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, over the length of a fragment taken from a larger, defined polypeptide sequence, for instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 150 contiguous residues. Such lengths are exemplary only, and it is understood that any fragment length supported by the sequences shown herein, in the tables, figures or Sequence Listing, may be used to describe a length over which percentage identity may be measured.
The amino acid sequences of the polypeptide variants, mutants, or derivatives as contemplated herein may include conservative amino acid substitutions relative to a reference amino acid sequence. For example, a variant, mutant, or derivative polypeptide may include conservative amino acid substitutions relative to a reference molecule. “Conservative amino acid substitutions” are those substitutions that are a substitution of an amino acid for a different amino acid where the substitution is predicted to interfere least with the properties of the reference polypeptide. In other words, conservative amino acid substitutions substantially conserve the structure and the function of the reference polypeptide. Conservative amino acid substitutions generally maintain (a) the structure of the polypeptide backbone in the area of the substitution, for example, as a beta sheet or alpha helical conformation, (b) the charge or hydrophobicity of the molecule at the site of the substitution, and/or (c) the bulk of the side chain.
Methods for producing a heterologous polypeptide in a cell are provided. The methods may include introducing any of the polynucleotides, constructs, or vectors described herein into the cell. Suitably, the polynucleotides, constructs, and vectors include a heterologous coding sequence encoding a heterologous polypeptide.
As used herein, “introducing” describes a process by which exogenous polynucleotides (e.g., DNA or RNA) or protein are introduced into a recipient cell. Methods of introducing polynucleotides and proteins into a cell are known in the art and may include, without limitation, microinjection, transformation, and transfection methods. Transformation or transfection may occur under natural or artificial conditions according to various methods well known in the art, and may rely on any known method for the insertion of foreign nucleic acid sequences into a host cell. The method for transformation or transfection is selected based on the type of host cell being transformed and may include, but is not limited to, bacteriophage or viral infection, electroporation, heat shock, lipofection, and particle bombardment. Microinjection of polynucleotides and/or proteins may also be used to introduce polynucleotides and/or proteins into cells.
The polynucleotides, constructs, and vectors of the present invention may also be formulated for delivery into a human subject. For example, it is envisioned that mRNAs produced using the constructs described herein either in cells or in an in vitro transcription system may be delivered to human cells using mRNA delivery platforms like those developed by, for example, Moderna Therapeutics.
Conventional viral and non-viral based gene transfer methods can be used to introduce polynucleotides into cells or target tissues. Non-viral vector delivery systems include DNA plasmids, RNA, naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Methods of non-viral delivery of nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™ and Lipofectin™). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Felgner, WO 91/17424; WO 91/16024. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).
In some embodiments, the methods may further include expressing a YTHDF polypeptide in the cell.
The methods may also further include additional steps used in producing polypeptides recombinantly. For example, the methods may include purifying the heterologous polypeptide from the cell. The term “purifying” is used to refer to the process of ensuring that the heterologous polypeptide is substantially or essentially free from cellular components and other impurities. Purification of polypeptides is typically performed using molecular biology and analytical chemistry techniques such as polyacrylamide gel electrophoresis or high performance liquid chromatography. Methods of purifying protein are well known to those skilled in the art. A “purified” heterologous polypeptide means that the heterologous polypeptide is at least 85% pure, more preferably at least 95% pure, and most preferably at least 99% pure.
The methods may also include the step of formulating the heterologous polypeptide into a therapeutic for administration to a subject.
As used herein, the term “subject” and “patient” are used interchangeably herein and refer to both human and nonhuman animals. The term “nonhuman animals” of the disclosure includes all vertebrates, e.g., mammals and non-mammals, such as nonhuman primates, sheep, dog, cat, horse, cow, mice, chickens, amphibians, reptiles, and the like. Preferably, the subject is a human patient. More preferably, the subject is a human patient in need of a heterologous polypeptide or a vaccine.
Fusion proteins, constructs encoding these fusion proteins and cells including these constructs or capable of expressing these fusion proteins are also provided. The fusion protein may include a YTHDF polypeptide and a RNA-binding polypeptide. The terms “fusion protein” and “fusion polypeptide” may be used to refer to a single polypeptide comprising two functional segments, e.g., a YTHDF polypeptide segment and a RNA-binding polypeptide. The fusion proteins may be any size, and the single polypeptide of the fusion protein may exist in a multimeric form in its functional state, e.g., by cysteine disulfide connection of two monomers of the single polypeptide. A polypeptide segment may be a synthetic polypeptide or a naturally occurring polypeptide. Such polypeptides may be a portion of a polypeptide or may comprise one or more mutations. The two polypeptide segments of the fusion proteins can be linked directly or indirectly. For instance, the two segments may be linked directly through, e.g., a peptide bond or chemical cross-linking, or indirectly, through, e.g., a linker segment or linker polypeptide. The peptide linker may be any length and may include traditional or non-traditional amino acids. For example, the peptide linker may be 1-100 amino acids long, suitably it is 5, 10, 15, 20, 25 or more amino acids long such that the YTHDF portion of the fusion polypeptide can mediate its m6A reader function and the RNA-binding polypeptide can bind its recognition requence.
A “RNA-binding polypeptide” may be any of the RNA-binding polypeptides commonly employed in protein-RNA tethering systems. Protein-RNA tethering systems have been summarized in, for example, Coller and Wickens, Methods of Enzymology 429:299-(2007). In choosing which RNA-binding polypeptide to use as the tether, it is necessary to consider the affinity and specificity for the RNA-binding polypeptide recognition sequence, subcellular localization, and impact of the tether on the activity of the YTHDF polypeptide. The most common RNA-binding polypeptide, and the RNA-binding polypeptide used in the Examples, is the bacteriophage MS2 coat protein. However, the iron response element binding protein (IRP), a derivative of bacteriophage λN-protein, and the spliceosomal U1A protein have been used successfully. Therefore, in some embodiments, the RNA-binding polypeptide may include, without limitation, a MS2 polypeptide, a lambda N polypeptide, an iron response element binding polypeptide, or U1A polypeptide.
The constructs of the present invention may also include (i) a heterologous coding sequence encoding a heterologous polypeptide, and (ii) a UTR sequence including at least one RNA-binding polypeptide recognition sequence. The UTR sequence may be located either 5′ or 3′ to the heterologous coding sequence. In one embodiment, the UTR sequence is 3′ to the heterologous coding sequence. The RNA-binding polypeptide recognition sequence is a polynucleotide sequence recognized and bound by the RNA-binding polypeptide. The recognition sequences often form a stem loop structure. Suitable RNA-binding polypeptide recognition sequences for the MS2, lambda N, iron response element binding, and U1A RNA-binding polypeptides are known in the art. In some embodiments, the UTR sequence may include at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or more RNA-binding polypeptide recognition sequences. Within the UTR sequence, the RNA-binding polypeptide recognition sequences may be separated by at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, or more bases. Cells including any of the constructs including RNA-binding polypeptide recognition sequences are also provided.
Methods for producing a heterologous polypeptide in a cell including (a) introducing or expressing any of the fusion proteins described herein or the constructs encoding such fusion proteins into the cell, and (b) introducing or expressing any of the constructs including RNA-binding polypeptide recognition sequences described herein. Such methods may also further include additional steps used in producing therapeutic polypeptides recombinantly. For example, the methods may include purifying the heterologous polypeptide from the cell. Such methods may also include the step of formulating the heterologous polypeptide into a therapeutic for administration to a subject as described more fully above.
Cells engineered to overexpress a YTHDF polypeptide are provided. Optionally, the YTHDF overexpressing cells may include a virus comprising at least one m6A sequence. As used herein, a “virus” may include any virus or viral vector including at least one m6A sequence or any of the viruses including the polynucleotides described herein. In some embodiments, the virus may be a nucleovirus that replicates in the nucleus of a cell. In some embodiments, the virus may be a virus used to make vaccines such as, without limitation, a measles virus, a mumps virus, a rubella virus, an influenza virus, a varicella-zoster virus, a polio virus, a rotavirus, a yellow fever virus, a rabies virus, or other viruses that may be used in the production of a vaccine or for making viral stocks for use in research or other applications. An influenza virus includes, but is not limited to an influenza A, B, or C virus. In some embodiments, the virus may be a live-attenuated virus or a live virus. In some embodiments, the virus may be a virus or viral vector used in gene therapy applications such as, without limitation, a retrovirus, an adenovirus such as AAV, or a Herpes simplex virus. In some embodiments, the retrovirus may be a lentivirus. In some embodiments, the virus may include, without limitation, an Adeno Associated Virus (AAV), influenza viruses (types A-C), Human Immunodeficiency Viruses (HIV) or other viruses that may be used in the production of viral vectors (for example, for gene therapy). In some embodiments, the virus may include viruses expressed from an engineered plasmid system such as a YAC or BAC or may be native viruses.
Methods of producing a virus in a cell are provided. The methods may include (a) introducing the virus into the cell, wherein the virus comprises at least one m6A sequence and (b) introducing or expressing a YTHDF polypeptide in the cell. Such methods may also further include additional steps used in producing vaccines. For example, the methods may include purifying the virus from the cell. In some embodiments, the virus may be killed following purification from the cell. Such methods may also include the step of formulating the virus (whether live-attenuated or killed) into a vaccine for administration to a subject.
The present disclosure is not limited to the specific details of construction, arrangement of components, or method steps set forth herein. The compositions and methods disclosed herein are capable of being made, practiced, used, carried out and/or formed in various ways that will be apparent to one of skill in the art in light of the disclosure that follows. The phraseology and terminology used herein is for the purpose of description only and should not be regarded as limiting to the scope of the claims. Ordinal indicators, such as first, second, and third, as used in the description and the claims to refer to various structures or method steps, are not meant to be construed to indicate any specific structures or steps, or any particular order or configuration to such structures or steps. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to facilitate the disclosure and does not imply any limitation on the scope of the disclosure unless otherwise claimed. No language in the specification, and no structures shown in the drawings, should be construed as indicating that any non-claimed element is essential to the practice of the disclosed subject matter. The use herein of the terms “including,” “comprising,” or “having,” and variations thereof, is meant to encompass the elements listed thereafter and equivalents thereof, as well as additional elements. Embodiments recited as “including,” “comprising,” or “having” certain elements are also contemplated as “consisting essentially of” and “consisting of” those certain elements.
Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. For example, if a concentration range is stated as 1% to 50%, it is intended that values such as 2% to 40%, 10% to 30%, or 1% to 3%, etc., are expressly enumerated in this specification. These are only examples of what is specifically intended, and all possible combinations of numerical values between and including the lowest value and the highest value enumerated are to be considered to be expressly stated in this disclosure. Use of the word “about” to describe a particular recited amount or range of amounts is meant to indicate that values very near to the recited amount are included in that amount, such as values that could or naturally would be accounted for due to manufacturing tolerances, instrument and human error in forming measurements, and the like. All percentages referring to amounts are by weight unless indicated otherwise.
No admission is made that any reference, including any non-patent or patent document cited in this specification, constitutes prior art. In particular, it will be understood that, unless otherwise stated, reference to any document herein does not constitute an admission that any of these documents forms part of the common general knowledge in the art in the United States or in any other country. Any discussion of the references states what their authors assert, and the applicant reserves the right to challenge the accuracy and pertinence of any of the documents cited herein. All references cited herein are fully incorporated by reference in their entirety, unless explicitly indicated otherwise. The present disclosure shall control in the event there are any disparities between any definitions and/or description found in the cited references.
Unless otherwise specified or indicated by context, the terms “a”, “an”, and “the” mean “one or more.” For example, “a protein” or “an RNA” should be interpreted to mean “one or more proteins” or “one or more RNAs,” respectively.
As used herein, “about,” “approximately,” “substantially,” and “significantly” will be understood by persons of ordinary skill in the art and will vary to some extent on the context in which they are used. If there are uses of these terms which are not clear to persons of ordinary skill in the art given the context in which they are used, “about” and “approximately” will mean plus or minus ≤10% of the particular term and “substantially” and “significantly” will mean plus or minus >10% of the particular term.
The following examples are meant only to be illustrative and are not meant as limitations on the scope of the invention or of the appended claims.
While the presence of multiple m6A editing sites on a range of viral RNAs was reported starting almost 40 years ago, how m6A editing affects virus replication has remained unknown. Here, we precisely map several m6A editing sites on the HIV-1 genome and show that these cluster in the HIV-1 3′ untranslated region (3′UTR). We demonstrate that these viral 3′UTR m6A sites, or analogous cellular m6A sites, strongly enhance mRNA expression in cis by recruiting the cellular YTHDF m6A “reader” proteins. As a result, inhibition of YTHDF expression was found to inhibit HIV-1 replication, while YTHDF overexpression enhanced HIV-1 replication. These data identify m6A editing, and the resultant recruitment of YTHDF proteins, as major positive regulators of HIV-1 mRNA expression.
Like proteins and DNA, RNA is subject to a number of covalent modifications that can impact its function and post-transcriptionally modified nucleotides have indeed been detected on eukaryotic mRNAs (Carlile et al., 2014; Dominissini et al., 2012; Dominissini et al., 2016; Meyer et al., 2012; Schwartz et al., 2014; Squires et al., 2012). Of these, the N6-methyladenosine (m6A) modification is the most common, with an average of ˜3 m6A addition sites per mRNA and with ˜25% of all cellular mRNAs containing generally multiple m6A residues (Desrosiers et al., 1975; Dominissini et al., 2012; Meyer et al., 2012). The importance of m6A is underlined by the fact that this modification is evolutionarily conserved from fungi to plants and animals, and that global inhibition of m6A addition is embryonic lethal in plants, insects and mammals (Meyer and Jaffrey, 2014; Yue et al., 2015).
The post-transcriptional addition of m6A to mRNAs occurs predominantly in the nucleus and is mediated by a heterotrimeric protein complex consisting of the two methyltransferase-like (METTL) enzymes METTL3 and METTL14 and their co-factor Wilms tumor 1-associated protein (WTAP) (Liu et al., 2014; Meyer and Jaffrey, 2014; Yue et al., 2015). This complex specifically methylates A residues in the consensus sequence (G/A/U)(G>A) m6AC (U/C/A), although only ˜15% of sites that have this consensus are actually modified and the level of modification at any given site can vary significantly. In addition to these m6A “writers”, mammals also encode two RNA demethylases or “erasers” called ALKBH5 (α-ketoglutamarate-dependent dioxygenase homologue 5) and FTO (fat mass and obesity associated), which are found predominantly in the nucleus or cytoplasm, respectively (Jia et al., 2011; Zheng et al., 2013). Finally, the function of m6A residues on mRNAs is thought to be primarily mediated by three related cytoplasmic “reader” proteins called YTH-domain containing family 1 (YTHDF1), YTHDF2 and YTHDF3 (Meyer and Jaffrey, 2014; Wang et al., 2014; Wang et al., 2015; Yue et al., 2015). The three YTHDF proteins all contain a conserved carboxy-terminal YTH domain that binds m6A and a more variable amino-terminal effector domain of unclear function.
While the m6A modification of mRNAs is therefore well established and has been suggested to modulate several aspects of RNA metabolism (Meyer and Jaffrey, 2014; Yue et al., 2015), exactly how m6A editing regulates mRNA function remains largely unclear. Importantly, m6A modifications appear to be ubiquitous on mRNAs expressed by viruses that replicate in the nucleus, including SV40, the related retroviruses avian sarcoma virus and Rous sarcoma virus (RSV), adenovirus and influenza A virus (IAV) (Dimock and Stoltzfus, 1977; Kane and Beemon, 1985; Krug et al., 1976; Lavi and Shatkin, 1975; Sommer et al., 1976). As viruses invariably rapidly evolve to maximize their replication potential, and given that it would be simple to select for viral mutants that lack consensus m6A modification sites, this implies that the m6A modification of viral mRNAs enhances viral replication by enhancing some aspect(s) of mRNA function.
Despite the fact that the identification of m6A on viral mRNAs dates back over 40 years, no report has shown that m6A affects any aspect of viral mRNA function. Here, we first precisely map m6A modification sites on the RNA genome of human immunodeficiency virus 1 (HIV-1) and show that different HIV-1 isolates contain from four to six m6A clusters at the extreme 3′ end of the viral genome, i.e., primarily in the 3′ untranslated regions (3′UTRs) of the various HIV-1 mRNAs. We further present evidence that these 3′UTR m6A residues enhance HIV-1 gene expression and replication by increasing the steady state level of viral mRNA expression. Finally, we show that HIV-1 is sensitive to the level of YTHDF2 expression in infected T cells, demonstrating enhanced replication when YTHDF2 was overexpressed and strongly reduced replication when the YTHDF2 gene was knocked out by DNA editing. These data demonstrate that the m6A modification of HIV-1 plays a key role in promoting its replication and identifies this RNA modification as a potential novel target for antiviral drug development.
Modification of adenosines to m6A on viral mRNAs has been reported for a range of viruses that replicate in the nucleus; however, with the exception of RSV, where seven m6A addition sites were mapped using biochemical approaches (Kane and Beemon, 1985), the location of individual m6A residues has remained unknown. To map m6A modifications in HIV-1, we used the previously described photo-crosslinking-assisted m6A sequencing (PA-m6A-seq) technique (Chen et al., 2015) to identify m6A residues on the HIV-1 genome in infected human CD4+ CEM-SS T cells. For this experiment, we pulsed HIV-1 infected T-cells with the nucleoside 4-thiouridine (4SU), isolated total poly(A)+ RNA (
The function of m6A sites is primarily mediated by the cytoplasmic YTHDF proteins, though other potential nuclear or cytoplasmic m6A binding proteins have been reported (Meyer and Jaffrey, 2014; Meyer et al., 2015). To determine whether any of the m6A sites on the HIV-1 genome mapped using PA-m6A-seq are bound by one or more of the three YTHDF proteins in living cells, and hence likely to be functionally relevant, we generated clones of the human cell line 293T engineered to express FLAG-tagged versions of green fluorescent protein (GFP), YTHDF1, YTHDF2 or YTHDF3 (
Analysis of recovered reads detected T to C mutations, which are characteristic of crosslinked 4SU residues that have been subjected to reverse transcription, in 45-60% of all viral reads obtained from the three FLAG-YTHDF expressing clones but in <5% of the reads obtained from the clone expressing FLAG-GFP (
To determine if the m6A modification sites mapped on the NL4-3 laboratory isolate were conserved in primary HIV-1 isolates BaL and JR-CSF (Cann et al., 1990; Hwang et al., 1991), we repeated the PAR-CLIP analysis in the 293T clones expressing either FLAG-YTHDF1 or FLAG-YTHDF2 using pseudotyped stocks of BaL or JR-CSF. Analysis of the BaL isolate showed that all four clusters identified in NL4-3, in the env/rev overlap, in nef, in U3 and in TAR (
The analysis of m6A editing sites in JR-CSF produced a similar result. Specifically, both YTHDF1 and YTHDF2 bound to the four m6A clusters previously identified in NL4-3 (
The Introduction of m6A into 3′UTRs Enhances mRNA Function
While the YTHDF and m6A antibody-specific binding clusters detected in the NL4-3 strain of HIV-1 are <40 nt each (
While we have mapped the m6A editing sites on the NL4-3 genome in the context of a virus infection (
Previously, m6A editing has been proposed to either enhance or decrease mRNA stability (Dominissini et al., 2012; Wang et al., 2014) raising the possibility that m6A editing might exert different effects dependent on, for example, RNA sequence context. To address the generalizability of the data shown in
While the data shown in
As noted above, m6A sites are thought to function by recruiting one or more of the YTHDF proteins to the mRNA. Currently, it remains unclear whether these three proteins are functionally distinct, though our data indicate that all three YTHDFs are recruited to each of the m6A editing sites identified on the HIV-1 genome (
While the experiments presented in
As an alternative approach, we therefore asked whether overexpression of YTHDF1, YTHDF2 or YTHDF3 might enhance HIV-1 gene expression, presumably by facilitating the recruitment of these proteins to viral m6A editing sites. As shown in
We next tested whether overexpression or reduced expression of the YTHDF proteins would affect HIV-1 replication in CD4+ T cells. Western analysis of the expression of these three proteins showed a readily detectable level of expression of YTHDF2, low expression of YTHDF1 and no detectable expression of YTHDF3 in the CD4+ T-cell line CEM-SS (data not shown) and we therefore focused our attention on YTHDF2.
To examine how YTHDF2 affects HIV-1 replication in culture, we generated two subclones of CEM-SS, one in which the endogenous YTHDF2 gene was mutationally inactivated using CRISPR/Cas (Shalem et al., 2014) (Y2-KO) and a second cell line that overexpresses YTHDF2 by ˜2-fold after transduction with a lentiviral YTHDF2 expression vector (Y2-OE). Analysis of these two cell lines, and a control CEM-SS cell line transduced with a GFP-expressing lentivirus, revealed comparable levels of CD4 and CXCR4 expression on their cell surface (
Although m6A editing of viral mRNAs was first reported 40 years ago (Krug et al., 1976), the ability to precisely map these editing sites has only recently been achieved. Here, we have focused on the pathogenic human lentivirus HIV-1. We first used an in vitro technique, PA-m6A-seq (Chen et al., 2015), to map m6A editing sites on the genome of the HIV-1 laboratory strain NL4-3 using an m6A-specific antibody (
Because all the m6A editing sites identified on the HIV-1 genome were located proximal to the viral polyadenylation site, in the 3′UTR region of many or all viral mRNAs, we next asked whether substitution of wildtype or m6A-deficient forms of the HIV-1 3′ UTR downstream of an indicator gene would affect its expression. As shown in
Several of the m6A editing sites mapped to the HIV-1 RNA genome were localized to sequences that are required for HIV-1 replication for other, unrelated reasons, including the overlap between the env gene and the second coding exon of rev, the LTR NF-κB binding sites and TAR, and mutational perturbation of any one of these would therefore be likely to reduce viral replication, thus making interpretation of any loss of viral fitness upon mutation of the viral m6A sites difficult. To circumvent this problem, and given our evidence that m6A sites primarily act to recruit YTHDF proteins (
If the m6A editing sites in the HIV-1 genome are important for maximizing virus replication, then one would predict that these would be conserved. The four m6A editing sites identified in the NL4-3 laboratory strain of HIV-1 were indeed found to be conserved in the primary isolates BaL and JR-CSF, though interestingly these also contained one or two additional m6A editing sites not seen in NL4-3 (
The TAR RNA hairpin forms part of the HIV-1 “R” region and is therefore present at both ends of the viral RNA genome (Hauber and Cullen, 1988). Many of the reads obtained during the YTHDF protein PAR-CLIP experiments extend past the R region into U3, thus demonstrating that the 3′ TAR is m6A edited (
Our observation that m6A editing in 3′ UTRs, and the direct recruitment of the YTHDF proteins to 3′UTRs, can significantly enhance the level of mRNA expression and, hence, protein production contrasts with a previous paper arguing that YTHDF2 can destabilize bound mRNAs (Wang et al., 2015). We note, however, that earlier work had suggested that loss of m6A correlates with the reduced expression of edited transcripts (Dominissini et al., 2012), which is consistent with our data. While the location of m6A residues on a given mRNA, or perhaps their sequence context, could certainly regulate how m6A affects mRNA function, we do not believe that the positive effect of m6A residues present in the 3′UTR is a unique attribute of HIV-1, as several cellular m6A editing sites exerted a similar positive effect (
If m6A is indeed important for viral replication, then the question arises whether a drug that inhibits m6A editing in HIV-1, or indeed other viruses, could act as an effective antiviral. Such a drug does in fact exist. Specifically, 3-deazaadenosine (DAA) has been shown to block m6A addition to mRNA substrates by blocking the hydrolysis of S-adenosylhomocysteine, a competitive inhibitor of S-adenosylmethionine, the methyl donor used by the METTL3/METTL14/WTAP complex (Chiang, 1998). Interestingly, DAA has also been reported to inhibit the replication of a range of viruses, including RSV, IAV and HIV-1, all of which display extensive m6A editing, though the mechanism of inhibition by DAA has remained uncertain (Bader et al., 1978; Fischer et al., 1990; Flexner et al., 1992). As shown in
Western blots used the following primary antibodies: HIV-1 p24 (AIDS Reagent Program-3517), YTHDF2 (SC-162427, Santa Cruz), Actin (SC-4/778, Santa Cruz), FLAG (F1804, Sigma) and HIV-1 Nef (AIDS Reagent Program-2949). ELISAs utilized an HIV-1 p24 antigen capture kit (ABL Catalog #5421 and 5447). Total poly(A)+ RNA was purified using Ambion Poly(A)Purist MAG kits.
cDNAs encoding full length, FLAG-tagged forms of the three YTHDF proteins were obtained by PCR from a human cDNA library and were then used to generate pLEX-based lentiviral vectors. For the YTHDF-MS2 coat protein fusions, pcDNA3 was modified to express pcGFP/MS2, pcYTHDF1/MS2, pcYTHDF2/MS2 and pcYTHDF3/MS2 chimeric proteins using the same YTHDF templates. The coordinates of the included N-terminal YTHDF segments are as follows: YTHDF1 (1-382), YTHDF2 (1-401), and YTHDF3 (1-409). The open reading frame for the MS2 bacteriophage coat protein was PCR amplified from pMS2-p65-HSF, (Addgene, #61426). Four copies of the MS2 RNA aptamer were inserted into psiCHECK2 (Promega) to generate the psiCHECK2-4XMS2 reporter plasmid. For the m6A site indicator plasmids, inserts were synthesized with predicted methyl receptor adenosines mutated to a guanosine. These m6A site mutant inserts, and the analogous WT inserts, were then cloned into psiCHECK2 (Promega) via the XhoI and NotI sites. The HIV-1 U3/NF-κB/TAR insert starts 34 bp 5′ of NF-κB site II in U3 and spans the entire R region, including TAR, before terminating 26 bp into the LTR U5 region. The 3′UTR construct has an identical 3′ terminus and initiates at the BamHI site in pNL4-3. All cellular m6A indicator constructs were constructed by insertion of oligonucleotides encompassing full length cellular m6A acceptor sites, in their wildtype or mutated form, into the 3′UTR of RLuc in psiCheck2.
293T cells were cultured in Dulbecco's Modified Eagle Medium (DMEM) containing 10% fetal bovine serum (FBS) and antibiotics. CEM-SS cells were cultured in RPMI 1640 containing 10% FBS and antibiotics. HIV-1 was produced by transfection of 293T cells with the pNL4-3 molecular clone; at 72 h post-transfection, supernatant media were harvested, clarified by centrifugation and then filtered through a 0.45 μM filter (PALL). To prepare vesicular stomatitis virus glycoprotein (VSV-G) pseudotyped virus, pVSV-G was transfected at a 1:10 ratio relative to an HIV-1 proviral expression vector encoding NL4-3, BaL or JR-CSF. The supernatant media were harvested 72 h later, as described above. 293T cells were infected with the HIV-1 virus stock overnight and fresh media added the next morning. CEM-SS sub-clones were HIV-1 infected overnight, then washed with PBS and resuspended in fresh media next morning. Samples for p24 ELISA and Western analysis were collected over time from 6 ml infections per condition/biological replicate.
Clonal YTHDF expressing 293T cell lines were produced by transduction with a constitutive lentiviral YTHDF expression vector followed by selection for the encoded puromycin resistance marker. Resistant cells were then sub-cloned by limiting dilution. CEM-SS (NIH AIDS Reagent Program catalog #776) overexpressing YTHDF2 were also obtained by lentiviral transduction, and puromycin resistant cells then sub-cloned by limiting dilution. YTHDF2 overexpression was confirmed by Western. YTHDF2 knockout CEM-SS cells were obtained by transduction with lentiCRISPRv2, with the sgRNA sequence 5′-GGAACCTTACTTGAGTCCAC-3′, obtained from a published library (Shalem et al., 2014), and were cloned by limiting dilution. The control for these cell lines was a puromycin selected GFP-expressing CEM-SS sub-clone.
PAR-CLIP was performed as described (Hafner et al., 2010). The three clonal 293T cell lines expressing FLAG-YTHDF proteins, or FLAG-GFP as a control, were infected with HIV-1 NL4-3 pseudotyped with VSV-G, incubated for 48 h and then pulsed with 100 μM 4SU in fresh media for 16 h. The cells were then harvested and the PAR-CLIP protocol performed. JR-CSF and BaL infections were conducted similarly. CEM-SS cells were infected with HIV-1, 4SU pulsed, total poly(A)+ RNA purified, and the rest of the PA-m6A-Seq protocol performed as described using an m6A specific polyclonal antibody (SySy). For the indicator plasmid PAR-CLIP experiment shown in
PAR-CLIP libraries were sequenced on a HiSeq 2000, base calling was performed with CASAVA and processed with the fastx toolkit (available at hannonlab.cshl.edu/fastx_toolkit). Reads >14 bp in length were used for bioinformatic analysis. All alignments were performed with Bowtie (Langmead et al., 2009). Reads were initially aligned to the human genome build hg19 allowing up to 1 mismatch, and unaligned reads were then aligned to the HIV-1 genome of interest, again with 1 mismatch. The HIV-1 aligned reads exhibited a substantial enrichment of reads containing T>C mutations when derived from cells expressing one of the YTHDF proteins (
The raw sequencing data obtained from small RNA deep sequencing have been submitted to the NCBI Gene Expression Omnibus (GEO) and are available through accession number GSE77890.
HIV-1 based indicators were transfected into 293T or CEM-SS cells utilizing the polyethylenimine (PEI) and Lipofectamine LTX (Invitrogen) transfection methods, respectively. Cells were harvested 48 h later and subjected to either cell lysis using Passive Lysis Buffer-PLB (Promega Dual Luciferase Kit), for protein extraction, or using TRIzol, for total RNA extraction. Protein lysates were analyzed for RLuc and FLuc levels using a Dual Luciferase Assay Kit (Promega). Total RNA was reverse-transcribed using a SuperScript III kit (Invitrogen) followed by SYBR green qPCR of cDNAs utilizing RLuc, FLuc, and GAPDH mRNA specific primers. RLuc mRNA abundance was determined by normalizing first to the endogenous GAPDH mRNA and then to the control FLuc mRNA. For the tethering assays, 293T cells were transfected with 50 ng psiCHECK2 or the psiCHECK2-4xMS2 reporter and 500 ng pcGFP/MS2, pcYTHDF1/MS2, pcYTHDF2/MS2 or pcYTHDF3/MS2 using PEI. Cells were harvested 72 h post-transfection and analyzed for RLuc (reporter) and FLuc (internal control) activity using the Dual-Luciferase Assay.
While studying the role of methylation of adenosine at the N6 position (m6A), we noted that over-expression of the human protein YTHDF2 in the human cell line 293T, and to a lesser extent overexpression of the related human proteins YTHDF1 and YTHDF3, substantially enhanced the production of HIV-1 viral proteins and mRNAs in this cell line (See, e.g.,
To address the unlikely possibility that the greatly increased expression of IAV proteins seen in A549 cells expressing ectopic human YTHDF1 (
The present application claims the benefit of priority to U.S. Provisional Patent Application No. 62/318,868, filed on Apr. 6, 2016, and U.S. Provisional Patent Application No. 62/361,282, filed on Jul. 12, 2016 the contents of which are incorporated herein by reference in their entireties.
This invention was made with government support under grant number R01-AI117780 awarded by the National Institute of Health. The United States government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US17/26391 | 4/6/2017 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62318868 | Apr 2016 | US | |
62361282 | Jul 2016 | US |