The Sequence Listing written in file “Sequence Listing for 81906-907996 (217410US).txt”, created on Dec. 22, 2014 and containing 3,336 bytes, machine format IBM-PC, MS-Windows operating system is hereby incorporated by reference in its entirety for all purposes.
Small noncoding RNAs (sncRNAs) mediate a variety of cellular functions in animals and plants. It has been discovered using deep sequencing that sncRNAs circulate in the blood of humans and other mammals. The most abundant types of circulating sncRNAs are microRNAs (miRNAs), 5′ transfer RNA (tRNA) halves, and YRNA fragments, with minute amounts of other types. It has been suggested that some sncRNAs are specifically processed and secreted as macromolecular complexes to protect the non-coding RNAs from degradation.
Properties of circulating sncRNAs are consistent with a possible role as signaling molecules. For instance, it has been shown that circulating miRNAs can enter cells and regulate cellular functions.
5′ tRNA halves are derived from a small subset of tRNAs, implying that they are produced by tRNA type-specific biogenesis and/or release. The 5′ tRNA halves are not in exosomes or microvesicles, but circulate as particles of 100-300 kDa. The size of these particles suggest that the 5′ tRNA halves are a component of a macromolecular complex; this is supported by the loss of 5′ tRNA halves from serum or plasma treated with EDTA, a chelating agent, but their retention in plasma anticoagulated with heparin or citrate. A survey of somatic tissues reveals that 5′ tRNA halves are concentrated within blood cells and hematopoietic tissues, but scant in other tissues, suggesting that they may be produced by blood cells.
Full-length YRNAs are small (84-112 nt) RNAs with poorly characterized functions, best known because they make up part of the Ro ribonucleoprotein autoantigens in connective tissue diseases. The present inventors have discovered YRNA fragments of lengths 27 nt and 30-33 nt, derived from the 5′ ends of specific YRNAs, and generated by cleavage within a predicted internal loop. These 5′ YRNA fragments make up a large proportion of all small RNAs (including miRNAs) present in human serum. They are also present in plasma, are not present in exosomes or microvesicles, and circulate as part of a complex with a mass between 100 and 300 kDa.
Studies have also shown that sncRNAs may server as markers of health and disease states. For example, serum levels of specific sncRNAs such as 5′ tRNA halves change markedly with age. Additionally, caloric restriction can mitigate these age-related changes, thereby indicating that sncRNA levels are under physiologic control. The inventors have discovered that levels of circulating tRNA-derived and YRNA-derived fragments correlate to the presence of breast cancer.
There is a need in the pertinent field for non-invasive methods for detection of healthy and disease states, including various types of cancer, such as breast cancer. There is also a need for measuring circulating small noncoding RNAs. The present invention satisfies these needs and provides related advantages as well.
The present invention is based, in part, on the discovery of two types of small noncoding RNA molecules (5′ tRNA halves and YRNA fragments) found in the circulating blood (e.g., serum or plasma) of a mammal (e.g., human). The 5′ tRNA halves are derived from the 5′ end of a subset of tRNAs and correspond to the first 27, 28, 29, 30, 31, 32, 33, 34, or even 35 nucleotides of a tRNA gene sequence (e.g., any one of those named in Table 3 or Table 4). They are found in serum as particles of about 100-300 kDa, being a part of a macromolecular structure (e.g., in complex with one or more proteins) but not in exosomes or microvesicles. These 5′ tRNA halves are also found within blood cells and hematopoietic tissues, indicating their origin as being produced by blood cells. The inventors observed that the serum levels of these small RNAs change markedly, either increase or decrease, with age (see, e.g., Table 3 or Table 4), and that such change can be mitigated by calorie restrictions.
The second type of small noncoding RNA molecules identified by the inventors are YRNAs. They are small (84-112 nt) RNAs that correspond to the first 27 or 30-33 nucleotides of a YRNA gene sequence (e.g., any one of those named in Table 5 or provided herein). They make up part of the Ro ribonucleoprotein autoantigens in connective tissue diseases. In surveying small RNAs present in the serum of healthy adult humans, the inventors have discovered YRNA fragments that are derived from the 5′ ends of specific YRNAs which were previously either annotated as pseudogenes or predicted informatically. There fragments are generated by cleavage within a predicted internal loop. The 5′ YRNA fragments provided herein make up a large proportion of all small RNAs (including miRNAs) present in human serum. They are also present in plasma, but are not in exosomes or microvesicles. Like, 5′ tRNA halves, YRNA fragments circulate as part of a complex with a mass between 100 and 300 kDa.
The inventors have observed that the serum levels of these small RNAs can increase or decrease with the presence of cancer such as breast cancer, aging and caloric restriction. As such, the present invention provides novel markers and non-invasive means for monitoring an individual's health status such as aging, potential longevity, and presence/risk of disease such as cancer, infectious diseases, cardiovascular diseases, neurodegenerative disorders including Alzheimer's disease, Huntington's disease, etc., especially in comparison with one or more other individuals with known health/aging/caloric intake status.
In the first aspect, the present invention provides novel polynucleotides (e.g., small RNA molecules) that each corresponds to a section of a tRNA having the polynucleotide sequence of the first 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides starting from the 5′ end of the tRNA sequence, or a complement thereof. Table 3 and 4 provide a list of these tRNAs. The invention also provides polynucleotide sequences that are complementary to the small RNA sequences, as such complementary sequences can be useful for detecting these small RNA molecules.
In the second aspect, the present invention provides novel polynucleotides (e.g., small RNA molecules) that each corresponds to a section of a YRNA having the polynucleotide sequence of the first 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides starting from the 5′ end of the YRNA sequence, or a complement thereof. Table 5 provides a list of these YRNAs. The invention also provides polynucleotide sequences that are complementary to the small RNA sequences, as such complementary sequences can be useful for detecting these small RNA molecules.
In the third aspect, the present invention provides a polynucleotide probe including a tRNA half or YRNA fragment described herein and a detectable moiety. The snRNA can be conjugated (e.g., linked) to the detectable label.
In the fourth aspect, the present invention provides a kit for detecting a polynucleotide having the nucleotide sequence corresponding to the first 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides starting from the 5′ end of the a tRNA provided in Table 3 or Table 4, or a YRNA provided in Table 5, or the complement thereof. The kit in some cases includes appropriate primters for amplifying a tRNA half or YRNA fragment as described herein. The kit may also contain a control that provides a sample of the polynucleotide or a complement thereof and the polynucleotide probe described above. As the kit may beused for diagnositic purposes as described herein, in some embodiments, the kit may further include a standard control in which the target tRNa half or YRNA fragment of the kit is at a concentration of a known state of health/age/caloric intake.
In another aspect, the present invention provides an expression cassette (e.g., expression vector) that includes a promoter, e.g., a heterologous promoter, that is operably linked to the polynucleotide described herein. The expression cassette can be introduced (e.g., transformed or transfected) into a host cells such as a eukaryotic cell or a prokaryotic cell. Alternatively, the expression cassette can be introduced into a stable cell line. In some embodiments, the expression cassette is introduced into a human cell.
In yet another aspect, the present invention provides a method for quantitating a polynucleotide having a nucleotide sequence corresponding to the first 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides of a tRNA provided in Tables 3 or 4, or a YRNA provided in Table 5; or a complement thereof. The method includes extracting nucleic acids (e.g., RNA) from a biological sample and measuring the level of the polynucleotide in the extract. In some cases, the step of measuring comprises an amplification reaction. In other cases, the step of measuring comprises sequencing. The biological sample can be whole blood, serum, plasma, saliva, mucus, urine, cerebrospinal fluid, nipple fluid, or another bodily fluid. Optionally, the biological sample can be a tissue sample such as breast tissue, hematopoietic tissue and lymphoid tissue, from, e.g., a biopsy.
In another aspect, the present invention provides a method for determining or monitoring the health status of a mammal based on the level of at least one polynucleotide or complement thereof in a biological sample taken from the mammal (e.g., a human patient). The method includes quantitating at least one polynucleotide has a nucleotide sequence corresponding to the first 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides of a tRNA provided in Table 3 or Table 4 or a YRNA gene provided in Table 5; or a complement thereof in the sample. The method also includes comparing the level to that of a control sample and concluding that the health status of the mammal is better or worse than the control if the level of the polynucleotide(s) is greater than that of the control sample. In some cases, the mammal is a human being. In some cases, the health status is aging status and/or predicted longevity. In some cases, the health status is caloric intake, especially in relation with caloric consumption by the mammal (e.g., after subtraction of the number of calories consumed due to physical/physiological activity during the same time period). In other instances, the health status is the risk or presence of breast cancer. In some cases, the health status is the presence or risk of certain diseases, for example, various types of cancer, infectious diseases, cardiovascular diseases, neurodegenerative disorders including but not limited to Alzheimer's disease, Huntington's disease, etc. In some cases, the biological sample is blood, serum, or plasma. In other cases, the biological sample is blood cells and hematopoietic tissues (e.g., leukocytes). Depending on the specific small RNA marker, as shown in Table 3, an increase or decrease can indicate a relatively better/improved health status or more restricted calorie intake. An increase or a decrease in the level of the specific small RNA marker, as shown in Tables 4 and 5, can indicate the presence of breast cancer. Once a diagnosis is made that a subject being tested has or is at risk of later developing a disorder among those named above, the subject should be given treatment for the disorder or regularly monitored for the onset of the disorder such that preventive and/or therapeutic measures can be taken as appropriate.
Typically, the determining and monitoring is based on comparing the level of one or more small RNA molecules found in a biological sample taken from a mammal (e.g., a human) with the level of the same small RNA marker(s) found in the same type of tissue or cell sample taken from another mammal (i.e., a control subject of the same species, often the same gender, with known age and health status, such as predicted longevity, presence/absence/risk of certain diseases, and caloric intake over consumption) to establish a comparison in terms of an increased or decreased level, which in turn provides indication of more or less advanced aging process, better or worse disease state/risk, in relation to the control subject. In some cases, the monitoring is achieved by comparing the levels of one or more small RNA marker(s) in the same individual's samples taken at two or more different times to establish a comparison, and the detected increase/decrease (or lack thereof) will indicate the changes (or lack thereof) in the individual's health status during the period marked by the times when the samples were taken. Once a conclusion is reached regarding the individual's health status, either comparing with a control subject or comparing with the individual him/herself at an earlier time, additional steps in terms of therapeutic and preventive measures may be taken to remedy any undesirable effects, such as by changing caloric intake/consumption, changing life style to prevent/minimize risk of certain diseases, staring treatment for conditions such as cancer or neurodegenerative diseases, or maintaining a routine of regular medical examination for early detection and intervention of any relevant medical conditions.
In yet another aspect, the present invention provides a kit for determining or monitoring the health status of a mammal (e.g., a human). The kit contains agents for detecting one or more small RNA markers (e.g., those having a nucleotide sequence corresponding to the first 30-35 nucleotides of the tRNA listed in Tables 3 and 4, or those having a nucleotide sequence corresponding to the first 27-35 nucleotides of the YRNA listed in Table 5.), such as by performing an amplification reaction (e.g., polymerase chain reaction or PCR and reverse transcription polymerase chain reaction or RT-PCR) to identify the RNA marker. In some cases, the agent for detection may include the polynucleotide probe described above. The kit may also contain a standard control sample, which provides the standard value(s) of the marker(s) from a particular tissue/cell sample from a subject of known health status such as aging, disease presence/risk, and caloric intake (in relation to caloric consumption). Optionally, an instruction manual is also provided in the kit.
As used herein, the following terms have the meanings ascribed to them unless specified otherwise.
In this disclosure the term “or” is generally employed in its sense including “and/or” unless the content clearly dictates otherwise.
The term “nucleic acid” or “polynucleotide” refers to deoxyribonucleic acids (DNA) or ribonucleic acids (RNA) and polymers thereof in either single- or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions), alleles, orthologs, SNPs, and complementary sequences as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka et al., J. Biol. Chem. 260:2605-2608 (1985); and Rossolini et al., Mol. Cell. Probes 8:91-98 (1994)).
The term “gene” means the segment of DNA involved in producing a RNA or polypeptide chain. It may include regions preceding and following the non-coding region. It may also include regions preceding and following the coding region (leader and trailer) as well as intervening sequences (introns) between individual coding segments (exons).
The term “nucleotide” covers naturally occurring nucleotides as well as nonnaturally occurring nucleotides. It should be clear to the person skilled in the art that various nucleotides which previously have been considered “non-naturally occurring” have subsequently been found in nature. Thus, “nucleotides” includes not only the known purine and pyrimidine heterocycles-containing molecules, but also heterocyclic analogues and tautomers thereof. Illustrative examples of other types of nucleotides are molecules containing adenine, guanine, thymine, cytosine, uracil, purine, xanthine, diaminopurine, 8-oxo-N6-methyladenine, 7-deazaxanthine, 7-deazaguanine, N4,N4-ethanocytosin, N6,N6-ethano-2,6-diaminopurine, 5-methylcytosine, 5-(C3-C6)-alkynylcytosine, 5-fluorouracil, 5-bromouracil, pseudoisocytosine, 2-hydroxy-5-methyl-4-triazolopyridin, isocytosine, isoguanin, inosine and the “non-naturally occurring” nucleotides described in U.S. Pat. No. 5,432,272. The term “nucleotide” is intended to cover every and all of these examples as well as analogues and tautomers thereof. Especially interesting nucleotides are those containing adenine, guanine, thymine, cytosine, and uracil, which are considered as the naturally occurring nucleotides in relation to therapeutic and diagnostic application in humans. Nucleotides include the natural 2′-deoxy and 2′-hydroxyl sugars, e.g., as described in Kornberg and Baker, DNA Replication, 2nd Ed. (Freeman, San Francisco, 1992) as well as their analogs.
In this disclosure the term “isolated” nucleic acid molecule means a nucleic acid molecule that is separated from other nucleic acid molecules that are usually associated with the isolated nucleic acid molecule. Thus, an “isolated” nucleic acid molecule includes, without limitation, a nucleic acid molecule that is free of nucleotide sequences that naturally flank one or both ends of the nucleic acid in the genome of the organism from which the isolated nucleic acid is derived (e.g., a cDNA or genomic DNA fragment produced by PCR or restriction endonuclease digestion). Such an isolated nucleic acid molecule is generally introduced into a vector (e.g., a cloning vector or an expression vector) for convenience of manipulation or to generate a fusion nucleic acid molecule. In addition, an isolated nucleic acid molecule can include an engineered nucleic acid molecule such as a recombinant or a synthetic nucleic acid molecule. A nucleic acid molecule existing among hundreds to millions of other nucleic acid molecules within, for example, a nucleic acid library (e.g., a cDNA or genomic library) or a gel (e.g., agarose, or polyacrylamide) containing restriction-digested genomic DNA, is not an “isolated” nucleic acid.
“Purified polynucleotide” or “isolated polynucleotide” refers to a polynucleotide of interest or fragment thereof which is essentially free, e.g., contains less than about 50%, preferably less than about 70%, and more preferably less than about at least 90%, of the protein with which the polynucleotide is naturally associated. Techniques for purifying polynucleotides of interest are well-known in the art and include, for example, disruption of the cell containing the polynucleotide with a chaotropic agent and separation of the polynucleotide(s) and proteins by ion-exchange chromatography, affinity chromatography and sedimentation according to density.
“Analogs” in reference to nucleotides includes synthetic nucleotides having modified base moieties and/or modified sugar moieties. Such analogs include synthetic nucleotides designed to enhance binding properties, e.g., duplex or triplex stability, specificity, or the like.
“Complementary,” as used herein, refers to the capacity for precise pairing between two nucleotides on one or two oligomeric strands. For example, if a nucleobase at a certain position of an antisense compound is capable of hydrogen bonding with a nucleobase at a certain position of a target nucleic acid, said target nucleic acid being a DNA, RNA, or oligonucleotide molecule, then the position of hydrogen bonding between the oligonucleotide and the target nucleic acid is considered to be a complementary position. The oligomeric compound and the further DNA, RNA, or oligonucleotide molecule are complementary to each other when a sufficient number of complementary positions in each molecule are occupied by nucleotides which can hydrogen bond with each other. Thus, “specifically hybridizable” and “complementary” are terms which are used to indicate a sufficient degree of precise pairing or complementarity over a sufficient number of nucleotides such that stable and specific binding occurs between the oligomeric compound and a target nucleic acid.
“Percentage of sequence identity” is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (e.g., a polypeptide of the invention), which does not comprise additions or deletions, for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.
The terms “identical” or percent “identity,” in the context of two or more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that are the same sequences. Two sequences are “substantially identical” if two sequences have a specified percentage of amino acid residues or nucleotides that are the same (i.e., 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity over a specified region, or, when not specified, over the entire sequence of a reference sequence), when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection. Optionally, the identity exists over a region that is at least about 10, 15, 25 or 50 nucleotides in length, or over the full length of the reference sequence.
For sequence comparison, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.
Two examples of algorithms that are suitable for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al. (1977) Nuc. Acids Res. 25:3389-3402, and Altschul et al. (1990) J. Mol. Biol. 215:403-410, respectively. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information. This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al., supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always <0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) or 10, M=5, N=−4 and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength of 3, and expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff and Henikoff (1989) Proc. Natl. Acad. Sci. USA 89:10915) alignments (B) of 50, expectation (E) of 10, M=5, N=−4, and a comparison of both strands.
The BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90:5873-5787). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001.
The term “variant” refers to biologically active derivatives of the reference molecule that retain desired activity. In general, the term “variant” refers to molecules (e.g., small non-coding RNAs, microRNAs, tRNAs, YRNAs) having a native sequence and structure with one or more additions, substitutions (generally conservative in nature) and/or deletions, relative to the native molecule, so long as the modifications do not destroy biological activity and which are “substantially homologous” to the reference molecule. In general, the sequences of such variants will have a high degree of sequence homology to the reference sequence, e.g., sequence homology of more than 50%, generally more than 60%-70%, even more particularly 80%-85% or more, such as at least 90%-95% or more, when the two sequences are aligned.
“Recombinant” as used herein to describe a nucleic acid molecule means a polynucleotide of genomic, cDNA, viral, semisynthetic, or synthetic origin which, by virtue of its origin or manipulation, is not associated with all or a portion of the polynucleotide with which it is associated in nature. The term “recombinant” as used with respect to a protein or polypeptide means a polypeptide produced by expression of a recombinant polynucleotide. In general, the gene of interest is cloned and then expressed in transformed organisms, as described further below. The host organism expresses the foreign gene to produce the protein under expression conditions.
The term “transformation” refers to the insertion of an exogenous polynucleotide into a host cell, irrespective of the method used for the insertion. For example, direct uptake, transduction or f-mating are included. The exogenous polynucleotide may be maintained as a non-integrated vector, for example, a plasmid, or alternatively, may be integrated into the host genome.
A “expression vector” or “expression cassette” is capable of transferring nucleic acid sequences to target cells (e.g., viral vectors, non-viral vectors, particulate carriers, and liposomes). Typically, “vector expression cassette” and “expression vector” refer to any nucleic acid construct capable of directing the expression of a nucleic acid of interest and which can transfer nucleic acid sequences to target cells. Thus, the term includes cloning and expression vehicles, as well as viral vectors.
“Recombinant host cells”, “host cells,” “cells”, “cell lines,” “cell cultures”, and other such terms denoting microorganisms or higher eukaryotic cell lines cultured as unicellular entities refer to cells which can be, or have been, used as recipients for recombinant vector or other transferred DNA, and include the original progeny of the original cell which has been transfected.
“Operably linked” refers to an arrangement of elements wherein the components so described are configured so as to perform their usual function. Thus, a given promoter operably linked to a coding sequence is capable of effecting the expression of the coding or non-coding sequence when the proper enzymes are present. Expression is meant to include the transcription of any one or more of transcription of a small non-coding RNA, e.g., microRNA, siRNA, piRNA, snRNA, and lncRNA, antisense nucleic acid, or mRNA from a DNA or RNA template and can further include translation of a protein from an mRNA template. The promoter need not be contiguous with the coding sequence, so long as it functions to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between the promoter sequence and the coding or non-coding sequence and the promoter sequence can still be considered “operably linked” to the coding or non-coding sequence.
The phrase “differentially expressed” refers to differences in the quantity and/or the frequency of a biomarker present in a sample taken from patients having, for example, cancer caloric restriction, or age-related disease, as compared to a control subject. For example, a biomarker can be a YRNA-derived fragment which is present at an elevated level or at a decreased level in samples of patients with breast cancer compared to samples of control subjects. Alternatively, a biomarker can be a YRNA-derived fragment which is detected at a higher frequency or at a lower frequency in samples of patients with cancer compared to samples of control subjects or control tissues. A biomarker can be differentially present in terms of quantity, frequency or both.
The terms “subject,” “individual,” and “patient,” are used interchangeably herein and refer to any mammalian subject for whom diagnosis, prognosis, treatment, or therapy is desired, particularly humans. Other subjects may include cattle, dogs, cats, guinea pigs, rabbits, rats, mice, horses, and so on. In some cases, the methods of the invention find use in experimental animals, in veterinary application, and in the development of animal models for disease, including, but not limited to, rodents including mice, rats, and hamsters; primates, and transgenic animals.
As used herein, a “biological sample” refers to a sample of tissue or fluid isolated from a subject, including but not limited to, for example, urine, blood, plasma, serum, fecal matter, bone marrow, bile, spinal fluid, lymph fluid, samples of the skin, external secretions of the skin, respiratory, intestinal, and genitourinary tracts, tears, saliva, milk, blood cells, organs, biopsies, and also samples containing cells or tissues derived from the subject and grown in culture, and in vitro cell culture constituents, including but not limited to, conditioned media resulting from the growth of cells and tissues in culture, recombinant cells, stem cells, and cell components.
A “polynucleotide hybridization method” as used herein refers to a method for detecting the presence and/or quantity of a pre-determined polynucleotide sequence based on its ability to form Watson-Crick base-pairing, under appropriate hybridization conditions, with a polynucleotide probe of a known sequence. Examples of such hybridization methods include Southern blot, Northern blot, and in situ hybridization.
A “label,” “detectable label,” or “detectable moiety” is a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, chemical, or other physical means. For example, useful labels include 32P, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin, digoxigenin, or haptens and proteins that can be made detectable, e.g., by incorporating a radioactive component into the peptide or used to detect antibodies specifically reactive with the peptide. Typically a detectable label is attached to a probe or a molecule with defined binding characteristics (e.g., a polypeptide with a known binding specificity or a polynucleotide), so as to allow the presence of the probe (and therefore its binding target) to be readily detectable.
The term “caloric restriction” refers to a diet in which the amount of calories is reduced in comparison to a normal diet without malnutrition. Typically, a caloric restricted diet constitutes about 90% or 85%, often 80%, 75%, 70%, 65%, 60%, 55%, or 50% of a normal diet for a subject. As appreciated by one of skill in the art, a normal diet is determined with respect to factors such as age, sex, height and body frame, and the like.
The term “biomarker of caloric restriction” refers to a nucleic acid sequence that is differentially expressed in caloric-restricted subject. Caloric-restricted biomarkers include those that are up-regulated (i.e., expressed at a higher level) in caloric-restriction, as well as those that are down-regulated (i.e., expressed at a lower level).
The term “up-regulation” means that the ratio of the level of product in treated vs. control is greater than one. Often, the ratio is 1.1, 1.3, 1.5, 2.0 or greater. As appreciated by those in the art, statistical analysis is typically performed to evaluate significance.
The term “down-regulation” as used herein means that the ratio of the level of product in treated vs. control is less than one. Often the ratio is 0.75, 0.5, 0.25 or less. As appreciated by those in the art, statistical analysis is typically performed to evaluate significance.
Practicing this invention utilizes routine techniques in the field of molecular biology. Basic texts disclosing the general methods of use in this invention include Sambrook and Russell, Molecular Cloning, A Laboratory Manual (3rd ed. 2001); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); and Current Protocols in Molecular Biology (Ausubel et al., eds., 1994)).
As disclosed above, the small non-coding RNAs used herein refer to 5′ tRNA halves that are derived from specific tRNAs (e.g., as in Tables 3 and 4). For instance, the 5′ tRNA halves having a nucleic acid sequence corresponding to the first 27-35, e.g., 27, 28, 29, 30, 31, 32, 33, 34, 35 nucleic acids of a tRNA gene. The sncRNAs also refer to the YRNA fragments derived from specific YRNAs (e.g., as in Table 5). For instance, the YRNA fragments, having a nucleic acid sequence corresponding to the first 27-35, e.g., 27, 28, 29, 30, 31, 32, 33, 34, 35 nucleic acids of a YRNA gene (or pseudogene).
The 5′ tRNA half can be generated from the specific tRNA from which it is derived. For example, a tRNA can be cleaved by, e.g., an in vitro cleavage reaction, to generate its cognate 5′ tRNA half. Similarly, a YRNA fragment can be produced from its cognate YRNA by cleavage.
The source of the sncRNA can be naturally-occuring or synthetic. In some embodiments, a synthetic sncRNA can have a sequence that is different from a naturally-occurring sncRNA and effectively mimic the naturally-occurring sncRNA. For example, the synthetic sncRNA can have at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or greater sequence similarity to the naturally-occurring sncRNA.
Synthetic polynucleotides or oligonucleotides can be generated by, e.g., using N-phosphonate or phosphoramidite chemistries (Froehler et al., Nucleic Acid Res. 14:5399-5407 (1986); McBride et al., Tetrahedron Lett. 24:246-248 (1983)). Synthetic sequences are typically between about 10 and about 500 bases in length, more typically between about 20 and about 100 bases, and most preferably between about 40 and about 70 bases in length. In some embodiments, synthetic nucleic acids include non-natural bases, such as, but by no means limited to, inosine. As noted above, nucleic acid analogues may be used as binding sites for hybridization. An example of a suitable nucleic acid analogue is peptide nucleic acid (see, e.g., Egholm et al., Nature 363:566-568 (1993); U.S. Pat. No. 5,539,083).
In some embodiments, expression vector that comprise a heterologous promoter and a polynucleotide sequence for a tRNA or YRNA (e.g., as provided in Tables 3-5) is generated and introduced to a host cell (e.g., a eukaryotic cell, a prokaryotic cell, a human cell, and a cell line). Examples of promoters include, but are not limited to, inducible promoters, constitutive promoters, enhancers, and other regulatory elements. In some embodiments, the promoter is an elongation factor 1α (EF1α) promoter, a U6 promoter, or a CMV promoter. In addition to the tRNA or YRNA sequence and the promoter to which it is operably linked, the expression cassette may contain one or more additional components, including, but not limited to regulatory elements such as enhancers. In some embodiments, the sncRNA sequence is optionally associated with a regulatory element that directs the expression of the sncRNA sequence in a target cell.
In some embodiments, the expression vector can replicate and direct expression of a sncRNA in the target cell. Various expression vectors that can be used herein include, but are not limited to, expression vectors that can be used for nucleic acid expression in prokaryotic and/or eukaryotic cells. Non-limiting examples of expression vectors for use in prokaryotic cells include pUC8, pUC9, pBR322 and pBR329 available from BioRad Laboratories, (Richmond, Calif.), pPL and pKK223 available from Pharmacia (Piscataway, N.J.). Non-limiting examples of expression vectors for use in eukaryotic cells include pSVL and pKSV-10 available from Pharmacia; pBPV-1/pML2d (International Biotechnologies, Inc.); pcDNA and pTDT1 (ATCC, #31255); viral vectors based on vaccinia virus, poliovirus, adenovirus, adeno-associated virus, herpes simplex virus, a lentivirus; vectors derived from retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, human immunodeficiency virus, myeloproliferative sarcoma virus, and mammary tumor virus); and the like. Additional examples of suitable eukaryotic vectors include bovine papilloma virus-based vectors, Epstein-Barr virus-based vectors, SV40, 2-micron circle, pcDNA3.1, pcDNA3.1/GS, pYES2/GS, pMT, p IND, pIND(Sp1), pVgRXR (Invitrogen), and the like, or their derivatives
In some embodiments, the expression vectors disclosed herein can include one or more coding regions that encode a polypeptide (a “marker”) that allows for detection and/or selection of the genetically modified host cell comprising the expression vectors. The marker can be a drug resistance protein such as neomycin phosphotransferase, aminoglycoside phosphotranferase (APH); a toxin; or fluorescence. Various selection systems that are well known in the art can be used herein. The selectable marker can optionally be present on a separate plasmid and introduced by co-transfection.
Skilled artisans will appreciate that any methods, expression vectors, and target cells suitable for adaptation to the expression of a 5′ tRNA or YRNA in target cells can be used herein and can be readily adapted to the specific circumstances.
In certain embodiments, the disclosure relates to methods of analyzing samples for expression of sncRNA or RNA disclosed herein. Typical methods are based on hybridization analysis of polynucleotides, and sequencing of polynucleotides. The most commonly used methods known in the art for the quantification of RNA expression in a sample include northern blotting and in situ hybridization; RNAse protection assays; and reverse transcription polymerase chain reaction (RT-PCR). Alternatively, antibodies may be employed that can recognize specific duplexes, including DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein duplexes. Representative methods for sequencing-based gene expression analysis include Serial Analysis of Gene Expression (SAGE), and gene expression analysis by massively parallel signature sequencing (MPSS). In certain embodiments, a sncRNA detection agent such as a complementary nucleotide sequence can be labeled to allow detection in an imaging system, such as a positron emission tomography (PET) scan, single-photon emission computed tomography (SPECT) or a similar type of scan by administering the labeled detection agent to the subject and then scanning the brain of the subject for binding. In those instances the detection agent may be labeled so as to only emit signal if bound to the sncRNA.
Reverse Transcriptase PCR (RT-PCR) may be used to compare sncRNA levels in different sample populations, in normal and disease samples, with or without drug treatment, to characterize patterns of sncRNA levels, to discriminate between closely related sncRNAs, and to analyze RNA structure. This method typically employs isolation of sncRNA from a target sample, e.g., blood, serum, plasma or other bodily fluid.
General methods for nucleic acid (e.g., RNA) extraction are well known in the art and are disclosed in standard textbooks of molecular biology, including Ausubel et al., Current Protocols of Molecular Biology, John Wiley and Sons (1997). Methods for RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp and Locker, Lab Invest. 56:A67 (1987), and De Andres et al., BioTechniques 18:42044 (1995). In particular, RNA isolation can be performed using purification kit, buffer set and protease from commercial manufacturers, such as Qiagen, according to the manufacturer's instructions. For example, total RNA from cells in culture can be isolated using Qiagen RNeasy mini-columns. RNA may be isolated, for example, by cesium chloride density gradient centrifugation.
RT-PCR can be performed using commercially available equipment, such as the ABI PRISM 7700™ Sequence Detection System™. Differential RNA expression can also be identified, or confirmed using the microarray technique.
In addition, methods of measuring sncRNA include contacting a sample from a subject with a probe, which can be a nucleic acid-containing compound. Such nucleic acid-containing compound can be complementary to at least a portion, including at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, or at least 11 or more nucleic acids of the sncRNA sequence. The probe can also be complementary to at least 50% m at least 60%, at least 70%, at least 80%, at least 90% or at least 95%, at least 98%, or more of the sncRNA sequence. The probe can itself emit a signal or be linked to or bind to a compound that emits a signal, that can be measured, or can be used in a method of measurement such as during a PCR-based technique.
The present invention related to assaying sncRNA (e.g., 5′tRNA halves and YRNA fragments) to determine or monitor an individual's health status, e.g., aging and/or caloric restriction. The present invention also relates to the use of sncRNA biomarkers to detect cancer, e.g., breast cancer. More specifically, the biomarkers of the present invention can be used in diagnostic tests to determine, characterize, qualify, and/or assess cancer status, for example, to diagnose cancer, in an individual, subject or patient.
In some embodiments, the presence or level of one or more 5′ tRNA halves, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50 or more 5′ tRNA halves, are used to determine a subject's health status. In some cases, the 5′ tRNA halves are selected from those disclosed in Tables 3 and 4 and Dhahbi et al., BMC Genomics, 2013, 14:298, the disclosure of which is herein incorporated by reference in its entirety for all purposes.
In some embodiments, the presence or level of one or more YRNA fragments, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50 or more YRNA fragment, are used to determine a subject's health status. In some cases, the 5′ YRNA fragment are selected from those disclosed in Table 5 and Dhahbi et al., Physiol Genomics, 2013, 45(21):990-998, the disclosure of which is herein incorporated by reference in its entirety for all purposes.
Detection and quantification of RNA expression can be achieved by any one of a number of methods well known in the art, including those described above. For instance, using the known sequences for the sncRNA biomarkers, specific probes and primers can be designed for use in the detection methods described herein as appropriate.
In some cases, the RNA detection method requires isolation of nucleic acid from a sample, such as a cell or tissue sample. Nucleic acids, including RNA and specifically scnRNAs, can be isolated using any suitable technique known in the art. For example, phenol-based extraction is a common method for isolation of RNA. Phenol-based reagents contain a combination of denaturants and RNase inhibitors for cell and tissue disruption and subsequent separation of RNA from contaminants. Phenol-based isolation procedures can recover RNA species in the 10-200-nucleotide range (e.g., sncRNAs). In addition, extraction procedures such as those using TRIZOL™ or TRI REAGENT™, will purify all RNAs, large and small, and are efficient methods for isolating total RNA from biological samples that contain small non-coding RNAs.
For use in diagnostic, research and therapeutic applications suggested above, kits are also provided by the invention. In the diagnostic and research applications such kits may include any or all of the following: assay reagents, buffers, hybridization probes and/or primers, control small non-coding RNAs, etc. A therapeutic product may include sterile saline or another pharmaceutically acceptable emulsion and suspension base.
The kits may include instructional materials containing directions (i.e., protocols) for the practice of the methods of this invention. While the instructional materials typically comprise written or printed materials they are not limited to such. Any medium capable of storing such instructions and communicating them to an end user is contemplated by this invention. Such media include, but are not limited to electronic storage media (e.g., magnetic discs, tapes, cartridges, chips), optical media (e.g., CD ROM), digital media, and the like. Such media may include addresses to internet sites that provide such instructional materials.
A wide variety of kits and components can be prepared according to the present invention, depending upon the intended user of the kit and the particular needs of the user.
The following examples are offered to illustrate, but not to limit the claimed invention.
Small RNAs complex with proteins to mediate a variety of functions in animals and plants. Some small RNAs, particularly miRNAs, circulate in mammalian blood and may carry out a signaling function by entering target cells and modulating gene expression. The subject of this study is a set of circulating 30-33 nt RNAs that are processed derivatives of the 5′ ends of a small subset of tRNA genes, and closely resemble cellular tRNA derivatives (tRFs, tiRNAs, half-tRNAs, 5′ tRNA halves) previously shown to inhibit translation initiation in response to stress in cultured cells.
In sequencing small RNAs extracted from mouse serum, we identified abundant 5′ tRNA halves derived from a small subset of tRNAs, implying that they are produced by tRNA type-specific biogenesis and/or release. The 5′ tRNA halves are not in exosomes or microvesicles, but circulate as particles of 100-300 kDa. The size of these particles suggest that the 5′ tRNA halves are a component of a macromolecular complex; this is supported by the loss of 5′ tRNA halves from serum or plasma treated with EDTA, a chelating agent, but their retention in plasma anticoagulated with heparin or citrate. A survey of somatic tissues reveals that 5′ tRNA halves are concentrated within blood cells and hematopoietic tissues, but scant in other tissues, suggesting that they may be produced by blood cells. Serum levels of specific subtypes of 5′ tRNA halves change markedly with age, either up or down, and these changes can be prevented by calorie restriction.
We demonstrate that 5′ tRNA halves circulate in the blood in a stable form, most likely as part of a nucleoprotein complex, and their serum levels are subject to regulation by age and calorie restriction. They may be produced by blood cells, but their cellular targets are not yet known. The characteristics of these circulating molecules, and their known function in suppression of translation initiation, suggest that they are a novel form of signaling molecule.
Several classes of small RNAs have been found to mediate biological functions in animals and plants [1-5]. miRNAs, siRNAs, piRNAs, and others are bound by Argonaute proteins, and have the common property of directing protein complexes to nucleic acids with sequence complementarity, where they may cleave or otherwise alter the target [6]. In both plants and animals, some small RNAs are able to travel between tissues within an organism, thus transferring their functions to other cells. In vertebrates, there has been much recent interest in the presence of specific miRNAs in the plasma and serum; there is some evidence that these can be taken up by cells and alter gene expression, and there is also interest in the possibility that they can be markers of specific disease states, including cancer [7-9].
There is also evidence for processing of non-coding RNAs into smaller RNAs, many with as yet poorly understood functions [10, 11]. Many of the non-coding RNAs that appear to undergo processing into smaller RNAs have well studied functions, although their smaller derivatives often do not. In particular, tRNA is processed into shorter forms termed tRNA fragments (tRFs) [12, 13]. The subject of this report is a tRNA fragment created by cleavage of tRNA near the anticodon loop to create a “5′ tRNA half” (the term we will use here). Previous reports have described 5′ tRNA halves as intracellular molecules interacting with components of the translation initiation complex. 5′ tRNA halves have been shown to be induced by the ribonuclease angiogenin in response to stress in cultured cells, to promote assembly of stress granules carrying stalled preinitiation complexes, and to inhibit mRNA translation [14, 15]; little more is known about their function.
We have sequenced small RNAs present in mouse serum; when multiple reportable alignments of the sequencing reads to the mouse genome were allowed, we noted the presence of a class of tRNA-derived 30-33 nt fragments that closely resemble the 5′ tRNA halves previously described in stressed cell cultures. Investigation of these 5′ tRNA halves reveals a novel class of circulating small RNAs whose characteristics, including changes with age that are antagonized by calorie restriction, strongly suggest physiologic regulation and function.
While investigating the effects of aging and calorie restriction (CR) on the profiles of cell-free small RNAs circulating in the bloodstream, we used small RNA-Seq (Illumina reads of 50 nt) to compare the serum levels of small RNAs from young and old control mice, and old mice subjected to CR. A combined total of 196,083,881 pre-processed sequencing reads obtained from 9 different serum samples, were mapped to the mouse genome with bowtie using parameters that align reads according to a policy similar to Maq's default policy [16]. Alignment of the combined 196,083,881 pre-processed sequencing reads generated a dataset of 163,078,230 mapped reads (83.2%), ranging from 5 to 48 nt. The size distribution of the mapped reads revealed an expected peak at 20-24 nt consistent with the size of miRNAs (
Only if multiple reportable alignments are allowed during bowtie mapping does an unfamiliar second peak emerge at 30-33 nt (
Annotation analysis of the mapped sequencing reads revealed that the 30-33 nt peak consists of reads mapping to tRNA genes (
Characterization of Circulating Small RNAs Derived from tRNAs
Since the 86,343,437 reads that align to tRNA genes are only 30 to 33 nt, and thus do not represent full length tRNAs, we examined the tRNA end distribution of the reads, and annotated the reads based on their overlap with 5′ or 3′ ends of tRNAs. More than 99% of the tRNA-derived reads align with the 5′ end of a tRNA; this is exemplified in
23%, 17%, 35%, and 26% of the sequencing reads that map to tRNAs are 30, 31, 32, and 33 nucleotide in size, respectively (Table 1), indicating that full length tRNAs are cleaved in the anticodon loop at more than one site and at varying rates to generate the 5′ tRNA halves found in serum. As an example,
It is unlikely that this result is a sequencing artifact: the full length of most tRNAs is 75-90 nt, and the sequencing runs used to generate these data were 50 cycles while the reads occupy a narrow size range of 30-33 nt. This pattern suggests that the tRNA reads were derived from processed fragments of full length tRNAs; the remainder of the tRNA was not significantly detected in the serum small RNA libraries. In support of this conclusion, tRNAs have been shown to undergo cleavage within anticodon loops to produce tRNA-derived stress-induced fragments (tiRNAs) when cultured cells are subjected to stresses such as arsenite, heat shock, or ultraviolet irradiation [17, 18]. Such cleavage of the anticodon loop does not seem to be part of a tRNA degradation process, because the generated 5′ tRNA fragments are stable in the cell. Our findings indicate that tRNA fragments highly similar to tiRNAs are present under normal (unstressed) conditions, and can remain stable even after they are released into the peripheral blood. 5′ but not 3′ tRNA fragments inhibit mRNA translation initiation in cultured cell lines [18].
The individual 5′ tRNA halves present in serum are derived from a small subset of tRNAs (
This implies a tRNA type-specific biogenesis and/or release of the circulating 5′ tRNA halves.
Presence in Circulating Mouse Blood of Particles Containing Stable Cell-Free 5′ tRNA Halves
To obtain an independent validation of the sequencing results, we used Northern blotting to analyze small RNAs circulating in the mouse serum. As a positive control for detection of tRNA halves by Northern blotting, we included RNA from U2OS cells cultured in the absence or presence of sodium arsenite, which is known to generate tRNA halves in these cells [18]. We probed RNA from mouse serum with oligonucleotides complementary to 5′ or 3′ ends of specific tRNAs. Probes specific for the 5′ ends of tRNA-Gly-GCC or tRNA-Val-CAC detected a band migrating near the 30 nt RNA marker (
We also probed RNA from mouse serum with a probe complementary to the 5′ end of tRNA-Asn-GTT to confirm the low abundance of circulating tRNA halves derived from tRNAs that were barely detected in the sequencing data. A 5-day exposure to X-ray film showed a very weak signal from tRNA-Asn-GTT probe compared to the strong signal from the tRNA-Gly-GCC probe obtained after a short (25 minute) exposure (
We next asked if the 5′ tRNA halves are contained within circulating exosomes or microvesicles. We Northern blotted RNA extracted from pellet or supernatant after ultracentrifugation of mouse serum at 110,000 g for 2 hours. A probe for the 5′ end of tRNA-Gly-GCC detected an ˜30 nt band present mainly in the supernatant and visible only as a trace in the pellet (
Because the tRNA halves we observe are stable in circulation but not encapsulated in exosomes, they are most likely complexed to carrying factors (e.g., proteins that protect them from degradation). To determine the size range of the putative complexes carrying the 5′ tRNA halves in the serum, we Northern blotted RNA extracted from concentrate or filtrate fractions after ultrafiltration of mouse serum samples through Vivaspin 2 columns with 30, 100, or 300 kDa MW cut-off. A probe for the 5′ end of tRNA-Gly-GCC detected a ˜30 nt band in the concentrates of 30 and 100 kDa MW cut-off, and in the filtrate of 300 kDa MW cut-off (
Thus 5′ tRNA halves circulate as part of 100-300 kDa complexes, while the 5′ tRNA halves themselves are only ˜10 kDa. This is reminiscent of reports that miRNAs can circulate in the bloodstream as components of RNA-protein/lipoprotein complexes. Stable argonaute 2-miRNA complexes that are not part of microvesicles were recovered from plasma and serum, and high-density lipoprotein has been reported to carry and deliver miRNAs to recipient cells [20-22].
5′ tRNA Halves are Concentrated in Hematopoietic and Lymphoid Tissues
To investigate whether 5′ tRNA halves are present in tissues we extracted total RNA from liver, spleen, and testes, and did Northern blots with probes complementary to 5′ and 3′ ends of tRNAs. We detected tRNA halves with a probe complementary to the 5′ end of tRNA-Gly-GCC in the spleen, but not in the liver and testes; a probe for the 3′ end tRNAs detected only full length tRNAs in all 3 tissues (
More extensive studies will establish if 5′ tRNA halves are concentrated in particular blood cell types, although the very high levels in lymph nodes point to lymphocytes as one such type. The evidence does not establish whether the 5′ tRNA halves are concentrated in hematopoietic cells because they are produced there, or because they are preferentially taken up from the blood: neither the origin nor the destinations of the 5′ tRNA halves is certain. The low levels of 5′ tRNA halves present in non-hematopoietic tissues may indicate low levels in those tissues, but they may also be derived from residual blood cells in those tissues.
A Chelating Agent Destabilizes Circulating 5′ tRNA Halves
Because clotting has the potential to release particles that are not present in circulating blood, we asked if 5′ tRNA halves circulating in the mouse serum are also present in mouse plasma. Northern blotting with a 5′ tRNA half probe gave a very weak band in a plasma sample when compared to the band derived from an equal volume of serum from the same mouse (
This result could suggest that 5′ tRNA halves are an artifact of blood clotting, but could also be an effect of EDTA, a chelating agent that depletes ions required for clotting. To assess the effects of EDTA on 5′ tRNA halves, we used Northern blotting to analyze a sample of serum that was incubated with EDTA for 15 min before RNA extraction. We also analyzed a sample of plasma extracted from blood collected with heparin, a nonchelating anticoagulant. This analysis showed that treatment of serum with EDTA significantly decreased the signal corresponding the 5′ tRNA halves, while 5′ tRNA halves are abundant in heparinized plasma (
Calorie Restriction Offsets Age-Associated Changes in Levels of Specific Circulating 5′ tRNA Halves
Calorie restriction (CR) can delay, prevent, or reverse many age-associated changes in physiologic parameters. We used aging and CR as model physiologic states to explore the possibility that they are associated with changes in the levels of circulating 5′ tRNA halves. We performed pairwise comparisons between young and old control groups to measure the differential abundance in circulating 5′ tRNA halves associated with old age, and between old control and old CR groups to determine whether CR has an effect on any age-associated changes.
This analysis revealed that aging is associated with alterations, either increase or decrease, in the circulating levels of 5′ tRNA halves derived from specific tRNA isoacceptors (Table 3). Notably, CR mitigated most of these age-related changes (Table 3), although it did not completely prevent them. CR has been shown to oppose the molecular and biological markers of aging including alterations in gene expression [24]. A causal relationship between circulating 5′ tRNA halves and the manifestations of aging is not established by this study, but it does indicate that levels are regulated in an age-associated fashion.
Deep sequencing of small RNAs extracted from mouse serum identifies a population of tRNA-derived molecules, termed 5′ tRNA halves, previously described only as stress-induced inhibitors of translation initiation in cultured cells. 5′ tRNA halves are more abundant than miRNAs in mouse serum, and are derived from distinct subset of tRNAs by cleavage near the anticodon loop; the 3′ portion of the tRNA molecule is present in serum only in trace quantities. Ultracentrifugation and size fractionation establish that the 5′ tRNA halves circulate as part of a larger complex, but are not contained in exosomes or microvesicles; their sensitivity to the chelating agent EDTA provides further evidence that they exist as circulating nucleoprotein complexes. They are concentrated in hematopoietic and lymphoid tissues, and present in other tissues at very low levels that may reflect residual blood cells. The origin of the serum particles, and their destinations, are uncertain; however their concentration in blood cells suggest that they may be produced by these cells. Levels of serum 5′ tRNA halves are distinctly changed in aged mice, and calorie restriction inhibits these changes, indicating that they are subject to physiologic regulation. Taken together with the extant evidence that 5′ tRNA halves can regulate mRNA translation, the characteristics of the circulating 5′ tRNA halves we have discovered suggest that they function as signaling molecules with as yet unknown physiologic roles.
To date, the only known function of 5′ tRNA halves is inhibition of translation in cultured cells subjected to a variety of stressors; transfection of 5′ tRNA halves inhibits global translation in U2OS cells [14, 18]. [14, 18]. A study published while this paper was in preparation reported induction of 5′ tRNA halves in human airway epithelial cells upon infection with respiratory syncytial virus (RSV). Induction involves cleavage at the tRNA anticodon loop by angiogenin, and at least one type, the 5′ tRNA-Glu-CTC half, promotes RSV replication [25]. Our findings indicate that 5′ tRNA halves function on an organismal rather than merely a cellular level. Furthermore they are likely to function in a context much broader than cellular stress or infection: we find 5′ tRNA halves in unstressed conditions. Changes in their expression (either increased or decreased) with age are also consistent with a broader physiologic role, and it is particularly interesting that these changes are partially mitigated by calorie restriction.
The most extensively studied cellular tRNA halves are generated under stress conditions by angiogenin, which cleaves mature tRNAs within the anticodon loops [26]. The stress-induced tRNA halves target the translation initiation machinery to reprogram protein translation in order to promote cell survival during stress [14, 26]. Pull-down and mass spectrometry analyses of RNA-protein complexes have identified several cellular proteins (YB-1, FXR-1, and PABP1) bound to intracellular 5′ tRNA halves [14]. The nature of the proteins and/or other factors that bind and stabilize the extracellular form of 5′ tRNAs halves has yet to be elucidated. Understanding of the origin, composition, and destinations of these complexes will provide insights into their role in organismal physiology.
Male mice of the long-lived B6C3F1 strain were fed either control or calorie-restricted (CR) diet (40% fewer calories than the control). Three mice were studied from each of three groups: young (7-month) and old (27-month) mice fed the control diet, and old (27-month) mice fed the CR diet. Total RNA including small RNA was isolated from each serum sample with miRNeasy kit (Qiagen) and used to construct indexed sequencing libraries with the Illumina TruSeq Small RNA Sample Prep Kit. The libraries were pooled and sequenced on an Illumina HiSeq 2000 instrument to generate 50 base reads.
Mice and diets. One-month-old male mice of the long-lived B6C3F1 strain were purchased from Harlan (Indianapolis, Ind.). One week after arrival, mice were individually housed and randomly assigned to one of two groups, control or calorie restricted (CR). Control mice were fed 93 kcal/wk of a defined control diet (AIN-93M, diet no. F05312, BIO-SERV). CR mice were fed 52.2 kcal/wk of a defined CR diet (AIN-93M 40% Restricted, diet no. F05314, BIO-SERV). The CR mice consumed ˜40% fewer calories than the control group. The CR diet was enriched so that the CR mice consumed approximately the same amount of protein, vitamins, and minerals per gram of body weight as the control mice. All mice had free access to water. Mice were maintained at 20-24° C. and 50-60% humidity with lights on from 0600 to 1800 h. Sentinel mice were kept in the same room as the experimental mice, and serum samples were screened every 6 months for titers against 11 common pathogens. No positive titers were found during these studies. At 27-months of age, mice were euthanized, and blood was collected through cardiac puncture and processed immediately. A group of control mice were euthanized at 7 months of age and used as a young control group. Each group consisted of 3 mice. The Institutional Animal Care and Use Committee of the University of California, Riverside, approved animal protocols.
RNA isolation, and small RNA library construction. Immediately after collection, blood was transferred to BD Microtainer tubes (Becton, Dickinson and Company), incubated for 30 min at room temperature to allow blood clotting, and centrifuged at 5,000 g for 10 min. The serum supernatant was transferred to new tubes, centrifuged at 16,000 g for 15 min to remove any residual cells and cell-debris, and stored at −80° C. before use. Isolation of total RNA including small RNA was performed with miRNeasy kit (Qiagen) according to the manufacturer's protocol with the exceptions of mixing 2 mL of Qiazol reagent with 0.4 mL serum, loading the entire aqueous phase onto a single column from the MinElute Cleanup Kit (Qiagen), and eluting the RNA in 20 μL of RNase-free water.
One fourth (5 μL) of the RNA isolated from each serum sample was used to construct sequencing libraries with the Illumina TruSeq Small RNA Sample Prep Kit, following the manufacturer's protocol. Briefly, 3′ and 5′ adapters were sequentially ligated to small RNA molecules and the obtained ligation products were subjected to a reverse transcription reaction to create single stranded cDNA. To selectively enrich those fragments that have adapter molecules on both ends, the cDNA was amplified with 15 PCR cycles using a common primer and a primer containing an index tag; this allows multiplexing and sequencing different samples in a single lane of a flowcell. The amplified cDNA constructs were gel purified, and validated by checking the size, purity, and concentration of the amplicons on the Agilent Bioanalyzer High Sensitivity DNA chip. The libraries were pooled in equimolar amounts, and sequenced on an Illumina HiSeq 2000 instrument to generate 50 base reads. Image deconvolution and quality values calculation were performed using the modules of the Illumina pipeline.
RNA extraction from mouse tissues, stressed U2OS cells, fractionated mouse serum and plasma for Northern blot analysis. For stress induction, U2OS cells were cultured in McCoy's 5A Medium supplemented with 10% fetal calf serum and 1% of penicillin/streptomycin, and treated with 500 μM of sodium arsenite (Sigma) for 2 hours before RNA extraction. Tissues and sera were collected from one-year-old mice fed control diet. Tissues were flash frozen in liquid nitrogen. Serum samples were centrifuged at 110,000 g for 2 hrs, and supernatant and pellet fractions were separated. Samples of 0.2 ml serum mixed with 1.8 ml PBS were subjected to ultrafiltration through Vivaspin 2 columns (GE Healthcare) with 30, 100, or 300 kDa MW cut-off, and concentrate and filtrate fractions were collected. All samples were stored at −80° C. before RNA extraction. For plasma preparation, mouse blood samples were mixed with 0.5 M EDTA (10 μl/ml) or sodium heparin (5.5 mg/ml) and centrifuged at 10,000 g for 10 min. The plasma supernatant was transferred to new tubes, centrifuged at 16,000 g for 15 min to remove any residual cells and cell-debris, and stored at −80° C. before use. Total RNA including small RNA was isolated from tissue samples, cell pellets or serum fractions with miRNeasy kit (Qiagen).
Collection of human blood and RNA extraction from serum and plasma. Human blood samples were collected with Institutional Review Board approval after obtaining informed consent. Blood was collected from one young adult male in BD Vacutainer Venous Blood Collection Tubes (BD Diagnostics): K2 EDTA Spray-Dried (BD-366643) or Spray-Coated Sodium Heparin (BD367874). Blood was transferred to Leucosep Centrifuge Tubes (Grenier Bio One #227290P) and centrifuged at 800 g for 15 min at room temperature. The plasma supernatant was transferred to fresh tubes, centrifuged at 16,000 g for 15 min to remove any residual cells and cell-debris, and stored at −80° C. before use. Total RNA including small RNA was isolated from plasma or serum with miRNeasy kit (Qiagen).
Preparation of leukocytes from mouse and human blood and RNA extraction. Blood was collected on EDTA, centrifuged at 1000 g for 15 minutes to separate the plasma and blood cells. The buffy coat was collected, incubated in erythrocyte lysis buffer (Qiagen), and washed with PBS. Leukocyte pellets were flash frozen in liquid nitrogen, and stored at −80° C. before use. Total RNA including small RNA was isolated from leukocytes pellets with miRNeasy kit (Qiagen).
Sequencing reads were pre-processed with FASTX-Toolkit (hannonlab.cshl.edu) to trim the adaptor sequences, and discard low quality reads. The obtained clean reads were mapped to the mouse reference genome (GRCm38/mm10) with bowtie version 0.12.8 [16] using different combinations of alignment and reporting options. We used the option “−n 0-114” to align the sequencing reads according to a policy similar to Maq's default policy and requiring no mismatches in the first 14 bases (the high-quality end of the read). In addition, this mode of alignment was combined with options that define which and how many alignments should be reported; the option “−k 1-best” instructed bowtie to report only the best alignment if more than one valid alignment exists, while the option “−m 1” instructed bowtie to refrain from reporting any alignments for reads having multiple reportable alignments. The “−k 1-best” and “−m 1” modes of alignment reporting were also used in combination with the end-to-end k-difference (−v) alignment mode. Varying the alignment and reporting modes allowed the differential detection of two predominant peak sizes of sequencing reads as described in the results section.
Annotation analysis of the mapped sequencing reads was performed with bedtools [27] using the following databases: the Genomic tRNA Database [19] (gtrnadb.ucsc.edu), miRBase 18 (mirbase.org), and rRNA, snRNA, scRNA, and srpRNA which were extracted from the RepeatMasker track (genome.ucsc.edu; mm10).
Analysis of Differentially Abundant Circulating tRNA Halves
The bowtie alignment files generated above from the young and old control and old CR serum sequencing samples were analyzed with bedtools [27] to obtain the coverage of the tRNA genes included in the Genomic tRNA Database [19] (gtrnadb.ucsc.edu), and to determine the read count for each tRNA in the database. The tRNA read counts were further analyzed with the Bioconductor package edgeR [28] to detect the changes in the levels of circulating 5′ tRNA halves in the different experimental groups. The algorithm of edgeR fits a negative binomial model to the count data, estimates dispersion, and measures differences using the generalized linear model likelihood ratio test which is recommended for experiments with multiple factors, such as the simultaneous analysis of age and diet in our study. The fitted count data was analyzed by performing pairwise comparisons between the different experimental groups: young and old control groups were compared to measure the differential abundance in circulating 5′ tRNA halves associated with old age; old control and old CR groups were compared to determine whether CR has an effect on any age-associated changes. The results were further filtered to keep only 5′ tRNA halves that achieved a minimum of 500 counts per million (cpm) in at least one of the 3 experimental groups.
RNAs analyzed with Northern blots were extracted from normal or sodium arsenite-treated U2OS and from a variety of tissues and sera harvested from one-year-old mice fed control diet. Before RNA extraction, some serum samples were centrifuged at 110,000×g for 2 hrs, and supernatant and pellet fractions were separated, or were separated into concentrate and filtrate fractions by ultrafiltration through Vivaspin 2 columns with 30, 100, or 300 kDa MW cut-off. RNAs were separated on 15% denaturing polyacrylamide gels, transferred and fixed to a membrane by chemical cross-linking [29], and hybridized with probes complementary to 5′ and 3′ ends of tRNAs.
RNAs extracted from tissue samples, cell pellets or serum fractions as described above were separated on 15% polyacrylamide Criterion TBE-Urea gels (Bio-Rad), transferred to a Hybond NX membrane (GE life sciences), and fixed to the membrane by chemical cross-linking (1). Blots were hybridized overnight at 42° C. in ULTRAhyb-Oligo Buffer (Invitrogen) with the following 32P-5′-end labeled oligonucleotide probes against the 5′ end of tRNA-Gly-GCC (5′-GGCGAGAATTCTACCACTGAACCACCAA; SEQ ID NO:3), the 3′ end of tRNA-Gly-GCC (5′-TGCATTGGCCGGGAACCGAACCCGGGCCTCCCGCG; SEQ ID NO:4), the 5′ end of tRNA-Val-CAC (5′-AGGCGAACGTGATAACCACTACACTACGGA; SEQ ID NO:5), or the 3′ end of tRNA-Val-CAC (5′-TGTTTCCGCCCGGTTTCGAACCGGGGACCTTTCGCG; SEQ ID NO:6), or the 5′ end of tRNA-Asn-GTT (5′-CGAACGCGCTAACCGATTGCGCCACAGA; SEQ ID NO:7). Membranes were washed twice with 2×SSC, 0.1% SDS solution for 30 minutes, and exposed to X-ray films for detection of signals
Real Time Quantitative PCR (qPCR)
For qPCR assays, 10 fmoles of the synthetic C. elegans cel-miR-39 (Qiagen #MSY0000010) were spiked into 0.2 ml of serum or plasma before RNA extraction to account for variations during RNA extraction, cDNA synthesis, and real-time PCR. One fourth of total RNA extracted from 0.2 ml serum or plasma was reverse transcribed using the miScript Reverse Transcription Kit (Qiagen) according to the manufacturer's protocol. The obtained reverse transcription product was amplified using the following Qiagen reagents: SYBR Green PCR Master Mix, Universal Primer, and miScript Primer Assays for miR-16, miR-24, and miR-Cel-39. Real-time qPCR was carried out on a Bio-Rad CFX96 thermocycler.
Small noncoding RNAs carry out a variety of functions in eukaryotic cells, and in multiple species they can travel between cells, thus serving as signaling molecules. In mammals multiple small RNAs have been found to circulate in the blood, although in most cases the targets of these RNAs, and even their functions, are not well-understood. YRNAs are small (84-112 nt) RNAs with poorly characterized functions, best known because they make up part of the Ro ribonucleoprotein autoantigens in connective tissue diseases. In surveying small RNAs present in the serum of healthy adult humans, we have found YRNA fragments of lengths 27 nt and 30-33 nt, derived from the 5′ ends of specific YRNAs and generated by cleavage within a predicted internal loop. Many of the YRNAs from which these fragments are derived, were previously annotated only as pseudogenes, or predicted informatically. These 5′ YRNA fragments make up a large proportion of all small RNAs (including miRNAs) present in human serum. They are also present in plasma, are not present in exosomes or microvesicles, and circulate as part of a complex with a mass between 100 and 300 kDa. Mouse serum contains far fewer 5′ YRNA fragments, possibly reflecting the much greater copy number of YRNA genes and pseudogenes in humans. The processing and secretion of specific YRNAs to produce 5′ end fragments that circulate in stable complexes are consistent with a signaling function.
Small noncoding regulatory RNAs, including miRNAs, siRNAs, piRNAs, and others, have been the focus of much recent interest, not only because they are crucial for a wide range of biological functions, but also because they are involved in the pathology of cancer and many other human diseases (Esteller M., Nature reviews Genetics 12: 861-874, 2011; Joshua-Tor L, and Hannon G J., Cold Spring Harbor perspectives in biology 3: a003772, 2011; Martens-Uzunova et al., Cancer letters 2013; Okamura K., Wiley interdisciplinary reviews RNA 3: 351-368, 2012; Wery et al., Wiley interdisciplinary reviews Systems biology and medicine 3: 728-738, 2011; Zhang C., Current opinion in molecular therapeutics 11: 641-651, 2009; Zhang et al., Plant physiology, 150: 378-387, 2009). Although miRNAs in particular have been found to have broad biological roles, next generation sequencing has revealed new small RNA types with uncertain functions. Well-described small noncoding RNAs such as tRNAs, snoRNAs, and YRNAs have been found to give rise to smaller RNA species (Dhahbi et al., BMC genomics 14: 298, 2013; Kapranov et al., Science, 316: 1484-1488, 2007; Rother and Meister, Biochimie, 93: 1905-1915, 2011; Tuck and Tollervey, Trends in Genetics, 27: 422-432, 2011); although in many cases the functions of the noncoding RNAs that undergo processing into smaller RNAs are known, the functions of their smaller derivatives remain poorly understood. Intracellular 5′ tRNA halves have been shown to be cleaved by the ribonuclease angiogenin in response to stress and infections; the generated 5′ tRNA halves promote assembly of stress granules carrying stalled preinitiation complexes, and inhibit mRNA translation (Gong et al., BMC infectious diseases, 13: 285, 2013; Ivanov et al., Molecular Cell, 43: 613-623, 2011; Saikia et al., The Journal of biological chemistry, 2012). Some snoRNA-derived RNAs exhibited miRNA-like regulatory activity, while the expression levels of other snoRNA-derived RNAs are altered in cancer (Martens-Uzunova et al., Oncogene 31: 978-991, 2012). It has been proposed that snoRNA-derived RNAs may act as tumor suppressors and oncogenes (Martens-Uzunova et al, Cancer letters, 2013). Human YRNA-derived fragments were first detected in cells exposed to apoptotic stimuli (Rutjes et al., The Journal of biological chemistry, 274: 24799-24807, 1999). They were later observed in solid tumors (Meiri et al., Nucleic acids research, 38: 6234-6246, 2010; Schotte et al., Leukemia, 23: 313-322, 2009) and in cultured cells as a response to the chemical stressor poly(I:C) (Nicolas et al., FEBS letters, 586: 1226-1230, 2012).
In both plants and animals, some small RNAs are able to travel between tissues within an organism, thus transferring their functions to other cells. There has been much recent interest in specific miRNAs circulating in the plasma and serum, and some evidence that these can be taken up by cells and alter gene expression; there is also interest in the possibility that they can be markers of specific disease states, particularly cancer (Allegra et al., International journal of oncology, 41: 1897-1912, 2012; Etheridge et al., Mutation Research 717: 85-90, 2011; Zen K, and Zhang C Y, Medicinal research reviews, 32: 326-348, 2012). Using deep sequencing, we recently demonstrated that the levels of many miRNAs circulating in the mouse are increased with age, and that these increases can be antagonized by calorie restriction (Dhahbi et al, Aging, 5: 130-141, 2013). The genes targeted by this set of age-modulated miRNAs are predicted to regulate biological processes directly relevant to the manifestations of aging, and the miRNAs themselves have been linked to diseases associated with old age.
We recently reported a novel class of circulating small RNAs, 5′ tRNA halves, which prior to our report were described only as stress-induced inhibitors of translation initiation in cultured cells (Dhahbi et al., BMC genomics, 14: 298, 2013). We found that the 5′ tRNA halves are concentrated in hematopoietic and lymphoid tissues, and present in other tissues at very low levels, suggesting that they may be processed in blood cells and released into the blood. Our findings imply that 5′ tRNA halves function on an organismal rather than merely a cellular level. Moreover, they likely function in a context much broader than cellular stress or infection: we find circulating 5′ tRNA halves in unstressed conditions. Changes in their expression with age are also consistent with a broader physiologic role, and it is particularly interesting that these changes are partially mitigated by calorie restriction.
The subject of this report is yet another derivative of a known class of small noncoding RNAs, the YRNAs. They are a largely unexplored noncoding RNA species that are transcribed by RNA polymerase III from four YRNA genes in man (hY1, hY3, hY4 and hY5), and two genes in mice (mY1 and mY3) (Wolin and Steitz, Cell, 32: 735-744, 1983). The sizes of the human YRNAs are 112 nt (hY1), 101 nt (hY3), 98 nt (hY4), and 84 nt (hY5). In addition to the annotated genes, the human genome carries a very large number of YRNA sequences that have been annotated as pseudogenes, while the mouse has few or none (Perreault et al., Nucleic acids research, 33: 2032-2041, 2005; Perreault et al., Molecular biology and evolution, 24: 1678-1689, 2007). YRNAs are components of Ro ribonucleoproteins (Ro RNPs), which are clinically significant autoantigens that are recognized by antibodies in patients with connective tissue diseases (Bouffard et al., The Journal of rheumatology, 23: 1838-1841, 1996; Lerner et al., Science, 211: 400-402, 1981; Reed et al., J Immunol, 191: 110-116, 2013). Although YRNAs are reported to function in chromosomal DNA replication and quality control of noncoding RNA (Sim and Wolin, Wiley interdisciplinary reviews RNA, 2: 686-699, 2011), the function of YRNA-derived fragments has yet to be elucidated. Here we report the presence of abundant cell-free YRNA-derived fragments circulating as large complexes in human serum and plasma. These fragments are derived mostly from the 5′ ends of YRNAs and seem to rise from cleavage of YRNAs at a predicted internal loop to produce what we term “5′ YRNA fragments.”
Blood samples were collected from 5 adult women between 30 and 57 years of age, after obtaining informed consent. To obtain serum samples, blood was collected in BD Vacutainer SST tubes (#367985, BD, Franklin Lakes, N.J.), incubated for 30 min at room temperature to allow coagulation, and centrifuged at 5,000 g for 10 min. To obtain plasma samples, blood was collected in BD K2 EDTA Spray-Dried tubes (#366643, BD, Franklin Lakes, N.J.) or in tubes containing sodium heparin (5.5 mg/ml), transferred to Leucosep tubes (#227290P, Grenier Bio One, Monroe, N.C.) and centrifuged at 800 g for 15 min at room temperature. Serum and plasma supernatants were transferred to new tubes, centrifuged at 16,000 g for 15 min to remove any residual cells and cell debris, and stored at −80° C. before use. Blood samples were also collected from 5 male B6C3F1 mice (Charles River Laboratories) at 7 months of age. Immediately after collection, blood was transferred to BD Microtainer tubes (#365967, BD, Franklin Lakes, N.J.), and processed as described above for the human blood samples to prepare mouse serum.
Isolation of total RNA, including small RNA, was performed with the miRNeasy kit (#217004, Qiagen, Hilden, Germany) according to the manufacturer's protocol except for the following alterations: 1 mL of Qiazol reagent was mixed with 0.2 mL serum or plasma, the entire aqueous phase was loaded onto a single column from the MinElute Cleanup Kit (#74204, Qiagen, Hilden, Germany), and RNA was eluted in 20 μL of RNase-free water. One fourth (5 μL) of the RNA isolated from each serum or plasma samples was used to construct sequencing libraries with the Illumina TruSeq Small RNA Sample Prep Kit (#RS-200-0012, Illumina, San Diego, Calif.), following the manufacturer's protocol. Briefly, 3′ and 5′ adapters were sequentially ligated to small RNA molecules and the obtained ligation products were subjected to a reverse transcription reaction to create single stranded cDNA. To selectively enrich those fragments that have adapter molecules on both ends, the cDNA was amplified with 15 PCR cycles using a common primer and a primer containing an index tag to allow sample multiplexing. The amplified cDNA constructs were gel purified, and validated by checking the size, purity, and concentration of the amplicons on the Agilent Bioanalyzer High Sensitivity DNA chip (#5067-4626, Genomics Agilent, Santa Clara, Calif.). The libraries were pooled in equimolar amounts, and sequenced on an Illumina HiSeq 2000 instrument to generate 50 base reads.
Sequencing reads were pre-processed with FASTX-Toolkit (hannonlab.cshl.edu) to trim the adaptor sequences, and discard low quality reads. The filtered reads were mapped to the human (hg19) or mouse (mm10) genomes with Bowtie version 0.12.8 (Langmead et al., Genome biology 10: R25, 2009) using the “end-to-end k-difference (−v)” alignment mode and allowing 2 or 0 mismatches. In addition, this mode of alignment was combined with options that define which and how many alignments should be reported: the option “−k 1-best” instructed Bowtie to report only the best alignment if more than one valid alignment exists, while the option “−m 1” instructed Bowtie to refrain from reporting any alignments for reads having multiple reportable alignments Annotation of the mapped sequencing reads was performed with BEDTools (Quinlan et al., Bioinformatics 26: 841-842, 2010) using noncoding RNAs from Ensembl GRCh37 release 70, miRNAs from miRBase and tRNAs from Genomic tRNA Database (Chan and Lowe, Nucleic acids research 37: D93-97, 2009).
RNAs analyzed with Northern blots were extracted from normal or UV-irradiated U2OS cells (# HTB-96, ATCC, Manassas, Va.) and from fractionated human serum. Before RNA extraction, some serum samples were centrifuged at 110,000 g for 2 hours, followed by separation of supernatant and pellet fractions, and others were separated into concentrate and filtrate fractions by ultrafiltration through Vivaspin 2 columns (GE Healthcare) with 100 or 300 kDa MW cut-off. Total RNA including small RNA was isolated from cell pellets or serum fractions with the miRNeasy kit (Qiagen). RNAs were separated on 15% denaturing polyacrylamide gels, transferred, and fixed to a membrane by chemical cross-linking (Pall GS, and Hamilton A J., Nature protocols 3: 1077-1084, 2008). Blots were hybridized overnight at 42° C. in ULTRAhyb-Oligo Buffer (Invitrogen) with the following 32P-5′-end labeled oligonucleotide probes against the 5′ end (5′-AGTTCTGATAACCCACTACCATCGGACCAGCC; SEQ ID NO:8), or 3′ end (5′-AGCCAGTCAAATTTAGCAGTGGGGGGTTGTAT; SEQ ID NO:9) of RNY4. Membranes were washed twice with 2×SSC at 42° C., 0.1% SDS for 30 minutes, and exposed to X-ray films for detection of signals
We used RNA-Seq (Illumina reads of 50 nt) to characterize small RNAs circulating in human serum, using indexed libraries to distinguish reads from each serum sample. A combined total of 58,203,901 pre-processed sequencing reads was obtained from five human serum samples. The pooled sequencing reads were mapped to the human genome with Bowtie using parameters that align reads according to the end-to-end k-difference policy, allowing two mismatches and reporting only the best alignment if more than one valid alignment exists (Langmead et al., Genome biology 10: R25, 2009). This analysis generated a dataset of 51,887,820 mapped reads (89.15%), ranging in size from 18 to 49 nt. When reads with more than one alignment were discarded, the size distribution of the mapped reads revealed an expected peak at 20-24 nt consistent with the size of miRNAs (
Annotation of the mapped sequencing reads revealed that, as expected, reads in the 20-24 nt peak were derived from miRNAs (
Most circulating small RNAs that align to YRNA genes are derived from the RNY4 gene and its pseudogenes (
Characterization of Circulating Small RNAs that Align to YRNAs.
The serum-derived sequencing reads that align to YRNA genes are either 27 nt or 30-33 nt in size, while the size of full length YRNAs is 84-112 nt. We asked if the YRNA reads were the products of random fragmentation of full-length YRNAs, or alternatively show evidence of processing to produce specific fragments. We examined the alignment of YRNA reads to the genes from which they were transcribed, and annotated them based on their overlap with 5′ or 3′ ends of the genes. This analysis revealed that >95% of the YRNA-derived reads align with the 5′ end of YRNA; this is exemplified in
Northern blotting confirms the presence 5′ YRNA fragments in human serum and plasma. A probe specific for the 5′ end of RNY4 detected a major band migrating near the 30 nt RNA marker, and a minor band at ˜27 nt (
As a positive control for detection of YRNA fragments by Northern blotting, we included RNA from U2OS cells exposed to UV irradiation, which is known to strongly induce apoptosis. Cleavage of YRNAs, with generation of stable cellular YRNA-derived fragments, has been observed after exposure of cells to apoptotic stimuli (30). RNA extracted from U2OS cells treated with UV produced the same two bands present in serum (
We next asked if the 5′ YRNA fragments are free, or contained within circulating exosomes or microvesicles. We Northern blotted RNA extracted from pellet and from supernatant after ultracentrifugation of human serum at 110,000 g for 2 hours. A probe for the 5′ end of RNY4 detected bands (at ˜30 nt and ˜27 nt) present in the supernatant and visible only as a trace in the pellet (
Human Serum and EDTA Plasma have Similar Profiles of Circulating 5′ YRNA Fragments.
We asked if the same 5′ YRNA fragments are present in both serum and plasma. We prepared serum and plasma from blood collected from the same individual at the same time; plasma was prepared from blood treated with the anticoagulant EDTA, and serum from coagulated blood. Sequencing of small RNAs extracted from equal amounts of serum and EDTA plasma shows that plasma displays the same peak pattern (20-24 nt, 27 nt and 30-33 nt peaks) found in serum, with the exception that reads of 30 nt are significantly under-represented in the EDTA plasma when compared to serum (
Comparison of the annotations of the sequencing reads revealed that miRNAs map to the 20-24 nt peak approximately equally in serum and EDTA plasma (
5′ YRNA Fragments are Much More Abundant in Human than in Mouse Serum.
We sequenced five mouse serum samples to obtain a combined total of 71,725,136 pre-processed sequencing reads. Alignment to the mm10 mouse genome with Bowtie using the end-to-end k-difference policy while allowing two mismatches and reporting only the best alignment if more than one valid alignment exists (16), generated a dataset of 62,111,449 mapped reads (86.6%), ranging from 18 to 49 nt. Comparison of the length distribution revealed that both human and mouse serum display 20-24 nt and 30-33 nt peaks (
While surveying the profiles of cell-free small RNAs circulating in human blood, we identified abundant small RNAs derived from YRNAs, a class of small noncoding RNAs which complex with Ro protein in the cytoplasm, but as yet have incompletely characterized functions. We obtained 45,890,222 sequencing reads aligning to known small RNAs and found that 33% of these reads were annotated as YRNAs (
The serum YRNAs are derived from a subset of YRNA genes, many of them previously annotated as pseudogenes. While 27% of all sequencing reads that align with YRNAs were derived from RNY4, only 2% mapped to RNY1, RNY3, and RNY5 combined (
More interestingly, 42% of the sequencing reads that align with YRNAs map to pseudogenes arising from RNY4, while only 0.02% map to the pseudogenes of RNY1, RNY3, and RNY5 combined (
The YRNA reads represent fragments processed from full length (84-112 nt) YRNAs: the sequencing runs used to generate these reads were 50 cycles, yet only reads of length 27 nt or 30-33 nt are recovered and longer species were not present (
Because clotting has the potential to release cellular components that are not present in circulating blood, we asked if the same peak pattern of small RNAs in the human serum is also present in human plasma. Sequencing analysis of small RNAs extracted from serum and EDTA plasma samples prepared from the same person revealed that YRNAs are present in equivalent amounts and types in serum and EDTA plasma (
This study points out a puzzling feature of circulating small RNAs: 5′ YRNA fragments are abundant in human serum, but scarce in the mouse (
Secreted miRNAs, the most extensively studied circulating small RNAs, circulate in the blood as part of microvesicles, exosomes, or apoptotic bodies, and also in association with the lipoproteins HDL and LDL, Argonaute proteins, nucleophosmin-1, and ribosomal proteins L10a and L5 (Arroyo et al., Proceedings of the National Academy of Sciences of the United States of America, 5003-5008, 2011; Turchinovich A, and Burwinkel B., RNA biology 9: 2012; Turchinovich et al., Nucleic acids research 39: 7223-7233, 2011; Vickers et al., Nature Cell Biology 13: 423-433, 2011; Wang et al., Nucleic acids research 38: 7248-7259, 2010; Zernecke et al., Science Signaling 2: ra81, 2009). Nothing is currently known about the packaging of circulating small RNAs other than miRNAs, nor is it known how small RNAs, including miRNAs, make their way out of the cell into the extracellular space. Our Northern blot analysis of RNA extracted from pellet or supernatant after ultracentrifugation of human serum indicates that circulating 5′ RNY4 fragments are not included in exosomes or microvesicles (
Currently the tissues/cells of origin of circulating small RNAs, the mechanisms by which they are delivered, and their functions in recipient cells, remain largely unknown. However, information about the properties of one type of circulating small RNAs, i.e., miRNAs, has been emerging. Vickers et al. demonstrated that circulating miRNA/HDL complexes from atherosclerotic subjects, when delivered into cultured hepatocytes, altered expression of genes with functions related to lipid metabolism, inflammation, and atherosclerosis (Vickers et al., Nature Cell Biology 13: 423-433, 2011). Extracellular miRNAs secreted by endothelial cells are reported to alter gene expression in recipient cells. miR-126 triggered the production of the chemokine CXCL12 in recipient vascular cells (Zernecke et al., Science Signaling 2: ra81, 2009) while miR-143/145 altered gene expression in co-cultured smooth muscle cells to reduce the formation of atherosclerotic lesions in the aorta of ApoE(−/−) mice (Hergenreider et al., Nature cell biology 14: 249-256, 2012). Similarly, miR-150 secreted by human blood cells and cultured monocytic THP-1 cells, reduced c-Myb expression and enhanced cell migration after delivery into HMEC-1 cells (Zhang et al., Molecular cell 39: 133-144, 2010). Thus, there is evidence that extracellular miRNAs can enter target cells and alter gene expression with significant functional consequences. This suggests that other circulating small RNAs, such as 5′ YRNA fragments and 5′ tRNA halves, may also be capable of crossing the membranes of target cells and modulating cellular functions.
Reports of YRNA-derived fragments in cells or tissues are scant. Human YRNA-derived fragments were first observed in cells exposed to apoptotic stimuli (Rutjes et al., The Journal of biological chemistry 274: 24799-24807, 1999). The apoptosis-induced YRNA fragments have small (22-25 nt) and large (27-36 nt) sizes, and remain bound to Ro after they are cleaved. However, whether these fragments are derived from the 5′ or 3′ ends, or both, was not determined; Rutjes and colleagues (Rutjes, supra) used a non-specified mixture of probes for the four human YRNAs during Northern blot analysis. The same study also showed that the cleavage of YRNA is caspase-dependent. This suggests that the nucleases that cleave YRNAs might be caspase-activated nucleases also involved in inter-nucleosomal cleavage of chromatin that results in the DNA ladder during apoptosis. Whether the 5′ YRNA fragments abundantly circulating in the bloodstream of healthy human subjects can be linked to such an apoptotic cleavage remains to be investigated.
Production of 3′ end fragments of human RNY5 was observed upon treatment of cancerous and non-cancerous cell lines with the stressor poly(I:C), a double-stranded RNA mimic immunostimulant chemical (Nicolas et al., FEBS letters 586: 1226-1230, 2012). The same study reported the presence of 3′ end fragments of human RNY5 RNA in non-stressed MCF 7 mammary adenocarcinoma cells (Nicolas, supra). Only a human RNY5 3′ end probe was used in the Northern blotting analysis in this study, and so it is not known if 5′ end fragments of human RNY5 RNA were also present in these cells. Likewise, two 25-nt fragments derived from RNY1 and RNY3 RNAs were detected in solid tumors and in normal serum (Meiri et al., Nucleic acids research 38: 6234-6246, 2010; Schotte et al., Leukemia 23: 313-322, 2009). These two small RNAs were initially classified as miRNAs, but subsequently removed from miRBase because they lacked gene regulatory activity. Larger (27-36 nt) fragments derived from YRNAs, similar to the ones reported here, were not reported in solid tumors and in normal serum, most likely because in these studies sequences whose length exceeded 17-25 nt were systematically discarded (Meiri et al., Nucleic acids research 38: 6234-6246, 2010). In another study, 28 nt YRNA fragments were found in vesicles released by immune cells, along with full length YRNAs, and full length and derivatives of SRP-RNA and vault-RNA (Nolte-'t Hoen et al., Nucleic acids research 40: 9272-9285, 2012). The vesicular small RNAs were enriched relative to cellular RNA, suggesting their selective release into the extracellular space and potential regulatory functions in target cells.
In this study, we have identified an abundant (comparable to miRNA) class of small RNA circulating in human blood, derived largely from genomic sequences annotated as YRNA pseudogenes. Taken together, the evidence discussed here indicates a potential for a variety of functions for 5′ YRNA fragments.
The development of non-invasive specific biomarkers for early detection of cancer is key for effective therapeutic and preventive approaches to confront the worldwide morbidity and mortality of cancer and its rising financial burden. Circulating miRNAs are emerging as novel blood-based markers for the detection of human cancers, especially at an early stage. More recently, other small non-coding small RNAs were detected in plasma and serum, offering potential as a new class of biomarkers for diseases. Non-coding RNAs, with well known functions, undergo processing into smaller RNAs, in particular, tRNA is processed into tRNA fragments which were shown to function as inhibitors of translation initiation in response to stress in cultured cells. We recently reported the presence of tRNA- and YRNA-derived fragments in serum/plasma where they circulate as a component of a stable macromolecular complex. We found that the abundance of 5′ tRNA halves in the serum changes with age and calorie restriction, strongly suggesting that they are a novel form of signaling molecule, and thus, could serve as markers of health and disease states. YRNA-derived fragments were detected in MCF7 mammary adenocarcinoma cells and found significantly induced upon treatment of cancerous and non-cancerous cell lines with the stressor poly(I:C), a double-stranded RNA mimic immunostimulant chemical.
Here, we used high-throughput sequencing of small RNAs to perform genome-wide measurements of the serum levels of tRNA and YRNA fragments from 5 healthy female controls and 5 female patients with breast cancer. The analysis revealed that breast cancer is associated with significant differences in the abundance of circulating noncoding small RNAs derived from tRNAs and YRNAs (Tables 4 and 5). The observed differences in the levels of the circulating YRNA- and tRNA-derived fragments are linked to the presence of cancer. Thus, the profile of these fragments in serum, plasma, and other body fluids can be used new minimally invasive cancer markers.
1tRNA isoacceptor identity with corresponding genomic positions in the human hg19 genome.
2Average tRNA read count for the indicated experimental group reported as counts per million (cpm) reads in the sequenced library.
3Fold change calculated by EdgeR from comparison between the normal and breast cancer serum samples.
1YRNA identity with corresponding genomic positions in the human hg19 genome.
2Average YRNA read count for the indicated experimental group reported as counts per million (cpm) reads in the sequenced library.
3Fold change calculated by EdgeR from comparison between the normal and breast cancer serum samples.
4Indicates whether the sequencing reads map to the 5′ or 3′ end YRNAs.
It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.
This application claims the benefit of U.S. Provisional Application No. 61/818,869, filed May 2, 2013, the contents of which are hereby incorporated by reference in their entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
61818869 | May 2013 | US |