The present invention relates generally to the field of molecular biology. More particularly, it concerns methods and compositions for detecting, evaluating, sequencing, and/or mapping modified adenosines.
A central question of biology is how the flow of genetic information from DNA to RNA to protein is regulated. While transcriptional regulation—the production of messenger RNA (mRNA)—plays major roles, and has been extensively studied, protein expression ultimately determines biological phenotypes. Protein production is augmented by various post-transcriptional regulations such as mRNA structure, microRNA, and mRNA translation; each of these processes fundamentally affects the protein levels and localizations that eventually impact every biological process.
Reversible and dynamic mRNA and long non-coding RNA (lncRNA) modifications were recently discovered as being a fundamental mechanism that broadly controls protein expression at the post-transcriptional level (Jia G, et al., Nat Chem Biol. 2011; 7(12):885-7; Liu J, et al., Nat Chem Biol. 2014; 10(2):93-5; Wang X, et al., Nature. 2014; 505(7481):117-20; Zheng G, et al., Mol Cell. 2013; 49(1):18-29; and Fu Y, et al., Nat Rev Genet. 2014; 15(5):293-306).
Since then, there has been extensive research interests in profiling various mRNA/lncRNA modifications such as N6-methyladenosine (m6A) based on antibodies or pseudouridine (Ψ) based on a chemical reaction (Carlile T M, et al, Nature. 2014; 515(7525):143-6; Schwartz S, et al., Cell. 2014; 159(1):148-62; Dominissini D, et al., Nature. 2012; 485(7397):201-6; and Meyer K, Cell. 2012; 149(7):1635-46). These studies have identified the presence of a very large number of modification sites, leading to the current high interests in the epitranscriptome field. Functional explorations of RNA modifications in various biological processes have so far uncovered several new gene expression regulatory mechanisms (Liu J, et al., Nat Chem Biol. 2014; 10(2):93-5; Wang X, et al., Nature. 2014; 505(7481):117-20; Zheng G, et al., Mol Cell. 2013; 49(1):18-29; Batista P J, et al., Cell Stem Cell. 2014; 15(6):707-19; Chen T, et al., Cell Stem Cell. 2015; 16(3):289-301; Geula S, et al., Science. 2015; 347(6225):1002-6; Ping X-L, et al., Cell Res. 2014; 24(2):177-89; Schwartz S, et al., Cell. 2013; 155(6):1409-21; and Wang Y, et al., Nat Cell Biol. 2014; 16(2):191-8). RNA modification is a highly fertile ground where additional regulatory mechanisms will be discovered. In particular, mRNA/lncRNA modifications are expected to be increasingly associated with human health and diseases as the field progresses.
Despite the functional significances and potential associations with human diseases, mRNA/lncRNA modifications have been studied with methods confined to either antibody-based immunoprecipitations or applications of decades-old chemical approaches; all these methods are significantly limited in resolution and sensitivity. Therefore, there is a need in the art for new methods of detecting RNA modifications.
The current disclosure addresses the aforementioned need in the art and describes a new generation of sequencing technology that can be applied generally in order to obtain highly sensitive and single-base-resolution mapping of different RNA modifications. Described herein is a method for detecting modified adenosine in a target ribonucleic acid (RNA) comprising contacting the target RNA with an adenosine deaminase enzyme to generate a target RNA with deaminated adenosines and sequencing the target RNA with deaminated adenosines; wherein any modified adenosine in the target RNA is read as an adenosine in the sequencing of the target RNA with deaminated adenosines. In some embodiments of any of the methods, kits, and compositions described herein, the adenosine deaminase enzyme is ADAR (adenosine deaminase, RNA-specific). In some embodiments, the adenosine deaminase enzyme is RNA specific. In some embodiments, the adenosine deaminase enzyme lacks significant sequence specificity or is non-sequence specific. In some embodiments, the adenosine deaminase enzyme works on double-stranded nucleic acids.
Adenosine deaminase enzymes include, for example, adenosine deaminases from any organisms such as humans (ADA, GenBank Accession: NP_000013), mouse (ADA, GenBank Accession: NP_001258981), Drosophila melanogaster (SEQ ID NO:1); RNA adenosine deaminase from humans (GenBank Accession: AAB97118), cows (GenBank Accession: XP_010801274.1), rat (GenBank Accession: EDM00617). The sequences associated with each of these is herein incorporated by references in their entirety.
In some embodiments, the method further comprises contacting target RNA with a demethylating enzyme; contacting the demethylated target RNA with the adenosine deaminase enzyme to produce control RNA; and sequencing the control RNA; wherein any modified adenosine in the target RNA is read as a guanosine in the sequencing of the control RNA with deaminated adenosines. In further embodiments, the demethylated target RNA is made by methods and steps described herein for generating controls. In some embodiments, the method further comprises comprising comparing the sequence of the target RNA with dominated deaminated adenosines to the sequence of the demethylated target RNA. In some embodiments, the demethylating enzyme is an N6-methyladenosine-specific demethylating enzyme. In some embodiments, the demethylating enzyme is ALKBH5 or FTO. FTO (fat mass and obesity associated) and ALKBH5 (alkB homolog 5, RNA demethylase or alkB, alkylation repair homolog 5) are demethylating enymes specific for N6-methyladenosine. FTO and ALKBH5 are known in the art. The enzyme may be recombinantly made or synthetic, and may be from any species. In some embodiments, the enzyme is the mammalian enzyme. The human ALKBH5 is represented by GenBank Accession Nos: NM_017758.3 (mRNA) and NP_060228.3 (protein). The mouse ALKBH5 is represented by GenBank Accession Nos.: NM_172943.4 (mRNA) and NP_766531.2 (protein). The human FTO is represented by GenBank Accession Nos.: XM_011523313.1 (mRNA), XP_011521615.1 (protein), XM_011523316.1 (mRNA), XP_011521618.1 (protein), XM_011523314.1 (mRNA) XP_011521616.1 (protein), XM_011523315.1 (mRNA), and XP_011521617.1 (protein). In some embodiments, the demethylating enzyme is ALKBH5. The sequences associated with each of these GenBank accession numbers is herein incorporated by reference for all purposes. In some embodiments, the demethylating enzyme is from insects. In some embodiments, the demethylating enzyme is from Drosophila melanogaster.
In some embodiments, the target and/or demethylated target RNA is in a duplex with a complementary strand of RNA or DNA. In some embodiments, the complementary strand is DNA. In some embodiments, the complementary strand comprises modified adenosine. The modified adenosine may be one known in the art and/or described herein. In some embodiments, the modified adenosine is N6-methyladenosine or N1-methyladenosine.
ADAR encodes the enzyme responsible for RNA editing by site-specific deamination of adenosines. This enzyme destabilizes double-stranded RNA through conversion of adenosine to inosine. The ADAR may be from any organisms such as human, mouse, insect, etc . . . In some embodiments, the ADAR is from Drosophila melanogaster, which is abbreviated dADAR In some embodiments, the ADAR is from insects. The ADAR may be synthetically made or recombinantly made. Methods of producing and purifying enzymes are known in the art.
The target RNA may be any type of RNA in a cell. In some embodiments, the target RNA is mRNA, lncRNA, pri-microRNA, pre-piRNA, rRNA, tRNA, snoRNA, or snRNA. In some embodiments, the target RNA is mRNA or lncRNA. In some embodiments, the method further comprises isolating RNA. In some embodiments, the method comprises isolating a specific RNA. The term isolating, in this context, refers to the separation of one type of RNA from other types of RNA. Therefore, the isolated RNA fraction may contain at least, at most, or exactly about 75, 80, 85, 90, 95, 96, 97, 98, 99, or 100% (or any derivable range therein) of a specific RNA type. In some embodiments, the term isolated also refers to something that is separated from cellular components and/or is free of cellular materials.
In some embodiments, the method further comprises generating a nucleic acid strand that is complementary with the target and/or demethylated target RNA and hybridizing the complementary nucleic acid strand with the target RNA and/or demethylated target RNA. Methods of generating complementary nucleic acid strands are known in the art and described herein. In some embodiments, generating the nucleic acid strand that is complementary with the target and/or demethylated target RNA comprises synthesis of a nucleic acid strand complementary to the target and/or demethylated target RNA. In some embodiments, the synthesis of the nucleic acid strand comprises a synthesis reaction composition comprising adenosine triphosphate, wherein all of the adenosine triphosphate is modified adenosine triphosphate. In some embodiments, the complementary nucleic acid strand comprises modified adenosines.
When the term “modified adenosines” is used herein, the modified adenosine may be any modified adenosine known in the art and/or described herein. In some embodiments of the methods, compositions, and kits of the disclosure, the modified adenosine is N6-methyladenosine. In some embodiments, the modified adenosine is N1methyladenosine. In some embodiments, the nucleic acid may comprise more than one type of adenosine modification. In some embodiments, the method further comprises determining the type of modification in a type of RNA.
In some embodiments, the complementary nucleic acid strand is RNA. In some embodiments, the complementary nucleic acid strand is DNA.
In some embodiments, the target and/or demethylated target RNA is contacted with an RNA polymerase to synthesize a complementary RNA strand. The RNA polymerase may be any RNA polymerase known in the art. In some embodiments, the method further comprises contacting the target and/or demethylated target RNA with a RNA replicase to synthesize a complementary RNA strand. In some embodiments, the RNA replicase is Phi6. In further embodiments, the RNA replicase is one known in the art. In some embodiment, the method comprises synthesis of a complementary nucleic acid from a cDNA library. In some embodiments, the complementary nucleic acid strand is RNA.
In some embodiments, the target RNA is immobilized on a solid support. Solid supports are known in the art and include, for example, glass, plastics, polymers, metals, metalloids, ceramics, organics, beads, agarose, cellulose, dextran (commercially available as, i.e., Sephadex, Sepharose) carboxymethyl cellulose, polystyrene, polyethylene glycol (PEG), filter paper, nitrocellulose, ion exchange resins, plastic films, polyaminemethylvinylether maleic acid copolymer, glass beads, amino acid copolymer, ethylene-maleic acid copolymer, nylon, silk, etc.
In some embodiments, the target RNA strand is labeled. As used herein, the term “label” intends a directly or indirectly detectable compound or reactable functional group useful for attachment of nucleic acids to solid supports. The label may also be conjugated directly or indirectly to the composition to be detected, e.g., polynucleotide or protein such as an antibody so as to generate a “labeled” composition. The term also includes sequences conjugated to the polynucleotide that will provide a signal upon expression of the inserted sequences, such as green fluorescent protein (GFP) and the like. The label may be detectable by itself (e.g. radioisotope labels or fluorescent labels) or, in the case of an enzymatic label, may catalyze chemical alteration of a substrate compound or composition which is detectable. The labels can be suitable for small scale detection or more suitable for high-throughput screening. As such, suitable labels include, but are not limited to radioisotopes, fluorochromes, chemiluminescent compounds, dyes, and proteins, including enzymes. The label may be simply detected or it may be quantified. A response that is simply detected generally comprises a response whose existence merely is confirmed, whereas a response that is quantified generally comprises a response having a quantifiable (e.g., numerically reportable) value such as an intensity, polarization, and/or other property. In luminescence or fluoresecence assays, the detectable response may be generated directly using a luminophore or fluorophore associated with an assay component actually involved in binding, or indirectly using a luminophore or fluorophore associated with another (e.g., reporter or indicator) component.
In some embodiments, the label is a compound or reactable functional group useful for attachment of a nucleic acid to a solid support. In some embodiments, the label is a phosphorothioate group. In some embodiments, the target RNA is immobilized by reaction of the phosphothioate group with a thiol-reactive group. In some embodiments, the thiol-reactive group is iodoacetamide, maleimide, or methanethiosulfonate.
In some embodiments, the target RNA is fragmented prior to immobilization on the solid support. In some embodiments, the RNA is fragmented into RNA molecules 50-300 nucleotides in length. In some embodiments, the RNA is fragmented into RNA molecules 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 225, 250, 275, 300, 350, 400, 450, or 500 to 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 225, 250, 275, 300, 350, 400, 450, 500, 1000, 1500, or 2000 (or any range derivable therein) nucleotides in length. In some embodiments, the average RNA molecule fragment is about 25, 50, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 220, 240, 260, 280, 300, 325, 350, 375, 400, 450, 500, or 600 nucleotides in length. In some embodiments, the target RNA is fragmented prior to immobilization.
In some embodiments, the method comprises one or more of the following steps: fragmenting the target RNA; contacting the target and/or demethylated target RNA with RNA replicase to synthesize a complementary RNA strand that forms a duplex with the target RNA or demethylated target RNA; denaturing of the RNA duplex to create mobilized complementary RNA; further comprises removal of the mobilized complementary RNA; and/or mobilizing the target RNA by separating it from the solid support. In some embodiments, mobilizing the target RNA is done by silver (I)-mediated cleavage (for phosphothiolate linkage). In some embodiments, mobilizing the target RNA further comprises proteinase K digestion and/or biotin completion (for biotin-based linkage). In some embodiments, sequencing of the target RNA comprises construction of a library of nucleic acid molecules comprising the target RNA sequence.
In some embodiments, the method comprises the steps of a) immobilizing the target RNA, b) contacting the target and/or demethylated target RNA with RNA replicase to synthesize a complementary RNA strand that forms a duplex with the target RNA or demethylated target RNA, c) contacting the target RNA with an adenosine deaminase enzyme to generate a target RNA with deaminated adenosines, d) denaturing of the RNA duplex to create mobilized complementary RNA; e) removal of the mobilized complementary RNA; and f) sequencing of the target RNA. In some embodiments, the steps a-f are sequential. In some embodiments, step a is performed followed by iterative rounds of the ordered steps of b-e, followed by step f. In some embodiments, steps b-e are repeated in iterative cycles for one or more times prior to step f For example, one method includes three iterative rounds of steps b-e, which includes performing the following steps in the following order: step a, b, c, d, e, b, c, d, e, b, c, d, e, f In some embodiments, steps b-e are repeated at least, at most, or exactly 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 50, 75, or 100 times (or any derivable range therein).
In some embodiments, the method further comprises contacting the target and/or demethylated target RNA with reverse transcriptase to synthesize a complementary DNA strand.
The target and/or demethylated target RNA may be of any length. In some embodiments, the target RNA is 10-1000 nucleic acids in length. In some embodiments the target RNA is at least, at most, or exactly about 5, 10, 20, 30, 40, 50, 60, 70, 80, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 4000, 5000, 7500, or 10000 nucleic acids in length, or any derivable range therein. In some embodiments, nucleic acid molecules may be DNA, RNA, or a combination of both. Nucleic acids may be recombinant, genomic, or synthesized. In additional embodiments, methods involve nucleic acid molecules that are isolated and/or purified. The nucleic acid may be isolated from a cell or biological sample in some embodiments. Certain embodiments involve isolating nucleic acids from a eukaryotic, mammalian, or human cell. In some cases, they are isolated from non-nucleic acids. In some embodiments, the nucleic acid molecule is eukaryotic; in some cases, the nucleic acid is mammalian, which may be human. This means the nucleic acid molecule is isolated from a human cell and/or has a sequence that identifies it as human. In particular embodiments, it is contemplated that the nucleic acid molecule is not a prokaryotic nucleic acid, such as a bacterial nucleic acid molecule. In additional embodiments, isolated nucleic acid molecules are on an array. In particular cases, the array is a microarray. In some cases, a nucleic acid is isolated by any technique known to those of skill in the art, including, but not limited to, using a gel, column, matrix or filter to isolate the nucleic acids. In some embodiments, the gel is a polyacrylamide or agarose gel.
In some embodiments, the sequence of the target and/or demethylated target RNA is known. The term sequence as used herein refers to the nucleotide sequence such as “A” for adenosine, “G” for guanine, “C” for cytosine, “T” for thymine, and “U” for uracil. Eventhough the sequence is known, it may not be known whether the nucleic acid bases are modified or unmodified.
In some embodiments, the method further comprises comparing the known sequence of the target RNA with the sequence of the target RNA with deaminated adenosines.
In some embodiments, contacting the target and/or demethylated target RNA with ADAR is done in the presence of GTP. In some embodiments, the GTP is in a concentration of 0.5-5 mM. In some embodiments the concentration of GTP in the solution is at least, at most, or exactly about 0.1, 0.2, 0.3, 0.4, 0.5, 0.8, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 7.0, 8.0, 9.0, 10, 15, 20, 30, 40, 50, 100, 200, 300, 400, or 500 mM (or any derivable range therein).
In some embodiments, the target RNA molecule is in a duplex with a DNA molecule. In some embodiments, the method comprises contacting the RNA-DNA duplex with a DNA digesting agent prior to sequencing the RNA. The DNA digesting enzyme may be sequence specific or non-specific. In some embodiments, the DNA digesting enzyme is DNase.
In some embodiments, the method further comprises the steps of: a) generating a DNA strand that is complementary to the target and/or demethylated target RNA and hybridizing the complementary DNA strand with the target and/or demethylated target RNA to generate an RNA-DNA duplex; b) contacting the RNA-DNA duplex with the adenosine deaminase enzyme; and c) contacting the RNA-DNA duplex with a DNA digesting agent; wherein all the steps are done prior to DNA target RNA sequencing. In some embodiments, the method further comprises repeating steps a, b, and/or c one or more times. In some embodiments, the steps are repeated at least two more times. In some embodiments, the steps a, b, and/or c are repeated at least, at most, or exactly 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 more times, or any derivable range therein. In some embodiments, the method further comprises step of strand separation after step b). In some embodiments, the method further comprises purification of the target and/or demethylated target RNA after step c) and prior to DNA target RNA sequencing.
In some embodiments, the method further comprises providing a quantification RNA control comprising a known percentage of modified adenosine; contacting the quantification control RNA with the adenosine deaminase enzyme; and sequencing the deaminated quantification control RNA. In some embodiments, 0, 25, 50, 75, or 100% of the adenosine in the quantification RNA control is modified. In some embodiments, the method comprises at least, at most, or exactly 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 quantification controls (or any derivable range therein). In each of the quantification controls, at least, at most, or exactly 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% (or any derivable range therein) of the adenosines may me modified.
Further aspects of the disclosure relate to a kit comprising: an adenosine deaminase enzyme and instructions for detecting modified adenosines in target RNA. In some embodiments, the kit further comprises a control nucleic acid. The control nucleic acid may be any embodiment described herein. In some embodiments, the control nucleic acid comprises a non-naturally occurring nucleic acid or non-widely present nucleic acid. In some embodiments, the control nucleic acids are are a non-naturally occurring nucleic acid sequence. In some embodiments, the control nucleic acid comprises RNA or DNA comprising modified adenosines. In some embodiments, the percentage of adenosines that are modified in the control nucleic acid is known. In some embodiments, at least, at most, or exactly 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% (or any derivable range therein) of the adenosines in the control nucleic acid are modified. In some embodiments, the control nucleic acid comprises a duplex with a complementary nucleic acid strand. In some embodiments, the complementary nucleic acid strand is DNA. In some embodiments, the complementary nucleic acid strand is RNA with modified adenosines. In some embodiments, the kit further comprises an adenosine demethylase. The adenosine demethylase may be one known in the art and/or described herein. In some embodiments, the adenosine demethylase is specific for N6-methyladenosine. In some embodiments, the adenosine demethylase is specific for N1methyladenosine. In some embodiments, the kit further comprises an adenosine deaminase enzyme reaction composition comprising GTP. In some embodiments, the kit further comprises a DNase. In some embodiments, the kit further comprises a reverse transcriptase. In some embodiments, the kit further comprises a molecule or embodiment described herein in the methods and compositions. In some embodiments, the kit further comprises reagents to perform the method steps described herein, such as immobilization reagents, enzymes, and buffers described throughout the disclosure.
Further aspects relate to a method for detecting modified adenosine in a target ribonucleic acid (RNA) comprising contacting a double-stranded nucleic acid molecule comprising the target RNA with the adenosine deaminase enzyme to generate a target RNA with deaminated adenosines and sequencing the target RNA with deaminated adenosines; wherein the modified adenosine is detected when the sequence of the target RNA with deaminated adenosines is adenosine.
A further aspect relates to a method for detecting modified adenosine in a target ribonucleic acid (RNA) comprising: a) providing a target RNA; b) generating a DNA strand that is complementary with the target RNA and hybridizing the complementary DNA strand with the target RNA to generate a RNA-DNA duplex comprising the target RNA; c) contacting the RNA-DNA duplex with the adenosine deaminase enzyme to generate target RNA with deaminated adenosines; and d) contacting the RNA-DNA duplex with a DNA digesting agent; and e) sequencing the target RNA with the deaminated adenosines.
A further aspect relates to a method for detecting modified adenosine in a target ribonucleic acid (RNA) comprising: a) providing a target RNA; b) generating a DNA strand that is complementary with the target RNA and hybridizing the complementary DNA strand with the target RNA to generate a RNA-DNA duplex comprising the target RNA; c) contacting the RNA-DNA duplex with the adenosine deaminase enzyme to generate target RNA with deaminated adenosines; and d) contacting the RNA-DNA duplex with a DNA digesting agent; e) repeating steps b, c, and d one or more times; and f) sequencing the target RNA with the deaminated adenosines.
A further aspect relates to a method for detecting modified adenosine in a target ribonucleic acid (RNA) comprising: a) providing a target RNA; b) generating an RNA strand with modified adenosine that is complementary with the target RNA to generate an RNA-RNA duplex comprising the target RNA; c) contacting the RNA-RNA duplex with the adenosine deaminase enzyme to generate target RNA with deaminated adenosines; and d) sequencing the target RNA with the deaminated adenosines.
In certain embodiments, the enzymes and/or nucleic acids used in the methods, kits, and compositions described herein may comprise one or more detectable moieties and/or modification. A detectable moiety refers to a chemical compound or element that is capable of being detectedIn certain embodiments, a detectable moiety is fluorescent, radioactive, enzymatic, electrochemical, or colorimetric. In some embodiments, the detectable moiety is a fluorophore or quantum dot. In some embodiments, a modification moiety may be a linker that allows one or more functional or detectable moieties or isolation tags to be attached to the molecules. In some embodiments the linker is an azide linker or a thiol linker. In further embodiments, the modification moiety may be an isolation tag, which means the tag can be used to isolate a molecule that is attached to the tag. In certain embodiments, the isolation tag is biotin, Flag, or a histidine tag. In some cases, the tag is modified, such as with a detectable moiety. It is contemplated that the linker allows for other chemical compounds or substances to be attached to the molecule.
Methods and compositions may also involve one or more enzymes. In some embodiments, the enzyme is a restriction enzyme or a polymerase. In certain cases, embodiments involve a restriction enzyme. The restriction enzyme may be methylation-insensitive. In other embodiments, the enzyme is polymerase.
Methods may involve identifying adenosine modifications in the nucleic acids by comparing modified nucleic acids with unmodified nucleic acids or to nucleic acids whose modification state is already known. Detection of the modification can involve a wide variety of recombinant nucleic acid techniques. In some embodiments, a modified nucleic acid molecule is incubated with polymerase, at least one primer, and one or more nucleotides under conditions to allow polymerization of the modified nucleic acid. In additional embodiments, methods may involve sequencing a modified nucleic acid molecule. In other embodiments, a modified nucleic acid is used in a primer extension assay.
Methods and compositions may involve a control nucleic acid. In addition to the controls described herein, control may also be used to evaluate whether modification or other enzymatic or chemical reactions are occurring. Alternatively, the control may be used to compare modification states. The control may be a negative control or it may be a positive control. It may be a control that was not incubated with one or more reagents in the modification reaction. Alternatively, a control nucleic acid may be a reference nucleic acid, which means its modification state (based on qualitative and/or quantitative information related to modification at adenosines, or the absence thereof) is used for comparing to a nucleic acid being evaluated. In some embodiments, multiple nucleic acids from different sources provide the basis for a control nucleic acid. In some embodiments, the control is a pool of target RNA that has undergone demethylation. Moreover, in some cases, the control nucleic acid is from a normal sample with respect to a particular attribute, such as a disease or condition, or other phenotype. In some embodiments, the control sample is from a different patient population, a different cell type or organ type, a different disease state, a different phase or severity of a disease state, a different prognosis, a different developmental stage, etc.
Embodiments also concern kits, which may be in a suitable container, that can be used to achieve the described methods. In further embodiments, a kit may include one or more buffers, such as buffers for nucleic acids or for reactions involving nucleic acids. Other enzymes may be included in kits in addition to the adenosine deaminase enzyme. In some embodiments, an enzyme is a polymerase. Kits may also include nucleotides for use with the polymerase. In some cases, a restriction enzyme, (e.g. DNase) is included in addition to or instead of a polymerase.
Other embodiments also concern an array or microarray containing nucleic acid molecules that have been modified at adenosines.
The use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.”
It is contemplated that any embodiment discussed herein can be implemented with respect to any method or composition of the invention, and vice versa. Furthermore, compositions and kits of the invention can be used to achieve methods of the invention.
Throughout this application, the term “about” is used to indicate that a value includes the standard deviation of error for the device or method being employed to determine the value.
The use of the term “or” in the claims is used to mean “and/or” unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and “and/or.” It is also contemplated that anything listed using the term “or” may also be specifically excluded.
As used in this specification and claim(s), the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “includes” and “include”) or “containing” (and any form of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, unrecited elements or method steps.
Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating specific embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present invention. The invention may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
Certain embodiments are directed to methods and compositions for detecting modified adenosine in a target ribonucleic acid (RNA) by contacting the target RNA with an adenosine deaminase enzyme (adenosine deaminase, RNA-specific). In some embodiments, the adenosine deaminase enzyme is ADAR. The ADAR enzyme deaminates unmodified adenosines, converting them to inosines. ADAR enzymes, such as those expressed and purified from insect cells, are highly active against unmodified adenosines, but possesses over 50-fold lower activity towards modified adenosines, such as m6A, for example, compared to unmodified adenosine (Véliz E A, et al., Journal of the American Chemical Society. 2003; 125(36):10867-76). Therefore, after selective deamination of all adenosines in the transcriptome and subsequent reverse transcriptase (RT)-PCR followed by sequencing, only modified adenosines will be read as “A” (
Dynamic chemical modifications of DNA and histone proteins represent fundamental mechanisms of biological regulation. Post-transcriptional modifications are also ubiquitous in RNA. To date, over 100 different RNA modifications have been identified with a wide variety of chemical diversities. Examples of such modifications can be found on the world wide web at rna-mdb.cas.albany.edu/RNAmods/, and the contents of this website publication are herein incorporated by reference. Unlike genomic DNA that tends to have limited variation of chemical modifications, the wide variety of RNA modifications appears to be a strategy used by nature to entail and facilitate a much greater diversity of structures and cellular functions for different RNA species. The m6A modification in mRNA/lncRNA alone is known to modulate the affinity for RNA-binding proteins, control subcellular localization, lifetime, storage, transport, and translation of mRNAs, switch secondary structures of RNAs, as well as to affect innate immune response.
The explosive discoveries of functional RNAs in the last decade have changed the current views on the functions of RNA and biological mechanisms that control RNA. However, the exact roles of most RNA modifications remain unknown. Certain RNA modifications are essential for life, with defects of RNA-modifying enzymes known to be associated with diverse human diseases. Most studies before 2011 on RNA modifications were limited to the abundant RNAs such as rRNA, snRNA, or tRNA. The positions of these modifications can be studied with RNA digestion followed by traditional liquid chromatography separation coupled with mass spectrometry or thin-layer chromatography due to their high abundances. The limit to these methods is their sensitivity; they cannot be applied to map modifications in low abundant mRNAs and lncRNAs, most of which appear to play critical roles in regulating gene expression. The methods of the current disclosure provide new opportunities to investigate distributions of not only mRNA and lncRNA modifications but also dynamic modifications on tRNA, snRNA, and rRNA that could not be effectively probed transcriptome-wide in the past. The methods and compositions of the disclosure can be readily applied to any class of RNA. The ability to identify all adenosine modifications at single-base resolution will significantly advance the frontier of epitranscriptomics and enable transcriptome-wide investigations that associate genomic variations and mutations with human health and diseases.
Mammalian mRNA and lncRNA can be modified at tens of thousands of sites. Many of these modifications are conserved in almost all eukaryotes. We have measured the relative abundances of all known mammalian internal mRNA modifications that include N6-methyladenosine (m6A), pseudouridine (Ψ), 5-methylcytosine (m5C), N1-methyladenosine (m1A), and 2′O-methylation (Nm) (
A. N6-methyladenosine (m6A)
m6A occurs at a high frequency in RNA (e.g. mRNA or lnc RNA). It is also reversible and dynamically regulated. The m6A modification appears to affect almost every phase of mRNA metabolism and function, thereby impacting diverse biological processes. Therefore, m6A studies so far embody the concept of “epitranscriptome”; its functional significance and implementation are exerted by three groups of proteins: “writers” that install, “erasers” that remove, and “readers” that bind or recognize m6A in order to determine the cellular fate of the modified mRNA/lncRNA (
In mammals, m6A is installed by a three-protein core complex comprised of two catalytic subunits, METTL3/METTL14 and an accessory factor WTAP. Depletion of METTL3 homologs readily leads to developmental arrest or defects in gametogenesis in yeast, flies, and plants. In zebra fish, knockdown of METTL3 leads to smaller head, eyes, and brain ventricle, and curved notochord. The phenotypes in mammals are more severe. Both methyltransferases METTL3 and METTL14 are essential in mammals. m6A is a critical regulator in the differentiation of mouse embryonic stem cells (mESCs).
The m6A modification plays a key role in facilitating transition of mESCs from the naïve state to the primed state upon differentiation. Mettl3-depleted mESCs preserved their “naïve” pluripotent identity, but failed to proceed into the “primed” EpiSC-like state, hence blocking the subsequent differentiation; they also failed to differentiate into normal embryoid bodies (EBs) and mature neurons upon corresponding inductions. In naive ESCs and primed EBs, the m6A modification was detected in 80% of transcripts of naive pluripotency genes (e.g., Nanog, Klf4, Sox2, Esrrb), as well as multiple lineage commitment regulators (e.g., Foxa2, Sox17). In general, m6A deposition in mESCs decreases the expression of methylated transcripts and directly reduces their stability. Therefore, m6A loss increased the abundance and prolonged the lifetimes of naïve pluripotency transcripts. Silencing METTL3 also leads to mRNA processing delay and circadian period elongation. m6A depletion prolongs nuclear retention and delays nuclear exit of mature mRNAs of clock genes Per2 and Arntl. This result reveals an important physiological function of m6A in setting the pace of the circadian cycle and determining the clock speed and stability by controlling nuclear RNA export. A recent study described that microRNAs (miRNAs) could regulate m6A modification via a sequence-pairing mechanism in order to modulate the binding of METTL3 to mRNA substrates. Another recent study uncovered that m6A on pri-miRNA plays critical roles in miRNA maturation. The inhibition of pri-miRNA methylation induced delayed maturation of 70% of all miRNA, resulting in >30% changes of mature miRNA levels.
m6A on mRNA can be reversed by two RNA demethylases, FTO and ALKBH5. Defects of FTO and AlkBH5 lead to altered metabolism, neural development retardation, and compromised spermatogenesis. A common variant of the FTO gene (an intron mutation) has been shown to generate a predisposition to obesity. Knockout mouse models revealed that FTO is important to development: most knockout mice die at the embryo state or within the first month of birth; those that survived tended to lose body weight and were smaller compared to the control mice. A mutation of the human FTO coding region has also been linked to mental retardation. The Alkbh5 knockout male mice exhibit significant spermatogenesis defects with compromised fertility. The fact that FTO and ALKBH5 show noticeable but very different phenotypes in humans or mice strongly indicates that reversible m6A RNA methylation plays important roles in biological regulation.
m6A is recognized by “reader” proteins to exhibit biological functions, just like the interplay between DNA cytosine-methylation and methyl-CpG-binding proteins that regulate gene expression through binding to methylated cytosines. Applicants have identified several m6A-specific binding proteins in humans that belong to the YTH family: YTHDF1, YTHDF2, and YTHDC1. All these proteins bind the m6A-containing RNA selectively over unmethylated RNA through direct accommodation of the methyl group in their structures. Functional characterizations revealed that YTHDF2 affects cytoplasmic localization and mediates the decay of methylated mRNA, YTHDF1 promotes translation of methylated mRNA through facilitating translation initiation, and YTHDC1 affects the nuclear export of methylated mRNA. At the organismal level, knockout of Ythdc1 or Ythdf2 is embryonically lethal in mouse.
Applicants have shown that m6A methylation can significantly affect the mRNA and lncRNA structure transcriptome-wide. The m6A effect on mRNA/lncRNA structure, termed “m6A-switch,” can dramatically affect protein-RNA interactions to impact mRNA abundance and alternative splicing of the methylated RNA. Therefore, m6A exerts its functions not only through being directly “read,” but also through RNA structural remodeling.
Using an antibody known to recognize m6A to enrich m6A-containing RNA fragments, transcriptome-wide profiling has been performed in human cells and mouse tissues. Both studies revealed tens of thousands of m6A-containing segments in mRNA and lncRNA. The m6A-immunoprecipitation (IP) approach uncovers “m6A-peaks” that are on average 100-200 bases wide in the mRNA/lncRNA. Subsequent SCARLET studies by Applicants at a dozen m6A sites at single-base resolution revealed sub-stoichiometry at all modification sites in mRNA/lncRNA investigated. Transcriptome-wide profiling of m6A during yeast meiosis and sporulation has also uncovered profound mRNA methylation, suggesting functional roles.
These global mapping studies have unveiled conserved, widespread, and dynamic mRNA methylation in eukaryotes. Three salient features of the m6A methylome are conserved in mammals: i) m6A sites are mainly confined in the consensus motif Pu[G>A] m6AC[U>A>C], consistent with early studies. The overall cellular m6A methylation accounts for at most ˜15% of all consensus sequences that can be modified; ii) m6A marks are not equally distributed across the transcriptome; rather, they are preferentially enriched in a subset of consensus sequences near stop codons, in 3′ UTRs, and within long internal exons (
Applicants have performed m6A profiling in mRNA from three individuals in each of human, chimpanzee, and rhesus monkey. Results indicate that the newly evolved m6A-modified transcripts are noticeably enriched in human disease pathways. In addition, results also indicate the functional significances of m6A in mRNA in neurogenesis and neurodevelopment.
B. N1-methyladenosine (m′A)
The most abundant eukaryotic rRNA and tRNA modifications are pseudouridine Ω, 2′O-methyls (Nm), N1-methyladenosine (m1A), and 5-methylcytidine (m5C). These are installed either through the use of guide RNAs bound with protein factors (snoRNP) or through designated protein enzymes. SnoRNA deletion and mutations in snoRNP proteins or rRNA/tRNA modification enzymes have been associated with human diseases including neurodegeneration, diabetes, and cancer. It is unclear in many cases, however, whether these disease phenotypes are truly derived from defects in rRNA/tRNA modifications or other yet-to-be-discovered mechanisms.
m1A not only blocks Watson-Crick base pairing but also introduces an extra positive charge under physiological conditions (
C. Other Adenosine Modifications
The methods and compositions of the disclosure are useful in the detection of adenosine modifications. It is contemplated that the methods and compositions may be useful for detecting the adenosine modifications listed below.
To summarize, mammalian mRNA and lncRNA contain many internal modifications with abundances ranging from 0.2-3 modified nucleotides per mRNA. This range of abundance suggests the presence of hundreds to tens of thousands of modified sites for each modification type in mammalian transcriptomes. Further, m6A and m1A are known to be reversible and undergo dynamic regulation. m6A is the most abundant and has been best studied with broad and fundamental roles uncovered so far. Other modifications could provide additional tuning of mRNA metabolism and function. The lack of highly sensitive, selective, and robust sequencing approaches for all these modifications presents current technology barriers that significantly hinder biological investigations. Development of single-base resolution and highly sensitive methods will be required in order to move the field forward and also to enable new discoveries on the functions of RNA modifications and their associations with human diseases.
The current disclosure provides methods and compositions comprising enzyme-mediated deamination for base-resolution sequencing of modified adenosines (e.g. m6A) in RNA. The current disclosure is based on a method that converts only unmodified A in RNA into a different base, leaving modified A untouched, and thereby allowing differentiation of A from m6A in sequencing (
A. ADAR (adenosine deaminase, RNA-Specific)
As detailed in the Examples of the application, RNA editing ADAR enzymes that deaminate A to inosine (I) in mRNA using guide RNAs that form duplexes with all target sites can be used. The ADAR enzyme is highly active against A but possesses over 50-fold lower activity towards m6A compared to unmodified A. Therefore, after selective deamination of all A in the transcriptome and subsequent RT-PCR followed by sequencing, only m6A will be read as A (
The ADAR from Drosophila melanogaster (also known as Dmel_CG12598, ADAR, ADAR1, CG12598, Dmel\CG12598, EG:BACN35H14.1, adar, adr, cg12598, dADAR, dAdar, and hypnos-2) is represented by the following GenBank accession Nos:
The sequences associated with each of the GenBank Accession numbers is herein incorporated by reference for all purposes. In some embodiments, the ADAR has the sequence associated with NP_001284791.1:
B. Protein Preparation
A variety of proteins can be purified using methods known in the art. Protein purification is a series of processes intended to isolate a single type of protein from a complex mixture. Protein purification is vital for the characterization of the function, structure and interactions of the protein of interest. The starting material is usually a biological tissue or a microbial culture. The various steps in the purification process may free the protein from a matrix that confines it, separate the protein and non-protein parts of the mixture, and finally separate the desired protein from all other proteins. Separation of one protein from all others is typically the most laborious aspect of protein purification. Separation steps exploit differences in protein size, physico-chemical properties and binding affinity.
Evaluating purification yield. The most general method to monitor the purification process is by running a SDS-PAGE of the different steps. This method only gives a rough measure of the amounts of different proteins in the mixture, and it is not able to distinguish between proteins with similar molecular weight. If the protein has a distinguishing spectroscopic feature or an enzymatic activity, this property can be used to detect and quantify the specific protein, and thus to select the fractions of the separation, that contains the protein. If antibodies against the protein are available then western blotting and ELISA can specifically detect and quantify the amount of desired protein. Some proteins function as receptors and can be detected during purification steps by a ligand binding assay, often using a radioactive ligand.
In order to evaluate the process of multistep purification, the amount of the specific protein has to be compared to the amount of total protein. The latter can be determined by the Bradford total protein assay or by absorbance of light at 280 nm, however some reagents used during the purification process may interfere with the quantification. For example, imidazole (commonly used for purification of polyhistidine-tagged recombinant proteins) is an amino acid analogue and at low concentrations will interfere with the bicinchoninic acid (BCA) assay for total protein quantification. Impurities in low-grade imidazole will also absorb at 280 nm, resulting in an inaccurate reading of protein concentration from UV absorbance.
Another method to be considered is Surface Plasmon Resonance (SPR). SPR can detect binding of label free molecules on the surface of a chip. If the desired protein is an antibody, binding can be translated to directly to the activity of the protein. One can express the active concentration of the protein as the percent of the total protein. SPR can be a powerful method for quickly determining protein activity and overall yield. It is a powerful technology that requires an instrument to perform.
Methods of protein purification. The methods used in protein purification can roughly be divided into analytical and preparative methods. The distinction is not exact, but the deciding factor is the amount of protein that can practically be purified with that method. Analytical methods aim to detect and identify a protein in a mixture, whereas preparative methods aim to produce large quantities of the protein for other purposes, such as structural biology or industrial use.
Depending on the source, the protein has to be brought into solution by breaking the tissue or cells containing it. There are several methods to achieve this: Repeated freezing and thawing, sonication, homogenization by high pressure, filtration (either via cellulose-based depth filters or cross-flow filtration), or permeabilization by organic solvents. The method of choice depends on how fragile the protein is and how sturdy the cells are. After this extraction process soluble proteins will be in the solvent, and can be separated from cell membranes, DNA etc. by centrifugation. The extraction process also extracts proteases, which will start digesting the proteins in the solution. If the protein is sensitive to proteolysis, it is usually desirable to proceed quickly, and keep the extract cooled, to slow down proteolysis.
In bulk protein purification, a common first step to isolate proteins is precipitation with ammonium sulfate (NH4)2SO4. This is performed by adding increasing amounts of ammonium sulfate and collecting the different fractions of precipitate protein. One advantage of this method is that it can be performed inexpensively with very large volumes.
The first proteins to be purified are water-soluble proteins. Purification of integral membrane proteins requires disruption of the cell membrane in order to isolate any one particular protein from others that are in the same membrane compartment. Sometimes a particular membrane fraction can be isolated first, such as isolating mitochondria from cells before purifying a protein located in a mitochondrial membrane. A detergent such as sodium dodecyl sulfate (SDS) can be used to dissolve cell membranes and keep membrane proteins in solution during purification; however, because SDS causes denaturation, milder detergents such as Triton X-100 or CHAPS can be used to retain the protein's native conformation during complete purification.
Centrifugation is a process that uses centrifugal force to separate mixtures of particles of varying masses or densities suspended in a liquid. When a vessel (typically a tube or bottle) containing a mixture of proteins or other particulate matter, such as bacterial cells, is rotated at high speeds, the angular momentum yields an outward force to each particle that is proportional to its mass. The tendency of a given particle to move through the liquid because of this force is offset by the resistance the liquid exerts on the particle. The net effect of “spinning” the sample in a centrifuge is that massive, small, and dense particles move outward faster than less massive particles or particles with more “drag” in the liquid. When suspensions of particles are “spun” in a centrifuge, a “pellet” may form at the bottom of the vessel that is enriched for the most massive particles with low drag in the liquid. Non-compacted particles still remaining mostly in the liquid are called the “supernatant” and can be removed from the vessel to separate the supernatant from the pellet. The rate of centrifugation is specified by the angular acceleration applied to the sample, typically measured in comparison to the g. If samples are centrifuged long enough, the particles in the vessel will reach equilibrium wherein the particles accumulate specifically at a point in the vessel where their buoyant density is balanced with centrifugal force. Such an “equilibrium” centrifugation can allow extensive purification of a given particle.
Sucrose gradient centrifugation is a linear concentration gradient of sugar (typically sucrose, glycerol, or a silica based density gradient media, like Percoll™) is generated in a tube such that the highest concentration is on the bottom and lowest on top. A protein sample is then layered on top of the gradient and spun at high speeds in an ultracentrifuge. This causes heavy macromolecules to migrate towards the bottom of the tube faster than lighter material. After separating the protein/particles, the gradient is then fractionated and collected.
Usually a protein purification protocol contains one or more chromatographic steps. The basic procedure in chromatography is to flow the solution containing the protein through a column packed with various materials. Different proteins interact differently with the column material, and can thus be separated by the time required to pass the column, or the conditions required to elute the protein from the column. Usually proteins are detected as they are coming off the column by their absorbance at 280 nm. Many different chromatographic methods exist:
Chromatography can be used to separate protein in solution or denaturing conditions by using porous gels. This technique is known as size exclusion chromatography. The principle is that smaller molecules have to traverse a larger volume in a porous matrix. Consequentially, proteins of a certain range in size will require a variable volume of eluent (solvent) before being collected at the other end of the column of gel.
In the context of protein purification, the eluant is usually pooled in different test tubes. All test tubes containing no measurable trace of the protein to purify are discarded. The remaining solution is thus made of the protein to purify and any other similarly-sized proteins.
Ion exchange chromatography separates compounds according to the nature and degree of their ionic charge. The column to be used is selected according to its type and strength of charge. Anion exchange resins have a positive charge and are used to retain and separate negatively charged compounds, while cation exchange resins have a negative charge and are used to separate positively charged molecules. Before the separation begins a buffer is pumped through the column to equilibrate the opposing charged ions. Upon injection of the sample, solute molecules will exchange with the buffer ions as each competes for the binding sites on the resin. The length of retention for each solute depends upon the strength of its charge. The most weakly charged compounds will elute first, followed by those with successively stronger charges. Because of the nature of the separating mechanism, pH, buffer type, buffer concentration, and temperature all play important roles in controlling the separation.
Affinity Chromatography is a separation technique based upon molecular conformation, which frequently utilizes application specific resins. These resins have ligands attached to their surfaces which are specific for the compounds to be separated. Most frequently, these ligands function in a fashion similar to that of antibody-antigen interactions. This “lock and key” fit between the ligand and its target compound makes it highly specific, frequently generating a single peak, while all else in the sample is unretained.
Many membrane proteins are glycoproteins and can be purified by lectin affinity chromatography. Detergent-solubilized proteins can be allowed to bind to a chromatography resin that has been modified to have a covalently attached lectin. Proteins that do not bind to the lectin are washed away and then specifically bound glycoproteins can be eluted by adding a high concentration of a sugar that competes with the bound glycoproteins at the lectin binding site. Some lectins have high affinity binding to oligosaccharides of glycoproteins that is hard to compete with sugars, and bound glycoproteins need to be released by denaturing the lectin.
A common technique involves engineering a sequence of 6 to 8 histidines into the N- or C-terminal of the protein. The polyhistidine binds strongly to divalent metal ions such as nickel and cobalt. The protein can be passed through a column containing immobilized nickel ions, which binds the polyhistidine tag. All untagged proteins pass through the column. The protein can be eluted with imidazole, which competes with the polyhistidine tag for binding to the column, or by a decrease in pH (typically to 4.5), which decreases the affinity of the tag for the resin. While this procedure is generally used for the purification of recombinant proteins with an engineered affinity tag (such as a 6×His tag or Clontech's HAT tag), it can also be used for natural proteins with an inherent affinity for divalent cations.
Immunoaffinity chromatography uses the specific binding of an antibody to the target protein to selectively purify the protein. The procedure involves immobilizing an antibody to a column material, which then selectively binds the protein, while everything else flows through. The protein can be eluted by changing the pH or the salinity. Because this method does not involve engineering in a tag, it can be used for proteins from natural sources.
Another way to tag proteins is to engineer an antigen peptide tag onto the protein, and then purify the protein on a column or by incubating with a loose resin that is coated with an immobilized antibody. This particular procedure is known as immunoprecipitation. Immunoprecipitation is quite capable of generating an extremely specific interaction which usually results in binding only the desired protein. The purified tagged proteins can then easily be separated from the other proteins in solution and later eluted back into clean solution. Tags can be cleaved by use of a protease. This often involves engineering a protease cleavage site between the tag and the protein.
High performance liquid chromatography or high pressure liquid chromatography is a form of chromatography applying high pressure to drive the solutes through the column faster. This means that the diffusion is limited and the resolution is improved. The most common form is “reversed phase” hplc, where the column material is hydrophobic. The proteins are eluted by a gradient of increasing amounts of an organic solvent, such as acetonitrile. The proteins elute according to their hydrophobicity. After purification by HPLC the protein is in a solution that only contains volatile compounds, and can easily be lyophilized. HPLC purification frequently results in denaturation of the purified proteins and is thus not applicable to proteins that do not spontaneously refold.
At the end of a protein purification, the protein often has to be concentrated. Different methods exist. If the solution doesn't contain any other soluble component than the protein in question the protein can be lyophilized (dried). This is commonly done after an HPLC run. This simply removes all volatile component leaving the proteins behind.
Ultrafiltration concentrates a protein solution using selective permeable membranes. The function of the membrane is to let the water and small molecules pass through while retaining the protein. The solution is forced against the membrane by mechanical pump or gas pressure or centrifugation.
Gel electrophoresis is a common laboratory technique that can be used both as preparative and analytical method. The principle of electrophoresis relies on the movement of a charged ion in an electric field. In practice, the proteins are denatured in a solution containing a detergent (SDS). In these conditions, the proteins are unfolded and coated with negatively charged detergent molecules. The proteins in SDS-PAGE are separated on the sole basis of their size.
In analytical methods, the protein migrate as bands based on size. Each band can be detected using stains such as Coomassie blue dye or silver stain. Preparative methods to purify large amounts of protein, require the extraction of the protein from the electrophoretic gel. This extraction may involve excision of the gel containing a band, or eluting the band directly off the gel as it runs off the end of the gel.
In the context of a purification strategy, denaturing condition electrophoresis provides an improved resolution over size exclusion chromatography, but does not scale to large quantity of proteins in a sample as well as the late chromatography columns.
Methods of the disclosure may involve purification of proteins by any combination of methods known in the art and/or discussed herein. In some embodiments, the protein is purified by a combination of one or more of affinity chromatography, ion exchange chromatograph, and gel filtration chromatography. In some embodiments, the affinity chromatography is anti-FLAG. In some embodiments, the ion exchange chromatography is heparin.
Nucleic acid analysis and evaluation includes various methods of amplifying, fragmenting, and/or hybridizing nucleic acids that have or have not been modified.
Methodologies are available for large scale sequence analysis. In certain aspects, the methods described exploit these genomic analysis methodologies and adapt them for uses incorporating the methodologies described herein. In certain instances the methods can be used to perform high resolution adenosine modification analysis on modified adenosines in RNA. Therefore, methods are directed to analysis of the adenosine modification status of a RNA sample, comprising one or more of the steps: (a) contacting the target RNA with an adenosine deaminase enzyme to generate a target RNA with deaminated adenosines, (b) sequencing the target RNA with deaminated adenosines; wherein the modified adenosine is detected when the nucleotide sequence is adenosine. In some embodiments, the method further comprises steps such as sequencing a control RNA; comparing the sequence of the target RNA with deaminated adenosines to the sequence of the control RNA; generating a nucleic acid strand that is complementary with the target and/or control RNA and hybridizing the complementary nucleic acid strand with the target RNA; comparing the known sequence of the target RNA with the sequence of the target RNA with deaminated adenosines.
In some embodiments, the assay comprises a cycle of steps as demonstrated in
A. RNA Isolation
RNA may be isolated from an organism of interest, including, but not limited to eukaryotic organisms and prokaryotic organisms, preferably mammalian organisms, such as humans. RNA may be isolated from cells grown in vitro or from cells in vivo. RNA may be isolated from tissues such as bone, blood, liver, heart, etc . . . Furthermore, the RNA may be isolated and fractionated by techniquest that separate different RNA types. Therefore, in some embodiments, RNA is purified, and the purified RNA comprises at least 80, 85, 90, 95, 96, 97, 98, 99, or 100% of a particular RNA such as mRNA, lncRNA, non-coding RNA, microRNA, pri-microRNA, pre-piRNA, rRNA, tRNA, snoRNA, or snRNA.
B. Sequencing
The sequencing may be done by known methods of sequencing nucleic acids. In certain embodiments, the target nucleic acids molecules are sequenced using any suitable sequencing technique known in the art. In one example, the sequencing is single-molecule sequencing-by-synthesis. Single-molecule sequencing is shown for example in U.S. Pat. Nos. 7,169,560, 6,818,395, 7,282,337, the contents of each of these references is incorporated by reference herein in its entirety. Other examples of sequencing nucleic acids may include Maxam-Gilbert techniques, Sanger type techniques, Sequencing by Synthesis methods (SBS), Sequencing by Hybridization (SBH), Sequencing by Ligation (SBL), Sequencing by Incorporation (SBI) techniques, massively parallel signature sequencing (MPSS), polony sequencing techniques, nanopore, waveguide and other single molecule detection techniques, reversible terminator techniques, or other sequencing technique now know or may be developed in the future.
In one embodiment, the sequencing is Illumina sequencing. Illumina sequencing is based on the amplification of DNA on a solid surface using fold-back PCR and anchored primers. Genomic DNA is fragmented, and adapters are added to the 5′ and 3′ ends of the fragments. DNA fragments that are attached to the surface of flow cell channels are extended and bridge amplified. The fragments become double stranded, and the double stranded molecules are denatured. Multiple cycles of the solid-phase amplification followed by denaturation can create several million clusters of approximately 1,000 copies of single-stranded DNA molecules of the same template in each channel of the flow cell. Primers, DNA polymerase and four fluorophore-labeled, reversibly terminating nucleotides are used to perform sequential sequencing. After nucleotide incorporation, a laser is used to excite the fluorophores, and an image is captured and the identity of the first base is recorded. The 3′ terminators and fluorophores from each incorporated base are removed and the incorporation, detection and identification steps are repeated.
In another embodiment, Ion Torrent sequencing can be used. (See, e.g., U.S. patent application numbers 2009/0026082, 2009/0127589, 2010/0035252, 2010/0137143, 2010/0188073, 2010/0197507, 2010/0282617, 2010/0300559), 2010/0300895, 2010/0301398, and 2010/0304982), the content of each of which is incorporated by reference herein in its entirety.) Oligonucleotide adaptors are ligated to the ends of target nucleic acid molecules. The adaptors serve as primers for amplification and sequencing of the fragments. The fragments can be attached to a surface and is attached at a resolution such that the fragments are individually resolvable. Addition of one or more nucleotides releases a proton (H+), which signal detected and recorded in a sequencing instrument. The signal strength is proportional to the number of nucleotides incorporated.
In some embodiments, sequencing the target RNA comprises creating a complementary DNA (cDNA) from the target RNA. In some embodiments, sequencing the target RNA comprises reverse transcription. In some embodiments, sequencing the target RNA comprises contacting the target RNA with an enzyme capable of transcribing DNA using the target RNA as a template (e.g. reverse transcriptase). In some embodiments, a cDNA of the target RNA is sequenced. The sequence of the cDNA is determined, and the cDNA sequence is used to determine the sequence of the target RNA. In some embodiments, the target RNA is determined to have a modified adenosine at the corresponding position of the cDNA that is determined by sequencing to be thymine.
In some embodiments, sequencing the target RNA comprises amplification of nucleic acids. Amplification can be done by techniques known in the art, such as PCR, that uses primers, polymerase, deoxynucleoside triphosphates, buffers, and bivalent and monovalent cations in a reaction that generates copies of a target DNA sequence from a single or few copies of the target DNA sequence.
C. Controls
In some embodiments, the methods described herein further comprise control samples. The intrinsically present RNA editing product should not influence the in vitro deamination (A has already been converted to I). However, an accurate assignment of the methylation fraction at each modification site could be affected if the same site is also intrinsically deaminated to some extent. In addition, the ADAR enzyme may exhibit certain sequence and/or structure biases that need to be corrected. Therefore, a control sample with minimum methylation is required to correct these factors.
In one embodiment, the control sample is a control transcript with minimum adenosine modification. In one embodiment, the control sample is a control transcript minimum m6A methylation (
In some embodiments the method comprises a control sample. To construct a control sample, the RNA preparation comprising the target RNA may be separated into multiple, such as at least, at most, or exactly 2, 3, 4, 5, 6, or more portions (or any derivable range therein). One portion may be subjected to modification-specific demethylation to remove the modification on the RNA. In some embodiment, one portion is contacted with ALKBH5 to catalyze m6A demethylation to remove most m6A on RNA. Next, both portions of the mRNA samples (after forming duplex RNA) will be subjected to the enzyme-mediated deamination, RT-PCR amplification, and high-throughput sequencing. Because the modified adenosine is resistant to deamination it will be read as A in the sample without modification-specific demethylation (e.g. ALKBH5 treatment). In the demethylation (e.g. ALKBH5-treated) control m6A is converted to A, which is deaminated to inosine in the deamination step and will be read as G. A comparison of the two parallel sequencing data will accurately reveal specific modification sites (i.e. m6A when ALKBH5 is used as the demethylase) at base resolution and eliminate potential RNA editing at the modification site (unmodified A-to-I) and potential biases of the deamination step.
The invention additionally provides kits for detecting modified adenosines in a target RNA. Each kit may also include additional components that are useful for amplifying the nucleic acid, or sequencing the nucleic acid, or other applications of the present invention as described herein. The kit may optionally provide additional components that are useful in the procedure. These optional components include buffers, capture reagents, developing reagents, labels, reacting surfaces, means for detection, control samples, instructions, and interpretive information. The kit may also include reagents for RNA isolation and/or purification.
The following examples are given for the purpose of illustrating various embodiments of the invention and are not meant to limit the present invention in any fashion. One skilled in the art will appreciate readily that the present invention is well adapted to carry out the objects and obtain the ends and advantages mentioned, as well as those objects, ends and advantages inherent herein. The present examples, along with the methods described herein are presently representative of certain embodiments, are provided as an example, and are not intended as limitations on the scope of the invention. Changes therein and other uses which are encompassed within the spirit of the invention as defined by the scope of the claims will occur to those skilled in the art.
Because the m6A modification on mRNA/lncRNA plays the most significant and broad roles in gene expression regulation, the focus of this Example is on sequencing of this modification. However, one skilled in the art would readily understand that parallel techniques could be applied for sequencing other adenosine modifications.
ADAR-mediated adenosine deamination sequencing (Deam-m6A-seq)
Applicants contemplated that adenosine deamination would be useful in methods for obtaining base-resolution sequencing of m6A in RNA (
The proposed use of ADAR for Deam-m6A-seq of m6A is not straightforward. Several major challenges must be resolved: 1. ADAR only works on duplex RNA. We must generate anti-sense RNA and form duplexes with sense RNA in order for the deamination reaction to occur. 2. ADAR-catalyzed deamination of duplex RNA can hardly reach 30% conversion because the deamination reaction converts A in the duplex RNA to I, which progressively weakens the duplex stability until dissociation of the two RNA strands leading to termination of the reaction. 3. The need for a control to correct various factors that may affect the deamination reaction. The intrinsically present RNA editing product should not influence the in vitro deamination (A has already been converted to I). However, an accurate assignment of the methylation fraction at each modification site could be affected if the same site is also intrinsically deaminated to some extent. In addition, the ADAR enzyme may exhibit certain sequence and/or structure biases that need to be corrected. Therefore, a control sample with minimum methylation is required in order to correct these factors.
Several strategies exist with regard to generating duplex RNA. The third challenge can be addressed with a demodification strategy, also detailed above in the “controls” section. The first and second challenges are most critical to overcome.
Challenge 1, Generating double-stranded RNA: Because ADAR only works on duplex RNA one solution is to generate a transcriptome-wide anti-sense RNA strand pool in order to form dsRNA for the application of Deam-m6A-seq. To do this, one can use cDNA library to prepare anti-sense RNA fragments and then anneal to the transcripts. Well-established procedures to clone mammalian or other cDNA into plasmids can be used. The anti-sense RNAs can be transcribed in vitro using T7 RNA polymerase and are annealed to the methylated and the demethylated transcriptomes; the resulting dsRNAs can serve as the substrates for the dADAR-mediated deamination.
Alternatively, the 3′ primer-free Phi6 RNA replicase can be utilized to generate dsRNA using purified transcripts as templates. Phi6 RNA replicase is a primer-independent viral RNA replicase. It can generate dsRNA directly from ssRNA templates without primers. This approach serves as an alternative to produce dsRNA for dADAR-mediated deamination.
Challenge 2, Innefficient deamination: To improve the deamination efficiency of ADAR on the sense strand of dsRNA, Applicants devised a strategy to inhibit adenosine deamination of the anti-sense strand of dsRNA. Applicants employed N6-methyladenosine triphosphate (m6ATP or me-ATP) in the in vitro transcription reaction to generate the complementary strand, replacing all A with m6A on the anti-sense RNA strand in order to prevent the deamination reaction and increase the stability of dsRNA. Applicants cloned recombinant dADAR into an insect cell expression system with a FLAG tag on the N terminus and a polyhistidine tag on the C terminus. The protein was expressed and purified to high purity via anti-FLAG affinity resin and heparin column. A target RNA probe was generated through transcription and purified as a model RNA (
Challenge 3, quantification controls: In order to generate a control sample, Applicants further propose a demethylation approach to provide control transcripts with minimum m6A methylation (
Applicants tested whether the A-to-G mutation ratio changes with and without demethylation reaction as proposed. A- and m6A-containing target strands were synthesized and mixed together in 1:1 ratio. The ssRNA substrate was split into two portions, a control portion and a portion subjected to ALKBH5-mediated demethylation (
Applicants have applied Deam-m6A-seq to isolated mRNA from HeLa cells by using Phi6 RNA replicase to generate duplex RNA (with m6A on the opposite anti-sense strand). A shallow sequencing was performed with promising results. In MALAT1 lncRNA as an example, we observed 20-60% A-to-G transition mutations at most A sites.
One main concern is that the ADAR-mediated deamination of unmodified A is not stoichiometric. But with m6A in the complementary strand to extend deamination of the target strand and a new buffer system that stabilizes duplex and promotes dADAR activity, the deamination efficiency can be improved up to ˜75%. To provide quantitative information of the modification percentage at each site, spike-in controls that contain different percentages of m6A/A can be used. Generating a calibration curve will enable more precise measurements of the methylation fractions at each sites. The use of spike-in controls and generation of a calibration curve provided more accurate estimates of the modification fractions.
Adenosine deamination in RNA-DNA Duplex
Applicants have recently discovered a new buffer system (Example 2) that stabilizes duplex and significantly promotes deamination of unmodified A in duplex RNA (with m6A on the complementary strand) by dADAR to ˜75%. Under the same conditions dADAR is active on RNA-DNA duplex substrates. We observed ˜25% A-to-G mutations for unmodified A in RNA of an RNA-DNA model substrate after treating with dADAR (
With the RNA-DNA hybrid duplex as the substrate, Applicants can selectively digest DNA after the strand separation (because of deamination) and perform reverse transcription to generate the new DNA stands complementary to the deaminated and un-deaminated RNAs for another round of deamination. In principle, iterative rounds of this procedure will allow for much higher deamination efficiency. An ALKBH5-treated control sample with minimum m6A will be processed in parallel. Comparison of these two samples could accurately reveal m6A with base-resolution accuracy. Potential deamination of m6A can be monitored in order to ensure minimum interference. This approach offers one of the best current solutions for m6A sequencing at base resolution.
In RNA technologies, RNA processing and modification are important features, closely related to its biological functions. Particularly, N6-methyladenosine (m6A) is a widely present modification found within eukaryotic messenger RNA and various nuclear non-coding RNAs. Recent discoveries have revealed that methylation of adenosine in mRNA is a dynamic and reversible process. m6A formation in the nucleus is catalyzed by a complex containing methyltransferase like 3 (METTL3), methyltransferase like 14 (METTL14), and Wilms' tumor 1-associating protein (WTAP). Two human AlkB family proteins, fat mass and obesity-associated protein (FTO) and ALKBH5, serve as RNA demethylases to remove m6A in mammalian polyAttailed RNA. The “reader” protein YTHDF2, an m6A specific binding protein which is shown to interact with thousands of mRNA targets, mediates its substrates to a methylation-dependent mRNA decay, demonstrating a significant role of methylation in mRNA metabolism. The latest work on METTL3 knockout embryonic stem cells also indicates the critical role of m6A in cell differentiation and its regulatory relationship with microRNA.
Currently, transcriptome-wide m6A detection is based on antibody-specific enrichment, followed by high-throughput sequencing (m6A-seq, or MeRIP-seq). Even a photo-crosslinking-assisted strategy has been developed to significantly improve the resolution of m6A-seq (PA-m6A-seq), the lack of direct conversion tool to differentiate methylated and unmethylated adenosine still hinders the single-nucleoside resolution detection and quantitative analyses of methylation across the entire transcriptome, which is of great importance to further investigate the biological significance of the methylation modification.
Described herein is an enzyme-based approach, ADAR-mediated adenosine deamination sequencing (Deam-m6A-seq), to map m6A in a transcriptome-wide manner at single-nucleoside resolution.
Design and Innovation: Drosophila Adenosine Deaminase that Acting on RNA (dADAR) was chosen as the tool enzyme to catalyze deamination process (
The following describes exemplary method steps useful in the methods and compositions of the disclosure. Certain embodiments may comprise the reagents and methods described below.
I. Step 1: dADAR Substrate Preparation:
A. Formation of dsRNA by Employing Phi6 RNA replicase and N6-methyl-ATP to Replace Normal ATP
Phi6 RNA replicase is a high-efficient RNA replicase from virus, which is able to generate complementary anti-sense RNA strand without the assistance of primer. Phi6 RNA replicase recognizes free 3′ hydroxyl group of RNA molecule to initiate the synthesis, typically producing the full-length dsRNA for the RNA/DNA template.
To avoid the nonsense deamination occurring on complementary RNA strand which destabilizes dsRNA and also consumes the catalytic turnover capability of dADAR, Applicants proposed to incorporate m6A instead of A in complementary strand by utilizing N6-methyl-ATP, which was not tested before. Surprisingly, the incorporation efficiency of N6-methyl-ATP by Phi6 RNA replicase is similar to that of normal ATP, ensuring the productivity of m6A-incorporated dsRNA for further deamination treatment (
B. Formation of RNA-DNA hybrid as dADAR substrate.
dADAR does not recognize RNA-DNA hybrid in typical buffer system. dATP and
GTP-containing reaction buffers were tested to investigate whether the deamination activity could be improved. It was eventually found that dADAR could react with RNA-DNA hybrid with GTP in the reaction buffer (
II. Step 2: deamination control setup
A. Demethylated Control Group
The deamination activity of dADAR on substrate is barely affected by sequences of substrate. However, the tiny uneven deamination reactivity on adenosine in different contexts has great negative effect on the power of differentiating methylated adenosine from unmethylated. Thus, setting up an effective control group for further data analyses is extremely significant.
In theory, comparing deamination results of methylated and unmethylated transcripts is the best way to 1) rule out any context biases on unmethylated adenosine and 2) amplify the deamination conversion effect on methylated adenosine. Applicants considered the use of m6A demethylases to get rid of m6A modification transcriptome-wide. Applicants tested both FTO and ALKBH5, finding that ALKBH5 affords the better efficiency of removing methyl group of adenosine (
B. Relationship Between Deamination Efficiency and Methylation Level Demonstrated by Calibration Curve
Even we have already optimized the entire deamination reaction from substrate to reaction condition, it is still unlikely to convert all unmethylated adenosine to inosine in deaminase treatment. Considering the relatively low level of methylation, a cutoff/criteria is required for confidently detecting m6A and quantifying methylation on each site. Therefore, a relationship between deamination efficiency and methylation level on each adenosine is necessary for quality control and data analyses.
T7 and T3 RNA polymerases are compatible with several modified nucleoside triphosphate and are able to generate modified nucleoside containing RNA fragments. We found that N6-methyl-ATP can be used to form the RNA strand. By using this strategy, we can prepare a set of samples with different methylation levels, with which a calibration curve, demonstrating the relationship between deamination efficiency (indicated by A-to-G transition ratio in high-throughput sequencing) and methylation level, can be prepared.
C. Step 3: Deamination Condition Optimization
The best buffer system for deamination reaction of dADAR (on dsRNA and RNA-DNA hybrid) is shown below:
The reactivity under two different concentrations of GTP in the buffer system were tested. With 4 mM GTP in buffer, the reactivity enhanced significantly (
The observation that dADAR works effectively on RNA-DNA duplex in the presence of GTP opens a possibility to further increase the deamination efficiency for A (
Through regular RNase H-minus RT Applicants can readily generate RNA-DNA hybrid duplex from a pool of isolated mRNA. dADAR does not work on the DNA strand. Therefore, its activity will be confined to the RNA strand. The dADAR-mediated deamination leads to progressive weakening of the duplex until the two strands separate. Applicants will then treat the mixture with DNase to digest all DNA, purify RNA, and perform another round of RT to generate another pool of RNA-DNA hybrid duplexes that form perfectly complementary strands, now with the deaminated RNA product from step 1. The dADAR-mediated deamination will be performed, and then the procedure goes back to step 1 for iterative rounds of duplex generation and RNA deamination. After several rounds of RT-deamination-DNA digestion-RT, the deamination efficiency of unmodified A could in principle reach stoichiometric. Library will be constructed for subsequent sequencing.
With the RNA-DNA hybrid duplex as the substrate, Applicants can selectively digest DNA after the strand separation (because of deamination) and perform reverse transcription to generate the new DNA stands complementary to the deaminated and un-deaminated RNAs for another round of deamination. In principle, iterative rounds of this procedure will allow us to achieve much higher deamination efficiency. An ALKBH5-treated control sample with minimum m6A will be processed in parallel. Comparison of these two samples could accurately reveal m6A with base-resolution accuracy. Applicants can monitor potential deamination of m6A in order to ensure minimum interference. This approach offers one of the best current solutions for m6A sequencing at base resolution.
The deamination process converts unmethylated adenosine to inosine and generates inosine-uracil/thymine mismatch in dsRNA/RNA-DNA hybrid, which significantly destabilizes the duplex structure. However, the presence of the duplex structure is critical for achieving high transition ratio through effective deamination.
To overcome this drawback, Applicants propose to iteratively perform deamination reaction on target strand to convert as many unmethylated adenosines to inosine as possible. In dsRNA Deam-seq, the introduction of the anti-sense complementary strand is critical for the deamination reaction; however, the addition of the complementary strand also interferences on iterative deamination reaction since after several rounds of formation of dsRNA, the original target strand, which is the “real” transcriptome, is diluted by artificially synthesized strand.
The best approach to differentiate the original target strand from artificially synthesized complementary strand is to selectively label the original one and immobilize it on a solid support. Therefore, we have designed the RNA immobilization and iterative Deam-seq strategy (
RNA fragment is first attached with a 5′-phosphothioate group by using T4 polynucleotide kinase and ATP-γS. Then the labeled RNA fragment reacts with a solid support with iodoacetamide (or other functional groups that react with phosphothioate) that is commercially available. The immobilize RNA serves as the template for the formation of dsRNA/RNA-DNA duplex and is applied to deamination reaction system.
After the conversion of as many adenosines as possible to inosine during deamination, the artificial anti-sense strand no longer stably interacts with target strand, which can be denatured and washed out. Then, a second-round of the formation of dsRNA/RNA-DNA duplex and subsequent deamination is followed (
Besides reacting ATP-γS with iodoacetamide, other 5′ end labeling and immobilization strategies can be applied. For instance, after 5′-phosphothioate transfer, labeled RNA/DNA can also react with maleimide derivatives (for example, maleimide biotin) and then be immobilized on streptavidin beads.
The high reactivity of phosphothioate transfer and high yield of solid phase capture may also enable our strategy to enrich cell free oligonucleotides (RNA or DNA) for clinical diagnostics. Current enrichment approaches are generally based on direct extraction and absorptions through electrostatic or hydrophobic interaction, which either is less efficient or hardly to be further treated for further experiments. Our approach anchors oligonucleotides at 5′ terminal, which has little effect on most biochemical treatment; also, the solid phase separation can afford iterative flow-through and clean-up steps, leading to high efficiency and low background noises.
The Phi6 RNA replicase gene was synthesized at GeneArt (Thermo Fisher). The gene was directly cloned into pMCSG19 vector by using Gibson assembly method. The construct was verified by Sanger sequencing then transformed to PRK1037 competent cell.
For protein expression and purification, the transformant was grown at 37° C. overnight as a starter culture. In the next day it was used to inoculate LB media and grown at 37° C. to an absorbance at 600 nm of 0.8. Then, the media culture was cooled down to 16° C. and induced by adding 1 mM isopropyl-β-D-thiogalactopyranoside at 16° C. for 18 hrs. The bacterial cells were pelleted by centrifugation and homogenized in lysis buffer containing 20 mM Tris-HCl pH 8.0, 200 mM NaCl and 1 mM phenylmethanesulfonyl-fluoride (PMSF) by a cell homogenizer. The supernatant was subjected to Ni-NTA columns for affinity purification. The protein was eluted in 20 mM Tris-HCl, pH 8.0, 200 mM NaCl and 500 mM imidazole, then directly subjected to Heparin column (GE Healthcare). Elution of the bound protein was performed with a 150 mM to 1 M NaCl gradient buffered with 50 mM Tris-HC1, pH 8.0 and 1 mM EDTA. Fractions containing Phi6 RNA replicase was further purified by Source Q column (GE Healthcare) with a 100 mM to 1 M NaCl gradient. The fraction was pooled together and concentrated for storage.
Besides maleimide sulfur reaction, an alternative approach was also developed. The reaction between the phosphothioate group at the 5′ terminal of RNA and methanethiosulfonate biotin (MTSEA-biotin) labels the RNA molecule with biotin via the formation of disulfide bond, which can be cleaved by DTT treatment.
The advantage of this methanethiosulfonate disulfide formation strategy is: 1) the high activity of MTSEA-biotin makes the labeling more efficient under milder condition (incubated at room temperature for 20 min); 2) the DTT cleavage step provides high recovery yield of selective pulldown RNA molecule from the streptavidin beads, ensuring the application of Deam-seq to limited amount samples.
Regarding the fact that Deam-seq original reads contain multiple specific mutations which is very similar to bisulfite sequencing, the inventors chose to adapt Bismark, a widely used bisulfite sequencing analysis tool, to the Deam-seq analysis by performing A-to-G transition instead of its original C-to-T transition. The modified Bismark script is able to uniquely map the raw data back to human transcriptome and report both the transition sites and transition frequency of each site as analysis output, which could be used for further statistical test.
With the adapted Bismark tool, the pipeline to analyze Deam-seq and translate the raw data into frequency is described as following:
Step 1: use a homemade script to convert the raw data .fastq file to its reverse complementary sequence .fastq.rc. The Illumina TruSeq stranded mRNA library preparation kit specifically labels the transcriptome to effectively distinguish the first-strand cDNA from the second strand, keeping the stranded information in raw data. Given the great likelihood that Deam-seq libraries contain both “sense” (the transcriptome) and “anti-sense” (the artificial “reverse complement-ome”) reads, it is of great necessity to process all reads to make sure that the reads are in their original directions as how they are annotated in genome.
Step 2: apply the adapted Bismark tool to map the converted raw data .fastq.rc file back to reference genome. In this step, both the raw reads from Step 1 and the reference genome are first transformed to replace all A sites with G sites, then two parallel alignment instances between original and converted versions of raw data and reference enable one to precisely and uniquely map the reads; meanwhile, conversion status of each A site is also recorded based on the parallel alignment, which is further reported as the A-to-G transition frequency.
Step 3: employ Bedtools intersect to extract the transcriptome oriented A-to-G transition frequency for further statistical analysis. Even the reverse complementary strand in dsRNA could be deaminated in the treatment, only the strand with the same direction as the transcriptome is the target for m6A analysis. Using the transcriptome based reference, Bedtools intersect tool efficiently extracts the “on-transcript-strand” part and reports in a separate file recording the conversion status of each A sites covered by raw data. An example report is shown in Table 1.
aA->A only Events: on A sites which do not have A-to-G mutation detected, the covered A reads counted as “events”.
bA->A only Sites: the A sites which do not have A-to-G mutation detected.
cA->G only Events: on A sites which have all A read as G, the covered G reads counted as “events”.
dA->G only Sites: the A sites which have all A read as G.
eA&G both A->A Events: on A sites which have both A and G detected, the covered A reads counted as “events”.
fA&G both A->G Events: on A sites which have both A and G detected, the covered G reads counted as “events”.
gA&G both Sites: the A sites which have both A and G detected.
All of the methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims. All publications described herein are specifically incorporated by reference for all purposes.
The following references, to the extent that they provide exemplary procedural or other details supplementary to those set forth herein, are specifically incorporated herein by reference.
Horber F, Potoczna N, Hercberg S, Le Stunff C, Bougneres P, Kovacs P, Marre M, Balkau B, Cauchi S, Chevre J-C, Froguel P. Variation in FTO contributes to childhood obesity and severe adult obesity. Nat Genet. 2007; 39(6):724-6. PMC pending.
84. Fernández I S, Ng C L, Kelley A C, Wu G, Yu Y-T, Ramakrishnan V. Unusual base pairing during the decoding of a stop codon by the ribosome. Nature. 2013; 500(7460):107-10. PMC3732562.
0139. Trewick S C, Henshaw T F, Hausinger R P, Lindahl T, Sedgwick B. Oxidative demethylation by Escherichia coli AlkB directly reverts DNA base damage. Nature. 2002; 419(6903):174-8. PMC pending.
This application claims the benefit of priority of U.S. Provisional Patent Application No. 62/211,948, filed Aug. 31, 2015, which is hereby incorporated by reference in its entirety.
This invention was made with government support under contract R01 HG006827 awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US16/49407 | 8/30/2016 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62211948 | Aug 2015 | US |