A METHOD FOR DIAGNOSING A DISEASE BY DETECTION OF circRNA IN BODILY FLUIDS

Information

  • Patent Application
  • 20180282809
  • Publication Number
    20180282809
  • Date Filed
    September 29, 2016
    7 years ago
  • Date Published
    October 04, 2018
    5 years ago
Abstract
The present application relates to a method for diagnosing a disease of a subject, comprising the step of determining the presence or absence of one or more circular RNA (circRNA) in a sample of a bodily fluid of said subject; wherein the presence or absence of said one or more circRNA is indicative for the disease. In particular the application relates to a method for diagnosing the neurodegenerative disease, preferably Alzheimer's disease, in a subject comprises the steps of —determining the level of one or more circRNA in a sample of a bodily fluid of said subject; —comparing the determined level to a control level of said one or more circRNA; wherein differing levels between the determined and the control level are indicative for the disease. Furthermore, the application relates to means for detecting circRNAs being a biomarker for a neurodegenerative disease and kits and array comprising nucleic acid probes for detecting exon-exon junctions in a head to tail arrangement of these circRNAs.
Description
TECHNICAL FIELD OF THE INVENTION

The present invention relates to the field of medicine and RNA biology, in particular it relates to the field of diagnosis of a disease using circRNAs, more particular the present invention relates to the field of diagnosis of a neurodegenerative disease, e.g. Alzheimer's disease.


BACKGROUND OF THE INVENTION

Many diseases are associated with deregulation of gene expression. Such deregulation is in many cases detectable at early stages of the disease. In fact, detection of deregulated expression often serves as a biomarker for a disease or the risk for acquiring a disease before the disease manifests in terms of symptoms. However, diagnosis is often restricted to samples of the diseased tissue. In some cases, diagnosis also aims at the detection of biomarkers in easily accessible samples, such as blood. However, these methods are restricted to protein biomarkers and require cumbersome preparation of the samples, e.g. serum or plasma samples. The direct readout of expression, i.e. RNA, is in most cases not feasible in blood samples, as RNAs are prone to degradation. Hence, RNA nowadays is a poor biomarker in blood, in particular for diseases manifesting in tissues others than blood.


Regulatory RNAs such as microRNAs (miRNAs) or long non-coding RNAs (lncRNAs) have been implicated in many biological processes and human diseases such as cancer (reviewed in Batista P J, Chang H Y. Long Noncoding RNAs: Cellular Address Codes in Development and Disease. Cell. 2013; 152(6):1298-1307; and Cech T R, Steitz J A. The Noncoding RNA Revolution Trashing Old Rules to Forge New Ones. Cell. 2014; 157(1):77-94). Recent studies have drawn attention to a new class of RNA that is endogenously expressed as single-stranded, covalently closed circular molecules (circRNA, reviewed in Jeck W R, Sharpless N E. Detecting and characterizing circular RNAs. Nature Biotechnology. 2014; 32(5):453-461). Most circRNAs are probably products of a ‘back-splice’ reaction that joins a splice donor site with an upstream splice acceptor site (see Ashwal-Fluss R, Meyer M, Pamudurti N R, et al. circRNA Biogenesis Competes with Pre-mRNA Splicing. MOLCEL. 2014; 1-12; and Starke S, Jost I, Rossbach O, et al. Exon Circularization Requires Canonical Splice Signals. Cell Reports. 2015; 10(1):103-111). Circular RNA is known for several decades from viroids, viruses and plants, but until recently only few mammalian circRNAs were reported. Sequencing based studies lately revealed that circRNAs are abundantly and prevalently expressed across life, oftentimes in a tissue and developmental-stage specific manner (see Danan M, Schwartz S, Edelheit S, Sorek R. Transcriptome-wide discovery of circular RNAs in Archaea. Nucleic Acids Res. 2012; 40(7):3131-3142; Salzman J, Gawad C, Wang P L, Lacayo N, Brown P O. Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS ONE. 2012; 7(2):e30733; Jeck W R, Sorrentino J A, Wang K, et al. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2013; 19(2):141-157; Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338; Salzman J, Chen R E, Olsen M N, Wang P L, Brown P O. Cell-Type Specific Features of Circular RNA Expression. PLoS Genetics. 2013; 9(9):e1003777; Wang P L, Bao Y, Yee M-C, et al. Circular RNA is expressed across the eukaryotic tree of life. PLoS ONE. 2014; 9(3):e90859; Guo J U, Agarwal V, Guo H, Bartel D P. Expanded identification and characterization of mammalian circular RNAs. Genome Biology. 2014; 1-14; and You X, Vlatkovic I, Babic A, et al. Neural circular RNAs are derived from synaptic genes and regulated by development and plasticity. Nat Neurosci. 20117-25). The vast majority of circRNAs consists of 2-4 exons of protein coding genes, but they can also derive from intronic, non-coding, antisense, 5′ or 3′ untranslated or intergenic genomic regions (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338; and Zhang Y, Zhang X-O, Chen T, et al. Circular Intronic Long Noncoding RNAs. MOLCEL. 2013; 1-15). Although not fully understood, the biogenesis of many mammalian circRNAs depends on complementary sequences within flanking introns (see Ashwal-Fluss R, Meyer M, Pamudurti N R, et al. circRNA Biogenesis Competes with Pre-mRNA Splicing. MOLCEL. 2014; 1-12; Rybak-Wolf A, Stottmeister C, Glažar P, et al. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. MOLCEL. 2015; 1-17; Zhang X-O, Wang H-B, Zhang Y, et al. Complementary Sequence-Mediated Exon Circularization. Cell. 2014; Liang D, Wilusz J E. Short intronic repeat sequences facilitate circular RNA production. Genes and Development. 2014; Conn S J, Pillman K A, Toubia J, et al. The RNA Binding Protein Quaking Regulates Formation of circRNAs. Cell. 2015; 160(6):1125-1134; and Ivanov A, Memczak S, Wyler E, et al. Analysis of Intron Sequences Reveals Hallmarks of Circular RNA Biogenesis in Animals. CellReports. 2015; 10(2):170-177) and their expression can be modulated by antagonistic or activating trans-acting factors such as ADAR and Quaking (see Ivanov A, Memczak S, Wyler E, et al. Analysis of Intron Sequences Reveals Hallmarks of Circular RNA Biogenesis in Animals. Cell Reports. 2015; 10(2):170-177; and Conn S J, Pillman K A, Toubia J, et al. The RNA Binding Protein Quaking Regulates Formation of circRNAs. Cell. 2015; 160(6):1125-1134; respectively). Although the function of animal circRNAs is largely unknown, it was demonstrated that the circRNAs CDR1as (ciRS-7) and SRY can act as antagonists of specific miRNAs by functioning as miRNA sponges (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338; and Hansen T B, Jensen T I, Clausen B H, et al. Natural RNA circles function as efficient microRNA sponges. Nature. 2013; 495(7441):384-388). Moreover, stable knockdown of CDR1as caused a migration defect in cell culture and a circRNA produced from the muscleblind transcript can bind muscleblind protein and likely regulate its expression levels (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338; and Ashwal-Fluss R, Meyer M, Pamudurti N R, et al. circRNA Biogenesis Competes with Pre-mRNA Splicing. MOLCEL. 2014; 1-12). Besides these specific functions for the few in-depth analyzed circRNAs, a recent study uncovered a putatively more general competition mechanism between linear RNA splicing and co-transcriptional circular RNA splicing (see Ashwal-Fluss R, Meyer M, Pamudurti N R, et al. circRNA Biogenesis Competes with Pre-mRNA Splicing. MOLCEL. 2014; 1-12). The lack of free ends, i.e. its circularity, renders circRNAs resistant to exonucleolytic activities within cells and in extracellular environments. Thus, circRNAs are stable molecules as demonstrated by their long half lives in cells a feature that distinguishes them from canonical linear RNA isoforms (see Cocquerelle C, Daubersies P, Majérus MA, Kerckaert J P, Bailleul B. Splicing with inverted order of exons occurs proximal to large introns. EMBO J 1992; 11(3):1095-1098; Jeck W R, Sorrentino J A, Wang K, et al. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2013; 19(2):141-157; and Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338).


The inventors now for the first time show the presence of a plurality of ten thousands of circRNAs in standard clinical whole blood specimen of diseased subjects and thereby show that circRNAs function as biomarkers in human disorders, in particular neurodegenerative disorders, as exemplified by Alzheimer's disease. Strikingly, the mRNA transcripts which give rise to circRNAs were in hundreds of cases almost not detectable while the corresponding circRNAs were highly expressed, underlining the significance of circRNAs as novel biomarkers. Approaches have been performed to detect circRNAs as biomarkers in blood. However, these approaches use processed blood. Blood-exosomes have been postulated as comprising circRNAs (Li, Yan et al. (2015); Circular RNA is enriched and stable in exosomes: a promising biomarker for cancer diagnosis; Cell Res, 25(8): 981-984). However, blood-exosomes are difficultly obtainable and the cumbersome procedure renders the procedure susceptible to errors and the significance of the so obtained circRNA levels questionable. The present inventors however showed that circRNAs in unprocessed samples, e.g. whole blood, are detectable and are surprisingly well suited as biomarkers.


Alzheimer's disease (also referred to as “AD”), is the cause for 60% to 70% of cases of dementia. It is a chronic neurodegenerative disease starting slowly and getting worse over time. One of the first symptoms is a short-term memory loss. As the disease advances, symptoms can include problems with language, disorientation (including easily getting lost), mood swings, loss of motivation, not managing self care, and behavioural issues. As a person's condition declines, she or he often withdraws from family and society. Gradually, bodily functions are lost, ultimately leading to death. Although the speed of progression can vary, the average life expectancy following diagnosis is three to nine years. The cause of Alzheimer's disease is poorly understood. About 70% of the risk is believed to be genetic with many genes usually involved. Other risk factors include a history of head injuries, depression, or hypertension. The disease process is associated with plaques and tangles in the brain.


Currently, the diagnosis of Alzheimer's disease is based on the history of the illness and cognitive testing. These tests are often substituted by medical imaging and blood tests to rule out other possible causes. Initial symptoms are often mistaken for normal ageing. Today, examination of brain tissue is needed for a definite diagnosis. However, brain tissue is not easily accessible and the surgical intervention causes severe dangers. Mental and physical exercise, and avoiding obesity may decrease the risk of AD.


In 2010, there were between 21 and 35 million people worldwide with AD. It most often begins in people over 65 years of age, although 4% to 5% of cases are early-onset Alzheimer's which begin before this. It affects about 6% of people 65 years and older. In 2010, dementia resulted in about 486,000 deaths. In developed countries, AD is one of the most financially costly diseases. There is a long felt need for a direct, easy and reliable diagnosis of Alzheimer's disease to allow intervention and prevention of adverse effects of the beginning or progressing mental degeneration.


The molecular underpinnings of AD are controversially debated; although substantial research efforts were made and are currently ongoing (annual budget for AD of the NIH in 2015 is $566 million). In particular there is an urgent need for biomarkers for AD since it is believed that the molecular alterations involved in the disease precede the symptoms years or even decades, hindering therapeutic interventions.


SUMMARY OF THE INVENTION

The present invention provides for a method that overcomes the above outlined drawbacks. The inventors have found that it is possible to detect circRNAs in samples of a bodily fluid in a great amount. The inventors further have proven that the circRNAs are indicative for a disease that is not a disease of the bodily fluid. Thereby, a tool is given to directly diagnose a disease by determining presence or absence of one or more circRNAs in a bodily fluid. The invention therefore provides for a new class of biomarkers in bodily fluids, e.g. blood.


Hence, the present invention relates to a method for diagnosing a disease of a subject, comprising the step of:

    • determining the presence or absence of one or more circular RNA (circRNA) in a sample of a bodily fluid of said subject;


wherein the presence or absence of said one or more circRNA is indicative for the disease. Preferably, said disease is not a disease of said bodily fluid.


As has been shown by the inventors, certain circRNAs may be present in samples of a diseased subject at differing levels as compared to samples from healthy subjects. Hence, it may be desirable to decide on “presence” or “absence” of a circRNA when compared to a control level. Hence, in a preferred embodiment of the method according to the present invention the determination step comprises:

    • determining the level of said one or more circRNA;
    • comparing the determined level to a control level of said one or more circRNA;


wherein differing levels between the determined and the control level are indicative for the disease. Hence, the invention also relates to a method for diagnosing a disease of a subject, comprising the step of:

    • determining the level of said one or more circRNA;
    • comparing the determined level to a control level of said one or more circRNA;


wherein differing levels between the determined and the control level are indicative for the disease.


In a preferred embodiment of the present invention, the method is a method for diagnosing a neurodegenerative disease in a subject. Hence, the invention also relates to a method for diagnosing the neurodegenerative disease, preferably Alzheimer's disease, in a subject comprising the steps of:

    • determining the level of one or more circRNA in a sample of a bodily fluid of said subject;
    • comparing the determined level to a control level of said one or more circRNA; wherein differing levels between the determined and the control level are indicative for the disease. A particular preferred neurodegenerative disease is Alzheimer's disease.


The inventors, furthermore, identified circRNAs that were not previously known to be present in blood. These novel blood circRNAs are listed in Table 1. Hence the present invention also relates to a nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to 910, and a RNA sequence encoded by a sequence of SEQ ID NO:1 to 910, or specifically hybridizing to a reverse complement sequence thereof; preferably specifically hybridizing to a sequence selected from the group consisting of SEQ ID NO:1 to 910, and a RNA sequence encoded by a sequence of SEQ ID NO:1 to 910, or specifically hybridizing to a reverse complement sequence thereof.


Furthermore, the invention relates to a kit for diagnosing a neurodegenerative disease, comprising means for specifically detecting one or more nucleic acid sequence encoded by a sequence selected from the group consisting of SEQ ID NO:1 to 910 or SEQ ID NO:911 to SEQ ID NO:1820.


The invention, furthermore, provides an array for determining the presence or level of a plurality of nucleic acids, comprising a plurality of probes, wherein the plurality of probes specifically hybridize to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and a RNA sequence encoded by a sequence of SEQ ID NO:1 to 910, or to a reverse complement sequence thereof; preferably the plurality of probes comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of at least 100, preferably at least 150 more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and a RNA sequence encoded by a sequence of SEQ ID NO:1 to 910; or specifically hybridize to the reverse complement sequences thereof, preferably the plurality of probes comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of SEQ ID NO:1 to SEQ ID NO:200, or a RNA sequence encoded by a sequence of SEQ ID NO:1 to 200, or the reverse complement sequences thereof.


The invention in particular relates to the use of a nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910; and a RNA sequence encoded by a sequence of SEQ ID NO:1 to 910, or hybridizing to the reverse complement thereof, a kit according to the invention, or an array according to the invention for the diagnosis of a neurodegenerative disease, preferably for the diagnosis of Alzheimer's disease.





FIGURE LEGENDS


FIG. 1: Thousands of circRNAs are reproducibly detected in human blood. (A) Total RNA was extracted from human whole blood samples and rRNA was depleted. cDNA libraries were synthesized using random primers and subjected to sequencing. Raw sequencing reads were used for circRNA detection as previously described (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338). Sequencing reads that map continuously to the human reference genome were disregarded. From unmapped reads anchors were extracted and independently mapped. Anchors that align consecutively indicate linear splicing events 1) whereas alignment in reverse orientation indicates head-to-tail splicing as observed for circular RNAs 2). After filtering of linear splicing events and circRNA candidates (see Methods in the Example section) the genomic coordinates and additional information such as read count, alignment quality and annotation are documented (Table 3). (B) circRNA candidate expression in human whole blood samples from two donors, ECDF=empirical cumulative distribution function. circRNA candidates tested in this study are annotated as numbers. Right panel: mRNA and long, non-coding RNA (lncRNA) (n=17,282) expression per gene in two blood samples in transcripts per million (TPM), RNAs with putative circular isoforms (n=2,523) are highlighted in blue; R-values: Spearman correlation for RNAs found in both samples. (C) ENSEMBLE genome annotation for reproducibly detected circRNA candidates (see also FIG. 5). Number of circRNAs with at least one splice site in each category is given. (D) Number of distinct circRNA candidates per gene. y-axis=log2(circRNA frequency+1). Gene names with the highest numbers are highlighted. (E) Expression level of top 8 circRNA candidates measured with sequencing (left panel) and divergent primers in qPCR (right); Ct=cycle threshold linear control genes VCL and TFRC were measured with convergent primer.



FIG. 2: Top expressed blood circRNAs dominate over linear RNA isoforms. (A) Example for the read coverage of a top expressed blood circRNA produced from the PCNT gene locus (http://genome.ucsc.edu; see Kent W J, Sugnet C W, Furey T S, et al. The human genome browser at UCSC. Genome Research. 2002; 12(6):996-1006). Data are shown for the human HEK293 cell line (see Ivanov A, Memczak S, Wyler E, et al. Analysis of Intron Sequences Reveals Hallmarks of Circular RNA Biogenesis in Animals. CellReports. 2015; 10(2):170-177) and two biologically independent blood RNA preparations. (B) Relative expression and raw Ct values of top expressed blood circRNAs and corresponding linear isoforms in HEK293 cells and whole blood (C).



FIG. 3: Circular to linear RNA isoform expression is high in blood compared to other tissues. (A) Comparison of circular to linear RNA isoforms in blood. circRNAs measured by head-to-tail spanning reads. As a proxy for linear RNA expression median linear splice site spanning reads were counted. Data are shown for one replicate each of blood cerebellum (B) and liver (C). Relative fraction of circRNA candidates with >4× higher expression than linear isoforms are given as inset, eight tested candidates are indicated by numbers, circRNAs derived from hemoglobin are marked in (A). (D) mean circular-to-linear RNA expression ratio for the same samples, in two biological independent replicates. Error bars indicate the standard error of the mean, *** denotes P<0.001 permutation test on pooled replicate data (see Method section in the Examples). (A-D) represent expression datasets for one replicate per sample (FIG. 15).



FIG. 4: Comparative analysis of blood circRNA expression in Alzheimer's disease patients and controls. (A) Principle Component Analysis (PCA) of circRNA expression for 5 control (H) and 5 Alzheimer's disease (AD) patients. A circRNA subset comprising the top 910 (out of 20,969) detected circRNAs was analyzed (see Results section in the Examples). (B) analysis as in (A) for the corresponding linear RNA isoforms measured by median read count of linear splice junctions. (C) Expression of 200 circRNA candidates with highest weight in PC2 (see A) were used for unsupervised clustering (Spearman's rank correlation as distance metric, see Method section in the Examples). PC2 represents the diseased/healthy principle component. Histograms show expression distribution. Patient details are given underneath each patient ID. (D) analysis as in (C) but for linear RNAs of the corresponding genes (n=167).



FIG. 5: Reproducibility of circRNA candidate detection. The overlap of 2,442 circRNAs found with at least 2 read counts in both samples is considered as reproducibly detected circRNA set.



FIG. 6: Technical reproducibility of circRNA candidate detection. A library of blood sample 1 (H_1) was sequenced twice (see Table 2).



FIG. 7: GO annotation of circRNAs and linear RNAs in blood. Significantly enriched GO terms (p<0.05) for circRNAs found in both samples (n=2,442) and for the same number of top expressed linear RNAs.



FIG. 8: Predicted circRNA length. Predicted spliced circRNA length distributions for circRNA candidates detected in liver, cerebellum and blood.



FIG. 9: circRNA candidate validation. (A) Top circRNA candidate expression was measured in qPCR using divergent primer on mock or RNase R treated total RNA preparation. 7/8 were successfully amplified while candidate 7 did not yield specific PCR products and is therefore excluded from further analysis. Linear RNAs and previously described circRNAs are shown as controls. (B) PCR amplicons for divergent and convergent primer sets (c—circular, l—linear) of the tested candidates, end point analysis after 40 cycles. (C) PCR amplicons were subjected to Sanger sequencing and checked for the presence of a head-to-tail junction, representative example result is shown.



FIG. 10: Comparison of circRNA candidates in blood to liver and cerebellum. (A) Comparison of circular RNA candidates detected in blood (sample 1) and cerebellum shown for the whole expression range. (B) fraction of circRNA candidates that overlap between the two samples binned by blood expression level. (C, D) analysis as before but for liver circRNA candidates.



FIG. 11: Correlation of linear RNAs in cerebellum and blood and liver and blood. Number of detected transcripts: blood=29,908; cerebellum=38,192; liver=27,880; TPM=transcripts per million.



FIG. 12: Comparison circ-to-linear expression by RNA-Seq and qPCR. Raw Ct values (Cycle threshold) and median linear splice junction spanning read counts are given for the respective RNA isoform.



FIG. 13: Histogram of principle components. Principle components (PC) were calculated from the analysis shown in FIG. 4 (A, B).



FIG. 14: Number of exons per circRNA in blood. Histogram of number of exons per circRNA. Reproducible detected set (2,442) without intergenic circRNAs (n=27); median exon number: 2, mean exon number: 2.8.



FIG. 15: List of circRNAs detected in human blood. Genomic location, ENSEMBL gene identifier and symbols and gene biotype are given together in Table 1, infra. Here, raw read counts for each circRNA candidate in each sample of healthy subjects (H_1 to H_5) and subjects suffering from Alzheimer's disease (AD_1 to AD_5) are given.





DETAILED DESCRIPTION OF THE INVENTION

As outlined herein and exemplified in the Examples, the inventors for the first time provide evidence that circular RNAs (circRNA) is present in whole blood in great amounts and suited as biomarker for diseases in a subject. Hence, the invention relates to a method for diagnosing a disease of a subject, comprising the step of determining the presence or absence of one or more circular RNA (circRNA) in a sample of a bodily fluid of said subject; wherein the presence or absence of said one or more circRNA is indicative for the disease.


It was unexpectedly possible to show a correlation of differing levels of RNAs and a disease in a tissue other than the sample tissue, i.e. other than the bodily fluid tested. Hence, in a preferred embodiment said disease is not a disease of said bodily fluid. The gist of the present invention is that circRNAs in bodily fluids like blood are unexpectedly suited as biomarkers.


The term “biomarker” (biological marker) was introduced in 1989 as a Medical Subject Heading (MeSH) term: “measurable and quantifiable biological parameters (e.g., specific enzyme concentration, specific hormone concentration, specific gene phenotype distribution in a population, presence of biological substances) which serve as indices for health- and physiology-related assessments, such as disease risk, psychiatric disorders, environmental exposure and its effects, disease diagnosis, metabolic processes, substance abuse, pregnancy, cell line development, epidemiologic studies, etc.” In 2001, an NIH working group standardized the definition of a biomarker as “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention” and defined types of biomarkers. A biomarker may be measured on a biosample (as a blood, urine, or tissue test), it may be a recording obtained from a person (blood pressure, ECG, or Holter), or it may be an imaging test (echocardiogram or CT scan). Biomarkers can indicate a variety of health or disease characteristics, including the level or type of exposure to an environmental factor, genetic susceptibility, genetic responses to exposures, markers of subclinical or clinical disease, or indicators of response to therapy. Thus, a simplistic way to think of biomarkers is as indicators of disease trait (risk factor or risk marker), disease state (preclinical or clinical), or disease rate (progression). Accordingly, biomarkers can be classified as antecedent biomarkers (identifying the risk of developing an illness), screening biomarkers (screening for subclinical disease), diagnostic biomarkers (recognizing overt disease), staging biomarkers (categorizing disease severity), or prognostic biomarkers (predicting future disease course, including recurrence and response to therapy, and monitoring efficacy of therapy). The biomarkers of the present invention are preferably antecedent or screening biomarkers. Hence, the methods are methods for diagnosing the presence or the risk for acquiring a disease.


Biomarkers may also serve as surrogate end points. Although there is limited consensus on this issue, a surrogate end point is one that can be used as an outcome in clinical trials to evaluate safety and effectiveness of therapies in lieu of measurement of the true outcome of interest. The underlying principle is that alterations in the surrogate end point track closely with changes in the outcome of interest. Surrogate end points have the advantage that they may be gathered in a shorter time frame and with less expense than end points such as morbidity and mortality, which require large clinical trials for evaluation. Additional values of surrogate end points include the fact that they are closer to the exposure/intervention of interest and may be easier to relate causally than more distant clinical events. An important disadvantage of surrogate end points is that if the clinical outcome of interest is influenced by numerous factors (in addition to the surrogate end point), residual confounding may reduce the validity of the surrogate end point. It has been suggested that the validity of a surrogate end point is greater if it can explain at least 50% of the effect of an exposure or intervention on the outcome of interest.


A “sample” in the meaning of the invention can be all biological fluids of the subject, such as lymph, saliva, urine, cerebrospinal fluid or blood. The sample is collected from the patient or subjected to the diagnosis according to the invention. The sample of the bodily fluid is in a preferred embodiment selected from the group consisting of blood, cerebrospinal fluid, saliva, serum, plasma, and semen, the most preferred embodiment of the sample is a whole blood sample. A “sample” in the meaning of the invention may also be a sample originating from a biochemical or chemical reaction such as the product of an amplification reaction. Liquid samples may be subjected to one or more pre-treatments prior to use in the present invention. Such pre-treatments include, but are not limited to dilution, filtration, centrifugation, concentration, sedimentation, precipitation or dialysis. Pre-treatments may also include the addition of chemical or biochemical substances to the solution, e.g. in order to stabilize the sample and the contained nucleic acids, in particular the circRNAs. Such addition of chemical or biochemical substances include acids, bases, buffers, salts, solvents, reactive dyes, detergents, emulsifiers, or chelators, like EDTA. The sample may for instance be taken and directly mixed with such substances. In a particularly preferred embodiment of the invention the sample is a whole blood sample. The whole blood sample is preferably not pre-treated by means of dilution, filtration, centrifugation, concentration, sedimentation, precipitation or dialysis. It is, however, preferred that substances are added to the sample in order to stabilize the sample until onset of analysis. “Stabilizing” in this context means prevention of degradation of the circRNAs to be determined. Preferred stabilizers in this context are EDTA, e.g. K2EDTA, RNase inhibitors, alcohols e.g. ethanol and isopropanol, agents used to salt out proteins (such as RNAlater).


“Whole blood” is a venous, arterial or capillary blood sample in which the concentrations and properties of cellular and extra-cellular constituents remain relatively unaltered when compared with their in vivo state. In some embodiments, anticoagulation in vitro stabilizes the constituents in a whole blood sample.


In a preferred embodiment the sample comprises a nucleic acid or nucleic acids. The term “nucleic acid” is here used in its broadest sense and comprises ribonucleic acids (RNA) and deoxyribonucleic acids (DNA) from all possible sources, in all lengths and configurations, such as double stranded, single stranded, circular, linear or branched. All sub-units and sub-types are also comprised, such as monomeric nucleotides, oligomers, plasmids, viral and bacterial nucleic acids, as well as genomic and non-genomic DNA and RNA from the subject, circular RNA (circRNA), messenger RNA (mRNA) in processed and unprocessed form, transfer RNA (tRNA), heterogeneous nuclear RNA (hn-RNA), ribosomal RNA (rRNA), complementary DNA (cDNA) as well as all other conceivable nucleic acids. However, in the most preferred embodiment the sample comprises circRNAs.


“Presence” or “absence” of a circRNA in connection with the present invention means that the circRNA is present at levels above a certain threshold or below a certain threshold, respectively. In case the threshold is “0” this would mean that “presence” is the actual presence of circRNA in the sample and “absence” is the actual absence. However, “presence” in context with the present invention may also mean that the respective circRNA is present at a level above a threshold, e.g. the levels determined in a control. “absence” in this context then means that the level of the circRNA is at or below the certain threshold. Hence, it is preferred that the method of the present invention comprises determining of the level of one or more circRNA and comparing it to a control level of said one or more circRNA. In a preferred embodiment of the invention the determination step comprises: (i) determining the level of said one or more circRNA; and (ii) comparing the determined level to a control level of said one or more circRNA; wherein differing levels between the determined and the control level are indicative for the disease. In other words, the invention relates to a method for diagnosing a disease of a subject, comprising the step of (i) determining the level of said one or more circRNA; and (ii) comparing the determined level to a control level of said one or more circRNA; wherein differing levels between the determined and the control level are indicative for the disease.


The term “control level” relates to a level to which the determined level is compared in order to allow the distinction between “presence” or “absence” of the circRNA. The control level is preferably the level which is determinant for the deductive step of making the actual diagnose. Control level in a preferred embodiment relates to the level of the respective circRNA in a healthy subject or a population of healthy subjects, i.e. a subject not having the disease to be diagnosed, e.g. not having a neurodegenerative disease, such as Alzheimer's disease. The skilled person with the disclosure of the present application is in the position to determine suited control levels using common statistical methods.


In the context of the present invention, the levels of the one or more circRNA may be analyzed in a number of fashions well known to a person skilled in the art. For example, each assay result obtained may be compared to a “normal” or “control” value, or a value indicating a particular disease or outcome. A particular diagnosis/prognosis may depend upon the comparison of each assay result to such a value, which may be referred to as a diagnostic or prognostic “threshold”. In certain embodiments, assays for one or more diagnostic or prognostic indicators are correlated to a condition or disease by merely the presence or absence of the circRNAs in the assay. For example, an assay can be designed so that a positive signal only occurs above a particular threshold level of interest, and below which level the assay provides no signal above background.


The sensitivity and specificity of a diagnostic and/or prognostic test depends on more than just the analytical “quality” of the test, they also depend on the definition of what constitutes an abnormal result, i.e. when a level may be regarded as differing from a control level. In practice, Receiver Operating Characteristic curves (ROC curves), are typically calculated by plotting the value of a variable versus its relative frequency in “normal” (i.e. apparently healthy individuals not having ovarian cancer) and “disease” populations. For any particular marker, a distribution of marker levels for subjects with and without a disease will likely overlap. Under such conditions, a test does not absolutely distinguish normal from disease with 100% accuracy, and the area of overlap indicates where the test cannot distinguish normal from disease. A threshold is selected, below which the test is considered to be abnormal and above which the test is considered to be normal. The area under the ROC curve is a measure of the probability that the perceived measurement will allow correct identification of a condition. ROC curves can be used even when test results don't necessarily give an accurate number. As long as one can rank results, one can create a ROC curve. For example, results of a test on “disease” samples might be ranked according to degree (e.g. 1=low, 2=normal, and 3=high). This ranking can be correlated to results in the “normal” or “control” population, and a ROC curve created. These methods are well known in the art. See, e.g., Hanley et al. 1982. Radiology 143: 29-36. Preferably, a threshold is selected to provide a ROC curve area of greater than about 0.5, more preferably greater than about 0.7, still more preferably greater than about 0.8, even more preferably greater than about 0.85, and most preferably greater than about 0.9. The term “about” in this context refers to +/−5% of a given measurement.


The horizontal axis of the ROC curve represents (1-specificity), which increases with the rate of false positives. The vertical axis of the curve represents sensitivity, which increases with the rate of true positives. Thus, for a particular cut-off selected, the value of (1-specificity) may be determined, and a corresponding sensitivity may be obtained. The area under the ROC curve is a measure of the probability that the measured marker level will allow correct identification of a disease or condition. Thus, the area under the ROC curve can be used to determine the effectiveness of the test.


In other embodiments, a positive likelihood ratio, negative likelihood ratio, odds ratio, or hazard ratio is used as a measure of a test's ability to predict risk or diagnose a disease. In the case of a positive likelihood ratio, a value of 1 indicates that a positive result is equally likely among subjects in both the “diseased” and “control” groups; a value greater than 1 indicates that a positive result is more likely in the diseased group; and a value less than 1 indicates that a positive result is more likely in the control group. In the case of a negative likelihood ratio, a value of 1 indicates that a negative result is equally likely among subjects in both the “diseased” and “control” groups; a value greater than 1 indicates that a negative result is more likely in the test group; and a value less than 1 indicates that a negative result is more likely in the control group.


In the case of an odds ratio, a value of 1 indicates that a positive result is equally likely among subjects in both the “diseased” and “control” groups; a value greater than 1 indicates that a positive result is more likely in the diseased group; and a value less than 1 indicates that a positive result is more likely in the control group.


In the case of a hazard ratio, a value of 1 indicates that the relative risk of an endpoint (e.g., death) is equal in both the “diseased” and “control” groups; a value greater than 1 indicates that the risk is greater in the diseased group; and a value less than 1 indicates that the risk is greater in the control group.


The skilled artisan will understand that associating a diagnostic or prognostic indicator, with a diagnosis or with a prognostic risk of a future clinical outcome is a statistical analysis. For example, a marker level of lower than X may signal that a patient is more likely to suffer from an adverse outcome than patients with a level more than or equal to X, as determined by a level of statistical significance. For another marker, a marker level of higher than X may signal that a patient is more likely to suffer from an adverse outcome than patients with a level less than or equal to X, as determined by a level of statistical significance. Additionally, a change in marker concentration from baseline levels may be reflective of patient prognosis, and the degree of change in marker level may be related to the severity of adverse events. Statistical significance is often determined by comparing two or more populations, and determining a confidence interval and/or a p value. See, e.g., Dowdy and Wearden, Statistics for Research, John Wiley & Sons, New York, 1983. Preferred confidence intervals of the invention are 90%, 95%, 97.5%, 98%, 99%, 99.5%, 99.9% and 99.99%, while preferred p values are 0.1, 0.05, 0.025, 0.02, 0.01, 0.005, 0.001, and 0.0001.


Suitable threshold levels for the diagnosis of the disease can be determined for certain combinations of circRNAs. This can e.g. be done by grouping a reference population of patients according to their level of circRNAs into certain quantiles, e.g. quartiles, quintiles or even according to suitable percentiles. For each of the quantiles or groups above and below certain percentiles, hazard ratios can be calculated comparing the risk for an adverse outcome, i.e. a “disease” or “Alzheimer's disease”, between those patients who have a certain disease and those who have not. In such a scenario, a hazard ratio (HR) above 1 indicates a higher risk for an adverse outcome for the patients. A HR below 1 indicates beneficial effects of a certain treatment in the group of patients. A HR around 1 (e.g. +/−0.1) indicates no elevated risk for the particular group of patients. By comparison of the HR between certain quantiles of patients with each other and with the HR of the overall population of patients, it is possible to identify those quantiles of patients who have an elevated risk and those who benefit from medication and thereby stratify subjects according to the present invention.


In some cases presence of the disease will not affect patients with levels (e.g. in the fifth quintile) of a circRNA different from the “control level”, while in other cases patients with levels similar to the control level will be affected (e.g. in the first quintile). However, with the above explanations and his common knowledge, a skilled person is able to identify those groups of patients having a disease, e.g. a neurodegenerative disease as Alzheimer's disease. Exemplarily, some combinations of levels of circRNAs are listed for Alzheimer's disease in the appended examples. In another embodiment of the invention, the diagnosis is determined by relating the patient's individual level of marker peptide to certain percentiles (e.g. 97.5th percentile (in case increased levels being indicative for a disease) or the 2.5th percentile (in case decreased levels being indicative for a disease)) of a healthy population.


Kaplan-Meier estimators may be used for the assessment or prediction of the outcome or risk (e.g. diagnosis, relapse, progression or morbidity) of a patient.


“Equal” level in context with the present invention means that the levels differ by not more than ±10%, preferably by not more than ±5%, more preferably by not more than ±2%. “Decreased” or “increased” level in the context of the present invention mean that the levels differ by more than 10%, preferably by more than 15%, preferably more than 20%.


The term “subject” relates to a subject to be diagnosed, preferably a subject suspected to have or to have a risk for acquiring a disease, preferably a neurodegenerative disease, more preferably a subject suspected to have or to have a risk for acquiring Alzheimer's disease. The subject is preferably an animal, more preferably a mammal, most preferably a human.


The inventors have found that differential abundance of circRNA in samples of a bodily fluid is suited as a biomarker. It has been found that differing levels of circRNAs are correlating with a disease. This has been proven for Alzheimer's disease, a disease of neuronal tissue. Without being bound by reference, the correlation may be due to a passage of the circRNAs through the blood-brain-barrier, i.e. the circRNAs detected in the method according to the present invention are differentially expressed in the tissue of the disease, i.e. the tissue of interest. In case of the neurodegenerative disease (e.g. Alzheimer's disease) the tissue of interest is neuronal tissue, e.g. the brain. Hence, in one embodiment of the present invention said one or more circRNA, the differing levels of which in the bodily fluid are attributed to the presence of the disease, is differentially expressed between the diseased and non-diseased state in the tissue of interest. Alternatively, the circRNA levels may be of secondary nature, e.g. arise due to an immune response in the bodily fluid. Hence, in a further embodiment said one or more circRNA, the differing levels of which in the bodily fluid are attributed to the presence of the disease, is not differentially expressed between the diseased and non-diseased state in the tissue of interest.


Circular RNA” (circRNA) has been previously described. However, not in connection with their detection in a bodily fluid, e.g. blood. The skilled person is able to determine whether a detected RNA is a circular RNA. In particular, a circRNA does not contain a free 3′-end or a free 5′ end, i.e. the entire nucleic acid is circularized. The circRNA is preferably a circularized, single stranded RNA molecule. Furthermore, the circRNA according to the present invention is a result of a head-to-tail splicing event that results in a discontinuous sequence with respect to the genomic sequence encoding the RNA. This means that a first sequence being present 5′-upstream of a second sequence in the genomic context, on the circRNA said first sequence at its 5′-end is linked to the 3′ end of said second sequence and thereby closing the circle. The consequence of this arrangement is that at the junction where the 5′-end of said first sequence is linked to the 3′-end of said second sequence a unique sequence is build that is neither present in the genomic context nor in the normally transcribed RNA, e.g. mRNA. It has been found by the inventors that these junctions in all identified circRNAs, in the genomic context, are flanked by the canonical splice sequence, the GT/AG splice signal known by the skilled person. Hence, the circRNAs according to the present invention preferably contain an exon-exon junction in a head-to-tail arrangement, as visualized in FIG. 1A as a result of a back-splicing reaction. The skilled person will recognize that a usual mRNA transcript contains exon-exon junctions in a tail-to-head arrangement, i.e. the 3′ end (tail) of exon being upstream in the genomic context is linked to the 5′ end (head) of the exon being downstream in the genomic context. The actual junction, i.e. the point at which the one exon is linked to the other is also referred to herein as “breakpoint”. In a preferred embodiment the presence or absence of a circRNA or the level of a circRNA is determined by detection of an exon-exon-junction in a head-to-tail arrangement. One possible approach is exemplified in the enclosed examples. The detection of circular RNA has been previously described in Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338, which is incorporated herein by reference in particular as relates to the detection and annotation of circRNAs. The biogenesis of many mammalian circRNAs depends on complementary sequences within flanking introns (see Ashwal-Fluss R, Meyer M, Pamudurti N R, et al. circRNA Biogenesis Competes with Pre-mRNA Splicing. MOLCEL. 2014; 1-12; Rybak-Wolf A, Stottmeister C, Glažar P, et al. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. MOLCEL. 2015; 1-17; Zhang X-O, Wang H-B, Zhang Y, et al. Complementary Sequence-Mediated Exon Circularization. Cell. 2014; Liang D, Wilusz J E. Short intronic repeat sequences facilitate circular RNA production. Genes and Development. 2014; Conn S J, Pillman K A, Toubia J, et al. The RNA Binding Protein Quaking Regulates Formation of circRNAs. Cell. 2015; 160(6):1125-1134; and Ivanov A, Memczak S, Wyler E, et al. Analysis of Intron Sequences Reveals Hallmarks of Circular RNA Biogenesis in Animals. CellReports. 2015; 10(2):170-177). Hence, in one embodiment the two introns upstream and downstream of and direct adjacent in the genomic context to the exons of the exon-exon junction (i.e. forming the exon-exon junction) in a head to tail arrangement often contain complementary sequences, e.g. a complementary sequence stretch of at least 15 nucleotides, preferably 500 nucleotides, more preferably 1000 nucleotides. For detection of circRNA in principle, the RNA of a sample is sequenced after reverse transcription and library preparation. Afterwards, the sequences are analyzed for the presence of exon-exon junctions in a head-to-tail arrangement. For instance RNA sequenced can be mapped to a reference genome using common mapping programs and software, e.g. bowtie2 (version 2.1.0; see Langmead B, Salzberg S L. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012; 9(4):357-359). Human reference genomes are known to the skilled person and include the human reference genome hg19 (February 2009, GRCh37; downloadable from the UCSC genome browser; see Kent W J, Sugnet C W, Furey T S, et al. The human genome browser at UCSC. Genome Research. 2002; 12(6):996-1006). Although circRNA detection in blood is possible without any preprocessing of the total RNA sample, it is preferred to deplete ribosomal RNAs (rRNA), preferably the majority of rRNA, to increase the sensitivity of circRNA detection, in particular when using RNA Sequencing approaches. To this end, the content of rRNAin the sample should be depleted to less than 20%, preferably less than 10%, more preferably less than 2% with respect to the total RNA content. The rRNA depletion may performed as known in the art, e.g. it may be facilitated by commercially available kits (e.g. Ribominus, Themo Scientific) or enzymatic methods (Xian Adiconis et al. Comprehensive comparative analysis of RNA sequencing methods for degraded or low input samples Nat Methods. 2013 July; 10(7): 10.1038/nmeth.2483.).


Further, in a preferred embodiment RNA sequences which map continuously to the genome by aligning without any trimming (end-to-end mode) are neglected. Reads not mapping continuously to the genome are preferably used for circRNA candidate detection. The terminal sequences (anchors) from the sequences, e.g. 20 nt or more, may be extracted and re-aligned independently to the genome. From this alignment the sequences may be extended until the full circRNA sequence is covered, i.e. aligned. Consecutively aligning anchors indicate linear splicing events whereas alignment in reverse orientation indicates head-to-tail splicing as observed in circRNAs (see FIG. 1A). The so identified resulting splicing events are filtered using the following criteria 1) GT/AG signal flanking the splice sites in the genomic context; 2) the breakpoint, i.e. the exon-exon-junction can be unambiguously detected; and 3) no more than 100 kilobases distance between the two splice sites in the genomic context. Furthermore, further optional criteria may be used, depending on the method chosen; e.g. a maximum of two mismatches when extending the anchor alignments; a breakpoint no more than two nucleotides inside the alignment of the anchors; at least two independent reads supporting the head-to-tail splice junction; and/or a minimum difference of 35 in the bowtie2 alignment score between the first and the second best alignment of each anchor.


The circRNAs according to the present invention may be detected using different techniques. As outlined herein, the exon-exon junction in a head-to-tail arrangement is unique to the circRNAs. Hence, the detection of these is preferred. Nucleic acid detection methods are commonly known to the skilled person and include probe hybridization based methods, nucleic acid amplification based methods, and nucleic acid sequencing, or combinations thereof. Hence, in a preferred embodiment of the present invention circRNA is detected using a method selected from the group consisting of probe hybridization based methods, nucleic acid amplification based methods, and nucleic acid sequencing.


Probe hybridization based method employ the feature of nucleic acids to specifically hybridize to a complementary strand. To this end nucleic acid probes may be employed that specifically hybridize to the exon-exon junction in a head-to-tail arrangement of the circRNA, i.e. to a sequence spanning the exon-exon junction, preferably to the region extending from 10 nt upstream to 10 nt downstream of the exon-exon junction, preferably to the region from 20 nt upstream to 20 nt downstream of the exon-exon junction, or even a greater region spanning the exon-exon junction. The skilled person will recognize that hybridization probes specifically hybridizing to the respective sequence of the circRNA may be used, as well as hybridization probes specifically hybridizing to the reverse complement sequence thereof, e.g. in case the circRNA is previously reverse transcribed to cDNA and/or amplified.


Hybridization can also be used as a measure of homology between two nucleic acid sequences. A nucleic acid sequence hybridizing specifically to an exon-exon junction in a head-to-tail arrangement according to the present invention may be used as a hybridization probe according to standard hybridization techniques. The hybridization of the probe to DNA or RNA from a test source (e.g., the bodily fluid, like whole blood, or amplified nucleic acids from the sample of the bodily fluid) is an indication of the presence of the relevant circRNA in the test source.


Hybridization conditions are known to those skilled in the art and can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y., 6.3.1-6.3.6, 1991. Preferably, specific hybridization refers to hybridization under stringent conditions. “Stringent conditions” are defined as equivalent to hybridization in 6× sodium chloride/sodium citrate (SSC) at 45° C., followed by a wash in 0.2×SSC, 0.1% SDS at 65° C.; or as equivalent to hybridization in commercially available hybridization buffers (e.g. ULTRAHyb, ThermoScientific) for blotting techniques and 5×SSC 0.5% SDS (750 mM NaCl, 75 mM sodium citrate, 0.5% sodiumdodecylsulfate, pH 7.0) for array based detection methods at 65° C.


The means and methods of the present invention preferably comprise the use of nucleic acid probes. A nucleic acid probe according to the present invention is an oligonucleotide, nucleic acid or a fragment thereof, which is substantially complementary to a specific nucleic acid sequence. “substantially complementary” refers to the ability to hybridize to the specific nucleic acid sequence under stringent conditions.


The skilled person knows means and methods to determine the levels of nucleic acids in a sample and compare them to control levels. Such methods may employ labeled nucleic acid probes according to the invention. “Labels” include fluorescent or enzymatic active labels as further defined herein below. Such methods include real-time PCR methods and microarray methods, like Affimetrix®, nanostring and the like.


The determination of the circRNAs or their level may also be detected using sequencing techniques. The skilled person is able to use sequencing techniques in connection with the present invention. Sequencing techniques include but are not limited to Maxam-Gilbert Sequencing, Sanger sequencing (chain-termination method using ddNTPs), and next generation sequencing methods, like massively parallel signature sequencing (MPSS), polony sequencing, 454 pyrosequencing, Illumina (Solexa) sequencing, SOLiD sequencing, or ion torrent semiconductor sequencing or single molecule, real-time technology sequencing (SMRT).


The detection/determination of the circRNAs and the respective level may also employ nucleic acid amplification method alone or in combination with the sequencing and/or hybridization method. Nucleic acid amplification may be used to amplify the sequence of interest prior to detection. It may however also be used for quantifying a nucleic acid, e.g. by real-time PCR methods. Such methods are commonly known to the skilled person. Nucleic acid amplification methods for example include rolling circle amplification (such as in Liu, et al., “Rolling circle DNA synthesis: Small circular oligonucleotides as efficient templates for DNA polymerases,” J. Am. Chem. Soc. 118:1587-1594 (1996).), isothermal amplification (such as in Walker, et al., “Strand displacement amplification—an isothermal, in vitro DNA amplification technique,” Nucleic Acids Res. 20(7):1691-6 (1992)), ligase chain reaction (such as in Landegren, et al., “A Ligase-Mediated Gene Detection Technique,” Science 241:1077-1080, 1988, or, in Wiedmann, et al., “Ligase Chain Reaction (LCR)—Overview and Applications,” PCR Methods and Applications (Cold Spring Harbor Laboratory Press, Cold Spring Harbor Laboratory, N Y, 1994) pp. S51-S64.)). Nucleic-acid amplification can be accomplished by any of the various nucleic-acid amplification methods known in the art, including but not limited to the polymerase chain reaction (PCR), ligase chain reaction (LCR), transcription-based amplification system (TAS), nucleic acid sequence based amplification (NASBA), rolling circle amplification (RCA), transcription-mediated amplification (TMA), self-sustaining sequence replication (3SR) and Qβ amplification. It will be readily understood that the amplification of the circRNA may start with a reverse transcription of the RNA into complementary DNA (cDNA), optionally followed by amplification of the so produced cDNA.


It may be desirable to reduce or diminish non circRNA prior to the determination or the presence or level of the circRNAs. To this end RNA degrading agents may be added to the sample and/or the isolated total nucleic acids, e.g. total RNA, thereof, wherein said RNA degrading agent does not degrade circRNAs or does degrade circRNAs only at lower rates as compared to linear RNAs. One such agent is RNase R. RNase R is a 3′-5′ exoribonuclease closely related to RNase II, which has been shown to be involved in selective mRNA degradation, particularly of non stop mRNAs in bacteria (see Cheng; Deutscher, M P et al. (2005). “An important role for RNase R in mRNA decay”. Molecular Cell 17(2):313-318; and Venkataraman, K; Guja, K E; Garcia-Diaz, M; Karzai, A W (2014). “Non-stop mRNA decay: a special attribute of trans-translation mediated ribosome rescue.”; Frontiers in microbiology 5:93. Suzuki H1, Zuo Y, Wang J, Zhang M Q, Malhotra A, Mayeda A; Characterization of RNase R-digested cellular RNA source that consists of lariat and circular RNAs from pre-mRNA splicing; Nucleic Acids Res. 2006 May 8; 34(8):e63.). RNase R has homologues in many other organisms. When a part of another larger protein has a domain that is very similar to RNase R, this is called an RNase R domain. Hence, in a preferred embodiment the sample is treated with RNase R before determination of the circRNA to deplete linear RNA isoforms from the total RNA preparation and thereby increase detection sensitivity.


As outlined herein, the diagnostic or prognostic value of a single circRNA may not be sufficient in order to allow a diagnosis or prognosis with a reliable result. In such case it may be desirable to determine the presence or level of more than one circRNA in the sample and optionally comparing them to the respective control level. The skilled person will acknowledge that these more than one circRNAs may be chosen from a predetermined panel of circRNAs. Such panel usually includes the minimum number of circRNAs necessary to allow a reliable diagnosis or prognosis. The number of circRNAs of the panel may vary depending on the desired reliability and/or the prognostic or diagnostic value of the included circRNAs, e.g. when determined alone. Hence, the method according to the present invention in a preferred embodiment determines more than one circRNA from a panel of circRNAs, e.g. their presence or absence, or level, respectively.


The panel for obtaining the desired may be chosen according to the needs. In particular the skilled person may apply statistical approaches as outlined herein in order to validate the diagnostic and/or prognostic significance of a certain panel. The inventors have herein shown for a neurodegenerative disease the development of a certain panel of circRNAs giving a reasonable degree of certainty. The skilled person may apply common statistical techniques in order to develop a panel of circRNAs. Such statistical techniques include cluster analysis (e.g. hierarchical or k-means clustering), principle component analysis or factor analysis.


In principle, the statistical methods aim the identification of circRNAs or panels of circRNAs that exhibit differing presence and/or levels in samples of diseased and healthy/normal subjects. As outlined, the panel is preferably a panel of more than one circRNA, i.e. a plurality. In a preferred embodiment of the invention said panel comprises a plurality of circRNAs that have been identified as being present at differing levels in bodily fluid samples of patients having the disease and patients not having the disease. The panel of circRNAs has been preferably identified by principle component analysis or clustering.


The “principle component analysis” (PCA) (as also used exemplified herein) regards the analysis of factors differing between diseased and healthy subjects. PCA is known to the skilled person (see Pearson K., “On lines and planes of closest fit to systems of points in space”, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2.11 559-572 (1901), and Hotelling H., “Analysis of a complex of statistical variables into principal components” Journal of educational psychology 24.6 417 (1933)). The circRNAs to be chosen for the principle component analysis may be those previously determined in samples of healthy and/or diseased subject. Thresholds may be incorporated in order to consider a circRNA for further analysis, in a preferred embodiment only circRNAs having an expression value of at least 6.7 after variance stabilizing transformation of raw read counts in one of the samples. PCA may be performed on circRNAs included in the analysis using the prcomp function of the standard package “stats” of the “R” programming language (R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0). Depending on the circRNAs chosen, the disease and other factors, the weights can vary. However, the skilled artisan will acknowledge that these circRNAs with the highest weight as regards the principle component of interest, i.e. disease/healthy state, shall be chosen in order to obtain the circRNAs with the highest predictive absolute values. PCA is used to visualize and measure the amount of variation in a data set. Mathematically, PCA is an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data lies on the first coordinate which is called Principal Component 1 (PC1) and so on. The before mentioned calculated weight represents the distance of each circular RNA to this specific projection. Thus, the higher the absolute value the more relevant is for this projection.


“Hierarchical Clustering” (also referred to herein as “clustering”) may be performed as known in the art (reviewed in Murtagh, F and Conteras, P “Methods of hierarchical clustering” arXiv preprint arXiv:1105.0121 (2011)). Samples may be clustered on log2 transformed normalized circRNA expression profiles (log2(ni+1)). Hierarchical, agglomerative clustering may be performed with complete linkage and optionally by further using Spearman's rank correlation as distance metric (1−{corr [log(ni+1)]}). “The goal of cluster analysis is to partition observations (here circRNA expression) into groups (“clusters”) so that the pairwise dissimilarities between those assigned to the same cluster tend to be smaller than those in different clusters” (see Friedman J, Hastie T, and Tibshiriani R, “The elements of statistical learning”, Vol. 1. Soringer, Berlin: Springer series in statistics (2001)). Here, the measure for dissimilarity is defined as the Spearman's rank correlation. A visualization and complete description of the hierarchical clustering is provided by a dendrogram.


The inventors have exemplified the method outlined above for a neurodegenerative disease, in particular Alzheimer's disease. A neurodegenerative disease in context with the present invention is to be understood as a disease associated with neurodegeneration. Neurodegeneration means a progressive loss of structure or function of neurons, including death of neurons. Many neurodegenerative diseases including ALS, Parkinson's, Alzheimer's, and Huntington's occur as a result of neurodegenerative processes. Nowadays, many similarities exist that relate these diseases to one another on a sub-cellular level. There are many parallels between different neurodegenerative disorders including atypical protein assemblies (protein misfolding and/or agglomeration) as well as induced cell death. Neurodegeneration can be found in many different levels of neuronal circuitry ranging from molecular to systemic. Hence, in a preferred embodiment of the present invention the disease is a neurodegenerative disease, preferably selected from the group of Alzheimer's, ALS, Parkinson's, and Huntington's.


In a particularly preferred embodiment the disease is Alzheimer's disease. Alzheimer's disease has been identified as a protein misfolding disease (proteopathy), causing plaque accumulation of abnormally folded amyloid beta protein, and tau protein in the brain. Plaques are made up of small peptides, 39-43 amino acids in length, called amyloid beta (Aβ). AP is a fragment from the larger amyloid precursor protein (APP). APP is a transmembrane protein that penetrates through the neuron's membrane. APP is critical to neuron growth, survival, and post-injury repair. In Alzheimer's disease, an unknown enzyme in a proteolytic process causes APP to be divided into smaller fragments. One of these fragments gives rise to fibrils of amyloid beta, which then form clumps that deposit outside neurons in dense formations known as senile plaques. AD is also considered a tauopathy due to abnormal aggregation of the tau protein. In AD, tau undergoes chemical changes, becoming hyperphosphorylated; it then begins to pair with other threads, creating neurofibrillary tangles and disintegrating the neuron's transport system. A patient, is classified as having Alzheimer's disease according to the criteria as set by the National Institute of Neurological and Communicative Disorders and Stroke (NINCDS) and the Alzheimer's disease and Related Disorders Association (ADRDA, now known as the Alzheimer's Association), the NINCDS-ADRDA Alzheimer's Criteria for diagnosis in 1984, extensively updated in 2007 (see McKhann G, Drachman D, Folstein M, et al. Clinical Diagnosis of Alzheimer's disease: Report of the NINCDS-ADRDA Work Group under the Auspices of Department of Health and Human Services Task Force on Alzheimer's disease. Neurology. 1984; 34(7):939-44; and Dubois B, Feldman H H, Jacova C, et al. Research Criteria for the Diagnosis of Alzheimer's disease: Revising the NINCDS-ADRDA Criteria. Lancet Neurology. 2007; 6(8):734-469). These criteria require that the presence of cognitive impairment, and a suspected dementia syndrome, be confirmed by neuropsychological testing for a clinical diagnosis of possible or probable Alzheimer's disease. A histopathologic confirmation including a microscopic examination of brain tissue is required for a definitive diagnosis. Good statistical reliability and validity have been shown between the diagnostic criteria and definitive histopathological confirmation (see Blacker D, Albert M S, Bassett S S, et al. Reliability and validity of NINCDS-ADRDA criteria for Alzheimer's disease. The National Institute of Mental Health Genetics Initiative. Archives of Neurology. 1994; 51(12):1198-204). Eight cognitive domains are most commonly impaired in AD memory, language, perceptual skills, attention, constructive abilities, orientation, problem solving and functional abilities. These domains are equivalent to the NINCDS-ADRDA Alzheimer's Criteria as listed in the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR) published by the American Psychiatric Association.


In a particular preferred embodiment of the present invention, it relates to a method for diagnosing a neurodegenerative disease in a subject comprises the steps of:

    • determining the level of one or more circRNA in a sample of a bodily fluid of said subject;
    • comparing the determined level to a control level of said one or more circRNA;


wherein differing levels between the determined and the control level are indicative for the disease. The neurodegenerative disease is most preferably Alzheimer's disease.


The inventors have identified specific circRNAs that have a predictive or diagnostic value as regards the neurodegenerative disease. In particular 910 highly expressed circRNAs have been identified that are differentially present in samples of patients with a neurodegenerative disease as compared to the healthy controls. These 910 circRNAs are particularly characterized by their exon-exon junction in a head-to-tail arrangement, as outlined herein above. The sequences encoding the 20 nucleotides upstream and 20 nucleotides downstream of said exon-exon junction in the respective circRNAs are given in SEQ ID NOs: 1 to 910. However, it may be sufficient to determine only 10 nucleotides upstream and 10 nucleotides downstream of the junction in order to detect the circRNAs specifically. Hence, in a preferred embodiment said one or more circRNA in the method for diagnosing the neurodegenerative disease comprises a sequence encoded by a sequence selected from the group consisting of nucleotides 11 to 30 of any of the sequences of SEQ ID NO:1 to SEQ ID NO:910. The circRNA may for instance be detected through determining the presence or levels of RNA comprising the respective sequences, e.g. by hybridization, sequencing and/or amplification methods as outlined herein. SEQ ID NO:1 to 1820 list the DNA sequences encoding the sequences of the exon-exon junctions or the complete sequences of the circRNAs of the present invention. “Encoded” in this regard means that the RNA encoded by the DNA sequence has the sequence of nucleotides as set out in the DNA sequence with the thymidines “T” being exchanged by uracils “U”, the backbone being ribonucleic acid instead of deoxyribonucleic acid. X


The inventors found that the circRNAs are indicative for the presence or the risk of acquiring a neurodegenerative disease when present at increased or decreased levels. Whether the presence of the specific circRNA at decreased or increased levels is indicative for the neurodegenerative disease is given in Table 1. Hence, in a particular preferred embodiment the presence of increased or decreased levels as defined in Table 1 under “diseased” for the circRNA comprising the respectively encoded sequence are indicative for the presence of or risk of acquiring a neurodegenerative disease, preferably for Alzheimer's disease. As outlined in the Table's legend, “+” denotes that increased levels and/or the presence of the respective circRNA are indicative for the presence or risk of acquiring Alzheimer's disease, while “−” denotes that decreased levels and/or the absence of the respective circRNA are indicative for the presence or risk of acquiring Alzheimer's disease.


As mentioned, the circRNAs may be detected through the unique sequences occurring at the exon-exon junction in the head-to-tail arrangement. However, in one embodiment the circRNA may be detected through detection of a larger portion of their sequence. In one embodiment of the method for diagnosing a neurodegenerative disease, preferably Alzheimer's disease, said one or more circRNA has a sequence as encoded by a sequence selected from the group consisting of SEQ ID NO:911 to SEQ ID NO:1820. Preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for the circRNA having the respective encoded sequence are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease.









TABLE 1







Preferred circRNAs in connection with the diagnosis of a neurodegenerative disease, preferably Alzheimer's disease:
















SEQ ID
SEQ ID










NO
NO full
circID


junction
length
“#”
chr
start
stop
gene
gene_name
score
diseased



















1
911
09611
chr17
29130918
29131126
ENSG00000176390
CRLF3
−3.022350822



2
912
04805
chr11
85718584
85742653
ENSG00000073921
PICALM
2.660787274
+


3
913
06983
chr14
35020919
35024118
NA
NA
−2.59734763



4
914
11279
chr19
37916769
37917280
ENSG00000196437
ZNF569
−2.257101936



5
915
09640
chr17
30315338
30315516
ENSG00000178691
SUZ12
−2.204418987



6
916
00725
chr1
1756835
1770677
ENSG00000078369
GNB1
−2.182283795



7
917
17120
chr4
54249939
54256040
ENSG00000145216
FIP1L1
−2.176395303



8
918
00155
chr1
114450630
114450813
ENSG00000118655
DCLRE1B
−2.150719181



9
919
18778
chr6
111583459
111585149
ENSG00000173214
KIAA1919
−2.150503499



10
920
09544
chr17
27778472
27778698
ENSG00000160551
TAOK1
−2.023829904



11
921
15325
chr3
169840378
169840532
ENSG00000173889
PHC3
−2.009445974



12
922
06048
chr12
938227
939110
ENSG00000060237
WNK1
−1.981641605



13
923
17070
chr4
48503639
48507632
ENSG00000075539
FRYL
−1.892671504


14
924
05238
chr12
121220457
121222396
ENSG00000157837
SPPL3
−1.85677938



15
925
17655
chr5
132227855
132228810
ENSG00000072364
AFF4
−1.850977839



16
926
01651
chr1
29313942
29323831
ENSG00000159023
EPB41
−1.8320179



17
927
11479
chr19
8520288
8528570
ENSG00000099783
HNRNPM
−1.802172301



18
928
17787
chr5
142434003
142437312
ENSG00000145819
ARHGAP26
−1.799997822



19
929
20940
chr8
101299728
101300495
ENSG00000034677
RNF19A
−1.790427008



20
930
10338
chr17
65916130
65919106
ENSG00000171634
BPTF
−1.790398051



21
931
14397
chr22
28306951
28310335
ENSG00000180957
PITPNB
1.774695607
+


22
932
09432
chr17
18208425
18210280
ENSG00000177302
TOP3A
−1.759757408



23
933
01741
chr1
31810021
31811895
ENSG00000121766
ZCCHC17
−1.757819625



24
934
15327
chr3
169840378
169843795
ENSG00000173889
PHC3
−1.733435542



25
935
01556
chr1
25553932
25554726
ENSG00000117614
SYF2
−1.731052005



26
936
18274
chr5
56542126
56545403
ENSG00000062194
GPBP1
−1.697565149



27
937
20658
chr7
72879768
72880731
ENSG00000009954
BAZ1B
−1.683472553



28
938
20552
chr7
44739739
44741214
ENSG00000105953
OGDH
−1.662553273



29
939
03293
chr10
70190192
70190417
ENSG00000138346
DNA2
−1.660365542



30
940
22764
chr9
96233422
96261168
ENSG00000048828
FAM120A
−1.651553618



31
941
15662
chr3
196118683
196120490
ENSG00000163960
UBXN7
−1.647242297



32
942
04222
chr11
3752620
3752808
ENSG00000110713
NUP98
−1.638985251



33
943
08121
chr15
59204761
59205895
ENSG00000137776
SLTM
−1.610197017



34
944
13514
chr2
99786012
99787892
ENSG00000158411
MITD1
−1.603567451



35
945
22065
chr9
126519981
126531842
ENSG00000119522
DENND1A
−1.603263457



36
946
21136
chr8
131302246
131370389
ENSG00000153317
ASAP1
−1.594570417



37
947
07290
chr14
56114742
56115588
ENSG00000126777
KTN1
−1.593658506



38
948
21493
chr8
42914234
42919358
ENSG00000168522
FNTA
−1.591101491



39
949
02565
chr10
105767934
105778666
ENSG00000065613
SLK
−1.571637751



40
950
13738
chr20
35672488
35672653
ENSG00000080839
RBL1
−1.533251478



41
951
09181
chr16
69729038
69729282
ENSG00000102908
NFAT5
−1.510313578



42
952
10581
chr18
20516716
20529676
ENSG00000101773
RBBP8
−1.491646626



43
953
16418
chr4
110412482
110416012
ENSG00000138802
SEC24B
−1.48101895



44
954
00480
chr1
15964801
15970145
ENSG00000197312
DDI2
−1.471661901



45
955
07128
chr14
50136241
50141145
ENSG00000100479
POLE2
−1.461703503



46
956
09687
chr17
34937773
34937968
ENSG00000005955
GGNBP2
−1.455641381



47
957
04374
chr11
5247878
5248265
ENSG00000244734
HBB
1.44536348
+


48
958
11059
chr19
10286215
10288043
ENSG00000130816
DNMT1
−1.420159643



49
959
21189
chr8
141595217
141616013
ENSG00000123908
AGO2
−1.420022161



50
960
20250
chr7
151946960
151948051
ENSG00000055609
KMT2C
−1.415607268



51
961
20208
chr7
148516069
148516779
ENSG00000106462
EZH2
−1.408045344



52
962
02272
chr1
78200031
78201843
ENSG00000077254
USP33
−1.398984454



53
963
00711
chr1
1747194
1756938
ENSG00000078369
GNB1
−1.381859545



54
964
19568
chr6
56989531
56999668
ENSG00000112200
ZNF451
−1.374430798



55
965
01141
chr1
212218001
212220759
ENSG00000143476
DTL
−1.37418011



56
966
02693
chr10
120797749
120797951
ENSG00000107581
EIF3A
−1.371606407



57
967
14089
chr21
17135209
17138460
ENSG00000155313
USP25
−1.370800154



58
968
08653
chr15
93540186
93541851
ENSG00000173575
CHD2
−1.363982557



59
969
04712
chr11
77402203
77404656
ENSG00000048649
RSF1
−1.347797823



60
970
06433
chr13
41517087
41518061
ENSG00000120690
ELF1
−1.329949423



61
971
04231
chr11
3789810
3789974
ENSG00000110713
NUP98
−1.328958073



62
972
15257
chr3
160131260
160132305
ENSG00000113810
SMC4
−1.327632403



63
973
08264
chr15
64791491
64792365
ENSG00000180357
ZNF609
−1.323996695



64
974
19854
chr7
101870646
101870949
ENSG00000257923
CUX1
−1.323311249



65
975
17166
chr4
56877577
56878151
ENSG00000174799
CEP135
−1.312060145



66
976
04708
chr11
77394754
77396204
ENSG00000048649
RSF1
−1.308996537



67
977
06000
chr12
89853414
89866052
ENSG00000139323
POC1B
−1.305211994



68
978
19065
chr6
150092297
150094305
ENSG00000120265
PCMT1
−1.285715752



69
979
01612
chr1
28362054
28384605
ENSG00000158161
EYA3
−1.283006706



70
980
07213
chr14
50952295
50952912
ENSG00000012983
MAP4K5
−1.277430489



71
981
02010
chr1
52293467
52299842
ENSG00000078618
NRD1
−1.265951283



72
982
10749
chr18
39607406
39629569
ENSG00000078142
PIK3C3
−1.244972909



73
983
12347
chr2
203921149
203922176
ENSG00000144426
NBEAL1
−1.238573412



74
984
10987
chr18
9195548
9221997
ENSG00000101745
ANKRD12
1.23352635
+


75
985
01864
chr1
41536266
41541123
ENSG00000010803
SCMH1
−1.230203935



76
986
20724
chr7
7826418
7841374
ENSG00000219545
RPA3-AS1
−1.228982871



77
987
22400
chr9
33960823
33963789
ENSG00000137073
UBAP2
−1.228711515



78
988
02152
chr1
63944434
63955889
ENSG00000142856
ITGB3BP
−1.22179497



79
989
16992
chr4
39839475
39843676
ENSG00000121892
PDS5A
−1.219666561



80
990
04329
chr11
47774467
47776216
ENSG00000109920
FNBP4
−1.21886247



81
991
10286
chr17
61841375
61842207
ENSG00000108588
CCDC47
−1.215198124



82
992
13162
chr2
61340904
61345251
ENSG00000162929
KIAA1841
−1.213144359



83
993
01070
chr1
205585605
205593019
ENSG00000158711
ELK4
−1.208469007



84
994
11293
chr19
39943995
39944161
ENSG00000196235
SUPT5H
−1.201547806



85
995
02313
chr1
87181406
87185318
ENSG00000097033
SH3GLB1
−1.194484292



86
996
11596
chr2
113057425
113057606
ENSG00000188177
ZC3H6
−1.192203498



87
997
15955
chr3
44835708
44835918
ENSG00000163808
KIF15
−1.18998281



88
998
07534
chr14
81297486
81307112
ENSG00000100629
CEP128
−1.187560558



89
999
00616
chr1
171537385
171544267
ENSG00000117523
PRRC2C
−1.187049294



90
1000
01909
chr1
46105881
46108171
ENSG00000159592
GPBP1L1
−1.185137589



91
1001
04787
chr11
85707868
85742653
ENSG00000073921
PICALM
1.180129801
+


92
1002
11447
chr19
57967020
57967550
ENSG00000268163
AC004076.9
−1.173577839



93
1003
16169
chr3
56661064
56662642
ENSG00000163946
FAM208A
−1.172753372



94
1004
09079
chr16
58594115
58594266
ENSG00000125107
CNOT1
−1.17086243



95
1005
00804
chr1
180366650
180382606
ENSG00000135847
ACBD6
−1.16836055



96
1006
13521
chr2
99985854
99988193
ENSG00000158417
EIF5B
−1.167808075



97
1007
02559
chr10
105197771
105198565
ENSG00000148843
PDCD11
−1.166398697



98
1008
06318
chr13
28155433
28155940
ENSG00000139517
LNX2
−1.162993126



99
1009
11563
chr2
109067483
109068922
ENSG00000135968
GCC2
−1.162334985



100
1010
02763
chr10
126631025
126631876
ENSG00000249456
RP11-
1.159634216
+









298J20.4


101
1011
10253
chr17
600657
602620
ENSG00000141252
VPS53
−1.155561006



102
1012
14888
chr3
129599151
129599402
ENSG00000172765
TMCC1
−1.151076826



103
1013
00354
chr1
154207066
154207767
ENSG00000143569
UBAP2L
−1.150469707



104
1014
22404
chr9
33971648
33973235
ENSG00000137073
UBAP2
−1.146045034



105
1015
05176
chr12
116668337
116669961
ENSG00000123066
MED13L
−1.143723652



106
1016
01113
chr1
21083658
21100103
ENSG00000127483
HP1BP3
−1.142758789



107
1017
13008
chr2
44729827
44732869
ENSG00000143919
CAMKMT
−1.14151883



108
1018
16696
chr4
153874650
153875471
ENSG00000137460
FHDC1
−1.13092216



109
1019
17750
chr5
138699447
138700432
ENSG00000120727
PAIP2
−1.125871509



110
1020
19832
chr6
99912479
99916494
ENSG00000123552
USP45
1.125162796
+


111
1021
22387
chr9
33948371
33948585
ENSG00000137073
UBAP2
−1.118516346



112
1022
01075
chr1
205696827
205698749
ENSG00000069275
NUCKS1
−1.112909385



113
1023
22554
chr9
5944870
5954095
ENSG00000183354
KIAA2026
−1.112044692



114
1024
05081
chr12
112116954
112121111
ENSG00000089234
BRAP
−1.108105518



115
1025
07080
chr14
45587230
45599993
ENSG00000100442
FKBP3
−1.107688517



116
1026
06866
chr14
23378691
23380612
ENSG00000100461
RBM23
−1.10744325



117
1027
12525
chr2
227729319
227732034
ENSG00000144468
RHBDD1
−1.099283508



118
1028
12560
chr2
231307651
231314970
ENSG00000067066
SP100
−1.092112695



119
1029
15719
chr3
197592293
197602646
ENSG00000186001
LRCH3
−1.089664619



120
1030
00593
chr1
169767997
169770112
ENSG00000000460
C1orf112
−1.08419511



121
1031
07174
chr14
50616725
50616948
ENSG00000100485
SOS2
−1.081425815



122
1032
18626
chr5
93964515
93966448
ENSG00000133302
ANKRD32
1.073184987
+


123
1033
19248
chr6
20781375
20846409
ENSG00000145996
CDKAL1
−1.072369586



124
1034
04914
chr12
102107867
102110590
ENSG00000111666
CHPT1
−1.062669426



125
1035
04376
chr11
5247941
5248230
ENSG00000244734
HBB
1.059317293
+


126
1036
15734
chr3
20113075
20113951
ENSG00000114166
KAT2B
−1.059120544



127
1037
16067
chr3
49372452
49373029
ENSG00000114316
USP4
−1.057601335



128
1038
17988
chr5
176402396
176409624
ENSG00000087206
UIMC1
−1.057060246



129
1039
00386
chr1
155408117
155429689
ENSG00000116539
ASH1L
−1.056345461



130
1040
20264
chr7
155457868
155473602
ENSG00000184863
RBM33
−1.052207723



131
1041
20916
chr8
100515063
100523740
ENSG00000132549
VPS13B
−1.050303405



132
1042
02796
chr10
13214375
13214765
ENSG00000065328
MCM10
−1.042082127



133
1043
20002
chr7
117825700
117828459
ENSG00000128534
NAA38
−1.041810719



134
1044
12270
chr2
202010100
202014558
ENSG00000003402
CFLAR
−1.036011434



135
1045
23007
chrX
149983334
149984551
ENSG00000102181
CD99L2
−1.033978093



136
1046
09230
chr16
71712657
71715808
ENSG00000040199
PHLPP2
−1.031262858



137
1047
22401
chr9
33960823
33973235
ENSG00000137073
UBAP2
−1.027473832



138
1048
07006
chr14
35331249
35331528
ENSG00000198604
BAZ1A
−1.022185742



139
1049
14241
chr21
38792600
38794168
ENSG00000157540
DYRK1A
−1.021673056



140
1050
16836
chr4
178274461
178274882
ENSG00000109674
NEIL3
−1.016081081



141
1051
10887
chr18
60206913
60217693
ENSG00000141664
ZCCHC2
−1.009357969



142
1052
22986
chrX
13684435
13698717
ENSG00000176896
TCEANC
1.007062962
+


143
1053
06929
chr14
31404368
31425448
ENSG00000196792
STRN3
−1.003076508



144
1054
07056
chr14
39627488
39628754
ENSG00000182400
TRAPPC6B
−1.001831479



145
1055
19098
chr6
155095122
155099179
ENSG00000213079
SCAF8
−1.000738262



146
1056
03734
chr11
108098321
108100050
ENSG00000149311
ATM
−0.995255586



147
1057
02126
chr1
62907158
62907970
ENSG00000162607
USP1
−0.994146093



148
1058
21287
chr8
21835280
21837714
ENSG00000130227
XPO7
−0.993409908



149
1059
15062
chr3
142116170
142123918
ENSG00000114127
XRN1
−0.991091493



150
1060
20509
chr7
36450122
36450775
ENSG00000011426
ANLN
−0.985382926



151
1061
10756
chr18
39623696
39629569
ENSG00000078142
PIK3C3
−0.980136493



152
1062
20561
chr7
48541721
48542148
ENSG00000179869
ABCA13
0.978360837
+


153
1063
08654
chr15
93540186
93545547
ENSG00000173575
CHD2
0.978292944
+


154
1064
08778
chr16
148142
150507
ENSG00000103148
NPRL3
0.975426406
+


155
1065
02929
chr10
22896855
22898646
ENSG00000150867
PIP4K2A
−0.9722031



156
1066
17110
chr4
52729602
52758017
ENSG00000109184
DCUN1D4
−0.966161676



157
1067
09420
chr17
1746096
1747980
ENSG00000132383
RPA1
−0.962147895



158
1068
04015
chr11
16205431
16256217
ENSG00000110693
SOX6
−0.961209315



159
1069
21439
chr8
41518947
41519459
ENSG00000029534
ANK1
−0.957238337



160
1070
19011
chr6
146209155
146216113
ENSG00000146414
SHPRH
0.956350194
+


161
1071
08356
chr15
66044716
66053776
ENSG00000174485
DENND4A
−0.953555637



162
1072
09604
chr17
29112936
29113049
ENSG00000176390
CRLF3
−0.952318815



163
1073
13634
chr20
32619327
32621124
ENSG00000125970
RALY
−0.952059983



164
1074
20447
chr7
27824781
27825108
ENSG00000106052
TAX1BP1
−0.951551903



165
1075
09322
chr16
85667519
85667738
ENSG00000131149
GSE1
−0.950544843



166
1076
20382
chr7
23224688
23224917
ENSG00000136243
NUPL2
−0.948456687



167
1077
01675
chr1
29319841
29320054
ENSG00000159023
EPB41
−0.947371967



168
1078
01264
chr1
222897433
222898897
ENSG00000162819
BROX
−0.944435092



169
1079
09473
chr17
20107645
20109225
ENSG00000128487
SPECC1
0.943075221
+


170
1080
04559
chr11
68318588
68331900
ENSG00000110075
PPP6R3
−0.940969479



171
1081
19673
chr6
76357446
76369123
ENSG00000112701
SENP6
−0.93979093



172
1082
15335
chr3
169854206
169867032
ENSG00000173889
PHC3
−0.93370794



173
1083
14366
chr22
22160138
22162135
ENSG00000100030
MAPK1
−0.932412536



174
1084
01252
chr1
22047528
22048257
ENSG00000090686
USP48
−0.92708516



175
1085
22466
chr9
36375930
36376124
ENSG00000137075
RNF38
−0.912116226



176
1086
01676
chr1
29319841
29323831
ENSG00000159023
EPB41
−0.907484412



177
1087
05433
chr12
1863423
1863680
ENSG00000006831
ADIPOR2
−0.906782889



178
1088
16984
chr4
39739039
39757359
ENSG00000078140
UBE2K
−0.906655604



179
1089
18946
chr6
13632601
13644961
ENSG00000010017
RANBP9
−0.905997471



180
1090
22377
chr9
33932559
33933626
ENSG00000137073
UBAP2
−0.904013554



181
1091
15300
chr3
169693395
169706147
ENSG00000008952
SEC62
−0.890692839



182
1092
08660
chr15
93543741
93558139
ENSG00000173575
CHD2
−0.889550329



183
1093
01562
chr1
25666964
25669564
ENSG00000183726
TMEM50A
−0.887890585



184
1094
09996
chr17
48828107
48828723
ENSG00000108848
LUC7L3
−0.887692142



185
1095
10763
chr18
43319127
43319627
ENSG00000141469
SLC14A1
−0.887285198



186
1096
05073
chr12
112096539
112097149
ENSG00000089234
BRAP
−0.886768771



187
1097
20330
chr7
16298014
16317851
ENSG00000214960
ISPD
−0.884134047



188
1098
01568
chr1
25678116
25679465
ENSG00000183726
TMEM50A
−0.882062038



189
1099
04367
chr11
5246893
5247941
ENSG00000244734
HBB
−0.881298453



190
1100
21442
chr8
41519318
41521260
ENSG00000029534
ANK1
−0.878728778



191
1101
15809
chr3
3178943
3186394
ENSG00000072756
TRNT1
−0.873695283



192
1102
08775
chr16
14720962
14721193
ENSG00000140694
PARN
−0.873513356



193
1103
01596
chr1
28116072
28120143
ENSG00000117758
STX12
−0.873353301



194
1104
15618
chr3
195791179
195796439
ENSG00000072274
TFRC
0.865977004
+


195
1105
04371
chr11
5247806
5254322
ENSG00000244734
HBB
−0.861493375



196
1106
17540
chr5
112321531
112339774
ENSG00000172795
DCP2
−0.859873285



197
1107
02864
chr10
17645558
17646046
ENSG00000165996
PTPLA
−0.856912854



198
1108
11369
chr19
48660270
48660397
ENSG00000105486
LIG1
−0.855477996



199
1109
19205
chr6
17646297
17649531
ENSG00000124789
NUP153
−0.851513363



200
1110
17131
chr4
54280781
54294350
ENSG00000145216
FIP1L1
−0.850581558



201
1111
14060
chr20
6011930
6012726
ENSG00000088766
CRLS1
−0.849745579



202
1112
04999
chr12
109046047
109048186
ENSG00000110880
CORO1C
−0.845189707



203
1113
09815
chr17
40652724
40653322
ENSG00000033627
ATP6V0A1
−0.843836528



204
1114
07981
chr15
50592985
50593565
ENSG00000104064
GABPB1
−0.836914748



205
1115
17411
chr4
89859238
89870589
ENSG00000138640
FAM13A
−0.834816404



206
1116
01550
chr1
24840803
24841057
ENSG00000117602
RCAN3
−0.833705319



207
1117
04237
chr11
3794861
3797251
ENSG00000110713
NUP98
−0.833473306



208
1118
06459
chr13
42040958
42042974
ENSG00000102760
RGCC
−0.833444657



209
1119
22388
chr9
33948371
33953472
ENSG00000137073
UBAP2
−0.832464951



210
1120
19582
chr6
57017018
57025950
ENSG00000112200
ZNF451
0.832178254
+


211
1121
02407
chr1
93683294
93692006
ENSG00000122483
CCDC18
−0.830332399



212
1122
23137
chrX
53641494
53642796
ENSG00000086758
HUWE1
−0.828040183



213
1123
13589
chr20
2944917
2945848
ENSG00000132670
PTPRA
−0.825864753



214
1124
14585
chr22
41531816
41536261
ENSG00000100393
EP300
−0.82395501



215
1125
02974
chr10
27453992
27454468
ENSG00000120539
MASTL
−0.822978236



216
1126
13436
chr2
8910799
8917022
ENSG00000134313
KIDINS220
0.822565477
+


217
1127
12752
chr2
26321530
26332775
ENSG00000084733
RAB10
−0.821885553



218
1128
21087
chr8
131092147
131104389
ENSG00000153317
ASAP1
−0.82162602



219
1129
02599
chr10
1125950
1126416
ENSG00000047056
WDR37
−0.821547428



220
1130
02264
chr1
78177431
78178966
ENSG00000077254
USP33
−0.821465245



221
1131
06099
chr12
96692646
96694138
ENSG00000059758
CDK17
−0.820285575



222
1132
03159
chr10
49609654
49618211
ENSG00000107643
MAPK8
0.818623607
+


223
1133
06939
chr14
31420068
31425448
ENSG00000196792
STRN3
−0.818620248



224
1134
17748
chr5
138614015
138614818
ENSG00000015479
MATR3
−0.81818695



225
1135
06781
chr14
103871412
103871604
ENSG00000075413
MARK3
−0.817182929



226
1136
09698
chr17
35800605
35800763
ENSG00000108264
TADA2A
−0.81705738



227
1137
12901
chr2
37426846
37428869
ENSG00000218739
CEBPZ-AS1
−0.816983939



228
1138
11110
chr19
13039155
13039661
ENSG00000179115
FARSA
−0.815139146



229
1139
15689
chr3
196876613
196888609
ENSG00000075711
DLG1
−0.812904778



230
1140
00858
chr1
185143503
185144245
ENSG00000116668
SWT1
−0.812024383



231
1141
10248
chr17
60061531
60062451
ENSG00000108510
MED13
−0.811297795



232
1142
16838
chr4
178274461
178281831
ENSG00000109674
NEIL3
−0.810885238



233
1143
06571
chr13
50642232
50649789
ENSG00000231607
DLEU2
−0.810759762



234
1144
01959
chr1
50956259
51001129
ENSG00000185104
FAF1
−0.806879972



235
1145
05880
chr12
69107644
69108533
ENSG00000111581
NUP107
−0.803562204



236
1146
13163
chr2
61343113
61345251
ENSG00000162929
KIAA1841
−0.802619965



237
1147
07836
chr15
41961025
41962156
ENSG00000174197
MGA
−0.801902039



238
1148
19213
chr6
17665469
17669259
ENSG00000124789
NUP153
−0.800253252



239
1149
04007
chr11
16133348
16208501
ENSG00000110693
SOX6
−0.796576061



240
1150
06028
chr12
93192667
93192862
ENSG00000102189
EEA1
−0.795690495



241
1151
20626
chr7
65592690
65599361
ENSG00000241258
CRCP
−0.795534527



242
1152
10962
chr18
76953182
76974038
ENSG00000166377
ATP9B
−0.792877388



243
1153
13660
chr20
33065576
33067594
ENSG00000078747
ITCH
−0.790880218



244
1154
15338
chr3
169863210
169867032
ENSG00000173889
PHC3
−0.789693985



245
1155
07828
chr15
41667909
41669502
ENSG00000137804
NUSAP1
−0.788962664



246
1156
17980
chr5
176370335
176385155
ENSG00000087206
UIMC1
−0.788756125



247
1157
14085
chr21
16386664
16415895
ENSG00000180530
NRIP1
−0.78857849



248
1158
21017
chr8
124089350
124117704
ENSG00000156787
TBC1D31
−0.787817422



249
1159
15597
chr3
195785154
195785503
ENSG00000072274
TFRC
−0.787214727



250
1160
17066
chr4
48371865
48385801
ENSG00000109171
SLAIN2
−0.785855689



251
1161
10120
chr17
57274904
57275150
ENSG00000068489
PRR11
−0.78426978



252
1162
13524
chr20
10536878
10541468
ENSG00000149346
SLX4IP
−0.782760423



253
1163
21766
chr8
95549330
95550574
ENSG00000164944
KIAA1429
−0.781355288



254
1164
13231
chr2
61749745
61753656
ENSG00000082898
XPO1
−0.781098538



255
1165
14042
chr20
57014000
57016139
ENSG00000124164
VAPB
−0.780718257



256
1166
16116
chr3
52446826
52448603
ENSG00000010318
PHF7
−0.780597733



257
1167
21279
chr8
21832180
21835354
ENSG00000130227
XPO7
−0.778597658



258
1168
03652
chr10
98618048
98667504
ENSG00000196233
LCOR
−0.775538046



259
1169
11792
chr2
144966169
144969146
ENSG00000121964
GTDC1
−0.771184033



260
1170
03028
chr10
31749965
31750166
ENSG00000148516
ZEB1
−0.771152714



261
1171
11627
chr2
11426664
11427862
ENSG00000134318
ROCK2
−0.768728393



262
1172
09942
chr17
45479497
45492285
ENSG00000178852
EFCAB13
0.768679343
+


263
1173
10858
chr18
55278868
55283207
ENSG00000134440
NARS
−0.767365826



264
1174
07253
chr14
55423751
55424353
ENSG00000198554
WDHD1
−0.766525169



265
1175
21408
chr8
37623043
37623873
ENSG00000147471
PROSC
−0.765299565



266
1176
01316
chr1
224599128
224601037
ENSG00000162923
WDR26
−0.765061353



267
1177
06007
chr12
89860546
89866052
ENSG00000139323
POC1B
−0.759596201



268
1178
07054
chr14
39623414
39627606
ENSG00000182400
TRAPPC6B
−0.757655748



269
1179
18441
chr5
72311452
72333042
ENSG00000157107
FCHO2
0.755970771
+


270
1180
00400
chr1
155695172
155695810
ENSG00000132676
DAP3
−0.755813418



271
1181
02388
chr1
93648916
93659301
ENSG00000122483
CCDC18
−0.751924076



272
1182
00200
chr1
117944807
117984947
ENSG00000198162
MAN1A2
−0.74921634



273
1183
05156
chr12
112798183
112798315
ENSG00000173064
HECTD4
0.748188995
+


274
1184
02650
chr10
11639629
11643979
ENSG00000148429
USP6NL
−0.748046374



275
1185
05799
chr12
62743001
62749256
ENSG00000135655
USP15
−0.748037916



276
1186
09092
chr16
66764014
66766408
ENSG00000135720
DYNC1LI2
−0.745511259



277
1187
10929
chr18
74561481
74563895
ENSG00000130856
ZNF236
−0.744223622



278
1188
21056
chr8
124392768
124392917
ENSG00000156802
ATAD2
−0.73775191



279
1189
01937
chr1
47745912
47748131
ENSG00000123473
STIL
−0.737462925



280
1190
08702
chr16
11114049
11154879
ENSG00000038532
CLEC16A
−0.733977011



281
1191
00443
chr1
158624600
158624741
ENSG00000163554
SPTA1
−0.733664113



282
1192
19873
chr7
102962378
102963241
ENSG00000105821
DNAJC2
−0.733541673



283
1193
02286
chr1
7837219
7838229
ENSG00000049245
VAMP3
−0.731067513



284
1194
07308
chr14
58690339
58690574
ENSG00000131966
ACTR10
−0.72935942



285
1195
20485
chr7
32672154
32678977
ENSG00000229358
DPY19L1P1
0.72562903
+


286
1196
00417
chr1
156303337
156304709
ENSG00000163468
CCT3
−0.724202091



287
1197
22727
chr9
88920106
88924932
ENSG00000083223
ZCCHC6
−0.722435269



288
1198
07807
chr15
41361767
41362745
ENSG00000128908
INO80
−0.722342802



289
1199
15663
chr3
196118683
196129890
ENSG00000163960
UBXN7
−0.718770738



290
1200
11682
chr2
122514815
122516382
ENSG00000211460
TSN
−0.717149492



291
1201
18087
chr5
32135677
32143986
ENSG00000113384
GOLPH3
−0.71686075



292
1202
15886
chr3
37170553
37190529
ENSG00000093167
LRRFIP2
−0.7148495



293
1203
15146
chr3
149563797
149639014
ENSG00000082996
RNF13
−0.712343388



294
1204
23244
chrX
77270158
77275895
ENSG00000165240
ATP7A
0.712089225
+


295
1205
07646
chr14
96986391
96987409
ENSG00000090060
PAPOLA
−0.711582141



296
1206
09875
chr17
4186092
4200109
ENSG00000132388
UBE2G1
−0.71126388



297
1207
01638
chr1
28907071
28907741
ENSG00000197989
SNHG12
−0.710805666



298
1208
20189
chr7
141752583
141778270
ENSG00000257335
MGAM
0.710063558
+


299
1209
00276
chr1
150198939
150201570
ENSG00000143401
ANP32E
−0.710010473



300
1210
08220
chr15
63988322
64008672
ENSG00000103657
HERC1
−0.709650893



301
1211
10863
chr18
55833019
55919286
ENSG00000049759
NEDD4L
−0.709240345



302
1212
13077
chr2
54278094
54284497
ENSG00000170634
ACYP2
−0.708784218



303
1213
04820
chr11
85733409
85742653
ENSG00000073921
PICALM
0.707847663
+


304
1214
17594
chr5
122881110
122893258
ENSG00000151292
CSNK1G3
0.70648759
+


305
1215
07062
chr14
39746137
39748741
ENSG00000258941
RP11-
−0.705740708










407N17.3


306
1216
16701
chr4
154315413
154318485
ENSG00000121211
MND1
−0.705435194



307
1217
08004
chr15
50875285
50878685
ENSG00000092439
TRPM7
−0.704828246



308
1218
16681
chr4
153332454
153333681
ENSG00000109670
FBXW7
0.704685694
+


309
1219
17907
chr5
162909647
162911251
ENSG00000072571
HMMR
−0.702946274



310
1220
05911
chr12
69983264
69985939
ENSG00000166226
CCT2
−0.702361818



311
1221
00169
chr1
115005725
115007010
ENSG00000197323
TRIM33
−0.700608549



312
1222
14223
chr21
37734480
37736557
ENSG00000159256
MORC3
−0.697854368



313
1223
22527
chr9
4860124
4860901
ENSG00000120158
RCL1
−0.695160044



314
1224
09138
chr16
68300495
68300624
ENSG00000103064
SLC7A6
0.694891988
+


315
1225
18377
chr5
68470703
68471364
ENSG00000134057
CCNB1
−0.694429979



316
1226
18506
chr5
75993811
75997038
ENSG00000145703
IQGAP2
−0.693808924



317
1227
10787
chr18
45391429
45423180
ENSG00000175387
SMAD2
0.691627662
+


318
1228
04075
chr11
17167214
17167489
ENSG00000011405
PIK3C2A
−0.690859029



319
1229
19214
chr6
17665469
17669777
ENSG00000124789
NUP153
−0.690761984



320
1230
19533
chr6
52935854
52941341
ENSG00000112146
FBXO9
−0.688778917



321
1231
10179
chr17
58346810
58348842
ENSG00000170832
USP32
−0.688400663



322
1232
09309
chr16
81058319
81060243
ENSG00000166451
CENPN
−0.683869042



323
1233
16985
chr4
39739039
39776553
ENSG00000078140
UBE2K
0.683584317
+


324
1234
06320
chr13
28748408
28752072
ENSG00000152520
PAN3
−0.683323431



325
1235
15439
chr3
180651121
180653019
ENSG00000114416
FXR1
−0.677850731



326
1236
18715
chr6
10703637
10705077
ENSG00000111845
PAK1IP1
−0.677107932



327
1237
20193
chr7
141760110
141786128
ENSG00000257335
MGAM
−0.674929803



328
1238
02184
chr1
67356836
67371058
ENSG00000152763
WDR78
−0.674140838



329
1239
00394
chr1
155646338
155649303
ENSG00000163374
YY1AP1
−0.674066047



330
1240
21181
chr8
141582910
141595410
ENSG00000123908
AGO2
−0.6713091



331
1241
13696
chr20
34317233
34320057
ENSG00000131051
RBM39
−0.662798279



332
1242
17137
chr4
54292038
54310270
ENSG00000145216
FIP1L1
−0.662175886



333
1243
02329
chr1
89206670
89237562
ENSG00000065243
PKN2
−0.660194584



334
1244
08087
chr15
56686362
56687032
ENSG00000151575
TEX9
−0.657395416



335
1245
22808
chrM
678
946
NA
NA
0.656447182
+


336
1246
06741
chr13
99890680
99896878
ENSG00000134882
UBAC2
−0.655751379



337
1247
15306
chr3
169694733
169706147
ENSG00000008952
SEC62
−0.65538327



338
1248
21571
chr8
61137095
61139494
ENSG00000178538
CA8
−0.650767459



339
1249
13002
chr2
44717924
44719593
ENSG00000143919
CAMKMT
−0.649894452



340
1250
02797
chr10
13233298
13234568
ENSG00000065328
MCM10
−0.649679006



341
1251
01565
chr1
25666964
25683344
ENSG00000183726
TMEM50A
−0.649491962



342
1252
02928
chr10
22880557
22898646
ENSG00000150867
PIP4K2A
−0.649420792



343
1253
03884
chr11
120345268
120348235
ENSG00000196914
ARHGEF12
−0.649153985



344
1254
04534
chr11
66372959
66373063
ENSG00000173992
CCS
0.647602152
+


345
1255
13232
chr2
61749745
61761038
ENSG00000082898
XPO1
−0.646529922



346
1256
05898
chr12
69644908
69656342
ENSG00000111605
CPSF6
0.644341581
+


347
1257
14988
chr3
138289159
138291774
ENSG00000114107
CEP70
−0.643339219



348
1258
22417
chr9
33996220
33998862
ENSG00000137073
UBAP2
−0.642592519



349
1259
06760
chr14
102659799
102661457
ENSG00000140153
WDR20
−0.642496523



350
1260
02652
chr10
11643343
11643979
ENSG00000148429
USP6NL
−0.642008103



351
1261
13125
chr2
58449076
58459247
ENSG00000115392
FANCL
−0.641155337



352
1262
07060
chr14
39648294
39648666
ENSG00000100941
PNN
−0.640545333



353
1263
18429
chr5
72285253
72286691
ENSG00000157107
FCHO2
−0.640018867



354
1264
07683
chr14
99924615
99932150
ENSG00000183576
SETD3
−0.639034091



355
1265
20868
chr7
99621041
99621930
ENSG00000106261
ZKSCAN1
−0.637712203



356
1266
02331
chr1
89206670
89251896
ENSG00000065243
PKN2
−0.635932753



357
1267
04405
chr11
61133516
61135470
ENSG00000149483
TMEM138
−0.634569229



358
1268
16983
chr4
39739039
39747430
ENSG00000078140
UBE2K
−0.63344921



359
1269
16811
chr4
170523158
170523829
ENSG00000137601
NEK1
−0.632606144



360
1270
08802
chr16
15973660
15978062
ENSG00000133393
FOPNL
−0.631706784



361
1271
05515
chr12
28378727
28412375
ENSG00000123106
CCDC91
−0.63053228



362
1272
00483
chr1
15964801
15978390
ENSG00000197312
DDI2
−0.630382539



363
1273
03851
chr11
120276826
120278532
ENSG00000196914
ARHGEF12
−0.62989944



364
1274
19220
chr6
17669523
17675264
ENSG00000124789
NUP153
−0.629886281



365
1275
16618
chr4
144464661
144465125
ENSG00000153147
SMARCA5
−0.628964515



366
1276
06199
chr13
114265310
114277601
ENSG00000198176
TFDP1
−0.626535807



367
1277
13881
chr20
45874751
45875261
ENSG00000101040
ZMYND8
−0.625387432



368
1278
15830
chr3
32757716
32758729
ENSG00000182973
CNOT10
−0.620191841



369
1279
01339
chr1
226453233
226454033
ENSG00000183814
LIN9
−0.619533975



370
1280
21917
chr9
114840817
114842445
ENSG00000106868
SUSD1
−0.618533169



371
1281
17404
chr4
89827529
89859392
ENSG00000138640
FAM13A
−0.613111444



372
1282
10799
chr18
47017995
47018203
ENSG00000265681
RPL17
−0.61125224



373
1283
08584
chr15
90431752
90432372
ENSG00000157823
AP3S2
−0.608684683



374
1284
09579
chr17
28003837
28004759
ENSG00000141298
SSH2
0.608248578
+


375
1285
01693
chr1
29362337
29365938
ENSG00000159023
EPB41
−0.607275382



376
1286
20192
chr7
141759271
141784444
ENSG00000257335
MGAM
0.604473759
+


377
1287
03072
chr10
32832227
32873232
ENSG00000216937
CCDC7
−0.601944609



378
1288
21122
chr8
131226801
131249240
ENSG00000153317
ASAP1
0.596477189
+


379
1289
11808
chr2
148730307
148739650
ENSG00000115947
ORC4
−0.595902755



380
1290
18633
chr5
93978977
93990457
ENSG00000133302
ANKRD32
0.595223761
+


381
1291
06793
chr14
103918254
103923549
ENSG00000075413
MARK3
−0.593329884



382
1292
02441
chr1
95609446
95616975
ENSG00000152078
TMEM56
−0.592118741



383
1293
00317
chr1
151139409
151139890
ENSG00000163156
SCNM1
−0.59143984



384
1294
05925
chr12
70193988
70195501
ENSG00000127328
RAB3IP
−0.590453589



385
1295
09920
chr17
45247282
45249430
ENSG00000004897
CDC27
−0.590090956



386
1296
18485
chr5
74842834
74848416
ENSG00000122008
POLK
−0.589212654



387
1297
06721
chr13
95886863
95900007
ENSG00000125257
ABCC4
0.58894702
+


388
1298
00393
chr1
155644800
155649303
ENSG00000163374
YY1AP1
−0.588932673



389
1299
01882
chr1
43293959
43295969
ENSG00000164010
ERMAP
−0.588844379



390
1300
03896
chr11
120916382
120930794
ENSG00000154114
TBCEL
−0.586343839



391
1301
06549
chr13
50025688
50026045
ENSG00000136169
SETDB2
−0.585761927



392
1302
10785
chr18
45391429
45396935
ENSG00000175387
SMAD2
0.583937093
+


393
1303
06941
chr14
31424825
31425448
ENSG00000196792
STRN3
−0.583201151



394
1304
06310
chr13
26974589
26975761
ENSG00000132964
CDK8
−0.581890878



395
1305
09968
chr17
47388673
47389404
ENSG00000198740
ZNF652
−0.581062739



396
1306
16163
chr3
56626997
56628056
ENSG00000180376
CCDC66
−0.578668491



397
1307
07694
chr15
101104896
101105470
ENSG00000140471
LINS
−0.578351997



398
1308
02708
chr10
121275020
121286936
ENSG00000148908
RGS10
0.577572918
+


399
1309
05496
chr12
27107077
27110676
ENSG00000111790
FGFR1OP2
0.576980459
+


400
1310
14614
chr22
41568502
41569788
ENSG00000100393
EP300
−0.576686603



401
1311
12064
chr2
175976295
175986268
ENSG00000115966
ATF2
−0.575720234



402
1312
03039
chr10
32309949
32310215
ENSG00000170759
KIF5B
−0.575295577



403
1313
07693
chr15
101104892
101105470
ENSG00000140471
LINS
−0.574250823



404
1314
10271
chr17
60111147
60112969
ENSG00000108510
MED13
−0.574056275



405
1315
10970
chr18
77668145
77668309
ENSG00000122490
PQLC1
−0.571239552



406
1316
04710
chr11
77394754
77404656
ENSG00000048649
RSF1
0.571177955
+


407
1317
23217
chrX
76907603
76912143
ENSG00000085224
ATRX
−0.569592962



408
1318
12082
chr2
178096616
178098999
ENSG00000116044
NFE2L2
−0.56851133



409
1319
00788
chr1
179972308
179975702
ENSG00000135837
CEP350
−0.568320418



410
1320
05414
chr12
14609494
14610229
ENSG00000171681
ATF7IP
−0.567149617



411
1321
20442
chr7
27668989
27672064
ENSG00000106049
HIBADH
−0.566428216



412
1322
21415
chr8
37967896
37968351
ENSG00000129691
ASH2L
−0.565206772



413
1323
16141
chr3
52771601
52775515
ENSG00000114904
NEK4
−0.56479756



414
1324
15039
chr3
141811902
141820683
ENSG00000114126
TFDP2
−0.563649191



415
1325
01723
chr1
29481207
29481422
ENSG00000116350
SRSF4
−0.56351081



416
1326
12722
chr2
24787163
24816590
ENSG00000084676
NCOA1
−0.562536917



417
1327
01585
chr1
26594973
26596105
ENSG00000130695
CEP85
−0.556978262



418
1328
04814
chr11
85723323
85742653
ENSG00000073921
PICALM
0.555693534
+


419
1329
06795
chr14
103918254
103928798
ENSG00000075413
MARK3
−0.555161705



420
1330
18206
chr5
52899281
52900725
ENSG00000164258
NDUFS4
−0.55332775



421
1331
08277
chr15
65471271
65472542
ENSG00000166855
CLPX
−0.553060093



422
1332
15888
chr3
37170553
37196215
ENSG00000093167
LRRFIP2
−0.552719212



423
1333
14215
chr21
37716876
37721706
ENSG00000159256
MORC3
0.552495821
+


424
1334
21103
chr8
131164981
131193126
ENSG00000153317
ASAP1
−0.552048479



425
1335
03209
chr10
5838725
5842668
ENSG00000057608
GDI2
−0.550992097



426
1336
20435
chr7
26724354
26729981
ENSG00000005020
SKAP2
−0.550537566



427
1337
18379
chr5
68487621
68492936
ENSG00000153044
CENPH
−0.549245895



428
1338
07964
chr15
49528047
49531564
ENSG00000156958
GALK2
−0.548755689



429
1339
09370
chr16
9009110
9011013
ENSG00000187555
USP7
−0.548099255



430
1340
20041
chr7
129760588
129762042
ENSG00000128607
KLHDC10
−0.547563163



431
1341
11772
chr2
136432901
136437894
ENSG00000048991
R3HDM1
−0.547466106



432
1342
12227
chr2
201721404
201721708
ENSG00000013441
CLK1
−0.546383174



433
1343
21187
chr8
141595217
141595410
ENSG00000123908
AGO2
−0.54617497



434
1344
06474
chr13
43528083
43544806
ENSG00000133106
EPSTI1
0.546089365
+


435
1345
07887
chr15
43627142
43628024
ENSG00000168803
ADAL
−0.545213157



436
1346
04380
chr11
5248159
5255443
ENSG00000244734
HBB
−0.544981515



437
1347
20067
chr7
131071878
131073731
ENSG00000128585
MKLN1
−0.544816911



438
1348
01978
chr1
51868106
51874004
ENSG00000085832
EPS15
−0.54474945



439
1349
05796
chr12
62715244
62749256
ENSG00000135655
USP15
−0.542734982



440
1350
06458
chr13
41943225
41946966
ENSG00000172766
NAA16
−0.542699921



441
1351
12631
chr2
24046127
24046439
ENSG00000119778
ATAD2B
−0.542019024



442
1352
16247
chr3
69077050
69077446
ENSG00000144747
TMF1
−0.541249355



443
1353
16972
chr4
39328182
39329376
ENSG00000035928
RFC1
0.539983221
+


444
1354
12660
chr2
24103508
24108699
ENSG00000119778
ATAD2B
0.537763436
+


445
1355
03207
chr10
5836847
5842668
ENSG00000057608
GDI2
0.536487079
+


446
1356
14164
chr21
34804483
34805178
ENSG00000159128
IFNGR2
−0.535037721



447
1357
03806
chr11
117150623
117150975
ENSG00000167257
RNF214
−0.533600834



448
1358
18891
chr6
131481198
131490413
ENSG00000118507
AKAP7
−0.530137447



449
1359
10683
chr18
29412046
29419420
ENSG00000153339
TRAPPC8
−0.52543663



450
1360
12223
chr2
201718625
201719809
ENSG00000013441
CLK1
−0.522633343



451
1361
22413
chr9
33986757
33998862
ENSG00000137073
UBAP2
0.521045501
+


452
1362
17718
chr5
137320945
137324004
ENSG00000031003
FAM13B
−0.520881586



453
1363
18147
chr5
38971978
38978752
ENSG00000164327
RICTOR
−0.520750421



454
1364
01647
chr1
29313942
29314417
ENSG00000159023
EPB41
−0.52004965



455
1365
18058
chr5
179665331
179668155
ENSG00000050748
MAPK9
−0.519526036



456
1366
16094
chr3
50145502
50145737
ENSG00000003756
RBM5
−0.518700066



457
1367
03636
chr10
98312403
98312816
ENSG00000077147
TM9SF3
−0.518334594



458
1368
07659
chr14
97026985
97029230
ENSG00000090060
PAPOLA
−0.516218178



459
1369
01488
chr1
24112164
24112913
ENSG00000057757
PITHD1
−0.515901988



460
1370
18726
chr6
108243000
108250718
ENSG00000025796
SEC63
−0.512704372



461
1371
21934
chr9
115013208
115015068
ENSG00000119314
PTBP3
0.511483471
+


462
1372
11181
chr19
19603114
19603521
ENSG00000167491
GATAD2A
−0.506490329



463
1373
17532
chr5
112128142
112128674
ENSG00000134982
APC
−0.505576237



464
1374
00404
chr1
155823066
155823597
ENSG00000116580
GON4L
−0.504015347



465
1375
16639
chr4
147227077
147230127
ENSG00000120519
SLC10A7
−0.501165826



466
1376
11038
chr19
10273342
10277361
ENSG00000130816
DNMT1
−0.499037453



467
1377
12628
chr2
24042616
24046439
ENSG00000119778
ATAD2B
−0.49858657



468
1378
20317
chr7
158580694
158591763
ENSG00000117868
ESYT2
−0.497702626



469
1379
07790
chr15
40938035
40939272
ENSG00000137812
CASC5
−0.497560503



470
1380
03959
chr11
130130750
130131824
ENSG00000196323
ZBTB44
−0.497284677



471
1381
08659
chr15
93543741
93552553
ENSG00000173575
CHD2
−0.497214572



472
1382
21499
chr8
48308935
48320523
ENSG00000164808
SPIDR
0.497091896
+


473
1383
14410
chr22
29120964
29121355
ENSG00000183765
CHEK2
−0.496970217



474
1384
05272
chr12
122995655
122999774
ENSG00000111011
RSRC2
−0.496602757



475
1385
20253
chr7
152007050
152012423
ENSG00000055609
KMT2C
0.49641096
+


476
1386
12867
chr2
33442618
33447218
ENSG00000049323
LTBP1
−0.494203398



477
1387
18343
chr5
65307876
65310553
ENSG00000112851
ERBB2IP
−0.490100764



478
1388
16491
chr4
128995614
128999117
ENSG00000138709
LARP1B
−0.489563987



479
1389
23279
chrY
22749909
22751461
ENSG00000198692
EIF1AY
−0.488676777



480
1390
04762
chr11
85692171
85692271
ENSG00000073921
PICALM
0.488490499
+


481
1391
03534
chr10
93711159
93713630
ENSG00000095564
BTAF1
−0.486905705



482
1392
12274
chr2
202163467
202164023
ENSG00000155749
ALS2CR12
−0.486743671



483
1393
17490
chr5
109049220
109065214
ENSG00000112893
MAN2A1
−0.486367479



484
1394
07235
chr14
53003436
53011089
ENSG00000087301
TXNDC16
−0.485655501



485
1395
00199
chr1
117944807
117963271
ENSG00000198162
MAN1A2
−0.484302881



486
1396
01097
chr1
207896962
207898053
ENSG00000197721
CR1L
−0.48202167



487
1397
08154
chr15
62299506
62306191
ENSG00000129003
VPS13C
−0.480880375



488
1398
01423
chr1
23356961
23377013
ENSG00000004487
KDM1A
−0.48077223



489
1399
18273
chr5
56542126
56543042
ENSG00000062194
GPBP1
−0.480747801



490
1400
02982
chr10
27821435
27822923
ENSG00000099246
RAB18
−0.479223363



491
1401
09356
chr16
89824984
89828430
ENSG00000187741
FANCA
−0.477120997



492
1402
02496
chr10
103221737
103239214
ENSG00000166167
BTRC
−0.473303249



493
1403
02437
chr1
95603830
95616975
ENSG00000231992
RP11-
−0.471456403










57H12.2


494
1404
14948
chr3
136323150
136323315
ENSG00000118007
STAG1
−0.470361517



495
1405
16191
chr3
57618991
57627474
ENSG00000174839
DENND6A
0.469381446
+


496
1406
00019
chr1
100889777
100908552
ENSG00000079335
CDC14A
−0.467831947



497
1407
17405
chr4
89827529
89870589
ENSG00000138640
FAM13A
−0.467278089



498
1408
05428
chr12
1812051
1863680
ENSG00000006831
ADIPOR2
−0.467232674



499
1409
15760
chr3
20178433
20181856
ENSG00000114166
KAT2B
−0.466896651



500
1410
09205
chr16
70601313
70601439
ENSG00000189091
SF3B3
−0.464994622



501
1411
14782
chr3
119219541
119222868
ENSG00000113845
TIMMDC1
0.463424571
+


502
1412
17376
chr4
88116475
88116842
ENSG00000145332
KLHL8
−0.462383046



503
1413
22074
chr9
127670655
127674305
ENSG00000136935
GOLGA1
−0.462041586



504
1414
09111
chr16
67662272
67663436
ENSG00000102974
CTCF
−0.461139143



505
1415
12648
chr2
240929490
240946787
ENSG00000130414
NDUFA10
−0.459774784



506
1416
11683
chr2
122514815
122519100
ENSG00000211460
TSN
−0.458085496



507
1417
04141
chr11
33307958
33309057
ENSG00000110422
HIPK3
−0.457031488



508
1418
02328
chr1
89206670
89226059
ENSG00000065243
PKN2
−0.456886353



509
1419
17863
chr5
153413350
153414527
ENSG00000055147
FAM114A2
−0.456304188



510
1420
16668
chr4
151719232
151738409
ENSG00000198589
LRBA
−0.456090855



511
1421
23105
chrX
44941820
44942034
ENSG00000147050
KDM6A
0.456030497
+


512
1422
01954
chr1
47834140
47840965
ENSG00000162368
CMPK1
0.455682466
+


513
1423
08564
chr15
89656955
89659752
ENSG00000140526
ABHD2
−0.454247399



514
1424
20352
chr7
17929985
17937069
ENSG00000071189
SNX13
−0.449868927



515
1425
11008
chr18
9524591
9525849
ENSG00000017797
RALBP1
0.447213595
+


516
1426
22402
chr9
33960823
33989124
ENSG00000137073
UBAP2
0.447213595
+


517
1427
21830
chr9
100756912
100760960
ENSG00000136938
ANP32B
−0.447213595



518
1428
12329
chr2
203162101
203162629
ENSG00000055044
NOP58
−0.447213595



519
1429
15468
chr3
182679013
182683541
ENSG00000043093
DCUN1D1
−0.447213595



520
1430
04828
chr11
85961337
85963282
ENSG00000074266
EED
−0.447213595



521
1431
16490
chr4
128995614
128996148
ENSG00000138709
LARP1B
−0.444473314



522
1432
01030
chr1
200583445
200584737
ENSG00000118193
KIF14
−0.443117581



523
1433
19420
chr6
42559888
42562042
ENSG00000024048
UBR2
−0.439246455



524
1434
14185
chr21
37619814
37620866
ENSG00000142197
DOPEY2
−0.438906244



525
1435
03718
chr11
108046972
108047817
ENSG00000149308
NPAT
−0.435269889



526
1436
09613
chr17
29170930
29171934
ENSG00000176208
ATAD5
−0.434866756



527
1437
06750
chr14
102368055
102372866
ENSG00000078304
PPP2R5C
0.434853393
+


528
1438
01018
chr1
200550328
200561368
ENSG00000118193
KIF14
−0.434615126



529
1439
16470
chr4
123977541
123978443
ENSG00000145375
SPATA5
−0.434279453



530
1440
09778
chr17
38547757
38548989
ENSG00000131747
TOP2A
−0.434135915



531
1441
05143
chr12
11273608
11276786
ENSG00000111215
PRR4
−0.433592133



532
1442
06260
chr13
21729831
21732264
ENSG00000165480
SKA3
−0.432989827



533
1443
22228
chr9
139115852
139118720
ENSG00000165661
QSOX2
−0.432509181



534
1444
13448
chr2
9048750
9098771
ENSG00000143797
MBOAT2
−0.432354985



535
1445
15357
chr3
171965322
171969331
ENSG00000075420
FNDC3B
0.43092904
+


536
1446
09002
chr16
47581343
47581459
ENSG00000102893
PHKB
−0.427842792



537
1447
06393
chr13
33091993
33101669
ENSG00000244754
N4BP2L2
−0.427642991



538
1448
13458
chr2
9083315
9102747
ENSG00000143797
MBOAT2
−0.427573656



539
1449
16825
chr4
17816475
17816981
ENSG00000109805
NCAPG
−0.426826527



540
1450
23126
chrX
53430497
53430825
ENSG00000072501
SMC1A
−0.424399643



541
1451
20269
chr7
155465560
155473602
ENSG00000184863
RBM33
0.424359885
+


542
1452
09112
chr16
67663300
67663436
ENSG00000102974
CTCF
−0.423521981



543
1453
06268
chr13
21742126
21742538
ENSG00000165480
SKA3
−0.421756846



544
1454
16155
chr3
56600621
56601081
ENSG00000180376
CCDC66
−0.420550846



545
1455
17913
chr5
167915606
167921655
ENSG00000113643
RARS
−0.42049953



546
1456
04638
chr11
73843888
73844602
ENSG00000168014
C2CD3
−0.418953178



547
1457
02840
chr10
15858833
15889942
ENSG00000148481
FAM188A
0.418323531
+


548
1458
07667
chr14
97299803
97327072
ENSG00000100749
VRK1
−0.417060256



549
1459
02854
chr10
16773475
16776063
ENSG00000148484
RSU1
−0.413151582



550
1460
19975
chr7
111027029
111030750
ENSG00000184903
IMMP2L
−0.412214025



551
1461
03181
chr10
52279590
52350007
ENSG00000198964
SGMS1
−0.411306293



552
1462
15801
chr3
31617887
31621588
ENSG00000163527
STT3B
0.410299463
+


553
1463
20062
chr7
131060182
131073731
ENSG00000128585
MKLN1
−0.407576284



554
1464
09831
chr17
40879652
40882936
ENSG00000108799
EZH1
−0.407486821



555
1465
03697
chr11
107260799
107263621
ENSG00000152404
CWF19L2
−0.405295156



556
1466
15598
chr3
195785154
195787118
ENSG00000072274
TFRC
−0.403678077



557
1467
06663
chr13
79209244
79219132
ENSG00000152193
RNF219
−0.402902809



558
1468
22473
chr9
3647337
3651867
ENSG00000237359
RP11-
−0.401011186










509J21.2


559
1469
15073
chr3
142144063
142145683
ENSG00000114127
XRN1
−0.400992543



560
1470
00198
chr1
117944807
117957453
ENSG00000198162
MAN1A2
−0.399491844



561
1471
13616
chr20
30954186
30956926
ENSG00000171456
ASXL1
0.396909813
+


562
1472
00912
chr1
187272597
187298192
ENSG00000236030
LINC01036
0.395630665
+


563
1473
17336
chr4
83891479
83900159
ENSG00000189308
LIN54
0.395430875
+


564
1474
09814
chr17
40650941
40653322
ENSG00000033627
ATP6V0A1
−0.394907679



565
1475
21876
chr9
110062421
110074018
ENSG00000119318
RAD23B
−0.393198961



566
1476
21373
chr8
29959413
29962002
ENSG00000104660
LEPROTL1
−0.392842587



567
1477
19233
chr6
18236682
18258636
ENSG00000124795
DEK
−0.3917531



568
1478
11803
chr2
148653869
148657467
ENSG00000121989
ACVR2A
0.391156997
+


569
1479
03038
chr10
32308785
32310215
ENSG00000170759
KIF5B
−0.390203461



570
1480
22414
chr9
33986757
34017187
ENSG00000137073
UBAP2
0.384924931
+


571
1481
00555
chr1
167921037
167944253
ENSG00000143164
DCAF6
0.384256353
+


572
1482
12529
chr2
227729319
227779067
ENSG00000144468
RHBDD1
−0.384024802



573
1483
08578
chr15
89856134
89857938
ENSG00000140525
FANCI
−0.383669546



574
1484
05946
chr12
72051305
72054207
ENSG00000133858
ZFC3H1
−0.378677648



575
1485
10124
chr17
57430575
57430887
ENSG00000175155
YPEL2
−0.374464606



576
1486
18352
chr5
65349233
65350779
ENSG00000112851
ERBB2IP
−0.368836498



577
1487
18457
chr5
72354259
72373320
ENSG00000157107
FCHO2
0.36829492
+


578
1488
12608
chr2
239090705
239093928
ENSG00000132323
ILKAP
−0.36754271



579
1489
20565
chr7
50358643
50367353
ENSG00000185811
IKZF1
−0.367385234



580
1490
18886
chr6
131466424
131490413
ENSG00000118507
AKAP7
0.365789382
+


581
1491
06726
chr13
96409897
96416207
ENSG00000102580
DNAJC3
0.365346639
+


582
1492
11894
chr2
15691616
15698758
ENSG00000151779
NBAS
−0.363763869



583
1493
08083
chr15
56680669
56687032
ENSG00000151575
TEX9
−0.363417974



584
1494
01398
chr1
230798886
230800333
ENSG00000135775
COG2
−0.362987252



585
1495
00433
chr1
15860731
15863309
ENSG00000116138
DNAJC16
−0.362155352



586
1496
15074
chr3
142151502
142151735
ENSG00000114127
XRN1
0.361833765
+


587
1497
16004
chr3
47139444
47147610
ENSG00000181555
SETD2
0.361080779
+


588
1498
18658
chr5
95091099
95099324
ENSG00000164292
RHOBTB3
−0.360131826



589
1499
14519
chr22
38895404
38897285
ENSG00000100201
DDX17
−0.358942195



590
1500
13180
chr2
61505299
61508377
ENSG00000115464
USP34
−0.357526515



591
1501
00915
chr1
187296052
187298192
ENSG00000236030
LINC01036
−0.357165013



592
1502
10135
chr17
57808781
57816308
ENSG00000062716
VMP1
0.357082198
+


593
1503
11374
chr19
48744218
48744320
ENSG00000105483
CARD8
−0.356101853



594
1504
18196
chr5
43675612
43677908
ENSG00000112992
NNT
−0.354388888



595
1505
12283
chr2
202195192
202195556
ENSG00000155749
ALS2CR12
−0.352185007



596
1506
01405
chr1
231090078
231097049
ENSG00000143643
TTC13
−0.348760252



597
1507
03492
chr10
89268092
89280926
ENSG00000107789
MINPP1
0.348651579
+


598
1508
03880
chr11
120335945
120338017
ENSG00000196914
ARHGEF12
−0.348069091



599
1509
15092
chr3
143704384
143708679
ENSG00000181744
C3orf58
−0.342079566



600
1510
14257
chr21
40578033
40584633
ENSG00000185658
BRWD1
0.3413697
+


601
1511
01241
chr1
220179447
220180680
ENSG00000136628
EPRS
−0.341221548



602
1512
20408
chr7
24663284
24690331
ENSG00000105926
MPP6
−0.340271383



603
1513
18316
chr5
64824278
64847463
ENSG00000123219
CENPK
0.339380192
+


604
1514
20268
chr7
155465560
155465982
ENSG00000184863
RBM33
−0.336928551



605
1515
02266
chr1
78177431
78181553
ENSG00000077254
USP33
0.334726691
+


606
1516
05793
chr12
62708570
62749256
ENSG00000135655
USP15
−0.333244067



607
1517
05179
chr12
116668337
116675510
ENSG00000123066
MED13L
0.333078903
+


608
1518
05174
chr12
116668237
116675510
ENSG00000123066
MED13L
−0.332850418



609
1519
22708
chr9
88284399
88327481
ENSG00000135049
AGTPBP1
0.332274421
+


610
1520
20391
chr7
23650789
23651172
ENSG00000169193
CCDC126
0.329914132
+


611
1521
11362
chr19
47767859
47768203
ENSG00000105321
CCDC9
−0.327502934



612
1522
15121
chr3
148303908
148310052
NA
NA
−0.326264404



613
1523
02629
chr10
11523768
11527910
ENSG00000148429
USP6NL
−0.32531565



614
1524
15583
chr3
195101737
195112876
ENSG00000114331
ACAP2
−0.323229025



615
1525
07647
chr14
96986391
96991728
ENSG00000090060
PAPOLA
−0.322155377



616
1526
12697
chr2
24357988
24369956
ENSG00000219626
FAM228B
−0.322096001



617
1527
09124
chr16
68155889
68157024
ENSG00000072736
NFATC3
−0.319198223



618
1528
01967
chr1
51204534
51210447
ENSG00000185104
FAF1
−0.318899562



619
1529
09438
chr17
18768781
18769265
ENSG00000141127
PRPSAP2
−0.316240039



620
1530
06838
chr14
20811305
20811436
ENSG00000259001
RPPH1
−0.314587253



621
1531
01695
chr1
29362337
29391670
ENSG00000159023
EPB41
−0.314217162



622
1532
06864
chr14
23375403
23380612
ENSG00000100461
RBM23
−0.314204415



623
1533
00645
chr1
172520651
172526934
ENSG00000094975
SUCO
0.312814513
+


624
1534
02169
chr1
65830317
65831879
ENSG00000116675
DNAJC6
−0.312716499



625
1535
01429
chr1
23397717
23398690
ENSG00000004487
KDM1A
−0.312036199



626
1536
22699
chr9
88257741
88261333
ENSG00000135049
AGTPBP1
−0.309641223



627
1537
05351
chr12
124071293
124074996
ENSG00000086598
TMED2
0.308784196
+


628
1538
18039
chr5
179050037
179050165
ENSG00000169045
HNRNPH1
−0.30753603



629
1539
13457
chr2
9083315
9098771
ENSG00000143797
MBOAT2
−0.307407548



630
1540
07460
chr14
73614502
73614802
ENSG00000080815
PSEN1
0.307005543
+


631
1541
06725
chr13
96375495
96377506
ENSG00000102580
DNAJC3
−0.306943155



632
1542
20409
chr7
24663284
24708279
ENSG00000105926
MPP6
0.304556407
+


633
1543
16349
chr4
103635594
103647840
ENSG00000109323
MANBA
0.302316399
+


634
1544
12976
chr2
44436348
44436466
ENSG00000138032
PPM1B
−0.302057933



635
1545
05296
chr12
123064451
123065217
ENSG00000184445
KNTC1
−0.30064316



636
1546
13459
chr2
9083315
9114564
ENSG00000143797
MBOAT2
−0.300355128



637
1547
06911
chr14
31185129
31204064
ENSG00000092108
SCFD1
−0.300074829



638
1548
02265
chr1
78177431
78180468
ENSG00000077254
USP33
−0.299632707



639
1549
12279
chr2
202172241
202173973
ENSG00000155749
ALS2CR12
−0.298172541



640
1550
00882
chr1
185183638
185200840
ENSG00000116668
SWT1
−0.296383051



641
1551
21702
chr8
86253827
86254037
ENSG00000133742
CA1
−0.29494585



642
1552
06790
chr14
103915255
103923549
ENSG00000075413
MARK3
0.293796256
+


643
1553
06499
chr13
46577273
46594692
ENSG00000123200
ZC3H13
−0.293181418



644
1554
19179
chr6
163876310
163899928
ENSG00000112531
QKI
−0.293026578



645
1555
15650
chr3
195800800
195802231
ENSG00000072274
TFRC
0.2929012
+


646
1556
20947
chr8
103372298
103373854
ENSG00000104517
UBR5
0.291370225
+


647
1557
04013
chr11
16205431
16208501
ENSG00000110693
SOX6
−0.289491559



648
1558
06935
chr14
31416295
31425448
ENSG00000196792
STRN3
−0.287741171



649
1559
11737
chr2
128944256
128945188
ENSG00000136731
UGGT1
−0.286640606



650
1560
02879
chr10
17746429
17747740
ENSG00000136738
STAM
−0.285602456



651
1561
20719
chr7
77407654
77408131
ENSG00000187257
RSBN1L
−0.285228365



652
1562
17108
chr4
52729602
52744020
ENSG00000109184
DCUN1D4
−0.283579414



653
1563
21844
chr9
102722198
102722437
ENSG00000136874
STX17
−0.283108654



654
1564
12598
chr2
234296902
234299129
ENSG00000077044
DGKD
0.28152101
+


655
1565
00556
chr1
167935866
167944253
ENSG00000143164
DCAF6
−0.281376187



656
1566
19977
chr7
111926927
111927129
ENSG00000198839
ZNF277
−0.281018353



657
1567
02334
chr1
89236034
89237562
ENSG00000065243
PKN2
−0.280668473



658
1568
17522
chr5
111611022
111643187
ENSG00000129595
EPB41L4A
−0.280356159



659
1569
06933
chr14
31416295
31420150
ENSG00000196792
STRN3
−0.279320036



660
1570
05228
chr12
120995084
120995485
ENSG00000022840
RNF10
−0.278784866



661
1571
19200
chr6
170855190
170858201
ENSG00000008018
PSMB1
−0.276150872



662
1572
20976
chr8
109462051
109468159
ENSG00000104412
EMC2
−0.275735656



663
1573
21101
chr8
131164981
131181313
ENSG00000153317
ASAP1
−0.272436284



664
1574
20069
chr7
131071878
131084192
ENSG00000128585
MKLN1
−0.271776986



665
1575
14335
chr21
47819503
47822397
ENSG00000160299
PCNT
0.270707487
+


666
1576
04811
chr11
85722072
85742653
ENSG00000073921
PICALM
0.270078678
+


667
1577
11938
chr2
162036124
162061304
ENSG00000136560
TANK
−0.267208245



668
1578
12288
chr2
202208892
202216174
ENSG00000155749
ALS2CR12
−0.266117708



669
1579
15165
chr3
150834124
150845771
ENSG00000144893
MED12L
0.265293687
+


670
1580
20975
chr8
109462051
109462721
ENSG00000104412
EMC2
−0.261534879



671
1581
08355
chr15
66044716
66048810
ENSG00000174485
DENND4A
−0.259935274



672
1582
09860
chr17
41256138
41256973
ENSG00000012048
BRCA1
−0.259776938



673
1583
05579
chr12
32751430
32764217
ENSG00000139132
FGD4
0.25903045
+


674
1584
01085
chr1
207820661
207828620
ENSG00000244703
CD46P1
−0.258975588



675
1585
02595
chr10
112356155
112358048
ENSG00000108055
SMC3
−0.256162842



676
1586
06256
chr13
21305979
21306260
ENSG00000150456
N6AMT2
−0.254616892



677
1587
04491
chr11
65267990
65268121
ENSG00000251562
MALAT1
−0.254540106



678
1588
21280
chr8
21832180
21837714
ENSG00000130227
XPO7
−0.252240898



679
1589
19230
chr6
18236682
18237747
ENSG00000124795
DEK
−0.25029879



680
1590
02255
chr1
77672324
77676174
ENSG00000142892
PIGK
−0.249922354



681
1591
10689
chr18
29432408
29432626
ENSG00000153339
TRAPPC8
0.249647437
+


682
1592
08145
chr15
60734614
60737990
ENSG00000128915
NARG2
−0.24428689



683
1593
16950
chr4
37633006
37640126
ENSG00000181826
RELL1
−0.244144099



684
1594
16304
chr3
8977554
8983488
ENSG00000070950
RAD18
0.243189302
+


685
1595
16003
chr3
47139444
47144913
ENSG00000181555
SETD2
−0.241271464



686
1596
23019
chrX
154528097
154528458
ENSG00000155962
CLIC2
−0.240340585



687
1597
20241
chr7
151181822
151195266
ENSG00000106615
RHEB
−0.239763518



688
1598
12807
chr2
32312560
32314674
ENSG00000021574
SPAST
−0.238945495



689
1599
23171
chrX
67731690
67742759
ENSG00000181704
YIPF6
0.237642343
+


690
1600
12531
chr2
227771508
227779067
ENSG00000144468
RHBDD1
0.237572322
+


691
1601
18517
chr5
76758919
76760634
ENSG00000164253
WDR41
−0.237516399



692
1602
15527
chr3
185638891
185639914
ENSG00000136527
TRA2B
−0.233742808



693
1603
15305
chr3
169694733
169703653
ENSG00000008952
SEC62
−0.231918034



694
1604
22668
chr9
86297865
86301070
ENSG00000135018
UBQLN1
−0.231396117



695
1605
21527
chr8
52758220
52773806
ENSG00000168300
PCMTD1
0.22970906
+


696
1606
04781
chr11
85707868
85714494
ENSG00000073921
PICALM
0.229644698
+


697
1607
15556
chr3
193374868
193385069
ENSG00000198836
OPA1
−0.229630322



698
1608
16517
chr4
129857809
129891623
ENSG00000151466
SCLT1
−0.228977881



699
1609
08315
chr15
65994642
65995346
ENSG00000174485
DENND4A
−0.227421726



700
1610
19101
chr6
155095122
155116273
ENSG00000213079
SCAF8
0.227418193
+


701
1611
21658
chr8
71071739
71075089
ENSG00000140396
NCOA2
0.226746325
+


702
1612
09532
chr17
27160969
27161344
ENSG00000173065
FAM222B
0.226375798
+


703
1613
16574
chr4
140046317
140060651
ENSG00000109381
ELF2
0.225090987
+


704
1614
18512
chr5
76342171
76344097
ENSG00000164252
AGGF1
0.223464328
+


705
1615
14107
chr21
17205666
17214859
ENSG00000155313
USP25
0.223298494
+


706
1616
21420
chr8
37971709
37976881
ENSG00000129691
ASH2L
0.222854832
+


707
1617
22434
chr9
3488775
3490345
ENSG00000080298
RFX3
−0.222562079



708
1618
03355
chr10
70719561
70720005
ENSG00000165732
DDX21
0.219375254
+


709
1619
10521
chr18
13037235
13040955
ENSG00000101639
CEP192
−0.219207234



710
1620
16492
chr4
128995614
129003460
ENSG00000138709
LARP1B
−0.218474838



711
1621
20383
chr7
23224688
23226765
ENSG00000136243
NUPL2
0.217804105
+


712
1622
07724
chr15
31266516
31269158
ENSG00000166912
MTMR10
−0.215163364



713
1623
21531
chr8
52773404
52773806
ENSG00000168300
PCMTD1
−0.21410755



714
1624
07279
chr14
55647930
55650471
ENSG00000126787
DLGAP5
−0.212145422



715
1625
19458
chr6
42630995
42633983
ENSG00000024048
UBR2
−0.209515648



716
1626
22716
chr9
88307603
88327481
ENSG00000135049
AGTPBP1
−0.209103823



717
1627
13376
chr2
74300675
74307718
ENSG00000187605
TET3
−0.208832195



718
1628
09780
chr17
38551700
38552717
ENSG00000131747
TOP2A
−0.207376107



719
1629
20641
chr7
66458203
66459328
ENSG00000126524
SBDS
0.206299772
+


720
1630
13248
chr2
64083439
64085070
ENSG00000169764
UGP2
0.205403768
+


721
1631
02469
chr1
9991948
9994918
ENSG00000162441
LZIC
−0.203608619



722
1632
03025
chr10
31661946
31676195
ENSG00000148516
ZEB1
0.201711818
+


723
1633
03075
chr10
32854485
32873232
ENSG00000150076
C10ORF68
−0.200304746



724
1634
19518
chr6
4891946
4892613
ENSG00000153046
CDYL
−0.199527565



725
1635
03513
chr10
91511102
91522592
ENSG00000138182
KIF20B
−0.199045211



726
1636
17257
chr4
77065301
77065626
ENSG00000138750
NUP54
−0.196900723



727
1637
03656
chr10
98667021
98667504
ENSG00000196233
LCOR
−0.195915307



728
1638
15717
chr3
197592293
197593090
ENSG00000186001
LRCH3
−0.195801766



729
1639
05517
chr12
28408513
28412375
ENSG00000123106
CCDC91
−0.195185653



730
1640
20627
chr7
65595730
65599361
ENSG00000241258
CRCP
0.194915841
+


731
1641
15459
chr3
182602540
182605501
ENSG00000058063
ATP11B
0.194190746
+


732
1642
07824
chr15
41648236
41669502
ENSG00000137804
NUSAP1
−0.192452614



733
1643
19559
chr6
56915571
56920595
ENSG00000168116
KIAA1586
−0.191962941



734
1644
17227
chr4
73956383
73958017
ENSG00000132466
ANKRD17
−0.191278185



735
1645
08494
chr15
77657504
77681144
ENSG00000173517
PEAK1
0.190411294
+


736
1646
20064
chr7
131060182
131084192
ENSG00000128585
MKLN1
−0.190090712



737
1647
06896
chr14
31139461
31144271
ENSG00000092108
SCFD1
−0.189779123



738
1648
13805
chr20
39721111
39729993
ENSG00000198900
TOP1
0.188631066
+


739
1649
15234
chr3
157839891
157841780
ENSG00000174891
RSRC1
−0.188143907



740
1650
18114
chr5
36982266
36986403
ENSG00000164190
NIPBL
0.187975589
+


741
1651
22270
chr9
17330629
17342442
ENSG00000044459
CNTLN
−0.187514331



742
1652
09250
chr16
72122885
72124685
ENSG00000140830
TXNL4B
0.187490707
+


743
1653
18068
chr5
179976930
179980471
ENSG00000113300
CNOT6
0.18705281
+


744
1654
21532
chr8
52773420
52773806
ENSG00000168300
PCMTD1
0.184831559
+


745
1655
10651
chr18
2718155
2718432
ENSG00000101596
SMCHD1
0.184800168
+


746
1656
13115
chr2
58311223
58316858
ENSG00000028116
VRK2
−0.183814521



747
1657
18933
chr6
13579682
13584457
ENSG00000124523
SIRT5
−0.182083653



748
1658
22782
chr9
98740342
98766983
ENSG00000182150
ERCC6L2
−0.181217059



749
1659
22389
chr9
33948371
33956144
ENSG00000137073
UBAP2
−0.180446169



750
1660
16786
chr4
170428187
170429482
ENSG00000137601
NEK1
−0.180057374



751
1661
17237
chr4
74852759
74852887
ENSG00000163736
PPBP
0.179149328
+


752
1662
12672
chr2
242099746
242102816
ENSG00000115685
PPP1R7
0.178794715
+


753
1663
06705
chr13
95813442
95840796
ENSG00000125257
ABCC4
−0.176068897



754
1664
18425
chr5
72157634
72161556
ENSG00000083312
TNPO1
0.175732494
+


755
1665
17203
chr4
6995910
7002978
ENSG00000132405
TBC1D14
−0.17425856



756
1666
21660
chr8
71126137
71128999
ENSG00000140396
NCOA2
−0.174239515



757
1667
00651
chr1
172525008
172526934
ENSG00000094975
SUCO
0.173485924
+


758
1668
17135
chr4
54292038
54294350
ENSG00000145216
FIP1L1
−0.172407695



759
1669
06702
chr13
95813442
95822882
ENSG00000125257
ABCC4
−0.171000298



760
1670
20191
chr7
141755799
141782010
ENSG00000257335
MGAM
0.17013547
+


761
1671
09521
chr17
26490568
26499644
ENSG00000087095
NLK
−0.169494931



762
1672
01122
chr1
21097422
21100103
ENSG00000127483
HP1BP3
−0.168563183



763
1673
12330
chr2
203329531
203332412
ENSG00000204217
BMPR2
−0.166786705



764
1674
05912
chr12
69983264
69987393
ENSG00000166226
CCT2
−0.166663778



765
1675
02086
chr1
61577042
61578015
ENSG00000162599
NFIA
0.166193266
+


766
1676
10343
chr17
65941524
65944422
ENSG00000171634
BPTF
−0.164705882



767
1677
20291
chr7
156619298
156629579
ENSG00000105983
LMBR1
−0.160626296



768
1678
14199
chr21
37711076
37717005
ENSG00000159256
MORC3
0.154418454
+


769
1679
21045
chr8
124349864
124351686
ENSG00000156802
ATAD2
0.153641031
+


770
1680
14325
chr21
47768925
47769734
ENSG00000160299
PCNT
0.153108926
+


771
1681
19798
chr6
90556280
90566918
ENSG00000118412
CASP8AP2
−0.153053021



772
1682
02845
chr10
15875628
15889942
ENSG00000148481
FAM188A
−0.152980029



773
1683
03993
chr11
16117541
16119234
ENSG00000110693
SOX6
−0.151781747



774
1684
16545
chr4
129913321
129925031
ENSG00000151466
SCLT1
0.151559153
+


775
1685
06880
chr14
31050069
31050322
ENSG00000092140
G2E3
−0.149358981



776
1686
18314
chr5
64824278
64825026
ENSG00000123219
CENPK
0.148684367
+


777
1687
20221
chr7
148543561
148544397
ENSG00000106462
EZH2
−0.147611287



778
1688
07570
chr14
90397884
90398971
ENSG00000140025
EFCAB11
0.14601488
+


779
1689
02091
chr1
61577042
61624827
ENSG00000270742
RP4-
−0.144497254










802A10.1


780
1690
17014
chr4
39915230
39927553
ENSG00000121892
PDS5A
0.143294254
+


781
1691
13755
chr20
35695126
35696589
ENSG00000080839
RBL1
0.140612985
+


782
1692
07121
chr14
50130032
50141145
ENSG00000100479
POLE2
0.140484442
+


783
1693
20223
chr7
148543588
148544397
ENSG00000106462
EZH2
−0.140309562



784
1694
00933
chr1
193044949
193046180
ENSG00000116747
TROVE2
−0.139596873



785
1695
07159
chr14
50292584
50298079
ENSG00000165525
NEMF
−0.138077345



786
1696
10346
chr17
65941524
65972074
ENSG00000171634
BPTF
−0.13780517



787
1697
21643
chr8
68044185
68049838
ENSG00000104218
CSPP1
−0.136932953



788
1698
13454
chr2
9079949
9098771
ENSG00000143797
MBOAT2
0.133919127
+


789
1699
18540
chr5
78914469
78915906
ENSG00000164329
PAPD4
0.132799362
+


790
1700
18111
chr5
36953719
36976504
ENSG00000164190
NIPBL
0.132583843
+


791
1701
10507
chr18
12370847
12371690
ENSG00000141385
AFG3L2
0.132359086
+


792
1702
18863
chr6
126196034
126199516
ENSG00000111912
NCOA7
0.132255998
+


793
1703
00144
chr1
114372213
114377061
ENSG00000134242
PTPN22
−0.131778107



794
1704
03188
chr10
5741487
5756170
ENSG00000108021
FAM208B
−0.131204074



795
1705
18463
chr5
72370568
72373320
ENSG00000157107
FCHO2
−0.130937396



796
1706
12665
chr2
24181170
24199945
ENSG00000173960
UBXN2A
0.129857265
+


797
1707
22491
chr9
37126308
37126939
ENSG00000147905
ZCCHC7
−0.129829873



798
1708
08822
chr16
1859238
1859834
ENSG00000063854
HAGH
−0.128958795



799
1709
09342
chr16
89291126
89292039
ENSG00000170100
ZNF778
0.124901078
+


800
1710
21674
chr8
74585341
74601048
ENSG00000040341
STAU2
0.124142904
+


801
1711
03883
chr11
120343758
120348235
ENSG00000196914
ARHGEF12
−0.123923895



802
1712
19740
chr6
84894904
84896341
ENSG00000135315
KIAA1009
0.123773636
+


803
1713
20697
chr7
77210743
77212967
ENSG00000127947
PTPN12
−0.123256841



804
1714
05413
chr12
14599921
14610229
ENSG00000171681
ATF7IP
0.123186507
+


805
1715
02968
chr10
27431315
27434519
ENSG00000136758
YME1L1
−0.121868499



806
1716
00185
chr1
1158623
1159348
ENSG00000078808
SDF4
−0.119575165



807
1717
22996
chrX
147743428
147744289
ENSG00000155966
AFF2
−0.118456414



808
1718
02080
chr1
61575449
61578015
ENSG00000162599
NFIA
−0.118244481



809
1719
04138
chr11
33127112
33127610
ENSG00000176102
CSTF3
−0.114563088



810
1720
13631
chr20
32617574
32619410
ENSG00000125970
RALY
−0.113556294



811
1721
21883
chr9
111812562
111812972
ENSG00000106771
TMEM245
0.110008883
+


812
1722
10514
chr18
12999419
13019205
ENSG00000101639
CEP192
−0.108819051



813
1723
15292
chr3
167754623
167759262
ENSG00000173905
GOLIM4
0.107484468
+


814
1724
09077
chr16
58593707
58594266
ENSG00000125107
CNOT1
−0.105014078



815
1725
20343
chr7
17885217
17890587
ENSG00000071189
SNX13
0.10426073
+


816
1726
06102
chr12
96717725
96728643
ENSG00000059758
CDK17
−0.102490008



817
1727
21911
chr9
114676884
114678116
ENSG00000148154
UGCG
−0.10000879



818
1728
16667
chr4
151719232
151729550
ENSG00000198589
LRBA
0.098635477
+


819
1729
08476
chr15
76580186
76585041
ENSG00000140374
ETFA
0.095064522
+


820
1730
02444
chr1
95609446
95639445
ENSG00000152078
TMEM56
0.09412163
+


821
1731
17034
chr4
42024854
42025401
ENSG00000014824
SLC30A9
−0.093848466



822
1732
00281
chr1
150202905
150204264
ENSG00000143401
ANP32E
0.092891301
+


823
1733
03999
chr11
16117541
16208501
ENSG00000110693
SOX6
−0.091724724



824
1734
22667
chr9
86294689
86301070
ENSG00000135018
UBQLN1
0.091306013
+


825
1735
22261
chr9
17226200
17236586
ENSG00000044459
CNTLN
−0.091125079



826
1736
10952
chr18
76886266
76914555
ENSG00000166377
ATP9B
−0.089171438



827
1737
19236
chr6
18256591
18258636
ENSG00000124795
DEK
−0.088063183



828
1738
13408
chr2
85875052
85875976
ENSG00000168883
USP39
−0.088017214



829
1739
07053
chr14
39620949
39628754
ENSG00000182400
TRAPPC6B
−0.086392887



830
1740
04125
chr11
2991032
2993473
ENSG00000205531
NAP1L4
−0.085668582



831
1741
09176
chr16
69404385
69406258
ENSG00000132604
TERF2
0.084750698
+


832
1742
20988
chr8
117668094
117671219
ENSG00000147677
EIF3H
0.084610875
+


833
1743
18339
chr5
65284462
65290692
ENSG00000112851
ERBB2IP
−0.080397524



834
1744
16176
chr3
56694758
56707753
ENSG00000163946
FAM208A
0.079486106
+


835
1745
11220
chr19
30476129
30477324
ENSG00000105176
URI1
−0.078233761



836
1746
14276
chr21
40600425
40601362
ENSG00000185658
BRWD1
−0.077947077



837
1747
11806
chr2
148730307
148733544
ENSG00000115947
ORC4
−0.073487662



838
1748
07606
chr14
92473983
92477416
ENSG00000100815
TRIP11
0.071182875
+


839
1749
17751
chr5
138979956
138994551
ENSG00000131508
UBE2D2
−0.070241902



840
1750
07232
chr14
52977957
53011089
ENSG00000087301
TXNDC16
−0.067999092



841
1751
00197
chr1
117944807
117948267
ENSG00000198162
MAN1A2
−0.0648967



842
1752
18265
chr5
56160560
56161804
ENSG00000095015
MAP3K1
−0.064020699



843
1753
04703
chr11
77336007
77336863
ENSG00000074201
CLNS1A
−0.063683386



844
1754
22665
chr9
86293355
86301070
ENSG00000135018
UBQLN1
0.063068015
+


845
1755
19216
chr6
17669205
17669777
ENSG00000124789
NUP153
−0.062863001



846
1756
17361
chr4
87967317
87968746
ENSG00000172493
AFF1
−0.062349721



847
1757
09584
chr17
28011580
28030080
ENSG00000141298
SSH2
−0.062209042



848
1758
06777
chr14
103865287
103871604
ENSG00000075413
MARK3
0.061116967
+


849
1759
21042
chr8
124346117
124348772
ENSG00000156802
ATAD2
0.060568881
+


850
1760
15108
chr3
148164052
148173318
NA
NA
0.057927058
+


851
1761
06177
chr13
113219424
113223573
ENSG00000126216
TUBGCP3
−0.0579005



852
1762
10983
chr18
9182379
9221997
ENSG00000101745
ANKRD12
−0.057876407



853
1763
10604
chr18
21087948
21089243
ENSG00000141452
C18orf8
−0.055418355



854
1764
08542
chr15
85223943
85234875
ENSG00000140612
SEC11A
0.054970161
+


855
1765
02224
chr1
70758070
70781249
ENSG00000118454
ANKRD13C
0.054606996
+


856
1766
01705
chr1
29386933
29424447
ENSG00000159023
EPB41
0.054384837
+


857
1767
07102
chr14
45705016
45706924
ENSG00000129534
MIS18BP1
−0.054279736



858
1768
04624
chr11
73418464
73429763
ENSG00000175582
RAB6A
−0.053950256



859
1769
06763
chr14
102661274
102664184
ENSG00000140153
WDR20
0.053904813
+


860
1770
00399
chr1
155691307
155695810
ENSG00000132676
DAP3
−0.053660957



861
1771
10224
chr17
59853761
59857762
ENSG00000136492
BRIP1
0.05142855
+


862
1772
16576
chr4
140058783
140060651
ENSG00000109381
ELF2
0.051075392
+


863
1773
16378
chr4
105439733
105440611
ENSG00000245384
AC004053.1
−0.050390264



864
1774
08827
chr16
18809246
18810156
ENSG00000170540
ARL6IP1
−0.050301092



865
1775
09125
chr16
68155889
68160513
ENSG00000072736
NFATC3
0.049811818
+


866
1776
08521
chr15
80412669
80415142
ENSG00000086666
ZFAND6
0.04971732
+


867
1777
22418
chr9
33996220
34017187
ENSG00000137073
UBAP2
0.048875187
+


868
1778
17441
chr4
99495607
99496056
ENSG00000168785
TSPAN5
0.046088362
+


869
1779
04651
chr11
74500670
74528759
ENSG00000166439
RNF169
0.04506624
+


870
1780
10064
chr17
53478829
53481229
ENSG00000108960
MMD
−0.043505886



871
1781
11243
chr19
33604672
33605325
ENSG00000076650
GPATCH1
−0.041554229



872
1782
13939
chr20
47685251
47686834
ENSG00000124207
CSE1L
−0.041440338



873
1783
20310
chr7
158552176
158557544
ENSG00000117868
ESYT2
−0.039332637



874
1784
12027
chr2
172782046
172809519
ENSG00000128708
HAT1
0.038890112
+


875
1785
05780
chr12
58340777
58347472
ENSG00000166896
XRCC6BP1
0.036777678
+


876
1786
23021
chrX
154645235
154649223
NA
NA
0.036352385
+


877
1787
14612
chr22
41566409
41569788
ENSG00000100393
EP300
−0.035726557



878
1788
12275
chr2
202163960
202173973
ENSG00000155749
ALS2CR12
0.033395502
+


879
1789
00568
chr1
168007608
168014465
ENSG00000143164
DCAF6
0.033275143
+


880
1790
08122
chr15
59204761
59209198
ENSG00000137776
SLTM
−0.032521344



881
1791
01196
chr1
213251037
213290752
ENSG00000136643
RPS6KC1
−0.032028145



882
1792
08360
chr15
66641397
66641775
ENSG00000075131
TIPIN
0.030872075
+


883
1793
16383
chr4
106155053
106158508
ENSG00000168769
TET2
−0.030773502



884
1794
11998
chr2
171884848
171902872
ENSG00000198586
TLK1
−0.030704957



885
1795
06802
chr14
103923478
103928798
ENSG00000075413
MARK3
0.03038484
+


886
1796
08341
chr15
66021409
66031213
ENSG00000174485
DENND4A
0.029114553
+


887
1797
15592
chr3
195780288
195803993
ENSG00000072274
TFRC
0.027473612
+


888
1798
02669
chr10
119100490
119104960
ENSG00000165650
PDZD8
−0.026742711



889
1799
22521
chr9
4823547
4827033
ENSG00000120158
RCL1
0.025254641
+


890
1800
14172
chr21
35138178
35140132
ENSG00000205726
ITSN1
−0.023931854



891
1801
22227
chr9
139115608
139118720
ENSG00000165661
QSOX2
0.019159719
+


892
1802
21772
chr8
95897294
95897786
ENSG00000175305
CCNE2
−0.017276448



893
1803
12762
chr2
26505712
26505919
ENSG00000138029
HADHB
0.016059531
+


894
1804
08619
chr15
93467550
93472321
ENSG00000173575
CHD2
0.015937506
+


895
1805
01109
chr1
21076215
21100103
ENSG00000127483
HP1BP3
0.015119097
+


896
1806
18951
chr6
13639794
13644961
ENSG00000010017
RANBP9
0.014679425
+


897
1807
01465
chr1
235963619
235964397
ENSG00000143669
LYST
−0.013840453



898
1808
03064
chr10
32759991
32762951
ENSG00000216937
CCDC7
0.012950478
+


899
1809
19408
chr6
41839301
41859613
ENSG00000164663
USP49
−0.012058437



900
1810
21222
chr8
142264087
142264728
ENSG00000022567
SLC45A4
0.011653698
+


901
1811
21894
chr9
114148656
114154104
ENSG00000136813
KIAA0368
0.011310291
+


902
1812
17147
chr4
56277780
56284152
ENSG00000134851
TMEM165
0.010589367
+


903
1813
12555
chr2
231222519
231226412
ENSG00000185404
SP140L
−0.010261584



904
1814
03671
chr10
99196173
99197507
ENSG00000171311
EXOSC1
−0.008914606



905
1815
15853
chr3
33725850
33738425
ENSG00000163539
CLASP2
0.007974047
+


906
1816
15700
chr3
197541778
197547301
ENSG00000186001
LRCH3
0.007744757
+


907
1817
03351
chr10
70547683
70548085
ENSG00000060339
CCAR1
0.006380515
+


908
1818
11660
chr2
120684173
120692534
ENSG00000088179
PTPN4
−0.003481975



909
1819
00498
chr1
160293220
160302347
ENSG00000122218
COPA
−0.003328713



910
1820
11490
chr19
8538547
8539128
ENSG00000099783
HNRNPM
+0.0001735361
+





“SEQ ID NO: junction” refers to the SEQ ID NO: encoding the sequence surrounding the exon-exon junction in a head-to-tail arrangement, i.e. 20 nucleotides upstream and 20 nucleotides downstream of the actual junction.


“SEQ ID NO: full length” refers to the SEQ ID NO: encoding the entire sequence identified for the respective circRNA. Importantly, these sequences do not comprise any intronic sequences which are assumed to be spliced out during circRNA biogenesis.


“circID” indicates the internal reference bloodCirc_# of the inventors.


“chr” denotes the chromosome the circRNA is stemming from (chrM is the mitochondrial chromosome).


“start” and “stop” indicate where on the respective chromosome the start and the stop of the circRNA encoding sequence is found. The reference sequence is hg19 downloaded from the UCSC genome browser (see Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al.; The human genome browser at UCSC. Genome Research. 2002 June; 12(6): 996-1006).


“gene” and “gene_name” denote the gene as annotated in the UCSC genome browser and the commonly used name, respectively.


“NA” indicated that the gene is not yet annotated.


The “score” is calculated by subtracting the mean values of each circRNA in the two groups (healthy and diseased (Alzheimer's)) and dividing by the highest standard deviation (cp. FIG. 15). Negative score = decreased levels or absence of the respective circRNA found in samples of diseased subjects. Positive score = increased levels and/or presence of the respective circRNA found in samples of diseased subjects diseased.


“diseased” denotes whether increased levels or presence of the respective circRNA are indicative for the presence of the neurodegenerative disease (“+”), or whether decreased levels or absence of the respective circRNA are indicative for the presence of a neurodegenerative disease (“−”).


The sequences in the Sequence Listing are DNA sequences encoding the actual circRNA. Hence, the actual circRNA is the listed sequence with “T” being exchanged by an “U”.


The 910 circRNAs listed here resulted from an expression cut-off on all detected circRNAs in the sample set. This cut-off was chosen such, that a Principle Component Anlaysis (PCA, see above) is not affected by expression noise.






As outlined herein, it may be desirable to determine the presence or absence, or the level of more than on circRNA in order to increase the diagnostic significance of the method according to the present invention. Hence, in a preferred embodiment of the method for diagnosing a neurodegenerative disease the levels of more than one circRNA comprising a sequence encoded by a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, or a sequence having at least 70% identity thereto, are determined and compared to the respective control level. The identity is preferably at least 80%, more preferably at least 90%, more preferably at least 95%. In a preferred embodiment the levels of at least 100 circRNAs comprising a sequence encoded by a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, or a sequence having at least 70% identity thereto, are determined in a sample of a bodily fluid of said subject and controlled to the respective control level; preferably the levels of at least 150 circRNAs and more preferably the levels of at least 200 circRNAs comprising a sequence encoded by a sequence selected from the group consisting of nt 11 to 30 of any one of SEQ ID NO:1 to SEQ ID NO:910, or a sequence having at least 70% identity thereto, are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. The identity is preferably at least 80%, more preferably at least 90%, more preferably at least 95% to the respective sequences in the SEQ ID NO:.


In one embodiment the circRNAs comprising a sequence encoded by SEQ ID NOs:1 to 910 have the sequence as determined by the inventors, i.e. have a sequence encoded by the sequence of any of SEQ ID NOs: 911 to 1820. Hence, in one embodiment of the method for diagnosing a neurodegenerative disease the levels of more than one circRNAs having a sequence being at least 70% identical to any of the sequences as encoded by SEQ ID NO: 911 to 1820 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. Particularly preferred the levels of at least 100 circRNAs having a sequence being at least 70% identical to any of the sequences as encoded by SEQ ID NOs: 911 to 1820 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level, preferably the levels of at least 150, and more preferably the levels of at least 200 circRNAs having a sequence being at least 70% identical to any of the sequences as encoded by SEQ ID NOs: 911 to 1820 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. In more preferred embodiments said sequence identity is at least 80%, preferably at least 90%, more preferably at least 95%, yet more preferably at least 99% to the outlined sequences. In a further preferred embodiment the circRNAs have the sequences as encoded by any one of SEQ ID NOs: 911 to 1820. The levels are preferably detected by using hybridization probes specifically hybridizing the sequences of nt 11 to 30 of SEQ ID NO:1 to 910 or specifically hybridizing to the sequences of SEQ ID NO:1 to 910, or an RNA sequence encoded by these sequences, or the respective reverse complements thereof.


The inventors found that the first 200 circRNAs as encoded by SEQ ID NO:1 to 910 have particular suited predictive and diagnostic values. Hence, in a preferred embodiment of the method for diagnosing a neurodegenerative disease said at least 100, preferably at least 150 more preferably at least 200 circRNAs comprise a sequence as encoded by any of SEQ ID NO:1 to SEQ ID NO:200, or a sequence having at least 70% identity thereto, or the circRNAs have a sequence as encoded by any of SEQ ID NO: 911 to 1110, or a sequence having at least 70% identity thereto. The identity is preferably at least 80%, more preferably at least 90%, more preferably at least 95%. In a preferred embodiment of the method for diagnosing a neurodegenerative disease the levels of more than one circRNA comprising a sequence encoded by a sequence being at least 70% identical to a sequence selected from the group consisting of the sequence of nt 11 to nt 30 of any one of SEQ ID NO: 1 to 200 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. Particularly preferred the levels of at least 100 circRNAs comprising a sequence encoded by a sequence being at least 70% identical to a sequence selected from the group consisting of the sequence of nt 11 to nt 30 of any one of SEQ ID NO: 1 to 200 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level, preferably of at least 150 and more preferably the levels of all 200 circRNAs comprising a sequence encoded by a sequence being at least 70% identical to a sequence selected from the group consisting of the sequence of nt 11 to nt 30 of any one of SEQ ID NO: 1 to 200 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. The identity is preferably at least 80%, more preferably at least 90%, more preferably at least 95%, yet more preferred 100%. The levels are preferably detected by using hybridization probes specifically hybridizing the sequences of nt 11 to 30 of SEQ ID NO:1 to 200 or specifically hybridizing to the sequences of SEQ ID NO:1 to 200, or an RNA sequence encoded by these sequences, or the reverse complements thereof


In one embodiment the circRNAs comprising a sequence encoded by any of the SEQ ID NOs:1 to 200 have the sequence as determined by the inventors, i.e. a sequence encoded by any of the SEQ ID NOs: 911 to 1110. Hence, in one embodiment of the method for diagnosing a neurodegenerative disease the levels of more than one circRNA having a sequence encoded by a sequence being at least 70% identical to any of the sequences of SEQ ID NO: 911 to 1110 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. Particularly preferred the levels of at least 100 circRNAs having a sequence encoded by a sequence being at least 70% identical to any of the sequences of SEQ ID NOs: 911 to 1110 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level, preferably the levels of at least 150, and more preferably the levels of at least 200 circRNAs having a sequence encoded by a sequence being at least 70% identical to any of the sequences of SEQ ID NOs: 911 to 1110 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. In more preferred embodiments said sequence identity is at least 80%, preferably at least 90%, more preferably at least 95%, yet more preferably at least 99% to the outlined sequences. In a further preferred embodiment the circRNAs have the sequences encoded by the sequences as set out in any one of SEQ ID NOs: 911 to 1110. The levels are preferably detected by using hybridization probes specifically hybridizing the sequences of nt 11 to 30 of SEQ ID NO:1 to 200 or specifically hybridizing to the sequences of SEQ ID NO:1 to 200, or an RNA sequence encoded by these sequences, or the reverse complements thereof.


The determination of percent identity between two sequences is accomplished using the mathematical algorithm of Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90: 5873-5877. Such an algorithm is incorporated into the BLASTN and BLASTP programs of Altschul et al. (1990) J. Mol. Biol. 215: 403-410. BLAST nucleotide searches are performed with the BLASTN program, score=100, word length=12, to obtain nucleotide sequences homologous to the nucleic acid sequences outlined herein. BLAST protein searches are performed with the BLASTP program, score=50, wordlength=3, to obtain amino acid sequences homologous to the EPO variant polypeptide, respectively. To obtain gapped alignments for comparative purposes, Gapped BLAST is utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25: 3389-3402. When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs are used.


In order to improve the diagnostic value, it may be desirable to determine the number of circRNAs showing increased or decreased levels being indicative for the neurodegenerative disease as outlined in Table 1. In a preferred embodiment the number of circRNAs showing increased or decreased levels being indicative for the neurodegenerative disease as outlined in Table 1 is above the 80% percentile of a control population, more preferably above the 90% percentile, yet more preferred above the 95% percentile. In a preferred embodiment of the method for diagnosing a neurodegenerative disease the presence of increased or decreased levels as defined in Table 1 under “disease” for the respective circRNA for at least 10, preferably at least 50, more preferably at least 100 circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease.


In one particular embodiment of the method for diagnosing a neurodegenerative disease the levels of all circRNAs comprising a sequence encoded by a sequence selected from the group consisting of nt 11 to 30 of any of SEQ ID NO:1 to SEQ ID NO:200, or a sequence having at least 70% identity thereto; wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease. Preferably, said method comprises the determination of the levels of all circRNAs comprising a sequence encoded by a sequence selected from the group consisting of any of SEQ ID NO:1 to SEQ ID NO:200 or a sequence having at least 70% identity thereto, wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease. Further preferred, all circRNAs with a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:911 to SEQ ID NO:1110 are detected, wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease. In more preferred embodiments said sequence identity is at least 80%, preferably at least 90%, more preferably at least 95%, yet more preferably at least 99% to the outlined sequences, yet more preferred the identity is 100%.


In one particular embodiment of the method for diagnosing a neurodegenerative disease the levels of all circRNAs comprising a sequence encoded by a sequence selected from the group consisting of nt 11 to 30 of any of SEQ ID NO:1 to SEQ ID NO:910, or a sequence having at least 70% identity thereto; wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease. Preferably, said method comprises the determination of the levels of all circRNAs comprising a sequence encoded by a sequence selected from the group consisting of any of SEQ ID NO:1 to SEQ ID NO:910 or a sequence having at least 70% identity thereto, wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease. Further preferred, all circRNAs with a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:911 to SEQ ID NO:1820 are detected, wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease. In more preferred embodiments said sequence identity is at least 80%, preferably at least 90%, more preferably at least 95%, yet more preferably at least 99% to the outlined sequences, yet more preferred the identity is 100%.


As outlined herein above, the circRNAs may be specifically detected through their unique sequence at the exon-exon junction in the head-to-tail arrangement. Hence, the invention also relates to a nucleic acid probe specifically hybridizing to a sequence of nucleotide (nt) 11 to nt 30 of any of the sequences of SEQ ID NO:1 to 910, or specifically hybridizing to a RNA sequence encoded by these sequences, or specifically hybridizing to a reverse complement sequences thereof, preferably specifically binding to any of the sequences of SEQ ID NO:1 to 910, or specifically hybridizing to a RNA sequence encoded by these sequences, or specifically hybridizing to a reverse complement sequences thereof. In a very preferred embodiment the nucleic acid probe spans the sequence of nt 15 to nt 35 of the respective SEQ ID NO: 1 to 910, of RNA sequence encoded by these sequences, or the reverse complement sequences thereof.


Nucleic acid probes may be prepared using any suitable method, such as, for example, the phosphotriester and phosphodiester methods or automated embodiments thereof. In one such automated embodiment diethylophosphoramidites are used as starting materials and may be synthesized as described by Beaucage et al., Tetrahedron Letters, 22:1859-1862 (1981), which is hereby incorporated by reference. One method for synthesizing oligonucleotides on a modified solid support is described in U.S. Pat. No. 4,458,006, which is hereby incorporated by reference. It is also possible to use a nucleic acid probes which has been isolated from a biological source (such as a restriction endonuclease digest). Preferred nucleic acid probes have a length of from about 15 to 500, more preferably about 20 to 200, most preferably about 25 to 60 bases.


The nucleic acid probe according to the present invention may be hybridization probe or as a primer for amplification reactions. In both cases the nucleic acid probe may comprise fluorescent dyes. Such fluorescent dyes may for example be FAM (5- or 6-carboxyfluorescein), VIC, NED, fluorescein, FITC, IRD-700/800, CY3, CY5, CY3.5, CY5.5, HEX, TET, TAMRA, JOE, ROX, BODIPY TMR, Oregon Green, Rhodamine Green, Rhodamine Red, Texas Red, Yakima Yellow, Alexa Fluor, PET and the like (see e.g. https://www.micro-shop.zeiss.com/us/us_en/spektral.php). In the context of the present invention, fluorescent dyes may for example be FAM (5- or 6-carboxyfluorescein), VIC, NED, fluorescein, fluorescein isothiocyanate (FITC), IRD-700/800, cyanine dyes, auch as CY3, CY5, CY3.5, CY5.5, Cy7, xanthen, 6-carboxy-2′,4′,7′,4,7-hexachlorofluorescein (HEX), TET, 6-carboxy-4′,5′-dichloro-2′,7′-dimethodyfluorescein (JOE), N,N,N′,N′-Tetramethyl-6-carboxyrhodamine (TAMRA), 6-carboxy-X-rhodamine (ROX), 5-Carboxyrhodamine-6G (R6G5), 6-carboxyrhodamine-6G (RG6), Rhodamine, Rhodamine Green, Rhodamine Red, Rhodamine 110, BODIPY dyes, such as BODIPY TMR, Oregon Green, coumarines such as Umbelliferone, benzimides, such as Hoechst 33258; phenanthridines, such as Texas Red, Yakima Yellow, Alexa Fluor, PET, ethidium bromide, acridinium dyes, carbazol dyes, phenoxazine dyes, porphyrin dyes, polymethin dyes, and the like.


In a preferred embodiment, the nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of the sequences listed in Table 1, or specifically hybridizing to a RNA sequence encoded by these sequences, or specifically hybridizing to a reverse complement sequences thereof; preferably specifically hybridizing to a sequence selected from the group consisting of the sequences listed in Table 1, or specifically hybridizing to a RNA sequence encoded by these sequences, or specifically hybridizing to a reverse complement sequences thereof.


The invention furthermore relates to a kit for specifically detecting one or more, preferably more than one nucleic acids comprising a sequence selected from the group consisting of nt 11 to nt 30 of any one of SEQ ID NO: 1 to 910, or a sequence selected from the group consisting of SEQ ID NO:1 to 910, or SEQ ID NO:911 to SEQ ID NO:1820, or an RNA sequence encoded by any of these sequences. The kit is preferably a kit for diagnosing a neurodegenerative disease, comprising means for specifically detecting one or more nucleic acid sequence selected from the group consisting of SEQ ID NO:1 to 910 or SEQ ID NO:911 to SEQ ID NO:1820, or an RNA sequence encoded by any of these sequences. In a preferred embodiment the kit comprises means for specifically detecting at least 100, preferably at least 150 more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910 or SEQ ID NO:911 to SEQ ID NO:1820, or an RNA sequence encoded by any of these sequences.


The means for detecting preferably are one or more of the nucleic acid probes according to the invention. Hence, in one embodiment the kit comprises one or more nucleic acid probes, preferably more than one nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, or an RNA sequence encoded by these sequences, or the reverse complements thereof; preferably the kit comprises a plurality of nucleic acid probes specifically hybridizing to the sequence of nucleotide 11 to 30 of at least 100, preferably at least 150, more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, or a RNA sequence encoded by these sequences, or the reverse complements thereof. In a particular preferred embodiment the kit comprises a plurality of nucleic acid probes hybridizing to the sequence of nucleotide 11 to 30 of at least 100, preferably at least 150, more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to 200, or the RNA sequences encoded by these sequences, or the reverse complements thereof; preferably the kit comprises a plurality of nucleic acid probes hybridizing to the sequence of nucleotide 11 to 30 of all of the sequences of SEQ ID NO:1 to 200, or the RNA sequences encoded by these sequences, or the reverse complements thereof.


In a further particular preferred embodiment the kit comprises a plurality of nucleic acid probes hybridizing to the sequence of at least 100, preferably at least 150, more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to 200, or an RNA sequence encoded by these sequences, or the reverse complements thereof; preferably the kit comprises a plurality of nucleic acid probes hybridizing to the sequence of all of the sequences of SEQ ID NO:1 to 200, or the RNA sequences encoded by these sequences, or the reverse complements thereof.


The kit may further comprise means for handling and/or preparation of a bodily fluid sample, preferably for cerebrospinal fluid or whole blood. In a preferred embodiment the kit comprises a container for collecting whole blood, said container comprising stabilizing agents, preferably selected from the group consisting of chelating agents, EDTA, K2EDTA, formulations like RNAlater (Qiagen) or such, or combinations thereof. In a particular preferred embodiment the kit comprises a K2EDTA coated container.


As used herein, a kit is a packaged combination optionally including instructions for use of the combination and/or other reactions and components for such use.


Furthermore, the invention relates to an array for determining the presence or level of a plurality of nucleic acids, said array comprising a plurality of probes, wherein the plurality of probes specifically hybridize to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, or an RNA sequence encoded by these sequences, or the reverse complements thereof, preferably the plurality of probes comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of at least 100, preferably at least 150 more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, or the RNA sequences encoded by these sequences, or the reverse complements thereof. Preferably the plurality of probes comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of SEQ ID NO:1 to SEQ ID NO:200, or the RNA sequences encoded by these sequences, or the reverse complements thereof.


Herein an “array” is a solid support comprising one or more nucleic acids attached thereto. Arrays, such as microarrays (e.g. from Affimetrix®) are known in the art Schena MI, Shalon D, Davis R W, Brown P O (1995); Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467-70. The solid support may be made of different nature including, but not limited to, those made of plastics, resins, polysaccharides, silica or silica-based materials, functionalized glass, modified silicon, carbon, metals, inorganic glasses, membranes, nylon, natural fibers such as silk, wool and cotton, and polymers. hi some embodiments, the material comprising the solid support has reactive groups such as carboxy, amino, hydroxy, etc., which are used for attachment of, e.g. nucleic acid probes. Polymers are preferred, and suitable polymers include, but are not limited to, polystyrene, polyethylene glycol tetraphthalate, polyvinyl acetate, polyvinyl chloride, polyvinyl pyrrolidone, polyacrylonitrile, polymethyl methacrylate, polytetrafluoroethylene, butyl rubber, styrenebutadiene rubber, natural rubber, polyethylene, polypropylene, (poly)tetrafluoroethylene, (poly)vinylidenefluoride, polycarbonate and polymethylpentene. Preferred polymers include those outlined in U.S. Pat. No. 5,427,779, hereby expressly incorporated by reference. The nucleic acid probes are preferably covalent attachment to the solid support of the array. Attachment may be performed as described below. As will be appreciated by those in the art, either the 5′ or 3′ terminus may be attached to the support using techniques known in the art. The arrays of the invention comprise at least two different covalently attached nucleic acid probes, with more than two being preferred. By “different” oligonucleotide herein is meant an oligonucleotide that has a nucleotide sequence that differs in at least one position from the sequence of a second oligonucleotide; that is, at least a single base is different, preferably their hybridization specificity is as outlined herein above.


Furthermore, the invention particularly relates to the use of a nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by these sequences, or hybridizing to the reverse complement thereof for the diagnosis of a neurodegenerative disease, preferably for the diagnosis of Alzheimer's disease The invention also relates to the use of a kit according to the invention for the diagnosis of a neurodegenerative disease, preferably for the diagnosis of Alzheimer's disease. Also encompassed by the invention is the use of an array according to the invention for the diagnosis of a neurodegenerative disease, preferably for the diagnosis of Alzheimer's disease.


The present invention also relates to the following items:

  • 1. A method for diagnosing a disease of a subject, comprising the step of:
    • determining the presence or absence of one or more circular RNA (circRNA) in a sample of a bodily fluid of said subject;
    • wherein the presence or absence of said one or more circRNA is indicative of the disease.
  • 2. The method according to item 1, wherein said disease is not a disease of said bodily fluid.
  • 3. The method according to item 1 or 2, wherein said bodily fluid is blood or cerebrospinal fluid, most preferred whole blood.
  • 4. The method according to any one of items 1 to 3, wherein the determination step comprises:
    • determining the level of said one or more circRNA;
    • comparing the determined level to a control level of said one or more circRNA;
    • wherein differing levels between the determined and the control level are indicative of the disease.
  • 5. The method according to item 4, wherein said one or more circRNA is differentially expressed between the diseased and non-diseased state in the tissue of interest.
  • 6. The method according to any one of items 1 to 5, wherein the circRNA is detected by detection of an exon-exon-junction in a head-to-tail arrangement.
  • 7. The method according to item 6, wherein circRNA is detected using a method selected from the group consisting of probe hybridization based methods, nucleic acid amplification based methods, and nucleic acid sequencing.
  • 8. The method according to any one of items 1 to 7, wherein the sample is treated with RNase R before determination of the circRNA.
  • 9. The method according to any one of items 1 to 8, wherein more than one circRNAs from a panel of circRNAs are determined.
  • 10. The method according to item 9, wherein said panel comprises a plurality of circRNAs that have been identified as being present at differing levels in bodily fluid samples of patients having the disease and patients not having the disease, preferably identified by principle component analysis or clustering.
  • 11. The method according to any one of items 4 to 10, wherein the disease is a neurodegenerative disease, preferably Alzheimer's disease.
  • 12. The method according to item 11, wherein the method for diagnosing the neurodegenerative disease, preferably Alzheimer's disease, in a subject comprises the steps of:
    • determining the level of one or more circRNA in a sample of a bodily fluid of said subject;
    • comparing the determined level to a control level of said one or more circRNA;
    • wherein differing levels between the determined and the control level are indicative of the disease.
  • 13. The method according to item 12, wherein said one or more circRNA comprises a sequence encoded by a sequence being at least 70% identical to a sequence selected from the group consisting of nucleotides 11 to 30 of any of the sequences of SEQ ID NO:1 to SEQ ID NO:910, preferably wherein the presence of increased or decreased levels as defined in Table 1 under “disease” for the circRNA comprising the respective sequence are indicative of the presence of a neurodegenerative disease, preferably Alzheimer's disease.
  • 14. The method according to item 13, wherein said one or more circRNA has a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:911 to SEQ ID NO:1820, preferably wherein the presence of increased or decreased levels as defined in Table 1 under “disease” for the circRNA having the respective sequence are indicative of the presence of a neurodegenerative disease, preferably Alzheimer's disease.
  • 15. The method according to any of items 12 to 13, wherein the levels of at least 100, preferably at least 150 and more preferably at least 200 circRNAs comprising a sequence encoded by a sequence being at least 70% identical to a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level.
  • 16. The method according to item 15, wherein said at least 100, preferably at least 150 and more preferably at least 200 circRNAs have a sequence encoded by a sequence selected from the group consisting of SEQ ID NO:911 to 1820, or a sequence being at least 70% identical thereto.
  • 17. The method according to item 15 or 16, wherein said at least 100, preferably at least 150 more preferably at least 200 circRNAs comprise a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:200, or said circRNAs have a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO: 911 to 1110.
  • 18. The method according to item 15, wherein the presence of increased or decreased levels as defined in Table 1 under “disease” for the respective circRNA for at least 10, preferably at least 50, more preferably at least 100 of said at least 100, preferably at least 150 and more preferably at least 200 circRNAs are indicative of the presence of a neurodegenerative disease, preferably Alzheimer's disease.
  • 19. The method according to any one of items 13 to 16, wherein the levels of all circRNAs comprising a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910 or all circRNAs having a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:911 to SEQ ID NO:1820 are detected, wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative of the presence of a neurodegenerative disease, preferably Alzheimer's disease.
  • 20. A nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of the sequences listed in Table 1, or specifically hybridizing to a reverse complement sequence thereof; preferably specifically hybridizing to a sequence selected from the group consisting of the sequences listed in Table 1, or specifically hybridizing to a reverse complement sequence thereof
  • 21. A kit for diagnosing a neurodegenerative disease, comprising means for specifically detecting one or more nucleic acid sequence encoded by a sequence selected from the group consisting of SEQ ID NO:1 to 910 or SEQ ID NO:911 to SEQ ID NO:1820, or a sequence being at least 70% identical to the recited sequences.
  • 22. The kit of item 20, wherein the kit comprises means for specifically detecting at least 100, preferably at least 150 more preferably at least 200 nucleic acid sequences encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910 or SEQ ID NO:911 to SEQ ID NO:1820.
  • 23. The kit of item 21 or 22, wherein the kit comprises one or more nucleic acid probes specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:910, or hybridizing the reverse complements thereof, preferably the kit comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of at least 100, preferably at least 150 more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:910, or hybridizing to the reverse complements thereof
  • 24. The kit according to any on of items 21 to 23, further comprising means for handling and/or preparation of a bodily fluid sample, preferably for cerebrospinal fluid or whole blood.
  • 25. An array for determining the presence or level of a plurality of nucleic acids, comprising a plurality of probes, wherein the plurality of probes specifically hybridize to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:910, or hybridizing the reverse complements thereof, preferably the plurality of probes comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of at least 100, preferably at least 150 more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:910, or specifically hybridizing to the reverse complement sequences thereof, preferably the plurality of probes comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of SEQ ID NO:1 to SEQ ID NO:200, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:200, or hybridizing to the reverse complement sequences thereof.
  • 26. Use of a nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:910, or hybridizing to the reverse complement thereof, a kit according to items 21 to 24, or an array according to item 25 for the diagnosis of a neurodegenerative disease, preferably for the diagnosis of Alzheimer's disease.


It will be apparent that the methods and components of the present invention, as well as the uses as substantially described herein or illustrated in the description and the examples, are also subject of the present invention and claimed herewith. In this respect, it is also understood that the embodiments as described in the description and/or any one of the examples, can be independently used and combined with any one of the embodiments described hereinbefore and claimed in the appended claims set. Thus, these and other embodiments are disclosed and encompassed by the description and examples of the present invention.


The invention is further illustrated by the following non-limiting Examples and Figures.


EXAMPLES

1. Methods


1.1 Whole Blood Sample Collection


Blood sampling was approved by the Charité ethics committee, registration number EA4/078/14 and all participants gave written informed consent. 5 mL blood were drawn from subjects by venipuncture and collected in K2EDTA coated Vacutainer (BD, #368841) and stored on ice until used for RNA preparation. For downstream RNA analysis by sequencing or qPCR assays presented here, 100 μL blood (>1 μg total RNA) is sufficient.


1.2 RNA Isolation and RNase R Treatment


Total RNA was isolated from fresh whole blood samples. Blood was diluted 1:3 in PBS and 250 μL of the dilution were used for RNA preparation using 750 μL Trizol LS reagent (Life Technology). Samples were homogenized by gentle vortexing and 200 μL chloroform was added. After centrifugation at 4° C., 15 min at full speed in a table top centrifuge, the aqueous phase was collected to a new tube (typically 400 μL). RNA was precipitated by adding an equal volume of cold isopropanol and incubation for ≥1 hour at −80° C. RNA pellets were recovered by spinning at 4° C., 30 min at full speed in a table top centrifuge. RNA pellets were washed with 1 mL 80% EtOH and subsequently air dried at room temperature for 5 min. The RNA was resuspended in 20 μL RNase-free water and treated with DNase I (Promega) for 15 min at 37° C. with subsequent heat inactivation for 10 min at 65° C. HEK293 total RNA was prepared in the same way but using 1 mL Trizol on cell pellets. For sequencing experiments the RNA preparations were additionally subjected to two rounds of ribosomal RNA depletion using a RiboMinus Kit (Life Technologies K1550-02 and A15020). Total RNA integrity and rRNA depletion were monitored using a Bioanalyzer 2001 (Agilent Technologies). For qPCR analysis the samples were treated with RNase R (Epicentre) for 15 min at 37° C. at a concentration of 3 U/μg RNA. After treatment 5% C. elegans total RNA was spiked-in followed by phenol-chloroform extraction of the RNA mixture. For controls the RNA was mock treated without the enzyme.


1.3 cDNA Library Preparation for Deep Sequencing


cDNA libraries were generated according to the Illumina TruSeq protocol. Sample RNA was fragmented, adaptor ligated, amplified and sequenced on an Illumina HiSeq2000 in 1×100 cycle runs.


1.4 Quantitative PCR (qPCR)


Total RNA was reverse transcribed using Maxima reverse transcriptase (Thermo Scientific) according to the manufacturer's protocol. qPCR reactions were performed using Maxima SYBR Green/Rox (Thermo Scientific) on a StepOne Plus System (Applied Biosystems). Primer sequences are available in the Table 7. RNase R assays were normalized to C. elegans RNA spike-in RNA.


1.5 Sanger Sequencing


PCR products were size separated by agarose gel electrophoresis, amplicons were extracted from gels and Sanger sequenced by standard methods (Eurofins).


1.6 Detection and Annotation of circRNAs


The detection of circular RNA was based on a previously published method (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338) with the following details. Human reference genome hg19 (February 2009, GRCh37) was downloaded from the UCSC genome browser (see Kent W J, Sugnet C W, Furey T S, et al. The human genome browser at UCSC. Genome Research. 2002; 12(6):996-1006) and was used for all subsequent analysis. bowtie2 (version 2.1.0 (see Langmead B, Salzberg S L. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012; 9(4):357-359) was employed for mapping of RNA sequencing reads. Reads were mapped to ribosomal RNA sequence data downloaded from the UCSC genome browser. Reads that do not map to rRNA were extracted for further processing. In a second step, all reads that mapped to the genome by aligning the whole read without any trimming (end-to-end mode) were neglected. Reads not mapping continuously to the genome were used for circRNA candidate detection. From those 20 nucleotide terminal sequences (anchors) were extracted and re-aligned independently to the genome. The anchor alignments were then extended until the full read sequence was covered. Consecutively aligning anchors indicate linear splicing events whereas alignment in reverse orientation indicates head-to-tail splicing as observed in circRNAs (FIG. 1A). The resulting splicing events were filtered using the following criteria 1) GT/AG signal flanking the splice sites 2) unambiguous breakpoint detection 3) maximum of two mismatches when extending the anchor alignments 4) breakpoint no more than two nucleotides inside the alignment of the anchors 5) at least two independent reads supporting the head-to-tail splice junction 6) a minimum difference of 35 in the bowtie2 alignment score between the first and the second best alignment of each anchor 7) no more than 100 kilobases distance between the two splice sites.


1.7 circRNA Annotation


Genomic coordinates of circRNA candidates were intersected with published gene models (ENSEMBL, release 75 containing 22,827 protein coding genes, 7484 lincRNAs and 3411 miRNAs). circRNAs were annotated and exon-intron structure predicted as previously described (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338). Known introns in circRNAs were assumed to be spliced out. Each circRNA was counted to a gene structure category if it overlaps fully or partially with the respective ENSEMBL feature (FIG. 1C, Table 1).


1.8 Published RNA Data Sets


In this study rRNA depleted RNA-seq data from whole blood samples (own data), fetal cerebellum (ENCODE accession: ENCSR000AEW) fetal liver (ENCODE accession: ENCSR000AFB) and HEK293 (Table 1; see Ivanov A, Memczak S, Wyler E, et al. Analysis of Intron Sequences Reveals Hallmarks of Circular RNA Biogenesis in Animals. CellReports. 2015; 10(2):170-177) was used. Expression values, coordinates and other details of the circRNAs reported here and all associated scripts will be made available at www.circbase.org (see Glažar P, Papavasileiou P, Rajewsky N. circBase: a database for circular RNAs. RNA. 2014; 20(11):1666-1670).


1.9 Quantification of circRNA and Host Gene Expression


The number of reads that span a particular head-to-tail junction were used as a measure for circRNA expression. To allow comparison of expression between samples, raw read counts were normalized to sequencing depth by dividing by the number of reads that map to protein coding gene regions and multiply by 1,000,000 (FIG. 1B left, FIG. 4 C-D, FIGS. 6 and 10 A,C). To estimate host gene expression, RNA-seq data were first mapped to the reference genome with STAR (see Dobin A, Davis C A, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013; 29(1):15-21). htseq-count (see Anders S, Pyl P T, Huber W. HTSeq—A Python framework to work with high-throughput sequencing data. 2014) was employed to count hits on genomic features of ENSEMBL gene models. The measure transcripts per million (TPM) was calculated for each transcript and sample in order to compare total host gene expression between samples (FIG. 1B, right).


Circular-to-linear ratios were calculated for each circRNA by dividing raw head-to-tail read counts by the median number of reads that span linear spliced junctions of the respective host gene. For both measures one pseudo count was added to avoid division by zero. CircRNAs from host genes without annotated splice junctions according to the ENSEMBL gene annotation, were not considered in this analysis.


For analysis in FIG. 3D a permutation test with 1000 Monte-Carlo replications was performed on pooled biological replicate data to approximate the exact conditional distribution. To adjust for different dataset sizes the respective larger data set of each comparison was randomly subsampled.


1.10 Principal Component Analysis and Clustering


To perform principal component analysis (PCA) of circRNA expression in whole blood samples of different donors, variance stabilizing transformation was first performed on raw head-to-tail spliced read counts using the R package DESeq2 (see Love M I, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 2014; 15(12):550). Only circRNAs with a transformed expression value of at least 6.7 (n=910, FIG. 4A) in one of the samples were considered for the analysis. PCA was performed on all remaining circRNAs using the prcomp function of R's stats package. All genes that give rise to these circRNAs and have at least one known splice junction were considered for PCA of the linear host gene expression. Same procedure was used for PCA using the median number of linear spliced reads as a proxy for linear expression. 200 circRNAs with the highest weight in PC2 were considered for clustering. Raw head-to-tail spliced read counts for each circRNA (ni) were normalized to sequencing depth by dividing by the number of reads that map to protein coding gene regions multiplied by 1,000,000. Whole blood samples of different donors were clustered on log2 transformed normalized circRNA expression profiles (log2(ni+1)). Hierarchical, agglomerative clustering was performed with complete linkage and by using Spearman's rank correlation as distance metric (1−{corr [log(n, +1)]}). The same procedure was used for linear host gene expression using the median number of linear spliced reads for all genes that give rise to these 200 circRNAs and have at least one known splice site.


2. Results


2.1 Thousands of circRNAs are Reproducibly Detected in Human Peripheral Whole Blood


First it was determined whether circRNAs are present in standard clinical blood specimen. To this end, total RNA was prepared from two biologically independent human peripheral whole blood samples and depleted ribosomal RNAs (see Methods, supra). The samples were reverse transcribed using random primers to allow for circRNA detection and sequencing libraries were produced (FIG. 1A). The raw reads were fed into our in silico circRNA detection pipeline (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338). In short, the program filters reads that map continuously to the genome but saves unmapped reads. From those, terminal 20-mer anchors are extracted and independently aligned to the genome. If the anchors map in reverse orientation and can be extended to cover the whole read sequence, they are flagged as head-to-tail junction spanning, i.e. indicative for circRNAs. Anchors that aligned consecutively were used to determine linear splicing as an internal library quality control and to assess linear RNA isoform expression (Table 2).


From the RNA of two human donors we identified 4550 and 4105 unique circRNA candidates, respectively, by at least two independent reads spanning a head-to-tail splice junction (FIG. 1B). In both datasets the number of total reads and linear splicing events were respectively similar, indicating reproducible sample preparation (Table 2, Table 5). When considering RNAs found in both samples, we observed a high correlation of expression for both linear (R=0.98) as well as circRNAs (R=0.80, FIG. 1B). Between the two samples 1265 circRNAs (55%) with more than 5 reads overlap and 2442 (39%) circRNAs supported by at least 2 reads are shared (Table 1, FIG. 15, FIG. 5, technical reproducibility is shown in FIG. 6). The later set will be considered as reproducibly detected circRNAs in the following analysis. CircRNA candidates are derived from genes covering the whole dynamic range of RNA expression (FIG. 1B, right panel). As observed in other human samples, we find that most circRNAs are derived from protein coding exonic regions or 5′ UTR sequences (FIG. 1C; see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338, and Rybak-Wolf A, Stottmeister C, Glažar P, et al. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. MOLCEL. 2015; 1-17). GO term enrichment analysis on reproducibly detected, top expressed circRNAs and the same number of top linear RNAs showed significant enrichment of different biological function annotations (FIG. 7). Together with the broad expression spectrum of corresponding host genes this finding argues that circRNA expression levels are largely independent of linear RNA isoform abundance.


The predicted spliced length of blood circRNAs of 200-800 nt (median=343 nt) is similar to that in liver or cerebellum (median=394/448 nt) and previous observations in HEK293 cell cultures and other human samples (FIG. 8 and see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338). However, we observed a high number of circRNAs per gene, with 23 genes giving rise to more than 10 circRNAs (‘circRNA hotspots’, FIG. 1D).


To assess the reproducibility of the sequencing results we designed divergent, circRNA specific primers and measured relative abundances of the top eight expressed circRNAs compared to linear control genes in qPCR (FIG. 1E). circRNA candidate 8 could not be unambiguously amplified from cDNA, most likely due to overlapping RNA isoforms and was therefore excluded from further analysis. For the remaining seven circRNA candidates, we tested circularity using previously established assays: 1) resistance to the 3′-5′ exonuclease RNase R and 2) Sanger sequencing of PCR amplicons to confirm the sequence of predicted head-to-tail splice junctions. With these assays we validated 7/7 tested candidates suggesting that the overall false positive rate in our data sets is low (FIG. 9). Interestingly, these circRNAs are expressed from gene loci that so far were not shown to have a specific blood related function (Table 3) but show expression levels that by far exceed expression of housekeeping genes such as VCL or TFRC (10-100 fold, FIG. 1E).


2.2 Circular-to-Linear RNA Expression is High in Blood


When inspecting the read coverage in blood sequencing data, it was noticed that oftentimes the expression of circularized exons was outstandingly high compared to the coverage of neighboring exons expressed in linear RNA isoforms of the same gene. For example, it was observed that the two exons of circRNA candidate 5, which is product of the PCNT locus were densely covered with sequencing reads in the blood samples, while the upstream and downstream exons were barely detected (FIG. 2A). This particular expression pattern was not observed in HEK293 cells, where all exons were equally covered. This observation was further investigated by qPCR, comparing linear to circular RNA expression with isoform specific primer sets in HEK293 and whole blood samples (FIG. 2B, C). This independent assay confirmed dominant expression of the tested candidates which was found to be at least 30-fold higher than the cognate linear isoforms. In contrast, this circRNA domination was not found in HEK293 cells where the same RNAs were probed, which argues for a tissue-specific pattern.


Thereafter, comparison of the blood data to published ENCODE project datasets from cerebellum, representative of neuronal tissues that in general have high circRNA expression (see Rybak-Wolf A, Stottmeister C, Glažar P, et al. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. MOLCEL. 2015; 1-17) and to a non-neuronal primary tissue, liver (Table 2) was performed. Approx. 30% of blood circRNAs are also found in cerebellum while this fraction was around 10% for liver with higher fractions for both cases when constraining the analysis to highly expressed blood circRNAs (FIG. 10 A-D, comparison between total RNAs in FIG. 11). In summary, circRNAs found in human whole blood in part overlap circRNAs expressed in cerebellum of liver, but also contain hundreds of other circRNAs.


The relative circular to linear RNA isoform abundance on a transcriptome wide scale was then analyzed. To this end, read counts that span head-to-tail junctions and are therefore indicative of circRNAs were compared to the median number of read counts on linear splice site junctions on the same gene, the latter serving as a proxy for linear RNA expression (see Methods, supra). We observed that many blood circRNAs are highly expressed while corresponding linear RNAs show average or low abundances (FIG. 3A), a finding that was recapitulated by qPCR assays validating our approach (FIG. 12). For the control samples cerebellum and liver this pattern was not observed (FIG. 3B, C) as revealed by comparing the mean circular-to-linear RNA ratio, which we found to be significantly higher in blood than in the tested control tissues (FIG. 3D). In summary, blood has an outstanding general tendency to contain circRNAs at high levels while the corresponding linear transcripts are much more lowly expressed. This tendency was only found (to a much lower extent) in cerebellum but not in liver RNA as well as RNA from many other tissues or cell lines that we have analyzed.


2.3 circRNAs are Putative Biomarkers in Alzheimer's Disease


The results show that circRNAs are reproducibly and easily detected in clinical standard blood samples and therefore are well suited to serve as a new class of biomarker for human diseases, like neurodegenerative diseases. Taking into account the high expression of circRNAs in neuronal tissues (see Rybak-Wolf A, Stottmeister C, Glažar P, et al. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. MOLCEL. 2015; 1-17) and the urgent need for biomarkers in neurological diseases, circular RNA expression in blood samples from Alzheimer's diseases patients and control subjects (see Methods, supra, Table 2, Table 4) was investigated. To this end, sequencing libraries from whole blood RNA from five individuals of each group were generated. In total 22,644 distinct circRNAs were detected in all samples combined. Then putative disease-specific circRNA expression were identified. Therefore, subsets of all detected circRNA candidates were defined and used these in a principle component analysis to detect expression differences between the two groups. Sorting of all circRNAs by expression and definition of sets for PCA analysis by increasing expression cut-offs was performed. In a range of the top 500 to top 900 circRNAs, a clear separation of control and diseased subjects was detected (FIG. 4A, FIG. 13). Interestingly, this is not observed when analyzing the corresponding linear RNAs, suggesting that there might be disease relevant information specifically encoded in the circular blood transcriptome (FIG. 4B). When the circRNA were sorted out of this analysis by their weight in principle component 2 (PC2) and the data was subjected to unsupervised clustering, again controls and Alzheimer's patients were distinguishable. Importantly, the two main clusters do not reflect the gender or age of the subjects (see also Table 6). The findings of this analysis show that circRNA expression patterns in blood have a diagnostic value, that is not revealed by analyzing the expression of their cognate linear RNA isoforms.


3. Discussion


Recent publications show that circRNAs can be detected in plasma and saliva samples (see Koh W, Pan W, Gawad C, et al. Noninvasive in vivo monitoring of tissue-specific global gene expression in humans. Proceedings of the National Academy of Sciences. 2014; 111(20):7361-7366; and Bahn J H, Zhang Q, Li F, et al. The Landscape of MicroRNA, Piwi-Interacting RNA, and Circular RNA in Human Saliva. Clinical Chemistry. 2014). However, in both specimens only few (10-70) circular RNAs with canonical splice sites were reported, which dramatically limits any further analysis. The circular transcriptome of whole blood presented here, demonstrates that the search for putative circRNA biomarker in peripheral blood is much more suitable to yield informative results. Using RNA-Seq of clinical standard samples showed reproducible detection of around 2400 circRNA candidates that are present in human whole blood. It will be interesting to determine the origin of blood circRNAs. Accumulating evidence suggests that circRNAs are specifically expressed in a developmental stage- and tissue-specific manner, rather than being merely byproducts of splicing reactions (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338; and Rybak-Wolf A, Stottmeister C, Glažar P, et al. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. MOLCEL. 2015; 1-17). Previously analyzed circRNA from neutrophils, B-cells and hematopoietic stem cells suggest that many circRNAs are constituents of hematocytes (see Salzman J, Gawad C, Wang P L, Lacayo N, Brown P O. Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS ONE. 2012; 7(2):e30733). However, there is also the intriguing possibility of circRNA excretion into the extracellular space, e.g. by vesicles such as exosomes. Likewise, aberrant circRNA expression in disease may reflect, either a condition-specific transcriptome change in blood cells themselves, or a direct consequence of active or passive release of circRNA from diseased tissue.


Further, we demonstrated that many circRNAs have a high expression compared to linear RNA isoforms from the same locus, a feature that distinguishes blood circRNAs from other primary tissues such as cerebellum or liver. Considering that this was observed for hundreds of blood circRNA candidates (FIG. 3A, Table 1, FIG. 15) and that further restricted the experimental setup to standard samples and preparation procedures. Gene products that are dominated by circRNAs which typically comprise 2-4 exons (example in FIG. 2, FIG. 14) will also dominate signals for the specific gene of interest in array assays, Northern Blots or qPCR experiments if the circularized exon expression is measured.


After reproducibly detecting thousands of oftentimes highly expressed circRNAs in blood, it was asked whether these might be instrumental in diagnosis of human disease. Therefore measurement of putatively specific circRNA abundances in Alzheimer's disease and control samples was performed.


It was observed that analyzing specific subsets of blood circular RNAs allows distinguishing Alzheimer's disease from control samples in a principal component analysis and unsupervised clustering. This distinction was not possible when analyzing linear RNA isoforms from the same genomic loci, demonstrating that circRNA expression data bear specific information.


Given the urgent need for non-invasive biomarker detection for many disease states, these findings show a way for biomarker detection in easily accessible bodily fluid samples; like whole blood and cerebrospinal fluid. These are not being limited to Alzheimer's disease or neurological condition in general, since blood circRNA expression might be specifically altered in many disorders and therefore exploitable as diagnostic tool in human diseases.









TABLE 2







Sequencing statistic of analyzed libraries





















reads







number of reads


mapping





that map
number of reads

to



number of

continously
used for
number of
protein



total

to
circRNA
linear
coding
number of



reads
% map
genome
detection
splicing
genes
circRNA
Ensembl


Sample
(millions)
to rRNA
(millions)
(millions)
events
(millions)
candidates
accession ID


















H_1
57.85
11.52
41.75
9.45
77.367
8.47
4.550



rep_H_1
169.86
11.43
122.18
28.27
107.996
24.73
9.996


H_2
48.04
6.32
37.13
7.88
74.676
7.81
4.105


H_3
164.93
15.44
115.02
24.44
108.870
21.25
11.113


H_4
171.76
47.99
75.43
13.91
94.811
13.17
5.739


H_5
170.20
10.52
123.28
29.02
107.573
24.26
10.002


AD_1
110.76
18.74
74.07
15.93
88.932
13.85
5.837


AD_2
132.73
11.27
100.26
17.51
98.956
16.70
7.513


AD_3
131.62
15.78
93.46
17.39
91.823
17.22
6.867


AD_4
140.48
19.81
95.07
17.57
98.952
16.92
8.016


AD_5
122.04
13.88
88.91
16.18
96.942
17.41
6.404


Cerebellum_1
87.22
0.49
76.60
18.19
113.99
22.01
6.792
ENCSR000AEW,










ENCFF001ROL


Cerebellum_2
122.00
0.14
109.82
13.18
122.375
24.63
5.786
ENCSR000AEW,










ENCFF001RPH


Liver_1
86.12
3.35
72.72
10.52
101.147
30.14
839
ENCSR000AFB,










ENCFF001RNR


Liver_2
103.53
6.95
72.01
17.41
106.969
55.07
1.557
ENCSR000AFB,










ENCFF001RNX









Summary of RNA-Sequencing Results


Sequencing results for blood RNA from five controls (H), five Alzheimer patients (AD), cerebellum and liver control RNA samples. If not noted otherwise sample datasets were produced for this study.









TABLE 3







Details on top expressed circRNA candidates.

















spliced length in
head-to-tail



candidate

host gene annoation
function
nt
read counts
circBase ID
















1
MBOAT2
membrane bound O-
acyltransferase
226
1367
hsa_circ_0007334




acyltransferase




domain containing 2


2
TMEM56
Transmembrane Protein 56
unknown
264
676
hsa_circ_0000095


3
DNAJC6
DnaJ Chaperone Homolog,
regulates chaperone
302
513
hsa_circ_0002454




Subfamily C, Member 6
activity


4
UBXN7
UBX domain protein 7
Ubiquitin-binding
183
485
hsa_circ_0001380





adapter


5
PCNT1
Pericentrin-13
component of the
315
344
hsa_circ_0002903





nuclear pore complex


6
MORC3
MORC family CW-type zinc
unknown
249
333
hsa_circ_0001189




finger 3


7
XPO1
Exportin 1
nuclear export of
207
333
hsa_circ_0001017





protein and RNAs


8
GSE1
Coiled-Coil Protein Genetic
unknown
219
326
hsa_circ_0000722




Suppressor Element
















TABLE 4







Patients overview.












subject
age
diagnosis
stage
MMSE
sex















H_1
31
control


m


H_2
27
control


m


H_3
71
control


f


H_4
65
control


f


H_5
62
control


m


AD_1
75
Alzheimer's Disease
mild
24
m


AD_2
81
Alzheimer's Disease
mild
23
m


AD_3
73
Alzheimer's Disease
mild
25
f


AD_4
69
Alzheimer's Disease
medium
18
f


AD_5
68
Alzheimer's Disease
severe
8
m





MMSE: Mini Mental State Examination













TABLE 5







Raw reads mapping to hemoglobin genes for blood sample 1 and 2.










sample 1
sample 2















total reads
57,853,921
48,035,915



HBA1
2,429,755
2,016,794



HBA2
3,330,428
1,891,626



HBB
3,964,389
3,703,140



HBD
1,016
936



sum
9,725,588
7,612,496



% of total
16.81
15.85

















TABLE 6







Rate of reproducibility after sub-sampling circRNAs


in FIG. 4a 1000 times.











%

% main cluster


% circRNAs
clustering reproduced
% linear RNAs
as in FIG. 4A





90
52.2
90
0


70
31.4
70
0


50
17.9
50
0
















TABLE 7







List of oligonucleotides used in the Examples









SEQ




ID




NO:
Name
Sequence





1821
hsaRTvinculinfwd
CTCGTCCGGGTTGGAAAAGAG





1822
hsaRTvinculinrev
AGTAAGGGTCTGACTGAAGCAT





1823
hsaTFRCfwd
ACCATTGTCATATACCCGGTTCA





1824
hsaTFRCrev
CAATAGCCCAAGTAGCCAATCAT





1825
celegansEIF3D_fwd
CGCCTTGAACATGGATAACTGCTGGG





1826
celegansEIF3D_rev
GATCGTCATCCGAGTTCTCCTCGTCG





1827
hsaMBOAT2div_fwd
AGTGCAAGATAAAGGCCCAAA





1828
hsaMBOAT2div_rev
TGATCATCATAGGAGTGGAGAACA





1829
hsaMBOAT2con_fwd
TACTCCACAGGTAATGTTGTAC





1830
hsaMBOAT2con_rev
ACTTTCATTGAAGGCAGATCATACCA





1831
hsaDNAJC6div_fwd
CCAGACATCTTGACCACTACACA





1832
hsaDNAJC6div_rev
ATGTGTCTTTGAGGGTGTCTTT





1833
hsaDNAJC6con_fwd
TCTCTACTCTACTCCTGGCCCAG





1834
hsaDNAJC6con_rev
GTAGGTCACACATATAGCCCAGGT





1835
hsaTMEM56div_fwd
CATCATTGTGCGTCCCTGTATG





1836
hsaTMEM56div_rev
GCTGAGACTATTGAAACCTGGAGA





1837
hsaTMEM56con_fwd
GCTGGCATACATTGGGAATTT





1838
hsaTMEM56con_rev
CAATCCGCACGATGAAGAATAC





1839
hsaUBXN7div_fwd
ACCAGTATTTCCTGCTTTTGAGG





1840
hsaUBXN7div_rev
CTACCCTTGCAGATCTATTCCGG





1841
hsaUBXN7con_fwd
AGAAATCCCGTCACTTGGTCCAA





1842
hsaUBXN7con_rev
TGACAGTGAGGAAGGTCAGAGA





1843
hsaGSE1div_fwd
CATCCTCCAGCTTTGCCGCCG





1844
hsaGSE1div_rev
CTGGTCGCGGTGGAAAGCATC





1845
hsaGSE1con_fwd
AGCTCAGTTGTGCAGGATTC





1846
hsaGSE1con_rev
CTTCTCAGGTAGTCCTCGGT





1847
hsaMORC3div_fwd
CATCCTACGTGGACAGAAAGTGAA





1848
hsaMORC3div_rev
CTGTTCCGTGGAAAACAGAGAAT





1849
hsaMORC3con_fwd
CAGTGCAGTTGCTGAATTAATAG





1850
hsaMORC3con_rev
TCCCATTGTCGGTGAATGTC





1851
hsaPCNT1div_fwd
CCGGTGTTTAGAAGACTTGGAGTT





1852
hsaPCNT1div_rev
TGCAGACAGTTCTTTGCGTAGATT





1853
hsaPCNT1con_fwd
TTGCCATTACTGACCTGGAGAGC





1854
hsaPCNT1con_rev
CCGTCAATGCCGTCTCCTTCTC





1855
hsaXPO1div_fwd
TGAAATCAAGCAGCTGACGA





1856
hsaXPO1div_rev
AGATTCTTCCAAGGAACCAGTG





1857
hsaXPO1con_fwd
GCCAGGGACAGACATTTGA





1858
hsaXPO1con_rev
GCTCAAGTAAAGCTCTTTGTGAC








Claims
  • 1-22. (canceled)
  • 23. A method for diagnosing a disease of a subject, comprising the step of: determining the presence or absence of one or more circular RNA (circRNA) in a sample of a bodily fluid of said subject;wherein the presence or absence of said one or more circRNA is indicative of the disease,wherein the disease is a neurodegenerative disease andwherein said bodily fluid is blood or cerebrospinal fluid.
  • 24. The method according to claim 23, wherein the determination step comprises: determining the level of said one or more circRNA; andcomparing the determined level to a control level of said one or more circRNA;wherein differing levels between the determined and the control level are indicative of the disease.
  • 25. The method according to claim 24, wherein said one or more circRNA is differentially expressed between the diseased and non-diseased state.
  • 26. The method according to claim 23, wherein the circRNA is detected by detection of an exon-exon-junction in a head-to-tail arrangement.
  • 27. The method according to claim 26, wherein circRNA is detected using a method selected from the group consisting of probe hybridization based methods, nucleic acid amplification based methods, and nucleic acid sequencing.
  • 28. The method according to claim 23, wherein the sample is treated with RNase R before determination of the circRNA.
  • 29. The method according to claim 23, wherein more than one circRNAs from a panel of circRNAs are determined.
  • 30. The method according to claim 29, wherein said panel comprises a plurality of circRNAs that have been identified as being present at differing levels in bodily fluid samples of patients having the disease and patients not having the disease, preferably identified by principle component analysis or clustering.
  • 31. The method according to claim 24, wherein said one or more circRNA comprises a sequence encoded by a sequence being at least 70% identical to a sequence selected from the group consisting of nucleotides 11 to 30 of any of the sequences of SEQ ID NO:1 to SEQ ID NO:910, preferably wherein the presence of increased or decreased levels as defined in Table 1 under “disease” for the circRNA comprising the respective sequence are indicative of the presence of a neurodegenerative disease.
  • 32. The method according to claim 31, wherein said one or more circRNA has a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:911 to SEQ ID NO:1820, preferably wherein the presence of increased or decreased levels as defined in Table 1 under “disease” for the circRNA having the respective sequence are indicative of the presence of a neurodegenerative disease.
  • 33. The method according to claim 24, wherein the levels of at least 100 circRNAs comprising a sequence encoded by a sequence being at least 70?/o identical to a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910 are determined in said sample of a bodily fluid of said subject and controlled to the respective control level.
  • 34. The method according to claim 33, wherein said at least 100 circRNAs have a sequence encoded by a sequence selected from the group consisting of SEQ ID NO:911 to 1820, or a sequence being at least 70% identical thereto.
  • 35. The method according to claim 33, wherein said at least 100 circRNAs comprise a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:200, or said circRNAs have a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO: 911 to 1110.
  • 36. The method according to claim 33, wherein the presence of increased or decreased levels as defined in Table 1 under “disease” for the respective circRNA for at least 10 circRNAs are indicative of the presence of a neurodegenerative disease.
  • 37. The method according to claim 31, wherein the levels of all circRNAs comprising a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910 or all circRNAs having a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:911 to SEQ ID NO:1820 are detected, wherein the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10 of the respective circRNAs are indicative of the presence of a neurodegenerative disease.
  • 38. The method according to claim 23, wherein the neurodegenerative disease is Alzheimer's disease.
  • 39. A nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of the sequences listed in Table 1, or specifically hybridizing to a reverse complement sequence thereof.
  • 40. A kit for diagnosing a neurodegenerative disease, comprising means for specifically detecting one or more nucleic acid sequence encoded by a sequence selected from the group consisting of SEQ ID NO:1 to 910 or SEQ ID NO:911 to SEQ ID NO:1820, or a sequence being at least 70% identical to the recited sequences.
  • 41. The kit according to claim 40, wherein the kit comprises means for specifically detecting at least 100 nucleic acid sequences encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910 or SEQ ID NO:911 to SEQ ID NO:1820.
  • 42. The kit according to claim 40, wherein the kit comprises one or more nucleic acid probes specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:910, or hybridizing the reverse complements thereof.
  • 43. The kit according to claim 40, further comprising means for handling and/or preparation of a bodily fluid sample, wherein the bodily fluid is blood or cerebrospinal fluid.
  • 44. An array for determining the presence or level of a plurality of nucleic acids, comprising a plurality of probes, wherein the plurality of probes specifically hybridize to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:910, or hybridizing the reverse complements thereof.
Priority Claims (1)
Number Date Country Kind
15187446.8 Sep 2015 EP regional
PCT Information
Filing Document Filing Date Country Kind
PCT/EP2016/073321 9/29/2016 WO 00