A METHOD FOR DIAGNOSING A DISEASE BY DETECTION OF circRNA IN BODILY FLUIDS

TECHNICAL FIELD OF THE INVENTION

The present invention relates to the field of medicine and RNA biology, in particular it relates to the field of diagnosis of a disease using circRNAs, more particular the present invention relates to the field of diagnosis of a neurodegenerative disease, e.g. Alzheimer's disease.

BACKGROUND OF THE INVENTION

Many diseases are associated with deregulation of gene expression. Such deregulation is in many cases detectable at early stages of the disease. In fact, detection of deregulated expression often serves as a biomarker for a disease or the risk for acquiring a disease before the disease manifests in terms of symptoms. However, diagnosis is often restricted to samples of the diseased tissue. In some cases, diagnosis also aims at the detection of biomarkers in easily accessible samples, such as blood. However, these methods are restricted to protein biomarkers and require cumbersome preparation of the samples, e.g. serum or plasma samples. The direct readout of expression, i.e. RNA, is in most cases not feasible in blood samples, as RNAs are prone to degradation. Hence, RNA nowadays is a poor biomarker in blood, in particular for diseases manifesting in tissues others than blood.

Regulatory RNAs such as microRNAs (miRNAs) or long non-coding RNAs (lncRNAs) have been implicated in many biological processes and human diseases such as cancer (reviewed in Batista P J, Chang H Y. Long Noncoding RNAs: Cellular Address Codes in Development and Disease. Cell. 2013; 152(6):1298-1307; and Cech T R, Steitz J A. The Noncoding RNA Revolution Trashing Old Rules to Forge New Ones. Cell. 2014; 157(1):77-94). Recent studies have drawn attention to a new class of RNA that is endogenously expressed as single-stranded, covalently closed circular molecules (circRNA, reviewed in Jeck W R, Sharpless N E. Detecting and characterizing circular RNAs. Nature Biotechnology. 2014; 32(5):453-461). Most circRNAs are probably products of a ‘back-splice’ reaction that joins a splice donor site with an upstream splice acceptor site (see Ashwal-Fluss R, Meyer M, Pamudurti N R, et al. circRNA Biogenesis Competes with Pre-mRNA Splicing. MOLCEL. 2014; 1-12; and Starke S, Jost I, Rossbach O, et al. Exon Circularization Requires Canonical Splice Signals. Cell Reports. 2015; 10(1):103-111). Circular RNA is known for several decades from viroids, viruses and plants, but until recently only few mammalian circRNAs were reported. Sequencing based studies lately revealed that circRNAs are abundantly and prevalently expressed across life, oftentimes in a tissue and developmental-stage specific manner (see Danan M, Schwartz S, Edelheit S, Sorek R. Transcriptome-wide discovery of circular RNAs in Archaea. Nucleic Acids Res. 2012; 40(7):3131-3142; Salzman J, Gawad C, Wang P L, Lacayo N, Brown P O. Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS ONE. 2012; 7(2):e30733; Jeck W R, Sorrentino J A, Wang K, et al. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2013; 19(2):141-157; Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338; Salzman J, Chen R E, Olsen M N, Wang P L, Brown P O. Cell-Type Specific Features of Circular RNA Expression. PLoS Genetics. 2013; 9(9):e1003777; Wang P L, Bao Y, Yee M-C, et al. Circular RNA is expressed across the eukaryotic tree of life. PLoS ONE. 2014; 9(3):e90859; Guo J U, Agarwal V, Guo H, Bartel D P. Expanded identification and characterization of mammalian circular RNAs. Genome Biology. 2014; 1-14; and You X, Vlatkovic I, Babic A, et al. Neural circular RNAs are derived from synaptic genes and regulated by development and plasticity. Nat Neurosci. 20117-25). The vast majority of circRNAs consists of 2-4 exons of protein coding genes, but they can also derive from intronic, non-coding, antisense, 5′ or 3′ untranslated or intergenic genomic regions (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338; and Zhang Y, Zhang X-O, Chen T, et al. Circular Intronic Long Noncoding RNAs. MOLCEL. 2013; 1-15). Although not fully understood, the biogenesis of many mammalian circRNAs depends on complementary sequences within flanking introns (see Ashwal-Fluss R, Meyer M, Pamudurti N R, et al. circRNA Biogenesis Competes with Pre-mRNA Splicing. MOLCEL. 2014; 1-12; Rybak-Wolf A, Stottmeister C, Glažar P, et al. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. MOLCEL. 2015; 1-17; Zhang X-O, Wang H-B, Zhang Y, et al. Complementary Sequence-Mediated Exon Circularization. Cell. 2014; Liang D, Wilusz J E. Short intronic repeat sequences facilitate circular RNA production. Genes and Development. 2014; Conn S J, Pillman K A, Toubia J, et al. The RNA Binding Protein Quaking Regulates Formation of circRNAs. Cell. 2015; 160(6):1125-1134; and Ivanov A, Memczak S, Wyler E, et al. Analysis of Intron Sequences Reveals Hallmarks of Circular RNA Biogenesis in Animals. CellReports. 2015; 10(2):170-177) and their expression can be modulated by antagonistic or activating trans-acting factors such as ADAR and Quaking (see Ivanov A, Memczak S, Wyler E, et al. Analysis of Intron Sequences Reveals Hallmarks of Circular RNA Biogenesis in Animals. Cell Reports. 2015; 10(2):170-177; and Conn S J, Pillman K A, Toubia J, et al. The RNA Binding Protein Quaking Regulates Formation of circRNAs. Cell. 2015; 160(6):1125-1134; respectively). Although the function of animal circRNAs is largely unknown, it was demonstrated that the circRNAs CDR1as (ciRS-7) and SRY can act as antagonists of specific miRNAs by functioning as miRNA sponges (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338; and Hansen T B, Jensen T I, Clausen B H, et al. Natural RNA circles function as efficient microRNA sponges. Nature. 2013; 495(7441):384-388). Moreover, stable knockdown of CDR1as caused a migration defect in cell culture and a circRNA produced from the muscleblind transcript can bind muscleblind protein and likely regulate its expression levels (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338; and Ashwal-Fluss R, Meyer M, Pamudurti N R, et al. circRNA Biogenesis Competes with Pre-mRNA Splicing. MOLCEL. 2014; 1-12). Besides these specific functions for the few in-depth analyzed circRNAs, a recent study uncovered a putatively more general competition mechanism between linear RNA splicing and co-transcriptional circular RNA splicing (see Ashwal-Fluss R, Meyer M, Pamudurti N R, et al. circRNA Biogenesis Competes with Pre-mRNA Splicing. MOLCEL. 2014; 1-12). The lack of free ends, i.e. its circularity, renders circRNAs resistant to exonucleolytic activities within cells and in extracellular environments. Thus, circRNAs are stable molecules as demonstrated by their long half lives in cells a feature that distinguishes them from canonical linear RNA isoforms (see Cocquerelle C, Daubersies P, Majérus MA, Kerckaert J P, Bailleul B. Splicing with inverted order of exons occurs proximal to large introns. EMBO J 1992; 11(3):1095-1098; Jeck W R, Sorrentino J A, Wang K, et al. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2013; 19(2):141-157; and Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338).

The inventors now for the first time show the presence of a plurality of ten thousands of circRNAs in standard clinical whole blood specimen of diseased subjects and thereby show that circRNAs function as biomarkers in human disorders, in particular neurodegenerative disorders, as exemplified by Alzheimer's disease. Strikingly, the mRNA transcripts which give rise to circRNAs were in hundreds of cases almost not detectable while the corresponding circRNAs were highly expressed, underlining the significance of circRNAs as novel biomarkers. Approaches have been performed to detect circRNAs as biomarkers in blood. However, these approaches use processed blood. Blood-exosomes have been postulated as comprising circRNAs (Li, Yan et al. (2015); Circular RNA is enriched and stable in exosomes: a promising biomarker for cancer diagnosis; Cell Res, 25(8): 981-984). However, blood-exosomes are difficultly obtainable and the cumbersome procedure renders the procedure susceptible to errors and the significance of the so obtained circRNA levels questionable. The present inventors however showed that circRNAs in unprocessed samples, e.g. whole blood, are detectable and are surprisingly well suited as biomarkers.

Alzheimer's disease (also referred to as “AD”), is the cause for 60% to 70% of cases of dementia. It is a chronic neurodegenerative disease starting slowly and getting worse over time. One of the first symptoms is a short-term memory loss. As the disease advances, symptoms can include problems with language, disorientation (including easily getting lost), mood swings, loss of motivation, not managing self care, and behavioural issues. As a person's condition declines, she or he often withdraws from family and society. Gradually, bodily functions are lost, ultimately leading to death. Although the speed of progression can vary, the average life expectancy following diagnosis is three to nine years. The cause of Alzheimer's disease is poorly understood. About 70% of the risk is believed to be genetic with many genes usually involved. Other risk factors include a history of head injuries, depression, or hypertension. The disease process is associated with plaques and tangles in the brain.

Currently, the diagnosis of Alzheimer's disease is based on the history of the illness and cognitive testing. These tests are often substituted by medical imaging and blood tests to rule out other possible causes. Initial symptoms are often mistaken for normal ageing. Today, examination of brain tissue is needed for a definite diagnosis. However, brain tissue is not easily accessible and the surgical intervention causes severe dangers. Mental and physical exercise, and avoiding obesity may decrease the risk of AD.

In 2010, there were between 21 and 35 million people worldwide with AD. It most often begins in people over 65 years of age, although 4% to 5% of cases are early-onset Alzheimer's which begin before this. It affects about 6% of people 65 years and older. In 2010, dementia resulted in about 486,000 deaths. In developed countries, AD is one of the most financially costly diseases. There is a long felt need for a direct, easy and reliable diagnosis of Alzheimer's disease to allow intervention and prevention of adverse effects of the beginning or progressing mental degeneration.

The molecular underpinnings of AD are controversially debated; although substantial research efforts were made and are currently ongoing (annual budget for AD of the NIH in 2015 is $566 million). In particular there is an urgent need for biomarkers for AD since it is believed that the molecular alterations involved in the disease precede the symptoms years or even decades, hindering therapeutic interventions.

SUMMARY OF THE INVENTION

The present invention provides for a method that overcomes the above outlined drawbacks. The inventors have found that it is possible to detect circRNAs in samples of a bodily fluid in a great amount. The inventors further have proven that the circRNAs are indicative for a disease that is not a disease of the bodily fluid. Thereby, a tool is given to directly diagnose a disease by determining presence or absence of one or more circRNAs in a bodily fluid. The invention therefore provides for a new class of biomarkers in bodily fluids, e.g. blood.

Hence, the present invention relates to a method for diagnosing a disease of a subject, comprising the step of:

- determining the presence or absence of one or more circular RNA (circRNA) in a sample of a bodily fluid of said subject;

wherein the presence or absence of said one or more circRNA is indicative for the disease. Preferably, said disease is not a disease of said bodily fluid.

As has been shown by the inventors, certain circRNAs may be present in samples of a diseased subject at differing levels as compared to samples from healthy subjects. Hence, it may be desirable to decide on “presence” or “absence” of a circRNA when compared to a control level. Hence, in a preferred embodiment of the method according to the present invention the determination step comprises:

- determining the level of said one or more circRNA;
- comparing the determined level to a control level of said one or more circRNA;

wherein differing levels between the determined and the control level are indicative for the disease. Hence, the invention also relates to a method for diagnosing a disease of a subject, comprising the step of:

- determining the level of said one or more circRNA;
- comparing the determined level to a control level of said one or more circRNA;

wherein differing levels between the determined and the control level are indicative for the disease.

In a preferred embodiment of the present invention, the method is a method for diagnosing a neurodegenerative disease in a subject. Hence, the invention also relates to a method for diagnosing the neurodegenerative disease, preferably Alzheimer's disease, in a subject comprising the steps of:

- determining the level of one or more circRNA in a sample of a bodily fluid of said subject;
- comparing the determined level to a control level of said one or more circRNA; wherein differing levels between the determined and the control level are indicative for the disease. A particular preferred neurodegenerative disease is Alzheimer's disease.

The inventors, furthermore, identified circRNAs that were not previously known to be present in blood. These novel blood circRNAs are listed in Table 1. Hence the present invention also relates to a nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to 910, and a RNA sequence encoded by a sequence of SEQ ID NO:1 to 910, or specifically hybridizing to a reverse complement sequence thereof; preferably specifically hybridizing to a sequence selected from the group consisting of SEQ ID NO:1 to 910, and a RNA sequence encoded by a sequence of SEQ ID NO:1 to 910, or specifically hybridizing to a reverse complement sequence thereof.

Furthermore, the invention relates to a kit for diagnosing a neurodegenerative disease, comprising means for specifically detecting one or more nucleic acid sequence encoded by a sequence selected from the group consisting of SEQ ID NO:1 to 910 or SEQ ID NO:911 to SEQ ID NO:1820.

The invention, furthermore, provides an array for determining the presence or level of a plurality of nucleic acids, comprising a plurality of probes, wherein the plurality of probes specifically hybridize to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and a RNA sequence encoded by a sequence of SEQ ID NO:1 to 910, or to a reverse complement sequence thereof; preferably the plurality of probes comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of at least 100, preferably at least 150 more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and a RNA sequence encoded by a sequence of SEQ ID NO:1 to 910; or specifically hybridize to the reverse complement sequences thereof, preferably the plurality of probes comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of SEQ ID NO:1 to SEQ ID NO:200, or a RNA sequence encoded by a sequence of SEQ ID NO:1 to 200, or the reverse complement sequences thereof.

The invention in particular relates to the use of a nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910; and a RNA sequence encoded by a sequence of SEQ ID NO:1 to 910, or hybridizing to the reverse complement thereof, a kit according to the invention, or an array according to the invention for the diagnosis of a neurodegenerative disease, preferably for the diagnosis of Alzheimer's disease.

FIGURE LEGENDS

FIG. 1: Thousands of circRNAs are reproducibly detected in human blood. (A) Total RNA was extracted from human whole blood samples and rRNA was depleted. cDNA libraries were synthesized using random primers and subjected to sequencing. Raw sequencing reads were used for circRNA detection as previously described (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338). Sequencing reads that map continuously to the human reference genome were disregarded. From unmapped reads anchors were extracted and independently mapped. Anchors that align consecutively indicate linear splicing events 1) whereas alignment in reverse orientation indicates head-to-tail splicing as observed for circular RNAs 2). After filtering of linear splicing events and circRNA candidates (see Methods in the Example section) the genomic coordinates and additional information such as read count, alignment quality and annotation are documented (Table 3). (B) circRNA candidate expression in human whole blood samples from two donors, ECDF=empirical cumulative distribution function. circRNA candidates tested in this study are annotated as numbers. Right panel: mRNA and long, non-coding RNA (lncRNA) (n=17,282) expression per gene in two blood samples in transcripts per million (TPM), RNAs with putative circular isoforms (n=2,523) are highlighted in blue; R-values: Spearman correlation for RNAs found in both samples. (C) ENSEMBLE genome annotation for reproducibly detected circRNA candidates (see also FIG. 5). Number of circRNAs with at least one splice site in each category is given. (D) Number of distinct circRNA candidates per gene. y-axis=log₂(circRNA frequency+1). Gene names with the highest numbers are highlighted. (E) Expression level of top 8 circRNA candidates measured with sequencing (left panel) and divergent primers in qPCR (right); Ct=cycle threshold linear control genes VCL and TFRC were measured with convergent primer.

FIG. 2: Top expressed blood circRNAs dominate over linear RNA isoforms. (A) Example for the read coverage of a top expressed blood circRNA produced from the PCNT gene locus (http://genome.ucsc.edu; see Kent W J, Sugnet C W, Furey T S, et al. The human genome browser at UCSC. Genome Research. 2002; 12(6):996-1006). Data are shown for the human HEK293 cell line (see Ivanov A, Memczak S, Wyler E, et al. Analysis of Intron Sequences Reveals Hallmarks of Circular RNA Biogenesis in Animals. CellReports. 2015; 10(2):170-177) and two biologically independent blood RNA preparations. (B) Relative expression and raw Ct values of top expressed blood circRNAs and corresponding linear isoforms in HEK293 cells and whole blood (C).

FIG. 3: Circular to linear RNA isoform expression is high in blood compared to other tissues. (A) Comparison of circular to linear RNA isoforms in blood. circRNAs measured by head-to-tail spanning reads. As a proxy for linear RNA expression median linear splice site spanning reads were counted. Data are shown for one replicate each of blood cerebellum (B) and liver (C). Relative fraction of circRNA candidates with >4× higher expression than linear isoforms are given as inset, eight tested candidates are indicated by numbers, circRNAs derived from hemoglobin are marked in (A). (D) mean circular-to-linear RNA expression ratio for the same samples, in two biological independent replicates. Error bars indicate the standard error of the mean, *** denotes P<0.001 permutation test on pooled replicate data (see Method section in the Examples). (A-D) represent expression datasets for one replicate per sample (FIG. 15).

FIG. 4: Comparative analysis of blood circRNA expression in Alzheimer's disease patients and controls. (A) Principle Component Analysis (PCA) of circRNA expression for 5 control (H) and 5 Alzheimer's disease (AD) patients. A circRNA subset comprising the top 910 (out of 20,969) detected circRNAs was analyzed (see Results section in the Examples). (B) analysis as in (A) for the corresponding linear RNA isoforms measured by median read count of linear splice junctions. (C) Expression of 200 circRNA candidates with highest weight in PC2 (see A) were used for unsupervised clustering (Spearman's rank correlation as distance metric, see Method section in the Examples). PC2 represents the diseased/healthy principle component. Histograms show expression distribution. Patient details are given underneath each patient ID. (D) analysis as in (C) but for linear RNAs of the corresponding genes (n=167).

FIG. 5: Reproducibility of circRNA candidate detection. The overlap of 2,442 circRNAs found with at least 2 read counts in both samples is considered as reproducibly detected circRNA set.

FIG. 6: Technical reproducibility of circRNA candidate detection. A library of blood sample 1 (H_1) was sequenced twice (see Table 2).

FIG. 7: GO annotation of circRNAs and linear RNAs in blood. Significantly enriched GO terms (p<0.05) for circRNAs found in both samples (n=2,442) and for the same number of top expressed linear RNAs.

FIG. 8: Predicted circRNA length. Predicted spliced circRNA length distributions for circRNA candidates detected in liver, cerebellum and blood.

FIG. 9: circRNA candidate validation. (A) Top circRNA candidate expression was measured in qPCR using divergent primer on mock or RNase R treated total RNA preparation. 7/8 were successfully amplified while candidate 7 did not yield specific PCR products and is therefore excluded from further analysis. Linear RNAs and previously described circRNAs are shown as controls. (B) PCR amplicons for divergent and convergent primer sets (c—circular, l—linear) of the tested candidates, end point analysis after 40 cycles. (C) PCR amplicons were subjected to Sanger sequencing and checked for the presence of a head-to-tail junction, representative example result is shown.

FIG. 10: Comparison of circRNA candidates in blood to liver and cerebellum. (A) Comparison of circular RNA candidates detected in blood (sample 1) and cerebellum shown for the whole expression range. (B) fraction of circRNA candidates that overlap between the two samples binned by blood expression level. (C, D) analysis as before but for liver circRNA candidates.

FIG. 11: Correlation of linear RNAs in cerebellum and blood and liver and blood. Number of detected transcripts: blood=29,908; cerebellum=38,192; liver=27,880; TPM=transcripts per million.

FIG. 12: Comparison circ-to-linear expression by RNA-Seq and qPCR. Raw Ct values (Cycle threshold) and median linear splice junction spanning read counts are given for the respective RNA isoform.

FIG. 13: Histogram of principle components. Principle components (PC) were calculated from the analysis shown in FIG. 4 (A, B).

FIG. 14: Number of exons per circRNA in blood. Histogram of number of exons per circRNA. Reproducible detected set (2,442) without intergenic circRNAs (n=27); median exon number: 2, mean exon number: 2.8.

FIG. 15: List of circRNAs detected in human blood. Genomic location, ENSEMBL gene identifier and symbols and gene biotype are given together in Table 1, infra. Here, raw read counts for each circRNA candidate in each sample of healthy subjects (H_1 to H_5) and subjects suffering from Alzheimer's disease (AD_1 to AD_5) are given.

DETAILED DESCRIPTION OF THE INVENTION

As outlined herein and exemplified in the Examples, the inventors for the first time provide evidence that circular RNAs (circRNA) is present in whole blood in great amounts and suited as biomarker for diseases in a subject. Hence, the invention relates to a method for diagnosing a disease of a subject, comprising the step of determining the presence or absence of one or more circular RNA (circRNA) in a sample of a bodily fluid of said subject; wherein the presence or absence of said one or more circRNA is indicative for the disease.

It was unexpectedly possible to show a correlation of differing levels of RNAs and a disease in a tissue other than the sample tissue, i.e. other than the bodily fluid tested. Hence, in a preferred embodiment said disease is not a disease of said bodily fluid. The gist of the present invention is that circRNAs in bodily fluids like blood are unexpectedly suited as biomarkers.

The term “biomarker” (biological marker) was introduced in 1989 as a Medical Subject Heading (MeSH) term: “measurable and quantifiable biological parameters (e.g., specific enzyme concentration, specific hormone concentration, specific gene phenotype distribution in a population, presence of biological substances) which serve as indices for health- and physiology-related assessments, such as disease risk, psychiatric disorders, environmental exposure and its effects, disease diagnosis, metabolic processes, substance abuse, pregnancy, cell line development, epidemiologic studies, etc.” In 2001, an NIH working group standardized the definition of a biomarker as “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention” and defined types of biomarkers. A biomarker may be measured on a biosample (as a blood, urine, or tissue test), it may be a recording obtained from a person (blood pressure, ECG, or Holter), or it may be an imaging test (echocardiogram or CT scan). Biomarkers can indicate a variety of health or disease characteristics, including the level or type of exposure to an environmental factor, genetic susceptibility, genetic responses to exposures, markers of subclinical or clinical disease, or indicators of response to therapy. Thus, a simplistic way to think of biomarkers is as indicators of disease trait (risk factor or risk marker), disease state (preclinical or clinical), or disease rate (progression). Accordingly, biomarkers can be classified as antecedent biomarkers (identifying the risk of developing an illness), screening biomarkers (screening for subclinical disease), diagnostic biomarkers (recognizing overt disease), staging biomarkers (categorizing disease severity), or prognostic biomarkers (predicting future disease course, including recurrence and response to therapy, and monitoring efficacy of therapy). The biomarkers of the present invention are preferably antecedent or screening biomarkers. Hence, the methods are methods for diagnosing the presence or the risk for acquiring a disease.

Biomarkers may also serve as surrogate end points. Although there is limited consensus on this issue, a surrogate end point is one that can be used as an outcome in clinical trials to evaluate safety and effectiveness of therapies in lieu of measurement of the true outcome of interest. The underlying principle is that alterations in the surrogate end point track closely with changes in the outcome of interest. Surrogate end points have the advantage that they may be gathered in a shorter time frame and with less expense than end points such as morbidity and mortality, which require large clinical trials for evaluation. Additional values of surrogate end points include the fact that they are closer to the exposure/intervention of interest and may be easier to relate causally than more distant clinical events. An important disadvantage of surrogate end points is that if the clinical outcome of interest is influenced by numerous factors (in addition to the surrogate end point), residual confounding may reduce the validity of the surrogate end point. It has been suggested that the validity of a surrogate end point is greater if it can explain at least 50% of the effect of an exposure or intervention on the outcome of interest.

A “sample” in the meaning of the invention can be all biological fluids of the subject, such as lymph, saliva, urine, cerebrospinal fluid or blood. The sample is collected from the patient or subjected to the diagnosis according to the invention. The sample of the bodily fluid is in a preferred embodiment selected from the group consisting of blood, cerebrospinal fluid, saliva, serum, plasma, and semen, the most preferred embodiment of the sample is a whole blood sample. A “sample” in the meaning of the invention may also be a sample originating from a biochemical or chemical reaction such as the product of an amplification reaction. Liquid samples may be subjected to one or more pre-treatments prior to use in the present invention. Such pre-treatments include, but are not limited to dilution, filtration, centrifugation, concentration, sedimentation, precipitation or dialysis. Pre-treatments may also include the addition of chemical or biochemical substances to the solution, e.g. in order to stabilize the sample and the contained nucleic acids, in particular the circRNAs. Such addition of chemical or biochemical substances include acids, bases, buffers, salts, solvents, reactive dyes, detergents, emulsifiers, or chelators, like EDTA. The sample may for instance be taken and directly mixed with such substances. In a particularly preferred embodiment of the invention the sample is a whole blood sample. The whole blood sample is preferably not pre-treated by means of dilution, filtration, centrifugation, concentration, sedimentation, precipitation or dialysis. It is, however, preferred that substances are added to the sample in order to stabilize the sample until onset of analysis. “Stabilizing” in this context means prevention of degradation of the circRNAs to be determined. Preferred stabilizers in this context are EDTA, e.g. K₂EDTA, RNase inhibitors, alcohols e.g. ethanol and isopropanol, agents used to salt out proteins (such as RNAlater).

“Whole blood” is a venous, arterial or capillary blood sample in which the concentrations and properties of cellular and extra-cellular constituents remain relatively unaltered when compared with their in vivo state. In some embodiments, anticoagulation in vitro stabilizes the constituents in a whole blood sample.

In a preferred embodiment the sample comprises a nucleic acid or nucleic acids. The term “nucleic acid” is here used in its broadest sense and comprises ribonucleic acids (RNA) and deoxyribonucleic acids (DNA) from all possible sources, in all lengths and configurations, such as double stranded, single stranded, circular, linear or branched. All sub-units and sub-types are also comprised, such as monomeric nucleotides, oligomers, plasmids, viral and bacterial nucleic acids, as well as genomic and non-genomic DNA and RNA from the subject, circular RNA (circRNA), messenger RNA (mRNA) in processed and unprocessed form, transfer RNA (tRNA), heterogeneous nuclear RNA (hn-RNA), ribosomal RNA (rRNA), complementary DNA (cDNA) as well as all other conceivable nucleic acids. However, in the most preferred embodiment the sample comprises circRNAs.

“Presence” or “absence” of a circRNA in connection with the present invention means that the circRNA is present at levels above a certain threshold or below a certain threshold, respectively. In case the threshold is “0” this would mean that “presence” is the actual presence of circRNA in the sample and “absence” is the actual absence. However, “presence” in context with the present invention may also mean that the respective circRNA is present at a level above a threshold, e.g. the levels determined in a control. “absence” in this context then means that the level of the circRNA is at or below the certain threshold. Hence, it is preferred that the method of the present invention comprises determining of the level of one or more circRNA and comparing it to a control level of said one or more circRNA. In a preferred embodiment of the invention the determination step comprises: (i) determining the level of said one or more circRNA; and (ii) comparing the determined level to a control level of said one or more circRNA; wherein differing levels between the determined and the control level are indicative for the disease. In other words, the invention relates to a method for diagnosing a disease of a subject, comprising the step of (i) determining the level of said one or more circRNA; and (ii) comparing the determined level to a control level of said one or more circRNA; wherein differing levels between the determined and the control level are indicative for the disease.

The term “control level” relates to a level to which the determined level is compared in order to allow the distinction between “presence” or “absence” of the circRNA. The control level is preferably the level which is determinant for the deductive step of making the actual diagnose. Control level in a preferred embodiment relates to the level of the respective circRNA in a healthy subject or a population of healthy subjects, i.e. a subject not having the disease to be diagnosed, e.g. not having a neurodegenerative disease, such as Alzheimer's disease. The skilled person with the disclosure of the present application is in the position to determine suited control levels using common statistical methods.

In the context of the present invention, the levels of the one or more circRNA may be analyzed in a number of fashions well known to a person skilled in the art. For example, each assay result obtained may be compared to a “normal” or “control” value, or a value indicating a particular disease or outcome. A particular diagnosis/prognosis may depend upon the comparison of each assay result to such a value, which may be referred to as a diagnostic or prognostic “threshold”. In certain embodiments, assays for one or more diagnostic or prognostic indicators are correlated to a condition or disease by merely the presence or absence of the circRNAs in the assay. For example, an assay can be designed so that a positive signal only occurs above a particular threshold level of interest, and below which level the assay provides no signal above background.

The sensitivity and specificity of a diagnostic and/or prognostic test depends on more than just the analytical “quality” of the test, they also depend on the definition of what constitutes an abnormal result, i.e. when a level may be regarded as differing from a control level. In practice, Receiver Operating Characteristic curves (ROC curves), are typically calculated by plotting the value of a variable versus its relative frequency in “normal” (i.e. apparently healthy individuals not having ovarian cancer) and “disease” populations. For any particular marker, a distribution of marker levels for subjects with and without a disease will likely overlap. Under such conditions, a test does not absolutely distinguish normal from disease with 100% accuracy, and the area of overlap indicates where the test cannot distinguish normal from disease. A threshold is selected, below which the test is considered to be abnormal and above which the test is considered to be normal. The area under the ROC curve is a measure of the probability that the perceived measurement will allow correct identification of a condition. ROC curves can be used even when test results don't necessarily give an accurate number. As long as one can rank results, one can create a ROC curve. For example, results of a test on “disease” samples might be ranked according to degree (e.g. 1=low, 2=normal, and 3=high). This ranking can be correlated to results in the “normal” or “control” population, and a ROC curve created. These methods are well known in the art. See, e.g., Hanley et al. 1982. Radiology 143: 29-36. Preferably, a threshold is selected to provide a ROC curve area of greater than about 0.5, more preferably greater than about 0.7, still more preferably greater than about 0.8, even more preferably greater than about 0.85, and most preferably greater than about 0.9. The term “about” in this context refers to +/−5% of a given measurement.

The horizontal axis of the ROC curve represents (1-specificity), which increases with the rate of false positives. The vertical axis of the curve represents sensitivity, which increases with the rate of true positives. Thus, for a particular cut-off selected, the value of (1-specificity) may be determined, and a corresponding sensitivity may be obtained. The area under the ROC curve is a measure of the probability that the measured marker level will allow correct identification of a disease or condition. Thus, the area under the ROC curve can be used to determine the effectiveness of the test.

In other embodiments, a positive likelihood ratio, negative likelihood ratio, odds ratio, or hazard ratio is used as a measure of a test's ability to predict risk or diagnose a disease. In the case of a positive likelihood ratio, a value of 1 indicates that a positive result is equally likely among subjects in both the “diseased” and “control” groups; a value greater than 1 indicates that a positive result is more likely in the diseased group; and a value less than 1 indicates that a positive result is more likely in the control group. In the case of a negative likelihood ratio, a value of 1 indicates that a negative result is equally likely among subjects in both the “diseased” and “control” groups; a value greater than 1 indicates that a negative result is more likely in the test group; and a value less than 1 indicates that a negative result is more likely in the control group.

In the case of an odds ratio, a value of 1 indicates that a positive result is equally likely among subjects in both the “diseased” and “control” groups; a value greater than 1 indicates that a positive result is more likely in the diseased group; and a value less than 1 indicates that a positive result is more likely in the control group.

In the case of a hazard ratio, a value of 1 indicates that the relative risk of an endpoint (e.g., death) is equal in both the “diseased” and “control” groups; a value greater than 1 indicates that the risk is greater in the diseased group; and a value less than 1 indicates that the risk is greater in the control group.

The skilled artisan will understand that associating a diagnostic or prognostic indicator, with a diagnosis or with a prognostic risk of a future clinical outcome is a statistical analysis. For example, a marker level of lower than X may signal that a patient is more likely to suffer from an adverse outcome than patients with a level more than or equal to X, as determined by a level of statistical significance. For another marker, a marker level of higher than X may signal that a patient is more likely to suffer from an adverse outcome than patients with a level less than or equal to X, as determined by a level of statistical significance. Additionally, a change in marker concentration from baseline levels may be reflective of patient prognosis, and the degree of change in marker level may be related to the severity of adverse events. Statistical significance is often determined by comparing two or more populations, and determining a confidence interval and/or a p value. See, e.g., Dowdy and Wearden, Statistics for Research, John Wiley & Sons, New York, 1983. Preferred confidence intervals of the invention are 90%, 95%, 97.5%, 98%, 99%, 99.5%, 99.9% and 99.99%, while preferred p values are 0.1, 0.05, 0.025, 0.02, 0.01, 0.005, 0.001, and 0.0001.

Suitable threshold levels for the diagnosis of the disease can be determined for certain combinations of circRNAs. This can e.g. be done by grouping a reference population of patients according to their level of circRNAs into certain quantiles, e.g. quartiles, quintiles or even according to suitable percentiles. For each of the quantiles or groups above and below certain percentiles, hazard ratios can be calculated comparing the risk for an adverse outcome, i.e. a “disease” or “Alzheimer's disease”, between those patients who have a certain disease and those who have not. In such a scenario, a hazard ratio (HR) above 1 indicates a higher risk for an adverse outcome for the patients. A HR below 1 indicates beneficial effects of a certain treatment in the group of patients. A HR around 1 (e.g. +/−0.1) indicates no elevated risk for the particular group of patients. By comparison of the HR between certain quantiles of patients with each other and with the HR of the overall population of patients, it is possible to identify those quantiles of patients who have an elevated risk and those who benefit from medication and thereby stratify subjects according to the present invention.

In some cases presence of the disease will not affect patients with levels (e.g. in the fifth quintile) of a circRNA different from the “control level”, while in other cases patients with levels similar to the control level will be affected (e.g. in the first quintile). However, with the above explanations and his common knowledge, a skilled person is able to identify those groups of patients having a disease, e.g. a neurodegenerative disease as Alzheimer's disease. Exemplarily, some combinations of levels of circRNAs are listed for Alzheimer's disease in the appended examples. In another embodiment of the invention, the diagnosis is determined by relating the patient's individual level of marker peptide to certain percentiles (e.g. 97.5th percentile (in case increased levels being indicative for a disease) or the 2.5^thpercentile (in case decreased levels being indicative for a disease)) of a healthy population.

Kaplan-Meier estimators may be used for the assessment or prediction of the outcome or risk (e.g. diagnosis, relapse, progression or morbidity) of a patient.

“Equal” level in context with the present invention means that the levels differ by not more than ±10%, preferably by not more than ±5%, more preferably by not more than ±2%. “Decreased” or “increased” level in the context of the present invention mean that the levels differ by more than 10%, preferably by more than 15%, preferably more than 20%.

The term “subject” relates to a subject to be diagnosed, preferably a subject suspected to have or to have a risk for acquiring a disease, preferably a neurodegenerative disease, more preferably a subject suspected to have or to have a risk for acquiring Alzheimer's disease. The subject is preferably an animal, more preferably a mammal, most preferably a human.

The inventors have found that differential abundance of circRNA in samples of a bodily fluid is suited as a biomarker. It has been found that differing levels of circRNAs are correlating with a disease. This has been proven for Alzheimer's disease, a disease of neuronal tissue. Without being bound by reference, the correlation may be due to a passage of the circRNAs through the blood-brain-barrier, i.e. the circRNAs detected in the method according to the present invention are differentially expressed in the tissue of the disease, i.e. the tissue of interest. In case of the neurodegenerative disease (e.g. Alzheimer's disease) the tissue of interest is neuronal tissue, e.g. the brain. Hence, in one embodiment of the present invention said one or more circRNA, the differing levels of which in the bodily fluid are attributed to the presence of the disease, is differentially expressed between the diseased and non-diseased state in the tissue of interest. Alternatively, the circRNA levels may be of secondary nature, e.g. arise due to an immune response in the bodily fluid. Hence, in a further embodiment said one or more circRNA, the differing levels of which in the bodily fluid are attributed to the presence of the disease, is not differentially expressed between the diseased and non-diseased state in the tissue of interest.

Circular RNA” (circRNA) has been previously described. However, not in connection with their detection in a bodily fluid, e.g. blood. The skilled person is able to determine whether a detected RNA is a circular RNA. In particular, a circRNA does not contain a free 3′-end or a free 5′ end, i.e. the entire nucleic acid is circularized. The circRNA is preferably a circularized, single stranded RNA molecule. Furthermore, the circRNA according to the present invention is a result of a head-to-tail splicing event that results in a discontinuous sequence with respect to the genomic sequence encoding the RNA. This means that a first sequence being present 5′-upstream of a second sequence in the genomic context, on the circRNA said first sequence at its 5′-end is linked to the 3′ end of said second sequence and thereby closing the circle. The consequence of this arrangement is that at the junction where the 5′-end of said first sequence is linked to the 3′-end of said second sequence a unique sequence is build that is neither present in the genomic context nor in the normally transcribed RNA, e.g. mRNA. It has been found by the inventors that these junctions in all identified circRNAs, in the genomic context, are flanked by the canonical splice sequence, the GT/AG splice signal known by the skilled person. Hence, the circRNAs according to the present invention preferably contain an exon-exon junction in a head-to-tail arrangement, as visualized in FIG. 1A as a result of a back-splicing reaction. The skilled person will recognize that a usual mRNA transcript contains exon-exon junctions in a tail-to-head arrangement, i.e. the 3′ end (tail) of exon being upstream in the genomic context is linked to the 5′ end (head) of the exon being downstream in the genomic context. The actual junction, i.e. the point at which the one exon is linked to the other is also referred to herein as “breakpoint”. In a preferred embodiment the presence or absence of a circRNA or the level of a circRNA is determined by detection of an exon-exon-junction in a head-to-tail arrangement. One possible approach is exemplified in the enclosed examples. The detection of circular RNA has been previously described in Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338, which is incorporated herein by reference in particular as relates to the detection and annotation of circRNAs. The biogenesis of many mammalian circRNAs depends on complementary sequences within flanking introns (see Ashwal-Fluss R, Meyer M, Pamudurti N R, et al. circRNA Biogenesis Competes with Pre-mRNA Splicing. MOLCEL. 2014; 1-12; Rybak-Wolf A, Stottmeister C, Glažar P, et al. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. MOLCEL. 2015; 1-17; Zhang X-O, Wang H-B, Zhang Y, et al. Complementary Sequence-Mediated Exon Circularization. Cell. 2014; Liang D, Wilusz J E. Short intronic repeat sequences facilitate circular RNA production. Genes and Development. 2014; Conn S J, Pillman K A, Toubia J, et al. The RNA Binding Protein Quaking Regulates Formation of circRNAs. Cell. 2015; 160(6):1125-1134; and Ivanov A, Memczak S, Wyler E, et al. Analysis of Intron Sequences Reveals Hallmarks of Circular RNA Biogenesis in Animals. CellReports. 2015; 10(2):170-177). Hence, in one embodiment the two introns upstream and downstream of and direct adjacent in the genomic context to the exons of the exon-exon junction (i.e. forming the exon-exon junction) in a head to tail arrangement often contain complementary sequences, e.g. a complementary sequence stretch of at least 15 nucleotides, preferably 500 nucleotides, more preferably 1000 nucleotides. For detection of circRNA in principle, the RNA of a sample is sequenced after reverse transcription and library preparation. Afterwards, the sequences are analyzed for the presence of exon-exon junctions in a head-to-tail arrangement. For instance RNA sequenced can be mapped to a reference genome using common mapping programs and software, e.g. bowtie2 (version 2.1.0; see Langmead B, Salzberg S L. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012; 9(4):357-359). Human reference genomes are known to the skilled person and include the human reference genome hg19 (February 2009, GRCh37; downloadable from the UCSC genome browser; see Kent W J, Sugnet C W, Furey T S, et al. The human genome browser at UCSC. Genome Research. 2002; 12(6):996-1006). Although circRNA detection in blood is possible without any preprocessing of the total RNA sample, it is preferred to deplete ribosomal RNAs (rRNA), preferably the majority of rRNA, to increase the sensitivity of circRNA detection, in particular when using RNA Sequencing approaches. To this end, the content of rRNAin the sample should be depleted to less than 20%, preferably less than 10%, more preferably less than 2% with respect to the total RNA content. The rRNA depletion may performed as known in the art, e.g. it may be facilitated by commercially available kits (e.g. Ribominus, Themo Scientific) or enzymatic methods (Xian Adiconis et al. Comprehensive comparative analysis of RNA sequencing methods for degraded or low input samples Nat Methods. 2013 July; 10(7): 10.1038/nmeth.2483.).

Further, in a preferred embodiment RNA sequences which map continuously to the genome by aligning without any trimming (end-to-end mode) are neglected. Reads not mapping continuously to the genome are preferably used for circRNA candidate detection. The terminal sequences (anchors) from the sequences, e.g. 20 nt or more, may be extracted and re-aligned independently to the genome. From this alignment the sequences may be extended until the full circRNA sequence is covered, i.e. aligned. Consecutively aligning anchors indicate linear splicing events whereas alignment in reverse orientation indicates head-to-tail splicing as observed in circRNAs (see FIG. 1A). The so identified resulting splicing events are filtered using the following criteria 1) GT/AG signal flanking the splice sites in the genomic context; 2) the breakpoint, i.e. the exon-exon-junction can be unambiguously detected; and 3) no more than 100 kilobases distance between the two splice sites in the genomic context. Furthermore, further optional criteria may be used, depending on the method chosen; e.g. a maximum of two mismatches when extending the anchor alignments; a breakpoint no more than two nucleotides inside the alignment of the anchors; at least two independent reads supporting the head-to-tail splice junction; and/or a minimum difference of 35 in the bowtie2 alignment score between the first and the second best alignment of each anchor.

The circRNAs according to the present invention may be detected using different techniques. As outlined herein, the exon-exon junction in a head-to-tail arrangement is unique to the circRNAs. Hence, the detection of these is preferred. Nucleic acid detection methods are commonly known to the skilled person and include probe hybridization based methods, nucleic acid amplification based methods, and nucleic acid sequencing, or combinations thereof. Hence, in a preferred embodiment of the present invention circRNA is detected using a method selected from the group consisting of probe hybridization based methods, nucleic acid amplification based methods, and nucleic acid sequencing.

Probe hybridization based method employ the feature of nucleic acids to specifically hybridize to a complementary strand. To this end nucleic acid probes may be employed that specifically hybridize to the exon-exon junction in a head-to-tail arrangement of the circRNA, i.e. to a sequence spanning the exon-exon junction, preferably to the region extending from 10 nt upstream to 10 nt downstream of the exon-exon junction, preferably to the region from 20 nt upstream to 20 nt downstream of the exon-exon junction, or even a greater region spanning the exon-exon junction. The skilled person will recognize that hybridization probes specifically hybridizing to the respective sequence of the circRNA may be used, as well as hybridization probes specifically hybridizing to the reverse complement sequence thereof, e.g. in case the circRNA is previously reverse transcribed to cDNA and/or amplified.

Hybridization can also be used as a measure of homology between two nucleic acid sequences. A nucleic acid sequence hybridizing specifically to an exon-exon junction in a head-to-tail arrangement according to the present invention may be used as a hybridization probe according to standard hybridization techniques. The hybridization of the probe to DNA or RNA from a test source (e.g., the bodily fluid, like whole blood, or amplified nucleic acids from the sample of the bodily fluid) is an indication of the presence of the relevant circRNA in the test source.

Hybridization conditions are known to those skilled in the art and can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y., 6.3.1-6.3.6, 1991. Preferably, specific hybridization refers to hybridization under stringent conditions. “Stringent conditions” are defined as equivalent to hybridization in 6× sodium chloride/sodium citrate (SSC) at 45° C., followed by a wash in 0.2×SSC, 0.1% SDS at 65° C.; or as equivalent to hybridization in commercially available hybridization buffers (e.g. ULTRAHyb, ThermoScientific) for blotting techniques and 5×SSC 0.5% SDS (750 mM NaCl, 75 mM sodium citrate, 0.5% sodiumdodecylsulfate, pH 7.0) for array based detection methods at 65° C.

The means and methods of the present invention preferably comprise the use of nucleic acid probes. A nucleic acid probe according to the present invention is an oligonucleotide, nucleic acid or a fragment thereof, which is substantially complementary to a specific nucleic acid sequence. “substantially complementary” refers to the ability to hybridize to the specific nucleic acid sequence under stringent conditions.

The skilled person knows means and methods to determine the levels of nucleic acids in a sample and compare them to control levels. Such methods may employ labeled nucleic acid probes according to the invention. “Labels” include fluorescent or enzymatic active labels as further defined herein below. Such methods include real-time PCR methods and microarray methods, like Affimetrix®, nanostring and the like.

The determination of the circRNAs or their level may also be detected using sequencing techniques. The skilled person is able to use sequencing techniques in connection with the present invention. Sequencing techniques include but are not limited to Maxam-Gilbert Sequencing, Sanger sequencing (chain-termination method using ddNTPs), and next generation sequencing methods, like massively parallel signature sequencing (MPSS), polony sequencing, 454 pyrosequencing, Illumina (Solexa) sequencing, SOLiD sequencing, or ion torrent semiconductor sequencing or single molecule, real-time technology sequencing (SMRT).

The detection/determination of the circRNAs and the respective level may also employ nucleic acid amplification method alone or in combination with the sequencing and/or hybridization method. Nucleic acid amplification may be used to amplify the sequence of interest prior to detection. It may however also be used for quantifying a nucleic acid, e.g. by real-time PCR methods. Such methods are commonly known to the skilled person. Nucleic acid amplification methods for example include rolling circle amplification (such as in Liu, et al., “Rolling circle DNA synthesis: Small circular oligonucleotides as efficient templates for DNA polymerases,” J. Am. Chem. Soc. 118:1587-1594 (1996).), isothermal amplification (such as in Walker, et al., “Strand displacement amplification—an isothermal, in vitro DNA amplification technique,” Nucleic Acids Res. 20(7):1691-6 (1992)), ligase chain reaction (such as in Landegren, et al., “A Ligase-Mediated Gene Detection Technique,” Science 241:1077-1080, 1988, or, in Wiedmann, et al., “Ligase Chain Reaction (LCR)—Overview and Applications,” PCR Methods and Applications (Cold Spring Harbor Laboratory Press, Cold Spring Harbor Laboratory, N Y, 1994) pp. S51-S64.)). Nucleic-acid amplification can be accomplished by any of the various nucleic-acid amplification methods known in the art, including but not limited to the polymerase chain reaction (PCR), ligase chain reaction (LCR), transcription-based amplification system (TAS), nucleic acid sequence based amplification (NASBA), rolling circle amplification (RCA), transcription-mediated amplification (TMA), self-sustaining sequence replication (3SR) and Qβ amplification. It will be readily understood that the amplification of the circRNA may start with a reverse transcription of the RNA into complementary DNA (cDNA), optionally followed by amplification of the so produced cDNA.

It may be desirable to reduce or diminish non circRNA prior to the determination or the presence or level of the circRNAs. To this end RNA degrading agents may be added to the sample and/or the isolated total nucleic acids, e.g. total RNA, thereof, wherein said RNA degrading agent does not degrade circRNAs or does degrade circRNAs only at lower rates as compared to linear RNAs. One such agent is RNase R. RNase R is a 3′-5′ exoribonuclease closely related to RNase II, which has been shown to be involved in selective mRNA degradation, particularly of non stop mRNAs in bacteria (see Cheng; Deutscher, M P et al. (2005). “An important role for RNase R in mRNA decay”. Molecular Cell 17(2):313-318; and Venkataraman, K; Guja, K E; Garcia-Diaz, M; Karzai, A W (2014). “Non-stop mRNA decay: a special attribute of trans-translation mediated ribosome rescue.”; Frontiers in microbiology 5:93. Suzuki H1, Zuo Y, Wang J, Zhang M Q, Malhotra A, Mayeda A; Characterization of RNase R-digested cellular RNA source that consists of lariat and circular RNAs from pre-mRNA splicing; Nucleic Acids Res. 2006 May 8; 34(8):e63.). RNase R has homologues in many other organisms. When a part of another larger protein has a domain that is very similar to RNase R, this is called an RNase R domain. Hence, in a preferred embodiment the sample is treated with RNase R before determination of the circRNA to deplete linear RNA isoforms from the total RNA preparation and thereby increase detection sensitivity.

As outlined herein, the diagnostic or prognostic value of a single circRNA may not be sufficient in order to allow a diagnosis or prognosis with a reliable result. In such case it may be desirable to determine the presence or level of more than one circRNA in the sample and optionally comparing them to the respective control level. The skilled person will acknowledge that these more than one circRNAs may be chosen from a predetermined panel of circRNAs. Such panel usually includes the minimum number of circRNAs necessary to allow a reliable diagnosis or prognosis. The number of circRNAs of the panel may vary depending on the desired reliability and/or the prognostic or diagnostic value of the included circRNAs, e.g. when determined alone. Hence, the method according to the present invention in a preferred embodiment determines more than one circRNA from a panel of circRNAs, e.g. their presence or absence, or level, respectively.

The panel for obtaining the desired may be chosen according to the needs. In particular the skilled person may apply statistical approaches as outlined herein in order to validate the diagnostic and/or prognostic significance of a certain panel. The inventors have herein shown for a neurodegenerative disease the development of a certain panel of circRNAs giving a reasonable degree of certainty. The skilled person may apply common statistical techniques in order to develop a panel of circRNAs. Such statistical techniques include cluster analysis (e.g. hierarchical or k-means clustering), principle component analysis or factor analysis.

In principle, the statistical methods aim the identification of circRNAs or panels of circRNAs that exhibit differing presence and/or levels in samples of diseased and healthy/normal subjects. As outlined, the panel is preferably a panel of more than one circRNA, i.e. a plurality. In a preferred embodiment of the invention said panel comprises a plurality of circRNAs that have been identified as being present at differing levels in bodily fluid samples of patients having the disease and patients not having the disease. The panel of circRNAs has been preferably identified by principle component analysis or clustering.

The “principle component analysis” (PCA) (as also used exemplified herein) regards the analysis of factors differing between diseased and healthy subjects. PCA is known to the skilled person (see Pearson K., “On lines and planes of closest fit to systems of points in space”, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2.11 559-572 (1901), and Hotelling H., “Analysis of a complex of statistical variables into principal components” Journal of educational psychology 24.6 417 (1933)). The circRNAs to be chosen for the principle component analysis may be those previously determined in samples of healthy and/or diseased subject. Thresholds may be incorporated in order to consider a circRNA for further analysis, in a preferred embodiment only circRNAs having an expression value of at least 6.7 after variance stabilizing transformation of raw read counts in one of the samples. PCA may be performed on circRNAs included in the analysis using the prcomp function of the standard package “stats” of the “R” programming language (R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0). Depending on the circRNAs chosen, the disease and other factors, the weights can vary. However, the skilled artisan will acknowledge that these circRNAs with the highest weight as regards the principle component of interest, i.e. disease/healthy state, shall be chosen in order to obtain the circRNAs with the highest predictive absolute values. PCA is used to visualize and measure the amount of variation in a data set. Mathematically, PCA is an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data lies on the first coordinate which is called Principal Component 1 (PC1) and so on. The before mentioned calculated weight represents the distance of each circular RNA to this specific projection. Thus, the higher the absolute value the more relevant is for this projection.

“Hierarchical Clustering” (also referred to herein as “clustering”) may be performed as known in the art (reviewed in Murtagh, F and Conteras, P “Methods of hierarchical clustering” arXiv preprint arXiv:1105.0121 (2011)). Samples may be clustered on log₂transformed normalized circRNA expression profiles (log₂(n_i+1)). Hierarchical, agglomerative clustering may be performed with complete linkage and optionally by further using Spearman's rank correlation as distance metric (1−{corr [log(n_i+1)]}). “The goal of cluster analysis is to partition observations (here circRNA expression) into groups (“clusters”) so that the pairwise dissimilarities between those assigned to the same cluster tend to be smaller than those in different clusters” (see Friedman J, Hastie T, and Tibshiriani R, “The elements of statistical learning”, Vol. 1. Soringer, Berlin: Springer series in statistics (2001)). Here, the measure for dissimilarity is defined as the Spearman's rank correlation. A visualization and complete description of the hierarchical clustering is provided by a dendrogram.

The inventors have exemplified the method outlined above for a neurodegenerative disease, in particular Alzheimer's disease. A neurodegenerative disease in context with the present invention is to be understood as a disease associated with neurodegeneration. Neurodegeneration means a progressive loss of structure or function of neurons, including death of neurons. Many neurodegenerative diseases including ALS, Parkinson's, Alzheimer's, and Huntington's occur as a result of neurodegenerative processes. Nowadays, many similarities exist that relate these diseases to one another on a sub-cellular level. There are many parallels between different neurodegenerative disorders including atypical protein assemblies (protein misfolding and/or agglomeration) as well as induced cell death. Neurodegeneration can be found in many different levels of neuronal circuitry ranging from molecular to systemic. Hence, in a preferred embodiment of the present invention the disease is a neurodegenerative disease, preferably selected from the group of Alzheimer's, ALS, Parkinson's, and Huntington's.

In a particularly preferred embodiment the disease is Alzheimer's disease. Alzheimer's disease has been identified as a protein misfolding disease (proteopathy), causing plaque accumulation of abnormally folded amyloid beta protein, and tau protein in the brain. Plaques are made up of small peptides, 39-43 amino acids in length, called amyloid beta (Aβ). AP is a fragment from the larger amyloid precursor protein (APP). APP is a transmembrane protein that penetrates through the neuron's membrane. APP is critical to neuron growth, survival, and post-injury repair. In Alzheimer's disease, an unknown enzyme in a proteolytic process causes APP to be divided into smaller fragments. One of these fragments gives rise to fibrils of amyloid beta, which then form clumps that deposit outside neurons in dense formations known as senile plaques. AD is also considered a tauopathy due to abnormal aggregation of the tau protein. In AD, tau undergoes chemical changes, becoming hyperphosphorylated; it then begins to pair with other threads, creating neurofibrillary tangles and disintegrating the neuron's transport system. A patient, is classified as having Alzheimer's disease according to the criteria as set by the National Institute of Neurological and Communicative Disorders and Stroke (NINCDS) and the Alzheimer's disease and Related Disorders Association (ADRDA, now known as the Alzheimer's Association), the NINCDS-ADRDA Alzheimer's Criteria for diagnosis in 1984, extensively updated in 2007 (see McKhann G, Drachman D, Folstein M, et al. Clinical Diagnosis of Alzheimer's disease: Report of the NINCDS-ADRDA Work Group under the Auspices of Department of Health and Human Services Task Force on Alzheimer's disease. Neurology. 1984; 34(7):939-44; and Dubois B, Feldman H H, Jacova C, et al. Research Criteria for the Diagnosis of Alzheimer's disease: Revising the NINCDS-ADRDA Criteria. Lancet Neurology. 2007; 6(8):734-469). These criteria require that the presence of cognitive impairment, and a suspected dementia syndrome, be confirmed by neuropsychological testing for a clinical diagnosis of possible or probable Alzheimer's disease. A histopathologic confirmation including a microscopic examination of brain tissue is required for a definitive diagnosis. Good statistical reliability and validity have been shown between the diagnostic criteria and definitive histopathological confirmation (see Blacker D, Albert M S, Bassett S S, et al. Reliability and validity of NINCDS-ADRDA criteria for Alzheimer's disease. The National Institute of Mental Health Genetics Initiative. Archives of Neurology. 1994; 51(12):1198-204). Eight cognitive domains are most commonly impaired in AD memory, language, perceptual skills, attention, constructive abilities, orientation, problem solving and functional abilities. These domains are equivalent to the NINCDS-ADRDA Alzheimer's Criteria as listed in the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR) published by the American Psychiatric Association.

In a particular preferred embodiment of the present invention, it relates to a method for diagnosing a neurodegenerative disease in a subject comprises the steps of:

- determining the level of one or more circRNA in a sample of a bodily fluid of said subject;
- comparing the determined level to a control level of said one or more circRNA;

wherein differing levels between the determined and the control level are indicative for the disease. The neurodegenerative disease is most preferably Alzheimer's disease.

The inventors have identified specific circRNAs that have a predictive or diagnostic value as regards the neurodegenerative disease. In particular 910 highly expressed circRNAs have been identified that are differentially present in samples of patients with a neurodegenerative disease as compared to the healthy controls. These 910 circRNAs are particularly characterized by their exon-exon junction in a head-to-tail arrangement, as outlined herein above. The sequences encoding the 20 nucleotides upstream and 20 nucleotides downstream of said exon-exon junction in the respective circRNAs are given in SEQ ID NOs: 1 to 910. However, it may be sufficient to determine only 10 nucleotides upstream and 10 nucleotides downstream of the junction in order to detect the circRNAs specifically. Hence, in a preferred embodiment said one or more circRNA in the method for diagnosing the neurodegenerative disease comprises a sequence encoded by a sequence selected from the group consisting of nucleotides 11 to 30 of any of the sequences of SEQ ID NO:1 to SEQ ID NO:910. The circRNA may for instance be detected through determining the presence or levels of RNA comprising the respective sequences, e.g. by hybridization, sequencing and/or amplification methods as outlined herein. SEQ ID NO:1 to 1820 list the DNA sequences encoding the sequences of the exon-exon junctions or the complete sequences of the circRNAs of the present invention. “Encoded” in this regard means that the RNA encoded by the DNA sequence has the sequence of nucleotides as set out in the DNA sequence with the thymidines “T” being exchanged by uracils “U”, the backbone being ribonucleic acid instead of deoxyribonucleic acid. X

The inventors found that the circRNAs are indicative for the presence or the risk of acquiring a neurodegenerative disease when present at increased or decreased levels. Whether the presence of the specific circRNA at decreased or increased levels is indicative for the neurodegenerative disease is given in Table 1. Hence, in a particular preferred embodiment the presence of increased or decreased levels as defined in Table 1 under “diseased” for the circRNA comprising the respectively encoded sequence are indicative for the presence of or risk of acquiring a neurodegenerative disease, preferably for Alzheimer's disease. As outlined in the Table's legend, “+” denotes that increased levels and/or the presence of the respective circRNA are indicative for the presence or risk of acquiring Alzheimer's disease, while “−” denotes that decreased levels and/or the absence of the respective circRNA are indicative for the presence or risk of acquiring Alzheimer's disease.

As mentioned, the circRNAs may be detected through the unique sequences occurring at the exon-exon junction in the head-to-tail arrangement. However, in one embodiment the circRNA may be detected through detection of a larger portion of their sequence. In one embodiment of the method for diagnosing a neurodegenerative disease, preferably Alzheimer's disease, said one or more circRNA has a sequence as encoded by a sequence selected from the group consisting of SEQ ID NO:911 to SEQ ID NO:1820. Preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for the circRNA having the respective encoded sequence are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease.

TABLE 1

Preferred circRNAs in connection with the diagnosis of a neurodegenerative disease, preferably Alzheimer's disease:

SEQ ID
SEQ ID

NO
NO full
circID

junction
length
“#”
chr
start
stop
gene
gene_name
score
diseased

1
911
09611
chr17
29130918
29131126
ENSG00000176390
CRLF3
−3.022350822
−

2
912
04805
chr11
85718584
85742653
ENSG00000073921
PICALM
2.660787274
+

3
913
06983
chr14
35020919
35024118
NA
NA
−2.59734763
−

4
914
11279
chr19
37916769
37917280
ENSG00000196437
ZNF569
−2.257101936
−

5
915
09640
chr17
30315338
30315516
ENSG00000178691
SUZ12
−2.204418987
−

6
916
00725
chr1
1756835
1770677
ENSG00000078369
GNB1
−2.182283795
−

7
917
17120
chr4
54249939
54256040
ENSG00000145216
FIP1L1
−2.176395303
−

8
918
00155
chr1
114450630
114450813
ENSG00000118655
DCLRE1B
−2.150719181
−

9
919
18778
chr6
111583459
111585149
ENSG00000173214
KIAA1919
−2.150503499
−

10
920
09544
chr17
27778472
27778698
ENSG00000160551
TAOK1
−2.023829904
−

11
921
15325
chr3
169840378
169840532
ENSG00000173889
PHC3
−2.009445974
−

12
922
06048
chr12
938227
939110
ENSG00000060237
WNK1
−1.981641605
−

13
923
17070
chr4
48503639
48507632
ENSG00000075539
FRYL
−1.892671504

14
924
05238
chr12
121220457
121222396
ENSG00000157837
SPPL3
−1.85677938
−

15
925
17655
chr5
132227855
132228810
ENSG00000072364
AFF4
−1.850977839
−

16
926
01651
chr1
29313942
29323831
ENSG00000159023
EPB41
−1.8320179
−

17
927
11479
chr19
8520288
8528570
ENSG00000099783
HNRNPM
−1.802172301
−

18
928
17787
chr5
142434003
142437312
ENSG00000145819
ARHGAP26
−1.799997822
−

19
929
20940
chr8
101299728
101300495
ENSG00000034677
RNF19A
−1.790427008
−

20
930
10338
chr17
65916130
65919106
ENSG00000171634
BPTF
−1.790398051
−

21
931
14397
chr22
28306951
28310335
ENSG00000180957
PITPNB
1.774695607
+

22
932
09432
chr17
18208425
18210280
ENSG00000177302
TOP3A
−1.759757408
−

23
933
01741
chr1
31810021
31811895
ENSG00000121766
ZCCHC17
−1.757819625
−

24
934
15327
chr3
169840378
169843795
ENSG00000173889
PHC3
−1.733435542
−

25
935
01556
chr1
25553932
25554726
ENSG00000117614
SYF2
−1.731052005
−

26
936
18274
chr5
56542126
56545403
ENSG00000062194
GPBP1
−1.697565149
−

27
937
20658
chr7
72879768
72880731
ENSG00000009954
BAZ1B
−1.683472553
−

28
938
20552
chr7
44739739
44741214
ENSG00000105953
OGDH
−1.662553273
−

29
939
03293
chr10
70190192
70190417
ENSG00000138346
DNA2
−1.660365542
−

30
940
22764
chr9
96233422
96261168
ENSG00000048828
FAM120A
−1.651553618
−

31
941
15662
chr3
196118683
196120490
ENSG00000163960
UBXN7
−1.647242297
−

32
942
04222
chr11
3752620
3752808
ENSG00000110713
NUP98
−1.638985251
−

33
943
08121
chr15
59204761
59205895
ENSG00000137776
SLTM
−1.610197017
−

34
944
13514
chr2
99786012
99787892
ENSG00000158411
MITD1
−1.603567451
−

35
945
22065
chr9
126519981
126531842
ENSG00000119522
DENND1A
−1.603263457
−

36
946
21136
chr8
131302246
131370389
ENSG00000153317
ASAP1
−1.594570417
−

37
947
07290
chr14
56114742
56115588
ENSG00000126777
KTN1
−1.593658506
−

38
948
21493
chr8
42914234
42919358
ENSG00000168522
FNTA
−1.591101491
−

39
949
02565
chr10
105767934
105778666
ENSG00000065613
SLK
−1.571637751
−

40
950
13738
chr20
35672488
35672653
ENSG00000080839
RBL1
−1.533251478
−

41
951
09181
chr16
69729038
69729282
ENSG00000102908
NFAT5
−1.510313578
−

42
952
10581
chr18
20516716
20529676
ENSG00000101773
RBBP8
−1.491646626
−

43
953
16418
chr4
110412482
110416012
ENSG00000138802
SEC24B
−1.48101895
−

44
954
00480
chr1
15964801
15970145
ENSG00000197312
DDI2
−1.471661901
−

45
955
07128
chr14
50136241
50141145
ENSG00000100479
POLE2
−1.461703503
−

46
956
09687
chr17
34937773
34937968
ENSG00000005955
GGNBP2
−1.455641381
−

47
957
04374
chr11
5247878
5248265
ENSG00000244734
HBB
1.44536348
+

48
958
11059
chr19
10286215
10288043
ENSG00000130816
DNMT1
−1.420159643
−

49
959
21189
chr8
141595217
141616013
ENSG00000123908
AGO2
−1.420022161
−

50
960
20250
chr7
151946960
151948051
ENSG00000055609
KMT2C
−1.415607268
−

51
961
20208
chr7
148516069
148516779
ENSG00000106462
EZH2
−1.408045344
−

52
962
02272
chr1
78200031
78201843
ENSG00000077254
USP33
−1.398984454
−

53
963
00711
chr1
1747194
1756938
ENSG00000078369
GNB1
−1.381859545
−

54
964
19568
chr6
56989531
56999668
ENSG00000112200
ZNF451
−1.374430798
−

55
965
01141
chr1
212218001
212220759
ENSG00000143476
DTL
−1.37418011
−

56
966
02693
chr10
120797749
120797951
ENSG00000107581
EIF3A
−1.371606407
−

57
967
14089
chr21
17135209
17138460
ENSG00000155313
USP25
−1.370800154
−

58
968
08653
chr15
93540186
93541851
ENSG00000173575
CHD2
−1.363982557
−

59
969
04712
chr11
77402203
77404656
ENSG00000048649
RSF1
−1.347797823
−

60
970
06433
chr13
41517087
41518061
ENSG00000120690
ELF1
−1.329949423
−

61
971
04231
chr11
3789810
3789974
ENSG00000110713
NUP98
−1.328958073
−

62
972
15257
chr3
160131260
160132305
ENSG00000113810
SMC4
−1.327632403
−

63
973
08264
chr15
64791491
64792365
ENSG00000180357
ZNF609
−1.323996695
−

64
974
19854
chr7
101870646
101870949
ENSG00000257923
CUX1
−1.323311249
−

65
975
17166
chr4
56877577
56878151
ENSG00000174799
CEP135
−1.312060145
−

66
976
04708
chr11
77394754
77396204
ENSG00000048649
RSF1
−1.308996537
−

67
977
06000
chr12
89853414
89866052
ENSG00000139323
POC1B
−1.305211994
−

68
978
19065
chr6
150092297
150094305
ENSG00000120265
PCMT1
−1.285715752
−

69
979
01612
chr1
28362054
28384605
ENSG00000158161
EYA3
−1.283006706
−

70
980
07213
chr14
50952295
50952912
ENSG00000012983
MAP4K5
−1.277430489
−

71
981
02010
chr1
52293467
52299842
ENSG00000078618
NRD1
−1.265951283
−

72
982
10749
chr18
39607406
39629569
ENSG00000078142
PIK3C3
−1.244972909
−

73
983
12347
chr2
203921149
203922176
ENSG00000144426
NBEAL1
−1.238573412
−

74
984
10987
chr18
9195548
9221997
ENSG00000101745
ANKRD12
1.23352635
+

75
985
01864
chr1
41536266
41541123
ENSG00000010803
SCMH1
−1.230203935
−

76
986
20724
chr7
7826418
7841374
ENSG00000219545
RPA3-AS1
−1.228982871
−

77
987
22400
chr9
33960823
33963789
ENSG00000137073
UBAP2
−1.228711515
−

78
988
02152
chr1
63944434
63955889
ENSG00000142856
ITGB3BP
−1.22179497
−

79
989
16992
chr4
39839475
39843676
ENSG00000121892
PDS5A
−1.219666561
−

80
990
04329
chr11
47774467
47776216
ENSG00000109920
FNBP4
−1.21886247
−

81
991
10286
chr17
61841375
61842207
ENSG00000108588
CCDC47
−1.215198124
−

82
992
13162
chr2
61340904
61345251
ENSG00000162929
KIAA1841
−1.213144359
−

83
993
01070
chr1
205585605
205593019
ENSG00000158711
ELK4
−1.208469007
−

84
994
11293
chr19
39943995
39944161
ENSG00000196235
SUPT5H
−1.201547806
−

85
995
02313
chr1
87181406
87185318
ENSG00000097033
SH3GLB1
−1.194484292
−

86
996
11596
chr2
113057425
113057606
ENSG00000188177
ZC3H6
−1.192203498
−

87
997
15955
chr3
44835708
44835918
ENSG00000163808
KIF15
−1.18998281
−

88
998
07534
chr14
81297486
81307112
ENSG00000100629
CEP128
−1.187560558
−

89
999
00616
chr1
171537385
171544267
ENSG00000117523
PRRC2C
−1.187049294
−

90
1000
01909
chr1
46105881
46108171
ENSG00000159592
GPBP1L1
−1.185137589
−

91
1001
04787
chr11
85707868
85742653
ENSG00000073921
PICALM
1.180129801
+

92
1002
11447
chr19
57967020
57967550
ENSG00000268163
AC004076.9
−1.173577839
−

93
1003
16169
chr3
56661064
56662642
ENSG00000163946
FAM208A
−1.172753372
−

94
1004
09079
chr16
58594115
58594266
ENSG00000125107
CNOT1
−1.17086243
−

95
1005
00804
chr1
180366650
180382606
ENSG00000135847
ACBD6
−1.16836055
−

96
1006
13521
chr2
99985854
99988193
ENSG00000158417
EIF5B
−1.167808075
−

97
1007
02559
chr10
105197771
105198565
ENSG00000148843
PDCD11
−1.166398697
−

98
1008
06318
chr13
28155433
28155940
ENSG00000139517
LNX2
−1.162993126
−

99
1009
11563
chr2
109067483
109068922
ENSG00000135968
GCC2
−1.162334985
−

100
1010
02763
chr10
126631025
126631876
ENSG00000249456
RP11-
1.159634216
+

298J20.4

101
1011
10253
chr17
600657
602620
ENSG00000141252
VPS53
−1.155561006
−

102
1012
14888
chr3
129599151
129599402
ENSG00000172765
TMCC1
−1.151076826
−

103
1013
00354
chr1
154207066
154207767
ENSG00000143569
UBAP2L
−1.150469707
−

104
1014
22404
chr9
33971648
33973235
ENSG00000137073
UBAP2
−1.146045034
−

105
1015
05176
chr12
116668337
116669961
ENSG00000123066
MED13L
−1.143723652
−

106
1016
01113
chr1
21083658
21100103
ENSG00000127483
HP1BP3
−1.142758789
−

107
1017
13008
chr2
44729827
44732869
ENSG00000143919
CAMKMT
−1.14151883
−

108
1018
16696
chr4
153874650
153875471
ENSG00000137460
FHDC1
−1.13092216
−

109
1019
17750
chr5
138699447
138700432
ENSG00000120727
PAIP2
−1.125871509
−

110
1020
19832
chr6
99912479
99916494
ENSG00000123552
USP45
1.125162796
+

111
1021
22387
chr9
33948371
33948585
ENSG00000137073
UBAP2
−1.118516346
−

112
1022
01075
chr1
205696827
205698749
ENSG00000069275
NUCKS1
−1.112909385
−

113
1023
22554
chr9
5944870
5954095
ENSG00000183354
KIAA2026
−1.112044692
−

114
1024
05081
chr12
112116954
112121111
ENSG00000089234
BRAP
−1.108105518
−

115
1025
07080
chr14
45587230
45599993
ENSG00000100442
FKBP3
−1.107688517
−

116
1026
06866
chr14
23378691
23380612
ENSG00000100461
RBM23
−1.10744325
−

117
1027
12525
chr2
227729319
227732034
ENSG00000144468
RHBDD1
−1.099283508
−

118
1028
12560
chr2
231307651
231314970
ENSG00000067066
SP100
−1.092112695
−

119
1029
15719
chr3
197592293
197602646
ENSG00000186001
LRCH3
−1.089664619
−

120
1030
00593
chr1
169767997
169770112
ENSG00000000460
C1orf112
−1.08419511
−

121
1031
07174
chr14
50616725
50616948
ENSG00000100485
SOS2
−1.081425815
−

122
1032
18626
chr5
93964515
93966448
ENSG00000133302
ANKRD32
1.073184987
+

123
1033
19248
chr6
20781375
20846409
ENSG00000145996
CDKAL1
−1.072369586
−

124
1034
04914
chr12
102107867
102110590
ENSG00000111666
CHPT1
−1.062669426
−

125
1035
04376
chr11
5247941
5248230
ENSG00000244734
HBB
1.059317293
+

126
1036
15734
chr3
20113075
20113951
ENSG00000114166
KAT2B
−1.059120544
−

127
1037
16067
chr3
49372452
49373029
ENSG00000114316
USP4
−1.057601335
−

128
1038
17988
chr5
176402396
176409624
ENSG00000087206
UIMC1
−1.057060246
−

129
1039
00386
chr1
155408117
155429689
ENSG00000116539
ASH1L
−1.056345461
−

130
1040
20264
chr7
155457868
155473602
ENSG00000184863
RBM33
−1.052207723
−

131
1041
20916
chr8
100515063
100523740
ENSG00000132549
VPS13B
−1.050303405
−

132
1042
02796
chr10
13214375
13214765
ENSG00000065328
MCM10
−1.042082127
−

133
1043
20002
chr7
117825700
117828459
ENSG00000128534
NAA38
−1.041810719
−

134
1044
12270
chr2
202010100
202014558
ENSG00000003402
CFLAR
−1.036011434
−

135
1045
23007
chrX
149983334
149984551
ENSG00000102181
CD99L2
−1.033978093
−

136
1046
09230
chr16
71712657
71715808
ENSG00000040199
PHLPP2
−1.031262858
−

137
1047
22401
chr9
33960823
33973235
ENSG00000137073
UBAP2
−1.027473832
−

138
1048
07006
chr14
35331249
35331528
ENSG00000198604
BAZ1A
−1.022185742
−

139
1049
14241
chr21
38792600
38794168
ENSG00000157540
DYRK1A
−1.021673056
−

140
1050
16836
chr4
178274461
178274882
ENSG00000109674
NEIL3
−1.016081081
−

141
1051
10887
chr18
60206913
60217693
ENSG00000141664
ZCCHC2
−1.009357969
−

142
1052
22986
chrX
13684435
13698717
ENSG00000176896
TCEANC
1.007062962
+

143
1053
06929
chr14
31404368
31425448
ENSG00000196792
STRN3
−1.003076508
−

144
1054
07056
chr14
39627488
39628754
ENSG00000182400
TRAPPC6B
−1.001831479
−

145
1055
19098
chr6
155095122
155099179
ENSG00000213079
SCAF8
−1.000738262
−

146
1056
03734
chr11
108098321
108100050
ENSG00000149311
ATM
−0.995255586
−

147
1057
02126
chr1
62907158
62907970
ENSG00000162607
USP1
−0.994146093
−

148
1058
21287
chr8
21835280
21837714
ENSG00000130227
XPO7
−0.993409908
−

149
1059
15062
chr3
142116170
142123918
ENSG00000114127
XRN1
−0.991091493
−

150
1060
20509
chr7
36450122
36450775
ENSG00000011426
ANLN
−0.985382926
−

151
1061
10756
chr18
39623696
39629569
ENSG00000078142
PIK3C3
−0.980136493
−

152
1062
20561
chr7
48541721
48542148
ENSG00000179869
ABCA13
0.978360837
+

153
1063
08654
chr15
93540186
93545547
ENSG00000173575
CHD2
0.978292944
+

154
1064
08778
chr16
148142
150507
ENSG00000103148
NPRL3
0.975426406
+

155
1065
02929
chr10
22896855
22898646
ENSG00000150867
PIP4K2A
−0.9722031
−

156
1066
17110
chr4
52729602
52758017
ENSG00000109184
DCUN1D4
−0.966161676
−

157
1067
09420
chr17
1746096
1747980
ENSG00000132383
RPA1
−0.962147895
−

158
1068
04015
chr11
16205431
16256217
ENSG00000110693
SOX6
−0.961209315
−

159
1069
21439
chr8
41518947
41519459
ENSG00000029534
ANK1
−0.957238337
−

160
1070
19011
chr6
146209155
146216113
ENSG00000146414
SHPRH
0.956350194
+

161
1071
08356
chr15
66044716
66053776
ENSG00000174485
DENND4A
−0.953555637
−

162
1072
09604
chr17
29112936
29113049
ENSG00000176390
CRLF3
−0.952318815
−

163
1073
13634
chr20
32619327
32621124
ENSG00000125970
RALY
−0.952059983
−

164
1074
20447
chr7
27824781
27825108
ENSG00000106052
TAX1BP1
−0.951551903
−

165
1075
09322
chr16
85667519
85667738
ENSG00000131149
GSE1
−0.950544843
−

166
1076
20382
chr7
23224688
23224917
ENSG00000136243
NUPL2
−0.948456687
−

167
1077
01675
chr1
29319841
29320054
ENSG00000159023
EPB41
−0.947371967
−

168
1078
01264
chr1
222897433
222898897
ENSG00000162819
BROX
−0.944435092
−

169
1079
09473
chr17
20107645
20109225
ENSG00000128487
SPECC1
0.943075221
+

170
1080
04559
chr11
68318588
68331900
ENSG00000110075
PPP6R3
−0.940969479
−

171
1081
19673
chr6
76357446
76369123
ENSG00000112701
SENP6
−0.93979093
−

172
1082
15335
chr3
169854206
169867032
ENSG00000173889
PHC3
−0.93370794
−

173
1083
14366
chr22
22160138
22162135
ENSG00000100030
MAPK1
−0.932412536
−

174
1084
01252
chr1
22047528
22048257
ENSG00000090686
USP48
−0.92708516
−

175
1085
22466
chr9
36375930
36376124
ENSG00000137075
RNF38
−0.912116226
−

176
1086
01676
chr1
29319841
29323831
ENSG00000159023
EPB41
−0.907484412
−

177
1087
05433
chr12
1863423
1863680
ENSG00000006831
ADIPOR2
−0.906782889
−

178
1088
16984
chr4
39739039
39757359
ENSG00000078140
UBE2K
−0.906655604
−

179
1089
18946
chr6
13632601
13644961
ENSG00000010017
RANBP9
−0.905997471
−

180
1090
22377
chr9
33932559
33933626
ENSG00000137073
UBAP2
−0.904013554
−

181
1091
15300
chr3
169693395
169706147
ENSG00000008952
SEC62
−0.890692839
−

182
1092
08660
chr15
93543741
93558139
ENSG00000173575
CHD2
−0.889550329
−

183
1093
01562
chr1
25666964
25669564
ENSG00000183726
TMEM50A
−0.887890585
−

184
1094
09996
chr17
48828107
48828723
ENSG00000108848
LUC7L3
−0.887692142
−

185
1095
10763
chr18
43319127
43319627
ENSG00000141469
SLC14A1
−0.887285198
−

186
1096
05073
chr12
112096539
112097149
ENSG00000089234
BRAP
−0.886768771
−

187
1097
20330
chr7
16298014
16317851
ENSG00000214960
ISPD
−0.884134047
−

188
1098
01568
chr1
25678116
25679465
ENSG00000183726
TMEM50A
−0.882062038
−

189
1099
04367
chr11
5246893
5247941
ENSG00000244734
HBB
−0.881298453
−

190
1100
21442
chr8
41519318
41521260
ENSG00000029534
ANK1
−0.878728778
−

191
1101
15809
chr3
3178943
3186394
ENSG00000072756
TRNT1
−0.873695283
−

192
1102
08775
chr16
14720962
14721193
ENSG00000140694
PARN
−0.873513356
−

193
1103
01596
chr1
28116072
28120143
ENSG00000117758
STX12
−0.873353301
−

194
1104
15618
chr3
195791179
195796439
ENSG00000072274
TFRC
0.865977004
+

195
1105
04371
chr11
5247806
5254322
ENSG00000244734
HBB
−0.861493375
−

196
1106
17540
chr5
112321531
112339774
ENSG00000172795
DCP2
−0.859873285
−

197
1107
02864
chr10
17645558
17646046
ENSG00000165996
PTPLA
−0.856912854
−

198
1108
11369
chr19
48660270
48660397
ENSG00000105486
LIG1
−0.855477996
−

199
1109
19205
chr6
17646297
17649531
ENSG00000124789
NUP153
−0.851513363
−

200
1110
17131
chr4
54280781
54294350
ENSG00000145216
FIP1L1
−0.850581558
−

201
1111
14060
chr20
6011930
6012726
ENSG00000088766
CRLS1
−0.849745579
−

202
1112
04999
chr12
109046047
109048186
ENSG00000110880
CORO1C
−0.845189707
−

203
1113
09815
chr17
40652724
40653322
ENSG00000033627
ATP6V0A1
−0.843836528
−

204
1114
07981
chr15
50592985
50593565
ENSG00000104064
GABPB1
−0.836914748
−

205
1115
17411
chr4
89859238
89870589
ENSG00000138640
FAM13A
−0.834816404
−

206
1116
01550
chr1
24840803
24841057
ENSG00000117602
RCAN3
−0.833705319
−

207
1117
04237
chr11
3794861
3797251
ENSG00000110713
NUP98
−0.833473306
−

208
1118
06459
chr13
42040958
42042974
ENSG00000102760
RGCC
−0.833444657
−

209
1119
22388
chr9
33948371
33953472
ENSG00000137073
UBAP2
−0.832464951
−

210
1120
19582
chr6
57017018
57025950
ENSG00000112200
ZNF451
0.832178254
+

211
1121
02407
chr1
93683294
93692006
ENSG00000122483
CCDC18
−0.830332399
−

212
1122
23137
chrX
53641494
53642796
ENSG00000086758
HUWE1
−0.828040183
−

213
1123
13589
chr20
2944917
2945848
ENSG00000132670
PTPRA
−0.825864753
−

214
1124
14585
chr22
41531816
41536261
ENSG00000100393
EP300
−0.82395501
−

215
1125
02974
chr10
27453992
27454468
ENSG00000120539
MASTL
−0.822978236
−

216
1126
13436
chr2
8910799
8917022
ENSG00000134313
KIDINS220
0.822565477
+

217
1127
12752
chr2
26321530
26332775
ENSG00000084733
RAB10
−0.821885553
−

218
1128
21087
chr8
131092147
131104389
ENSG00000153317
ASAP1
−0.82162602
−

219
1129
02599
chr10
1125950
1126416
ENSG00000047056
WDR37
−0.821547428
−

220
1130
02264
chr1
78177431
78178966
ENSG00000077254
USP33
−0.821465245
−

221
1131
06099
chr12
96692646
96694138
ENSG00000059758
CDK17
−0.820285575
−

222
1132
03159
chr10
49609654
49618211
ENSG00000107643
MAPK8
0.818623607
+

223
1133
06939
chr14
31420068
31425448
ENSG00000196792
STRN3
−0.818620248
−

224
1134
17748
chr5
138614015
138614818
ENSG00000015479
MATR3
−0.81818695
−

225
1135
06781
chr14
103871412
103871604
ENSG00000075413
MARK3
−0.817182929
−

226
1136
09698
chr17
35800605
35800763
ENSG00000108264
TADA2A
−0.81705738
−

227
1137
12901
chr2
37426846
37428869
ENSG00000218739
CEBPZ-AS1
−0.816983939
−

228
1138
11110
chr19
13039155
13039661
ENSG00000179115
FARSA
−0.815139146
−

229
1139
15689
chr3
196876613
196888609
ENSG00000075711
DLG1
−0.812904778
−

230
1140
00858
chr1
185143503
185144245
ENSG00000116668
SWT1
−0.812024383
−

231
1141
10248
chr17
60061531
60062451
ENSG00000108510
MED13
−0.811297795
−

232
1142
16838
chr4
178274461
178281831
ENSG00000109674
NEIL3
−0.810885238
−

233
1143
06571
chr13
50642232
50649789
ENSG00000231607
DLEU2
−0.810759762
−

234
1144
01959
chr1
50956259
51001129
ENSG00000185104
FAF1
−0.806879972
−

235
1145
05880
chr12
69107644
69108533
ENSG00000111581
NUP107
−0.803562204
−

236
1146
13163
chr2
61343113
61345251
ENSG00000162929
KIAA1841
−0.802619965
−

237
1147
07836
chr15
41961025
41962156
ENSG00000174197
MGA
−0.801902039
−

238
1148
19213
chr6
17665469
17669259
ENSG00000124789
NUP153
−0.800253252
−

239
1149
04007
chr11
16133348
16208501
ENSG00000110693
SOX6
−0.796576061
−

240
1150
06028
chr12
93192667
93192862
ENSG00000102189
EEA1
−0.795690495
−

241
1151
20626
chr7
65592690
65599361
ENSG00000241258
CRCP
−0.795534527
−

242
1152
10962
chr18
76953182
76974038
ENSG00000166377
ATP9B
−0.792877388
−

243
1153
13660
chr20
33065576
33067594
ENSG00000078747
ITCH
−0.790880218
−

244
1154
15338
chr3
169863210
169867032
ENSG00000173889
PHC3
−0.789693985
−

245
1155
07828
chr15
41667909
41669502
ENSG00000137804
NUSAP1
−0.788962664
−

246
1156
17980
chr5
176370335
176385155
ENSG00000087206
UIMC1
−0.788756125
−

247
1157
14085
chr21
16386664
16415895
ENSG00000180530
NRIP1
−0.78857849
−

248
1158
21017
chr8
124089350
124117704
ENSG00000156787
TBC1D31
−0.787817422
−

249
1159
15597
chr3
195785154
195785503
ENSG00000072274
TFRC
−0.787214727
−

250
1160
17066
chr4
48371865
48385801
ENSG00000109171
SLAIN2
−0.785855689
−

251
1161
10120
chr17
57274904
57275150
ENSG00000068489
PRR11
−0.78426978
−

252
1162
13524
chr20
10536878
10541468
ENSG00000149346
SLX4IP
−0.782760423
−

253
1163
21766
chr8
95549330
95550574
ENSG00000164944
KIAA1429
−0.781355288
−

254
1164
13231
chr2
61749745
61753656
ENSG00000082898
XPO1
−0.781098538
−

255
1165
14042
chr20
57014000
57016139
ENSG00000124164
VAPB
−0.780718257
−

256
1166
16116
chr3
52446826
52448603
ENSG00000010318
PHF7
−0.780597733
−

257
1167
21279
chr8
21832180
21835354
ENSG00000130227
XPO7
−0.778597658
−

258
1168
03652
chr10
98618048
98667504
ENSG00000196233
LCOR
−0.775538046
−

259
1169
11792
chr2
144966169
144969146
ENSG00000121964
GTDC1
−0.771184033
−

260
1170
03028
chr10
31749965
31750166
ENSG00000148516
ZEB1
−0.771152714
−

261
1171
11627
chr2
11426664
11427862
ENSG00000134318
ROCK2
−0.768728393
−

262
1172
09942
chr17
45479497
45492285
ENSG00000178852
EFCAB13
0.768679343
+

263
1173
10858
chr18
55278868
55283207
ENSG00000134440
NARS
−0.767365826
−

264
1174
07253
chr14
55423751
55424353
ENSG00000198554
WDHD1
−0.766525169
−

265
1175
21408
chr8
37623043
37623873
ENSG00000147471
PROSC
−0.765299565
−

266
1176
01316
chr1
224599128
224601037
ENSG00000162923
WDR26
−0.765061353
−

267
1177
06007
chr12
89860546
89866052
ENSG00000139323
POC1B
−0.759596201
−

268
1178
07054
chr14
39623414
39627606
ENSG00000182400
TRAPPC6B
−0.757655748
−

269
1179
18441
chr5
72311452
72333042
ENSG00000157107
FCHO2
0.755970771
+

270
1180
00400
chr1
155695172
155695810
ENSG00000132676
DAP3
−0.755813418
−

271
1181
02388
chr1
93648916
93659301
ENSG00000122483
CCDC18
−0.751924076
−

272
1182
00200
chr1
117944807
117984947
ENSG00000198162
MAN1A2
−0.74921634
−

273
1183
05156
chr12
112798183
112798315
ENSG00000173064
HECTD4
0.748188995
+

274
1184
02650
chr10
11639629
11643979
ENSG00000148429
USP6NL
−0.748046374
−

275
1185
05799
chr12
62743001
62749256
ENSG00000135655
USP15
−0.748037916
−

276
1186
09092
chr16
66764014
66766408
ENSG00000135720
DYNC1LI2
−0.745511259
−

277
1187
10929
chr18
74561481
74563895
ENSG00000130856
ZNF236
−0.744223622
−

278
1188
21056
chr8
124392768
124392917
ENSG00000156802
ATAD2
−0.73775191
−

279
1189
01937
chr1
47745912
47748131
ENSG00000123473
STIL
−0.737462925
−

280
1190
08702
chr16
11114049
11154879
ENSG00000038532
CLEC16A
−0.733977011
−

281
1191
00443
chr1
158624600
158624741
ENSG00000163554
SPTA1
−0.733664113
−

282
1192
19873
chr7
102962378
102963241
ENSG00000105821
DNAJC2
−0.733541673
−

283
1193
02286
chr1
7837219
7838229
ENSG00000049245
VAMP3
−0.731067513
−

284
1194
07308
chr14
58690339
58690574
ENSG00000131966
ACTR10
−0.72935942
−

285
1195
20485
chr7
32672154
32678977
ENSG00000229358
DPY19L1P1
0.72562903
+

286
1196
00417
chr1
156303337
156304709
ENSG00000163468
CCT3
−0.724202091
−

287
1197
22727
chr9
88920106
88924932
ENSG00000083223
ZCCHC6
−0.722435269
−

288
1198
07807
chr15
41361767
41362745
ENSG00000128908
INO80
−0.722342802
−

289
1199
15663
chr3
196118683
196129890
ENSG00000163960
UBXN7
−0.718770738
−

290
1200
11682
chr2
122514815
122516382
ENSG00000211460
TSN
−0.717149492
−

291
1201
18087
chr5
32135677
32143986
ENSG00000113384
GOLPH3
−0.71686075
−

292
1202
15886
chr3
37170553
37190529
ENSG00000093167
LRRFIP2
−0.7148495
−

293
1203
15146
chr3
149563797
149639014
ENSG00000082996
RNF13
−0.712343388
−

294
1204
23244
chrX
77270158
77275895
ENSG00000165240
ATP7A
0.712089225
+

295
1205
07646
chr14
96986391
96987409
ENSG00000090060
PAPOLA
−0.711582141
−

296
1206
09875
chr17
4186092
4200109
ENSG00000132388
UBE2G1
−0.71126388
−

297
1207
01638
chr1
28907071
28907741
ENSG00000197989
SNHG12
−0.710805666
−

298
1208
20189
chr7
141752583
141778270
ENSG00000257335
MGAM
0.710063558
+

299
1209
00276
chr1
150198939
150201570
ENSG00000143401
ANP32E
−0.710010473
−

300
1210
08220
chr15
63988322
64008672
ENSG00000103657
HERC1
−0.709650893
−

301
1211
10863
chr18
55833019
55919286
ENSG00000049759
NEDD4L
−0.709240345
−

302
1212
13077
chr2
54278094
54284497
ENSG00000170634
ACYP2
−0.708784218
−

303
1213
04820
chr11
85733409
85742653
ENSG00000073921
PICALM
0.707847663
+

304
1214
17594
chr5
122881110
122893258
ENSG00000151292
CSNK1G3
0.70648759
+

305
1215
07062
chr14
39746137
39748741
ENSG00000258941
RP11-
−0.705740708
−

407N17.3

306
1216
16701
chr4
154315413
154318485
ENSG00000121211
MND1
−0.705435194
−

307
1217
08004
chr15
50875285
50878685
ENSG00000092439
TRPM7
−0.704828246
−

308
1218
16681
chr4
153332454
153333681
ENSG00000109670
FBXW7
0.704685694
+

309
1219
17907
chr5
162909647
162911251
ENSG00000072571
HMMR
−0.702946274
−

310
1220
05911
chr12
69983264
69985939
ENSG00000166226
CCT2
−0.702361818
−

311
1221
00169
chr1
115005725
115007010
ENSG00000197323
TRIM33
−0.700608549
−

312
1222
14223
chr21
37734480
37736557
ENSG00000159256
MORC3
−0.697854368
−

313
1223
22527
chr9
4860124
4860901
ENSG00000120158
RCL1
−0.695160044
−

314
1224
09138
chr16
68300495
68300624
ENSG00000103064
SLC7A6
0.694891988
+

315
1225
18377
chr5
68470703
68471364
ENSG00000134057
CCNB1
−0.694429979
−

316
1226
18506
chr5
75993811
75997038
ENSG00000145703
IQGAP2
−0.693808924
−

317
1227
10787
chr18
45391429
45423180
ENSG00000175387
SMAD2
0.691627662
+

318
1228
04075
chr11
17167214
17167489
ENSG00000011405
PIK3C2A
−0.690859029
−

319
1229
19214
chr6
17665469
17669777
ENSG00000124789
NUP153
−0.690761984
−

320
1230
19533
chr6
52935854
52941341
ENSG00000112146
FBXO9
−0.688778917
−

321
1231
10179
chr17
58346810
58348842
ENSG00000170832
USP32
−0.688400663
−

322
1232
09309
chr16
81058319
81060243
ENSG00000166451
CENPN
−0.683869042
−

323
1233
16985
chr4
39739039
39776553
ENSG00000078140
UBE2K
0.683584317
+

324
1234
06320
chr13
28748408
28752072
ENSG00000152520
PAN3
−0.683323431
−

325
1235
15439
chr3
180651121
180653019
ENSG00000114416
FXR1
−0.677850731
−

326
1236
18715
chr6
10703637
10705077
ENSG00000111845
PAK1IP1
−0.677107932
−

327
1237
20193
chr7
141760110
141786128
ENSG00000257335
MGAM
−0.674929803
−

328
1238
02184
chr1
67356836
67371058
ENSG00000152763
WDR78
−0.674140838
−

329
1239
00394
chr1
155646338
155649303
ENSG00000163374
YY1AP1
−0.674066047
−

330
1240
21181
chr8
141582910
141595410
ENSG00000123908
AGO2
−0.6713091
−

331
1241
13696
chr20
34317233
34320057
ENSG00000131051
RBM39
−0.662798279
−

332
1242
17137
chr4
54292038
54310270
ENSG00000145216
FIP1L1
−0.662175886
−

333
1243
02329
chr1
89206670
89237562
ENSG00000065243
PKN2
−0.660194584
−

334
1244
08087
chr15
56686362
56687032
ENSG00000151575
TEX9
−0.657395416
−

335
1245
22808
chrM
678
946
NA
NA
0.656447182
+

336
1246
06741
chr13
99890680
99896878
ENSG00000134882
UBAC2
−0.655751379
−

337
1247
15306
chr3
169694733
169706147
ENSG00000008952
SEC62
−0.65538327
−

338
1248
21571
chr8
61137095
61139494
ENSG00000178538
CA8
−0.650767459
−

339
1249
13002
chr2
44717924
44719593
ENSG00000143919
CAMKMT
−0.649894452
−

340
1250
02797
chr10
13233298
13234568
ENSG00000065328
MCM10
−0.649679006
−

341
1251
01565
chr1
25666964
25683344
ENSG00000183726
TMEM50A
−0.649491962
−

342
1252
02928
chr10
22880557
22898646
ENSG00000150867
PIP4K2A
−0.649420792
−

343
1253
03884
chr11
120345268
120348235
ENSG00000196914
ARHGEF12
−0.649153985
−

344
1254
04534
chr11
66372959
66373063
ENSG00000173992
CCS
0.647602152
+

345
1255
13232
chr2
61749745
61761038
ENSG00000082898
XPO1
−0.646529922
−

346
1256
05898
chr12
69644908
69656342
ENSG00000111605
CPSF6
0.644341581
+

347
1257
14988
chr3
138289159
138291774
ENSG00000114107
CEP70
−0.643339219
−

348
1258
22417
chr9
33996220
33998862
ENSG00000137073
UBAP2
−0.642592519
−

349
1259
06760
chr14
102659799
102661457
ENSG00000140153
WDR20
−0.642496523
−

350
1260
02652
chr10
11643343
11643979
ENSG00000148429
USP6NL
−0.642008103
−

351
1261
13125
chr2
58449076
58459247
ENSG00000115392
FANCL
−0.641155337
−

352
1262
07060
chr14
39648294
39648666
ENSG00000100941
PNN
−0.640545333
−

353
1263
18429
chr5
72285253
72286691
ENSG00000157107
FCHO2
−0.640018867
−

354
1264
07683
chr14
99924615
99932150
ENSG00000183576
SETD3
−0.639034091
−

355
1265
20868
chr7
99621041
99621930
ENSG00000106261
ZKSCAN1
−0.637712203
−

356
1266
02331
chr1
89206670
89251896
ENSG00000065243
PKN2
−0.635932753
−

357
1267
04405
chr11
61133516
61135470
ENSG00000149483
TMEM138
−0.634569229
−

358
1268
16983
chr4
39739039
39747430
ENSG00000078140
UBE2K
−0.63344921
−

359
1269
16811
chr4
170523158
170523829
ENSG00000137601
NEK1
−0.632606144
−

360
1270
08802
chr16
15973660
15978062
ENSG00000133393
FOPNL
−0.631706784
−

361
1271
05515
chr12
28378727
28412375
ENSG00000123106
CCDC91
−0.63053228
−

362
1272
00483
chr1
15964801
15978390
ENSG00000197312
DDI2
−0.630382539
−

363
1273
03851
chr11
120276826
120278532
ENSG00000196914
ARHGEF12
−0.62989944
−

364
1274
19220
chr6
17669523
17675264
ENSG00000124789
NUP153
−0.629886281
−

365
1275
16618
chr4
144464661
144465125
ENSG00000153147
SMARCA5
−0.628964515
−

366
1276
06199
chr13
114265310
114277601
ENSG00000198176
TFDP1
−0.626535807
−

367
1277
13881
chr20
45874751
45875261
ENSG00000101040
ZMYND8
−0.625387432
−

368
1278
15830
chr3
32757716
32758729
ENSG00000182973
CNOT10
−0.620191841
−

369
1279
01339
chr1
226453233
226454033
ENSG00000183814
LIN9
−0.619533975
−

370
1280
21917
chr9
114840817
114842445
ENSG00000106868
SUSD1
−0.618533169
−

371
1281
17404
chr4
89827529
89859392
ENSG00000138640
FAM13A
−0.613111444
−

372
1282
10799
chr18
47017995
47018203
ENSG00000265681
RPL17
−0.61125224
−

373
1283
08584
chr15
90431752
90432372
ENSG00000157823
AP3S2
−0.608684683
−

374
1284
09579
chr17
28003837
28004759
ENSG00000141298
SSH2
0.608248578
+

375
1285
01693
chr1
29362337
29365938
ENSG00000159023
EPB41
−0.607275382
−

376
1286
20192
chr7
141759271
141784444
ENSG00000257335
MGAM
0.604473759
+

377
1287
03072
chr10
32832227
32873232
ENSG00000216937
CCDC7
−0.601944609
−

378
1288
21122
chr8
131226801
131249240
ENSG00000153317
ASAP1
0.596477189
+

379
1289
11808
chr2
148730307
148739650
ENSG00000115947
ORC4
−0.595902755
−

380
1290
18633
chr5
93978977
93990457
ENSG00000133302
ANKRD32
0.595223761
+

381
1291
06793
chr14
103918254
103923549
ENSG00000075413
MARK3
−0.593329884
−

382
1292
02441
chr1
95609446
95616975
ENSG00000152078
TMEM56
−0.592118741
−

383
1293
00317
chr1
151139409
151139890
ENSG00000163156
SCNM1
−0.59143984
−

384
1294
05925
chr12
70193988
70195501
ENSG00000127328
RAB3IP
−0.590453589
−

385
1295
09920
chr17
45247282
45249430
ENSG00000004897
CDC27
−0.590090956
−

386
1296
18485
chr5
74842834
74848416
ENSG00000122008
POLK
−0.589212654
−

387
1297
06721
chr13
95886863
95900007
ENSG00000125257
ABCC4
0.58894702
+

388
1298
00393
chr1
155644800
155649303
ENSG00000163374
YY1AP1
−0.588932673
−

389
1299
01882
chr1
43293959
43295969
ENSG00000164010
ERMAP
−0.588844379
−

390
1300
03896
chr11
120916382
120930794
ENSG00000154114
TBCEL
−0.586343839
−

391
1301
06549
chr13
50025688
50026045
ENSG00000136169
SETDB2
−0.585761927
−

392
1302
10785
chr18
45391429
45396935
ENSG00000175387
SMAD2
0.583937093
+

393
1303
06941
chr14
31424825
31425448
ENSG00000196792
STRN3
−0.583201151
−

394
1304
06310
chr13
26974589
26975761
ENSG00000132964
CDK8
−0.581890878
−

395
1305
09968
chr17
47388673
47389404
ENSG00000198740
ZNF652
−0.581062739
−

396
1306
16163
chr3
56626997
56628056
ENSG00000180376
CCDC66
−0.578668491
−

397
1307
07694
chr15
101104896
101105470
ENSG00000140471
LINS
−0.578351997
−

398
1308
02708
chr10
121275020
121286936
ENSG00000148908
RGS10
0.577572918
+

399
1309
05496
chr12
27107077
27110676
ENSG00000111790
FGFR1OP2
0.576980459
+

400
1310
14614
chr22
41568502
41569788
ENSG00000100393
EP300
−0.576686603
−

401
1311
12064
chr2
175976295
175986268
ENSG00000115966
ATF2
−0.575720234
−

402
1312
03039
chr10
32309949
32310215
ENSG00000170759
KIF5B
−0.575295577
−

403
1313
07693
chr15
101104892
101105470
ENSG00000140471
LINS
−0.574250823
−

404
1314
10271
chr17
60111147
60112969
ENSG00000108510
MED13
−0.574056275
−

405
1315
10970
chr18
77668145
77668309
ENSG00000122490
PQLC1
−0.571239552
−

406
1316
04710
chr11
77394754
77404656
ENSG00000048649
RSF1
0.571177955
+

407
1317
23217
chrX
76907603
76912143
ENSG00000085224
ATRX
−0.569592962
−

408
1318
12082
chr2
178096616
178098999
ENSG00000116044
NFE2L2
−0.56851133
−

409
1319
00788
chr1
179972308
179975702
ENSG00000135837
CEP350
−0.568320418
−

410
1320
05414
chr12
14609494
14610229
ENSG00000171681
ATF7IP
−0.567149617
−

411
1321
20442
chr7
27668989
27672064
ENSG00000106049
HIBADH
−0.566428216
−

412
1322
21415
chr8
37967896
37968351
ENSG00000129691
ASH2L
−0.565206772
−

413
1323
16141
chr3
52771601
52775515
ENSG00000114904
NEK4
−0.56479756
−

414
1324
15039
chr3
141811902
141820683
ENSG00000114126
TFDP2
−0.563649191
−

415
1325
01723
chr1
29481207
29481422
ENSG00000116350
SRSF4
−0.56351081
−

416
1326
12722
chr2
24787163
24816590
ENSG00000084676
NCOA1
−0.562536917
−

417
1327
01585
chr1
26594973
26596105
ENSG00000130695
CEP85
−0.556978262
−

418
1328
04814
chr11
85723323
85742653
ENSG00000073921
PICALM
0.555693534
+

419
1329
06795
chr14
103918254
103928798
ENSG00000075413
MARK3
−0.555161705
−

420
1330
18206
chr5
52899281
52900725
ENSG00000164258
NDUFS4
−0.55332775
−

421
1331
08277
chr15
65471271
65472542
ENSG00000166855
CLPX
−0.553060093
−

422
1332
15888
chr3
37170553
37196215
ENSG00000093167
LRRFIP2
−0.552719212
−

423
1333
14215
chr21
37716876
37721706
ENSG00000159256
MORC3
0.552495821
+

424
1334
21103
chr8
131164981
131193126
ENSG00000153317
ASAP1
−0.552048479
−

425
1335
03209
chr10
5838725
5842668
ENSG00000057608
GDI2
−0.550992097
−

426
1336
20435
chr7
26724354
26729981
ENSG00000005020
SKAP2
−0.550537566
−

427
1337
18379
chr5
68487621
68492936
ENSG00000153044
CENPH
−0.549245895
−

428
1338
07964
chr15
49528047
49531564
ENSG00000156958
GALK2
−0.548755689
−

429
1339
09370
chr16
9009110
9011013
ENSG00000187555
USP7
−0.548099255
−

430
1340
20041
chr7
129760588
129762042
ENSG00000128607
KLHDC10
−0.547563163
−

431
1341
11772
chr2
136432901
136437894
ENSG00000048991
R3HDM1
−0.547466106
−

432
1342
12227
chr2
201721404
201721708
ENSG00000013441
CLK1
−0.546383174
−

433
1343
21187
chr8
141595217
141595410
ENSG00000123908
AGO2
−0.54617497
−

434
1344
06474
chr13
43528083
43544806
ENSG00000133106
EPSTI1
0.546089365
+

435
1345
07887
chr15
43627142
43628024
ENSG00000168803
ADAL
−0.545213157
−

436
1346
04380
chr11
5248159
5255443
ENSG00000244734
HBB
−0.544981515
−

437
1347
20067
chr7
131071878
131073731
ENSG00000128585
MKLN1
−0.544816911
−

438
1348
01978
chr1
51868106
51874004
ENSG00000085832
EPS15
−0.54474945
−

439
1349
05796
chr12
62715244
62749256
ENSG00000135655
USP15
−0.542734982
−

440
1350
06458
chr13
41943225
41946966
ENSG00000172766
NAA16
−0.542699921
−

441
1351
12631
chr2
24046127
24046439
ENSG00000119778
ATAD2B
−0.542019024
−

442
1352
16247
chr3
69077050
69077446
ENSG00000144747
TMF1
−0.541249355
−

443
1353
16972
chr4
39328182
39329376
ENSG00000035928
RFC1
0.539983221
+

444
1354
12660
chr2
24103508
24108699
ENSG00000119778
ATAD2B
0.537763436
+

445
1355
03207
chr10
5836847
5842668
ENSG00000057608
GDI2
0.536487079
+

446
1356
14164
chr21
34804483
34805178
ENSG00000159128
IFNGR2
−0.535037721
−

447
1357
03806
chr11
117150623
117150975
ENSG00000167257
RNF214
−0.533600834
−

448
1358
18891
chr6
131481198
131490413
ENSG00000118507
AKAP7
−0.530137447
−

449
1359
10683
chr18
29412046
29419420
ENSG00000153339
TRAPPC8
−0.52543663
−

450
1360
12223
chr2
201718625
201719809
ENSG00000013441
CLK1
−0.522633343
−

451
1361
22413
chr9
33986757
33998862
ENSG00000137073
UBAP2
0.521045501
+

452
1362
17718
chr5
137320945
137324004
ENSG00000031003
FAM13B
−0.520881586
−

453
1363
18147
chr5
38971978
38978752
ENSG00000164327
RICTOR
−0.520750421
−

454
1364
01647
chr1
29313942
29314417
ENSG00000159023
EPB41
−0.52004965
−

455
1365
18058
chr5
179665331
179668155
ENSG00000050748
MAPK9
−0.519526036
−

456
1366
16094
chr3
50145502
50145737
ENSG00000003756
RBM5
−0.518700066
−

457
1367
03636
chr10
98312403
98312816
ENSG00000077147
TM9SF3
−0.518334594
−

458
1368
07659
chr14
97026985
97029230
ENSG00000090060
PAPOLA
−0.516218178
−

459
1369
01488
chr1
24112164
24112913
ENSG00000057757
PITHD1
−0.515901988
−

460
1370
18726
chr6
108243000
108250718
ENSG00000025796
SEC63
−0.512704372
−

461
1371
21934
chr9
115013208
115015068
ENSG00000119314
PTBP3
0.511483471
+

462
1372
11181
chr19
19603114
19603521
ENSG00000167491
GATAD2A
−0.506490329
−

463
1373
17532
chr5
112128142
112128674
ENSG00000134982
APC
−0.505576237
−

464
1374
00404
chr1
155823066
155823597
ENSG00000116580
GON4L
−0.504015347
−

465
1375
16639
chr4
147227077
147230127
ENSG00000120519
SLC10A7
−0.501165826
−

466
1376
11038
chr19
10273342
10277361
ENSG00000130816
DNMT1
−0.499037453
−

467
1377
12628
chr2
24042616
24046439
ENSG00000119778
ATAD2B
−0.49858657
−

468
1378
20317
chr7
158580694
158591763
ENSG00000117868
ESYT2
−0.497702626
−

469
1379
07790
chr15
40938035
40939272
ENSG00000137812
CASC5
−0.497560503
−

470
1380
03959
chr11
130130750
130131824
ENSG00000196323
ZBTB44
−0.497284677
−

471
1381
08659
chr15
93543741
93552553
ENSG00000173575
CHD2
−0.497214572
−

472
1382
21499
chr8
48308935
48320523
ENSG00000164808
SPIDR
0.497091896
+

473
1383
14410
chr22
29120964
29121355
ENSG00000183765
CHEK2
−0.496970217
−

474
1384
05272
chr12
122995655
122999774
ENSG00000111011
RSRC2
−0.496602757
−

475
1385
20253
chr7
152007050
152012423
ENSG00000055609
KMT2C
0.49641096
+

476
1386
12867
chr2
33442618
33447218
ENSG00000049323
LTBP1
−0.494203398
−

477
1387
18343
chr5
65307876
65310553
ENSG00000112851
ERBB2IP
−0.490100764
−

478
1388
16491
chr4
128995614
128999117
ENSG00000138709
LARP1B
−0.489563987
−

479
1389
23279
chrY
22749909
22751461
ENSG00000198692
EIF1AY
−0.488676777
−

480
1390
04762
chr11
85692171
85692271
ENSG00000073921
PICALM
0.488490499
+

481
1391
03534
chr10
93711159
93713630
ENSG00000095564
BTAF1
−0.486905705
−

482
1392
12274
chr2
202163467
202164023
ENSG00000155749
ALS2CR12
−0.486743671
−

483
1393
17490
chr5
109049220
109065214
ENSG00000112893
MAN2A1
−0.486367479
−

484
1394
07235
chr14
53003436
53011089
ENSG00000087301
TXNDC16
−0.485655501
−

485
1395
00199
chr1
117944807
117963271
ENSG00000198162
MAN1A2
−0.484302881
−

486
1396
01097
chr1
207896962
207898053
ENSG00000197721
CR1L
−0.48202167
−

487
1397
08154
chr15
62299506
62306191
ENSG00000129003
VPS13C
−0.480880375
−

488
1398
01423
chr1
23356961
23377013
ENSG00000004487
KDM1A
−0.48077223
−

489
1399
18273
chr5
56542126
56543042
ENSG00000062194
GPBP1
−0.480747801
−

490
1400
02982
chr10
27821435
27822923
ENSG00000099246
RAB18
−0.479223363
−

491
1401
09356
chr16
89824984
89828430
ENSG00000187741
FANCA
−0.477120997
−

492
1402
02496
chr10
103221737
103239214
ENSG00000166167
BTRC
−0.473303249
−

493
1403
02437
chr1
95603830
95616975
ENSG00000231992
RP11-
−0.471456403
−

57H12.2

494
1404
14948
chr3
136323150
136323315
ENSG00000118007
STAG1
−0.470361517
−

495
1405
16191
chr3
57618991
57627474
ENSG00000174839
DENND6A
0.469381446
+

496
1406
00019
chr1
100889777
100908552
ENSG00000079335
CDC14A
−0.467831947
−

497
1407
17405
chr4
89827529
89870589
ENSG00000138640
FAM13A
−0.467278089
−

498
1408
05428
chr12
1812051
1863680
ENSG00000006831
ADIPOR2
−0.467232674
−

499
1409
15760
chr3
20178433
20181856
ENSG00000114166
KAT2B
−0.466896651
−

500
1410
09205
chr16
70601313
70601439
ENSG00000189091
SF3B3
−0.464994622
−

501
1411
14782
chr3
119219541
119222868
ENSG00000113845
TIMMDC1
0.463424571
+

502
1412
17376
chr4
88116475
88116842
ENSG00000145332
KLHL8
−0.462383046
−

503
1413
22074
chr9
127670655
127674305
ENSG00000136935
GOLGA1
−0.462041586
−

504
1414
09111
chr16
67662272
67663436
ENSG00000102974
CTCF
−0.461139143
−

505
1415
12648
chr2
240929490
240946787
ENSG00000130414
NDUFA10
−0.459774784
−

506
1416
11683
chr2
122514815
122519100
ENSG00000211460
TSN
−0.458085496
−

507
1417
04141
chr11
33307958
33309057
ENSG00000110422
HIPK3
−0.457031488
−

508
1418
02328
chr1
89206670
89226059
ENSG00000065243
PKN2
−0.456886353
−

509
1419
17863
chr5
153413350
153414527
ENSG00000055147
FAM114A2
−0.456304188
−

510
1420
16668
chr4
151719232
151738409
ENSG00000198589
LRBA
−0.456090855
−

511
1421
23105
chrX
44941820
44942034
ENSG00000147050
KDM6A
0.456030497
+

512
1422
01954
chr1
47834140
47840965
ENSG00000162368
CMPK1
0.455682466
+

513
1423
08564
chr15
89656955
89659752
ENSG00000140526
ABHD2
−0.454247399
−

514
1424
20352
chr7
17929985
17937069
ENSG00000071189
SNX13
−0.449868927
−

515
1425
11008
chr18
9524591
9525849
ENSG00000017797
RALBP1
0.447213595
+

516
1426
22402
chr9
33960823
33989124
ENSG00000137073
UBAP2
0.447213595
+

517
1427
21830
chr9
100756912
100760960
ENSG00000136938
ANP32B
−0.447213595
−

518
1428
12329
chr2
203162101
203162629
ENSG00000055044
NOP58
−0.447213595
−

519
1429
15468
chr3
182679013
182683541
ENSG00000043093
DCUN1D1
−0.447213595
−

520
1430
04828
chr11
85961337
85963282
ENSG00000074266
EED
−0.447213595
−

521
1431
16490
chr4
128995614
128996148
ENSG00000138709
LARP1B
−0.444473314
−

522
1432
01030
chr1
200583445
200584737
ENSG00000118193
KIF14
−0.443117581
−

523
1433
19420
chr6
42559888
42562042
ENSG00000024048
UBR2
−0.439246455
−

524
1434
14185
chr21
37619814
37620866
ENSG00000142197
DOPEY2
−0.438906244
−

525
1435
03718
chr11
108046972
108047817
ENSG00000149308
NPAT
−0.435269889
−

526
1436
09613
chr17
29170930
29171934
ENSG00000176208
ATAD5
−0.434866756
−

527
1437
06750
chr14
102368055
102372866
ENSG00000078304
PPP2R5C
0.434853393
+

528
1438
01018
chr1
200550328
200561368
ENSG00000118193
KIF14
−0.434615126
−

529
1439
16470
chr4
123977541
123978443
ENSG00000145375
SPATA5
−0.434279453
−

530
1440
09778
chr17
38547757
38548989
ENSG00000131747
TOP2A
−0.434135915
−

531
1441
05143
chr12
11273608
11276786
ENSG00000111215
PRR4
−0.433592133
−

532
1442
06260
chr13
21729831
21732264
ENSG00000165480
SKA3
−0.432989827
−

533
1443
22228
chr9
139115852
139118720
ENSG00000165661
QSOX2
−0.432509181
−

534
1444
13448
chr2
9048750
9098771
ENSG00000143797
MBOAT2
−0.432354985
−

535
1445
15357
chr3
171965322
171969331
ENSG00000075420
FNDC3B
0.43092904
+

536
1446
09002
chr16
47581343
47581459
ENSG00000102893
PHKB
−0.427842792
−

537
1447
06393
chr13
33091993
33101669
ENSG00000244754
N4BP2L2
−0.427642991
−

538
1448
13458
chr2
9083315
9102747
ENSG00000143797
MBOAT2
−0.427573656
−

539
1449
16825
chr4
17816475
17816981
ENSG00000109805
NCAPG
−0.426826527
−

540
1450
23126
chrX
53430497
53430825
ENSG00000072501
SMC1A
−0.424399643
−

541
1451
20269
chr7
155465560
155473602
ENSG00000184863
RBM33
0.424359885
+

542
1452
09112
chr16
67663300
67663436
ENSG00000102974
CTCF
−0.423521981
−

543
1453
06268
chr13
21742126
21742538
ENSG00000165480
SKA3
−0.421756846
−

544
1454
16155
chr3
56600621
56601081
ENSG00000180376
CCDC66
−0.420550846
−

545
1455
17913
chr5
167915606
167921655
ENSG00000113643
RARS
−0.42049953
−

546
1456
04638
chr11
73843888
73844602
ENSG00000168014
C2CD3
−0.418953178
−

547
1457
02840
chr10
15858833
15889942
ENSG00000148481
FAM188A
0.418323531
+

548
1458
07667
chr14
97299803
97327072
ENSG00000100749
VRK1
−0.417060256
−

549
1459
02854
chr10
16773475
16776063
ENSG00000148484
RSU1
−0.413151582
−

550
1460
19975
chr7
111027029
111030750
ENSG00000184903
IMMP2L
−0.412214025
−

551
1461
03181
chr10
52279590
52350007
ENSG00000198964
SGMS1
−0.411306293
−

552
1462
15801
chr3
31617887
31621588
ENSG00000163527
STT3B
0.410299463
+

553
1463
20062
chr7
131060182
131073731
ENSG00000128585
MKLN1
−0.407576284
−

554
1464
09831
chr17
40879652
40882936
ENSG00000108799
EZH1
−0.407486821
−

555
1465
03697
chr11
107260799
107263621
ENSG00000152404
CWF19L2
−0.405295156
−

556
1466
15598
chr3
195785154
195787118
ENSG00000072274
TFRC
−0.403678077
−

557
1467
06663
chr13
79209244
79219132
ENSG00000152193
RNF219
−0.402902809
−

558
1468
22473
chr9
3647337
3651867
ENSG00000237359
RP11-
−0.401011186
−

509J21.2

559
1469
15073
chr3
142144063
142145683
ENSG00000114127
XRN1
−0.400992543
−

560
1470
00198
chr1
117944807
117957453
ENSG00000198162
MAN1A2
−0.399491844
−

561
1471
13616
chr20
30954186
30956926
ENSG00000171456
ASXL1
0.396909813
+

562
1472
00912
chr1
187272597
187298192
ENSG00000236030
LINC01036
0.395630665
+

563
1473
17336
chr4
83891479
83900159
ENSG00000189308
LIN54
0.395430875
+

564
1474
09814
chr17
40650941
40653322
ENSG00000033627
ATP6V0A1
−0.394907679
−

565
1475
21876
chr9
110062421
110074018
ENSG00000119318
RAD23B
−0.393198961
−

566
1476
21373
chr8
29959413
29962002
ENSG00000104660
LEPROTL1
−0.392842587
−

567
1477
19233
chr6
18236682
18258636
ENSG00000124795
DEK
−0.3917531
−

568
1478
11803
chr2
148653869
148657467
ENSG00000121989
ACVR2A
0.391156997
+

569
1479
03038
chr10
32308785
32310215
ENSG00000170759
KIF5B
−0.390203461
−

570
1480
22414
chr9
33986757
34017187
ENSG00000137073
UBAP2
0.384924931
+

571
1481
00555
chr1
167921037
167944253
ENSG00000143164
DCAF6
0.384256353
+

572
1482
12529
chr2
227729319
227779067
ENSG00000144468
RHBDD1
−0.384024802
−

573
1483
08578
chr15
89856134
89857938
ENSG00000140525
FANCI
−0.383669546
−

574
1484
05946
chr12
72051305
72054207
ENSG00000133858
ZFC3H1
−0.378677648
−

575
1485
10124
chr17
57430575
57430887
ENSG00000175155
YPEL2
−0.374464606
−

576
1486
18352
chr5
65349233
65350779
ENSG00000112851
ERBB2IP
−0.368836498
−

577
1487
18457
chr5
72354259
72373320
ENSG00000157107
FCHO2
0.36829492
+

578
1488
12608
chr2
239090705
239093928
ENSG00000132323
ILKAP
−0.36754271
−

579
1489
20565
chr7
50358643
50367353
ENSG00000185811
IKZF1
−0.367385234
−

580
1490
18886
chr6
131466424
131490413
ENSG00000118507
AKAP7
0.365789382
+

581
1491
06726
chr13
96409897
96416207
ENSG00000102580
DNAJC3
0.365346639
+

582
1492
11894
chr2
15691616
15698758
ENSG00000151779
NBAS
−0.363763869
−

583
1493
08083
chr15
56680669
56687032
ENSG00000151575
TEX9
−0.363417974
−

584
1494
01398
chr1
230798886
230800333
ENSG00000135775
COG2
−0.362987252
−

585
1495
00433
chr1
15860731
15863309
ENSG00000116138
DNAJC16
−0.362155352
−

586
1496
15074
chr3
142151502
142151735
ENSG00000114127
XRN1
0.361833765
+

587
1497
16004
chr3
47139444
47147610
ENSG00000181555
SETD2
0.361080779
+

588
1498
18658
chr5
95091099
95099324
ENSG00000164292
RHOBTB3
−0.360131826
−

589
1499
14519
chr22
38895404
38897285
ENSG00000100201
DDX17
−0.358942195
−

590
1500
13180
chr2
61505299
61508377
ENSG00000115464
USP34
−0.357526515
−

591
1501
00915
chr1
187296052
187298192
ENSG00000236030
LINC01036
−0.357165013
−

592
1502
10135
chr17
57808781
57816308
ENSG00000062716
VMP1
0.357082198
+

593
1503
11374
chr19
48744218
48744320
ENSG00000105483
CARD8
−0.356101853
−

594
1504
18196
chr5
43675612
43677908
ENSG00000112992
NNT
−0.354388888
−

595
1505
12283
chr2
202195192
202195556
ENSG00000155749
ALS2CR12
−0.352185007
−

596
1506
01405
chr1
231090078
231097049
ENSG00000143643
TTC13
−0.348760252
−

597
1507
03492
chr10
89268092
89280926
ENSG00000107789
MINPP1
0.348651579
+

598
1508
03880
chr11
120335945
120338017
ENSG00000196914
ARHGEF12
−0.348069091
−

599
1509
15092
chr3
143704384
143708679
ENSG00000181744
C3orf58
−0.342079566
−

600
1510
14257
chr21
40578033
40584633
ENSG00000185658
BRWD1
0.3413697
+

601
1511
01241
chr1
220179447
220180680
ENSG00000136628
EPRS
−0.341221548
−

602
1512
20408
chr7
24663284
24690331
ENSG00000105926
MPP6
−0.340271383
−

603
1513
18316
chr5
64824278
64847463
ENSG00000123219
CENPK
0.339380192
+

604
1514
20268
chr7
155465560
155465982
ENSG00000184863
RBM33
−0.336928551
−

605
1515
02266
chr1
78177431
78181553
ENSG00000077254
USP33
0.334726691
+

606
1516
05793
chr12
62708570
62749256
ENSG00000135655
USP15
−0.333244067
−

607
1517
05179
chr12
116668337
116675510
ENSG00000123066
MED13L
0.333078903
+

608
1518
05174
chr12
116668237
116675510
ENSG00000123066
MED13L
−0.332850418
−

609
1519
22708
chr9
88284399
88327481
ENSG00000135049
AGTPBP1
0.332274421
+

610
1520
20391
chr7
23650789
23651172
ENSG00000169193
CCDC126
0.329914132
+

611
1521
11362
chr19
47767859
47768203
ENSG00000105321
CCDC9
−0.327502934
−

612
1522
15121
chr3
148303908
148310052
NA
NA
−0.326264404
−

613
1523
02629
chr10
11523768
11527910
ENSG00000148429
USP6NL
−0.32531565
−

614
1524
15583
chr3
195101737
195112876
ENSG00000114331
ACAP2
−0.323229025
−

615
1525
07647
chr14
96986391
96991728
ENSG00000090060
PAPOLA
−0.322155377
−

616
1526
12697
chr2
24357988
24369956
ENSG00000219626
FAM228B
−0.322096001
−

617
1527
09124
chr16
68155889
68157024
ENSG00000072736
NFATC3
−0.319198223
−

618
1528
01967
chr1
51204534
51210447
ENSG00000185104
FAF1
−0.318899562
−

619
1529
09438
chr17
18768781
18769265
ENSG00000141127
PRPSAP2
−0.316240039
−

620
1530
06838
chr14
20811305
20811436
ENSG00000259001
RPPH1
−0.314587253
−

621
1531
01695
chr1
29362337
29391670
ENSG00000159023
EPB41
−0.314217162
−

622
1532
06864
chr14
23375403
23380612
ENSG00000100461
RBM23
−0.314204415
−

623
1533
00645
chr1
172520651
172526934
ENSG00000094975
SUCO
0.312814513
+

624
1534
02169
chr1
65830317
65831879
ENSG00000116675
DNAJC6
−0.312716499
−

625
1535
01429
chr1
23397717
23398690
ENSG00000004487
KDM1A
−0.312036199
−

626
1536
22699
chr9
88257741
88261333
ENSG00000135049
AGTPBP1
−0.309641223
−

627
1537
05351
chr12
124071293
124074996
ENSG00000086598
TMED2
0.308784196
+

628
1538
18039
chr5
179050037
179050165
ENSG00000169045
HNRNPH1
−0.30753603
−

629
1539
13457
chr2
9083315
9098771
ENSG00000143797
MBOAT2
−0.307407548
−

630
1540
07460
chr14
73614502
73614802
ENSG00000080815
PSEN1
0.307005543
+

631
1541
06725
chr13
96375495
96377506
ENSG00000102580
DNAJC3
−0.306943155
−

632
1542
20409
chr7
24663284
24708279
ENSG00000105926
MPP6
0.304556407
+

633
1543
16349
chr4
103635594
103647840
ENSG00000109323
MANBA
0.302316399
+

634
1544
12976
chr2
44436348
44436466
ENSG00000138032
PPM1B
−0.302057933
−

635
1545
05296
chr12
123064451
123065217
ENSG00000184445
KNTC1
−0.30064316
−

636
1546
13459
chr2
9083315
9114564
ENSG00000143797
MBOAT2
−0.300355128
−

637
1547
06911
chr14
31185129
31204064
ENSG00000092108
SCFD1
−0.300074829
−

638
1548
02265
chr1
78177431
78180468
ENSG00000077254
USP33
−0.299632707
−

639
1549
12279
chr2
202172241
202173973
ENSG00000155749
ALS2CR12
−0.298172541
−

640
1550
00882
chr1
185183638
185200840
ENSG00000116668
SWT1
−0.296383051
−

641
1551
21702
chr8
86253827
86254037
ENSG00000133742
CA1
−0.29494585
−

642
1552
06790
chr14
103915255
103923549
ENSG00000075413
MARK3
0.293796256
+

643
1553
06499
chr13
46577273
46594692
ENSG00000123200
ZC3H13
−0.293181418
−

644
1554
19179
chr6
163876310
163899928
ENSG00000112531
QKI
−0.293026578
−

645
1555
15650
chr3
195800800
195802231
ENSG00000072274
TFRC
0.2929012
+

646
1556
20947
chr8
103372298
103373854
ENSG00000104517
UBR5
0.291370225
+

647
1557
04013
chr11
16205431
16208501
ENSG00000110693
SOX6
−0.289491559
−

648
1558
06935
chr14
31416295
31425448
ENSG00000196792
STRN3
−0.287741171
−

649
1559
11737
chr2
128944256
128945188
ENSG00000136731
UGGT1
−0.286640606
−

650
1560
02879
chr10
17746429
17747740
ENSG00000136738
STAM
−0.285602456
−

651
1561
20719
chr7
77407654
77408131
ENSG00000187257
RSBN1L
−0.285228365
−

652
1562
17108
chr4
52729602
52744020
ENSG00000109184
DCUN1D4
−0.283579414
−

653
1563
21844
chr9
102722198
102722437
ENSG00000136874
STX17
−0.283108654
−

654
1564
12598
chr2
234296902
234299129
ENSG00000077044
DGKD
0.28152101
+

655
1565
00556
chr1
167935866
167944253
ENSG00000143164
DCAF6
−0.281376187
−

656
1566
19977
chr7
111926927
111927129
ENSG00000198839
ZNF277
−0.281018353
−

657
1567
02334
chr1
89236034
89237562
ENSG00000065243
PKN2
−0.280668473
−

658
1568
17522
chr5
111611022
111643187
ENSG00000129595
EPB41L4A
−0.280356159
−

659
1569
06933
chr14
31416295
31420150
ENSG00000196792
STRN3
−0.279320036
−

660
1570
05228
chr12
120995084
120995485
ENSG00000022840
RNF10
−0.278784866
−

661
1571
19200
chr6
170855190
170858201
ENSG00000008018
PSMB1
−0.276150872
−

662
1572
20976
chr8
109462051
109468159
ENSG00000104412
EMC2
−0.275735656
−

663
1573
21101
chr8
131164981
131181313
ENSG00000153317
ASAP1
−0.272436284
−

664
1574
20069
chr7
131071878
131084192
ENSG00000128585
MKLN1
−0.271776986
−

665
1575
14335
chr21
47819503
47822397
ENSG00000160299
PCNT
0.270707487
+

666
1576
04811
chr11
85722072
85742653
ENSG00000073921
PICALM
0.270078678
+

667
1577
11938
chr2
162036124
162061304
ENSG00000136560
TANK
−0.267208245
−

668
1578
12288
chr2
202208892
202216174
ENSG00000155749
ALS2CR12
−0.266117708
−

669
1579
15165
chr3
150834124
150845771
ENSG00000144893
MED12L
0.265293687
+

670
1580
20975
chr8
109462051
109462721
ENSG00000104412
EMC2
−0.261534879
−

671
1581
08355
chr15
66044716
66048810
ENSG00000174485
DENND4A
−0.259935274
−

672
1582
09860
chr17
41256138
41256973
ENSG00000012048
BRCA1
−0.259776938
−

673
1583
05579
chr12
32751430
32764217
ENSG00000139132
FGD4
0.25903045
+

674
1584
01085
chr1
207820661
207828620
ENSG00000244703
CD46P1
−0.258975588
−

675
1585
02595
chr10
112356155
112358048
ENSG00000108055
SMC3
−0.256162842
−

676
1586
06256
chr13
21305979
21306260
ENSG00000150456
N6AMT2
−0.254616892
−

677
1587
04491
chr11
65267990
65268121
ENSG00000251562
MALAT1
−0.254540106
−

678
1588
21280
chr8
21832180
21837714
ENSG00000130227
XPO7
−0.252240898
−

679
1589
19230
chr6
18236682
18237747
ENSG00000124795
DEK
−0.25029879
−

680
1590
02255
chr1
77672324
77676174
ENSG00000142892
PIGK
−0.249922354
−

681
1591
10689
chr18
29432408
29432626
ENSG00000153339
TRAPPC8
0.249647437
+

682
1592
08145
chr15
60734614
60737990
ENSG00000128915
NARG2
−0.24428689
−

683
1593
16950
chr4
37633006
37640126
ENSG00000181826
RELL1
−0.244144099
−

684
1594
16304
chr3
8977554
8983488
ENSG00000070950
RAD18
0.243189302
+

685
1595
16003
chr3
47139444
47144913
ENSG00000181555
SETD2
−0.241271464
−

686
1596
23019
chrX
154528097
154528458
ENSG00000155962
CLIC2
−0.240340585
−

687
1597
20241
chr7
151181822
151195266
ENSG00000106615
RHEB
−0.239763518
−

688
1598
12807
chr2
32312560
32314674
ENSG00000021574
SPAST
−0.238945495
−

689
1599
23171
chrX
67731690
67742759
ENSG00000181704
YIPF6
0.237642343
+

690
1600
12531
chr2
227771508
227779067
ENSG00000144468
RHBDD1
0.237572322
+

691
1601
18517
chr5
76758919
76760634
ENSG00000164253
WDR41
−0.237516399
−

692
1602
15527
chr3
185638891
185639914
ENSG00000136527
TRA2B
−0.233742808
−

693
1603
15305
chr3
169694733
169703653
ENSG00000008952
SEC62
−0.231918034
−

694
1604
22668
chr9
86297865
86301070
ENSG00000135018
UBQLN1
−0.231396117
−

695
1605
21527
chr8
52758220
52773806
ENSG00000168300
PCMTD1
0.22970906
+

696
1606
04781
chr11
85707868
85714494
ENSG00000073921
PICALM
0.229644698
+

697
1607
15556
chr3
193374868
193385069
ENSG00000198836
OPA1
−0.229630322
−

698
1608
16517
chr4
129857809
129891623
ENSG00000151466
SCLT1
−0.228977881
−

699
1609
08315
chr15
65994642
65995346
ENSG00000174485
DENND4A
−0.227421726
−

700
1610
19101
chr6
155095122
155116273
ENSG00000213079
SCAF8
0.227418193
+

701
1611
21658
chr8
71071739
71075089
ENSG00000140396
NCOA2
0.226746325
+

702
1612
09532
chr17
27160969
27161344
ENSG00000173065
FAM222B
0.226375798
+

703
1613
16574
chr4
140046317
140060651
ENSG00000109381
ELF2
0.225090987
+

704
1614
18512
chr5
76342171
76344097
ENSG00000164252
AGGF1
0.223464328
+

705
1615
14107
chr21
17205666
17214859
ENSG00000155313
USP25
0.223298494
+

706
1616
21420
chr8
37971709
37976881
ENSG00000129691
ASH2L
0.222854832
+

707
1617
22434
chr9
3488775
3490345
ENSG00000080298
RFX3
−0.222562079
−

708
1618
03355
chr10
70719561
70720005
ENSG00000165732
DDX21
0.219375254
+

709
1619
10521
chr18
13037235
13040955
ENSG00000101639
CEP192
−0.219207234
−

710
1620
16492
chr4
128995614
129003460
ENSG00000138709
LARP1B
−0.218474838
−

711
1621
20383
chr7
23224688
23226765
ENSG00000136243
NUPL2
0.217804105
+

712
1622
07724
chr15
31266516
31269158
ENSG00000166912
MTMR10
−0.215163364
−

713
1623
21531
chr8
52773404
52773806
ENSG00000168300
PCMTD1
−0.21410755
−

714
1624
07279
chr14
55647930
55650471
ENSG00000126787
DLGAP5
−0.212145422
−

715
1625
19458
chr6
42630995
42633983
ENSG00000024048
UBR2
−0.209515648
−

716
1626
22716
chr9
88307603
88327481
ENSG00000135049
AGTPBP1
−0.209103823
−

717
1627
13376
chr2
74300675
74307718
ENSG00000187605
TET3
−0.208832195
−

718
1628
09780
chr17
38551700
38552717
ENSG00000131747
TOP2A
−0.207376107
−

719
1629
20641
chr7
66458203
66459328
ENSG00000126524
SBDS
0.206299772
+

720
1630
13248
chr2
64083439
64085070
ENSG00000169764
UGP2
0.205403768
+

721
1631
02469
chr1
9991948
9994918
ENSG00000162441
LZIC
−0.203608619
−

722
1632
03025
chr10
31661946
31676195
ENSG00000148516
ZEB1
0.201711818
+

723
1633
03075
chr10
32854485
32873232
ENSG00000150076
C10ORF68
−0.200304746
−

724
1634
19518
chr6
4891946
4892613
ENSG00000153046
CDYL
−0.199527565
−

725
1635
03513
chr10
91511102
91522592
ENSG00000138182
KIF20B
−0.199045211
−

726
1636
17257
chr4
77065301
77065626
ENSG00000138750
NUP54
−0.196900723
−

727
1637
03656
chr10
98667021
98667504
ENSG00000196233
LCOR
−0.195915307
−

728
1638
15717
chr3
197592293
197593090
ENSG00000186001
LRCH3
−0.195801766
−

729
1639
05517
chr12
28408513
28412375
ENSG00000123106
CCDC91
−0.195185653
−

730
1640
20627
chr7
65595730
65599361
ENSG00000241258
CRCP
0.194915841
+

731
1641
15459
chr3
182602540
182605501
ENSG00000058063
ATP11B
0.194190746
+

732
1642
07824
chr15
41648236
41669502
ENSG00000137804
NUSAP1
−0.192452614
−

733
1643
19559
chr6
56915571
56920595
ENSG00000168116
KIAA1586
−0.191962941
−

734
1644
17227
chr4
73956383
73958017
ENSG00000132466
ANKRD17
−0.191278185
−

735
1645
08494
chr15
77657504
77681144
ENSG00000173517
PEAK1
0.190411294
+

736
1646
20064
chr7
131060182
131084192
ENSG00000128585
MKLN1
−0.190090712
−

737
1647
06896
chr14
31139461
31144271
ENSG00000092108
SCFD1
−0.189779123
−

738
1648
13805
chr20
39721111
39729993
ENSG00000198900
TOP1
0.188631066
+

739
1649
15234
chr3
157839891
157841780
ENSG00000174891
RSRC1
−0.188143907
−

740
1650
18114
chr5
36982266
36986403
ENSG00000164190
NIPBL
0.187975589
+

741
1651
22270
chr9
17330629
17342442
ENSG00000044459
CNTLN
−0.187514331
−

742
1652
09250
chr16
72122885
72124685
ENSG00000140830
TXNL4B
0.187490707
+

743
1653
18068
chr5
179976930
179980471
ENSG00000113300
CNOT6
0.18705281
+

744
1654
21532
chr8
52773420
52773806
ENSG00000168300
PCMTD1
0.184831559
+

745
1655
10651
chr18
2718155
2718432
ENSG00000101596
SMCHD1
0.184800168
+

746
1656
13115
chr2
58311223
58316858
ENSG00000028116
VRK2
−0.183814521
−

747
1657
18933
chr6
13579682
13584457
ENSG00000124523
SIRT5
−0.182083653
−

748
1658
22782
chr9
98740342
98766983
ENSG00000182150
ERCC6L2
−0.181217059
−

749
1659
22389
chr9
33948371
33956144
ENSG00000137073
UBAP2
−0.180446169
−

750
1660
16786
chr4
170428187
170429482
ENSG00000137601
NEK1
−0.180057374
−

751
1661
17237
chr4
74852759
74852887
ENSG00000163736
PPBP
0.179149328
+

752
1662
12672
chr2
242099746
242102816
ENSG00000115685
PPP1R7
0.178794715
+

753
1663
06705
chr13
95813442
95840796
ENSG00000125257
ABCC4
−0.176068897
−

754
1664
18425
chr5
72157634
72161556
ENSG00000083312
TNPO1
0.175732494
+

755
1665
17203
chr4
6995910
7002978
ENSG00000132405
TBC1D14
−0.17425856
−

756
1666
21660
chr8
71126137
71128999
ENSG00000140396
NCOA2
−0.174239515
−

757
1667
00651
chr1
172525008
172526934
ENSG00000094975
SUCO
0.173485924
+

758
1668
17135
chr4
54292038
54294350
ENSG00000145216
FIP1L1
−0.172407695
−

759
1669
06702
chr13
95813442
95822882
ENSG00000125257
ABCC4
−0.171000298
−

760
1670
20191
chr7
141755799
141782010
ENSG00000257335
MGAM
0.17013547
+

761
1671
09521
chr17
26490568
26499644
ENSG00000087095
NLK
−0.169494931
−

762
1672
01122
chr1
21097422
21100103
ENSG00000127483
HP1BP3
−0.168563183
−

763
1673
12330
chr2
203329531
203332412
ENSG00000204217
BMPR2
−0.166786705
−

764
1674
05912
chr12
69983264
69987393
ENSG00000166226
CCT2
−0.166663778
−

765
1675
02086
chr1
61577042
61578015
ENSG00000162599
NFIA
0.166193266
+

766
1676
10343
chr17
65941524
65944422
ENSG00000171634
BPTF
−0.164705882
−

767
1677
20291
chr7
156619298
156629579
ENSG00000105983
LMBR1
−0.160626296
−

768
1678
14199
chr21
37711076
37717005
ENSG00000159256
MORC3
0.154418454
+

769
1679
21045
chr8
124349864
124351686
ENSG00000156802
ATAD2
0.153641031
+

770
1680
14325
chr21
47768925
47769734
ENSG00000160299
PCNT
0.153108926
+

771
1681
19798
chr6
90556280
90566918
ENSG00000118412
CASP8AP2
−0.153053021
−

772
1682
02845
chr10
15875628
15889942
ENSG00000148481
FAM188A
−0.152980029
−

773
1683
03993
chr11
16117541
16119234
ENSG00000110693
SOX6
−0.151781747
−

774
1684
16545
chr4
129913321
129925031
ENSG00000151466
SCLT1
0.151559153
+

775
1685
06880
chr14
31050069
31050322
ENSG00000092140
G2E3
−0.149358981
−

776
1686
18314
chr5
64824278
64825026
ENSG00000123219
CENPK
0.148684367
+

777
1687
20221
chr7
148543561
148544397
ENSG00000106462
EZH2
−0.147611287
−

778
1688
07570
chr14
90397884
90398971
ENSG00000140025
EFCAB11
0.14601488
+

779
1689
02091
chr1
61577042
61624827
ENSG00000270742
RP4-
−0.144497254
−

802A10.1

780
1690
17014
chr4
39915230
39927553
ENSG00000121892
PDS5A
0.143294254
+

781
1691
13755
chr20
35695126
35696589
ENSG00000080839
RBL1
0.140612985
+

782
1692
07121
chr14
50130032
50141145
ENSG00000100479
POLE2
0.140484442
+

783
1693
20223
chr7
148543588
148544397
ENSG00000106462
EZH2
−0.140309562
−

784
1694
00933
chr1
193044949
193046180
ENSG00000116747
TROVE2
−0.139596873
−

785
1695
07159
chr14
50292584
50298079
ENSG00000165525
NEMF
−0.138077345
−

786
1696
10346
chr17
65941524
65972074
ENSG00000171634
BPTF
−0.13780517
−

787
1697
21643
chr8
68044185
68049838
ENSG00000104218
CSPP1
−0.136932953
−

788
1698
13454
chr2
9079949
9098771
ENSG00000143797
MBOAT2
0.133919127
+

789
1699
18540
chr5
78914469
78915906
ENSG00000164329
PAPD4
0.132799362
+

790
1700
18111
chr5
36953719
36976504
ENSG00000164190
NIPBL
0.132583843
+

791
1701
10507
chr18
12370847
12371690
ENSG00000141385
AFG3L2
0.132359086
+

792
1702
18863
chr6
126196034
126199516
ENSG00000111912
NCOA7
0.132255998
+

793
1703
00144
chr1
114372213
114377061
ENSG00000134242
PTPN22
−0.131778107
−

794
1704
03188
chr10
5741487
5756170
ENSG00000108021
FAM208B
−0.131204074
−

795
1705
18463
chr5
72370568
72373320
ENSG00000157107
FCHO2
−0.130937396
−

796
1706
12665
chr2
24181170
24199945
ENSG00000173960
UBXN2A
0.129857265
+

797
1707
22491
chr9
37126308
37126939
ENSG00000147905
ZCCHC7
−0.129829873
−

798
1708
08822
chr16
1859238
1859834
ENSG00000063854
HAGH
−0.128958795
−

799
1709
09342
chr16
89291126
89292039
ENSG00000170100
ZNF778
0.124901078
+

800
1710
21674
chr8
74585341
74601048
ENSG00000040341
STAU2
0.124142904
+

801
1711
03883
chr11
120343758
120348235
ENSG00000196914
ARHGEF12
−0.123923895
−

802
1712
19740
chr6
84894904
84896341
ENSG00000135315
KIAA1009
0.123773636
+

803
1713
20697
chr7
77210743
77212967
ENSG00000127947
PTPN12
−0.123256841
−

804
1714
05413
chr12
14599921
14610229
ENSG00000171681
ATF7IP
0.123186507
+

805
1715
02968
chr10
27431315
27434519
ENSG00000136758
YME1L1
−0.121868499
−

806
1716
00185
chr1
1158623
1159348
ENSG00000078808
SDF4
−0.119575165
−

807
1717
22996
chrX
147743428
147744289
ENSG00000155966
AFF2
−0.118456414
−

808
1718
02080
chr1
61575449
61578015
ENSG00000162599
NFIA
−0.118244481
−

809
1719
04138
chr11
33127112
33127610
ENSG00000176102
CSTF3
−0.114563088
−

810
1720
13631
chr20
32617574
32619410
ENSG00000125970
RALY
−0.113556294
−

811
1721
21883
chr9
111812562
111812972
ENSG00000106771
TMEM245
0.110008883
+

812
1722
10514
chr18
12999419
13019205
ENSG00000101639
CEP192
−0.108819051
−

813
1723
15292
chr3
167754623
167759262
ENSG00000173905
GOLIM4
0.107484468
+

814
1724
09077
chr16
58593707
58594266
ENSG00000125107
CNOT1
−0.105014078
−

815
1725
20343
chr7
17885217
17890587
ENSG00000071189
SNX13
0.10426073
+

816
1726
06102
chr12
96717725
96728643
ENSG00000059758
CDK17
−0.102490008
−

817
1727
21911
chr9
114676884
114678116
ENSG00000148154
UGCG
−0.10000879
−

818
1728
16667
chr4
151719232
151729550
ENSG00000198589
LRBA
0.098635477
+

819
1729
08476
chr15
76580186
76585041
ENSG00000140374
ETFA
0.095064522
+

820
1730
02444
chr1
95609446
95639445
ENSG00000152078
TMEM56
0.09412163
+

821
1731
17034
chr4
42024854
42025401
ENSG00000014824
SLC30A9
−0.093848466
−

822
1732
00281
chr1
150202905
150204264
ENSG00000143401
ANP32E
0.092891301
+

823
1733
03999
chr11
16117541
16208501
ENSG00000110693
SOX6
−0.091724724
−

824
1734
22667
chr9
86294689
86301070
ENSG00000135018
UBQLN1
0.091306013
+

825
1735
22261
chr9
17226200
17236586
ENSG00000044459
CNTLN
−0.091125079
−

826
1736
10952
chr18
76886266
76914555
ENSG00000166377
ATP9B
−0.089171438
−

827
1737
19236
chr6
18256591
18258636
ENSG00000124795
DEK
−0.088063183
−

828
1738
13408
chr2
85875052
85875976
ENSG00000168883
USP39
−0.088017214
−

829
1739
07053
chr14
39620949
39628754
ENSG00000182400
TRAPPC6B
−0.086392887
−

830
1740
04125
chr11
2991032
2993473
ENSG00000205531
NAP1L4
−0.085668582
−

831
1741
09176
chr16
69404385
69406258
ENSG00000132604
TERF2
0.084750698
+

832
1742
20988
chr8
117668094
117671219
ENSG00000147677
EIF3H
0.084610875
+

833
1743
18339
chr5
65284462
65290692
ENSG00000112851
ERBB2IP
−0.080397524
−

834
1744
16176
chr3
56694758
56707753
ENSG00000163946
FAM208A
0.079486106
+

835
1745
11220
chr19
30476129
30477324
ENSG00000105176
URI1
−0.078233761
−

836
1746
14276
chr21
40600425
40601362
ENSG00000185658
BRWD1
−0.077947077
−

837
1747
11806
chr2
148730307
148733544
ENSG00000115947
ORC4
−0.073487662
−

838
1748
07606
chr14
92473983
92477416
ENSG00000100815
TRIP11
0.071182875
+

839
1749
17751
chr5
138979956
138994551
ENSG00000131508
UBE2D2
−0.070241902
−

840
1750
07232
chr14
52977957
53011089
ENSG00000087301
TXNDC16
−0.067999092
−

841
1751
00197
chr1
117944807
117948267
ENSG00000198162
MAN1A2
−0.0648967
−

842
1752
18265
chr5
56160560
56161804
ENSG00000095015
MAP3K1
−0.064020699
−

843
1753
04703
chr11
77336007
77336863
ENSG00000074201
CLNS1A
−0.063683386
−

844
1754
22665
chr9
86293355
86301070
ENSG00000135018
UBQLN1
0.063068015
+

845
1755
19216
chr6
17669205
17669777
ENSG00000124789
NUP153
−0.062863001
−

846
1756
17361
chr4
87967317
87968746
ENSG00000172493
AFF1
−0.062349721
−

847
1757
09584
chr17
28011580
28030080
ENSG00000141298
SSH2
−0.062209042
−

848
1758
06777
chr14
103865287
103871604
ENSG00000075413
MARK3
0.061116967
+

849
1759
21042
chr8
124346117
124348772
ENSG00000156802
ATAD2
0.060568881
+

850
1760
15108
chr3
148164052
148173318
NA
NA
0.057927058
+

851
1761
06177
chr13
113219424
113223573
ENSG00000126216
TUBGCP3
−0.0579005
−

852
1762
10983
chr18
9182379
9221997
ENSG00000101745
ANKRD12
−0.057876407
−

853
1763
10604
chr18
21087948
21089243
ENSG00000141452
C18orf8
−0.055418355
−

854
1764
08542
chr15
85223943
85234875
ENSG00000140612
SEC11A
0.054970161
+

855
1765
02224
chr1
70758070
70781249
ENSG00000118454
ANKRD13C
0.054606996
+

856
1766
01705
chr1
29386933
29424447
ENSG00000159023
EPB41
0.054384837
+

857
1767
07102
chr14
45705016
45706924
ENSG00000129534
MIS18BP1
−0.054279736
−

858
1768
04624
chr11
73418464
73429763
ENSG00000175582
RAB6A
−0.053950256
−

859
1769
06763
chr14
102661274
102664184
ENSG00000140153
WDR20
0.053904813
+

860
1770
00399
chr1
155691307
155695810
ENSG00000132676
DAP3
−0.053660957
−

861
1771
10224
chr17
59853761
59857762
ENSG00000136492
BRIP1
0.05142855
+

862
1772
16576
chr4
140058783
140060651
ENSG00000109381
ELF2
0.051075392
+

863
1773
16378
chr4
105439733
105440611
ENSG00000245384
AC004053.1
−0.050390264
−

864
1774
08827
chr16
18809246
18810156
ENSG00000170540
ARL6IP1
−0.050301092
−

865
1775
09125
chr16
68155889
68160513
ENSG00000072736
NFATC3
0.049811818
+

866
1776
08521
chr15
80412669
80415142
ENSG00000086666
ZFAND6
0.04971732
+

867
1777
22418
chr9
33996220
34017187
ENSG00000137073
UBAP2
0.048875187
+

868
1778
17441
chr4
99495607
99496056
ENSG00000168785
TSPAN5
0.046088362
+

869
1779
04651
chr11
74500670
74528759
ENSG00000166439
RNF169
0.04506624
+

870
1780
10064
chr17
53478829
53481229
ENSG00000108960
MMD
−0.043505886
−

871
1781
11243
chr19
33604672
33605325
ENSG00000076650
GPATCH1
−0.041554229
−

872
1782
13939
chr20
47685251
47686834
ENSG00000124207
CSE1L
−0.041440338
−

873
1783
20310
chr7
158552176
158557544
ENSG00000117868
ESYT2
−0.039332637
−

874
1784
12027
chr2
172782046
172809519
ENSG00000128708
HAT1
0.038890112
+

875
1785
05780
chr12
58340777
58347472
ENSG00000166896
XRCC6BP1
0.036777678
+

876
1786
23021
chrX
154645235
154649223
NA
NA
0.036352385
+

877
1787
14612
chr22
41566409
41569788
ENSG00000100393
EP300
−0.035726557
−

878
1788
12275
chr2
202163960
202173973
ENSG00000155749
ALS2CR12
0.033395502
+

879
1789
00568
chr1
168007608
168014465
ENSG00000143164
DCAF6
0.033275143
+

880
1790
08122
chr15
59204761
59209198
ENSG00000137776
SLTM
−0.032521344
−

881
1791
01196
chr1
213251037
213290752
ENSG00000136643
RPS6KC1
−0.032028145
−

882
1792
08360
chr15
66641397
66641775
ENSG00000075131
TIPIN
0.030872075
+

883
1793
16383
chr4
106155053
106158508
ENSG00000168769
TET2
−0.030773502
−

884
1794
11998
chr2
171884848
171902872
ENSG00000198586
TLK1
−0.030704957
−

885
1795
06802
chr14
103923478
103928798
ENSG00000075413
MARK3
0.03038484
+

886
1796
08341
chr15
66021409
66031213
ENSG00000174485
DENND4A
0.029114553
+

887
1797
15592
chr3
195780288
195803993
ENSG00000072274
TFRC
0.027473612
+

888
1798
02669
chr10
119100490
119104960
ENSG00000165650
PDZD8
−0.026742711
−

889
1799
22521
chr9
4823547
4827033
ENSG00000120158
RCL1
0.025254641
+

890
1800
14172
chr21
35138178
35140132
ENSG00000205726
ITSN1
−0.023931854
−

891
1801
22227
chr9
139115608
139118720
ENSG00000165661
QSOX2
0.019159719
+

892
1802
21772
chr8
95897294
95897786
ENSG00000175305
CCNE2
−0.017276448
−

893
1803
12762
chr2
26505712
26505919
ENSG00000138029
HADHB
0.016059531
+

894
1804
08619
chr15
93467550
93472321
ENSG00000173575
CHD2
0.015937506
+

895
1805
01109
chr1
21076215
21100103
ENSG00000127483
HP1BP3
0.015119097
+

896
1806
18951
chr6
13639794
13644961
ENSG00000010017
RANBP9
0.014679425
+

897
1807
01465
chr1
235963619
235964397
ENSG00000143669
LYST
−0.013840453
−

898
1808
03064
chr10
32759991
32762951
ENSG00000216937
CCDC7
0.012950478
+

899
1809
19408
chr6
41839301
41859613
ENSG00000164663
USP49
−0.012058437
−

900
1810
21222
chr8
142264087
142264728
ENSG00000022567
SLC45A4
0.011653698
+

901
1811
21894
chr9
114148656
114154104
ENSG00000136813
KIAA0368
0.011310291
+

902
1812
17147
chr4
56277780
56284152
ENSG00000134851
TMEM165
0.010589367
+

903
1813
12555
chr2
231222519
231226412
ENSG00000185404
SP140L
−0.010261584
−

904
1814
03671
chr10
99196173
99197507
ENSG00000171311
EXOSC1
−0.008914606
−

905
1815
15853
chr3
33725850
33738425
ENSG00000163539
CLASP2
0.007974047
+

906
1816
15700
chr3
197541778
197547301
ENSG00000186001
LRCH3
0.007744757
+

907
1817
03351
chr10
70547683
70548085
ENSG00000060339
CCAR1
0.006380515
+

908
1818
11660
chr2
120684173
120692534
ENSG00000088179
PTPN4
−0.003481975
−

909
1819
00498
chr1
160293220
160302347
ENSG00000122218
COPA
−0.003328713
−

910
1820
11490
chr19
8538547
8539128
ENSG00000099783
HNRNPM
+0.0001735361
+

“SEQ ID NO: junction” refers to the SEQ ID NO: encoding the sequence surrounding the exon-exon junction in a head-to-tail arrangement, i.e. 20 nucleotides upstream and 20 nucleotides downstream of the actual junction.

“SEQ ID NO: full length” refers to the SEQ ID NO: encoding the entire sequence identified for the respective circRNA. Importantly, these sequences do not comprise any intronic sequences which are assumed to be spliced out during circRNA biogenesis.

“circID” indicates the internal reference bloodCirc_# of the inventors.

“chr” denotes the chromosome the circRNA is stemming from (chrM is the mitochondrial chromosome).

“start” and “stop” indicate where on the respective chromosome the start and the stop of the circRNA encoding sequence is found. The reference sequence is hg19 downloaded from the UCSC genome browser (see Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al.; The human genome browser at UCSC. Genome Research. 2002 June; 12(6): 996-1006).

“gene” and “gene_name” denote the gene as annotated in the UCSC genome browser and the commonly used name, respectively.

“NA” indicated that the gene is not yet annotated.

The “score” is calculated by subtracting the mean values of each circRNA in the two groups (healthy and diseased (Alzheimer's)) and dividing by the highest standard deviation (cp. FIG. 15). Negative score = decreased levels or absence of the respective circRNA found in samples of diseased subjects. Positive score = increased levels and/or presence of the respective circRNA found in samples of diseased subjects diseased.

“diseased” denotes whether increased levels or presence of the respective circRNA are indicative for the presence of the neurodegenerative disease (“+”), or whether decreased levels or absence of the respective circRNA are indicative for the presence of a neurodegenerative disease (“−”).

The sequences in the Sequence Listing are DNA sequences encoding the actual circRNA. Hence, the actual circRNA is the listed sequence with “T” being exchanged by an “U”.

The 910 circRNAs listed here resulted from an expression cut-off on all detected circRNAs in the sample set. This cut-off was chosen such, that a Principle Component Anlaysis (PCA, see above) is not affected by expression noise.

As outlined herein, it may be desirable to determine the presence or absence, or the level of more than on circRNA in order to increase the diagnostic significance of the method according to the present invention. Hence, in a preferred embodiment of the method for diagnosing a neurodegenerative disease the levels of more than one circRNA comprising a sequence encoded by a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, or a sequence having at least 70% identity thereto, are determined and compared to the respective control level. The identity is preferably at least 80%, more preferably at least 90%, more preferably at least 95%. In a preferred embodiment the levels of at least 100 circRNAs comprising a sequence encoded by a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, or a sequence having at least 70% identity thereto, are determined in a sample of a bodily fluid of said subject and controlled to the respective control level; preferably the levels of at least 150 circRNAs and more preferably the levels of at least 200 circRNAs comprising a sequence encoded by a sequence selected from the group consisting of nt 11 to 30 of any one of SEQ ID NO:1 to SEQ ID NO:910, or a sequence having at least 70% identity thereto, are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. The identity is preferably at least 80%, more preferably at least 90%, more preferably at least 95% to the respective sequences in the SEQ ID NO:.

In one embodiment the circRNAs comprising a sequence encoded by SEQ ID NOs:1 to 910 have the sequence as determined by the inventors, i.e. have a sequence encoded by the sequence of any of SEQ ID NOs: 911 to 1820. Hence, in one embodiment of the method for diagnosing a neurodegenerative disease the levels of more than one circRNAs having a sequence being at least 70% identical to any of the sequences as encoded by SEQ ID NO: 911 to 1820 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. Particularly preferred the levels of at least 100 circRNAs having a sequence being at least 70% identical to any of the sequences as encoded by SEQ ID NOs: 911 to 1820 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level, preferably the levels of at least 150, and more preferably the levels of at least 200 circRNAs having a sequence being at least 70% identical to any of the sequences as encoded by SEQ ID NOs: 911 to 1820 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. In more preferred embodiments said sequence identity is at least 80%, preferably at least 90%, more preferably at least 95%, yet more preferably at least 99% to the outlined sequences. In a further preferred embodiment the circRNAs have the sequences as encoded by any one of SEQ ID NOs: 911 to 1820. The levels are preferably detected by using hybridization probes specifically hybridizing the sequences of nt 11 to 30 of SEQ ID NO:1 to 910 or specifically hybridizing to the sequences of SEQ ID NO:1 to 910, or an RNA sequence encoded by these sequences, or the respective reverse complements thereof.

The inventors found that the first 200 circRNAs as encoded by SEQ ID NO:1 to 910 have particular suited predictive and diagnostic values. Hence, in a preferred embodiment of the method for diagnosing a neurodegenerative disease said at least 100, preferably at least 150 more preferably at least 200 circRNAs comprise a sequence as encoded by any of SEQ ID NO:1 to SEQ ID NO:200, or a sequence having at least 70% identity thereto, or the circRNAs have a sequence as encoded by any of SEQ ID NO: 911 to 1110, or a sequence having at least 70% identity thereto. The identity is preferably at least 80%, more preferably at least 90%, more preferably at least 95%. In a preferred embodiment of the method for diagnosing a neurodegenerative disease the levels of more than one circRNA comprising a sequence encoded by a sequence being at least 70% identical to a sequence selected from the group consisting of the sequence of nt 11 to nt 30 of any one of SEQ ID NO: 1 to 200 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. Particularly preferred the levels of at least 100 circRNAs comprising a sequence encoded by a sequence being at least 70% identical to a sequence selected from the group consisting of the sequence of nt 11 to nt 30 of any one of SEQ ID NO: 1 to 200 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level, preferably of at least 150 and more preferably the levels of all 200 circRNAs comprising a sequence encoded by a sequence being at least 70% identical to a sequence selected from the group consisting of the sequence of nt 11 to nt 30 of any one of SEQ ID NO: 1 to 200 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. The identity is preferably at least 80%, more preferably at least 90%, more preferably at least 95%, yet more preferred 100%. The levels are preferably detected by using hybridization probes specifically hybridizing the sequences of nt 11 to 30 of SEQ ID NO:1 to 200 or specifically hybridizing to the sequences of SEQ ID NO:1 to 200, or an RNA sequence encoded by these sequences, or the reverse complements thereof

In one embodiment the circRNAs comprising a sequence encoded by any of the SEQ ID NOs:1 to 200 have the sequence as determined by the inventors, i.e. a sequence encoded by any of the SEQ ID NOs: 911 to 1110. Hence, in one embodiment of the method for diagnosing a neurodegenerative disease the levels of more than one circRNA having a sequence encoded by a sequence being at least 70% identical to any of the sequences of SEQ ID NO: 911 to 1110 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. Particularly preferred the levels of at least 100 circRNAs having a sequence encoded by a sequence being at least 70% identical to any of the sequences of SEQ ID NOs: 911 to 1110 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level, preferably the levels of at least 150, and more preferably the levels of at least 200 circRNAs having a sequence encoded by a sequence being at least 70% identical to any of the sequences of SEQ ID NOs: 911 to 1110 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level. In more preferred embodiments said sequence identity is at least 80%, preferably at least 90%, more preferably at least 95%, yet more preferably at least 99% to the outlined sequences. In a further preferred embodiment the circRNAs have the sequences encoded by the sequences as set out in any one of SEQ ID NOs: 911 to 1110. The levels are preferably detected by using hybridization probes specifically hybridizing the sequences of nt 11 to 30 of SEQ ID NO:1 to 200 or specifically hybridizing to the sequences of SEQ ID NO:1 to 200, or an RNA sequence encoded by these sequences, or the reverse complements thereof.

The determination of percent identity between two sequences is accomplished using the mathematical algorithm of Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90: 5873-5877. Such an algorithm is incorporated into the BLASTN and BLASTP programs of Altschul et al. (1990) J. Mol. Biol. 215: 403-410. BLAST nucleotide searches are performed with the BLASTN program, score=100, word length=12, to obtain nucleotide sequences homologous to the nucleic acid sequences outlined herein. BLAST protein searches are performed with the BLASTP program, score=50, wordlength=3, to obtain amino acid sequences homologous to the EPO variant polypeptide, respectively. To obtain gapped alignments for comparative purposes, Gapped BLAST is utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25: 3389-3402. When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs are used.

In order to improve the diagnostic value, it may be desirable to determine the number of circRNAs showing increased or decreased levels being indicative for the neurodegenerative disease as outlined in Table 1. In a preferred embodiment the number of circRNAs showing increased or decreased levels being indicative for the neurodegenerative disease as outlined in Table 1 is above the 80% percentile of a control population, more preferably above the 90% percentile, yet more preferred above the 95% percentile. In a preferred embodiment of the method for diagnosing a neurodegenerative disease the presence of increased or decreased levels as defined in Table 1 under “disease” for the respective circRNA for at least 10, preferably at least 50, more preferably at least 100 circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease.

In one particular embodiment of the method for diagnosing a neurodegenerative disease the levels of all circRNAs comprising a sequence encoded by a sequence selected from the group consisting of nt 11 to 30 of any of SEQ ID NO:1 to SEQ ID NO:200, or a sequence having at least 70% identity thereto; wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease. Preferably, said method comprises the determination of the levels of all circRNAs comprising a sequence encoded by a sequence selected from the group consisting of any of SEQ ID NO:1 to SEQ ID NO:200 or a sequence having at least 70% identity thereto, wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease. Further preferred, all circRNAs with a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:911 to SEQ ID NO:1110 are detected, wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease. In more preferred embodiments said sequence identity is at least 80%, preferably at least 90%, more preferably at least 95%, yet more preferably at least 99% to the outlined sequences, yet more preferred the identity is 100%.

In one particular embodiment of the method for diagnosing a neurodegenerative disease the levels of all circRNAs comprising a sequence encoded by a sequence selected from the group consisting of nt 11 to 30 of any of SEQ ID NO:1 to SEQ ID NO:910, or a sequence having at least 70% identity thereto; wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease. Preferably, said method comprises the determination of the levels of all circRNAs comprising a sequence encoded by a sequence selected from the group consisting of any of SEQ ID NO:1 to SEQ ID NO:910 or a sequence having at least 70% identity thereto, wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease. Further preferred, all circRNAs with a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:911 to SEQ ID NO:1820 are detected, wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative for the presence of a neurodegenerative disease, preferably Alzheimer's disease. In more preferred embodiments said sequence identity is at least 80%, preferably at least 90%, more preferably at least 95%, yet more preferably at least 99% to the outlined sequences, yet more preferred the identity is 100%.

As outlined herein above, the circRNAs may be specifically detected through their unique sequence at the exon-exon junction in the head-to-tail arrangement. Hence, the invention also relates to a nucleic acid probe specifically hybridizing to a sequence of nucleotide (nt) 11 to nt 30 of any of the sequences of SEQ ID NO:1 to 910, or specifically hybridizing to a RNA sequence encoded by these sequences, or specifically hybridizing to a reverse complement sequences thereof, preferably specifically binding to any of the sequences of SEQ ID NO:1 to 910, or specifically hybridizing to a RNA sequence encoded by these sequences, or specifically hybridizing to a reverse complement sequences thereof. In a very preferred embodiment the nucleic acid probe spans the sequence of nt 15 to nt 35 of the respective SEQ ID NO: 1 to 910, of RNA sequence encoded by these sequences, or the reverse complement sequences thereof.

Nucleic acid probes may be prepared using any suitable method, such as, for example, the phosphotriester and phosphodiester methods or automated embodiments thereof. In one such automated embodiment diethylophosphoramidites are used as starting materials and may be synthesized as described by Beaucage et al., Tetrahedron Letters, 22:1859-1862 (1981), which is hereby incorporated by reference. One method for synthesizing oligonucleotides on a modified solid support is described in U.S. Pat. No. 4,458,006, which is hereby incorporated by reference. It is also possible to use a nucleic acid probes which has been isolated from a biological source (such as a restriction endonuclease digest). Preferred nucleic acid probes have a length of from about 15 to 500, more preferably about 20 to 200, most preferably about 25 to 60 bases.

The nucleic acid probe according to the present invention may be hybridization probe or as a primer for amplification reactions. In both cases the nucleic acid probe may comprise fluorescent dyes. Such fluorescent dyes may for example be FAM (5- or 6-carboxyfluorescein), VIC, NED, fluorescein, FITC, IRD-700/800, CY3, CY5, CY3.5, CY5.5, HEX, TET, TAMRA, JOE, ROX, BODIPY TMR, Oregon Green, Rhodamine Green, Rhodamine Red, Texas Red, Yakima Yellow, Alexa Fluor, PET and the like (see e.g. https://www.micro-shop.zeiss.com/us/us_en/spektral.php). In the context of the present invention, fluorescent dyes may for example be FAM (5- or 6-carboxyfluorescein), VIC, NED, fluorescein, fluorescein isothiocyanate (FITC), IRD-700/800, cyanine dyes, auch as CY3, CY5, CY3.5, CY5.5, Cy7, xanthen, 6-carboxy-2′,4′,7′,4,7-hexachlorofluorescein (HEX), TET, 6-carboxy-4′,5′-dichloro-2′,7′-dimethodyfluorescein (JOE), N,N,N′,N′-Tetramethyl-6-carboxyrhodamine (TAMRA), 6-carboxy-X-rhodamine (ROX), 5-Carboxyrhodamine-6G (R6G5), 6-carboxyrhodamine-6G (RG6), Rhodamine, Rhodamine Green, Rhodamine Red, Rhodamine 110, BODIPY dyes, such as BODIPY TMR, Oregon Green, coumarines such as Umbelliferone, benzimides, such as Hoechst 33258; phenanthridines, such as Texas Red, Yakima Yellow, Alexa Fluor, PET, ethidium bromide, acridinium dyes, carbazol dyes, phenoxazine dyes, porphyrin dyes, polymethin dyes, and the like.

In a preferred embodiment, the nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of the sequences listed in Table 1, or specifically hybridizing to a RNA sequence encoded by these sequences, or specifically hybridizing to a reverse complement sequences thereof; preferably specifically hybridizing to a sequence selected from the group consisting of the sequences listed in Table 1, or specifically hybridizing to a RNA sequence encoded by these sequences, or specifically hybridizing to a reverse complement sequences thereof.

The invention furthermore relates to a kit for specifically detecting one or more, preferably more than one nucleic acids comprising a sequence selected from the group consisting of nt 11 to nt 30 of any one of SEQ ID NO: 1 to 910, or a sequence selected from the group consisting of SEQ ID NO:1 to 910, or SEQ ID NO:911 to SEQ ID NO:1820, or an RNA sequence encoded by any of these sequences. The kit is preferably a kit for diagnosing a neurodegenerative disease, comprising means for specifically detecting one or more nucleic acid sequence selected from the group consisting of SEQ ID NO:1 to 910 or SEQ ID NO:911 to SEQ ID NO:1820, or an RNA sequence encoded by any of these sequences. In a preferred embodiment the kit comprises means for specifically detecting at least 100, preferably at least 150 more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910 or SEQ ID NO:911 to SEQ ID NO:1820, or an RNA sequence encoded by any of these sequences.

The means for detecting preferably are one or more of the nucleic acid probes according to the invention. Hence, in one embodiment the kit comprises one or more nucleic acid probes, preferably more than one nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, or an RNA sequence encoded by these sequences, or the reverse complements thereof; preferably the kit comprises a plurality of nucleic acid probes specifically hybridizing to the sequence of nucleotide 11 to 30 of at least 100, preferably at least 150, more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, or a RNA sequence encoded by these sequences, or the reverse complements thereof. In a particular preferred embodiment the kit comprises a plurality of nucleic acid probes hybridizing to the sequence of nucleotide 11 to 30 of at least 100, preferably at least 150, more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to 200, or the RNA sequences encoded by these sequences, or the reverse complements thereof; preferably the kit comprises a plurality of nucleic acid probes hybridizing to the sequence of nucleotide 11 to 30 of all of the sequences of SEQ ID NO:1 to 200, or the RNA sequences encoded by these sequences, or the reverse complements thereof.

In a further particular preferred embodiment the kit comprises a plurality of nucleic acid probes hybridizing to the sequence of at least 100, preferably at least 150, more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to 200, or an RNA sequence encoded by these sequences, or the reverse complements thereof; preferably the kit comprises a plurality of nucleic acid probes hybridizing to the sequence of all of the sequences of SEQ ID NO:1 to 200, or the RNA sequences encoded by these sequences, or the reverse complements thereof.

The kit may further comprise means for handling and/or preparation of a bodily fluid sample, preferably for cerebrospinal fluid or whole blood. In a preferred embodiment the kit comprises a container for collecting whole blood, said container comprising stabilizing agents, preferably selected from the group consisting of chelating agents, EDTA, K₂EDTA, formulations like RNAlater (Qiagen) or such, or combinations thereof. In a particular preferred embodiment the kit comprises a K₂EDTA coated container.

As used herein, a kit is a packaged combination optionally including instructions for use of the combination and/or other reactions and components for such use.

Furthermore, the invention relates to an array for determining the presence or level of a plurality of nucleic acids, said array comprising a plurality of probes, wherein the plurality of probes specifically hybridize to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, or an RNA sequence encoded by these sequences, or the reverse complements thereof, preferably the plurality of probes comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of at least 100, preferably at least 150 more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, or the RNA sequences encoded by these sequences, or the reverse complements thereof. Preferably the plurality of probes comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of SEQ ID NO:1 to SEQ ID NO:200, or the RNA sequences encoded by these sequences, or the reverse complements thereof.

Herein an “array” is a solid support comprising one or more nucleic acids attached thereto. Arrays, such as microarrays (e.g. from Affimetrix®) are known in the art Schena M^I, Shalon D, Davis R W, Brown P O (1995); Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467-70. The solid support may be made of different nature including, but not limited to, those made of plastics, resins, polysaccharides, silica or silica-based materials, functionalized glass, modified silicon, carbon, metals, inorganic glasses, membranes, nylon, natural fibers such as silk, wool and cotton, and polymers. hi some embodiments, the material comprising the solid support has reactive groups such as carboxy, amino, hydroxy, etc., which are used for attachment of, e.g. nucleic acid probes. Polymers are preferred, and suitable polymers include, but are not limited to, polystyrene, polyethylene glycol tetraphthalate, polyvinyl acetate, polyvinyl chloride, polyvinyl pyrrolidone, polyacrylonitrile, polymethyl methacrylate, polytetrafluoroethylene, butyl rubber, styrenebutadiene rubber, natural rubber, polyethylene, polypropylene, (poly)tetrafluoroethylene, (poly)vinylidenefluoride, polycarbonate and polymethylpentene. Preferred polymers include those outlined in U.S. Pat. No. 5,427,779, hereby expressly incorporated by reference. The nucleic acid probes are preferably covalent attachment to the solid support of the array. Attachment may be performed as described below. As will be appreciated by those in the art, either the 5′ or 3′ terminus may be attached to the support using techniques known in the art. The arrays of the invention comprise at least two different covalently attached nucleic acid probes, with more than two being preferred. By “different” oligonucleotide herein is meant an oligonucleotide that has a nucleotide sequence that differs in at least one position from the sequence of a second oligonucleotide; that is, at least a single base is different, preferably their hybridization specificity is as outlined herein above.

Furthermore, the invention particularly relates to the use of a nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by these sequences, or hybridizing to the reverse complement thereof for the diagnosis of a neurodegenerative disease, preferably for the diagnosis of Alzheimer's disease The invention also relates to the use of a kit according to the invention for the diagnosis of a neurodegenerative disease, preferably for the diagnosis of Alzheimer's disease. Also encompassed by the invention is the use of an array according to the invention for the diagnosis of a neurodegenerative disease, preferably for the diagnosis of Alzheimer's disease.

The present invention also relates to the following items:

1. A method for diagnosing a disease of a subject, comprising the step of:
- determining the presence or absence of one or more circular RNA (circRNA) in a sample of a bodily fluid of said subject;
- wherein the presence or absence of said one or more circRNA is indicative of the disease.
2. The method according to item 1, wherein said disease is not a disease of said bodily fluid.
3. The method according to item 1 or 2, wherein said bodily fluid is blood or cerebrospinal fluid, most preferred whole blood.
4. The method according to any one of items 1 to 3, wherein the determination step comprises:
- determining the level of said one or more circRNA;
- comparing the determined level to a control level of said one or more circRNA;
- wherein differing levels between the determined and the control level are indicative of the disease.
5. The method according to item 4, wherein said one or more circRNA is differentially expressed between the diseased and non-diseased state in the tissue of interest.
6. The method according to any one of items 1 to 5, wherein the circRNA is detected by detection of an exon-exon-junction in a head-to-tail arrangement.
7. The method according to item 6, wherein circRNA is detected using a method selected from the group consisting of probe hybridization based methods, nucleic acid amplification based methods, and nucleic acid sequencing.
8. The method according to any one of items 1 to 7, wherein the sample is treated with RNase R before determination of the circRNA.
9. The method according to any one of items 1 to 8, wherein more than one circRNAs from a panel of circRNAs are determined.
10. The method according to item 9, wherein said panel comprises a plurality of circRNAs that have been identified as being present at differing levels in bodily fluid samples of patients having the disease and patients not having the disease, preferably identified by principle component analysis or clustering.
11. The method according to any one of items 4 to 10, wherein the disease is a neurodegenerative disease, preferably Alzheimer's disease.
12. The method according to item 11, wherein the method for diagnosing the neurodegenerative disease, preferably Alzheimer's disease, in a subject comprises the steps of:
- determining the level of one or more circRNA in a sample of a bodily fluid of said subject;
- comparing the determined level to a control level of said one or more circRNA;
- wherein differing levels between the determined and the control level are indicative of the disease.
13. The method according to item 12, wherein said one or more circRNA comprises a sequence encoded by a sequence being at least 70% identical to a sequence selected from the group consisting of nucleotides 11 to 30 of any of the sequences of SEQ ID NO:1 to SEQ ID NO:910, preferably wherein the presence of increased or decreased levels as defined in Table 1 under “disease” for the circRNA comprising the respective sequence are indicative of the presence of a neurodegenerative disease, preferably Alzheimer's disease.
14. The method according to item 13, wherein said one or more circRNA has a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:911 to SEQ ID NO:1820, preferably wherein the presence of increased or decreased levels as defined in Table 1 under “disease” for the circRNA having the respective sequence are indicative of the presence of a neurodegenerative disease, preferably Alzheimer's disease.
15. The method according to any of items 12 to 13, wherein the levels of at least 100, preferably at least 150 and more preferably at least 200 circRNAs comprising a sequence encoded by a sequence being at least 70% identical to a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910 are determined in a sample of a bodily fluid of said subject and controlled to the respective control level.
16. The method according to item 15, wherein said at least 100, preferably at least 150 and more preferably at least 200 circRNAs have a sequence encoded by a sequence selected from the group consisting of SEQ ID NO:911 to 1820, or a sequence being at least 70% identical thereto.
17. The method according to item 15 or 16, wherein said at least 100, preferably at least 150 more preferably at least 200 circRNAs comprise a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:200, or said circRNAs have a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO: 911 to 1110.
18. The method according to item 15, wherein the presence of increased or decreased levels as defined in Table 1 under “disease” for the respective circRNA for at least 10, preferably at least 50, more preferably at least 100 of said at least 100, preferably at least 150 and more preferably at least 200 circRNAs are indicative of the presence of a neurodegenerative disease, preferably Alzheimer's disease.
19. The method according to any one of items 13 to 16, wherein the levels of all circRNAs comprising a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910 or all circRNAs having a sequence encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:911 to SEQ ID NO:1820 are detected, wherein preferably the presence of increased or decreased levels as defined in Table 1 under “disease” for at least 10, preferably at least 50, more preferably at least 100 of the respective circRNAs are indicative of the presence of a neurodegenerative disease, preferably Alzheimer's disease.
20. A nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of the sequences listed in Table 1, or specifically hybridizing to a reverse complement sequence thereof; preferably specifically hybridizing to a sequence selected from the group consisting of the sequences listed in Table 1, or specifically hybridizing to a reverse complement sequence thereof
21. A kit for diagnosing a neurodegenerative disease, comprising means for specifically detecting one or more nucleic acid sequence encoded by a sequence selected from the group consisting of SEQ ID NO:1 to 910 or SEQ ID NO:911 to SEQ ID NO:1820, or a sequence being at least 70% identical to the recited sequences.
22. The kit of item 20, wherein the kit comprises means for specifically detecting at least 100, preferably at least 150 more preferably at least 200 nucleic acid sequences encoded by a sequence having at least 70% identity to a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910 or SEQ ID NO:911 to SEQ ID NO:1820.
23. The kit of item 21 or 22, wherein the kit comprises one or more nucleic acid probes specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:910, or hybridizing the reverse complements thereof, preferably the kit comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of at least 100, preferably at least 150 more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:910, or hybridizing to the reverse complements thereof
24. The kit according to any on of items 21 to 23, further comprising means for handling and/or preparation of a bodily fluid sample, preferably for cerebrospinal fluid or whole blood.
25. An array for determining the presence or level of a plurality of nucleic acids, comprising a plurality of probes, wherein the plurality of probes specifically hybridize to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:910, or hybridizing the reverse complements thereof, preferably the plurality of probes comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of at least 100, preferably at least 150 more preferably at least 200 nucleic acid sequences selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:910, or specifically hybridizing to the reverse complement sequences thereof, preferably the plurality of probes comprises probes specifically hybridizing to the sequence of nucleotide 11 to 30 of SEQ ID NO:1 to SEQ ID NO:200, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:200, or hybridizing to the reverse complement sequences thereof.
26. Use of a nucleic acid probe specifically hybridizing to the sequence of nucleotide 11 to 30 of a sequence selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:910, and the RNA sequences encoded by a sequence of SEQ ID NO:1 to SEQ ID NO:910, or hybridizing to the reverse complement thereof, a kit according to items 21 to 24, or an array according to item 25 for the diagnosis of a neurodegenerative disease, preferably for the diagnosis of Alzheimer's disease.

It will be apparent that the methods and components of the present invention, as well as the uses as substantially described herein or illustrated in the description and the examples, are also subject of the present invention and claimed herewith. In this respect, it is also understood that the embodiments as described in the description and/or any one of the examples, can be independently used and combined with any one of the embodiments described hereinbefore and claimed in the appended claims set. Thus, these and other embodiments are disclosed and encompassed by the description and examples of the present invention.

The invention is further illustrated by the following non-limiting Examples and Figures.

EXAMPLES

1. Methods

1.1 Whole Blood Sample Collection

Blood sampling was approved by the Charité ethics committee, registration number EA4/078/14 and all participants gave written informed consent. 5 mL blood were drawn from subjects by venipuncture and collected in K₂EDTA coated Vacutainer (BD, #368841) and stored on ice until used for RNA preparation. For downstream RNA analysis by sequencing or qPCR assays presented here, 100 μL blood (>1 μg total RNA) is sufficient.

1.2 RNA Isolation and RNase R Treatment

Total RNA was isolated from fresh whole blood samples. Blood was diluted 1:3 in PBS and 250 μL of the dilution were used for RNA preparation using 750 μL Trizol LS reagent (Life Technology). Samples were homogenized by gentle vortexing and 200 μL chloroform was added. After centrifugation at 4° C., 15 min at full speed in a table top centrifuge, the aqueous phase was collected to a new tube (typically 400 μL). RNA was precipitated by adding an equal volume of cold isopropanol and incubation for ≥1 hour at −80° C. RNA pellets were recovered by spinning at 4° C., 30 min at full speed in a table top centrifuge. RNA pellets were washed with 1 mL 80% EtOH and subsequently air dried at room temperature for 5 min. The RNA was resuspended in 20 μL RNase-free water and treated with DNase I (Promega) for 15 min at 37° C. with subsequent heat inactivation for 10 min at 65° C. HEK293 total RNA was prepared in the same way but using 1 mL Trizol on cell pellets. For sequencing experiments the RNA preparations were additionally subjected to two rounds of ribosomal RNA depletion using a RiboMinus Kit (Life Technologies K1550-02 and A15020). Total RNA integrity and rRNA depletion were monitored using a Bioanalyzer 2001 (Agilent Technologies). For qPCR analysis the samples were treated with RNase R (Epicentre) for 15 min at 37° C. at a concentration of 3 U/μg RNA. After treatment 5% C. elegans total RNA was spiked-in followed by phenol-chloroform extraction of the RNA mixture. For controls the RNA was mock treated without the enzyme.

1.3 cDNA Library Preparation for Deep Sequencing

cDNA libraries were generated according to the Illumina TruSeq protocol. Sample RNA was fragmented, adaptor ligated, amplified and sequenced on an Illumina HiSeq2000 in 1×100 cycle runs.

1.4 Quantitative PCR (qPCR)

Total RNA was reverse transcribed using Maxima reverse transcriptase (Thermo Scientific) according to the manufacturer's protocol. qPCR reactions were performed using Maxima SYBR Green/Rox (Thermo Scientific) on a StepOne Plus System (Applied Biosystems). Primer sequences are available in the Table 7. RNase R assays were normalized to C. elegans RNA spike-in RNA.

1.5 Sanger Sequencing

PCR products were size separated by agarose gel electrophoresis, amplicons were extracted from gels and Sanger sequenced by standard methods (Eurofins).

1.6 Detection and Annotation of circRNAs

The detection of circular RNA was based on a previously published method (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338) with the following details. Human reference genome hg19 (February 2009, GRCh37) was downloaded from the UCSC genome browser (see Kent W J, Sugnet C W, Furey T S, et al. The human genome browser at UCSC. Genome Research. 2002; 12(6):996-1006) and was used for all subsequent analysis. bowtie2 (version 2.1.0 (see Langmead B, Salzberg S L. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012; 9(4):357-359) was employed for mapping of RNA sequencing reads. Reads were mapped to ribosomal RNA sequence data downloaded from the UCSC genome browser. Reads that do not map to rRNA were extracted for further processing. In a second step, all reads that mapped to the genome by aligning the whole read without any trimming (end-to-end mode) were neglected. Reads not mapping continuously to the genome were used for circRNA candidate detection. From those 20 nucleotide terminal sequences (anchors) were extracted and re-aligned independently to the genome. The anchor alignments were then extended until the full read sequence was covered. Consecutively aligning anchors indicate linear splicing events whereas alignment in reverse orientation indicates head-to-tail splicing as observed in circRNAs (FIG. 1A). The resulting splicing events were filtered using the following criteria 1) GT/AG signal flanking the splice sites 2) unambiguous breakpoint detection 3) maximum of two mismatches when extending the anchor alignments 4) breakpoint no more than two nucleotides inside the alignment of the anchors 5) at least two independent reads supporting the head-to-tail splice junction 6) a minimum difference of 35 in the bowtie2 alignment score between the first and the second best alignment of each anchor 7) no more than 100 kilobases distance between the two splice sites.

1.7 circRNA Annotation

Genomic coordinates of circRNA candidates were intersected with published gene models (ENSEMBL, release 75 containing 22,827 protein coding genes, 7484 lincRNAs and 3411 miRNAs). circRNAs were annotated and exon-intron structure predicted as previously described (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338). Known introns in circRNAs were assumed to be spliced out. Each circRNA was counted to a gene structure category if it overlaps fully or partially with the respective ENSEMBL feature (FIG. 1C, Table 1).

1.8 Published RNA Data Sets

In this study rRNA depleted RNA-seq data from whole blood samples (own data), fetal cerebellum (ENCODE accession: ENCSR000AEW) fetal liver (ENCODE accession: ENCSR000AFB) and HEK293 (Table 1; see Ivanov A, Memczak S, Wyler E, et al. Analysis of Intron Sequences Reveals Hallmarks of Circular RNA Biogenesis in Animals. CellReports. 2015; 10(2):170-177) was used. Expression values, coordinates and other details of the circRNAs reported here and all associated scripts will be made available at www.circbase.org (see Glažar P, Papavasileiou P, Rajewsky N. circBase: a database for circular RNAs. RNA. 2014; 20(11):1666-1670).

1.9 Quantification of circRNA and Host Gene Expression

The number of reads that span a particular head-to-tail junction were used as a measure for circRNA expression. To allow comparison of expression between samples, raw read counts were normalized to sequencing depth by dividing by the number of reads that map to protein coding gene regions and multiply by 1,000,000 (FIG. 1B left, FIG. 4 C-D, FIGS. 6 and 10 A,C). To estimate host gene expression, RNA-seq data were first mapped to the reference genome with STAR (see Dobin A, Davis C A, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013; 29(1):15-21). htseq-count (see Anders S, Pyl P T, Huber W. HTSeq—A Python framework to work with high-throughput sequencing data. 2014) was employed to count hits on genomic features of ENSEMBL gene models. The measure transcripts per million (TPM) was calculated for each transcript and sample in order to compare total host gene expression between samples (FIG. 1B, right).

Circular-to-linear ratios were calculated for each circRNA by dividing raw head-to-tail read counts by the median number of reads that span linear spliced junctions of the respective host gene. For both measures one pseudo count was added to avoid division by zero. CircRNAs from host genes without annotated splice junctions according to the ENSEMBL gene annotation, were not considered in this analysis.

For analysis in FIG. 3D a permutation test with 1000 Monte-Carlo replications was performed on pooled biological replicate data to approximate the exact conditional distribution. To adjust for different dataset sizes the respective larger data set of each comparison was randomly subsampled.

1.10 Principal Component Analysis and Clustering

To perform principal component analysis (PCA) of circRNA expression in whole blood samples of different donors, variance stabilizing transformation was first performed on raw head-to-tail spliced read counts using the R package DESeq2 (see Love M I, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 2014; 15(12):550). Only circRNAs with a transformed expression value of at least 6.7 (n=910, FIG. 4A) in one of the samples were considered for the analysis. PCA was performed on all remaining circRNAs using the prcomp function of R's stats package. All genes that give rise to these circRNAs and have at least one known splice junction were considered for PCA of the linear host gene expression. Same procedure was used for PCA using the median number of linear spliced reads as a proxy for linear expression. 200 circRNAs with the highest weight in PC2 were considered for clustering. Raw head-to-tail spliced read counts for each circRNA (n_i) were normalized to sequencing depth by dividing by the number of reads that map to protein coding gene regions multiplied by 1,000,000. Whole blood samples of different donors were clustered on log₂transformed normalized circRNA expression profiles (log₂(n_i+1)). Hierarchical, agglomerative clustering was performed with complete linkage and by using Spearman's rank correlation as distance metric (1−{corr [log(n, +1)]}). The same procedure was used for linear host gene expression using the median number of linear spliced reads for all genes that give rise to these 200 circRNAs and have at least one known splice site.

2. Results

2.1 Thousands of circRNAs are Reproducibly Detected in Human Peripheral Whole Blood

First it was determined whether circRNAs are present in standard clinical blood specimen. To this end, total RNA was prepared from two biologically independent human peripheral whole blood samples and depleted ribosomal RNAs (see Methods, supra). The samples were reverse transcribed using random primers to allow for circRNA detection and sequencing libraries were produced (FIG. 1A). The raw reads were fed into our in silico circRNA detection pipeline (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338). In short, the program filters reads that map continuously to the genome but saves unmapped reads. From those, terminal 20-mer anchors are extracted and independently aligned to the genome. If the anchors map in reverse orientation and can be extended to cover the whole read sequence, they are flagged as head-to-tail junction spanning, i.e. indicative for circRNAs. Anchors that aligned consecutively were used to determine linear splicing as an internal library quality control and to assess linear RNA isoform expression (Table 2).

From the RNA of two human donors we identified 4550 and 4105 unique circRNA candidates, respectively, by at least two independent reads spanning a head-to-tail splice junction (FIG. 1B). In both datasets the number of total reads and linear splicing events were respectively similar, indicating reproducible sample preparation (Table 2, Table 5). When considering RNAs found in both samples, we observed a high correlation of expression for both linear (R=0.98) as well as circRNAs (R=0.80, FIG. 1B). Between the two samples 1265 circRNAs (55%) with more than 5 reads overlap and 2442 (39%) circRNAs supported by at least 2 reads are shared (Table 1, FIG. 15, FIG. 5, technical reproducibility is shown in FIG. 6). The later set will be considered as reproducibly detected circRNAs in the following analysis. CircRNA candidates are derived from genes covering the whole dynamic range of RNA expression (FIG. 1B, right panel). As observed in other human samples, we find that most circRNAs are derived from protein coding exonic regions or 5′ UTR sequences (FIG. 1C; see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338, and Rybak-Wolf A, Stottmeister C, Glažar P, et al. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. MOLCEL. 2015; 1-17). GO term enrichment analysis on reproducibly detected, top expressed circRNAs and the same number of top linear RNAs showed significant enrichment of different biological function annotations (FIG. 7). Together with the broad expression spectrum of corresponding host genes this finding argues that circRNA expression levels are largely independent of linear RNA isoform abundance.

The predicted spliced length of blood circRNAs of 200-800 nt (median=343 nt) is similar to that in liver or cerebellum (median=394/448 nt) and previous observations in HEK293 cell cultures and other human samples (FIG. 8 and see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338). However, we observed a high number of circRNAs per gene, with 23 genes giving rise to more than 10 circRNAs (‘circRNA hotspots’, FIG. 1D).

To assess the reproducibility of the sequencing results we designed divergent, circRNA specific primers and measured relative abundances of the top eight expressed circRNAs compared to linear control genes in qPCR (FIG. 1E). circRNA candidate 8 could not be unambiguously amplified from cDNA, most likely due to overlapping RNA isoforms and was therefore excluded from further analysis. For the remaining seven circRNA candidates, we tested circularity using previously established assays: 1) resistance to the 3′-5′ exonuclease RNase R and 2) Sanger sequencing of PCR amplicons to confirm the sequence of predicted head-to-tail splice junctions. With these assays we validated 7/7 tested candidates suggesting that the overall false positive rate in our data sets is low (FIG. 9). Interestingly, these circRNAs are expressed from gene loci that so far were not shown to have a specific blood related function (Table 3) but show expression levels that by far exceed expression of housekeeping genes such as VCL or TFRC (10-100 fold, FIG. 1E).

2.2 Circular-to-Linear RNA Expression is High in Blood

When inspecting the read coverage in blood sequencing data, it was noticed that oftentimes the expression of circularized exons was outstandingly high compared to the coverage of neighboring exons expressed in linear RNA isoforms of the same gene. For example, it was observed that the two exons of circRNA candidate 5, which is product of the PCNT locus were densely covered with sequencing reads in the blood samples, while the upstream and downstream exons were barely detected (FIG. 2A). This particular expression pattern was not observed in HEK293 cells, where all exons were equally covered. This observation was further investigated by qPCR, comparing linear to circular RNA expression with isoform specific primer sets in HEK293 and whole blood samples (FIG. 2B, C). This independent assay confirmed dominant expression of the tested candidates which was found to be at least 30-fold higher than the cognate linear isoforms. In contrast, this circRNA domination was not found in HEK293 cells where the same RNAs were probed, which argues for a tissue-specific pattern.

Thereafter, comparison of the blood data to published ENCODE project datasets from cerebellum, representative of neuronal tissues that in general have high circRNA expression (see Rybak-Wolf A, Stottmeister C, Glažar P, et al. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. MOLCEL. 2015; 1-17) and to a non-neuronal primary tissue, liver (Table 2) was performed. Approx. 30% of blood circRNAs are also found in cerebellum while this fraction was around 10% for liver with higher fractions for both cases when constraining the analysis to highly expressed blood circRNAs (FIG. 10 A-D, comparison between total RNAs in FIG. 11). In summary, circRNAs found in human whole blood in part overlap circRNAs expressed in cerebellum of liver, but also contain hundreds of other circRNAs.

The relative circular to linear RNA isoform abundance on a transcriptome wide scale was then analyzed. To this end, read counts that span head-to-tail junctions and are therefore indicative of circRNAs were compared to the median number of read counts on linear splice site junctions on the same gene, the latter serving as a proxy for linear RNA expression (see Methods, supra). We observed that many blood circRNAs are highly expressed while corresponding linear RNAs show average or low abundances (FIG. 3A), a finding that was recapitulated by qPCR assays validating our approach (FIG. 12). For the control samples cerebellum and liver this pattern was not observed (FIG. 3B, C) as revealed by comparing the mean circular-to-linear RNA ratio, which we found to be significantly higher in blood than in the tested control tissues (FIG. 3D). In summary, blood has an outstanding general tendency to contain circRNAs at high levels while the corresponding linear transcripts are much more lowly expressed. This tendency was only found (to a much lower extent) in cerebellum but not in liver RNA as well as RNA from many other tissues or cell lines that we have analyzed.

2.3 circRNAs are Putative Biomarkers in Alzheimer's Disease

The results show that circRNAs are reproducibly and easily detected in clinical standard blood samples and therefore are well suited to serve as a new class of biomarker for human diseases, like neurodegenerative diseases. Taking into account the high expression of circRNAs in neuronal tissues (see Rybak-Wolf A, Stottmeister C, Glažar P, et al. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. MOLCEL. 2015; 1-17) and the urgent need for biomarkers in neurological diseases, circular RNA expression in blood samples from Alzheimer's diseases patients and control subjects (see Methods, supra, Table 2, Table 4) was investigated. To this end, sequencing libraries from whole blood RNA from five individuals of each group were generated. In total 22,644 distinct circRNAs were detected in all samples combined. Then putative disease-specific circRNA expression were identified. Therefore, subsets of all detected circRNA candidates were defined and used these in a principle component analysis to detect expression differences between the two groups. Sorting of all circRNAs by expression and definition of sets for PCA analysis by increasing expression cut-offs was performed. In a range of the top 500 to top 900 circRNAs, a clear separation of control and diseased subjects was detected (FIG. 4A, FIG. 13). Interestingly, this is not observed when analyzing the corresponding linear RNAs, suggesting that there might be disease relevant information specifically encoded in the circular blood transcriptome (FIG. 4B). When the circRNA were sorted out of this analysis by their weight in principle component 2 (PC2) and the data was subjected to unsupervised clustering, again controls and Alzheimer's patients were distinguishable. Importantly, the two main clusters do not reflect the gender or age of the subjects (see also Table 6). The findings of this analysis show that circRNA expression patterns in blood have a diagnostic value, that is not revealed by analyzing the expression of their cognate linear RNA isoforms.

3. Discussion

Recent publications show that circRNAs can be detected in plasma and saliva samples (see Koh W, Pan W, Gawad C, et al. Noninvasive in vivo monitoring of tissue-specific global gene expression in humans. Proceedings of the National Academy of Sciences. 2014; 111(20):7361-7366; and Bahn J H, Zhang Q, Li F, et al. The Landscape of MicroRNA, Piwi-Interacting RNA, and Circular RNA in Human Saliva. Clinical Chemistry. 2014). However, in both specimens only few (10-70) circular RNAs with canonical splice sites were reported, which dramatically limits any further analysis. The circular transcriptome of whole blood presented here, demonstrates that the search for putative circRNA biomarker in peripheral blood is much more suitable to yield informative results. Using RNA-Seq of clinical standard samples showed reproducible detection of around 2400 circRNA candidates that are present in human whole blood. It will be interesting to determine the origin of blood circRNAs. Accumulating evidence suggests that circRNAs are specifically expressed in a developmental stage- and tissue-specific manner, rather than being merely byproducts of splicing reactions (see Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013; 495(7441):333-338; and Rybak-Wolf A, Stottmeister C, Glažar P, et al. Circular RNAs in the Mammalian Brain Are Highly Abundant, Conserved, and Dynamically Expressed. MOLCEL. 2015; 1-17). Previously analyzed circRNA from neutrophils, B-cells and hematopoietic stem cells suggest that many circRNAs are constituents of hematocytes (see Salzman J, Gawad C, Wang P L, Lacayo N, Brown P O. Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS ONE. 2012; 7(2):e30733). However, there is also the intriguing possibility of circRNA excretion into the extracellular space, e.g. by vesicles such as exosomes. Likewise, aberrant circRNA expression in disease may reflect, either a condition-specific transcriptome change in blood cells themselves, or a direct consequence of active or passive release of circRNA from diseased tissue.

Further, we demonstrated that many circRNAs have a high expression compared to linear RNA isoforms from the same locus, a feature that distinguishes blood circRNAs from other primary tissues such as cerebellum or liver. Considering that this was observed for hundreds of blood circRNA candidates (FIG. 3A, Table 1, FIG. 15) and that further restricted the experimental setup to standard samples and preparation procedures. Gene products that are dominated by circRNAs which typically comprise 2-4 exons (example in FIG. 2, FIG. 14) will also dominate signals for the specific gene of interest in array assays, Northern Blots or qPCR experiments if the circularized exon expression is measured.

After reproducibly detecting thousands of oftentimes highly expressed circRNAs in blood, it was asked whether these might be instrumental in diagnosis of human disease. Therefore measurement of putatively specific circRNA abundances in Alzheimer's disease and control samples was performed.

It was observed that analyzing specific subsets of blood circular RNAs allows distinguishing Alzheimer's disease from control samples in a principal component analysis and unsupervised clustering. This distinction was not possible when analyzing linear RNA isoforms from the same genomic loci, demonstrating that circRNA expression data bear specific information.

Given the urgent need for non-invasive biomarker detection for many disease states, these findings show a way for biomarker detection in easily accessible bodily fluid samples; like whole blood and cerebrospinal fluid. These are not being limited to Alzheimer's disease or neurological condition in general, since blood circRNA expression might be specifically altered in many disorders and therefore exploitable as diagnostic tool in human diseases.

TABLE 2

Sequencing statistic of analyzed libraries

reads

number of reads

mapping

that map
number of reads

to

number of

continously
used for
number of
protein

total

to
circRNA
linear
coding
number of

reads
% map
genome
detection
splicing
genes
circRNA
Ensembl

Sample
(millions)
to rRNA
(millions)
(millions)
events
(millions)
candidates
accession ID

H_1
57.85
11.52
41.75
9.45
77.367
8.47
4.550

rep_H_1
169.86
11.43
122.18
28.27
107.996
24.73
9.996

H_2
48.04
6.32
37.13
7.88
74.676
7.81
4.105

H_3
164.93
15.44
115.02
24.44
108.870
21.25
11.113

H_4
171.76
47.99
75.43
13.91
94.811
13.17
5.739

H_5
170.20
10.52
123.28
29.02
107.573
24.26
10.002

AD_1
110.76
18.74
74.07
15.93
88.932
13.85
5.837

AD_2
132.73
11.27
100.26
17.51
98.956
16.70
7.513

AD_3
131.62
15.78
93.46
17.39
91.823
17.22
6.867

AD_4
140.48
19.81
95.07
17.57
98.952
16.92
8.016

AD_5
122.04
13.88
88.91
16.18
96.942
17.41
6.404

Cerebellum_1
87.22
0.49
76.60
18.19
113.99
22.01
6.792
ENCSR000AEW,

ENCFF001ROL

Cerebellum_2
122.00
0.14
109.82
13.18
122.375
24.63
5.786
ENCSR000AEW,

ENCFF001RPH

Liver_1
86.12
3.35
72.72
10.52
101.147
30.14
839
ENCSR000AFB,

ENCFF001RNR

Liver_2
103.53
6.95
72.01
17.41
106.969
55.07
1.557
ENCSR000AFB,

ENCFF001RNX

Summary of RNA-Sequencing Results

Sequencing results for blood RNA from five controls (H), five Alzheimer patients (AD), cerebellum and liver control RNA samples. If not noted otherwise sample datasets were produced for this study.

TABLE 3

Details on top expressed circRNA candidates.

spliced length in
head-to-tail

candidate

host gene annoation
function
nt
read counts
circBase ID

1
MBOAT2
membrane bound O-
acyltransferase
226
1367
hsa_circ_0007334

acyltransferase

domain containing 2

2
TMEM56
Transmembrane Protein 56
unknown
264
676
hsa_circ_0000095

3
DNAJC6
DnaJ Chaperone Homolog,
regulates chaperone
302
513
hsa_circ_0002454

Subfamily C, Member 6
activity

4
UBXN7
UBX domain protein 7
Ubiquitin-binding
183
485
hsa_circ_0001380

adapter

5
PCNT1
Pericentrin-13
component of the
315
344
hsa_circ_0002903

nuclear pore complex

6
MORC3
MORC family CW-type zinc
unknown
249
333
hsa_circ_0001189

finger 3

7
XPO1
Exportin 1
nuclear export of
207
333
hsa_circ_0001017

protein and RNAs

8
GSE1
Coiled-Coil Protein Genetic
unknown
219
326
hsa_circ_0000722

Suppressor Element

TABLE 4

Patients overview.

subject
age
diagnosis
stage
MMSE
sex

H_1
31
control

m

H_2
27
control

m

H_3
71
control

f

H_4
65
control

f

H_5
62
control

m

AD_1
75
Alzheimer's Disease
mild
24
m

AD_2
81
Alzheimer's Disease
mild
23
m

AD_3
73
Alzheimer's Disease
mild
25
f

AD_4
69
Alzheimer's Disease
medium
18
f

AD_5
68
Alzheimer's Disease
severe
8
m

MMSE: Mini Mental State Examination

TABLE 5

Raw reads mapping to hemoglobin genes for blood sample 1 and 2.

sample 1
sample 2

total reads
57,853,921
48,035,915

HBA1
2,429,755
2,016,794

HBA2
3,330,428
1,891,626

HBB
3,964,389
3,703,140

HBD
1,016
936

sum
9,725,588
7,612,496

% of total
16.81
15.85

TABLE 6

Rate of reproducibility after sub-sampling circRNAs

in FIG. 4a 1000 times.

%

% main cluster

% circRNAs
clustering reproduced
% linear RNAs
as in FIG. 4A

90
52.2
90
0

70
31.4
70
0

50
17.9
50
0

TABLE 7

List of oligonucleotides used in the Examples

SEQ

ID

NO:
Name
Sequence

1821
hsaRTvinculinfwd
CTCGTCCGGGTTGGAAAAGAG

1822
hsaRTvinculinrev
AGTAAGGGTCTGACTGAAGCAT

1823
hsaTFRCfwd
ACCATTGTCATATACCCGGTTCA

1824
hsaTFRCrev
CAATAGCCCAAGTAGCCAATCAT

1825
celegansEIF3D_fwd
CGCCTTGAACATGGATAACTGCTGGG

1826
celegansEIF3D_rev
GATCGTCATCCGAGTTCTCCTCGTCG

1827
hsaMBOAT2div_fwd
AGTGCAAGATAAAGGCCCAAA

1828
hsaMBOAT2div_rev
TGATCATCATAGGAGTGGAGAACA

1829
hsaMBOAT2con_fwd
TACTCCACAGGTAATGTTGTAC

1830
hsaMBOAT2con_rev
ACTTTCATTGAAGGCAGATCATACCA

1831
hsaDNAJC6div_fwd
CCAGACATCTTGACCACTACACA

1832
hsaDNAJC6div_rev
ATGTGTCTTTGAGGGTGTCTTT

1833
hsaDNAJC6con_fwd
TCTCTACTCTACTCCTGGCCCAG

1834
hsaDNAJC6con_rev
GTAGGTCACACATATAGCCCAGGT

1835
hsaTMEM56div_fwd
CATCATTGTGCGTCCCTGTATG

1836
hsaTMEM56div_rev
GCTGAGACTATTGAAACCTGGAGA

1837
hsaTMEM56con_fwd
GCTGGCATACATTGGGAATTT

1838
hsaTMEM56con_rev
CAATCCGCACGATGAAGAATAC

1839
hsaUBXN7div_fwd
ACCAGTATTTCCTGCTTTTGAGG

1840
hsaUBXN7div_rev
CTACCCTTGCAGATCTATTCCGG

1841
hsaUBXN7con_fwd
AGAAATCCCGTCACTTGGTCCAA

1842
hsaUBXN7con_rev
TGACAGTGAGGAAGGTCAGAGA

1843
hsaGSE1div_fwd
CATCCTCCAGCTTTGCCGCCG

1844
hsaGSE1div_rev
CTGGTCGCGGTGGAAAGCATC

1845
hsaGSE1con_fwd
AGCTCAGTTGTGCAGGATTC

1846
hsaGSE1con_rev
CTTCTCAGGTAGTCCTCGGT

1847
hsaMORC3div_fwd
CATCCTACGTGGACAGAAAGTGAA

1848
hsaMORC3div_rev
CTGTTCCGTGGAAAACAGAGAAT

1849
hsaMORC3con_fwd
CAGTGCAGTTGCTGAATTAATAG

1850
hsaMORC3con_rev
TCCCATTGTCGGTGAATGTC

1851
hsaPCNT1div_fwd
CCGGTGTTTAGAAGACTTGGAGTT

1852
hsaPCNT1div_rev
TGCAGACAGTTCTTTGCGTAGATT

1853
hsaPCNT1con_fwd
TTGCCATTACTGACCTGGAGAGC

1854
hsaPCNT1con_rev
CCGTCAATGCCGTCTCCTTCTC

1855
hsaXPO1div_fwd
TGAAATCAAGCAGCTGACGA

1856
hsaXPO1div_rev
AGATTCTTCCAAGGAACCAGTG

1857
hsaXPO1con_fwd
GCCAGGGACAGACATTTGA

1858
hsaXPO1con_rev
GCTCAAGTAAAGCTCTTTGTGAC

A METHOD FOR DIAGNOSING A DISEASE BY DETECTION OF circRNA IN BODILY FLUIDS

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

PCT Information